Re: [petsc-dev] mpi/cuda issue

Barry Smith Mon, 21 Mar 2016 20:00:43 -0700

> On Mar 21, 2016, at 9:50 PM, Satish Balay <[email protected]> wrote:
> 
> BTW: perils of using 'gitcommit=origin/master'
> http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2016/03/21/master.html
> 
> Perhaps we should switch superlu_dist to use a working snapshot?
>    self.gitcommit        = '35c3b21630d93b3f8392a68e607467c247b5e053'
> 
> balay@asterix /home/balay/petsc (master=)
> $ git grep origin/master config
> config/BuildSystem/config/packages/Chombo.py:    self.gitcommit        = 
> 'origin/master'
> config/BuildSystem/config/packages/SuperLU_DIST.py:    self.gitcommit        
> = 'origin/master'
> config/BuildSystem/config/packages/amanzi.py:    self.gitcommit        = 
> 'origin/master'
> config/BuildSystem/config/packages/saws.py:    self.gitcommit = 
> 'origin/master'


  Satish and Hong,

Sherry has changed SuperLU_dist to have no name conflicts with SuperLU this 
means we need to update the SuperLU_dist interface for fix these problems. Once 
things have settled down we can use a release commit instead of master.

  Barry

> 
> Satish
> 
> On Mon, 21 Mar 2016, Satish Balay wrote:
> 
>> Hm - get a gtx 950 [2GB] and replace? [or gtx 970 4GB?]
>> 
>> http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-950/specifications
>> http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-970/specifications
>> 
>> There is a different machine with 2 M2090 cards - so I'll switch the
>> builds to that [es.mcs.anl.gov]. I was previously avoiding builds on
>> it - as its used as a general use machine [and perhaps ocassionally
>> for benchmark runs]
>> 
>> Satish
>> 
>> On Mon, 21 Mar 2016, Dominic Meiser wrote:
>> 
>>> Hi Jiri,
>>> 
>>> Thanks very much for the fast response.  That's very useful
>>> information.  I had no idea the memory footprint of the contexts
>>> was this large.
>>> 
>>> Satish, Barry is there any chance we can upgrade the GPU in the
>>> test machine to at least Fermi generation?  That way I can help
>>> much more easily because I'd be able to reproduce your setup
>>> locally.
>>> 
>>> Cheers,
>>> Dominic
>>> 
>>> On Mon, Mar 21, 2016 at 08:01:10PM +0000, Jiri Kraus wrote:
>>>> Hi Dominic,
>>>> 
>>>> I think the error messages you get is pretty descriptive regarding the 
>>>> root cause. You are probably running out of GPU memory. Since you are 
>>>> running on a GTX 285 you can't use MPS [1] therefore each MPI process has 
>>>> its own context on the GPU. Each context needs to initialize some data on 
>>>> the GPU (used for local variables and so on). The required amount needed 
>>>> for this depends on the size of the GPUs (essentially correlates with the 
>>>> maximum number of concurrently active threads). This can easily be 
>>>> 50-100MB. So with only 1GB of GPU memory you are probably using all GPUs 
>>>> memory for context data and nothing is available for your application. 
>>>> Unfortunately there is no good way to debug this with GeForce. On Tesla 
>>>> nvidia-smi does show you all processes that have a context on a GPU 
>>>> together with their memory consumption.
>>>> 
>>>> Hope this helps
>>>> 
>>>> Jiri
>>>> 
>>>> 
>>>> [1] https://docs.nvidia.com/deploy/mps/index.html 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Dominic Meiser [mailto:[email protected]]
>>>>> Sent: Montag, 21. März 2016 19:17
>>>>> To: Jiri Kraus <[email protected]>
>>>>> Cc: Karl Rupp <[email protected]>; Barry Smith <[email protected]>;
>>>>> [email protected]
>>>>> Subject: mpi/cuda issue
>>>>> 
>>>>> Hi Jiri,
>>>>> 
>>>>> Hope things are going well.  We are trying to understand an
>>>>> mpi+cuda issue in the tests of the PETSc library and I was
>>>>> wondering if you could help us out.
>>>>> 
>>>>> The behavior we're seeing is that some of the tests fail intermittently 
>>>>> with
>>>>> "out of memory" errors, e.g.
>>>>> 
>>>>> terminate called after throwing an instance of
>>>>> 'thrust::system::detail::bad_alloc'
>>>>>   what():  std::bad_alloc: out of memory
>>>>> 
>>>>> Other tests hang when we oversubscribe the GPU with a largish number of
>>>>> MPI processes (32 in one case).  Satish obtained info on the GPU
>>>>> configuration using nvidia-smi below.
>>>>> 
>>>>> Could you remind us what the requirements for MPI+cuda are, especially
>>>>> regarding over subscription?
>>>>> 
>>>>> Are there any other tools we can use to debug this problem?  Any
>>>>> suggestions on what we should look at next?
>>>>> 
>>>>> Thanks very much in advance.
>>>>> Cheers,
>>>>> Dominic
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Mar 21, 2016 at 01:09:14PM -0500, Satish Balay wrote:
>>>>>> balay@frog ~ $ nvidia-smi
>>>>>> Mon Mar 21 13:07:36 2016
>>>>>> +------------------------------------------------------+
>>>>>> | NVIDIA-SMI 340.93     Driver Version: 340.93         |
>>>>>> |-------------------------------+----------------------+----------------------+
>>>>>> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile 
>>>>>> Uncorr. ECC |
>>>>>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
>>>>> Compute M. |
>>>>>> 
>>>>> |===============================+======================+======
>>>>> ================|
>>>>>> |   0  GeForce GTX 285     Off  | 0000:03:00.0     N/A |                 
>>>>>>  N/A |
>>>>>> | 40%   66C    P0    N/A /  N/A |      3MiB /  1023MiB |     N/A      
>>>>>> Default |
>>>>>> +-------------------------------+----------------------+----------------------+
>>>>>> 
>>>>>> +-----------------------------------------------------------------------------+
>>>>>> | Compute processes:                                               GPU 
>>>>>> Memory |
>>>>>> |  GPU       PID  Process name                                     Usage 
>>>>>>      |
>>>>>> 
>>>>> |=============================================================
>>>>> ================|
>>>>>> |    0            Not Supported                                          
>>>>>>      |
>>>>>> +-----------------------------------------------------------------------------+
>>>>>> 
>>>>>> 
>>>>>> balay@frog ~/soft/NVIDIA_CUDA-5.5_Samples/bin/x86_64/linux/release $
>>>>>> ./deviceQuery ./deviceQuery Starting...
>>>>>> 
>>>>>> CUDA Device Query (Runtime API) version (CUDART static linking)
>>>>>> 
>>>>>> Detected 1 CUDA Capable device(s)
>>>>>> 
>>>>>> Device 0: "GeForce GTX 285"
>>>>>>  CUDA Driver Version / Runtime Version          6.5 / 5.5
>>>>>>  CUDA Capability Major/Minor version number:    1.3
>>>>>>  Total amount of global memory:                 1024 MBytes (1073414144
>>>>> bytes)
>>>>>>  (30) Multiprocessors, (  8) CUDA Cores/MP:     240 CUDA Cores
>>>>>>  GPU Clock rate:                                1476 MHz (1.48 GHz)
>>>>>>  Memory Clock rate:                             1242 Mhz
>>>>>>  Memory Bus Width:                              512-bit
>>>>>>  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536,
>>>>> 32768), 3D=(2048, 2048, 2048)
>>>>>>  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
>>>>>>  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512
>>>>> layers
>>>>>>  Total amount of constant memory:               65536 bytes
>>>>>>  Total amount of shared memory per block:       16384 bytes
>>>>>>  Total number of registers available per block: 16384
>>>>>>  Warp size:                                     32
>>>>>>  Maximum number of threads per multiprocessor:  1024
>>>>>>  Maximum number of threads per block:           512
>>>>>>  Max dimension size of a thread block (x,y,z): (512, 512, 64)
>>>>>>  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
>>>>>>  Maximum memory pitch:                          2147483647 bytes
>>>>>>  Texture alignment:                             256 bytes
>>>>>>  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
>>>>>>  Run time limit on kernels:                     No
>>>>>>  Integrated GPU sharing Host Memory:            No
>>>>>>  Support host page-locked memory mapping:       Yes
>>>>>>  Alignment requirement for Surfaces:            Yes
>>>>>>  Device has ECC support:                        Disabled
>>>>>>  Device supports Unified Addressing (UVA):      No
>>>>>>  Device PCI Bus ID / PCI location ID:           3 / 0
>>>>>>  Compute Mode:
>>>>>>     < Default (multiple host threads can use ::cudaSetDevice() with
>>>>>> device simultaneously) >
>>>>>> 
>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA
>>>>>> Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce GTX 285 Result =
>>>>>> PASS balay@frog
>>>>>> ~/soft/NVIDIA_CUDA-5.5_Samples/bin/x86_64/linux/release $
>>>>>> 
>>>>>> 
>>>>>> On Mon, 21 Mar 2016, Dominic Meiser wrote:
>>>>>> 
>>>>>>> I have used over subscription of GPUs fairly routinely but it
>>>>>>> requires driver support (and I think at some point it also required
>>>>>>> a patched mpich, but that requirement is gone AFAIK).  I don't
>>>>>>> remember what driver version is needed.  Can you get the driver
>>>>>>> version on the test machine with nvidia-smi?
>>>>>>> 
>>>>>>> Also over subscription by such a large factor could be an issue.
>>>>>>> But given that the example doesn't actually use GPUs one would hope
>>>>>>> that it shouldn't matter ...
>>>>>>> 
>>>>>>> Karl, have you been able to reproduce this issue on a different
>>>>>>> machine?  Or any idea what's needed to reproduce the failures?
>>>>>>> I can try and hunt down a sm_13 GPU but if there's an easier way to
>>>>>>> reproduce that would be great.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Dominic
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Mar 21, 2016 at 12:11:08PM -0500, Satish Balay wrote:
>>>>>>>> I attempted to manually run the tests after the reboot - and then
>>>>>>>> they crashed/hanged
>>>>>>>> at:
>>>>>>>> 
>>>>>>>> [14]PETSC ERROR: --------------------- Error Message
>>>>>>>> --------------------------------------------------------------
>>>>>>>> [14]PETSC ERROR: Error in external library [14]PETSC ERROR: CUBLAS
>>>>>>>> error 1 [14]PETSC ERROR: See
>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>>> shooting.
>>>>>>>> [14]PETSC ERROR: Petsc Development GIT revision:
>>>>>>>> pre-tsfc-2225-g6da9565  GIT Date: 2016-03-20 23:47:14 -0500
>>>>>>>> [14]PETSC ERROR: ./ex36 on a arch-cuda-double named frog by balay
>>>>>>>> Mon Mar 21 10:49:24 2016 [14]PETSC ERROR: Configure options
>>>>>>>> --with-cuda=1 --with-cusp=1
>>>>>>>> -with-cusp-dir=/home/balay/soft/cusplibrary-0.4.0 --with-thrust=1
>>>>>>>> --with-precision=double --with-clanguage=c --with-cuda-arch=sm_13
>>>>>>>> --with-no-output -PETSC_ARCH=arch-cuda-double
>>>>>>>> -PETSC_DIR=/home/balay/petsc.clone
>>>>>>>> [14]PETSC ERROR: #1 PetscInitialize() line 922 in
>>>>>>>> /home/balay/petsc.clone/src/sys/objects/pinit.c
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This one does: 'mpiexec -n 32 ./ex36'
>>>>>>>> 
>>>>>>>> Does such oversubscription of GPU supporsed to work? BTW: I don't
>>>>>>>> think this example is using cuda [but there is still cublas
>>>>>>>> initialization?]
>>>>>>>> 
>>>>>>>> I've rebooted the machine again - and the 'day'builds have just 
>>>>>>>> started..
>>>>>>>> 
>>>>>>>> Satish
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, 21 Mar 2016, Karl Rupp wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> the reboot may help, yes. I've observed such weird test failures
>>>>>>>>> twice over the years. In both cases they were gone after
>>>>>>>>> powering the machine off and powering them on again (at least in
>>>>> one case it was not sufficient to reboot).
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Karli
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 03/21/2016 04:38 PM, Satish Balay wrote:
>>>>>>>>>> The test mode [on this machine] didn't change in the past few
>>>>> months..
>>>>>>>>>> 
>>>>>>>>>> I've rebooted the box now..
>>>>>>>>>> 
>>>>>>>>>> Satish
>>>>>>>>>> 
>>>>>>>>>> On Mon, 21 Mar 2016, Dominic Meiser wrote:
>>>>>>>>>> 
>>>>>>>>>>> Really odd that these out-of-memory errors are occurring now.
>>>>>>>>>>> AFAIK nothing related to this has changed in the code.  Are
>>>>>>>>>>> the tests run any differently?  Perhaps more tests in
>>>>>>>>>>> parallel?  Is it possible to reset the driver or to reboot the test
>>>>> machine?
>>>>>>>>>>> 
>>>>>>>>>>> Dominic
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, Mar 20, 2016 at 09:12:36PM -0500, Barry Smith wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> ftp://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2016/0
>>>>>>>>>>>> 3/20/examples_master_arch-cuda-double_frog.log
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> --
>>>>> Dominic Meiser
>>>>> Tech-X Corporation - 5621 Arapahoe Avenue - Boulder, CO 80303
>>>> NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
>>>> Managing Director: Karen Theresa Burns
>>>> 
>>>> -----------------------------------------------------------------------------------
>>>> This email message is for the sole use of the intended recipient(s) and 
>>>> may contain
>>>> confidential information.  Any unauthorized review, use, disclosure or 
>>>> distribution
>>>> is prohibited.  If you are not the intended recipient, please contact the 
>>>> sender by
>>>> reply email and destroy all copies of the original message.
>>>> -----------------------------------------------------------------------------------
>>>

Re: [petsc-dev] mpi/cuda issue

Reply via email to