On Mon, May 29, 2017 at 11:19 AM, Xinzhe Wu <[email protected]> wrote:
> Dear all, > > We have developed the codes with PETSc + SLEPc which works well on CPU > version. Now we want to try these codes with GPU + MPI, but get some weird > errors shown as below. > > I have found someone talked about this problem here > http://lists.mcs.anl.gov/pipermail/petsc-dev/2016-March/018836.html , but > I can hardly understand it. Can anyone help me with these issues? > The answer is here: >>>>* I think the error messages you get is pretty descriptive regarding the >>>>root cause. You are probably running out of GPU memory. Since you are >>>>running on a GTX 285 you can't use MPS [1] therefore each MPI process has >>>>its own context on the GPU. Each context needs to initialize some data on >>>>the GPU (used for local variables and so on). The required amount needed >>>>for this depends on the size of the GPUs (essentially correlates with the >>>>maximum number of concurrently active threads). This can easily be >>>>50-100MB. So with only 1GB of GPU memory you are probably using all GPUs >>>>memory for context data and nothing is available for your application. >>>>Unfortunately there is no good way to debug this with GeForce. On Tesla >>>>nvidia-smi does show you all processes that have a context on a GPU >>>>together with their memory consumption.* It appears that you are running out of GPU memory. This can happen if you use too many MPI processes for a single GPU. Thanks, Matt > Thank you in advance! > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: CUBLAS error 1 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: CUBLAS error 1 > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3965-gf375733 GIT > Date: 2017-05-28 10:32:02 -0500 > [2]PETSC ERROR: ./hyperh on a arch-linux2-c-debug named romeo44 by > xinzhewu Mon May 29 18:03:58 2017 > [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > --with-visibility=0 --with-shared-libraries=0 --with-cuda=1 --with-thrust=1 > --with-precision=double --with-clanguage=c > --with-pestc-arch=linux-c-no-debug-complex > --with-scalar-type=complex > [2]PETSC ERROR: #1 PetscInitialize() line 906 in /home/xinzhewu/Petsc-GPUs/ > petsc/src/sys/objects/pinit.c > [2]PETSC ERROR: #2 SlepcInitialize() line 259 in /home/xinzhewu/Petsc-GPUs/ > slepc/src/sys/slepcinit.c > > > -- > Xinzhe WU > Ph.D Student of Computer Science > Maison de la Simulation, CNRS USR3441 > Building 565, CEA Saclay > 91191, Gif-sur-Yvette, France > Tel: +33 (0) 1 69 08 59 93 <+33%201%2069%2008%2059%2093> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/
