On Mon, May 29, 2017 at 11:19 AM, Xinzhe Wu <[email protected]> wrote:

> Dear all,
>
> We have developed the codes with PETSc + SLEPc which works well on CPU
> version. Now we want to try these codes with GPU + MPI, but get some weird
> errors shown as below.
>
> I have found someone talked about this problem here
> http://lists.mcs.anl.gov/pipermail/petsc-dev/2016-March/018836.html , but
> I can hardly understand it. Can anyone help me with these issues?
>

The answer is here:

>>>>* I think the error messages you get is pretty descriptive regarding the 
>>>>root cause. You are probably running out of GPU memory. Since you are 
>>>>running on a GTX 285 you can't use MPS [1] therefore each MPI process has 
>>>>its own context on the GPU. Each context needs to initialize some data on 
>>>>the GPU (used for local variables and so on). The required amount needed 
>>>>for this depends on the size of the GPUs (essentially correlates with the 
>>>>maximum number of concurrently active threads). This can easily be 
>>>>50-100MB. So with only 1GB of GPU memory you are probably using all GPUs 
>>>>memory for context data and nothing is available for your application. 
>>>>Unfortunately there is no good way to debug this with GeForce. On Tesla 
>>>>nvidia-smi does show you all processes that have a context on a GPU 
>>>>together with their memory consumption.*

It appears that you are running out of GPU memory. This can happen if you
use too many
MPI processes for a single GPU.

  Thanks,

     Matt


> Thank you in advance!
>
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: CUBLAS error 1
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: CUBLAS error 1
> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3965-gf375733  GIT
> Date: 2017-05-28 10:32:02 -0500
> [2]PETSC ERROR: ./hyperh on a arch-linux2-c-debug named romeo44 by
> xinzhewu Mon May 29 18:03:58 2017
> [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-mpich --download-fblaslapack
> --with-visibility=0 --with-shared-libraries=0 --with-cuda=1 --with-thrust=1
> --with-precision=double --with-clanguage=c 
> --with-pestc-arch=linux-c-no-debug-complex
> --with-scalar-type=complex
> [2]PETSC ERROR: #1 PetscInitialize() line 906 in /home/xinzhewu/Petsc-GPUs/
> petsc/src/sys/objects/pinit.c
> [2]PETSC ERROR: #2 SlepcInitialize() line 259 in /home/xinzhewu/Petsc-GPUs/
> slepc/src/sys/slepcinit.c
>
>
> --
> Xinzhe WU
> Ph.D Student of Computer Science
> Maison de la Simulation, CNRS USR3441
> Building 565, CEA Saclay
> 91191, Gif-sur-Yvette, France
> Tel: +33 (0) 1 69 08 59 93 <+33%201%2069%2008%2059%2093>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/

Reply via email to