Re: [petsc-users] GAMG crash during setup when using multiple GPUs
Hi Mark, Thanks for the information. @Junchao: Given that there are known issues with GPU aware MPI, it might be best to wait until there is an updated version of cray-mpich (which hopefully contains the relevant fixes). Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io<http://s-sajid-ali.github.io> From: Mark Adams Sent: Thursday, February 10, 2022 8:47 PM To: Junchao Zhang Cc: Sajid Ali Syed ; petsc-users@mcs.anl.gov Subject: Re: [petsc-users] GAMG crash during setup when using multiple GPUs Perlmutter has problems with GPU aware MPI. This is being actively worked on at NERSc. Mark On Thu, Feb 10, 2022 at 9:22 PM Junchao Zhang mailto:junchao.zh...@gmail.com>> wrote: Hi, Sajid Ali, I have no clue. I have access to perlmutter. I am thinking how to debug that. If your app is open-sourced and easy to build, then I can build and debug it. Otherwise, suppose you build and install petsc (only with options needed by your app) to a shared directory, and I can access your executable (which uses RPATH for libraries), then maybe I can debug it (I only need to install my own petsc to the shared directory) --Junchao Zhang On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed mailto:sas...@fnal.gov>> wrote: Hi Junchao, With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for this case with this email. I also ran with gpu aware mpi to see if I could reproduce the error and got the error but from a different location. This logfile is also attached. This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me know if I can share further information to help with debugging this. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=Fea4VIbc4UoqdTFjAk3kg3Hp94LYXkjR3gHIdP08lMeT-3zEDZNKDcHjRejBIggW=ezCw13eIYUcCzUki3rlnpGZWZrdcTxlGpG57GqrEz_s=> From: Junchao Zhang mailto:junchao.zh...@gmail.com>> Sent: Thursday, February 10, 2022 1:43 PM To: Sajid Ali Syed mailto:sas...@fnal.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] GAMG crash during setup when using multiple GPUs Also, try "-use_gpu_aware_mpi 0" to see if there is a difference. --Junchao Zhang On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang mailto:junchao.zh...@gmail.com>> wrote: Did it fail without GPU at 64 MPI ranks? --Junchao Zhang On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed mailto:sas...@fnal.gov>> wrote: Hi PETSc-developers, I’m seeing the following crash that occurs during the setup phase of the preconditioner when using multiple GPUs. The relevant error trace is shown below: (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, CUDA_ERROR_ALREADY_MAPPED, line no 272 [24]PETSC ERROR: - Error Message -- [24]PETSC ERROR: General MPI error [24]PETSC ERROR: MPI error 1 Invalid buffer pointer [24]PETSC ERROR: See https://petsc.org/release/faq/<https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=> for trouble shooting. [24]PETSC ERROR: Petsc Development GIT revision: f351d5494b5462f62c419e00645ac2e477b88cae GIT Date: 2022-02-08 15:08:19 + ... [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54 [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274 [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218 [24]PETSC ERROR: #4 PetscSFBcastEnd() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499 [24]PETSC ERROR: #5 VecScatterEnd_Internal() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87 [24]PETSC ERROR: #6 VecScatterEnd() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366 [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6k
Re: [petsc-users] GAMG crash during setup when using multiple GPUs
Perlmutter has problems with GPU aware MPI. This is being actively worked on at NERSc. Mark On Thu, Feb 10, 2022 at 9:22 PM Junchao Zhang wrote: > Hi, Sajid Ali, > I have no clue. I have access to perlmutter. I am thinking how to debug > that. > If your app is open-sourced and easy to build, then I can build and > debug it. Otherwise, suppose you build and install petsc (only with options > needed by your app) to a shared directory, and I can access your executable > (which uses RPATH for libraries), then maybe I can debug it (I only need to > install my own petsc to the shared directory) > > --Junchao Zhang > > > On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed wrote: > >> Hi Junchao, >> >> With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for >> this case with this email. >> >> I also ran with gpu aware mpi to see if I could reproduce the error and >> got the error but from a different location. This logfile is also attached. >> >> This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me >> know if I can share further information to help with debugging this. >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> >> ------ >> *From:* Junchao Zhang >> *Sent:* Thursday, February 10, 2022 1:43 PM >> *To:* Sajid Ali Syed >> *Cc:* petsc-users@mcs.anl.gov >> *Subject:* Re: [petsc-users] GAMG crash during setup when using multiple >> GPUs >> >> Also, try "-use_gpu_aware_mpi 0" to see if there is a difference. >> >> --Junchao Zhang >> >> >> On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang >> wrote: >> >> Did it fail without GPU at 64 MPI ranks? >> >> --Junchao Zhang >> >> >> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed wrote: >> >> Hi PETSc-developers, >> >> I’m seeing the following crash that occurs during the setup phase of the >> preconditioner when using multiple GPUs. The relevant error trace is shown >> below: >> >> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, >> CUDA_ERROR_ALREADY_MAPPED, line no 272 >> [24]PETSC ERROR: - Error Message >> -- >> [24]PETSC ERROR: General MPI error >> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer >> [24]PETSC ERROR: See https://petsc.org/release/faq/ >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=> >> for trouble shooting. >> [24]PETSC ERROR: Petsc Development GIT revision: >> f351d5494b5462f62c419e00645ac2e477b88cae GIT Date: 2022-02-08 15:08:19 + >> ... >> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54 >> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274 >> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218 >> [24]PETSC ERROR: #4 PetscSFBcastEnd() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499 >> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87 >> [24]PETSC ERROR: #6 VecScatterEnd() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366 >> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302 >> >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mpiaijcusparse.cu-3A302=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=eMW4lGCKOn_tzQeT5gnM0i9mgEMwwbOe1EkCAtKG9M8=> >> [24]PETSC ERROR: #8 MatMult() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttlj
Re: [petsc-users] GAMG crash during setup when using multiple GPUs
Hi, Sajid Ali, I have no clue. I have access to perlmutter. I am thinking how to debug that. If your app is open-sourced and easy to build, then I can build and debug it. Otherwise, suppose you build and install petsc (only with options needed by your app) to a shared directory, and I can access your executable (which uses RPATH for libraries), then maybe I can debug it (I only need to install my own petsc to the shared directory) --Junchao Zhang On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed wrote: > Hi Junchao, > > With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for > this case with this email. > > I also ran with gpu aware mpi to see if I could reproduce the error and > got the error but from a different location. This logfile is also attached. > > This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me > know if I can share further information to help with debugging this. > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > -- > *From:* Junchao Zhang > *Sent:* Thursday, February 10, 2022 1:43 PM > *To:* Sajid Ali Syed > *Cc:* petsc-users@mcs.anl.gov > *Subject:* Re: [petsc-users] GAMG crash during setup when using multiple > GPUs > > Also, try "-use_gpu_aware_mpi 0" to see if there is a difference. > > --Junchao Zhang > > > On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang > wrote: > > Did it fail without GPU at 64 MPI ranks? > > --Junchao Zhang > > > On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed wrote: > > Hi PETSc-developers, > > I’m seeing the following crash that occurs during the setup phase of the > preconditioner when using multiple GPUs. The relevant error trace is shown > below: > > (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, > CUDA_ERROR_ALREADY_MAPPED, line no 272 > [24]PETSC ERROR: - Error Message > -- > [24]PETSC ERROR: General MPI error > [24]PETSC ERROR: MPI error 1 Invalid buffer pointer > [24]PETSC ERROR: See https://petsc.org/release/faq/ > <https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=> > for trouble shooting. > [24]PETSC ERROR: Petsc Development GIT revision: > f351d5494b5462f62c419e00645ac2e477b88cae GIT Date: 2022-02-08 15:08:19 + > ... > [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54 > [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274 > [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218 > [24]PETSC ERROR: #4 PetscSFBcastEnd() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499 > [24]PETSC ERROR: #5 VecScatterEnd_Internal() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87 > [24]PETSC ERROR: #6 VecScatterEnd() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366 > [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302 > > <https://urldefense.proofpoint.com/v2/url?u=http-3A__mpiaijcusparse.cu-3A302=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=eMW4lGCKOn_tzQeT5gnM0i9mgEMwwbOe1EkCAtKG9M8=> > [24]PETSC ERROR: #8 MatMult() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438 > [24]PETSC ERROR: #9 PCApplyBAorAB() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730 > [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421 > [24]PETSC ERROR: #11 KSPGMRESCycle() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro
Re: [petsc-users] GAMG crash during setup when using multiple GPUs
Also, try "-use_gpu_aware_mpi 0" to see if there is a difference. --Junchao Zhang On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang wrote: > Did it fail without GPU at 64 MPI ranks? > > --Junchao Zhang > > > On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed wrote: > >> Hi PETSc-developers, >> >> I’m seeing the following crash that occurs during the setup phase of the >> preconditioner when using multiple GPUs. The relevant error trace is shown >> below: >> >> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, >> CUDA_ERROR_ALREADY_MAPPED, line no 272 >> [24]PETSC ERROR: - Error Message >> -- >> [24]PETSC ERROR: General MPI error >> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer >> [24]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [24]PETSC ERROR: Petsc Development GIT revision: >> f351d5494b5462f62c419e00645ac2e477b88cae GIT Date: 2022-02-08 15:08:19 + >> ... >> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54 >> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274 >> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218 >> [24]PETSC ERROR: #4 PetscSFBcastEnd() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499 >> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87 >> [24]PETSC ERROR: #6 VecScatterEnd() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366 >> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302 >> [24]PETSC ERROR: #8 MatMult() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438 >> [24]PETSC ERROR: #9 PCApplyBAorAB() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730 >> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421 >> [24]PETSC ERROR: #11 KSPGMRESCycle() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:162 >> [24]PETSC ERROR: #12 KSPSolve_GMRES() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:247 >> [24]PETSC ERROR: #13 KSPSolve_Private() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:925 >> [24]PETSC ERROR: #14 KSPSolve() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:1103 >> [24]PETSC ERROR: #15 PCGAMGOptProlongator_AGG() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/agg.c:1127 >> [24]PETSC ERROR: #16 PCSetUp_GAMG() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/gamg.c:626 >> [24]PETSC ERROR: #17 PCSetUp() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:1017 >> [24]PETSC ERROR: #18 KSPSetUp() at >> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:417 >> [24]PETSC ERROR: #19 main() at poisson3d.c:69 >> [24]PETSC ERROR: PETSc Option Table entries: >> [24]PETSC ERROR: -dm_mat_type aijcusparse >> [24]PETSC ERROR: -dm_vec_type cuda >> [24]PETSC ERROR: -ksp_monitor >> [24]PETSC ERROR: -ksp_norm_type unpreconditioned >> [24]PETSC ERROR: -ksp_type cg >> [24]PETSC ERROR: -ksp_view >> [24]PETSC ERROR: -log_view >> [24]PETSC ERROR: -mg_levels_esteig_ksp_type cg >> [24]PETSC ERROR: -mg_levels_ksp_type chebyshev >> [24]PETSC ERROR: -mg_levels_pc_type jacobi >> [24]PETSC ERROR: -pc_gamg_agg_nsmooths 1 >> [24]PETSC ERROR: -pc_gamg_square_graph 1 >> [24]PETSC ERROR: -pc_gamg_threshold 0.0 >> [24]PETSC ERROR: -pc_gamg_threshold_scale 0.0 >> [24]PETSC ERROR: -pc_gamg_type agg >> [24]PETSC ERROR: -pc_type gamg >> [24]PETSC ERROR: End of Error Message ---send entire
Re: [petsc-users] GAMG crash during setup when using multiple GPUs
Did it fail without GPU at 64 MPI ranks? --Junchao Zhang On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed wrote: > Hi PETSc-developers, > > I’m seeing the following crash that occurs during the setup phase of the > preconditioner when using multiple GPUs. The relevant error trace is shown > below: > > (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, > CUDA_ERROR_ALREADY_MAPPED, line no 272 > [24]PETSC ERROR: - Error Message > -- > [24]PETSC ERROR: General MPI error > [24]PETSC ERROR: MPI error 1 Invalid buffer pointer > [24]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [24]PETSC ERROR: Petsc Development GIT revision: > f351d5494b5462f62c419e00645ac2e477b88cae GIT Date: 2022-02-08 15:08:19 + > ... > [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54 > [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274 > [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218 > [24]PETSC ERROR: #4 PetscSFBcastEnd() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499 > [24]PETSC ERROR: #5 VecScatterEnd_Internal() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87 > [24]PETSC ERROR: #6 VecScatterEnd() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366 > [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302 > [24]PETSC ERROR: #8 MatMult() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438 > [24]PETSC ERROR: #9 PCApplyBAorAB() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730 > [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421 > [24]PETSC ERROR: #11 KSPGMRESCycle() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:162 > [24]PETSC ERROR: #12 KSPSolve_GMRES() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:247 > [24]PETSC ERROR: #13 KSPSolve_Private() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:925 > [24]PETSC ERROR: #14 KSPSolve() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:1103 > [24]PETSC ERROR: #15 PCGAMGOptProlongator_AGG() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/agg.c:1127 > [24]PETSC ERROR: #16 PCSetUp_GAMG() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/gamg.c:626 > [24]PETSC ERROR: #17 PCSetUp() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:1017 > [24]PETSC ERROR: #18 KSPSetUp() at > /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:417 > [24]PETSC ERROR: #19 main() at poisson3d.c:69 > [24]PETSC ERROR: PETSc Option Table entries: > [24]PETSC ERROR: -dm_mat_type aijcusparse > [24]PETSC ERROR: -dm_vec_type cuda > [24]PETSC ERROR: -ksp_monitor > [24]PETSC ERROR: -ksp_norm_type unpreconditioned > [24]PETSC ERROR: -ksp_type cg > [24]PETSC ERROR: -ksp_view > [24]PETSC ERROR: -log_view > [24]PETSC ERROR: -mg_levels_esteig_ksp_type cg > [24]PETSC ERROR: -mg_levels_ksp_type chebyshev > [24]PETSC ERROR: -mg_levels_pc_type jacobi > [24]PETSC ERROR: -pc_gamg_agg_nsmooths 1 > [24]PETSC ERROR: -pc_gamg_square_graph 1 > [24]PETSC ERROR: -pc_gamg_threshold 0.0 > [24]PETSC ERROR: -pc_gamg_threshold_scale 0.0 > [24]PETSC ERROR: -pc_gamg_type agg > [24]PETSC ERROR: -pc_type gamg > [24]PETSC ERROR: End of Error Message ---send entire > error message to petsc-ma...@mcs.anl.gov-- > > Attached with this email is the full error log and the submit script for a > 8-node/64-GPU/64 MPI rank job. I’ll also note that the same program did not > crash