Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-11 Thread Sajid Ali Syed
Hi Mark,

Thanks for the information.

@Junchao: Given that there are known issues with GPU aware MPI, it might be 
best to wait until there is an updated version of cray-mpich (which hopefully 
contains the relevant fixes).

Thank You,
Sajid Ali (he/him) | Research Associate
Scientific Computing Division
Fermi National Accelerator Laboratory
s-sajid-ali.github.io<http://s-sajid-ali.github.io>


From: Mark Adams 
Sent: Thursday, February 10, 2022 8:47 PM
To: Junchao Zhang 
Cc: Sajid Ali Syed ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] GAMG crash during setup when using multiple GPUs

Perlmutter has problems with GPU aware MPI.
This is being actively worked on at NERSc.

Mark

On Thu, Feb 10, 2022 at 9:22 PM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Sajid Ali,
  I have no clue. I have access to perlmutter.  I am thinking how to debug that.
  If your app is open-sourced and easy to build, then I can build and debug it. 
Otherwise, suppose you build and install petsc (only with options needed by 
your app) to a shared directory, and I can access your executable (which uses 
RPATH for libraries), then maybe I can debug it (I only need to install my own 
petsc to the shared directory)

--Junchao Zhang


On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed 
mailto:sas...@fnal.gov>> wrote:
Hi Junchao,

With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for this 
case with this email.

I also ran with gpu aware mpi to see if I could reproduce the error and got the 
error but from a different location. This logfile is also attached.

This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me know 
if I can share further information to help with debugging this.

Thank You,
Sajid Ali (he/him) | Research Associate
Scientific Computing Division
Fermi National Accelerator Laboratory
s-sajid-ali.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=Fea4VIbc4UoqdTFjAk3kg3Hp94LYXkjR3gHIdP08lMeT-3zEDZNKDcHjRejBIggW=ezCw13eIYUcCzUki3rlnpGZWZrdcTxlGpG57GqrEz_s=>


From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Thursday, February 10, 2022 1:43 PM
To: Sajid Ali Syed mailto:sas...@fnal.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] GAMG crash during setup when using multiple GPUs

Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Did it fail without GPU at 64 MPI ranks?

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed 
mailto:sas...@fnal.gov>> wrote:

Hi PETSc-developers,

I’m seeing the following crash that occurs during the setup phase of the 
preconditioner when using multiple GPUs. The relevant error trace is shown 
below:

(GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
CUDA_ERROR_ALREADY_MAPPED, line no 272
[24]PETSC ERROR: - Error Message 
--
[24]PETSC ERROR: General MPI error
[24]PETSC ERROR: MPI error 1 Invalid buffer pointer
[24]PETSC ERROR: See 
https://petsc.org/release/faq/<https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=>
 for trouble shooting.
[24]PETSC ERROR: Petsc Development GIT revision: 
f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
...
[24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
[24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
[24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
[24]PETSC ERROR: #4 PetscSFBcastEnd() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
[24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
[24]PETSC ERROR: #6 VecScatterEnd() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
[24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6k

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Mark Adams
Perlmutter has problems with GPU aware MPI.
This is being actively worked on at NERSc.

Mark

On Thu, Feb 10, 2022 at 9:22 PM Junchao Zhang 
wrote:

> Hi, Sajid Ali,
>   I have no clue. I have access to perlmutter.  I am thinking how to debug
> that.
>   If your app is open-sourced and easy to build, then I can build and
> debug it. Otherwise, suppose you build and install petsc (only with options
> needed by your app) to a shared directory, and I can access your executable
> (which uses RPATH for libraries), then maybe I can debug it (I only need to
> install my own petsc to the shared directory)
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed  wrote:
>
>> Hi Junchao,
>>
>> With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for
>> this case with this email.
>>
>> I also ran with gpu aware mpi to see if I could reproduce the error and
>> got the error but from a different location. This logfile is also attached.
>>
>> This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me
>> know if I can share further information to help with debugging this.
>>
>> Thank You,
>> Sajid Ali (he/him) | Research Associate
>> Scientific Computing Division
>> Fermi National Accelerator Laboratory
>> s-sajid-ali.github.io
>>
>> ------
>> *From:* Junchao Zhang 
>> *Sent:* Thursday, February 10, 2022 1:43 PM
>> *To:* Sajid Ali Syed 
>> *Cc:* petsc-users@mcs.anl.gov 
>> *Subject:* Re: [petsc-users] GAMG crash during setup when using multiple
>> GPUs
>>
>> Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
>> wrote:
>>
>> Did it fail without GPU at 64 MPI ranks?
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:
>>
>> Hi PETSc-developers,
>>
>> I’m seeing the following crash that occurs during the setup phase of the
>> preconditioner when using multiple GPUs. The relevant error trace is shown
>> below:
>>
>> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
>> CUDA_ERROR_ALREADY_MAPPED, line no 272
>> [24]PETSC ERROR: - Error Message 
>> --
>> [24]PETSC ERROR: General MPI error
>> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
>> [24]PETSC ERROR: See https://petsc.org/release/faq/ 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=>
>>  for trouble shooting.
>> [24]PETSC ERROR: Petsc Development GIT revision: 
>> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
>> ...
>> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
>> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
>> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
>> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
>> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
>> [24]PETSC ERROR: #6 VecScatterEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
>> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mpiaijcusparse.cu-3A302=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=eMW4lGCKOn_tzQeT5gnM0i9mgEMwwbOe1EkCAtKG9M8=>
>> [24]PETSC ERROR: #8 MatMult() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttlj

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Junchao Zhang
Hi, Sajid Ali,
  I have no clue. I have access to perlmutter.  I am thinking how to debug
that.
  If your app is open-sourced and easy to build, then I can build and debug
it. Otherwise, suppose you build and install petsc (only with options
needed by your app) to a shared directory, and I can access your executable
(which uses RPATH for libraries), then maybe I can debug it (I only need to
install my own petsc to the shared directory)

--Junchao Zhang


On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed  wrote:

> Hi Junchao,
>
> With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for
> this case with this email.
>
> I also ran with gpu aware mpi to see if I could reproduce the error and
> got the error but from a different location. This logfile is also attached.
>
> This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me
> know if I can share further information to help with debugging this.
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
>
> --
> *From:* Junchao Zhang 
> *Sent:* Thursday, February 10, 2022 1:43 PM
> *To:* Sajid Ali Syed 
> *Cc:* petsc-users@mcs.anl.gov 
> *Subject:* Re: [petsc-users] GAMG crash during setup when using multiple
> GPUs
>
> Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
> wrote:
>
> Did it fail without GPU at 64 MPI ranks?
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:
>
> Hi PETSc-developers,
>
> I’m seeing the following crash that occurs during the setup phase of the
> preconditioner when using multiple GPUs. The relevant error trace is shown
> below:
>
> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
> CUDA_ERROR_ALREADY_MAPPED, line no 272
> [24]PETSC ERROR: - Error Message 
> --
> [24]PETSC ERROR: General MPI error
> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
> [24]PETSC ERROR: See https://petsc.org/release/faq/ 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=>
>  for trouble shooting.
> [24]PETSC ERROR: Petsc Development GIT revision: 
> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
> ...
> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
> [24]PETSC ERROR: #6 VecScatterEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
>  
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mpiaijcusparse.cu-3A302=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=eMW4lGCKOn_tzQeT5gnM0i9mgEMwwbOe1EkCAtKG9M8=>
> [24]PETSC ERROR: #8 MatMult() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438
> [24]PETSC ERROR: #9 PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730
> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421
> [24]PETSC ERROR: #11 KSPGMRESCycle() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Junchao Zhang
Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
wrote:

> Did it fail without GPU at 64 MPI ranks?
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:
>
>> Hi PETSc-developers,
>>
>> I’m seeing the following crash that occurs during the setup phase of the
>> preconditioner when using multiple GPUs. The relevant error trace is shown
>> below:
>>
>> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
>> CUDA_ERROR_ALREADY_MAPPED, line no 272
>> [24]PETSC ERROR: - Error Message 
>> --
>> [24]PETSC ERROR: General MPI error
>> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
>> [24]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [24]PETSC ERROR: Petsc Development GIT revision: 
>> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
>> ...
>> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
>> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
>> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
>> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
>> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
>> [24]PETSC ERROR: #6 VecScatterEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
>> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
>> [24]PETSC ERROR: #8 MatMult() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438
>> [24]PETSC ERROR: #9 PCApplyBAorAB() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730
>> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421
>> [24]PETSC ERROR: #11 KSPGMRESCycle() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:162
>> [24]PETSC ERROR: #12 KSPSolve_GMRES() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:247
>> [24]PETSC ERROR: #13 KSPSolve_Private() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:925
>> [24]PETSC ERROR: #14 KSPSolve() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:1103
>> [24]PETSC ERROR: #15 PCGAMGOptProlongator_AGG() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/agg.c:1127
>> [24]PETSC ERROR: #16 PCSetUp_GAMG() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/gamg.c:626
>> [24]PETSC ERROR: #17 PCSetUp() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:1017
>> [24]PETSC ERROR: #18 KSPSetUp() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:417
>> [24]PETSC ERROR: #19 main() at poisson3d.c:69
>> [24]PETSC ERROR: PETSc Option Table entries:
>> [24]PETSC ERROR: -dm_mat_type aijcusparse
>> [24]PETSC ERROR: -dm_vec_type cuda
>> [24]PETSC ERROR: -ksp_monitor
>> [24]PETSC ERROR: -ksp_norm_type unpreconditioned
>> [24]PETSC ERROR: -ksp_type cg
>> [24]PETSC ERROR: -ksp_view
>> [24]PETSC ERROR: -log_view
>> [24]PETSC ERROR: -mg_levels_esteig_ksp_type cg
>> [24]PETSC ERROR: -mg_levels_ksp_type chebyshev
>> [24]PETSC ERROR: -mg_levels_pc_type jacobi
>> [24]PETSC ERROR: -pc_gamg_agg_nsmooths 1
>> [24]PETSC ERROR: -pc_gamg_square_graph 1
>> [24]PETSC ERROR: -pc_gamg_threshold 0.0
>> [24]PETSC ERROR: -pc_gamg_threshold_scale 0.0
>> [24]PETSC ERROR: -pc_gamg_type agg
>> [24]PETSC ERROR: -pc_type gamg
>> [24]PETSC ERROR: End of Error Message ---send entire 

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Junchao Zhang
Did it fail without GPU at 64 MPI ranks?

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:

> Hi PETSc-developers,
>
> I’m seeing the following crash that occurs during the setup phase of the
> preconditioner when using multiple GPUs. The relevant error trace is shown
> below:
>
> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
> CUDA_ERROR_ALREADY_MAPPED, line no 272
> [24]PETSC ERROR: - Error Message 
> --
> [24]PETSC ERROR: General MPI error
> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
> [24]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [24]PETSC ERROR: Petsc Development GIT revision: 
> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
> ...
> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
> [24]PETSC ERROR: #6 VecScatterEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
> [24]PETSC ERROR: #8 MatMult() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438
> [24]PETSC ERROR: #9 PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730
> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421
> [24]PETSC ERROR: #11 KSPGMRESCycle() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:162
> [24]PETSC ERROR: #12 KSPSolve_GMRES() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:247
> [24]PETSC ERROR: #13 KSPSolve_Private() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:925
> [24]PETSC ERROR: #14 KSPSolve() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:1103
> [24]PETSC ERROR: #15 PCGAMGOptProlongator_AGG() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/agg.c:1127
> [24]PETSC ERROR: #16 PCSetUp_GAMG() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/gamg.c:626
> [24]PETSC ERROR: #17 PCSetUp() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:1017
> [24]PETSC ERROR: #18 KSPSetUp() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:417
> [24]PETSC ERROR: #19 main() at poisson3d.c:69
> [24]PETSC ERROR: PETSc Option Table entries:
> [24]PETSC ERROR: -dm_mat_type aijcusparse
> [24]PETSC ERROR: -dm_vec_type cuda
> [24]PETSC ERROR: -ksp_monitor
> [24]PETSC ERROR: -ksp_norm_type unpreconditioned
> [24]PETSC ERROR: -ksp_type cg
> [24]PETSC ERROR: -ksp_view
> [24]PETSC ERROR: -log_view
> [24]PETSC ERROR: -mg_levels_esteig_ksp_type cg
> [24]PETSC ERROR: -mg_levels_ksp_type chebyshev
> [24]PETSC ERROR: -mg_levels_pc_type jacobi
> [24]PETSC ERROR: -pc_gamg_agg_nsmooths 1
> [24]PETSC ERROR: -pc_gamg_square_graph 1
> [24]PETSC ERROR: -pc_gamg_threshold 0.0
> [24]PETSC ERROR: -pc_gamg_threshold_scale 0.0
> [24]PETSC ERROR: -pc_gamg_type agg
> [24]PETSC ERROR: -pc_type gamg
> [24]PETSC ERROR: End of Error Message ---send entire 
> error message to petsc-ma...@mcs.anl.gov--
>
> Attached with this email is the full error log and the submit script for a
> 8-node/64-GPU/64 MPI rank job. I’ll also note that the same program did not
> crash