I don't see a problem in the matrix assembly. If you point me to your repo and show me how to build it, I can try to reproduce.
--Junchao Zhang On Mon, Aug 14, 2023 at 2:53 PM Vanella, Marcos (Fed) < marcos.vane...@nist.gov> wrote: > Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type > asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as > (I understand) is done in the ex60. The error is always the same, so it > seems it is not related to ksp,pc. Indeed it seems to happen when trying to > offload the Matrix to the GPU: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > ... > #8 0x20003935fc6b in ??? > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > #9 0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec769b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efd6a3 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efd6a3 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > #13 0x11efd6a3 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efd6a3 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efd6a3 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11edb287 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU* > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ* > ... > ... > > This is the piece of fortran code I have doing this within my Poisson > solver: > > ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag > blocks nonzeros per row to 5. > CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL% > NUNKH_TOTAL,ZSL%NUNKH_TOTAL,& > 7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS% > A_H,PETSC_IERR) > CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR) > DO IROW=1,ZSL%NUNKH_LOCAL > DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW) > ! PETSC expects zero based indexes.1,Global I position (zero > base),1,Global J position (zero base) > CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1 > ,ZSL%JD_MAT_H(JCOL,IROW)-1,& > ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR) > ENDDO > ENDDO > CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR) > > Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size), > and add nonzero values one by one. I wonder if there is something related > to this that the copying to GPU does not like. > Thanks, > Marcos > > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Monday, August 14, 2023 3:24 PM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov> > *Cc:* PETSc users list <petsc-users@mcs.anl.gov>; Satish Balay < > ba...@mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Yeah, it looks like ex60 was run correctly. > Double check your code again and if you still run into errors, we can try > to reproduce on our end. > > Thanks. > --Junchao Zhang > > > On Mon, Aug 14, 2023 at 1:05 PM Vanella, Marcos (Fed) < > marcos.vane...@nist.gov> wrote: > > Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The > batch script for slurm submission, ex60.log and gpu stats files are > attached. > Nothing stands out as wrong to me but please have a look. > I'll revisit running the original 2 MPI process + 1 GPU Poisson problem. > Thanks! > Marcos > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Friday, August 11, 2023 5:52 PM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov> > *Cc:* PETSc users list <petsc-users@mcs.anl.gov>; Satish Balay < > ba...@mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Before digging into the details, could you try to run > src/ksp/ksp/tests/ex60.c to make sure the environment is ok. > > The comment at the end shows how to run it > test: > requires: cuda > suffix: 1_cuda > nsize: 4 > args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type > cusparse > > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 4:36 PM Vanella, Marcos (Fed) < > marcos.vane...@nist.gov> wrote: > > Hi Junchao, thank you for the info. I compiled the main branch of PETSc in > another machine that has the openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain > and don't see the fortran compilation error. It might have been related to > gcc-9.3. > I tried the case again, 2 CPUs and one GPU and get this error now: > > terminate called after throwing an instance of > 'thrust::system::system_error' > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > what(): parallel_for failed: cudaErrorInvalidConfiguration: invalid > configuration argument > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #0 0x2000397fcd8f in ??? > #1 0x2000397fb657 in ??? > #2 0x2000000604d7 in ??? > #2 0x2000000604d7 in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #3 0x200039cb9628 in ??? > #4 0x200039c93eb3 in ??? > #5 0x200039364a97 in ??? > #6 0x20003935f6d3 in ??? > #7 0x20003935f78f in ??? > #8 0x20003935fc6b in ??? > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > #9 0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225 > #10 0x11ec425b in > _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at > /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88 > #11 0x11efa263 in > _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_ > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55 > #12 0x11efa263 in > _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93 > #13 0x11efa263 in > _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_ > at > /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #14 0x11efa263 in > _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm > at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254 > #15 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #16 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213 > #17 0x11efa263 in > _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em > at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65 > #18 0x11ed7e47 in > _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em > at /usr/local/cuda-11.7/include/thrust/device_vector.h:88 > #19 0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:2488 > #20 0x11eef623 in MatSeqAIJCUSPARSEMergeMats > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4696 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #21 0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:251 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #22 0x133f141f in MatMPIAIJGetLocalMatMerge > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #23 0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #24 0x1377e1df in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #25 0x11e4dd1f in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #26 0x130d792f in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #27 0x130db89b in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #28 0x130bf5a3 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #29 0x141518ff in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #30 0x13b3a43f in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #31 0x1276845b in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #32 0x127d6cbb in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #33 0x127dddbf in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #34 0x127e4987 in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #35 0x1280b18b in kspsolve_ > at > /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #36 0x1140945f in __globmat_solver_MOD_glmat_solver > at ../../Source/pres.f90:3128 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #37 0x119f8853 in pressure_iteration_scheme > at ../../Source/main.f90:1449 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #38 0x11969bd3 in fds > at ../../Source/main.f90:688 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > #39 0x11a10167 in main > at ../../Source/main.f90:6 > srun: error: enki12: tasks 0-1: Aborted (core dumped) > > > This was the slurm submission script in this case: > > #!/bin/bash > # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds > #SBATCH -J test > #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err > #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log > #SBATCH --partition=debug > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > > # PETSc dir and arch: > export PETSC_DIR=/home/mnv/Software/petsc > export PETSC_ARCH=arch-linux-c-dbg > > # SYSTEM name: > export MYSYSTEM=enki > > # modules > module load cuda/11.7 > module load gcc/11.2.1/toolset > module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7 > > cd /home/mnv/Firemodels_fork/fds/Issues/PETSc > srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2 > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg > > The configure.log for the PETSc build is attached. Another clue to what > is happening is that even setting the matrices/vectors to be mpi (-vec_type > mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning : > > 0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: GPU error > [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: GPU error > [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100 > (cudaErrorNoDevice) : no CUDA-capable device is detected > [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the > program crashed before usage or a spelling mistake, etc! > [0]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Option left: name:-pc_type value: gamg source: command > line > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad > GIT Date: 2023-08-11 15:13:02 +0000 > [0]PETSC ERROR: > /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db > on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023 > [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2" > FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2" > --with-debugging=yes --with-shared-libraries=0 --download-suitesparse > --download-hypre --download-fblaslapack --with-cuda > ... > > I would have expected not to see GPU errors being printed out, given I did > not request cuda matrix/vectors. The case run anyways, I assume it > defaulted to the CPU solver. > Let me know if you have any ideas as to what is happening. Thanks, > Marcos > > > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Friday, August 11, 2023 3:35 PM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov>; PETSc users list < > petsc-users@mcs.anl.gov>; Satish Balay <ba...@mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Marcos, > We do not have good petsc/gpu documentation, but see > https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires: > cuda" in petsc tests and you will find examples using GPU. > For the Fortran compile errors, attach your configure.log and Satish > (Cc'ed) or others should know how to fix them. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 2:22 PM Vanella, Marcos (Fed) < > marcos.vane...@nist.gov> wrote: > > Hi Junchao, thanks for the explanation. Is there some development > documentation on the GPU work? I'm interested learning about it. > I checked out the main branch and configured petsc. when compiling with > gcc/gfortran I come across this error: > > .... > CUDAC > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > CUDAC.dep > arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o > FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61: > > 37 | subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z) > | 1 > *Error: Symbol ‘pcasmcreatesubdomains2d’ at (1) already has an explicit > interface* > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13: > > 38 | import tIS > | 1 > Error: IMPORT statement at (1) only permitted in an INTERFACE body > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80: > > 39 | PetscInt a ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80: > > 40 | PetscInt b ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80: > > 41 | PetscInt c ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80: > > 42 | PetscInt d ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80: > > 43 | PetscInt e ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80: > > 44 | PetscInt f ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80: > > 45 | PetscInt g ! PetscInt > | > 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30: > > 46 | IS h ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30: > > 47 | IS i ! IS > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43: > > 48 | PetscErrorCode z > | 1 > Error: Unexpected data declaration statement in INTERFACE block at (1) > > /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10: > > 49 | end subroutine PCASMCreateSubdomains2D > | 1 > Error: Expecting END INTERFACE statement at (1) > make[3]: *** [gmakefile:225: > arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > CC > arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o > CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o > CUDAC > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > CUDAC.dep > arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o > make[3]: Leaving directory '/home/mnv/Software/petsc' > make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs] > Error 2 > make[2]: Leaving directory '/home/mnv/Software/petsc' > **************************ERROR************************************* > Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log > Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to > petsc-ma...@mcs.anl.gov > ******************************************************************** > make[1]: *** [makefile:45: all] Error 1 > make: *** [GNUmakefile:9: all] Error 2 > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Friday, August 11, 2023 3:04 PM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Macros, > I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack. > We recently refactored the COO code and got rid of that function. So could > you try petsc/main? > We map MPI processes to GPUs in a round-robin fashion. We query the > number of visible CUDA devices (g), and assign the device (rank%g) to the > MPI process (rank). In that sense, the work distribution is totally > determined by your MPI work partition (i.e, yourself). > On clusters, this MPI process to GPU binding is usually done by the job > scheduler like slurm. You need to check your cluster's users' guide to see > how to bind MPI processes to GPUs. If the job scheduler has done that, the > number of visible CUDA devices to a process might just appear to be 1, > making petsc's own mapping void. > > Thanks. > --Junchao Zhang > > > On Fri, Aug 11, 2023 at 12:43 PM Vanella, Marcos (Fed) < > marcos.vane...@nist.gov> wrote: > > Hi Junchao, thank you for replying. I compiled petsc in debug mode and > this is what I get for the case: > > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > #0 0x15264731ead0 in ??? > #1 0x15264731dc35 in ??? > #2 0x15264711551f in ??? > #3 0x152647169a7c in ??? > #4 0x152647115475 in ??? > #5 0x1526470fb7f2 in ??? > #6 0x152647678bbd in ??? > #7 0x15264768424b in ??? > #8 0x1526476842b6 in ??? > #9 0x152647684517 in ??? > #10 0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc > at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224 > #11 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316 > #12 0x55bb46342ebb in > _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544 > #13 0x55bb46342ebb in > _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_ > at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669 > #14 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_ > at /usr/local/cuda/include/thrust/detail/sort.inl:115 > #15 0x55bb46317bc5 in > _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_ > at /usr/local/cuda/include/thrust/detail/sort.inl:305 > #16 0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/ > aijcusparse.cu:4452 > #17 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:173 > #18 0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/ > mpiaijcusparse.cu:222 > #19 0x55bb468e01cf in MatSetPreallocationCOO > at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606 > #20 0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND > at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547 > #21 0x55bb469015e5 in MatProductSymbolic > at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803 > #22 0x55bb4694ade2 in MatPtAP > at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897 > #23 0x55bb4696d3ec in MatCoarsenApply_MISK_private > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283 > #24 0x55bb4696eb67 in MatCoarsenApply_MISK > at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368 > #25 0x55bb4695bd91 in MatCoarsenApply > at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97 > #26 0x55bb478294d8 in PCGAMGCoarsen_AGG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524 > #27 0x55bb471d1cb4 in PCSetUp_GAMG > at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631 > #28 0x55bb464022cf in PCSetUp > at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994 > #29 0x55bb4718b8a7 in KSPSetUp > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406 > #30 0x55bb4718f22e in KSPSolve_Private > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824 > #31 0x55bb47192c0c in KSPSolve > at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070 > #32 0x55bb463efd35 in kspsolve_ > at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320 > #33 0x55bb45e94b32 in ??? > #34 0x55bb46048044 in ??? > #35 0x55bb46052ea1 in ??? > #36 0x55bb45ac5f8e in ??? > #37 0x1526470fcd8f in ??? > #38 0x1526470fce3f in ??? > #39 0x55bb45aef55d in ??? > #40 0xffffffffffffffff in ??? > -------------------------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code. Per user-direction, the job has been aborted. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited > on signal 6 (Aborted). > -------------------------------------------------------------------------- > > BTW, I'm curious. If I set n MPI processes, each of them building a part > of the linear system, and g GPUs, how does PETSc distribute those n pieces > of system matrix and rhs in the g GPUs? Does it do some load balancing > algorithm? Where can I read about this? > Thank you and best Regards, I can also point you to my code repo in GitHub > if you want to take a closer look. > > Best Regards, > Marcos > > ------------------------------ > *From:* Junchao Zhang <junchao.zh...@gmail.com> > *Sent:* Friday, August 11, 2023 10:52 AM > *To:* Vanella, Marcos (Fed) <marcos.vane...@nist.gov> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> > *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi > processes and 1 GPU > > Hi, Marcos, > Could you build petsc in debug mode and then copy and paste the whole > error stack message? > > Thanks > --Junchao Zhang > > > On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > Hi, I'm trying to run a parallel matrix vector build and linear solution > with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix > build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda > enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the > following error: > > terminate called after throwing an instance of > 'thrust::system::system_error' > *what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: > an illegal memory access was encountered* > > Program received signal SIGABRT: Process abort signal. > > Backtrace for this error: > terminate called after throwing an instance of > 'thrust::system::system_error' > what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an > illegal memory access was encountered > > Program received signal SIGABRT: Process abort signal. > > I'm new to submitting jobs in slurm that also use GPU resources, so I > might be doing something wrong in my submission script. This is it: > > #!/bin/bash > #SBATCH -J test > #SBATCH -e /home/Issues/PETSc/test.err > #SBATCH -o /home/Issues/PETSc/test.log > #SBATCH --partition=batch > #SBATCH --ntasks=2 > #SBATCH --nodes=1 > #SBATCH --cpus-per-task=1 > #SBATCH --ntasks-per-node=2 > #SBATCH --time=01:00:00 > #SBATCH --gres=gpu:1 > > export OMP_NUM_THREADS=1 > module load cuda/11.5 > module load openmpi/4.1.1 > > cd /home/Issues/PETSc > *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds > *-vec_type > mpicuda -mat_type mpiaijcusparse -pc_type gamg* > > If anyone has any suggestions on how o troubleshoot this please let me > know. > Thanks! > Marcos > > > >