And nvidia is OK: 12:16 nid002872 main= perlmutter:~/petsc$ make PETSC_DIR=/global/homes/m/madams/petsc PETSC_ARCH=arch-perlmutter-opt-nvidia-kokkos-cuda -f gmakefile test search='snes_tutorials-ex19_cuda' Using MAKEFLAGS: -- search=snes_tutorials-ex19_cuda PETSC_ARCH=arch-perlmutter-opt-nvidia-kokkos-cuda PETSC_DIR=/global/homes/m/madams/petsc CC arch-perlmutter-opt-nvidia-kokkos-cuda/tests/snes/tutorials/ex19.o CLINKER arch-perlmutter-opt-nvidia-kokkos-cuda/tests/snes/tutorials/ex19 TEST arch-perlmutter-opt-nvidia-kokkos-cuda/tests/counts/snes_tutorials-ex19_cuda.counts ok snes_tutorials-ex19_cuda ok diff-snes_tutorials-ex19_cuda 12:17 nid002872 main= perlmutter:~/petsc$
On Fri, Jan 7, 2022 at 2:23 PM Mark Adams <mfad...@lbl.gov> wrote: > And it looks universal: > > 11:21 nid001544 main= perlmutter:~/petsc$ make > PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda -f gmakefile test search=' > *snes_tutorials-ex19_cuda*' > Using MAKEFLAGS: -- search=snes_tutorials-ex19_cuda > PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda > CC > arch-perlmutter-opt-gcc-kokkos-cuda/tests/snes/tutorials/ex19.o > CLINKER arch-perlmutter-opt-gcc-kokkos-cuda/tests/snes/tutorials/ex19 > TEST > arch-perlmutter-opt-gcc-kokkos-cuda/tests/counts/snes_tutorials-ex19_cuda.counts > # retrying snes_tutorials-ex19_cuda > not ok snes_tutorials-ex19_cuda # Error code: 97 > # lid velocity = 0.0625, prandtl # = 1., grashof # = 1. > # [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > # [0]PETSC ERROR: GPU error > # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED) > # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. > # [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-511-g96172674f3 > GIT Date: 2022-01-06 23:44:32 +0000 > # [0]PETSC ERROR: > /global/u2/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/snes/tutorials/runex19_cuda/../ex19 > on a arch-perlmutter-opt-gcc-kokkos-cuda named nid001544 by madams Fri Jan > 7 11:22:35 2022 > # [0]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2 > -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 > -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler > -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" > --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 > --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc > --COPTFLAGS=" -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS=" -O3" > --with-debugging=0 --download-metis --download-parmetis --with-cuda=1 > --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1 > --with-zlib=1 --download-kokkos --download-kokkos-kernels > --with-kokkos-kernels-tpl=0 --with-make-np=8 > PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda > # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at > /global/u2/m/madams/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:994 > # [0]PETSC ERROR: #2 VecNorm() at > /global/u2/m/madams/petsc/src/vec/vec/interface/rvector.c:228 > # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at > /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c:179 > # [0]PETSC ERROR: #4 SNESSolve() at > /global/u2/m/madams/petsc/src/snes/interface/snes.c:4810 > # [0]PETSC ERROR: #5 main() at > /global/u2/m/madams/petsc/src/snes/tutorials/ex19.c:159 > # [0]PETSC ERROR: PETSc Option Table entries: > # [0]PETSC ERROR: -check_pointer_intensity 0 > # [0]PETSC ERROR: -dm_mat_type aijcusparse > # [0]PETSC ERROR: -dm_vec_type cuda > # [0]PETSC ERROR: -error_output_stdout > # [0]PETSC ERROR: -ksp_type fgmres > # [0]PETSC ERROR: -malloc_dump > # [0]PETSC ERROR: -nox > # [0]PETSC ERROR: -nox_warning > # [0]PETSC ERROR: -pc_type none > # [0]PETSC ERROR: -snes_monitor_short > # [0]PETSC ERROR: -snes_rtol 1.e-5 > # [0]PETSC ERROR: -use_gpu_aware_mpi 0 > # [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-ma...@mcs.anl.gov---------- > # MPICH Notice [Rank 0] [job id 1041592.1] [Fri Jan 7 11:22:36 2022] > [nid001544] - Abort(97) (rank 0 in comm 0): application called > MPI_Abort(MPI_COMM_WORLD, 97) - process 0 > # > # Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize() > # srun: error: nid001544: task 0: Exited with exit code 97 > # srun: launch/slurm: _step_signal: Terminating StepId=1041592.1 > ok snes_tutorials-ex19_cuda # SKIP Command failed so no diff > > > > On Fri, Jan 7, 2022 at 2:20 PM Mark Adams <mfad...@lbl.gov> wrote: > >> Well, this sure looks like it is deterministic: >> >> 11:00 130 nid002645 main= perlmutter:~/petsc$ make >> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda -f gmakefile test >> search='ts_utils_dmplexlandau_tutorials-ex1_cuda' >> Using MAKEFLAGS: -- search=ts_utils_dmplexlandau_tutorials-ex1_cuda >> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda >> CC >> arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/ex1.o >> CLINKER >> arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/ex1 >> TEST >> arch-perlmutter-opt-gcc-kokkos-cuda/tests/counts/ts_utils_dmplexlandau_tutorials-ex1_cuda.counts >> # retrying ts_utils_dmplexlandau_tutorials-ex1_cuda >> not ok ts_utils_dmplexlandau_tutorials-ex1_cuda # Error code: 97 >> # masses: e= 9.109e-31; ions in proton mass units: 2.000e+00 >> 4.000e+00 ... >> # charges: e=-1.602e-19; charges in elementary units: 1.000e+00 >> 1.800e+01 >> # n: e: 1.000e+00 i: 1.000e+00 >> 1.000e-05 >> # thermal T (K): e= 5.802e+07 i= 5.802e+07 5.802e+06. v_0= 2.965e+07 ( >> 9.892e-02c) n_0= 1.000e+20 t_0= 6.470e-05, classical, Intuitive, 1 batched >> # Domain radius (AMR levels) grid 0: 5. (2) , 1: 8.252e-02 (1) >> # 0) FormLandau 352 IPs, 22 cells total, Nb=16, Nq=16, dim=2, Tab: Nb=16 >> Nf=3 Np=16 cdim=2 N=324 >> # 0) species-0: charge density= -1.6024538233648e+01 z-momentum= >> -1.7133689250463e-19 energy= 1.2009868166183e+05 >> # 0) species-1: charge density= 1.6068752193414e+01 z-momentum= >> -5.3757901114069e-19 energy= 1.1752123333205e+05 >> # 0) species-2: charge density= 2.7152642046328e-04 z-momentum= >> 3.4016005887213e-21 energy= 1.1701658725855e+00 >> # 0) Total: charge density= 4.4485486187274e-02, momentum= >> -7.0551430305659e-19, energy= 2.3762108515975e+05 (m_i[0]/m_e = 3670.94, >> 14 cells on electron grid) >> # 0 TS dt 0.1 time 0. >> # [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> # [0]PETSC ERROR: GPU error >> # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED) >> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >> shooting. >> # [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-511-g96172674f3 >> GIT Date: 2022-01-06 23:44:32 +0000 >> # [0]PETSC ERROR: >> /global/u2/m/madams/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/runex1_cuda/../ex1 >> on a arch-perlmutter-opt-gcc-kokkos-cuda named nid002645 by madams Fri Jan >> 7 11:15:42 2022 >> # [0]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2 >> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 >> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler >> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" >> --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 >> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc >> --COPTFLAGS=" -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS=" -O3" >> --with-debugging=0 --download-metis --download-parmetis --with-cuda=1 >> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1 >> --with-zlib=1 --download-kokkos --download-kokkos-kernels >> --with-kokkos-kernels-tpl=0 --with-make-np=8 >> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda >> # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at >> /global/u2/m/madams/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:994 >> # [0]PETSC ERROR: #2 VecNorm() at >> /global/u2/m/madams/petsc/src/vec/vec/interface/rvector.c:228 >> # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at >> /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c:179 >> # [0]PETSC ERROR: #4 SNESSolve() at >> /global/u2/m/madams/petsc/src/snes/interface/snes.c:4810 >> # [0]PETSC ERROR: #5 TSStep_ARKIMEX() at >> /global/u2/m/madams/petsc/src/ts/impls/arkimex/arkimex.c:845 >> # [0]PETSC ERROR: #6 TSStep() at >> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3572 >> # [0]PETSC ERROR: #7 TSSolve() at >> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3971 >> # [0]PETSC ERROR: #8 main() at >> /global/u2/m/madams/petsc/src/ts/utils/dmplexlandau/tutorials/ex1.c:45 >> # [0]PETSC ERROR: PETSc Option Table entries: >> # [0]PETSC ERROR: -check_pointer_intensity 0 >> # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1 >> # [0]PETSC ERROR: -dm_landau_device_type cuda >> # [0]PETSC ERROR: -dm_landau_ion_charges 1,18 >> >> On Fri, Jan 7, 2022 at 1:52 PM Junchao Zhang <junchao.zh...@gmail.com> >> wrote: >> >>> >>> >>> >>> On Fri, Jan 7, 2022 at 11:17 AM Mark Adams <mfad...@lbl.gov> wrote: >>> >>>> These are cuda/cusparse tests. The Kokkos versions are fine and >>>> cusparse w/o a Kokkos build is fine. >>>> >>>> I do have some #ifdefs in the code. Maybe something snuck into the >>>> #ifdef KOKKOS, but I can't imagine what that could even be. >>>> >>>> I have had problems with very large "cuda" jobs (on Summit with 21 MPI >>>> processes per GPU) running out of "resources" with a Kokkos build, that >>>> went away with a pure CUDA build (ie, w/o Kokkos), but these are tiny >>>> tests. >>>> >>> If Kokkos is initialized on MPI ranks, then each rank will consume >>> resources on GPU. >>> >>>> >>>> I will try it again. >>>> >>>> Thanks, >>>> >>>> >>>> On Fri, Jan 7, 2022 at 12:06 PM Junchao Zhang <junchao.zh...@gmail.com> >>>> wrote: >>>> >>>>> It failed when you did not even pass any vec/mat kokkos options? It >>>>> does not make sense and you need to double check that. >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Thu, Jan 6, 2022 at 9:33 PM Mark Adams <mfad...@lbl.gov> wrote: >>>>> >>>>>> I seem to have a regression with using aijcusprase in a kokkos build. >>>>>> It's OK with a straight CUDA build. >>>>>> >>>>>> # [0]PETSC ERROR: --------------------- Error Message >>>>>> -------------------------------------------------------------- >>>>>> # [0]PETSC ERROR: GPU error >>>>>> # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED) >>>>>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>>>>> shooting. >>>>>> # [0]PETSC ERROR: Petsc Development GIT revision: >>>>>> v3.16.3-511-g96172674f3 GIT Date: 2022-01-06 23:44:32 +0000 >>>>>> # [0]PETSC ERROR: >>>>>> /global/u2/m/madams/petsc_install/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/runex1_cuda/../ex1 >>>>>> on a arch-perlmutter-opt-gcc-kokkos-cuda named nid003188 by madams Thu >>>>>> Jan >>>>>> 6 19:29:06 2022 >>>>>> # [0]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2 >>>>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 >>>>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler >>>>>> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" >>>>>> --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 >>>>>> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc >>>>>> --COPTFLAGS=" -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS=" -O3" >>>>>> --with-debugging=0 --download-metis --download-parmetis --with-cuda=1 >>>>>> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1 >>>>>> --with-zlib=1 --download-kokkos --download-kokkos-kernels >>>>>> --with-kokkos-kernels-tpl=0 --with-make-np=8 >>>>>> PETSC_DIR=/global/homes/m/madams/petsc_install/petsc >>>>>> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda >>>>>> # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/impls/seq/seqcuda/ >>>>>> veccuda2.cu:994 >>>>>> # [0]PETSC ERROR: #2 VecNorm() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/interface/rvector.c:228 >>>>>> # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/snes/impls/ls/ls.c:179 >>>>>> # [0]PETSC ERROR: #4 SNESSolve() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c:4810 >>>>>> # [0]PETSC ERROR: #5 TSStep_ARKIMEX() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/ts/impls/arkimex/arkimex.c:845 >>>>>> # [0]PETSC ERROR: #6 TSStep() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3572 >>>>>> # [0]PETSC ERROR: #7 TSSolve() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3971 >>>>>> # [0]PETSC ERROR: #8 main() at >>>>>> /global/u2/m/madams/petsc_install/petsc/src/ts/utils/dmplexlandau/tutorials/ex1.c:45 >>>>>> # [0]PETSC ERROR: PETSc Option Table entries: >>>>>> # [0]PETSC ERROR: -check_pointer_intensity 0 >>>>>> # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1 >>>>>> # [0]PETSC ERROR: -dm_landau_device_type cuda >>>>>> # [0]PETSC ERROR: -dm_landau_ion_charges 1,18 >>>>>> # [0]PETSC ERROR: -dm_landau_ion_masses 2,4 >>>>>> # [0]PETSC ERROR: -dm_landau_n 1.00018,1,1e-5 >>>>>> # [0]PETSC ERROR: -dm_landau_n_0 1e20 >>>>>> # [0]PETSC ERROR: -dm_landau_num_species_grid 1,2 >>>>>> # [0]PETSC ERROR: -dm_landau_thermal_temps 5,5,.5 >>>>>> # [0]PETSC ERROR: -dm_landau_type p4est >>>>>> # [0]PETSC ERROR: -dm_mat_type aijcusparse >>>>>> # [0]PETSC ERROR: -dm_preallocate_only false >>>>>> # [0]PETSC ERROR: -dm_vec_type cuda >>>>>> # [0]PETSC ERROR: -error_output_stdout >>>>>> # [0]PETSC ERROR: -ksp_type preonly >>>>>> # [0]PETSC ERROR: -malloc_dump >>>>>> # [0]PETSC ERROR: -mat_cusparse_use_cpu_solve >>>>>> # [0]PETSC ERROR: -nox >>>>>> # [0]PETSC ERROR: -nox_warning >>>>>> # [0]PETSC ERROR: -pc_type lu >>>>>> # [0]PETSC ERROR: -petscspace_degree 3 >>>>>> # [0]PETSC ERROR: -petscspace_poly_tensor 1 >>>>>> # [0]PETSC ERROR: -snes_converged_reason >>>>>> # [0]PETSC ERROR: -snes_monitor >>>>>> # [0]PETSC ERROR: -snes_rtol 1.e-14 >>>>>> # [0]PETSC ERROR: -snes_stol 1.e-14 >>>>>> # [0]PETSC ERROR: -ts_adapt_clip .5,1.25 >>>>>> # [0]PETSC ERROR: -ts_adapt_scale_solve_failed 0.75 >>>>>> # [0]PETSC ERROR: -ts_adapt_time_step_increase_delay 5 >>>>>> # [0]PETSC ERROR: -ts_arkimex_type 1bee >>>>>> # [0]PETSC ERROR: -ts_dt 1.e-1 >>>>>> # [0]PETSC ERROR: -ts_max_snes_failures -1 >>>>>> # [0]PETSC ERROR: -ts_max_steps 1 >>>>>> # [0]PETSC ERROR: -ts_max_time 1 >>>>>> # [0]PETSC ERROR: -ts_monitor >>>>>> # [0]PETSC ERROR: -ts_rtol 1e-1 >>>>>> # [0]PETSC ERROR: -ts_type arkimex >>>>>> # [0]PETSC ERROR: -use_gpu_aware_mpi 0 >>>>>> # [0]PETSC ERROR: ----------------End of Error Message -------send >>>>>> entire error message to petsc-ma...@mcs.anl.gov---------- >>>>>> >>>>>