> > > > Configure sets that automatically from --with-hip-arch - which is > auto-detected from 'rocminfo' [which appears to work on spock] > > OK, removed that.
> Perhaps it should also set --with-magma-gputarget the same way. > > > On Fri, Dec 10, 2021 at 11:08 AM Mark Adams <mfad...@lbl.gov> wrote: > > > > > It seems to be hanging on the 2 processor test. > > > I'll try running jobs manually. > > > Hm - perhaps the srun command you need is different? > > '--with-mpiexec=srun -p ecp -N 1 -A csc314 -t 00:10:00' > I use this now. I am a member of csc314. Did you ever get a debugger to work? gdb seems stuck 'reading symbols' > > Satish > > > > > > > On Fri, Dec 10, 2021 at 9:34 AM Satish Balay <ba...@mcs.anl.gov> > wrote: > > > > > >> Merged now. And the following now works [for me]. > > >> > > >> 1025 git fetch -p > > >> 1026 git checkout origin/main > > >> 1027 ./config/examples/arch-olcf-spock.py && make > > >> 1028 MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0 MPICH_GPU_SUPPORT_ENABLED=1 > > >> MPICH_SMP_SINGLE_COPY_MODE=CMA make check > > >> > > >> Satish > > >> > > >> On Fri, 10 Dec 2021, Satish Balay via petsc-dev wrote: > > >> > > >> > Works for me [per instructions in balay/update-spock, > > >> config/examples/arch-olcf-spock.py] with main - without these > additional > > >> options > > >> > > > >> > I'll go ahead and merge in balay/update-spock > > >> > > > >> > Satish > > >> > > > >> > ----- > > >> > > > >> > 1009 git fetch -p > > >> > 1015 module load emacs > > >> > 1016 module load rocm/4.3.0 > > >> > 1018 git reset --hard > > >> > 1019 git checkout origin/main > > >> > 1020 git merge origin/balay/update-spock > > >> > 1021 ./config/examples/arch-olcf-spock.py && make > > >> > > > >> > > > >> > > > >> > [balay@login2.spock petsc]$ MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0 > > >> MPICH_GPU_SUPPORT_ENABLED=1 MPICH_SMP_SINGLE_COPY_MODE=CMA make check > > >> > Running check examples to verify correct installation > > >> > Using PETSC_DIR=/autofs/nccs-svm1_home1/balay/petsc and > > >> PETSC_ARCH=arch-olcf-spock > > >> > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI > > >> process > > >> > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI > > >> processes > > >> > C/C++ example src/snes/tutorials/ex3k run successfully with > > >> kokkos-kernels > > >> > *******************Error detected during compile or > > >> link!******************* > > >> > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > >> > /ccs/home/balay/petsc/src/snes/tutorials ex5f > > >> > ********************************************************* > > >> > ftn -fPIC -fPIC -I/autofs/nccs-svm1_home1/balay/petsc/include > > >> -I/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/include > > >> -I/opt/rocm-4.3.0/include ex5f.F90 > > >> -Wl,-rpath,/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib > > >> -L/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib > > >> -Wl,-rpath,/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib > > >> -L/autofs/nccs-svm1_home1/balay/petsc/arch-olcf-spock/lib > > >> -Wl,-rpath,/opt/rocm-4.3.0/lib -L/opt/rocm-4.3.0/lib > > >> -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/gtl/lib > > >> -L/opt/cray/pe/mpich/8.1.10/gtl/lib > > >> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 > > >> -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/ > > >> 21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ > > >> 21.08.1.2/CRAY/9.0/x86_64/lib > > >> -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib > > >> -L/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib > > >> -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib > > >> -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib > -Wl,-rpath,/opt/cray/pe/pmi/6.0.14/lib > > >> -L/opt/cray/pe/pmi/6 > > >> > .0.14/li > > >> > b -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce/x86_64/lib > > >> -L/opt/cray/pe/cce/12.0.3/cce/x86_64/lib > > >> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > > >> -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > > >> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > > >> -L/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > > >> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > > >> -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > > >> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib > > >> -L/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib > > >> -lpetsc -lmagma -lkokkoskernels -lkokkoscontainers -lkokkoscore > -lhipsparse > > >> -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 > -lstdc++ > > >> -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 > -lxpmem > > >> -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup > -lgfortran > > >> -lpthread -lgcc_eh -lm -lclang_rt.craypg > > >> > o-x86_64 > > >> > -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -lmpi_gtl_hsa > -o > > >> > ex5f/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: > > >> warning: alignment 128 of symbol > > >> `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in > > >> /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than > 256 in > > >> /tmp/pe_202599/ex5f_1.o > > >> > /opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: > > >> warning: alignment 64 of symbol `$data_init$iso_c_binding_' in > > >> /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller than > 256 in > > >> /tmp/pe_202599/ex5f_1.o > > >> > Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI > > >> process > > >> > Completed test examples > > >> > [balay@login2.spock petsc]$ > > >> > > > >> > > > >> > On Fri, 10 Dec 2021, Mark Adams wrote: > > >> > > > >> > > FWIW, here is my current status. > > >> > > > > >> > > 08:08 main= spock:/gpfs/alpine/csc314/scratch/adams/petsc$ make > > >> > > PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc > > >> > > PETSC_ARCH=arch-olcf-spock check > > >> > > Running check examples to verify correct installation > > >> > > Using PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc and > > >> > > PETSC_ARCH=arch-olcf-spock > > >> > > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI > > >> process > > >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > >> > > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > >> > > 0 KSP Residual norm 0.0406612 > > >> > > 1 KSP Residual norm 0.036923 > > >> > > 2 KSP Residual norm 0.0191849 > > >> > > 3 KSP Residual norm 0.00201589 > > >> > > 4 KSP Residual norm 0.000376045 > > >> > > 5 KSP Residual norm 4.2974e-05 > > >> > > 6 KSP Residual norm 5.96585e-06 > > >> > > 7 KSP Residual norm 4.5398e-07 > > >> > > 8 KSP Residual norm 6.30474e-08 > > >> > > 9 KSP Residual norm 5.55518e-09 > > >> > > 10 KSP Residual norm 6.180e-10 > > >> > > 11 KSP Residual norm 6.211e-11 > > >> > > Linear solve converged due to CONVERGED_RTOL iterations 11 > > >> > > 0 KSP Residual norm 3.32845e-06 > > >> > > 1 KSP Residual norm 9.0003e-07 > > >> > > 2 KSP Residual norm 1.32594e-07 > > >> > > 3 KSP Residual norm 1.49857e-08 > > >> > > 4 KSP Residual norm 1.31887e-09 > > >> > > 5 KSP Residual norm 2.105e-10 > > >> > > 6 KSP Residual norm 2.827e-11 > > >> > > 7 KSP Residual norm < 1.e-11 > > >> > > 8 KSP Residual norm < 1.e-11 > > >> > > 9 KSP Residual norm < 1.e-11 > > >> > > 10 KSP Residual norm < 1.e-11 > > >> > > Linear solve converged due to CONVERGED_RTOL iterations 10 > > >> > > Number of SNES iterations = 2 > > >> > > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI > > >> processes > > >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > >> > > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. > > >> > > 0 KSP Residual norm 0.0406612 > > >> > > 1 KSP Residual norm 0.0281101 > > >> > > 2 KSP Residual norm 0.00773873 > > >> > > 3 KSP Residual norm 0.00165731 > > >> > > 4 KSP Residual norm 0.000395614 > > >> > > 5 KSP Residual norm 8.67655e-05 > > >> > > 6 KSP Residual norm 1.69495e-05 > > >> > > 7 KSP Residual norm 3.70051e-06 > > >> > > 8 KSP Residual norm 5.97067e-07 > > >> > > 9 KSP Residual norm 1.02242e-07 > > >> > > 10 KSP Residual norm 1.75727e-08 > > >> > > 11 KSP Residual norm 3.84826e-09 > > >> > > 12 KSP Residual norm 6.414e-10 > > >> > > 13 KSP Residual norm 1.380e-10 > > >> > > Linear solve converged due to CONVERGED_RTOL iterations 13 > > >> > > 0 KSP Residual norm 3.32846e-06 > > >> > > 1 KSP Residual norm 8.99139e-07 > > >> > > 2 KSP Residual norm 1.72893e-07 > > >> > > 3 KSP Residual norm 3.733e-08 > > >> > > 4 KSP Residual norm 6.67427e-09 > > >> > > 5 KSP Residual norm 1.22785e-09 > > >> > > 6 KSP Residual norm 2.551e-10 > > >> > > 7 KSP Residual norm 5.458e-11 > > >> > > 8 KSP Residual norm 1.050e-11 > > >> > > 9 KSP Residual norm < 1.e-11 > > >> > > 10 KSP Residual norm < 1.e-11 > > >> > > 11 KSP Residual norm < 1.e-11 > > >> > > 12 KSP Residual norm < 1.e-11 > > >> > > Linear solve converged due to CONVERGED_RTOL iterations 12 > > >> > > Number of SNES iterations = 2 > > >> > > 3,5c3,14 > > >> > > < 1 SNES Function norm 4.12227e-06 > > >> > > < 2 SNES Function norm 6.098e-11 > > >> > > < Number of SNES iterations = 2 > > >> > > --- > > >> > > > 0 KSP Residual norm 0.0406612 > > >> > > > 1 KSP Residual norm 0.21263 > > >> > > > 2 KSP Residual norm 1.09192 > > >> > > > 3 KSP Residual norm 6.9087 > > >> > > > 4 KSP Residual norm 23.4292 > > >> > > > 5 KSP Residual norm 57.7558 > > >> > > > 6 KSP Residual norm 118.076 > > >> > > > 7 KSP Residual norm 213.527 > > >> > > > 8 KSP Residual norm 354.101 > > >> > > > 9 KSP Residual norm 550.58 > > >> > > > Linear solve did not converge due to DIVERGED_DTOL iterations > 9 > > >> > > > Number of SNES iterations = 0 > > >> > > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials > > >> > > Possible problem with ex19 running with hypre, diffs above > > >> > > ========================================= > > >> > > gmake[3]: [makefile:115: runex3k_kokkos] Error 134 (ignored) > > >> > > 21,25c21,26 > > >> > > < 1 SNES Function norm 2.952582418265e-01 > > >> > > < 2 SNES Function norm 4.502293658739e-04 > > >> > > < 3 SNES Function norm 1.389665806646e-09 > > >> > > < Number of SNES iterations = 3 > > >> > > < Norm of error 1.49752e-10 Iterations 3 > > >> > > --- > > >> > > > Memory access fault by GPU node-4 (Agent handle: 0xb08c90) on > > >> address > > >> > > 0xe17000. Reason: Page not present or supervisor privilege. > > >> > > > Memory access fault by GPU node-5 (Agent handle: 0xb0d3c0) on > > >> address > > >> > > 0xe11000. Reason: Page not present or supervisor privilege. > > >> > > > srun: error: spock25: task 0: Aborted > > >> > > > srun: launch/slurm: _step_signal: Terminating StepId=304034.3 > > >> > > > slurmstepd: error: *** STEP 304034.3 ON spock25 CANCELLED AT > > >> > > 2021-12-10T08:08:40 *** > > >> > > > srun: error: spock25: task 1: Aborted (core dumped) > > >> > > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials > > >> > > Possible problem with ex3k running with kokkos-kernels, diffs > above > > >> > > ========================================= > > >> > > *******************Error detected during compile or > > >> link!******************* > > >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > >> > > /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f > > >> > > ********************************************************* > > >> > > ftn -fPIC -fPIC > -I/gpfs/alpine/csc314/scratch/adams/petsc/include > > >> > > -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/include > > >> > > > > >> > -I/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/include > > >> > > -I/opt/rocm-4.3.0/include ex5f.F90 > > >> > > > > >> -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/lib > > >> > > -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-spock/lib > > >> > > > > >> > -Wl,-rpath,/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/lib > > >> > > > > >> > -L/gpfs/alpine/geo127/proj-shared/spock/petsc/current/arch-opt-cray/lib > > >> > > -Wl,-rpath,/opt/rocm-4.3.0/lib -L/opt/rocm-4.3.0/lib > > >> > > -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/gtl/lib > > >> > > -L/opt/cray/pe/mpich/8.1.10/gtl/lib > > >> > > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 > > >> > > -L/opt/cray/pe/gcc/8.1.0/snos/lib64 > -Wl,-rpath,/opt/cray/pe/libsci/ > > >> > > 21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/ > > >> > > 21.08.1.2/CRAY/9.0/x86_64/lib > > >> > > -Wl,-rpath,/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib > > >> > > -L/opt/cray/pe/mpich/8.1.10/ofi/cray/10.0/lib > > >> > > -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib > > >> > > -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib > > >> -Wl,-rpath,/opt/cray/pe/pmi/6.0.14/lib > > >> > > -L/opt/cray/pe/pmi/6.0.14/lib > > >> > > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce/x86_64/lib > > >> > > -L/opt/cray/pe/cce/12.0.3/cce/x86_64/lib > > >> > > -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > > >> > > -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64 > > >> > > > > >> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > > >> > > > -L/opt/cray/pe/cce/12.0.3/cce-clang/x86_64/lib/clang/12.0.0/lib/linux > > >> > > > -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > > >> > > -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 > > >> > > > > >> > -Wl,-rpath,/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib > > >> > > > -L/opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-unknown-linux-gnu/lib > > >> > > -lpetsc -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore > > >> -lhipsparse > > >> > > -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 > > >> -lstdc++ > > >> > > -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 > > >> -lxpmem > > >> > > -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup > > >> -lgfortran > > >> > > -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 > > >> -lclang_rt.builtins-x86_64 > > >> > > -lquadmath -lstdc++ -ldl -lmpi_gtl_hsa -o ex5f > > >> > > > /opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: > > >> > > warning: alignment 128 of symbol > > >> > > `$host_init$$runtime_init_for_iso_c_binding$iso_c_binding_' in > > >> > > /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller > than > > >> 256 in > > >> > > /tmp/pe_46424/ex5f_1.o > > >> > > > /opt/cray/pe/cce/12.0.3/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld: > > >> > > warning: alignment 64 of symbol `$data_init$iso_c_binding_' in > > >> > > /opt/cray/pe/cce/12.0.3/cce/x86_64/lib/libmodules.so is smaller > than > > >> 256 in > > >> > > /tmp/pe_46424/ex5f_1.o > > >> > > Possible error running Fortran example src/snes/tutorials/ex5f > with 1 > > >> MPI > > >> > > process > > >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > >> > > 0 KSP Residual norm < 1.e-11 > > >> > > Linear solve converged due to CONVERGED_ATOL iterations 0 > > >> > > > > >> > > On Fri, Dec 10, 2021 at 8:07 AM Mark Adams <mfad...@lbl.gov> > wrote: > > >> > > > > >> > > > I am trying to get Spock working (again) and am having problems. > > >> > > > > > >> > > > * make check seems to fail but it is hard to see what is going > on. > > >> Maybe > > >> > > > we should start here, but let me continue. > > >> > > > > > >> > > > * GAMG seems to work on the CPU > > >> > > > > > >> > > > * I have this for configuring with Kokkos. I am guessing these > > >> versions > > >> > > > are out of data. What is current practice: > > >> > > > '--with-kokkos-hip-arch=VEGA908', > > >> > > > '--download-kokkos-commit=3.4.01', > > >> > > > '--download-kokkos-kernels-commit=3.4.01', > > >> > > > > > >> > > > * Should I hold off (and tell my eager user to do same)? > > >> > > > > > >> > > > Thanks, > > >> > > > Mark > > >> > > > > > >> > > > > >> > > > >> > > >> > > > >