Re: [petsc-users] Compiling PETSc in Polaris with gnu
Thank you Satish, I'll give it a try. From: Satish Balay Sent: Thursday, May 2, 2024 5:51 PM To: Vanella, Marcos (Fed) Cc: Junchao Zhang ; petsc-users ; Mueller, Eric V. (Fed) Subject: Re: [petsc-users] Compiling PETSc in Polaris with gnu Perhaps you need to: module load craype-accel-nvidia80 And then rebuild PETSc, your application And have the same list of modules loaded at runtime. Satish On Thu, 2 May 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Thank you Satish and Junchao! I was able to compile PETSc with your configure > options + suitesparse and hypre, and then compile my fortran code linking to > PETSc. > But when I try to run my test run I'm picking up an error at the very > beginning: > > MPICH ERROR [Rank 0] [job id 01eb3c4a-28a7-4178-aced-512b4fb704c6] [Thu May > 2 20:44:26 2024] [x3006c0s19b1n0] - Abort(-1) (rank 0 in comm 0): > MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not > linked > (Other MPI error) > > aborting job: > MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not > linked > > It says in the Polaris user guide that: > > The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your > application requires MPI-GPU support whereby the MPI library sends and > receives data directly from GPU buffers. In this case, it will be important > to have the craype-accel-nvidia80 module loaded both when compiling your > application and during runtime to correctly link against a GPU Transport > Layer (GTL) MPI library. Otherwise, you'll likely see GPU_SUPPORT_ENABLED is > requested, but GTL library is not linked errors during runtime. > > I tried adding loading this module (also needed to add nvhpc-mixed) in my > submission script but I get the same result. > I'll get in touch with alcf help on this. > > > > > From: Satish Balay > Sent: Thursday, May 2, 2024 11:58 AM > To: Junchao Zhang > Cc: petsc-users ; Vanella, Marcos (Fed) > ; Mueller, Eric V. (Fed) > Subject: Re: [petsc-users] Compiling PETSc in Polaris with gnu > > I just tried a build (used default versions) - and the following builds for > me [on the login node]. > > > module use /soft/modulefiles > module load PrgEnv-gnu > module load cudatoolkit-standalone > module load cray-libsci > ./configure --with-cc=cc --with-fc=ftn --with-cxx=CC --with-make-np=4 > --with-cuda=1 --with-cudac=nvcc --with-cuda-arch=80 \ > --with-debugging=0 COPTFLAGS=-O2 CXXOPTFLAGS=-O2 FOPTFLAGS=-O2 > CUDAOPTFLAGS=-O2 --download-kokkos --download-kokkos-kernels > make > > Satish > > --- > > balay@polaris-login-01:~> module list > > Currently Loaded Modules: > 1) libfabric/1.15.2.0 4) darshan/3.4.4 7) cray-dsmml/0.2.2 10) > cray-pals/1.3.4 13) PrgEnv-gnu/8.5.0 > 2) craype-network-ofi 5) gcc-native/12.3 8) cray-mpich/8.1.28 11) > cray-libpals/1.3.4 14) cudatoolkit-standalone/12.2.2 > 3) perftools-base/23.12.0 6) craype/2.7.30 9) cray-pmi/6.1.1312) > craype-x86-milan15) cray-libsci/23.12.5 > > > On Thu, 2 May 2024, Junchao Zhang wrote: > > > I used cudatoolkit-standalone/12.4.1 and gcc-12.3. > > > > Be sure to use the latest petsc/main or petsc/release, which contains fixes > > for Polaris. > > > > --Junchao Zhang > > > > > > On Thu, May 2, 2024 at 10:23 AM Satish Balay via petsc-users < > > petsc-users@mcs.anl.gov> wrote: > > > > > Try: > > > > > > module use /soft/modulefiles > > > > > > Satish > > > > > > On Thu, 2 May 2024, Vanella, Marcos (Fed) via petsc-users wrote: > > > > > > > Hi all, it seems the modules in Polaris have changed (can't find > > > cudatoolkit-standalone anymore). > > > > Does anyone have recent experience compiling the library with gnu and > > > cuda in the machine? > > > > Thank you! > > > > Marcos > > > > > > > > > > > > >
Re: [petsc-users] Compiling PETSc in Polaris with gnu
Thank you Satish and Junchao! I was able to compile PETSc with your configure options + suitesparse and hypre, and then compile my fortran code linking to PETSc. But when I try to run my test run I'm picking up an error at the very beginning: MPICH ERROR [Rank 0] [job id 01eb3c4a-28a7-4178-aced-512b4fb704c6] [Thu May 2 20:44:26 2024] [x3006c0s19b1n0] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked (Other MPI error) aborting job: MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked It says in the Polaris user guide that: The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your application requires MPI-GPU support whereby the MPI library sends and receives data directly from GPU buffers. In this case, it will be important to have the craype-accel-nvidia80 module loaded both when compiling your application and during runtime to correctly link against a GPU Transport Layer (GTL) MPI library. Otherwise, you'll likely see GPU_SUPPORT_ENABLED is requested, but GTL library is not linked errors during runtime. I tried adding loading this module (also needed to add nvhpc-mixed) in my submission script but I get the same result. I'll get in touch with alcf help on this. From: Satish Balay Sent: Thursday, May 2, 2024 11:58 AM To: Junchao Zhang Cc: petsc-users ; Vanella, Marcos (Fed) ; Mueller, Eric V. (Fed) Subject: Re: [petsc-users] Compiling PETSc in Polaris with gnu I just tried a build (used default versions) - and the following builds for me [on the login node]. module use /soft/modulefiles module load PrgEnv-gnu module load cudatoolkit-standalone module load cray-libsci ./configure --with-cc=cc --with-fc=ftn --with-cxx=CC --with-make-np=4 --with-cuda=1 --with-cudac=nvcc --with-cuda-arch=80 \ --with-debugging=0 COPTFLAGS=-O2 CXXOPTFLAGS=-O2 FOPTFLAGS=-O2 CUDAOPTFLAGS=-O2 --download-kokkos --download-kokkos-kernels make Satish --- balay@polaris-login-01:~> module list Currently Loaded Modules: 1) libfabric/1.15.2.0 4) darshan/3.4.4 7) cray-dsmml/0.2.2 10) cray-pals/1.3.4 13) PrgEnv-gnu/8.5.0 2) craype-network-ofi 5) gcc-native/12.3 8) cray-mpich/8.1.28 11) cray-libpals/1.3.4 14) cudatoolkit-standalone/12.2.2 3) perftools-base/23.12.0 6) craype/2.7.30 9) cray-pmi/6.1.1312) craype-x86-milan15) cray-libsci/23.12.5 On Thu, 2 May 2024, Junchao Zhang wrote: > I used cudatoolkit-standalone/12.4.1 and gcc-12.3. > > Be sure to use the latest petsc/main or petsc/release, which contains fixes > for Polaris. > > --Junchao Zhang > > > On Thu, May 2, 2024 at 10:23 AM Satish Balay via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > > Try: > > > > module use /soft/modulefiles > > > > Satish > > > > On Thu, 2 May 2024, Vanella, Marcos (Fed) via petsc-users wrote: > > > > > Hi all, it seems the modules in Polaris have changed (can't find > > cudatoolkit-standalone anymore). > > > Does anyone have recent experience compiling the library with gnu and > > cuda in the machine? > > > Thank you! > > > Marcos > > > > > > > >
[petsc-users] Compiling PETSc in Polaris with gnu
Hi all, it seems the modules in Polaris have changed (can't find cudatoolkit-standalone anymore). Does anyone have recent experience compiling the library with gnu and cuda in the machine? Thank you! Marcos
Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time
Thank you Barry and Satish. Trying it now. From: Barry Smith Sent: Monday, April 29, 2024 12:15 PM To: Vanella, Marcos (Fed) Cc: ba...@mcs.anl.gov ; petsc-users Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time --with-x=0 On Apr 29, 2024, at 12:05 PM, Vanella, Marcos (Fed) via petsc-users wrote: This Message Is From an External Sender This message came from outside your organization. Hi Satish, Ok thank you for clarifying. I don't need to include Metis in the config phase then (not using anywhere else). Is there a way I can configure PETSc to not require X11 (Xgraph functions, etc.)? Thank you, Marcos From: Satish Balay mailto:ba...@mcs.anl.gov>> Sent: Monday, April 29, 2024 12:00 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time # Other CMakeLists.txt files inside SuiteSparse are from dependent packages # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis # which is a slightly revised copy of METIS 5.0.1) but none of those # CMakeLists.txt files are used to build any package in SuiteSparse. So suitesparse includes a copy of metis sources - i.e does not use external metis library? >> balay@pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway libcholmod.so<https://urldefense.us/v3/__http://libcholmod.so/__;!!G_uCfscf7eWS!YIYZ1vTytYFBE5oIEuet3ePXOjBBGc_S-RV9ifF34mCwbBIPNMHUWhE0UXHsPrrNvUcv_j84iDub1KspAUwetDdPCYFkOISg$ >:0026e500 T SuiteSparse_metis_METIS_PartGraphKway <<< And metis routines are already in -lcholmod [with some namespace fixes] Satish On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at > configure time with PETSc? Using Metis for reordering at symbolic > factorization phase gives lower filling for factorization matrices than AMD > in some cases (faster solution phase). > I tried this with gcc compilers and openmpi: > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" > FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 > --download-metis --download-suitesparse --download-hypre > --download-fblaslapack --download-make --force > > and get for SuiteSparse: > > metis: > Version:5.1.0 > Includes: > -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: > -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib > -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > SuiteSparse: > Version:7.7.0 > Includes: > -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse > -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: > -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib > -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr > -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd > -lsuitesparseconfig > > for which I see Metis will be compiled but I don't have -lmetis linking in > the SuiteSparse Libraries. > Thank you for your time! > Marcos >
Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time
Hi Satish, Ok thank you for clarifying. I don't need to include Metis in the config phase then (not using anywhere else). Is there a way I can configure PETSc to not require X11 (Xgraph functions, etc.)? Thank you, Marcos From: Satish Balay Sent: Monday, April 29, 2024 12:00 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time # Other CMakeLists.txt files inside SuiteSparse are from dependent packages # (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis # which is a slightly revised copy of METIS 5.0.1) but none of those # CMakeLists.txt files are used to build any package in SuiteSparse. So suitesparse includes a copy of metis sources - i.e does not use external metis library? >> balay@pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway libcholmod.so:0026e500 T SuiteSparse_metis_METIS_PartGraphKway <<< And metis routines are already in -lcholmod [with some namespace fixes] Satish On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote: > Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at > configure time with PETSc? Using Metis for reordering at symbolic > factorization phase gives lower filling for factorization matrices than AMD > in some cases (faster solution phase). > I tried this with gcc compilers and openmpi: > > $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" > FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 > --download-metis --download-suitesparse --download-hypre > --download-fblaslapack --download-make --force > > and get for SuiteSparse: > > metis: > Version:5.1.0 > Includes: > -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: > -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib > -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis > SuiteSparse: > Version:7.7.0 > Includes: > -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse > -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include > Libraries: > -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib > -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr > -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd > -lsuitesparseconfig > > for which I see Metis will be compiled but I don't have -lmetis linking in > the SuiteSparse Libraries. > Thank you for your time! > Marcos >
[petsc-users] Asking SuiteSparse to use Metis at PETSc config time
Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at configure time with PETSc? Using Metis for reordering at symbolic factorization phase gives lower filling for factorization matrices than AMD in some cases (faster solution phase). I tried this with gcc compilers and openmpi: $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 --download-metis --download-suitesparse --download-hypre --download-fblaslapack --download-make --force and get for SuiteSparse: metis: Version:5.1.0 Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis SuiteSparse: Version:7.7.0 Includes: -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include Libraries: -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig for which I see Metis will be compiled but I don't have -lmetis linking in the SuiteSparse Libraries. Thank you for your time! Marcos
[petsc-users] Compiling PETSc with strumpack in ORNL Frontier
Hi all, we are trying to compile PETSc in Frontier using the structured matrix hierarchical solver strumpack, which uses GPU and might be a good candidate for our Poisson discretization. The list of libs I used for PETSc in this case is: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3 --offload-arch=gfx90a" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hip-arch=gfx908 --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${CRAY_XPMEM_POST_LINK_OPTS} -lxpmem ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-superlu_dist --download-strumpack --download-metis --download-slate --download-magma --download-parmetis --download-ptscotch --download-zfp --download-butterflypack --with-openmp-dir=/opt/cray/pe/gcc/12.2.0/snos --download-scalapack --download-cmake --force I'm getting an error at configure time: ... Trying to download https://urldefense.us/v3/__https://github.com/liuyangzhuan/ButterflyPACK__;!!G_uCfscf7eWS!cW5KuKKMbmDa8n59SJGArXdSxVT_-V0qH3vt1-LE-CAr4wShO2pTXN3GvI0bVCwUh6RWH6z2URqBczHnVEyXXKAJ2LN7JnSj$ for BUTTERFLYPACK = = Configuring BUTTERFLYPACK with CMake; this may take several minutes = = Compiling and installing BUTTERFLYPACK; this may take several minutes = = Trying to download https://urldefense.us/v3/__https://github.com/pghysels/STRUMPACK__;!!G_uCfscf7eWS!cW5KuKKMbmDa8n59SJGArXdSxVT_-V0qH3vt1-LE-CAr4wShO2pTXN3GvI0bVCwUh6RWH6z2URqBczHnVEyXXKAJ2FeUr7dA$ for STRUMPACK = = Configuring STRUMPACK with CMake; this may take several minutes = = Compiling and installing STRUMPACK; this may take several minutes = * UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): - Error running make on STRUMPACK * Looking in the configure.log file I see error like this related to strumpack compilation: /opt/cray/pe/craype/2.7.19/bin/CC -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dstrumpack_EXPORTS -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src -I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build -isystem /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/include -isystem /opt/rocm-5.4.0/include -isystem /opt/rocm-5.4.0/hip/include -isystem /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/.. -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -fPIC -Wall -Wno-overloaded-virtual -fopenmp -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -MF CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o.d -o CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -c /autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src/clustering/NeighborSearch.cpp gmake[2]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' gmake[1]: Leaving directory '/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build' stdout: g++: error: unrecognized command-line option '--offload-arch=gfx900' g++: error: unrecognized command-line
Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs
Ok, thanks. I'll try it when the machine comes back online. Cheers, M From: Mark Adams Sent: Tuesday, March 19, 2024 5:15 PM To: Vanella, Marcos (Fed) Cc: PETSc users list Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs You want: -mat_type aijhipsparse On Tue, Mar 19, 2024 at 5:06 PM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Mark, thanks. I'll try your suggestions. So, I would keep -mat_type mpiaijkokkos but -vec_type hip as runtime options? Thanks, Marcos From: Mark Adams mailto:mfad...@lbl.gov>> Sent: Tuesday, March 19, 2024 4:57 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: PETSc users list mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs [keep on list] I have little experience with running hypre on GPUs but others might have more. 1M dogs/node is not a lot and NVIDIA has larger L1 cache and more mature compilers, etc. so it is not surprising that NVIDIA is faster. I suspect the gap would narrow with a larger problem. Also, why are you using Kokkos? It should not make a difference but you could check easily. Just use -vec_type hip with your current code. You could also test with GAMG, -pc_type gamg Mark On Tue, Mar 19, 2024 at 4:12 PM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Mark, I run a canonical test we have to time our code. It is a propane fire on a burner within a box with around 1 million cells. I split the problem in 4 GPUS, single node, both in Polaris and Frontier. I compiled PETSc with gnu and HYPRE being downloaded and the following configure options: * Polaris: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" CUDAOPTFLAGS="-O3" --with-debugging=0 --download-suitesparse --download-hypre --with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80 --download-cmake * Frontier: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-cmake Our code was compiled also with gnu compilers and -O3 flag. I used latest (from this week) PETSc repo update. These are the timings for the test case: * 8 meshes + 1Million cells case, 8 MPI processes, 4 GPUS, 2 MPI Procs per GPU, 1 sec run time (~580 time steps, ~1160 Poisson solves): System Poisson Solver GPU Implementation Poisson Wall time (sec) Total Wall time (sec) Polaris CG + HYPRE PC CUDA80 287 FrontierCG + HYPRE PC Kokkos + HIP158 401 It is interesting to see that the Poisson solves take twice the time in Frontier than in Polaris. Do you have experience on running HYPRE AMG on these machines? Is this difference between the CUDA implementation and Kokkos-kernels to be expected? I can run the case in both computers with the log flags you suggest. Might give more information on where the differences are. Thank you for your time, Marcos From: Mark Adams mailto:mfad...@lbl.gov>> Sent: Tuesday, March 5, 2024 2:41 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs You can run with -log_view_gpu_time to get rid of the nans and get more data. You can run with -ksp_view to get more info on the solver and send that output. -options_left is also good to use so we can see what parameters you used. The last 100 in this row: KSPSolve1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04 3.1e+04 12 100 100 100 49 12 100 100 100 98 2503-nan 0 1.80e-050 0.00e+00 100 tells us that all the flops were logged on GPUs. You do need at least 100K equations per GPU to see speedup, so don't worry about small problems. Mark On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 ZjQcmQRYFpfptBannerStart This M
Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs
Hi Mark, thanks. I'll try your suggestions. So, I would keep -mat_type mpiaijkokkos but -vec_type hip as runtime options? Thanks, Marcos From: Mark Adams Sent: Tuesday, March 19, 2024 4:57 PM To: Vanella, Marcos (Fed) Cc: PETSc users list Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs [keep on list] I have little experience with running hypre on GPUs but others might have more. 1M dogs/node is not a lot and NVIDIA has larger L1 cache and more mature compilers, etc. so it is not surprising that NVIDIA is faster. I suspect the gap would narrow with a larger problem. Also, why are you using Kokkos? It should not make a difference but you could check easily. Just use -vec_type hip with your current code. You could also test with GAMG, -pc_type gamg Mark On Tue, Mar 19, 2024 at 4:12 PM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Mark, I run a canonical test we have to time our code. It is a propane fire on a burner within a box with around 1 million cells. I split the problem in 4 GPUS, single node, both in Polaris and Frontier. I compiled PETSc with gnu and HYPRE being downloaded and the following configure options: * Polaris: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" CUDAOPTFLAGS="-O3" --with-debugging=0 --download-suitesparse --download-hypre --with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc --with-cuda-arch=80 --download-cmake * Frontier: $./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-cmake Our code was compiled also with gnu compilers and -O3 flag. I used latest (from this week) PETSc repo update. These are the timings for the test case: * 8 meshes + 1Million cells case, 8 MPI processes, 4 GPUS, 2 MPI Procs per GPU, 1 sec run time (~580 time steps, ~1160 Poisson solves): System Poisson Solver GPU Implementation Poisson Wall time (sec) Total Wall time (sec) Polaris CG + HYPRE PC CUDA80 287 FrontierCG + HYPRE PC Kokkos + HIP158 401 It is interesting to see that the Poisson solves take twice the time in Frontier than in Polaris. Do you have experience on running HYPRE AMG on these machines? Is this difference between the CUDA implementation and Kokkos-kernels to be expected? I can run the case in both computers with the log flags you suggest. Might give more information on where the differences are. Thank you for your time, Marcos From: Mark Adams mailto:mfad...@lbl.gov>> Sent: Tuesday, March 5, 2024 2:41 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs You can run with -log_view_gpu_time to get rid of the nans and get more data. You can run with -ksp_view to get more info on the solver and send that output. -options_left is also good to use so we can see what parameters you used. The last 100 in this row: KSPSolve1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04 3.1e+04 12 100 100 100 49 12 100 100 100 98 2503-nan 0 1.80e-050 0.00e+00 100 tells us that all the flops were logged on GPUs. You do need at least 100K equations per GPU to see speedup, so don't worry about small problems. Mark On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIB
Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs
Thank you Mark, I'll try the options you suggest to get more info. I'm also building PETSc and the code with the cray compiler suite to test. The test I'm running has 1 million unknowns. I was able to see good scaling up to 4 gpus on this case in Polaris. Talk soon, Marcos From: Mark Adams Sent: Tuesday, March 5, 2024 2:41 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs You can run with -log_view_gpu_time to get rid of the nans and get more data. You can run with -ksp_view to get more info on the solver and send that output. -options_left is also good to use so we can see what parameters you used. The last 100 in this row: KSPSolve1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04 3.1e+04 12 100 100 100 49 12 100 100 100 98 2503-nan 0 1.80e-050 0.00e+00 100 tells us that all the flops were logged on GPUs. You do need at least 100K equations per GPU to see speedup, so don't worry about small problems. Mark On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-cmake and have started testing our code solving a Poisson linear system with CG + HYPRE preconditioner. Timings look rather high compared to compilations done on other machines that have NVIDIA cards. They are also not changing when using more than one GPU for the simple test I doing. Does anyone happen to know if HYPRE has an hip GPU implementation for Boomer AMG and is it compiled when configuring PETSc? Thanks! Marcos PS: This is what I see on the log file (-log_view) when running the case with 2 GPUs in the node: -- PETSc Performance Summary: -- /ccs/home/vanellam/Firemodels_fork/fds/Build/mpich_gnu_frontier/fds_mpich_gnu_frontier on a arch-linux-frontier-opt-gcc named frontier04119 with 4 processors, by vanellam Tue Mar 5 12:42:29 2024 Using Petsc Development GIT revision: v3.20.5-713-gabdf6bc0fcf GIT Date: 2024-03-05 01:04:54 + Max Max/Min Avg Total Time (sec): 8.368e+02 1.000 8.368e+02 Objects: 0.000e+00 0.000 0.000e+00 Flops:2.546e+11 0.000 1.270e+11 5.079e+11 Flops/sec:3.043e+08 0.000 1.518e+08 6.070e+08 MPI Msg Count:1.950e+04 0.000 9.748e+03 3.899e+04 MPI Msg Len (bytes): 1.560e+09 0.000 7.999e+04 3.119e+09 MPI Reductions: 6.331e+04 2877.545 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: - Time -- - Flop -- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %TotalCount %Total Avg %TotalCount %Total 0: Main Stage: 8.3676e+02 100.0% 5.0792e+11 100.0% 3.899e+04 100.0% 7.999e+04 100.0% 3.164e+04 50.0% See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in th
[petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn --with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" --download-kokkos --download-kokkos-kernels --download-suitesparse --download-hypre --download-cmake and have started testing our code solving a Poisson linear system with CG + HYPRE preconditioner. Timings look rather high compared to compilations done on other machines that have NVIDIA cards. They are also not changing when using more than one GPU for the simple test I doing. Does anyone happen to know if HYPRE has an hip GPU implementation for Boomer AMG and is it compiled when configuring PETSc? Thanks! Marcos PS: This is what I see on the log file (-log_view) when running the case with 2 GPUs in the node: -- PETSc Performance Summary: -- /ccs/home/vanellam/Firemodels_fork/fds/Build/mpich_gnu_frontier/fds_mpich_gnu_frontier on a arch-linux-frontier-opt-gcc named frontier04119 with 4 processors, by vanellam Tue Mar 5 12:42:29 2024 Using Petsc Development GIT revision: v3.20.5-713-gabdf6bc0fcf GIT Date: 2024-03-05 01:04:54 + Max Max/Min Avg Total Time (sec): 8.368e+02 1.000 8.368e+02 Objects: 0.000e+00 0.000 0.000e+00 Flops:2.546e+11 0.000 1.270e+11 5.079e+11 Flops/sec:3.043e+08 0.000 1.518e+08 6.070e+08 MPI Msg Count:1.950e+04 0.000 9.748e+03 3.899e+04 MPI Msg Len (bytes): 1.560e+09 0.000 7.999e+04 3.119e+09 MPI Reductions: 6.331e+04 2877.545 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: - Time -- - Flop -- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %TotalCount %Total Avg %TotalCount %Total 0: Main Stage: 8.3676e+02 100.0% 5.0792e+11 100.0% 3.899e+04 100.0% 7.999e+04 100.0% 3.164e+04 50.0% See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event EventCount Time (sec) Flop --- Global --- --- Stage Total GPU- CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --- --- Event Stage 0: Main Stage BuildTwoSided 1201 0.0 nan nan 0.00e+00 0.0 2.0e+00 4.0e+00 6.0e+02 0 0 0 0 1 0 0 0 0 2 -nan-nan 0 0.00e+000 0.00e+00 0 BuildTwoSidedF 1200 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+02 0 0 0 0 1 0 0 0 0 2 -nan-nan 0 0.00e+000 0.00e+00 0 MatMult19494 0.0 nan nan 1.35e+11 0.0 3.9e+04 8.0e+04 0.0e+00 7 53
Re: [petsc-users] Using Sundials from PETSc
Hi Matt, very interesting project you are working on. We haven't gone deep on how we would do this in GPUs and are starting to look at options. We will explore if it is possible to batch work needed for several cells within a thread group on the gpu. We use a single Cartesian mesh per MPI process (usually with 40^3 to 50^3 cells). Something I implemented to avoid the MPI process over-subscription of GPU with PETSc solvers was to cluster several MPI Processes per GPU on resource sets. Then, the processes in the set would pass matrix (at setup) and RHS to a single process (set master) which communicates with the GPU. The GPU solution is then brought back to the set master which distributes it to the MPI processes in the set as needed. So, only a set of processes as large as the number of GPUs in the calculation (with their own MPI communicator) call the PETSc matrix and vector building, and solve routines. The neat thing is that all MPI communications are local to the node. This idea is not new, it was developed by the researchers at GWU that interfaced PETSc to AMGx back when there were no native GPU solvers in PETSc, HYPRE and other libs (~2016). Best, Marcos From: Matthew Knepley Sent: Monday, October 16, 2023 4:31 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov ; Paul, Chandan (IntlAssoc) Subject: Re: [petsc-users] Using Sundials from PETSc On Mon, Oct 16, 2023 at 4:08 PM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Mathew, we have code that time splits the combustion step from the chemical species transport, so on each computational cell for each fluid flow time step, once transport is done we have the mixture chemical composition as initial condition. We are looking into doing finite rate chemistry with skeletal combustion models (20+ equations) in each cell for each fluid time step. Sundials provides the CVODE solver for the time integration of these, and would be interesting to see if we can make use of GPU acceleration. From their User Guide for Version 6.6.0 there are several GPU implementations for building RHS and using linear, nonlinear and stiff ODE solvers. We are doing a similar thing in CHREST (https://www.buffalo.edu/chrest.html). Since we normally use hundreds of species and thousands of reactions for the reduced mechanism, we are using TChem2 to build and solve the system in each cell. Since these systems are so small, you are likely to need some way of batching them within a warp. Do you have an idea for this already? Thanks, Matt Thank you Satish for the comment. Might be better at this point to first get an idea on what the implementation in our code using Sundials directly would look like. Then, we can see if it is possible and makes sense to access it through PETSc. We have things working in CPU making use of and older version of CVODE. BTW after some changes in our code we are starting running larger cases using GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced already. Thanks! From: Matthew Knepley mailto:knep...@gmail.com>> Sent: Monday, October 16, 2023 3:03 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>>; Paul, Chandan (IntlAssoc) mailto:chandan.p...@nist.gov>> Subject: Re: [petsc-users] Using Sundials from PETSc On Mon, Oct 16, 2023 at 2:29 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc? The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now. We would be happy to get an MR for the updates if you want to try it. We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code. How does the GPU interest interact with the SUNDIALS version? Thanks, Matt Thanks, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
Re: [petsc-users] Using Sundials from PETSc
Hi Mathew, we have code that time splits the combustion step from the chemical species transport, so on each computational cell for each fluid flow time step, once transport is done we have the mixture chemical composition as initial condition. We are looking into doing finite rate chemistry with skeletal combustion models (20+ equations) in each cell for each fluid time step. Sundials provides the CVODE solver for the time integration of these, and would be interesting to see if we can make use of GPU acceleration. From their User Guide for Version 6.6.0 there are several GPU implementations for building RHS and using linear, nonlinear and stiff ODE solvers. Thank you Satish for the comment. Might be better at this point to first get an idea on what the implementation in our code using Sundials directly would look like. Then, we can see if it is possible and makes sense to access it through PETSc. We have things working in CPU making use of and older version of CVODE. BTW after some changes in our code we are starting running larger cases using GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced already. Thanks! From: Matthew Knepley Sent: Monday, October 16, 2023 3:03 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov ; Paul, Chandan (IntlAssoc) Subject: Re: [petsc-users] Using Sundials from PETSc On Mon, Oct 16, 2023 at 2:29 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc? The short answer is, no. We are at v2.5 and they are at v6.5. There were no dates on the version history page, so I do not know how out of date we are. There have not been any requests for update until now. We would be happy to get an MR for the updates if you want to try it. We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code. How does the GPU interest interact with the SUNDIALS version? Thanks, Matt Thanks, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
[petsc-users] Using Sundials from PETSc
Hi, we were wondering if it would be possible to call the latest version of Sundials from PETSc? We are interested in doing chemistry using GPUs and already have interfaces to PETSc from our code. Thanks, Marcos
Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You might have an idea on what is going on here. These are my modules: Currently Loaded Modules: 1) lsf-tools/2.0 3) darshan-runtime/3.4.0-lite 5) DefApps 7) spectrum-mpi/10.4.0.3-20210112 9) nsight-systems/2021.3.1.54 2) hsi/5.0.2.p54) xalt/1.2.1 6) nvhpc/22.11 8) nsight-compute/2021.2.1 10) cuda/11.7.1 I configured and compiled petsc with these options: ./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre --download-fblaslapack --with-cuda without issues. The MPI checks did not go through as this was done in the login node. Then, I started getting (similarly to what I saw with pgi and gcc in summit) ambiguous interface errors related to mpi routines. I was able to make a simple piece of code that reproduces this. It has to do with having a USE PETSC statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) using that module, even though the PRIVATE statement has been used in said (TEST_MOD) module. MODULE TEST_MOD ! In this module we use PETSC. USE PETSC !USE MPI IMPLICIT NONE PRIVATE PUBLIC :: TEST1 CONTAINS SUBROUTINE TEST1(A) IMPLICIT NONE REAL, INTENT(INOUT) :: A INTEGER :: IERR A=0. ENDSUBROUTINE TEST1 ENDMODULE TEST_MOD PROGRAM MAIN ! Assume in main we use some MPI_F08 features. USE MPI_F08 USE TEST_MOD, ONLY : TEST1 IMPLICIT NONE INTEGER :: MY_RANK,IERR=0 INTEGER :: PNAMELEN=0 INTEGER :: PROVIDED INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED REAL :: A=0. CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR) CALL TEST1(A) CALL MPI_FINALIZE(IERR) ENDPROGRAM MAIN Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to compile this code: vanellam@login5 test_spectrum_issue $ mpifort -c -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" -I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include" mpitest.f90 NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread (mpitest.f90: 34) NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize (mpitest.f90: 37) 0 inform, 0 warnings, 2 severes, 0 fatal for main Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation proceeds correctly. If I leave the USE PETSC statement in the module and change to USE MPI the statement in main compilation also goes through. So it seems to be something related to using the PETSC and MPI_F08 modules. My take is that it is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc with openmpi in other systems. Well please let me know if you have any ideas on what might be going on. I'll move to polaris and try with mpich too. Thanks! Marcos From: Junchao Zhang Sent: Tuesday, August 22, 2023 5:25 PM To: Matthew Knepley Cc: Vanella, Marcos (Fed) ; PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Macros, yes, refer to the example script Matt mentioned for Summit. Feel free to turn on/off options in the file. In my experience, gcc is easier to use. Also, I found https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, which might be similar to your machine (4 GPUs per node). The key point is: The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. So you can try the helper script set_affinity_gpu_polaris.sh to manually set CUDA_VISIBLE_DEVICES. In other words, make the script on your PATH and then run your job with srun -N 2 -n 16 set_affinity_gpu_polaris.sh /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -pc_type gamg -mat_type aijcusparse -vec_type cuda Then, check again with nvidia-smi to see if GPU memory is evenly allocated. --Junchao Zhang On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley mailto:knep...@gmail.com>> wrote: On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I trie
Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Hi Junchao, both the slurm scontrol show job_id -dd and looking at CUDA_VISIBLE_DEVICES does not provide information about which MPI process is associated to which GPU in the node in our system. I can see this with nvidia-smi, but if you have any other suggestion using slurm I would like to hear it. I've been trying to compile the code+Petsc in summit, but have been having all sorts of issues related to spectrum-mpi, and the different compilers they provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, others give issues of repeated MPI definitions, etc.). I also wanted to ask you, do you know if it is possible to compile PETSc with the xl/16.1.1-10 suite? Thanks! I configured the library --with-cuda and when compiling I get a compilation error with CUDAC: CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44: In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532: In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27: /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: warning: Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] THRUST_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: expanded from macro 'THRUST_COMPILER_DEPRECATION' THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: expanded from macro 'THRUST_COMP_DEPR_IMPL' # define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: expanded from macro 'THRUST_COMP_DEPR_IMPL0' # define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :141:6: note: expanded from here GCC warning "Thrust requires at least Clang 7.0. Define THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^ In file included from /autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2: In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19: In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277: In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35: In file included from /sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36: In file included from /sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19: In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36: /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. [-W#pragma-messages] CUB_COMPILER_DEPRECATION(Clang 7.0); ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded from macro 'CUB_COMPILER_DEPRECATION' CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded from macro 'CUB_COMP_DEPR_IMPL' # define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg) ^ /sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded from macro 'CUB_COMP_DEPR_IMPL0' # define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr) ^ :198:6: note: expanded from here GCC warning "CUB requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message." ^
Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI processes meshes but only working on 2 of them? It says in the script it has allocated 2.4GB Best, Marcos From: Junchao Zhang Sent: Monday, August 21, 2023 3:29 PM To: Vanella, Marcos (Fed) Cc: PETSc users list ; Guan, Collin X. (Fed) Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Macros, If you look at the PIDs of the nvidia-smi output, you will only find 8 unique PIDs, which is expected since you allocated 8 MPI ranks per node. The duplicate PIDs are usually for threads spawned by the MPI runtime (for example, progress threads in MPI implementation). So your job script and output are all good. Thanks. On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03CUDA Version: 12.2 | |-+--+--+ | GPU Name Persistence-M | Bus-IdDisp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=+==+==| | 0 Tesla V100-SXM2-16GB On | 0004:04:00.0 Off | 0 | | N/A 34CP0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ | 1 Tesla V100-SXM2-16GB On | 0004:05:00.0 Off | 0 | | N/A 38CP0 56W / 300W |638MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ | 2 Tesla V100-SXM2-16GB On | 0035:03:00.0 Off | 0 | | N/A 35CP0 52W / 300W |638MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ | 3 Tesla V100-SXM2-16GB On | 0035:04:00.0 Off | 0 | | N/A 38CP0 53W / 300W |638MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ +---+ | Processes: | | GPU GI CIPID Type Process name GPU Memory | |ID ID Usage | |===| |0 N/A N/A214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |0 N/A N/A214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |0 N/A N/A214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |1 N/A N/A214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |1 N/A N/A214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |2 N/A N/A214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |2 N/A N/A214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |3 N/A N/A214629
Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Hi Junchao, something I'm noting related to running with cuda enabled linear solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the GPU 0 in the node is taking what seems to be all sub-matrices corresponding to all the MPI processes in the node. This is the result of the nvidia-smi command on a node with 8 MPI processes (each advancing the same number of unknowns in the calculation) and 4 GPU V100s: Mon Aug 21 14:36:07 2023 +---+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03CUDA Version: 12.2 | |-+--+--+ | GPU Name Persistence-M | Bus-IdDisp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=+==+==| | 0 Tesla V100-SXM2-16GB On | 0004:04:00.0 Off | 0 | | N/A 34CP0 63W / 300W | 2488MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ | 1 Tesla V100-SXM2-16GB On | 0004:05:00.0 Off | 0 | | N/A 38CP0 56W / 300W |638MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ | 2 Tesla V100-SXM2-16GB On | 0035:03:00.0 Off | 0 | | N/A 35CP0 52W / 300W |638MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ | 3 Tesla V100-SXM2-16GB On | 0035:04:00.0 Off | 0 | | N/A 38CP0 53W / 300W |638MiB / 16384MiB | 0% Default | | | | N/A | +-+--+--+ +---+ | Processes: | | GPU GI CIPID Type Process name GPU Memory | |ID ID Usage | |===| |0 N/A N/A214626 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |0 N/A N/A214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214630 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |0 N/A N/A214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |0 N/A N/A214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 308MiB | |1 N/A N/A214627 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |1 N/A N/A214631 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |2 N/A N/A214628 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |2 N/A N/A214632 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |3 N/A N/A214629 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | |3 N/A N/A214633 C ...d/ompi_gnu_linux/fds_ompi_gnu_linux 318MiB | +---+ You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm wondering if this is expected or there are some changes I need to do on my submission script/runtime parameters. This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node): #!/bin/bash # ../../Utilities/Scripts/qfds.sh -p 2 -T db -d test.fds #SBATCH -J test #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log #SBATCH --partition=gpu #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8
Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
. -- -- mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on signal 6 (Aborted). -- BTW, I'm curious. If I set n MPI processes, each of them building a part of the linear system, and g GPUs, how does PETSc distribute those n pieces of system matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where can I read about this? Thank you and best Regards, I can also point you to my code repo in GitHub if you want to take a closer look. Best Regards, Marcos From: Junchao Zhang Sent: Friday, August 11, 2023 10:52 AM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU Hi, Marcos, Could you build petsc in debug mode and then copy and paste the whole error stack message? Thanks --Junchao Zhang On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos
[petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU
Hi, I'm trying to run a parallel matrix vector build and linear solution with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the following error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. Backtrace for this error: terminate called after throwing an instance of 'thrust::system::system_error' what(): merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered Program received signal SIGABRT: Process abort signal. I'm new to submitting jobs in slurm that also use GPU resources, so I might be doing something wrong in my submission script. This is it: #!/bin/bash #SBATCH -J test #SBATCH -e /home/Issues/PETSc/test.err #SBATCH -o /home/Issues/PETSc/test.log #SBATCH --partition=batch #SBATCH --ntasks=2 #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=2 #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 export OMP_NUM_THREADS=1 module load cuda/11.5 module load openmpi/4.1.1 cd /home/Issues/PETSc mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg If anyone has any suggestions on how o troubleshoot this please let me know. Thanks! Marcos
Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Sorry, meant 100K to 200K cells. Also, check the release page of suitesparse. The mutli-GPU version of cholmod might be coming soon: https://people.engr.tamu.edu/davis/SuiteSparse/index.html From: Vanella, Marcos (Fed) Sent: Tuesday, June 27, 2023 2:56 PM To: Matthew Knepley Cc: Mark Adams ; petsc-users@mcs.anl.gov Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm trying to see if GPU would provide an advantage for this particular Poisson solution we do in our code. Our grids are staggered with the Poisson unknown in cell centers. All my tests for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster option for these meshes considering the mesh as unstructured (an implementation separate from the PETSc option). We have the option of Fishpack (fast trigonometric solvers), but that is not as general (requires solution on the whole mesh + a special treatment of immersed geometry). The single mesh solver is used as a black box within a fixed point domain decomposition iteration in multi-mesh cases. The approximation error in this method is confined to the mesh boundaries. The other option I have tried with MKL is to build the global matrix across all meshes and use the MKL cluster sparse solver. The problem becomes a memory one for meshes that go over a couple million unknowns due to the exact Cholesky factorization matrix storage. I'm thinking the other possibility using PETSc is to build in parallel the global matrix (as done for the MKL global solver) and try the GPU accelerated Krylov + multigrid preconditioner. If this can bring down the time to solution to what we get for the previous scheme and keep memory use undrr control it would be a good option for CPU+GPU systems. Thing is we need to bring the residual of the equation to ~10^-10 or less to avoid instability so it might still be costly. I'll keep you updated. Thanks, Marcos From: Matthew Knepley Sent: Tuesday, June 27, 2023 2:08 PM To: Vanella, Marcos (Fed) Cc: Mark Adams ; petsc-users@mcs.anl.gov Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the hypre Boomer AMG. They work just fine for my case. I also got my hands on a machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make use of cuda and cuda-enabled openmpi (with gcc). I'm running the previous tests and want to also check some of the cuda enabled solvers. I was able to submit a case for the default Krylov solver with these runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky -pc_factor_mat_solver_type cusparse. The case run to completion. I guess my question now is how do I monitor (if there is a way) that the GPU is being used in the calculation, and any other stats? You should get that automatically with -log_view If you want finer-grained profiling of the kernels, you can use -log_view_gpu_time but it can slows things down. Also, which other solver combination using GPU would you recommend for me to try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE? Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now for GPUs. It depends on what card you have, what version of the driver, what version of the libraries, etc. It is very fragile. Hopefully this period ends soon, but I am not optimistic. Unless you are very confident that GPUs will help, I would not recommend spending the time. Thanks, Matt Thank you for your help! Marcos From: Matthew Knepley mailto:knep...@gmail.com>> Sent: Monday, June 26, 2023 12:11 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: Mark Adams mailto:mfad...@lbl.gov>>; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Than you Matt and Mark, I'll try your suggestions. To configure with hypre can I just use the --download-hypre configure line? Yes, Thanks, Matt That is what I did with suitesparse, very nice. From: Mark Adams mailto:mfad...@lbl.gov>> Sent: Monday, June 26, 2023 12:05 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users]
Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm trying to see if GPU would provide an advantage for this particular Poisson solution we do in our code. Our grids are staggered with the Poisson unknown in cell centers. All my tests for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster option for these meshes considering the mesh as unstructured (an implementation separate from the PETSc option). We have the option of Fishpack (fast trigonometric solvers), but that is not as general (requires solution on the whole mesh + a special treatment of immersed geometry). The single mesh solver is used as a black box within a fixed point domain decomposition iteration in multi-mesh cases. The approximation error in this method is confined to the mesh boundaries. The other option I have tried with MKL is to build the global matrix across all meshes and use the MKL cluster sparse solver. The problem becomes a memory one for meshes that go over a couple million unknowns due to the exact Cholesky factorization matrix storage. I'm thinking the other possibility using PETSc is to build in parallel the global matrix (as done for the MKL global solver) and try the GPU accelerated Krylov + multigrid preconditioner. If this can bring down the time to solution to what we get for the previous scheme and keep memory use undrr control it would be a good option for CPU+GPU systems. Thing is we need to bring the residual of the equation to ~10^-10 or less to avoid instability so it might still be costly. I'll keep you updated. Thanks, Marcos From: Matthew Knepley Sent: Tuesday, June 27, 2023 2:08 PM To: Vanella, Marcos (Fed) Cc: Mark Adams ; petsc-users@mcs.anl.gov Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> wrote: Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the hypre Boomer AMG. They work just fine for my case. I also got my hands on a machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make use of cuda and cuda-enabled openmpi (with gcc). I'm running the previous tests and want to also check some of the cuda enabled solvers. I was able to submit a case for the default Krylov solver with these runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky -pc_factor_mat_solver_type cusparse. The case run to completion. I guess my question now is how do I monitor (if there is a way) that the GPU is being used in the calculation, and any other stats? You should get that automatically with -log_view If you want finer-grained profiling of the kernels, you can use -log_view_gpu_time but it can slows things down. Also, which other solver combination using GPU would you recommend for me to try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE? Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now for GPUs. It depends on what card you have, what version of the driver, what version of the libraries, etc. It is very fragile. Hopefully this period ends soon, but I am not optimistic. Unless you are very confident that GPUs will help, I would not recommend spending the time. Thanks, Matt Thank you for your help! Marcos From: Matthew Knepley mailto:knep...@gmail.com>> Sent: Monday, June 26, 2023 12:11 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: Mark Adams mailto:mfad...@lbl.gov>>; petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Than you Matt and Mark, I'll try your suggestions. To configure with hypre can I just use the --download-hypre configure line? Yes, Thanks, Matt That is what I did with suitesparse, very nice. From: Mark Adams mailto:mfad...@lbl.gov>> Sent: Monday, June 26, 2023 12:05 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution I'm not sure what MG is doing with an "unstructured" problem. I assume you are not using DMDA. -pc_type gamg should work I would configure with hypre and try that also: -pc_type hypre As Matt said MG should be faster. How many iterations was it taking? Try a 100^3 and check that the iteration count does not change much, if at all. Mark On Mon, Jun 26, 2023 at 11:35 AM V
Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the hypre Boomer AMG. They work just fine for my case. I also got my hands on a machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make use of cuda and cuda-enabled openmpi (with gcc). I'm running the previous tests and want to also check some of the cuda enabled solvers. I was able to submit a case for the default Krylov solver with these runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky -pc_factor_mat_solver_type cusparse. The case run to completion. I guess my question now is how do I monitor (if there is a way) that the GPU is being used in the calculation, and any other stats? Also, which other solver combination using GPU would you recommend for me to try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE? Thank you for your help! Marcos From: Matthew Knepley Sent: Monday, June 26, 2023 12:11 PM To: Vanella, Marcos (Fed) Cc: Mark Adams ; petsc-users@mcs.anl.gov Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Than you Matt and Mark, I'll try your suggestions. To configure with hypre can I just use the --download-hypre configure line? Yes, Thanks, Matt That is what I did with suitesparse, very nice. From: Mark Adams mailto:mfad...@lbl.gov>> Sent: Monday, June 26, 2023 12:05 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution I'm not sure what MG is doing with an "unstructured" problem. I assume you are not using DMDA. -pc_type gamg should work I would configure with hypre and try that also: -pc_type hypre As Matt said MG should be faster. How many iterations was it taking? Try a 100^3 and check that the iteration count does not change much, if at all. Mark On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi, I was wondering if anyone has experience on what combinations are more efficient to solve a Poisson problem derived from a 7 point stencil on a single mesh (serial). I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type cholmod for the case I'm testing. I'm new to PETSc so any suggestions are most welcome and appreciated, Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can I just use the --download-hypre configure line? That is what I did with suitesparse, very nice. From: Mark Adams Sent: Monday, June 26, 2023 12:05 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution I'm not sure what MG is doing with an "unstructured" problem. I assume you are not using DMDA. -pc_type gamg should work I would configure with hypre and try that also: -pc_type hypre As Matt said MG should be faster. How many iterations was it taking? Try a 100^3 and check that the iteration count does not change much, if at all. Mark On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hi, I was wondering if anyone has experience on what combinations are more efficient to solve a Poisson problem derived from a 7 point stencil on a single mesh (serial). I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type cholmod for the case I'm testing. I'm new to PETSc so any suggestions are most welcome and appreciated, Marcos
[petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution
Hi, I was wondering if anyone has experience on what combinations are more efficient to solve a Poisson problem derived from a 7 point stencil on a single mesh (serial). I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type cholmod for the case I'm testing. I'm new to PETSc so any suggestions are most welcome and appreciated, Marcos
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
May 15, 2023 12:08 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX Ventura 13.3.1. I can compile PETSc in debug mode with this configure and make lines. I can run the PETSC tests, which seem fine. When I compile the library in optimized mode, either using -O3 or O1, for example configuring with: I hate to yell "compiler bug" when this happens, but it sure seems like one. Can you just use --with-debugging=0 without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is almost certainly a compiler bug. If not, then we can go in the debugger and see what is failing. Thanks, Matt $ ./configure --prefix=/opt/petsc-oneapi22u3 --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 --with-shared-libraries=0 --download-make and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib compiles. Yet, I see right off the bat this segfault error in the first PETSc example: $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 PETSC_ARCH=arch-darwin-c-opt test /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make --no-print-directory -f /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test /opt/intel/oneapi/intelpython/latest/bin/python3 /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o In file included from /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44), from /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4): /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): warning #2621: attribute "warn_unused_result" does not apply here PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD { ^ CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1 TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts not ok sys_classes_draw_tests-ex1_1 # Error code: 139 # [excess:98681] *** Process received signal *** # [excess:98681] Signal: Segmentation fault: 11 (11) # [excess:98681] Signal code: Address not mapped (1) # [excess:98681] Failing at address: 0x7f # [excess:98681] *** End of error message *** # -- # Primary job terminated normally, but 1 process returned # a non-zero exit code. Per user-direction, the job has been aborted. # -- # -- # mpiexec noticed that process rank 0 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). # -- ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff I see the same segfault error in all PETSc examples. Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is to use the linear solver from PETSc for the Poisson equation on our numerical scheme and test this on a GPU cluster. So also, any guideline on how to interface PETSc with a fortran code and personal experience is also most appreciated! Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
has been aborted. -- -- mpiexec noticed that process rank 0 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). -- Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes See https://petsc.org/release/faq/ [excess:37831] *** Process received signal *** [excess:37831] Signal: Segmentation fault: 11 (11) [excess:37831] Signal code: Address not mapped (1) [excess:37831] Failing at address: 0x7f [excess:37831] *** End of error message *** [excess:37832] *** Process received signal *** [excess:37832] Signal: Segmentation fault: 11 (11) [excess:37832] Signal code: Address not mapped (1) [excess:37832] Failing at address: 0x7f [excess:37832] *** End of error message *** -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpiexec noticed that process rank 1 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). -- Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PCRoutineLineSource libifcoremt.dylib 00010B7F7FE4 for__signal_handl Unknown Unknown libsystem_platfor 7FF8024C25ED _sigtramp Unknown Unknown ex5f 0001087AFA38 PetscGetArchType Unknown Unknown ex5f 00010887913B PetscErrorPrintfI Unknown Unknown ex5f 00010878D227 PetscInitialize_C Unknown Unknown ex5f 00010879D289 petscinitializef_ Unknown Unknown ex5f 000108713C09 petscsys_mp_petsc Unknown Unknown ex5f 000108710B5D MAIN__Unknown Unknown ex5f 000108710AEE main Unknown Unknown dyld 7FF80213B41F start Unknown Unknown -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[48108,1],0] Exit code:174 -- Completed test examples Error while running make check make[1]: *** [check] Error 1 make: *** [check] Error 2 From: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Sent: Monday, May 15, 2023 12:20 PM To: Matthew Knepley mailto:knep...@gmail.com>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI Thank you Matt I'll try this and let you know. Marcos From: Matthew Knepley mailto:knep...@gmail.com>> Sent: Monday, May 15, 2023 12:08 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX Ventura 13.3.1. I can compile PETSc in debug mode with this configure and make lines. I can run the PETSC tests, which seem fine. When I compile the library in optimized mode, either using -O3 or O1, for example configuring with: I hate to yell "compiler bug" when this happens, but it sure seems like one. Can you just use --with-debugging=0 without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is almost certainly a compiler bug. If not, then we can go in the debugger and see what is failing. Thanks, Matt $ ./configure --prefix=/opt/petsc-oneapi22u3 --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 --with-shared-l
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
2 MPI processes See https://petsc.org/release/faq/ [excess:37831] *** Process received signal *** [excess:37831] Signal: Segmentation fault: 11 (11) [excess:37831] Signal code: Address not mapped (1) [excess:37831] Failing at address: 0x7f [excess:37831] *** End of error message *** [excess:37832] *** Process received signal *** [excess:37832] Signal: Segmentation fault: 11 (11) [excess:37832] Signal code: Address not mapped (1) [excess:37832] Failing at address: 0x7f [excess:37832] *** End of error message *** -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpiexec noticed that process rank 1 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). -- Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PCRoutineLineSource libifcoremt.dylib 00010B7F7FE4 for__signal_handl Unknown Unknown libsystem_platfor 7FF8024C25ED _sigtramp Unknown Unknown ex5f 0001087AFA38 PetscGetArchType Unknown Unknown ex5f 00010887913B PetscErrorPrintfI Unknown Unknown ex5f 00010878D227 PetscInitialize_C Unknown Unknown ex5f 00010879D289 petscinitializef_ Unknown Unknown ex5f 000108713C09 petscsys_mp_petsc Unknown Unknown ex5f 000108710B5D MAIN__Unknown Unknown ex5f 000108710AEE main Unknown Unknown dyld 7FF80213B41F start Unknown Unknown -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[48108,1],0] Exit code:174 -- Completed test examples Error while running make check make[1]: *** [check] Error 1 make: *** [check] Error 2 From: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Sent: Monday, May 15, 2023 12:20 PM To: Matthew Knepley mailto:knep...@gmail.com>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI Thank you Matt I'll try this and let you know. Marcos From: Matthew Knepley mailto:knep...@gmail.com>> Sent: Monday, May 15, 2023 12:08 PM To: Vanella, Marcos (Fed) mailto:marcos.vane...@nist.gov>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> mailto:petsc-users@mcs.anl.gov>> Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX Ventura 13.3.1. I can compile PETSc in debug mode with this configure and make lines. I can run the PETSC tests, which seem fine. When I compile the library in optimized mode, either using -O3 or O1, for example configuring with: I hate to yell "compiler bug" when this happens, but it sure seems like one. Can you just use --with-debugging=0 without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is almost certainly a compiler bug. If not, then we can go in the debugger and see what is failing. Thanks, Matt $ ./configure --prefix=/opt/petsc-oneapi22u3 --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 --with-shared-libraries=0 --download-make and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib compiles. Yet, I see right off the bat this segfault error in the first PETSc example: $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 PETSC_ARCH=arch-darwin-c-opt test /Users/mnv/Doc
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
Hi Satish, yes the -m64 flag tells the compilers the target cpu is intel 64. The only reason I'm trying to get PETSc working with intel is that the bundles for the software we release use Intel compilers for Linux, Mac and Windows (OneAPI intelMPI for linux and Windows, OpenMPI compiled with intel for MacOS). I'm just trying to get PETSc compiled with intel to maintain the scheme we have and keep these compilers, which would be handy if we are to release an alternative Poisson solver using PETSc in the future. For our research projects I'm thinking we'll use gcc/openmpi in linux clusters. Marcos From: Satish Balay Sent: Monday, May 15, 2023 12:48 PM To: Vanella, Marcos (Fed) Cc: petsc-users Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI Ops - for some reason I assumed this build is on Mac M1. [likely due to the usage of '-m64' - that was strange].. But yeah - our general usage on Mac is with xcode/clang and brew gfortran (on both Intel and ARM CPUs) - and unless you need Intel compilers for specific needs - clang/gfortran should work better for this development work. Satish On Mon, 15 May 2023, Vanella, Marcos (Fed) via petsc-users wrote: > Hi Satish, well turns out this is not an M1 Mac, it is an older Intel Mac > (2019). > I'm trying to get a local computer to do development and tests, but I also > have access to linux clusters with GPU which we plan to go to next. > Thanks for the suggestion, I might also try compiling a gcc/gfortran version > of the lib on this computer. > Marcos > > From: Satish Balay > Sent: Monday, May 15, 2023 12:10 PM > To: Vanella, Marcos (Fed) > Cc: petsc-users@mcs.anl.gov > Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and > OpenMPI > > I see Intel compilers here are building x86_64 binaries - that get run on the > Arm M1 CPU - perhaps there are issues here with this mode of usage.. > > > I'm starting to work with PETSc. Our plan is to use the linear solver from > > PETSc for the Poisson equation on our numerical scheme and test this on a > > GPU cluster. > > What does intel compilers provide you for this use case? > > Why not use xcode/clang with gfortran here - i.e native ARM binaries? > > > Satish > > On Mon, 15 May 2023, Vanella, Marcos (Fed) via petsc-users wrote: > > > Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI > > 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX > > Ventura 13.3.1. > > I can compile PETSc in debug mode with this configure and make lines. I can > > run the PETSC tests, which seem fine. > > When I compile the library in optimized mode, either using -O3 or O1, for > > example configuring with: > > > > $ ./configure --prefix=/opt/petsc-oneapi22u3 > > --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g > > -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' > > FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 > > --with-shared-libraries=0 --download-make > > > > and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib > > compiles. Yet, I see right off the bat this segfault error in the first > > PETSc example: > > > > $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 > > PETSC_ARCH=arch-darwin-c-opt test > > /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make > > --no-print-directory -f > > /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test > > PETSC_ARCH=arch-darwin-c-opt > > PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test > > /opt/intel/oneapi/intelpython/latest/bin/python3 > > /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py > > --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 > > --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests > > Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt > > PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 > > CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o > > In file included from > > /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44), > > from > > /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4): > > /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): > > warning #2621: attribute "warn_unused_result" does not apply here > > PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD { > > ^ > > > > CLINKER arch-darwin-c-opt/tests/sys/c
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
2] Failing at address: 0x7f [excess:37832] *** End of error message *** -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpiexec noticed that process rank 1 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). -- Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI process See https://petsc.org/release/faq/ forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PCRoutineLineSource libifcoremt.dylib 00010B7F7FE4 for__signal_handl Unknown Unknown libsystem_platfor 7FF8024C25ED _sigtramp Unknown Unknown ex5f 0001087AFA38 PetscGetArchType Unknown Unknown ex5f 00010887913B PetscErrorPrintfI Unknown Unknown ex5f 00010878D227 PetscInitialize_C Unknown Unknown ex5f 00010879D289 petscinitializef_ Unknown Unknown ex5f 000108713C09 petscsys_mp_petsc Unknown Unknown ex5f 000108710B5D MAIN__Unknown Unknown ex5f 000108710AEE main Unknown Unknown dyld 7FF80213B41F start Unknown Unknown -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[48108,1],0] Exit code:174 -- Completed test examples Error while running make check make[1]: *** [check] Error 1 make: *** [check] Error 2 From: Vanella, Marcos (Fed) Sent: Monday, May 15, 2023 12:20 PM To: Matthew Knepley Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI Thank you Matt I'll try this and let you know. Marcos From: Matthew Knepley Sent: Monday, May 15, 2023 12:08 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX Ventura 13.3.1. I can compile PETSc in debug mode with this configure and make lines. I can run the PETSC tests, which seem fine. When I compile the library in optimized mode, either using -O3 or O1, for example configuring with: I hate to yell "compiler bug" when this happens, but it sure seems like one. Can you just use --with-debugging=0 without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is almost certainly a compiler bug. If not, then we can go in the debugger and see what is failing. Thanks, Matt $ ./configure --prefix=/opt/petsc-oneapi22u3 --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 --with-shared-libraries=0 --download-make and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib compiles. Yet, I see right off the bat this segfault error in the first PETSc example: $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 PETSC_ARCH=arch-darwin-c-opt test /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make --no-print-directory -f /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test /opt/intel/oneapi/intelpython/latest/bin/python3 /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o In file included from /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
Hi Satish, well turns out this is not an M1 Mac, it is an older Intel Mac (2019). I'm trying to get a local computer to do development and tests, but I also have access to linux clusters with GPU which we plan to go to next. Thanks for the suggestion, I might also try compiling a gcc/gfortran version of the lib on this computer. Marcos From: Satish Balay Sent: Monday, May 15, 2023 12:10 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI I see Intel compilers here are building x86_64 binaries - that get run on the Arm M1 CPU - perhaps there are issues here with this mode of usage.. > I'm starting to work with PETSc. Our plan is to use the linear solver from > PETSc for the Poisson equation on our numerical scheme and test this on a GPU > cluster. What does intel compilers provide you for this use case? Why not use xcode/clang with gfortran here - i.e native ARM binaries? Satish On Mon, 15 May 2023, Vanella, Marcos (Fed) via petsc-users wrote: > Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI > 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX > Ventura 13.3.1. > I can compile PETSc in debug mode with this configure and make lines. I can > run the PETSC tests, which seem fine. > When I compile the library in optimized mode, either using -O3 or O1, for > example configuring with: > > $ ./configure --prefix=/opt/petsc-oneapi22u3 > --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g > -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' > FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 > --with-shared-libraries=0 --download-make > > and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib > compiles. Yet, I see right off the bat this segfault error in the first PETSc > example: > > $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 > PETSC_ARCH=arch-darwin-c-opt test > /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make > --no-print-directory -f > /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test > PETSC_ARCH=arch-darwin-c-opt > PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test > /opt/intel/oneapi/intelpython/latest/bin/python3 > /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py > --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 > --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests > Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt > PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 > CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o > In file included from > /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44), > from > /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4): > /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): > warning #2621: attribute "warn_unused_result" does not apply here > PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD { > ^ > > CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1 >TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts > not ok sys_classes_draw_tests-ex1_1 # Error code: 139 > # [excess:98681] *** Process received signal *** > # [excess:98681] Signal: Segmentation fault: 11 (11) > # [excess:98681] Signal code: Address not mapped (1) > # [excess:98681] Failing at address: 0x7f > # [excess:98681] *** End of error message *** > # > -- > # Primary job terminated normally, but 1 process returned > # a non-zero exit code. Per user-direction, the job has been aborted. > # > -- > # > -- > # mpiexec noticed that process rank 0 with PID 0 on node excess exited on > signal 11 (Segmentation fault: 11). > # > -- > ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff > > I see the same segfault error in all PETSc examples. > Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is > to use the linear solver from PETSc for the Poisson equation on our numerical > scheme and test this on a GPU cluster. So also, any guideline on how to > interface PETSc with a fortran code and personal experience is also most > appreciated! > > Marcos > > > >
Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
Thank you Matt I'll try this and let you know. Marcos From: Matthew Knepley Sent: Monday, May 15, 2023 12:08 PM To: Vanella, Marcos (Fed) Cc: petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX Ventura 13.3.1. I can compile PETSc in debug mode with this configure and make lines. I can run the PETSC tests, which seem fine. When I compile the library in optimized mode, either using -O3 or O1, for example configuring with: I hate to yell "compiler bug" when this happens, but it sure seems like one. Can you just use --with-debugging=0 without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is almost certainly a compiler bug. If not, then we can go in the debugger and see what is failing. Thanks, Matt $ ./configure --prefix=/opt/petsc-oneapi22u3 --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 --with-shared-libraries=0 --download-make and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib compiles. Yet, I see right off the bat this segfault error in the first PETSc example: $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 PETSC_ARCH=arch-darwin-c-opt test /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make --no-print-directory -f /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test /opt/intel/oneapi/intelpython/latest/bin/python3 /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o In file included from /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44), from /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4): /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): warning #2621: attribute "warn_unused_result" does not apply here PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD { ^ CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1 TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts not ok sys_classes_draw_tests-ex1_1 # Error code: 139 # [excess:98681] *** Process received signal *** # [excess:98681] Signal: Segmentation fault: 11 (11) # [excess:98681] Signal code: Address not mapped (1) # [excess:98681] Failing at address: 0x7f # [excess:98681] *** End of error message *** # -- # Primary job terminated normally, but 1 process returned # a non-zero exit code. Per user-direction, the job has been aborted. # -- # -- # mpiexec noticed that process rank 0 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). # -- ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff I see the same segfault error in all PETSc examples. Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is to use the linear solver from PETSc for the Poisson equation on our numerical scheme and test this on a GPU cluster. So also, any guideline on how to interface PETSc with a fortran code and personal experience is also most appreciated! Marcos -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
[petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX Ventura 13.3.1. I can compile PETSc in debug mode with this configure and make lines. I can run the PETSC tests, which seem fine. When I compile the library in optimized mode, either using -O3 or O1, for example configuring with: $ ./configure --prefix=/opt/petsc-oneapi22u3 --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 --with-shared-libraries=0 --download-make and using mpicc (icc), mpif90 (ifort) from Open MPI, the static lib compiles. Yet, I see right off the bat this segfault error in the first PETSc example: $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 PETSC_ARCH=arch-darwin-c-opt test /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make --no-print-directory -f /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test /opt/intel/oneapi/intelpython/latest/bin/python3 /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o In file included from /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44), from /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4): /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): warning #2621: attribute "warn_unused_result" does not apply here PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD { ^ CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1 TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts not ok sys_classes_draw_tests-ex1_1 # Error code: 139 # [excess:98681] *** Process received signal *** # [excess:98681] Signal: Segmentation fault: 11 (11) # [excess:98681] Signal code: Address not mapped (1) # [excess:98681] Failing at address: 0x7f # [excess:98681] *** End of error message *** # -- # Primary job terminated normally, but 1 process returned # a non-zero exit code. Per user-direction, the job has been aborted. # -- # -- # mpiexec noticed that process rank 0 with PID 0 on node excess exited on signal 11 (Segmentation fault: 11). # -- ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff I see the same segfault error in all PETSc examples. Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is to use the linear solver from PETSc for the Poisson equation on our numerical scheme and test this on a GPU cluster. So also, any guideline on how to interface PETSc with a fortran code and personal experience is also most appreciated! Marcos