Re: [petsc-users] Compiling PETSc in Polaris with gnu

2024-05-03 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Satish, I'll give it a try.

From: Satish Balay 
Sent: Thursday, May 2, 2024 5:51 PM
To: Vanella, Marcos (Fed) 
Cc: Junchao Zhang ; petsc-users 
; Mueller, Eric V. (Fed) 
Subject: Re: [petsc-users] Compiling PETSc in Polaris with gnu

Perhaps you need to:

module load craype-accel-nvidia80

And then rebuild PETSc, your application

And have the same list of modules loaded at runtime.

Satish

On Thu, 2 May 2024, Vanella, Marcos (Fed) via petsc-users wrote:

> Thank you Satish and Junchao! I was able to compile PETSc with your configure 
> options + suitesparse and hypre, and then compile my fortran code linking to 
> PETSc.
> But when I try to run my test run I'm picking up an error at the very 
> beginning:
>
> MPICH ERROR [Rank 0] [job id 01eb3c4a-28a7-4178-aced-512b4fb704c6] [Thu May  
> 2 20:44:26 2024] [x3006c0s19b1n0] - Abort(-1) (rank 0 in comm 0): 
> MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not 
> linked
>  (Other MPI error)
>
> aborting job:
> MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not 
> linked
>
> It says in the Polaris user guide that:
>
> The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your 
> application requires MPI-GPU support whereby the MPI library sends and 
> receives data directly from GPU buffers. In this case, it will be important 
> to have the craype-accel-nvidia80  module loaded both when compiling your 
> application and during runtime to correctly link against a GPU Transport 
> Layer (GTL) MPI library. Otherwise, you'll likely see GPU_SUPPORT_ENABLED is 
> requested, but GTL library is not linked errors during runtime.
>
> I tried adding loading this module (also needed to add nvhpc-mixed) in my 
> submission script but I get the same result.
> I'll get in touch with alcf help on this.
>
>
>
> 
> From: Satish Balay 
> Sent: Thursday, May 2, 2024 11:58 AM
> To: Junchao Zhang 
> Cc: petsc-users ; Vanella, Marcos (Fed) 
> ; Mueller, Eric V. (Fed) 
> Subject: Re: [petsc-users] Compiling PETSc in Polaris with gnu
>
> I just tried a build  (used default versions) - and the following builds for 
> me [on the login node].
>
>
> module use /soft/modulefiles
> module load PrgEnv-gnu
> module load cudatoolkit-standalone
> module load cray-libsci
> ./configure --with-cc=cc --with-fc=ftn --with-cxx=CC --with-make-np=4 
> --with-cuda=1 --with-cudac=nvcc --with-cuda-arch=80 \
>   --with-debugging=0 COPTFLAGS=-O2 CXXOPTFLAGS=-O2 FOPTFLAGS=-O2 
> CUDAOPTFLAGS=-O2 --download-kokkos --download-kokkos-kernels
> make
>
> Satish
>
> ---
>
> balay@polaris-login-01:~> module list
>
> Currently Loaded Modules:
>   1) libfabric/1.15.2.0   4) darshan/3.4.4 7) cray-dsmml/0.2.2   10) 
> cray-pals/1.3.4 13) PrgEnv-gnu/8.5.0
>   2) craype-network-ofi   5) gcc-native/12.3   8) cray-mpich/8.1.28  11) 
> cray-libpals/1.3.4  14) cudatoolkit-standalone/12.2.2
>   3) perftools-base/23.12.0   6) craype/2.7.30 9) cray-pmi/6.1.1312) 
> craype-x86-milan15) cray-libsci/23.12.5
>
>
> On Thu, 2 May 2024, Junchao Zhang wrote:
>
> > I used cudatoolkit-standalone/12.4.1 and gcc-12.3.
> >
> > Be sure to use the latest petsc/main or petsc/release, which contains fixes
> > for Polaris.
> >
> > --Junchao Zhang
> >
> >
> > On Thu, May 2, 2024 at 10:23 AM Satish Balay via petsc-users <
> > petsc-users@mcs.anl.gov> wrote:
> >
> > > Try:
> > >
> > > module use /soft/modulefiles
> > >
> > > Satish
> > >
> > > On Thu, 2 May 2024, Vanella, Marcos (Fed) via petsc-users wrote:
> > >
> > > > Hi all, it seems the modules in Polaris have changed (can't find
> > > cudatoolkit-standalone anymore).
> > > > Does anyone have recent experience compiling the library with gnu and
> > > cuda in the machine?
> > > > Thank you!
> > > > Marcos
> > > >
> > >
> > >
> >
>


Re: [petsc-users] Compiling PETSc in Polaris with gnu

2024-05-02 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Satish and Junchao! I was able to compile PETSc with your configure 
options + suitesparse and hypre, and then compile my fortran code linking to 
PETSc.
But when I try to run my test run I'm picking up an error at the very beginning:

MPICH ERROR [Rank 0] [job id 01eb3c4a-28a7-4178-aced-512b4fb704c6] [Thu May  2 
20:44:26 2024] [x3006c0s19b1n0] - Abort(-1) (rank 0 in comm 0): 
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
 (Other MPI error)

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

It says in the Polaris user guide that:

The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your 
application requires MPI-GPU support whereby the MPI library sends and receives 
data directly from GPU buffers. In this case, it will be important to have the 
craype-accel-nvidia80  module loaded both when compiling your application and 
during runtime to correctly link against a GPU Transport Layer (GTL) MPI 
library. Otherwise, you'll likely see GPU_SUPPORT_ENABLED is requested, but GTL 
library is not linked errors during runtime.

I tried adding loading this module (also needed to add nvhpc-mixed) in my 
submission script but I get the same result.
I'll get in touch with alcf help on this.




From: Satish Balay 
Sent: Thursday, May 2, 2024 11:58 AM
To: Junchao Zhang 
Cc: petsc-users ; Vanella, Marcos (Fed) 
; Mueller, Eric V. (Fed) 
Subject: Re: [petsc-users] Compiling PETSc in Polaris with gnu

I just tried a build  (used default versions) - and the following builds for me 
[on the login node].


module use /soft/modulefiles
module load PrgEnv-gnu
module load cudatoolkit-standalone
module load cray-libsci
./configure --with-cc=cc --with-fc=ftn --with-cxx=CC --with-make-np=4 
--with-cuda=1 --with-cudac=nvcc --with-cuda-arch=80 \
  --with-debugging=0 COPTFLAGS=-O2 CXXOPTFLAGS=-O2 FOPTFLAGS=-O2 
CUDAOPTFLAGS=-O2 --download-kokkos --download-kokkos-kernels
make

Satish

---

balay@polaris-login-01:~> module list

Currently Loaded Modules:
  1) libfabric/1.15.2.0   4) darshan/3.4.4 7) cray-dsmml/0.2.2   10) 
cray-pals/1.3.4 13) PrgEnv-gnu/8.5.0
  2) craype-network-ofi   5) gcc-native/12.3   8) cray-mpich/8.1.28  11) 
cray-libpals/1.3.4  14) cudatoolkit-standalone/12.2.2
  3) perftools-base/23.12.0   6) craype/2.7.30 9) cray-pmi/6.1.1312) 
craype-x86-milan15) cray-libsci/23.12.5


On Thu, 2 May 2024, Junchao Zhang wrote:

> I used cudatoolkit-standalone/12.4.1 and gcc-12.3.
>
> Be sure to use the latest petsc/main or petsc/release, which contains fixes
> for Polaris.
>
> --Junchao Zhang
>
>
> On Thu, May 2, 2024 at 10:23 AM Satish Balay via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> > Try:
> >
> > module use /soft/modulefiles
> >
> > Satish
> >
> > On Thu, 2 May 2024, Vanella, Marcos (Fed) via petsc-users wrote:
> >
> > > Hi all, it seems the modules in Polaris have changed (can't find
> > cudatoolkit-standalone anymore).
> > > Does anyone have recent experience compiling the library with gnu and
> > cuda in the machine?
> > > Thank you!
> > > Marcos
> > >
> >
> >
>


[petsc-users] Compiling PETSc in Polaris with gnu

2024-05-02 Thread Vanella, Marcos (Fed) via petsc-users
Hi all, it seems the modules in Polaris have changed (can't find 
cudatoolkit-standalone anymore).
Does anyone have recent experience compiling the library with gnu and cuda in 
the machine?
Thank you!
Marcos


Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time

2024-04-29 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Barry and Satish.
Trying it now.

From: Barry Smith 
Sent: Monday, April 29, 2024 12:15 PM
To: Vanella, Marcos (Fed) 
Cc: ba...@mcs.anl.gov ; petsc-users 
Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time


--with-x=0


On Apr 29, 2024, at 12:05 PM, Vanella, Marcos (Fed) via petsc-users 
 wrote:

This Message Is From an External Sender
This message came from outside your organization.
Hi Satish,
Ok thank you for clarifying. I don't need to include Metis in the config phase 
then (not using anywhere else).
Is there a way I can configure PETSc to not require X11 (Xgraph functions, 
etc.)?
Thank you,
Marcos

From: Satish Balay mailto:ba...@mcs.anl.gov>>
Sent: Monday, April 29, 2024 12:00 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time


# Other CMakeLists.txt files inside SuiteSparse are from dependent packages
# (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis
# which is a slightly revised copy of METIS 5.0.1) but none of those
# CMakeLists.txt files are used to build any package in SuiteSparse.


So suitesparse includes a copy of metis sources - i.e does not use external 
metis library?

>>
balay@pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway
libcholmod.so<https://urldefense.us/v3/__http://libcholmod.so/__;!!G_uCfscf7eWS!YIYZ1vTytYFBE5oIEuet3ePXOjBBGc_S-RV9ifF34mCwbBIPNMHUWhE0UXHsPrrNvUcv_j84iDub1KspAUwetDdPCYFkOISg$
 >:0026e500 T SuiteSparse_metis_METIS_PartGraphKway
<<<

And metis routines are already in -lcholmod [with some namespace fixes]

Satish

On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote:

> Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at 
> configure time with PETSc? Using Metis for reordering at symbolic 
> factorization phase gives lower filling for factorization matrices than AMD 
> in some cases (faster solution phase).
> I tried this with gcc compilers and openmpi:
>
> $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" 
> FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 
> --download-metis --download-suitesparse --download-hypre 
> --download-fblaslapack --download-make --force
>
> and get for SuiteSparse:
>
> metis:
>   Version:5.1.0
>   Includes:   
> -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include
>   Libraries:  
> -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib 
> -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis
> SuiteSparse:
>   Version:7.7.0
>   Includes:   
> -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse 
> -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include
>   Libraries:  
> -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib 
> -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr 
> -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd 
> -lsuitesparseconfig
>
> for which I see Metis will be compiled but I don't have -lmetis linking in 
> the SuiteSparse Libraries.
> Thank you for your time!
> Marcos
>



Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time

2024-04-29 Thread Vanella, Marcos (Fed) via petsc-users
Hi Satish,
Ok thank you for clarifying. I don't need to include Metis in the config phase 
then (not using anywhere else).
Is there a way I can configure PETSc to not require X11 (Xgraph functions, 
etc.)?
Thank you,
Marcos

From: Satish Balay 
Sent: Monday, April 29, 2024 12:00 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] Asking SuiteSparse to use Metis at PETSc config time


# Other CMakeLists.txt files inside SuiteSparse are from dependent packages
# (LAGraph/deps/json_h, GraphBLAS/cpu_features, and CHOLMOD/SuiteSparse_metis
# which is a slightly revised copy of METIS 5.0.1) but none of those
# CMakeLists.txt files are used to build any package in SuiteSparse.


So suitesparse includes a copy of metis sources - i.e does not use external 
metis library?

>>
balay@pj01:~/petsc/arch-linux-c-debug/lib$ nm -Ao *.so |grep METIS_PartGraphKway
libcholmod.so:0026e500 T SuiteSparse_metis_METIS_PartGraphKway
<<<

And metis routines are already in -lcholmod [with some namespace fixes]

Satish

On Mon, 29 Apr 2024, Vanella, Marcos (Fed) via petsc-users wrote:

> Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at 
> configure time with PETSc? Using Metis for reordering at symbolic 
> factorization phase gives lower filling for factorization matrices than AMD 
> in some cases (faster solution phase).
> I tried this with gcc compilers and openmpi:
>
> $./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" 
> FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 
> --download-metis --download-suitesparse --download-hypre 
> --download-fblaslapack --download-make --force
>
> and get for SuiteSparse:
>
> metis:
>   Version:5.1.0
>   Includes:   
> -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include
>   Libraries:  
> -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib 
> -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis
> SuiteSparse:
>   Version:7.7.0
>   Includes:   
> -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse 
> -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include
>   Libraries:  
> -Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib 
> -L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr 
> -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd 
> -lsuitesparseconfig
>
> for which I see Metis will be compiled but I don't have -lmetis linking in 
> the SuiteSparse Libraries.
> Thank you for your time!
> Marcos
>



[petsc-users] Asking SuiteSparse to use Metis at PETSc config time

2024-04-29 Thread Vanella, Marcos (Fed) via petsc-users
Hi all, I'm wondering.. Is it possible to get SuiteSparse to use Metis at 
configure time with PETSc? Using Metis for reordering at symbolic factorization 
phase gives lower filling for factorization matrices than AMD in some cases 
(faster solution phase).
I tried this with gcc compilers and openmpi:

$./configure LDFLAGS="-ld_classic" COPTFLAGS="-O2 -g" CXXOPTFLAGS="-O2 -g" 
FOPTFLAGS="-O2 -g" --with-debugging=0 --with-shared-libraries=0 
--download-metis --download-suitesparse --download-hypre --download-fblaslapack 
--download-make --force

and get for SuiteSparse:

metis:
  Version:5.1.0
  Includes:   -I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include
  Libraries:  
-Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib 
-L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lmetis
SuiteSparse:
  Version:7.7.0
  Includes:   
-I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include/suitesparse 
-I/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/include
  Libraries:  
-Wl,-rpath,/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib 
-L/Users/mnv/Documents/Software/petsc/arch-darwin-opt-gcc/lib -lspqr -lumfpack 
-lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig

for which I see Metis will be compiled but I don't have -lmetis linking in the 
SuiteSparse Libraries.
Thank you for your time!
Marcos


[petsc-users] Compiling PETSc with strumpack in ORNL Frontier

2024-04-05 Thread Vanella, Marcos (Fed) via petsc-users
Hi all, we are trying to compile PETSc in Frontier using the structured matrix 
hierarchical solver strumpack, which uses GPU and might be a good candidate for 
our Poisson discretization.
The list of libs I used for PETSc in this case is:

$./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
HIPOPTFLAGS="-O3 --offload-arch=gfx90a" --with-debugging=0 --with-cc=cc 
--with-cxx=CC --with-fc=ftn --with-hip --with-hip-arch=gfx908 --with-hipc=hipcc 
  --LIBS="-L${MPICH_DIR}/lib -lmpi ${CRAY_XPMEM_POST_LINK_OPTS} -lxpmem 
${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" 
--download-kokkos --download-kokkos-kernels --download-suitesparse 
--download-hypre --download-superlu_dist --download-strumpack --download-metis 
--download-slate --download-magma --download-parmetis --download-ptscotch 
--download-zfp --download-butterflypack 
--with-openmp-dir=/opt/cray/pe/gcc/12.2.0/snos --download-scalapack 
--download-cmake --force

I'm getting an error at configure time:

...
  Trying to download 
https://urldefense.us/v3/__https://github.com/liuyangzhuan/ButterflyPACK__;!!G_uCfscf7eWS!cW5KuKKMbmDa8n59SJGArXdSxVT_-V0qH3vt1-LE-CAr4wShO2pTXN3GvI0bVCwUh6RWH6z2URqBczHnVEyXXKAJ2LN7JnSj$
  for BUTTERFLYPACK
=
=
 Configuring BUTTERFLYPACK with CMake; this may take several minutes
=
=
Compiling and installing BUTTERFLYPACK; this may take several 
minutes
=
=
Trying to download 
https://urldefense.us/v3/__https://github.com/pghysels/STRUMPACK__;!!G_uCfscf7eWS!cW5KuKKMbmDa8n59SJGArXdSxVT_-V0qH3vt1-LE-CAr4wShO2pTXN3GvI0bVCwUh6RWH6z2URqBczHnVEyXXKAJ2FeUr7dA$
  for STRUMPACK
=
=
   Configuring STRUMPACK with CMake; this may take several minutes
=
=
  Compiling and installing STRUMPACK; this may take several minutes
=

*
   UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for 
details):
-
   Error running make on  STRUMPACK
*

 Looking in the configure.log file I see error like this related to strumpack 
compilation:

/opt/cray/pe/craype/2.7.19/bin/CC -D__HIP_PLATFORM_AMD__=1 
-D__HIP_PLATFORM_HCC__=1 -Dstrumpack_EXPORTS 
-I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src
 
-I/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build
 -isystem 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/include
 -isystem /opt/rocm-5.4.0/include -isystem /opt/rocm-5.4.0/hip/include -isystem 
/opt/rocm-5.4.0/llvm/lib/clang/15.0.0/.. -Wno-lto-type-mismatch -Wno-psabi -O3 
-fPIC -fopenmp -Wno-lto-type-mismatch -Wno-psabi -O3 -fPIC -fopenmp -fPIC -Wall 
-Wno-overloaded-virtual -fopenmp -x hip --offload-arch=gfx900 
--offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a 
--offload-arch=gfx1030 -MD -MT 
CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -MF 
CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o.d -o 
CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.o -c 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/src/clustering/NeighborSearch.cpp
gmake[2]: Leaving directory 
'/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build'
gmake[1]: Leaving directory 
'/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-frontier-opt-gcc-direct/externalpackages/git.strumpack/petsc-build'
stdout:
g++: error: unrecognized command-line option '--offload-arch=gfx900'
g++: error: unrecognized command-line 

Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

2024-03-19 Thread Vanella, Marcos (Fed) via petsc-users
Ok, thanks. I'll try it when the machine comes back online.
Cheers,
M

From: Mark Adams 
Sent: Tuesday, March 19, 2024 5:15 PM
To: Vanella, Marcos (Fed) 
Cc: PETSc users list 
Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

You want: -mat_type aijhipsparse

On Tue, Mar 19, 2024 at 5:06 PM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark, thanks. I'll try your suggestions. So, I would keep -mat_type 
mpiaijkokkos but -vec_type hip as runtime options?
Thanks,
Marcos

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Tuesday, March 19, 2024 4:57 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: PETSc users list mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

[keep on list]

I have little experience with running hypre on GPUs but others might have more.

1M dogs/node is not a lot and NVIDIA has larger L1 cache and more mature 
compilers, etc. so it is not surprising that NVIDIA is faster.
I suspect the gap would narrow with a larger problem.

Also, why are you using Kokkos? It should not make a difference but you could 
check easily. Just use -vec_type hip with your current code.

You could also test with GAMG, -pc_type gamg

Mark


On Tue, Mar 19, 2024 at 4:12 PM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark, I run a canonical test we have to time our code. It is a propane fire 
on a burner within a box with around 1 million cells.
I split the problem in 4 GPUS, single node, both in Polaris and Frontier. I 
compiled PETSc with gnu and HYPRE being downloaded and the following configure 
options:


  *
Polaris:
$./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
CUDAOPTFLAGS="-O3" --with-debugging=0 --download-suitesparse --download-hypre 
--with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc 
--with-cuda-arch=80 --download-cmake


  *
Frontier:
$./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn 
--with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi 
${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" 
--download-kokkos --download-kokkos-kernels --download-suitesparse 
--download-hypre --download-cmake

Our code was compiled also with gnu compilers and -O3 flag. I used latest (from 
this week) PETSc repo update. These are the timings for the test case:


  *   8 meshes + 1Million cells case, 8 MPI processes, 4 GPUS, 2 MPI Procs per 
GPU, 1 sec run time (~580 time steps, ~1160 Poisson solves):

System  Poisson Solver  GPU Implementation  Poisson 
Wall time (sec) Total Wall time (sec)
Polaris CG + HYPRE PC   CUDA80  
287
FrontierCG + HYPRE PC   Kokkos + HIP158 
401

It is interesting to see that the Poisson solves take twice the time in 
Frontier than in Polaris.
Do you have experience on running HYPRE AMG on these machines? Is this 
difference between the CUDA implementation and Kokkos-kernels to be expected?

I can run the case in both computers with the log flags you suggest. Might give 
more information on where the differences are.

Thank you for your time,
Marcos



From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Tuesday, March 5, 2024 2:41 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

You can run with -log_view_gpu_time to get rid of the nans and get more data.

You can run with -ksp_view to get more info on the solver and send that output.

-options_left is also good to use so we can see what parameters you used.

The last 100 in this row:

KSPSolve1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04 
3.1e+04 12 100 100 100 49  12 100 100 100 98  2503-nan  0 1.80e-050 
0.00e+00  100

tells us that all the flops were logged on GPUs.

You do need at least 100K equations per GPU to see speedup, so don't worry 
about small problems.

Mark




On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip 
options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" 
FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0
ZjQcmQRYFpfptBannerStart
This M

Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

2024-03-19 Thread Vanella, Marcos (Fed) via petsc-users
Hi Mark, thanks. I'll try your suggestions. So, I would keep -mat_type 
mpiaijkokkos but -vec_type hip as runtime options?
Thanks,
Marcos

From: Mark Adams 
Sent: Tuesday, March 19, 2024 4:57 PM
To: Vanella, Marcos (Fed) 
Cc: PETSc users list 
Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

[keep on list]

I have little experience with running hypre on GPUs but others might have more.

1M dogs/node is not a lot and NVIDIA has larger L1 cache and more mature 
compilers, etc. so it is not surprising that NVIDIA is faster.
I suspect the gap would narrow with a larger problem.

Also, why are you using Kokkos? It should not make a difference but you could 
check easily. Just use -vec_type hip with your current code.

You could also test with GAMG, -pc_type gamg

Mark


On Tue, Mar 19, 2024 at 4:12 PM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark, I run a canonical test we have to time our code. It is a propane fire 
on a burner within a box with around 1 million cells.
I split the problem in 4 GPUS, single node, both in Polaris and Frontier. I 
compiled PETSc with gnu and HYPRE being downloaded and the following configure 
options:


  *
Polaris:
$./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
CUDAOPTFLAGS="-O3" --with-debugging=0 --download-suitesparse --download-hypre 
--with-cuda --with-cc=cc --with-cxx=CC --with-fc=ftn --with-cudac=nvcc 
--with-cuda-arch=80 --download-cmake


  *
Frontier:
$./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn 
--with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi 
${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" 
--download-kokkos --download-kokkos-kernels --download-suitesparse 
--download-hypre --download-cmake

Our code was compiled also with gnu compilers and -O3 flag. I used latest (from 
this week) PETSc repo update. These are the timings for the test case:


  *   8 meshes + 1Million cells case, 8 MPI processes, 4 GPUS, 2 MPI Procs per 
GPU, 1 sec run time (~580 time steps, ~1160 Poisson solves):

System  Poisson Solver  GPU Implementation  Poisson 
Wall time (sec) Total Wall time (sec)
Polaris CG + HYPRE PC   CUDA80  
287
FrontierCG + HYPRE PC   Kokkos + HIP158 
401

It is interesting to see that the Poisson solves take twice the time in 
Frontier than in Polaris.
Do you have experience on running HYPRE AMG on these machines? Is this 
difference between the CUDA implementation and Kokkos-kernels to be expected?

I can run the case in both computers with the log flags you suggest. Might give 
more information on where the differences are.

Thank you for your time,
Marcos



From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Tuesday, March 5, 2024 2:41 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

You can run with -log_view_gpu_time to get rid of the nans and get more data.

You can run with -ksp_view to get more info on the solver and send that output.

-options_left is also good to use so we can see what parameters you used.

The last 100 in this row:

KSPSolve1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04 
3.1e+04 12 100 100 100 49  12 100 100 100 98  2503-nan  0 1.80e-050 
0.00e+00  100

tells us that all the flops were logged on GPUs.

You do need at least 100K equations per GPU to see speedup, so don't worry 
about small problems.

Mark




On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip 
options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" 
FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip 
options:

./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn 
--with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi 
${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIB

Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

2024-03-05 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Mark, I'll try the options you suggest to get more info. I'm also 
building PETSc and the code with the cray compiler suite to test.
The test I'm running has 1 million unknowns. I was able to see good scaling up 
to 4 gpus on this case in Polaris.
Talk soon,
Marcos

From: Mark Adams 
Sent: Tuesday, March 5, 2024 2:41 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

You can run with -log_view_gpu_time to get rid of the nans and get more data.

You can run with -ksp_view to get more info on the solver and send that output.

-options_left is also good to use so we can see what parameters you used.

The last 100 in this row:

KSPSolve1197 0.0 2.0291e+02 0.0 2.55e+11 0.0 3.9e+04 8.0e+04 
3.1e+04 12 100 100 100 49  12 100 100 100 98  2503-nan  0 1.80e-050 
0.00e+00  100

tells us that all the flops were logged on GPUs.

You do need at least 100K equations per GPU to see speedup, so don't worry 
about small problems.

Mark




On Tue, Mar 5, 2024 at 12:52 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip 
options: ./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" 
FCOPTFLAGS="-O3" HIPOPTFLAGS="-O3" --with-debugging=0
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip 
options:

./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn 
--with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi 
${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" 
--download-kokkos --download-kokkos-kernels --download-suitesparse 
--download-hypre --download-cmake

and have started testing our code solving a Poisson linear system with CG + 
HYPRE preconditioner. Timings look rather high compared to compilations done on 
other machines that have NVIDIA cards. They are also not changing when using 
more than one GPU for the simple test I doing.
Does anyone happen to know if HYPRE has an hip GPU implementation for Boomer 
AMG and is it compiled when configuring PETSc?

Thanks!

Marcos


PS: This is what I see on the log file (-log_view) when running the case with 2 
GPUs in the node:


-- PETSc 
Performance Summary: 
--

/ccs/home/vanellam/Firemodels_fork/fds/Build/mpich_gnu_frontier/fds_mpich_gnu_frontier
 on a arch-linux-frontier-opt-gcc named frontier04119 with 4 processors, by 
vanellam Tue Mar  5 12:42:29 2024
Using Petsc Development GIT revision: v3.20.5-713-gabdf6bc0fcf  GIT Date: 
2024-03-05 01:04:54 +

 Max   Max/Min Avg   Total
Time (sec):   8.368e+02 1.000   8.368e+02
Objects:  0.000e+00 0.000   0.000e+00
Flops:2.546e+11 0.000   1.270e+11  5.079e+11
Flops/sec:3.043e+08 0.000   1.518e+08  6.070e+08
MPI Msg Count:1.950e+04 0.000   9.748e+03  3.899e+04
MPI Msg Len (bytes):  1.560e+09 0.000   7.999e+04  3.119e+09
MPI Reductions:   6.331e+04   2877.545

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N 
flops
and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   - Time --  - Flop --  --- Messages ---  -- 
Message Lengths --  -- Reductions --
Avg %Total Avg %TotalCount   %Total 
Avg %TotalCount   %Total
 0:  Main Stage: 8.3676e+02 100.0%  5.0792e+11 100.0%  3.899e+04 100.0%  
7.999e+04  100.0%  3.164e+04  50.0%


See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
  %T - percent time in this phase %F - percent flop in this phase
  %M - percent messages in th

[petsc-users] Running CG with HYPRE AMG preconditioner in AMD GPUs

2024-03-05 Thread Vanella, Marcos (Fed) via petsc-users
Hi all, I compiled the latest PETSc source in Frontier using gcc+kokkos and hip 
options:

./configure COPTFLAGS="-O3" CXXOPTFLAGS="-O3" FOPTFLAGS="-O3" FCOPTFLAGS="-O3" 
HIPOPTFLAGS="-O3" --with-debugging=0 --with-cc=cc --with-cxx=CC --with-fc=ftn 
--with-hip --with-hipc=hipcc --LIBS="-L${MPICH_DIR}/lib -lmpi 
${PE_MPICH_GTL_DIR_amd_gfx90a} ${PE_MPICH_GTL_LIBS_amd_gfx90a}" 
--download-kokkos --download-kokkos-kernels --download-suitesparse 
--download-hypre --download-cmake

and have started testing our code solving a Poisson linear system with CG + 
HYPRE preconditioner. Timings look rather high compared to compilations done on 
other machines that have NVIDIA cards. They are also not changing when using 
more than one GPU for the simple test I doing.
Does anyone happen to know if HYPRE has an hip GPU implementation for Boomer 
AMG and is it compiled when configuring PETSc?

Thanks!

Marcos


PS: This is what I see on the log file (-log_view) when running the case with 2 
GPUs in the node:


-- PETSc 
Performance Summary: 
--

/ccs/home/vanellam/Firemodels_fork/fds/Build/mpich_gnu_frontier/fds_mpich_gnu_frontier
 on a arch-linux-frontier-opt-gcc named frontier04119 with 4 processors, by 
vanellam Tue Mar  5 12:42:29 2024
Using Petsc Development GIT revision: v3.20.5-713-gabdf6bc0fcf  GIT Date: 
2024-03-05 01:04:54 +

 Max   Max/Min Avg   Total
Time (sec):   8.368e+02 1.000   8.368e+02
Objects:  0.000e+00 0.000   0.000e+00
Flops:2.546e+11 0.000   1.270e+11  5.079e+11
Flops/sec:3.043e+08 0.000   1.518e+08  6.070e+08
MPI Msg Count:1.950e+04 0.000   9.748e+03  3.899e+04
MPI Msg Len (bytes):  1.560e+09 0.000   7.999e+04  3.119e+09
MPI Reductions:   6.331e+04   2877.545

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N 
flops
and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   - Time --  - Flop --  --- Messages ---  -- 
Message Lengths --  -- Reductions --
Avg %Total Avg %TotalCount   %Total 
Avg %TotalCount   %Total
 0:  Main Stage: 8.3676e+02 100.0%  5.0792e+11 100.0%  3.899e+04 100.0%  
7.999e+04  100.0%  3.164e+04  50.0%


See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
  %T - percent time in this phase %F - percent flop in this phase
  %M - percent messages in this phase %L - percent message lengths in 
this phase
  %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all 
processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time 
over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per 
processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per 
processor)
   GPU %F: percent flops on GPU in this event

EventCount  Time (sec) Flop 
 --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu - GPU
   Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---

--- Event Stage 0: Main Stage

BuildTwoSided   1201 0.0   nan nan 0.00e+00 0.0 2.0e+00 4.0e+00 6.0e+02  0  
0  0  0  1   0  0  0  0  2  -nan-nan  0 0.00e+000 0.00e+00  0
BuildTwoSidedF  1200 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+02  0  
0  0  0  1   0  0  0  0  2  -nan-nan  0 0.00e+000 0.00e+00  0
MatMult19494 0.0   nan nan 1.35e+11 0.0 3.9e+04 8.0e+04 0.0e+00  7 
53 

Re: [petsc-users] Using Sundials from PETSc

2023-10-16 Thread Vanella, Marcos (Fed) via petsc-users
Hi Matt, very interesting project you are working on. We haven't gone deep on 
how we would do this in GPUs and are starting to look at options. We will 
explore if it is possible to batch work needed for several cells within a 
thread group on the gpu.

We use a single Cartesian mesh per MPI process (usually with 40^3 to 50^3 
cells). Something I implemented to avoid the MPI process over-subscription of 
GPU with PETSc solvers was to cluster several MPI Processes per GPU on resource 
sets. Then, the processes in the set would pass matrix (at setup) and RHS to a 
single process (set master) which communicates with the GPU.
The GPU solution is then brought back to the set master which distributes it to 
the MPI processes in the set as needed.
So, only a set of processes as large as the number of GPUs in the calculation 
(with their own MPI communicator) call the PETSc matrix and vector building, 
and solve routines.  The neat thing is that all MPI communications are local to 
the node. This idea is not new, it was developed by the researchers at GWU that 
interfaced PETSc to AMGx back when there were no native GPU solvers in PETSc, 
HYPRE and other libs (~2016).

Best,
Marcos


From: Matthew Knepley 
Sent: Monday, October 16, 2023 4:31 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov ; Paul, Chandan 
(IntlAssoc) 
Subject: Re: [petsc-users] Using Sundials from PETSc

On Mon, Oct 16, 2023 at 4:08 PM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mathew, we have code that time splits the combustion step from the chemical 
species transport, so on each computational cell for each fluid flow time step, 
once transport is done we have the mixture chemical composition as initial 
condition. We are looking into doing finite rate chemistry with skeletal 
combustion models (20+ equations) in each cell for each fluid time step. 
Sundials provides the CVODE solver for the time integration of these, and would 
be interesting to see if we can make use of GPU acceleration. From their User 
Guide for Version 6.6.0 there are several GPU implementations for building RHS 
and using linear, nonlinear and stiff ODE solvers.

We are doing a similar thing in CHREST (https://www.buffalo.edu/chrest.html). 
Since we normally use hundreds of species and thousands of reactions for the 
reduced mechanism, we are using TChem2 to build and solve the system in each 
cell.

Since these systems are so small, you are likely to need some way of batching 
them within a warp. Do you have an idea for this already?

  Thanks,

 Matt

Thank you Satish for the comment. Might be better at this point to first get an 
idea on what the implementation in our code using Sundials directly would look 
like. Then, we can see if it is possible and makes sense to access it through 
PETSc.
We have things working in CPU making use of and older version of CVODE.

BTW after some changes in our code we are starting running larger cases using 
GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced 
already.

Thanks!


From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, October 16, 2023 3:03 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>; Paul, Chandan 
(IntlAssoc) mailto:chandan.p...@nist.gov>>
Subject: Re: [petsc-users] Using Sundials from PETSc

On Mon, Oct 16, 2023 at 2:29 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, we were wondering if it would be possible to call the latest version of 
Sundials from PETSc?

The short answer is, no. We are at v2.5 and they are at v6.5. There were no 
dates on the version history page, so I do not know how out of date we are. 
There have not been any requests for update until now.

We would be happy to get an MR for the updates if you want to try it.

We are interested in doing chemistry using GPUs and already have interfaces to 
PETSc from our code.

How does the GPU interest interact with the SUNDIALS version?

  Thanks,

 Matt

Thanks,
Marcos


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-users] Using Sundials from PETSc

2023-10-16 Thread Vanella, Marcos (Fed) via petsc-users
Hi Mathew, we have code that time splits the combustion step from the chemical 
species transport, so on each computational cell for each fluid flow time step, 
once transport is done we have the mixture chemical composition as initial 
condition. We are looking into doing finite rate chemistry with skeletal 
combustion models (20+ equations) in each cell for each fluid time step. 
Sundials provides the CVODE solver for the time integration of these, and would 
be interesting to see if we can make use of GPU acceleration. From their User 
Guide for Version 6.6.0 there are several GPU implementations for building RHS 
and using linear, nonlinear and stiff ODE solvers.

Thank you Satish for the comment. Might be better at this point to first get an 
idea on what the implementation in our code using Sundials directly would look 
like. Then, we can see if it is possible and makes sense to access it through 
PETSc.
We have things working in CPU making use of and older version of CVODE.

BTW after some changes in our code we are starting running larger cases using 
GPU accelerated iterative solvers from PETSc, so we have PETSc interfaced 
already.

Thanks!


From: Matthew Knepley 
Sent: Monday, October 16, 2023 3:03 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov ; Paul, Chandan 
(IntlAssoc) 
Subject: Re: [petsc-users] Using Sundials from PETSc

On Mon, Oct 16, 2023 at 2:29 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, we were wondering if it would be possible to call the latest version of 
Sundials from PETSc?

The short answer is, no. We are at v2.5 and they are at v6.5. There were no 
dates on the version history page, so I do not know how out of date we are. 
There have not been any requests for update until now.

We would be happy to get an MR for the updates if you want to try it.

We are interested in doing chemistry using GPUs and already have interfaces to 
PETSc from our code.

How does the GPU interest interact with the SUNDIALS version?

  Thanks,

 Matt

Thanks,
Marcos


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


[petsc-users] Using Sundials from PETSc

2023-10-16 Thread Vanella, Marcos (Fed) via petsc-users
Hi, we were wondering if it would be possible to call the latest version of 
Sundials from PETSc?
We are interested in doing chemistry using GPUs and already have interfaces to 
PETSc from our code.
Thanks,
Marcos


Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

2023-08-24 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Matt and Junchao. I've been testing further with nvhpc on summit. You 
might have an idea on what is going on here.
These are my modules:

Currently Loaded Modules:
  1) lsf-tools/2.0   3) darshan-runtime/3.4.0-lite   5) DefApps   7) 
spectrum-mpi/10.4.0.3-20210112   9) nsight-systems/2021.3.1.54
  2) hsi/5.0.2.p54) xalt/1.2.1   6) nvhpc/22.11   8) 
nsight-compute/2021.2.1 10) cuda/11.7.1

I configured and compiled petsc with these options:

./configure COPTFLAGS="-O2" CXXOPTFLAGS="-O2" FOPTFLAGS="-O2" FCOPTFLAGS="-O2" 
CUDAOPTFLAGS="-O2" --with-debugging=0 --download-suitesparse --download-hypre 
--download-fblaslapack --with-cuda

without issues. The MPI checks did not go through as this was done in the login 
node.

Then, I started getting (similarly to what I saw with pgi and gcc in summit) 
ambiguous interface errors related to mpi routines. I was able to make a simple 
piece of code that reproduces this. It has to do with having a USE PETSC 
statement in a module (TEST_MOD) and a USE MPI_F08 on the main program (MAIN) 
using that module, even though the PRIVATE statement has been used in said 
(TEST_MOD) module.

MODULE TEST_MOD
! In this module we use PETSC.
USE PETSC
!USE MPI
IMPLICIT NONE
PRIVATE
PUBLIC :: TEST1

CONTAINS
SUBROUTINE TEST1(A)
IMPLICIT NONE
REAL, INTENT(INOUT) :: A
INTEGER :: IERR
A=0.
ENDSUBROUTINE TEST1

ENDMODULE TEST_MOD


PROGRAM MAIN

! Assume in main we use some MPI_F08 features.
USE MPI_F08
USE TEST_MOD, ONLY : TEST1
IMPLICIT NONE
INTEGER :: MY_RANK,IERR=0
INTEGER :: PNAMELEN=0
INTEGER :: PROVIDED
INTEGER, PARAMETER :: REQUIRED=MPI_THREAD_FUNNELED
REAL :: A=0.
CALL MPI_INIT_THREAD(REQUIRED,PROVIDED,IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, MY_RANK, IERR)
CALL TEST1(A)
CALL MPI_FINALIZE(IERR)

ENDPROGRAM MAIN

Leaving the USE PETSC statement in TEST_MOD this is what I get when trying to 
compile this code:

vanellam@login5 test_spectrum_issue $ mpifort -c 
-I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/" 
-I"/autofs/nccs-svm1_home1/vanellam/Software/petsc/arch-linux-c-opt-nvhpc/include"
  mpitest.f90
NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_init_thread 
(mpitest.f90: 34)
NVFORTRAN-S-0155-Ambiguous interfaces for generic procedure mpi_finalize 
(mpitest.f90: 37)
  0 inform,   0 warnings,   2 severes, 0 fatal for main

Now, if I change USE PETSC by USE MPI in the module TEST_MOD compilation 
proceeds correctly. If I leave the USE PETSC statement in the module and change 
to USE MPI the statement in main compilation also goes through. So it seems to 
be something related to using the PETSC and MPI_F08 modules. My take is that it 
is related to spectrum-mpi, as I haven't had issues compiling the FDS+PETSc 
with openmpi in other systems.

Well please let me know if you have any ideas on what might be going on. I'll 
move to polaris and try with mpich too.

Thanks!
Marcos



From: Junchao Zhang 
Sent: Tuesday, August 22, 2023 5:25 PM
To: Matthew Knepley 
Cc: Vanella, Marcos (Fed) ; PETSc users list 
; Guan, Collin X. (Fed) 
Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
processes and 1 GPU

Macros,
  yes, refer to the example script Matt mentioned for Summit.  Feel free to 
turn on/off options in the file.  In my experience, gcc is easier to use.
  Also, I found 
https://docs.alcf.anl.gov/polaris/running-jobs/#binding-mpi-ranks-to-gpus, 
which might be similar to your machine (4 GPUs per node).  The key point is: 
The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. 
For applications that need this support, this instead can be handled by use of 
a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each 
MPI rank.
  So you can try the helper script set_affinity_gpu_polaris.sh to manually set  
CUDA_VISIBLE_DEVICES.  In other words, make the script on your PATH and then 
run your job with
  srun -N 2 -n 16 set_affinity_gpu_polaris.sh 
/home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds 
-pc_type gamg -mat_type aijcusparse -vec_type cuda

  Then, check again with nvidia-smi to see if GPU memory is evenly allocated.
--Junchao Zhang


On Tue, Aug 22, 2023 at 3:03 PM Matthew Knepley 
mailto:knep...@gmail.com>> wrote:
On Tue, Aug 22, 2023 at 2:54 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi Junchao, both the slurm scontrol show job_id -dd and looking at 
CUDA_VISIBLE_DEVICES does not provide information about which MPI process is 
associated to which GPU in the node in our system. I can see this with 
nvidia-smi, but if you have any other suggestion using slurm I would like to 
hear it.

I've been trying to compile the code+Petsc in summit, but have been having all 
sorts of issues related to spectrum-mpi, and the different compilers they 
provide (I trie

Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

2023-08-22 Thread Vanella, Marcos (Fed) via petsc-users
Hi Junchao, both the slurm scontrol show job_id -dd and looking at 
CUDA_VISIBLE_DEVICES does not provide information about which MPI process is 
associated to which GPU in the node in our system. I can see this with 
nvidia-smi, but if you have any other suggestion using slurm I would like to 
hear it.

I've been trying to compile the code+Petsc in summit, but have been having all 
sorts of issues related to spectrum-mpi, and the different compilers they 
provide (I tried gcc, nvhpc, pgi, xl. Some of them don't handle Fortran 2018, 
others give issues of repeated MPI definitions, etc.).

I also wanted to ask you, do you know if it is possible to compile PETSc with 
the xl/16.1.1-10 suite?

Thanks!

I configured the library --with-cuda and when compiling I get a compilation 
error with CUDAC:

CUDAC arch-linux-opt-xl/obj/src/sys/classes/random/impls/curand/curand2.o
In file included from 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:1:
In file included from 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/randomimpl.h:5:
In file included from 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petsc/private/petscimpl.h:7:
In file included from 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsys.h:44:
In file included from 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/include/petscsystypes.h:532:
In file included from /sw/summit/cuda/11.7.1/include/thrust/complex.h:24:
In file included from /sw/summit/cuda/11.7.1/include/thrust/detail/config.h:23:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/detail/config/config.h:27:
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:112:6: 
warning: Thrust requires at least Clang 7.0. Define 
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message. 
[-W#pragma-messages]
 THRUST_COMPILER_DEPRECATION(Clang 7.0);
 ^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:101:3: note: 
expanded from macro 'THRUST_COMPILER_DEPRECATION'
  THRUST_COMP_DEPR_IMPL(Thrust requires at least REQ. Define 
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
  ^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:95:38: note: 
expanded from macro 'THRUST_COMP_DEPR_IMPL'
#  define THRUST_COMP_DEPR_IMPL(msg) THRUST_COMP_DEPR_IMPL0(GCC warning #msg)
 ^
/sw/summit/cuda/11.7.1/include/thrust/detail/config/cpp_dialect.h:96:40: note: 
expanded from macro 'THRUST_COMP_DEPR_IMPL0'
#  define THRUST_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
   ^
:141:6: note: expanded from here
 GCC warning "Thrust requires at least Clang 7.0. Define 
THRUST_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
 ^
In file included from 
/autofs/nccs-svm1_home1/vanellam/Software/petsc/src/sys/classes/random/impls/curand/curand2.cu:2:
In file included from /sw/summit/cuda/11.7.1/include/thrust/transform.h:721:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/detail/transform.inl:27:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.h:104:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/system/detail/generic/transform.inl:19:
In file included from /sw/summit/cuda/11.7.1/include/thrust/for_each.h:277:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/detail/for_each.inl:27:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/system/detail/adl/for_each.h:42:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/for_each.h:35:
In file included from 
/sw/summit/cuda/11.7.1/include/thrust/system/cuda/detail/util.h:36:
In file included from 
/sw/summit/cuda/11.7.1/include/cub/detail/device_synchronize.cuh:19:
In file included from /sw/summit/cuda/11.7.1/include/cub/util_arch.cuh:36:
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:123:6: warning: CUB 
requires at least Clang 7.0. Define CUB_IGNORE_DEPRECATED_CPP_DIALECT to 
suppress this message. [-W#pragma-messages]
 CUB_COMPILER_DEPRECATION(Clang 7.0);
 ^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:112:3: note: expanded 
from macro 'CUB_COMPILER_DEPRECATION'
  CUB_COMP_DEPR_IMPL(CUB requires at least REQ. Define 
CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message.)
  ^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:106:35: note: expanded 
from macro 'CUB_COMP_DEPR_IMPL'
#  define CUB_COMP_DEPR_IMPL(msg) CUB_COMP_DEPR_IMPL0(GCC warning #msg)
  ^
/sw/summit/cuda/11.7.1/include/cub/util_cpp_dialect.cuh:107:37: note: expanded 
from macro 'CUB_COMP_DEPR_IMPL0'
#  define CUB_COMP_DEPR_IMPL0(expr) _Pragma(#expr)
^
:198:6: note: expanded from here
 GCC warning "CUB requires at least Clang 7.0. Define 
CUB_IGNORE_DEPRECATED_CPP_DIALECT to suppress this message."
 ^

Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

2023-08-21 Thread Vanella, Marcos (Fed) via petsc-users
Ok thanks Junchao, so is GPU 0 actually allocating memory for the 8 MPI 
processes meshes but only working on 2 of them?
It says in the script it has allocated 2.4GB
Best,
Marcos

From: Junchao Zhang 
Sent: Monday, August 21, 2023 3:29 PM
To: Vanella, Marcos (Fed) 
Cc: PETSc users list ; Guan, Collin X. (Fed) 

Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
processes and 1 GPU

Hi, Macros,
  If you look at the PIDs of the nvidia-smi output, you will only find 8 unique 
PIDs, which is expected since you allocated 8 MPI ranks per node.
  The duplicate PIDs are usually for threads spawned by the MPI runtime (for 
example, progress threads in MPI implementation).   So your job script and 
output are all good.

  Thanks.

On Mon, Aug 21, 2023 at 2:00 PM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Junchao, something I'm noting related to running with cuda enabled linear 
solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the 
GPU 0 in the node is taking what seems to be all sub-matrices corresponding to 
all the MPI processes in the node. This is the result of the nvidia-smi command 
on a node with 8 MPI processes (each advancing the same number of unknowns in 
the calculation) and 4 GPU V100s:

Mon Aug 21 14:36:07 2023
+---+
| NVIDIA-SMI 535.54.03  Driver Version: 535.54.03CUDA Version: 
12.2 |
|-+--+--+
| GPU  Name Persistence-M | Bus-IdDisp.A | Volatile 
Uncorr. ECC |
| Fan  Temp   Perf  Pwr:Usage/Cap | Memory-Usage | GPU-Util  
Compute M. |
| |  |  
 MIG M. |
|=+==+==|
|   0  Tesla V100-SXM2-16GB   On  | 0004:04:00.0 Off |  
  0 |
| N/A   34CP0  63W / 300W |   2488MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+
|   1  Tesla V100-SXM2-16GB   On  | 0004:05:00.0 Off |  
  0 |
| N/A   38CP0  56W / 300W |638MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+
|   2  Tesla V100-SXM2-16GB   On  | 0035:03:00.0 Off |  
  0 |
| N/A   35CP0  52W / 300W |638MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+
|   3  Tesla V100-SXM2-16GB   On  | 0035:04:00.0 Off |  
  0 |
| N/A   38CP0  53W / 300W |638MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+

+---+
| Processes:
|
|  GPU   GI   CIPID   Type   Process name
GPU Memory |
|ID   ID 
Usage  |
|===|
|0   N/A  N/A214626  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|0   N/A  N/A214627  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214628  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214629  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214630  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|0   N/A  N/A214631  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214632  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214633  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|1   N/A  N/A214627  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|1   N/A  N/A214631  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|2   N/A  N/A214628  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|2   N/A  N/A214632  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|3   N/A  N/A214629

Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

2023-08-21 Thread Vanella, Marcos (Fed) via petsc-users
Hi Junchao, something I'm noting related to running with cuda enabled linear 
solvers (CG+HYPRE, CG+GAMG) is that for multi cpu-multi gpu calculations, the 
GPU 0 in the node is taking what seems to be all sub-matrices corresponding to 
all the MPI processes in the node. This is the result of the nvidia-smi command 
on a node with 8 MPI processes (each advancing the same number of unknowns in 
the calculation) and 4 GPU V100s:

Mon Aug 21 14:36:07 2023
+---+
| NVIDIA-SMI 535.54.03  Driver Version: 535.54.03CUDA Version: 
12.2 |
|-+--+--+
| GPU  Name Persistence-M | Bus-IdDisp.A | Volatile 
Uncorr. ECC |
| Fan  Temp   Perf  Pwr:Usage/Cap | Memory-Usage | GPU-Util  
Compute M. |
| |  |  
 MIG M. |
|=+==+==|
|   0  Tesla V100-SXM2-16GB   On  | 0004:04:00.0 Off |  
  0 |
| N/A   34CP0  63W / 300W |   2488MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+
|   1  Tesla V100-SXM2-16GB   On  | 0004:05:00.0 Off |  
  0 |
| N/A   38CP0  56W / 300W |638MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+
|   2  Tesla V100-SXM2-16GB   On  | 0035:03:00.0 Off |  
  0 |
| N/A   35CP0  52W / 300W |638MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+
|   3  Tesla V100-SXM2-16GB   On  | 0035:04:00.0 Off |  
  0 |
| N/A   38CP0  53W / 300W |638MiB / 16384MiB |  0%  
Default |
| |  |  
N/A |
+-+--+--+

+---+
| Processes:
|
|  GPU   GI   CIPID   Type   Process name
GPU Memory |
|ID   ID 
Usage  |
|===|
|0   N/A  N/A214626  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|0   N/A  N/A214627  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214628  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214629  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214630  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|0   N/A  N/A214631  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214632  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|0   N/A  N/A214633  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 308MiB |
|1   N/A  N/A214627  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|1   N/A  N/A214631  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|2   N/A  N/A214628  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|2   N/A  N/A214632  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|3   N/A  N/A214629  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
|3   N/A  N/A214633  C   ...d/ompi_gnu_linux/fds_ompi_gnu_linux 
 318MiB |
+---+


You can see that GPU 0 is connected to all 8 MPI Processes, each taking about 
300MB on it, whereas GPUs 1,2 and 3 are working with 2 MPI Processes. I'm 
wondering if this is expected or there are some changes I need to do on my 
submission script/runtime parameters.
This is the script in this case (2 nodes, 8 MPI processes/node, 4 GPU/node):

#!/bin/bash
# ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds
#SBATCH -J test
#SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
#SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
#SBATCH --partition=gpu
#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=8

Re: [petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

2023-08-11 Thread Vanella, Marcos (Fed) via petsc-users
.
--
--
mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited on 
signal 6 (Aborted).
--

BTW, I'm curious. If I set n MPI processes, each of them building a part of the 
linear system, and g GPUs, how does PETSc distribute those n pieces of system 
matrix and rhs in the g GPUs? Does it do some load balancing algorithm? Where 
can I read about this?
Thank you and best Regards, I can also point you to my code repo in GitHub if 
you want to take a closer look.

Best Regards,
Marcos


From: Junchao Zhang 
Sent: Friday, August 11, 2023 10:52 AM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] CUDA error trying to run a job with two mpi 
processes and 1 GPU

Hi, Marcos,
  Could you build petsc in debug mode and then copy and paste the whole error 
stack message?

   Thanks
--Junchao Zhang


On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, I'm trying to run a parallel matrix vector build and linear solution with 
PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and 
solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled 
openmpi and gcc 9.3. When I run the job with GPU enabled I get the following 
error:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress: an 
illegal memory access was encountered

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress: an 
illegal memory access was encountered

Program received signal SIGABRT: Process abort signal.

I'm new to submitting jobs in slurm that also use GPU resources, so I might be 
doing something wrong in my submission script. This is it:

#!/bin/bash
#SBATCH -J test
#SBATCH -e /home/Issues/PETSc/test.err
#SBATCH -o /home/Issues/PETSc/test.log
#SBATCH --partition=batch
#SBATCH --ntasks=2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:1

export OMP_NUM_THREADS=1
module load cuda/11.5
module load openmpi/4.1.1

cd /home/Issues/PETSc
mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds 
-vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg

If anyone has any suggestions on how o troubleshoot this please let me know.
Thanks!
Marcos





[petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

2023-08-10 Thread Vanella, Marcos (Fed) via petsc-users
Hi, I'm trying to run a parallel matrix vector build and linear solution with 
PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix build and 
solution is successful in CPUs only. I'm using cuda 11.5 and cuda enabled 
openmpi and gcc 9.3. When I run the job with GPU enabled I get the following 
error:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress: an 
illegal memory access was encountered

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress: an 
illegal memory access was encountered

Program received signal SIGABRT: Process abort signal.

I'm new to submitting jobs in slurm that also use GPU resources, so I might be 
doing something wrong in my submission script. This is it:

#!/bin/bash
#SBATCH -J test
#SBATCH -e /home/Issues/PETSc/test.err
#SBATCH -o /home/Issues/PETSc/test.log
#SBATCH --partition=batch
#SBATCH --ntasks=2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=2
#SBATCH --time=01:00:00
#SBATCH --gres=gpu:1

export OMP_NUM_THREADS=1
module load cuda/11.5
module load openmpi/4.1.1

cd /home/Issues/PETSc
mpirun -n 2 /home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds 
-vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg

If anyone has any suggestions on how o troubleshoot this please let me know.
Thanks!
Marcos





Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Vanella, Marcos (Fed) via petsc-users
Sorry, meant 100K to 200K cells.

Also, check the release page of suitesparse. The mutli-GPU version of cholmod 
might be coming soon:

https://people.engr.tamu.edu/davis/SuiteSparse/index.html

From: Vanella, Marcos (Fed) 
Sent: Tuesday, June 27, 2023 2:56 PM
To: Matthew Knepley 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm 
trying to see if GPU would provide an advantage for this particular Poisson 
solution we do in our code.

Our grids are staggered with the Poisson unknown in cell centers. All my tests 
for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster 
option for these meshes considering the mesh as unstructured (an implementation 
separate from the PETSc option). We have the option of Fishpack (fast 
trigonometric solvers), but that is not as general (requires solution on the 
whole mesh + a special treatment of immersed geometry). The single mesh solver 
is used as a black box within a fixed point domain decomposition iteration in 
multi-mesh cases. The approximation error in this method is confined to the 
mesh boundaries.

The other option I have tried with MKL is to build the global matrix across all 
meshes and use the MKL cluster sparse solver. The problem becomes a memory one 
for meshes that go over a couple million unknowns due to the exact Cholesky 
factorization matrix storage. I'm thinking the other possibility using PETSc is 
to build in parallel the global matrix (as done for the MKL global solver) and 
try the GPU accelerated Krylov + multigrid preconditioner. If this can bring 
down the time to solution to what we get for the previous scheme and keep 
memory use undrr control it would be a good option for CPU+GPU systems. Thing 
is we need to bring the residual of the equation to ~10^-10 or less to avoid 
instability so it might still be costly.

I'll keep you updated. Thanks,
Marcos

From: Matthew Knepley 
Sent: Tuesday, June 27, 2023 2:08 PM
To: Vanella, Marcos (Fed) 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the 
hypre Boomer AMG. They work just fine for my case. I also got my hands on a 
machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make 
use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled 
solvers. I was able to submit a case for the default Krylov solver with these 
runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky 
-pc_factor_mat_solver_type cusparse. The case run to completion.

I guess my question now is how do I monitor (if there is a way) that the GPU is 
being used in the calculation, and any other stats?

You should get that automatically with

  -log_view

If you want finer-grained profiling of the kernels, you can use

  -log_view_gpu_time

but it can slows things down.

Also, which other solver combination using GPU would you recommend for me to 
try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE?

Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now 
for GPUs. It depends on what card you have, what version of the driver, what 
version of the libraries, etc. It is very fragile. Hopefully this period ends 
soon, but I am not optimistic. Unless you are very confident that GPUs will 
help,
I would not recommend spending the time.

  Thanks,

 Matt

Thank you for your help!
Marcos


From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: Mark Adams mailto:mfad...@lbl.gov>>; 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?

Yes,

  Thanks,

Matt

That is what I did with suitesparse, very nice.

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] 

Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm 
trying to see if GPU would provide an advantage for this particular Poisson 
solution we do in our code.

Our grids are staggered with the Poisson unknown in cell centers. All my tests 
for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster 
option for these meshes considering the mesh as unstructured (an implementation 
separate from the PETSc option). We have the option of Fishpack (fast 
trigonometric solvers), but that is not as general (requires solution on the 
whole mesh + a special treatment of immersed geometry). The single mesh solver 
is used as a black box within a fixed point domain decomposition iteration in 
multi-mesh cases. The approximation error in this method is confined to the 
mesh boundaries.

The other option I have tried with MKL is to build the global matrix across all 
meshes and use the MKL cluster sparse solver. The problem becomes a memory one 
for meshes that go over a couple million unknowns due to the exact Cholesky 
factorization matrix storage. I'm thinking the other possibility using PETSc is 
to build in parallel the global matrix (as done for the MKL global solver) and 
try the GPU accelerated Krylov + multigrid preconditioner. If this can bring 
down the time to solution to what we get for the previous scheme and keep 
memory use undrr control it would be a good option for CPU+GPU systems. Thing 
is we need to bring the residual of the equation to ~10^-10 or less to avoid 
instability so it might still be costly.

I'll keep you updated. Thanks,
Marcos

From: Matthew Knepley 
Sent: Tuesday, June 27, 2023 2:08 PM
To: Vanella, Marcos (Fed) 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the 
hypre Boomer AMG. They work just fine for my case. I also got my hands on a 
machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make 
use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled 
solvers. I was able to submit a case for the default Krylov solver with these 
runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky 
-pc_factor_mat_solver_type cusparse. The case run to completion.

I guess my question now is how do I monitor (if there is a way) that the GPU is 
being used in the calculation, and any other stats?

You should get that automatically with

  -log_view

If you want finer-grained profiling of the kernels, you can use

  -log_view_gpu_time

but it can slows things down.

Also, which other solver combination using GPU would you recommend for me to 
try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE?

Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now 
for GPUs. It depends on what card you have, what version of the driver, what 
version of the libraries, etc. It is very fragile. Hopefully this period ends 
soon, but I am not optimistic. Unless you are very confident that GPUs will 
help,
I would not recommend spending the time.

  Thanks,

 Matt

Thank you for your help!
Marcos


From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: Mark Adams mailto:mfad...@lbl.gov>>; 
petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?

Yes,

  Thanks,

Matt

That is what I did with suitesparse, very nice.

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

I'm not sure what MG is doing with an "unstructured" problem. I assume you are 
not using DMDA.
-pc_type gamg should work
I would configure with hypre and try that also: -pc_type hypre

As Matt said MG should be faster. How many iterations was it taking?
Try a 100^3 and check that the iteration count does not change much, if at all.

Mark


On Mon, Jun 26, 2023 at 11:35 AM V

Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Vanella, Marcos (Fed) via petsc-users
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the 
hypre Boomer AMG. They work just fine for my case. I also got my hands on a 
machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make 
use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled 
solvers. I was able to submit a case for the default Krylov solver with these 
runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky 
-pc_factor_mat_solver_type cusparse. The case run to completion.

I guess my question now is how do I monitor (if there is a way) that the GPU is 
being used in the calculation, and any other stats? Also, which other solver 
combination using GPU would you recommend for me to try? Can we compile PETSc 
with the cuda enabled version for CHOLMOD and HYPRE?

Thank you for your help!
Marcos


From: Matthew Knepley 
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?

Yes,

  Thanks,

Matt

That is what I did with suitesparse, very nice.

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

I'm not sure what MG is doing with an "unstructured" problem. I assume you are 
not using DMDA.
-pc_type gamg should work
I would configure with hypre and try that also: -pc_type hypre

As Matt said MG should be faster. How many iterations was it taking?
Try a 100^3 and check that the iteration count does not change much, if at all.

Mark


On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, I was wondering if anyone has experience on what combinations are more 
efficient to solve a Poisson problem derived from a 7 point stencil on a single 
mesh (serial).
I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type 
mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type 
cholmod for the case I'm testing.
I'm new to PETSc so any suggestions are most welcome and appreciated,
Marcos


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-26 Thread Vanella, Marcos (Fed) via petsc-users
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?
That is what I did with suitesparse, very nice.

From: Mark Adams 
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

I'm not sure what MG is doing with an "unstructured" problem. I assume you are 
not using DMDA.
-pc_type gamg should work
I would configure with hypre and try that also: -pc_type hypre

As Matt said MG should be faster. How many iterations was it taking?
Try a 100^3 and check that the iteration count does not change much, if at all.

Mark


On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, I was wondering if anyone has experience on what combinations are more 
efficient to solve a Poisson problem derived from a 7 point stencil on a single 
mesh (serial).
I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type 
mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type 
cholmod for the case I'm testing.
I'm new to PETSc so any suggestions are most welcome and appreciated,
Marcos


[petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-26 Thread Vanella, Marcos (Fed) via petsc-users
Hi, I was wondering if anyone has experience on what combinations are more 
efficient to solve a Poisson problem derived from a 7 point stencil on a single 
mesh (serial).
I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type 
mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type 
cholmod for the case I'm testing.
I'm new to PETSc so any suggestions are most welcome and appreciated,
Marcos


Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
May 15, 2023 12:08 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
Ventura 13.3.1.
I can compile PETSc in debug mode with this configure and make lines. I can run 
the PETSC tests, which seem fine.
When I compile the library in optimized mode, either using -O3 or O1, for 
example configuring with:

I hate to yell "compiler bug" when this happens, but it sure seems like one. 
Can you just use

  --with-debugging=0

without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is 
almost
certainly a compiler bug. If not, then we can go in the debugger and see what 
is failing.

  Thanks,

Matt

$ ./configure --prefix=/opt/petsc-oneapi22u3 
--with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
-diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
--with-shared-libraries=0 --download-make

and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib compiles. 
Yet, I see right off the bat this segfault error in the first PETSc example:

$ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
PETSC_ARCH=arch-darwin-c-opt test
/Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make 
--no-print-directory -f 
/Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test 
PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test
/opt/intel/oneapi/intelpython/latest/bin/python3 
/Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py 
--petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 
--petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests
Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1
 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o
In file included from 
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44),
 from 
/Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4):
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): warning 
#2621: attribute "warn_unused_result" does not apply here
  PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD {
^

CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1
   TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts
not ok sys_classes_draw_tests-ex1_1 # Error code: 139
# [excess:98681] *** Process received signal ***
# [excess:98681] Signal: Segmentation fault: 11 (11)
# [excess:98681] Signal code: Address not mapped (1)
# [excess:98681] Failing at address: 0x7f
# [excess:98681] *** End of error message ***
# --
# Primary job  terminated normally, but 1 process returned
# a non-zero exit code. Per user-direction, the job has been aborted.
# --
# --
# mpiexec noticed that process rank 0 with PID 0 on node excess exited on 
signal 11 (Segmentation fault: 11).
# --
 ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff

I see the same segfault error in all PETSc examples.
Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is to 
use the linear solver from PETSc for the Poisson equation on our numerical 
scheme and test this on a GPU cluster. So also, any guideline on how to 
interface PETSc with a fortran code and personal experience is also most 
appreciated!

Marcos





--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>



Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
has been aborted.
--
--
mpiexec noticed that process rank 0 with PID 0 on node excess exited on signal 
11 (Segmentation fault: 11).
--
Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
See https://petsc.org/release/faq/
[excess:37831] *** Process received signal ***
[excess:37831] Signal: Segmentation fault: 11 (11)
[excess:37831] Signal code: Address not mapped (1)
[excess:37831] Failing at address: 0x7f
[excess:37831] *** End of error message ***
[excess:37832] *** Process received signal ***
[excess:37832] Signal: Segmentation fault: 11 (11)
[excess:37832] Signal code: Address not mapped (1)
[excess:37832] Failing at address: 0x7f
[excess:37832] *** End of error message ***
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec noticed that process rank 1 with PID 0 on node excess exited on signal 
11 (Segmentation fault: 11).
--
Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI 
process
See https://petsc.org/release/faq/
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource
libifcoremt.dylib  00010B7F7FE4  for__signal_handl Unknown  Unknown
libsystem_platfor  7FF8024C25ED  _sigtramp Unknown  Unknown
ex5f   0001087AFA38  PetscGetArchType  Unknown  Unknown
ex5f   00010887913B  PetscErrorPrintfI Unknown  Unknown
ex5f   00010878D227  PetscInitialize_C Unknown  Unknown
ex5f   00010879D289  petscinitializef_ Unknown  Unknown
ex5f   000108713C09  petscsys_mp_petsc Unknown  Unknown
ex5f   000108710B5D  MAIN__Unknown  Unknown
ex5f   000108710AEE  main  Unknown  Unknown
dyld   7FF80213B41F  start Unknown  Unknown
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[48108,1],0]
  Exit code:174
--
Completed test examples
Error while running make check
make[1]: *** [check] Error 1
make: *** [check] Error 2


From: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Sent: Monday, May 15, 2023 12:20 PM
To: Matthew Knepley mailto:knep...@gmail.com>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

Thank you Matt I'll try this and let you know.
Marcos

From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, May 15, 2023 12:08 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
Ventura 13.3.1.
I can compile PETSc in debug mode with this configure and make lines. I can run 
the PETSC tests, which seem fine.
When I compile the library in optimized mode, either using -O3 or O1, for 
example configuring with:

I hate to yell "compiler bug" when this happens, but it sure seems like one. 
Can you just use

  --with-debugging=0

without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is 
almost
certainly a compiler bug. If not, then we can go in the debugger and see what 
is failing.

  Thanks,

Matt

$ ./configure --prefix=/opt/petsc-oneapi22u3 
--with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
-diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
--with-shared-l

Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
 2 MPI processes
See https://petsc.org/release/faq/
[excess:37831] *** Process received signal ***
[excess:37831] Signal: Segmentation fault: 11 (11)
[excess:37831] Signal code: Address not mapped (1)
[excess:37831] Failing at address: 0x7f
[excess:37831] *** End of error message ***
[excess:37832] *** Process received signal ***
[excess:37832] Signal: Segmentation fault: 11 (11)
[excess:37832] Signal code: Address not mapped (1)
[excess:37832] Failing at address: 0x7f
[excess:37832] *** End of error message ***
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec noticed that process rank 1 with PID 0 on node excess exited on signal 
11 (Segmentation fault: 11).
--
Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI 
process
See https://petsc.org/release/faq/
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource
libifcoremt.dylib  00010B7F7FE4  for__signal_handl Unknown  Unknown
libsystem_platfor  7FF8024C25ED  _sigtramp Unknown  Unknown
ex5f   0001087AFA38  PetscGetArchType  Unknown  Unknown
ex5f   00010887913B  PetscErrorPrintfI Unknown  Unknown
ex5f   00010878D227  PetscInitialize_C Unknown  Unknown
ex5f   00010879D289  petscinitializef_ Unknown  Unknown
ex5f   000108713C09  petscsys_mp_petsc Unknown  Unknown
ex5f   000108710B5D  MAIN__Unknown  Unknown
ex5f   000108710AEE  main  Unknown  Unknown
dyld   7FF80213B41F  start Unknown  Unknown
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[48108,1],0]
  Exit code:174
--
Completed test examples
Error while running make check
make[1]: *** [check] Error 1
make: *** [check] Error 2


From: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Sent: Monday, May 15, 2023 12:20 PM
To: Matthew Knepley mailto:knep...@gmail.com>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

Thank you Matt I'll try this and let you know.
Marcos

From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, May 15, 2023 12:08 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
Ventura 13.3.1.
I can compile PETSc in debug mode with this configure and make lines. I can run 
the PETSC tests, which seem fine.
When I compile the library in optimized mode, either using -O3 or O1, for 
example configuring with:

I hate to yell "compiler bug" when this happens, but it sure seems like one. 
Can you just use

  --with-debugging=0

without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is 
almost
certainly a compiler bug. If not, then we can go in the debugger and see what 
is failing.

  Thanks,

Matt

$ ./configure --prefix=/opt/petsc-oneapi22u3 
--with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
-diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
--with-shared-libraries=0 --download-make

and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib compiles. 
Yet, I see right off the bat this segfault error in the first PETSc example:

$ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
PETSC_ARCH=arch-darwin-c-opt test
/Users/mnv/Doc

Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
Hi Satish, yes the -m64 flag tells the compilers the target cpu is intel 64.

The only reason I'm trying to get PETSc working with intel is that the bundles 
for the software we release use Intel compilers for Linux, Mac and Windows 
(OneAPI intelMPI for linux and Windows, OpenMPI compiled with intel for MacOS). 
I'm just trying to get PETSc compiled with intel to maintain the scheme we have 
and keep these compilers, which would be handy if we are to release an 
alternative Poisson solver using PETSc in the future.
For our research projects I'm thinking we'll use gcc/openmpi in linux clusters.

Marcos

From: Satish Balay 
Sent: Monday, May 15, 2023 12:48 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users 
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

Ops - for some reason I assumed this build is on Mac M1. [likely due to the 
usage of '-m64' - that was strange]..

But yeah - our general usage on Mac is with xcode/clang and brew gfortran (on 
both Intel and ARM CPUs) - and unless you need Intel compilers for specific 
needs - clang/gfortran should work better for this development work.

Satish

On Mon, 15 May 2023, Vanella, Marcos (Fed) via petsc-users wrote:

> Hi Satish, well turns out this is not an M1 Mac, it is an older Intel Mac 
> (2019).
> I'm trying to get a local computer to do development and tests, but I also 
> have access to linux clusters with GPU which we plan to go to next.
> Thanks for the suggestion, I might also try compiling a gcc/gfortran version 
> of the lib on this computer.
> Marcos
> 
> From: Satish Balay 
> Sent: Monday, May 15, 2023 12:10 PM
> To: Vanella, Marcos (Fed) 
> Cc: petsc-users@mcs.anl.gov 
> Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
> OpenMPI
>
> I see Intel compilers here are building x86_64 binaries - that get run on the 
> Arm M1 CPU - perhaps there are issues here with this mode of usage..
>
> > I'm starting to work with PETSc. Our plan is to use the linear solver from 
> > PETSc for the Poisson equation on our numerical scheme and test this on a 
> > GPU cluster.
>
> What does intel compilers provide you for this use case?
>
> Why not use xcode/clang with gfortran here - i.e native ARM binaries?
>
>
> Satish
>
> On Mon, 15 May 2023, Vanella, Marcos (Fed) via petsc-users wrote:
>
> > Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
> > 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
> > Ventura 13.3.1.
> > I can compile PETSc in debug mode with this configure and make lines. I can 
> > run the PETSC tests, which seem fine.
> > When I compile the library in optimized mode, either using -O3 or O1, for 
> > example configuring with:
> >
> > $ ./configure --prefix=/opt/petsc-oneapi22u3 
> > --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
> > -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
> > FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
> > --with-shared-libraries=0 --download-make
> >
> > and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib 
> > compiles. Yet, I see right off the bat this segfault error in the first 
> > PETSc example:
> >
> > $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
> > PETSC_ARCH=arch-darwin-c-opt test
> > /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make 
> > --no-print-directory -f 
> > /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test 
> > PETSC_ARCH=arch-darwin-c-opt 
> > PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test
> > /opt/intel/oneapi/intelpython/latest/bin/python3 
> > /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py 
> > --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 
> > --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests
> > Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt 
> > PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1
> >  CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o
> > In file included from 
> > /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44),
> >  from 
> > /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4):
> > /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): 
> > warning #2621: attribute "warn_unused_result" does not apply here
> >   PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD {
> > ^
> >
> > CLINKER arch-darwin-c-opt/tests/sys/c

Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
2] Failing at address: 0x7f
[excess:37832] *** End of error message ***
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec noticed that process rank 1 with PID 0 on node excess exited on signal 
11 (Segmentation fault: 11).
--
Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI 
process
See https://petsc.org/release/faq/
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource
libifcoremt.dylib  00010B7F7FE4  for__signal_handl Unknown  Unknown
libsystem_platfor  7FF8024C25ED  _sigtramp Unknown  Unknown
ex5f   0001087AFA38  PetscGetArchType  Unknown  Unknown
ex5f   00010887913B  PetscErrorPrintfI Unknown  Unknown
ex5f   00010878D227  PetscInitialize_C Unknown  Unknown
ex5f   00010879D289  petscinitializef_ Unknown  Unknown
ex5f   000108713C09  petscsys_mp_petsc Unknown  Unknown
ex5f   000108710B5D  MAIN__Unknown  Unknown
ex5f   000108710AEE  main  Unknown  Unknown
dyld   7FF80213B41F  start Unknown  Unknown
--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[48108,1],0]
  Exit code:174
--
Completed test examples
Error while running make check
make[1]: *** [check] Error 1
make: *** [check] Error 2


From: Vanella, Marcos (Fed) 
Sent: Monday, May 15, 2023 12:20 PM
To: Matthew Knepley 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

Thank you Matt I'll try this and let you know.
Marcos

From: Matthew Knepley 
Sent: Monday, May 15, 2023 12:08 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
Ventura 13.3.1.
I can compile PETSc in debug mode with this configure and make lines. I can run 
the PETSC tests, which seem fine.
When I compile the library in optimized mode, either using -O3 or O1, for 
example configuring with:

I hate to yell "compiler bug" when this happens, but it sure seems like one. 
Can you just use

  --with-debugging=0

without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is 
almost
certainly a compiler bug. If not, then we can go in the debugger and see what 
is failing.

  Thanks,

Matt

$ ./configure --prefix=/opt/petsc-oneapi22u3 
--with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
-diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
--with-shared-libraries=0 --download-make

and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib compiles. 
Yet, I see right off the bat this segfault error in the first PETSc example:

$ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
PETSC_ARCH=arch-darwin-c-opt test
/Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make 
--no-print-directory -f 
/Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test 
PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test
/opt/intel/oneapi/intelpython/latest/bin/python3 
/Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py 
--petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 
--petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests
Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1
 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o
In file included from 
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(

Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
Hi Satish, well turns out this is not an M1 Mac, it is an older Intel Mac 
(2019).
I'm trying to get a local computer to do development and tests, but I also have 
access to linux clusters with GPU which we plan to go to next.
Thanks for the suggestion, I might also try compiling a gcc/gfortran version of 
the lib on this computer.
Marcos

From: Satish Balay 
Sent: Monday, May 15, 2023 12:10 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

I see Intel compilers here are building x86_64 binaries - that get run on the 
Arm M1 CPU - perhaps there are issues here with this mode of usage..

> I'm starting to work with PETSc. Our plan is to use the linear solver from 
> PETSc for the Poisson equation on our numerical scheme and test this on a GPU 
> cluster.

What does intel compilers provide you for this use case?

Why not use xcode/clang with gfortran here - i.e native ARM binaries?


Satish

On Mon, 15 May 2023, Vanella, Marcos (Fed) via petsc-users wrote:

> Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
> 4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
> Ventura 13.3.1.
> I can compile PETSc in debug mode with this configure and make lines. I can 
> run the PETSC tests, which seem fine.
> When I compile the library in optimized mode, either using -O3 or O1, for 
> example configuring with:
>
> $ ./configure --prefix=/opt/petsc-oneapi22u3 
> --with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
> -diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
> FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
> --with-shared-libraries=0 --download-make
>
> and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib 
> compiles. Yet, I see right off the bat this segfault error in the first PETSc 
> example:
>
> $ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
> PETSC_ARCH=arch-darwin-c-opt test
> /Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make 
> --no-print-directory -f 
> /Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test 
> PETSC_ARCH=arch-darwin-c-opt 
> PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test
> /opt/intel/oneapi/intelpython/latest/bin/python3 
> /Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py 
> --petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 
> --petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests
> Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt 
> PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1
>  CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o
> In file included from 
> /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44),
>  from 
> /Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4):
> /Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): 
> warning #2621: attribute "warn_unused_result" does not apply here
>   PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD {
> ^
>
> CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1
>TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts
> not ok sys_classes_draw_tests-ex1_1 # Error code: 139
> # [excess:98681] *** Process received signal ***
> # [excess:98681] Signal: Segmentation fault: 11 (11)
> # [excess:98681] Signal code: Address not mapped (1)
> # [excess:98681] Failing at address: 0x7f
> # [excess:98681] *** End of error message ***
> # 
> --
> # Primary job  terminated normally, but 1 process returned
> # a non-zero exit code. Per user-direction, the job has been aborted.
> # 
> --
> # 
> --
> # mpiexec noticed that process rank 0 with PID 0 on node excess exited on 
> signal 11 (Segmentation fault: 11).
> # 
> --
>  ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff
>
> I see the same segfault error in all PETSc examples.
> Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is 
> to use the linear solver from PETSc for the Poisson equation on our numerical 
> scheme and test this on a GPU cluster. So also, any guideline on how to 
> interface PETSc with a fortran code and personal experience is also most 
> appreciated!
>
> Marcos
>
>
>
>


Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Matt I'll try this and let you know.
Marcos

From: Matthew Knepley 
Sent: Monday, May 15, 2023 12:08 PM
To: Vanella, Marcos (Fed) 
Cc: petsc-users@mcs.anl.gov 
Subject: Re: [petsc-users] Compiling PETSC with Intel OneAPI compilers and 
OpenMPI

On Mon, May 15, 2023 at 11:19 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
Ventura 13.3.1.
I can compile PETSc in debug mode with this configure and make lines. I can run 
the PETSC tests, which seem fine.
When I compile the library in optimized mode, either using -O3 or O1, for 
example configuring with:

I hate to yell "compiler bug" when this happens, but it sure seems like one. 
Can you just use

  --with-debugging=0

without the custom COPTFLAGS, CXXOPTFLAGS, FOPTFLAGS? If that works, it is 
almost
certainly a compiler bug. If not, then we can go in the debugger and see what 
is failing.

  Thanks,

Matt

$ ./configure --prefix=/opt/petsc-oneapi22u3 
--with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
-diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
--with-shared-libraries=0 --download-make

and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib compiles. 
Yet, I see right off the bat this segfault error in the first PETSc example:

$ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
PETSC_ARCH=arch-darwin-c-opt test
/Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make 
--no-print-directory -f 
/Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test 
PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test
/opt/intel/oneapi/intelpython/latest/bin/python3 
/Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py 
--petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 
--petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests
Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1
 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o
In file included from 
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44),
 from 
/Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4):
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): warning 
#2621: attribute "warn_unused_result" does not apply here
  PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD {
^

CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1
   TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts
not ok sys_classes_draw_tests-ex1_1 # Error code: 139
# [excess:98681] *** Process received signal ***
# [excess:98681] Signal: Segmentation fault: 11 (11)
# [excess:98681] Signal code: Address not mapped (1)
# [excess:98681] Failing at address: 0x7f
# [excess:98681] *** End of error message ***
# --
# Primary job  terminated normally, but 1 process returned
# a non-zero exit code. Per user-direction, the job has been aborted.
# --
# --
# mpiexec noticed that process rank 0 with PID 0 on node excess exited on 
signal 11 (Segmentation fault: 11).
# --
 ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff

I see the same segfault error in all PETSc examples.
Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is to 
use the linear solver from PETSc for the Poisson equation on our numerical 
scheme and test this on a GPU cluster. So also, any guideline on how to 
interface PETSc with a fortran code and personal experience is also most 
appreciated!

Marcos





--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


[petsc-users] Compiling PETSC with Intel OneAPI compilers and OpenMPI

2023-05-15 Thread Vanella, Marcos (Fed) via petsc-users
Hello, I'm trying to compile the PETSc library version 3.19.1 with OpenMPI 
4.1.4 and the OneAPI 2022 Update 2 Intel Compiler suite on a Mac with OSX 
Ventura 13.3.1.
I can compile PETSc in debug mode with this configure and make lines. I can run 
the PETSC tests, which seem fine.
When I compile the library in optimized mode, either using -O3 or O1, for 
example configuring with:

$ ./configure --prefix=/opt/petsc-oneapi22u3 
--with-blaslapack-dir=/opt/intel/oneapi/mkl/2022.2.1 COPTFLAGS='-m64 -O1 -g 
-diag-disable=10441' CXXOPTFLAGS='-m64 -O1 -g -diag-disable=10441' 
FOPTFLAGS='-m64 -O1 -g' LDFLAGS='-m64' --with-debugging=0 
--with-shared-libraries=0 --download-make

and using mpicc (icc), mpif90 (ifort) from  Open MPI, the static lib compiles. 
Yet, I see right off the bat this segfault error in the first PETSc example:

$ make PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 
PETSC_ARCH=arch-darwin-c-opt test
/Users/mnv/Documents/Software/petsc-3.19.1/arch-darwin-c-opt/bin/make 
--no-print-directory -f 
/Users/mnv/Documents/Software/petsc-3.19.1/gmakefile.test 
PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1 test
/opt/intel/oneapi/intelpython/latest/bin/python3 
/Users/mnv/Documents/Software/petsc-3.19.1/config/gmakegentest.py 
--petsc-dir=/Users/mnv/Documents/Software/petsc-3.19.1 
--petsc-arch=arch-darwin-c-opt --testdir=./arch-darwin-c-opt/tests
Using MAKEFLAGS: --no-print-directory -- PETSC_ARCH=arch-darwin-c-opt 
PETSC_DIR=/Users/mnv/Documents/Software/petsc-3.19.1
 CC arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1.o
In file included from 
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsys.h(44),
 from 
/Users/mnv/Documents/Software/petsc-3.19.1/src/sys/classes/draw/tests/ex1.c(4):
/Users/mnv/Documents/Software/petsc-3.19.1/include/petscsystypes.h(68): warning 
#2621: attribute "warn_unused_result" does not apply here
  PETSC_ERROR_CODE_TYPEDEF enum PETSC_ERROR_CODE_NODISCARD {
^

CLINKER arch-darwin-c-opt/tests/sys/classes/draw/tests/ex1
   TEST arch-darwin-c-opt/tests/counts/sys_classes_draw_tests-ex1_1.counts
not ok sys_classes_draw_tests-ex1_1 # Error code: 139
# [excess:98681] *** Process received signal ***
# [excess:98681] Signal: Segmentation fault: 11 (11)
# [excess:98681] Signal code: Address not mapped (1)
# [excess:98681] Failing at address: 0x7f
# [excess:98681] *** End of error message ***
# --
# Primary job  terminated normally, but 1 process returned
# a non-zero exit code. Per user-direction, the job has been aborted.
# --
# --
# mpiexec noticed that process rank 0 with PID 0 on node excess exited on 
signal 11 (Segmentation fault: 11).
# --
 ok sys_classes_draw_tests-ex1_1 # SKIP Command failed so no diff

I see the same segfault error in all PETSc examples.
Any help is mostly appreciated, I'm starting to work with PETSc. Our plan is to 
use the linear solver from PETSc for the Poisson equation on our numerical 
scheme and test this on a GPU cluster. So also, any guideline on how to 
interface PETSc with a fortran code and personal experience is also most 
appreciated!

Marcos