Can you pinpoint the MPI calls (and the routines in PETSc or hypre that they are in) that are not using CUDA-aware message passing? That is, inside the MPI call, they are copying to host memory and doing the needed inter-process communication from there? I do not understand the graphics you have sent.
Barry Or is it possible the buffers passed to MPI are not on the GPU and so naturally do the MPI from host memory? If so, where? > On Aug 31, 2025, at 1:30 PM, LEDAC Pierre <pierre.le...@cea.fr> wrote: > > Ok, I just tried --enable-gpu-aware-mpi passed to Hypre, > > Hypre_config.h defines now HYPRE_USING_GPU_AWARE_MPI 1 > > But still no D2D copy near MPI calls in ex46.c example. > > Probably an obvious thing I forgot during PETSc configure, but I don't see... > > Pierre LEDAC > Commissariat à l’énergie atomique et aux énergies alternatives > Centre de SACLAY > DES/ISAS/DM2S/SGLS/LCAN > Bâtiment 451 – point courrier n°41 > F-91191 Gif-sur-Yvette > +33 1 69 08 04 03 > +33 6 83 42 05 79 > De : LEDAC Pierre > Envoyé : dimanche 31 août 2025 19:13:36 > À : Barry Smith > Cc : petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> > Objet : RE: [petsc-users] [MPI][GPU] > > Barry, > > It solved the unrecognized option but still exchanging MPI messages through > the host. > > I switch to a simpler test case without reading a matrix > (src/ksp/ksp/tutorials/ex46.c) but get the same behaviour. > > In the Nsys profile for ex46, the MPI synchronizations occurs during PCApply > so now I am wondering if the issue is related to the fact than Hypre is not > configured/enabled with MPI GPU-Aware in the PETSc build. > I will give a try with --enable-gpu-aware-mpi passed to Hypre. > > Do you know an example in PETSc which specifically bench with/without > Cuda-Aware enabled for MPI ? > > <pastedImage.png> > > Pierre LEDAC > Commissariat à l’énergie atomique et aux énergies alternatives > Centre de SACLAY > DES/ISAS/DM2S/SGLS/LCAN > Bâtiment 451 – point courrier n°41 > F-91191 Gif-sur-Yvette > +33 1 69 08 04 03 > +33 6 83 42 05 79 > De : Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> > Envoyé : dimanche 31 août 2025 16:33:38 > À : LEDAC Pierre > Cc : petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> > Objet : Re: [petsc-users] [MPI][GPU] > > > Ahh, that ex10.c is missing a VecSetFromOptions() call before the VecLoad() > and friends. In contrast, the matrix has a MatSetFromOptions(). Can you try > adding it to ex10.c and see if that resolves the problem with ex10.c (and may > be a path forward for your code)? > > Barry > > >> On Aug 31, 2025, at 4:32 AM, LEDAC Pierre <pierre.le...@cea.fr >> <mailto:pierre.le...@cea.fr>> wrote: >> >> Yes, but was surprised it was not used, so I removed it (same for -vec_type >> mpicuda) >> >> mpirun -np 2 ./ex10 2 -f Matrix_3133717_rows_1_cpus.petsc -ksp_view >> -log_view -ksp_monitor -ksp_type cg -pc_type hypre -pc_hypre_type boomeramg >> -pc_hypre_boomeramg_strong_threshold 0.7 -mat_type aijcusparse -vec_type cuda >> ... >> WARNING! There are options you set that were not used! >> WARNING! could be spelling mistake, etc! >> There is one unused database option. It is: >> Option left: name:-vec_type value: cuda source: command lin >> >> Pierre LEDAC >> Commissariat à l’énergie atomique et aux énergies alternatives >> Centre de SACLAY >> DES/ISAS/DM2S/SGLS/LCAN >> Bâtiment 451 – point courrier n°41 >> F-91191 Gif-sur-Yvette >> +33 1 69 08 04 03 >> +33 6 83 42 05 79 >> De : Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> >> Envoyé : samedi 30 août 2025 21:47:07 >> À : LEDAC Pierre >> Cc : petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >> Objet : Re: [petsc-users] [MPI][GPU] >> >> >> Did you try the additional option -vec_type cuda with ex10.c ? >> >> >> >>> On Aug 30, 2025, at 1:16 PM, LEDAC Pierre <pierre.le...@cea.fr >>> <mailto:pierre.le...@cea.fr>> wrote: >>> >>> Hello, >>> >>> My code is built with PETSc 3.23+OpenMPI 4.1.6 (Cuda support enabled) and >>> profling indicates that MPI communications are done between GPUs in all the >>> code except PETSc part where D2H transfers occur. >>> >>> I reproduced the PETSc issue with the example under >>> src/ksp/ksp/tutorials/ex10 on 2 MPI ranks. See output in ex10.log >>> >>> Also below the Nsys system profiling on ex10 with D2H and H2D copies >>> before/after MPI calls. >>> >>> Thanks for your help, >>> >>> <pastedImage.png> >>> >>> >>> Pierre LEDAC >>> Commissariat à l’énergie atomique et aux énergies alternatives >>> Centre de SACLAY >>> DES/ISAS/DM2S/SGLS/LCAN >>> Bâtiment 451 – point courrier n°41 >>> F-91191 Gif-sur-Yvette >>> +33 1 69 08 04 03 >>> +33 6 83 42 05 79 >>> <ex10.log>