Thank you very much Matthew, I did what you suggested and I also added
ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); Now that I can see the matrices I notice that some values differ. I will debug and simplify my code to try to understand where the difference comes from . As soon as I have a more clear picture I will contact you back. Best, Herbert Owen Senior Researcher, Dpt. Computer Applications in Science and Engineering Barcelona Supercomputing Center (BSC-CNS) Tel: +34 93 413 4038 Skype: herbert.owen https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ > On 13 Nov 2025, at 18:23, Matthew Knepley <[email protected]> wrote: > > On Thu, Nov 13, 2025 at 12:11 PM howen via petsc-users > <[email protected] <mailto:[email protected]>> wrote: >> Dear Junchao, >> >> Thank you for response and sorry for taking so long to answer back. >> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC >> and gives us problems when compiling our code. >> What I have done to enable using the latest petsc is to create my own C code >> to call petsc. >> I have little experience with c and it took me some time, but I can now use >> petsc 3.24.1 ;) >> >> The behaviour remains the same as in my original email . >> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all >> work ok and give the same result. >> >> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi. >> I see that the difference starts in >> src/ksp/ksp/impls/cg/cg.c L170 >> PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br >> */ >> I have printed the vectors R and Z and the norm dp. >> R is identical on both CPU and GPU; but Z differs. >> The correct value of dp (for the first time it enters) is 14.3014, while >> running on the GPU with 2 mpis it gives 14.7493. >> If you wish I can send you prints I introduced in cg.c > > Thank you for all the detail in this report. However, since you see a problem > in KSPCG, I believe we can reduce the complexity. You can use > > -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin > > and send us those files. Then we can run your system directly using KSP ex10 > (and so can you). > > Thanks, > > Matt > >> The folder with the input files to run the case can be downloaded from >> https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG1yKXAMP$ >> >> <https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$> >> >> For submitting the gpu run I use >> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh >> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d >> ChannelFlowSolverIncomp.json >> >> For the cpu run >> mpirun -np 2 >> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d >> ChannelFlowSolverIncomp.json >> >> Our code can be downloaded with : >> git clone --recursive >> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG8xQvHi_$ >> >> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$> >> >> -and the branch I am using with >> git checkout 140-add-petsc >> >> To use exactly the same commit I am using >> git checkout 09a923c9b57e46b14ae54b935845d50272691ace >> >> >> I am currently using: Currently Loaded Modules: >> 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1 >> I guess/hope similar modules should be available in any supercomputer. >> >> To build the cpu version >> mkdir build_cpu >> cd build_cpu >> >> export >> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal >> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH >> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH >> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH >> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH >> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH >> >> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF >> -DDEBUG_MODE=OFF .. >> make -j 80 >> >> I have built petsc myself as follows >> >> git clone -b release >> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG9OyCmiL$ >> >> <https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$> >> petsc >> cd petsc >> git checkout v3.24.1 >> module purge >> module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 >> ./configure >> --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc >> --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal >> --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt >> --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 >> --with-precision=single --download-hypre >> CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= >> --with-shared-libraries=1 --with-mpi=1 >> --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a >> --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include >> --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ >> --download-ptscotch=yes --download-metis --download-parmetis >> make all check >> make install >> >> ------------------- >> For the GPU version when configuring petsc I add : --with-cuda >> >> I then change the export PETSC_INSTALL to >> export >> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal >> and repeat all other exports >> >> mkdir build_gpu >> cd build_gpu >> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON -DDEBUG_MODE=OFF >> .. >> make -j 80 >> >> As you can see from the submit instructions the executable is found in >> sod2d_gitlab/build_gpu/src/app_sod2d/sod2d >> >> I hope I have not forgotten anything and my instructions are 'easy' to >> follow. If you have any issue do not doubt to contact me. >> The wiki for our code can be found in >> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG49E2dbs$ >> >> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$> >> >> Best, >> >> Herbert Owen >> >> Herbert Owen >> Senior Researcher, Dpt. Computer Applications in Science and Engineering >> Barcelona Supercomputing Center (BSC-CNS) >> Tel: +34 93 413 4038 >> Skype: herbert.owen >> >> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ >> >> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$> >> >> >> >> >> >> >> >> >>> On 16 Oct 2025, at 18:30, Junchao Zhang <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi, Herbert, >>> I don't have much experience on OpenACC and PETSc CI doesn't have such >>> tests. Could you avoid using nvfortran and instead use gfortran to compile >>> your Fortran + OpenACC code? If you, then you can use the latest petsc >>> code and make our debugging easier. >>> Also, could you provide us with a test and instructions to reproduce the >>> problem? >>> >>> Thanks! >>> --Junchao Zhang >>> >>> >>> On Thu, Oct 16, 2025 at 5:07 AM howen via petsc-users >>> <[email protected] <mailto:[email protected]>> wrote: >>>> Dear All, >>>> >>>> I am interfacing our CFD code (Fortran + OpenACC) to Petsc. >>>> Since we use OpenACC the natural choice for us is to use Nvidia´s nvhpc >>>> compiler. The Gnu compiler does not work well and we do not have access to >>>> the Cray compiler. >>>> >>>> I already know that the latest version of Petsc does not compile with >>>> nvhpc, I am therefore using version 3.21. >>>> I get good results on the CPU both in serial and parallel (MPI). However, >>>> the GPU implementation, that is what we are interested in, only work >>>> correctly for the serial version. In parallel, the results are different. >>>> Even for a CG solve. >>>> >>>> I would like to know, if you have experience with the Nvidia compiler. I >>>> am particularly interested if you have already observed issues with it. >>>> Your opinion on whether to put further effort into trying to find a bug I >>>> may have introduced during the interfacing is highly appreciated. >>>> >>>> Best, >>>> >>>> Herbert Owen >>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering >>>> Barcelona Supercomputing Center (BSC-CNS) >>>> Tel: +34 93 413 4038 >>>> Skype: herbert.owen >>>> >>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ >>>> >>>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!abuM7ozzUs7eISYBumHNxpvO2Tuy74KRM4-WWcunXHZVjQf1V032xQrCzTfC5vA_NM-35xMEZ9yJ8XK-3QFqjWBSWuUi$> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG02oLPV3$ > > <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG3J_36vG$ > >
