Dear Matthew and Junchao, I finally found my error now everything works fine. I was a bit stuck at some moment and your small comments were very helpful.
Thanks!!! Herbert Owen Senior Researcher, Dpt. Computer Applications in Science and Engineering Barcelona Supercomputing Center (BSC-CNS) Tel: +34 93 413 4038 Skype: herbert.owen https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ > On 14 Nov 2025, at 16:08, howen <[email protected]> wrote: > > Thank you very much Matthew, > > I did what you suggested and I also added > > ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr); > > Now that I can see the matrices I notice that some values differ. I will > debug and simplify my code to try to understand where the difference comes > from . > > As soon as I have a more clear picture I will contact you back. > > Best, > > > Herbert Owen > Senior Researcher, Dpt. Computer Applications in Science and Engineering > Barcelona Supercomputing Center (BSC-CNS) > Tel: +34 93 413 4038 > Skype: herbert.owen > > https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ > > > > > > > > > >> On 13 Nov 2025, at 18:23, Matthew Knepley <[email protected]> wrote: >> >> On Thu, Nov 13, 2025 at 12:11 PM howen via petsc-users >> <[email protected] <mailto:[email protected]>> wrote: >>> Dear Junchao, >>> >>> Thank you for response and sorry for taking so long to answer back. >>> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC >>> and gives us problems when compiling our code. >>> What I have done to enable using the latest petsc is to create my own C >>> code to call petsc. >>> I have little experience with c and it took me some time, but I can now use >>> petsc 3.24.1 ;) >>> >>> The behaviour remains the same as in my original email . >>> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all >>> work ok and give the same result. >>> >>> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi. >>> I see that the difference starts in >>> src/ksp/ksp/impls/cg/cg.c L170 >>> PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br >>> */ >>> I have printed the vectors R and Z and the norm dp. >>> R is identical on both CPU and GPU; but Z differs. >>> The correct value of dp (for the first time it enters) is 14.3014, while >>> running on the GPU with 2 mpis it gives 14.7493. >>> If you wish I can send you prints I introduced in cg.c >> >> Thank you for all the detail in this report. However, since you see a >> problem in KSPCG, I believe we can reduce the complexity. You can use >> >> -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin >> >> and send us those files. Then we can run your system directly using KSP ex10 >> (and so can you). >> >> Thanks, >> >> Matt >> >>> The folder with the input files to run the case can be downloaded from >>> https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRSYLlx3K$ >>> >>> <https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$> >>> >>> For submitting the gpu run I use >>> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh >>> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d >>> ChannelFlowSolverIncomp.json >>> >>> For the cpu run >>> mpirun -np 2 >>> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d >>> ChannelFlowSolverIncomp.json >>> >>> Our code can be downloaded with : >>> git clone --recursive >>> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRQCr0eq8$ >>> >>> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$> >>> >>> -and the branch I am using with >>> git checkout 140-add-petsc >>> >>> To use exactly the same commit I am using >>> git checkout 09a923c9b57e46b14ae54b935845d50272691ace >>> >>> >>> I am currently using: Currently Loaded Modules: >>> 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1 >>> I guess/hope similar modules should be available in any supercomputer. >>> >>> To build the cpu version >>> mkdir build_cpu >>> cd build_cpu >>> >>> export >>> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal >>> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH >>> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH >>> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH >>> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH >>> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH >>> >>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF >>> -DDEBUG_MODE=OFF .. >>> make -j 80 >>> >>> I have built petsc myself as follows >>> >>> git clone -b release >>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRZKzWAoJ$ >>> >>> <https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$> >>> petsc >>> cd petsc >>> git checkout v3.24.1 >>> module purge >>> module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 >>> ./configure >>> --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc >>> --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal >>> --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt >>> --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 >>> --with-precision=single --download-hypre >>> CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= >>> --with-shared-libraries=1 --with-mpi=1 >>> --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a >>> --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include >>> --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ >>> --download-ptscotch=yes --download-metis --download-parmetis >>> make all check >>> make install >>> >>> ------------------- >>> For the GPU version when configuring petsc I add : --with-cuda >>> >>> I then change the export PETSC_INSTALL to >>> export >>> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal >>> and repeat all other exports >>> >>> mkdir build_gpu >>> cd build_gpu >>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON >>> -DDEBUG_MODE=OFF .. >>> make -j 80 >>> >>> As you can see from the submit instructions the executable is found in >>> sod2d_gitlab/build_gpu/src/app_sod2d/sod2d >>> >>> I hope I have not forgotten anything and my instructions are 'easy' to >>> follow. If you have any issue do not doubt to contact me. >>> The wiki for our code can be found in >>> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRTtC2VEI$ >>> >>> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$> >>> >>> Best, >>> >>> Herbert Owen >>> >>> Herbert Owen >>> Senior Researcher, Dpt. Computer Applications in Science and Engineering >>> Barcelona Supercomputing Center (BSC-CNS) >>> Tel: +34 93 413 4038 >>> Skype: herbert.owen >>> >>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ >>> >>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$> >>> >>> >>> >>> >>> >>> >>> >>> >>>> On 16 Oct 2025, at 18:30, Junchao Zhang <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi, Herbert, >>>> I don't have much experience on OpenACC and PETSc CI doesn't have such >>>> tests. Could you avoid using nvfortran and instead use gfortran to >>>> compile your Fortran + OpenACC code? If you, then you can use the latest >>>> petsc code and make our debugging easier. >>>> Also, could you provide us with a test and instructions to reproduce >>>> the problem? >>>> >>>> Thanks! >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Oct 16, 2025 at 5:07 AM howen via petsc-users >>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> Dear All, >>>>> >>>>> I am interfacing our CFD code (Fortran + OpenACC) to Petsc. >>>>> Since we use OpenACC the natural choice for us is to use Nvidia´s nvhpc >>>>> compiler. The Gnu compiler does not work well and we do not have access >>>>> to the Cray compiler. >>>>> >>>>> I already know that the latest version of Petsc does not compile with >>>>> nvhpc, I am therefore using version 3.21. >>>>> I get good results on the CPU both in serial and parallel (MPI). However, >>>>> the GPU implementation, that is what we are interested in, only work >>>>> correctly for the serial version. In parallel, the results are different. >>>>> Even for a CG solve. >>>>> >>>>> I would like to know, if you have experience with the Nvidia compiler. I >>>>> am particularly interested if you have already observed issues with it. >>>>> Your opinion on whether to put further effort into trying to find a bug I >>>>> may have introduced during the interfacing is highly appreciated. >>>>> >>>>> Best, >>>>> >>>>> Herbert Owen >>>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering >>>>> Barcelona Supercomputing Center (BSC-CNS) >>>>> Tel: +34 93 413 4038 >>>>> Skype: herbert.owen >>>>> >>>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$ >>>>> >>>>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!abuM7ozzUs7eISYBumHNxpvO2Tuy74KRM4-WWcunXHZVjQf1V032xQrCzTfC5vA_NM-35xMEZ9yJ8XK-3QFqjWBSWuUi$> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRY11r9Bz$ >> >> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRZltPRPT$ >> > >
