Thank you very much Matthew,

I did what you suggested and I also added 

ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr);

Now that I can see the matrices I notice that some values differ. I will debug 
and simplify my code to try to understand where the difference comes from . 

As soon as I have a more clear picture I will contact you back. 

Best, 


Herbert Owen
Senior Researcher, Dpt. Computer Applications in Science and Engineering
Barcelona Supercomputing Center (BSC-CNS)
Tel: +34 93 413 4038
Skype: herbert.owen

https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$
 








> On 13 Nov 2025, at 18:23, Matthew Knepley <[email protected]> wrote:
> 
> On Thu, Nov 13, 2025 at 12:11 PM howen via petsc-users 
> <[email protected] <mailto:[email protected]>> wrote:
>> Dear Junchao,
>> 
>> Thank you for response and sorry for taking so long to answer back. 
>> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC 
>> and gives us problems when compiling our code.
>> What I have done to enable using the latest petsc is to create my own C code 
>> to call petsc. 
>> I have little experience with c and it took me some time, but I can now use 
>> petsc 3.24.1  ;)
>> 
>> The behaviour remains the same as in my original email . 
>> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all 
>> work ok and give the same result.
>> 
>> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi.
>> I see that the difference starts in 
>> src/ksp/ksp/impls/cg/cg.c  L170
>>     PetscCall(KSP_PCApply(ksp, R, Z));  /*    z <- Br                        
>>    */
>> I have printed the vectors R and Z and the norm dp.
>> R is identical on both CPU and GPU; but Z differs.
>> The correct value of dp (for the first time it enters) is 14.3014, while 
>> running on the GPU with 2 mpis it gives 14.7493.
>> If you wish I can send you prints I introduced in cg.c    
> 
> Thank you for all the detail in this report. However, since you see a problem 
> in KSPCG, I believe we can reduce the complexity. You can use
> 
>   -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin
> 
> and send us those files. Then we can run your system directly using KSP ex10 
> (and so can you).
> 
>   Thanks,
> 
>       Matt
>  
>> The folder with the input files to run the case can be downloaded from 
>> https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG1yKXAMP$
>>   
>> <https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$>
>> 
>> For submitting the gpu run I use 
>> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh 
>> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
>>  ChannelFlowSolverIncomp.json
>> 
>> For the cpu run
>> mpirun -np 2 
>> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d
>>  ChannelFlowSolverIncomp.json
>> 
>> Our code can be downloaded with :
>> git clone --recursive 
>> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG8xQvHi_$
>>   
>> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$>
>> 
>> -and the branch I am using with
>> git checkout 140-add-petsc
>> 
>> To use exactly the same commit I am using 
>> git checkout 09a923c9b57e46b14ae54b935845d50272691ace
>> 
>> 
>> I am currently using: Currently Loaded Modules:
>>   1) nvidia-hpc-sdk/25.1   2) hdf5/1.14.1-2-nvidia-nvhpcx   3) cmake/3.25.1
>> I guess/hope similar modules should be available in any supercomputer.
>> 
>> To build the cpu version 
>> mkdir build_cpu
>> cd build_cpu
>> 
>> export 
>> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal
>> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH
>> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH
>> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH
>> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH
>> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH
>> 
>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF 
>> -DDEBUG_MODE=OFF ..
>> make -j 80
>> 
>> I have built petsc myself  as follows
>> 
>> git clone -b release 
>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG9OyCmiL$
>>   
>> <https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$>
>>  petsc
>> cd petsc
>> git checkout v3.24.1     
>> module purge
>> module load nvidia-hpc-sdk/25.1   hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 
>> ./configure 
>> --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc 
>> --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal 
>> --with-fortran-bindings=0  --with-fc=0 --with-petsc-arch=linux-x86_64-opt 
>> --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 
>> --with-precision=single --download-hypre 
>> CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= 
>> --with-shared-libraries=1 --with-mpi=1 
>> --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a
>>  --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include 
>> --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/
>>  --download-ptscotch=yes --download-metis --download-parmetis
>> make all check
>> make install
>> 
>> -------------------
>> For the GPU version when configuring petsc I add : --with-cuda 
>> 
>> I then change the export PETSC_INSTALL  to 
>> export 
>> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal
>> and repeat all other exports
>> 
>> mkdir build_gpu
>> cd build_gpu
>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON -DDEBUG_MODE=OFF 
>> ..
>> make -j 80
>> 
>> As you can see from the submit instructions the executable is found in 
>> sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
>> 
>> I hope I have not forgotten anything and my instructions are 'easy' to 
>> follow. If you have any issue do not doubt to contact me.
>> The wiki for our code can be found in 
>> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG49E2dbs$
>>   
>> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$>
>> 
>> Best, 
>> 
>> Herbert Owen
>>  
>> Herbert Owen
>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>> Barcelona Supercomputing Center (BSC-CNS)
>> Tel: +34 93 413 4038
>> Skype: herbert.owen
>> 
>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$
>>   
>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 16 Oct 2025, at 18:30, Junchao Zhang <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi, Herbert,
>>>    I don't have much experience on OpenACC and PETSc CI doesn't have such 
>>> tests.  Could you avoid using nvfortran and instead use gfortran to compile 
>>> your Fortran + OpenACC code?  If you, then you can use the latest petsc 
>>> code and make our debugging easier. 
>>>    Also, could you provide us with a test and instructions to reproduce the 
>>> problem?
>>>    
>>>    Thanks!
>>> --Junchao Zhang
>>> 
>>> 
>>> On Thu, Oct 16, 2025 at 5:07 AM howen via petsc-users 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> Dear All,
>>>> 
>>>> I am interfacing our CFD code (Fortran + OpenACC)  to Petsc. 
>>>> Since we use OpenACC the natural choice for us is to use Nvidia´s nvhpc 
>>>> compiler. The Gnu compiler does not work well and we do not have access to 
>>>> the Cray compiler.  
>>>> 
>>>> I already know that the latest version of Petsc does not compile with 
>>>> nvhpc, I am therefore using version 3.21.  
>>>> I get good results on the CPU both in serial and parallel (MPI). However, 
>>>> the GPU implementation, that is what we are interested in, only work 
>>>> correctly for the serial version. In parallel, the results are different. 
>>>> Even for a CG solve. 
>>>> 
>>>> I would like to know, if you have experience with the Nvidia compiler.  I 
>>>> am particularly interested if you have already observed issues with it. 
>>>> Your opinion on whether to put further effort into trying to find a bug I 
>>>> may have introduced during the interfacing is highly appreciated.
>>>> 
>>>> Best,
>>>> 
>>>> Herbert Owen
>>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>>>> Barcelona Supercomputing Center (BSC-CNS)
>>>> Tel: +34 93 413 4038
>>>> Skype: herbert.owen
>>>> 
>>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$
>>>>   
>>>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!abuM7ozzUs7eISYBumHNxpvO2Tuy74KRM4-WWcunXHZVjQf1V032xQrCzTfC5vA_NM-35xMEZ9yJ8XK-3QFqjWBSWuUi$>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
> 
> 
> 
> --
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG02oLPV3$
>   
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG3J_36vG$
>  >

Reply via email to