> On Oct 22, 2025, at 2:55 AM, LEDAC Pierre <[email protected]> wrote: > > Barry, > > We are currently using more and more GPU computations and heavily relying on > PETSc solvers (through > boomeramg, amgx, gamg preconditioners) so yes we will report you any issues > or bottlenecks. > > This leads to my next question: any hope, one day, of a > MatGetDiagonal_SeqAIJHIPSPARSE implementation ?
I'll put that on my list. I should be able to get something working in a few days. Thanks for the reminder. Barry > > We know that there is Kokkos backend as a workaround though. > > Thanks again, > > Pierre LEDAC > Commissariat à l’énergie atomique et aux énergies alternatives > Centre de SACLAY > DES/ISAS/DM2S/SGLS/LCAN > Bâtiment 451 – point courrier n°41 > F-91191 Gif-sur-Yvette > +33 1 69 08 04 03 > +33 6 83 42 05 79 > De : Barry Smith <[email protected] <mailto:[email protected]>> > Envoyé : mardi 21 octobre 2025 16:35:24 > À : LEDAC Pierre > Cc : Junchao Zhang; [email protected] <mailto:[email protected]>; > BOURGEOIS Rémi > Objet : Re: [petsc-users] [GPU] Jacobi preconditioner > > > That is clearly a dramatic amount! Of course, the previous code was > absurd, copying all the nonzero entries to the host, finding the diagonal > entries, and then copying them back to the GPU. > > If, through Nsight, you find other similar performance bottlenecks, > please let us know, and I can try to resolve them. > > Barry > > >> On Oct 21, 2025, at 5:55 AM, LEDAC Pierre <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello, >> >> Thanks for the work ! >> It is ok now, i check with Nsight system, the diagonal is indeed computed on >> the device. >> >> How much time it saves ? I guess it depends of the number of iterations for >> Gmres, the lower, the more it is significant. >> In my case, with 5 158 400 rows for the matrix, 45 iterations of GMRES, time >> to solve decrease 1.160s from to 0.671s >> on a RTX A6000. >> >> So thanks again, >> >> Pierre LEDAC >> Commissariat à l’énergie atomique et aux énergies alternatives >> Centre de SACLAY >> DES/ISAS/DM2S/SGLS/LCAN >> Bâtiment 451 – point courrier n°41 >> F-91191 Gif-sur-Yvette >> +33 1 69 08 04 03 >> +33 6 83 42 05 79 >> De : Barry Smith <[email protected] <mailto:[email protected]>> >> Envoyé : vendredi 17 octobre 2025 23:27:19 >> À : LEDAC Pierre >> Cc : Junchao Zhang; [email protected] <mailto:[email protected]> >> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner >> >> >> I have updated the MR with what I think is now correct code for computing >> the diagonal on the GPU, could you please try it again and let me know if it >> works and how much time it saves (I think it is should be significant). >> >> Thankts for your patients, >> >> Barry >> >> >>> On Oct 2, 2025, at 1:16 AM, LEDAC Pierre <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Yes, probably the reason I saw also a crash in my test case after a quick >>> fix of the integer conversion. >>> >>> Pierre LEDAC >>> Commissariat à l’énergie atomique et aux énergies alternatives >>> Centre de SACLAY >>> DES/ISAS/DM2S/SGLS/LCAN >>> Bâtiment 451 – point courrier n°41 >>> F-91191 Gif-sur-Yvette >>> +33 1 69 08 04 03 >>> +33 6 83 42 05 79 >>> >>> De : Barry Smith <[email protected] <mailto:[email protected]>> >>> Envoyé : jeudi 2 octobre 2025 02:16:40 >>> À : LEDAC Pierre >>> Cc : Junchao Zhang; [email protected] <mailto:[email protected]> >>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner >>> >>> >>> Sorry about that. The current code is buggy anyways; I will let you know >>> when I have tested it extensively so you can try again. >>> >>> Barry >>> >>> >>>> On Oct 1, 2025, at 3:47 PM, LEDAC Pierre <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Sorry the correct error is: >>>> >>>> /export/home/catA/pl254994/trust/petsc/lib/src/LIBPETSC/build/petsc-barry-2025-09-30-add-matgetdiagonal-cuda/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu(3765): >>>> error: argument of type "int*" is incompatible with parameter of type >>>> "const PetscInt *" >>>> GetDiagonal_CSR<<<(int)((n + 255) / 256), 256, 0, >>>> PetscDefaultCudaStream>>>(cusparsestruct->rowoffsets_gpu->data().get(), >>>> matstruct->cprowIndices->data().get(), >>>> cusparsestruct->workVector->data().get(), n, darray); >>>> >>>> >>>> Pierre LEDAC >>>> Commissariat à l’énergie atomique et aux énergies alternatives >>>> Centre de SACLAY >>>> DES/ISAS/DM2S/SGLS/LCAN >>>> Bâtiment 451 – point courrier n°41 >>>> F-91191 Gif-sur-Yvette >>>> +33 1 69 08 04 03 >>>> +33 6 83 42 05 79 >>>> De : LEDAC Pierre >>>> Envoyé : mercredi 1 octobre 2025 21:46:00 >>>> À : Barry Smith >>>> Cc : Junchao Zhang; [email protected] >>>> <mailto:[email protected]> >>>> Objet : RE: [petsc-users] [GPU] Jacobi preconditioner >>>> >>>> Hi all, >>>> >>>> Thanks for the MR, there is a build issue cause we use >>>> --with-64-bit-indices: >>>> >>>> /export/home/catA/pl254994/trust/petsc/lib/src/LIBPETSC/build/petsc-barry-2025-09-30-add-matgetdiagonal-cuda/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu(3765): >>>> error: argument of type "PetscInt" is incompatible with parameter of type >>>> "const PetscInt *" >>>> GetDiagonal_CSR<<<(int)((n + 255) / 256), 256, 0, >>>> PetscDefaultCudaStream>>>(cusparsestruct->rowoffsets_gpu->data().get(), >>>> matstruct->cprowIndices->data().get(), >>>> cusparsestruct->workVector->data().get(), n, darray); >>>> >>>> Thanks, >>>> >>>> Pierre LEDAC >>>> Commissariat à l’énergie atomique et aux énergies alternatives >>>> Centre de SACLAY >>>> DES/ISAS/DM2S/SGLS/LCAN >>>> Bâtiment 451 – point courrier n°41 >>>> F-91191 Gif-sur-Yvette >>>> +33 1 69 08 04 03 >>>> +33 6 83 42 05 79 >>>> De : Barry Smith <[email protected] <mailto:[email protected]>> >>>> Envoyé : mercredi 1 octobre 2025 18:48:37 >>>> À : LEDAC Pierre >>>> Cc : Junchao Zhang; [email protected] >>>> <mailto:[email protected]> >>>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner >>>> >>>> >>>> I have finally created an MR that moves the Jacobi accessing of the >>>> diagonal to the GPU, which should improve the GPU performance of your >>>> code. >>>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8756__;!!G_uCfscf7eWS!Z3vNRk5sR_97xbqL3Cns8okunsbBvctJMySGgbt7k5XRvpmQ2mg2SoVEfyRr96Lw69iLdV1KRBASzeT7a35k-9U$ >>>> >>>> >>>> Please give it a try and let us know if it causes any difficulties or, >>>> hopefully, improves your code's performance significantly. >>>> >>>> Sorry for the long delay, NVIDIA is hiring too many PETSc developers >>>> away from us. >>>> >>>> Barry >>>> >>>>> On Jul 31, 2025, at 6:46 AM, LEDAC Pierre <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Thanks Barry, I agree but didn't dare asking for that. >>>>> >>>>> Pierre LEDAC >>>>> Commissariat à l’énergie atomique et aux énergies alternatives >>>>> Centre de SACLAY >>>>> DES/ISAS/DM2S/SGLS/LCAN >>>>> Bâtiment 451 – point courrier n°41 >>>>> F-91191 Gif-sur-Yvette >>>>> +33 1 69 08 04 03 >>>>> +33 6 83 42 05 79 >>>>> >>>>> De : Barry Smith <[email protected] <mailto:[email protected]>> >>>>> Envoyé : mercredi 30 juillet 2025 20:34:26 >>>>> À : Junchao Zhang >>>>> Cc : LEDAC Pierre; [email protected] >>>>> <mailto:[email protected]> >>>>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner >>>>> >>>>> >>>>> We absolutely should have a MatGetDiagonal_SeqAIJCUSPARSE(). It's >>>>> somewhat embarrassing that we don't provide this. >>>>> >>>>> I have found some potential code at >>>>> https://urldefense.us/v3/__https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse__;!!G_uCfscf7eWS!Z3vNRk5sR_97xbqL3Cns8okunsbBvctJMySGgbt7k5XRvpmQ2mg2SoVEfyRr96Lw69iLdV1KRBASzeT7a3YPyyk$ >>>>> >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Jul 28, 2025, at 11:43 AM, Junchao Zhang <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented. petsc/cuda >>>>>> and petsc/kokkos backends are separate code. >>>>>> If petsc/kokkos meet your needs, then just use them. For petsc users, >>>>>> we hope it will be just a difference of extra --download-kokkos >>>>>> --download-kokkos-kernels in configuration. >>>>>> >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Mon, Jul 28, 2025 at 2:51 AM LEDAC Pierre <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>>> Hello all, >>>>>>> >>>>>>> We are solving with PETSc a linear system updated every time step >>>>>>> (constant stencil but coefficients changing). >>>>>>> >>>>>>> The matrix is preallocated once with MatSetPreallocationCOO() then >>>>>>> filled each time step with MatSetValuesCOO() and we use device pointers >>>>>>> for coo_i, coo_j, and coefficients values. >>>>>>> >>>>>>> It is working fine with a GMRES Ksp solver and PC Jacobi but we are >>>>>>> surprised to see that every time step, during PCSetUp, >>>>>>> MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. >>>>>>> Looking at the API, it seems there is no >>>>>>> MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS(). >>>>>>> >>>>>>> Does it mean we should use Kokkos backend in PETSc to have Jacobi >>>>>>> preconditioner built directly on device ? Or I am doing something wrong >>>>>>> ? >>>>>>> NB: Gmres is running well on device. >>>>>>> >>>>>>> I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated >>>>>>> each solve on host but it increases significantly the number of >>>>>>> iterations. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> <pastedImage.png> >>>>>>> >>>>>>> >>>>>>> Pierre LEDAC >>>>>>> Commissariat à l’énergie atomique et aux énergies alternatives >>>>>>> Centre de SACLAY >>>>>>> DES/ISAS/DM2S/SGLS/LCAN >>>>>>> Bâtiment 451 – point courrier n°41 >>>>>>> F-91191 Gif-sur-Yvette >>>>>>> +33 1 69 08 04 03 >>>>>>> +33 6 83 42 05 79
