> On Oct 22, 2025, at 2:55 AM, LEDAC Pierre <[email protected]> wrote:
> 
> Barry,
> 
> We are currently using more and more GPU computations and heavily relying on 
> PETSc solvers (through
> boomeramg, amgx, gamg preconditioners) so yes we will report you any issues 
> or bottlenecks.
> 
> This leads to my next question: any hope, one day, of a 
> MatGetDiagonal_SeqAIJHIPSPARSE implementation ?

   I'll put that on my list. I should be able to get something working in a few 
days. Thanks for the reminder.

   Barry

> 
> We know that there is Kokkos backend as a workaround though.
> 
> Thanks again,
> 
> Pierre LEDAC
> Commissariat à l’énergie atomique et aux énergies alternatives
> Centre de SACLAY
> DES/ISAS/DM2S/SGLS/LCAN
> Bâtiment 451 – point courrier n°41
> F-91191 Gif-sur-Yvette
> +33 1 69 08 04 03
> +33 6 83 42 05 79
> De : Barry Smith <[email protected] <mailto:[email protected]>>
> Envoyé : mardi 21 octobre 2025 16:35:24
> À : LEDAC Pierre
> Cc : Junchao Zhang; [email protected] <mailto:[email protected]>; 
> BOURGEOIS Rémi
> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
>  
> 
>    That is clearly a dramatic amount! Of course, the previous code was 
> absurd, copying all the nonzero entries to the host, finding the diagonal 
> entries, and then copying them back to the GPU. 
> 
>     If, through Nsight, you find other similar performance bottlenecks, 
> please let us know, and I can try to resolve them.
> 
>    Barry
> 
> 
>> On Oct 21, 2025, at 5:55 AM, LEDAC Pierre <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hello,
>> 
>> Thanks for the work !
>> It is ok now, i check with Nsight system, the diagonal is indeed computed on 
>> the device.
>> 
>> How much time it saves ? I guess it depends of the number of iterations for 
>> Gmres, the lower, the more it is significant.
>> In my case, with 5 158 400 rows for the matrix, 45 iterations of GMRES, time 
>> to solve  decrease 1.160s from to 0.671s
>> on a RTX A6000.
>> 
>> So thanks again,
>> 
>> Pierre LEDAC
>> Commissariat à l’énergie atomique et aux énergies alternatives
>> Centre de SACLAY
>> DES/ISAS/DM2S/SGLS/LCAN
>> Bâtiment 451 – point courrier n°41
>> F-91191 Gif-sur-Yvette
>> +33 1 69 08 04 03
>> +33 6 83 42 05 79
>> De : Barry Smith <[email protected] <mailto:[email protected]>>
>> Envoyé : vendredi 17 octobre 2025 23:27:19
>> À : LEDAC Pierre
>> Cc : Junchao Zhang; [email protected] <mailto:[email protected]>
>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
>>  
>> 
>>   I have updated the MR with what I think is now correct code for computing 
>> the diagonal on the GPU, could you please try it again and let me know if it 
>> works and how much time it saves (I think it is should be significant).
>> 
>>    Thankts for your patients,
>> 
>>   Barry
>> 
>> 
>>> On Oct 2, 2025, at 1:16 AM, LEDAC Pierre <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Yes, probably the reason I saw also a crash in my test case after a quick 
>>> fix of the integer conversion. 
>>> 
>>> Pierre LEDAC
>>> Commissariat à l’énergie atomique et aux énergies alternatives
>>> Centre de SACLAY
>>> DES/ISAS/DM2S/SGLS/LCAN
>>> Bâtiment 451 – point courrier n°41
>>> F-91191 Gif-sur-Yvette
>>> +33 1 69 08 04 03
>>> +33 6 83 42 05 79
>>>  
>>> De : Barry Smith <[email protected] <mailto:[email protected]>>
>>> Envoyé : jeudi 2 octobre 2025 02:16:40
>>> À : LEDAC Pierre
>>> Cc : Junchao Zhang; [email protected] <mailto:[email protected]>
>>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
>>>  
>>> 
>>>   Sorry about that. The current code is buggy anyways; I will let you know 
>>> when I have tested it extensively so you can try again.
>>> 
>>>   Barry
>>> 
>>> 
>>>> On Oct 1, 2025, at 3:47 PM, LEDAC Pierre <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Sorry the correct error is:
>>>> 
>>>> /export/home/catA/pl254994/trust/petsc/lib/src/LIBPETSC/build/petsc-barry-2025-09-30-add-matgetdiagonal-cuda/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu(3765):
>>>>  error: argument of type "int*" is incompatible with parameter of type 
>>>> "const PetscInt *"
>>>>         GetDiagonal_CSR<<<(int)((n + 255) / 256), 256, 0, 
>>>> PetscDefaultCudaStream>>>(cusparsestruct->rowoffsets_gpu->data().get(), 
>>>> matstruct->cprowIndices->data().get(), 
>>>> cusparsestruct->workVector->data().get(), n, darray);
>>>> 
>>>> 
>>>> Pierre LEDAC
>>>> Commissariat à l’énergie atomique et aux énergies alternatives
>>>> Centre de SACLAY
>>>> DES/ISAS/DM2S/SGLS/LCAN
>>>> Bâtiment 451 – point courrier n°41
>>>> F-91191 Gif-sur-Yvette
>>>> +33 1 69 08 04 03
>>>> +33 6 83 42 05 79
>>>> De : LEDAC Pierre
>>>> Envoyé : mercredi 1 octobre 2025 21:46:00
>>>> À : Barry Smith
>>>> Cc : Junchao Zhang; [email protected] 
>>>> <mailto:[email protected]>
>>>> Objet : RE: [petsc-users] [GPU] Jacobi preconditioner
>>>>  
>>>> Hi all,
>>>> 
>>>> Thanks for the MR, there is a build issue cause we use 
>>>> --with-64-bit-indices:
>>>> 
>>>> /export/home/catA/pl254994/trust/petsc/lib/src/LIBPETSC/build/petsc-barry-2025-09-30-add-matgetdiagonal-cuda/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu(3765):
>>>>  error: argument of type "PetscInt" is incompatible with parameter of type 
>>>> "const PetscInt *"
>>>>         GetDiagonal_CSR<<<(int)((n + 255) / 256), 256, 0, 
>>>> PetscDefaultCudaStream>>>(cusparsestruct->rowoffsets_gpu->data().get(), 
>>>> matstruct->cprowIndices->data().get(), 
>>>> cusparsestruct->workVector->data().get(), n, darray);
>>>> 
>>>> Thanks,
>>>> 
>>>> Pierre LEDAC
>>>> Commissariat à l’énergie atomique et aux énergies alternatives
>>>> Centre de SACLAY
>>>> DES/ISAS/DM2S/SGLS/LCAN
>>>> Bâtiment 451 – point courrier n°41
>>>> F-91191 Gif-sur-Yvette
>>>> +33 1 69 08 04 03
>>>> +33 6 83 42 05 79
>>>> De : Barry Smith <[email protected] <mailto:[email protected]>>
>>>> Envoyé : mercredi 1 octobre 2025 18:48:37
>>>> À : LEDAC Pierre
>>>> Cc : Junchao Zhang; [email protected] 
>>>> <mailto:[email protected]>
>>>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
>>>>  
>>>> 
>>>>     I have finally created an MR that moves the Jacobi accessing of the 
>>>> diagonal to the GPU, which should improve the GPU performance of your 
>>>> code. 
>>>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8756__;!!G_uCfscf7eWS!Z3vNRk5sR_97xbqL3Cns8okunsbBvctJMySGgbt7k5XRvpmQ2mg2SoVEfyRr96Lw69iLdV1KRBASzeT7a35k-9U$
>>>>  
>>>> 
>>>>     Please give it a try and let us know if it causes any difficulties or, 
>>>> hopefully, improves your code's performance significantly.
>>>> 
>>>>    Sorry for the long delay, NVIDIA is hiring too many PETSc developers 
>>>> away from us.
>>>> 
>>>>    Barry
>>>> 
>>>>> On Jul 31, 2025, at 6:46 AM, LEDAC Pierre <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Thanks Barry, I agree but didn't dare asking for that.
>>>>> 
>>>>> Pierre LEDAC
>>>>> Commissariat à l’énergie atomique et aux énergies alternatives
>>>>> Centre de SACLAY
>>>>> DES/ISAS/DM2S/SGLS/LCAN
>>>>> Bâtiment 451 – point courrier n°41
>>>>> F-91191 Gif-sur-Yvette
>>>>> +33 1 69 08 04 03
>>>>> +33 6 83 42 05 79
>>>>>     
>>>>> De : Barry Smith <[email protected] <mailto:[email protected]>>
>>>>> Envoyé : mercredi 30 juillet 2025 20:34:26
>>>>> À : Junchao Zhang
>>>>> Cc : LEDAC Pierre; [email protected] 
>>>>> <mailto:[email protected]>
>>>>> Objet : Re: [petsc-users] [GPU] Jacobi preconditioner
>>>>>  
>>>>> 
>>>>>    We absolutely should have a MatGetDiagonal_SeqAIJCUSPARSE(). It's 
>>>>> somewhat embarrassing that we don't provide this.
>>>>> 
>>>>>    I have found some potential code at 
>>>>> https://urldefense.us/v3/__https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse__;!!G_uCfscf7eWS!Z3vNRk5sR_97xbqL3Cns8okunsbBvctJMySGgbt7k5XRvpmQ2mg2SoVEfyRr96Lw69iLdV1KRBASzeT7a3YPyyk$
>>>>>  
>>>>> 
>>>>>    Barry
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 28, 2025, at 11:43 AM, Junchao Zhang <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Yes, MatGetDiagonal_SeqAIJCUSPARSE hasn't been implemented.  petsc/cuda 
>>>>>> and petsc/kokkos backends are separate code.  
>>>>>> If petsc/kokkos meet your needs, then just use them.  For petsc users, 
>>>>>> we hope it will be just a difference of extra --download-kokkos 
>>>>>> --download-kokkos-kernels in configuration. 
>>>>>> 
>>>>>> --Junchao Zhang
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 28, 2025 at 2:51 AM LEDAC Pierre <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> Hello all,
>>>>>>> 
>>>>>>> We are solving with PETSc a linear system updated every time step 
>>>>>>> (constant stencil but coefficients changing).
>>>>>>> 
>>>>>>> The matrix is preallocated once with MatSetPreallocationCOO() then 
>>>>>>> filled each time step with MatSetValuesCOO() and we use device pointers 
>>>>>>> for coo_i, coo_j, and coefficients values.
>>>>>>> 
>>>>>>> It is working fine with a GMRES Ksp solver and PC Jacobi but we are 
>>>>>>> surprised to see that every time step, during PCSetUp, 
>>>>>>> MatGetDiagonal_SeqAIJ is called whereas the matrix is on the device. 
>>>>>>> Looking at the API, it seems there is no 
>>>>>>> MatGetDiagonal_SeqAIJCUSPARSE() but a MatGetDiagonal_SeqAIJKOKKOS().
>>>>>>> 
>>>>>>> Does it mean we should use Kokkos backend in PETSc to have Jacobi 
>>>>>>> preconditioner built directly on device ? Or I am doing something wrong 
>>>>>>> ?
>>>>>>> NB: Gmres is running well on device.
>>>>>>> 
>>>>>>> I could use -ksp_reuse_preconditioner to avoid Jacobi being recreated 
>>>>>>> each solve on host but it increases significantly the number of 
>>>>>>> iterations.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> <pastedImage.png>
>>>>>>> 
>>>>>>> 
>>>>>>> Pierre LEDAC
>>>>>>> Commissariat à l’énergie atomique et aux énergies alternatives
>>>>>>> Centre de SACLAY
>>>>>>> DES/ISAS/DM2S/SGLS/LCAN
>>>>>>> Bâtiment 451 – point courrier n°41
>>>>>>> F-91191 Gif-sur-Yvette
>>>>>>> +33 1 69 08 04 03
>>>>>>> +33 6 83 42 05 79

Reply via email to