Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Jed Brown
"Blondel, Sophie"  writes:

> Hi Jed,
>
> I'm not sure I'm going to reply to your question correctly because I don't 
> really understand how the split is done. Is it related to on diagonal and off 
> diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
> DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
> for the reaction term.

>From the run-time option, it'll be a default (additive) split and we're 
>interested in the two diagonal blocks. One currently has a cheap solver that 
>would only be efficient with a well-conditioned positive definite matrix and 
>the other is using a direct solver ('redundant'). If you were to run with 
>-ksp_view and share the output, it would be informative.

Either way, I'd like to understand what physics are beind the equation 
currently being solved with 'redundant'. If it's diffusive, then algebraic 
multigrid would be a good place to start.

> Let us know what we can do to answer this question more accurately.
>
> Cheers,
>
> Sophie
> 
> From: Jed Brown 
> Sent: Tuesday, November 28, 2023 19:07
> To: Fackler, Philip ; Junchao Zhang 
> 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
> performance losses switching to COO interface
>
> [Some people who received this message don't often get email from 
> j...@jedbrown.org. Learn why this is important at 
> https://aka.ms/LearnAboutSenderIdentification ]
>
> "Fackler, Philip via petsc-users"  writes:
>
>> That makes sense. Here are the arguments that I think are relevant:
>>
>> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type 
>> fieldsplit -pc_fieldsplit_detect_coupling​
>
> What sort of physics are in splits 0 and 1?
>
> SOR is not a good GPU algorithm, so we'll want to change that one way or 
> another. Are the splits of similar size or very different?
>
>> What would you suggest to make this better?
>>
>> Also, note that the cases marked "serial" are running on CPU only, that is, 
>> using only the SERIAL backend for kokkos.
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory
>> 
>> From: Junchao Zhang 
>> Sent: Tuesday, November 28, 2023 15:51
>> To: Fackler, Philip 
>> Cc: petsc-users@mcs.anl.gov ; 
>> xolotl-psi-developm...@lists.sourceforge.net 
>> 
>> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
>> switching to COO interface
>>
>> Hi, Philip,
>>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos 
>> does not have a GPU LU implementation, we do it on CPU via 
>> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>>
>> [Screenshot 2023-11-28 at 2.43.03 PM.png]
>> --Junchao Zhang
>>
>>
>> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
>> mailto:fackle...@ornl.gov>> wrote:
>> I definitely dropped the ball on this. I'm sorry for that. I have new 
>> profiling data using the latest (as of yesterday) of petsc/main. I've put 
>> them in a single google drive folder linked here:
>>
>> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link<https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=Qn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6=_Lqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54_9OvJI=>
>>
>> Have a happy holiday weekend!
>>
>> Thanks,
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory
>> 
>> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
>> Sent: Monday, October 16, 2023 15:24
>> To: Fackler, Philip mailto:fackle...@ornl.gov>>
>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>> mailto:petsc-users@mcs.anl.gov>>; 
>> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>>  
>> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
>> Subject: Re: [

Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Fackler, Philip via petsc-users
I'm sorry for the extra confusion. I copied those arguments from the wrong 
place. We're actually using jacobi​ instead of sor​ for fieldsplit 0.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Blondel, Sophie 
Sent: Wednesday, November 29, 2023 11:03
To: Brown, Jed ; Fackler, Philip ; 
Junchao Zhang 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
performance losses switching to COO interface

Hi Jed,

I'm not sure I'm going to reply to your question correctly because I don't 
really understand how the split is done. Is it related to on diagonal and off 
diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
for the reaction term.

Let us know what we can do to answer this question more accurately.

Cheers,

Sophie

From: Jed Brown 
Sent: Tuesday, November 28, 2023 19:07
To: Fackler, Philip ; Junchao Zhang 

Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
performance losses switching to COO interface

[Some people who received this message don't often get email from 
j...@jedbrown.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification<https://urldefense.us/v2/url?u=https-3A__aka.ms_LearnAboutSenderIdentification=DwMGaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=owjSIigcMR9I7l2CwJBbVn5A6D7SU6RaEKwSNLafLr2msXHmjhMhZWSPCqdwWRAI=dxTWlHKB4itRnwMh05b4rPf4V4axP7XpXAJIUNJYWoQ=>
 ]

"Fackler, Philip via petsc-users"  writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling​

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang 
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link<https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=Qn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6=_Lqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54_9OvJI=><https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink-253Chttps-3A__urldefense.us_v2_url-3Fu-3Dhttps-2D3A-5F-5Fdrive.google.com-5Fdrive-5Ffolders-5F14ScvyfxOzc4OzXs9HZVeQDO-2D2Dg6FdIVAI-2D3Fusp-2D3Ddrive-2D5Flink-26d-3DDwMFaQ-26c-3Dv4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-2DO7C4ViYc-26r-3DDAkLCjn8leYU-2DuJ-2DkfNEQMhPZWx9lzc4d5KgIR-2DRZWQ-26m-3DQn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6-26s-3D-5FLqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54-5F9OvJI-26e-3D-253E=DwMGaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=owjSIigcMR9I7l2CwJBbVn5A6D7SU6RaEKwSNLafLr2msXHmjhMhZWSPCqdwWRAI=oGtVi32Tn6u94h6NHstgb1mNruDdymyS686_qBYnxTs=>
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Phi

Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Blondel, Sophie via petsc-users
Hi Jed,

I'm not sure I'm going to reply to your question correctly because I don't 
really understand how the split is done. Is it related to on diagonal and off 
diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
for the reaction term.

Let us know what we can do to answer this question more accurately.

Cheers,

Sophie

From: Jed Brown 
Sent: Tuesday, November 28, 2023 19:07
To: Fackler, Philip ; Junchao Zhang 

Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
performance losses switching to COO interface

[Some people who received this message don't often get email from 
j...@jedbrown.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

"Fackler, Philip via petsc-users"  writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling​

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang 
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link<https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=Qn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6=_Lqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54_9OvJI=>
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Sent: Monday, October 16, 2023 15:24
> To: Fackler, Philip mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>That branch was merged to petsc/main today. Let me know once you have new 
> profiling results.
>
>Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before; 
> yellow is the new one). There is an improvement for the NE_3 case only with 
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI 
> cases, MatShift doesn't show up (I assume because we're using different 
> preconditioner arguments). So, there must be some other primary culprit. I'll 
> try to get updated profiling data to you soon.
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Comput