Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-28 Thread Jed Brown
"Fackler, Philip via petsc-users"  writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling​

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang 
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link<https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=Qn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6=_Lqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54_9OvJI=>
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Sent: Monday, October 16, 2023 15:24
> To: Fackler, Philip mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>That branch was merged to petsc/main today. Let me know once you have new 
> profiling results.
>
>Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before; 
> yellow is the new one). There is an improvement for the NE_3 case only with 
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI 
> cases, MatShift doesn't show up (I assume because we're using different 
> preconditioner arguments). So, there must be some other primary culprit. I'll 
> try to get updated profiling data to you soon.
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Fackler, Philip via Xolotl-psi-development 
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Sent: Wednesday, October 11, 2023 11:31
> To: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected 
> performance losses switching to COO interface
>
> I'm on it.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
>

Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-22 Thread Fackler, Philip via petsc-users
I definitely dropped the ball on this. I'm sorry for that. I have new profiling 
data using the latest (as of yesterday) of petsc/main. I've put them in a 
single google drive folder linked here:

https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link

Have a happy holiday weekend!

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Monday, October 16, 2023 15:24
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
switching to COO interface

Hi, Philip,
   That branch was merged to petsc/main today. Let me know once you have new 
profiling results.

   Thanks.
--Junchao Zhang


On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
mailto:fackle...@ornl.gov>> wrote:
Junchao,

I've attached updated timing plots (red and blue are swapped from before; 
yellow is the new one). There is an improvement for the NE_3 case only with 
CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI cases, 
MatShift doesn't show up (I assume because we're using different preconditioner 
arguments). So, there must be some other primary culprit. I'll try to get 
updated profiling data to you soon.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Fackler, Philip via Xolotl-psi-development 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>
Sent: Wednesday, October 11, 2023 11:31
To: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>; 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>
Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected 
performance losses switching to COO interface

I'm on it.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Wednesday, October 11, 2023 10:14
To: Fackler, Philip mailto:fackle...@ornl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>; 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 Blondel, Sophie mailto:sblon...@utk.edu>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
switching to COO interface

Hi,  Philip,
  Could you try this branch 
jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip 
mailto:fackle...@ornl.gov>> wrote:
Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip mailto:fackle...@ornl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>; 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 Blondel, Sophie mailto:sblon...@utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching 
to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of 
MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() 
instead of the COO interface?  MatSetValues() needs to copy the data from 
device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29 AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>>

Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-10-16 Thread Junchao Zhang
Hi, Philip,
   That branch was merged to petsc/main today. Let me know once you have
new profiling results.

   Thanks.
--Junchao Zhang


On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip  wrote:

> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before;
> yellow is the new one). There is an improvement for the NE_3 case only with
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI
> cases, MatShift doesn't show up (I assume because we're using different
> preconditioner arguments). So, there must be some other primary culprit.
> I'll try to get updated profiling data to you soon.
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Fackler, Philip via Xolotl-psi-development <
> xolotl-psi-developm...@lists.sourceforge.net>
> *Sent:* Wednesday, October 11, 2023 11:31
> *To:* Junchao Zhang 
> *Cc:* petsc-users@mcs.anl.gov ;
> xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>
> *Subject:* Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users]
> Unexpected performance losses switching to COO interface
>
> I'm on it.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Wednesday, October 11, 2023 10:14
> *To:* Fackler, Philip 
> *Cc:* petsc-users@mcs.anl.gov ;
> xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>; Blondel, Sophie <
> sblon...@utk.edu>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses
> switching to COO interface
>
> Hi,  Philip,
>   Could you try this branch
> jczhang/2023-10-05/feature-support-matshift-aijkokkos ?
>
>   Thanks.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip  wrote:
>
> Aha! That makes sense. Thank you.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Thursday, October 5, 2023 17:29
> *To:* Fackler, Philip 
> *Cc:* petsc-users@mcs.anl.gov ;
> xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>; Blondel, Sophie <
> sblon...@utk.edu>
> *Subject:* [EXTERNAL] Re: [petsc-users] Unexpected performance losses
> switching to COO interface
>
> Wait a moment, it seems it was because we do not have a GPU implementation
> of MatShift...
> Let me see how to add it.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
> wrote:
>
> Hi, Philip,
>   I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues()
> instead of the COO interface?  MatSetValues() needs to copy the data from
> device to host and thus is expensive.
>   Do you have profiling results with COO enabled?
>
> [image: Screenshot 2023-10-05 at 10.55.29 AM.png]
>
>
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
> wrote:
>
> Hi, Philip,
>   I will look into the tarballs and get back to you.
>Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> We finally have xolotl ported to use the new COO interface and the
> aijkokkos implementation for Mat (and kokkos for Vec). Comparing this port
> to our previous version (using MatSetValuesStencil and the default Mat and
> Vec implementations), we expected to see an improvement in performance for
> both the "serial" and "cuda" builds (here I'm referring to the kokkos
> configuration).
>
> Attached are two plots that show timings for three different cases. All of
> these were run on Ascent (the Summit-like training system) with 6 MPI tasks
> (on a single node). The CUDA cases were given one GPU per task (and used
> CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases
> we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent
> as possible.
>
> The performance of RHSJacobian (where the bulk of computation happens in
> xolotl) behaved basically as expected (better than expected in the serial
> bu

Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-10-16 Thread Fackler, Philip via petsc-users
Junchao,

I've attached updated timing plots (red and blue are swapped from before; 
yellow is the new one). There is an improvement for the NE_3 case only with 
CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI cases, 
MatShift doesn't show up (I assume because we're using different preconditioner 
arguments). So, there must be some other primary culprit. I'll try to get 
updated profiling data to you soon.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Fackler, Philip via Xolotl-psi-development 

Sent: Wednesday, October 11, 2023 11:31
To: Junchao Zhang 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected 
performance losses switching to COO interface

I'm on it.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Wednesday, October 11, 2023 10:14
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 
; Blondel, Sophie 

Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
switching to COO interface

Hi,  Philip,
  Could you try this branch 
jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip 
mailto:fackle...@ornl.gov>> wrote:
Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip mailto:fackle...@ornl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>; 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 Blondel, Sophie mailto:sblon...@utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching 
to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of 
MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() 
instead of the COO interface?  MatSetValues() needs to copy the data from 
device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29 AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos 
implementation for Mat (and kokkos for Vec). Comparing this port to our 
previous version (using MatSetValuesStencil and the default Mat and Vec 
implementations), we expected to see an improvement in performance for both the 
"serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of 
these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on 
a single node). The CUDA cases were given one GPU per task (and used CUDA-aware 
MPI). The labels on the blue bars indicate speedup. In all cases we used 
"-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in 
xolotl) behaved basically as expected (better than expected in the serial 
build). NE_3 case in CUDA was the only one that performed worse, but not 
surprisingly, since its workload for the GPUs is much smaller. We've still got 
more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems 
to be due simply to switching to the kokkos-based implementation. I'm wondering 
if there are any changes we can make in configuration or runtime arguments to 
help with PETSc's performance here. Any help looking into this would be 
appreciated.

The tarballs linked 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddri

Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-10-11 Thread Fackler, Philip via petsc-users
I'm on it.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Wednesday, October 11, 2023 10:14
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 
; Blondel, Sophie 

Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
switching to COO interface

Hi,  Philip,
  Could you try this branch 
jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip 
mailto:fackle...@ornl.gov>> wrote:
Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip mailto:fackle...@ornl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>; 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 Blondel, Sophie mailto:sblon...@utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching 
to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of 
MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() 
instead of the COO interface?  MatSetValues() needs to copy the data from 
device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29 AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos 
implementation for Mat (and kokkos for Vec). Comparing this port to our 
previous version (using MatSetValuesStencil and the default Mat and Vec 
implementations), we expected to see an improvement in performance for both the 
"serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of 
these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on 
a single node). The CUDA cases were given one GPU per task (and used CUDA-aware 
MPI). The labels on the blue bars indicate speedup. In all cases we used 
"-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in 
xolotl) behaved basically as expected (better than expected in the serial 
build). NE_3 case in CUDA was the only one that performed worse, but not 
surprisingly, since its workload for the GPUs is much smaller. We've still got 
more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems 
to be due simply to switching to the kokkos-based implementation. I'm wondering 
if there are any changes we can make in configuration or runtime arguments to 
help with PETSc's performance here. Any help looking into this would be 
appreciated.

The tarballs linked 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA=>
 and 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk=>
 are profiling databases which, once extracted, can be viewed with hpcviewer. I 
don't know how helpful that will be, but hopefully it can give you some 
direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory


Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-10-06 Thread Zhang, Hong via petsc-users
I noticed that you are using ARKIMEX in the code. A temporary workaround you 
can try is to disable adaptive time stepping, e.g. by using the option 
-ts_adapt_type none. Then MatShift() will not be called when the Jacobians are 
computed.

Hong (Mr.)

On Oct 5, 2023, at 4:52 PM, Fackler, Philip via petsc-users 
 wrote:

Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 
; Blondel, Sophie 

Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching 
to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of 
MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() 
instead of the COO interface?  MatSetValues() needs to copy the data from 
device to host and thus is expensive.
  Do you have profiling results with COO enabled?




--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos 
implementation for Mat (and kokkos for Vec). Comparing this port to our 
previous version (using MatSetValuesStencil and the default Mat and Vec 
implementations), we expected to see an improvement in performance for both the 
"serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of 
these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on 
a single node). The CUDA cases were given one GPU per task (and used CUDA-aware 
MPI). The labels on the blue bars indicate speedup. In all cases we used 
"-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in 
xolotl) behaved basically as expected (better than expected in the serial 
build). NE_3 case in CUDA was the only one that performed worse, but not 
surprisingly, since its workload for the GPUs is much smaller. We've still got 
more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems 
to be due simply to switching to the kokkos-based implementation. I'm wondering 
if there are any changes we can make in configuration or runtime arguments to 
help with PETSc's performance here. Any help looking into this would be 
appreciated.

The tarballs linked 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA=>
 and 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk=>
 are profiling databases which, once extracted, can be viewed with hpcviewer. I 
don't know how helpful that will be, but hopefully it can give you some 
direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory