I'm on it.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zh...@gmail.com>
Sent: Wednesday, October 11, 2023 10:14
To: Fackler, Philip <fackle...@ornl.gov>
Cc: petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>; 
xolotl-psi-developm...@lists.sourceforge.net 
<xolotl-psi-developm...@lists.sourceforge.net>; Blondel, Sophie 
<sblon...@utk.edu>
Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
switching to COO interface

Hi,  Philip,
  Could you try this branch 
jczhang/2023-10-05/feature-support-matshift-aijkokkos ?

  Thanks.
--Junchao Zhang


On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip 
<fackle...@ornl.gov<mailto:fackle...@ornl.gov>> wrote:
Aha! That makes sense. Thank you.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
Sent: Thursday, October 5, 2023 17:29
To: Fackler, Philip <fackle...@ornl.gov<mailto:fackle...@ornl.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; 
xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
 
<xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
 Blondel, Sophie <sblon...@utk.edu<mailto:sblon...@utk.edu>>
Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching 
to COO interface

Wait a moment, it seems it was because we do not have a GPU implementation of 
MatShift...
Let me see how to add it.
--Junchao Zhang


On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
<junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() 
instead of the COO interface?  MatSetValues() needs to copy the data from 
device to host and thus is expensive.
  Do you have profiling results with COO enabled?

[Screenshot 2023-10-05 at 10.55.29 AM.png]


--Junchao Zhang


On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
<junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>> wrote:
Hi, Philip,
  I will look into the tarballs and get back to you.
   Thanks.
--Junchao Zhang


On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
We finally have xolotl ported to use the new COO interface and the aijkokkos 
implementation for Mat (and kokkos for Vec). Comparing this port to our 
previous version (using MatSetValuesStencil and the default Mat and Vec 
implementations), we expected to see an improvement in performance for both the 
"serial" and "cuda" builds (here I'm referring to the kokkos configuration).

Attached are two plots that show timings for three different cases. All of 
these were run on Ascent (the Summit-like training system) with 6 MPI tasks (on 
a single node). The CUDA cases were given one GPU per task (and used CUDA-aware 
MPI). The labels on the blue bars indicate speedup. In all cases we used 
"-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent as possible.

The performance of RHSJacobian (where the bulk of computation happens in 
xolotl) behaved basically as expected (better than expected in the serial 
build). NE_3 case in CUDA was the only one that performed worse, but not 
surprisingly, since its workload for the GPUs is much smaller. We've still got 
more optimization to do on this.

The real surprise was how much worse the overall solve times were. This seems 
to be due simply to switching to the kokkos-based implementation. I'm wondering 
if there are any changes we can make in configuration or runtime arguments to 
help with PETSc's performance here. Any help looking into this would be 
appreciated.

The tarballs linked 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=>
 and 
here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=>
 are profiling databases which, once extracted, can be viewed with hpcviewer. I 
don't know how helpful that will be, but hopefully it can give you some 
direction.

Thanks for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

Reply via email to