"Fackler, Philip via petsc-users" <petsc-users@mcs.anl.gov> writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling​

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> ________________________________
> From: Junchao Zhang <junchao.zh...@gmail.com>
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip <fackle...@ornl.gov>
> Cc: petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov>; 
> xolotl-psi-developm...@lists.sourceforge.net 
> <xolotl-psi-developm...@lists.sourceforge.net>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>    I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> <fackle...@ornl.gov<mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link<https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=Qn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6&s=_Lqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54_9OvJI&e=>
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> ________________________________
> From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
> Sent: Monday, October 16, 2023 15:24
> To: Fackler, Philip <fackle...@ornl.gov<mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> <xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>    That branch was merged to petsc/main today. Let me know once you have new 
> profiling results.
>
>    Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
> <fackle...@ornl.gov<mailto:fackle...@ornl.gov>> wrote:
> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before; 
> yellow is the new one). There is an improvement for the NE_3 case only with 
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI 
> cases, MatShift doesn't show up (I assume because we're using different 
> preconditioner arguments). So, there must be some other primary culprit. I'll 
> try to get updated profiling data to you soon.
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> ________________________________
> From: Fackler, Philip via Xolotl-psi-development 
> <xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Sent: Wednesday, October 11, 2023 11:31
> To: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> <xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected 
> performance losses switching to COO interface
>
> I'm on it.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> ________________________________
> From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
> Sent: Wednesday, October 11, 2023 10:14
> To: Fackler, Philip <fackle...@ornl.gov<mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> <xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
>  Blondel, Sophie <sblon...@utk.edu<mailto:sblon...@utk.edu>>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi,  Philip,
>   Could you try this branch 
> jczhang/2023-10-05/feature-support-matshift-aijkokkos ?
>
>   Thanks.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 4:52 PM Fackler, Philip 
> <fackle...@ornl.gov<mailto:fackle...@ornl.gov>> wrote:
> Aha! That makes sense. Thank you.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> ________________________________
> From: Junchao Zhang <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>>
> Sent: Thursday, October 5, 2023 17:29
> To: Fackler, Philip <fackle...@ornl.gov<mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>  
> <xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>>;
>  Blondel, Sophie <sblon...@utk.edu<mailto:sblon...@utk.edu>>
> Subject: [EXTERNAL] Re: [petsc-users] Unexpected performance losses switching 
> to COO interface
>
> Wait a moment, it seems it was because we do not have a GPU implementation of 
> MatShift...
> Let me see how to add it.
> --Junchao Zhang
>
>
> On Thu, Oct 5, 2023 at 10:58 AM Junchao Zhang 
> <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>> wrote:
> Hi, Philip,
>   I looked at the hpcdb-NE_3-cuda file. It seems you used MatSetValues() 
> instead of the COO interface?  MatSetValues() needs to copy the data from 
> device to host and thus is expensive.
>   Do you have profiling results with COO enabled?
>
> [Screenshot 2023-10-05 at 10.55.29 AM.png]
>
>
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:52 AM Junchao Zhang 
> <junchao.zh...@gmail.com<mailto:junchao.zh...@gmail.com>> wrote:
> Hi, Philip,
>   I will look into the tarballs and get back to you.
>    Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 2, 2023 at 9:41 AM Fackler, Philip via petsc-users 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
> We finally have xolotl ported to use the new COO interface and the aijkokkos 
> implementation for Mat (and kokkos for Vec). Comparing this port to our 
> previous version (using MatSetValuesStencil and the default Mat and Vec 
> implementations), we expected to see an improvement in performance for both 
> the "serial" and "cuda" builds (here I'm referring to the kokkos 
> configuration).
>
> Attached are two plots that show timings for three different cases. All of 
> these were run on Ascent (the Summit-like training system) with 6 MPI tasks 
> (on a single node). The CUDA cases were given one GPU per task (and used 
> CUDA-aware MPI). The labels on the blue bars indicate speedup. In all cases 
> we used "-fieldsplit_0_pc_type jacobi" to keep the comparison as consistent 
> as possible.
>
> The performance of RHSJacobian (where the bulk of computation happens in 
> xolotl) behaved basically as expected (better than expected in the serial 
> build). NE_3 case in CUDA was the only one that performed worse, but not 
> surprisingly, since its workload for the GPUs is much smaller. We've still 
> got more optimization to do on this.
>
> The real surprise was how much worse the overall solve times were. This seems 
> to be due simply to switching to the kokkos-based implementation. I'm 
> wondering if there are any changes we can make in configuration or runtime 
> arguments to help with PETSc's performance here. Any help looking into this 
> would be appreciated.
>
> The tarballs linked 
> here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_19X-5FL3SVkGBM9YUzXnRR-5FkVWFG0JFwqZ3_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=GW0ImGWhWr4rR5AoSULCnaP1CN1QWxTSeMDhdOuhTEA&e=>
>  and 
> here<https://urldefense.us/v2/url?u=https-3A__drive.google.com_file_d_15yDBN7-2DYlO1g6RJNPYNImzr611i1Ffhv_view-3Fusp-3Ddrive-5Flink&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ&m=GTpC2k9hIdMhUg_aJkeAqd-1CP5M8bwJMJjTriVE1k-j36ZnEHerQkZOzszxWoG2&s=tO-BnNY2myA-pIsRnBjQNoaOSjn-B3-lWGiQp7XXJwk&e=>
>  are profiling databases which, once extracted, can be viewed with hpcviewer. 
> I don't know how helpful that will be, but hopefully it can give you some 
> direction.
>
> Thanks for your help,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory

Reply via email to