An overlap of 16 is huge and would rarely if ever, be done in practice. It is 
not surprising that the subproblems become as large as they do with such a 
large overlap. 

  How the overlap is used. 

  while (overlap--) {
     add to the subproblem all degrees of freedom that are coupled by nonzeros 
in the matrix to all the current degrees of freedom in the subproblem.
  }

  So it is grabbing all the neighbors for 16 rounds of grabbing.



> On Nov 5, 2025, at 3:05 PM, Angus, Justin Ray via petsc-dev 
> <[email protected]> wrote:
> 
> I think the issue is my overlap is too large. Perhaps I don’t fully 
> understand how the overlap parameter is used. Let me explain my setup below.
> 
> My vector of unknowns is the electric field on a Yee grid in a 2D geometry. 
> I’m using 4x4 grid cells per rank. This gives 4*5 = 20 degrees of freedom for 
> each of the two in-plane components of E, and 5*5 = 25 for the out-of-plane 
> component. The total is 65 degrees of freedom per rank. My global problem 
> size is 224x16 on 224 ranks for one case, and 224x32 on 448 ranks for 
> another. Using ASM overlap 4, I get the following for PC sub blocks on a rank:
> PC Object: (sub_) 1 MPI process
>       type: lu
>         out-of-place factorization
>         Reusing fill from past factorization
>         Reusing reordering from past factorization
>         tolerance for zero pivot 2.22045e-14
>         matrix ordering: nd
>         factor fill ratio given 5., needed 4.02544
>           Factored matrix follows:
>             Mat Object: (sub_) 1 MPI process
>               type: seqaij
>               rows=402, cols=402
>               package used to perform factorization: petsc
>               total: nonzeros=16295, allocated nonzeros=16295
>                 not using I-node routines
> 
> The above is for a 224x16 size domain in 224 total ranks, but I get the same 
> thing for a 224x32 size domain on 448 ranks, which is what I am expected to 
> get.
> 
> However, if I set the overlap to 16 (which is larger than by box size on a 
> given rank), I get the following
> 224x16 gid on 112 ranks: 
> PC Object: (sub_) 1 MPI process
>       type: lu
>         out-of-place factorization
>         Reusing fill from past factorization
>         Reusing reordering from past factorization
>         tolerance for zero pivot 2.22045e-14
>         matrix ordering: nd
>         factor fill ratio given 5., needed 6.52557
>           Factored matrix follows:
>             Mat Object: (sub_) 1 MPI process
>               type: seqaij
>               rows=1316, cols=1316
>               package used to perform factorization: petsc
>               total: nonzeros=95195, allocated nonzeros=95195
>                 not using I-node routines
> 
> 224x16 gid on 112 ranks: 
> PC Object: (sub_) 1 MPI process
>       type: lu
>         out-of-place factorization
>         Reusing fill from past factorization
>         Reusing reordering from past factorization
>         tolerance for zero pivot 2.22045e-14
>         matrix ordering: nd
>         factor fill ratio given 5., needed 8.59182
>           Factored matrix follows:
>             Mat Object: (sub_) 1 MPI process
>               type: seqaij
>               rows=2632, cols=2632
>               package used to perform factorization: petsc
>               total: nonzeros=250675, allocated nonzeros=250675
>                 not using I-node routines
> 
> In this case, with an overlap much larger than the box size, the rows/cols 
> per rank go up by a factor of 2 when doubling the problem size at fixed work 
> per rank. 
> 
> Why is this?
> How exactly is the overlap parameter used?
> 
> Thank you.
> 
> -Justin
> 
> From: Angus, Justin Ray <[email protected]>
> Date: Wednesday, November 5, 2025 at 8:17 AM
> To: [email protected] <[email protected]>, Matthew Knepley <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: [petsc-dev] Additive Schwarz Method + ILU on GPU platforms
> 
> Thanks for the reply.
> 
> The work per block should be the same for the weak scaling. I know LU is not 
> scalable with respect to the block size.
> 
> Perhaps our setup is not doing what we think it is doing. I’ll look into it 
> further.
> 
> -Justin
> 
> From: Mark Adams <[email protected]>
> Date: Wednesday, November 5, 2025 at 6:14 AM
> To: Matthew Knepley <[email protected]>
> Cc: Angus, Justin Ray <[email protected]>, [email protected] 
> <[email protected]>
> Subject: Re: [petsc-dev] Additive Schwarz Method + ILU on GPU platforms
> 
> And we do not have sparse LU on GPUs so that is done on the CPU.
> 
> And I don't know why it would not weak scale well. 
> Your results are consistent with just using one process with one domain, (re 
> Matt) while you double the problem size.
> 
> On Tue, Nov 4, 2025 at 2:27 PM Matthew Knepley <[email protected] 
> <mailto:[email protected]>> wrote:
> On Tue, Nov 4, 2025 at 1:25 PM Angus, Justin Ray via petsc-dev 
> <[email protected] <mailto:[email protected]>> wrote:
> Hi Junchao,
> 
> We have recently been using ASM + LU for 2D problems on both CPU and GPU. 
> However, I found that this method has very bad weak scaling. I find that the 
> cost of PCApply increases by about a factor of 4 each time I increase the 
> problem size in 1 dimension by a factor of 2 while keeping the load per 
> core/gpu the same. The total number of GMRES iterations does not increase, 
> just the cost of PCApply (and PCSetup). Is this scaling behavior expected? 
> Any ideas of how to optimize the preconditioner?
> 
> The cost of PCApply for ASM is dominated by the cost of process-local block 
> solves. You are using LU for the block solve. (Sparse) LU has cost roughly 
> O(N^2) for the apply (depending on the structure of the matrix). So, if you 
> double the size of a local block, your runtime should increase by about 4x. 
> Thus LU is not a scalable method.
> 
>   Thanks,
> 
>      Matt
>  
> Thank you.
> 
> -Justin
> 
> From: Junchao Zhang <[email protected] <mailto:[email protected]>>
> Date: Monday, April 14, 2025 at 7:35 PM
> To: Angus, Justin Ray <[email protected] <mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>>, Ghosh, Debojyoti 
> <[email protected] <mailto:[email protected]>>
> Subject: Re: [petsc-dev] Additive Schwarz Method + ILU on GPU platforms
> 
> Petsc supports ILU0/ICC0 numeric factorization (without reordering) and then 
> triangular solve on GPUs. It is done by calling vendor libraries (ex. 
> cusparse).
> We have options -pc_factor_mat_factor_on_host <bool>  
> -pc_factor_mat_solve_on_host <bool> to force doing the factorization and 
> MatSolve on the host for device matrix types.
> 
> You can try to see if it works for your case.
> 
> --Junchao Zhang
> 
> 
> On Mon, Apr 14, 2025 at 4:39 PM Angus, Justin Ray via petsc-dev 
> <[email protected] <mailto:[email protected]>> wrote:
> Hello,
> 
>  
> A project I work on uses GMRES via PETSc. In particular, we have had good 
> successes using the Additive Schwarz Method + ILU preconditioner setup using 
> a CPU-based code. I found online where it is stated that “Parts of most 
> preconditioners run directly on the GPU” 
> (https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!a_8TfxeDzbG_lCrCC136iGZS5sjN7ztnUdFCfx8-z22iGCTLkqRkhKCH2veVVdMwnYaOYulKDOV-MlPE9UAwlA$
>   
> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bw6qeKcY7MKSvlEgcogdKR7fpjZSOFvka6zfDprUZ_sJHdE-YZmRD6UTqWQW3_uGVBII4P-AG0zaGTLbI67_fQ$>).
>  Is ASM + ILU also available for GPU platforms?
> 
>  
> -Justin
> 
> 
> 
> --
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!a_8TfxeDzbG_lCrCC136iGZS5sjN7ztnUdFCfx8-z22iGCTLkqRkhKCH2veVVdMwnYaOYulKDOV-MlP-gqELRw$
>   
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dXQeQOf4ckc4MRP64tltlc6e1FJgPXuEuzX8tHsTreO_vIP2Lbge1es994i-WdQTd1zpmNP2R9dbEHfLa0v_$>

Reply via email to