Re: [petsc-users] Slow convergence while parallel computations.

Pierre Jolivet Thu, 02 Sep 2021 05:34:50 -0700


> On 2 Sep 2021, at 2:31 PM, Pierre Jolivet <[email protected]> wrote:
> 
> 
> 
>> On 2 Sep 2021, at 2:07 PM, Viktor Nazdrachev <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hello, Pierre!
>> 
>> Thank you for your response!
>> I attached log files (txt files with convergence behavior and RAM usage log 
>> in separate txt files) and resulting table with convergence investigation 
>> data(xls). Data for main non-regular grid with 500K cells and heterogeneous 
>> properties are in 500K folder, whereas data for simple uniform 125K cells 
>> grid with constant properties are in 125K folder.  
>>  
>> >Dear Viktor,
>> > 
>> >> On 1 Sep 2021, at 10:42 AM, Наздрачёв Виктор <numbersixvs at gmail.com 
>> >> <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>> > <>wrote:
>> >>
>> >> Dear all,
>> >>
>> >> I have a 3D elasticity problem with heterogeneous properties. There is 
>> >> unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet 
>> >> BCs  are imposed on bottom face of mesh. Also, Neumann (traction) BCs are 
>> >> imposed on side faces. Gravity load is also accounted for. The grid I use 
>> >> consists of 500k cells (which is approximately 1.6M of DOFs).
>> >>
>> >> The best performance and memory usage for single MPI process was obtained 
>> >> with HPDDM(BFBCG) solver
>> >>
>> >Block Krylov solvers are (most often) only useful if you have multiple 
>> >right-hand sides, e.g., in the context of elasticity, multiple loadings.
>> Is that really the case? If not, you may as well stick to “standard” CG 
>> instead of the breakdown-free block (BFB) variant.
>> > 
>>  
>> In that case only single right-hand side is utilized, so I switched to 
>> “standard” cg solver (-ksp_hpddm_type cg), but I noticed the interesting 
>> convergence behavior. For non-regular grid with 500K cells and heterogeneous 
>> properties CG  solver converged with 1 iteration 
>> (log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt), but for more simple uniform 
>> grid with 125K cells and homogeneous properties CG solves linear system 
>> successfully(log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt).
>> BFBCG solver works properly for both grids.
> 
> Just stick to -ksp_type cg or maybe -ksp_type gmres 
> -ksp_gmres_modifiedgramschmidt (even if the problem is SPD).
> Sorry if I repeat myself, but KSPHPDDM methods are mostly useful for either 
> blocking or recycling.
> If you use something as simple as CG, you’ll get better diagnostics and error 
> handling if you use the native PETSc implementation (KSPCG) instead of the 
> external implementation (-ksp_hpddm_type cg).
> 
>>   
>> >> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s 
>> >> and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s 
>> >> when using 5.6 GB of RAM. This because of number of iterations required 
>> >> to achieve the same tolerance is significantly increased.
>> >>
>> >> I`ve also tried PCGAMG (agg) preconditioner with ICС (1) 
>> >> sub-precondtioner. For single MPI process, the calculation took 10 min 
>> >> and 3.4 GB of RAM. To improve the convergence rate, the nullspace was 
>> >> attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace 
>> >> subroutines.  This has reduced calculation time to 3 m 58 s when using 
>> >> 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which 
>> >> appears just before the start of the iterations. Parallel computation 
>> >> with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case 
>> >> the peak memory usage is about 22 GB.
>> >> 
>> >I’m surprised that GAMG is converging so slowly. What do you mean by 
>> >"ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse 
>> >level solver?
>> > 
>> 
>> Sorry for misleading, ICC is used only for BJACOBI preconditioner, no ICC 
>> for GAMG.
>>  
>> >How many iterations are required to reach convergence?
>> >Could you please maybe run the solver with -ksp_view -log_view and send us 
>> >the output?
>> > 
>>  
>> For case with 4 MPI processes and attached nullspace it is required 177 
>> iterations to reach convergence (you may see detailed log in 
>> log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt and memory usage log in 
>> RAM_log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 
>> iterations are required for sequential 
>> run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt).
>> 
>> 
>> >Most of the default parameters of GAMG should be good enough for 3D 
>> >elasticity, provided that your MatNullSpace is correct.
>> > 
>>  
>> How can I be sure that nullspace is attached correctly? Is there any way for 
>> self-checking (Well perhaps calculate some parameters using matrix and 
>> solution vector)? 
>>  
>> >One parameter that may need some adjustments though is the aggregation 
>> >threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] 
>> >range, that’s what I always use for elasticity problems).
>> > 
>>  
>> Tried to find optimal value of this option, set -pc_gamg_threshold 0.01 and 
>> -pc_gamg_threshold_scale 2, but I didn't notice any significant changes 
>> (Need more time for experiments ) 
>> 
> I don’t see anything too crazy in your logs at first sight. In addition to 
> maybe trying GMRES with a more robust orthogonalization scheme, here is what 
> I would do:
> 1) MatSetBlockSize(Pmat, 6), it seems to be missing right now, cf.


Sorry for the noise, but this should read 3, not 6…

Thanks,
Pierre

>   linear system matrix = precond matrix:
>   Mat Object: 4 MPI processes
>     type: mpiaij
>     rows=1600200, cols=1600200
>     total: nonzeros=124439742, allocated nonzeros=259232400
>     total number of mallocs used during MatSetValues calls=0
>       has attached near null space
> 2) -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu
> 3) more playing around with the threshold, this can be critical for hard 
> problems
> If you can share your matrix/nullspace/RHS, we could have a crack at it as 
> well.
> 
> Thanks,
> Pierre 
> 
>> Kind regards,
>>  
>> Viktor Nazdrachev
>>  
>> R&D senior researcher
>>  
>> Geosteering Technologies LLC 
>> 
>> 
>> ср, 1 сент. 2021 г. в 12:01, Pierre Jolivet <[email protected] 
>> <mailto:[email protected]>>:
>> Dear Viktor,
>> 
>>> On 1 Sep 2021, at 10:42 AM, Наздрачёв Виктор <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Dear all,
>>> 
>>> I have a 3D elasticity problem with heterogeneous properties. There is 
>>> unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs 
>>>  are imposed on bottom face of mesh. Also, Neumann (traction) BCs are 
>>> imposed on side faces. Gravity load is also accounted for. The grid I use 
>>> consists of 500k cells (which is approximately 1.6M of DOFs).
>>> 
>>> The best performance and memory usage for single MPI process was obtained 
>>> with HPDDM(BFBCG) solver
>>> 
>> Block Krylov solvers are (most often) only useful if you have multiple 
>> right-hand sides, e.g., in the context of elasticity, multiple loadings.
>> Is that really the case? If not, you may as well stick to “standard” CG 
>> instead of the breakdown-free block (BFB) variant.
>> 
>>> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s 
>>> and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s 
>>> when using 5.6 GB of RAM. This because of number of iterations required to 
>>> achieve the same tolerance is significantly increased.
>>> 
>>> I`ve also tried PCGAMG (agg) preconditioner with ICС (1) sub-precondtioner. 
>>> For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To 
>>> improve the convergence rate, the nullspace was attached using 
>>> MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.  This has 
>>> reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there 
>>> is peak memory usage with 14.1 GB, which appears just before the start of 
>>> the iterations. Parallel computation with 4 MPI processes took 2 m 53 s 
>>> when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.
>>> 
>> I’m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) 
>> sub-preconditioner"? Do you use that as a smoother or as a coarse level 
>> solver?
>> How many iterations are required to reach convergence?
>> Could you please maybe run the solver with -ksp_view -log_view and send us 
>> the output?
>> Most of the default parameters of GAMG should be good enough for 3D 
>> elasticity, provided that your MatNullSpace is correct.
>> One parameter that may need some adjustments though is the aggregation 
>> threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, 
>> that’s what I always use for elasticity problems).
>> 
>> Thanks,
>> Pierre
>> 
>>> Are there ways to avoid decreasing of the convergence rate for bjacobi 
>>> precondtioner in parallel mode? Does it make sense to use hierarchical or 
>>> nested krylov methods with a local gmres solver (sub_pc_type gmres) and 
>>> some sub-precondtioner (for example, sub_pc_type bjacobi)?
>>> 
>>>  
>>> Is this peak memory usage expected for gamg preconditioner? is there any 
>>> way to reduce it?
>>> 
>>>  
>>> What advice would you give to improve the convergence rate with multiple 
>>> MPI processes, but keep memory consumption reasonable?
>>> 
>>>  
>>> Kind regards,
>>> 
>>> Viktor Nazdrachev
>>> 
>>> R&D senior researcher
>>> 
>>> Geosteering Technologies LLC
>>> 
>> 
>> <logs.rar>

Re: [petsc-users] Slow convergence while parallel computations.

Reply via email to