> On 2 Sep 2021, at 2:31 PM, Pierre Jolivet <[email protected]> wrote: > > > >> On 2 Sep 2021, at 2:07 PM, Viktor Nazdrachev <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello, Pierre! >> >> Thank you for your response! >> I attached log files (txt files with convergence behavior and RAM usage log >> in separate txt files) and resulting table with convergence investigation >> data(xls). Data for main non-regular grid with 500K cells and heterogeneous >> properties are in 500K folder, whereas data for simple uniform 125K cells >> grid with constant properties are in 125K folder. >> >> >Dear Viktor, >> > >> >> On 1 Sep 2021, at 10:42 AM, Наздрачёв Виктор <numbersixvs at gmail.com >> >> <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>> > <>wrote: >> >> >> >> Dear all, >> >> >> >> I have a 3D elasticity problem with heterogeneous properties. There is >> >> unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet >> >> BCs are imposed on bottom face of mesh. Also, Neumann (traction) BCs are >> >> imposed on side faces. Gravity load is also accounted for. The grid I use >> >> consists of 500k cells (which is approximately 1.6M of DOFs). >> >> >> >> The best performance and memory usage for single MPI process was obtained >> >> with HPDDM(BFBCG) solver >> >> >> >Block Krylov solvers are (most often) only useful if you have multiple >> >right-hand sides, e.g., in the context of elasticity, multiple loadings. >> Is that really the case? If not, you may as well stick to “standard” CG >> instead of the breakdown-free block (BFB) variant. >> > >> >> In that case only single right-hand side is utilized, so I switched to >> “standard” cg solver (-ksp_hpddm_type cg), but I noticed the interesting >> convergence behavior. For non-regular grid with 500K cells and heterogeneous >> properties CG solver converged with 1 iteration >> (log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt), but for more simple uniform >> grid with 125K cells and homogeneous properties CG solves linear system >> successfully(log_hpddm(cg)_gamg_nearnullspace_1_mpi.txt). >> BFBCG solver works properly for both grids. > > Just stick to -ksp_type cg or maybe -ksp_type gmres > -ksp_gmres_modifiedgramschmidt (even if the problem is SPD). > Sorry if I repeat myself, but KSPHPDDM methods are mostly useful for either > blocking or recycling. > If you use something as simple as CG, you’ll get better diagnostics and error > handling if you use the native PETSc implementation (KSPCG) instead of the > external implementation (-ksp_hpddm_type cg). > >> >> >> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s >> >> and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s >> >> when using 5.6 GB of RAM. This because of number of iterations required >> >> to achieve the same tolerance is significantly increased. >> >> >> >> I`ve also tried PCGAMG (agg) preconditioner with ICС (1) >> >> sub-precondtioner. For single MPI process, the calculation took 10 min >> >> and 3.4 GB of RAM. To improve the convergence rate, the nullspace was >> >> attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace >> >> subroutines. This has reduced calculation time to 3 m 58 s when using >> >> 4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which >> >> appears just before the start of the iterations. Parallel computation >> >> with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that case >> >> the peak memory usage is about 22 GB. >> >> >> >I’m surprised that GAMG is converging so slowly. What do you mean by >> >"ICC(1) sub-preconditioner"? Do you use that as a smoother or as a coarse >> >level solver? >> > >> >> Sorry for misleading, ICC is used only for BJACOBI preconditioner, no ICC >> for GAMG. >> >> >How many iterations are required to reach convergence? >> >Could you please maybe run the solver with -ksp_view -log_view and send us >> >the output? >> > >> >> For case with 4 MPI processes and attached nullspace it is required 177 >> iterations to reach convergence (you may see detailed log in >> log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt and memory usage log in >> RAM_log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 >> iterations are required for sequential >> run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >> >> >> >Most of the default parameters of GAMG should be good enough for 3D >> >elasticity, provided that your MatNullSpace is correct. >> > >> >> How can I be sure that nullspace is attached correctly? Is there any way for >> self-checking (Well perhaps calculate some parameters using matrix and >> solution vector)? >> >> >One parameter that may need some adjustments though is the aggregation >> >threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] >> >range, that’s what I always use for elasticity problems). >> > >> >> Tried to find optimal value of this option, set -pc_gamg_threshold 0.01 and >> -pc_gamg_threshold_scale 2, but I didn't notice any significant changes >> (Need more time for experiments ) >> > I don’t see anything too crazy in your logs at first sight. In addition to > maybe trying GMRES with a more robust orthogonalization scheme, here is what > I would do: > 1) MatSetBlockSize(Pmat, 6), it seems to be missing right now, cf.
Sorry for the noise, but this should read 3, not 6… Thanks, Pierre > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=1600200, cols=1600200 > total: nonzeros=124439742, allocated nonzeros=259232400 > total number of mallocs used during MatSetValues calls=0 > has attached near null space > 2) -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu > 3) more playing around with the threshold, this can be critical for hard > problems > If you can share your matrix/nullspace/RHS, we could have a crack at it as > well. > > Thanks, > Pierre > >> Kind regards, >> >> Viktor Nazdrachev >> >> R&D senior researcher >> >> Geosteering Technologies LLC >> >> >> ср, 1 сент. 2021 г. в 12:01, Pierre Jolivet <[email protected] >> <mailto:[email protected]>>: >> Dear Viktor, >> >>> On 1 Sep 2021, at 10:42 AM, Наздрачёв Виктор <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Dear all, >>> >>> I have a 3D elasticity problem with heterogeneous properties. There is >>> unstructured grid with aspect ratio varied from 4 to 25. Zero Dirichlet BCs >>> are imposed on bottom face of mesh. Also, Neumann (traction) BCs are >>> imposed on side faces. Gravity load is also accounted for. The grid I use >>> consists of 500k cells (which is approximately 1.6M of DOFs). >>> >>> The best performance and memory usage for single MPI process was obtained >>> with HPDDM(BFBCG) solver >>> >> Block Krylov solvers are (most often) only useful if you have multiple >> right-hand sides, e.g., in the context of elasticity, multiple loadings. >> Is that really the case? If not, you may as well stick to “standard” CG >> instead of the breakdown-free block (BFB) variant. >> >>> and bjacobian + ICC (1) in subdomains as preconditioner, it took 1 m 45 s >>> and RAM 5.0 GB. Parallel computation with 4 MPI processes took 2 m 46 s >>> when using 5.6 GB of RAM. This because of number of iterations required to >>> achieve the same tolerance is significantly increased. >>> >>> I`ve also tried PCGAMG (agg) preconditioner with ICС (1) sub-precondtioner. >>> For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To >>> improve the convergence rate, the nullspace was attached using >>> MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. This has >>> reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there >>> is peak memory usage with 14.1 GB, which appears just before the start of >>> the iterations. Parallel computation with 4 MPI processes took 2 m 53 s >>> when using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB. >>> >> I’m surprised that GAMG is converging so slowly. What do you mean by "ICC(1) >> sub-preconditioner"? Do you use that as a smoother or as a coarse level >> solver? >> How many iterations are required to reach convergence? >> Could you please maybe run the solver with -ksp_view -log_view and send us >> the output? >> Most of the default parameters of GAMG should be good enough for 3D >> elasticity, provided that your MatNullSpace is correct. >> One parameter that may need some adjustments though is the aggregation >> threshold -pc_gamg_threshold (you could try values in the [0.01; 0.1] range, >> that’s what I always use for elasticity problems). >> >> Thanks, >> Pierre >> >>> Are there ways to avoid decreasing of the convergence rate for bjacobi >>> precondtioner in parallel mode? Does it make sense to use hierarchical or >>> nested krylov methods with a local gmres solver (sub_pc_type gmres) and >>> some sub-precondtioner (for example, sub_pc_type bjacobi)? >>> >>> >>> Is this peak memory usage expected for gamg preconditioner? is there any >>> way to reduce it? >>> >>> >>> What advice would you give to improve the convergence rate with multiple >>> MPI processes, but keep memory consumption reasonable? >>> >>> >>> Kind regards, >>> >>> Viktor Nazdrachev >>> >>> R&D senior researcher >>> >>> Geosteering Technologies LLC >>> >> >> <logs.rar>
