Hi Stephan, Yes, MIS(A^T A) -> MIS(MIS(A)) change?
Yep, that is it. This change was required because A^T A is super expensive. This change did not do much to my tests but this is complex. I am on travel now, but I can get to this in a few days. You provided me with a lot of data and I can take a look, but I think we need to look at parameters, Thanks, Mark On Wed, Aug 9, 2023 at 10:08 AM Stephan Kramer <s.kra...@imperial.ac.uk> wrote: > Dear petsc devs > > We have noticed a performance regression using GAMG as the > preconditioner to solve the velocity block in a Stokes equations saddle > point system with variable viscosity solved on a 3D hexahedral mesh of a > spherical shell using Q2-Q1 elements. This is comparing performance from > the beginning of last year (petsc 3.16.4) and a more recent petsc master > (from around May this year). This is the weak scaling analysis we > published in https://doi.org/10.5194/gmd-15-5127-2022 Previously the > number of iterations for the velocity block (inner solve of the Schur > complement) starts at 40 iterations > ( > https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png) > > and only slowly going for larger problems (+more cores). Now the number > of iterations now starts at 60 > ( > https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png), > > same tolerances, again slowly going up with increasing size, with the > cost per iteration also gone up (slightly) - resulting in an increased > runtime of > 50%. > > The main change we can see is that the coarsening seems to have gotten a > lot less aggressive at the first coarsening stage (finest->to > one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? > The performance issues might be similar to > https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html > ? > > As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the > older petsc version we had: > > rows=126, cols=126, bs=6 > total: nonzeros=15876, allocated nonzeros=15876 > -- > rows=3072, cols=3072, bs=6 > total: nonzeros=3344688, allocated nonzeros=3344688 > -- > rows=91152, cols=91152, bs=6 > total: nonzeros=109729584, allocated nonzeros=109729584 > -- > rows=2655378, cols=2655378, bs=6 > total: nonzeros=1468980252, allocated nonzeros=1468980252 > -- > rows=152175366, cols=152175366, bs=3 > total: nonzeros=29047661586, allocated nonzeros=29047661586 > > Whereas with the newer version we get: > > rows=420, cols=420, bs=6 > total: nonzeros=176400, allocated nonzeros=176400 > -- > rows=6462, cols=6462, bs=6 > total: nonzeros=10891908, allocated nonzeros=10891908 > -- > rows=91716, cols=91716, bs=6 > total: nonzeros=81687384, allocated nonzeros=81687384 > -- > rows=5419362, cols=5419362, bs=6 > total: nonzeros=3668190588, allocated nonzeros=3668190588 > -- > rows=152175366, cols=152175366, bs=3 > total: nonzeros=29047661586, allocated nonzeros=29047661586 > > So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to > 2.6e6 DOFs. Note that we are providing the rigid body near nullspace, > hence the bs=3 to bs=6. > We have tried different values for the gamg_threshold but it doesn't > really seem to significantly alter the coarsening amount in that first > step. > > Do you have any suggestions for further things we should try/look at? > Any feedback would be much appreciated > > Best wishes > Stephan Kramer > > Full logs including log_view timings available from > https://github.com/stephankramer/petsc-scaling/ > > In particular: > > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > >