Re: [petsc-dev] [petsc-users] performance regression with GAMG

Mark Adams Wed, 11 Oct 2023 05:40:24 -0700

BTW, the MR has been merged to main.
Thanks,
Mark

On Wed, Oct 11, 2023 at 1:46 AM Pierre Jolivet <pie...@joliv.et> wrote:


>
> On 11 Oct 2023, at 6:41 AM, Stephan Kramer <s.kra...@imperial.ac.uk>
> wrote:
>
> On 07/10/2023 06:51, Pierre Jolivet wrote:
>
> Hello Stephan,
> Could you share the Amat/Pmat in binary format of the specific fieldsplit 
> block, as well as all inputs needed to generate the same grid hierarchy 
> (block size, options used, near kernel)?
> Alternatively, have you been able to generate the same error in a plain PETSc 
> example?
>
> I could but unfortunately, as Mark indicated, we only see this on on a
> very large system, run on 1536 cores. The matrix dump appears to be 300G.
> If you want I could try make it available but I imagine it's not the most
> practical thing.
>
>
> It should be OK on my end.
>
> We have tried the one line change you suggested below and it indeed
> prevents the problem - i.e. on the  adams/gamg-fast-filter branch we get
> the "inconsistent data" error with -pc_gamg_low_memory_filter True but not
> if we change that line as suggested
>
>
> OK, then that means the bug is indeed pretty localized.
> Either MatEliminateZeros(), MatDuplicate(), or MatHeaderReplace().
> Hong (Mr.), do you think there is something missing in
> MatEliminateZeros_MPIAIJ()? Maybe a call to MatDisAssemble_MPIAIJ() — I
> have no idea what this function does.
>
> Note that for our uses, we're happy to just not use the low memory filter
> (as is now the default in main), but let us know if we can provide any
> further help
>
>
> I’m not happy with the same function being twice in the library, and
> having an “improved” version only available to a part of the library.
> I’m also not happy with GAMG having tons of MatAIJ-specific code, which
> makes it unusable with other MatType, e.g., we can’t even use MatBAIJ or
> MatSBAIJ whereas PCHYPRE works even though it’s an external package (a good
> use case here would have been to ask you to use a MatBAIJ with bs = 1 to
> incriminate MatEliminateZeros_MPIAIJ() or not, but we can’t).
> But that’s just my opinion.
>
> Thanks,
> Pierre
>
> Thanks
> Stephan
>
> I’m suspecting a bug in MatEliminateZeros(). If you have the chance to, could 
> you please edit src/mat/impls/aij/mpi/mpiaij.c, change the line that looks 
> like:
>     PetscCall(MatFilter(Gmat, filter, PETSC_TRUE, PETSC_TRUE));
> Into:
>     PetscCall(MatFilter(Gmat, filter, PETSC_FALSE, PETSC_TRUE));
> And give that a go? It will be extremely memory-inefficient, but this is just 
> to confirm my intuition.
>
> Thanks,
> Pierre
>
>
> On 6 Oct 2023, at 1:22 AM, Stephan Kramer <s.kra...@imperial.ac.uk> 
> <s.kra...@imperial.ac.uk> wrote:
>
> Great, that seems to fix the issue indeed - i.e. on the branch with the low 
> memory filtering switched off (by default) we no longer see the "inconsistent 
> data" error or hangs, and going back to the square graph aggressive 
> coarsening brings us back the old performance. So we'd be keen to have that 
> branch merged indeed
> Many thanks for your assistance with this
> Stephan
>
> On 05/10/2023 01:11, Mark Adams wrote:
>
> Thanks Stephan,
>
> It looks like the matrix is in a bad/incorrect state and parallel Mat-Mat
> is waiting for messages that were not sent. A bug.
>
> Can you try my branch, which is ready to merge, adams/gamg-fast-filter.
> We added a new filtering method in main that uses low memory but I found it
> was slow, so this branch brings back the old filter code, used by default,
> and keeps the low memory version as an option.
> It is possible this low memory filtering messed up the internals of the Mat
> in some way.
> I hope this is it, but if not we can continue.
>
> This MR also makes square graph the default.
> I have found it does create better aggregates and on GPUs, with Kokkos bug
> fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs)
>
> Mark
>
>
>
>
> On Wed, Oct 4, 2023 at 12:30 AM Stephan Kramer <s.kra...@imperial.ac.uk> 
> <s.kra...@imperial.ac.uk>
> wrote:
>
>
> Hi Mark
>
> Thanks again for re-enabling the square graph aggressive coarsening
> option which seems to have restored performance for most of our cases.
> Unfortunately we do have a remaining issue, which only seems to occur
> for the larger mesh size ("level 7" which has 6,389,890 vertices and we
> normally run on 1536 cpus): we either get a "Petsc has generated
> inconsistent data" error, or a hang - both when constructing the square
> graph matrix. So this is with the new
> -pc_gamg_aggressive_square_graph=true option, without the option there's
> no error but of course we would get back to the worse performance.
>
> Backtrace for the "inconsistent data" error. Note this is actually just
> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge
> of adams/gamg-add-old-coarsening into main - with one unrelated commit
> from firedrake
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: j 8 not equal to expected number of sends 9
> [0]PETSC ERROR: Petsc Development GIT revision:
> v3.4.2-43104-ga3b76b71a1  GIT Date: 2023-09-18 10:26:04 +0100
> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a  
> namedgadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct  4 14:30:41 2023
> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix
> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1
> --with-fortran-bindings=0 --with-zlib --with-c2html=0
> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx
> --with-fc=mpifort --download-hdf5 --download-hypre
> --download-superlu_dist --download-ptscotch --download-suitesparse
> --download-pastix --download-hwloc --download-metis --download-scalapack
> --download-mumps --download-chaco --download-ml
> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441
> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at
> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270
> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867
> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
> [0]PETSC ERROR: #4 MatProductSymbolic() at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
> [0]PETSC ERROR: #7 PCSetUp_GAMG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
> [0]PETSC ERROR: #8 PCSetUp() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
> [0]PETSC ERROR: #9 PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
> [0]PETSC ERROR: #10 PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> [0]PETSC ERROR: #11 KSP_PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> [0]PETSC ERROR: #12 KSPSolve_CG() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
> [0]PETSC ERROR: #13 KSPSolve_Private() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
> [0]PETSC ERROR: #14 KSPSolve() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at
>
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175
> [0]PETSC ERROR: #16 PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> [0]PETSC ERROR: #17 KSP_PCApply() at
> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25
> [0]PETSC ERROR: #19 KSPSolve_Private() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910
> [0]PETSC ERROR: #20 KSPSolve() at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082
> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at
> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49
> [0]PETSC ERROR: #22 SNESSolve() at
> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635
>
> Last -info :pc messages:
>
> [0] <pc:gamg> PCSetUp(): Setting up PC for first time
> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0)
> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536
> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in
> graph (1.588710e+07 1.765233e+06)
> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
> Square Graph on level 1
> [0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes
> [0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_:
> New grid 380144 nodes
> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00
> min=9.015236e-02 PC=jacobi
> [0] <pc:gamg> PCGAMGOptProlongator_AGG():
> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra
> 0.0901524 4.48938
> [0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_:
> Coarse grid reduction from 1536 to 1536 active processes
> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1)
> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes
> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges in
> graph (5.310360e+05 5.353000e+03)
> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_:
> Square Graph on level 2
>
> The hang (on a slightly different model configuration but on the same
> mesh and n/o cores) seems to occur in the same location. If I use gdb to
> attach to the running processes, it seems on some cores it has somehow
> manages to fall out of the pcsetup and is waiting in the first norm
> calculation in the outside CG iteration:
>
> #0  0x000014cce9999119 in
> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from
> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so
> #1  0x000014ccef2c2737 in _coll_ml_allreduce () from
> /apps/hcoll/4.7.3202/lib/libhcoll.so.1
> #2  0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1,
> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>,
> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110)
> at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228
> #3  0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1,
> recvbuf=<optimized out>, count=1, datatype=<optimized out>,
> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at pallreduce.c:113
> #4  0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>,
> type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>)
> at
>
> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168
> #5  VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at
> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39
> #6  0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648,
> val=0x22d) at
> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214
> #7  0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163
> etc.
>
> but with other cores still stuck at:
>
> #0  0x000015375cf41e8a in ucp_worker_progress () from
> /apps/ucx/1.12.0/lib/libucp.so.0
> #1  0x000015377d4bd57b in opal_progress () at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231
> #2  0x000015377d4c3ba5 in ompi_sync_wait_mt
> (sync=sync@entry=0x7ffd6aedf6f0) at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85
> #3  0x000015378bf7cf38 in ompi_request_default_wait_any (count=8,
> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at
>
> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124
> #4  0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0,
> indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86
> #5  0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ
> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884
> #6  0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ
> (C=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071
> #7  0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795
> #8  0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500,
> Gmat1=0x1, Gmat2=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489
> #9  0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500,
> a_Gmat1=0x1, agg_lists=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969
> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645
> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069
> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484
> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply
> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524,
> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082
> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS
> (func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>,
> kwnames=<optimized out>) at ../Objects/descrobject.c:405
> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0,
> nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890,
> tstate=0x23e0020) at ../Include/cpython/abstract.h:114
> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>,
> args=0x83b3218, callable=0x15378f302890) at
> ../Include/cpython/abstract.h:123
> #17 call_function (kwnames=0x0, oparg=<optimized out>,
> pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390,
> tstate=<optimized out>) at ../Python/ceval.c:5867
> #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
> throwflag=<optimized out>) at ../Python/ceval.c:4198
> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080,
> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46
> #20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>,
> locals=<optimized out>, args=<optimized out>, argcount=4,
> kwnames=<optimized out>) at ../Python/ceval.c:5065
> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized
> out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at
> src/petsc4py/PETSc.c:548022
> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500,
> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979
> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487
> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1,
> y=0xc0fe132c) at
> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383
> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at
> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162
>
> Let me know if there is anything further we can try to debug this issue
>
> Kind regards
> Stephan Kramer
>
>
> On 02/09/2023 01:58, Mark Adams wrote:
>
> Fantastic!
>
> I fixed a memory free problem. You should be OK now.
> I am pretty sure you are good but I would like to wait to get any
>
> feedback
>
> from you.
> We should have a release at the end of the month and it would be nice to
> get this into it.
>
> Thanks,
> Mark
>
>
> On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer <s.kra...@imperial.ac.uk> 
> <s.kra...@imperial.ac.uk>
> wrote:
>
>
> Hi Mark
>
> Sorry took a while to report back. We have tried your branch but hit a
> few issues, some of which we're not entirely sure are related.
>
> First switching off minimum degree ordering, and then switching to the
> old version of aggressive coarsening, as you suggested, got us back to
> the coarsening behaviour that we had previously, but then we also
> observed an even further worsening of the iteration count: it had
> previously gone up by 50% already (with the newer main petsc), but now
> was more than double "old" petsc. Took us a while to realize this was
> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi.
> Switching this also back to the old default we get back to very similar
> coarsening levels (see below for more details if it is of interest) and
> iteration counts.
>
> So that's all very good news. However, we were also starting seeing
> memory errors (double free or corruption) when we switched off the
> minimum degree ordering. Because this was at an earlier version of your
> branch we then rebuild, hoping this was just an earlier bug that had
> been fixed, but then we were having MPI-lockup issues. We have now
> figured out the MPI issues are completely unrelated - some combination
> with a newer mpi build and firedrake on our cluster which also occur
> using main branches of everything. So switching back to an older MPI
> build we are hoping to now test your most recent version of
> adams/gamg-add-old-coarsening with these options and see whether the
> memory errors are still there. Will let you know
>
> Best wishes
> Stephan Kramer
>
> Coarsening details with various options for Level 6 of the test case:
>
> In our original setup (using "old" petsc), we had:
>
>             rows=516, cols=516, bs=6
>             rows=12660, cols=12660, bs=6
>             rows=346974, cols=346974, bs=6
>             rows=19169670, cols=19169670, bs=3
>
> Then with the newer main petsc we had
>
>             rows=666, cols=666, bs=6
>             rows=7740, cols=7740, bs=6
>             rows=34902, cols=34902, bs=6
>             rows=736578, cols=736578, bs=6
>             rows=19169670, cols=19169670, bs=3
>
> Then on your branch with minimum_degree_ordering False:
>
>             rows=504, cols=504, bs=6
>             rows=2274, cols=2274, bs=6
>             rows=11010, cols=11010, bs=6
>             rows=35790, cols=35790, bs=6
>             rows=430686, cols=430686, bs=6
>             rows=19169670, cols=19169670, bs=3
>
> And with minimum_degree_ordering False and use_aggressive_square_graph
> True:
>
>             rows=498, cols=498, bs=6
>             rows=12672, cols=12672, bs=6
>             rows=346974, cols=346974, bs=6
>             rows=19169670, cols=19169670, bs=3
>
> So that is indeed pretty much back to what it was before
>
>
>
>
>
>
>
>
> On 31/08/2023 23:40, Mark Adams wrote:
>
> Hi Stephan,
>
> This branch is settling down.  adams/gamg-add-old-coarsening
> <
>
> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
>
> I made the old, not minimum degree, ordering the default but kept the
>
> new
>
> "aggressive" coarsening as the default, so I am hoping that just adding
> "-pc_gamg_use_aggressive_square_graph true" to your regression tests
>
> will
>
> get you back to where you were before.
> Fingers crossed ... let me know if you have any success or not.
>
> Thanks,
> Mark
>
>
> On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <mfad...@lbl.gov> 
> <mfad...@lbl.gov> wrote:
>
>
> Hi Stephan,
>
> I have a branch that you can try: adams/gamg-add-old-coarsening
> <
>
> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening
>
> Things to test:
> * First, verify that nothing unintended changed by reproducing your
>
> bad
>
> results with this branch (the defaults are the same)
> * Try not using the minimum degree ordering that I suggested
> with: -pc_gamg_use_minimum_degree_ordering false
>     -- I am eager to see if that is the main problem.
> * Go back to what I think is the old method:
> -pc_gamg_use_minimum_degree_ordering
> false -pc_gamg_use_aggressive_square_graph true
>
> When we get back to where you were, I would like to try to get modern
> stuff working.
> I did add a -pc_gamg_aggressive_mis_k <2>
> You could to another step of MIS coarsening with
>
> -pc_gamg_aggressive_mis_k
>
> 3
>
> Anyway, lots to look at but, alas, AMG does have a lot of parameters.
>
> Thanks,
> Mark
>
> On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <mfad...@lbl.gov> 
> <mfad...@lbl.gov> wrote:
>
>
> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <
>
> s.kra...@imperial.ac.uk>
>
> wrote:
>
>
> Many thanks for looking into this, Mark
>
> My 3D tests were not that different and I see you lowered the
>
> threshold.
>
> Note, you can set the threshold to zero, but your test is running
>
> so
>
> much
>
> differently than mine there is something else going on.
> Note, the new, bad, coarsening rate of 30:1 is what we tend to
>
> shoot
>
> for
>
> in 3D.
>
> So it is not clear what the problem is.  Some questions:
>
> * do you have a picture of this mesh to show me?
>
> It's just a standard hexahedral cubed sphere mesh with the
>
> refinement
>
> level giving the number of times each of the six sides have been
> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to
>
> 16
>
> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x
>
> 16 =
>
> 98304  hexes. And everything doubles in all 3 dimensions (so 2^3)
>
> going
>
> to the next Level
>
>
> I see, and I assume these are pretty stretched elements.
>
>
>
> * what do you mean by Q1-Q2 elements?
>
> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for
>
> velocity
>
> and (tri)linear for pressure
>
> I guess you could argue we could/should just do good old geometric
> multigrid instead. More generally we do use this solver
>
> configuration
>
> a
>
> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
> adaptive mesh runs - would it be worth to see if we have the same
> performance issues with tetrahedral P2-P1?
>
>
> No, you have a clear reproducer, if not minimal.
> The first coarsening is very different.
>
> I am working on this and I see that I added a heuristic for thin
>
> bodies
>
> where you order the vertices in greedy algorithms with minimum degree
>
> first.
>
> This will tend to pick corners first, edges then faces, etc.
> That may be the problem. I would like to understand it better (see
>
> below).
>
> It would be nice to see if the new and old codes are similar
>
> without
>
> aggressive coarsening.
> This was the intended change of the major change in this time frame
>
> as
>
> you
>
> noticed.
> If these jobs are easy to run, could you check that the old and new
> versions are similar with "-pc_gamg_square_graph  0 ",  ( and you
>
> only
>
> need
>
> one time step).
> All you need to do is check that the first coarse grid has about
>
> the
>
> same
>
> number of equations (large).
>
> Unfortunately we're seeing some memory errors when we use this
>
> option,
>
> and I'm not entirely clear whether we're just running out of memory
>
> and
>
> need to put it on a special queue.
>
> The run with square_graph 0 using new PETSc managed to get through
>
> one
>
> solve at level 5, and is giving the following mg levels:
>
>            rows=174, cols=174, bs=6
>              total: nonzeros=30276, allocated nonzeros=30276
> --
>              rows=2106, cols=2106, bs=6
>              total: nonzeros=4238532, allocated nonzeros=4238532
> --
>              rows=21828, cols=21828, bs=6
>              total: nonzeros=62588232, allocated nonzeros=62588232
> --
>              rows=589824, cols=589824, bs=6
>              total: nonzeros=1082528928, allocated
>
> nonzeros=1082528928
>
> --
>              rows=2433222, cols=2433222, bs=3
>              total: nonzeros=456526098, allocated nonzeros=456526098
>
> comparing with square_graph 100 with new PETSc
>
>              rows=96, cols=96, bs=6
>              total: nonzeros=9216, allocated nonzeros=9216
> --
>              rows=1440, cols=1440, bs=6
>              total: nonzeros=647856, allocated nonzeros=647856
> --
>              rows=97242, cols=97242, bs=6
>              total: nonzeros=65656836, allocated nonzeros=65656836
> --
>              rows=2433222, cols=2433222, bs=3
>              total: nonzeros=456526098, allocated nonzeros=456526098
>
> and old PETSc with square_graph 100
>
>              rows=90, cols=90, bs=6
>              total: nonzeros=8100, allocated nonzeros=8100
> --
>              rows=1872, cols=1872, bs=6
>              total: nonzeros=1234080, allocated nonzeros=1234080
> --
>              rows=47652, cols=47652, bs=6
>              total: nonzeros=23343264, allocated nonzeros=23343264
> --
>              rows=2433222, cols=2433222, bs=3
>              total: nonzeros=456526098, allocated nonzeros=456526098
> --
>
> Unfortunately old PETSc with square_graph 0 did not complete a
>
> single
>
> solve before giving the memory error
>
>
> OK, thanks for trying.
>
> I am working on this and I will give you a branch to test, but if you
>
> can
>
> rebuild PETSc here is a quick test that might fix your problem.
> In src/ksp/pc/impls/gamg/agg.c you will see:
>
>       PetscCall(PetscSortIntWithArray(nloc, degree, permute));
>
> If you can comment this out in the new code and compare with the old,
> that might fix the problem.
>
> Thanks,
> Mark
>
>
>
> BTW, I am starting to think I should add the old method back as an
>
> option.
>
> I did not think this change would cause large differences.
>
> Yes, I think that would be much appreciated. Let us know if we can
>
> do
>
> any testing
>
> Best wishes
> Stephan
>
>
>
> Thanks,
> Mark
>
>
>
>
>
> Note that we are providing the rigid body near nullspace,
> hence the bs=3 to bs=6.
> We have tried different values for the gamg_threshold but it
>
> doesn't
>
> really seem to significantly alter the coarsening amount in that
>
> first
>
> step.
>
> Do you have any suggestions for further things we should try/look
>
> at?
>
> Any feedback would be much appreciated
>
> Best wishes
> Stephan Kramer
>
> Full logs including log_view timings available 
> fromhttps://github.com/stephankramer/petsc-scaling/
>
> In particular:
>
>
>
>
> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dathttps://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dathttps://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dathttps://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dathttps://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dathttps://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat
>
>
>
>

Re: [petsc-dev] [petsc-users] performance regression with GAMG

Reply via email to