Pierre, (moved to dev) It looks like there is a subtle bug in the new MatFilter. My guess is that after the compression/filter the communication buffers and lists need to be recomputed because the graph has changed. And, the Mat-Mat Mults failed or hung because the communication requirements, as seen in the graph, did not match the cached communication lists. The old way just created a whole new matrix, which took care of that.
Mark On Thu, Oct 5, 2023 at 8:51 PM Mark Adams <mfad...@lbl.gov> wrote: > Fantastic, it will get merged soon. > > Thank you for your diligence and patience. > This would have been a time bomb waiting to explode. > > Mark > > On Thu, Oct 5, 2023 at 7:23 PM Stephan Kramer <s.kra...@imperial.ac.uk> > wrote: > >> Great, that seems to fix the issue indeed - i.e. on the branch with the >> low memory filtering switched off (by default) we no longer see the >> "inconsistent data" error or hangs, and going back to the square graph >> aggressive coarsening brings us back the old performance. So we'd be >> keen to have that branch merged indeed >> Many thanks for your assistance with this >> Stephan >> >> On 05/10/2023 01:11, Mark Adams wrote: >> > Thanks Stephan, >> > >> > It looks like the matrix is in a bad/incorrect state and parallel >> Mat-Mat >> > is waiting for messages that were not sent. A bug. >> > >> > Can you try my branch, which is ready to merge, adams/gamg-fast-filter. >> > We added a new filtering method in main that uses low memory but I >> found it >> > was slow, so this branch brings back the old filter code, used by >> default, >> > and keeps the low memory version as an option. >> > It is possible this low memory filtering messed up the internals of the >> Mat >> > in some way. >> > I hope this is it, but if not we can continue. >> > >> > This MR also makes square graph the default. >> > I have found it does create better aggregates and on GPUs, with Kokkos >> bug >> > fixes from Junchao, Mat-Mat is fast. (it might be slow on CPUs) >> > >> > Mark >> > >> > >> > >> > >> > On Wed, Oct 4, 2023 at 12:30 AM Stephan Kramer <s.kra...@imperial.ac.uk >> > >> > wrote: >> > >> >> Hi Mark >> >> >> >> Thanks again for re-enabling the square graph aggressive coarsening >> >> option which seems to have restored performance for most of our cases. >> >> Unfortunately we do have a remaining issue, which only seems to occur >> >> for the larger mesh size ("level 7" which has 6,389,890 vertices and we >> >> normally run on 1536 cpus): we either get a "Petsc has generated >> >> inconsistent data" error, or a hang - both when constructing the square >> >> graph matrix. So this is with the new >> >> -pc_gamg_aggressive_square_graph=true option, without the option >> there's >> >> no error but of course we would get back to the worse performance. >> >> >> >> Backtrace for the "inconsistent data" error. Note this is actually just >> >> petsc main from 17 Sep, git 9a75acf6e50cfe213617e - so after your merge >> >> of adams/gamg-add-old-coarsening into main - with one unrelated commit >> >> from firedrake >> >> >> >> [0]PETSC ERROR: --------------------- Error Message >> >> -------------------------------------------------------------- >> >> [0]PETSC ERROR: Petsc has generated inconsistent data >> >> [0]PETSC ERROR: j 8 not equal to expected number of sends 9 >> >> [0]PETSC ERROR: Petsc Development GIT revision: >> >> v3.4.2-43104-ga3b76b71a1 GIT Date: 2023-09-18 10:26:04 +0100 >> >> [0]PETSC ERROR: stokes_cubed_sphere_7e3_A3_TS1.py on a named >> >> gadi-cpu-clx-0241.gadi.nci.org.au by sck551 Wed Oct 4 14:30:41 2023 >> >> [0]PETSC ERROR: Configure options --prefix=/tmp/firedrake-prefix >> >> --with-make-np=4 --with-debugging=0 --with-shared-libraries=1 >> >> --with-fortran-bindings=0 --with-zlib --with-c2html=0 >> >> --with-mpiexec=mpiexec --with-cc=mpicc --with-cxx=mpicxx >> >> --with-fc=mpifort --download-hdf5 --download-hypre >> >> --download-superlu_dist --download-ptscotch --download-suitesparse >> >> --download-pastix --download-hwloc --download-metis >> --download-scalapack >> >> --download-mumps --download-chaco --download-ml >> >> CFLAGS=-diag-disable=10441 CXXFLAGS=-diag-disable=10441 >> >> [0]PETSC ERROR: #1 PetscGatherMessageLengths2() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/sys/utils/mpimesg.c:270 >> >> [0]PETSC ERROR: #2 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() at >> >> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1867 >> >> [0]PETSC ERROR: #3 MatProductSymbolic_AtB_MPIAIJ_MPIAIJ() at >> >> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 >> >> [0]PETSC ERROR: #4 MatProductSymbolic() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 >> >> [0]PETSC ERROR: #5 PCGAMGSquareGraph_GAMG() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 >> >> [0]PETSC ERROR: #6 PCGAMGCoarsen_AGG() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 >> >> [0]PETSC ERROR: #7 PCSetUp_GAMG() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 >> >> [0]PETSC ERROR: #8 PCSetUp() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 >> >> [0]PETSC ERROR: #9 PCApply() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 >> >> [0]PETSC ERROR: #10 PCApply() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 >> >> [0]PETSC ERROR: #11 KSP_PCApply() at >> >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 >> >> [0]PETSC ERROR: #12 KSPSolve_CG() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 >> >> [0]PETSC ERROR: #13 KSPSolve_Private() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 >> >> [0]PETSC ERROR: #14 KSPSolve() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 >> >> [0]PETSC ERROR: #15 PCApply_FieldSplit_Schur() at >> >> >> >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c:1175 >> >> [0]PETSC ERROR: #16 PCApply() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 >> >> [0]PETSC ERROR: #17 KSP_PCApply() at >> >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 >> >> [0]PETSC ERROR: #18 KSPSolve_PREONLY() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/preonly/preonly.c:25 >> >> [0]PETSC ERROR: #19 KSPSolve_Private() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:910 >> >> [0]PETSC ERROR: #20 KSPSolve() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/interface/itfunc.c:1082 >> >> [0]PETSC ERROR: #21 SNESSolve_KSPONLY() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/snes/impls/ksponly/ksponly.c:49 >> >> [0]PETSC ERROR: #22 SNESSolve() at >> >> /jobfs/95504034.gadi-pbs/petsc/src/snes/interface/snes.c:4635 >> >> >> >> Last -info :pc messages: >> >> >> >> [0] <pc:gamg> PCSetUp(): Setting up PC for first time >> >> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: level 0) >> >> N=152175366, n data rows=3, n data cols=6, nnz/row (ave)=191, np=1536 >> >> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 100. % edges in >> >> graph (1.588710e+07 1.765233e+06) >> >> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: >> >> Square Graph on level 1 >> >> [0] <pc:gamg> fixAggregatesWithSquare(): isMPI = yes >> >> [0] <pc:gamg> PCGAMGProlongator_AGG(): Stokes_fieldsplit_0_assembled_: >> >> New grid 380144 nodes >> >> [0] <pc:gamg> PCGAMGOptProlongator_AGG(): >> >> Stokes_fieldsplit_0_assembled_: Smooth P0: max eigen=4.489376e+00 >> >> min=9.015236e-02 PC=jacobi >> >> [0] <pc:gamg> PCGAMGOptProlongator_AGG(): >> >> Stokes_fieldsplit_0_assembled_: Smooth P0: level 0, cache spectra >> >> 0.0901524 4.48938 >> >> [0] <pc:gamg> PCGAMGCreateLevel_GAMG(): Stokes_fieldsplit_0_assembled_: >> >> Coarse grid reduction from 1536 to 1536 active processes >> >> [0] <pc:gamg> PCSetUp_GAMG(): Stokes_fieldsplit_0_assembled_: 1) >> >> N=2280864, n data cols=6, nnz/row (ave)=503, 1536 active pes >> >> [0] <pc:gamg> PCGAMGCreateGraph_AGG(): Filtering left 36.2891 % edges >> in >> >> graph (5.310360e+05 5.353000e+03) >> >> [0] <pc:gamg> PCGAMGSquareGraph_GAMG(): Stokes_fieldsplit_0_assembled_: >> >> Square Graph on level 2 >> >> >> >> The hang (on a slightly different model configuration but on the same >> >> mesh and n/o cores) seems to occur in the same location. If I use gdb >> to >> >> attach to the running processes, it seems on some cores it has somehow >> >> manages to fall out of the pcsetup and is waiting in the first norm >> >> calculation in the outside CG iteration: >> >> >> >> #0 0x000014cce9999119 in >> >> hmca_bcol_basesmuma_bcast_k_nomial_knownroot_progress () from >> >> /apps/hcoll/4.7.3202/lib/hcoll/hmca_bcol_basesmuma.so >> >> #1 0x000014ccef2c2737 in _coll_ml_allreduce () from >> >> /apps/hcoll/4.7.3202/lib/libhcoll.so.1 >> >> #2 0x000014ccef5dd95b in mca_coll_hcoll_allreduce (sbuf=0x1, >> >> rbuf=0x7fff74ecbee8, count=1, dtype=0x14cd26ce6f80 <ompi_mpi_double>, >> >> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0, module=0x43a0110) >> >> at >> >> >> >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/mca/coll/hcoll/coll_hcoll_ops.c:228 >> >> #3 0x000014cd26a1de28 in PMPI_Allreduce (sendbuf=0x1, >> >> recvbuf=<optimized out>, count=1, datatype=<optimized out>, >> >> op=0x14cd26cfbc20 <ompi_mpi_op_sum>, comm=0x3076fb0) at >> pallreduce.c:113 >> >> #4 0x000014cd271c9889 in VecNorm_MPI_Default (xin=<optimized out>, >> >> type=<optimized out>, z=<optimized out>, VecNorm_SeqFn=<optimized out>) >> >> at >> >> >> >> >> /jobfs/95504034.gadi-pbs/petsc/include/../src/vec/vec/impls/mpi/pvecimpl.h:168 >> >> #5 VecNorm_MPI (xin=0x14ccee1ddb80, type=3924123648, z=0x22d) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/impls/mpi/pvec2.c:39 >> >> #6 0x000014cd2718cddd in VecNorm (x=0x14ccee1ddb80, type=3924123648, >> >> val=0x22d) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/vec/vec/interface/rvector.c:214 >> >> #7 0x000014cd27f5a0b9 in KSPSolve_CG (ksp=0x14ccee1ddb80) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:163 >> >> etc. >> >> >> >> but with other cores still stuck at: >> >> >> >> #0 0x000015375cf41e8a in ucp_worker_progress () from >> >> /apps/ucx/1.12.0/lib/libucp.so.0 >> >> #1 0x000015377d4bd57b in opal_progress () at >> >> >> >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/runtime/opal_progress.c:231 >> >> #2 0x000015377d4c3ba5 in ompi_sync_wait_mt >> >> (sync=sync@entry=0x7ffd6aedf6f0) at >> >> >> >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/opal/threads/wait_sync.c:85 >> >> #3 0x000015378bf7cf38 in ompi_request_default_wait_any (count=8, >> >> requests=0x8d465a0, index=0x7ffd6aedfa60, status=0x7ffd6aedfa10) at >> >> >> >> >> /jobfs/35226956.gadi-pbs/0/openmpi/4.0.7/source/openmpi-4.0.7/ompi/request/req_wait.c:124 >> >> #4 0x000015378bfc1b4b in PMPI_Waitany (count=8, requests=0x8d465a0, >> >> indx=0x7ffd6aedfa60, status=<optimized out>) at pwaitany.c:86 >> >> #5 0x000015378c88ef2c in MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ >> >> (P=0x2cc7500, A=0x1, fill=2.1219957934356005e-314, C=0xc0fe132c) at >> >> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:1884 >> >> #6 0x000015378c88dd4f in MatProductSymbolic_AtB_MPIAIJ_MPIAIJ >> >> (C=0x2cc7500) at >> >> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:2071 >> >> #7 0x000015378cc665b8 in MatProductSymbolic (mat=0x2cc7500) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/mat/interface/matproduct.c:795 >> >> #8 0x000015378d294473 in PCGAMGSquareGraph_GAMG (a_pc=0x2cc7500, >> >> Gmat1=0x1, Gmat2=0xc0fe132c) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:489 >> >> #9 0x000015378d27b83e in PCGAMGCoarsen_AGG (a_pc=0x2cc7500, >> >> a_Gmat1=0x1, agg_lists=0xc0fe132c) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/agg.c:969 >> >> #10 0x000015378d294c73 in PCSetUp_GAMG (pc=0x2cc7500) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/impls/gamg/gamg.c:645 >> >> #11 0x000015378d215721 in PCSetUp (pc=0x2cc7500) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:1069 >> >> #12 0x000015378d216b82 in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) >> at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:484 >> >> #13 0x000015378eb91b2f in __pyx_pw_8petsc4py_5PETSc_2PC_45apply >> >> (__pyx_v_self=0x2cc7500, __pyx_args=0x1, __pyx_nargs=3237876524, >> >> __pyx_kwds=0x1) at src/petsc4py/PETSc.c:259082 >> >> #14 0x000015379e0a69f7 in method_vectorcall_FASTCALL_KEYWORDS >> >> (func=0x15378f302890, args=0x83b3218, nargsf=<optimized out>, >> >> kwnames=<optimized out>) at ../Objects/descrobject.c:405 >> >> #15 0x000015379e11d435 in _PyObject_VectorcallTstate (kwnames=0x0, >> >> nargsf=<optimized out>, args=0x83b3218, callable=0x15378f302890, >> >> tstate=0x23e0020) at ../Include/cpython/abstract.h:114 >> >> #16 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, >> >> args=0x83b3218, callable=0x15378f302890) at >> >> ../Include/cpython/abstract.h:123 >> >> #17 call_function (kwnames=0x0, oparg=<optimized out>, >> >> pp_stack=<synthetic pointer>, trace_info=0x7ffd6aee0390, >> >> tstate=<optimized out>) at ../Python/ceval.c:5867 >> >> #18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized >> out>, >> >> throwflag=<optimized out>) at ../Python/ceval.c:4198 >> >> #19 0x000015379e11b63b in _PyEval_EvalFrame (throwflag=0, f=0x83b3080, >> >> tstate=0x23e0020) at ../Include/internal/pycore_ceval.h:46 >> >> #20 _PyEval_Vector (tstate=<optimized out>, con=<optimized out>, >> >> locals=<optimized out>, args=<optimized out>, argcount=4, >> >> kwnames=<optimized out>) at ../Python/ceval.c:5065 >> >> #21 0x000015378ee1e057 in __Pyx_PyObject_FastCallDict (func=<optimized >> >> out>, args=0x1, _nargs=<optimized out>, kwargs=<optimized out>) at >> >> src/petsc4py/PETSc.c:548022 >> >> #22 __pyx_f_8petsc4py_5PETSc_PCApply_Python (__pyx_v_pc=0x2cc7500, >> >> __pyx_v_x=0x1, __pyx_v_y=0xc0fe132c) at src/petsc4py/PETSc.c:31979 >> >> #23 0x000015378d216cba in PCApply (pc=0x2cc7500, x=0x1, y=0xc0fe132c) >> at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/pc/interface/precon.c:487 >> >> #24 0x000015378d4d153c in KSP_PCApply (ksp=0x2cc7500, x=0x1, >> >> y=0xc0fe132c) at >> >> /jobfs/95504034.gadi-pbs/petsc/include/petsc/private/kspimpl.h:383 >> >> #25 0x000015378d4d1097 in KSPSolve_CG (ksp=0x2cc7500) at >> >> /jobfs/95504034.gadi-pbs/petsc/src/ksp/ksp/impls/cg/cg.c:162 >> >> >> >> Let me know if there is anything further we can try to debug this issue >> >> >> >> Kind regards >> >> Stephan Kramer >> >> >> >> >> >> On 02/09/2023 01:58, Mark Adams wrote: >> >>> Fantastic! >> >>> >> >>> I fixed a memory free problem. You should be OK now. >> >>> I am pretty sure you are good but I would like to wait to get any >> >> feedback >> >>> from you. >> >>> We should have a release at the end of the month and it would be nice >> to >> >>> get this into it. >> >>> >> >>> Thanks, >> >>> Mark >> >>> >> >>> >> >>> On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer < >> s.kra...@imperial.ac.uk> >> >>> wrote: >> >>> >> >>>> Hi Mark >> >>>> >> >>>> Sorry took a while to report back. We have tried your branch but hit >> a >> >>>> few issues, some of which we're not entirely sure are related. >> >>>> >> >>>> First switching off minimum degree ordering, and then switching to >> the >> >>>> old version of aggressive coarsening, as you suggested, got us back >> to >> >>>> the coarsening behaviour that we had previously, but then we also >> >>>> observed an even further worsening of the iteration count: it had >> >>>> previously gone up by 50% already (with the newer main petsc), but >> now >> >>>> was more than double "old" petsc. Took us a while to realize this was >> >>>> due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. >> >>>> Switching this also back to the old default we get back to very >> similar >> >>>> coarsening levels (see below for more details if it is of interest) >> and >> >>>> iteration counts. >> >>>> >> >>>> So that's all very good news. However, we were also starting seeing >> >>>> memory errors (double free or corruption) when we switched off the >> >>>> minimum degree ordering. Because this was at an earlier version of >> your >> >>>> branch we then rebuild, hoping this was just an earlier bug that had >> >>>> been fixed, but then we were having MPI-lockup issues. We have now >> >>>> figured out the MPI issues are completely unrelated - some >> combination >> >>>> with a newer mpi build and firedrake on our cluster which also occur >> >>>> using main branches of everything. So switching back to an older MPI >> >>>> build we are hoping to now test your most recent version of >> >>>> adams/gamg-add-old-coarsening with these options and see whether the >> >>>> memory errors are still there. Will let you know >> >>>> >> >>>> Best wishes >> >>>> Stephan Kramer >> >>>> >> >>>> Coarsening details with various options for Level 6 of the test case: >> >>>> >> >>>> In our original setup (using "old" petsc), we had: >> >>>> >> >>>> rows=516, cols=516, bs=6 >> >>>> rows=12660, cols=12660, bs=6 >> >>>> rows=346974, cols=346974, bs=6 >> >>>> rows=19169670, cols=19169670, bs=3 >> >>>> >> >>>> Then with the newer main petsc we had >> >>>> >> >>>> rows=666, cols=666, bs=6 >> >>>> rows=7740, cols=7740, bs=6 >> >>>> rows=34902, cols=34902, bs=6 >> >>>> rows=736578, cols=736578, bs=6 >> >>>> rows=19169670, cols=19169670, bs=3 >> >>>> >> >>>> Then on your branch with minimum_degree_ordering False: >> >>>> >> >>>> rows=504, cols=504, bs=6 >> >>>> rows=2274, cols=2274, bs=6 >> >>>> rows=11010, cols=11010, bs=6 >> >>>> rows=35790, cols=35790, bs=6 >> >>>> rows=430686, cols=430686, bs=6 >> >>>> rows=19169670, cols=19169670, bs=3 >> >>>> >> >>>> And with minimum_degree_ordering False and >> use_aggressive_square_graph >> >>>> True: >> >>>> >> >>>> rows=498, cols=498, bs=6 >> >>>> rows=12672, cols=12672, bs=6 >> >>>> rows=346974, cols=346974, bs=6 >> >>>> rows=19169670, cols=19169670, bs=3 >> >>>> >> >>>> So that is indeed pretty much back to what it was before >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On 31/08/2023 23:40, Mark Adams wrote: >> >>>>> Hi Stephan, >> >>>>> >> >>>>> This branch is settling down. adams/gamg-add-old-coarsening >> >>>>> < >> >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening >> > >> >>>>> I made the old, not minimum degree, ordering the default but kept >> the >> >> new >> >>>>> "aggressive" coarsening as the default, so I am hoping that just >> adding >> >>>>> "-pc_gamg_use_aggressive_square_graph true" to your regression tests >> >> will >> >>>>> get you back to where you were before. >> >>>>> Fingers crossed ... let me know if you have any success or not. >> >>>>> >> >>>>> Thanks, >> >>>>> Mark >> >>>>> >> >>>>> >> >>>>> On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <mfad...@lbl.gov> wrote: >> >>>>> >> >>>>>> Hi Stephan, >> >>>>>> >> >>>>>> I have a branch that you can try: adams/gamg-add-old-coarsening >> >>>>>> < >> >> https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening >> >>>>>> Things to test: >> >>>>>> * First, verify that nothing unintended changed by reproducing your >> >> bad >> >>>>>> results with this branch (the defaults are the same) >> >>>>>> * Try not using the minimum degree ordering that I suggested >> >>>>>> with: -pc_gamg_use_minimum_degree_ordering false >> >>>>>> -- I am eager to see if that is the main problem. >> >>>>>> * Go back to what I think is the old method: >> >>>>>> -pc_gamg_use_minimum_degree_ordering >> >>>>>> false -pc_gamg_use_aggressive_square_graph true >> >>>>>> >> >>>>>> When we get back to where you were, I would like to try to get >> modern >> >>>>>> stuff working. >> >>>>>> I did add a -pc_gamg_aggressive_mis_k <2> >> >>>>>> You could to another step of MIS coarsening with >> >>>> -pc_gamg_aggressive_mis_k >> >>>>>> 3 >> >>>>>> >> >>>>>> Anyway, lots to look at but, alas, AMG does have a lot of >> parameters. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Mark >> >>>>>> >> >>>>>> On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <mfad...@lbl.gov> >> wrote: >> >>>>>> >> >>>>>>> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer < >> >>>> s.kra...@imperial.ac.uk> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> Many thanks for looking into this, Mark >> >>>>>>>>> My 3D tests were not that different and I see you lowered the >> >>>>>>>> threshold. >> >>>>>>>>> Note, you can set the threshold to zero, but your test is >> running >> >> so >> >>>>>>>> much >> >>>>>>>>> differently than mine there is something else going on. >> >>>>>>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to >> >> shoot >> >>>>>>>> for >> >>>>>>>>> in 3D. >> >>>>>>>>> >> >>>>>>>>> So it is not clear what the problem is. Some questions: >> >>>>>>>>> >> >>>>>>>>> * do you have a picture of this mesh to show me? >> >>>>>>>> It's just a standard hexahedral cubed sphere mesh with the >> >> refinement >> >>>>>>>> level giving the number of times each of the six sides have been >> >>>>>>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded >> to >> >> 16 >> >>>>>>>> layers. So the total number of elements at Level_5 is 6 x 32 x >> 32 x >> >>>> 16 = >> >>>>>>>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) >> >>>> going >> >>>>>>>> to the next Level >> >>>>>>>> >> >>>>>>> I see, and I assume these are pretty stretched elements. >> >>>>>>> >> >>>>>>> >> >>>>>>>>> * what do you mean by Q1-Q2 elements? >> >>>>>>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for >> >> velocity >> >>>>>>>> and (tri)linear for pressure >> >>>>>>>> >> >>>>>>>> I guess you could argue we could/should just do good old >> geometric >> >>>>>>>> multigrid instead. More generally we do use this solver >> >> configuration >> >>>> a >> >>>>>>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for >> our >> >>>>>>>> adaptive mesh runs - would it be worth to see if we have the same >> >>>>>>>> performance issues with tetrahedral P2-P1? >> >>>>>>>> >> >>>>>>> No, you have a clear reproducer, if not minimal. >> >>>>>>> The first coarsening is very different. >> >>>>>>> >> >>>>>>> I am working on this and I see that I added a heuristic for thin >> >> bodies >> >>>>>>> where you order the vertices in greedy algorithms with minimum >> degree >> >>>> first. >> >>>>>>> This will tend to pick corners first, edges then faces, etc. >> >>>>>>> That may be the problem. I would like to understand it better (see >> >>>> below). >> >>>>>>> >> >>>>>>>>> It would be nice to see if the new and old codes are similar >> >> without >> >>>>>>>>> aggressive coarsening. >> >>>>>>>>> This was the intended change of the major change in this time >> frame >> >>>> as >> >>>>>>>> you >> >>>>>>>>> noticed. >> >>>>>>>>> If these jobs are easy to run, could you check that the old and >> new >> >>>>>>>>> versions are similar with "-pc_gamg_square_graph 0 ", ( and >> you >> >>>> only >> >>>>>>>> need >> >>>>>>>>> one time step). >> >>>>>>>>> All you need to do is check that the first coarse grid has about >> >> the >> >>>>>>>> same >> >>>>>>>>> number of equations (large). >> >>>>>>>> Unfortunately we're seeing some memory errors when we use this >> >> option, >> >>>>>>>> and I'm not entirely clear whether we're just running out of >> memory >> >>>> and >> >>>>>>>> need to put it on a special queue. >> >>>>>>>> >> >>>>>>>> The run with square_graph 0 using new PETSc managed to get >> through >> >> one >> >>>>>>>> solve at level 5, and is giving the following mg levels: >> >>>>>>>> >> >>>>>>>> rows=174, cols=174, bs=6 >> >>>>>>>> total: nonzeros=30276, allocated nonzeros=30276 >> >>>>>>>> -- >> >>>>>>>> rows=2106, cols=2106, bs=6 >> >>>>>>>> total: nonzeros=4238532, allocated nonzeros=4238532 >> >>>>>>>> -- >> >>>>>>>> rows=21828, cols=21828, bs=6 >> >>>>>>>> total: nonzeros=62588232, allocated >> nonzeros=62588232 >> >>>>>>>> -- >> >>>>>>>> rows=589824, cols=589824, bs=6 >> >>>>>>>> total: nonzeros=1082528928, allocated >> >> nonzeros=1082528928 >> >>>>>>>> -- >> >>>>>>>> rows=2433222, cols=2433222, bs=3 >> >>>>>>>> total: nonzeros=456526098, allocated >> nonzeros=456526098 >> >>>>>>>> >> >>>>>>>> comparing with square_graph 100 with new PETSc >> >>>>>>>> >> >>>>>>>> rows=96, cols=96, bs=6 >> >>>>>>>> total: nonzeros=9216, allocated nonzeros=9216 >> >>>>>>>> -- >> >>>>>>>> rows=1440, cols=1440, bs=6 >> >>>>>>>> total: nonzeros=647856, allocated nonzeros=647856 >> >>>>>>>> -- >> >>>>>>>> rows=97242, cols=97242, bs=6 >> >>>>>>>> total: nonzeros=65656836, allocated >> nonzeros=65656836 >> >>>>>>>> -- >> >>>>>>>> rows=2433222, cols=2433222, bs=3 >> >>>>>>>> total: nonzeros=456526098, allocated >> nonzeros=456526098 >> >>>>>>>> >> >>>>>>>> and old PETSc with square_graph 100 >> >>>>>>>> >> >>>>>>>> rows=90, cols=90, bs=6 >> >>>>>>>> total: nonzeros=8100, allocated nonzeros=8100 >> >>>>>>>> -- >> >>>>>>>> rows=1872, cols=1872, bs=6 >> >>>>>>>> total: nonzeros=1234080, allocated nonzeros=1234080 >> >>>>>>>> -- >> >>>>>>>> rows=47652, cols=47652, bs=6 >> >>>>>>>> total: nonzeros=23343264, allocated >> nonzeros=23343264 >> >>>>>>>> -- >> >>>>>>>> rows=2433222, cols=2433222, bs=3 >> >>>>>>>> total: nonzeros=456526098, allocated >> nonzeros=456526098 >> >>>>>>>> -- >> >>>>>>>> >> >>>>>>>> Unfortunately old PETSc with square_graph 0 did not complete a >> >> single >> >>>>>>>> solve before giving the memory error >> >>>>>>>> >> >>>>>>> OK, thanks for trying. >> >>>>>>> >> >>>>>>> I am working on this and I will give you a branch to test, but if >> you >> >>>> can >> >>>>>>> rebuild PETSc here is a quick test that might fix your problem. >> >>>>>>> In src/ksp/pc/impls/gamg/agg.c you will see: >> >>>>>>> >> >>>>>>> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); >> >>>>>>> >> >>>>>>> If you can comment this out in the new code and compare with the >> old, >> >>>>>>> that might fix the problem. >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Mark >> >>>>>>> >> >>>>>>> >> >>>>>>>>> BTW, I am starting to think I should add the old method back as >> an >> >>>>>>>> option. >> >>>>>>>>> I did not think this change would cause large differences. >> >>>>>>>> Yes, I think that would be much appreciated. Let us know if we >> can >> >> do >> >>>>>>>> any testing >> >>>>>>>> >> >>>>>>>> Best wishes >> >>>>>>>> Stephan >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> Thanks, >> >>>>>>>>> Mark >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>>> Note that we are providing the rigid body near nullspace, >> >>>>>>>>>> hence the bs=3 to bs=6. >> >>>>>>>>>> We have tried different values for the gamg_threshold but it >> >> doesn't >> >>>>>>>>>> really seem to significantly alter the coarsening amount in >> that >> >>>> first >> >>>>>>>>>> step. >> >>>>>>>>>> >> >>>>>>>>>> Do you have any suggestions for further things we should >> try/look >> >>>> at? >> >>>>>>>>>> Any feedback would be much appreciated >> >>>>>>>>>> >> >>>>>>>>>> Best wishes >> >>>>>>>>>> Stephan Kramer >> >>>>>>>>>> >> >>>>>>>>>> Full logs including log_view timings available from >> >>>>>>>>>> https://github.com/stephankramer/petsc-scaling/ >> >>>>>>>>>> >> >>>>>>>>>> In particular: >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> >> >> >>