Hi Mark, Thanks for your reply.
On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams <mfad...@lbl.gov> wrote: > The problem comes from setting the number of MG levels (-pc_mg_levels 2). > Not your fault, it looks like the GAMG logic is faulty, in your version at > least. > What I want is that GAMG coarsens the fine matrix once and then stops doing anything. I did not see any benefits to have more levels if the number of processors is small. > > GAMG will force the coarsest grid to one processor by default, in newer > versions. You can override the default with: > > -pc_gamg_use_parallel_coarse_grid_solver > > Your coarse grid solver is ASM with these 37 equation per process and 512 > processes. That is bad. > Why this is bad? The subdomain problem is too small? > Note, you could run this on one process to see the proper convergence > rate. > Convergence rate for which part? coarse solver, subdomain solver? > You can fix this with parameters: > > > -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations > per process on coarse grids (PCGAMGSetProcEqLim) > > -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the > coarse grid (PCGAMGSetCoarseEqLim) > > If you really want two levels then set something like > -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145 > (or higher). > May have something like: make the coarse problem 1/8 large as the original problem? Otherwise, this number is just problem dependent. > You can run with -info and grep on GAMG and you will meta-data for each > level. you should see "npe=1" for the coarsest, last, grid. Or use a > parallel direct solver. > I will try. > > Note, you should not see much degradation as you increase the number of > levels. 18145 eqs on a 3D problem will probably be noticeable. I generally > aim for about 3000. > It should be fine as long as the coarse problem is solved by a parallel solver. Fande, > > > On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande <fande.k...@inl.gov> wrote: > >> >> >> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams <mfad...@lbl.gov> wrote: >> >>> You seem to have two levels here and 3M eqs on the fine grid and 37 on >>> the coarse grid. >> >> >> 37 is on the sub domain. >> >> rows=18145, cols=18145 on the entire coarse grid. >> >> >> >> >> >>> I don't understand that. >>> >>> You are also calling the AMG setup a lot, but not spending much time >>> in it. Try running with -info and grep on "GAMG". >>> >>> >>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande <fande.k...@inl.gov> wrote: >>> > Thanks, Barry. >>> > >>> > It works. >>> > >>> > GAMG is three times better than ASM in terms of the number of linear >>> > iterations, but it is five times slower than ASM. Any suggestions to >>> improve >>> > the performance of GAMG? Log files are attached. >>> > >>> > Fande, >>> > >>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov> >>> wrote: >>> >> >>> >> >>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.k...@inl.gov> wrote: >>> >> > >>> >> > Thanks, Mark and Barry, >>> >> > >>> >> > It works pretty wells in terms of the number of linear iterations >>> (using >>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. >>> I am >>> >> > using the two-level method via "-pc_mg_levels 2". The reason why >>> the compute >>> >> > time is larger than other preconditioning options is that a matrix >>> free >>> >> > method is used in the fine level and in my particular problem the >>> function >>> >> > evaluation is expensive. >>> >> > >>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free >>> Newton, >>> >> > but I do not think I want to make the preconditioning part >>> matrix-free. Do >>> >> > you guys know how to turn off the matrix-free method for GAMG? >>> >> >>> >> -pc_use_amat false >>> >> >>> >> > >>> >> > Here is the detailed solver: >>> >> > >>> >> > SNES Object: 384 MPI processes >>> >> > type: newtonls >>> >> > maximum iterations=200, maximum function evaluations=10000 >>> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >>> >> > total number of linear solver iterations=20 >>> >> > total number of function evaluations=166 >>> >> > norm schedule ALWAYS >>> >> > SNESLineSearch Object: 384 MPI processes >>> >> > type: bt >>> >> > interpolation: cubic >>> >> > alpha=1.000000e-04 >>> >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >>> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >>> >> > lambda=1.000000e-08 >>> >> > maximum iterations=40 >>> >> > KSP Object: 384 MPI processes >>> >> > type: gmres >>> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >>> >> > Orthogonalization with no iterative refinement >>> >> > GMRES: happy breakdown tolerance 1e-30 >>> >> > maximum iterations=100, initial guess is zero >>> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >>> >> > right preconditioning >>> >> > using UNPRECONDITIONED norm type for convergence test >>> >> > PC Object: 384 MPI processes >>> >> > type: gamg >>> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >>> >> > Cycles per PCApply=1 >>> >> > Using Galerkin computed coarse grid matrices >>> >> > GAMG specific options >>> >> > Threshold for dropping small values from graph 0. >>> >> > AGG specific options >>> >> > Symmetric graph true >>> >> > Coarse grid solver -- level ------------------------------- >>> >> > KSP Object: (mg_coarse_) 384 MPI processes >>> >> > type: preonly >>> >> > maximum iterations=10000, initial guess is zero >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_coarse_) 384 MPI processes >>> >> > type: bjacobi >>> >> > block Jacobi: number of blocks = 384 >>> >> > Local solve is same for all blocks, in the following KSP >>> and >>> >> > PC objects: >>> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >>> >> > type: preonly >>> >> > maximum iterations=1, initial guess is zero >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_coarse_sub_) 1 MPI processes >>> >> > type: lu >>> >> > LU: out-of-place factorization >>> >> > tolerance for zero pivot 2.22045e-14 >>> >> > using diagonal shift on blocks to prevent zero pivot >>> >> > [INBLOCKS] >>> >> > matrix ordering: nd >>> >> > factor fill ratio given 5., needed 1.31367 >>> >> > Factored matrix follows: >>> >> > Mat Object: 1 MPI processes >>> >> > type: seqaij >>> >> > rows=37, cols=37 >>> >> > package used to perform factorization: petsc >>> >> > total: nonzeros=913, allocated nonzeros=913 >>> >> > total number of mallocs used during MatSetValues >>> calls >>> >> > =0 >>> >> > not using I-node routines >>> >> > linear system matrix = precond matrix: >>> >> > Mat Object: 1 MPI processes >>> >> > type: seqaij >>> >> > rows=37, cols=37 >>> >> > total: nonzeros=695, allocated nonzeros=695 >>> >> > total number of mallocs used during MatSetValues calls >>> =0 >>> >> > not using I-node routines >>> >> > linear system matrix = precond matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=18145, cols=18145 >>> >> > total: nonzeros=1709115, allocated nonzeros=1709115 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > Down solver (pre-smoother) on level 1 >>> >> > ------------------------------- >>> >> > KSP Object: (mg_levels_1_) 384 MPI processes >>> >> > type: chebyshev >>> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >>> >> > 1.46673 >>> >> > Chebyshev: eigenvalues estimated using gmres with >>> translations >>> >> > [0. 0.1; 0. 1.1] >>> >> > KSP Object: (mg_levels_1_esteig_) 384 >>> MPI >>> >> > processes >>> >> > type: gmres >>> >> > GMRES: restart=30, using Classical (unmodified) >>> >> > Gram-Schmidt Orthogonalization with no iterative refinement >>> >> > GMRES: happy breakdown tolerance 1e-30 >>> >> > maximum iterations=10, initial guess is zero >>> >> > tolerances: relative=1e-12, absolute=1e-50, >>> >> > divergence=10000. >>> >> > left preconditioning >>> >> > using PRECONDITIONED norm type for convergence test >>> >> > maximum iterations=2 >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using nonzero initial guess >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_levels_1_) 384 MPI processes >>> >> > type: sor >>> >> > SOR: type = local_symmetric, iterations = 1, local >>> iterations >>> >> > = 1, omega = 1. >>> >> > linear system matrix followed by preconditioner matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mffd >>> >> > rows=3020875, cols=3020875 >>> >> > Matrix-free approximation: >>> >> > err=1.49012e-08 (relative error in function >>> evaluation) >>> >> > Using wp compute h routine >>> >> > Does not compute normU >>> >> > Mat Object: () 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=3020875, cols=3020875 >>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > Up solver (post-smoother) same as down solver (pre-smoother) >>> >> > linear system matrix followed by preconditioner matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mffd >>> >> > rows=3020875, cols=3020875 >>> >> > Matrix-free approximation: >>> >> > err=1.49012e-08 (relative error in function evaluation) >>> >> > Using wp compute h routine >>> >> > Does not compute normU >>> >> > Mat Object: () 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=3020875, cols=3020875 >>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > >>> >> > >>> >> > Fande, >>> >> > >>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfad...@lbl.gov> wrote: >>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsm...@mcs.anl.gov> >>> wrote: >>> >> > > >>> >> > >> Does this mean that GAMG works for the symmetrical matrix only? >>> >> > > >>> >> > > No, it means that for non symmetric nonzero structure you need >>> the >>> >> > > extra flag. So use the extra flag. The reason we don't always use >>> the flag >>> >> > > is because it adds extra cost and isn't needed if the matrix >>> already has a >>> >> > > symmetric nonzero structure. >>> >> > >>> >> > BTW, if you have symmetric non-zero structure you can just set >>> >> > -pc_gamg_threshold -1.0', note the "or" in the message. >>> >> > >>> >> > If you want to mess with the threshold then you need to use the >>> >> > symmetrized flag. >>> >> > >>> >> >>> > >>> >> >> >