On Wed, Apr 26, 2017 at 7:30 PM, Barry Smith <[email protected]> wrote:
> > Yes, you asked for LU so it used LU! > > Of course for smaller coarse grids and large numbers of processes this > is very inefficient. > > The default behavior for GAMG is probably what you want. In that case > it is equivalent to > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries > hard No, it just slams those puppies onto proc 0 :) > to put all the coarse grid degrees > of freedom on the first process and none on the rest, so you do end up > with the exact equivalent of a direct solver. > Try -ksp_view in that case. > > There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type > lu. In that case it makes a copy of the coarse matrix on EACH process and > each process does its own factorization and solve. This saves one phase of > the communication for each V cycle since every process has the entire > solution it just grabs from itself the values it needs without > communication. > > > > > > On Apr 26, 2017, at 5:25 PM, Garth N. Wells <[email protected]> wrote: > > > > I'm a bit confused by the selection of the coarse grid solver for > > multigrid. For the demo ksp/ex56, if I do: > > > > mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > > -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > > > > I see > > > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 1 MPI processes > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: nd > > factor fill ratio given 5., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=6, cols=6, bs=6 > > package used to perform factorization: petsc > > total: nonzeros=36, allocated nonzeros=36 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 2 nodes, limit used is 5 > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=6, cols=6, bs=6 > > total: nonzeros=36, allocated nonzeros=36 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 2 nodes, limit used is 5 > > > > which is what I expect. Increasing from 1 to 2 processes: > > > > mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > > -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > > > > I see > > > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 2 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 2 MPI processes > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0., needed 0. > > Factored matrix follows: > > Mat Object: 2 MPI processes > > type: superlu_dist > > rows=6, cols=6 > > package used to perform factorization: superlu_dist > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > SuperLU_DIST run parameters: > > Process grid nprow 2 x npcol 1 > > Equilibrate matrix TRUE > > Matrix input mode 1 > > Replace tiny pivots FALSE > > Use iterative refinement FALSE > > Processors in row 2 col partition 1 > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > Parallel symbolic factorization FALSE > > Repeated factorization SamePattern > > linear system matrix = precond matrix: > > Mat Object: 2 MPI processes > > type: mpiaij > > rows=6, cols=6, bs=6 > > total: nonzeros=36, allocated nonzeros=36 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 2 nodes, limit used > is 5 > > > > Note that the coarse grid is now using superlu_dist. Is the coarse > > grid being solved in parallel? > > > > Garth > >
