Barry, we seem to get an error when you explicitly set this. Garth, Maybe to set the default explicitly you need to use pc_type asm -sub_pc_type lu. That is the true default.
More below but this is the error message: 17:46 knepley/feature-plasma-example *= ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package petsc [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/ linearsolvertable.html for possible LU and Cholesky solvers [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT Date: 2017-04-25 12:56:20 -0500 [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by markadams Wed Apr 26 17:46:28 2017 [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <[email protected]> wrote: > On 27 April 2017 at 00:30, Barry Smith <[email protected]> wrote: > > > > Yes, you asked for LU so it used LU! > > > > Of course for smaller coarse grids and large numbers of processes > this is very inefficient. > > > > The default behavior for GAMG is probably what you want. In that case > it is equivalent to > > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries > hard to put all the coarse grid degrees > > of freedom on the first process and none on the rest, so you do end up > with the exact equivalent of a direct solver. > > Try -ksp_view in that case. > > > > Thanks, Barry. > > I'm struggling a little to understand the matrix data structure for > the coarse grid. Is it just a mpiaji matrix, with all entries > (usually) on one process? > Yes. > > Is there an options key prefix for the matrix on different levels? > E.g., to turn on a viewer? > something like -mg_level_1_ksp_view should work (run with -help to get the correct syntax). > > If I get GAMG to use more than one process for the coarse grid (a GAMG > setting), can I get a parallel LU (exact) solver to solve it using > only the processes that store parts of the coarse grid matrix? > No, we should make a sub communicator for the active processes only, but I am not too motivated to do this because the only reason that this matters is if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions everywhere for not good reason, or 2) you use a Krylov solver (very uncommon). All of the communication in a non-krylov solver in point to point and there is no win that I know of with a sub communicator. Note, the redundant coarse grid solver does use a subcommuncator, obviously, but I think it is hardwired to PETSC_COMM_SELF, but maybe not? > > Related to all this, do the parallel LU solvers internally > re-distribute a matrix over the whole MPI communicator as part of > their re-ordering phase? > They better not! I doubt any solver would be that eager by default. > > Garth > > > There is also -mg_coarse_pc_type redundant > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse > matrix on EACH process and each process does its own factorization and > solve. This saves one phase of the communication for each V cycle since > every process has the entire solution it just grabs from itself the values > it needs without communication. > > > > > > > > > >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <[email protected]> wrote: > >> > >> I'm a bit confused by the selection of the coarse grid solver for > >> multigrid. For the demo ksp/ex56, if I do: > >> > >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> > >> I see > >> > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 1 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 1 MPI processes > >> type: lu > >> out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> matrix ordering: nd > >> factor fill ratio given 5., needed 1. > >> Factored matrix follows: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6, cols=6, bs=6 > >> package used to perform factorization: petsc > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 2 nodes, limit used is 5 > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6, cols=6, bs=6 > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 2 nodes, limit used is 5 > >> > >> which is what I expect. Increasing from 1 to 2 processes: > >> > >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> > >> I see > >> > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 2 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 2 MPI processes > >> type: lu > >> out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> matrix ordering: natural > >> factor fill ratio given 0., needed 0. > >> Factored matrix follows: > >> Mat Object: 2 MPI processes > >> type: superlu_dist > >> rows=6, cols=6 > >> package used to perform factorization: superlu_dist > >> total: nonzeros=0, allocated nonzeros=0 > >> total number of mallocs used during MatSetValues calls =0 > >> SuperLU_DIST run parameters: > >> Process grid nprow 2 x npcol 1 > >> Equilibrate matrix TRUE > >> Matrix input mode 1 > >> Replace tiny pivots FALSE > >> Use iterative refinement FALSE > >> Processors in row 2 col partition 1 > >> Row permutation LargeDiag > >> Column permutation METIS_AT_PLUS_A > >> Parallel symbolic factorization FALSE > >> Repeated factorization SamePattern > >> linear system matrix = precond matrix: > >> Mat Object: 2 MPI processes > >> type: mpiaij > >> rows=6, cols=6, bs=6 > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 2 nodes, limit > used is 5 > >> > >> Note that the coarse grid is now using superlu_dist. Is the coarse > >> grid being solved in parallel? > >> > >> Garth > > >
