On 27 April 2017 at 13:45, Mark Adams <[email protected]> wrote: > Barry, we seem to get an error when you explicitly set this. > > Garth, Maybe to set the default explicitly you need to use pc_type asm > -sub_pc_type lu. That is the true default. > > More below but this is the error message: > > 17:46 knepley/feature-plasma-example *= > ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ > /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne > 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > -mg_coarse_pc_factor_mat_solver_package petsc > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for > possible LU and Cholesky solvers > [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT > Date: 2017-04-25 12:56:20 -0500 > [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by > markadams Wed Apr 26 17:46:28 2017 > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ > COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F > > > On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <[email protected]> wrote: >> >> On 27 April 2017 at 00:30, Barry Smith <[email protected]> wrote: >> > >> > Yes, you asked for LU so it used LU! >> > >> > Of course for smaller coarse grids and large numbers of processes >> > this is very inefficient. >> > >> > The default behavior for GAMG is probably what you want. In that case >> > it is equivalent to >> > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries >> > hard to put all the coarse grid degrees >> > of freedom on the first process and none on the rest, so you do end up >> > with the exact equivalent of a direct solver. >> > Try -ksp_view in that case. >> > >> >> Thanks, Barry. >> >> I'm struggling a little to understand the matrix data structure for >> the coarse grid. Is it just a mpiaji matrix, with all entries >> (usually) on one process? > > > Yes. > >> >> >> Is there an options key prefix for the matrix on different levels? >> E.g., to turn on a viewer? > > > something like -mg_level_1_ksp_view should work (run with -help to get the > correct syntax). >
Does the matrix operator(s) associated with the ksp have an options prefix? >> >> >> If I get GAMG to use more than one process for the coarse grid (a GAMG >> setting), can I get a parallel LU (exact) solver to solve it using >> only the processes that store parts of the coarse grid matrix? > > > No, we should make a sub communicator for the active processes only, but I > am not too motivated to do this because the only reason that this matters is > if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions > everywhere for not good reason, or 2) you use a Krylov solver (very > uncommon). All of the communication in a non-krylov solver in point to point > and there is no win that I know of with a sub communicator. > > Note, the redundant coarse grid solver does use a subcommuncator, obviously, > but I think it is hardwired to PETSC_COMM_SELF, but maybe not? > >> >> >> Related to all this, do the parallel LU solvers internally >> re-distribute a matrix over the whole MPI communicator as part of >> their re-ordering phase? > > > They better not! > I did a test with MUMPS, and from the MUMPS diagnostics (memory use per process) it appears that it does split the matrix across all processes. Garth > I doubt any solver would be that eager by default. > >> >> >> Garth >> >> > There is also -mg_coarse_pc_type redundant >> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse >> > matrix on EACH process and each process does its own factorization and >> > solve. This saves one phase of the communication for each V cycle since >> > every process has the entire solution it just grabs from itself the values >> > it needs without communication. >> > >> > >> > >> > >> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <[email protected]> wrote: >> >> >> >> I'm a bit confused by the selection of the coarse grid solver for >> >> multigrid. For the demo ksp/ex56, if I do: >> >> >> >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> >> >> I see >> >> >> >> Coarse grid solver -- level ------------------------------- >> >> KSP Object: (mg_coarse_) 1 MPI processes >> >> type: preonly >> >> maximum iterations=10000, initial guess is zero >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >> left preconditioning >> >> using NONE norm type for convergence test >> >> PC Object: (mg_coarse_) 1 MPI processes >> >> type: lu >> >> out-of-place factorization >> >> tolerance for zero pivot 2.22045e-14 >> >> matrix ordering: nd >> >> factor fill ratio given 5., needed 1. >> >> Factored matrix follows: >> >> Mat Object: 1 MPI processes >> >> type: seqaij >> >> rows=6, cols=6, bs=6 >> >> package used to perform factorization: petsc >> >> total: nonzeros=36, allocated nonzeros=36 >> >> total number of mallocs used during MatSetValues calls =0 >> >> using I-node routines: found 2 nodes, limit used is 5 >> >> linear system matrix = precond matrix: >> >> Mat Object: 1 MPI processes >> >> type: seqaij >> >> rows=6, cols=6, bs=6 >> >> total: nonzeros=36, allocated nonzeros=36 >> >> total number of mallocs used during MatSetValues calls =0 >> >> using I-node routines: found 2 nodes, limit used is 5 >> >> >> >> which is what I expect. Increasing from 1 to 2 processes: >> >> >> >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> >> >> I see >> >> >> >> Coarse grid solver -- level ------------------------------- >> >> KSP Object: (mg_coarse_) 2 MPI processes >> >> type: preonly >> >> maximum iterations=10000, initial guess is zero >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >> left preconditioning >> >> using NONE norm type for convergence test >> >> PC Object: (mg_coarse_) 2 MPI processes >> >> type: lu >> >> out-of-place factorization >> >> tolerance for zero pivot 2.22045e-14 >> >> matrix ordering: natural >> >> factor fill ratio given 0., needed 0. >> >> Factored matrix follows: >> >> Mat Object: 2 MPI processes >> >> type: superlu_dist >> >> rows=6, cols=6 >> >> package used to perform factorization: superlu_dist >> >> total: nonzeros=0, allocated nonzeros=0 >> >> total number of mallocs used during MatSetValues calls =0 >> >> SuperLU_DIST run parameters: >> >> Process grid nprow 2 x npcol 1 >> >> Equilibrate matrix TRUE >> >> Matrix input mode 1 >> >> Replace tiny pivots FALSE >> >> Use iterative refinement FALSE >> >> Processors in row 2 col partition 1 >> >> Row permutation LargeDiag >> >> Column permutation METIS_AT_PLUS_A >> >> Parallel symbolic factorization FALSE >> >> Repeated factorization SamePattern >> >> linear system matrix = precond matrix: >> >> Mat Object: 2 MPI processes >> >> type: mpiaij >> >> rows=6, cols=6, bs=6 >> >> total: nonzeros=36, allocated nonzeros=36 >> >> total number of mallocs used during MatSetValues calls =0 >> >> using I-node (on process 0) routines: found 2 nodes, limit >> >> used is 5 >> >> >> >> Note that the coarse grid is now using superlu_dist. Is the coarse >> >> grid being solved in parallel? >> >> >> >> Garth >> > > >
