> On Apr 27, 2017, at 8:27 AM, Garth N. Wells <[email protected]> wrote: > > On 27 April 2017 at 13:45, Mark Adams <[email protected]> wrote: >> Barry, we seem to get an error when you explicitly set this. >> >> Garth, Maybe to set the default explicitly you need to use pc_type asm >> -sub_pc_type lu. That is the true default. >> >> More below but this is the error message: >> >> 17:46 knepley/feature-plasma-example *= >> ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ >> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne >> 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> -mg_coarse_pc_factor_mat_solver_package petsc >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: See >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for >> possible LU and Cholesky solvers >> [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT >> Date: 2017-04-25 12:56:20 -0500 >> [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by >> markadams Wed Apr 26 17:46:28 2017 >> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >> COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F >> >> >> On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <[email protected]> wrote: >>> >>> On 27 April 2017 at 00:30, Barry Smith <[email protected]> wrote: >>>> >>>> Yes, you asked for LU so it used LU! >>>> >>>> Of course for smaller coarse grids and large numbers of processes >>>> this is very inefficient. >>>> >>>> The default behavior for GAMG is probably what you want. In that case >>>> it is equivalent to >>>> -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries >>>> hard to put all the coarse grid degrees >>>> of freedom on the first process and none on the rest, so you do end up >>>> with the exact equivalent of a direct solver. >>>> Try -ksp_view in that case. >>>> >>> >>> Thanks, Barry. >>> >>> I'm struggling a little to understand the matrix data structure for >>> the coarse grid. Is it just a mpiaji matrix, with all entries >>> (usually) on one process? >> >> >> Yes. >> >>> >>> >>> Is there an options key prefix for the matrix on different levels? >>> E.g., to turn on a viewer? >> >> >> something like -mg_level_1_ksp_view should work (run with -help to get the >> correct syntax). >> > > Does the matrix operator(s) associated with the ksp have an options prefix?
No, because the matrices are created independent of the KSP/PC infrastructure. You can use -mg_coarse_ksp_view_pmat to print the matrix for just the coarse level; and do things like -mg_coarse_ksp_view_pmat ::ascii_info to display information about the matrix; > >>> >>> >>> If I get GAMG to use more than one process for the coarse grid (a GAMG >>> setting), can I get a parallel LU (exact) solver to solve it using >>> only the processes that store parts of the coarse grid matrix? >> >> >> No, we should make a sub communicator for the active processes only, but I >> am not too motivated to do this because the only reason that this matters is >> if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions >> everywhere for not good reason, or 2) you use a Krylov solver (very >> uncommon). All of the communication in a non-krylov solver in point to point >> and there is no win that I know of with a sub communicator. >> >> Note, the redundant coarse grid solver does use a subcommuncator, obviously, >> but I think it is hardwired to PETSC_COMM_SELF, but maybe not? >> >>> >>> >>> Related to all this, do the parallel LU solvers internally >>> re-distribute a matrix over the whole MPI communicator as part of >>> their re-ordering phase? >> >> >> They better not! >> > > I did a test with MUMPS, and from the MUMPS diagnostics (memory use > per process) it appears that it does split the matrix across all > processes. > > Garth > >> I doubt any solver would be that eager by default. >> >>> >>> >>> Garth >>> >>>> There is also -mg_coarse_pc_type redundant >>>> -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse >>>> matrix on EACH process and each process does its own factorization and >>>> solve. This saves one phase of the communication for each V cycle since >>>> every process has the entire solution it just grabs from itself the values >>>> it needs without communication. >>>> >>>> >>>> >>>> >>>>> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <[email protected]> wrote: >>>>> >>>>> I'm a bit confused by the selection of the coarse grid solver for >>>>> multigrid. For the demo ksp/ex56, if I do: >>>>> >>>>> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >>>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >>>>> >>>>> I see >>>>> >>>>> Coarse grid solver -- level ------------------------------- >>>>> KSP Object: (mg_coarse_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_coarse_) 1 MPI processes >>>>> type: lu >>>>> out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: nd >>>>> factor fill ratio given 5., needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=6, cols=6, bs=6 >>>>> package used to perform factorization: petsc >>>>> total: nonzeros=36, allocated nonzeros=36 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node routines: found 2 nodes, limit used is 5 >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=6, cols=6, bs=6 >>>>> total: nonzeros=36, allocated nonzeros=36 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node routines: found 2 nodes, limit used is 5 >>>>> >>>>> which is what I expect. Increasing from 1 to 2 processes: >>>>> >>>>> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >>>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >>>>> >>>>> I see >>>>> >>>>> Coarse grid solver -- level ------------------------------- >>>>> KSP Object: (mg_coarse_) 2 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_coarse_) 2 MPI processes >>>>> type: lu >>>>> out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: natural >>>>> factor fill ratio given 0., needed 0. >>>>> Factored matrix follows: >>>>> Mat Object: 2 MPI processes >>>>> type: superlu_dist >>>>> rows=6, cols=6 >>>>> package used to perform factorization: superlu_dist >>>>> total: nonzeros=0, allocated nonzeros=0 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> SuperLU_DIST run parameters: >>>>> Process grid nprow 2 x npcol 1 >>>>> Equilibrate matrix TRUE >>>>> Matrix input mode 1 >>>>> Replace tiny pivots FALSE >>>>> Use iterative refinement FALSE >>>>> Processors in row 2 col partition 1 >>>>> Row permutation LargeDiag >>>>> Column permutation METIS_AT_PLUS_A >>>>> Parallel symbolic factorization FALSE >>>>> Repeated factorization SamePattern >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=6, cols=6, bs=6 >>>>> total: nonzeros=36, allocated nonzeros=36 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node (on process 0) routines: found 2 nodes, limit >>>>> used is 5 >>>>> >>>>> Note that the coarse grid is now using superlu_dist. Is the coarse >>>>> grid being solved in parallel? >>>>> >>>>> Garth >>>> >> >>
