> On Apr 27, 2017, at 7:45 AM, Mark Adams <[email protected]> wrote:
> 
> Barry, we seem to get an error when you explicitly set this. 

  Of course you get an error, you are asking PETSc to do a parallel LU; PETSc 
does NOT have a parallel LU as you well know. How could you possibly think this 
would work?

   Note if you have for example superlu_dist or mumps installed and do NOT list 
-mg_coarse_pc_factor_mat_solver_package petsc then it will default to use one 
of the parallel solvers automatically (it just looks through the list of 
installed solvers for that case and picks the first one).



> 
> Garth, Maybe to set the default explicitly you need to use pc_type asm 
> -sub_pc_type lu. That is the true default.
> 
> More below but this is the error message:
> 
> 17:46 knepley/feature-plasma-example *= 
> ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ 
> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne 
> 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu 
> -mg_coarse_pc_factor_mat_solver_package petsc
> [0]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [0]PETSC ERROR: See 
> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for 
> possible LU and Cholesky solvers
> [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da  GIT 
> Date: 2017-04-25 12:56:20 -0500
> [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by 
> markadams Wed Apr 26 17:46:28 2017
> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ 
> COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F
> 
> 
> On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells <[email protected]> wrote:
> On 27 April 2017 at 00:30, Barry Smith <[email protected]> wrote:
> >
> >   Yes, you asked for LU so it used LU!
> >
> >    Of course for smaller coarse grids and large numbers of processes this 
> > is very inefficient.
> >
> >    The default behavior for GAMG is probably what you want. In that case it 
> > is equivalent to
> > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu.  But GAMG tries hard 
> > to put all the coarse grid degrees
> > of freedom on the first process and none on the rest, so you do end up with 
> > the exact equivalent of a direct solver.
> > Try -ksp_view in that case.
> >
> 
> Thanks, Barry.
> 
> I'm struggling a little to understand the matrix data structure for
> the coarse grid. Is it just a mpiaji matrix, with all entries
> (usually) on one process?
> 
> Yes.
>  
> 
> Is there an options key prefix for the matrix on different levels?
> E.g., to turn on a viewer?
> 
> something like -mg_level_1_ksp_view should work (run with -help to get the 
> correct syntax).
>  
> 
> If I get GAMG to use more than one process for the coarse grid (a GAMG
> setting), can I get a parallel LU (exact) solver to solve it using
> only the processes that store parts of the coarse grid matrix?
> 
> No, we should make a sub communicator for the active processes only, but I am 
> not too motivated to do this because the only reason that this matters is if 
> 1) a solver (ie, the parallel direct solver) is lazy and puts reductions 
> everywhere for not good reason, or 2) you use a Krylov solver (very 
> uncommon). All of the communication in a non-krylov solver in point to point 
> and there is no win that I know of with a sub communicator. 
> 
> Note, the redundant coarse grid solver does use a subcommuncator, obviously, 
> but I think it is hardwired to PETSC_COMM_SELF, but maybe not?
>  
> 
> Related to all this, do the parallel LU solvers internally
> re-distribute a matrix over the whole MPI communicator as part of
> their re-ordering phase?
> 
> They better not!
> 
> I doubt any solver would be that eager by default.
>  
> 
> Garth
> 
> >    There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type 
> > lu. In that case it makes a copy of the coarse matrix on EACH process and 
> > each process does its own factorization and solve. This saves one phase of 
> > the communication for each V cycle since every process has the entire 
> > solution it just grabs from itself the values it needs without 
> > communication.
> >
> >
> >
> >
> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells <[email protected]> wrote:
> >>
> >> I'm a bit confused by the selection of the coarse grid solver for
> >> multigrid. For the demo ksp/ex56, if I do:
> >>
> >>    mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg
> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> >>
> >> I see
> >>
> >>  Coarse grid solver -- level -------------------------------
> >>    KSP Object: (mg_coarse_) 1 MPI processes
> >>      type: preonly
> >>      maximum iterations=10000, initial guess is zero
> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>      left preconditioning
> >>      using NONE norm type for convergence test
> >>    PC Object: (mg_coarse_) 1 MPI processes
> >>      type: lu
> >>        out-of-place factorization
> >>        tolerance for zero pivot 2.22045e-14
> >>        matrix ordering: nd
> >>        factor fill ratio given 5., needed 1.
> >>          Factored matrix follows:
> >>            Mat Object: 1 MPI processes
> >>              type: seqaij
> >>              rows=6, cols=6, bs=6
> >>              package used to perform factorization: petsc
> >>              total: nonzeros=36, allocated nonzeros=36
> >>              total number of mallocs used during MatSetValues calls =0
> >>                using I-node routines: found 2 nodes, limit used is 5
> >>      linear system matrix = precond matrix:
> >>      Mat Object: 1 MPI processes
> >>        type: seqaij
> >>        rows=6, cols=6, bs=6
> >>        total: nonzeros=36, allocated nonzeros=36
> >>        total number of mallocs used during MatSetValues calls =0
> >>          using I-node routines: found 2 nodes, limit used is 5
> >>
> >> which is what I expect. Increasing from 1 to 2 processes:
> >>
> >>    mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg
> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu
> >>
> >> I see
> >>
> >>  Coarse grid solver -- level -------------------------------
> >>    KSP Object: (mg_coarse_) 2 MPI processes
> >>      type: preonly
> >>      maximum iterations=10000, initial guess is zero
> >>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> >>      left preconditioning
> >>      using NONE norm type for convergence test
> >>    PC Object: (mg_coarse_) 2 MPI processes
> >>      type: lu
> >>        out-of-place factorization
> >>        tolerance for zero pivot 2.22045e-14
> >>        matrix ordering: natural
> >>        factor fill ratio given 0., needed 0.
> >>          Factored matrix follows:
> >>            Mat Object: 2 MPI processes
> >>              type: superlu_dist
> >>              rows=6, cols=6
> >>              package used to perform factorization: superlu_dist
> >>              total: nonzeros=0, allocated nonzeros=0
> >>              total number of mallocs used during MatSetValues calls =0
> >>                SuperLU_DIST run parameters:
> >>                  Process grid nprow 2 x npcol 1
> >>                  Equilibrate matrix TRUE
> >>                  Matrix input mode 1
> >>                  Replace tiny pivots FALSE
> >>                  Use iterative refinement FALSE
> >>                  Processors in row 2 col partition 1
> >>                  Row permutation LargeDiag
> >>                  Column permutation METIS_AT_PLUS_A
> >>                  Parallel symbolic factorization FALSE
> >>                  Repeated factorization SamePattern
> >>      linear system matrix = precond matrix:
> >>      Mat Object: 2 MPI processes
> >>        type: mpiaij
> >>        rows=6, cols=6, bs=6
> >>        total: nonzeros=36, allocated nonzeros=36
> >>        total number of mallocs used during MatSetValues calls =0
> >>          using I-node (on process 0) routines: found 2 nodes, limit used 
> >> is 5
> >>
> >> Note that the coarse grid is now using superlu_dist. Is the coarse
> >> grid being solved in parallel?
> >>
> >> Garth
> >
> 

Reply via email to