On Fri, Apr 7, 2017 at 3:52 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> > > On Apr 7, 2017, at 4:46 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > > > > > On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > > > Using Petsc Release Version 3.7.5, unknown > > > > So are you using the release or are you using master branch? > > > > I am working on the maint branch. > > > > I did something two months ago: > > > > git clone -b maint https://urldefense.proofpoint. > com/v2/url?u=https-3A__bitbucket.org_petsc_petsc&d=DwIFAg&c= > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=c92UNplDTVgzFrXIn_ > 70buWa2rXPGUKN083_aJYI0FQ&s=yrulwZxJiduZc-703r7PJOUApPDehsFIkhS0BTrroXc&e= > petsc. > > > > > > I am interested to improve the GAMG performance. > > Why, why not use the best solver for your problem? > I am just curious. I want to understand the potential of interesting preconditioners. > > > Is it possible? It can not beat ASM at all? The multilevel method should > be better than the one-level if the number of processor cores is large. > > The ASM is taking 30 iterations, this is fantastic, it is really going > to be tough to get GAMG to be faster (set up time for GAMG is high). > > What happens to both with 10 times as many processes? 100 times as many? > Did not try many processes yet. Fande, > > > Barry > > > > > Fande, > > > > > > If you use master the ASM will be even faster. > > > > What's new in master? > > > > > > Fande, > > > > > > > > > On Apr 7, 2017, at 4:29 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > > > Thanks, Barry. > > > > > > It works. > > > > > > GAMG is three times better than ASM in terms of the number of linear > iterations, but it is five times slower than ASM. Any suggestions to > improve the performance of GAMG? Log files are attached. > > > > > > Fande, > > > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov> > wrote: > > > > > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > > > > > Thanks, Mark and Barry, > > > > > > > > It works pretty wells in terms of the number of linear iterations > (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. > I am using the two-level method via "-pc_mg_levels 2". The reason why the > compute time is larger than other preconditioning options is that a matrix > free method is used in the fine level and in my particular problem the > function evaluation is expensive. > > > > > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free > Newton, but I do not think I want to make the preconditioning part > matrix-free. Do you guys know how to turn off the matrix-free method for > GAMG? > > > > > > -pc_use_amat false > > > > > > > > > > > Here is the detailed solver: > > > > > > > > SNES Object: 384 MPI processes > > > > type: newtonls > > > > maximum iterations=200, maximum function evaluations=10000 > > > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > > > total number of linear solver iterations=20 > > > > total number of function evaluations=166 > > > > norm schedule ALWAYS > > > > SNESLineSearch Object: 384 MPI processes > > > > type: bt > > > > interpolation: cubic > > > > alpha=1.000000e-04 > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > > maximum iterations=40 > > > > KSP Object: 384 MPI processes > > > > type: gmres > > > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=100, initial guess is zero > > > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > > > right preconditioning > > > > using UNPRECONDITIONED norm type for convergence test > > > > PC Object: 384 MPI processes > > > > type: gamg > > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > > > Cycles per PCApply=1 > > > > Using Galerkin computed coarse grid matrices > > > > GAMG specific options > > > > Threshold for dropping small values from graph 0. > > > > AGG specific options > > > > Symmetric graph true > > > > Coarse grid solver -- level ------------------------------- > > > > KSP Object: (mg_coarse_) 384 MPI processes > > > > type: preonly > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (mg_coarse_) 384 MPI processes > > > > type: bjacobi > > > > block Jacobi: number of blocks = 384 > > > > Local solve is same for all blocks, in the following KSP > and PC objects: > > > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > > > type: preonly > > > > maximum iterations=1, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (mg_coarse_sub_) 1 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > > > > matrix ordering: nd > > > > factor fill ratio given 5., needed 1.31367 > > > > Factored matrix follows: > > > > Mat Object: 1 MPI processes > > > > type: seqaij > > > > rows=37, cols=37 > > > > package used to perform factorization: petsc > > > > total: nonzeros=913, allocated nonzeros=913 > > > > total number of mallocs used during MatSetValues > calls =0 > > > > not using I-node routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 1 MPI processes > > > > type: seqaij > > > > rows=37, cols=37 > > > > total: nonzeros=695, allocated nonzeros=695 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 384 MPI processes > > > > type: mpiaij > > > > rows=18145, cols=18145 > > > > total: nonzeros=1709115, allocated nonzeros=1709115 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > Down solver (pre-smoother) on level 1 > ------------------------------- > > > > KSP Object: (mg_levels_1_) 384 MPI processes > > > > type: chebyshev > > > > Chebyshev: eigenvalue estimates: min = 0.133339, max = > 1.46673 > > > > Chebyshev: eigenvalues estimated using gmres with > translations [0. 0.1; 0. 1.1] > > > > KSP Object: (mg_levels_1_esteig_) 384 > MPI processes > > > > type: gmres > > > > GMRES: restart=30, using Classical (unmodified) > Gram-Schmidt Orthogonalization with no iterative refinement > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=10, initial guess is zero > > > > tolerances: relative=1e-12, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > maximum iterations=2 > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using nonzero initial guess > > > > using NONE norm type for convergence test > > > > PC Object: (mg_levels_1_) 384 MPI processes > > > > type: sor > > > > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 384 MPI processes > > > > type: mffd > > > > rows=3020875, cols=3020875 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 384 MPI processes > > > > type: mpiaij > > > > rows=3020875, cols=3020875 > > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > Up solver (post-smoother) same as down solver (pre-smoother) > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 384 MPI processes > > > > type: mffd > > > > rows=3020875, cols=3020875 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 384 MPI processes > > > > type: mpiaij > > > > rows=3020875, cols=3020875 > > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > > > > > > > > > Fande, > > > > > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfad...@lbl.gov> wrote: > > > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsm...@mcs.anl.gov> > wrote: > > > > > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > > > > > No, it means that for non symmetric nonzero structure you need > the extra flag. So use the extra flag. The reason we don't always use the > flag is because it adds extra cost and isn't needed if the matrix already > has a symmetric nonzero structure. > > > > > > > > BTW, if you have symmetric non-zero structure you can just set > > > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > > > > > If you want to mess with the threshold then you need to use the > > > > symmetrized flag. > > > > > > > > > > > > > <asm.txt><gamg.txt> > > > > > >