Re: [petsc-users] GAMG for the unsymmetrical matrix

Kong, Fande Fri, 07 Apr 2017 15:03:49 -0700

On Fri, Apr 7, 2017 at 3:52 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:


>
> > On Apr 7, 2017, at 4:46 PM, Kong, Fande <fande.k...@inl.gov> wrote:
> >
> >
> >
> > On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> >
> >   Using Petsc Release Version 3.7.5, unknown
> >
> >    So are you using the release or are you using master branch?
> >
> > I am working on the maint branch.
> >
> > I did something two months ago:
> >
> >  git clone -b maint https://urldefense.proofpoint.
> com/v2/url?u=https-3A__bitbucket.org_petsc_petsc&d=DwIFAg&c=
> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_
> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=c92UNplDTVgzFrXIn_
> 70buWa2rXPGUKN083_aJYI0FQ&s=yrulwZxJiduZc-703r7PJOUApPDehsFIkhS0BTrroXc&e=
> petsc.
> >
> >
> > I am interested to improve the GAMG performance.
>
>   Why, why not use the best solver for your problem?
>

I am just curious. I want to understand the potential of interesting
preconditioners.



>
> > Is it possible? It can not beat ASM at all? The multilevel method should
> be better than the one-level if the number of processor cores is large.
>
>    The ASM is taking 30 iterations, this is fantastic, it is really going
> to be tough to get GAMG to be faster (set up time for GAMG is high).
>
>    What happens to both with 10 times as many processes? 100 times as many?
>


Did not try many processes yet.

Fande,



>
>
>    Barry
>
> >
> > Fande,
> >
> >
> >    If you use master the ASM will be even faster.
> >
> > What's new in master?
> >
> >
> > Fande,
> >
> >
> >
> > > On Apr 7, 2017, at 4:29 PM, Kong, Fande <fande.k...@inl.gov> wrote:
> > >
> > > Thanks, Barry.
> > >
> > > It works.
> > >
> > > GAMG is three times better than ASM in terms of the number of linear
> iterations, but it is five times slower than ASM. Any suggestions to
> improve the performance of GAMG? Log files are attached.
> > >
> > > Fande,
> > >
> > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith <bsm...@mcs.anl.gov>
> wrote:
> > >
> > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande <fande.k...@inl.gov> wrote:
> > > >
> > > > Thanks, Mark and Barry,
> > > >
> > > > It works pretty wells in terms of the number of linear iterations
> (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
> I am using the two-level method via "-pc_mg_levels 2". The reason why the
> compute time is larger than other preconditioning options is that a matrix
> free method is used in the fine level and in my particular problem the
> function evaluation is expensive.
> > > >
> > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
> Newton, but I do not think I want to make the preconditioning part
> matrix-free.  Do you guys know how to turn off the matrix-free method for
> GAMG?
> > >
> > >    -pc_use_amat false
> > >
> > > >
> > > > Here is the detailed solver:
> > > >
> > > > SNES Object: 384 MPI processes
> > > >   type: newtonls
> > > >   maximum iterations=200, maximum function evaluations=10000
> > > >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> > > >   total number of linear solver iterations=20
> > > >   total number of function evaluations=166
> > > >   norm schedule ALWAYS
> > > >   SNESLineSearch Object:   384 MPI processes
> > > >     type: bt
> > > >       interpolation: cubic
> > > >       alpha=1.000000e-04
> > > >     maxstep=1.000000e+08, minlambda=1.000000e-12
> > > >     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
> lambda=1.000000e-08
> > > >     maximum iterations=40
> > > >   KSP Object:   384 MPI processes
> > > >     type: gmres
> > > >       GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > > >       GMRES: happy breakdown tolerance 1e-30
> > > >     maximum iterations=100, initial guess is zero
> > > >     tolerances:  relative=0.001, absolute=1e-50, divergence=10000.
> > > >     right preconditioning
> > > >     using UNPRECONDITIONED norm type for convergence test
> > > >   PC Object:   384 MPI processes
> > > >     type: gamg
> > > >       MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > > >         Cycles per PCApply=1
> > > >         Using Galerkin computed coarse grid matrices
> > > >         GAMG specific options
> > > >           Threshold for dropping small values from graph 0.
> > > >           AGG specific options
> > > >             Symmetric graph true
> > > >     Coarse grid solver -- level -------------------------------
> > > >       KSP Object:      (mg_coarse_)       384 MPI processes
> > > >         type: preonly
> > > >         maximum iterations=10000, initial guess is zero
> > > >         tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
> > > >         left preconditioning
> > > >         using NONE norm type for convergence test
> > > >       PC Object:      (mg_coarse_)       384 MPI processes
> > > >         type: bjacobi
> > > >           block Jacobi: number of blocks = 384
> > > >           Local solve is same for all blocks, in the following KSP
> and PC objects:
> > > >         KSP Object:        (mg_coarse_sub_)         1 MPI processes
> > > >           type: preonly
> > > >           maximum iterations=1, initial guess is zero
> > > >           tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
> > > >           left preconditioning
> > > >           using NONE norm type for convergence test
> > > >         PC Object:        (mg_coarse_sub_)         1 MPI processes
> > > >           type: lu
> > > >             LU: out-of-place factorization
> > > >             tolerance for zero pivot 2.22045e-14
> > > >             using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> > > >             matrix ordering: nd
> > > >             factor fill ratio given 5., needed 1.31367
> > > >               Factored matrix follows:
> > > >                 Mat Object:                 1 MPI processes
> > > >                   type: seqaij
> > > >                   rows=37, cols=37
> > > >                   package used to perform factorization: petsc
> > > >                   total: nonzeros=913, allocated nonzeros=913
> > > >                   total number of mallocs used during MatSetValues
> calls =0
> > > >                     not using I-node routines
> > > >           linear system matrix = precond matrix:
> > > >           Mat Object:           1 MPI processes
> > > >             type: seqaij
> > > >             rows=37, cols=37
> > > >             total: nonzeros=695, allocated nonzeros=695
> > > >             total number of mallocs used during MatSetValues calls =0
> > > >               not using I-node routines
> > > >         linear system matrix = precond matrix:
> > > >         Mat Object:         384 MPI processes
> > > >           type: mpiaij
> > > >           rows=18145, cols=18145
> > > >           total: nonzeros=1709115, allocated nonzeros=1709115
> > > >           total number of mallocs used during MatSetValues calls =0
> > > >             not using I-node (on process 0) routines
> > > >     Down solver (pre-smoother) on level 1
> -------------------------------
> > > >       KSP Object:      (mg_levels_1_)       384 MPI processes
> > > >         type: chebyshev
> > > >           Chebyshev: eigenvalue estimates:  min = 0.133339, max =
> 1.46673
> > > >           Chebyshev: eigenvalues estimated using gmres with
> translations  [0. 0.1; 0. 1.1]
> > > >           KSP Object:          (mg_levels_1_esteig_)           384
> MPI processes
> > > >             type: gmres
> > > >               GMRES: restart=30, using Classical (unmodified)
> Gram-Schmidt Orthogonalization with no iterative refinement
> > > >               GMRES: happy breakdown tolerance 1e-30
> > > >             maximum iterations=10, initial guess is zero
> > > >             tolerances:  relative=1e-12, absolute=1e-50,
> divergence=10000.
> > > >             left preconditioning
> > > >             using PRECONDITIONED norm type for convergence test
> > > >         maximum iterations=2
> > > >         tolerances:  relative=1e-05, absolute=1e-50,
> divergence=10000.
> > > >         left preconditioning
> > > >         using nonzero initial guess
> > > >         using NONE norm type for convergence test
> > > >       PC Object:      (mg_levels_1_)       384 MPI processes
> > > >         type: sor
> > > >           SOR: type = local_symmetric, iterations = 1, local
> iterations = 1, omega = 1.
> > > >         linear system matrix followed by preconditioner matrix:
> > > >         Mat Object:         384 MPI processes
> > > >           type: mffd
> > > >           rows=3020875, cols=3020875
> > > >             Matrix-free approximation:
> > > >               err=1.49012e-08 (relative error in function evaluation)
> > > >               Using wp compute h routine
> > > >                   Does not compute normU
> > > >         Mat Object:        ()         384 MPI processes
> > > >           type: mpiaij
> > > >           rows=3020875, cols=3020875
> > > >           total: nonzeros=215671710, allocated nonzeros=241731750
> > > >           total number of mallocs used during MatSetValues calls =0
> > > >             not using I-node (on process 0) routines
> > > >     Up solver (post-smoother) same as down solver (pre-smoother)
> > > >     linear system matrix followed by preconditioner matrix:
> > > >     Mat Object:     384 MPI processes
> > > >       type: mffd
> > > >       rows=3020875, cols=3020875
> > > >         Matrix-free approximation:
> > > >           err=1.49012e-08 (relative error in function evaluation)
> > > >           Using wp compute h routine
> > > >               Does not compute normU
> > > >     Mat Object:    ()     384 MPI processes
> > > >       type: mpiaij
> > > >       rows=3020875, cols=3020875
> > > >       total: nonzeros=215671710, allocated nonzeros=241731750
> > > >       total number of mallocs used during MatSetValues calls =0
> > > >         not using I-node (on process 0) routines
> > > >
> > > >
> > > > Fande,
> > > >
> > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams <mfad...@lbl.gov> wrote:
> > > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith <bsm...@mcs.anl.gov>
> wrote:
> > > > >
> > > > >> Does this mean that GAMG works for the symmetrical matrix only?
> > > > >
> > > > >   No, it means that for non symmetric nonzero structure you need
> the extra flag. So use the extra flag. The reason we don't always use the
> flag is because it adds extra cost and isn't needed if the matrix already
> has a symmetric nonzero structure.
> > > >
> > > > BTW, if you have symmetric non-zero structure you can just set
> > > > -pc_gamg_threshold -1.0', note the "or" in the message.
> > > >
> > > > If you want to mess with the threshold then you need to use the
> > > > symmetrized flag.
> > > >
> > >
> > >
> > > <asm.txt><gamg.txt>
> >
> >
>
>

Re: [petsc-users] GAMG for the unsymmetrical matrix

Reply via email to