On Oct 9, 2013, at 11:32 AM, Matthew Knepley <[email protected]> wrote:
> On Wed, Oct 9, 2013 at 8:34 AM, Pierre Jolivet <[email protected]> wrote: > Mark and Barry, > You will find attached the log for BoomerAMG (better, but still slow > imho), ML (still lost), GAMG (better, I took Jed's advice and recompiled > petsc-maint but forgot to relink my app so please discard the time spent > in MatView again) and GASM (best, really ? for a Poisson equation ?). > > I'll try bigger matrices (that is likely the real problem now, at least > for GAMG), but if you still see something fishy that I might need to > adjust in the parameters, please tell me. > > I strongly suspect something wrong with the formulation. There are plenty of > examples of this same thing in PETSc that work fine. SNES ex12, although it > is an unstructured grid, can do 3d Poisson and you can run GAMG on it to > show good linear scaling. KSP ex56 is 3d elasticity and what Mark uses to > benchmark GAMG. GAMG is coarsening super fast. THis is not a big deal at this stage but I would add '-pc_gamg_square_graph false' Also, the flop rates are terrible (2 Mflop/s/core). Is this a 3D cube that is partitioned lexicographically? > > Thanks, > > Matt > > Also, the first results I got for elasticity (before going back to plain > scalar diffusion) were at least as bad. Do you have any tips for such > problems beside setting the correct BlockSize and MatNearNullSpace and > using parameters similar to the ones you just gave me or the ones that can > be found here > http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html ? > > Thanks for your help, > Pierre > > > > > On Oct 8, 2013, at 8:18 PM, Barry Smith <[email protected]> wrote: > > > >> > >> MatView 6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00 > >> 0.0e+00 4.0e+00 18 0 0 0 0 25 0 0 0 0 0 > >> > >> Something is seriously wrong with the default matview (or pcview) for > >> PC GAMG? It is printing way to much for the default view and thus > >> totally hosing the timings. The default PCView() is suppose to be > >> very light weight (not do excessive communication) and provide very > >> high level information. > >> > > > > Oh, I think the problem is that GAMG sets the coarse grid solver > > explicitly as a block jacobi with LU local. GAMG insures that all > > equation are on one PE for the coarsest grid. ML uses redundant. You > > should be able to use redundant in GAMG, it is just not the default. This > > is not tested. So I'm guessing the problem is that block Jacobi is noisy. > > > >> > >> Barry > >> > >> > >> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <[email protected]> wrote: > >> > >>> Something is going terrible wrong with the setup in hypre and ML. > >>> hypre's default parameters are not setup well for 3D. I use: > >>> > >>> -pc_hypre_boomeramg_no_CF > >>> -pc_hypre_boomeramg_agg_nl 1 > >>> -pc_hypre_boomeramg_coarsen_type HMIS > >>> -pc_hypre_boomeramg_interp_type ext+i > >>> > >>> I'm not sure what is going wrong with ML's setup. > >>> > >>> GAMG is converging terribly. Is this just a simple 7-point Laplacian? > >>> It looks like you the eigen estimate is low on the finest grid, which > >>> messes up the smoother. Try running with these parameters and send the > >>> output: > >>> > >>> -pc_gamg_agg_nsmooths 1 > >>> -pc_gamg_verbose 2 > >>> -mg_levels_ksp_type richardson > >>> -mg_levels_pc_type sor > >>> > >>> Mark > >>> > >>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <[email protected]> > >>> wrote: > >>> > >>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up > >>>> for > >>>> GAMG doesn't look so bad compared to the other packages, so I'm > >>>> wondering > >>>> what is going on with those ? > >>>> > >>>>> > >>>>> We need the output from running with -log_summary -pc_mg_log > >>>>> > >>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type > >>>>> gamg) > >>>>> This will give the most useful information about where it is spending > >>>>> the time. > >>>>> > >>>>> > >>>>> Barry > >>>>> > >>>>> > >>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> Dear all, > >>>>>> I'm trying to compare linear solvers for a simple Poisson equation > >>>>>> in > >>>>>> 3D. > >>>>>> I thought that MG was the way to go, but looking at my log, the > >>>>>> performance looks abysmal (I know that the matrices are way too > >>>>>> small > >>>>>> but > >>>>>> if I go bigger, it just never performs a single iteration ..). Even > >>>>>> though > >>>>>> this is neither the BoomerAMG nor the ML mailing list, could you > >>>>>> please > >>>>>> tell me if PETSc sets some default flags that make the setup for > >>>>>> those > >>>>>> solvers so slow for this simple problem ? The performance of (G)ASM > >>>>>> is > >>>>>> in > >>>>>> comparison much better. > >>>>>> > >>>>>> Thanks in advance for your help. > >>>>>> > >>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry). > >>>>>> > >>>>>> 0 KSP Residual norm 1.599647112604e+00 > >>>>>> 1 KSP Residual norm 5.450838232404e-02 > >>>>>> 2 KSP Residual norm 3.549673478318e-03 > >>>>>> 3 KSP Residual norm 2.901826808841e-04 > >>>>>> 4 KSP Residual norm 2.574235778729e-05 > >>>>>> 5 KSP Residual norm 2.253410171682e-06 > >>>>>> 6 KSP Residual norm 1.871067784877e-07 > >>>>>> 7 KSP Residual norm 1.681162800670e-08 > >>>>>> 8 KSP Residual norm 2.120841512414e-09 > >>>>>> KSP Object: 2048 MPI processes > >>>>>> type: gmres > >>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >>>>>> Orthogonalization with no iterative refinement > >>>>>> GMRES: happy breakdown tolerance 1e-30 > >>>>>> maximum iterations=200, initial guess is zero > >>>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000 > >>>>>> left preconditioning > >>>>>> using PRECONDITIONED norm type for convergence test > >>>>>> PC Object: 2048 MPI processes > >>>>>> type: hypre > >>>>>> HYPRE BoomerAMG preconditioning > >>>>>> HYPRE BoomerAMG: Cycle type V > >>>>>> HYPRE BoomerAMG: Maximum number of levels 25 > >>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > >>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > >>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25 > >>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 > >>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0 > >>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > >>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > >>>>>> HYPRE BoomerAMG: Maximum row sums 0.9 > >>>>>> HYPRE BoomerAMG: Sweeps down 1 > >>>>>> HYPRE BoomerAMG: Sweeps up 1 > >>>>>> HYPRE BoomerAMG: Sweeps on coarse 1 > >>>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > >>>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > >>>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > >>>>>> HYPRE BoomerAMG: Relax weight (all) 1 > >>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 > >>>>>> HYPRE BoomerAMG: Using CF-relaxation > >>>>>> HYPRE BoomerAMG: Measure type local > >>>>>> HYPRE BoomerAMG: Coarsen type Falgout > >>>>>> HYPRE BoomerAMG: Interpolation type classical > >>>>>> linear system matrix = precond matrix: > >>>>>> Matrix Object: 2048 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=4173281, cols=4173281 > >>>>>> total: nonzeros=102576661, allocated nonzeros=102576661 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> not using I-node (on process 0) routines > >>>>>> --- system solved with PETSc (in 1.005199e+02 seconds) > >>>>>> > >>>>>> 0 KSP Residual norm 2.368804472986e-01 > >>>>>> 1 KSP Residual norm 5.676430019132e-02 > >>>>>> 2 KSP Residual norm 1.898005876002e-02 > >>>>>> 3 KSP Residual norm 6.193922902926e-03 > >>>>>> 4 KSP Residual norm 2.008448794493e-03 > >>>>>> 5 KSP Residual norm 6.390465670228e-04 > >>>>>> 6 KSP Residual norm 2.157709394389e-04 > >>>>>> 7 KSP Residual norm 7.295973819979e-05 > >>>>>> 8 KSP Residual norm 2.358343271482e-05 > >>>>>> 9 KSP Residual norm 7.489696222066e-06 > >>>>>> 10 KSP Residual norm 2.390946857593e-06 > >>>>>> 11 KSP Residual norm 8.068086385140e-07 > >>>>>> 12 KSP Residual norm 2.706607789749e-07 > >>>>>> 13 KSP Residual norm 8.636910863376e-08 > >>>>>> 14 KSP Residual norm 2.761981175852e-08 > >>>>>> 15 KSP Residual norm 8.755459874369e-09 > >>>>>> 16 KSP Residual norm 2.708848598341e-09 > >>>>>> 17 KSP Residual norm 8.968748876265e-10 > >>>>>> KSP Object: 2048 MPI processes > >>>>>> type: gmres > >>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > >>>>>> Orthogonalization with no iterative refinement > >>>>>> GMRES: happy breakdown tolerance 1e-30 > >>>>>> maximum iterations=200, initial guess is zero > >>>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000 > >>>>>> left preconditioning > >>>>>> using PRECONDITIONED norm type for convergence test > >>>>>> PC Object: 2048 MPI processes > >>>>>> type: ml > >>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v > >>>>>> Cycles per PCApply=1 > >>>>>> Using Galerkin computed coarse grid matrices > >>>>>> Coarse grid solver -- level ------------------------------- > >>>>>> KSP Object: (mg_coarse_) 2048 MPI processes > >>>>>> type: preonly > >>>>>> maximum iterations=1, initial guess is zero > >>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > >>>>>> left preconditioning > >>>>>> using NONE norm type for convergence test > >>>>>> PC Object: (mg_coarse_) 2048 MPI processes > >>>>>> type: redundant > >>>>>> Redundant preconditioner: First (color=0) of 2048 PCs follows > >>>>>> KSP Object: (mg_coarse_redundant_) 1 MPI processes > >>>>>> type: preonly > >>>>>> maximum iterations=10000, initial guess is zero > >>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > >>>>>> left preconditioning > >>>>>> using NONE norm type for convergence test > >>>>>> PC Object: (mg_coarse_redundant_) 1 MPI processes > >>>>>> type: lu > >>>>>> LU: out-of-place factorization > >>>>>> tolerance for zero pivot 2.22045e-14 > >>>>>> using diagonal shift on blocks to prevent zero pivot > >>>>>> matrix ordering: nd > >>>>>> factor fill ratio given 5, needed 4.38504 > >>>>>> Factored matrix follows: > >>>>>> Matrix Object: 1 MPI processes > >>>>>> type: seqaij > >>>>>> rows=2055, cols=2055 > >>>>>> package used to perform factorization: petsc > >>>>>> total: nonzeros=2476747, allocated nonzeros=2476747 > >>>>>> total number of mallocs used during MatSetValues calls > >>>>>> =0 > >>>>>> using I-node routines: found 1638 nodes, limit used is > >>>>>> 5 > >>>>>> linear system matrix = precond matrix: > >>>>>> Matrix Object: 1 MPI processes > >>>>>> type: seqaij > >>>>>> rows=2055, cols=2055 > >>>>>> total: nonzeros=564817, allocated nonzeros=1093260 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> not using I-node routines > >>>>>> linear system matrix = precond matrix: > >>>>>> Matrix Object: 2048 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=2055, cols=2055 > >>>>>> total: nonzeros=564817, allocated nonzeros=564817 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> not using I-node (on process 0) routines > >>>>>> Down solver (pre-smoother) on level 1 > >>>>>> ------------------------------- > >>>>>> KSP Object: (mg_levels_1_) 2048 MPI processes > >>>>>> type: richardson > >>>>>> Richardson: damping factor=1 > >>>>>> maximum iterations=2 > >>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > >>>>>> left preconditioning > >>>>>> using nonzero initial guess > >>>>>> using NONE norm type for convergence test > >>>>>> PC Object: (mg_levels_1_) 2048 MPI processes > >>>>>> type: sor > >>>>>> SOR: type = local_symmetric, iterations = 1, local iterations = > >>>>>> 1, > >>>>>> omega = 1 > >>>>>> linear system matrix = precond matrix: > >>>>>> Matrix Object: 2048 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=30194, cols=30194 > >>>>>> total: nonzeros=3368414, allocated nonzeros=3368414 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> not using I-node (on process 0) routines > >>>>>> Up solver (post-smoother) same as down solver (pre-smoother) > >>>>>> Down solver (pre-smoother) on level 2 > >>>>>> ------------------------------- > >>>>>> KSP Object: (mg_levels_2_) 2048 MPI processes > >>>>>> type: richardson > >>>>>> Richardson: damping factor=1 > >>>>>> maximum iterations=2 > >>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > >>>>>> left preconditioning > >>>>>> using nonzero initial guess > >>>>>> using NONE norm type for convergence test > >>>>>> PC Object: (mg_levels_2_) 2048 MPI processes > >>>>>> type: sor > >>>>>> SOR: type = local_symmetric, iterations = 1, local iterations = > >>>>>> 1, > >>>>>> omega = 1 > >>>>>> linear system matrix = precond matrix: > >>>>>> Matrix Object: 2048 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=531441, cols=531441 > >>>>>> total: nonzeros=12476324, allocated nonzeros=12476324 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> not using I-node (on process 0) routines > >>>>>> Up solver (post-smoother) same as down solver (pre-smoother) > >>>>>> linear system matrix = precond matrix: > >>>>>> Matrix Object: 2048 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=531441, cols=531441 > >>>>>> total: nonzeros=12476324, allocated nonzeros=12476324 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> not using I-node (on process 0) routines > >>>>>> --- system solved with PETSc (in 2.407844e+02 seconds) > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> <log-GAMG><log-ML><log-BoomerAMG> > >>> > >> > > > > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener
