We need the output from running with -log_summary -pc_mg_log Also you can run with PETSc's AMG called GAMG (run with -pc_type gamg) This will give the most useful information about where it is spending the time.
Barry On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <[email protected]> wrote: > Dear all, > I'm trying to compare linear solvers for a simple Poisson equation in 3D. > I thought that MG was the way to go, but looking at my log, the > performance looks abysmal (I know that the matrices are way too small but > if I go bigger, it just never performs a single iteration ..). Even though > this is neither the BoomerAMG nor the ML mailing list, could you please > tell me if PETSc sets some default flags that make the setup for those > solvers so slow for this simple problem ? The performance of (G)ASM is in > comparison much better. > > Thanks in advance for your help. > > PS: first the BoomerAMG log, then ML (much more verbose, sorry). > > 0 KSP Residual norm 1.599647112604e+00 > 1 KSP Residual norm 5.450838232404e-02 > 2 KSP Residual norm 3.549673478318e-03 > 3 KSP Residual norm 2.901826808841e-04 > 4 KSP Residual norm 2.574235778729e-05 > 5 KSP Residual norm 2.253410171682e-06 > 6 KSP Residual norm 1.871067784877e-07 > 7 KSP Residual norm 1.681162800670e-08 > 8 KSP Residual norm 2.120841512414e-09 > KSP Object: 2048 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=200, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 2048 MPI processes > type: hypre > HYPRE BoomerAMG preconditioning > HYPRE BoomerAMG: Cycle type V > HYPRE BoomerAMG: Maximum number of levels 25 > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > HYPRE BoomerAMG: Threshold for strong coupling 0.25 > HYPRE BoomerAMG: Interpolation truncation factor 0 > HYPRE BoomerAMG: Interpolation: max elements per row 0 > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > HYPRE BoomerAMG: Maximum row sums 0.9 > HYPRE BoomerAMG: Sweeps down 1 > HYPRE BoomerAMG: Sweeps up 1 > HYPRE BoomerAMG: Sweeps on coarse 1 > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > HYPRE BoomerAMG: Relax weight (all) 1 > HYPRE BoomerAMG: Outer relax weight (all) 1 > HYPRE BoomerAMG: Using CF-relaxation > HYPRE BoomerAMG: Measure type local > HYPRE BoomerAMG: Coarsen type Falgout > HYPRE BoomerAMG: Interpolation type classical > linear system matrix = precond matrix: > Matrix Object: 2048 MPI processes > type: mpiaij > rows=4173281, cols=4173281 > total: nonzeros=102576661, allocated nonzeros=102576661 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > --- system solved with PETSc (in 1.005199e+02 seconds) > > 0 KSP Residual norm 2.368804472986e-01 > 1 KSP Residual norm 5.676430019132e-02 > 2 KSP Residual norm 1.898005876002e-02 > 3 KSP Residual norm 6.193922902926e-03 > 4 KSP Residual norm 2.008448794493e-03 > 5 KSP Residual norm 6.390465670228e-04 > 6 KSP Residual norm 2.157709394389e-04 > 7 KSP Residual norm 7.295973819979e-05 > 8 KSP Residual norm 2.358343271482e-05 > 9 KSP Residual norm 7.489696222066e-06 > 10 KSP Residual norm 2.390946857593e-06 > 11 KSP Residual norm 8.068086385140e-07 > 12 KSP Residual norm 2.706607789749e-07 > 13 KSP Residual norm 8.636910863376e-08 > 14 KSP Residual norm 2.761981175852e-08 > 15 KSP Residual norm 8.755459874369e-09 > 16 KSP Residual norm 2.708848598341e-09 > 17 KSP Residual norm 8.968748876265e-10 > KSP Object: 2048 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=200, initial guess is zero > tolerances: relative=1e-08, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 2048 MPI processes > type: ml > MG: type is MULTIPLICATIVE, levels=3 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 2048 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 2048 MPI processes > type: redundant > Redundant preconditioner: First (color=0) of 2048 PCs follows > KSP Object: (mg_coarse_redundant_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_redundant_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot > matrix ordering: nd > factor fill ratio given 5, needed 4.38504 > Factored matrix follows: > Matrix Object: 1 MPI processes > type: seqaij > rows=2055, cols=2055 > package used to perform factorization: petsc > total: nonzeros=2476747, allocated nonzeros=2476747 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 1638 nodes, limit used is 5 > linear system matrix = precond matrix: > Matrix Object: 1 MPI processes > type: seqaij > rows=2055, cols=2055 > total: nonzeros=564817, allocated nonzeros=1093260 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Matrix Object: 2048 MPI processes > type: mpiaij > rows=2055, cols=2055 > total: nonzeros=564817, allocated nonzeros=564817 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 2048 MPI processes > type: richardson > Richardson: damping factor=1 > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 2048 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1 > linear system matrix = precond matrix: > Matrix Object: 2048 MPI processes > type: mpiaij > rows=30194, cols=30194 > total: nonzeros=3368414, allocated nonzeros=3368414 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > Down solver (pre-smoother) on level 2 ------------------------------- > KSP Object: (mg_levels_2_) 2048 MPI processes > type: richardson > Richardson: damping factor=1 > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_2_) 2048 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, > omega = 1 > linear system matrix = precond matrix: > Matrix Object: 2048 MPI processes > type: mpiaij > rows=531441, cols=531441 > total: nonzeros=12476324, allocated nonzeros=12476324 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix = precond matrix: > Matrix Object: 2048 MPI processes > type: mpiaij > rows=531441, cols=531441 > total: nonzeros=12476324, allocated nonzeros=12476324 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > --- system solved with PETSc (in 2.407844e+02 seconds) > >
