> On Aug 19, 2020, at 7:56 PM, Jed Brown <j...@jedbrown.org> wrote: > > Manav Bhatia <bhatiama...@gmail.com> writes: > >> Thanks for the followup, Jed. >> >>> On Aug 19, 2020, at 7:42 PM, Jed Brown <j...@jedbrown.org> wrote: >>> >>> Can you share a couple example stack traces from that debugging? >> >> Do you mean a similar screenshot at different system sizes? Or a different >> format? > > Sorry, I missed the screenshots (they were tucked away in the text/html and I > was reading the text/plain version of your message).
Glad you found them. Please let me know if more information would help. > >>> About how many nonzeros per row? >> >> This is a 3D elasticity run with Hex8 elements. So, each row has 81 non-zero >> entries, although I have not verified that (I will do so now). Is there a >> command line argument that will print this for the matrix? Although, on >> second thought that will not be printed unless the Assembly routine has >> finished. > > You could run a smaller problem size with -snes_view, which would show matrix > stats. Here is the information from a case with 2e6 DoFs. KSP Object: 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: gamg type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0. 0. 0. Threshold scaling factor for each level not specified = 1. AGG specific options Symmetric graph false Number of levels to square graph 1 Number smoothing steps 1 Complexity: grid = 1.16005 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=12, cols=12, bs=6 package used to perform factorization: petsc total: nonzeros=144, allocated nonzeros=144 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=12, cols=12, bs=6 total: nonzeros=144, allocated nonzeros=144 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=12, cols=12, bs=6 total: nonzeros=144, allocated nonzeros=144 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 3 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.16303, max = 1.79333 eigenvalues estimate via gmres min 0.0108937, max 1.6303 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=240, cols=240, bs=6 total: nonzeros=51912, allocated nonzeros=51912 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 13 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.146755, max = 1.6143 eigenvalues estimate via gmres min 0.00483441, max 1.46755 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=6336, cols=6336, bs=6 total: nonzeros=3902760, allocated nonzeros=3902760 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation using I-node (on process 0) routines: found 228 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.1525, max = 1.67751 eigenvalues estimate via gmres min 0.0281517, max 1.525 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=87246, cols=87246, bs=6 total: nonzeros=21279420, allocated nonzeros=21279420 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation using I-node (on process 0) routines: found 3552 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.160784, max = 1.76862 eigenvalues estimate via gmres min 0.0293826, max 1.60784 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_4_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_4_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=2000103, cols=2000103, bs=3 total: nonzeros=157666509, allocated nonzeros=160054056 total number of mallocs used during MatSetValues calls=0 has attached near null space using I-node (on process 0) routines: found 86672 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=2000103, cols=2000103, bs=3 total: nonzeros=157666509, allocated nonzeros=160054056 total number of mallocs used during MatSetValues calls=0 has attached near null space using I-node (on process 0) routines: found 86672 nodes, limit used is 5 > > Can you try running with -matstash_legacy? Will do and report results shortly. > > What version of Open MPI is this? This is MPI 4.0.1 installed using macports: InfiHorizon:opt manav$ mpiexec-openmpi-clang --version mpiexec-openmpi-clang (OpenRTE) 4.0.1