The branch should now be good to go (https://gitlab.com/petsc/petsc/-/merge_requests/6841). Sorry, I made a mistake before, hence the error on PetscObjectQuery(). I’m not sure the code will be covered by the pipeline, but I have tested this on a Raviart—Thomas discretization with PCFIELDSPLIT. You’ll see in the attached logs that: 1) the numerics match 2) in the SBAIJ case, PCFIELDSPLIT extract the (non-symmetric) A_{01} block from the global (symmetric) A and we get the A_{10} block cheaply by just using MatCreateHermitianTranspose(), instead of calling another time MatCreateSubMatrix() Please let me know if you have some time to test the branch and whether it fails or succeeds on your test cases. Also, I do not agree with what Hong said. Sometimes, the assembly of a coefficient can be more expensive than the communication of the said coefficient. So they are instances where SBAIJ would be more efficient than AIJ even if it would require more communication, it is not a black and white picture. Thanks, Pierre |
0 KSP Residual norm 3.873169750889e+00 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 1 KSP Residual norm 1.182487410355e-01 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28 2 KSP Residual norm 1.102241338775e-02 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 26 3 KSP Residual norm 2.301967727513e-03 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 27 4 KSP Residual norm 1.597010741936e-04 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28 5 KSP Residual norm 5.540316293664e-05 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 29 6 KSP Residual norm 6.398182217972e-06 KSP Object: 4 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, factorization FULL Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_) 4 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_) 4 MPI processes type: bjacobi number of blocks = 4 Local solver information for first block is in the following KSP and PC objects on rank 0: Use -fieldsplit_0_ksp_view ::ascii_info_detail to display information for all blocks KSP Object: (fieldsplit_0_sub_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_sub_) 1 MPI process type: icc out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 using Manteuffel shift [POSITIVE_DEFINITE] matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: (fieldsplit_0_sub_) 1 MPI process type: seqsbaij rows=2121, cols=2121 package used to perform factorization: petsc total: nonzeros=21972, allocated nonzeros=21972 block size is 1 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_sub_) 1 MPI process type: seqsbaij rows=2121, cols=2121 total: nonzeros=21972, allocated nonzeros=21972 total number of mallocs used during MatSetValues calls=0 block size is 1 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 4 MPI processes type: mpisbaij rows=10116, cols=10116 total: nonzeros=105912, allocated nonzeros=105912 total number of mallocs used during MatSetValues calls=0 block size is 1 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 4 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.0001, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_1_) 4 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down symmetric-SOR/Jacobi Relax up symmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Maximum size of coarsest grid 9 Minimum size of coarsest grid 1 Using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type Falgout Interpolation type classical SpGEMM type hypre linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_1_) 4 MPI processes type: schurcomplement rows=5712, cols=5712 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_1_) 4 MPI processes type: mpisbaij rows=5712, cols=5712 total: nonzeros=19992, allocated nonzeros=19992 total number of mallocs used during MatSetValues calls=0 block size is 1 A10 Mat Object: 4 MPI processes type: hermitiantranspose rows=5712, cols=10116 KSP solver for A00 block viewable with the additional option -fieldsplit_0_ksp_view A01 Mat Object: 4 MPI processes type: mpiaij rows=10116, cols=5712 total: nonzeros=85680, allocated nonzeros=85680 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 692 nodes, limit used is 5 Mat Object: 4 MPI processes type: mpiaij rows=5712, cols=5712 total: nonzeros=134208, allocated nonzeros=134208 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 470 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpisbaij rows=15828, cols=15828 total: nonzeros=211584, allocated nonzeros=211584 total number of mallocs used during MatSetValues calls=0 block size is 1
0 KSP Residual norm 3.873169750889e+00 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25 1 KSP Residual norm 1.182487410353e-01 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28 2 KSP Residual norm 1.102241338764e-02 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 26 3 KSP Residual norm 2.301967727466e-03 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 27 4 KSP Residual norm 1.597010741933e-04 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28 5 KSP Residual norm 5.540316293543e-05 Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 29 6 KSP Residual norm 6.398182217802e-06 KSP Object: 4 MPI processes type: fgmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, factorization FULL Preconditioner for the Schur complement formed from Sp, an assembled approximation to S, which uses A00's diagonal's inverse Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_0_) 4 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_) 4 MPI processes type: bjacobi number of blocks = 4 Local solver information for first block is in the following KSP and PC objects on rank 0: Use -fieldsplit_0_ksp_view ::ascii_info_detail to display information for all blocks KSP Object: (fieldsplit_0_sub_) 1 MPI process type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_sub_) 1 MPI process type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: (fieldsplit_0_sub_) 1 MPI process type: seqaij rows=2121, cols=2121 package used to perform factorization: petsc total: nonzeros=41823, allocated nonzeros=41823 using I-node routines: found 686 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_sub_) 1 MPI process type: seqaij rows=2121, cols=2121 total: nonzeros=41823, allocated nonzeros=41823 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 686 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 4 MPI processes type: mpiaij rows=10116, cols=10116 total: nonzeros=201708, allocated nonzeros=201708 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 686 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_1_) 4 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=0.0001, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_1_) 4 MPI processes type: hypre HYPRE BoomerAMG preconditioning Cycle type V Maximum number of levels 25 Maximum number of iterations PER hypre call 1 Convergence tolerance PER hypre call 0. Threshold for strong coupling 0.25 Interpolation truncation factor 0. Interpolation: max elements per row 0 Number of levels of aggressive coarsening 0 Number of paths for aggressive coarsening 1 Maximum row sums 0.9 Sweeps down 1 Sweeps up 1 Sweeps on coarse 1 Relax down symmetric-SOR/Jacobi Relax up symmetric-SOR/Jacobi Relax on coarse Gaussian-elimination Relax weight (all) 1. Outer relax weight (all) 1. Maximum size of coarsest grid 9 Minimum size of coarsest grid 1 Using CF-relaxation Not using more complex smoothers. Measure type local Coarsen type Falgout Interpolation type classical SpGEMM type hypre linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_1_) 4 MPI processes type: schurcomplement rows=5712, cols=5712 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_1_) 4 MPI processes type: mpiaij rows=5712, cols=5712 total: nonzeros=34272, allocated nonzeros=34272 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 470 nodes, limit used is 5 A10 Mat Object: 4 MPI processes type: mpiaij rows=5712, cols=10116 total: nonzeros=85680, allocated nonzeros=85680 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 469 nodes, limit used is 5 KSP solver for A00 block viewable with the additional option -fieldsplit_0_ksp_view A01 Mat Object: 4 MPI processes type: mpiaij rows=10116, cols=5712 total: nonzeros=85680, allocated nonzeros=85680 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 692 nodes, limit used is 5 Mat Object: 4 MPI processes type: mpiaij rows=5712, cols=5712 total: nonzeros=134208, allocated nonzeros=134208 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 470 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=15828, cols=15828 total: nonzeros=407340, allocated nonzeros=407340 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 965 nodes, limit used is 5
|