Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ

Pierre Jolivet Mon, 04 Sep 2023 12:31:52 -0700

The branch should now be good to go (https://gitlab.com/petsc/petsc/-/merge_requests/6841).

Sorry, I made a mistake before, hence the error on PetscObjectQuery().

I’m not sure the code will be covered by the pipeline, but I have tested this on a Raviart—Thomas discretization with PCFIELDSPLIT.

You’ll see in the attached logs that:

1) the numerics match

2) in the SBAIJ case, PCFIELDSPLIT extract the (non-symmetric) A_{01} block from the global (symmetric) A and we get the A_{10} block cheaply by just using MatCreateHermitianTranspose(), instead of calling another time MatCreateSubMatrix()

Please let me know if you have some time to test the branch and whether it fails or succeeds on your test cases.

Also, I do not agree with what Hong said.

Sometimes, the assembly of a coefficient can be more expensive than the communication of the said coefficient.

So they are instances where SBAIJ would be more efficient than AIJ even if it would require more communication, it is not a black and white picture.

Thanks,

Pierre

  0 KSP Residual norm 3.873169750889e+00 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25
  1 KSP Residual norm 1.182487410355e-01 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
  2 KSP Residual norm 1.102241338775e-02 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 26
  3 KSP Residual norm 2.301967727513e-03 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 27
  4 KSP Residual norm 1.597010741936e-04 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
  5 KSP Residual norm 5.540316293664e-05 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 29
  6 KSP Residual norm 6.398182217972e-06 
KSP Object: 4 MPI processes
  type: fgmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: fieldsplit
    FieldSplit with Schur preconditioner, factorization FULL
    Preconditioner for the Schur complement formed from Sp, an assembled 
approximation to S, which uses A00's diagonal's inverse
    Split info:
    Split number 0 Defined by IS
    Split number 1 Defined by IS
    KSP solver for A00 block
      KSP Object: (fieldsplit_0_) 4 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (fieldsplit_0_) 4 MPI processes
        type: bjacobi
          number of blocks = 4
          Local solver information for first block is in the following KSP and 
PC objects on rank 0:
          Use -fieldsplit_0_ksp_view ::ascii_info_detail to display information 
for all blocks
        KSP Object: (fieldsplit_0_sub_) 1 MPI process
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object: (fieldsplit_0_sub_) 1 MPI process
          type: icc
            out-of-place factorization
            0 levels of fill
            tolerance for zero pivot 2.22045e-14
            using Manteuffel shift [POSITIVE_DEFINITE]
            matrix ordering: natural
            factor fill ratio given 1., needed 1.
              Factored matrix follows:
                Mat Object: (fieldsplit_0_sub_) 1 MPI process
                  type: seqsbaij
                  rows=2121, cols=2121
                  package used to perform factorization: petsc
                  total: nonzeros=21972, allocated nonzeros=21972
                      block size is 1
          linear system matrix = precond matrix:
          Mat Object: (fieldsplit_0_sub_) 1 MPI process
            type: seqsbaij
            rows=2121, cols=2121
            total: nonzeros=21972, allocated nonzeros=21972
            total number of mallocs used during MatSetValues calls=0
                block size is 1
        linear system matrix = precond matrix:
        Mat Object: (fieldsplit_0_) 4 MPI processes
          type: mpisbaij
          rows=10116, cols=10116
          total: nonzeros=105912, allocated nonzeros=105912
          total number of mallocs used during MatSetValues calls=0
              block size is 1
    KSP solver for S = A11 - A10 inv(A00) A01 
      KSP Object: (fieldsplit_1_) 4 MPI processes
        type: gmres
          restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
          happy breakdown tolerance 1e-30
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=0.0001, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object: (fieldsplit_1_) 4 MPI processes
        type: hypre
          HYPRE BoomerAMG preconditioning
            Cycle type V
            Maximum number of levels 25
            Maximum number of iterations PER hypre call 1
            Convergence tolerance PER hypre call 0.
            Threshold for strong coupling 0.25
            Interpolation truncation factor 0.
            Interpolation: max elements per row 0
            Number of levels of aggressive coarsening 0
            Number of paths for aggressive coarsening 1
            Maximum row sums 0.9
            Sweeps down         1
            Sweeps up           1
            Sweeps on coarse    1
            Relax down          symmetric-SOR/Jacobi
            Relax up            symmetric-SOR/Jacobi
            Relax on coarse     Gaussian-elimination
            Relax weight  (all)      1.
            Outer relax weight (all) 1.
            Maximum size of coarsest grid 9
            Minimum size of coarsest grid 1
            Using CF-relaxation
            Not using more complex smoothers.
            Measure type        local
            Coarsen type        Falgout
            Interpolation type  classical
            SpGEMM type         hypre
        linear system matrix followed by preconditioner matrix:
        Mat Object: (fieldsplit_1_) 4 MPI processes
          type: schurcomplement
          rows=5712, cols=5712
            Schur complement A11 - A10 inv(A00) A01
            A11
              Mat Object: (fieldsplit_1_) 4 MPI processes
                type: mpisbaij
                rows=5712, cols=5712
                total: nonzeros=19992, allocated nonzeros=19992
                total number of mallocs used during MatSetValues calls=0
                    block size is 1
            A10
              Mat Object: 4 MPI processes
                type: hermitiantranspose
                rows=5712, cols=10116
            KSP solver for A00 block viewable with the additional option 
-fieldsplit_0_ksp_view
            A01
              Mat Object: 4 MPI processes
                type: mpiaij
                rows=10116, cols=5712
                total: nonzeros=85680, allocated nonzeros=85680
                total number of mallocs used during MatSetValues calls=0
                  using I-node (on process 0) routines: found 692 nodes, limit 
used is 5
        Mat Object: 4 MPI processes
          type: mpiaij
          rows=5712, cols=5712
          total: nonzeros=134208, allocated nonzeros=134208
          total number of mallocs used during MatSetValues calls=0
            using I-node (on process 0) routines: found 470 nodes, limit used 
is 5
  linear system matrix = precond matrix:
  Mat Object: 4 MPI processes
    type: mpisbaij
    rows=15828, cols=15828
    total: nonzeros=211584, allocated nonzeros=211584
    total number of mallocs used during MatSetValues calls=0
        block size is 1

  0 KSP Residual norm 3.873169750889e+00 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 25
  1 KSP Residual norm 1.182487410353e-01 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
  2 KSP Residual norm 1.102241338764e-02 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 26
  3 KSP Residual norm 2.301967727466e-03 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 27
  4 KSP Residual norm 1.597010741933e-04 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 28
  5 KSP Residual norm 5.540316293543e-05 
  Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 29
  6 KSP Residual norm 6.398182217802e-06 
KSP Object: 4 MPI processes
  type: fgmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
with no iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: fieldsplit
    FieldSplit with Schur preconditioner, factorization FULL
    Preconditioner for the Schur complement formed from Sp, an assembled 
approximation to S, which uses A00's diagonal's inverse
    Split info:
    Split number 0 Defined by IS
    Split number 1 Defined by IS
    KSP solver for A00 block
      KSP Object: (fieldsplit_0_) 4 MPI processes
        type: preonly
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
        left preconditioning
        using NONE norm type for convergence test
      PC Object: (fieldsplit_0_) 4 MPI processes
        type: bjacobi
          number of blocks = 4
          Local solver information for first block is in the following KSP and 
PC objects on rank 0:
          Use -fieldsplit_0_ksp_view ::ascii_info_detail to display information 
for all blocks
        KSP Object: (fieldsplit_0_sub_) 1 MPI process
          type: preonly
          maximum iterations=10000, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
          left preconditioning
          using NONE norm type for convergence test
        PC Object: (fieldsplit_0_sub_) 1 MPI process
          type: ilu
            out-of-place factorization
            0 levels of fill
            tolerance for zero pivot 2.22045e-14
            matrix ordering: natural
            factor fill ratio given 1., needed 1.
              Factored matrix follows:
                Mat Object: (fieldsplit_0_sub_) 1 MPI process
                  type: seqaij
                  rows=2121, cols=2121
                  package used to perform factorization: petsc
                  total: nonzeros=41823, allocated nonzeros=41823
                    using I-node routines: found 686 nodes, limit used is 5
          linear system matrix = precond matrix:
          Mat Object: (fieldsplit_0_sub_) 1 MPI process
            type: seqaij
            rows=2121, cols=2121
            total: nonzeros=41823, allocated nonzeros=41823
            total number of mallocs used during MatSetValues calls=0
              using I-node routines: found 686 nodes, limit used is 5
        linear system matrix = precond matrix:
        Mat Object: (fieldsplit_0_) 4 MPI processes
          type: mpiaij
          rows=10116, cols=10116
          total: nonzeros=201708, allocated nonzeros=201708
          total number of mallocs used during MatSetValues calls=0
            using I-node (on process 0) routines: found 686 nodes, limit used 
is 5
    KSP solver for S = A11 - A10 inv(A00) A01 
      KSP Object: (fieldsplit_1_) 4 MPI processes
        type: gmres
          restart=30, using Classical (unmodified) Gram-Schmidt 
Orthogonalization with no iterative refinement
          happy breakdown tolerance 1e-30
        maximum iterations=10000, initial guess is zero
        tolerances:  relative=0.0001, absolute=1e-50, divergence=10000.
        left preconditioning
        using PRECONDITIONED norm type for convergence test
      PC Object: (fieldsplit_1_) 4 MPI processes
        type: hypre
          HYPRE BoomerAMG preconditioning
            Cycle type V
            Maximum number of levels 25
            Maximum number of iterations PER hypre call 1
            Convergence tolerance PER hypre call 0.
            Threshold for strong coupling 0.25
            Interpolation truncation factor 0.
            Interpolation: max elements per row 0
            Number of levels of aggressive coarsening 0
            Number of paths for aggressive coarsening 1
            Maximum row sums 0.9
            Sweeps down         1
            Sweeps up           1
            Sweeps on coarse    1
            Relax down          symmetric-SOR/Jacobi
            Relax up            symmetric-SOR/Jacobi
            Relax on coarse     Gaussian-elimination
            Relax weight  (all)      1.
            Outer relax weight (all) 1.
            Maximum size of coarsest grid 9
            Minimum size of coarsest grid 1
            Using CF-relaxation
            Not using more complex smoothers.
            Measure type        local
            Coarsen type        Falgout
            Interpolation type  classical
            SpGEMM type         hypre
        linear system matrix followed by preconditioner matrix:
        Mat Object: (fieldsplit_1_) 4 MPI processes
          type: schurcomplement
          rows=5712, cols=5712
            Schur complement A11 - A10 inv(A00) A01
            A11
              Mat Object: (fieldsplit_1_) 4 MPI processes
                type: mpiaij
                rows=5712, cols=5712
                total: nonzeros=34272, allocated nonzeros=34272
                total number of mallocs used during MatSetValues calls=0
                  using I-node (on process 0) routines: found 470 nodes, limit 
used is 5
            A10
              Mat Object: 4 MPI processes
                type: mpiaij
                rows=5712, cols=10116
                total: nonzeros=85680, allocated nonzeros=85680
                total number of mallocs used during MatSetValues calls=0
                  using I-node (on process 0) routines: found 469 nodes, limit 
used is 5
            KSP solver for A00 block viewable with the additional option 
-fieldsplit_0_ksp_view
            A01
              Mat Object: 4 MPI processes
                type: mpiaij
                rows=10116, cols=5712
                total: nonzeros=85680, allocated nonzeros=85680
                total number of mallocs used during MatSetValues calls=0
                  using I-node (on process 0) routines: found 692 nodes, limit 
used is 5
        Mat Object: 4 MPI processes
          type: mpiaij
          rows=5712, cols=5712
          total: nonzeros=134208, allocated nonzeros=134208
          total number of mallocs used during MatSetValues calls=0
            using I-node (on process 0) routines: found 470 nodes, limit used 
is 5
  linear system matrix = precond matrix:
  Mat Object: 4 MPI processes
    type: mpiaij
    rows=15828, cols=15828
    total: nonzeros=407340, allocated nonzeros=407340
    total number of mallocs used during MatSetValues calls=0
      using I-node (on process 0) routines: found 965 nodes, limit used is 5

On 28 Aug 2023, at 12:12 PM, Pierre Jolivet <pierre.joli...@lip6.fr> wrote:

On 28 Aug 2023, at 6:50 PM, Carl-Johan Thore <carljohanth...@gmail.com> wrote:

I've tried the new files, and with them, PCFIELDSPLIT now gets set up without crashes (but the setup is significantly slower than for MATAIJ)

I’ll be back from Japan at the end of this week, my schedule is too packed to get anything done in the meantime.
But I’ll let you know when things are working properly (last I checked, I think it was working, but I may have forgotten about a corner case or two).
But yes, though one would except things to be faster and less memory intensive with SBAIJ, it’s unfortunately not always the case.

Thanks,
Pierre

Unfortunately I still get errors later in the process:

[0]PETSC ERROR: Null argument, when expecting valid pointer
[0]PETSC ERROR: Null Pointer: Parameter # 1
[0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-1023-ga6d78fcba1d GIT Date: 2023-08-22 20:32:33 -0400
[0]PETSC ERROR: Configure options -f --with-fortran-bindings=0 --with-cuda --with-cusp --download-scalapack --download-hdf5 --download-zlib --download-mumps --download-parmetis --download-metis --download-ptscotch --download-hypre --download-spai
[0]PETSC ERROR: #1 PetscObjectQuery() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/sys/objects/inherit.c:742
[0]PETSC ERROR: #2 MatCreateSubMatrix_MPISBAIJ() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/mat/impls/sbaij/mpi/mpisbaij.c:1414
[0]PETSC ERROR: #3 MatCreateSubMatrix() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/mat/interface/matrix.c:8476
[0]PETSC ERROR: #4 PCSetUp_FieldSplit() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/pc/impls/fieldsplit/fieldsplit.c:826
[0]PETSC ERROR: #5 PCSetUp() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/pc/interface/precon.c:1069
[0]PETSC ERROR: #6 KSPSetUp() at /mnt/c/mathware/petsc/petsc-v3-19-4/src/ksp/ksp/interface/itfunc.c:415

The code I'm running here works without any problems for MATAIJ. To run it with MATSBAIJ I've simply used the command-line option
-dm_mat_type sbaij

Kind regards,
Carl-Johan

On Sat, Aug 26, 2023 at 5:21 PM Pierre Jolivet via petsc-users <petsc-users@mcs.anl.gov> wrote:

On 27 Aug 2023, at 12:14 AM, Carl-Johan Thore <carl-johan.th...@liu.se> wrote:

“Well, your A00 and A11 will possibly be SBAIJ also, so you’ll end up with the same issue.”
I’m not sure I follow. Does PCFIELDSPLIT extract further submatrices from these blocks, or is there
somewhere else in the code that things will go wrong?

Ah, no, you are right, in that case it should work.

For the MATNEST I was thinking to get some savings from the block-symmetry at least
even if symmetry in A00 and A11 cannot be exploited; using SBAIJ for them would just be a
(pretty big) bonus.

“I’ll rebase on top of main and try to get it integrated if it could be useful to you (but I’m traveling
right now so everything gets done more slowly, sorry).”
Sound great, Thanks again!

The MR is there https://gitlab.com/petsc/petsc/-/merge_requests/6841.
I need to add a new code path in MatCreateRedundantMatrix() to make sure the resulting Mat is indeed SBAIJ, but that is orthogonal to the PCFIELDSPLIT issue.
The branch should be usable in its current state.

Thanks,
Pierre

From: Pierre Jolivet <pierre.joli...@lip6.fr>
Sent: Saturday, August 26, 2023 4:36 PM
To: Carl-Johan Thore <carl-johan.th...@liu.se>
Cc: Carl-Johan Thore <carljohanth...@gmail.com>; petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ

On 26 Aug 2023, at 11:16 PM, Carl-Johan Thore <carl-johan.th...@liu.se> wrote:

"(Sadly) MATSBAIJ is extremely broken, in particular, it cannot be used to retrieve rectangular blocks in MatCreateSubMatrices, thus you cannot get the A01 and A10 blocks in PCFIELDSPLIT.
I have a branch that fixes this, but I haven’t rebased in a while (and I’m AFK right now), would you want me to rebase and give it a go, or must you stick to a release tarball?"

Ok, would be great if you could look at this! I don't need to stick to any particular branch.

Do you think MATNEST could be an alternative here?

Well, your A00 and A11 will possibly be SBAIJ also, so you’ll end up with the same issue.
I’m using both approaches (monolithic SBAIJ or Nest + SBAIJ), it was crashing but I think it was thoroughly fixed in https://gitlab.com/petsc/petsc/-/commits/jolivet/feature-matcreatesubmatrices-rectangular-sbaij/
It is ugly code on top of ugly code, so I didn’t try to get it integrated and just used the branch locally, and then moved to some other stuff.
I’ll rebase on top of main and try to get it integrated if it could be useful to you (but I’m traveling right now so everything gets done more slowly, sorry).

Thanks,
Pierre

My matrix is
[A00 A01;
A01^t A11]
so perhaps with MATNEST I can make use of the block-symmetry at least, and then use MATSBAIJ for
A00 and A11 if it's possible to combine matrix types which the manual seems to imply.

Kind regards
Carl-Johan

On 26 Aug 2023, at 10:09 PM, Carl-Johan Thore <carljohanth...@gmail.com> wrote:

Hi,

I'm trying to use PCFIELDSPLIT with MATSBAIJ in PETSc 3.19.4.
According to the manual "[t]he fieldsplit preconditioner cannot
currently be used with the MATBAIJ or MATSBAIJ data formats if the
blocksize is larger than 1". Since my blocksize is exactly 1 it would seem that I can use PCFIELDSPLIT. But this fails with "PETSC ERROR: For symmetric format, iscol must equal isrow"
from MatCreateSubMatrix_MPISBAIJ. Tracing backwards one ends up in
fieldsplit.c at

/* extract the A01 and A10 matrices */ ilink = jac->head;
PetscCall(ISComplement(ilink->is_col, rstart, rend, &ccis)); if
(jac->offdiag_use_amat) { PetscCall(MatCreateSubMatrix(pc->mat,
ilink->is, ccis, MAT_INITIAL_MATRIX, &jac->B)); } else {
PetscCall(MatCreateSubMatrix(pc->pmat, ilink->is, ccis,
MAT_INITIAL_MATRIX, &jac->B)); }

This, since my A01 and A10 are not square, seems to explain why iscol is not equal to isrow.
From this I gather that it is in general NOT possible to use
PCFIELDSPLIT with MATSBAIJ even with block size 1?

Kind regards,
Carl-Johan

Re: [petsc-users] PCFIELDSPLIT with MATSBAIJ

Reply via email to