Hello Pierre,

After applying your patch to my local version of PETSc, all of the cases that previously caused the provided test to fail are now running smoothly. In a more complex context (with more processes and colors in my application), no errors are found and the sub-matrices look OK.

Thank you very much for your time. This debugged function will greatly simplify the development of my application.

Best regards

A.S.

Le 26/08/2025 à 15:46, Matthew Knepley a écrit :
On Tue, Aug 26, 2025 at 9:42 AM Pierre Jolivet <pie...@joliv.et> wrote:

    It’s indeed very suspicious (to me) that we are using rmap to
    change a column index.
    Switching to cmap gets your code running, but I’ll need to see if
    this triggers regressions.


That looks right to me. I am sure this has only been tested for GASM, which would be symmetric.

  Thanks,

     Matt

    Thanks for the report,
    Pierre

    diff --git a/src/mat/impls/aij/mpi/mpiov.c
    b/src/mat/impls/aij/mpi/mpiov.c
    index d1037d7d817..051981ebe9a 100644
    --- a/src/mat/impls/aij/mpi/mpiov.c
    +++ b/src/mat/impls/aij/mpi/mpiov.c
    @@ -2948,3 +2948,3 @@ PetscErrorCode MatSetSeqMats_MPIAIJ(Mat C,
    IS rowemb, IS dcolemb, IS ocolemb, Ma
    -    PetscCall(PetscLayoutGetRange(C->rmap, &rstart, &rend));
    +    PetscCall(PetscLayoutGetRange(C->cmap, &rstart, &rend));
         shift      = rend - rstart;

    $ cat proc_0_output.txt
    rstart 0 rend 4
    Mat Object: 3 MPI processes
      type: mpiaij
      row 0:   (0, 101.)    (3, 104.)    (6, 107.)  (9, 110.)
      row 1:   (2, 203.)    (5, 206.)    (8, 209.)  (11, 212.)
      row 2:   (1, 302.)    (4, 305.)    (7, 308.)  (10, 311.)
      row 3:   (0, 401.)    (3, 404.)    (6, 407.)  (9, 410.)
      row 4:   (2, 503.)    (5, 506.)    (8, 509.)  (11, 512.)
      row 5:   (1, 602.)    (4, 605.)    (7, 608.)  (10, 611.)
      row 6:   (0, 701.)    (3, 704.)    (6, 707.)  (9, 710.)
      row 7:   (2, 803.)    (5, 806.)    (8, 809.)  (11, 812.)
      row 8:   (1, 902.)    (4, 905.)    (7, 908.)  (10, 911.)
      row 9:   (0, 1001.)    (3, 1004.)    (6, 1007.)  (9, 1010.)
      row 10:   (2, 1103.)    (5, 1106.)    (8, 1109.)    (11, 1112.)
      row 11:   (1, 1202.)    (4, 1205.)    (7, 1208.)    (10, 1211.)
    idxr proc
    IS Object: 2 MPI processes
      type: general
    [0] Number of indices in set 4
    [0] 0 0
    [0] 1 1
    [0] 2 2
    [0] 3 3
    [1] Number of indices in set 4
    [1] 0 4
    [1] 1 5
    [1] 2 6
    [1] 3 7
    idxc proc
    IS Object: 2 MPI processes
      type: general
    [0] Number of indices in set 2
    [0] 0 0
    [0] 1 1
    [1] Number of indices in set 2
    [1] 0 6
    [1] 1 7
    Mat Object: 2 MPI processes
      type: mpiaij
      row 0:   (0, 101.)    (2, 107.)
      row 1:
      row 2:   (1, 302.)    (3, 308.)
      row 3:   (0, 401.)    (2, 407.)
      row 4:
      row 5:   (1, 602.)    (3, 608.)
      row 6:   (0, 701.)    (2, 707.)
      row 7:
    rstart 0 rend 4
    local row 0: ( 0 , 1.010000e+02) ( 2 , 1.070000e+02)
    local row 1:
    local row 2: ( 1 , 3.020000e+02) ( 3 , 3.080000e+02)
    local row 3: ( 0 , 4.010000e+02) ( 2 , 4.070000e+02)

    On 26 Aug 2025, at 3:18 PM, Pierre Jolivet <pie...@joliv.et> wrote:


    On 26 Aug 2025, at 12:50 PM, Alexis SALZMAN
    <alexis.salz...@ec-nantes.fr> wrote:

    Mark, you were right and I was wrong about the dense matrix.
    Adding explicit zeros to the distributed matrix used to extract
    the sub-matrices (making it dense) in my test does not change
    the behaviour: there is still an error.

    I am finding it increasingly difficult to understand the logic
    of the row and column 'IS' creation. I ran many tests to achieve
    the desired result: a rectangular sub-matrix (so a rectangular
    or square sub-matrix appears to be possible). However, many
    others resulted in the same kind of error.

    This may be a PETSc bug in MatSetSeqMats_MPIAIJ().
    -> 2965   PetscCall(MatSetValues(aij->B, 1, &row, 1, &col, &v,
    INSERT_VALUES));
    col has a value of 4, which doesn’t make sense since the output
    Mat has 4 columns (thus, has the error message suggests, the
    value should be lower than or equal to 3).

    Thanks,
    Pierre

    From what I observed, the test only works if the column
    selection contribution (size_c in the test) has a specific value
    related to the row selection contribution (size_r in the test)
    for proc 0 (rank for both communicator and sub-communicator):

      * if size_r==2 then if size_c<=2 it works.
      * if size_r>=3 and size_r<=5 then size_c==size_r is the only
        working case.

    This occurs "regardless" of what is requested in proc 1 and in
    selr/selc (It can't be a dummy setting, though). In any case,
    it's certainly not an exhaustive analysis.

    Many thanks to anyone who can explain to me the logic behind the
    construction of row and column 'IS'.

    Regards

    A.S.


    Le 25/08/2025 à 20:00, Alexis SALZMAN a écrit :

    Thanks Mark for your attention.

    The uncleaned error message, compared to my post in July, is as
    follows:

    [0]PETSC ERROR: --------------------- Error Message
    --------------------------------------------------------------
    [0]PETSC ERROR: Argument out of range
    [0]PETSC ERROR: Column too large: col 4 max 3
[0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eQa_exf2PCgmMQ0L4h9al-nkWsWRJJ1Zwkjm_qHJsqT0zwLzW7eMjKlkRssc6loRju6u04y4yp9L0U39POoDIvyQKcfmQGUDmdo$ <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dWBkCu100EMuxu8ooVUnqSFN7OhzOBoNHAiwDYEQ5cJ921sU5hdFb-G24ounZFeUQgZkfWqGRX4iIHyQ-xLQElJst5RbKa2pGnk$>
    for trouble shooting.
    [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown
    [0]PETSC ERROR: subnb with 3 MPI process(es) and PETSC_ARCH  on
    pc-str97.ec-nantes.fr 
<https://urldefense.us/v3/__http://pc-str97.ec-nantes.fr__;!!G_uCfscf7eWS!eQa_exf2PCgmMQ0L4h9al-nkWsWRJJ1Zwkjm_qHJsqT0zwLzW7eMjKlkRssc6loRju6u04y4yp9L0U39POoDIvyQKcfm9nWwEJg$
 > by salzman
    Mon Aug 25 19:11:37 2025
    [0]PETSC ERROR: Configure options:
    PETSC_ARCH=real_fc41_Release_gcc_i4
    PETSC_DIR=/home/salzman/devel/ExternalLib/build/PETSC/petsc
    --doCleanup=1 --with-scalar-type=real --known-level1-dcach
    e-linesize=64 --with-cc=gcc --CFLAGS="-fPIC "
    --CC_LINKER_FLAGS=-fopenmp --with-cxx=g++
    --with-cxx-dialect=c++20 --CXXFLAGS="-fPIC "
    --CXX_LINKER_FLAGS=-fopenmp --with-fc=gfortran --FFLAGS=
    "-fPIC " --FC_LINKER_FLAGS=-fopenmp --with-debugging=0
    --with-fortran-bindings=0 --with-fortran-kernels=1
    --with-mpi-compilers=0
    --with-mpi-include=/usr/include/openmpi-x86_64 --with-mpi-li
    
b="[/usr/lib64/openmpi/lib/libmpi.so,/usr/lib64/openmpi/lib/libmpi.so,/usr/lib64/openmpi/lib/libmpi_mpifh.so]"
    
--with-blas-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_intel_lp64.so,/opt/i
    
ntel/oneapi/mkl/latest/lib/libmkl_gnu_thread.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_core.so]"
    
--with-lapack-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_intel_lp64.so,/opt/intel/oneapi
    
/mkl/latest/lib/libmkl_gnu_thread.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_core.so]"
    --with-mumps=1
    --with-mumps-include=/home/salzman/local/i4_gcc/include
    --with-mumps-lib="[/home/salzma
    
n/local/i4_gcc/lib/libdmumps.so,/home/salzman/local/i4_gcc/lib/libmumps_common.so,/home/salzman/local/i4_gcc/lib/libpord.so]"
    --with-scalapack-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_
    
scalapack_lp64.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_blacs_openmpi_lp64.so]"
    --with-mkl_pardiso=1
    --with-mkl_pardiso-include=/opt/intel/oneapi/mkl/latest/include
    --with-mkl_pardiso-lib
    ="[/opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so]"
    --with-hdf5=1 --with-hdf5-include=/usr/include/openmpi-x86_64
    --with-hdf5-lib="[/usr/lib64/openmpi/lib/libhdf5.so]" --with
    -pastix=0 --download-pastix=no --with-hwloc=1
    --with-hwloc-dir=/home/salzman/local/i4_gcc --download-hwloc=no
    --with-ptscotch-include=/home/salzman/local/i4_gcc/include
    --with-ptscotch-lib=
    
"[/home/salzman/local/i4_gcc/lib/libptscotch.a,/home/salzman/local/i4_gcc/lib/libptscotcherr.a,/home/salzman/local/i4_gcc/lib/libptscotcherrexit.a,/home/salzman/local/i4_gcc/lib/libscotch.a
    
,/home/salzman/local/i4_gcc/lib/libscotcherr.a,/home/salzman/local/i4_gcc/lib/libscotcherrexit.a]"
    --with-hypre=1 --download-hypre=yes --with-suitesparse=1
    --with-suitesparse-include=/home/
    salzman/local/i4_gcc/include
    
--with-suitesparse-lib="[/home/salzman/local/i4_gcc/lib/libsuitesparseconfig.so,/home/salzman/local/i4_gcc/lib/libumfpack.so,/home/salzman/local/i4_gcc/lib/libk
    
lu.so,/home/salzman/local/i4_gcc/lib/libcholmod.so,/home/salzman/local/i4_gcc/lib/libspqr.so,/home/salzman/local/i4_gcc/lib/libcolamd.so,/home/salzman/local/i4_gcc/lib/libccolamd.so,/home/s
    
alzman/local/i4_gcc/lib/libcamd.so,/home/salzman/local/i4_gcc/lib/libamd.so,/home/salzman/local/i4_gcc/lib/libmetis.so]"
    --download-suitesparse=no --with-python-exec=python3.12
    --have-numpy
    =1 ---with-petsc4py=1 ---with-petsc4py-test-np=4
    ---with-mpi4py=1
    --prefix=/home/salzman/local/i4_gcc/real_arithmetic
    COPTFLAGS="-O3 -g " CXXOPTFLAGS="-O3 -g " FOPTFLAGS="-O3 -g "
    [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at
    /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/seq/aij.c:426
    [0]PETSC ERROR: #2 MatSetValues() at
    /home/salzman/devel/PETSc/petsc/src/mat/interface/matrix.c:1543
    [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at
    /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:2965
    [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at
    /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:3163
    [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at
    /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:3196
    [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at
    /home/salzman/devel/PETSc/petsc/src/mat/interface/matrix.c:7293
    [0]PETSC ERROR: #7 main() at subnb.c:181
    [0]PETSC ERROR: No PETSc Option Table entries
    [0]PETSC ERROR: ----------------End of Error Message
    -------send entire error message to
    petsc-ma...@mcs.anl.gov----------
    --------------------------------------------------------------------------

    This message comes from executing the attached test (I
    simplified the test by removing the block size from the matrix
    used for extraction, compared to the July test). In
    proc_xx_output.txt, you will find the output from the code
    execution with the -ok option (i.e. irow/idxr and icol/idxc are
    the same, i.e. a square sub-block for colour 0 distributed
    across the first two processes).

    Has expected in this case we obtain the 0,3,6,9 sub-block
    terms, which are distributed across processes 0 and 1 (two rows
    per proc).

    When asking for rectangular sub-block (i.e. with no option) it
    crash with column to large on process 0: 4 col max 3 ??? I ask
    for 4 rows and 2 columns in this process ???

    Otherwise, I mention the dense aspect of the matrix in ex183.c,
    because, in this case, no matter what selection is requested,
    all terms are non-null. If there is an issue with the way the
    selection is coded in the user program, I think it will be
    masked thanks to the full graph representation. However, this
    may not be the case — I should test it.

    I'll take a look at ex23.c.

    Thanks,

    A.S.



    Le 25/08/2025 à 17:55, Mark Adams a écrit :
    Ah, OK, never say never.

    MatCreateSubMatrices seems to support creating a new matrix
    with the communicator of the IS.
    It just needs to read from the input matrix and does not use
    it for communication, so it can do that.

    As far as rectangular matrices, there is no reason not to
    support that (the row IS and column IS can be distinct).
    Can you send the whole error message?
    There may not be a test that does this,
    but src/mat/tests/ex23.c looks like it may be a
    rectangular matrix output.

    And, it should not matter if the input matrix has a 100% full
    sparse matrix. It is still MatAIJ.
    The semantics and API is the same for sparse or dense matrices.

    Thanks,
    Mark

    On Mon, Aug 25, 2025 at 7:31 AM Alexis SALZMAN
    <alexis.salz...@ec-nantes.fr> wrote:

        Hi,

        Thanks for your answer, Mark. Perhaps
        MatCreateSubMatricesMPI is the only PETSc function that
        acts on a sub-communicator — I'm not sure — but it's clear
        that there's no ambiguity on that point. The first line of
        the documentation for that function states that it 'may
        live on subcomms'. This is confirmed by the
        'src/mat/tests/ex183.c' test case. I used this test case
        to understand the function, which helped me with my code
        and the example I provided in my initial post.
        Unfortunately, in this example, the matrix from which the
        sub-matrices are extracted is dense, even though it uses a
        sparse structure. This does not clarify how to define
        sub-matrices when extracting from a sparse distributed
        matrix. Since my initial post, I have discovered that
        having more columns than rows can also result in the same
        error message.

        So, my questions boil down to:

        Can MatCreateSubMatricesMPI extract rectangular matrices
        from a square distributed sparse matrix?

        If not, the fact that only square matrices can be
        extracted in this context should perhaps be mentioned in
        the documentation.

        If so, I would be very grateful for any assistance in
        defining an IS pair in this context.

        Regards

        A.S.

        Le 27/07/2025 à 00:15, Mark Adams a écrit :
        First, you can not mix communicators in PETSc calls in
        general (ever?), but this error looks like you might
        be asking for a row from the matrix that does not exist.
        You should start with a PETSc example code. Test it and
        modify it to suit your needs.

        Good luck,
        Mark

        On Fri, Jul 25, 2025 at 9:31 AM Alexis SALZMAN
        <alexis.salz...@ec-nantes.fr> wrote:

            Hi,

            As I am relatively new to Petsc, I may have
            misunderstood how to use the
            MatCreateSubMatricesMPI function. The attached code
            is tuned for three
            processes and extracts one matrix for each colour of
            a subcommunicator
            that has been created using the MPI_Comm_split
            function from an  MPIAij
            matrix. The following error message appears when the
            code is set to its
            default configuration (i.e. when a rectangular matrix
            is extracted with
            more rows than columns for colour 0):

            [0]PETSC ERROR: --------------------- Error Message
            --------------------------------------------------------------
            [0]PETSC ERROR: Argument out of range
            [0]PETSC ERROR: Column too large: col 4 max 3
            [0]PETSC ERROR: See
            
https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZqH097BZ0G0O3WI7RWrwIKFNpyk0czSWEqfusAeTlgEygAffwpgBUzsLw1TIoGkjZ3mYG-NRQxxFoxU4y8EyY0ofiz9I43Qwe0w$
            for trouble shooting.
            [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown

            ... petsc git hash 2a89477b25f compiled on a dell i9
            computer with Gcc
            14.3, mkl 2025.2, .....
            [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at
            ...petsc/src/mat/impls/aij/seq/aij.c:426
            [0]PETSC ERROR: #2 MatSetValues() at
            ...petsc/src/mat/interface/matrix.c:1543
            [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at
            .../petsc/src/mat/impls/aij/mpi/mpiov.c:2965
            [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at
            .../petsc/src/mat/impls/aij/mpi/mpiov.c:3163
            [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at
            .../petsc/src/mat/impls/aij/mpi/mpiov.c:3196
            [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at
            .../petsc/src/mat/interface/matrix.c:7293
            [0]PETSC ERROR: #7 main() at sub.c:169

            When the '-ok' option is selected, the code extracts
            a square matrix for
            colour 0, which runs smoothly in this case. Selecting
            the '-trans'
            option swaps the row and column selection indices,
            providing a
            transposed submatrix smoothly. For colour 1, which
            uses only one process
            and is therefore sequential, rectangular extraction
            is OK regardless of
            the shape.

            Is this dependency on the shape expected? Have I
            missed an important
            tuning step somewhere?

            Thank you in advance for any clarification.

            Regards

            A.S.

            P.S.: I'm sorry, but as I'm leaving my office for the
            following weeks
            this evening, I won't be very responsive during this
            period.






--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQa_exf2PCgmMQ0L4h9al-nkWsWRJJ1Zwkjm_qHJsqT0zwLzW7eMjKlkRssc6loRju6u04y4yp9L0U39POoDIvyQKcfmxLsZ-Is$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQa_exf2PCgmMQ0L4h9al-nkWsWRJJ1Zwkjm_qHJsqT0zwLzW7eMjKlkRssc6loRju6u04y4yp9L0U39POoDIvyQKcfmEYm5G4Y$ >

Reply via email to