Hi Matthew,

Thanks for having a look, your example runs just like mine in Fortran.

In serial, the value (0.0,0.0) was inserted whereas it shouldn't have.

In parallel, you'll see that an error "Inserting a new nonzero at global row/column" is triggered.

In both cases, MatSetValues tries to insert a zero value whereas IGNORE_ZERO_ENTRIES was set. That's what Barry is looking into, if I'm not mistaken.


Thibaut


On 24/10/2019 02:31, Matthew Knepley wrote:
On Tue, Oct 22, 2019 at 1:37 PM Thibaut Appel <[email protected] <mailto:[email protected]>> wrote:

    Hi both,

    Please find attached a tiny example (in Fortran, sorry Matthew)
    that - I think - reproduces the problem we mentioned.

    Let me know.

Okay, I converted to C so I could understand, and it runs fine for me:

master *:~/Downloads/tmp$ PETSC_ARCH=arch-master-complex-debug make main

/PETSc3/petsc/bin/mpicc -o main.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g3 -I/PETSc3/petsc/petsc-dev/include -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include -I/PETSc3/petsc/include -I/opt/X11/include`pwd`/main.c

/PETSc3/petsc/bin/mpicc -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,-no_compact_unwind-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g3-o main main.o -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl

master *:~/Downloads/tmp$ ./main

After first assembly:

Mat Object: 1 MPI processes

type: seqaij

row 0: (0, 1. + 1. i)

row 1: (1, 1. + 1. i)

row 2: (2, 1. + 1. i)

row 3: (3, 1. + 1. i)

row 4: (4, 1. + 1. i)

row 5: (5, 1. + 1. i)

row 6: (6, 1. + 1. i)

row 7: (7, 1. + 1. i)

row 8: (8, 1. + 1. i)

row 9: (9, 1. + 1. i)

After second assembly:

Mat Object: 1 MPI processes

type: seqaij

row 0: (0, 1. + 1. i)

row 1: (1, 1. + 1. i)

row 2: (2, 1. + 1. i)

row 3: (3, 1. + 1. i)

row 4: (4, 1. + 1. i)

row 5: (5, 1. + 1. i)

row 6: (6, 1. + 1. i)

row 7: (7, 1. + 1. i)

row 8: (8, 1. + 1. i)

row 9: (9, 1. + 1. i)

row 0 col: 9 val: 0. + 0. i

I attach my code.  So then I ran your Fortran:

/PETSc3/petsc/bin/mpif90 -c -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -Wno-unused-variable -g-I/PETSc3/petsc/petsc-dev/include -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include -I/PETSc3/petsc/include -I/opt/X11/include-o main2.o main2.F90

/PETSc3/petsc/bin/mpif90 -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,-no_compact_unwind-Wall -ffree-line-length-0 -Wno-unused-dummy-argument -Wno-unused-variable -g -o main2 main2.o -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl

master *:~/Downloads/tmp$ ./main2

After first assembly:

Mat Object: 1 MPI processes

type: seqaij

row 0: (0, 1.)

row 1: (1, 1.)

row 2: (2, 1.)

row 3: (3, 1.)

row 4: (4, 1.)

row 5: (5, 1.)

row 6: (6, 1.)

row 7: (7, 1.)

row 8: (8, 1.)

row 9: (9, 1.)

After second assembly:

Mat Object: 1 MPI processes

type: seqaij

row 0: (0, 1.)

row 1: (1, 1.)

row 2: (2, 1.)

row 3: (3, 1.)

row 4: (4, 1.)

row 5: (5, 1.)

row 6: (6, 1.)

row 7: (7, 1.)

row 8: (8, 1.)

row 9: (9, 1.)

row:0 col:9 val:0.000000000000000000E+000.000000000000000000E+00


I am not seeing an error. Am I not running it correctly?

  Thanks,

     MAtt

    Thibaut


    On 22/10/2019 17:48, Matthew Knepley wrote:
    On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Hong,

        Thank you for having a look, I copied/pasted your code
        snippet into ex28.c and the error indeed appears if you
        change that col[0]. That's because you did not allow a new
        non-zero location in the matrix with the option
        MAT_NEW_NONZERO_LOCATION_ERR.

        I spent the day debugging the code and already checked my
        calls to MatSetValues:

        For all MatSetValues calls corresponding to the row/col
        location in the error messages in the subsequent assembly,
        the numerical value associated with that row/col was exactly
        (0.0,0.0) (complex arithmetic) so it shouldn't be inserted
        w.r.t. the option MAT_IGNORE_ZERO_ENTRIES. It seems
        MatSetValues still did it anyway.

    Okay, lets solve this problem first. You say that

      - You called MatSetOption(A,  MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE)
      - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val
    had a complex 0 in it
      - PETSc tried to insert the complex 0

    This should be really easy to test in a tiny example. Do you mind
    making it? If its broken, I will fix it.

      Thanks,

        Matt

        I was able to solve the problem by adding
        MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after
        my first assembly. However I don't know why it fixed it as
        the manual seems to say this is just for efficiency purposes

        It "should be specified after the first matrix has been fully
        assembled. This option ensures that certain data structures
        and communication information will be reused (instead of
        regenerated during successive steps, thereby increasing
        efficiency"

        So I'm still puzzled by why I got that error in the first
        place. Unless "regenerated" implies resetting some attributes
        of the preallocated non-zero structure / first assembly?


        Thibaut


        On 22/10/2019 17:06, Zhang, Hong wrote:
        Thibaut:
        Check your code on MatSetValues(), likely you set a value
        "to a new nonzero at global row/column (200, 160) into
        matrix" L.
        I tested petsc/src/mat/examples/tests/ex28.c by adding
        @@ -95,6 +95,26 @@ int main(int argc,char **args)
           /* Compute numeric factors using same F, then solve */
           for (k = 0; k < num_numfac; k++) {
             /* Get numeric factor of A[k] */
        +    if (k==0) {
        +      ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
        +      for (i=rstart; i<rend; i++) {
        +        col[0] = i-1; col[1] = i; col[2] = i+1;
        +        if (i == 0) {
        +          ierr =
        MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
        +        } else if (i == N-1) {
        +          ierr =
        MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
        +        } else {
        +          ierr =
        MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
        +        }
        +      }
        +      if (!rank) {
        +      i = N - 1; col[0] = N - 1;
        +        ierr =
        MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
        +      }
        +      ierr =
        MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
        +      ierr =
        MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
        +    }
        +

        It works in both sequential and parallel. If I set col[0] =
        0, then I get the same error as yours.
        Hong

            Dear PETSc developers,

            I'm extending a validated matrix preallocation/assembly
            part of my code to solve multiple linear systems with
            MUMPS at each iteration of a main loop, following the
            example src/mat/examples/tests/ex28.c that Hong Zhang
            added a few weeks ago. The difference is that I'm using
            just 1 matrix to solve different systems.

            I'm trying to investigate a nasty bug arising when I try
            to assemble "for a second time" that MPIAIJ matrix. The
            issue arises only in parallel, serial works fine.

            Creating 1 MPIAIJ matrix, preallocating it "perfectly"
            with the case where I have the fewest zero entries in
            the non-zero structure, before getting its symbolic
            factorization.

            Further in the main loop, I'm solely changing its
            entries *retaining the non-zero structure*.

            Here is the simplified Fortran code I'm using:

            ! Fill (M,N) case to ensure all non-zero entries are
            preallocated
            CALL set_equations(M,N)

            CALL alloc_matrix(L)
              ! --> Call
            MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
              ! --> Sets MAT_IGNORE_ZERO_ENTRIES,
            MAT_NEW_NONZERO_ALLOCATION_ERR, MAT_NO_OFF_PROC_ENTRIES
            to true

            CALL assemble_matrix(L)
              ! --> Calls MatSetValues with ADD_VALUES
              ! --> Call MatAssemblyBegin/MatAssemblyEnd

            ! Tell PETSc that new non-zero insertions in matrix are
            forbidden
            CALL
            MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)

            CALL set_mumps_parameters()

            ! Get symbolic LU factorization using MUMPS
            CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
            CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
            CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)

            CALL initialize_right_hand_sides()

            ! Zero matrix entries
            CALL MatZeroEntries(L,ierr)

            ! Main loop
            DO itr=1, maxitr

              DO m = 1, M
                DO n = 1, N

                CALL set_equations(m,n)
                CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1,
            CRASH IN MatSetValues call

                ! Solving the linear system associated with (m,n)
                CALL MatLUFactorNumeric(F,L,info,ierr)
                CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)

                ! Process v_rhs's from v_sol's for next iteration

                CALL MatZeroEntries(L,ierr)

                END DO
              END DO

            END DO


            Testing on a small case, the error I get is

            [1]PETSC ERROR: --------------------- Error Message
            --------------------------------------------------------------
            [1]PETSC ERROR: Argument out of range
            [1]PETSC ERROR: Inserting a new nonzero at global
            row/column (200, 160) into matrix
            [1]PETSC ERROR: See
            https://www.mcs.anl.gov/petsc/documentation/faq.html
            <https://www.mcs.anl.gov/petsc/documentation/faq.html>
            for trouble shooting.
            [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
            [1]PETSC ERROR: Configure options
            --PETSC_ARCH=cplx_gcc_debug --with-scalar-type=complex
            --with-precision=double --with-debugging=1
            --with-valgrind=1 --with-debugger=gdb
            --with-fortran-kernels=1 --download-mpich
            --download-fblaslapack --download-scalapack
            --download-metis --download-parmetis --download-ptscotch
            --download-mumps --download-slepc --COPTFLAGS="-O0 -g"
            --CXXOPTFLAGS="-O0 -g" --FOPTFLAGS="-O0 -g -fbacktrace"
            [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in
            /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
            [1]PETSC ERROR: #2 MatSetValues() line 1375 in
            /home/thibaut/Packages/petsc/src/mat/interface/matrix.c
            [1]PETSC ERROR: #3 User provided function() line 0 in
            User file
            application called MPI_Abort(MPI_COMM_SELF, 63) - process 0


            which I don't understand. That element was not in the
            non-zero structure and wasn't preallocated. I printed
            the value to be inserted at this location (200,160) and
            it is exactly (0.0000000000000000,0.0000000000000000) so
            this entry should not be inserted due to
            MAT_IGNORE_ZERO_ENTRIES, however it seems it is. I'm
            using ADD_VALUES in MatSetValues but it is the only call
            where (200,160) is inserted.


                - I zero the matrix entries with MatZeroEntries
            which retains the non-zero structure (checked when I
            print the matrix) but tried to comment the corresponding
            calls.

                - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND
            MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without
            effect.


            Perhaps there's something fundamentally wrong in my
            approach, in any case would you have any suggestions to
            identify the exact problem?

            Using PETSc 3.12.0. Thanks for your support,


            Thibaut



-- What most experimenters take for granted before they begin their
    experiments is infinitely more interesting than any results to
    which their experiments lead.
    -- Norbert Wiener

    https://www.cse.buffalo.edu/~knepley/
    <http://www.cse.buffalo.edu/~knepley/>



--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Reply via email to