Yes ok, so it seems the problem arose in parallel only, as mentioned in my first email. That output confused me,

Thibaut

On 25/10/2019 15:19, Smith, Barry F. wrote:
   That is just a write statement in the code. You are trying to put in the 0 
value but it is ignoring it as it is suppose to so all is good for the 
sequential guess.



On Oct 25, 2019, at 8:56 AM, Thibaut Appel <[email protected]> wrote:

Hi Barry,

I was referring to:

row 0: (0, 1.)
row 1: (1, 1.)
row 2: (2, 1.)
row 3: (3, 1.)
row 4: (4, 1.)
row 5: (5, 1.)
row 6: (6, 1.)
row 7: (7, 1.)
row 8: (8, 1.)
row 9: (9, 1.)
row:  0 col:  9 val:  0.000000000000000000E+00  0.000000000000000000E+00


The last line. But I was probably mistaken - if it was inserted it would have 
been
row 0: (0, 1.), (9, 0.)

on the first line instead?

Thibaut



On 25/10/2019 14:41, Smith, Barry F. wrote:

On Oct 24, 2019, at 5:09 AM, Thibaut Appel <[email protected]> wrote:

Hi Matthew,

Thanks for having a look, your example runs just like mine in Fortran.

In serial, the value (0.0,0.0) was inserted whereas it shouldn't have.
     I'm sorry, I don't see this for the serial case:

$ petscmpiexec -n 1 ./ex241f
Mat Object: 1 MPI processes
   type: seqaij
row 0: (0, 2.)
row 1: (1, 2.)
row 2: (2, 2.)
row 3: (3, 2.)
row 4: (4, 2.)
row 5: (5, 2.)
row 6: (6, 2.)
row 7: (7, 2.)
row 8: (8, 2.)
row 9: (9, 2.)

Where is the "(0.0,0.0) was inserted whereas it shouldn't have."?


Barry


In parallel, you'll see that an error "Inserting a new nonzero at global 
row/column" is triggered.

In both cases, MatSetValues tries to insert a zero value whereas 
IGNORE_ZERO_ENTRIES was set. That's what Barry is looking into, if I'm not 
mistaken.



Thibaut



On 24/10/2019 02:31, Matthew Knepley wrote:
On Tue, Oct 22, 2019 at 1:37 PM Thibaut Appel <[email protected]> wrote:
Hi both,

Please find attached a tiny example (in Fortran, sorry Matthew) that - I think 
- reproduces the problem we mentioned.

Let me know.

Okay, I converted to C so I could understand, and it runs fine for me:

master *:~/Downloads/tmp$ PETSC_ARCH=arch-master-complex-debug make main
/PETSc3/petsc/bin/mpicc -o main.o -c -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden 
-g3   -I/PETSc3/petsc/petsc-dev/include 
-I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include 
-I/PETSc3/petsc/include -I/opt/X11/include    `pwd`/main.c
/PETSc3/petsc/bin/mpicc -Wl,-multiply_defined,suppress -Wl,-multiply_defined 
-Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first 
-Wl,-no_compact_unwind  -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden 
-g3  -o main main.o 
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib 
-L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 
-L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -Wl,-rpath,/usr/local/lib 
-L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 -llapack -lblas -lhdf5hl_fortran 
-lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 
-lctetgen -lstdc++ -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread 
-lgfortran -lgcc_s.10.5 -lstdc++ -ldl
master *:~/Downloads/tmp$ ./main
After first assembly:
Mat Object: 1 MPI processes
   type: seqaij
row 0: (0, 1. + 1. i)
row 1: (1, 1. + 1. i)
row 2: (2, 1. + 1. i)
row 3: (3, 1. + 1. i)
row 4: (4, 1. + 1. i)
row 5: (5, 1. + 1. i)
row 6: (6, 1. + 1. i)
row 7: (7, 1. + 1. i)
row 8: (8, 1. + 1. i)
row 9: (9, 1. + 1. i)
After second assembly:
Mat Object: 1 MPI processes
   type: seqaij
row 0: (0, 1. + 1. i)
row 1: (1, 1. + 1. i)
row 2: (2, 1. + 1. i)
row 3: (3, 1. + 1. i)
row 4: (4, 1. + 1. i)
row 5: (5, 1. + 1. i)
row 6: (6, 1. + 1. i)
row 7: (7, 1. + 1. i)
row 8: (8, 1. + 1. i)
row 9: (9, 1. + 1. i)
row 0 col: 9 val: 0. + 0. i

I attach my code.  So then I ran your Fortran:

/PETSc3/petsc/bin/mpif90 -c -Wall -ffree-line-length-0 
-Wno-unused-dummy-argument -Wno-unused-variable -g    
-I/PETSc3/petsc/petsc-dev/include 
-I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include 
-I/PETSc3/petsc/include -I/opt/X11/include      -o main2.o main2.F90
/PETSc3/petsc/bin/mpif90 -Wl,-multiply_defined,suppress -Wl,-multiply_defined 
-Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first 
-Wl,-no_compact_unwind  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument 
-Wno-unused-variable -g   -o main2 main2.o 
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib 
-Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib 
-L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 
-L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -Wl,-rpath,/usr/local/lib 
-L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 -llapack -lblas -lhdf5hl_fortran 
-lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 
-lctetgen -lstdc++ -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread 
-lgfortran -lgcc_s.10.5 -lstdc++ -ldl
master *:~/Downloads/tmp$ ./main2
  After first assembly:
Mat Object: 1 MPI processes
   type: seqaij
row 0: (0, 1.)
row 1: (1, 1.)
row 2: (2, 1.)
row 3: (3, 1.)
row 4: (4, 1.)
row 5: (5, 1.)
row 6: (6, 1.)
row 7: (7, 1.)
row 8: (8, 1.)
row 9: (9, 1.)
  After second assembly:
Mat Object: 1 MPI processes
   type: seqaij
row 0: (0, 1.)
row 1: (1, 1.)
row 2: (2, 1.)
row 3: (3, 1.)
row 4: (4, 1.)
row 5: (5, 1.)
row 6: (6, 1.)
row 7: (7, 1.)
row 8: (8, 1.)
row 9: (9, 1.)
  row:  0 col:  9 val:  0.000000000000000000E+00  0.000000000000000000E+00

I am not seeing an error. Am I not running it correctly?

   Thanks,

      MAtt
Thibaut



On 22/10/2019 17:48, Matthew Knepley wrote:
On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users 
<[email protected]> wrote:
Hi Hong,

Thank you for having a look, I copied/pasted your code snippet into ex28.c and 
the error indeed appears if you change that col[0]. That's because you did not 
allow a new non-zero location in the matrix with the option 
MAT_NEW_NONZERO_LOCATION_ERR.

I spent the day debugging the code and already checked my calls to MatSetValues:

For all MatSetValues calls corresponding to the row/col location in the error 
messages in the subsequent assembly, the numerical value associated with that 
row/col was exactly (0.0,0.0) (complex arithmetic) so it shouldn't be inserted 
w.r.t. the option MAT_IGNORE_ZERO_ENTRIES. It seems MatSetValues still did it 
anyway.

Okay, lets solve this problem first. You say that

   - You called MatSetOption(A,  MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE)
   - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val had a 
complex 0 in it
   - PETSc tried to insert the complex 0

This should be really easy to test in a tiny example. Do you mind making it? If 
its broken, I will fix it.

   Thanks,

     Matt
I was able to solve the problem by adding 
MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first assembly. 
However I don't know why it fixed it as the manual seems to say this is just 
for efficiency purposes

It "should be specified after the first matrix has been fully assembled. This option 
ensures that certain data structures and communication information will be reused 
(instead of regenerated during successive steps, thereby increasing efficiency"

So I'm still puzzled by why I got that error in the first place. Unless 
"regenerated" implies resetting some attributes of the preallocated non-zero 
structure / first assembly?



Thibaut



On 22/10/2019 17:06, Zhang, Hong wrote:
Thibaut:
Check your code on MatSetValues(), likely you set a value "to a new nonzero at 
global row/column (200, 160) into matrix" L.
I tested petsc/src/mat/examples/tests/ex28.c by adding
@@ -95,6 +95,26 @@ int main(int argc,char **args)
    /* Compute numeric factors using same F, then solve */
    for (k = 0; k < num_numfac; k++) {
      /* Get numeric factor of A[k] */
+    if (k==0) {
+      ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
+      for (i=rstart; i<rend; i++) {
+        col[0] = i-1; col[1] = i; col[2] = i+1;
+        if (i == 0) {
+          ierr = 
MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
+        } else if (i == N-1) {
+          ierr = 
MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
+        } else {
+          ierr = 
MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
+        }
+      }
+      if (!rank) {
+      i = N - 1; col[0] = N - 1;
+        ierr = MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
+      }
+      ierr = MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
+      ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
+    }
+

It works in both sequential and parallel. If I set col[0] = 0, then I get the 
same error as yours.
Hong
Dear PETSc developers,

I'm extending a validated matrix preallocation/assembly part of my code to 
solve multiple linear systems with MUMPS at each iteration of a main loop, 
following the example src/mat/examples/tests/ex28.c that Hong Zhang added a few 
weeks ago. The difference is that I'm using just 1 matrix to solve different 
systems.

I'm trying to investigate a nasty bug arising when I try to assemble "for a second 
time" that MPIAIJ matrix. The issue arises only in parallel, serial works fine.

Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the case where I 
have the fewest zero entries in the non-zero structure, before getting its symbolic 
factorization.

Further in the main loop, I'm solely changing its entries retaining the 
non-zero structure.

Here is the simplified Fortran code I'm using:

! Fill (M,N) case to ensure all non-zero entries are preallocated
CALL set_equations(M,N)

CALL alloc_matrix(L)
   ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
   ! --> Sets MAT_IGNORE_ZERO_ENTRIES, MAT_NEW_NONZERO_ALLOCATION_ERR, 
MAT_NO_OFF_PROC_ENTRIES to true

CALL assemble_matrix(L)
   ! --> Calls MatSetValues with ADD_VALUES
   ! --> Call MatAssemblyBegin/MatAssemblyEnd

! Tell PETSc that new non-zero insertions in matrix are forbidden
CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)

CALL set_mumps_parameters()

! Get symbolic LU factorization using MUMPS
CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)

CALL initialize_right_hand_sides()

! Zero matrix entries
CALL MatZeroEntries(L,ierr)

! Main loop
DO itr=1, maxitr

   DO m = 1, M
     DO n = 1, N

     CALL set_equations(m,n)
     CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN MatSetValues 
call

     ! Solving the linear system associated with (m,n)
     CALL MatLUFactorNumeric(F,L,info,ierr)
     CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)

     ! Process v_rhs's from v_sol's for next iteration

     CALL MatZeroEntries(L,ierr)

     END DO
   END DO

END DO



Testing on a small case, the error I get is

[1]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[1]PETSC ERROR: Argument out of range
[1]PETSC ERROR: Inserting a new nonzero at global row/column (200, 160) into 
matrix
[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
[1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug --with-scalar-type=complex 
--with-precision=double --with-debugging=1 --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1 
--download-mpich --download-fblaslapack --download-scalapack --download-metis --download-parmetis 
--download-ptscotch --download-mumps --download-slepc --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 
-g" --FOPTFLAGS="-O0 -g -fbacktrace"
[1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in 
/home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
[1]PETSC ERROR: #2 MatSetValues() line 1375 in 
/home/thibaut/Packages/petsc/src/mat/interface/matrix.c
[1]PETSC ERROR: #3 User provided function() line 0 in User file
application called MPI_Abort(MPI_COMM_SELF, 63) - process 0
which I don't understand. That element was not in the non-zero structure and 
wasn't preallocated. I printed the value to be inserted at this location 
(200,160) and it is exactly (0.0000000000000000,0.0000000000000000) so this 
entry should not be inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems 
it is. I'm using ADD_VALUES in MatSetValues but it is the only call where 
(200,160) is inserted.



     - I zero the matrix entries with MatZeroEntries which retains the non-zero 
structure (checked when I print the matrix) but tried to comment the 
corresponding calls.

     - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND 
MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect.



Perhaps there's something fundamentally wrong in my approach, in any case would 
you have any suggestions to identify the exact problem?

Using PETSc 3.12.0. Thanks for your support,



Thibaut


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

Reply via email to