Thanks for the test case. There is a bug in the code; the check is not in the correct place. I'll be working on a patch for 3.12
Barry > On Oct 23, 2019, at 8:31 PM, Matthew Knepley via petsc-users > <[email protected]> wrote: > > On Tue, Oct 22, 2019 at 1:37 PM Thibaut Appel <[email protected]> > wrote: > Hi both, > > Please find attached a tiny example (in Fortran, sorry Matthew) that - I > think - reproduces the problem we mentioned. > > Let me know. > > > Okay, I converted to C so I could understand, and it runs fine for me: > > master *:~/Downloads/tmp$ PETSC_ARCH=arch-master-complex-debug make main > /PETSc3/petsc/bin/mpicc -o main.o -c -Wall -Wwrite-strings > -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector > -Qunused-arguments -fvisibility=hidden -g3 > -I/PETSc3/petsc/petsc-dev/include > -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include > -I/PETSc3/petsc/include -I/opt/X11/include `pwd`/main.c > /PETSc3/petsc/bin/mpicc -Wl,-multiply_defined,suppress -Wl,-multiply_defined > -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first > -Wl,-no_compact_unwind -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden > -g3 -o main main.o > -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib > -L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 > -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 > -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 > -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco > -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90 > -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl > master *:~/Downloads/tmp$ ./main > After first assembly: > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1. + 1. i) > row 1: (1, 1. + 1. i) > row 2: (2, 1. + 1. i) > row 3: (3, 1. + 1. i) > row 4: (4, 1. + 1. i) > row 5: (5, 1. + 1. i) > row 6: (6, 1. + 1. i) > row 7: (7, 1. + 1. i) > row 8: (8, 1. + 1. i) > row 9: (9, 1. + 1. i) > After second assembly: > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1. + 1. i) > row 1: (1, 1. + 1. i) > row 2: (2, 1. + 1. i) > row 3: (3, 1. + 1. i) > row 4: (4, 1. + 1. i) > row 5: (5, 1. + 1. i) > row 6: (6, 1. + 1. i) > row 7: (7, 1. + 1. i) > row 8: (8, 1. + 1. i) > row 9: (9, 1. + 1. i) > row 0 col: 9 val: 0. + 0. i > > I attach my code. So then I ran your Fortran: > > /PETSc3/petsc/bin/mpif90 -c -Wall -ffree-line-length-0 > -Wno-unused-dummy-argument -Wno-unused-variable -g > -I/PETSc3/petsc/petsc-dev/include > -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include > -I/PETSc3/petsc/include -I/opt/X11/include -o main2.o main2.F90 > /PETSc3/petsc/bin/mpif90 -Wl,-multiply_defined,suppress -Wl,-multiply_defined > -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first > -Wl,-no_compact_unwind -Wall -ffree-line-length-0 -Wno-unused-dummy-argument > -Wno-unused-variable -g -o main2 main2.o > -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib > -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib > -L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 > -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 > -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 > -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco > -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90 > -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl > master *:~/Downloads/tmp$ ./main2 > After first assembly: > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) > row 1: (1, 1.) > row 2: (2, 1.) > row 3: (3, 1.) > row 4: (4, 1.) > row 5: (5, 1.) > row 6: (6, 1.) > row 7: (7, 1.) > row 8: (8, 1.) > row 9: (9, 1.) > After second assembly: > Mat Object: 1 MPI processes > type: seqaij > row 0: (0, 1.) > row 1: (1, 1.) > row 2: (2, 1.) > row 3: (3, 1.) > row 4: (4, 1.) > row 5: (5, 1.) > row 6: (6, 1.) > row 7: (7, 1.) > row 8: (8, 1.) > row 9: (9, 1.) > row: 0 col: 9 val: 0.000000000000000000E+00 0.000000000000000000E+00 > > I am not seeing an error. Am I not running it correctly? > > Thanks, > > MAtt > Thibaut > > > > On 22/10/2019 17:48, Matthew Knepley wrote: >> On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users >> <[email protected]> wrote: >> Hi Hong, >> >> Thank you for having a look, I copied/pasted your code snippet into ex28.c >> and the error indeed appears if you change that col[0]. That's because you >> did not allow a new non-zero location in the matrix with the option >> MAT_NEW_NONZERO_LOCATION_ERR. >> >> I spent the day debugging the code and already checked my calls to >> MatSetValues: >> >> For all MatSetValues calls corresponding to the row/col location in the >> error messages in the subsequent assembly, the numerical value associated >> with that row/col was exactly (0.0,0.0) (complex arithmetic) so it shouldn't >> be inserted w.r.t. the option MAT_IGNORE_ZERO_ENTRIES. It seems MatSetValues >> still did it anyway. >> >> Okay, lets solve this problem first. You say that >> >> - You called MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE) >> - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val had a >> complex 0 in it >> - PETSc tried to insert the complex 0 >> >> This should be really easy to test in a tiny example. Do you mind making it? >> If its broken, I will fix it. >> >> Thanks, >> >> Matt >> I was able to solve the problem by adding >> MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first >> assembly. However I don't know why it fixed it as the manual seems to say >> this is just for efficiency purposes >> >> It "should be specified after the first matrix has been fully assembled. >> This option ensures that certain data structures and communication >> information will be reused (instead of regenerated during successive steps, >> thereby increasing efficiency" >> >> So I'm still puzzled by why I got that error in the first place. Unless >> "regenerated" implies resetting some attributes of the preallocated non-zero >> structure / first assembly? >> >> >> >> Thibaut >> >> >> >> On 22/10/2019 17:06, Zhang, Hong wrote: >>> Thibaut: >>> Check your code on MatSetValues(), likely you set a value "to a new nonzero >>> at global row/column (200, 160) into matrix" L. >>> I tested petsc/src/mat/examples/tests/ex28.c by adding >>> @@ -95,6 +95,26 @@ int main(int argc,char **args) >>> /* Compute numeric factors using same F, then solve */ >>> for (k = 0; k < num_numfac; k++) { >>> /* Get numeric factor of A[k] */ >>> + if (k==0) { >>> + ierr = MatZeroEntries(A[0]);CHKERRQ(ierr); >>> + for (i=rstart; i<rend; i++) { >>> + col[0] = i-1; col[1] = i; col[2] = i+1; >>> + if (i == 0) { >>> + ierr = >>> MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr); >>> + } else if (i == N-1) { >>> + ierr = >>> MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr); >>> + } else { >>> + ierr = >>> MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr); >>> + } >>> + } >>> + if (!rank) { >>> + i = N - 1; col[0] = N - 1; >>> + ierr = >>> MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr); >>> + } >>> + ierr = MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>> + ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >>> + } >>> + >>> >>> It works in both sequential and parallel. If I set col[0] = 0, then I get >>> the same error as yours. >>> Hong >>> Dear PETSc developers, >>> >>> I'm extending a validated matrix preallocation/assembly part of my code to >>> solve multiple linear systems with MUMPS at each iteration of a main loop, >>> following the example src/mat/examples/tests/ex28.c that Hong Zhang added a >>> few weeks ago. The difference is that I'm using just 1 matrix to solve >>> different systems. >>> >>> I'm trying to investigate a nasty bug arising when I try to assemble "for a >>> second time" that MPIAIJ matrix. The issue arises only in parallel, serial >>> works fine. >>> >>> Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the case where >>> I have the fewest zero entries in the non-zero structure, before getting >>> its symbolic factorization. >>> >>> Further in the main loop, I'm solely changing its entries retaining the >>> non-zero structure. >>> >>> Here is the simplified Fortran code I'm using: >>> >>> ! Fill (M,N) case to ensure all non-zero entries are preallocated >>> CALL set_equations(M,N) >>> >>> CALL alloc_matrix(L) >>> ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation >>> ! --> Sets MAT_IGNORE_ZERO_ENTRIES, MAT_NEW_NONZERO_ALLOCATION_ERR, >>> MAT_NO_OFF_PROC_ENTRIES to true >>> >>> CALL assemble_matrix(L) >>> ! --> Calls MatSetValues with ADD_VALUES >>> ! --> Call MatAssemblyBegin/MatAssemblyEnd >>> >>> ! Tell PETSc that new non-zero insertions in matrix are forbidden >>> CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr) >>> >>> CALL set_mumps_parameters() >>> >>> ! Get symbolic LU factorization using MUMPS >>> CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr) >>> CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr) >>> CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr) >>> >>> CALL initialize_right_hand_sides() >>> >>> ! Zero matrix entries >>> CALL MatZeroEntries(L,ierr) >>> >>> ! Main loop >>> DO itr=1, maxitr >>> >>> DO m = 1, M >>> DO n = 1, N >>> >>> CALL set_equations(m,n) >>> CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN >>> MatSetValues call >>> >>> ! Solving the linear system associated with (m,n) >>> CALL MatLUFactorNumeric(F,L,info,ierr) >>> CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr) >>> >>> ! Process v_rhs's from v_sol's for next iteration >>> >>> CALL MatZeroEntries(L,ierr) >>> >>> END DO >>> END DO >>> >>> END DO >>> >>> >>> >>> Testing on a small case, the error I get is >>> >>> [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [1]PETSC ERROR: Argument out of range >>> [1]PETSC ERROR: Inserting a new nonzero at global row/column (200, 160) >>> into matrix >>> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown >>> [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug >>> --with-scalar-type=complex --with-precision=double --with-debugging=1 >>> --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1 >>> --download-mpich --download-fblaslapack --download-scalapack >>> --download-metis --download-parmetis --download-ptscotch --download-mumps >>> --download-slepc --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 -g" >>> --FOPTFLAGS="-O0 -g -fbacktrace" >>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in >>> /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [1]PETSC ERROR: #2 MatSetValues() line 1375 in >>> /home/thibaut/Packages/petsc/src/mat/interface/matrix.c >>> [1]PETSC ERROR: #3 User provided function() line 0 in User file >>> application called MPI_Abort(MPI_COMM_SELF, 63) - process 0 >>> >>> >>> which I don't understand. That element was not in the non-zero structure >>> and wasn't preallocated. I printed the value to be inserted at this >>> location (200,160) and it is exactly >>> (0.0000000000000000,0.0000000000000000) so this entry should not be >>> inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems it is. I'm using >>> ADD_VALUES in MatSetValues but it is the only call where (200,160) is >>> inserted. >>> >>> >>> >>> - I zero the matrix entries with MatZeroEntries which retains the >>> non-zero structure (checked when I print the matrix) but tried to comment >>> the corresponding calls. >>> >>> - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND >>> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect. >>> >>> >>> >>> Perhaps there's something fundamentally wrong in my approach, in any case >>> would you have any suggestions to identify the exact problem? >>> >>> Using PETSc 3.12.0. Thanks for your support, >>> >>> >>> >>> Thibaut >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <main.c>
