On Tue, Oct 22, 2019 at 1:37 PM Thibaut Appel <t.appe...@imperial.ac.uk> wrote:
> Hi both, > > Please find attached a tiny example (in Fortran, sorry Matthew) that - I > think - reproduces the problem we mentioned. > > Let me know. > > Okay, I converted to C so I could understand, and it runs fine for me: master *:~/Downloads/tmp$ PETSC_ARCH=arch-master-complex-debug make main /PETSc3/petsc/bin/mpicc -o main.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g3 -I/PETSc3/petsc/petsc-dev/include -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include -I/PETSc3/petsc/include -I/opt/X11/include `pwd`/main.c /PETSc3/petsc/bin/mpicc -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,-no_compact_unwind -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g3 -o main main.o -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl master *:~/Downloads/tmp$ ./main After first assembly: Mat Object: 1 MPI processes type: seqaij row 0: (0, 1. + 1. i) row 1: (1, 1. + 1. i) row 2: (2, 1. + 1. i) row 3: (3, 1. + 1. i) row 4: (4, 1. + 1. i) row 5: (5, 1. + 1. i) row 6: (6, 1. + 1. i) row 7: (7, 1. + 1. i) row 8: (8, 1. + 1. i) row 9: (9, 1. + 1. i) After second assembly: Mat Object: 1 MPI processes type: seqaij row 0: (0, 1. + 1. i) row 1: (1, 1. + 1. i) row 2: (2, 1. + 1. i) row 3: (3, 1. + 1. i) row 4: (4, 1. + 1. i) row 5: (5, 1. + 1. i) row 6: (6, 1. + 1. i) row 7: (7, 1. + 1. i) row 8: (8, 1. + 1. i) row 9: (9, 1. + 1. i) row 0 col: 9 val: 0. + 0. i I attach my code. So then I ran your Fortran: /PETSc3/petsc/bin/mpif90 -c -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -Wno-unused-variable -g -I/PETSc3/petsc/petsc-dev/include -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include -I/PETSc3/petsc/include -I/opt/X11/include -o main2.o main2.F90 /PETSc3/petsc/bin/mpif90 -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,-no_compact_unwind -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -Wno-unused-variable -g -o main2 main2.o -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0 -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3 -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl master *:~/Downloads/tmp$ ./main2 After first assembly: Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) row 1: (1, 1.) row 2: (2, 1.) row 3: (3, 1.) row 4: (4, 1.) row 5: (5, 1.) row 6: (6, 1.) row 7: (7, 1.) row 8: (8, 1.) row 9: (9, 1.) After second assembly: Mat Object: 1 MPI processes type: seqaij row 0: (0, 1.) row 1: (1, 1.) row 2: (2, 1.) row 3: (3, 1.) row 4: (4, 1.) row 5: (5, 1.) row 6: (6, 1.) row 7: (7, 1.) row 8: (8, 1.) row 9: (9, 1.) row: 0 col: 9 val: 0.000000000000000000E+00 0.000000000000000000E+00 I am not seeing an error. Am I not running it correctly? Thanks, MAtt > Thibaut > > > On 22/10/2019 17:48, Matthew Knepley wrote: > > On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users < > petsc-users@mcs.anl.gov> wrote: > >> Hi Hong, >> >> Thank you for having a look, I copied/pasted your code snippet into >> ex28.c and the error indeed appears if you change that col[0]. That's >> because you did not allow a new non-zero location in the matrix with the >> option MAT_NEW_NONZERO_LOCATION_ERR. >> >> I spent the day debugging the code and already checked my calls to >> MatSetValues: >> >> For all MatSetValues calls corresponding to the row/col location in the >> error messages in the subsequent assembly, the numerical value associated >> with that row/col was exactly (0.0,0.0) (complex arithmetic) so it >> shouldn't be inserted w.r.t. the option MAT_IGNORE_ZERO_ENTRIES. It seems >> MatSetValues still did it anyway. >> > Okay, lets solve this problem first. You say that > > - You called MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE) > - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val had a > complex 0 in it > - PETSc tried to insert the complex 0 > > This should be really easy to test in a tiny example. Do you mind making > it? If its broken, I will fix it. > > Thanks, > > Matt > >> I was able to solve the problem by adding >> MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first >> assembly. However I don't know why it fixed it as the manual seems to say >> this is just for efficiency purposes >> >> It "should be specified after the first matrix has been fully assembled. >> This option ensures that certain data structures and communication >> information will be reused (instead of regenerated during successive steps, >> thereby increasing efficiency" >> >> So I'm still puzzled by why I got that error in the first place. Unless >> "regenerated" implies resetting some attributes of the preallocated >> non-zero structure / first assembly? >> >> >> Thibaut >> >> >> On 22/10/2019 17:06, Zhang, Hong wrote: >> >> Thibaut: >> Check your code on MatSetValues(), likely you set a value "to a new >> nonzero at global row/column (200, 160) into matrix" L. >> I tested petsc/src/mat/examples/tests/ex28.c by adding >> @@ -95,6 +95,26 @@ int main(int argc,char **args) >> /* Compute numeric factors using same F, then solve */ >> for (k = 0; k < num_numfac; k++) { >> /* Get numeric factor of A[k] */ >> + if (k==0) { >> + ierr = MatZeroEntries(A[0]);CHKERRQ(ierr); >> + for (i=rstart; i<rend; i++) { >> + col[0] = i-1; col[1] = i; col[2] = i+1; >> + if (i == 0) { >> + ierr = >> MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr); >> + } else if (i == N-1) { >> + ierr = >> MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr); >> + } else { >> + ierr = >> MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr); >> + } >> + } >> + if (!rank) { >> + i = N - 1; col[0] = N - 1; >> + ierr = >> MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr); >> + } >> + ierr = MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> + ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr); >> + } >> + >> >> It works in both sequential and parallel. If I set col[0] = 0, then I get >> the same error as yours. >> Hong >> >>> Dear PETSc developers, >>> >>> I'm extending a validated matrix preallocation/assembly part of my code >>> to solve multiple linear systems with MUMPS at each iteration of a main >>> loop, following the example src/mat/examples/tests/ex28.c that Hong Zhang >>> added a few weeks ago. The difference is that I'm using just 1 matrix to >>> solve different systems. >>> >>> I'm trying to investigate a nasty bug arising when I try to assemble >>> "for a second time" that MPIAIJ matrix. The issue arises only in parallel, >>> serial works fine. >>> >>> Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the case >>> where I have the fewest zero entries in the non-zero structure, before >>> getting its symbolic factorization. >>> >>> Further in the main loop, I'm solely changing its entries *retaining >>> the non-zero structure*. >>> >>> Here is the simplified Fortran code I'm using: >>> >>> ! Fill (M,N) case to ensure all non-zero entries are preallocated >>> CALL set_equations(M,N) >>> >>> CALL alloc_matrix(L) >>> ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation >>> ! --> Sets MAT_IGNORE_ZERO_ENTRIES, MAT_NEW_NONZERO_ALLOCATION_ERR, >>> MAT_NO_OFF_PROC_ENTRIES to true >>> >>> CALL assemble_matrix(L) >>> ! --> Calls MatSetValues with ADD_VALUES >>> ! --> Call MatAssemblyBegin/MatAssemblyEnd >>> >>> ! Tell PETSc that new non-zero insertions in matrix are forbidden >>> CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr) >>> >>> CALL set_mumps_parameters() >>> >>> ! Get symbolic LU factorization using MUMPS >>> CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr) >>> CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr) >>> CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr) >>> >>> CALL initialize_right_hand_sides() >>> >>> ! Zero matrix entries >>> CALL MatZeroEntries(L,ierr) >>> >>> ! Main loop >>> DO itr=1, maxitr >>> >>> DO m = 1, M >>> DO n = 1, N >>> >>> CALL set_equations(m,n) >>> CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN >>> MatSetValues call >>> >>> ! Solving the linear system associated with (m,n) >>> CALL MatLUFactorNumeric(F,L,info,ierr) >>> CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr) >>> >>> ! Process v_rhs's from v_sol's for next iteration >>> >>> CALL MatZeroEntries(L,ierr) >>> >>> END DO >>> END DO >>> >>> END DO >>> >>> >>> Testing on a small case, the error I get is >>> [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [1]PETSC ERROR: Argument out of range >>> [1]PETSC ERROR: Inserting a new nonzero at global row/column (200, 160) >>> into matrix >>> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown >>> [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug >>> --with-scalar-type=complex --with-precision=double --with-debugging=1 >>> --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1 >>> --download-mpich --download-fblaslapack --download-scalapack >>> --download-metis --download-parmetis --download-ptscotch --download-mumps >>> --download-slepc --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 -g" >>> --FOPTFLAGS="-O0 -g -fbacktrace" >>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in >>> /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [1]PETSC ERROR: #2 MatSetValues() line 1375 in >>> /home/thibaut/Packages/petsc/src/mat/interface/matrix.c >>> [1]PETSC ERROR: #3 User provided function() line 0 in User file >>> application called MPI_Abort(MPI_COMM_SELF, 63) - process 0 >>> >>> >>> >>> which I don't understand. That element was not in the non-zero structure >>> and wasn't preallocated. I printed the value to be inserted at this >>> location (200,160) and it is exactly >>> (0.0000000000000000,0.0000000000000000) so this entry should not be >>> inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems it is. I'm using >>> ADD_VALUES in MatSetValues but it is the only call where (200,160) is >>> inserted. >>> >>> >>> - I zero the matrix entries with MatZeroEntries which retains the >>> non-zero structure (checked when I print the matrix) but tried to comment >>> the corresponding calls. >>> >>> - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND >>> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect. >>> >>> >>> Perhaps there's something fundamentally wrong in my approach, in any >>> case would you have any suggestions to identify the exact problem? >>> >>> Using PETSc 3.12.0. Thanks for your support, >>> >>> >>> Thibaut >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
main.c
Description: Binary data