Satish, I tested your fix on ex51f.F90 (modified from build_nullbasis_petsc_mumps.F90) --it gives clean results with valgrind.
Shall you patch it to petsc-maint? I also like add ex51f.F90 (contributed by Constantin) to petsc/src/ksp/ksp/examples/tests/. Hong On Thu, May 26, 2016 at 5:15 PM, Hong <[email protected]> wrote: > Satish found a problem in using inode routines. > > In addition, user code has bugs. I modified > build_nullbasis_petsc_mumps.F90 into ex51f.F90 (attached) > which works well with option '-mat_no_inode'. > > ex51f.F90 differs from build_nullbasis_petsc_mumps.F90 in > 1) use MATAIJ/MATDENSE instead of MATMPIAIJ/MATMPIDENSE > MATAIJ wraps MATSEQAIJ and MATMPIAIJ. > > 2) > MatConvert(x, MATMPIAIJ, MAT_REUSE_MATRIX, x,ierr) > -> > MatConvert(x, MATMPIAIJ, MAT_INPLACE_MATRIX, x,ierr) > see > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatConvert.html > > Hong > > On Thu, May 26, 2016 at 3:05 PM, Satish Balay <[email protected]> wrote: > >> Well looks like MatGetBrowsOfAoCols_MPIAIJ() issue is primarily >> setting some local variables with uninitialzed data [thats primarily >> set/used for parallel commumication]. So valgrind flags it - but I >> don't think it gets used later on. >> >> [perhaps most of the code should be skipped for a sequential run..] >> >> The primary issue here is MatGetRowIJ_SeqAIJ_Inode_Symmetric() called >> by MatGetOrdering_ND(). >> >> The workarround is to not use ND with: >> call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr) >> >> But I think the following might be the fix [have to recheck].. The >> test code works with this change [with the default ND] >> >> diff --git a/src/mat/impls/aij/seq/inode.c b/src/mat/impls/aij/seq/inode.c >> index 9af404e..49f76ce 100644 >> --- a/src/mat/impls/aij/seq/inode.c >> +++ b/src/mat/impls/aij/seq/inode.c >> @@ -97,6 +97,7 @@ static PetscErrorCode >> MatGetRowIJ_SeqAIJ_Inode_Symmetric(Mat A,const PetscInt *i >> >> j = aj + ai[row] + ishift; >> jmax = aj + ai[row+1] + ishift; >> + if (j==jmax) continue; /* empty row */ >> col = *j++ + ishift; >> i2 = tvc[col]; >> while (i2<i1 && j<jmax) { /* 1.[-xx-d-xx--] >> 2.[-xx-------],off-diagonal elemets */ >> @@ -125,6 +126,7 @@ static PetscErrorCode >> MatGetRowIJ_SeqAIJ_Inode_Symmetric(Mat A,const PetscInt *i >> for (i1=0,row=0; i1<nslim_row; row += ns_row[i1],i1++) { >> j = aj + ai[row] + ishift; >> jmax = aj + ai[row+1] + ishift; >> + if (j==jmax) continue; /* empty row */ >> col = *j++ + ishift; >> i2 = tvc[col]; >> while (i2<i1 && j<jmax) { >> >> Satish >> >> On Thu, 26 May 2016, Hong wrote: >> >> > I'll investigate this - had a day off since yesterday. >> > Hong >> > >> > On Thu, May 26, 2016 at 12:04 PM, Barry Smith <[email protected]> >> wrote: >> > >> > > >> > > Hong needs to run with this matrix and add appropriate error >> checkers in >> > > the matrix routines to detect "incomplete" matrices and likely just >> error >> > > out. >> > > >> > > Barry >> > > >> > > > On May 26, 2016, at 11:23 AM, Satish Balay <[email protected]> >> wrote: >> > > > >> > > > Mat Object: 1 MPI processes >> > > > type: mpiaij >> > > > row 0: (0, 0.) (1, 0.486111) >> > > > row 1: (0, 0.486111) (1, 0.) >> > > > row 2: (2, 0.) (3, 0.486111) >> > > > row 3: (4, 0.486111) (5, -0.486111) >> > > > row 4: >> > > > row 5: >> > > > >> > > > The matrix created is funny (empty rows at the end) - so perhaps its >> > > > exposing bugs in Mat code? [is that a valid matrix for this code?] >> > > > >> > > > ==21091== Use of uninitialised value of size 8 >> > > > ==21091== at 0x57CA16B: MatGetRowIJ_SeqAIJ_Inode_Symmetric >> > > (inode.c:101) >> > > > ==21091== by 0x57CBA1C: MatGetRowIJ_SeqAIJ_Inode (inode.c:241) >> > > > ==21091== by 0x537C0B5: MatGetRowIJ (matrix.c:7274) >> > > > ==21091== by 0x53072FD: MatGetOrdering_ND (spnd.c:18) >> > > > ==21091== by 0x530BC39: MatGetOrdering (sorder.c:260) >> > > > ==21091== by 0x530A72D: MatGetOrdering (sorder.c:202) >> > > > ==21091== by 0x5DDD764: PCSetUp_LU (lu.c:124) >> > > > ==21091== by 0x5EBFE60: PCSetUp (precon.c:968) >> > > > ==21091== by 0x5FDA1B3: KSPSetUp (itfunc.c:390) >> > > > ==21091== by 0x601C17D: kspsetup_ (itfuncf.c:252) >> > > > ==21091== by 0x4028B9: MAIN__ (ex1f.F90:104) >> > > > ==21091== by 0x403535: main (ex1f.F90:185) >> > > > >> > > > >> > > > This goes away if I add: >> > > > >> > > > call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr) >> > > > >> > > > And then there is also: >> > > > >> > > > ==21275== Invalid read of size 8 >> > > > ==21275== at 0x584DE93: MatGetBrowsOfAoCols_MPIAIJ >> (mpiaij.c:4734) >> > > > ==21275== by 0x58970A8: >> MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable >> > > (mpimatmatmult.c:198) >> > > > ==21275== by 0x5894A54: MatMatMult_MPIAIJ_MPIAIJ >> (mpimatmatmult.c:34) >> > > > ==21275== by 0x539664E: MatMatMult (matrix.c:9510) >> > > > ==21275== by 0x53B3201: matmatmult_ (matrixf.c:1157) >> > > > ==21275== by 0x402FC9: MAIN__ (ex1f.F90:149) >> > > > ==21275== by 0x4035B9: main (ex1f.F90:186) >> > > > ==21275== Address 0xa3d20f0 is 0 bytes after a block of size 48 >> alloc'd >> > > > ==21275== at 0x4C2DF93: memalign (vg_replace_malloc.c:858) >> > > > ==21275== by 0x4FDE05E: PetscMallocAlign (mal.c:28) >> > > > ==21275== by 0x5240240: VecScatterCreate (vscat.c:1220) >> > > > ==21275== by 0x5857708: MatSetUpMultiply_MPIAIJ (mmaij.c:116) >> > > > ==21275== by 0x581C31E: MatAssemblyEnd_MPIAIJ (mpiaij.c:747) >> > > > ==21275== by 0x53680F2: MatAssemblyEnd (matrix.c:5187) >> > > > ==21275== by 0x53B24D2: matassemblyend_ (matrixf.c:926) >> > > > ==21275== by 0x40262C: MAIN__ (ex1f.F90:60) >> > > > ==21275== by 0x4035B9: main (ex1f.F90:186) >> > > > >> > > > >> > > > Satish >> > > > >> > > > ----------- >> > > > >> > > > $ diff build_nullbasis_petsc_mumps.F90 ex1f.F90 >> > > > 3,7c3 >> > > > < #include <petsc/finclude/petscsys.h> >> > > > < #include "petsc/finclude/petscvec.h" >> > > > < #include "petsc/finclude/petscmat.h" >> > > > < #include "petsc/finclude/petscpc.h" >> > > > < #include "petsc/finclude/petscksp.h" >> > > > --- >> > > >> #include "petsc/finclude/petsc.h" >> > > > 40,41c36,37 >> > > > < call PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat_c_bin.txt", >> 0, >> > > viewer, ierr) >> > > > < call MatLoad(mat_c, viewer) >> > > > --- >> > > >> call PetscViewerBinaryOpen(PETSC_COMM_WORLD, "mat_c_bin.txt", >> > > FILE_MODE_READ, viewer, ierr) >> > > >> call MatLoad(mat_c, viewer,ierr) >> > > > 75a72 >> > > >> call PCFactorSetMatOrderingType(pc,MATORDERINGNATURAL,ierr) >> > > > 150c147 >> > > > < call MatConvert(x, MATMPIAIJ, MAT_REUSE_MATRIX, x, ierr) >> > > > --- >> > > >> call MatConvert(x, MATMPIAIJ, MAT_INPLACE_MATRIX, x, ierr) >> > > > >> > > > >> > > > On Thu, 26 May 2016, Matthew Knepley wrote: >> > > > >> > > >> Usually this means you have an uninitialized variable that is >> causing >> > > you >> > > >> to overwrite memory. Fortran >> > > >> is so lax in checking this, its one reason to switch to C. >> > > >> >> > > >> Thanks, >> > > >> >> > > >> Matt >> > > >> >> > > >> On Thu, May 26, 2016 at 1:46 AM, Constantin Nguyen Van < >> > > >> [email protected]> wrote: >> > > >> >> > > >>> Thanks for all your answers. >> > > >>> I'm sorry for the syntax mistake in MatLoad, it was done >> afterwards. >> > > >>> >> > > >>> I recompile PETSC --with-debugging=yes and run my code again. >> > > >>> Now, I also have this strange behaviour. When I run the code >> without >> > > >>> valgrind and with one proc, I have this error message: >> > > >>> >> > > >>> BEGIN PROC 0 >> > > >>> ITERATION 1 >> > > >>> ECHO 1 >> > > >>> ECHO 2 >> > > >>> INFOG(28): 2 >> > > >>> BASIS OK 0 >> > > >>> END PROC 0 >> > > >>> BEGIN PROC 0 >> > > >>> ITERATION 2 >> > > >>> ECHO 1 >> > > >>> ECHO 2 >> > > >>> INFOG(28): 2 >> > > >>> BASIS OK 0 >> > > >>> END PROC 0 >> > > >>> BEGIN PROC 0 >> > > >>> ITERATION 3 >> > > >>> ECHO 1 >> > > >>> [0]PETSC ERROR: >> > > >>> >> > > >> ------------------------------------------------------------------------ >> > > >>> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation >> Violation, >> > > >>> probably memory access out of range >> > > >>> [0]PETSC ERROR: Try option -start_in_debugger or >> > > -on_error_attach_debugger >> > > >>> [0]PETSC ERROR: or see >> > > >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and >> Apple Mac >> > > OS >> > > >>> X to find memory corruption errors >> > > >>> [0]PETSC ERROR: likely location of problem given in stack below >> > > >>> [0]PETSC ERROR: --------------------- Stack Frames >> > > >>> ------------------------------------ >> > > >>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> > > >>> available, >> > > >>> [0]PETSC ERROR: INSTEAD the line number of the start of the >> > > function >> > > >>> [0]PETSC ERROR: is given. >> > > >>> [0]PETSC ERROR: [0] MatGetRowIJ_SeqAIJ_Inode_Symmetric line 69 >> > > >>> >> > > >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/impls/aij/seq/inode.c >> > > >>> [0]PETSC ERROR: [0] MatGetRowIJ_SeqAIJ_Inode line 235 >> > > >>> >> > > >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/impls/aij/seq/inode.c >> > > >>> [0]PETSC ERROR: [0] MatGetRowIJ line 7099 >> > > >>> >> > > >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/interface/matrix.c >> > > >>> [0]PETSC ERROR: [0] MatGetOrdering_ND line 17 >> > > >>> >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/spnd.c >> > > >>> [0]PETSC ERROR: [0] MatGetOrdering line 185 >> > > >>> >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/sorder.c >> > > >>> [0]PETSC ERROR: [0] MatGetOrdering line 185 >> > > >>> >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/mat/order/sorder.c >> > > >>> [0]PETSC ERROR: [0] PCSetUp_LU line 99 >> > > >>> >> > > >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/pc/impls/factor/lu/lu.c >> > > >>> [0]PETSC ERROR: [0] PCSetUp line 945 >> > > >>> >> > > >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/pc/interface/precon.c >> > > >>> [0]PETSC ERROR: [0] KSPSetUp line 247 >> > > >>> >> > > >> /home/j10077/librairie/petsc-mumps/petsc-3.6.4/src/ksp/ksp/interface/itfunc.c >> > > >>> >> > > >>> But when I run it with valgrind, it does work well. >> > > >>> >> > > >>> Le 2016-05-25 20:04, Barry Smith a écrit : >> > > >>> >> > > >>>> First run with valgrind >> > > >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > > >>>> >> > > >>>> On May 25, 2016, at 2:35 AM, Constantin Nguyen Van >> > > >>>>> <[email protected]> wrote: >> > > >>>>> >> > > >>>>> Hi, >> > > >>>>> >> > > >>>>> I'm a new user of PETSc and I try to use it with MUMPS >> > > >>>>> functionalities to compute a nullbasis. >> > > >>>>> I wrote a code where I compute 4 times the same nullbasis. It >> does >> > > >>>>> work well when I run it with several procs but with only one >> > > >>>>> processor I get an error on the 2nd iteration when KSPSetUp is >> > > >>>>> called. Furthermore when it is run with a debugger ( >> > > >>>>> --with-debugging=yes), it works fine with one or several >> processors. >> > > >>>>> Have you got any idea about why it doesn't work with one >> processor >> > > >>>>> and no debugger? >> > > >>>>> >> > > >>>>> Thanks. >> > > >>>>> Constantin. >> > > >>>>> >> > > >>>>> PS: You can find the code and the files required to run it >> enclosed. >> > > >>>>> >> > > >>>> >> > > >> >> > > >> >> > > >> > > >> > >> > >
