It is almost surely some subtle memory corruption somewhere use http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind to track it down.
> On Oct 6, 2016, at 4:22 AM, Ivano Barletta <[email protected]> wrote: > > Hello everyone > > Recently I resumed the task of nesting Petsc into > this fem ocean model, for the solution of a linear system > > I followed your suggestions and "almost" everything works. > > The problem raised during a run with 4 CPUs, when i got this > error > > 3:[3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > 3:[3]PETSC ERROR: Petsc has generated inconsistent data > 3:[3]PETSC ERROR: Negative MPI source! > 3:[3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > 3:[3]PETSC ERROR: Petsc Release Version 3.7.1, May, 15, 2016 > 3:[3]PETSC ERROR: /users/home/ib04116/shympi_last4/fem3d/shympi > > > ^A on a linux-gnu-intel named > n243.cluster.net by ib04116 Thu Oct 6 10:37:01 2016 > 3:[3]PETSC ERROR: Configure options > CFLAGS=-I/users/home/opt/netcdf/netcdf-4.2.1.1/include > -I/users/home/opt/szip/szip-2.1/include > -I/users/home/opt/hdf5/hdf5-1.8.10-patch1/include -I/usr/include > -I/users/home/opt/netcdf/netcdf-4.3/include > -I/users/home/opt/hdf5/hdf5-1.8.11/include FFLAGS=-xHost -no-prec-div -O3 > -I/users/home/opt/netcdf/netcdf-4.2.1.1/include > -I/users/home/opt/netcdf/netcdf-4.3/include > LDFLAGS=-L/users/home/opt/netcdf/netcdf-4.2.1.1/lib -lnetcdff > -L/users/home/opt/szip/szip-2.1/lib > -L/users/home/opt/hdf5/hdf5-1.8.10-patch1/lib > -L/users/home/opt/netcdf/netcdf-4.2.1.1/lib -L/usr/lib64/ -lz -lnetcdf > -lnetcdf -lgpfs -L/users/home/opt/netcdf/netcdf-4.3/lib > -L/users/home/opt/hdf5/hdf5-1.8.11/lib > -L/users/home/opt/netcdf/netcdf-4.3/lib -lcurl --PETSC_ARCH=linux-gnu-intel > --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-mpiexec=mpirun > --with-blas-lapack-dir=/users/home/opt/intel/composer_xe_2013/mkl > --with-scalapack-lib="-L/users/home/opt/intel/composer_xe_2013/mkl//lib/intel64 > -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64" > --with-scalapack-include=/users/home/opt/intel/composer_xe_2013/mkl/include > --download-metis --download-parmetis --download-mumps --download-superlu > 3:[3]PETSC ERROR: #1 MatStashScatterGetMesg_Ref() line 692 in > /users/home/sco116/petsc/petsc-3.7.1/src/mat/utils/matstash.c > 3:[3]PETSC ERROR: #2 MatStashScatterGetMesg_Private() line 663 in > /users/home/sco116/petsc/petsc-3.7.1/src/mat/utils/matstash.c > 3:[3]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 713 in > /users/home/sco116/petsc/petsc-3.7.1/src/mat/impls/aij/mpi/mpiaij.c > 3:[3]PETSC ERROR: #4 MatAssemblyEnd() line 5187 in > /users/home/sco116/petsc/petsc-3.7.1/src/mat/interface/matrix.c > > The code is in fortran and the Petsc version is 3.7.1 > > This error looks quite strange to me, because it doesn't happen always in the > same > situation. The model goes through several time steps, but this error is not > raised always at the same time. It has happened at the fourth, for example, > at > the fifth time step. What is even more odd is that once the run of the model > (720 time steps) > was completed without any error. > > What I do to solve the linear system for each time step is the following: > > call petsc_solve( ..arguments..) > > subroutine petsc_solve(..args) > call PetscInitialize(PETSC_NULL_CHARACTER) > > call MatCreate > ... > ... > call KSPSolve(...) > > call XXXDestroy() > call PetscFinalize > end subroutine > > Do you think that calling PetscInitialize and PetscFinalize > several times might cause problems? I guess Petsc use > the same communicator of the model, which is MPI_COMM_WORLD > > It don't have hints to troubleshoot this, since is not a > reproducible error and I don't know where to look to > sort it out. > > Have you got any suggestion? > > Thanks in advance > > Ivano > > > > > 2016-07-13 5:16 GMT+02:00 Barry Smith <[email protected]>: > > > On Jul 12, 2016, at 4:13 AM, Matthew Knepley <[email protected]> wrote: > > > > On Tue, Jul 12, 2016 at 3:35 AM, Ivano Barletta <[email protected]> wrote: > > Dear Petsc users > > > > my aim is to parallelize the solution of a linear > > system into a finite elements > > ocean model. > > > > The model has been almost entirely parallelized, with > > a partitioning of the domain made element-wise through > > the use of Zoltan libraries, so the subdomains > > share the nodes lying on the edges. > > > > The linear system includes node-to-node dependencies > > so my guess is that I need to create an halo surrounding > > each subdomain, to allow connections of edge nodes with > > neighbour subdomains ones > > > > Apart from that, my question is if Petsc accept a > > previously made partitioning (maybe taking into account of halo) > > using the data structures coming out of it > > > > Has anybody of you ever faced a similar problem? > > > > If all you want to do is construct a PETSc Mat and Vec for the linear > > system, > > just give PETSc the non-overlapping partition to create those objects. You > > can input values on off-process partitions automatically using > > MatSetValues() > > and VecSetValues(). > > Note that by just using the VecSetValues() and MatSetValues() PETSc will > manage all the halo business needed by the linear algebra system solver > automatically. You don't need to provide any halo information to PETSc. It is > really straightforward. > > Barry > > > > > Thanks, > > > > Matt > > > > Thanks in advance > > Ivano > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > >
