Re: [petsc-users] signal received error; MatNullSpaceTest; Stokes flow solver with pc fieldsplit and schur complement

Bishesh Khanal Fri, 18 Oct 2013 05:47:00 -0700

On Thu, Oct 17, 2013 at 5:30 PM, Barry Smith <[email protected]> wrote:


>
> On Oct 17, 2013, at 9:26 AM, Bishesh Khanal <[email protected]> wrote:
> >
> >
> >
> --------------------------------------------------------------------------
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: Caught signal number 15 Terminate: Somet process (or the
> batch system) has told this process to end
> > [0]PETSC ERROR: Try option -start_in_debugger or
> -on_error_attach_debugger
> > [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[0]PETSCERROR: or 
> try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> > [0]PETSC ERROR: likely location of problem given in stack below
> > [0]PETSC ERROR: ---------------------  Stack Frames
> ------------------------------------
> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > [0]PETSC ERROR:       INSTEAD the line number of the start of the
> function
> > [0]PETSC ERROR:       is given.
> > [0]PETSC ERROR: [0] MatSetValues_MPIAIJ line 505
> /tmp/petsc-3.4.3/src/mat/impls/aij/mpi/mpiaij.c
> > [0]PETSC ERROR: [0] MatSetValues line 1071
> /tmp/petsc-3.4.3/src/mat/interface/matrix.c
> > [0]PETSC ERROR: [0] MatSetValuesLocal line 1935
> /tmp/petsc-3.4.3/src/mat/interface/matrix.c
> > [0]PETSC ERROR: [0] DMCreateMatrix_DA_3d_MPIAIJ line 1051
> /tmp/petsc-3.4.3/src/dm/impls/da/fdda.c
> > [0]PETSC ERROR: [0] DMCreateMatrix_DA line 627
> /tmp/petsc-3.4.3/src/dm/impls/da/fdda.c
> > [0]PETSC ERROR: [0] DMCreateMatrix line 900
> /tmp/petsc-3.4.3/src/dm/interface/dm.c
> > [0]PETSC ERROR: [0] KSPSetUp line 192
> /tmp/petsc-3.4.3/src/ksp/ksp/interface/itfunc.c
> > [0]PETSC ERROR: [0] solveModel line 122
> "unknowndirectory/"/epi/asclepios2/bkhanal/works/AdLemModel/src/PetscAdLemTaras3D.cxx
> > [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> > [0]PETSC ERROR: Signal received!
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: Petsc Release Version 3.4.3, Oct, 15, 2013
> > [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > [0]PETSC ERROR: See docs/index.html for manual pages.
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR:
> /epi/asclepios2/bkhanal/works/AdLemModel/build/src/AdLemMain on a
> arch-linux2-cxx-debug named nef002 by bkhanal Thu Oct 17 15:55:33 2013
> > [0]PETSC ERROR: Libraries linked from
> /epi/asclepios2/bkhanal/petscDebug/lib
> > [0]PETSC ERROR: Configure run at Wed Oct 16 14:18:48 2013
> > [0]PETSC ERROR: Configure options
> --with-mpi-dir=/opt/openmpi-gcc/current/ --with-shared-libraries
> --prefix=/epi/asclepios2/bkhanal/petscDebug -download-f-blas-lapack=1
> --download-metis --download-parmetis --download-superlu_dist
> --download-scalapack --download-mumps --download-hypre --with-clanguage=cxx
> > [0]PETSC ERROR:
> ------------------------------------------------------------------------
> > [0]PETSC ERROR: User provided function() line 0 in unknown directory
> unknown file
> > ==47363==
> > ==47363== HEAP SUMMARY:
> > ==47363==     in use at exit: 10,939,838,029 bytes in 8,091 blocks
> > ==47363==   total heap usage: 1,936,963 allocs, 1,928,872 frees,
> 11,530,164,042 bytes allocated
> > ==47363==
> >
> > Does it mean it is crashing near MatSetValues_MPIAIJ ?
>
> > It is not really "crashing" here. The most likely cause is that the OS
> has run out of memory to provide so it has started shutting down processes;
> this is the "unfriendly" way that Unix handles "running out of memory".  A
> much less likely possibility (since the program has just started) is that
> the process has run out of allocated time and the OS is shutting it down.
>
>    I am confident in this case it is simply a mater of the system running
> out of memory.  What happens if you run the exact same job without the
> valgrind? Please send the ENTIRE error message.
>

Thanks!! Yes it seems that it was because of the running out of the memory.
I realized that in the job script I submitted to the cluster I had limited
the memory to a certain value. I deleted that line and resubmitted the job,
now I the job gets killed because of the wall time (of 3 hours) being
exceeded unlike the previous case where the mpiexec killed the job quite
quickly. Now I will run the job again by increasing the wall time and see
if it gives results or not and get back here. Thanks again!


>
>    Barry
>
>
>
>
>

Re: [petsc-users] signal received error; MatNullSpaceTest; Stokes flow solver with pc fieldsplit and schur complement

Reply via email to