Absurd memory requests "Memory requested 18446744068029169664" usually means that 32 bit integers are not large enough for the problem. Try configuring on the cray with --with-64-bit-indices
Barry > On Jan 19, 2017, at 7:14 AM, Cyrill Vonplanta <[email protected]> > wrote: > > Dear PETSc Users, > > > I have a problem with a solver running on a cray machine that crashes at the > command “MatMatMult” (see error message below). When i run the same solver on > my machine in serial or parallel it runs through, also when I look at it with > -malloc_debug there doesn’t seem to be any issues. > > Does someone have a clue what the cause of this failure could be? > > Best Cyrill > -- > > The line that causes the crash is this: > > ierr = MatMatMult(_O, _interpolations[0], MAT_INITIAL_MATRIX, PETSC_DEFAULT, > &mmg->interpolations[mg_levels-2]); CHKERRQ(ierr); > > The error message: > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0]PETSC ERROR: Memory allocated 0 Memory used by process 61852 > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [0]PETSC ERROR: Memory requested 18446744068029169664 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > [0]PETSC ERROR: /scratch/snx3000/studi/./moose-passo-opt on a haswell named > nid01137 by studi Thu Jan 19 14:03:27 2017 > [0]PETSC ERROR: Configure options --known-has-attribute-aligned=1 > --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 > --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 > --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 > --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 > --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 > --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 > --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 > --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc > --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 > --with-debugging=0 --with-dependencies=0 --with-fc=ftn > --with-fortran-datatypes=0 --with-fortran-interfaces=0 > --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real > --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 > --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 > --with-mpi-lib="[]" --with-mpi-include="[]" > --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib > -lsci_gnu_mp" --with-superlu=1 > --with-superlu-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-superlu-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsuperlu" > --with-superlu_dist=1 > --with-superlu_dist-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-superlu_dist-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib > -lsuperlu_dist" --with-parmetis=1 > --with-parmetis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-parmetis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lparmetis" > --with-metis=1 > --with-metis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-metis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lmetis" > --with-ptscotch=1 > --with-ptscotch-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-ptscotch-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lptscotch > -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 > --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include > --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib > -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 > --with-mumps-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-mumps-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lcmumps > -ldmumps -lesmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lpord" > --with-hdf5=1 > --with-hdf5-include=/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/include > --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/lib -lhdf5_parallel > -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" > --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" > --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --LIBS= > --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell > --prefix=/opt/cray/pe/petsc/3.7.2.1/real/GNU/5.1/haswell --with-hypre=1 > --with-hypre-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-hypre-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lHYPRE" > --with-sundials=1 > --with-sundials-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-sundials-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib > -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas > -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial" > [0]PETSC ERROR: #1 MatGetBrowsOfAoCols_MPIAIJ() line 4815 in > src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #2 MatGetBrowsOfAoCols_MPIAIJ() line 4815 in > src/mat/impls/aij/mpi/mpiaij.c > [0]PETSC ERROR: #3 MatMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable() line 198 in > src/mat/impls/aij/mpi/mpimatmatmult.c > [0]PETSC ERROR: #4 MatMatMult_MPIAIJ_MPIAIJ() line 34 in > src/mat/impls/aij/mpi/mpimatmatmult.c > [0]PETSC ERROR: MMG Setup 30.868420 ms. > #5 MatMatMult() line 9517 in src/mat/interface/matrix.c > [0]PETSC ERROR: #6 MMGSetup() line 85 in > /users/studi/src/moose-passo/src/passo/monotone_mg.C > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Arguments are incompatible > [0]PETSC ERROR: Incompatible vector local lengths 666 != 10922 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > [0]PETSC ERROR: /scratch/snx3000/studi/./moose-passo-opt on a haswell named > nid01137 by studi Thu Jan 19 14:03:27 2017 > [0]PETSC ERROR: Configure options --known-has-attribute-aligned=1 > --known-mpi-int64_t=0 --known-bits-per-byte=8 --known-sdot-returns-double=0 > --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0 > --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 > --known-memcmp-ok=1 --known-mpi-c-double-complex=1 --known-mpi-long-double=1 > --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 > --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 > --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 > --known-sizeof-long=8 --known-sizeof-short=2 --known-sizeof-size_t=8 > --known-sizeof-void-p=8 --with-ar=ar --with-batch=1 --with-cc=cc > --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 > --with-debugging=0 --with-dependencies=0 --with-fc=ftn > --with-fortran-datatypes=0 --with-fortran-interfaces=0 > --with-fortranlib-autodetect=0 --with-ranlib=ranlib --with-scalar-type=real > --with-shared-ld=ar --with-etags=0 --with-dependencies=0 --with-x=0 > --with-ssl=0 --with-shared-libraries=0 --with-dependencies=0 > --with-mpi-lib="[]" --with-mpi-include="[]" > --with-blas-lapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib > -lsci_gnu_mp" --with-superlu=1 > --with-superlu-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-superlu-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lsuperlu" > --with-superlu_dist=1 > --with-superlu_dist-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-superlu_dist-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib > -lsuperlu_dist" --with-parmetis=1 > --with-parmetis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-parmetis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lparmetis" > --with-metis=1 > --with-metis-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-metis-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lmetis" > --with-ptscotch=1 > --with-ptscotch-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-ptscotch-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lptscotch > -lscotch -lptscotcherr -lscotcherr" --with-scalapack=1 > --with-scalapack-include=/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/include > --with-scalapack-lib="-L/opt/cray/libsci/13.2.0/GNU/5.1/x86_64/lib > -lsci_gnu_mpi_mp -lsci_gnu_mp" --with-mumps=1 > --with-mumps-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-mumps-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lcmumps > -ldmumps -lesmumps -lsmumps -lzmumps -lmumps_common -lptesmumps -lpord" > --with-hdf5=1 > --with-hdf5-include=/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/include > --with-hdf5-lib="-L/opt/cray/hdf5-parallel/1.8.16/GNU/5.1/lib -lhdf5_parallel > -lz -ldl" --CFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" > --CPPFLAGS= --CXXFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" > --FFLAGS="-march=haswell -fopenmp -O3 -ffast-math -fPIC" --LIBS= > --CXX_LINKER_FLAGS= --PETSC_ARCH=haswell > --prefix=/opt/cray/pe/petsc/3.7.2.1/real/GNU/5.1/haswell --with-hypre=1 > --with-hypre-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-hypre-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib -lHYPRE" > --with-sundials=1 > --with-sundials-include=/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/include > --with-sundials-lib="-L/opt/cray/tpsl/16.07.1/GNU/5.1/haswell/lib > -lsundials_cvode -lsundials_cvodes -lsundials_ida -lsundials_idas > -lsundials_kinsol -lsundials_nvecparallel -lsundials_nvecserial" > [0]PETSC ERROR: #7 VecCopy() line 1639 in src/vec/vec/interface/vector.c > Level 1, Presmoothing step 0 ... srun: error: nid01137: task 0: > Trace/breakpoint trap > srun: Terminating job step 349949.1 > slurmstepd: error: *** STEP 349949.1 ON nid01137 CANCELLED AT > 2017-01-19T14:03:32 *** > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > srun: error: nid01137: task 1: Killed > >
