MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with -st_mat_mumps_icntl_14 <percentage> where <percentage> is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more.
See ex43.c for an example showing how to set this option in code. Jose > El 25 ago 2021, a las 14:11, dazza simplythebest <[email protected]> > escribió: > > > > From: dazza simplythebest <[email protected]> > Sent: Wednesday, August 25, 2021 12:08 PM > To: Matthew Knepley <[email protected]> > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > Dear Matthew and Jose, > I have derived a smaller program > from the original program by constructing > matrices of the same size, but filling their entries randomly instead of > computing the correct > fluid dynamics values just to allow faster experimentation. This modified > code's behaviour seems > to be similar, with the code again failing for the large matrix case with > the SIGKILL error, so I first report > results from that code here. Firstly I can confirm that I am using Fortran , > and I am compiling with the > intel compiler, which it seems places automatic arrays on the stack. The > stacksize, as determined > by ulimit -a, is reported to be : > stack size (kbytes, -s) 8192 > > [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where' > in one of the non-SIGKILL gdb windows. > I have pasted the output into the bottom of this email (see [1] output) - it > does look like the problem occurs somewhere in the call > to the MUMPS solver ? > > [2] I have also today gained access to another workstation, and so have tried > running the (original) code on that machine. > This new machine has two (more powerful) CPU nodes and a larger memory > (both machines feature Intel Xeon processors). > On this new machine the large matrix case again failed with the familiar > SIGKILL report when I used 16 or 12 MPI > processes, ran to the end w/out error for 4 or 6 MPI processes, and failed > but with a PETSC error message > when I used 8 MPI processes, which I have pasted below (see [2] output). > Does this point to some sort of resource > demand that exceeds some limit as the number of MPI processes increases ? > > Many thanks once again, > Dan. > > [2] output > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=6 > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [0]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=6 > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [1]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [1]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [1]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [1]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=6 > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [2]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [2]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [2]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [2]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [2]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Error in external library > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=6 > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [3]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [3]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [3]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [3]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [3]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [3]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [3]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [4]PETSC ERROR: Error in external library > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=6 > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [4]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [4]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [4]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [4]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [4]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [4]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [4]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [4]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [5]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [5]PETSC ERROR: Error in external library > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=6 > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [5]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [5]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [5]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [5]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [5]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [5]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [5]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [5]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [5]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [6]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [6]PETSC ERROR: Error in external library > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=21891045 > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [6]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [6]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [6]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [6]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [6]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [6]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [6]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [6]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [6]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [7]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [7]PETSC ERROR: Error in external library > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=21841925 > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed > Aug 25 11:18:48 2021 > [7]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [7]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [7]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [7]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [7]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [7]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [7]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [7]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [7]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [0]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [0]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [0]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [0]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [0]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [0]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [0]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [0]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [1]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [1]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [1]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [1]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [2]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [2]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [2]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [2]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [3]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > [1] output > > Continuing. > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > [New Thread 0x7f6f5aad0800 (LWP 794040)] > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > ^C > Thread 1 "my.exe" received signal SIGINT, Interrupt. > 0x00007f72904927b0 in ofi_fastlock_release_noop () > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > (gdb) where > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #1 0x00007f729049354b in ofi_cq_readfrom () > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > comm=1, flag=0x0, status=0xffffffffffffffff) > at /usr/include/rdma/fi_tagged.h:109 > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > blocking=<error reading variable: Cannot access memory at address 0x1>, > > --Type <RET> for more, q to quit, c to continue without paging--cont > irecv=<error reading variable: Cannot access memory at address 0x0>, > message_received=<error reading variable: Cannot access memory at address > 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., > lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, > iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, > lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, > ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., > pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, > nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, > root=<error reading variable: value of type `zmumps_root_struc' requires > 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, > itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., > intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., > frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., > tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at > zfac_process_message.F:730 > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., > liw=<error reading variable: Cannot access memory at address 0x1>, a=..., > la=<error reading variable: Cannot access memory at address > 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., > step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., > tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: > Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot > access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, > nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at > address 0x2>, noffnegpv=<error reading variable: Cannot access memory at > address 0x0>, nb22t1=<error reading variable: Cannot access memory at address > 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, > nbtiny=<error reading variable: Cannot access memory at address 0x0>, > det_exp=<error reading variable: Cannot access memory at address 0x0>, > det_mant=<error reading variable: Cannot access memory at address 0x0>, > det_sign=<error reading variable: Cannot access memory at address 0x0>, > ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., > itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot > access memory at address 0x0>, rinfo=<error reading variable: Cannot access > memory at address 0x0>, posfac=<error reading variable: Cannot access memory > at address 0x0>, iwpos=<error reading variable: Cannot access memory at > address 0x0>, lrlu=<error reading variable: Cannot access memory at address > 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, > lrlus=<error reading variable: Cannot access memory at address 0x0>, > leaf=<error reading variable: Cannot access memory at address 0x0>, > nbroot=<error reading variable: Cannot access memory at address 0x0>, > nbrtot=<error reading variable: Cannot access memory at address 0x0>, > uu=<error reading variable: Cannot access memory at address 0x0>, > icntl=<error reading variable: Cannot access memory at address 0x0>, > ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at > address 0x0>, keep=<error reading variable: Cannot access memory at address > 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at > address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot > access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot > access memory at address 0xffffffff>, comm_nodes=<error reading variable: > Cannot access memory at address 0x0>, myid_nodes=<error reading variable: > Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, > intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., > lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, > seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., > pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182 > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error > reading variable: Cannot access memory at address 0x1>, liw=<error reading > variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, > ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., > istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading > variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., > ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., > lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading > variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., > keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, > comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at > address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading > variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error > reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error > reading variable: Cannot access memory at address 0x4>, intarr=..., > dblarr=..., root=<error reading variable: Cannot access memory at address > 0x11dbec>, nelt=<error reading variable: Cannot access memory at address > 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot > access memory at address 0x0>, ass_irecv=<error reading variable: Cannot > access memory at address 0x0>, seuil=<error reading variable: Cannot access > memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot > access memory at address 0x0>, mem_distrib=<error reading variable: Cannot > access memory at address 0x0>, dkeep=<error reading variable: Cannot access > memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: > Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243 > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: > value of type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zfac_driver.F:2421 > #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type > `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) > at zmumps_driver.F:1883 > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading > variable: Cannot access memory at address 0x1>, comm_f77=<error reading > variable: Cannot access memory at address 0x0>, n=<error reading variable: > Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., > cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, > jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., > irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, > eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, > blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., > perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., > rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, > listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., > wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, > instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., > rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, > irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, > isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, > lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, > mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., > write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, > write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at > zmumps_f77.F:289 > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, > A=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, > mat=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., > b_pet=..., jthisone=<error reading variable: Cannot access memory at address > 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at > small_slepc_example_program.F:322 > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > #26 0x00000000004023f2 in main () > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, > argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, > rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at > ../csu/libc-start.c:308 > #28 0x00000000004022fe in _start () > > From: Matthew Knepley <[email protected]> > Sent: Tuesday, August 24, 2021 3:59 PM > To: dazza simplythebest <[email protected]> > Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]> > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <[email protected]> > wrote: > > Dear Matthew and Jose, > Apologies for the delayed reply, I had a couple of unforeseen days off > this week. > Firstly regarding Jose's suggestion re: MUMPS, the program is already using > MUMPS > to solve linear systems (the code is using a distributed MPI matrix to solve > the generalised > non-Hermitian complex problem). > > I have tried the gdb debugger as per Matthew's suggestion. > Just to note in case someone else is following this that at first it didn't > work (couldn't 'attach') , > but after some googling I found a tip suggesting the command; > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > which seemed to get it working. > > I then first ran the debugger on the small matrix case that worked. > That stopped in gdb almost immediately after starting execution > with a report regarding 'nanosleep.c': > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > However, issuing the 'cont' command again caused the program to run through > to the end of the > execution w/out any problems, and with correct looking results, so I am > guessing this error > is not particularly important. > > We do that on purpose when the debugger starts up. Typing 'cont' is correct. > > I then tried the same debugging procedure on the large matrix case that fails. > The code again stopped almost immediately after the start of execution with > the same nanosleep error as before, and I was able to set the program running > again with 'cont' (see full output below). I was running the code with 4 MPI > processes, > and so had 4 gdb windows appear. Thereafter the code ran for sometime until > completing the > matrix construction, and then one of the gdb process windows printed a > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > message. I then typed 'where' into this terminal but just received the > message > No stack. > > I have only seen this behavior one other time, and it was with Fortran. > Fortran allows you to declare really big arrays > on the stack by putting them at the start of a function (rather than F90 > malloc). When I had one of those arrays exceed > the stack space, I got this kind of an error where everything is destroyed > rather than just stopping. Could it be that you > have a large structure on the stack? > > Second, you can at least look at the stack for the processes that were not > killed. You type Ctrl-C, which should give you > the prompt and then "where". > > Thanks, > > Matt > > The other gdb windows basically seemed to be left in limbo until I issued the > 'quit' > command in the SIGKILL, and then they vanished. > > I paste the full output from the gdb window that recorded the SIGKILL below > here. > I guess it is necessary to somehow work out where the SIGKILL originates from > ? > > Thanks once again, > Dan. > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./stab1.exe... > Attaching to program: > /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, > process 675919 > Reading symbols from > /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > Reading symbols from > /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for > more, q to quit, c to continue without paging--cont > /intel64_lin/libmkl_intel_lp64.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > Reading symbols from > /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, > clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffdc641a9a0, > rem=rem@entry=0x7ffdc641a9a0) at > ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or > directory. > (gdb) cont > Continuing. > [New Thread 0x7f9e49c02780 (LWP 676559)] > [New Thread 0x7f9e49400800 (LWP 676560)] > [New Thread 0x7f9e48bfe880 (LWP 676562)] > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > [Thread 0x7f9e49400800 (LWP 676560) exited] > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > (gdb) where > No stack. > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - > > From: Matthew Knepley <[email protected]> > Sent: Friday, August 20, 2021 2:12 PM > To: dazza simplythebest <[email protected]> > Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]> > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <[email protected]> > wrote: > Dear Jose, > Many thanks for your response, I have been investigating this issue with > a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow an > easier exploration of things > I first downsized the resolution of the underlying fluid solver while keeping > all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the same > physical problem as the original > larger matrix but to lower accuracy. > > Results > > Small matrix (N= 21168) - everything good! > This converged when using the -eps_largest_real approach (taking 92 > iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert > approach, converging > very impressively in a single iteration ! Interestingly it did this both for > a non-zero -eps_target > and also for a zero -eps_target. > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type > sinvert > I have just double checked again that the code does run properly when we use > the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= > -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, which > took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be ballpark > correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain > 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same > physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means > the agreement is in line > with expectations. > > Unfortunately though the code does still crash though when I try to do > shift-invert for the large matrix case , > whether or not I use a non-zero -eps_target. For reference this is the > command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type > sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it successfully > calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and > EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from slepc/PETSC, > and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED > BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > > Hi Dan, > > It would help track this error down if we had a stack trace. You can get a > stack trace from the debugger. You run with > > -start_in_debugger > > which should launch the debugger (usually), and then type > > cont > > to continue, and then > > where > > to get the stack trace when it crashes, or 'bt' on lldb. > > Thanks, > > Matt > > Do you have any ideas as to why this larger matrix case should fail when > using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct > results > when using the -eps_largest_real option suggests that there is probably > nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error message > from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been exceeded, > which could cause such a sudden > shutdown? For your reference when running the large matrix case with the > -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift invert > approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this issue > or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is so > impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > From: Jose E. Roman <[email protected]> > Sent: Thursday, August 19, 2021 7:58 AM > To: dazza simplythebest <[email protected]> > Cc: PETSc <[email protected]> > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have > small magnitude. I would not say 600 iterations is a lot, you probably need > many more. In most cases, approach B) is better because it improves > convergence of eigenvalues close to the target, but it requires prior > knowledge of your spectrum distribution in order to choose an appropriate > target. > > In B) what do you mean that it crashes. If you get an error about > factorization, it means that your A-matrix is singular, In that case, try > using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest <[email protected]> > > escribió: > > > > Dear All, > > I am planning on using slepc to do a large number of eigenvalue > > calculations > > of a generalized eigenvalue problem, called from a program written in > > fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both > > locally and on a cluster, > > and on smaller test problems everything is working well; the matrices are > > efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just now > > starting to move > > towards now solving the full-size 'production run' problems, and would > > appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx > > whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will be > > tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues with > > the largest real part, > > although in other cases I will also be interested in finding the > > eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol > > 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any eigenvalues > > within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly approaching > > convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation using > > the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so > > perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( or > > find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that > > increasing ncv > > may help, but I am wondering , since 600 is a large number of iterations, > > whether there > > maybe something else going on - e.g. perhaps some alternative > > preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command line > > options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/
