MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is 
insuficient workspace. Try running with
 -st_mat_mumps_icntl_14 <percentage>
where <percentage> is the percentage in which you want to increase the 
workspace, e.g. 50 or 100 or more.

See ex43.c for an example showing how to set this option in code.

Jose


> El 25 ago 2021, a las 14:11, dazza simplythebest <[email protected]> 
> escribió:
> 
> 
> 
> From: dazza simplythebest <[email protected]>
> Sent: Wednesday, August 25, 2021 12:08 PM
> To: Matthew Knepley <[email protected]>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> ​Dear Matthew and Jose,
>                                           I have derived a smaller program 
> from the original program by constructing 
> matrices of the same size, but filling their entries randomly instead of 
> computing the correct 
> fluid dynamics values just to allow  faster experimentation. This modified 
> code's behaviour seems 
>  to be similar, with the code again failing for the large matrix case  with  
> the SIGKILL error, so I first report 
> results from that code here. Firstly I can confirm that I am using Fortran , 
> and I am compiling with the 
>  intel compiler, which it seems places automatic arrays on the stack.  The 
> stacksize, as determined 
> by ulimit -a, is reported to be :
> stack size              (kbytes, -s) 8192
> 
> [1] Okay, so I followed your suggestion and used ctrl-c  followed by 'where' 
> in one of the non-SIGKILL gdb windows. 
>  I have pasted the output into the bottom of this email (see [1] output) - it 
> does look like the problem occurs somewhere in the call
>  to the MUMPS solver ?
> 
> [2] I have also today gained access to another workstation, and so have tried 
> running the (original) code on that machine.
>   This new machine has two (more powerful) CPU nodes and a larger memory 
> (both machines feature Intel Xeon processors).
> On this new machine the large matrix case again failed with the familiar 
> SIGKILL report when I used 16 or 12 MPI
> processes,  ran to the end w/out error for 4 or 6 MPI processes, and failed 
> but with a PETSC error message 
>  when I used 8 MPI processes, which I have pasted below (see [2] output). 
> Does this point to some sort of resource
> demand that exceeds some limit as the number of MPI processes increases ?
> 
>   Many thanks once again,
>             Dan.
> 
> [2] output
> [0]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=6
> 
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [0]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=6
> 
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [1]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [1]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [1]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [1]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [2]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=6
> 
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [2]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [2]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [2]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [2]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [2]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=6
> 
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [3]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [3]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [3]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [3]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [3]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [3]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [3]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [4]PETSC ERROR: Error in external library
> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=6
> 
> [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [4]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [4]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [4]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [4]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [4]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [4]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [4]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [4]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [5]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [5]PETSC ERROR: Error in external library
> [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=6
> 
> [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [5]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [5]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [5]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [5]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [5]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [5]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [5]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [5]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [5]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [6]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [6]PETSC ERROR: Error in external library
> [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=21891045
> 
> [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [6]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [6]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [6]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [6]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [6]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [6]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [6]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [6]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [6]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [7]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [7]PETSC ERROR: Error in external library
> [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=21841925
> 
> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed 
> Aug 25 11:18:48 2021
> [7]PETSC ERROR: Configure options 
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs 
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort 
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex 
> --with-precision=double --with-debugging=0 --with-openmp 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-omp_nodbug
> [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [7]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [7]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [7]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [7]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [7]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [7]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [7]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [7]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [0]PETSC ERROR: #2 MatLUFactorNumeric() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [0]PETSC ERROR: #3 PCSetUp_LU() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [0]PETSC ERROR: #4 PCSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [0]PETSC ERROR: #5 KSPSetUp() at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [0]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [0]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [0]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [0]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [1]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [1]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [1]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [1]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [2]PETSC ERROR: #6 STSetUp_Sinvert() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [2]PETSC ERROR: #7 STSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [2]PETSC ERROR: #8 EPSSetUp() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [2]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [3]PETSC ERROR: #9 EPSSolve() at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> 
> 
> 
> [1] output
> 
> Continuing.
> [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> [New Thread 0x7f6f5aad0800 (LWP 794040)]
> [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> ^C
> Thread 1 "my.exe" received signal SIGINT, Interrupt.
> 0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> (gdb) where
> #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #1  0x00007f729049354b in ofi_cq_readfrom ()
>    from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
>    from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
>    from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
>    from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
>     comm=1, flag=0x0, status=0xffffffffffffffff)
>     at /usr/include/rdma/fi_tagged.h:109
> #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
>     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
>     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
>     blocking=<error reading variable: Cannot access memory at address 0x1>,
>     
>     --Type <RET> for more, q to quit, c to continue without paging--cont
>     irecv=<error reading variable: Cannot access memory at address 0x0>, 
> message_received=<error reading variable: Cannot access memory at address 
> 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., 
> lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, 
> iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, 
> lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, 
> ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., 
> pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, 
> nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, 
> root=<error reading variable: value of type `zmumps_root_struc' requires 
> 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, 
> itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., 
> intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., 
> frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., 
> tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at 
> zfac_process_message.F:730
> #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., 
> liw=<error reading variable: Cannot access memory at address 0x1>, a=..., 
> la=<error reading variable: Cannot access memory at address 
> 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., 
> step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., 
> tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: 
> Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot 
> access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, 
> nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at 
> address 0x2>, noffnegpv=<error reading variable: Cannot access memory at 
> address 0x0>, nb22t1=<error reading variable: Cannot access memory at address 
> 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, 
> nbtiny=<error reading variable: Cannot access memory at address 0x0>, 
> det_exp=<error reading variable: Cannot access memory at address 0x0>, 
> det_mant=<error reading variable: Cannot access memory at address 0x0>, 
> det_sign=<error reading variable: Cannot access memory at address 0x0>, 
> ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., 
> itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot 
> access memory at address 0x0>, rinfo=<error reading variable: Cannot access 
> memory at address 0x0>, posfac=<error reading variable: Cannot access memory 
> at address 0x0>, iwpos=<error reading variable: Cannot access memory at 
> address 0x0>, lrlu=<error reading variable: Cannot access memory at address 
> 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, 
> lrlus=<error reading variable: Cannot access memory at address 0x0>, 
> leaf=<error reading variable: Cannot access memory at address 0x0>, 
> nbroot=<error reading variable: Cannot access memory at address 0x0>, 
> nbrtot=<error reading variable: Cannot access memory at address 0x0>, 
> uu=<error reading variable: Cannot access memory at address 0x0>, 
> icntl=<error reading variable: Cannot access memory at address 0x0>, 
> ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at 
> address 0x0>, keep=<error reading variable: Cannot access memory at address 
> 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at 
> address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot 
> access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot 
> access memory at address 0xffffffff>, comm_nodes=<error reading variable: 
> Cannot access memory at address 0x0>, myid_nodes=<error reading variable: 
> Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, 
> intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., 
> lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, 
> seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., 
> pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
> #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error 
> reading variable: Cannot access memory at address 0x1>, liw=<error reading 
> variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, 
> ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., 
> istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading 
> variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., 
> ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., 
> lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading 
> variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., 
> keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, 
> comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at 
> address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading 
> variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error 
> reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error 
> reading variable: Cannot access memory at address 0x4>, intarr=..., 
> dblarr=..., root=<error reading variable: Cannot access memory at address 
> 0x11dbec>, nelt=<error reading variable: Cannot access memory at address 
> 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot 
> access memory at address 0x0>, ass_irecv=<error reading variable: Cannot 
> access memory at address 0x0>, seuil=<error reading variable: Cannot access 
> memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot 
> access memory at address 0x0>, mem_distrib=<error reading variable: Cannot 
> access memory at address 0x0>, dkeep=<error reading variable: Cannot access 
> memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: 
> Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
> #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: 
> value of type `zmumps_struc' requires 386095520 bytes, which is more than 
> max-value-size>) at zfac_driver.F:2421
> #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type 
> `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) 
> at zmumps_driver.F:1883
> #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading 
> variable: Cannot access memory at address 0x1>, comm_f77=<error reading 
> variable: Cannot access memory at address 0x0>, n=<error reading variable: 
> Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., 
> cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, 
> jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., 
> irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, 
> eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, 
> blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., 
> perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., 
> rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, 
> listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., 
> wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, 
> instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., 
> rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, 
> irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, 
> isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, 
> lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, 
> mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., 
> write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, 
> write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at 
> zmumps_f77.F:289
> #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, 
> A=0x7ffda7afdae0, info=0x1) at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, 
> mat=0x7ffda7afdae0, info=0x1) at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at 
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> #20 0x00007f7309130462 in STSetUp (st=0xd70248) at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at 
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., 
> b_pet=..., jthisone=<error reading variable: Cannot access memory at address 
> 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at 
> small_slepc_example_program.F:322
> #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> #26 0x00000000004023f2 in main ()
> #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, 
> argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, 
> rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at 
> ../csu/libc-start.c:308
> #28 0x00000000004022fe in _start ()
> 
> From: Matthew Knepley <[email protected]>
> Sent: Tuesday, August 24, 2021 3:59 PM
> To: dazza simplythebest <[email protected]>
> Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <[email protected]> 
> wrote:
> 
> Dear Matthew and Jose,
>    Apologies for the delayed reply, I had a couple of unforeseen days off 
> this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already using 
> MUMPS
> to solve linear systems (the code is using a distributed MPI  matrix to solve 
> the generalised 
> non-Hermitian complex problem).
> 
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it didn't 
> work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
> 
> I then first ran the debugger on the small matrix case that worked.
> That stopped in gdb almost immediately after starting execution 
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run through 
> to the end of the
>  execution w/out any problems, and with correct looking results, so I am 
> guessing this error
> is not particularly important.
> 
> We do that on purpose when the debugger starts up. Typing 'cont' is correct.
>  
> I then tried the same debugging procedure on the large matrix case that fails.
> The code again stopped almost immediately after the start of execution with 
> the same nanosleep error as before, and I was able to set the program running 
>  again with 'cont' (see full output below). I was running the code with 4 MPI 
> processes,
>  and so had 4 gdb windows appear.  Thereafter the code ran for sometime until 
> completing the 
> matrix construction, and then one of the gdb process windows printed a 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message.  I then typed 'where' into this terminal but just received the 
> message
> No stack.
> 
> I have only seen this behavior one other time, and it was with Fortran. 
> Fortran allows you to declare really big arrays
> on the stack by putting them at the start of a function (rather than F90 
> malloc). When I had one of those arrays exceed
> the stack space, I got this kind of an error where everything is destroyed 
> rather than just stopping. Could it be that you
> have a large structure on the stack?
> 
> Second, you can at least look at the stack for the processes that were not 
> killed. You type Ctrl-C, which should give you
> the prompt and then "where".
> 
>   Thanks,
> 
>       Matt
>  
> The other gdb windows basically seemed to be left in limbo until I issued the 
> 'quit'
>  command in the SIGKILL, and then they vanished.
> 
> I paste the full output from the gdb window that recorded the SIGKILL below 
> here.
> I guess it is necessary to somehow work out where the SIGKILL originates from 
> ?
> 
>  Thanks once again,
>                          Dan.
> 
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program: 
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, 
> process 675919
> Reading symbols from 
> /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from 
> /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for 
> more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from 
> /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in 
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, 
> clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffdc641a9a0, 
> rem=rem@entry=0x7ffdc641a9a0) at 
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or 
> directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
> 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> - - - - - - - - - - - 
>   
> From: Matthew Knepley <[email protected]>
> Sent: Friday, August 20, 2021 2:12 PM
> To: dazza simplythebest <[email protected]>
> Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <[email protected]> 
> wrote:
> Dear Jose,
>     Many thanks for your response, I have been investigating this issue with 
> a few more calculations 
> today, hence the slightly delayed response.
> 
> The problem is actually derived from a fluid dynamics problem, so to allow an 
> easier exploration of things 
> I first downsized the resolution of the underlying fluid solver while keeping 
> all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the same 
> physical problem as the original
>  larger matrix but to lower accuracy.  
> 
> Results
> 
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92 
> iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert 
> approach, converging 
> very impressively in a single iteration ! Interestingly it did this both for 
> a non-zero  -eps_target
>  and also for a zero  -eps_target.
> 
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type 
> sinvert 
> I have just double checked again that the code does run properly when we use 
> the -eps_largest_real 
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= 
> -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which 
> took 6 hours on the 
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark 
> correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain 
> 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same 
> physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means 
> the agreement is in line
> with expectations.
> 
> Unfortunately though the code does still crash though when I try to do 
> shift-invert for the large matrix case ,
>  whether or not I use a non-zero  -eps_target. For reference this is the 
> command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type 
> sinvert -eps_monitor :monitor_output05.txt  
> To be precise the code crashes soon after calling EPSSolve (it successfully 
> calls 
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and 
> EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC, 
> and do not even get the 
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED 
> BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
> 
> Hi Dan,
> 
> It would help track this error down if we had a stack trace. You can get a 
> stack trace from the debugger. You run with
> 
>   -start_in_debugger
> 
> which should launch the debugger (usually), and then type
> 
>   cont
> 
> to continue, and then
> 
>   where
> 
> to get the stack trace when it crashes, or 'bt' on lldb.
> 
>   Thanks,
> 
>      Matt
>  
> Do you have any ideas as to why this larger matrix case should fail when 
> using shift-invert but succeed when using 
> -eps_largest_real ? The fact that the program works and produces correct 
> results 
> when using the -eps_largest_real  option suggests that there is probably 
> nothing wrong with the specification 
> of the problem or the matrices ? It is strange how there is no error message 
> from slepc / Petsc ... the 
> only idea I have at the moment is that perhaps max memory has been exceeded, 
> which could cause such a sudden 
> shutdown? For your reference when running the large matrix case with the 
> -eps_largest_real option I am using 
> about 36 GB of the 148GB available on this machine  - does the shift invert 
> approach require substantially 
> more memory for example ?
> 
>   I would be very grateful if you have any suggestions to resolve this issue 
> or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is so 
> impressive it would be great to
>  get that working for the full-size problem.
> 
>    Many thanks and best wishes,
>                                   Dan.
> 
> 
> 
> From: Jose E. Roman <[email protected]>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <[email protected]>
> Cc: PETSc <[email protected]>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> In A) convergence may be slow, especially if the wanted eigenvalues have 
> small magnitude. I would not say 600 iterations is a lot, you probably need 
> many more. In most cases, approach B) is better because it improves 
> convergence of eigenvalues close to the target, but it requires prior 
> knowledge of your spectrum distribution in order to choose an appropriate 
> target.
> 
> In B) what do you mean that it crashes. If you get an error about 
> factorization, it means that your A-matrix is singular, In that case, try 
> using a nonzero target -eps_target 0.1
> 
> Jose
> 
> 
> > El 19 ago 2021, a las 7:12, dazza simplythebest <[email protected]> 
> > escribió:
> > 
> > Dear All,
> >             I am planning on using slepc to do a large number of eigenvalue 
> > calculations
> >  of a generalized eigenvalue problem, called from a program written in 
> > fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both 
> > locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices are 
> > efficiently and 
> > correctly constructed and slepc returns the correct spectrum. I am just now 
> > starting to move
> > towards now solving the full-size 'production run' problems, and would 
> > appreciate some 
> > general advice on how to improve the solver's performance.
> > 
> > In particular, I am currently trying to solve the problem Ax = lambda Bx 
> > whose matrices 
> > are of size 50000 (this is the smallest 'production run' problem I will be 
> > tackling), and are 
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues with 
> > the largest real part, 
> > although in other cases I will also be interested in finding the 
> > eigenvalues whose real part 
> > is close to zero.
> > 
> > A)
> > Calling slepc 's EPS solver with the following options:
> > 
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 
> > 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> > 
> > 
> > led to the code successfully running, but failing to find any eigenvalues 
> > within the maximum 600 iterations 
> > (examining the monitor output it did appear to be very slowly approaching 
> > convergence).
> > 
> > B)
> > On the same problem I have also tried a shift-invert transformation using 
> > the options
> > 
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> > 
> > -in this case the code crashed at the point it tried to call slepc, so 
> > perhaps I have incorrectly specified these options ?
> > 
> > 
> > Does anyone have any suggestions as to how to improve this performance ( or 
> > find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that 
> > increasing ncv 
> > may help, but I am wondering , since 600 is a large number of iterations, 
> > whether there 
> > maybe something else going on - e.g. perhaps some alternative 
> > preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line 
> > options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/

Reply via email to