Hello, Dan, You might want to have a look the manual at https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html Thanks. --Junchao Zhang
On Thu, Aug 26, 2021 at 7:32 AM dazza simplythebest <[email protected]> wrote: > Dear Jose and Matthew, > Many thanks for your assistance, this would seem to > explain what the problem was. > So judging by this test case, there seems to be a memory vs computational > time tradeoff involved > in choosing whether to shift-invert or not; the shift-invert will greatly > reduce the > number of required iterations ,but will require a higher memory cost ? > I have been trying a few values of -st_mat_mumps_icntl_14 (and also the > alternative > -st_mat_mumps_icntl_23) today but have not yet been able to select one > that fits onto the > workstation I am using (although it seems that setting these parameters > seems to guarantee > that an error message is generated at least). > > Thus I will probably need to reduce the number of MPI > processes and thereby reduce the memory requirement). In this regard the > MUMPS documentation > suggests that a hybrid MPI-OpenMP approach is optimum for their software, > whereas I remember reading > somewhere else that openmp threading was not a good choice for using > PETSC, would you have any > general advice on this ? I was thinking maybe that a version of slepc / > petsc compiled against openmp, > and with the number of threads set appropriately, but not explicitly > using openmp directives in > the user's code may be the way forward ? That way PETSC will (?) just > ignore the threading whereas > threading will be available to MUMPS when execution is passed to those > routines ? > > Many thanks once again, > Dan. > > > > ------------------------------ > *From:* Jose E. Roman <[email protected]> > *Sent:* Wednesday, August 25, 2021 1:40 PM > *To:* dazza simplythebest <[email protected]> > *Cc:* PETSc <[email protected]> > *Subject:* Re: [petsc-users] Improving efficiency of slepc usage > > MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 > is insuficient workspace. Try running with > -st_mat_mumps_icntl_14 <percentage> > where <percentage> is the percentage in which you want to increase the > workspace, e.g. 50 or 100 or more. > > See ex43.c for an example showing how to set this option in code. > > Jose > > > > El 25 ago 2021, a las 14:11, dazza simplythebest <[email protected]> > escribió: > > > > > > > > From: dazza simplythebest <[email protected]> > > Sent: Wednesday, August 25, 2021 12:08 PM > > To: Matthew Knepley <[email protected]> > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > Dear Matthew and Jose, > > I have derived a smaller > program from the original program by constructing > > matrices of the same size, but filling their entries randomly instead of > computing the correct > > fluid dynamics values just to allow faster experimentation. This > modified code's behaviour seems > > to be similar, with the code again failing for the large matrix case > with the SIGKILL error, so I first report > > results from that code here. Firstly I can confirm that I am using > Fortran , and I am compiling with the > > intel compiler, which it seems places automatic arrays on the stack. > The stacksize, as determined > > by ulimit -a, is reported to be : > > stack size (kbytes, -s) 8192 > > > > [1] Okay, so I followed your suggestion and used ctrl-c followed by > 'where' in one of the non-SIGKILL gdb windows. > > I have pasted the output into the bottom of this email (see [1] output) > - it does look like the problem occurs somewhere in the call > > to the MUMPS solver ? > > > > [2] I have also today gained access to another workstation, and so have > tried running the (original) code on that machine. > > This new machine has two (more powerful) CPU nodes and a larger memory > (both machines feature Intel Xeon processors). > > On this new machine the large matrix case again failed with the familiar > SIGKILL report when I used 16 or 12 MPI > > processes, ran to the end w/out error for 4 or 6 MPI processes, and > failed but with a PETSC error message > > when I used 8 MPI processes, which I have pasted below (see [2] > output). Does this point to some sort of resource > > demand that exceeds some limit as the number of MPI processes increases ? > > > > Many thanks once again, > > Dan. > > > > [2] output > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Error in external library > > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [0]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Error in external library > > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [1]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [1]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [1]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [1]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [1]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [2]PETSC ERROR: Error in external library > > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [2]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [2]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [2]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [2]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [2]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [3]PETSC ERROR: Error in external library > > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [3]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [3]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [3]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [3]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [3]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [3]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [3]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [3]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [4]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [4]PETSC ERROR: Error in external library > > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [4]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [4]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [4]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [4]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [4]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [4]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [4]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [4]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [4]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [5]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [5]PETSC ERROR: Error in external library > > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [5]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [5]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [5]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [5]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [5]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [5]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [5]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [5]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [5]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [6]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [6]PETSC ERROR: Error in external library > > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=21891045 > > > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [6]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [6]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [6]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [6]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [6]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [6]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [6]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [6]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [6]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [7]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [7]PETSC ERROR: Error in external library > > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=21841925 > > > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [7]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [7]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [7]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [7]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [7]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [7]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [7]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [7]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [7]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [0]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [0]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [0]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [0]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [0]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [0]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [0]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [0]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [1]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [1]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [1]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [1]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [2]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [2]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [2]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [2]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [3]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > > > > > [1] output > > > > Continuing. > > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > > [New Thread 0x7f6f5aad0800 (LWP 794040)] > > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > > ^C > > Thread 1 "my.exe" received signal SIGINT, Interrupt. > > 0x00007f72904927b0 in ofi_fastlock_release_noop () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > (gdb) where > > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > #1 0x00007f729049354b in ofi_cq_readfrom () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > > comm=1, flag=0x0, status=0xffffffffffffffff) > > at /usr/include/rdma/fi_tagged.h:109 > > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > > blocking=<error reading variable: Cannot access memory at address > 0x1>, > > > > --Type <RET> for more, q to quit, c to continue without paging--cont > > irecv=<error reading variable: Cannot access memory at address 0x0>, > message_received=<error reading variable: Cannot access memory at address > 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., > lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, > iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, > lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, > ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., > pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, > nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, > root=<error reading variable: value of type `zmumps_root_struc' requires > 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, > itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., > intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., > frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., > istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, > lrgroups=...) at zfac_process_message.F:730 > > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., > liw=<error reading variable: Cannot access memory at address 0x1>, a=..., > la=<error reading variable: Cannot access memory at address > 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., > step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., > tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: > Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot > access memory at address 0x0>, nelva=50400, comp=259581, > maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable: > Cannot access memory at address 0x2>, noffnegpv=<error reading variable: > Cannot access memory at address 0x0>, nb22t1=<error reading variable: > Cannot access memory at address 0x0>, nb22t2=<error reading variable: > Cannot access memory at address 0x0>, nbtiny=<error reading variable: > Cannot access memory at address 0x0>, det_exp=<error reading variable: > Cannot access memory at address 0x0>, det_mant=<error reading variable: > Cannot access memory at address 0x0>, det_sign=<error reading variable: > Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., > pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., > lpool=<error reading variable: Cannot access memory at address 0x0>, > rinfo=<error reading variable: Cannot access memory at address 0x0>, > posfac=<error reading variable: Cannot access memory at address 0x0>, > iwpos=<error reading variable: Cannot access memory at address 0x0>, > lrlu=<error reading variable: Cannot access memory at address 0x0>, > iptrlu=<error reading variable: Cannot access memory at address 0x0>, > lrlus=<error reading variable: Cannot access memory at address 0x0>, > leaf=<error reading variable: Cannot access memory at address 0x0>, > nbroot=<error reading variable: Cannot access memory at address 0x0>, > nbrtot=<error reading variable: Cannot access memory at address 0x0>, > uu=<error reading variable: Cannot access memory at address 0x0>, > icntl=<error reading variable: Cannot access memory at address 0x0>, > ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory > at address 0x0>, keep=<error reading variable: Cannot access memory at > address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access > memory at address 0x0>, procnode_steps=..., slavef=<error reading variable: > Cannot access memory at address 0x4ffffffff>, myid=<error reading variable: > Cannot access memory at address 0xffffffff>, comm_nodes=<error reading > variable: Cannot access memory at address 0x0>, myid_nodes=<error reading > variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, > lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, > frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, > seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, > mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, > lrgroups=...) at zfac_par_m.F:182 > > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., > la=<error reading variable: Cannot access memory at address 0x1>, > liw=<error reading variable: Cannot access memory at address 0x0>, > sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., > frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., > ptrar=..., ldptrar=<error reading variable: Cannot access memory at address > 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., > rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, > icntl=<error reading variable: Cannot access memory at address 0x25344>, > info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., > slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable: > Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., > lbufr=<error reading variable: Cannot access memory at address 0x11db4c>, > lbufr_bytes=<error reading variable: Cannot access memory at address > 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at > address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot > access memory at address 0x11dbec>, nelt=<error reading variable: Cannot > access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error > reading variable: Cannot access memory at address 0x0>, ass_irecv=<error > reading variable: Cannot access memory at address 0x0>, seuil=<error > reading variable: Cannot access memory at address 0x0>, > seuil_ldlt_niv2=<error reading variable: Cannot access memory at address > 0x0>, mem_distrib=<error reading variable: Cannot access memory at address > 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>, > pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at > address 0x0>, lrgroups=...) at zfac_b.F:243 > > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: > value of type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zfac_driver.F:2421 > > #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of > type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zmumps_driver.F:1883 > > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading > variable: Cannot access memory at address 0x1>, comm_f77=<error reading > variable: Cannot access memory at address 0x0>, n=<error reading variable: > Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., > cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, > jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, > irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., > a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, > a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, > perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, > info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, > size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., > schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, > rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., > rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., > irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, > nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, > schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., > ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., > tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, > save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 > > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, > A=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, > mat=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, > __ierr=0x7ffda7afdae0) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., > b_pet=..., jthisone=<error reading variable: Cannot access memory at > address 0x1>, isize=<error reading variable: Cannot access memory at > address 0x0>) at small_slepc_example_program.F:322 > > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > > #26 0x00000000004023f2 in main () > > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, > argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, > rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at > ../csu/libc-start.c:308 > > #28 0x00000000004022fe in _start () > > > > From: Matthew Knepley <[email protected]> > > Sent: Tuesday, August 24, 2021 3:59 PM > > To: dazza simplythebest <[email protected]> > > Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]> > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest < > [email protected]> wrote: > > > > Dear Matthew and Jose, > > Apologies for the delayed reply, I had a couple of unforeseen days > off this week. > > Firstly regarding Jose's suggestion re: MUMPS, the program is already > using MUMPS > > to solve linear systems (the code is using a distributed MPI matrix to > solve the generalised > > non-Hermitian complex problem). > > > > I have tried the gdb debugger as per Matthew's suggestion. > > Just to note in case someone else is following this that at first it > didn't work (couldn't 'attach') , > > but after some googling I found a tip suggesting the command; > > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > > which seemed to get it working. > > > > I then first ran the debugger on the small matrix case that worked. > > That stopped in gdb almost immediately after starting execution > > with a report regarding 'nanosleep.c': > > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > > However, issuing the 'cont' command again caused the program to run > through to the end of the > > execution w/out any problems, and with correct looking results, so I am > guessing this error > > is not particularly important. > > > > We do that on purpose when the debugger starts up. Typing 'cont' is > correct. > > > > I then tried the same debugging procedure on the large matrix case that > fails. > > The code again stopped almost immediately after the start of execution > with > > the same nanosleep error as before, and I was able to set the program > running > > again with 'cont' (see full output below). I was running the code with > 4 MPI processes, > > and so had 4 gdb windows appear. Thereafter the code ran for sometime > until completing the > > matrix construction, and then one of the gdb process windows printed a > > Program terminated with signal SIGKILL, Killed. > > The program no longer exists. > > message. I then typed 'where' into this terminal but just received the > message > > No stack. > > > > I have only seen this behavior one other time, and it was with Fortran. > Fortran allows you to declare really big arrays > > on the stack by putting them at the start of a function (rather than F90 > malloc). When I had one of those arrays exceed > > the stack space, I got this kind of an error where everything is > destroyed rather than just stopping. Could it be that you > > have a large structure on the stack? > > > > Second, you can at least look at the stack for the processes that were > not killed. You type Ctrl-C, which should give you > > the prompt and then "where". > > > > Thanks, > > > > Matt > > > > The other gdb windows basically seemed to be left in limbo until I > issued the 'quit' > > command in the SIGKILL, and then they vanished. > > > > I paste the full output from the gdb window that recorded the SIGKILL > below here. > > I guess it is necessary to somehow work out where the SIGKILL originates > from ? > > > > Thanks once again, > > Dan. > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > > Copyright (C) 2020 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. > > Type "show copying" and "show warranty" for details. > > This GDB was configured as "x86_64-linux-gnu". > > Type "show configuration" for configuration details. > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>. > > Find the GDB manual and other documentation resources online at: > > <http://www.gnu.org/software/gdb/documentation/>. > > > > For help, type "help". > > Type "apropos word" to search for commands related to "word"... > > Reading symbols from ./stab1.exe... > > Attaching to program: > /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, > process 675919 > > Reading symbols from > /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > > Reading symbols from > /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for > more, q to quit, c to continue without paging--cont > > /intel64_lin/libmkl_intel_lp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > > Reading symbols from > /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > > Reading symbols from > /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library > "/lib/x86_64-linux-gnu/libthread_db.so.1". > > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > > Reading symbols from > /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > > (No debugging symbols found in > /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, > clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffdc641a9a0, > rem=rem@entry=0x7ffdc641a9a0) at > ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or > directory. > > (gdb) cont > > Continuing. > > [New Thread 0x7f9e49c02780 (LWP 676559)] > > [New Thread 0x7f9e49400800 (LWP 676560)] > > [New Thread 0x7f9e48bfe880 (LWP 676562)] > > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > > [Thread 0x7f9e49400800 (LWP 676560) exited] > > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > > > Program terminated with signal SIGKILL, Killed. > > The program no longer exists. > > (gdb) where > > No stack. > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - > > > > From: Matthew Knepley <[email protected]> > > Sent: Friday, August 20, 2021 2:12 PM > > To: dazza simplythebest <[email protected]> > > Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]> > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest < > [email protected]> wrote: > > Dear Jose, > > Many thanks for your response, I have been investigating this issue > with a few more calculations > > today, hence the slightly delayed response. > > > > The problem is actually derived from a fluid dynamics problem, so to > allow an easier exploration of things > > I first downsized the resolution of the underlying fluid solver while > keeping all the physical parameters > > the same - i.e. I would get a smaller matrix that should be solving the > same physical problem as the original > > larger matrix but to lower accuracy. > > > > Results > > > > Small matrix (N= 21168) - everything good! > > This converged when using the -eps_largest_real approach (taking 92 > iterations for nev=10, > > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert > approach, converging > > very impressively in a single iteration ! Interestingly it did this both > for a non-zero -eps_target > > and also for a zero -eps_target. > > > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type > sinvert > > I have just double checked again that the code does run properly when we > use the -eps_largest_real > > option - indeed I ran it with a small nev and large tolerance (nev = 4, > tol= -eps_tol 5.0e-4 , ncv = 300) > > and with these parameters convergence was obtained in 164 iterations, > which took 6 hours on the > > machine I was running it on. Furthermore the eigenvalues seem to be > ballpark correct; for this large > > higher resolution case (although with lower slepc tolerance) we obtain > 1789.56816314173 -4724.51319554773i > > as the eigenvalue with largest real part, while the smaller matrix > (same physical problem but at lower resolution case) > > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which > means the agreement is in line > > with expectations. > > > > Unfortunately though the code does still crash though when I try to do > shift-invert for the large matrix case , > > whether or not I use a non-zero -eps_target. For reference this is the > command line used : > > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 > -st_type sinvert -eps_monitor :monitor_output05.txt > > To be precise the code crashes soon after calling EPSSolve (it > successfully calls > > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and > EPSSetFromOptions). > > By crashes I mean that I do not even get any error messages from > slepc/PETSC, and do not even get the > > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran > 'KILLED BY SIGNAL: 9 (Killed)' message > > as soon as EPSsolve is called. > > > > Hi Dan, > > > > It would help track this error down if we had a stack trace. You can get > a stack trace from the debugger. You run with > > > > -start_in_debugger > > > > which should launch the debugger (usually), and then type > > > > cont > > > > to continue, and then > > > > where > > > > to get the stack trace when it crashes, or 'bt' on lldb. > > > > Thanks, > > > > Matt > > > > Do you have any ideas as to why this larger matrix case should fail when > using shift-invert but succeed when using > > -eps_largest_real ? The fact that the program works and produces correct > results > > when using the -eps_largest_real option suggests that there is probably > nothing wrong with the specification > > of the problem or the matrices ? It is strange how there is no error > message from slepc / Petsc ... the > > only idea I have at the moment is that perhaps max memory has been > exceeded, which could cause such a sudden > > shutdown? For your reference when running the large matrix case with the > -eps_largest_real option I am using > > about 36 GB of the 148GB available on this machine - does the shift > invert approach require substantially > > more memory for example ? > > > > I would be very grateful if you have any suggestions to resolve this > issue or even ways to clarify it further, > > the performance I have seen with the shift-invert for the small matrix > is so impressive it would be great to > > get that working for the full-size problem. > > > > Many thanks and best wishes, > > Dan. > > > > > > > > From: Jose E. Roman <[email protected]> > > Sent: Thursday, August 19, 2021 7:58 AM > > To: dazza simplythebest <[email protected]> > > Cc: PETSc <[email protected]> > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > In A) convergence may be slow, especially if the wanted eigenvalues have > small magnitude. I would not say 600 iterations is a lot, you probably need > many more. In most cases, approach B) is better because it improves > convergence of eigenvalues close to the target, but it requires prior > knowledge of your spectrum distribution in order to choose an appropriate > target. > > > > In B) what do you mean that it crashes. If you get an error about > factorization, it means that your A-matrix is singular, In that case, try > using a nonzero target -eps_target 0.1 > > > > Jose > > > > > > > El 19 ago 2021, a las 7:12, dazza simplythebest <[email protected]> > escribió: > > > > > > Dear All, > > > I am planning on using slepc to do a large number of > eigenvalue calculations > > > of a generalized eigenvalue problem, called from a program written in > fortran using MPI. > > > Thus far I have successfully installed the slepc/PETSc software, both > locally and on a cluster, > > > and on smaller test problems everything is working well; the matrices > are efficiently and > > > correctly constructed and slepc returns the correct spectrum. I am > just now starting to move > > > towards now solving the full-size 'production run' problems, and would > appreciate some > > > general advice on how to improve the solver's performance. > > > > > > In particular, I am currently trying to solve the problem Ax = lambda > Bx whose matrices > > > are of size 50000 (this is the smallest 'production run' problem I > will be tackling), and are > > > complex, non-Hermitian. In most cases I aim to find the eigenvalues > with the largest real part, > > > although in other cases I will also be interested in finding the > eigenvalues whose real part > > > is close to zero. > > > > > > A) > > > Calling slepc 's EPS solver with the following options: > > > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 > -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > > > > led to the code successfully running, but failing to find any > eigenvalues within the maximum 600 iterations > > > (examining the monitor output it did appear to be very slowly > approaching convergence). > > > > > > B) > > > On the same problem I have also tried a shift-invert transformation > using the options > > > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > > > -in this case the code crashed at the point it tried to call slepc, so > perhaps I have incorrectly specified these options ? > > > > > > > > > Does anyone have any suggestions as to how to improve this performance > ( or find out more about the problem) ? > > > In the case of A) I can see from watching the slepc videos that > increasing ncv > > > may help, but I am wondering , since 600 is a large number of > iterations, whether there > > > maybe something else going on - e.g. perhaps some alternative > preconditioner may help ? > > > In the case of B), I guess there must be some mistake in these command > line options? > > > Again, any advice will be greatly appreciated. > > > Best wishes, Dan. > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > >
