Dear All,
Okay, thanks for the tip and all the guidance this far - I will
also investigate superLU as the linear solver.
I have a good test problem now at least !
Have a good weekend and many thanks once again,
Dan.
________________________________
From: Matthew Knepley <[email protected]>
Sent: Thursday, August 26, 2021 3:53 PM
To: dazza simplythebest <[email protected]>
Cc: Jose E. Roman <[email protected]>; PETSc <[email protected]>
Subject: Re: [petsc-users] Improving efficiency of slepc usage -memory
management when using shift-invert
On Thu, Aug 26, 2021 at 8:32 AM dazza simplythebest
<[email protected]<mailto:[email protected]>> wrote:
Dear Jose and Matthew,
Many thanks for your assistance, this would seem to explain
what the problem was.
So judging by this test case, there seems to be a memory vs computational time
tradeoff involved
in choosing whether to shift-invert or not; the shift-invert will greatly
reduce the
number of required iterations ,but will require a higher memory cost ?
I have been trying a few values of -st_mat_mumps_icntl_14 (and also the
alternative
-st_mat_mumps_icntl_23) today but have not yet been able to select one that
fits onto the
workstation I am using (although it seems that setting these parameters seems
to guarantee
that an error message is generated at least).
Thus I will probably need to reduce the number of MPI
processes and thereby reduce the memory requirement). In this regard the MUMPS
documentation
suggests that a hybrid MPI-OpenMP approach is optimum for their software,
whereas I remember reading
somewhere else that openmp threading was not a good choice for using PETSC,
would you have any
general advice on this ?
Memory does not really track the number of MPI processes. MUMPS does a lot of
things redundantly. For minimum memory, I
would suggest trying SuperLU_dist:
--download-superlu_dist
I do not think OpenMP will have much influence at all.
Thanks,
Matt
I was thinking maybe that a version of slepc / petsc compiled against openmp,
and with the number of threads set appropriately, but not explicitly using
openmp directives in
the user's code may be the way forward ? That way PETSC will (?) just ignore
the threading whereas
threading will be available to MUMPS when execution is passed to those
routines ?
Many thanks once again,
Dan.
________________________________
From: Jose E. Roman <[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 25, 2021 1:40 PM
To: dazza simplythebest <[email protected]<mailto:[email protected]>>
Cc: PETSc <[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Improving efficiency of slepc usage
MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is
insuficient workspace. Try running with
-st_mat_mumps_icntl_14 <percentage>
where <percentage> is the percentage in which you want to increase the
workspace, e.g. 50 or 100 or more.
See ex43.c for an example showing how to set this option in code.
Jose
> El 25 ago 2021, a las 14:11, dazza simplythebest
> <[email protected]<mailto:[email protected]>> escribió:
>
>
>
> From: dazza simplythebest <[email protected]<mailto:[email protected]>>
> Sent: Wednesday, August 25, 2021 12:08 PM
> To: Matthew Knepley <[email protected]<mailto:[email protected]>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> Dear Matthew and Jose,
> I have derived a smaller program
> from the original program by constructing
> matrices of the same size, but filling their entries randomly instead of
> computing the correct
> fluid dynamics values just to allow faster experimentation. This modified
> code's behaviour seems
> to be similar, with the code again failing for the large matrix case with
> the SIGKILL error, so I first report
> results from that code here. Firstly I can confirm that I am using Fortran ,
> and I am compiling with the
> intel compiler, which it seems places automatic arrays on the stack. The
> stacksize, as determined
> by ulimit -a, is reported to be :
> stack size (kbytes, -s) 8192
>
> [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where'
> in one of the non-SIGKILL gdb windows.
> I have pasted the output into the bottom of this email (see [1] output) - it
> does look like the problem occurs somewhere in the call
> to the MUMPS solver ?
>
> [2] I have also today gained access to another workstation, and so have tried
> running the (original) code on that machine.
> This new machine has two (more powerful) CPU nodes and a larger memory
> (both machines feature Intel Xeon processors).
> On this new machine the large matrix case again failed with the familiar
> SIGKILL report when I used 16 or 12 MPI
> processes, ran to the end w/out error for 4 or 6 MPI processes, and failed
> but with a PETSC error message
> when I used 8 MPI processes, which I have pasted below (see [2] output).
> Does this point to some sort of resource
> demand that exceeds some limit as the number of MPI processes increases ?
>
> Many thanks once again,
> Dan.
>
> [2] output
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=6
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [0]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=6
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [1]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [1]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [1]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [1]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=6
>
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [2]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [2]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [2]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [2]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [2]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=6
>
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [3]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [3]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [3]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [3]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [3]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [3]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [3]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [4]PETSC ERROR: Error in external library
> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=6
>
> [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [4]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [4]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [4]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [4]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [4]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [4]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [4]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [4]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [5]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [5]PETSC ERROR: Error in external library
> [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=6
>
> [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [5]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [5]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [5]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [5]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [5]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [5]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [5]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [5]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [5]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [6]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [6]PETSC ERROR: Error in external library
> [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=21891045
>
> [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [6]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [6]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [6]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [6]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [6]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [6]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [6]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [6]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [6]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [7]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [7]PETSC ERROR: Error in external library
> [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase:
> INFOG(1)=-9, INFO(2)=21841925
>
> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed
> Aug 25 11:18:48 2021
> [7]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [7]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [7]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [7]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [7]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [7]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [7]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [7]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [7]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [0]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [0]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [0]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [0]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [0]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [0]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [0]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [0]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [1]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [1]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [1]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [1]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [2]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [2]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [2]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [2]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [3]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
>
>
>
> [1] output
>
> Continuing.
> [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> [New Thread 0x7f6f5aad0800 (LWP 794040)]
> [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> ^C
> Thread 1 "my.exe" received signal SIGINT, Interrupt.
> 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> (gdb) where
> #0 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #1 0x00007f729049354b in ofi_cq_readfrom ()
> from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #2 0x00007f728ffe8f0e in rxm_ep_do_progress ()
> from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
> from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
> from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
> comm=1, flag=0x0, status=0xffffffffffffffff)
> at /usr/include/rdma/fi_tagged.h:109
> #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
> v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
> at ../../src/binding/fortran/mpif_h/iprobef.c:276
> #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
> blocking=<error reading variable: Cannot access memory at address 0x1>,
>
> --Type <RET> for more, q to quit, c to continue without paging--cont
> irecv=<error reading variable: Cannot access memory at address 0x0>,
> message_received=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=...,
> lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1,
> iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816,
> lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796,
> ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=...,
> pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958,
> nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4,
> root=<error reading variable: value of type `zmumps_root_struc' requires
> 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0,
> itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=...,
> intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=...,
> frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=...,
> tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at
> zfac_process_message.F:730
> #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=...,
> liw=<error reading variable: Cannot access memory at address 0x1>, a=...,
> la=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=...,
> step=..., frere=..., dad=..., cand=..., istep_to_iniv2=...,
> tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable:
> Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot
> access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576,
> nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at
> address 0x2>, noffnegpv=<error reading variable: Cannot access memory at
> address 0x0>, nb22t1=<error reading variable: Cannot access memory at address
> 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>,
> nbtiny=<error reading variable: Cannot access memory at address 0x0>,
> det_exp=<error reading variable: Cannot access memory at address 0x0>,
> det_mant=<error reading variable: Cannot access memory at address 0x0>,
> det_sign=<error reading variable: Cannot access memory at address 0x0>,
> ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=...,
> itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot
> access memory at address 0x0>, rinfo=<error reading variable: Cannot access
> memory at address 0x0>, posfac=<error reading variable: Cannot access memory
> at address 0x0>, iwpos=<error reading variable: Cannot access memory at
> address 0x0>, lrlu=<error reading variable: Cannot access memory at address
> 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>,
> lrlus=<error reading variable: Cannot access memory at address 0x0>,
> leaf=<error reading variable: Cannot access memory at address 0x0>,
> nbroot=<error reading variable: Cannot access memory at address 0x0>,
> nbrtot=<error reading variable: Cannot access memory at address 0x0>,
> uu=<error reading variable: Cannot access memory at address 0x0>,
> icntl=<error reading variable: Cannot access memory at address 0x0>,
> ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at
> address 0x0>, keep=<error reading variable: Cannot access memory at address
> 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at
> address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot
> access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot
> access memory at address 0xffffffff>, comm_nodes=<error reading variable:
> Cannot access memory at address 0x0>, myid_nodes=<error reading variable:
> Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5,
> intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=...,
> lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314,
> seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=...,
> pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
> #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error
> reading variable: Cannot access memory at address 0x1>, liw=<error reading
> variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1,
> ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=...,
> istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading
> variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=...,
> ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=...,
> lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading
> variable: Cannot access memory at address 0x25344>, info=..., rinfo=...,
> keep=..., keep8=..., procnode_steps=..., slavef=-1889504640,
> comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at
> address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading
> variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error
> reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error
> reading variable: Cannot access memory at address 0x4>, intarr=...,
> dblarr=..., root=<error reading variable: Cannot access memory at address
> 0x11dbec>, nelt=<error reading variable: Cannot access memory at address
> 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot
> access memory at address 0x0>, ass_irecv=<error reading variable: Cannot
> access memory at address 0x0>, seuil=<error reading variable: Cannot access
> memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot
> access memory at address 0x0>, mem_distrib=<error reading variable: Cannot
> access memory at address 0x0>, dkeep=<error reading variable: Cannot access
> memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable:
> Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
> #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable:
> value of type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zfac_driver.F:2421
> #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type
> `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>)
> at zmumps_driver.F:1883
> #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading
> variable: Cannot access memory at address 0x1>, comm_f77=<error reading
> variable: Cannot access memory at address 0x0>, n=<error reading variable:
> Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=...,
> cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0,
> jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=...,
> irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0,
> eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0,
> blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=...,
> perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=...,
> rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0,
> listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=...,
> wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0,
> instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=...,
> rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0,
> irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0,
> isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0,
> lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0,
> mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=...,
> write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20,
> write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at
> zmumps_f77.F:289
> #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248,
> A=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248,
> mat=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> #20 0x00007f7309130462 in STSetUp (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=...,
> b_pet=..., jthisone=<error reading variable: Cannot access memory at address
> 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at
> small_slepc_example_program.F:322
> #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> #26 0x00000000004023f2 in main ()
> #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14,
> argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>,
> rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at
> ../csu/libc-start.c:308
> #28 0x00000000004022fe in _start ()
>
> From: Matthew Knepley <[email protected]<mailto:[email protected]>>
> Sent: Tuesday, August 24, 2021 3:59 PM
> To: dazza simplythebest <[email protected]<mailto:[email protected]>>
> Cc: Jose E. Roman <[email protected]<mailto:[email protected]>>; PETSc
> <[email protected]<mailto:[email protected]>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest
> <[email protected]<mailto:[email protected]>> wrote:
>
> Dear Matthew and Jose,
> Apologies for the delayed reply, I had a couple of unforeseen days off
> this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already using
> MUMPS
> to solve linear systems (the code is using a distributed MPI matrix to solve
> the generalised
> non-Hermitian complex problem).
>
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it didn't
> work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
>
> I then first ran the debugger on the small matrix case that worked.
> That stopped in gdb almost immediately after starting execution
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run through
> to the end of the
> execution w/out any problems, and with correct looking results, so I am
> guessing this error
> is not particularly important.
>
> We do that on purpose when the debugger starts up. Typing 'cont' is correct.
>
> I then tried the same debugging procedure on the large matrix case that fails.
> The code again stopped almost immediately after the start of execution with
> the same nanosleep error as before, and I was able to set the program running
> again with 'cont' (see full output below). I was running the code with 4 MPI
> processes,
> and so had 4 gdb windows appear. Thereafter the code ran for sometime until
> completing the
> matrix construction, and then one of the gdb process windows printed a
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message. I then typed 'where' into this terminal but just received the
> message
> No stack.
>
> I have only seen this behavior one other time, and it was with Fortran.
> Fortran allows you to declare really big arrays
> on the stack by putting them at the start of a function (rather than F90
> malloc). When I had one of those arrays exceed
> the stack space, I got this kind of an error where everything is destroyed
> rather than just stopping. Could it be that you
> have a large structure on the stack?
>
> Second, you can at least look at the stack for the processes that were not
> killed. You type Ctrl-C, which should give you
> the prompt and then "where".
>
> Thanks,
>
> Matt
>
> The other gdb windows basically seemed to be left in limbo until I issued the
> 'quit'
> command in the SIGKILL, and then they vanished.
>
> I paste the full output from the gdb window that recorded the SIGKILL below
> here.
> I guess it is necessary to somehow work out where the SIGKILL originates from
> ?
>
> Thanks once again,
> Dan.
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program:
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe,
> process 675919
> Reading symbols from
> /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from
> /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for
> more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from
> /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>,
> clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffdc641a9a0,
> rem=rem@entry=0x7ffdc641a9a0) at
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or
> directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
>
> From: Matthew Knepley <[email protected]<mailto:[email protected]>>
> Sent: Friday, August 20, 2021 2:12 PM
> To: dazza simplythebest <[email protected]<mailto:[email protected]>>
> Cc: Jose E. Roman <[email protected]<mailto:[email protected]>>; PETSc
> <[email protected]<mailto:[email protected]>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest
> <[email protected]<mailto:[email protected]>> wrote:
> Dear Jose,
> Many thanks for your response, I have been investigating this issue with
> a few more calculations
> today, hence the slightly delayed response.
>
> The problem is actually derived from a fluid dynamics problem, so to allow an
> easier exploration of things
> I first downsized the resolution of the underlying fluid solver while keeping
> all the physical parameters
> the same - i.e. I would get a smaller matrix that should be solving the same
> physical problem as the original
> larger matrix but to lower accuracy.
>
> Results
>
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92
> iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert
> approach, converging
> very impressively in a single iteration ! Interestingly it did this both for
> a non-zero -eps_target
> and also for a zero -eps_target.
>
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type
> sinvert
> I have just double checked again that the code does run properly when we use
> the -eps_largest_real
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol=
> -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which
> took 6 hours on the
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark
> correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain
> 1789.56816314173 -4724.51319554773i
> as the eigenvalue with largest real part, while the smaller matrix (same
> physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means
> the agreement is in line
> with expectations.
>
> Unfortunately though the code does still crash though when I try to do
> shift-invert for the large matrix case ,
> whether or not I use a non-zero -eps_target. For reference this is the
> command line used :
> -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type
> sinvert -eps_monitor :monitor_output05.txt
> To be precise the code crashes soon after calling EPSSolve (it successfully
> calls
> MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and
> EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC,
> and do not even get the
> 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED
> BY SIGNAL: 9 (Killed)' message
> as soon as EPSsolve is called.
>
> Hi Dan,
>
> It would help track this error down if we had a stack trace. You can get a
> stack trace from the debugger. You run with
>
> -start_in_debugger
>
> which should launch the debugger (usually), and then type
>
> cont
>
> to continue, and then
>
> where
>
> to get the stack trace when it crashes, or 'bt' on lldb.
>
> Thanks,
>
> Matt
>
> Do you have any ideas as to why this larger matrix case should fail when
> using shift-invert but succeed when using
> -eps_largest_real ? The fact that the program works and produces correct
> results
> when using the -eps_largest_real option suggests that there is probably
> nothing wrong with the specification
> of the problem or the matrices ? It is strange how there is no error message
> from slepc / Petsc ... the
> only idea I have at the moment is that perhaps max memory has been exceeded,
> which could cause such a sudden
> shutdown? For your reference when running the large matrix case with the
> -eps_largest_real option I am using
> about 36 GB of the 148GB available on this machine - does the shift invert
> approach require substantially
> more memory for example ?
>
> I would be very grateful if you have any suggestions to resolve this issue
> or even ways to clarify it further,
> the performance I have seen with the shift-invert for the small matrix is so
> impressive it would be great to
> get that working for the full-size problem.
>
> Many thanks and best wishes,
> Dan.
>
>
>
> From: Jose E. Roman <[email protected]<mailto:[email protected]>>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <[email protected]<mailto:[email protected]>>
> Cc: PETSc <[email protected]<mailto:[email protected]>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> In A) convergence may be slow, especially if the wanted eigenvalues have
> small magnitude. I would not say 600 iterations is a lot, you probably need
> many more. In most cases, approach B) is better because it improves
> convergence of eigenvalues close to the target, but it requires prior
> knowledge of your spectrum distribution in order to choose an appropriate
> target.
>
> In B) what do you mean that it crashes. If you get an error about
> factorization, it means that your A-matrix is singular, In that case, try
> using a nonzero target -eps_target 0.1
>
> Jose
>
>
> > El 19 ago 2021, a las 7:12, dazza simplythebest
> > <[email protected]<mailto:[email protected]>> escribió:
> >
> > Dear All,
> > I am planning on using slepc to do a large number of eigenvalue
> > calculations
> > of a generalized eigenvalue problem, called from a program written in
> > fortran using MPI.
> > Thus far I have successfully installed the slepc/PETSc software, both
> > locally and on a cluster,
> > and on smaller test problems everything is working well; the matrices are
> > efficiently and
> > correctly constructed and slepc returns the correct spectrum. I am just now
> > starting to move
> > towards now solving the full-size 'production run' problems, and would
> > appreciate some
> > general advice on how to improve the solver's performance.
> >
> > In particular, I am currently trying to solve the problem Ax = lambda Bx
> > whose matrices
> > are of size 50000 (this is the smallest 'production run' problem I will be
> > tackling), and are
> > complex, non-Hermitian. In most cases I aim to find the eigenvalues with
> > the largest real part,
> > although in other cases I will also be interested in finding the
> > eigenvalues whose real part
> > is close to zero.
> >
> > A)
> > Calling slepc 's EPS solver with the following options:
> >
> > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol
> > 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt
> >
> >
> > led to the code successfully running, but failing to find any eigenvalues
> > within the maximum 600 iterations
> > (examining the monitor output it did appear to be very slowly approaching
> > convergence).
> >
> > B)
> > On the same problem I have also tried a shift-invert transformation using
> > the options
> >
> > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert
> >
> > -in this case the code crashed at the point it tried to call slepc, so
> > perhaps I have incorrectly specified these options ?
> >
> >
> > Does anyone have any suggestions as to how to improve this performance ( or
> > find out more about the problem) ?
> > In the case of A) I can see from watching the slepc videos that
> > increasing ncv
> > may help, but I am wondering , since 600 is a large number of iterations,
> > whether there
> > maybe something else going on - e.g. perhaps some alternative
> > preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line
> > options?
> > Again, any advice will be greatly appreciated.
> > Best wishes, Dan.
>
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
> --
> What most experimenters take for granted before they begin their experiments
> is infinitely more interesting than any results to which their experiments
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
--
What most experimenters take for granted before they begin their experiments is
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>