Regarding the MUMPS issue, I’m not sure if this is useful, but when I run with the mumps flags -mat_mumps_icntl_4 4, to see the progress, it hangs at this point:
... Structural symmetry (in percent)= 75 Density: NBdense, Average, Median = 2 9 7 Ordering based on METIS -gideon > On Aug 22, 2015, at 5:12 PM, Barry Smith <[email protected]> wrote: > > >> On Aug 22, 2015, at 4:04 PM, Gideon Simpson <[email protected]> wrote: >> >> I’m having issues with both SuperLU dist and MUMPS, as compiled by PETsc, in >> the following sense: >> >> 1. For large enough systems, which seems to vary depending on which >> computer I’m on, MUMPS seems to just die and never start, when it’s used as >> the linear solver within a SNES. There’s no error message, it just sits >> there and doesn’t do anything. > > You will need to use a debugger to figure out where it is "hanging"; we > haven't heard reports about this. >> >> 2. When running with SuperLU dist, I got the following error, with no >> further information: > > The last release of SuperLU_DIST had some pretty nasty bugs, memory > corruption that caused crashes etc. We think they are now fixed if you use > the maint branch of the PETSc repository and --download-superlu_dist If you > stick with the PETSc release and SuperLU_Dist you are using you will keep > seeing these crashes > > Barry > > >> >> [3]PETSC ERROR: >> ------------------------------------------------------------------------ >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, >> probably memory access out of range >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [3]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [3]PETSC ERROR: likely location of problem given in stack below >> [3]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [3]PETSC ERROR: INSTEAD the line number of the start of the function >> [3]PETSC ERROR: is given. >> [3]PETSC ERROR: [3] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [3]PETSC ERROR: [3] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [3]PETSC ERROR: [3] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [3]PETSC ERROR: [3] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [3]PETSC ERROR: [3] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [3]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [3]PETSC ERROR: Signal received >> [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [3]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [3]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [3]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [3]PETSC ERROR: #1 User provided function() line 0 in unknown file >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD >> with errorcode 59. >> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >> You may or may not see output from other processes, depending on >> exactly when Open MPI kills them. >> -------------------------------------------------------------------------- >> [proteusi01:14037] 1 more process has sent help message help-mpi-api.txt / >> mpi-abort >> [proteusi01:14037] Set MCA parameter "orte_base_help_aggregate" to 0 to see >> all help / error messages >> [6]PETSC ERROR: >> ------------------------------------------------------------------------ >> [6]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [6]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [6]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [6]PETSC ERROR: likely location of problem given in stack below >> [6]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [6]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [6]PETSC ERROR: INSTEAD the line number of the start of the function >> [6]PETSC ERROR: is given. >> [6]PETSC ERROR: [6] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [6]PETSC ERROR: [6] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [6]PETSC ERROR: [6] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [6]PETSC ERROR: [6] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [6]PETSC ERROR: [6] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [6]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [6]PETSC ERROR: Signal received >> [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [6]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [6]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [6]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [6]PETSC ERROR: #1 User provided function() line 0 in unknown file >> [7]PETSC ERROR: >> ------------------------------------------------------------------------ >> [7]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [7]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [7]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [7]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [7]PETSC ERROR: likely location of problem given in stack below >> [7]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [7]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [7]PETSC ERROR: INSTEAD the line number of the start of the function >> [7]PETSC ERROR: is given. >> [7]PETSC ERROR: [7] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [7]PETSC ERROR: [7] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [7]PETSC ERROR: [7] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [7]PETSC ERROR: [7] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [7]PETSC ERROR: [7] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [7]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [7]PETSC ERROR: Signal received >> [7]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [7]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [7]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [7]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [7]PETSC ERROR: #1 User provided function() line 0 in unknown file >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [0]PETSC ERROR: likely location of problem given in stack below >> [0]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [0]PETSC ERROR: INSTEAD the line number of the start of the function >> [0]PETSC ERROR: is given. >> [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [0]PETSC ERROR: [0] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [0]PETSC ERROR: [0] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [0]PETSC ERROR: [0] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Signal received >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [0]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [0]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >> [1]PETSC ERROR: >> ------------------------------------------------------------------------ >> [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [1]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [1]PETSC ERROR: likely location of problem given in stack below >> [1]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [1]PETSC ERROR: INSTEAD the line number of the start of the function >> [1]PETSC ERROR: is given. >> [1]PETSC ERROR: [1] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [1]PETSC ERROR: [1] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [1]PETSC ERROR: [1] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [1]PETSC ERROR: [1] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [1]PETSC ERROR: [1] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [1]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [1]PETSC ERROR: Signal received >> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [1]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [1]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [1]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [1]PETSC ERROR: #1 User provided function() line 0 in unknown file >> [2]PETSC ERROR: >> ------------------------------------------------------------------------ >> [2]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [2]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [2]PETSC ERROR: likely location of problem given in stack below >> [2]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [2]PETSC ERROR: INSTEAD the line number of the start of the function >> [2]PETSC ERROR: is given. >> [2]PETSC ERROR: [2] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [2]PETSC ERROR: [2] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [2]PETSC ERROR: [2] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [2]PETSC ERROR: [2] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [2]PETSC ERROR: [2] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [2]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [2]PETSC ERROR: Signal received >> [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [2]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [2]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [2]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [2]PETSC ERROR: #1 User provided function() line 0 in unknown file >> [4]PETSC ERROR: >> ------------------------------------------------------------------------ >> [4]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [4]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [4]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [4]PETSC ERROR: likely location of problem given in stack below >> [4]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [4]PETSC ERROR: INSTEAD the line number of the start of the function >> [4]PETSC ERROR: is given. >> [4]PETSC ERROR: [4] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [4]PETSC ERROR: [4] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [4]PETSC ERROR: [4] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [4]PETSC ERROR: [4] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [4]PETSC ERROR: [4] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [4]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [4]PETSC ERROR: Signal received >> [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [4]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [4]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [4]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [4]PETSC ERROR: #1 User provided function() line 0 in unknown file >> [5]PETSC ERROR: >> ------------------------------------------------------------------------ >> [5]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [5]PETSC ERROR: or see >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [5]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X >> to find memory corruption errors >> [5]PETSC ERROR: likely location of problem given in stack below >> [5]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [5]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [5]PETSC ERROR: INSTEAD the line number of the start of the function >> [5]PETSC ERROR: is given. >> [5]PETSC ERROR: [5] SuperLU_DIST:pdgssvx line 161 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [5]PETSC ERROR: [5] MatSolve_SuperLU_DIST line 121 >> /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c >> [5]PETSC ERROR: [5] MatSolve line 3104 >> /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c >> [5]PETSC ERROR: [5] PCApply_LU line 194 >> /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c >> [5]PETSC ERROR: [5] KSP_PCApplyBAorAB line 258 >> /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h >> [5]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [5]PETSC ERROR: Signal received >> [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [5]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 >> [5]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by >> simpson Sat Aug 22 17:01:41 2015 >> [5]PETSC ERROR: Configure options >> --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed >> --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a >> --with-lapack-lib=/liblapack.a --download-suitesparse=yes >> --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes >> --download-metis=yes --download-parmetis=yes --download-scalapack=yes >> [5]PETSC ERROR: #1 User provided function() line 0 in unknown file >> >> -gideon >> >> >
