> On Aug 22, 2015, at 4:04 PM, Gideon Simpson <[email protected]> wrote: > > I’m having issues with both SuperLU dist and MUMPS, as compiled by PETsc, in > the following sense: > > 1. For large enough systems, which seems to vary depending on which computer > I’m on, MUMPS seems to just die and never start, when it’s used as the linear > solver within a SNES. There’s no error message, it just sits there and > doesn’t do anything.
You will need to use a debugger to figure out where it is "hanging"; we haven't heard reports about this. > > 2. When running with SuperLU dist, I got the following error, with no > further information: The last release of SuperLU_DIST had some pretty nasty bugs, memory corruption that caused crashes etc. We think they are now fixed if you use the maint branch of the PETSc repository and --download-superlu_dist If you stick with the PETSc release and SuperLU_Dist you are using you will keep seeing these crashes Barry > > [3]PETSC ERROR: > ------------------------------------------------------------------------ > [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [3]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [3]PETSC ERROR: likely location of problem given in stack below > [3]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [3]PETSC ERROR: INSTEAD the line number of the start of the function > [3]PETSC ERROR: is given. > [3]PETSC ERROR: [3] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [3]PETSC ERROR: [3] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [3]PETSC ERROR: [3] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [3]PETSC ERROR: [3] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [3]PETSC ERROR: [3] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [3]PETSC ERROR: Signal received > [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [3]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [3]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [3]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD > with errorcode 59. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [proteusi01:14037] 1 more process has sent help message help-mpi-api.txt / > mpi-abort > [proteusi01:14037] Set MCA parameter "orte_base_help_aggregate" to 0 to see > all help / error messages > [6]PETSC ERROR: > ------------------------------------------------------------------------ > [6]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [6]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [6]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [6]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [6]PETSC ERROR: likely location of problem given in stack below > [6]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [6]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [6]PETSC ERROR: INSTEAD the line number of the start of the function > [6]PETSC ERROR: is given. > [6]PETSC ERROR: [6] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [6]PETSC ERROR: [6] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [6]PETSC ERROR: [6] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [6]PETSC ERROR: [6] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [6]PETSC ERROR: [6] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [6]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [6]PETSC ERROR: Signal received > [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [6]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [6]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [6]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [6]PETSC ERROR: #1 User provided function() line 0 in unknown file > [7]PETSC ERROR: > ------------------------------------------------------------------------ > [7]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [7]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [7]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [7]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [7]PETSC ERROR: likely location of problem given in stack below > [7]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [7]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [7]PETSC ERROR: INSTEAD the line number of the start of the function > [7]PETSC ERROR: is given. > [7]PETSC ERROR: [7] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [7]PETSC ERROR: [7] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [7]PETSC ERROR: [7] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [7]PETSC ERROR: [7] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [7]PETSC ERROR: [7] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [7]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [7]PETSC ERROR: Signal received > [7]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [7]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [7]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [7]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [7]PETSC ERROR: #1 User provided function() line 0 in unknown file > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [0]PETSC ERROR: [0] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [0]PETSC ERROR: [0] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [0]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [0]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [1]PETSC ERROR: likely location of problem given in stack below > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [1]PETSC ERROR: INSTEAD the line number of the start of the function > [1]PETSC ERROR: is given. > [1]PETSC ERROR: [1] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [1]PETSC ERROR: [1] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [1]PETSC ERROR: [1] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [1]PETSC ERROR: [1] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [1]PETSC ERROR: [1] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Signal received > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [1]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [1]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file > [2]PETSC ERROR: > ------------------------------------------------------------------------ > [2]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [2]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [2]PETSC ERROR: likely location of problem given in stack below > [2]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [2]PETSC ERROR: INSTEAD the line number of the start of the function > [2]PETSC ERROR: is given. > [2]PETSC ERROR: [2] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [2]PETSC ERROR: [2] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [2]PETSC ERROR: [2] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [2]PETSC ERROR: [2] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [2]PETSC ERROR: [2] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Signal received > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [2]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [2]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [2]PETSC ERROR: #1 User provided function() line 0 in unknown file > [4]PETSC ERROR: > ------------------------------------------------------------------------ > [4]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [4]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [4]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [4]PETSC ERROR: likely location of problem given in stack below > [4]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [4]PETSC ERROR: INSTEAD the line number of the start of the function > [4]PETSC ERROR: is given. > [4]PETSC ERROR: [4] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [4]PETSC ERROR: [4] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [4]PETSC ERROR: [4] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [4]PETSC ERROR: [4] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [4]PETSC ERROR: [4] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [4]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [4]PETSC ERROR: Signal received > [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [4]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [4]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [4]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [4]PETSC ERROR: #1 User provided function() line 0 in unknown file > [5]PETSC ERROR: > ------------------------------------------------------------------------ > [5]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [5]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [5]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [5]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to > find memory corruption errors > [5]PETSC ERROR: likely location of problem given in stack below > [5]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [5]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [5]PETSC ERROR: INSTEAD the line number of the start of the function > [5]PETSC ERROR: is given. > [5]PETSC ERROR: [5] SuperLU_DIST:pdgssvx line 161 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [5]PETSC ERROR: [5] MatSolve_SuperLU_DIST line 121 > /home/simpson/software/petsc-3.5.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c > [5]PETSC ERROR: [5] MatSolve line 3104 > /home/simpson/software/petsc-3.5.4/src/mat/interface/matrix.c > [5]PETSC ERROR: [5] PCApply_LU line 194 > /home/simpson/software/petsc-3.5.4/src/ksp/pc/impls/factor/lu/lu.c > [5]PETSC ERROR: [5] KSP_PCApplyBAorAB line 258 > /home/simpson/software/petsc-3.5.4/include/petsc-private/kspimpl.h > [5]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [5]PETSC ERROR: Signal received > [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [5]PETSC ERROR: Petsc Release Version 3.5.4, May, 23, 2015 > [5]PETSC ERROR: ./blowup_batch2 on a arch-linux2-c-debug named proteusi01 by > simpson Sat Aug 22 17:01:41 2015 > [5]PETSC ERROR: Configure options > --with-mpi-dir=/mnt/HA/opt/openmpi/intel/64/1.8.1-mlnx-ofed > --with-blas-lib=/mnt/HA/opt/blas/gcc/64/20110419/libblas.a > --with-lapack-lib=/liblapack.a --download-suitesparse=yes > --download-superlu=yes --download-superlu_dist=yes --download-mumps=yes > --download-metis=yes --download-parmetis=yes --download-scalapack=yes > [5]PETSC ERROR: #1 User provided function() line 0 in unknown file > > -gideon > >
