> MPI_ABORT was invoked on rank 11 in communicator MPI_COMM_WORLD
Please send ALL the output. In particular since rank 11 seems to have chocked we need to see all the messages from [11] to see what it thinks has gone wrong. Barry On Aug 27, 2014, at 4:27 PM, Evan Um <[email protected]> wrote: > Dear PETSC users, > > I try to solve a large problem (about 9,000,000 unknowns) with large number > of processes (about 400 processes and 1TB). I guess that this is a reasonably > large resource for solving this problem because I was able to solve the same > problem using serial MUMPS with 500GB. Of course, it took very long to be > computed. > The same code was parallelized with PETSC. However, my code with PETSC > suddenly crashes after KSPSolve() successfully calls MUMPS as shown below. If > this problem comes from MUMPS, I expect that MUMPS should produce an error > report (ICNTL(4)=3), but no error report was not generated. Did anyone have > such experience with PETSC+MUMPS? I request comments on its trouble shooting. > In advance, I appreciate your help. > > Regards, > Evan > > Codes: > > KSPCreate(PETSC_COMM_WORLD, &ksp); > KSPSetOperators(ksp, A, A); > KSPSetType (ksp, KSPPREONLY); > KSPGetPC(ksp, &pc); > MatSetOption(A, MAT_SPD, PETSC_TRUE); > PCSetType(pc, PCCHOLESKY); > PCFactorSetMatSolverPackage(pc, MATSOLVERMUMPS); > PCFactorSetUpMatSolverPackage(pc); > PCFactorGetMatrix(pc, &F); > KSPSetType(ksp, KSPCG); > MPI_Barrier(MPI_COMM_WORLD); > icntl=29; ival=2; // ParMetis > MatMumpsSetIcntl(F, icntl, ival); > icntl=4; ival=3; // Errors > MatMumpsSetIcntl(F, icntl, ival); > icntl=23; ival=1500; > MatMumpsSetIcntl(F, icntl, ival); > KSPSolve(ksp,b,x); > > > > Errors: > > Entering DMUMPS driver with JOB, N, NZ = 1 9778426 0 > DMUMPS 4.10.0 > L D L^T Solver for symmetric positive definite matrices > Type of parallelism: Working host > ****** ANALYSIS STEP ******** > Using ParMETIS for parallel ordering. > Structual symmetry is:100% > -------------------------------------------------------------------------- > WARNING: A process refused to die! > Host: n0000.voltaire0 > PID: 28131 > This process may still be running and/or consuming resources. > -------------------------------------------------------------------------- > [n0000.voltaire0:28047] 1 more process has sent help message > help-odls-default.txt / odls-default:could-not-kill > [n0000.voltaire0:28047] Set MCA parameter "orte_base_help_aggregate" to 0 to > see all help / error messages > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 11 in communicator MPI_COMM_WORLD > with errorcode 59. > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch > system) has told this process to end > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: > or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory > corruption errors > [1]PETSC ERROR: likely location of problem given in stack below > [1]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [1]PETSC ERROR: INSTEAD the line number of the start of the function > [1]PETSC ERROR: is given. > [1]PETSC ERROR: [1] MatCholeskyFactorSymbolic_MUMPS line 1076 > /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/impls/aij/mpi/mumps/mumps.c > [1]PETSC ERROR: [1] MatCholeskyFactorSymbolic line 2995 > /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/mat/interface/matrix.c > [1]PETSC ERROR: [1] PCSetUp_Cholesky line 88 > /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/pc/impls/factor/cholesky/cholesky.c > [1]PETSC ERROR: [1] KSPSetUp line 219 > /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c > [1]PETSC ERROR: [1] KSPSolve line 381 > /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/ksp/ksp/interface/itfunc.c > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Signal received > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014 > [1]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0000.voltaire0 by > esum Wed Aug 27 13:48:51 2014 > [1]PETSC ERROR: Configure options > --prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0 > --download-fblaslapack=1 --download-mumps=1 --download-parmetis=1 > --download-scalapack --download-metis=1 > --with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/ > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file > [5]PETSC ERROR: > ------------------------------------------------------------------------
