On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhe...@gmail.com> wrote:
> > > On 19 September 2016 at 21:05, David Knezevic <david.kneze...@akselos.com> > wrote: > >> When I use MUMPS via PETSc, one issue is that it can sometimes fail with >> MUMPS error -9, which means that MUMPS didn't allocate a big enough >> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g. >> via the command line option -mat_mumps_icntl_14. >> >> However, instead of having to run several times with different command >> line options, I'd like to be able to automatically increment icntl 14 value >> in a loop until the solve succeeds. >> >> I have a saved matrix which fails when I use it for a solve with MUMPS >> with 4 MPI processes and the default ictnl values, so I'm using this to >> check that I can achieve the automatic icntl 14 update, as described above. >> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send >> it to anyone else who wants to try this test case out.) >> >> I've pasted some test code below which provides a simple test of this >> idea using two solves. The first solve uses the default value of icntl 14, >> which fails, and then we update icntl 14 to 30 and solve again. The second >> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to >> succeed in this case, but for some reason the second solve still fails. >> >> Below I've also pasted the output from -ksp_view, and you can see that >> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the >> output), so it's not clear to me why the second solve fails. It seems like >> MUMPS is ignoring the update to the ictnl value? >> > > I believe this parameter is utilized during the numerical factorization > phase. > In your code, the operator hasn't changed, however you haven't signalled > to the KSP that you want to re-perform the numerical factorization. > You can do this by calling KSPSetOperators() before your second solve. > I think if you do this (please try it), the factorization will be > performed again and the new value of icntl will have an effect. > > Note this is a wild stab in the dark - I haven't dug through the > petsc-mumps code in detail... > That sounds like a plausible guess to me, but unfortunately it didn't work. I added KSPSetOperators(ksp,A,A); before the second solve and I got the same behavior as before. Thanks, David > ------------------------------------------------------------ >> ----------------------------------------- >> Test code: >> >> Mat A; >> MatCreate(PETSC_COMM_WORLD,&A); >> MatSetType(A,MATMPIAIJ); >> >> PetscViewer petsc_viewer; >> PetscViewerBinaryOpen( PETSC_COMM_WORLD, >> "matrix.dat", >> FILE_MODE_READ, >> &petsc_viewer); >> MatLoad(A, petsc_viewer); >> PetscViewerDestroy(&petsc_viewer); >> >> PetscInt m, n; >> MatGetSize(A, &m, &n); >> >> Vec x; >> VecCreate(PETSC_COMM_WORLD,&x); >> VecSetSizes(x,PETSC_DECIDE,m); >> VecSetFromOptions(x); >> VecSet(x,1.0); >> >> Vec b; >> VecDuplicate(x,&b); >> >> KSP ksp; >> PC pc; >> >> KSPCreate(PETSC_COMM_WORLD,&ksp); >> KSPSetOperators(ksp,A,A); >> >> KSPSetType(ksp,KSPPREONLY); >> KSPGetPC(ksp,&pc); >> >> PCSetType(pc,PCCHOLESKY); >> >> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >> PCFactorSetUpMatSolverPackage(pc); >> >> KSPSetFromOptions(ksp); >> KSPSetUp(ksp); >> >> KSPSolve(ksp,b,x); >> >> { >> KSPConvergedReason reason; >> KSPGetConvergedReason(ksp, &reason); >> std::cout << "converged reason: " << reason << std::endl; >> } >> >> Mat F; >> PCFactorGetMatrix(pc,&F); >> MatMumpsSetIcntl(F,14,30); >> >> KSPSolve(ksp,b,x); >> >> { >> KSPConvergedReason reason; >> KSPGetConvergedReason(ksp, &reason); >> std::cout << "converged reason: " << reason << std::endl; >> } >> >> ------------------------------------------------------------ >> ----------------------------------------- >> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged >> reason: -11" for both solves) >> >> KSP Object: 4 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: 4 MPI processes >> type: cholesky >> Cholesky: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> factor fill ratio given 0., needed 0. >> Factored matrix follows: >> Mat Object: 4 MPI processes >> type: mpiaij >> rows=22878, cols=22878 >> package used to perform factorization: mumps >> total: nonzeros=3361617, allocated nonzeros=3361617 >> total number of mallocs used during MatSetValues calls =0 >> MUMPS run parameters: >> SYM (matrix type): 2 >> PAR (host participation): 1 >> ICNTL(1) (output for error): 6 >> ICNTL(2) (output of diagnostic msg): 0 >> ICNTL(3) (output for global info): 0 >> ICNTL(4) (level of printing): 0 >> ICNTL(5) (input mat struct): 0 >> ICNTL(6) (matrix prescaling): 7 >> ICNTL(7) (sequentia matrix ordering):7 >> ICNTL(8) (scalling strategy): 77 >> ICNTL(10) (max num of refinements): 0 >> ICNTL(11) (error analysis): 0 >> ICNTL(12) (efficiency control): 0 >> ICNTL(13) (efficiency control): 0 >> ICNTL(14) (percentage of estimated workspace increase): 20 >> ICNTL(18) (input mat struct): 3 >> ICNTL(19) (Shur complement info): 0 >> ICNTL(20) (rhs sparse pattern): 0 >> ICNTL(21) (solution struct): 1 >> ICNTL(22) (in-core/out-of-core facility): 0 >> ICNTL(23) (max size of memory can be allocated locally):0 >> ICNTL(24) (detection of null pivot rows): 0 >> ICNTL(25) (computation of a null space basis): 0 >> ICNTL(26) (Schur options for rhs or solution): 0 >> ICNTL(27) (experimental parameter): -24 >> ICNTL(28) (use parallel or sequential ordering): 1 >> ICNTL(29) (parallel ordering): 0 >> ICNTL(30) (user-specified set of entries in inv(A)): 0 >> ICNTL(31) (factors is discarded in the solve phase): 0 >> ICNTL(33) (compute determinant): 0 >> CNTL(1) (relative pivoting threshold): 0.01 >> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >> CNTL(3) (absolute pivoting threshold): 0. >> CNTL(4) (value of static pivoting): -1. >> CNTL(5) (fixation for null pivots): 0. >> RINFO(1) (local estimated flops for the elimination after >> analysis): >> [0] 1.84947e+08 >> [1] 2.42065e+08 >> [2] 2.53044e+08 >> [3] 2.18441e+08 >> RINFO(2) (local estimated flops for the assembly after >> factorization): >> [0] 945938. >> [1] 906795. >> [2] 897815. >> [3] 998840. >> RINFO(3) (local estimated flops for the elimination after >> factorization): >> [0] 1.59835e+08 >> [1] 1.50867e+08 >> [2] 2.27932e+08 >> [3] 1.52037e+08 >> INFO(15) (estimated size of (in MB) MUMPS internal data for >> running numerical factorization): >> [0] 36 >> [1] 37 >> [2] 38 >> [3] 39 >> INFO(16) (size of (in MB) MUMPS internal data used during >> numerical factorization): >> [0] 36 >> [1] 37 >> [2] 38 >> [3] 39 >> INFO(23) (num of pivots eliminated on this processor after >> factorization): >> [0] 6450 >> [1] 5442 >> [2] 4386 >> [3] 5526 >> RINFOG(1) (global estimated flops for the elimination after >> analysis): 8.98497e+08 >> RINFOG(2) (global estimated flops for the assembly after >> factorization): 3.74939e+06 >> RINFOG(3) (global estimated flops for the elimination after >> factorization): 6.9067e+08 >> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >> (0.,0.)*(2^0) >> INFOG(3) (estimated real workspace for factors on all >> processors after analysis): 4082184 >> INFOG(4) (estimated integer workspace for factors on all >> processors after analysis): 231846 >> INFOG(5) (estimated maximum front size in the complete >> tree): 678 >> INFOG(6) (number of nodes in the complete tree): 1380 >> INFOG(7) (ordering option effectively use after analysis): >> 5 >> INFOG(8) (structural symmetry in percent of the permuted >> matrix after analysis): 100 >> INFOG(9) (total real/complex workspace to store the matrix >> factors after factorization): 3521904 >> INFOG(10) (total integer space store the matrix factors >> after factorization): 229416 >> INFOG(11) (order of largest frontal matrix after >> factorization): 678 >> INFOG(12) (number of off-diagonal pivots): 0 >> INFOG(13) (number of delayed pivots after factorization): 0 >> INFOG(14) (number of memory compress after factorization): >> 0 >> INFOG(15) (number of steps of iterative refinement after >> solution): 0 >> INFOG(16) (estimated size (in MB) of all MUMPS internal >> data for factorization after analysis: value on the most memory consuming >> processor): 39 >> INFOG(17) (estimated size of all MUMPS internal data for >> factorization after analysis: sum over all processors): 150 >> INFOG(18) (size of all MUMPS internal data allocated during >> factorization: value on the most memory consuming processor): 39 >> INFOG(19) (size of all MUMPS internal data allocated during >> factorization: sum over all processors): 150 >> INFOG(20) (estimated number of entries in the factors): >> 3361617 >> INFOG(21) (size in MB of memory effectively used during >> factorization - value on the most memory consuming processor): 35 >> INFOG(22) (size in MB of memory effectively used during >> factorization - sum over all processors): 136 >> INFOG(23) (after analysis: value of ICNTL(6) effectively >> used): 0 >> INFOG(24) (after analysis: value of ICNTL(12) effectively >> used): 1 >> INFOG(25) (after factorization: number of pivots modified >> by static pivoting): 0 >> INFOG(28) (after factorization: number of null pivots >> encountered): 0 >> INFOG(29) (after factorization: effective number of entries >> in the factors (sum over all processors)): 2931438 >> INFOG(30, 31) (after solution: size in Mbytes of memory >> used during solution phase): 0, 0 >> INFOG(32) (after analysis: type of analysis done): 1 >> INFOG(33) (value used for ICNTL(8)): 7 >> INFOG(34) (exponent of the determinant if determinant is >> requested): 0 >> linear system matrix = precond matrix: >> Mat Object: 4 MPI processes >> type: mpiaij >> rows=22878, cols=22878 >> total: nonzeros=1219140, allocated nonzeros=1219140 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 1889 nodes, limit used >> is 5 >> converged reason: -11 >> KSP Object: 4 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: 4 MPI processes >> type: cholesky >> Cholesky: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> factor fill ratio given 0., needed 0. >> Factored matrix follows: >> Mat Object: 4 MPI processes >> type: mpiaij >> rows=22878, cols=22878 >> package used to perform factorization: mumps >> total: nonzeros=3361617, allocated nonzeros=3361617 >> total number of mallocs used during MatSetValues calls =0 >> MUMPS run parameters: >> SYM (matrix type): 2 >> PAR (host participation): 1 >> ICNTL(1) (output for error): 6 >> ICNTL(2) (output of diagnostic msg): 0 >> ICNTL(3) (output for global info): 0 >> ICNTL(4) (level of printing): 0 >> ICNTL(5) (input mat struct): 0 >> ICNTL(6) (matrix prescaling): 7 >> ICNTL(7) (sequentia matrix ordering):7 >> ICNTL(8) (scalling strategy): 77 >> ICNTL(10) (max num of refinements): 0 >> ICNTL(11) (error analysis): 0 >> ICNTL(12) (efficiency control): 0 >> ICNTL(13) (efficiency control): 0 >> ICNTL(14) (percentage of estimated workspace increase): 30 >> ICNTL(18) (input mat struct): 3 >> ICNTL(19) (Shur complement info): 0 >> ICNTL(20) (rhs sparse pattern): 0 >> ICNTL(21) (solution struct): 1 >> ICNTL(22) (in-core/out-of-core facility): 0 >> ICNTL(23) (max size of memory can be allocated locally):0 >> ICNTL(24) (detection of null pivot rows): 0 >> ICNTL(25) (computation of a null space basis): 0 >> ICNTL(26) (Schur options for rhs or solution): 0 >> ICNTL(27) (experimental parameter): -24 >> ICNTL(28) (use parallel or sequential ordering): 1 >> ICNTL(29) (parallel ordering): 0 >> ICNTL(30) (user-specified set of entries in inv(A)): 0 >> ICNTL(31) (factors is discarded in the solve phase): 0 >> ICNTL(33) (compute determinant): 0 >> CNTL(1) (relative pivoting threshold): 0.01 >> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >> CNTL(3) (absolute pivoting threshold): 0. >> CNTL(4) (value of static pivoting): -1. >> CNTL(5) (fixation for null pivots): 0. >> RINFO(1) (local estimated flops for the elimination after >> analysis): >> [0] 1.84947e+08 >> [1] 2.42065e+08 >> [2] 2.53044e+08 >> [3] 2.18441e+08 >> RINFO(2) (local estimated flops for the assembly after >> factorization): >> [0] 945938. >> [1] 906795. >> [2] 897815. >> [3] 998840. >> RINFO(3) (local estimated flops for the elimination after >> factorization): >> [0] 1.59835e+08 >> [1] 1.50867e+08 >> [2] 2.27932e+08 >> [3] 1.52037e+08 >> INFO(15) (estimated size of (in MB) MUMPS internal data for >> running numerical factorization): >> [0] 36 >> [1] 37 >> [2] 38 >> [3] 39 >> INFO(16) (size of (in MB) MUMPS internal data used during >> numerical factorization): >> [0] 36 >> [1] 37 >> [2] 38 >> [3] 39 >> INFO(23) (num of pivots eliminated on this processor after >> factorization): >> [0] 6450 >> [1] 5442 >> [2] 4386 >> [3] 5526 >> RINFOG(1) (global estimated flops for the elimination after >> analysis): 8.98497e+08 >> RINFOG(2) (global estimated flops for the assembly after >> factorization): 3.74939e+06 >> RINFOG(3) (global estimated flops for the elimination after >> factorization): 6.9067e+08 >> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >> (0.,0.)*(2^0) >> INFOG(3) (estimated real workspace for factors on all >> processors after analysis): 4082184 >> INFOG(4) (estimated integer workspace for factors on all >> processors after analysis): 231846 >> INFOG(5) (estimated maximum front size in the complete >> tree): 678 >> INFOG(6) (number of nodes in the complete tree): 1380 >> INFOG(7) (ordering option effectively use after analysis): >> 5 >> INFOG(8) (structural symmetry in percent of the permuted >> matrix after analysis): 100 >> INFOG(9) (total real/complex workspace to store the matrix >> factors after factorization): 3521904 >> INFOG(10) (total integer space store the matrix factors >> after factorization): 229416 >> INFOG(11) (order of largest frontal matrix after >> factorization): 678 >> INFOG(12) (number of off-diagonal pivots): 0 >> INFOG(13) (number of delayed pivots after factorization): 0 >> INFOG(14) (number of memory compress after factorization): >> 0 >> INFOG(15) (number of steps of iterative refinement after >> solution): 0 >> INFOG(16) (estimated size (in MB) of all MUMPS internal >> data for factorization after analysis: value on the most memory consuming >> processor): 39 >> INFOG(17) (estimated size of all MUMPS internal data for >> factorization after analysis: sum over all processors): 150 >> INFOG(18) (size of all MUMPS internal data allocated during >> factorization: value on the most memory consuming processor): 39 >> INFOG(19) (size of all MUMPS internal data allocated during >> factorization: sum over all processors): 150 >> INFOG(20) (estimated number of entries in the factors): >> 3361617 >> INFOG(21) (size in MB of memory effectively used during >> factorization - value on the most memory consuming processor): 35 >> INFOG(22) (size in MB of memory effectively used during >> factorization - sum over all processors): 136 >> INFOG(23) (after analysis: value of ICNTL(6) effectively >> used): 0 >> INFOG(24) (after analysis: value of ICNTL(12) effectively >> used): 1 >> INFOG(25) (after factorization: number of pivots modified >> by static pivoting): 0 >> INFOG(28) (after factorization: number of null pivots >> encountered): 0 >> INFOG(29) (after factorization: effective number of entries >> in the factors (sum over all processors)): 2931438 >> INFOG(30, 31) (after solution: size in Mbytes of memory >> used during solution phase): 0, 0 >> INFOG(32) (after analysis: type of analysis done): 1 >> INFOG(33) (value used for ICNTL(8)): 7 >> INFOG(34) (exponent of the determinant if determinant is >> requested): 0 >> linear system matrix = precond matrix: >> Mat Object: 4 MPI processes >> type: mpiaij >> rows=22878, cols=22878 >> total: nonzeros=1219140, allocated nonzeros=1219140 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 1889 nodes, limit used >> is 5 >> converged reason: -11 >> >> ------------------------------------------------------------ >> ----------------------------------------- >> > >