> On 19 Mar 2021, at 5:00 AM, Barry Smith <[email protected]> wrote:
>
>
> Eric,
>
>> -Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 value: 1
>
> If an option is skipped it is often due to the exact string name used with
> the option. I see your KSP option is
> Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 but then below I see
>
>> Coarse grid solver -- level -------------------------------
>> KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>> type: preonly
>
> That is, it seems to be looking for an option without the sub. This is
> normal. For the coarsest level of multigrid it uses coarse otherwise it uses
> levels. Sub is used for block methods in PETSc such as block Jacobi methods
> but that doesn't seem to apply in your run with one process. It is possible
> since the run is on one process without a method such as block Jacobi the sub
> is just not relevant.
There is something wrong with the option, IMHO, Eric is right in thinking that
he should prepend sub_.
The prefix is not being propagated properly, I’ll investigate.
Here is a simple reproducer:
$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi // KO
[…]
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
[…]
$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type asm // OK
[…]
linear system matrix = precond matrix:
Mat Object: (sub_) 1 MPI processes
type: seqaij
[…]
$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type gasm // OK
[…]
linear system matrix = precond matrix:
Mat Object: (sub_) 1 MPI processes
type: seqaij
[…]
$ mpirun -n 4 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi
-pc_bjacobi_blocks 1 // OK
[…]
linear system matrix = precond matrix:
Mat Object: (sub_) 4 MPI processes
type: mpiaij
[…]
Eric, in the meantime, you can just put the MUMPS options in the global scope,
i.e., -mat_mumps_icntl_24 1, but this will apply to all unprefixed MUMPS
instances.
Thanks,
Pierre
> You can run any code with your current options and with -help and then grep
> for particular options that you may wish to add. Or you can run with
> -ts/snes/ksp_view to see the option prefixes needed for each inner solve.
>
> I am not sure how to make the code bullet proof in your situation. ideally it
> would explain why your options don't work but I am not sure if that is
> possible.
>
> Barry
>
>
>
>
>
>> On Mar 18, 2021, at 8:46 PM, Eric Chamberland
>> <[email protected]> wrote:
>>
>> Hi again,
>>
>> ok, just saw that some matrices have lines of "0" in case of 3D hermite DOFs
>> (ex: du/dz derivatives) when used into a 2D plane mesh...
>>
>> So, my last problem about hypre smoother is "normal".
>>
>> However, just to play with one of this matrix, I tried to do a "LU" with
>> mumps icntl_24 option activated on the global system: fine it works.
>>
>> Then I tried to switche to GAMG with mumps for the coarse_sub level, but it
>> seems my icntl_24 option is then ignored and I don't know why...
>>
>> See my KSP:
>>
>> KSP Object: (Options_ProjectionL2_0) 1 MPI processes
>> type: bcgs
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-15, absolute=1e-15, divergence=1e+12
>> left preconditioning
>> using PRECONDITIONED norm type for convergence test
>> PC Object: (Options_ProjectionL2_0) 1 MPI processes
>> type: gamg
>> type is MULTIPLICATIVE, levels=2 cycles=v
>> Cycles per PCApply=1
>> Using externally compute Galerkin coarse grid matrices
>> GAMG specific options
>> Threshold for dropping small values in graph on each level =
>> Threshold scaling factor for each level not specified = 1.
>> AGG specific options
>> Symmetric graph false
>> Number of levels to square graph 1
>> Number smoothing steps 1
>> Complexity: grid = 1.09756
>> Coarse grid solver -- level -------------------------------
>> KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>> type: preonly
>> maximum iterations=10000, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>> type: bjacobi
>> number of blocks = 1
>> Local solver is the same for all blocks, as in the following KSP and
>> PC objects on rank 0:
>> KSP Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
>> type: preonly
>> maximum iterations=1, initial guess is zero
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
>> type: lu
>> out-of-place factorization
>> tolerance for zero pivot 2.22045e-14
>> using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>> matrix ordering: nd
>> factor fill ratio given 0., needed 0.
>> Factored matrix follows:
>> Mat Object: 1 MPI processes
>> type: mumps
>> rows=8, cols=8
>> package used to perform factorization: mumps
>> total: nonzeros=64, allocated nonzeros=64
>> MUMPS run parameters:
>> SYM (matrix type): 0
>> PAR (host participation): 1
>> ICNTL(1) (output for error): 6
>> ICNTL(2) (output of diagnostic msg): 0
>> ICNTL(3) (output for global info): 0
>> ICNTL(4) (level of printing): 0
>> ICNTL(5) (input mat struct): 0
>> ICNTL(6) (matrix prescaling): 7
>> ICNTL(7) (sequential matrix ordering):7
>> ICNTL(8) (scaling strategy): 77
>> ICNTL(10) (max num of refinements): 0
>> ICNTL(11) (error analysis): 0
>> ICNTL(12) (efficiency control): 1
>> ICNTL(13) (sequential factorization of the root node): 0
>> ICNTL(14) (percentage of estimated workspace increase): 20
>> ICNTL(18) (input mat struct): 0
>> ICNTL(19) (Schur complement info): 0
>> ICNTL(20) (RHS sparse pattern): 0
>> ICNTL(21) (solution struct): 0
>> ICNTL(22) (in-core/out-of-core facility): 0
>> ICNTL(23) (max size of memory can be allocated locally):0
>> ICNTL(24) (detection of null pivot rows): 0
>> ICNTL(25) (computation of a null space basis): 0
>> ICNTL(26) (Schur options for RHS or solution): 0
>> ICNTL(27) (blocking size for multiple RHS):
>> -32
>> ICNTL(28) (use parallel or sequential ordering): 1
>> ICNTL(29) (parallel ordering): 0
>> ICNTL(30) (user-specified set of entries in inv(A)): 0
>> ICNTL(31) (factors is discarded in the solve phase): 0
>> ICNTL(33) (compute determinant): 0
>> ICNTL(35) (activate BLR based factorization): 0
>> ICNTL(36) (choice of BLR factorization variant): 0
>> ICNTL(38) (estimated compression rate of LU factors):
>> 333
>> CNTL(1) (relative pivoting threshold): 0.01
>> CNTL(2) (stopping criterion of refinement): 1.49012e-08
>> CNTL(3) (absolute pivoting threshold): 0.
>> CNTL(4) (value of static pivoting): -1.
>> CNTL(5) (fixation for null pivots): 0.
>> CNTL(7) (dropping parameter for BLR): 0.
>> RINFO(1) (local estimated flops for the elimination after
>> analysis):
>> [0] 308.
>> RINFO(2) (local estimated flops for the assembly after
>> factorization):
>> [0] 0.
>> RINFO(3) (local estimated flops for the elimination after
>> factorization):
>> [0] 0.
>> INFO(15) (estimated size of (in MB) MUMPS internal data
>> for running numerical factorization):
>> [0] 0
>> INFO(16) (size of (in MB) MUMPS internal data used during
>> numerical factorization):
>> [0] 0
>> INFO(23) (num of pivots eliminated on this processor
>> after factorization):
>> [0] 6
>> RINFOG(1) (global estimated flops for the elimination
>> after analysis): 308.
>> RINFOG(2) (global estimated flops for the assembly after
>> factorization): 0.
>> RINFOG(3) (global estimated flops for the elimination
>> after factorization): 0.
>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>> (0.,0.)*(2^0)
>> INFOG(3) (estimated real workspace for factors on all
>> processors after analysis): 64
>> INFOG(4) (estimated integer workspace for factors on all
>> processors after analysis): 35
>> INFOG(5) (estimated maximum front size in the complete
>> tree): 8
>> INFOG(6) (number of nodes in the complete tree): 1
>> INFOG(7) (ordering option effectively use after
>> analysis): 2
>> INFOG(8) (structural symmetry in percent of the permuted
>> matrix after analysis): 100
>> INFOG(9) (total real/complex workspace to store the
>> matrix factors after factorization): 64
>> INFOG(10) (total integer space store the matrix factors
>> after factorization): 35
>> INFOG(11) (order of largest frontal matrix after
>> factorization): 8
>> INFOG(12) (number of off-diagonal pivots): 0
>> INFOG(13) (number of delayed pivots after factorization): >> 0
>> INFOG(14) (number of memory compress after
>> factorization): 0
>> INFOG(15) (number of steps of iterative refinement after
>> solution): 0
>> INFOG(16) (estimated size (in MB) of all MUMPS internal
>> data for factorization after analysis: value on the most memory consuming
>> processor): 0
>> INFOG(17) (estimated size of all MUMPS internal data for
>> factorization after analysis: sum over all processors): 0
>> INFOG(18) (size of all MUMPS internal data allocated
>> during factorization: value on the most memory consuming processor): 0
>> INFOG(19) (size of all MUMPS internal data allocated
>> during factorization: sum over all processors): 0
>> INFOG(20) (estimated number of entries in the factors): 64
>> INFOG(21) (size in MB of memory effectively used during
>> factorization - value on the most memory consuming processor): 0
>> INFOG(22) (size in MB of memory effectively used during
>> factorization - sum over all processors): 0
>> INFOG(23) (after analysis: value of ICNTL(6) effectively
>> used): 0
>> INFOG(24) (after analysis: value of ICNTL(12) effectively
>> used): 1
>> INFOG(25) (after factorization: number of pivots modified
>> by static pivoting): 0
>> INFOG(28) (after factorization: number of null pivots
>> encountered): 0
>> INFOG(29) (after factorization: effective number of
>> entries in the factors (sum over all processors)): 0
>> INFOG(30, 31) (after solution: size in Mbytes of memory
>> used during solution phase): 0, 0
>> INFOG(32) (after analysis: type of analysis done): 1
>> INFOG(33) (value used for ICNTL(8)): 7
>> INFOG(34) (exponent of the determinant if determinant is
>> requested): 0
>> INFOG(35) (after factorization: number of entries taking
>> into account BLR factor compression - sum over all processors): 0
>> INFOG(36) (after analysis: estimated size of all MUMPS
>> internal data for running BLR in-core - value on the most memory consuming
>> processor): 0
>> INFOG(37) (after analysis: estimated size of all MUMPS
>> internal data for running BLR in-core - sum over all processors): 0
>> INFOG(38) (after analysis: estimated size of all MUMPS
>> internal data for running BLR out-of-core - value on the most memory
>> consuming processor): 0
>> INFOG(39) (after analysis: estimated size of all MUMPS
>> internal data for running BLR out-of-core - sum over all processors): 0
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=8, cols=8, bs=4
>> total: nonzeros=64, allocated nonzeros=64
>> total number of mallocs used during MatSetValues calls=0
>> using I-node routines: found 2 nodes, limit used is 5
>> linear system matrix = precond matrix:
>> Mat Object: 1 MPI processes
>> type: seqaij
>> rows=8, cols=8, bs=4
>> total: nonzeros=64, allocated nonzeros=64
>> total number of mallocs used during MatSetValues calls=0
>> using I-node routines: found 2 nodes, limit used is 5
>> Down solver (pre-smoother) on level 1 -------------------------------
>> KSP Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>> type: chebyshev
>> eigenvalue estimates used: min = 0., max = 0.
>> eigenvalues estimate via gmres min 0., max 0.
>> eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1]
>> KSP Object: (Options_ProjectionL2_0mg_levels_1_esteig_) 1 MPI
>> processes
>> type: gmres
>> restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> happy breakdown tolerance 1e-30
>> maximum iterations=10, initial guess is zero
>> tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using PRECONDITIONED norm type for convergence test
>> PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>> type: sor
>> type = local_symmetric, iterations = 1, local iterations = 1,
>> omega = 1.
>> linear system matrix = precond matrix:
>> Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>> type: seqaij
>> rows=36, cols=36, bs=4
>> total: nonzeros=656, allocated nonzeros=656
>> total number of mallocs used during MatSetValues calls=0
>> using I-node routines: found 9 nodes, limit used is 5
>> estimating eigenvalues using noisy right hand side
>> maximum iterations=2, nonzero initial guess
>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>> left preconditioning
>> using NONE norm type for convergence test
>> PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>> type: sor
>> type = local_symmetric, iterations = 1, local iterations = 1, omega =
>> 1.
>> linear system matrix = precond matrix:
>> Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>> type: seqaij
>> rows=36, cols=36, bs=4
>> total: nonzeros=656, allocated nonzeros=656
>> total number of mallocs used during MatSetValues calls=0
>> using I-node routines: found 9 nodes, limit used is 5
>> Up solver (post-smoother) same as down solver (pre-smoother)
>> linear system matrix = precond matrix:
>> Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>> type: seqaij
>> rows=36, cols=36, bs=4
>> total: nonzeros=656, allocated nonzeros=656
>> total number of mallocs used during MatSetValues calls=0
>> using I-node routines: found 9 nodes, limit used is 5
>>
>> but I have this option left:
>>
>> Option left: name:-Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24
>> value: 1
>>
>> and as you can see above I end with:
>>
>> ICNTL(24) (detection of null pivot rows): 0
>>
>> which is fatal in my case...
>>
>> Can you see where I did wrong?
>>
>> Thanks,
>>
>> Eric
>>
>>
>