Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Pierre Jolivet Fri, 19 Mar 2021 04:01:53 -0700

> On 19 Mar 2021, at 5:00 AM, Barry Smith <[email protected]> wrote:
> 
> 
>  Eric,
> 
>> -Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 value: 1
> 
>  If an option is skipped it is often due to the exact string name used with 
> the option. I see your KSP option is 
> Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 but then below I see 
> 
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>>      type: preonly
> 
> That is, it seems to be looking for an option without the sub. This is 
> normal. For the coarsest level of multigrid it uses coarse otherwise it uses 
> levels.  Sub is used for block methods in PETSc such as block Jacobi methods 
> but that doesn't seem to apply in your run with one process. It is possible 
> since the run is on one process without a method such as block Jacobi the sub 
> is just not relevant.

There is something wrong with the option, IMHO, Eric is right in thinking that 
he should prepend sub_.
The prefix is not being propagated properly, I’ll investigate.
Here is a simple reproducer:
$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi // KO
[…]
    linear system matrix = precond matrix:
    Mat Object: 1 MPI processes
      type: seqaij
[…]
$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type asm // OK
[…]
    linear system matrix = precond matrix:
    Mat Object: (sub_) 1 MPI processes
      type: seqaij
[…]
$ mpirun -n 1 src/ksp/ksp/tests/ex60 -ksp_view -pc_type gasm // OK
[…]
      linear system matrix = precond matrix:
      Mat Object: (sub_) 1 MPI processes
        type: seqaij
[…]
$ mpirun -n 4 src/ksp/ksp/tests/ex60 -ksp_view -pc_type bjacobi 
-pc_bjacobi_blocks 1 // OK
[…]
    linear system matrix = precond matrix:
    Mat Object: (sub_) 4 MPI processes
      type: mpiaij
[…]

Eric, in the meantime, you can just put the MUMPS options in the global scope, 
i.e., -mat_mumps_icntl_24 1, but this will apply to all unprefixed MUMPS 
instances.

Thanks,
Pierre

> You can run any code with your current options and with -help  and then grep 
> for particular options that you may wish to add. Or you can run with 
> -ts/snes/ksp_view to see the option prefixes needed for each inner solve. 
> 
> I am not sure how to make the code bullet proof in your situation. ideally it 
> would explain why your options don't work but I am not sure if that is 
> possible.
> 
>  Barry
> 
> 
> 
> 
> 
>> On Mar 18, 2021, at 8:46 PM, Eric Chamberland 
>> <[email protected]> wrote:
>> 
>> Hi again,
>> 
>> ok, just saw that some matrices have lines of "0" in case of 3D hermite DOFs 
>> (ex: du/dz derivatives) when used into a 2D plane mesh...
>> 
>> So, my last problem about hypre smoother is "normal".
>> 
>> However, just to play with one of this matrix, I tried to do a "LU" with 
>> mumps icntl_24 option activated on the global system: fine it works.
>> 
>> Then I tried to switche to GAMG with mumps for the coarse_sub level, but it 
>> seems my icntl_24 option is then ignored and I don't know why...
>> 
>> See my KSP:
>> 
>> KSP Object: (Options_ProjectionL2_0) 1 MPI processes
>>  type: bcgs
>>  maximum iterations=10000, initial guess is zero
>>  tolerances:  relative=1e-15, absolute=1e-15, divergence=1e+12
>>  left preconditioning
>>  using PRECONDITIONED norm type for convergence test
>> PC Object: (Options_ProjectionL2_0) 1 MPI processes
>>  type: gamg
>>    type is MULTIPLICATIVE, levels=2 cycles=v
>>      Cycles per PCApply=1
>>      Using externally compute Galerkin coarse grid matrices
>>      GAMG specific options
>>        Threshold for dropping small values in graph on each level =
>>        Threshold scaling factor for each level not specified = 1.
>>        AGG specific options
>>          Symmetric graph false
>>          Number of levels to square graph 1
>>          Number smoothing steps 1
>>        Complexity:    grid = 1.09756
>>  Coarse grid solver -- level -------------------------------
>>    KSP Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>>      type: preonly
>>      maximum iterations=10000, initial guess is zero
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (Options_ProjectionL2_0mg_coarse_) 1 MPI processes
>>      type: bjacobi
>>        number of blocks = 1
>>        Local solver is the same for all blocks, as in the following KSP and 
>> PC objects on rank 0:
>>      KSP Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
>>        type: preonly
>>        maximum iterations=1, initial guess is zero
>>        tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>        left preconditioning
>>        using NONE norm type for convergence test
>>      PC Object: (Options_ProjectionL2_0mg_coarse_sub_) 1 MPI processes
>>        type: lu
>>          out-of-place factorization
>>          tolerance for zero pivot 2.22045e-14
>>          using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>>          matrix ordering: nd
>>          factor fill ratio given 0., needed 0.
>>            Factored matrix follows:
>>              Mat Object: 1 MPI processes
>>                type: mumps
>>                rows=8, cols=8
>>                package used to perform factorization: mumps
>>                total: nonzeros=64, allocated nonzeros=64
>>                  MUMPS run parameters:
>>                    SYM (matrix type):                   0
>>                    PAR (host participation):            1
>>                    ICNTL(1) (output for error):         6
>>                    ICNTL(2) (output of diagnostic msg): 0
>>                    ICNTL(3) (output for global info):   0
>>                    ICNTL(4) (level of printing):        0
>>                    ICNTL(5) (input mat struct):         0
>>                    ICNTL(6) (matrix prescaling):        7
>>                    ICNTL(7) (sequential matrix ordering):7
>>                    ICNTL(8) (scaling strategy):        77
>>                    ICNTL(10) (max num of refinements):  0
>>                    ICNTL(11) (error analysis):          0
>>                    ICNTL(12) (efficiency control):                         1
>>                    ICNTL(13) (sequential factorization of the root node):  0
>>                    ICNTL(14) (percentage of estimated workspace increase): 20
>>                    ICNTL(18) (input mat struct):                           0
>>                    ICNTL(19) (Schur complement info):                      0
>>                    ICNTL(20) (RHS sparse pattern):                         0
>>                    ICNTL(21) (solution struct):                            0
>>                    ICNTL(22) (in-core/out-of-core facility):               0
>>                    ICNTL(23) (max size of memory can be allocated locally):0
>>                    ICNTL(24) (detection of null pivot rows):               0
>>                    ICNTL(25) (computation of a null space basis):          0
>>                    ICNTL(26) (Schur options for RHS or solution):          0
>>                    ICNTL(27) (blocking size for multiple RHS):             
>> -32
>>                    ICNTL(28) (use parallel or sequential ordering):        1
>>                    ICNTL(29) (parallel ordering):                          0
>>                    ICNTL(30) (user-specified set of entries in inv(A)):    0
>>                    ICNTL(31) (factors is discarded in the solve phase):    0
>>                    ICNTL(33) (compute determinant):                        0
>>                    ICNTL(35) (activate BLR based factorization):           0
>>                    ICNTL(36) (choice of BLR factorization variant):        0
>>                    ICNTL(38) (estimated compression rate of LU factors):   
>> 333
>>                    CNTL(1) (relative pivoting threshold): 0.01
>>                    CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>                    CNTL(3) (absolute pivoting threshold):      0.
>>                    CNTL(4) (value of static pivoting): -1.
>>                    CNTL(5) (fixation for null pivots):         0.
>>                    CNTL(7) (dropping parameter for BLR):       0.
>>                    RINFO(1) (local estimated flops for the elimination after 
>> analysis):
>>                      [0] 308.
>>                    RINFO(2) (local estimated flops for the assembly after 
>> factorization):
>>                      [0]  0.
>>                    RINFO(3) (local estimated flops for the elimination after 
>> factorization):
>>                      [0]  0.
>>                    INFO(15) (estimated size of (in MB) MUMPS internal data 
>> for running numerical factorization):
>>                    [0] 0
>>                    INFO(16) (size of (in MB) MUMPS internal data used during 
>> numerical factorization):
>>                      [0] 0
>>                    INFO(23) (num of pivots eliminated on this processor 
>> after factorization):
>>                      [0] 6
>>                    RINFOG(1) (global estimated flops for the elimination 
>> after analysis): 308.
>>                    RINFOG(2) (global estimated flops for the assembly after 
>> factorization): 0.
>>                    RINFOG(3) (global estimated flops for the elimination 
>> after factorization): 0.
>>                    (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): 
>> (0.,0.)*(2^0)
>>                    INFOG(3) (estimated real workspace for factors on all 
>> processors after analysis): 64
>>                    INFOG(4) (estimated integer workspace for factors on all 
>> processors after analysis): 35
>>                    INFOG(5) (estimated maximum front size in the complete 
>> tree): 8
>>                    INFOG(6) (number of nodes in the complete tree): 1
>>                    INFOG(7) (ordering option effectively use after 
>> analysis): 2
>>                    INFOG(8) (structural symmetry in percent of the permuted 
>> matrix after analysis): 100
>>                    INFOG(9) (total real/complex workspace to store the 
>> matrix factors after factorization): 64
>>                    INFOG(10) (total integer space store the matrix factors 
>> after factorization): 35
>>                    INFOG(11) (order of largest frontal matrix after 
>> factorization): 8
>>                    INFOG(12) (number of off-diagonal pivots): 0
>>                    INFOG(13) (number of delayed pivots after factorization): >> 0
>>                    INFOG(14) (number of memory compress after 
>> factorization): 0
>>                    INFOG(15) (number of steps of iterative refinement after 
>> solution): 0
>>                    INFOG(16) (estimated size (in MB) of all MUMPS internal 
>> data for factorization after analysis: value on the most memory consuming 
>> processor): 0
>>                    INFOG(17) (estimated size of all MUMPS internal data for 
>> factorization after analysis: sum over all processors): 0
>>                    INFOG(18) (size of all MUMPS internal data allocated 
>> during factorization: value on the most memory consuming processor): 0
>>                    INFOG(19) (size of all MUMPS internal data allocated 
>> during factorization: sum over all processors): 0
>>                    INFOG(20) (estimated number of entries in the factors): 64
>>                    INFOG(21) (size in MB of memory effectively used during 
>> factorization - value on the most memory consuming processor): 0
>>                    INFOG(22) (size in MB of memory effectively used during 
>> factorization - sum over all processors): 0
>>                    INFOG(23) (after analysis: value of ICNTL(6) effectively 
>> used): 0
>>                    INFOG(24) (after analysis: value of ICNTL(12) effectively 
>> used): 1
>>                    INFOG(25) (after factorization: number of pivots modified 
>> by static pivoting): 0
>>                    INFOG(28) (after factorization: number of null pivots 
>> encountered): 0
>>                    INFOG(29) (after factorization: effective number of 
>> entries in the factors (sum over all processors)): 0
>>                    INFOG(30, 31) (after solution: size in Mbytes of memory 
>> used during solution phase): 0, 0
>>                    INFOG(32) (after analysis: type of analysis done): 1
>>                    INFOG(33) (value used for ICNTL(8)): 7
>>                    INFOG(34) (exponent of the determinant if determinant is 
>> requested): 0
>>                    INFOG(35) (after factorization: number of entries taking 
>> into account BLR factor compression - sum over all processors): 0
>>                    INFOG(36) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR in-core - value on the most memory consuming 
>> processor): 0
>>                    INFOG(37) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR in-core - sum over all processors): 0
>>                    INFOG(38) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR out-of-core - value on the most memory 
>> consuming processor): 0
>>                    INFOG(39) (after analysis: estimated size of all MUMPS 
>> internal data for running BLR out-of-core - sum over all processors): 0
>>        linear system matrix = precond matrix:
>>        Mat Object: 1 MPI processes
>>          type: seqaij
>>          rows=8, cols=8, bs=4
>>          total: nonzeros=64, allocated nonzeros=64
>>          total number of mallocs used during MatSetValues calls=0
>>            using I-node routines: found 2 nodes, limit used is 5
>>      linear system matrix = precond matrix:
>>      Mat Object: 1 MPI processes
>>        type: seqaij
>>        rows=8, cols=8, bs=4
>>        total: nonzeros=64, allocated nonzeros=64
>>        total number of mallocs used during MatSetValues calls=0
>>          using I-node routines: found 2 nodes, limit used is 5
>>  Down solver (pre-smoother) on level 1 -------------------------------
>>    KSP Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>>      type: chebyshev
>>        eigenvalue estimates used:  min = 0., max = 0.
>>        eigenvalues estimate via gmres min 0., max 0.
>>        eigenvalues estimated using gmres with translations  [0. 0.1; 0. 1.1]
>>        KSP Object: (Options_ProjectionL2_0mg_levels_1_esteig_) 1 MPI 
>> processes
>>          type: gmres
>>            restart=30, using Classical (unmodified) Gram-Schmidt 
>> Orthogonalization with no iterative refinement
>>            happy breakdown tolerance 1e-30
>>          maximum iterations=10, initial guess is zero
>>          tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
>>          left preconditioning
>>          using PRECONDITIONED norm type for convergence test
>>        PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>>          type: sor
>>            type = local_symmetric, iterations = 1, local iterations = 1, 
>> omega = 1.
>>          linear system matrix = precond matrix:
>>          Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>>            type: seqaij
>>            rows=36, cols=36, bs=4
>>            total: nonzeros=656, allocated nonzeros=656
>>            total number of mallocs used during MatSetValues calls=0
>>              using I-node routines: found 9 nodes, limit used is 5
>>        estimating eigenvalues using noisy right hand side
>>      maximum iterations=2, nonzero initial guess
>>      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>      left preconditioning
>>      using NONE norm type for convergence test
>>    PC Object: (Options_ProjectionL2_0mg_levels_1_) 1 MPI processes
>>      type: sor
>>        type = local_symmetric, iterations = 1, local iterations = 1, omega = 
>> 1.
>>      linear system matrix = precond matrix:
>>      Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>>        type: seqaij
>>        rows=36, cols=36, bs=4
>>        total: nonzeros=656, allocated nonzeros=656
>>        total number of mallocs used during MatSetValues calls=0
>>          using I-node routines: found 9 nodes, limit used is 5
>>  Up solver (post-smoother) same as down solver (pre-smoother)
>>  linear system matrix = precond matrix:
>>  Mat Object: (Options_ProjectionL2_0) 1 MPI processes
>>    type: seqaij
>>    rows=36, cols=36, bs=4
>>    total: nonzeros=656, allocated nonzeros=656
>>    total number of mallocs used during MatSetValues calls=0
>>      using I-node routines: found 9 nodes, limit used is 5
>> 
>> but I have this option left:
>> 
>> Option left: name:-Options_ProjectionL2_0mg_coarse_sub_mat_mumps_icntl_24 
>> value: 1
>> 
>> and as you can see above I end with:
>> 
>>                    ICNTL(24) (detection of null pivot rows):               0
>> 
>> which is fatal in my case...
>> 
>> Can you see where I did wrong?
>> 
>> Thanks,
>> 
>> Eric
>> 
>> 
>
Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Reply via email to