Re: [petsc-users] Bad memory scaling with PETSc 3.10

Myriam Peyrounette via petsc-users Tue, 26 Mar 2019 06:27:56 -0700

I checked with -ksp_view (attached) but no prefix is associated with the
matrix. Some are associated to the KSP and PC, but none to the Mat.



Le 03/26/19 à 11:55, Dave May a écrit :
>
>
> On Tue, 26 Mar 2019 at 10:36, Myriam Peyrounette
> <myriam.peyroune...@idris.fr <mailto:myriam.peyroune...@idris.fr>> wrote:
>
>     Oh you were right, the three options are unsused (-matptap_via
>     scalable, -inner_offdiag_matmatmult_via scalable and
>     -inner_diag_matmatmult_via scalable). Does this mean I am not
>     using the associated PtAP functions?
>
>
> No - not necessarily. All it means is the options were not parsed. 
>
> If your matrices have an option prefix associated with them (e.g. abc)
> , then you need to provide the option as
>   -abc_matptap_via scalable
>
> If you are not sure if you matrices have a prefix, look at the result
> of -ksp_view (see below for an example)
>
>   Mat Object: 2 MPI processes
>
>     type: mpiaij
>
>     rows=363, cols=363, bs=3
>
>     total: nonzeros=8649, allocated nonzeros=8649
>
>     total number of mallocs used during MatSetValues calls =0
>
>   Mat Object: (B_) 2 MPI processes
>
>     type: mpiaij
>
>     rows=363, cols=363, bs=3
>
>     total: nonzeros=8649, allocated nonzeros=8649
>
>     total number of mallocs used during MatSetValues calls =0
>
>
> The first matrix has no options prefix, but the second does and it's
> called "B_".
>
>
>
>  
>
>     Myriam
>
>
>     Le 03/26/19 à 11:10, Dave May a écrit :
>>
>>     On Tue, 26 Mar 2019 at 09:52, Myriam Peyrounette via petsc-users
>>     <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:
>>
>>         How can I be sure they are indeed used? Can I print this
>>         information in some log file?
>>
>>     Yes. Re-run the job with the command line option
>>
>>     -options_left true
>>
>>     This will report all options parsed, and importantly, will also
>>     indicate if any options were unused.
>>      
>>
>>     Thanks
>>     Dave
>>
>>         Thanks in advance
>>
>>         Myriam
>>
>>
>>         Le 03/25/19 à 18:24, Matthew Knepley a écrit :
>>>         On Mon, Mar 25, 2019 at 10:54 AM Myriam Peyrounette via
>>>         petsc-users <petsc-users@mcs.anl.gov
>>>         <mailto:petsc-users@mcs.anl.gov>> wrote:
>>>
>>>             Hi,
>>>
>>>             thanks for the explanations. I tried the last PETSc
>>>             version (commit
>>>             fbc5705bc518d02a4999f188aad4ccff5f754cbf), which
>>>             includes the patch you talked about. But the memory
>>>             scaling shows no improvement (see scaling attached),
>>>             even when using the "scalable" options :(
>>>
>>>             I had a look at the PETSc functions
>>>             MatPtAPNumeric_MPIAIJ_MPIAIJ and
>>>             MatPtAPSymbolic_MPIAIJ_MPIAIJ (especially at the
>>>             differences before and after the first "bad" commit),
>>>             but I can't find what induced this memory issue.
>>>
>>>         Are you sure that the option was used? It just looks
>>>         suspicious to me that they use exactly the same amount of
>>>         memory. It should be different, even if it does not solve
>>>         the problem.
>>>
>>>            Thanks,
>>>
>>>              Matt 
>>>
>>>             Myriam
>>>
>>>
>>>
>>>
>>>             Le 03/20/19 à 17:38, Fande Kong a écrit :
>>>>             Hi Myriam,
>>>>
>>>>             There are three algorithms in PETSc to do PtAP ( const
>>>>             char          *algTypes[3] =
>>>>             {"scalable","nonscalable","hypre"};), and can be
>>>>             specified using the petsc options: -matptap_via xxxx.
>>>>
>>>>             (1) -matptap_via hypre: This call the hypre package to
>>>>             do the PtAP trough an all-at-once triple product. In
>>>>             our experiences, it is the most memory efficient, but
>>>>             could be slow.
>>>>
>>>>             (2)  -matptap_via scalable: This involves a row-wise
>>>>             algorithm plus an outer product.  This will use more
>>>>             memory than hypre, but way faster. This used to have a
>>>>             bug that could take all your memory, and I have a fix
>>>>             at 
>>>> https://bitbucket.org/petsc/petsc/pull-requests/1452/mpiptap-enable-large-scale-simulations/diff.
>>>>  
>>>>             When using this option, we may want to have extra
>>>>             options such as   -inner_offdiag_matmatmult_via
>>>>             scalable -inner_diag_matmatmult_via scalable  to select
>>>>             inner scalable algorithms.
>>>>
>>>>             (3)  -matptap_via nonscalable:  Suppose to be even
>>>>             faster, but use more memory. It does dense matrix
>>>>             operations.
>>>>
>>>>
>>>>             Thanks,
>>>>
>>>>             Fande Kong
>>>>
>>>>
>>>>
>>>>
>>>>             On Wed, Mar 20, 2019 at 10:06 AM Myriam Peyrounette via
>>>>             petsc-users <petsc-users@mcs.anl.gov
>>>>             <mailto:petsc-users@mcs.anl.gov>> wrote:
>>>>
>>>>                 More precisely: something happens when upgrading
>>>>                 the functions MatPtAPNumeric_MPIAIJ_MPIAIJ and/or
>>>>                 MatPtAPSymbolic_MPIAIJ_MPIAIJ.
>>>>
>>>>                 Unfortunately, there are a lot of differences
>>>>                 between the old and new versions of these
>>>>                 functions. I keep investigating but if you have any
>>>>                 idea, please let me know.
>>>>
>>>>                 Best,
>>>>
>>>>                 Myriam
>>>>
>>>>
>>>>                 Le 03/20/19 à 13:48, Myriam Peyrounette a écrit :
>>>>>
>>>>>                 Hi all,
>>>>>
>>>>>                 I used git bisect to determine when the memory
>>>>>                 need increased. I found that the first "bad"
>>>>>                 commit is   aa690a28a7284adb519c28cb44eae20a2c131c85.
>>>>>
>>>>>                 Barry was right, this commit seems to be about an
>>>>>                 evolution of MatPtAPSymbolic_MPIAIJ_MPIAIJ. You
>>>>>                 mentioned the option "-matptap_via scalable" but I
>>>>>                 can't find any information about it. Can you tell
>>>>>                 me more?
>>>>>
>>>>>                 Thanks
>>>>>
>>>>>                 Myriam
>>>>>
>>>>>
>>>>>                 Le 03/11/19 à 14:40, Mark Adams a écrit :
>>>>>>                 Is there a difference in memory usage on your
>>>>>>                 tiny problem? I assume no.
>>>>>>
>>>>>>                 I don't see anything that could come from GAMG
>>>>>>                 other than the RAP stuff that you have discussed
>>>>>>                 already.
>>>>>>
>>>>>>                 On Mon, Mar 11, 2019 at 9:32 AM Myriam
>>>>>>                 Peyrounette <myriam.peyroune...@idris.fr
>>>>>>                 <mailto:myriam.peyroune...@idris.fr>> wrote:
>>>>>>
>>>>>>                     The code I am using here is the example 42 of
>>>>>>                     PETSc
>>>>>>                     
>>>>>> (https://www.mcs.anl.gov/petsc/petsc-3.9/src/ksp/ksp/examples/tutorials/ex42.c.html).
>>>>>>                     Indeed it solves the Stokes equation. I
>>>>>>                     thought it was a good idea to use an example
>>>>>>                     you might know (and didn't find any that uses
>>>>>>                     GAMG functions). I just changed the PCMG
>>>>>>                     setup so that the memory problem appears. And
>>>>>>                     it appears when adding PCGAMG.
>>>>>>
>>>>>>                     I don't care about the performance or even
>>>>>>                     the result rightness here, but only about the
>>>>>>                     difference in memory use between 3.6 and
>>>>>>                     3.10. Do you think finding a more adapted
>>>>>>                     script would help?
>>>>>>
>>>>>>                     I used the threshold of 0.1 only once, at the
>>>>>>                     beginning, to test its influence. I used the
>>>>>>                     default threshold (of 0, I guess) for all the
>>>>>>                     other runs.
>>>>>>
>>>>>>                     Myriam
>>>>>>
>>>>>>
>>>>>>                     Le 03/11/19 à 13:52, Mark Adams a écrit :
>>>>>>>                     In looking at this larger scale run ...
>>>>>>>
>>>>>>>                     * Your eigen estimates are much lower than
>>>>>>>                     your tiny test problem.  But this is Stokes
>>>>>>>                     apparently and it should not work anyway.
>>>>>>>                     Maybe you have a small time step that adds a
>>>>>>>                     lot of mass that brings the eigen estimates
>>>>>>>                     down. And your min eigenvalue (not used) is
>>>>>>>                     positive. I would expect negative for Stokes ...
>>>>>>>
>>>>>>>                     * You seem to be setting a threshold value
>>>>>>>                     of 0.1 -- that is very high
>>>>>>>
>>>>>>>                     * v3.6 says "using nonzero initial guess"
>>>>>>>                     but this is not in v3.10. Maybe we just
>>>>>>>                     stopped printing that.
>>>>>>>
>>>>>>>                     * There were some changes to coasening
>>>>>>>                     parameters in going from v3.6 but it does
>>>>>>>                     not look like your problem was effected.
>>>>>>>                     (The coarsening algo is non-deterministic by
>>>>>>>                     default and you can see small difference on
>>>>>>>                     different runs)
>>>>>>>
>>>>>>>                     * We may have also added a "noisy" RHS for
>>>>>>>                     eigen estimates by default from v3.6.
>>>>>>>
>>>>>>>                     * And for non-symetric problems you can try
>>>>>>>                     -pc_gamg_agg_nsmooths 0, but again GAMG is
>>>>>>>                     not built for Stokes anyway.
>>>>>>>
>>>>>>>
>>>>>>>                     On Tue, Mar 5, 2019 at 11:53 AM Myriam
>>>>>>>                     Peyrounette <myriam.peyroune...@idris.fr
>>>>>>>                     <mailto:myriam.peyroune...@idris.fr>> wrote:
>>>>>>>
>>>>>>>                         I used PCView to display the size of the
>>>>>>>                         linear system in each level of the MG.
>>>>>>>                         You'll find the outputs attached to this
>>>>>>>                         mail (zip file) for both the default
>>>>>>>                         threshold value and a value of 0.1, and
>>>>>>>                         for both 3.6 and 3.10 PETSc versions.
>>>>>>>
>>>>>>>                         For convenience, I summarized the
>>>>>>>                         information in a graph, also attached
>>>>>>>                         (png file).
>>>>>>>
>>>>>>>                         As you can see, there are slight
>>>>>>>                         differences between the two versions but
>>>>>>>                         none is critical, in my opinion. Do you
>>>>>>>                         see anything suspicious in the outputs?
>>>>>>>
>>>>>>>                         + I can't find the default threshold
>>>>>>>                         value. Do you know where I can find it?
>>>>>>>
>>>>>>>                         Thanks for the follow-up
>>>>>>>
>>>>>>>                         Myriam
>>>>>>>
>>>>>>>
>>>>>>>                         Le 03/05/19 à 14:06, Matthew Knepley a
>>>>>>>                         écrit :
>>>>>>>>                         On Tue, Mar 5, 2019 at 7:14 AM Myriam
>>>>>>>>                         Peyrounette
>>>>>>>>                         <myriam.peyroune...@idris.fr
>>>>>>>>                         <mailto:myriam.peyroune...@idris.fr>>
>>>>>>>>                         wrote:
>>>>>>>>
>>>>>>>>                             Hi Matt,
>>>>>>>>
>>>>>>>>                             I plotted the memory scalings using
>>>>>>>>                             different threshold values. The two
>>>>>>>>                             scalings are slightly translated
>>>>>>>>                             (from -22 to -88 mB) but this gain
>>>>>>>>                             is neglectable. The 3.6-scaling
>>>>>>>>                             keeps being robust while the
>>>>>>>>                             3.10-scaling deteriorates.
>>>>>>>>
>>>>>>>>                             Do you have any other suggestion?
>>>>>>>>
>>>>>>>>                         Mark, what is the option she can give
>>>>>>>>                         to output all the GAMG data?
>>>>>>>>
>>>>>>>>                         Also, run using -ksp_view. GAMG will
>>>>>>>>                         report all the sizes of its grids, so
>>>>>>>>                         it should be easy to see
>>>>>>>>                         if the coarse grid sizes are
>>>>>>>>                         increasing, and also what the effect of
>>>>>>>>                         the threshold value is.
>>>>>>>>
>>>>>>>>                           Thanks,
>>>>>>>>
>>>>>>>>                              Matt 
>>>>>>>>
>>>>>>>>                             Thanks
>>>>>>>>
>>>>>>>>                             Myriam
>>>>>>>>
>>>>>>>>                             Le 03/02/19 à 02:27, Matthew
>>>>>>>>                             Knepley a écrit :
>>>>>>>>>                             On Fri, Mar 1, 2019 at 10:53 AM
>>>>>>>>>                             Myriam Peyrounette via petsc-users
>>>>>>>>>                             <petsc-users@mcs.anl.gov
>>>>>>>>>                             <mailto:petsc-users@mcs.anl.gov>>
>>>>>>>>>                             wrote:
>>>>>>>>>
>>>>>>>>>                                 Hi,
>>>>>>>>>
>>>>>>>>>                                 I used to run my code with
>>>>>>>>>                                 PETSc 3.6. Since I upgraded
>>>>>>>>>                                 the PETSc version
>>>>>>>>>                                 to 3.10, this code has a bad
>>>>>>>>>                                 memory scaling.
>>>>>>>>>
>>>>>>>>>                                 To report this issue, I took
>>>>>>>>>                                 the PETSc script ex42.c and
>>>>>>>>>                                 slightly
>>>>>>>>>                                 modified it so that the KSP
>>>>>>>>>                                 and PC configurations are the
>>>>>>>>>                                 same as in my
>>>>>>>>>                                 code. In particular, I use a
>>>>>>>>>                                 "personnalised" multi-grid
>>>>>>>>>                                 method. The
>>>>>>>>>                                 modifications are indicated by
>>>>>>>>>                                 the keyword "TopBridge" in the
>>>>>>>>>                                 attached
>>>>>>>>>                                 scripts.
>>>>>>>>>
>>>>>>>>>                                 To plot the memory (weak)
>>>>>>>>>                                 scaling, I ran four
>>>>>>>>>                                 calculations for each
>>>>>>>>>                                 script with increasing problem
>>>>>>>>>                                 sizes and computations cores:
>>>>>>>>>
>>>>>>>>>                                 1. 100,000 elts on 4 cores
>>>>>>>>>                                 2. 1 million elts on 40 cores
>>>>>>>>>                                 3. 10 millions elts on 400 cores
>>>>>>>>>                                 4. 100 millions elts on 4,000
>>>>>>>>>                                 cores
>>>>>>>>>
>>>>>>>>>                                 The resulting graph is also
>>>>>>>>>                                 attached. The scaling using
>>>>>>>>>                                 PETSc 3.10
>>>>>>>>>                                 clearly deteriorates for large
>>>>>>>>>                                 cases, while the one using
>>>>>>>>>                                 PETSc 3.6 is
>>>>>>>>>                                 robust.
>>>>>>>>>
>>>>>>>>>                                 After a few tests, I found
>>>>>>>>>                                 that the scaling is mostly
>>>>>>>>>                                 sensitive to the
>>>>>>>>>                                 use of the AMG method for the
>>>>>>>>>                                 coarse grid (line 1780 in
>>>>>>>>>                                 main_ex42_petsc36.cc). In
>>>>>>>>>                                 particular, the performance
>>>>>>>>>                                 strongly
>>>>>>>>>                                 deteriorates when commenting
>>>>>>>>>                                 lines 1777 to 1790 (in
>>>>>>>>>                                 main_ex42_petsc36.cc).
>>>>>>>>>
>>>>>>>>>                                 Do you have any idea of what
>>>>>>>>>                                 changed between version 3.6
>>>>>>>>>                                 and version
>>>>>>>>>                                 3.10 that may imply such
>>>>>>>>>                                 degradation?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                             I believe the default values for
>>>>>>>>>                             PCGAMG changed between versions.
>>>>>>>>>                             It sounds like the coarsening rate
>>>>>>>>>                             is not great enough, so that these
>>>>>>>>>                             grids are too large. This can be
>>>>>>>>>                             set using:
>>>>>>>>>
>>>>>>>>>                               
>>>>>>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html
>>>>>>>>>
>>>>>>>>>                             There is some explanation of this
>>>>>>>>>                             effect on that page. Let us know
>>>>>>>>>                             if setting this does not correct
>>>>>>>>>                             the situation.
>>>>>>>>>
>>>>>>>>>                               Thanks,
>>>>>>>>>
>>>>>>>>>                                  Matt
>>>>>>>>>                              
>>>>>>>>>
>>>>>>>>>                                 Let me know if you need
>>>>>>>>>                                 further information.
>>>>>>>>>
>>>>>>>>>                                 Best,
>>>>>>>>>
>>>>>>>>>                                 Myriam Peyrounette
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                 -- 
>>>>>>>>>                                 Myriam Peyrounette
>>>>>>>>>                                 CNRS/IDRIS - HLST
>>>>>>>>>                                 --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                             -- 
>>>>>>>>>                             What most experimenters take for
>>>>>>>>>                             granted before they begin their
>>>>>>>>>                             experiments is infinitely more
>>>>>>>>>                             interesting than any results to
>>>>>>>>>                             which their experiments lead.
>>>>>>>>>                             -- Norbert Wiener
>>>>>>>>>
>>>>>>>>>                             https://www.cse.buffalo.edu/~knepley/
>>>>>>>>>                             <http://www.cse.buffalo.edu/%7Eknepley/>
>>>>>>>>
>>>>>>>>                             -- 
>>>>>>>>                             Myriam Peyrounette
>>>>>>>>                             CNRS/IDRIS - HLST
>>>>>>>>                             --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                         -- 
>>>>>>>>                         What most experimenters take for
>>>>>>>>                         granted before they begin their
>>>>>>>>                         experiments is infinitely more
>>>>>>>>                         interesting than any results to which
>>>>>>>>                         their experiments lead.
>>>>>>>>                         -- Norbert Wiener
>>>>>>>>
>>>>>>>>                         https://www.cse.buffalo.edu/~knepley/
>>>>>>>>                         <http://www.cse.buffalo.edu/%7Eknepley/>
>>>>>>>
>>>>>>>                         -- 
>>>>>>>                         Myriam Peyrounette
>>>>>>>                         CNRS/IDRIS - HLST
>>>>>>>                         --
>>>>>>>
>>>>>>
>>>>>>                     -- 
>>>>>>                     Myriam Peyrounette
>>>>>>                     CNRS/IDRIS - HLST
>>>>>>                     --
>>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Myriam Peyrounette
>>>>>                 CNRS/IDRIS - HLST
>>>>>                 --
>>>>
>>>>                 -- 
>>>>                 Myriam Peyrounette
>>>>                 CNRS/IDRIS - HLST
>>>>                 --
>>>>
>>>
>>>             -- 
>>>             Myriam Peyrounette
>>>             CNRS/IDRIS - HLST
>>>             --
>>>
>>>
>>>
>>>         -- 
>>>         What most experimenters take for granted before they begin
>>>         their experiments is infinitely more interesting than any
>>>         results to which their experiments lead.
>>>         -- Norbert Wiener
>>>
>>>         https://www.cse.buffalo.edu/~knepley/
>>>         <http://www.cse.buffalo.edu/%7Eknepley/>
>>
>>         -- 
>>         Myriam Peyrounette
>>         CNRS/IDRIS - HLST
>>         --
>>
>
>     -- 
>     Myriam Peyrounette
>     CNRS/IDRIS - HLST
>     --
>

-- 
Myriam Peyrounette
CNRS/IDRIS - HLST
--

KSP Object: 8 MPI processes
  type: gmres
    GMRES: restart=150, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=150
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1e+15
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=2 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8 MPI processes
      type: cg
      maximum iterations=200, initial guess is zero
      tolerances:  relative=1e-08, absolute=1e-50, divergence=1000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8 MPI processes
      type: gamg
        MG: type is MULTIPLICATIVE, levels=4 cycles=v
          Cycles per PCApply=1
          Using Galerkin computed coarse grid matrices
          GAMG specific options
            Threshold for dropping small values from graph 0
            AGG specific options
              Symmetric graph false
      Coarse grid solver -- level -------------------------------
        KSP Object:        (mg_coarse_mg_coarse_)         8 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=1, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_coarse_)         8 MPI processes
          type: bjacobi
            block Jacobi: number of blocks = 8
            Local solve is same for all blocks, in the following KSP and PC objects:
          KSP Object:          (mg_coarse_mg_coarse_sub_)           1 MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_mg_coarse_sub_)           1 MPI processes
            type: lu
              LU: out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
              matrix ordering: nd
              factor fill ratio given 5, needed 1
                Factored matrix follows:
                  Mat Object:                   1 MPI processes
                    type: seqaij
                    rows=18, cols=18, bs=3
                    package used to perform factorization: petsc
                    total: nonzeros=324, allocated nonzeros=324
                    total number of mallocs used during MatSetValues calls =0
                      using I-node routines: found 4 nodes, limit used is 5
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=18, cols=18, bs=3
              total: nonzeros=324, allocated nonzeros=324
              total number of mallocs used during MatSetValues calls =0
                using I-node routines: found 4 nodes, limit used is 5
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=18, cols=18, bs=3
            total: nonzeros=324, allocated nonzeros=324
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 4 nodes, limit used is 5
      Down solver (pre-smoother) on level 1 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_1_)         8 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.151062, max = 1.66169
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_1_esteig_)             8 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_1_)         8 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=111, cols=111, bs=3
            total: nonzeros=5661, allocated nonzeros=5661
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 19 nodes, limit used is 5
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 2 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_2_)         8 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.176955, max = 1.94651
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_2_esteig_)             8 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_2_)         8 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=1239, cols=1239, bs=3
            total: nonzeros=82125, allocated nonzeros=82125
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 47 nodes, limit used is 5
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 3 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_3_)         8 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.170478, max = 1.87526
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_3_esteig_)             8 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_3_)         8 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=46314, cols=46314, bs=3
            total: nonzeros=3.23669e+06, allocated nonzeros=3.23669e+06
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 2016 nodes, limit used is 5
      Up solver (post-smoother) same as down solver (pre-smoother)
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=46314, cols=46314, bs=3
        total: nonzeros=3.23669e+06, allocated nonzeros=3.23669e+06
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 2016 nodes, limit used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0.176403, max = 1.94043
        Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
        KSP Object:        (mg_levels_1_esteig_)         8 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using NONE norm type for convergence test
      maximum iterations=4
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=332145, cols=332145, bs=3
        total: nonzeros=2.4896e+07, allocated nonzeros=2.4896e+07
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8 MPI processes
    type: mpiaij
    rows=332145, cols=332145, bs=3
    total: nonzeros=2.4896e+07, allocated nonzeros=2.4896e+07
    total number of mallocs used during MatSetValues calls =0
KSP Object: 8 MPI processes
  type: gmres
    GMRES: restart=150, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=150
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1e+15
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: mg
    MG: type is MULTIPLICATIVE, levels=2 cycles=v
      Cycles per PCApply=1
      Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level -------------------------------
    KSP Object:    (mg_coarse_)     8 MPI processes
      type: cg
      maximum iterations=200, initial guess is zero
      tolerances:  relative=1e-08, absolute=1e-50, divergence=1000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (mg_coarse_)     8 MPI processes
      type: gamg
        MG: type is MULTIPLICATIVE, levels=4 cycles=v
          Cycles per PCApply=1
          Using Galerkin computed coarse grid matrices
          GAMG specific options
            Threshold for dropping small values from graph 0
            AGG specific options
              Symmetric graph false
      Coarse grid solver -- level -------------------------------
        KSP Object:        (mg_coarse_mg_coarse_)         8 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=1, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_coarse_)         8 MPI processes
          type: bjacobi
            block Jacobi: number of blocks = 8
            Local solve is same for all blocks, in the following KSP and PC objects:
          KSP Object:          (mg_coarse_mg_coarse_sub_)           1 MPI processes
            type: preonly
            maximum iterations=1, initial guess is zero
            tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
            left preconditioning
            using NONE norm type for convergence test
          PC Object:          (mg_coarse_mg_coarse_sub_)           1 MPI processes
            type: lu
              LU: out-of-place factorization
              tolerance for zero pivot 2.22045e-14
              using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
              matrix ordering: nd
              factor fill ratio given 5, needed 1
                Factored matrix follows:
                  Mat Object:                   1 MPI processes
                    type: seqaij
                    rows=18, cols=18, bs=3
                    package used to perform factorization: petsc
                    total: nonzeros=324, allocated nonzeros=324
                    total number of mallocs used during MatSetValues calls =0
                      using I-node routines: found 4 nodes, limit used is 5
            linear system matrix = precond matrix:
            Mat Object:             1 MPI processes
              type: seqaij
              rows=18, cols=18, bs=3
              total: nonzeros=324, allocated nonzeros=324
              total number of mallocs used during MatSetValues calls =0
                using I-node routines: found 4 nodes, limit used is 5
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=18, cols=18, bs=3
            total: nonzeros=324, allocated nonzeros=324
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 4 nodes, limit used is 5
      Down solver (pre-smoother) on level 1 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_1_)         8 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.152589, max = 1.67848
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_1_esteig_)             8 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_1_)         8 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=111, cols=111, bs=3
            total: nonzeros=5661, allocated nonzeros=5661
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 19 nodes, limit used is 5
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 2 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_2_)         8 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.177057, max = 1.94762
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_2_esteig_)             8 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_2_)         8 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=1239, cols=1239, bs=3
            total: nonzeros=82125, allocated nonzeros=82125
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 47 nodes, limit used is 5
      Up solver (post-smoother) same as down solver (pre-smoother)
      Down solver (pre-smoother) on level 3 -------------------------------
        KSP Object:        (mg_coarse_mg_levels_3_)         8 MPI processes
          type: chebyshev
            Chebyshev: eigenvalue estimates:  min = 0.160141, max = 1.76155
            Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
            KSP Object:            (mg_coarse_mg_levels_3_esteig_)             8 MPI processes
              type: gmres
                GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
                GMRES: happy breakdown tolerance 1e-30
              maximum iterations=10, initial guess is zero
              tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
              left preconditioning
              using NONE norm type for convergence test
          maximum iterations=2
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using nonzero initial guess
          using NONE norm type for convergence test
        PC Object:        (mg_coarse_mg_levels_3_)         8 MPI processes
          type: sor
            SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
          linear system matrix = precond matrix:
          Mat Object:           8 MPI processes
            type: mpiaij
            rows=46314, cols=46314, bs=3
            total: nonzeros=3.23669e+06, allocated nonzeros=3.23669e+06
            total number of mallocs used during MatSetValues calls =0
              using I-node (on process 0) routines: found 2016 nodes, limit used is 5
      Up solver (post-smoother) same as down solver (pre-smoother)
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=46314, cols=46314, bs=3
        total: nonzeros=3.23669e+06, allocated nonzeros=3.23669e+06
        total number of mallocs used during MatSetValues calls =0
          using I-node (on process 0) routines: found 2016 nodes, limit used is 5
  Down solver (pre-smoother) on level 1 -------------------------------
    KSP Object:    (mg_levels_1_)     8 MPI processes
      type: chebyshev
        Chebyshev: eigenvalue estimates:  min = 0.149352, max = 1.64288
        Chebyshev: eigenvalues estimated using gmres with translations  [0 0.1; 0 1.1]
        KSP Object:        (mg_levels_1_esteig_)         8 MPI processes
          type: gmres
            GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
            GMRES: happy breakdown tolerance 1e-30
          maximum iterations=10, initial guess is zero
          tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
          left preconditioning
          using NONE norm type for convergence test
      maximum iterations=4
      tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
      left preconditioning
      using nonzero initial guess
      using NONE norm type for convergence test
    PC Object:    (mg_levels_1_)     8 MPI processes
      type: sor
        SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1
      linear system matrix = precond matrix:
      Mat Object:       8 MPI processes
        type: mpiaij
        rows=332145, cols=332145, bs=3
        total: nonzeros=2.4896e+07, allocated nonzeros=2.4896e+07
        total number of mallocs used during MatSetValues calls =0
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object:   8 MPI processes
    type: mpiaij
    rows=332145, cols=332145, bs=3
    total: nonzeros=2.4896e+07, allocated nonzeros=2.4896e+07
    total number of mallocs used during MatSetValues calls =0

smime.p7s
Description: Signature cryptographique S/MIME

Re: [petsc-users] Bad memory scaling with PETSc 3.10

Reply via email to