Re: [petsc-users] KSP changes for successive solver

Barry Smith Sun, 26 Jul 2015 10:35:21 -0700

> On Jul 26, 2015, at 12:11 PM, Dave May <[email protected]> wrote:
> 
> Hi Barry,
> 
> 
> On 26 July 2015 at 05:18, Barry Smith <[email protected]> wrote:
> 
>   This dmdarepart business, which I am guessing is running PCMG on smaller 
> sets of processes with a DMDDA on that smaller set of processes
> 
> Yes. The implementation queries the PC for a DMDA, creates a new DMDA on a 
> communicator of smaller size and attaches this to a KSP which utilizes the 
> same reduced communicator. Matrices and vectors are permuted into the new 
> ordering associated the reduced DMDA. By attached the reduced DMDA to the 
> nested KSP, Galerkin MG can be invoked from the command line.


   This is great
> 
> One notable difference from the subcomm creation routine used by PCREDUNDANT 
> is that this implementation introduces idle cores.

  This is fine
> 
> I specifically wanted to use a smaller communicator, rather than have many 
> cores with zero local length, because if I want to use a Krylov method 
> requiring I want the global reductions performed over less cores. For the 
> network on the Cray XE6 I was using when this was being developed, the 
> reductions times were absolutely killing my performance at around 8k-16k 
> cores.

   Totally agree, using a smaller communicator is better
>  
> for a coarse problem is a fine idea but you should keep in mind the rule of 
> thumb that that parallel iterative (and even more direct) solvers don't do 
> well we there is roughly 10,000 or fewer degrees of freedom per processor. 
> 
> The original motivation for this repartitioning implementation was precisely 
> to deal with the issue you raise above. In my application, my subdomains are 
> on the order of 24^3 Q2 elements and with the current DMDA implementation, 
> this means I can typically can use 4 levels of multi-grid. For the large jobs 
> (> 4k cores), this results in large coarse grid problems. The coarse grids 
> would generally consist of a single Q2 element with at most 375 non-zeros per 
> row (3D, velocity vector)
> 
> In my experience, I found the setup cost of associated with PCREDUNDANT + 
> sparse direct to be too high to be a practical option. In addition, the 
> elliptic operators from my application possessed large coefficient variations 
> making convergence of any standard iterative solver (GMRES/{ASM,BJACOBI}) 
> difficult. Even with PCREDUNDANT on reduced comm sizes in combination with 
> SUPERLU_DIST or MUMPS I failed to find a strategy which was both reliable and 
> sufficiently fast. 
> 
> I should note that I did have a lot of success using GAMG as my coarse level 
> solver at the scale I wanted to run at (at the time has <16k cores). At 200k 
> cores however, this might not be the case due to AMG setup times.
> 
> However, since GMG is always faster when it is applicable, I really wanted to 
> be able to continue coarsening until I am at the point where the Galerkin 
> coarse operators no longer capture my coefficient variations. For my problem, 
> this is occurring at approximately 6-7 levels of coarsening - but the 
> constraints imposed by the DMDA partitioning implementation prevented such a 
> hierarchy from being defined.

  Sure, understood
> 
> I've raised the issue before and the consensus seemed to be that faking a 
> reduced comm size by having zero length local rows in a Mat/Vec is just fine. 
> That might be the case, but this philosophy cannot be used in-conjunction 
> with the current DMDA infrastructure to perform a "full machine" run (for 
> strong or weak scaling studies) using a geometric multi-grid implementation 
> whilst simultaneously retaining the fastest, most scalable solution strategy.

   I agree with you more than the "consensus". I think the consensus does it 
just because it is perceived as too difficult or we don't have the right 
infrastructure to do it "correctly"
> 
> In the end that is what I want to do. :D
> 
> I would be happy to contribute a similar repartitioning preconditioner to 
> petsc.

   We'd love to have this reduced processor repartitioning for both DMDA/PCMG 
and for PCGAMG in PETSc.

  Barry

> 
> 
> Cheers,
>   Dave
> 
> So you should definitely not be using SuperLU_DIST in parallel to solve a 
> problem with 1048 degrees of freedom on 128 processes, just use PCREDUNDANT 
> and its default (sequential) LU. That should be faster.
> 
>   Barry
> 
> > On Jul 25, 2015, at 10:09 PM, Barry Smith <[email protected]> wrote:
> >
> >
> >  Don't use
> >
> > -mg_coarse_pc_factor_mat_solver_package superlu_dist
> > -mg_coarse_pc_type lu
> >
> >  with 8000+ processes and 1 degree of freedom per process SuperLU_DIST will 
> > be terrible. Just leave the defaults for this and send the -log_summary
> >
> >  Barry
> >
> >> On Jul 24, 2015, at 2:44 PM, Michele Rosso <[email protected]> wrote:
> >>
> >> Barry,
> >>
> >> I attached ksp_view and log_summary for two different setups:
> >>
> >> 1) Plain MG on 5 levels + LU at the coarse level (files ending in mg5)
> >> 2) Plain MG on 5 levels + custom PC + LU at the coarse level (files ending 
> >> in mg7)
> >>
> >> The custom PC works on a subset of processes, thus allowing to use two 
> >> more levels of MG, for a total of 7.
> >> Case 1) is extremely slow ( ~ 20 sec per solve ) and converges in 21 
> >> iterations.
> >> Case 2) is way faster ( ~ 0.25 sec per solve ) and converges in 29 
> >> iterations.
> >>
> >> Thanks for your help!
> >>
> >> Michele
> >>
> >>
> >> On Fri, 2015-07-24 at 13:56 -0500, Barry Smith wrote:
> >>>  The coarse problem for the PCMG (geometric multigrid) is
> >>>
> >>> Mat Object:       8192 MPI processes
> >>>        type: mpiaij
> >>>        rows=8192, cols=8192
> >>>
> >>> then it tries to solve it with algebraic multigrid on 8192 processes 
> >>> (which is completely insane). A lot of the time is spent in setting up 
> >>> the algebraic multigrid (not surprisingly).
> >>>
> >>> 8192 is kind of small to parallelize.  Please run the same code but with 
> >>> the default coarse grid problem instead of PCGAMG and send us the 
> >>> -log_summary again
> >>>
> >>>  Barry
> >>>
> >>>
> >>>> On Jul 24, 2015, at 1:35 PM, Michele Rosso <[email protected]> wrote:
> >>>>
> >>>> Hi Mark and Barry,
> >>>>
> >>>> I am sorry for my late reply: it was a busy week!
> >>>> I run a test case for a larger problem with  as many levels (i.e. 5) of 
> >>>> MG I could and  GAMG as PC at the coarse level. I attached the output of 
> >>>> info ( after grep for "gmag"),  ksp_view and log_summary.
> >>>> The solve takes about 2 seconds on 8192 cores, which is way too much. 
> >>>> The number of iterations to convergence is 24.
> >>>> I hope there is a way to speed it up.
> >>>>
> >>>> Thanks,
> >>>> Michele
> >>>>
> >>>>
> >>>> On Fri, 2015-07-17 at 09:38 -0400, Mark Adams wrote:
> >>>>>
> >>>>>
> >>>>> On Thu, Jul 16, 2015 at 8:18 PM, Michele Rosso <[email protected]> wrote:
> >>>>> Barry,
> >>>>>
> >>>>> thank you very much for the detailed answer.  I tried what you 
> >>>>> suggested and it works.
> >>>>> So far I tried on a small system but the final goal is to use it for 
> >>>>> very large runs.  How does  PCGAMG compares to PCMG  as far as 
> >>>>> performances and scalability are concerned?
> >>>>> Also, could you help me to tune the GAMG part ( my current setup is in 
> >>>>> the attached ksp_view.txt file )?
> >>>>>
> >>>>>
> >>>>>
> >>>>> I am going to add this to the document today but you can run with 
> >>>>> -info.  This is very noisy so you might want to do the next step at run 
> >>>>> time.  Then grep on GAMG.  This will be about 20 lines.  Send that to 
> >>>>> us and we can go from there.
> >>>>>
> >>>>>
> >>>>> Mark
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> I also tried to use superlu_dist for the LU decomposition on 
> >>>>> mg_coarse_mg_sub_
> >>>>> -mg_coarse_mg_coarse_sub_pc_type lu
> >>>>> -mg_coarse_mg_coarse_sub_pc_factor_mat_solver_package superlu_dist
> >>>>>
> >>>>> but I got an error:
> >>>>>
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> ****** Error in MC64A/AD. INFO(1) = -2
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>> symbfact() error returns 0
> >>>>>
> >>>>>
> >>>>> Thank you,
> >>>>> Michele
> >>>>>
> >>>>>
> >>>>> On Thu, 2015-07-16 at 18:07 -0500, Barry Smith wrote:
> >>>>>>
> >>>>>>> On Jul 16, 2015, at 5:42 PM, Michele Rosso <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Barry,
> >>>>>>>
> >>>>>>> thanks for your reply. So if I want it fixed, I will have to use the 
> >>>>>>> master branch, correct?
> >>>>>>
> >>>>>>
> >>>>>>  Yes, or edit mg.c and remove the offending lines of code (easy 
> >>>>>> enough).
> >>>>>>
> >>>>>>>
> >>>>>>> On a side note, what I am trying to achieve is to be able to use how 
> >>>>>>> many levels of MG I want, despite the limitation imposed by the local 
> >>>>>>> number of grid nodes.
> >>>>>>
> >>>>>>
> >>>>>>   I assume you are talking about with DMDA? There is no generic 
> >>>>>> limitation for PETSc's multigrid, it is only with the way the DMDA 
> >>>>>> code figures out the interpolation that causes a restriction.
> >>>>>>
> >>>>>>
> >>>>>>> So far I am using a borrowed code that implements a PC that creates a 
> >>>>>>> sub communicator and perform MG on it.
> >>>>>>> While reading the documentation I found out that PCMGSetLevels takes 
> >>>>>>> in an optional array of communicators. How does this work?
> >>>>>>
> >>>>>>
> >>>>>>   It doesn't work. It was an idea that never got pursued.
> >>>>>>
> >>>>>>
> >>>>>>> Can I can simply define my matrix and rhs on the fine grid as I would 
> >>>>>>> do normally ( I do not use kspsetoperators and kspsetrhs ) and KSP 
> >>>>>>> would take care of it by using the correct communicator for each 
> >>>>>>> level?
> >>>>>>
> >>>>>>
> >>>>>>   No.
> >>>>>>
> >>>>>>   You can use the PCMG geometric multigrid with DMDA for as many 
> >>>>>> levels as it works and then use PCGAMG as the coarse grid solver. 
> >>>>>> PCGAMG automatically uses fewer processes for the coarse level 
> >>>>>> matrices and vectors. You could do this all from the command line 
> >>>>>> without writing code.
> >>>>>>
> >>>>>>   For example if your code uses a DMDA and calls KSPSetDM() use for 
> >>>>>> example -da_refine 3 -pc_type mg -pc_mg_galerkin -mg_coarse_pc_type 
> >>>>>> gamg  -ksp_view
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>  Barry
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Michele
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, 2015-07-16 at 17:30 -0500, Barry Smith wrote:
> >>>>>>>>   Michel,
> >>>>>>>>
> >>>>>>>>    This is a very annoying feature that has been fixed in master
> >>>>>>>> http://www.mcs.anl.gov/petsc/developers/index.html
> >>>>>>>>  I would like to have changed it in maint but Jed would have a 
> >>>>>>>> shit-fit :-) since it changes behavior.
> >>>>>>>>
> >>>>>>>>  Barry
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jul 16, 2015, at 4:53 PM, Michele Rosso <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I am performing a series of solves inside a loop. The matrix for 
> >>>>>>>>> each solve changes but not enough to justify a rebuilt of the PC at 
> >>>>>>>>> each solve.
> >>>>>>>>> Therefore I am using  KSPSetReusePreconditioner to avoid rebuilding 
> >>>>>>>>> unless necessary. The solver is CG + MG with a custom  PC at the 
> >>>>>>>>> coarse level.
> >>>>>>>>> If KSP is not updated each time, everything works as it is supposed 
> >>>>>>>>> to.
> >>>>>>>>> When instead I allow the default PETSc  behavior, i.e. updating PC 
> >>>>>>>>> every time the matrix changes, the coarse level KSP , initially set 
> >>>>>>>>> to PREONLY, is changed into GMRES
> >>>>>>>>> after the first solve. I am not sure where the problem lies (my PC 
> >>>>>>>>> or PETSc), so I would like to have your opinion on this.
> >>>>>>>>> I attached the ksp_view for the 2 successive solve and the options 
> >>>>>>>>> stack.
> >>>>>>>>>
> >>>>>>>>> Thanks for your help,
> >>>>>>>>> Michel
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> <ksp_view.txt><petsc_options.txt>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> <info.txt><ksp_view.txt><log_gamg.txt>
> >>>
> >>>
> >>>
> >>
> >> <ksp_view_mg5.txt><ksp_view_mg7.txt><log_mg5.txt><log_mg7.txt>
> >
> 
>

Re: [petsc-users] KSP changes for successive solver

Reply via email to