On 14 October 2015 at 16:50, Matthew Knepley <[email protected]> wrote:
> On Wed, Oct 14, 2015 at 7:34 AM, Timothée Nicolas < > [email protected]> wrote: > >> OK, I see. Does it mean that the coarse grid solver is by default set up >> with the options -ksp_type preonly -pc_type lu ? What about the >> multiprocessor case ? >> > > Small scale: We use redundant LU > > Large Scale: We use GAMG > > Is your answer what "you" recommend, or what PETSc does by default? Your answer gives the impression that PETSc makes a decision regarding the choice of either redundant/LU or gamg based on something - e.g. the size of the matrix, the number of cores (or some combination of the two). Is that really what is happening inside PCMG? > Matt > > >> Thx >> >> Timothee >> >> 2015-10-14 21:22 GMT+09:00 Matthew Knepley <[email protected]>: >> >>> On Tue, Oct 13, 2015 at 9:23 PM, Timothée Nicolas < >>> [email protected]> wrote: >>> >>>> Dear all, >>>> >>>> I have been playing around with multigrid recently, namely with >>>> /ksp/ksp/examples/tutorials/ex42.c, with /snes/examples/tutorial/ex5.c and >>>> with my own implementation of a laplacian type problem. In all cases, I >>>> have noted no improvement whatsoever in the performance, whether in CPU >>>> time or KSP iteration, by varying the number of levels of the multigrid >>>> solver. As an example, I have attached the log_summary for ex5.c with >>>> nlevels = 2 to 7, launched by >>>> >>>> mpiexec -n 1 ./ex5 -da_grid_x 21 -da_grid_y 21 -ksp_rtol 1.0e-9 >>>> -da_refine 6 -pc_type mg -pc_mg_levels # -snes_monitor -ksp_monitor >>>> -log_summary >>>> >>>> where -pc_mg_levels is set to a number between 2 and 7. >>>> >>>> So there is a noticeable CPU time improvement from 2 levels to 3 levels >>>> (30%), and then no improvement whatsoever. I am surprised because with 6 >>>> levels of refinement of the DMDA the fine grid has more than 1200 points so >>>> with 3 levels the coarse grid still has more than 300 points which is still >>>> pretty large (I assume the ratio between grids is 2). I am wondering how >>>> the coarse solver efficiently solves the problem on the coarse grid with >>>> such a large number of points ? Given the principle of multigrid which is >>>> to erase the smooth part of the error with relaxation methods, which are >>>> usually efficient only for high frequency, I would expect optimal >>>> performance when the coarse grid is basically just a few points in each >>>> direction. Does anyone know why the performance saturates at low number of >>>> levels ? Basically what happens internally seems to be quite different from >>>> what I would expect... >>>> >>> >>> A performance model that counts only flops is not sophisticated enough >>> to understand this effect. Unfortunately, nearly all MG >>> books/papers use this model. What we need is a model that incorporates >>> memory bandwidth (for pulling down the values), and >>> also maybe memory latency. For instance, your relaxation pulls down all >>> the values and makes a little progress. It does few flops, >>> but lots of memory access. An LU solve does a little memory access, many >>> more flops, but makes a lots more progress. If memory >>> access is more expensive, then we have a tradeoff, and can understand >>> using a coarse grid which is not just a few points. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Best >>>> >>>> Timothee >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener >
