Wow, quick response! Yes the times still indicate that after 4 levels you get no improvement in time.
t = [1.5629e+01 , 6.2692e+00, 5.3451e+00, 5.4948e+00, 5.4940e+00, 5.7643e+00 ] I'll look more specifically at the numbers to see where the time is being transformed tomorrow when I am less drunk. It is a trade off between the work saved in the direct solve vs the work needed for the coarser levels in the multigrid cycle. Try refining the grid a couple more times, likely more levels will still help in that case Ahh, you should also try -pc_mg_type full Barry > On Oct 14, 2015, at 10:53 PM, Timothée Nicolas <[email protected]> > wrote: > > OK, > > Richardson is 30-70% faster for these tests, but other than this I don't see > any change. > > Timothee > > > > 2015-10-15 12:37 GMT+09:00 Barry Smith <[email protected]>: > > Timothee, > > Thank you for reporting this issue, it is indeed disturbing and could be > due to a performance regression we may have introduced by being too clever > for our own good. Could you please rerun with the additional option > -mg_levels_ksp_type richardson and send the same output? > > Thanks > > Barry > > > On Oct 14, 2015, at 9:32 PM, Timothée Nicolas <[email protected]> > > wrote: > > > > Thank you Barry for pointing this out. Indeed on a system with no debugging > > the Jacobian evaluations no longer dominate the time (less than 10%). > > However the rest is similar, except the improvement from 2 to 3 levels is > > much better. Still it saturates after levels=3. I understand it in terms of > > CPU time thanks to Matthew's explanations, however what surprises me more > > is that KSP iterations are not more efficient. At the least, even if it > > takes more time to have more levels because of memory issues, I would > > expect KSP iterations to converge more rapidly with more levels, but it is > > not the case as you can see. Probably there is also a rationale behind this > > but I cannot see easily. > > > > I send the new outputs > > > > Best > > > > Timothee > > > > 2015-10-15 3:02 GMT+09:00 Barry Smith <[email protected]>: > > 1) Your timings are meaningless! You cannot compare timings when built with > > all debugging on, PERIOD! > > > > ########################################################## > > # # > > # WARNING!!! # > > # # > > # This code was compiled with a debugging option, # > > # To get timing results run ./configure # > > # using --with-debugging=no, the performance will # > > # be generally two or three times faster. # > > # # > > ########################################################## > > > > 2) Please run with -snes_view . > > > > 3) Note that with 7 levels > > > > SNESJacobianEval 21 1.0 2.4364e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 54 0 0 0 0 54 0 0 0 0 0 > > > > with 2 levels > > > > SNESJacobianEval 6 1.0 2.2441e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > > 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 > > > > > > The Jacobian evaluation is dominating the time! Likely if you fix the > > debugging this will be less the case > > > > Barry > > > > > On Oct 13, 2015, at 9:23 PM, Timothée Nicolas > > > <[email protected]> wrote: > > > > > > Dear all, > > > > > > I have been playing around with multigrid recently, namely with > > > /ksp/ksp/examples/tutorials/ex42.c, with /snes/examples/tutorial/ex5.c > > > and with my own implementation of a laplacian type problem. In all cases, > > > I have noted no improvement whatsoever in the performance, whether in CPU > > > time or KSP iteration, by varying the number of levels of the multigrid > > > solver. As an example, I have attached the log_summary for ex5.c with > > > nlevels = 2 to 7, launched by > > > > > > mpiexec -n 1 ./ex5 -da_grid_x 21 -da_grid_y 21 -ksp_rtol 1.0e-9 > > > -da_refine 6 -pc_type mg -pc_mg_levels # -snes_monitor -ksp_monitor > > > -log_summary > > > > > > where -pc_mg_levels is set to a number between 2 and 7. > > > > > > So there is a noticeable CPU time improvement from 2 levels to 3 levels > > > (30%), and then no improvement whatsoever. I am surprised because with 6 > > > levels of refinement of the DMDA the fine grid has more than 1200 points > > > so with 3 levels the coarse grid still has more than 300 points which is > > > still pretty large (I assume the ratio between grids is 2). I am > > > wondering how the coarse solver efficiently solves the problem on the > > > coarse grid with such a large number of points ? Given the principle of > > > multigrid which is to erase the smooth part of the error with relaxation > > > methods, which are usually efficient only for high frequency, I would > > > expect optimal performance when the coarse grid is basically just a few > > > points in each direction. Does anyone know why the performance saturates > > > at low number of levels ? Basically what happens internally seems to be > > > quite different from what I would expect... > > > > > > Best > > > > > > Timothee > > > <ex5_2_levels_of_multigrid.log><ex5_3_levels_of_multigrid.log><ex5_4_levels_of_multigrid.log><ex5_5_levels_of_multigrid.log><ex5_6_levels_of_multigrid.log><ex5_7_levels_of_multigrid.log> > > > > > > <ex5_2_multigrid_levels.log><ex5_3_multigrid_levels.log><ex5_4_multigrid_levels.log><ex5_5_multigrid_levels.log><ex5_6_multigrid_levels.log><ex5_7_multigrid_levels.log> > > > <ex5_2_multigrid_levels_richardson.log><ex5_3_multigrid_levels_richardson.log><ex5_4_multigrid_levels_richardson.log><ex5_5_multigrid_levels_richardson.log><ex5_6_multigrid_levels_richardson.log><ex5_7_multigrid_levels_richardson.log>
