Sorry forgot one more comment. So basically I am comparing a "good" TAO solver (though this can be debated) with a "so-so" KSP solver in CG/Jacobi. If this "good" TAO solver cannot beat the performance of the "so-so" KSP, is there really any need to include the performance of the "good"KSP if my objective is focused on TAO and my methodology?
On Sunday, June 7, 2015, Justin Chang <[email protected]> wrote: > Matt (Knepley), > > I see what you're saying and it makes perfect sense. The point of my work > isn't necessarily to compare CG/Jacobi with GAMG. Rather I am trying to > compare both the numerical solution and the computational performance of my > "correction" methodology (through optimization) with just solving the FEM > problem normally. Of course this methodology is going to be more expensive > but I think it would be nice to have some "benchmark" to compare against. I > have examples that show where the parallel efficiency of TAO > overtakes CG/Jacobi, and I also have the AI that shows how TAO is higher > than CG/Jacobi and that both are invariant with respect to problem size. > > I ran some (smaller) experiments with GAMG and have noticed problems in > which GAMG wall-clock time is less than CG/Jacobi (though not by much). > However the problem is that it seems I cannot compute the arithmetic > intensity for GAMG. > > The way I see it I have these three options: > > 1) Stick with what I have and acknowledge that GAMG can be better for > larger problems. Since I have compared TAO with CG/Jacobi, somebody else > can compare GAMG with CG/Jacobi. > > 2) Do strong scaling studies with GAMG and TAO and forget about the AI > stuff. If I do this, then IMHO the paper will lose much of its flavor. > > 3) Use a different performance model that can be used to measure GAMG. I > can only imagine that the complexity in applying any other model would > proliferate for GAMG > > 4) Simply report FLOPS/s and the associated wall-clock times with respect > to each solver. Yes this is easily gamed but I would think that this can at > least tell you something (I.e., if this metric drops for a given problem > size, it can be an indicator that the program is losing some efficiency) > > Thoughts? > > Justin > > On Saturday, June 6, 2015, Matthew Knepley <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> On Sat, Jun 6, 2015 at 4:29 AM, Justin Chang <[email protected]> wrote: >> >>> Matt and Mark thank you guys for your responses. >>> >>> The reason I brought up GAMG was because it seems to me that this is the >>> preconditioner to use for elliptic problems. However, I am using CG/Jacobi >>> for my larger problems and the solver converges (with -ksp_atol and >>> -ksp_rtol set to 1e-8). Using GAMG I get rough the same wall-clock time, >>> but significantly fewer solver iterations. >>> >>> As I also kind of mentioned in another mail, the ultimate purpose is to >>> compare how this "correction" methodology using the TAO solver (with >>> bounded constraints) performs compared to the original methodology using >>> the KSP solver (without constraints). I have the A for BLMVM and CG/Jacobi >>> and they are roughly 0.3 and 0.2 respectively (do these sound about >>> right?). Although the AI is higher for TAO , the ratio of actual FLOPS/s >>> over the AI*STREAMS BW is smaller, though I am not sure what conclusions to >>> make of that. This was also partly why I wanted to see what kind of metrics >>> another KSP solver/preconditioner produces. >>> >>> Point being, if I were to draw such comparisons between TAO and KSP, >>> would I get crucified if people find out I am using CG/Jacobi and not GAMG? >>> >> >> Here is what someone like me reviewing your paper would say first. I can >> believe that a well-conditioned problem would >> converge using CG/Jacobi. However, if the highest order derivative looks >> like the Laplacian, then the condition number of >> the equations will be O(h^2), and even with CG it will be O(h), so the >> number of iterations should increase as the square root >> of the problem size (in 2D), where GAMG should be constant. Thus at some >> size GAMG will be more efficient. I would want >> to see where the crossover is for your problem. If you do not get the >> O(h) dependence, I would think that there is a problem >> in the formulation. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Justin >>> >>> On Fri, Jun 5, 2015 at 2:02 PM, Mark Adams <[email protected]> wrote: >>> >>>> >>>>>> >>>>> The overwhleming cost of AMG is the Galerkin triple-product RAP. >>>>> >>>>> >>>> That is overstating it a bit. It can be if you have a hard 3D operator >>>> and coarsening slowly is best. >>>> >>>> Rule of thumb is you spend 50% time is the solver and 50% in the setup, >>>> which is often mostly RAP (in 3D, 2D is much faster). That way you are >>>> within 2x of optimal and it often works out that way anyway. >>>> >>>> Mark >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >
