Thanks, Jed, On Tue, Jun 9, 2020 at 3:19 PM Jed Brown <[email protected]> wrote:
> Fande Kong <[email protected]> writes: > > > Hi All, > > > > I am trying to interpret the results from "make stream" on two compute > > nodes, where each node has 48 cores. > > > > If my calculations are memory bandwidth limited, such as AMG, MatVec, > > GMRES, etc.. > > There's a lot more to AMG setup than memory bandwidth (architecture > matters a lot, even between different generation CPUs). Could you elaborate a bit more on this? From my understanding, one big part of AMG SetUp is RAP that should be pretty much bandwidth. So the graph coarsening part is affected by architechniques? MatMult and > Krylov are almost pure bandwidth. > > > The best speedup I could get is 16.6938 if I start from one core?? The > > speedup for function evaluations and Jacobian evaluations can be better > > than16.6938? > > Residual and Jacobians can be faster, especially if your code is slow > (poorly vectorized, branchy, or has a lot of arithmetic). > It will be branchy when we handle complicated mutphyics. > > Are you trying to understand perf on current hardware or make decisions > about new hardware? > The nodes are INL supercomputer nodes. I am trying to understand what could be the best speedup I could get when running moose/petsc on that machine for the linear algebra part. Thanks, Fande,
