> > which preconditioner are you using and how many iterations does it > require?
This is a linear autonomous system? > How long does it take to solve that system stand-alone using MAGMA, > including > >> the data transfers? > > > > I'm still working on these tests. > Do that first. Thank you very much for the guidance. I'll get back with the answers tomorrow. On Sat, May 30, 2015 at 11:50 PM, Jed Brown <[email protected]> wrote: > Harshad Sahasrabudhe <[email protected]> writes: > > >> > >> Surely you're familiar with this. > > > > > > Yes, I'm familiar with this. We are running on Intel Xeon E5 processor. > It > > has enough bandwidth and performance. > > One core saturates a sizeable fraction of the memory bandwidth for the > socket. You certainly can't expect 10x speedups when moving from 1 to > 16 cores for a memory bandwidth limited application. > > > Is the poor scaling due to increased iteration count? What method are > you > >> using? > > > > This is exactly why we have poor scaling. We have tried KSPGMRES. > > GMRES is secondary for this discussion; which preconditioner are you > using and how many iterations does it require? > > > This sounds like a problem with your code (non-scalable data structure). > > > > We need to work on the algorithm for matrix assembly. In it's current > > state, one CPU ends up doing much of the work.This could be the cause of > > bad memory scaling. This doesn't contribute to the bad scaling to time > > stepping, time taken for time stepping is counted separately from > assembly. > > This is a linear autonomous system? > > > How long does it take to solve that system stand-alone using MAGMA, > including > >> the data transfers? > > > > I'm still working on these tests. > > Do that first. >
