Harshad Sahasrabudhe <[email protected]> writes: >> >> Surely you're familiar with this. > > > Yes, I'm familiar with this. We are running on Intel Xeon E5 processor. It > has enough bandwidth and performance.
One core saturates a sizeable fraction of the memory bandwidth for the socket. You certainly can't expect 10x speedups when moving from 1 to 16 cores for a memory bandwidth limited application. > Is the poor scaling due to increased iteration count? What method are you >> using? > > This is exactly why we have poor scaling. We have tried KSPGMRES. GMRES is secondary for this discussion; which preconditioner are you using and how many iterations does it require? > This sounds like a problem with your code (non-scalable data structure). > > We need to work on the algorithm for matrix assembly. In it's current > state, one CPU ends up doing much of the work.This could be the cause of > bad memory scaling. This doesn't contribute to the bad scaling to time > stepping, time taken for time stepping is counted separately from assembly. This is a linear autonomous system? > How long does it take to solve that system stand-alone using MAGMA, including >> the data transfers? > > I'm still working on these tests. Do that first.
signature.asc
Description: PGP signature
