Re: [petsc-users] Offloading linear solves in time stepper to GPU

Harshad Sahasrabudhe Sat, 30 May 2015 21:04:17 -0700

>
> which preconditioner are you using and how many iterations does it
> require?


 This is a linear autonomous system?



> How long does it take to solve that system stand-alone using MAGMA,
> including
> >> the data transfers?
> >
> > I'm still working on these tests.
> Do that first.


Thank you very much for the guidance. I'll get back with the answers
tomorrow.


On Sat, May 30, 2015 at 11:50 PM, Jed Brown <[email protected]> wrote:

> Harshad Sahasrabudhe <[email protected]> writes:
>
> >>
> >>  Surely you're familiar with this.
> >
> >
> > Yes, I'm familiar with this. We are running on Intel Xeon E5 processor.
> It
> > has enough bandwidth and performance.
>
> One core saturates a sizeable fraction of the memory bandwidth for the
> socket.  You certainly can't expect 10x speedups when moving from 1 to
> 16 cores for a memory bandwidth limited application.
>
> > Is the poor scaling due to increased iteration count?  What method are
> you
> >> using?
> >
> > This is exactly why we have poor scaling. We have tried KSPGMRES.
>
> GMRES is secondary for this discussion; which preconditioner are you
> using and how many iterations does it require?
>
> > This sounds like a problem with your code (non-scalable data structure).
> >
> > We need to work on the algorithm for matrix assembly. In it's current
> > state, one CPU ends up doing much of the work.This could be the cause of
> > bad memory scaling. This doesn't contribute to the bad scaling to time
> > stepping, time taken for time stepping is counted separately from
> assembly.
>
> This is a linear autonomous system?
>
> > How long does it take to solve that system stand-alone using MAGMA,
> including
> >> the data transfers?
> >
> > I'm still working on these tests.
>
> Do that first.
>

Re: [petsc-users] Offloading linear solves in time stepper to GPU

Reply via email to