Re: [petsc-users] Offloading linear solves in time stepper to GPU

Jed Brown Sat, 30 May 2015 20:51:07 -0700

Harshad Sahasrabudhe <[email protected]> writes:

>>
>>  Surely you're familiar with this.
>
>
> Yes, I'm familiar with this. We are running on Intel Xeon E5 processor. It
> has enough bandwidth and performance.


One core saturates a sizeable fraction of the memory bandwidth for the
socket.  You certainly can't expect 10x speedups when moving from 1 to
16 cores for a memory bandwidth limited application.

> Is the poor scaling due to increased iteration count?  What method are you
>> using?
>
> This is exactly why we have poor scaling. We have tried KSPGMRES.

GMRES is secondary for this discussion; which preconditioner are you
using and how many iterations does it require?

> This sounds like a problem with your code (non-scalable data structure).
>
> We need to work on the algorithm for matrix assembly. In it's current
> state, one CPU ends up doing much of the work.This could be the cause of
> bad memory scaling. This doesn't contribute to the bad scaling to time
> stepping, time taken for time stepping is counted separately from assembly.

This is a linear autonomous system?

> How long does it take to solve that system stand-alone using MAGMA, including
>> the data transfers?
>
> I'm still working on these tests.

Do that first.

signature.asc
Description: PGP signature

Re: [petsc-users] Offloading linear solves in time stepper to GPU

Reply via email to