Hi,

>     Yes, each MPI process is responsible for solving a system of
    nonlinear equations on a number of grid cells.


Just to elaborate, and Ed can correct me, each MPI process has a few 100
to a few 1000 (spacial) cells.  We solve a (Folker-Plank) system in
velocity space at each grid cell.

Thanks, Mark, this helps. Is there any chance you can collect a couple of spatial cells together and solve a bigger system consisting of decoupled subsystems?

Ideally you have more than 100k dofs for GPUs to perform well. Have a look at this figure here (cross-over at about 10k dofs for CUDA):
  http://viennacl.sourceforge.net/uploads/pics/cg-timings.png
to get an idea about the saturation of GPU solves at smaller system sizes. PCI-Express latency is the limiting factor here.

Best regards,
Karli

Reply via email to