Hi Ed,
> Yes, each MPI process is responsible for solving a system of nonlinear
equations on a number of grid cells.
The nonlinear equations are solved by Picard iteration and the time
consuming part is the formation and solution of the nonsymmetric sparse
linear system arising from a rectangular grid with a regular finite
difference stencil. All the linear systems have the same sparsity
pattern but may have different numerical values.
Since there are 16 cores on each node on Titan, there can be
concurrently 16 separate independent linear systems to be solved.
One may not want to batch or synchronize the solvers since different
grid cells may require different number of Picard iterations.
Hmm, this does not sound like something I would consider a good fit for
GPUs. With 16 MPI processes you have additional congestion of the one or
two GPUs per node, so you would have the rethink the solution procedure
as a whole. I can think of a procedure where each of these systems is
solved on a separate streaming processor (or work group in OpenCL
language), where synchronization is cheaper - however, this is not
covered by standard functionality in PETSc. Either way, you would
certainly trade robustness of the implementation and a substantial
amount of development time for probably a 2x speedup (if you're lucky).
If you want to give it a try nonetheless, try
-vectype cusp -mattype aijcusp
and some simple preconditioners like Jacobi in order to avoid
host<->device communication.
Best regards,
Karli
Ed
On 12/12/2013 04:15 PM, Karl Rupp wrote:
Hi Mark,
> We have a lot of 5-point stencil operators on ~50x100 grids to solve.
These are not symmetric and we have been using LU. We want to move
this onto GPUs (Titan). What resources are there to do this?
do you have lots of problems to solve simultaneously? Or any other
feature that makes this problem expensive? 50x100 would mean a system
size of about 5000 dofs, which is too small to really benefit from GPUs.
Best regards,
Karli