CUDA: Part 1: Memory

Jed Brown Sat, 6 Oct 2012 20:00:56 -0500

On Sat, Oct 6, 2012 at 5:26 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:


> So, could we use a single kernel launcher for multi-core, CUDA, OpenCL
> based on this principle? Then VecCUDAGetArray() type things would keep
> track of parts of Vecs based on IS instead of all entries in the Vec.
>  Similarly there would be a VecMultiCoreGetArray(). Whenever possible the
> VecXXXGetArray() would not require copies.    As part of this model I'd
> also like to separate the "moving needed data" part of the kernel from the
> "computation on the data" so that everything doesn't block when data is
> being moved around.
>

Hmm, "kernel" code is different in each case. I think it's premature to try
to share the launcher now, but perhaps it could be restructured to support
that case.

Note that sometimes (even now) we want to ensure that a memory copy is up
to date before launching a kernel. In the threads case, we could make a
collective VecXXGetArray(), but on the device, we have to do the transfer
before landing in device code.


>
>    Ok, how about moving this same model up to the MPI level? We already do
> this with IS converted to VecScatter (for performance) for updating ghost
> points (for matrix-vector products, for PDE ghost points etc) (note we can
> hide the VecScatter inside the IS and have it created as needed).
>

VecGetSubVector() sort of does this "hiding the VecScatter". In the general
MPI world, we need a "start" for this sort of subvector access to overlap
comm with computation.

I think a huge number of operations can be phrased as asynchronous access
to subvectors and submatrices, but that's a separate discussion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121006/66a98885/attachment.html>

[petsc-dev] Unification approach for OpenMP/Threads/OpenCL/CUDA: Part 1: Memory

Reply via email to