On Sat, Oct 6, 2012 at 5:26 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> So, could we use a single kernel launcher for multi-core, CUDA, OpenCL > based on this principle? Then VecCUDAGetArray() type things would keep > track of parts of Vecs based on IS instead of all entries in the Vec. > Similarly there would be a VecMultiCoreGetArray(). Whenever possible the > VecXXXGetArray() would not require copies. As part of this model I'd > also like to separate the "moving needed data" part of the kernel from the > "computation on the data" so that everything doesn't block when data is > being moved around. > Hmm, "kernel" code is different in each case. I think it's premature to try to share the launcher now, but perhaps it could be restructured to support that case. Note that sometimes (even now) we want to ensure that a memory copy is up to date before launching a kernel. In the threads case, we could make a collective VecXXGetArray(), but on the device, we have to do the transfer before landing in device code. > > Ok, how about moving this same model up to the MPI level? We already do > this with IS converted to VecScatter (for performance) for updating ghost > points (for matrix-vector products, for PDE ghost points etc) (note we can > hide the VecScatter inside the IS and have it created as needed). > VecGetSubVector() sort of does this "hiding the VecScatter". In the general MPI world, we need a "start" for this sort of subvector access to overlap comm with computation. I think a huge number of operations can be phrased as asynchronous access to subvectors and submatrices, but that's a separate discussion. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121006/66a98885/attachment.html>
