On Tue, Apr 24, 2012 at 14:12, Daniel Lowell <redratio1 at gmail.com> wrote:

> I'm writing a vector type with uses flag synching like you have in PETSc
> with Vec CUSP, however it uses asynchronous kernel launches
> (pipeling,etc..) and autotuned kernels. Not quite ready for primetime, but
> we have seen the value of it in terms of speed up.


Okay, but why do dozens of small kernel launches when all the data is
available up-front? I'm just skeptical that VecMDot should be implemented
for CUDA the way it currently is.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/288a470a/attachment.html>

Reply via email to