On Tue, Apr 24, 2012 at 14:12, Daniel Lowell <redratio1 at gmail.com> wrote:
> I'm writing a vector type with uses flag synching like you have in PETSc > with Vec CUSP, however it uses asynchronous kernel launches > (pipeling,etc..) and autotuned kernels. Not quite ready for primetime, but > we have seen the value of it in terms of speed up. Okay, but why do dozens of small kernel launches when all the data is available up-front? I'm just skeptical that VecMDot should be implemented for CUDA the way it currently is. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/288a470a/attachment.html>