You're probably aware the thrust library which you're already including also provides GPU algorithms for prefix-sum, reduce, sort etc eg. http://thrust.googlecode.com/svn/tags/1.2.0/doc/html/modules.html
On Mon, Aug 16, 2010 at 10:06 PM, Matthew Knepley <knepley at gmail.com> wrote: > I am no longer as pessimistic as I once was. To start, this is a good > report on prefix-sum in parallel: > > > http://www.cs.cmu.edu/~blelloch/papers/Ble93.pdf<http://www.cs.cmu.edu/%7Eblelloch/papers/Ble93.pdf> > > It can solve first-order recurrences optimally. Higher order recurrences > can be reduced to first-order, > however it is not work optimal (too big by a factor of the order, m), which > seems alright to me. Thus > we should be able to solve bandwidth m matrices efficiently. Here is a nice > implementation of prefix-sum > > http://code.google.com/p/cudpp/ > > This all leads me to believe that for a bounded number of nonzeros per row, > we can get an efficient > algorithm for triangular solve. > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- Chris Cooper Dr. D Studios -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100817/5d3578e7/attachment.html>
