Hi Jack, Thanks for your comments. I had not thought of the idea of a wrapper. The idea of the overhead with small blocks is certainly worrisome. I have not really played around with blas much in a long time and so don't really have an idea of where the breakeven size might be. I might play around with this at some point just to satisfy my curiosity.
Thanks again, Dave Jack Poulson writes: > Dave, > > That will probably not be a very good idea due to the overhead associated > with transferring data to and from the GPU being more expensive than the > computation itself for small problems. This issue can be somewhat avoided > by writing trivial wrappers for routines like dgemm which only run the > multiply on the GPU when the dimensions of the problem are above some > threshold, but this would require slightly more work than simply replacing > BLAS with CUBLAS. > > Jack > > On Fri, Feb 24, 2012 at 8:28 PM, Dave Nystrom <Dave.Nystrom at > tachyonlogic.com > > wrote: > > > I was wondering if anyone had ever tried using cuBlas as a substitute for > > something like MKL with PETSc. I've been wondering if it would give better > > performance than MKL for my direct solves with cholmod even though the > > block > > sizes are small for cholmod i.e. 32x32 is the default I believe. If so, > > were > > there any tricky aspects to using cuBlas in this way? > > > > Thanks, > > > > Dave > >
