Hi Bryan, "W. Bryan Smith" <[email protected]> writes: > i am just getting started with pycuda, and wanted to check the performance > on matrix multiplication. > > i copied the demo at > http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i can > get it to run just fine. but the performance, as measured by the returned > gputime, is consistently a little slower than using numpy's builtin > linalg.dot() function. i have left all the default settings as defined in > the demo file, and on a test matrix of size (10000,250) the pycuda version > of the inner product takes about 10-15% longer than the numpy version. are > there default settings i can tweak to make this faster? or, alternatively, > is there something else i should be doing to test this? > > I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on OS > X 10.7.4
First, DemoMetaMatrixmulCheetah is (GT200-generation, IIRC) a demo. PyCUDA per se does not come with an optimize matmul implementation, but scikits.cuda wraps CUBLAS and should give you competitive performance. http://lebedov.github.com/scikits.cuda/generated/scikits.cuda.linalg.dot.html Also, the memory error that you saw was likely due to a refcounting bug that was recently fixed in git. Andreas _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
