Hi Bryan,

"W. Bryan Smith" <[email protected]> writes:
> i am just getting started with pycuda, and wanted to check the performance
> on matrix multiplication.
>
> i copied the demo at
> http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i can
> get it to run just fine.  but the performance, as measured by the returned
> gputime, is consistently a little slower than using numpy's builtin
> linalg.dot() function.  i have left all the default settings as defined in
> the demo file, and on a test matrix of size (10000,250) the pycuda version
> of the inner product takes about 10-15% longer than the numpy version.  are
> there default settings i can tweak to make this faster?  or, alternatively,
> is there something else i should be doing to test this?
>
> I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on OS
> X 10.7.4

First, DemoMetaMatrixmulCheetah is (GT200-generation, IIRC) a
demo. PyCUDA per se does not come with an optimize matmul
implementation, but scikits.cuda wraps CUBLAS and should give you
competitive performance.

http://lebedov.github.com/scikits.cuda/generated/scikits.cuda.linalg.dot.html

Also, the memory error that you saw was likely due to a refcounting bug
that was recently fixed in git.

Andreas

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to