hi all,

i am just getting started with pycuda, and wanted to check the performance
on matrix multiplication.

i copied the demo at
http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i can
get it to run just fine.  but the performance, as measured by the returned
gputime, is consistently a little slower than using numpy's builtin
linalg.dot() function.  i have left all the default settings as defined in
the demo file, and on a test matrix of size (10000,250) the pycuda version
of the inner product takes about 10-15% longer than the numpy version.  are
there default settings i can tweak to make this faster?  or, alternatively,
is there something else i should be doing to test this?

I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on OS
X 10.7.4

thanks,
bryan
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to