hi all, i am just getting started with pycuda, and wanted to check the performance on matrix multiplication.
i copied the demo at http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i can get it to run just fine. but the performance, as measured by the returned gputime, is consistently a little slower than using numpy's builtin linalg.dot() function. i have left all the default settings as defined in the demo file, and on a test matrix of size (10000,250) the pycuda version of the inner product takes about 10-15% longer than the numpy version. are there default settings i can tweak to make this faster? or, alternatively, is there something else i should be doing to test this? I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on OS X 10.7.4 thanks, bryan
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
