meant to paste the following in my initial post (this is with a 5000-by-250
matrix... somehow after i ran it a couple times, i was getting a memory
error with the 10K matrix):

In [79]: wwt_gpu, gputime  = matrixmul_opt(Wsub,Wsub.T)
number of registers used: 25

In [80]: gputime
Out[80]: 0.5596442222595215

In [81]: wwt_gpu, gputime  = matrixmul_opt(Wsub,Wsub.T)
number of registers used: 25

In [82]: gputime
Out[82]: 0.5658891201019287

In [83]: wwt_gpu, gputime  = matrixmul_opt(Wsub,Wsub.T)
number of registers used: 25

In [84]: gputime
Out[84]: 0.5873079299926758

In [85]: t = time(); wwt_cpu = Wsub.dot(Wsub.T); dt_cpu = time()-t

In [86]: dt_cpu
Out[86]: 0.4057891368865967

In [87]: t = time(); wwt_cpu = Wsub.dot(Wsub.T); dt_cpu = time()-t

In [88]: dt_cpu
Out[88]: 0.36435890197753906

In [89]: t = time(); wwt_cpu = Wsub.dot(Wsub.T); dt_cpu = time()-t

In [90]: dt_cpu
Out[90]: 0.3597581386566162


On Sun, Sep 30, 2012 at 6:04 PM, W. Bryan Smith <[email protected]> wrote:

> hi all,
>
> i am just getting started with pycuda, and wanted to check the performance
> on matrix multiplication.
>
> i copied the demo at
> http://wiki.tiker.net/PyCuda/Examples/DemoMetaMatrixmulCheetah, and i can
> get it to run just fine.  but the performance, as measured by the returned
> gputime, is consistently a little slower than using numpy's builtin
> linalg.dot() function.  i have left all the default settings as defined in
> the demo file, and on a test matrix of size (10000,250) the pycuda version
> of the inner product takes about 10-15% longer than the numpy version.  are
> there default settings i can tweak to make this faster?  or, alternatively,
> is there something else i should be doing to test this?
>
> I am running the CUDA-5.0 libraries, pycuda 2012.1, and Cheetah 2.4.4 on
> OS X 10.7.4
>
> thanks,
> bryan
>
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to