Re: [PyCuda] A little test I did...

Hua Wong Thu, 28 May 2009 07:35:17 -0700

Following Andreas remark, I replaced the following in the code :


import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(4000,4000))
b=numpy.float32(numpy.random.randn(4000,4000))

tic=time.time()

axb=numpy.dot(a,b) # I assume this time it is matrix multiplication,according to numpy tutorials I've read...

toc=time.time()-tic
print toc,"CPU"


tic=time.time()
a_gpu = gpuarray.to_gpu(a)
b_gpu = gpuarray.to_gpu(b)
axbGPU = (numpy.dot(a_gpu,b_gpu)).get()  # ditto here
toc=time.time()-tic
print toc,"GPU"

Here are the results I get :
2.06739115715 CPU
0.171211004257 GPU

It speeds up the calculation 11 times :)

But I can't try bigger matrices, I lack RAM :(

Hua Wong a écrit :

Thanks, I'm also puzzled by the results because I thought a 1e4*1e4matrix was already ginormous...

I expected something like a 49 time speedup like in thetest_gpuarray_speed_random.py (size ~16000000 give a x49 speedup).

So I guess I'm doing something wrong somewhere. I will check the testscript...


Getting :
0.46285700798 CPU
0.728541851044 GPU

with your code on a CentOS machine, with a GTX280 and 2x quad core E5410

Per B. Sederberg a écrit :

I modified your code slightly to make it so you are comparing apples
to apples a bit better and I'm getting even worse performance for the
GPU (GTX285 on Debian Testing):

0.652935028076 CPU
1.61081981659 GPU

Here's the new code, which puts the sending and receiving of the data
to/from the card in the loop and also has the CPU perform a float32
operation just like the GPU:

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(1e4,1e4))

tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"


tic=time.time()
a_gpu = gpuarray.to_gpu(a)
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"

It looks like you'll need to have even larger matrices before you'll
see a major GPU benefit, though I'm a bit surprised by these results.

Best,
Per


On Thu, May 28, 2009 at 6:55 AM, Hua Wong <hua.w...@pasteur.fr> wrote:

Here is the results I get
0.865973949432 CPU
0.582780122757 GPU

I kind of expected more... (the GPU is a GTX280)

Of course, I never exclude that I did something stupid, in fact, Iexpect

it...

Is it the acceleration I should expect from this kind of matrixoperation?

If yes, well cool... I guess.
If not, did I miss something?

Here is the code I use :

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.random.randn(1e4,1e4)

tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"

a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))

tic=time.time()
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"


_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net



_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Re: [PyCuda] A little test I did...

Reply via email to