Okay... after many dir(pycuda.gpuarrays) I am ready for antoehr try.

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(4000,4000))
b=numpy.float32(numpy.random.randn(4000,4000))

tic=time.time()
axb=numpy.dot(a,b)   #still assuming this part is correct
toc=time.time()-tic
print toc,"CPU"


tic=time.time()
#a_gpu = gpuarray.to_gpu(a)
#b_gpu = gpuarray.to_gpu(b)
axbGPU = (gpuarray.numpy.dot(a,b)) #found this one while browsing the functions in gpuarray.numpy
toc=time.time()-tic
print toc,"GPU"

print axb
print axbGPU

axb and axbGPU are the same now.

Timing is not extraordinary though, so...
1.86853194237 CPU
1.67322492599 GPU (on average GPU timing is lower than CPU, still... meh)

Have I done this right? Or did I made another stupid move?

Per B. Sederberg a écrit :
I'm not sure your code is doing what you mean it to.  I get totally
different results when running a dot product of two gpuarrays.  Did
you check the output to show that it was doing what you expect?  I'm
actually surprised it ran at all.

As far as I know, you can't simply replace numpy arrays with gpuarrays
in any numpy method, but I would love to be wrong about this...

Best,
Per

On Thu, May 28, 2009 at 10:34 AM, Hua Wong <hua.w...@pasteur.fr> wrote:
Following Andreas remark, I replaced the following in the code :

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(4000,4000))
b=numpy.float32(numpy.random.randn(4000,4000))

tic=time.time()
axb=numpy.dot(a,b)   # I assume this time it is matrix multiplication,
according to numpy tutorials I've read...
toc=time.time()-tic
print toc,"CPU"


tic=time.time()
a_gpu = gpuarray.to_gpu(a)
b_gpu = gpuarray.to_gpu(b)
axbGPU = (numpy.dot(a_gpu,b_gpu)).get()  # ditto here
toc=time.time()-tic
print toc,"GPU"

Here are the results I get :
2.06739115715 CPU
0.171211004257 GPU

It speeds up the calculation 11 times :)
But I can't try bigger matrices, I lack RAM :(

Hua Wong a écrit :
Thanks, I'm also puzzled by the results because I thought a 1e4*1e4 matrix
was already ginormous...

I expected something like a 49 time speedup like in the
test_gpuarray_speed_random.py (size ~16000000 give a x49 speedup).

So I guess I'm doing something wrong somewhere. I will check the test
script...

Getting :
0.46285700798 CPU
0.728541851044 GPU

with your code on a CentOS machine, with a GTX280 and 2x quad core E5410

Per B. Sederberg a écrit :
I modified your code slightly to make it so you are comparing apples
to apples a bit better and I'm getting even worse performance for the
GPU (GTX285 on Debian Testing):

0.652935028076 CPU
1.61081981659 GPU

Here's the new code, which puts the sending and receiving of the data
to/from the card in the loop and also has the CPU perform a float32
operation just like the GPU:

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(1e4,1e4))

tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"


tic=time.time()
a_gpu = gpuarray.to_gpu(a)
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"

It looks like you'll need to have even larger matrices before you'll
see a major GPU benefit, though I'm a bit surprised by these results.

Best,
Per


On Thu, May 28, 2009 at 6:55 AM, Hua Wong <hua.w...@pasteur.fr> wrote:

Here is the results I get
0.865973949432 CPU
0.582780122757 GPU

I kind of expected more... (the GPU is a GTX280)

Of course, I never exclude that I did something stupid, in fact, I
expect
it...
Is it the acceleration I should expect from this kind of matrix
operation?
If yes, well cool... I guess.
If not, did I miss something?

Here is the code I use :

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.random.randn(1e4,1e4)

tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"

a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))

tic=time.time()
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"


_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net



_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to