Hua Wong a écrit :
Darn... you are right... Back to square one.

Per B. Sederberg a écrit :
I'm not sure your code is doing what you mean it to.  I get totally
different results when running a dot product of two gpuarrays.  Did
you check the output to show that it was doing what you expect?  I'm
actually surprised it ran at all.

As far as I know, you can't simply replace numpy arrays with gpuarrays
in any numpy method, but I would love to be wrong about this...

Best,
Per

On Thu, May 28, 2009 at 10:34 AM, Hua Wong <hua.w...@pasteur.fr> wrote:
Following Andreas remark, I replaced the following in the code :

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(4000,4000))
b=numpy.float32(numpy.random.randn(4000,4000))

tic=time.time()
axb=numpy.dot(a,b)   # I assume this time it is matrix multiplication,
according to numpy tutorials I've read...
toc=time.time()-tic
print toc,"CPU"


tic=time.time()
a_gpu = gpuarray.to_gpu(a)
b_gpu = gpuarray.to_gpu(b)
axbGPU = (numpy.dot(a_gpu,b_gpu)).get()  # ditto here
toc=time.time()-tic
print toc,"GPU"

Here are the results I get :
2.06739115715 CPU
0.171211004257 GPU

It speeds up the calculation 11 times :)
But I can't try bigger matrices, I lack RAM :(

Hua Wong a écrit :
Thanks, I'm also puzzled by the results because I thought a 1e4*1e4 matrix
was already ginormous...

I expected something like a 49 time speedup like in the
test_gpuarray_speed_random.py (size ~16000000 give a x49 speedup).

So I guess I'm doing something wrong somewhere. I will check the test
script...

Getting :
0.46285700798 CPU
0.728541851044 GPU

with your code on a CentOS machine, with a GTX280 and 2x quad core E5410

Per B. Sederberg a écrit :
I modified your code slightly to make it so you are comparing apples
to apples a bit better and I'm getting even worse performance for the
GPU (GTX285 on Debian Testing):

0.652935028076 CPU
1.61081981659 GPU

Here's the new code, which puts the sending and receiving of the data
to/from the card in the loop and also has the CPU perform a float32
operation just like the GPU:

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.float32(numpy.random.randn(1e4,1e4))

tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"


tic=time.time()
a_gpu = gpuarray.to_gpu(a)
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"

It looks like you'll need to have even larger matrices before you'll
see a major GPU benefit, though I'm a bit surprised by these results.

Best,
Per


On Thu, May 28, 2009 at 6:55 AM, Hua Wong <hua.w...@pasteur.fr> wrote:

Here is the results I get
0.865973949432 CPU
0.582780122757 GPU

I kind of expected more... (the GPU is a GTX280)

Of course, I never exclude that I did something stupid, in fact, I
expect
it...
Is it the acceleration I should expect from this kind of matrix
operation?
If yes, well cool... I guess.
If not, did I miss something?

Here is the code I use :

import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time

a=numpy.random.randn(1e4,1e4)

tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"

a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))

tic=time.time()
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"


_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net


_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net





_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to