Following Andreas remark, I replaced the following in the code :
import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time
a=numpy.float32(numpy.random.randn(4000,4000))
b=numpy.float32(numpy.random.randn(4000,4000))
tic=time.time()
axb=numpy.dot(a,b) # I assume this time it is matrix multiplication,
according to numpy tutorials I've read...
toc=time.time()-tic
print toc,"CPU"
tic=time.time()
a_gpu = gpuarray.to_gpu(a)
b_gpu = gpuarray.to_gpu(b)
axbGPU = (numpy.dot(a_gpu,b_gpu)).get() # ditto here
toc=time.time()-tic
print toc,"GPU"
Here are the results I get :
2.06739115715 CPU
0.171211004257 GPU
It speeds up the calculation 11 times :)
But I can't try bigger matrices, I lack RAM :(
Hua Wong a écrit :
Thanks, I'm also puzzled by the results because I thought a 1e4*1e4
matrix was already ginormous...
I expected something like a 49 time speedup like in the
test_gpuarray_speed_random.py (size ~16000000 give a x49 speedup).
So I guess I'm doing something wrong somewhere. I will check the test
script...
Getting :
0.46285700798 CPU
0.728541851044 GPU
with your code on a CentOS machine, with a GTX280 and 2x quad core E5410
Per B. Sederberg a écrit :
I modified your code slightly to make it so you are comparing apples
to apples a bit better and I'm getting even worse performance for the
GPU (GTX285 on Debian Testing):
0.652935028076 CPU
1.61081981659 GPU
Here's the new code, which puts the sending and receiving of the data
to/from the card in the loop and also has the CPU perform a float32
operation just like the GPU:
import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time
a=numpy.float32(numpy.random.randn(1e4,1e4))
tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"
tic=time.time()
a_gpu = gpuarray.to_gpu(a)
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"
It looks like you'll need to have even larger matrices before you'll
see a major GPU benefit, though I'm a bit surprised by these results.
Best,
Per
On Thu, May 28, 2009 at 6:55 AM, Hua Wong <hua.w...@pasteur.fr> wrote:
Here is the results I get
0.865973949432 CPU
0.582780122757 GPU
I kind of expected more... (the GPU is a GTX280)
Of course, I never exclude that I did something stupid, in fact, I
expect
it...
Is it the acceleration I should expect from this kind of matrix
operation?
If yes, well cool... I guess.
If not, did I miss something?
Here is the code I use :
import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy
import time
a=numpy.random.randn(1e4,1e4)
tic=time.time()
a_square=a*a
toc=time.time()-tic
print toc,"CPU"
a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))
tic=time.time()
a_squared = (a_gpu*a_gpu).get()
toc=time.time()-tic
print toc,"GPU"
_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net
_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net