Le jeudi 28 mai 2009 15:34:57, Hua Wong a écrit :
> Thanks, I'm also puzzled by the results because I thought a 1e4*1e4
> matrix was already ginormous...
>
> I expected something like a 49 time speedup like in the
> test_gpuarray_speed_random.py (size ~16000000 give a x49 speedup).
>
> So I guess I'm doing something wrong somewhere. I will check the test
> script...

  You should ask on the CUDA forums - but it's not very likely you could get a 
really large speedup - you are performing elementwise multiplications on two 
arrays of size N - which means you need 2N main -> gpu memory transfers, then 
2N memory reads and N memory writes, then again N gpu-> main memory transfer, 
all that for only N floating point operations !

   In other words, you're pretty much limited by memory transfers. If you had 
a better operations/memory transfer ratio in your kernel (such as for a matrix 
multiplication) you'd get a better speedup though.

    Vincent
-- 
Vincent Favre-Nicolin                    http://inac.cea.fr

CEA/Grenoble              Institut Nanosciences & Cryogénie
Laboratoire SP2M/Nano-structures et Rayonnement Synchrotron
17, rue des Martyrs
38054 Grenoble Cedex 9 - France

Université Joseph Fourier        http://www.ujf-grenoble.fr

tél: (+33) 4 38 78 95 40           fax: (+33) 4 38 78 51 38


_______________________________________________
PyCuda mailing list
PyCuda@tiker.net
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to