Le jeudi 28 mai 2009 15:34:57, Hua Wong a écrit : > Thanks, I'm also puzzled by the results because I thought a 1e4*1e4 > matrix was already ginormous... > > I expected something like a 49 time speedup like in the > test_gpuarray_speed_random.py (size ~16000000 give a x49 speedup). > > So I guess I'm doing something wrong somewhere. I will check the test > script...
You should ask on the CUDA forums - but it's not very likely you could get a really large speedup - you are performing elementwise multiplications on two arrays of size N - which means you need 2N main -> gpu memory transfers, then 2N memory reads and N memory writes, then again N gpu-> main memory transfer, all that for only N floating point operations ! In other words, you're pretty much limited by memory transfers. If you had a better operations/memory transfer ratio in your kernel (such as for a matrix multiplication) you'd get a better speedup though. Vincent -- Vincent Favre-Nicolin http://inac.cea.fr CEA/Grenoble Institut Nanosciences & Cryogénie Laboratoire SP2M/Nano-structures et Rayonnement Synchrotron 17, rue des Martyrs 38054 Grenoble Cedex 9 - France Université Joseph Fourier http://www.ujf-grenoble.fr tél: (+33) 4 38 78 95 40 fax: (+33) 4 38 78 51 38 _______________________________________________ PyCuda mailing list PyCuda@tiker.net http://tiker.net/mailman/listinfo/pycuda_tiker.net