Agree. This is from the earlier example I believe you gave. Try calling the sum on a small array and then a larger array. This gets the kernel into the card's memory. My card is OLD (like circa 2005 old) but it runs just as fast as the brand new CPU
import numpy """ """ import pycuda.driver as cuda import pycuda.tools import pycuda.gpuarray as gpuarray import pycuda.autoinit, pycuda.compiler import time; a=numpy.arange(40000000) a_gpu=gpuarray.arange(40000000,dtype=numpy.float32) b = numpy.arange(400) b_gpu=gpuarray.arange(400,dtype=numpy.float32) out = gpuarray.sum(b_gpu).get()/b.size start=cuda.Event() end=cuda.Event() start.record() out = gpuarray.sum(a_gpu).get()/a.size end.record() end.synchronize() print "GPU array time: %fs" %(start.time_till(end)*1e-3) print out start.record() out = numpy.sum(a)/a.size end.record() end.synchronize() print "numpy array time: %fs" %(start.time_till(end)*1e-3) print out **************************************** GPU array time: 0.032181s 20000000.0754 numpy array time: 0.040284s 19999999 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Eli Stevens (Gmail) Sent: Monday, April 09, 2012 4:11 PM To: Serra, Mr. Efren, Contractor, Code 7542 Cc: [email protected] Subject: Re: [PyCUDA] numpy.sum 377x faster than gpuarray.sum There are fixed startup costs that do not amortize well over only 400 elements. What happens when you vary the size of the array over several orders of magnitude? Eli On Mon, Apr 9, 2012 at 2:05 PM, Serra, Mr. Efren, Contractor, Code 7542 <[email protected]> wrote: > import numpy > """ > """ > import pycuda.driver as cuda > import pycuda.tools > import pycuda.gpuarray as gpuarray > import pycuda.autoinit, pycuda.compiler > > a=numpy.arange(400) > a_gpu=gpuarray.arange(400,dtype=numpy.float32) > > start=cuda.Event() > end=cuda.Event() > start.record() > gpuarray.sum(a_gpu).get()/a.size > end.record() > end.synchronize() > print "GPU array time: %fs" %(start.time_till(end)*1e-3) > > start.record() > numpy.sum(a)/a.size > end.record() > end.synchronize() > print "numpy array time: %fs" %(start.time_till(end)*1e-3) > > GPU array time: 0.000377s > numpy array time: 0.000001s > > Efren A. Serra (Contractor) > DeVine Consulting, Inc. > Naval Research Laboratory > Marine Meteorology Division > 7 Grace Hopper Ave., STOP 2 > Monterey, CA 93943 > Code 7542 > Office: 831-656-4650 > > > _______________________________________________ > PyCUDA mailing list > [email protected] > http://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
