Agree.  This is from the earlier example I believe you gave.  Try calling the 
sum on a small array and then a larger array.  This gets the kernel into the 
card's memory.  My card is OLD (like circa 2005 old) but it runs just as fast 
as the brand new CPU

import numpy
"""
"""
import pycuda.driver as cuda
import pycuda.tools
import pycuda.gpuarray as gpuarray
import pycuda.autoinit, pycuda.compiler
import time;

a=numpy.arange(40000000)
a_gpu=gpuarray.arange(40000000,dtype=numpy.float32)

b = numpy.arange(400)
b_gpu=gpuarray.arange(400,dtype=numpy.float32)
out = gpuarray.sum(b_gpu).get()/b.size

start=cuda.Event()
end=cuda.Event()
start.record()
out = gpuarray.sum(a_gpu).get()/a.size
end.record()
end.synchronize()
print "GPU array time: %fs" %(start.time_till(end)*1e-3)
print out

start.record()
out = numpy.sum(a)/a.size
end.record()
end.synchronize()
print "numpy array time: %fs" %(start.time_till(end)*1e-3)
print out

****************************************
GPU array time: 0.032181s
20000000.0754
numpy array time: 0.040284s
19999999




-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of 
Eli Stevens (Gmail)
Sent: Monday, April 09, 2012 4:11 PM
To: Serra, Mr. Efren, Contractor, Code 7542
Cc: [email protected]
Subject: Re: [PyCUDA] numpy.sum 377x faster than gpuarray.sum

There are fixed startup costs that do not amortize well over only 400 elements.

What happens when you vary the size of the array over several orders
of magnitude?

Eli

On Mon, Apr 9, 2012 at 2:05 PM, Serra, Mr. Efren, Contractor, Code
7542 <[email protected]> wrote:
> import numpy
> """
> """
> import pycuda.driver as cuda
> import pycuda.tools
> import pycuda.gpuarray as gpuarray
> import pycuda.autoinit, pycuda.compiler
>
> a=numpy.arange(400)
> a_gpu=gpuarray.arange(400,dtype=numpy.float32)
>
> start=cuda.Event()
> end=cuda.Event()
> start.record()
> gpuarray.sum(a_gpu).get()/a.size
> end.record()
> end.synchronize()
> print "GPU array time: %fs" %(start.time_till(end)*1e-3)
>
> start.record()
> numpy.sum(a)/a.size
> end.record()
> end.synchronize()
> print "numpy array time: %fs" %(start.time_till(end)*1e-3)
>
> GPU array time: 0.000377s
> numpy array time: 0.000001s
>
> Efren A. Serra (Contractor)
> DeVine Consulting, Inc.
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave., STOP 2
> Monterey, CA 93943
> Code 7542
> Office: 831-656-4650
>
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to