Hello,
I just started working with PyCUDA. Basically whole CUDA is new to me. I
was trying to get to use the GPU to compute dot products of a large number
of vectors. Cause it was taking several days using multiple CPU cores.

But with my first try, I am sad that I did not see the boost in speed. Here
is a piece of code that I am currently running. This is just to see how
much speedup I will be getting. My vector of interest has a dimension of
around "3000". So eventually I will be computing dot product ( or L2 norm)
of those vectors.

I would highly appreciate if someone could suggest what I am missing and
how I could achieve my goal.

I also see some difference in results on numpy and on GPUs. Not as big a
concern right now but I am curious why.

Here is a sample code  I m working with:

import pycuda.gpuarray as gpuarray
import pycuda.reduction as reduction
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
import time


krnl = reduction.ReductionKernel(numpy.float32, neutral="0",
        reduce_expr="a+b", map_expr="x[i]*y[i]",
        arguments="float *x, float *y")
ssd = reduction.ReductionKernel(numpy.float32, neutral="0",
        reduce_expr="a+b", map_expr="(x[i] - y[i])*(x[i] - y[i])",
        arguments="float *x, float *y")

for i in range(10):
    a = numpy.random.randn(3000)
    b = numpy.random.randn(3000)

    a_gpu = gpuarray.to_gpu(a.astype(numpy.float32))
    b_gpu = gpuarray.to_gpu(b.astype(numpy.float32))

    start = time.time()
    numpy_dot = numpy.dot(a,b)
    end = time.time()
    dt = end - start

    print ("CPU time", dt)
    print ("numpy_dot", numpy_dot)
    print ("numpy_euclid", numpy_ssd)

    start = time.time()
    my_dot_prod = krnl(a_gpu, b_gpu).get()
    end = time.time()


    dt = end - start
    print ("GPU time", dt)
    print ("my dot product", my_dot_prod)
    print ("my euclid", my_euclid)
    print ("\n")


Example timings are:
CPU time 5.9604644775390625e-06
numpy_dot -19.7736554062 <(773)%20655-4062>
numpy_ssd 5975.41368065
GPU time 0.0009388923645019531
my dot product -19.77365493774414
my ssd 5975.4140625


Thanks,
Arch
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to