Hello!

I need to parallelize a computation of intersection of sets of keywords
over GPU .

As example, I will take a cosine similarity computing the intersection
between two sets.
(see also post:
http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster
)

I want to compute the similiarity, for each key value pairs of large
dictionaries.

The value of a key is indeed a set of thousands of elements, and they can
be strings.

Using multiprocessing I was able to improve by 4x, but i would like to try
out GPU for really speed up the computation.

in the source module, i actually don't know how to declare my parameters
cause they are not float and i haven't found a tutorial using other data
structures than numerical arrays with numpy.
That's why I was I converted my lists of keywords in np.asarray() and I
have tried the following:



# convert list of strings into numpy array
key1 = 'key1'
array1 = np.asarray(D[key1])

# convert list of strings into numpy array
array2 = np.asarray(D[key2])

# assign memory to cuda

array1_cuda = cuda.mem_alloc(sys.getsizeof(array1))
array2_cuda = cuda.mem_alloc(sys.getsizeof(array2))

# and tried

mod = SourceModule("""
  __global__ void cosine(*a, *b)
  {
    int idx = threadIdx.x + threadIdx.y*4;
    proxy =
len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx])))

  }
  """)



a_gpu = gpuarray.to_gpu(array1)
b_gpu = gpuarray.to_gpu(array2)

proxy =
len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu)))




but I get

TypeError: GPUArrays are not hashable.


Is it a problem of data structure, or am I following a conceptual mistake ?


with multiprocessing (without pyCuda) my code is:

## Measuring Performance: 4x !
with Timer() as t:
    key = 'key1'
    setParent = D[key]
    ngbrProxy = set([])
    p = Pool()
    for ngbr in p.imap_unordered(cosine,setParent):
        ngbrProxy.add(ngbr)

print "=> elasped lpush: %s s" % t.secs

I wonder how I could exploit the GPU for this type of computation: I am not
working with numerical matrixes; on the documentation of pyCuda i read it
is possibile to assign any type of data structures, even str, but I
couldn't find an example.

Could you please help in working this out ?
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to