Luigi,

here are a few problems with your approach:

- The contents of your SourceModule is not valid C (as in, C the
  programming language)

- 'set' is a Python data structure. PyCUDA will not magically swap out
  the code of 'set' and execute its operations on the GPU.

- Working with arrays of variable-size objects (such as strings) on the
  GPU is somewhat tricky. You'll have to come up with a good data
  structure.  In particular, just copying over a Python data structure
  will not help--if it succeeds, the pointers in the structure will
  point to CPU memory and be entirely useless on the GPU.

Andreas


Luigi Assom <[email protected]> writes:
> I need to parallelize a computation of intersection of sets of keywords
> over GPU .
>
> As example, I will take a cosine similarity computing the intersection
> between two sets.
> (see also post:
> http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster
> )
>
> I want to compute the similiarity, for each key value pairs of large
> dictionaries.
>
> The value of a key is indeed a set of thousands of elements, and they can
> be strings.
>
> Using multiprocessing I was able to improve by 4x, but i would like to try
> out GPU for really speed up the computation.
>
> in the source module, i actually don't know how to declare my parameters
> cause they are not float and i haven't found a tutorial using other data
> structures than numerical arrays with numpy.
> That's why I was I converted my lists of keywords in np.asarray() and I
> have tried the following:
>
>
>
> # convert list of strings into numpy array
> key1 = 'key1'
> array1 = np.asarray(D[key1])
>
> # convert list of strings into numpy array
> array2 = np.asarray(D[key2])
>
> # assign memory to cuda
>
> array1_cuda = cuda.mem_alloc(sys.getsizeof(array1))
> array2_cuda = cuda.mem_alloc(sys.getsizeof(array2))
>
> # and tried
>
> mod = SourceModule("""
>   __global__ void cosine(*a, *b)
>   {
>     int idx = threadIdx.x + threadIdx.y*4;
>     proxy =
> len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx])))
>
>   }
>   """)
>
>
>
> a_gpu = gpuarray.to_gpu(array1)
> b_gpu = gpuarray.to_gpu(array2)
>
> proxy =
> len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu)))
>
>
>
>
> but I get
>
> TypeError: GPUArrays are not hashable.
>
>
> Is it a problem of data structure, or am I following a conceptual mistake ?
>
>
> with multiprocessing (without pyCuda) my code is:
>
> ## Measuring Performance: 4x !
> with Timer() as t:
>     key = 'key1'
>     setParent = D[key]
>     ngbrProxy = set([])
>     p = Pool()
>     for ngbr in p.imap_unordered(cosine,setParent):
>         ngbrProxy.add(ngbr)
>
> print "=> elasped lpush: %s s" % t.secs
>
> I wonder how I could exploit the GPU for this type of computation: I am not
> working with numerical matrixes; on the documentation of pyCuda i read it
> is possibile to assign any type of data structures, even str, but I
> couldn't find an example.
>
> Could you please help in working this out ?
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda

Attachment: signature.asc
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to