Hello Andreas, thank you for your feedback:
Which prerequisite must have a data structure to be good for GPU? Should I allocate exact size of memory for each array ? Is it ok to use numpy data stractures to compute arrays and execute operations on the GPU, instead of python 'set' data structure? e.g. np.intersect1d(['a','beta','gamma'],['gamma','delta','omega']) could it be parallelized ? as approch, is it better to try to parallelize the operation of intersection between two keys in a dictionary, or rather import the whole dictionary (or a partition of it) in the GPU ? On Sun, Dec 21, 2014 at 9:44 PM, Andreas Kloeckner <[email protected]> wrote: > Luigi, > > here are a few problems with your approach: > > - The contents of your SourceModule is not valid C (as in, C the > programming language) > > - 'set' is a Python data structure. PyCUDA will not magically swap out > the code of 'set' and execute its operations on the GPU. > > - Working with arrays of variable-size objects (such as strings) on the > GPU is somewhat tricky. You'll have to come up with a good data > structure. In particular, just copying over a Python data structure > will not help--if it succeeds, the pointers in the structure will > point to CPU memory and be entirely useless on the GPU. > > Andreas > > > Luigi Assom <[email protected]> writes: > > I need to parallelize a computation of intersection of sets of keywords > > over GPU . > > > > As example, I will take a cosine similarity computing the intersection > > between two sets. > > (see also post: > > > http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster > > ) > > > > I want to compute the similiarity, for each key value pairs of large > > dictionaries. > > > > The value of a key is indeed a set of thousands of elements, and they can > > be strings. > > > > Using multiprocessing I was able to improve by 4x, but i would like to > try > > out GPU for really speed up the computation. > > > > in the source module, i actually don't know how to declare my parameters > > cause they are not float and i haven't found a tutorial using other data > > structures than numerical arrays with numpy. > > That's why I was I converted my lists of keywords in np.asarray() and I > > have tried the following: > > > > > > > > # convert list of strings into numpy array > > key1 = 'key1' > > array1 = np.asarray(D[key1]) > > > > # convert list of strings into numpy array > > array2 = np.asarray(D[key2]) > > > > # assign memory to cuda > > > > array1_cuda = cuda.mem_alloc(sys.getsizeof(array1)) > > array2_cuda = cuda.mem_alloc(sys.getsizeof(array2)) > > > > # and tried > > > > mod = SourceModule(""" > > __global__ void cosine(*a, *b) > > { > > int idx = threadIdx.x + threadIdx.y*4; > > proxy = > > len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx]))) > > > > } > > """) > > > > > > > > a_gpu = gpuarray.to_gpu(array1) > > b_gpu = gpuarray.to_gpu(array2) > > > > proxy = > > len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu))) > > > > > > > > > > but I get > > > > TypeError: GPUArrays are not hashable. > > > > > > Is it a problem of data structure, or am I following a conceptual > mistake ? > > > > > > with multiprocessing (without pyCuda) my code is: > > > > ## Measuring Performance: 4x ! > > with Timer() as t: > > key = 'key1' > > setParent = D[key] > > ngbrProxy = set([]) > > p = Pool() > > for ngbr in p.imap_unordered(cosine,setParent): > > ngbrProxy.add(ngbr) > > > > print "=> elasped lpush: %s s" % t.secs > > > > I wonder how I could exploit the GPU for this type of computation: I am > not > > working with numerical matrixes; on the documentation of pyCuda i read it > > is possibile to assign any type of data structures, even str, but I > > couldn't find an example. > > > > Could you please help in working this out ? > > _______________________________________________ > > PyCUDA mailing list > > [email protected] > > http://lists.tiker.net/listinfo/pycuda > > -- Luigi Assom Skype contact: oggigigi
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
