[PyCUDA] Pointer arithmetic

Andrew Wagner Tue, 16 Jun 2009 12:12:26 -0700

Suppose I have a column-major array stored in linear memory on thegpu, and want to run a kernel on one column.

One way would be to pass the base pointer and the offset into thekernel as parameters and do the same addition in each thread of mykernel. This seems unnecessarily inefficient.

Another way would be to allocate each column separately and keeparound vectors of pointers to columns for kernels that need to processthe whole array. This seems like a mess.

When calling a kernel from C the handles to device arrays are justaddresses into device memory, and you can just apply the offset beforecalling the kernel, and this seems like the right way to go about itfor C.


What's the right way to do this in a pycuda context?

I found something promising in one of the tests:

        # now try with offsets
        dest = numpy.zeros_like(a)
        multiply_them(
                drv.Out(dest), numpy.intp(a_gpu)+1, b_gpu,
                block=(399,1,1))

and I found the same syntax in:

http://documen.tician.de/pycuda/tutorial.html?highlight=intp#structures

but this doesn't seem to be explicitly documented.

Thanks!
Drew


_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

[PyCUDA] Pointer arithmetic

Reply via email to