On Dienstag 16 Juni 2009, Andrew Wagner wrote:
> Suppose I have a column-major array stored in linear memory on the
> gpu, and want to run a kernel on one column.
>
> One way would be to pass the base pointer and the offset into the
> kernel as parameters and do the same addition in each thread of my
> kernel.  This seems unnecessarily inefficient.
>
> Another way would be to allocate each column separately and keep
> around vectors of pointers to columns for kernels that need to process
> the whole array.  This seems like a mess.
>
> When calling a kernel from C the handles to device arrays are just
> addresses into device memory, and you can just apply the offset before
> calling the kernel, and this seems like the right way to go about it
> for C.

The "right" way isn't quite supported yet, which would be to just write a[:,i] 
and get the right view delivered. This is mostly because PyCUDA doesn't know 
about strides just yet, and assumes that arrays are contiguous chunks of 
memory. Of course, your particular case doesn't violate that assumption, so 
you can feel free to hack just that case into GPUArray.__getitem__. (It 
already deals with the 1D case.)  (Or feel free to hack stride treatment into 
PyCUDA--even a tiny step in that direction would be pretty cool.)

Another way is to obtain a 1D view of the 2D array (which would require you to 
implement a .flat attribute mimicking numpy, also simple by copying what's 
happening in __getitem__).

The last way is to just grab the pointer from ary.gpuarray, increment it by 
the right multiple of ary.dtype.itemsize and run with that.

Choose your weapon. :)

And be sure to send patches if you choose 1) or 2).

Andreas

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to