On Dienstag 16 Juni 2009, Andrew Wagner wrote: > Suppose I have a column-major array stored in linear memory on the > gpu, and want to run a kernel on one column. > > One way would be to pass the base pointer and the offset into the > kernel as parameters and do the same addition in each thread of my > kernel. This seems unnecessarily inefficient. > > Another way would be to allocate each column separately and keep > around vectors of pointers to columns for kernels that need to process > the whole array. This seems like a mess. > > When calling a kernel from C the handles to device arrays are just > addresses into device memory, and you can just apply the offset before > calling the kernel, and this seems like the right way to go about it > for C.
The "right" way isn't quite supported yet, which would be to just write a[:,i] and get the right view delivered. This is mostly because PyCUDA doesn't know about strides just yet, and assumes that arrays are contiguous chunks of memory. Of course, your particular case doesn't violate that assumption, so you can feel free to hack just that case into GPUArray.__getitem__. (It already deals with the 1D case.) (Or feel free to hack stride treatment into PyCUDA--even a tiny step in that direction would be pretty cool.) Another way is to obtain a 1D view of the 2D array (which would require you to implement a .flat attribute mimicking numpy, also simple by copying what's happening in __getitem__). The last way is to just grab the pointer from ary.gpuarray, increment it by the right multiple of ary.dtype.itemsize and run with that. Choose your weapon. :) And be sure to send patches if you choose 1) or 2). Andreas
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ PyCUDA mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
