>> Is the binding of a GPUArray to a Texture not supported, Or did I miss something ??

I was having the same problem, trying to pass GPUArrays' gpudata into my kernels to do other processing. I think I've figured out what's going on.

A quick search through the PyCUDA source code finds bind_to_texref in:
test/test_driver.py
test/test_texture_nan.py
pycuda/gpuarray.py

In test/test_driver.py, function test_fp_textures, texture fetching is done using the tex1Dfetch function in kernel code, and operates on a 1D texture. The script test/test_texture_nan.py does the same.

GPUArray uses textures internally, when calling elementwise.get_take_kernel() in GPUArray.take() etc. elementwise.get_take_kernel() declares its texture references as 1D, and again fetches them with tex1Dfetch() in device code.

If I do likewise and set up my GPUArrays/textures in 1D, I can indeed bind the GPUArray to a texture and fetch its contents inside my device kernels - but only using tex1Dfetch(). tex1D() seems to always return zeros, or rather sometimes I can fetch only element 0 of my array.

What is the difference between tex1Dfetch() and tex1D() ? From the CUDA Programming guide:

tex1dfetch(): fetch the region of linear memory bound to texture reference texRef using integer texture coordinate x. No texture filtering and addressing modes are supported.

tex1D/2D/3D(): fetches the CUDA array bound to texture reference using floating-point texture coordinates.

I'm piecing this together from various sources, but it looks like there's a difference between 'linear memory allocation' and 'pitch linear memory allocation' on the device. Pitch linear memory is nicely arranged for texture processing, multi-D addressing, etc. Linear memory is simply global device memory, accessed byte after byte.

It's hard to keep things straight since we have numpy ndarrays, GPUArrays, and CUDA arrays at work here, and they are all often referred to as just 'an array'.

CUDA arrays (see pycuda.driver.Array) are '2D or 3D memory block that can only be accessed via texture references'. That is, they are opaque, we can't directly read/write their bytes. They live in pitch linear memory.

PyCUDA GPUArrays, on the other hand, use device 'linear memory' to store their data - it allocated using a normal cudaMalloc, not a cudaMallocPitch. This can be seen by looking at their 'allocator', which is set to drv.mem_alloc in the constructor - it would be mem_alloc_pitched if they were using pitched memory. This allows accessing them directly in device kernels.

numpy ndarrays, of course, just keep their data in host memory. GPUArrays are designed to work like them syntactically for ease of use.

With these pieces of information, I tried the following:
Make a multidimensional GPUArray in Python on the host.
Bind it to a texture reference from a module that is declared as 1D.
Access it in a device kernel using tex1Dfetch(), not tex1D/2D/3D(), by building up a 1D 'flat' offset for the element I want.

It works!

The other approach is what you see in matrix_to_texref(matrix, texref, order), which takes a numpy array on the host and attaches it to a texref, using matrix_to_array() and bind_array_to_texref(). It does so by copying the data into device linear pitched memory using CUDA's memcpy2D function, and so is limited to 2D arrays for the moment. It looks like you have to do a memcpy from (host / device linear) -> (device linear pitched memory), in order to arrange the data nicely for multi-D texture fetching to work.

Can anyone confirm that this is correct?

-Andrew



_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to