>> Is the binding of a GPUArray to a Texture not supported, Or did I
miss something ??
I was having the same problem, trying to pass GPUArrays' gpudata into
my kernels to do other processing. I think I've figured out what's
going on.
A quick search through the PyCUDA source code finds bind_to_texref in:
test/test_driver.py
test/test_texture_nan.py
pycuda/gpuarray.py
In test/test_driver.py, function test_fp_textures, texture fetching is
done using the tex1Dfetch function in kernel code, and operates on a
1D texture. The script test/test_texture_nan.py does the same.
GPUArray uses textures internally, when calling
elementwise.get_take_kernel() in GPUArray.take() etc.
elementwise.get_take_kernel() declares its texture references as 1D,
and again fetches them with tex1Dfetch() in device code.
If I do likewise and set up my GPUArrays/textures in 1D, I can indeed
bind the GPUArray to a texture and fetch its contents inside my device
kernels - but only using tex1Dfetch(). tex1D() seems to always return
zeros, or rather sometimes I can fetch only element 0 of my array.
What is the difference between tex1Dfetch() and tex1D() ? From the
CUDA Programming guide:
tex1dfetch(): fetch the region of linear memory bound to texture
reference texRef using integer texture coordinate x. No texture
filtering and addressing modes are supported.
tex1D/2D/3D(): fetches the CUDA array bound to texture reference using
floating-point texture coordinates.
I'm piecing this together from various sources, but it looks like
there's a difference between 'linear memory allocation' and 'pitch
linear memory allocation' on the device. Pitch linear memory is nicely
arranged for texture processing, multi-D addressing, etc. Linear
memory is simply global device memory, accessed byte after byte.
It's hard to keep things straight since we have numpy ndarrays,
GPUArrays, and CUDA arrays at work here, and they are all often
referred to as just 'an array'.
CUDA arrays (see pycuda.driver.Array) are '2D or 3D memory block that
can only be accessed via texture references'. That is, they are
opaque, we can't directly read/write their bytes. They live in pitch
linear memory.
PyCUDA GPUArrays, on the other hand, use device 'linear memory' to
store their data - it allocated using a normal cudaMalloc, not a
cudaMallocPitch. This can be seen by looking at their 'allocator',
which is set to drv.mem_alloc in the constructor - it would be
mem_alloc_pitched if they were using pitched memory. This allows
accessing them directly in device kernels.
numpy ndarrays, of course, just keep their data in host memory.
GPUArrays are designed to work like them syntactically for ease of use.
With these pieces of information, I tried the following:
Make a multidimensional GPUArray in Python on the host.
Bind it to a texture reference from a module that is declared as 1D.
Access it in a device kernel using tex1Dfetch(), not tex1D/2D/3D(), by
building up a 1D 'flat' offset for the element I want.
It works!
The other approach is what you see in matrix_to_texref(matrix, texref,
order), which takes a numpy array on the host and attaches it to a
texref, using matrix_to_array() and bind_array_to_texref(). It does so
by copying the data into device linear pitched memory using CUDA's
memcpy2D function, and so is limited to 2D arrays for the moment. It
looks like you have to do a memcpy from (host / device linear) ->
(device linear pitched memory), in order to arrange the data nicely
for multi-D texture fetching to work.
Can anyone confirm that this is correct?
-Andrew
_______________________________________________
PyCUDA mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net