On Thu, Nov 1, 2012 at 12:23 AM, Ahmed Fasih <[email protected]> wrote:
>
> On Thu, Nov 1, 2012 at 12:07 AM, Ahmed Fasih <[email protected]> wrote:
>>
>> Hi PyCUDA community and author, thanks for your continued fantastic
>> support. Today I am running into an unexpected problem when I try to
bind a
>> big chunk of memory to a texture. I've reduced the problem to the
following
>> stand-alone illustrative example, but I apologize, because the example
>> contains some specifics about my usecase (float2 data, 2D texture---the
end
>> use of all this is to do matrix filtering):
>>
>> ### code starts
>> import pycuda.driver as cuda
>> import pycuda.autoinit
>> from pycuda.compiler import SourceModule
>>
>> # borrowed from PyCUDA test
>> mod = SourceModule("""
>> #define CHANNELS 2
>> texture<float2, 2, cudaReadModeElementType> mtx_tex;
>> __global__ void copy_texture(float *dest)
>> {
>> int row = threadIdx.x;
>> int col = threadIdx.y;
>> int w = blockDim.y;
>> float2 texval = tex2D(mtx_tex, row, col);
>> dest[(row*w+col)*CHANNELS + 0] = texval.x;
>> dest[(row*w+col)*CHANNELS + 1] = texval.y;
>> }
>> """)
>> mtx_tex = mod.get_texref("mtx_tex")
>>
>> width = 32*1024
>> height = 4330
>> nbytes = width * height * 2 * 4 # 2 floats per float2, 4 bytes per float
>> gpu = cuda.mem_alloc(nbytes)
>> HACK = 3
>> mtx_tex.set_address(gpu, nbytes / HACK)
>>
>> ### code ends
>>
>> The above code runs on my Telsa C2050, for values of "HACK" >= 3, that
is,
>> asking set_address() to bind only one-third (or less) of the device
memory
>> needed into the texture unit. I need it to work for HACK=1, but I get an
>> error:
>>
>> ### error begins
>> Traceback (most recent call last):
>> File "texbind.py", line 26, in <module>
>> mtx_tex.set_address(gpu, nbytes / HACK)
>> pycuda._driver.LogicError: cuTexRefSetAddress failed: invalid value
>> ### error ends
>>
>
> Sorry for the extra email. I see on [1] that there is a limit to the width
> of a 1D linear memory texture reference, of 2**27, which is just a bit
less
> than my width*height = 2**27.08. I'll try resolving this issue by using
> set_address_2d() instead of set_address(), and will post an update
tomorrow.
> Again, apologies for additionally burdening your inboxes,
> Ahmed
>
> [1] http://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
Below is an enhanced micro-demo showing how I used set_address_2d() and
set_address() in conjunction with both mem_alloc() and mem_alloc_pitch(),
in the hope that a final bit of spam in your inboxes is worth it to
somebody down the road:
### begin code
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
mod = SourceModule("""
#define CHANNELS 2
texture<float2, 2, cudaReadModeElementType> mtx_tex;
__global__ void copy_texture(float *dest)
{
int row = threadIdx.x;
int col = threadIdx.y;
int w = blockDim.y;
float2 texval = tex2D(mtx_tex, row, col);
dest[(row*w+col)*CHANNELS + 0] = texval.x;
dest[(row*w+col)*CHANNELS + 1] = texval.y;
}
""")
mtx_tex = mod.get_texref("mtx_tex")
HACK = 1
width = 32*1024 / HACK
height = 4330
nbytes = width * height * 2 * 4 # 2 floats per float2, 4 bytes per float
if 1:
gpu2, pitch = cuda.mem_alloc_pitch(width, height, 4*2) # 2 floats per
float2, 4 bytes per float
print "Pitch in units of elements (not bytes):", pitch
descr = cuda.ArrayDescriptor()
descr.format = cuda.array_format.FLOAT
descr.height = height
descr.width = width
descr.num_channels = 2
mtx_tex.set_address_2d(gpu2, descr, pitch*4*2) # method 1
gpu = cuda.mem_alloc(nbytes)
mtx_tex.set_address_2d(gpu, descr, width*4*2) # method 2
if HACK <= 1:
print "The following might fail."
mtx_tex.set_address(gpu, nbytes) # method 3
else:
import numpy
from numpy.random import randn
from numpy import ones
data = (numpy.ones((height,width)) + 1j *
numpy.ones((height,width))).astype(numpy.complex64)
carr = cuda.make_multichannel_2d_array(numpy.asarray(numpy.concatenate(
(data.real[:,:,numpy.newaxis], data.imag[:,:,numpy.newaxis]),2),
order='C'), order='C')
cuda.bind_array_to_texref(carr, mtx_tex)
### end code
Method 1: set_address_2d with mem_alloc_pitch
Method 2: set_address_2d with mem_alloc
Method 3: set_address with mem_alloc
All three work (as in, don't produce an error) when HACK>2, and the first
two (using explicitly 2D textures) work with HACK=1 as desired. Thanks for
your patience and moral support,
Ahmed
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda