Re: [PyCUDA] Problem binding large DeviceAllocations to textures

Ahmed Fasih Thu, 01 Nov 2012 11:08:14 -0700

On Thu, Nov 1, 2012 at 12:23 AM, Ahmed Fasih <[email protected]> wrote:
>
> On Thu, Nov 1, 2012 at 12:07 AM, Ahmed Fasih <[email protected]> wrote:
>>
>> Hi PyCUDA community and author, thanks for your continued fantastic
>> support. Today I am running into an unexpected problem when I try to
bind a
>> big chunk of memory to a texture. I've reduced the problem to the
following
>> stand-alone illustrative example, but I apologize, because the example
>> contains some specifics about my usecase (float2 data, 2D texture---the
end
>> use of all this is to do matrix filtering):
>>
>> ### code starts
>> import pycuda.driver as cuda
>> import pycuda.autoinit
>> from pycuda.compiler import SourceModule
>>
>> # borrowed from PyCUDA test
>> mod = SourceModule("""
>>         #define CHANNELS 2
>>         texture<float2, 2, cudaReadModeElementType> mtx_tex;
>>         __global__ void copy_texture(float *dest)
>>         {
>>           int row = threadIdx.x;
>>           int col = threadIdx.y;
>>           int w = blockDim.y;
>>           float2 texval = tex2D(mtx_tex, row, col);
>>           dest[(row*w+col)*CHANNELS + 0] = texval.x;
>>           dest[(row*w+col)*CHANNELS + 1] = texval.y;
>>         }
>>         """)
>> mtx_tex = mod.get_texref("mtx_tex")
>>
>> width = 32*1024
>> height = 4330
>> nbytes = width * height * 2 * 4 # 2 floats per float2, 4 bytes per float
>> gpu = cuda.mem_alloc(nbytes)
>> HACK = 3
>> mtx_tex.set_address(gpu, nbytes / HACK)
>>
>> ### code ends
>>
>> The above code runs on my Telsa C2050, for values of "HACK" >= 3, that
is,
>> asking set_address() to bind only one-third (or less) of the device
memory
>> needed into the texture unit. I need it to work for HACK=1, but I get an
>> error:
>>
>> ### error begins
>> Traceback (most recent call last):
>>   File "texbind.py", line 26, in <module>
>>     mtx_tex.set_address(gpu, nbytes / HACK)
>> pycuda._driver.LogicError: cuTexRefSetAddress failed: invalid value
>> ### error ends
>>
>
> Sorry for the extra email. I see on [1] that there is a limit to the width
> of a 1D linear memory texture reference, of 2**27, which is just a bit
less
> than my width*height = 2**27.08. I'll try resolving this issue by using
> set_address_2d() instead of set_address(), and will post an update
tomorrow.
> Again, apologies for additionally burdening your inboxes,
> Ahmed
>
> [1] http://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications



Below is an enhanced micro-demo showing how I used set_address_2d() and
set_address() in conjunction with both mem_alloc() and mem_alloc_pitch(),
in the hope that a final bit of spam in your inboxes is worth it to
somebody down the road:

### begin code
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

mod = SourceModule("""
        #define CHANNELS 2
        texture<float2, 2, cudaReadModeElementType> mtx_tex;
        __global__ void copy_texture(float *dest)
        {
          int row = threadIdx.x;
          int col = threadIdx.y;
          int w = blockDim.y;
          float2 texval = tex2D(mtx_tex, row, col);
          dest[(row*w+col)*CHANNELS + 0] = texval.x;
          dest[(row*w+col)*CHANNELS + 1] = texval.y;
        }
        """)
mtx_tex = mod.get_texref("mtx_tex")

HACK = 1
width = 32*1024 / HACK
height = 4330
nbytes = width * height * 2 * 4 # 2 floats per float2, 4 bytes per float

if 1:
    gpu2, pitch = cuda.mem_alloc_pitch(width, height, 4*2) # 2 floats per
float2, 4 bytes per float
    print "Pitch in units of elements (not bytes):", pitch
    descr = cuda.ArrayDescriptor()
    descr.format = cuda.array_format.FLOAT
    descr.height = height
    descr.width  = width
    descr.num_channels = 2
    mtx_tex.set_address_2d(gpu2, descr, pitch*4*2) # method 1


    gpu = cuda.mem_alloc(nbytes)
    mtx_tex.set_address_2d(gpu, descr, width*4*2) # method 2

    if HACK <= 1:
        print "The following might fail."
    mtx_tex.set_address(gpu, nbytes)          # method 3

else:
    import numpy
    from numpy.random import randn
    from numpy import ones
    data = (numpy.ones((height,width)) + 1j *
numpy.ones((height,width))).astype(numpy.complex64)

    carr =  cuda.make_multichannel_2d_array(numpy.asarray(numpy.concatenate(
        (data.real[:,:,numpy.newaxis], data.imag[:,:,numpy.newaxis]),2),
        order='C'), order='C')
    cuda.bind_array_to_texref(carr, mtx_tex)

### end code

Method 1: set_address_2d with mem_alloc_pitch
Method 2: set_address_2d with mem_alloc
Method 3: set_address with mem_alloc

All three work (as in, don't produce an error) when HACK>2, and the first
two (using explicitly 2D textures) work with HACK=1 as desired. Thanks for
your patience and moral support,
Ahmed

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Problem binding large DeviceAllocations to textures

Reply via email to