On Wed, Nov 7, 2012 at 9:53 PM, Lev Givon <[email protected]> wrote:
> Received from Ahmed Fasih on Wed, Nov 07, 2012 at 09:02:15PM EST:
>> Hi folks, please take my continued stream of questions as an indication
>> of the versatility and flexibility of PyCUDA, instead of my slowness
>> with it.
>>
>> I have PyCUDA that allocates memory on the host and on the device to
>> store the output of a kernel calculation. E.g.,
>>
>> ### start code
>> import numpy
>> import pycuda.driver as cuda
>> import pycuda.autoinit
>> image = numpy.zeros((1024, 1024), dtype=numpy.complex64, order='C')
>> image_gpu = cuda.mem_alloc(image.nbytes)
>>
>> # kernel invocation using image_gpu, works fine.
>> ### end code
>>
>> I'd like to replace this idiom with one involving page-locked memory,
>> specifically device-mapped memory. I have something as follows:
>>
>> ### start code
>> image = cuda.aligned_empty((1024,1024), dtype=numpy.complex64,
>> order='C')
>> image_gpu = cuda.register_host_memory(image,
>> flags=cuda.mem_host_register_flags.DEVICEMAP)
>> ### end code
>>
>> Same kernel invocation using image_gpu fails,
>> "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"
>>
>> I also try sending the kernel the integer (pointer) returned by
>> register_host_memory()'s returned value's basemap's
>> get_device_pointer(). In other words, I tried this:
>>
>> ### start code
>> image_gpu_return = cuda.register_host_memory(self.image,
>>     flags=cuda.mem_host_register_flags.DEVICEMAP)
>> image_gpu = image_gpu_return.base.get_device_pointer()
>>
>> # kernel invocation using image_gpu, fails.
>> ### end code
>>
>> Note that it is the kernel invocation that throws the exception---the
>> aligned_empty & register_host_memory & get_device_pointer calls are
>> fine.
>>
>> Anyone used this CUDA feature in PyCUDA and can shed some light on how
>> to do it right? Thanks,
>> Ahmed
>>
>> PS. Perhaps a note on why I'm trying to use device-mapped memory: any
>> memory allocated this way will be written only once by the kernel, and
>> it might not fit entirely in GPU memory.
>
> Not sure why you are observing a failure; the following gists run
> without error on my system (CUDA 4.2.9, PyCUDA 2012.1, NVIDIA driver
> 295.40, 64-bit Linux):
>
> https://gist.github.com/4036297
> https://gist.github.com/4036292
>
>                                                         L.G.

Thanks Lev! These gists were really useful in understanding how to use
these functions, and they work for me too. Nonetheless, I tried and
succeeded in breaking the second one: see
https://gist.github.com/4036693

First, I had to add "assert" in the calls to np.allclose to make sure
I'd be informed if things weren't all close. Then I extended the
kernel to work with multiple blocks, and finally I moved the unpinned
test first. As I increased N from 20 to 22, both tests passed. But at
N=23 (23 by 23 array), although the unpinned version works, the pinned
assertion fails and PyCUDA complains that cleanup operations failed.

I can't find any documented limit on the size of page-locked memory
allocations, but it ought to be >3kb, right?

Ubuntu 11.10, NVIDIA driver 304.51, CUDA 5, PyCUDA 2012.1, Tesla
C2050. If you or any other kind soul is able to successfully run this
gist, let me know! https://gist.github.com/4036693

Thanks again,
Ahmed

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to