Hi folks, please take my continued stream of questions as an indication
of the versatility and flexibility of PyCUDA, instead of my slowness
with it.

I have PyCUDA that allocates memory on the host and on the device to
store the output of a kernel calculation. E.g.,

### start code
import numpy
import pycuda.driver as cuda
import pycuda.autoinit
image = numpy.zeros((1024, 1024), dtype=numpy.complex64, order='C')
image_gpu = cuda.mem_alloc(image.nbytes)

# kernel invocation using image_gpu, works fine.
### end code

I'd like to replace this idiom with one involving page-locked memory,
specifically device-mapped memory. I have something as follows:

### start code
image = cuda.aligned_empty((1024,1024), dtype=numpy.complex64,
order='C')
image_gpu = cuda.register_host_memory(image,
flags=cuda.mem_host_register_flags.DEVICEMAP)
### end code

Same kernel invocation using image_gpu fails,
"pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

I also try sending the kernel the integer (pointer) returned by
register_host_memory()'s returned value's basemap's
get_device_pointer(). In other words, I tried this:

### start code
image_gpu_return = cuda.register_host_memory(self.image,
    flags=cuda.mem_host_register_flags.DEVICEMAP)
image_gpu = image_gpu_return.base.get_device_pointer()

# kernel invocation using image_gpu, fails.
### end code

Note that it is the kernel invocation that throws the exception---the
aligned_empty & register_host_memory & get_device_pointer calls are
fine.

Anyone used this CUDA feature in PyCUDA and can shed some light on how
to do it right? Thanks,
Ahmed

PS. Perhaps a note on why I'm trying to use device-mapped memory: any
memory allocated this way will be written only once by the kernel, and
it might not fit entirely in GPU memory.

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to