Hi folks, please take my continued stream of questions as an indication
of the versatility and flexibility of PyCUDA, instead of my slowness
with it.
I have PyCUDA that allocates memory on the host and on the device to
store the output of a kernel calculation. E.g.,
### start code
import numpy
import pycuda.driver as cuda
import pycuda.autoinit
image = numpy.zeros((1024, 1024), dtype=numpy.complex64, order='C')
image_gpu = cuda.mem_alloc(image.nbytes)
# kernel invocation using image_gpu, works fine.
### end code
I'd like to replace this idiom with one involving page-locked memory,
specifically device-mapped memory. I have something as follows:
### start code
image = cuda.aligned_empty((1024,1024), dtype=numpy.complex64,
order='C')
image_gpu = cuda.register_host_memory(image,
flags=cuda.mem_host_register_flags.DEVICEMAP)
### end code
Same kernel invocation using image_gpu fails,
"pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"
I also try sending the kernel the integer (pointer) returned by
register_host_memory()'s returned value's basemap's
get_device_pointer(). In other words, I tried this:
### start code
image_gpu_return = cuda.register_host_memory(self.image,
flags=cuda.mem_host_register_flags.DEVICEMAP)
image_gpu = image_gpu_return.base.get_device_pointer()
# kernel invocation using image_gpu, fails.
### end code
Note that it is the kernel invocation that throws the exception---the
aligned_empty & register_host_memory & get_device_pointer calls are
fine.
Anyone used this CUDA feature in PyCUDA and can shed some light on how
to do it right? Thanks,
Ahmed
PS. Perhaps a note on why I'm trying to use device-mapped memory: any
memory allocated this way will be written only once by the kernel, and
it might not fit entirely in GPU memory.
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda