Alexander Bock <[email protected]> writes: > I am creating some timing tests with PyCUDA for batch-loading an image > sequence. I first tried timing a normal, synchronous transfer over global > memory. > > Now I am looking to test pagelocked memory, specifically, I would like to > test: Single-stream, pagelocked synchronous transfers, multi-stream, > asynchronous pagelocked transfers and zero-copy memory using device mapped > memory. > > For the first one, do I simply call pycuda.driver.memcpy_htod/dtoh using > the pagelocked memory (I am using memflags=0 for creating the pagelocked > memory, I assume it corresponds to cudaHostAllocDefault?) For the second, I > would use the memcpy_(htod/dtoh)_async calls with more than one stream (my > laptop supports concurrent kernels). For the final one, I would create my > own context using pycuda.driver.make_context with the MAP_HOST flag, > allocate the pagelocked memory using host_alloc_flags.DEVICE_MAP and call > my kernel with the device pointer? Am I on the right track?
Yep, that sounds right. In terms of documentation, the CUDA programming guide applies. One thing to notice is to look at the "driver" interface, not the "runtime" interface. The lowest layer of PyCUDA is just a coat of Python paint on that. Example and docs contributions would be more than welcome! Andreas
pgpA5I94ds3Cs.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
