Alexander Bock <[email protected]> writes:
> I am creating some timing tests with PyCUDA for batch-loading an image
> sequence. I first tried timing a normal, synchronous transfer over global
> memory.
>
> Now I am looking to test pagelocked memory, specifically, I would like to
> test: Single-stream, pagelocked synchronous transfers, multi-stream,
> asynchronous pagelocked transfers and zero-copy memory using device mapped
> memory.
>
> For the first one, do I simply call pycuda.driver.memcpy_htod/dtoh using
> the pagelocked memory (I am using memflags=0 for creating the pagelocked
> memory, I assume it corresponds to cudaHostAllocDefault?) For the second, I
> would use the memcpy_(htod/dtoh)_async calls with more than one stream (my
> laptop supports concurrent kernels). For the final one, I would create my
> own context using pycuda.driver.make_context with the MAP_HOST flag,
> allocate the pagelocked memory using host_alloc_flags.DEVICE_MAP and call
> my kernel with the device pointer? Am I on the right track?

Yep, that sounds right.

In terms of documentation, the CUDA programming guide applies. One thing
to notice is to look at the "driver" interface, not the "runtime"
interface. The lowest layer of PyCUDA is just a coat of Python paint on that.

Example and docs contributions would be more than welcome!

Andreas

Attachment: pgpA5I94ds3Cs.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to