Hi Freddie,

Freddie Witherden <[email protected]> writes:
> I have finally bitten the bullet and have started porting my solver from
> CUDA to OpenCL.  During a time-step it is necessary for MPI ranks to
> exchange data.  With PyCUDA and mpi4py our application proceeds as follows:
>
> At start-up we allocate a page-locked buffer on the host and an
> equally-sized buffer on the device.  We also construct a persistent MPI
> request for either sending the host buffer.  Then, when the time is
> right, we run a packing kernel on the device, initiate a device-to-host
> copy, and then start the persistent MPI request.
>
> Does anyone have any experience with performing this with OpenCL?  From
> what I can gather there are a variety of options, although none which
> jump off the page.  I am weary of getting the device to use a
> memory-mapped host pointer (when I tried it with CUDA our performance
> tanked).  I can not also find a direct equivalent to pagelocked_empty in
> OpenCL.  ALLOC_HOST_PTR followed by an enqueue_map_buffer may be what I
> want but am unsure if it fits in with persistent requests (it would need
> to be mapped all of the time).

ALLOC_HOST_PTR with enqueue_map_buffer will give you page-locked memory
(and thus fast transfers) on Nvidia and AMD. You very likely do *not*
want to pass this buffer to any kernels though--just use it as a
transfer target.

Hope that helps,
Andreas

Attachment: pgpbRjLxPAbqL.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to