Simon Perkins <[email protected]> writes:

> Hi there
>
> Would it be possible to add an allocator keyword argument to
> ReductionKernel.__call__ and gpuarray.sum etc.?
>
> At the moment we have:
>
> krnl = ReductionKernel(...)
> result = krnl(a, stream)
>
> Now  __call__() uses a.allocator to make device allocations, but unless a
> has been allocated using a DeviceMemoryPool, a device allocation and
> deallocation occurs for the returned value. Additionally, this serialises
> asynchronous stream calls. One possible work-around is:
>
> pool = pycuda.tools.DeviceMemoryPool()
> tmp_alloc = a.allocator
> a.allocator = pool.allocate
> result = krnl(a, stream)
> a.allocator = tmp_alloc

I'd be happy to take a patch.

Andreas

Attachment: pgpk2JKbxmQsf.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to