Hi there

Would it be possible to add an allocator keyword argument to
ReductionKernel.__call__ and gpuarray.sum etc.?

At the moment we have:

krnl = ReductionKernel(...)
result = krnl(a, stream)

Now  __call__() uses a.allocator to make device allocations, but unless a
has been allocated using a DeviceMemoryPool, a device allocation and
deallocation occurs for the returned value. Additionally, this serialises
asynchronous stream calls. One possible work-around is:

pool = pycuda.tools.DeviceMemoryPool()
tmp_alloc = a.allocator
a.allocator = pool.allocate
result = krnl(a, stream)
a.allocator = tmp_alloc

thanks!
  Simon
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to