Hi there Would it be possible to add an allocator keyword argument to ReductionKernel.__call__ and gpuarray.sum etc.?
At the moment we have: krnl = ReductionKernel(...) result = krnl(a, stream) Now __call__() uses a.allocator to make device allocations, but unless a has been allocated using a DeviceMemoryPool, a device allocation and deallocation occurs for the returned value. Additionally, this serialises asynchronous stream calls. One possible work-around is: pool = pycuda.tools.DeviceMemoryPool() tmp_alloc = a.allocator a.allocator = pool.allocate result = krnl(a, stream) a.allocator = tmp_alloc thanks! Simon
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
