Simon Perkins <[email protected]> writes: > Hi there > > Would it be possible to add an allocator keyword argument to > ReductionKernel.__call__ and gpuarray.sum etc.? > > At the moment we have: > > krnl = ReductionKernel(...) > result = krnl(a, stream) > > Now __call__() uses a.allocator to make device allocations, but unless a > has been allocated using a DeviceMemoryPool, a device allocation and > deallocation occurs for the returned value. Additionally, this serialises > asynchronous stream calls. One possible work-around is: > > pool = pycuda.tools.DeviceMemoryPool() > tmp_alloc = a.allocator > a.allocator = pool.allocate > result = krnl(a, stream) > a.allocator = tmp_alloc
I'd be happy to take a patch. Andreas
pgpk2JKbxmQsf.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
