Re: [PyCuda] async memcpy

Nicholas Tung Fri, 13 Mar 2009 11:28:48 -0700

 On Wed, Mar 11, 2009 at 22:44, Andreas Klöckner <[email protected]>wrote:

> On Donnerstag 12 März 2009, Nicholas Tung wrote:
> > Is there merit in creating a "main device thread" and letting Python
> > threads post requests to it [memcpy, kernel invocation, etc.], which
> would
> > be synchronized through streams [typically one stream per thread, but
> > modifications for passing data between threads]? If so, I'd be happy to
> > contribute to any implementation.
>
> This would probably have a non-negligible latency penalty, rendering the
> approach useful to only a few applications. If you write something like
> this,
> please make it available so people with similar needs can find it. I'd also
> have no problem sticking it into examples/.

[from above, the "main device thread" is "DeviceContextThread" (could be
multiple if there are multiple contexts) and the "Python threads post
requests..." are "ExecutionThreads" below]

The memory freeing gets kind of bad though; right now, I have a
DeviceContextThread which keeps a list of all memory allocated, and makes
the main thread drop the ref counts when ExecutionThreads no longer have any
references. This can get tricky, because one has to ensure that
ExecutionThreads [instances] don't have any references to memory, as the
thread objects stick around after the thread actually closes. The other
unfortunate aspect is that it's potentially slow and unintuitive "if
getrefcount(ref) == 3".

It would also be very nice if the memory pools could have a try_allocate()
function which would only grab already-allocated memory, thus avoiding the
need to post an event for that on the DeviceExecutionThread (otherwise, CUDA
calls require it to be called from the same thread which created the current
context). Out of curiosity, did you write the memory allocator yourself?
Sorry to be lazy and not look at the source.

Unfortunately, the built-in free() functions fail if, e.g., I manually
free() it when the DeviceContextThread exits, and then the ExecutionThread
is deleted [upon program termination]. Also, the "terminate called after
cuda::error" messages don't seem to be able to be caught by Python try /
except clauses, and they don't print a stack trace when one has multiple
threads, which is quite frustrating for development. If you have better
ideas, for any of these issues, I'd appreciate them.

Thanks,
Nicholas

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Re: [PyCuda] async memcpy

Reply via email to