On Wed, Mar 11, 2009 at 22:44, Andreas Klöckner <[email protected]>wrote:
> On Donnerstag 12 März 2009, Nicholas Tung wrote: > > Is there merit in creating a "main device thread" and letting Python > > threads post requests to it [memcpy, kernel invocation, etc.], which > would > > be synchronized through streams [typically one stream per thread, but > > modifications for passing data between threads]? If so, I'd be happy to > > contribute to any implementation. > > This would probably have a non-negligible latency penalty, rendering the > approach useful to only a few applications. If you write something like > this, > please make it available so people with similar needs can find it. I'd also > have no problem sticking it into examples/. [from above, the "main device thread" is "DeviceContextThread" (could be multiple if there are multiple contexts) and the "Python threads post requests..." are "ExecutionThreads" below] The memory freeing gets kind of bad though; right now, I have a DeviceContextThread which keeps a list of all memory allocated, and makes the main thread drop the ref counts when ExecutionThreads no longer have any references. This can get tricky, because one has to ensure that ExecutionThreads [instances] don't have any references to memory, as the thread objects stick around after the thread actually closes. The other unfortunate aspect is that it's potentially slow and unintuitive "if getrefcount(ref) == 3". It would also be very nice if the memory pools could have a try_allocate() function which would only grab already-allocated memory, thus avoiding the need to post an event for that on the DeviceExecutionThread (otherwise, CUDA calls require it to be called from the same thread which created the current context). Out of curiosity, did you write the memory allocator yourself? Sorry to be lazy and not look at the source. Unfortunately, the built-in free() functions fail if, e.g., I manually free() it when the DeviceContextThread exits, and then the ExecutionThread is deleted [upon program termination]. Also, the "terminate called after cuda::error" messages don't seem to be able to be caught by Python try / except clauses, and they don't print a stack trace when one has multiple threads, which is quite frustrating for development. If you have better ideas, for any of these issues, I'd appreciate them. Thanks, Nicholas
_______________________________________________ PyCuda mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
