Re: [PyCuda] async memcpy

Nicholas Tung Sat, 14 Mar 2009 19:54:44 -0700

On Sat, Mar 14, 2009 at 15:37, Nicholas Tung <[email protected]> wrote:


> On Fri, Mar 13, 2009 at 21:28, Andreas Klöckner 
> <[email protected]>wrote:
>
>> On Freitag 13 März 2009, Nicholas Tung wrote:
>> >  On Wed, Mar 11, 2009 at 22:44, Andreas Klöckner
>> <[email protected]>wrote:
>> > > On Donnerstag 12 März 2009, Nicholas Tung wrote:
>> > > > Is there merit in creating a "main device thread" and letting Python
>> > > > threads post requests to it [memcpy, kernel invocation, etc.], which
>> > >
>> > > would
>> > >
>> > > > be synchronized through streams [typically one stream per thread,
>> but
>> > > > modifications for passing data between threads]? If so, I'd be happy
>> to
>> > > > contribute to any implementation.
>> > >
>> > > This would probably have a non-negligible latency penalty, rendering
>> the
>> > > approach useful to only a few applications. If you write something
>> like
>> > > this,
>> > > please make it available so people with similar needs can find it. I'd
>> > > also have no problem sticking it into examples/.
>> >
>> > [from above, the "main device thread" is "DeviceContextThread" (could be
>> > multiple if there are multiple contexts) and the "Python threads post
>> > requests..." are "ExecutionThreads" below]
>> >
>> > The memory freeing gets kind of bad though; right now, I have a
>> > DeviceContextThread which keeps a list of all memory allocated, and
>> makes
>> > the main thread drop the ref counts when ExecutionThreads no longer have
>> > any references. This can get tricky, because one has to ensure that
>> > ExecutionThreads [instances] don't have any references to memory, as the
>> > thread objects stick around after the thread actually closes. The other
>> > unfortunate aspect is that it's potentially slow and unintuitive "if
>> > getrefcount(ref) == 3".
>>
>> All types of memory handle in PyCUDA have an explicit 'free()'. Use that.
>> Forget refcounts.
>
>
> Are you suggesting writing c++-like code and tracking every piece of
> memory? I think this might take too much time since I haven't done it since
> the beginning...
>
> Also, I think you have a bug...
>
> ~device_allocation()
>       {
>         if (m_valid)
>           free();
>       }
>
> however,
> ~pooled_allocation()
>       { free(); }
>
> adding the if(m_valid) [to avoid the exception in the code below] seems to
> help with some problems... will get back to you with more later.
>
> Also, do you typically build using python setup.py build? It doesn't seem
> to detect c++ file changes for me...
>
> thanks,
> Nicholas


[cc'ing to list]

By the way, I know this is probably my problem, but have you ever
encountered,
"Fatal Python error: GC object already tracked"

For some reason, my multithreaded code isn't printing stack traces where I'd
like it to... any suggestions would be helpful (can pdb handle mt code
well?)

Thanks,
Nicholas

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Re: [PyCuda] async memcpy

Reply via email to