Hello,

Your approach seems ok as we need a reliable way to remove dead
thread. Note that we also must be sure that the CG_Thread entry is
removed as well (and I suspect that some debugger entries have to be
also removed). I did some tests when trying to avoid the memory leak
after marking, but it seems that I hit a deadlock somewhere that
prevent the removal of the CG_Thread entry. Maybe you will be luckier
than me...

Monobjc is not changing anything in the TSD. It only swizzles the
dealloc method of the NSObject class. So I guess this is between
Objective-C and Mono...

Regards, Laurent Etiemble.

2009/10/20 Sledge Ham <mahegd...@telenet.be>:
> On Sun, Oct 18, 2009 at 09:53:18AM +0200, Laurent Etiemble wrote:
>> Hello,
>> I think Sledge Ham described correctly the issue. Let's analyse what going
>> on the native and Monobjc worlds.
>>
>> *** The Native World ***
>>
>> Here are the results of my experiments (i have attached a sample native
>> application that exhibit the behavior)
>> 1) The thread exits
>> 2) In order to clean-up, it calls "_pthread_exit" and "_pthread_tsd_cleanup"
>> 3) And after, it posts a NSThreadWillExitNotification notification to inform
>> observers that it has exited.
>>
>> If you register for NSThreadWillExitNotification notifications and if you
>> put a break point in the callback method, you will see the following
>> stacktrace:
>>
>> #0 0x00001d92 in -[Controller threadExitNotification:] at Controller.m:37
>> #1 0x91aae253 in _nsnote_callback
>> #2 0x9833dc29 in __CFXNotificationPost
>> #3 0x9833d65a in _CFXNotificationPostNotification
>> #4 0x91aa3120 in -[NSNotificationCenter
>> postNotificationName:object:userInfo:]
>> #5 0x91aa3dae in __NSFinalizeThreadData
>> #6 0x98677a1d in _pthread_tsd_cleanup
>> #7 0x986775ca in _pthread_exit
>> #8 0x91aae984 in +[NSThread exit]
>> #9 0x91aae92c in __NSThread__main__
>> #10 0x9866ef39 in _pthread_start
>> #11 0x9866edbe in thread_start
>>
>> Sounds familiar ?
>>
>> *** The Monobjc World ***
>>
>> In Monobjc, we swizzle the NSObject dealloc method in order to maintain a
>> consisency between native and managed world. It guarantees you that one and
>> only one managed wrapper exists for a native object. Without that, we will
>> have many wrappers for one native object. I have searched other ways to do
>> that without any success (One day, I wish I could find how to do that
>> without messing around with swizzling).
>>
>> What we learned from the Sledge Ham's stack trace is:
>> 1) The thread exits
>> 2) It calls "_pthread_exit" and "_pthread_tsd_cleanup".
>> 3) The Mono runtime unregister the thread (through a hook installed when the
>> thread has registered)
>> 4) The thread allocated a notification and posts it to the
>> NSNotificationCenter
>> 5) One the notification has been dispatched, the thread deallocs the
>> notification object.
>> 6) When the dealloc message is sent, the Monobjc's native-to-managed
>> callback is called.
>> 7) The Mono runtime see that a new thread wants to be registered (remember
>> that the thread has been previously removed)
>> 8) The Mono runtime has a reference on a dead thread (the thread has exited
>> right after the notification).
>> 9) The Mono runtime crashes when it tries to get the thread's stack trace as
>> it is not valid anymore.
>>
>> *** Conclusion ***
>>
>> The dealloc swizzling is one of the suspects: if you disabled swizzling
>> (comment the ObjectiveCMessage attribute on the NSObject dealloc method),
>> there is no crash anymore. So an short term solution would be to avoid
>> method swizzling. But there is still a situation regarding the use of
>> NSThreadWillExitNotification notifications as they will cross the bridge and
>> trigger the registration of the dead thread in the Mono runtime, if an
>> application registers for them.
>>
>> So, there are two options left:
>> 1) Remove method swizzling by finding another solution, and prohibit the use
>> of thread related notifications.
>> 2) Patch the Mono runtime to check that when a foreign thread is added, it
>> is a valid one. Help will be greatly appreciated to find a robust way to do
>> it (pthread + mach checks).
> Laurent,
>
> What about the following approach?
>
> We replace the deregister_foreign method in libgc/pthread_support.c with 
> something which marks the thread as about to terminate (just add a bit in the 
> flag), but do not terminate it yet.
>
> Then in the darwin_stop_world.c we test if the TERMINATING-flag is set...
> If it is set, we call the mach_port_get_refs to count the number of 
> MACH_PORT_RIGHT_DEAD_NAME on the thread. If it is > 0 then this is the signal 
> to deallocate all the reservations for this thread, because it went dead. So 
> we call the orignal deregister_foreign method, because we are now sure that 
> the thread really exited . After deallocating all this stuff we process the 
> next entry in the GC-threads...
> If it is not set, we just do the same as what is already implemented.
>
>
> This approach describes a way where we only cleanup data when the thread 
> really exited...There is no possibility to have other parts of 
> 'garbage'/thread clean up to occur later on.
>
> btw... is monobjc doing some reservations in the thread specific data or is 
> this cocoa?
>
> cc
>
>
>
>
>>
>> Any comments ?
>>
>> Regards, Laurent Etiemble.
>>
>> 2009/10/14 Sledge Ham <mahegd...@telenet.be>
>>
>> > > >From: Anthony Bowker <anth...@flowol.com>
>> > > >Date: Tue 13 Oct 2009 21:41:20 GMT+02:00
>> > > >To: "users@lists.monobjc.net" <users@lists.monobjc.net>
>> > > >Subject: RE: [us...@lists.monobjc.net] Re: [us...@lists.monobjc.net]
>> > > >Feeback Wanted on Snow Leopard
>> > > >Reply-To: "users@lists.monobjc.net" <users@lists.monobjc.net>
>> > > >
>> > > >I haven't been able to devote much time over the past couple of
>> > > >weeks to
>> > > >building mono with the patch and testing my app.  My apologies for
>> > > >not being
>> > > >able to help out more.
>> > > >
>> > > >I've just noticed Sledge Ham's stack trace in the bug
>> > > >(https://bugzilla.novell.com/show_bug.cgi?id=537764) and since the
>> > > >notification in the stack is dealloc, I've been reading about
>> > > >monobjc's
>> > > >method swizzling on dealloc.
>> > > >
>> > > >How about this hypothesis: In Snow Leopard, there are some foreign
>> > > >threads
>> > > >created by the ObjectiveC runtime which do some stuff with
>> > > >NSObjects, and
>> > > >eventually their NSObject:dealloc method is called, this is swizzled
>> > > >by
>> > > >monobjc which runs the managed NSObjectImposter.dealloc method to
>> > > >maintain
>> > > >the managed wrapper class instances.  This is a managed method, so
>> > > >mono
>> > > >kicks into gear and mono_jit_thread_attach and friends are called to
>> > > >add the
>> > > >thread to the GC_threads-array.  This is a peculiar thread, it goes
>> > > >away
>> > > >later without notification to the mono runtime, and causes the crash.
>> > > >
>> > > >It would be cool if we could swizzle for only the threads we care
>> > > >about.  Or
>> > > >perhaps we should swizzle with native code first...
>> > > >
>> > Anthony,
>> >
>> > What about the following hypothesis:
>> > In snow leopard there are some foreign threads created by/used by the
>> > ObjectiveC runtime which do some stuff with NSObjects.
>> > One of the actions is that this thread is registered with mono and
>> > mono registers a thread specific data with clean up function.
>> >
>> >
>> > Later some object/function does some things with NS objects.. the
>> > runtime now allocates a new NSobject and stores it in some thread
>> > specific data with a clean up handler...
>> >
>> > Now for some reason this thread decides to exit (probably because the
>> > app lost focus or so).. Now the pthread calls the pthread_exit()
>> > function.
>> >
>> > One of the tasks of the pthread_exit function is to clean up all
>> > thread specific data.. The first thing it does is deregister itself
>> > from the GC..
>> > The second thing it does, is proces the other thread specific cleanup
>> > handlers, which is the dealloc-function... This is picked up by mono
>> > which does not find the thread (it just go cleaned up in the previous
>> > step) so it reregisters this thread and does all required stuff
>> > (perform the dealloc on the objects). When this clean up has
>> > completed, the cleanup handler returns and the thread thinks it
>> > cleaned up everything..
>> >
>> > but... wait.. the dealloc cause a reregistration of the thread.. and
>> > the thread exited.. We have an inconsistecy between the threads which
>> > are still alive (accroding to the processor/OS) and which threads do
>> > exists according to mono's gc..
>> >
>> > The problem is that to my knowledge pthreads do not specify an order
>> > in which thread-specific cleanup is implemented... To my feeling we
>> > should find a way to make sure we do not reregister threads when they
>> > are being cleaned up. as pthreads do not guarantee any order we are
>> > kinda screwed...
>> >
>> >
>> > Ideas I have, but which are probably not ok:
>> >
>> > 1) When the thread is deregistered we call ourselves
>> > pthread_tsd_cleanup...
>> > This will make sure all other thread specific data is gone before we
>> > return from the deregister functions... I am not sure if this will
>> > work however, because we would have a nested
>> > pthread_tsd_cleanup().. Chances are very high that the outer
>> > pthread_tsd_cleanup will crash because data structures have changed..
>> >
>> > 2) We try to implement something which makes sure that we do not
>> > reregister a thread which is quiting.. how?
>> > a) we add a new thread
>> > specific data with clean up and we 'hope' that it gets called again
>> > after the other objects are cleaned.. This is also very dependent on
>> > the pthread implementation, so it is probably not a good way
>> >
>> > b) while deregistering we add a new tsd which says that the thread is
>> > quitting... before registering a function we check that this tsd does
>> > not exist.. If it does we do not reregister the thread. problem is
>> > that this new tsd might be cleanedup before the other objects and we
>> > run into the same issue..
>> >
>> >
>> > I think that without a good insight of the pthread_tsd_cleanup() we
>> > can make guesses on what is happening and probably introduce a lot of
>> > new issues ;-(
>> >
>> >
>> > Maybe we should think in a completely other direction... Instead of
>> > deleting the thread when the deregister function is called, we just
>> > mark the thread stale (like laurent proposed) and we register the time
>> > at which the thread went stale... Then at a later time we remove all
>> > threads which were stale for at least x (10?) seconds or
>> > so... Question I have is.. what if after 5 seconds a new thread is
>> > fired with the same threadid.. we will probably again run into an
>> > inconsistent state...
>> >
>> > Another idea .. which is a radical change.. Instead of trying to
>> > register the threads, can't we just ask mach which threads are alive
>> > and suspend all these threads when the GC runs? this way we do not
>> > rely on any pthreads mechanism...
>> >
>> > Other ideas are welcome...
>> >
>> > Sledge
>> >
>> >
>> > > >I realize this is probably the wrong place to be asking, but does
>> > > >anyone
>> > > >know if the other mono-objectiveC bridges have the same problem?
>> > > >
>> > > >I'm very new to understanding swizzling and the mono GC, so if what
>> > > >I've
>> > > >written above sounds like nonsense, it probably is :-)
>> > > >
>> > > >Many thanks for everyone's efforts here.
>> > > >
>> > > >Anthony
>> > >
>> >
>
>
>

Reply via email to