On Sun, Oct 18, 2009 at 09:53:18AM +0200, Laurent Etiemble wrote:
> Hello,
> I think Sledge Ham described correctly the issue. Let's analyse what going
> on the native and Monobjc worlds.
> 
> *** The Native World ***
> 
> Here are the results of my experiments (i have attached a sample native
> application that exhibit the behavior)
> 1) The thread exits
> 2) In order to clean-up, it calls "_pthread_exit" and "_pthread_tsd_cleanup"
> 3) And after, it posts a NSThreadWillExitNotification notification to inform
> observers that it has exited.
> 
> If you register for NSThreadWillExitNotification notifications and if you
> put a break point in the callback method, you will see the following
> stacktrace:
> 
> #0 0x00001d92 in -[Controller threadExitNotification:] at Controller.m:37
> #1 0x91aae253 in _nsnote_callback
> #2 0x9833dc29 in __CFXNotificationPost
> #3 0x9833d65a in _CFXNotificationPostNotification
> #4 0x91aa3120 in -[NSNotificationCenter
> postNotificationName:object:userInfo:]
> #5 0x91aa3dae in __NSFinalizeThreadData
> #6 0x98677a1d in _pthread_tsd_cleanup
> #7 0x986775ca in _pthread_exit
> #8 0x91aae984 in +[NSThread exit]
> #9 0x91aae92c in __NSThread__main__
> #10 0x9866ef39 in _pthread_start
> #11 0x9866edbe in thread_start
> 
> Sounds familiar ?
> 
> *** The Monobjc World ***
> 
> In Monobjc, we swizzle the NSObject dealloc method in order to maintain a
> consisency between native and managed world. It guarantees you that one and
> only one managed wrapper exists for a native object. Without that, we will
> have many wrappers for one native object. I have searched other ways to do
> that without any success (One day, I wish I could find how to do that
> without messing around with swizzling).
> 
> What we learned from the Sledge Ham's stack trace is:
> 1) The thread exits
> 2) It calls "_pthread_exit" and "_pthread_tsd_cleanup".
> 3) The Mono runtime unregister the thread (through a hook installed when the
> thread has registered)
> 4) The thread allocated a notification and posts it to the
> NSNotificationCenter
> 5) One the notification has been dispatched, the thread deallocs the
> notification object.
> 6) When the dealloc message is sent, the Monobjc's native-to-managed
> callback is called.
> 7) The Mono runtime see that a new thread wants to be registered (remember
> that the thread has been previously removed)
> 8) The Mono runtime has a reference on a dead thread (the thread has exited
> right after the notification).
> 9) The Mono runtime crashes when it tries to get the thread's stack trace as
> it is not valid anymore.
> 
> *** Conclusion ***
> 
> The dealloc swizzling is one of the suspects: if you disabled swizzling
> (comment the ObjectiveCMessage attribute on the NSObject dealloc method),
> there is no crash anymore. So an short term solution would be to avoid
> method swizzling. But there is still a situation regarding the use of
> NSThreadWillExitNotification notifications as they will cross the bridge and
> trigger the registration of the dead thread in the Mono runtime, if an
> application registers for them.
> 
> So, there are two options left:
> 1) Remove method swizzling by finding another solution, and prohibit the use
> of thread related notifications.
> 2) Patch the Mono runtime to check that when a foreign thread is added, it
> is a valid one. Help will be greatly appreciated to find a robust way to do
> it (pthread + mach checks).
Laurent,

What about the following approach?

We replace the deregister_foreign method in libgc/pthread_support.c with 
something which marks the thread as about to terminate (just add a bit in the 
flag), but do not terminate it yet. 

Then in the darwin_stop_world.c we test if the TERMINATING-flag is set...
If it is set, we call the mach_port_get_refs to count the number of 
MACH_PORT_RIGHT_DEAD_NAME on the thread. If it is > 0 then this is the signal 
to deallocate all the reservations for this thread, because it went dead. So we 
call the orignal deregister_foreign method, because we are now sure that the 
thread really exited . After deallocating all this stuff we process the next 
entry in the GC-threads...
If it is not set, we just do the same as what is already implemented.


This approach describes a way where we only cleanup data when the thread really 
exited...There is no possibility to have other parts of 'garbage'/thread clean 
up to occur later on.

btw... is monobjc doing some reservations in the thread specific data or is 
this cocoa?

cc




> 
> Any comments ?
> 
> Regards, Laurent Etiemble.
> 
> 2009/10/14 Sledge Ham <[email protected]>
> 
> > > >From: Anthony Bowker <[email protected]>
> > > >Date: Tue 13 Oct 2009 21:41:20 GMT+02:00
> > > >To: "[email protected]" <[email protected]>
> > > >Subject: RE: [[email protected]] Re: [[email protected]]
> > > >Feeback Wanted on Snow Leopard
> > > >Reply-To: "[email protected]" <[email protected]>
> > > >
> > > >I haven't been able to devote much time over the past couple of
> > > >weeks to
> > > >building mono with the patch and testing my app.  My apologies for
> > > >not being
> > > >able to help out more.
> > > >
> > > >I've just noticed Sledge Ham's stack trace in the bug
> > > >(https://bugzilla.novell.com/show_bug.cgi?id=537764) and since the
> > > >notification in the stack is dealloc, I've been reading about
> > > >monobjc's
> > > >method swizzling on dealloc.
> > > >
> > > >How about this hypothesis: In Snow Leopard, there are some foreign
> > > >threads
> > > >created by the ObjectiveC runtime which do some stuff with
> > > >NSObjects, and
> > > >eventually their NSObject:dealloc method is called, this is swizzled
> > > >by
> > > >monobjc which runs the managed NSObjectImposter.dealloc method to
> > > >maintain
> > > >the managed wrapper class instances.  This is a managed method, so
> > > >mono
> > > >kicks into gear and mono_jit_thread_attach and friends are called to
> > > >add the
> > > >thread to the GC_threads-array.  This is a peculiar thread, it goes
> > > >away
> > > >later without notification to the mono runtime, and causes the crash.
> > > >
> > > >It would be cool if we could swizzle for only the threads we care
> > > >about.  Or
> > > >perhaps we should swizzle with native code first...
> > > >
> > Anthony,
> >
> > What about the following hypothesis:
> > In snow leopard there are some foreign threads created by/used by the
> > ObjectiveC runtime which do some stuff with NSObjects.
> > One of the actions is that this thread is registered with mono and
> > mono registers a thread specific data with clean up function.
> >
> >
> > Later some object/function does some things with NS objects.. the
> > runtime now allocates a new NSobject and stores it in some thread
> > specific data with a clean up handler...
> >
> > Now for some reason this thread decides to exit (probably because the
> > app lost focus or so).. Now the pthread calls the pthread_exit()
> > function.
> >
> > One of the tasks of the pthread_exit function is to clean up all
> > thread specific data.. The first thing it does is deregister itself
> > from the GC..
> > The second thing it does, is proces the other thread specific cleanup
> > handlers, which is the dealloc-function... This is picked up by mono
> > which does not find the thread (it just go cleaned up in the previous
> > step) so it reregisters this thread and does all required stuff
> > (perform the dealloc on the objects). When this clean up has
> > completed, the cleanup handler returns and the thread thinks it
> > cleaned up everything..
> >
> > but... wait.. the dealloc cause a reregistration of the thread.. and
> > the thread exited.. We have an inconsistecy between the threads which
> > are still alive (accroding to the processor/OS) and which threads do
> > exists according to mono's gc..
> >
> > The problem is that to my knowledge pthreads do not specify an order
> > in which thread-specific cleanup is implemented... To my feeling we
> > should find a way to make sure we do not reregister threads when they
> > are being cleaned up. as pthreads do not guarantee any order we are
> > kinda screwed...
> >
> >
> > Ideas I have, but which are probably not ok:
> >
> > 1) When the thread is deregistered we call ourselves
> > pthread_tsd_cleanup...
> > This will make sure all other thread specific data is gone before we
> > return from the deregister functions... I am not sure if this will
> > work however, because we would have a nested
> > pthread_tsd_cleanup().. Chances are very high that the outer
> > pthread_tsd_cleanup will crash because data structures have changed..
> >
> > 2) We try to implement something which makes sure that we do not
> > reregister a thread which is quiting.. how?
> > a) we add a new thread
> > specific data with clean up and we 'hope' that it gets called again
> > after the other objects are cleaned.. This is also very dependent on
> > the pthread implementation, so it is probably not a good way
> >
> > b) while deregistering we add a new tsd which says that the thread is
> > quitting... before registering a function we check that this tsd does
> > not exist.. If it does we do not reregister the thread. problem is
> > that this new tsd might be cleanedup before the other objects and we
> > run into the same issue..
> >
> >
> > I think that without a good insight of the pthread_tsd_cleanup() we
> > can make guesses on what is happening and probably introduce a lot of
> > new issues ;-(
> >
> >
> > Maybe we should think in a completely other direction... Instead of
> > deleting the thread when the deregister function is called, we just
> > mark the thread stale (like laurent proposed) and we register the time
> > at which the thread went stale... Then at a later time we remove all
> > threads which were stale for at least x (10?) seconds or
> > so... Question I have is.. what if after 5 seconds a new thread is
> > fired with the same threadid.. we will probably again run into an
> > inconsistent state...
> >
> > Another idea .. which is a radical change.. Instead of trying to
> > register the threads, can't we just ask mach which threads are alive
> > and suspend all these threads when the GC runs? this way we do not
> > rely on any pthreads mechanism...
> >
> > Other ideas are welcome...
> >
> > Sledge
> >
> >
> > > >I realize this is probably the wrong place to be asking, but does
> > > >anyone
> > > >know if the other mono-objectiveC bridges have the same problem?
> > > >
> > > >I'm very new to understanding swizzling and the mono GC, so if what
> > > >I've
> > > >written above sounds like nonsense, it probably is :-)
> > > >
> > > >Many thanks for everyone's efforts here.
> > > >
> > > >Anthony
> > >
> >


Reply via email to