On Sun, Oct 18, 2009 at 09:53:18AM +0200, Laurent Etiemble wrote: > Hello, > I think Sledge Ham described correctly the issue. Let's analyse what going > on the native and Monobjc worlds. > > *** The Native World *** > > Here are the results of my experiments (i have attached a sample native > application that exhibit the behavior) > 1) The thread exits > 2) In order to clean-up, it calls "_pthread_exit" and "_pthread_tsd_cleanup" > 3) And after, it posts a NSThreadWillExitNotification notification to inform > observers that it has exited. > > If you register for NSThreadWillExitNotification notifications and if you > put a break point in the callback method, you will see the following > stacktrace: > > #0 0x00001d92 in -[Controller threadExitNotification:] at Controller.m:37 > #1 0x91aae253 in _nsnote_callback > #2 0x9833dc29 in __CFXNotificationPost > #3 0x9833d65a in _CFXNotificationPostNotification > #4 0x91aa3120 in -[NSNotificationCenter > postNotificationName:object:userInfo:] > #5 0x91aa3dae in __NSFinalizeThreadData > #6 0x98677a1d in _pthread_tsd_cleanup > #7 0x986775ca in _pthread_exit > #8 0x91aae984 in +[NSThread exit] > #9 0x91aae92c in __NSThread__main__ > #10 0x9866ef39 in _pthread_start > #11 0x9866edbe in thread_start > > Sounds familiar ? > > *** The Monobjc World *** > > In Monobjc, we swizzle the NSObject dealloc method in order to maintain a > consisency between native and managed world. It guarantees you that one and > only one managed wrapper exists for a native object. Without that, we will > have many wrappers for one native object. I have searched other ways to do > that without any success (One day, I wish I could find how to do that > without messing around with swizzling). > > What we learned from the Sledge Ham's stack trace is: > 1) The thread exits > 2) It calls "_pthread_exit" and "_pthread_tsd_cleanup". > 3) The Mono runtime unregister the thread (through a hook installed when the > thread has registered) > 4) The thread allocated a notification and posts it to the > NSNotificationCenter > 5) One the notification has been dispatched, the thread deallocs the > notification object. > 6) When the dealloc message is sent, the Monobjc's native-to-managed > callback is called. > 7) The Mono runtime see that a new thread wants to be registered (remember > that the thread has been previously removed) > 8) The Mono runtime has a reference on a dead thread (the thread has exited > right after the notification). > 9) The Mono runtime crashes when it tries to get the thread's stack trace as > it is not valid anymore. > > *** Conclusion *** > > The dealloc swizzling is one of the suspects: if you disabled swizzling > (comment the ObjectiveCMessage attribute on the NSObject dealloc method), > there is no crash anymore. So an short term solution would be to avoid > method swizzling. But there is still a situation regarding the use of > NSThreadWillExitNotification notifications as they will cross the bridge and > trigger the registration of the dead thread in the Mono runtime, if an > application registers for them. > > So, there are two options left: > 1) Remove method swizzling by finding another solution, and prohibit the use > of thread related notifications. > 2) Patch the Mono runtime to check that when a foreign thread is added, it > is a valid one. Help will be greatly appreciated to find a robust way to do > it (pthread + mach checks). Laurent,
What about the following approach? We replace the deregister_foreign method in libgc/pthread_support.c with something which marks the thread as about to terminate (just add a bit in the flag), but do not terminate it yet. Then in the darwin_stop_world.c we test if the TERMINATING-flag is set... If it is set, we call the mach_port_get_refs to count the number of MACH_PORT_RIGHT_DEAD_NAME on the thread. If it is > 0 then this is the signal to deallocate all the reservations for this thread, because it went dead. So we call the orignal deregister_foreign method, because we are now sure that the thread really exited . After deallocating all this stuff we process the next entry in the GC-threads... If it is not set, we just do the same as what is already implemented. This approach describes a way where we only cleanup data when the thread really exited...There is no possibility to have other parts of 'garbage'/thread clean up to occur later on. btw... is monobjc doing some reservations in the thread specific data or is this cocoa? cc > > Any comments ? > > Regards, Laurent Etiemble. > > 2009/10/14 Sledge Ham <[email protected]> > > > > >From: Anthony Bowker <[email protected]> > > > >Date: Tue 13 Oct 2009 21:41:20 GMT+02:00 > > > >To: "[email protected]" <[email protected]> > > > >Subject: RE: [[email protected]] Re: [[email protected]] > > > >Feeback Wanted on Snow Leopard > > > >Reply-To: "[email protected]" <[email protected]> > > > > > > > >I haven't been able to devote much time over the past couple of > > > >weeks to > > > >building mono with the patch and testing my app. My apologies for > > > >not being > > > >able to help out more. > > > > > > > >I've just noticed Sledge Ham's stack trace in the bug > > > >(https://bugzilla.novell.com/show_bug.cgi?id=537764) and since the > > > >notification in the stack is dealloc, I've been reading about > > > >monobjc's > > > >method swizzling on dealloc. > > > > > > > >How about this hypothesis: In Snow Leopard, there are some foreign > > > >threads > > > >created by the ObjectiveC runtime which do some stuff with > > > >NSObjects, and > > > >eventually their NSObject:dealloc method is called, this is swizzled > > > >by > > > >monobjc which runs the managed NSObjectImposter.dealloc method to > > > >maintain > > > >the managed wrapper class instances. This is a managed method, so > > > >mono > > > >kicks into gear and mono_jit_thread_attach and friends are called to > > > >add the > > > >thread to the GC_threads-array. This is a peculiar thread, it goes > > > >away > > > >later without notification to the mono runtime, and causes the crash. > > > > > > > >It would be cool if we could swizzle for only the threads we care > > > >about. Or > > > >perhaps we should swizzle with native code first... > > > > > > Anthony, > > > > What about the following hypothesis: > > In snow leopard there are some foreign threads created by/used by the > > ObjectiveC runtime which do some stuff with NSObjects. > > One of the actions is that this thread is registered with mono and > > mono registers a thread specific data with clean up function. > > > > > > Later some object/function does some things with NS objects.. the > > runtime now allocates a new NSobject and stores it in some thread > > specific data with a clean up handler... > > > > Now for some reason this thread decides to exit (probably because the > > app lost focus or so).. Now the pthread calls the pthread_exit() > > function. > > > > One of the tasks of the pthread_exit function is to clean up all > > thread specific data.. The first thing it does is deregister itself > > from the GC.. > > The second thing it does, is proces the other thread specific cleanup > > handlers, which is the dealloc-function... This is picked up by mono > > which does not find the thread (it just go cleaned up in the previous > > step) so it reregisters this thread and does all required stuff > > (perform the dealloc on the objects). When this clean up has > > completed, the cleanup handler returns and the thread thinks it > > cleaned up everything.. > > > > but... wait.. the dealloc cause a reregistration of the thread.. and > > the thread exited.. We have an inconsistecy between the threads which > > are still alive (accroding to the processor/OS) and which threads do > > exists according to mono's gc.. > > > > The problem is that to my knowledge pthreads do not specify an order > > in which thread-specific cleanup is implemented... To my feeling we > > should find a way to make sure we do not reregister threads when they > > are being cleaned up. as pthreads do not guarantee any order we are > > kinda screwed... > > > > > > Ideas I have, but which are probably not ok: > > > > 1) When the thread is deregistered we call ourselves > > pthread_tsd_cleanup... > > This will make sure all other thread specific data is gone before we > > return from the deregister functions... I am not sure if this will > > work however, because we would have a nested > > pthread_tsd_cleanup().. Chances are very high that the outer > > pthread_tsd_cleanup will crash because data structures have changed.. > > > > 2) We try to implement something which makes sure that we do not > > reregister a thread which is quiting.. how? > > a) we add a new thread > > specific data with clean up and we 'hope' that it gets called again > > after the other objects are cleaned.. This is also very dependent on > > the pthread implementation, so it is probably not a good way > > > > b) while deregistering we add a new tsd which says that the thread is > > quitting... before registering a function we check that this tsd does > > not exist.. If it does we do not reregister the thread. problem is > > that this new tsd might be cleanedup before the other objects and we > > run into the same issue.. > > > > > > I think that without a good insight of the pthread_tsd_cleanup() we > > can make guesses on what is happening and probably introduce a lot of > > new issues ;-( > > > > > > Maybe we should think in a completely other direction... Instead of > > deleting the thread when the deregister function is called, we just > > mark the thread stale (like laurent proposed) and we register the time > > at which the thread went stale... Then at a later time we remove all > > threads which were stale for at least x (10?) seconds or > > so... Question I have is.. what if after 5 seconds a new thread is > > fired with the same threadid.. we will probably again run into an > > inconsistent state... > > > > Another idea .. which is a radical change.. Instead of trying to > > register the threads, can't we just ask mach which threads are alive > > and suspend all these threads when the GC runs? this way we do not > > rely on any pthreads mechanism... > > > > Other ideas are welcome... > > > > Sledge > > > > > > > >I realize this is probably the wrong place to be asking, but does > > > >anyone > > > >know if the other mono-objectiveC bridges have the same problem? > > > > > > > >I'm very new to understanding swizzling and the mono GC, so if what > > > >I've > > > >written above sounds like nonsense, it probably is :-) > > > > > > > >Many thanks for everyone's efforts here. > > > > > > > >Anthony > > > > >

