Hello, Your approach seems ok as we need a reliable way to remove dead thread. Note that we also must be sure that the CG_Thread entry is removed as well (and I suspect that some debugger entries have to be also removed). I did some tests when trying to avoid the memory leak after marking, but it seems that I hit a deadlock somewhere that prevent the removal of the CG_Thread entry. Maybe you will be luckier than me...
Monobjc is not changing anything in the TSD. It only swizzles the dealloc method of the NSObject class. So I guess this is between Objective-C and Mono... Regards, Laurent Etiemble. 2009/10/20 Sledge Ham <mahegd...@telenet.be>: > On Sun, Oct 18, 2009 at 09:53:18AM +0200, Laurent Etiemble wrote: >> Hello, >> I think Sledge Ham described correctly the issue. Let's analyse what going >> on the native and Monobjc worlds. >> >> *** The Native World *** >> >> Here are the results of my experiments (i have attached a sample native >> application that exhibit the behavior) >> 1) The thread exits >> 2) In order to clean-up, it calls "_pthread_exit" and "_pthread_tsd_cleanup" >> 3) And after, it posts a NSThreadWillExitNotification notification to inform >> observers that it has exited. >> >> If you register for NSThreadWillExitNotification notifications and if you >> put a break point in the callback method, you will see the following >> stacktrace: >> >> #0 0x00001d92 in -[Controller threadExitNotification:] at Controller.m:37 >> #1 0x91aae253 in _nsnote_callback >> #2 0x9833dc29 in __CFXNotificationPost >> #3 0x9833d65a in _CFXNotificationPostNotification >> #4 0x91aa3120 in -[NSNotificationCenter >> postNotificationName:object:userInfo:] >> #5 0x91aa3dae in __NSFinalizeThreadData >> #6 0x98677a1d in _pthread_tsd_cleanup >> #7 0x986775ca in _pthread_exit >> #8 0x91aae984 in +[NSThread exit] >> #9 0x91aae92c in __NSThread__main__ >> #10 0x9866ef39 in _pthread_start >> #11 0x9866edbe in thread_start >> >> Sounds familiar ? >> >> *** The Monobjc World *** >> >> In Monobjc, we swizzle the NSObject dealloc method in order to maintain a >> consisency between native and managed world. It guarantees you that one and >> only one managed wrapper exists for a native object. Without that, we will >> have many wrappers for one native object. I have searched other ways to do >> that without any success (One day, I wish I could find how to do that >> without messing around with swizzling). >> >> What we learned from the Sledge Ham's stack trace is: >> 1) The thread exits >> 2) It calls "_pthread_exit" and "_pthread_tsd_cleanup". >> 3) The Mono runtime unregister the thread (through a hook installed when the >> thread has registered) >> 4) The thread allocated a notification and posts it to the >> NSNotificationCenter >> 5) One the notification has been dispatched, the thread deallocs the >> notification object. >> 6) When the dealloc message is sent, the Monobjc's native-to-managed >> callback is called. >> 7) The Mono runtime see that a new thread wants to be registered (remember >> that the thread has been previously removed) >> 8) The Mono runtime has a reference on a dead thread (the thread has exited >> right after the notification). >> 9) The Mono runtime crashes when it tries to get the thread's stack trace as >> it is not valid anymore. >> >> *** Conclusion *** >> >> The dealloc swizzling is one of the suspects: if you disabled swizzling >> (comment the ObjectiveCMessage attribute on the NSObject dealloc method), >> there is no crash anymore. So an short term solution would be to avoid >> method swizzling. But there is still a situation regarding the use of >> NSThreadWillExitNotification notifications as they will cross the bridge and >> trigger the registration of the dead thread in the Mono runtime, if an >> application registers for them. >> >> So, there are two options left: >> 1) Remove method swizzling by finding another solution, and prohibit the use >> of thread related notifications. >> 2) Patch the Mono runtime to check that when a foreign thread is added, it >> is a valid one. Help will be greatly appreciated to find a robust way to do >> it (pthread + mach checks). > Laurent, > > What about the following approach? > > We replace the deregister_foreign method in libgc/pthread_support.c with > something which marks the thread as about to terminate (just add a bit in the > flag), but do not terminate it yet. > > Then in the darwin_stop_world.c we test if the TERMINATING-flag is set... > If it is set, we call the mach_port_get_refs to count the number of > MACH_PORT_RIGHT_DEAD_NAME on the thread. If it is > 0 then this is the signal > to deallocate all the reservations for this thread, because it went dead. So > we call the orignal deregister_foreign method, because we are now sure that > the thread really exited . After deallocating all this stuff we process the > next entry in the GC-threads... > If it is not set, we just do the same as what is already implemented. > > > This approach describes a way where we only cleanup data when the thread > really exited...There is no possibility to have other parts of > 'garbage'/thread clean up to occur later on. > > btw... is monobjc doing some reservations in the thread specific data or is > this cocoa? > > cc > > > > >> >> Any comments ? >> >> Regards, Laurent Etiemble. >> >> 2009/10/14 Sledge Ham <mahegd...@telenet.be> >> >> > > >From: Anthony Bowker <anth...@flowol.com> >> > > >Date: Tue 13 Oct 2009 21:41:20 GMT+02:00 >> > > >To: "users@lists.monobjc.net" <users@lists.monobjc.net> >> > > >Subject: RE: [us...@lists.monobjc.net] Re: [us...@lists.monobjc.net] >> > > >Feeback Wanted on Snow Leopard >> > > >Reply-To: "users@lists.monobjc.net" <users@lists.monobjc.net> >> > > > >> > > >I haven't been able to devote much time over the past couple of >> > > >weeks to >> > > >building mono with the patch and testing my app. My apologies for >> > > >not being >> > > >able to help out more. >> > > > >> > > >I've just noticed Sledge Ham's stack trace in the bug >> > > >(https://bugzilla.novell.com/show_bug.cgi?id=537764) and since the >> > > >notification in the stack is dealloc, I've been reading about >> > > >monobjc's >> > > >method swizzling on dealloc. >> > > > >> > > >How about this hypothesis: In Snow Leopard, there are some foreign >> > > >threads >> > > >created by the ObjectiveC runtime which do some stuff with >> > > >NSObjects, and >> > > >eventually their NSObject:dealloc method is called, this is swizzled >> > > >by >> > > >monobjc which runs the managed NSObjectImposter.dealloc method to >> > > >maintain >> > > >the managed wrapper class instances. This is a managed method, so >> > > >mono >> > > >kicks into gear and mono_jit_thread_attach and friends are called to >> > > >add the >> > > >thread to the GC_threads-array. This is a peculiar thread, it goes >> > > >away >> > > >later without notification to the mono runtime, and causes the crash. >> > > > >> > > >It would be cool if we could swizzle for only the threads we care >> > > >about. Or >> > > >perhaps we should swizzle with native code first... >> > > > >> > Anthony, >> > >> > What about the following hypothesis: >> > In snow leopard there are some foreign threads created by/used by the >> > ObjectiveC runtime which do some stuff with NSObjects. >> > One of the actions is that this thread is registered with mono and >> > mono registers a thread specific data with clean up function. >> > >> > >> > Later some object/function does some things with NS objects.. the >> > runtime now allocates a new NSobject and stores it in some thread >> > specific data with a clean up handler... >> > >> > Now for some reason this thread decides to exit (probably because the >> > app lost focus or so).. Now the pthread calls the pthread_exit() >> > function. >> > >> > One of the tasks of the pthread_exit function is to clean up all >> > thread specific data.. The first thing it does is deregister itself >> > from the GC.. >> > The second thing it does, is proces the other thread specific cleanup >> > handlers, which is the dealloc-function... This is picked up by mono >> > which does not find the thread (it just go cleaned up in the previous >> > step) so it reregisters this thread and does all required stuff >> > (perform the dealloc on the objects). When this clean up has >> > completed, the cleanup handler returns and the thread thinks it >> > cleaned up everything.. >> > >> > but... wait.. the dealloc cause a reregistration of the thread.. and >> > the thread exited.. We have an inconsistecy between the threads which >> > are still alive (accroding to the processor/OS) and which threads do >> > exists according to mono's gc.. >> > >> > The problem is that to my knowledge pthreads do not specify an order >> > in which thread-specific cleanup is implemented... To my feeling we >> > should find a way to make sure we do not reregister threads when they >> > are being cleaned up. as pthreads do not guarantee any order we are >> > kinda screwed... >> > >> > >> > Ideas I have, but which are probably not ok: >> > >> > 1) When the thread is deregistered we call ourselves >> > pthread_tsd_cleanup... >> > This will make sure all other thread specific data is gone before we >> > return from the deregister functions... I am not sure if this will >> > work however, because we would have a nested >> > pthread_tsd_cleanup().. Chances are very high that the outer >> > pthread_tsd_cleanup will crash because data structures have changed.. >> > >> > 2) We try to implement something which makes sure that we do not >> > reregister a thread which is quiting.. how? >> > a) we add a new thread >> > specific data with clean up and we 'hope' that it gets called again >> > after the other objects are cleaned.. This is also very dependent on >> > the pthread implementation, so it is probably not a good way >> > >> > b) while deregistering we add a new tsd which says that the thread is >> > quitting... before registering a function we check that this tsd does >> > not exist.. If it does we do not reregister the thread. problem is >> > that this new tsd might be cleanedup before the other objects and we >> > run into the same issue.. >> > >> > >> > I think that without a good insight of the pthread_tsd_cleanup() we >> > can make guesses on what is happening and probably introduce a lot of >> > new issues ;-( >> > >> > >> > Maybe we should think in a completely other direction... Instead of >> > deleting the thread when the deregister function is called, we just >> > mark the thread stale (like laurent proposed) and we register the time >> > at which the thread went stale... Then at a later time we remove all >> > threads which were stale for at least x (10?) seconds or >> > so... Question I have is.. what if after 5 seconds a new thread is >> > fired with the same threadid.. we will probably again run into an >> > inconsistent state... >> > >> > Another idea .. which is a radical change.. Instead of trying to >> > register the threads, can't we just ask mach which threads are alive >> > and suspend all these threads when the GC runs? this way we do not >> > rely on any pthreads mechanism... >> > >> > Other ideas are welcome... >> > >> > Sledge >> > >> > >> > > >I realize this is probably the wrong place to be asking, but does >> > > >anyone >> > > >know if the other mono-objectiveC bridges have the same problem? >> > > > >> > > >I'm very new to understanding swizzling and the mono GC, so if what >> > > >I've >> > > >written above sounds like nonsense, it probably is :-) >> > > > >> > > >Many thanks for everyone's efforts here. >> > > > >> > > >Anthony >> > > >> > > > >