Re: [[email protected]] Re: Snow Leopard: mono hangs + threading.

Sledge Ham Tue, 20 Oct 2009 11:35:05 -0700

On Tue, Oct 20, 2009 at 02:09:11PM +0200, Laurent Etiemble wrote:
> Hello,
> 
> Your approach seems ok as we need a reliable way to remove dead
> thread. Note that we also must be sure that the CG_Thread entry is
> removed as well (and I suspect that some debugger entries have to be
> also removed). I did some tests when trying to avoid the memory leak
> after marking, but it seems that I hit a deadlock somewhere that
> prevent the removal of the CG_Thread entry. Maybe you will be luckier
> than me...
It seems to work for me...You should take into acount that you must not try to 
get a lock in , as you are already in a locked situation :
#0  confirm_delete (me=0x613330) at pthread_support.c:287
#1  0x002164cb in GC_push_all_stacks () at darwin_stop_world.c:281
#2  0x002219d3 in GC_default_push_other_roots () at os_dep.c:2078
#3  0x0021f7b1 in GC_push_roots (all=1, cold_gc_frame=0xbfffea0c "??!") at 
mark_rts.c:646
#4  0x0021cbb5 in GC_mark_some (cold_gc_frame=0xbfffea0c "??!") at mark.c:326
#5  0x002149b4 in GC_stopped_mark (stop_func=0x213d1b <GC_never_stop_func>) at 
alloc.c:543
#6  0x002144c4 in GC_try_to_collect_inner (stop_func=0x213d1b 
<GC_never_stop_func>) at alloc.c:382
#7  0x002150a8 in GC_try_to_collect (stop_func=0x213d1b <GC_never_stop_func>) 
at alloc.c:809
#8  0x00215112 in GC_gcollect () at alloc.c:822
#9  0x000fbd74 in mono_gc_collect (generation=0) at boehm-gc.c:113
#10 0x0011e0a7 in ves_icall_System_GC_InternalCollect (generation=0) at gc.c:380
#11 0x1783d2e7 in ?? ()
#12 0x1783d254 in ?? ()
#13 0x007eb92f in ?? ()
#14 0x007eb88e in ?? ()
#15 0x007e3c39 in ?? ()
#16 0x936d514a in -[NSApplication sendAction:to:from:] ()
#17 0x937b4891 in -[NSControl sendAction:to:] ()
#18 0x937b033e in -[NSCell _sendActionFrom:] ()


And GC_try_to_collect (stop_func=0x213d1b <GC_never_stop_func>) at alloc.c:809
 reads:
804         LOCK();
805         ENTER_GC();
806         if (!GC_is_initialized) GC_init_inner();
807         /* Minimize junk left in my registers */
808           GC_noop(0,0,0,0,0,0);
809         result = (int)GC_try_to_collect_inner(stop_func);
810         EXIT_GC();
811         UNLOCK();
812         ENABLE_SIGNALS();


So no need to get a lock anymore...



changes in libgc/pthread_support.c
void GC_thread_deregister_foreign (void *data)
{
    GC_thread me = (GC_thread)data;
 /*   GC_fprintf1( "\n\n\n\n --- Deregister %x ---\n\n\n\n\n", me->flags ); */
    if (me -> flags & FOREIGN_THREAD) {
#ifdef GC_DARWIN_THREADS
        /* instead of deregistering this thread, just mark it to go dead..
                as of snow leopard other message could be delivered on this 
thread before it is really finalized*/
        LOCK();
        printf("terminbating %p %p %p\n", me->stop_info.mach_thread, me, 
me->id);
        me->flags |= TERMINATING;
        UNLOCK();
        return;
#endif
        LOCK();
 /*     GC_fprintf0( "\n\n\n\n --- FOO ---\n\n\n\n\n" ); */
#if defined(THREAD_LOCAL_ALLOC) && !defined(DBG_HDRS_ALL)
        GC_destroy_thread_local (me);
#endif
        GC_delete_gc_thread(me->id, me);
        UNLOCK();
    } else {
   printf("not releaing...\n");
    }
}

void confirm_delete(GC_thread me)
{
  /* we should be in a LOCK */
        printf( "*** DELETING THREAD ****\n" );
        printf("deleting %p %p %p\n", me->stop_info.mach_thread, me, me->id);
#if defined(THREAD_LOCAL_ALLOC) && !defined(DBG_HDRS_ALL)
        GC_destroy_thread_local (me);
#endif
        GC_delete_gc_thread(me->id, me);

}


The dorwin_stop_world.c reads like:
  for(i=0;i<THREAD_TABLE_SZ;i++) {
    for(p=GC_threads[i];p!=0;p=p->next) {
      if(p -> flags & FINISHED) continue;
      if (p -> flags & TERMINATING) {
         //Verify if the thread is really dead
          mach_port_urefs_t refs=0;
          kern_return_t r;
          r=mach_port_get_refs(mach_task_self(), p->stop_info.mach_thread, 
MACH_PORT_RIGHT_DEAD_NAME, &refs);
          if (r != KERN_SUCCESS) ABORT("Could not get count on TERMINATING 
thread");
          if (refs > 0) {
            confirm_delete(p);
          }
          continue;
      }
...


So the confirm_delete is called when the gc detects a dead_name exists.

Now I do not have a crash on the thread created by the pthread_tsd_cleanup 
handler, but I hit the next problem.
Now I get a crash because start_wqthead/_pthread_wqthread register again a 
thread which is exited without calling the thread_deregister_foreign

#0  GC_thread_register_foreign (base_addr=0xb0184e64) at pthread_support.c:1369
#1  0x000fbe4b in mono_gc_register_thread (baseptr=0xb0184e64) at boehm-gc.c:218
#2  0x001af879 in mono_thread_attach (domain=0x616e58) at threads.c:813
#3  0x0000653a in mono_jit_thread_attach (domain=0x616e58) at mini.c:2128
#4  0x00779295 in ?? ()
#5  0x91da32c7 in -[NSOperation dealloc] ()
#6  0x91daa669 in -[NSBlockOperation dealloc] ()
#7  0x91da31da in __release_operation ()
#8  0x933db97e in _dispatch_worker_thread2 ()
#9  0x933db401 in _pthread_wqthread ()
#10 0x933db246 in start_wqthread ()

Does anyone know what start_wqthread/_pthread_wqthread is used for and if we 
can get a notification if it exits?

cc


> 
> Monobjc is not changing anything in the TSD. It only swizzles the
> dealloc method of the NSObject class. So I guess this is between
> Objective-C and Mono...
> 
> Regards, Laurent Etiemble.
> 
> 2009/10/20 Sledge Ham <[email protected]>:
> > On Sun, Oct 18, 2009 at 09:53:18AM +0200, Laurent Etiemble wrote:
> >> Hello,
> >> I think Sledge Ham described correctly the issue. Let's analyse what going
> >> on the native and Monobjc worlds.
> >>
> >> *** The Native World ***
> >>
> >> Here are the results of my experiments (i have attached a sample native
> >> application that exhibit the behavior)
> >> 1) The thread exits
> >> 2) In order to clean-up, it calls "_pthread_exit" and 
> >> "_pthread_tsd_cleanup"
> >> 3) And after, it posts a NSThreadWillExitNotification notification to 
> >> inform
> >> observers that it has exited.
> >>
> >> If you register for NSThreadWillExitNotification notifications and if you
> >> put a break point in the callback method, you will see the following
> >> stacktrace:
> >>
> >> #0 0x00001d92 in -[Controller threadExitNotification:] at Controller.m:37
> >> #1 0x91aae253 in _nsnote_callback
> >> #2 0x9833dc29 in __CFXNotificationPost
> >> #3 0x9833d65a in _CFXNotificationPostNotification
> >> #4 0x91aa3120 in -[NSNotificationCenter
> >> postNotificationName:object:userInfo:]
> >> #5 0x91aa3dae in __NSFinalizeThreadData
> >> #6 0x98677a1d in _pthread_tsd_cleanup
> >> #7 0x986775ca in _pthread_exit
> >> #8 0x91aae984 in +[NSThread exit]
> >> #9 0x91aae92c in __NSThread__main__
> >> #10 0x9866ef39 in _pthread_start
> >> #11 0x9866edbe in thread_start
> >>
> >> Sounds familiar ?
> >>
> >> *** The Monobjc World ***
> >>
> >> In Monobjc, we swizzle the NSObject dealloc method in order to maintain a
> >> consisency between native and managed world. It guarantees you that one and
> >> only one managed wrapper exists for a native object. Without that, we will
> >> have many wrappers for one native object. I have searched other ways to do
> >> that without any success (One day, I wish I could find how to do that
> >> without messing around with swizzling).
> >>
> >> What we learned from the Sledge Ham's stack trace is:
> >> 1) The thread exits
> >> 2) It calls "_pthread_exit" and "_pthread_tsd_cleanup".
> >> 3) The Mono runtime unregister the thread (through a hook installed when 
> >> the
> >> thread has registered)
> >> 4) The thread allocated a notification and posts it to the
> >> NSNotificationCenter
> >> 5) One the notification has been dispatched, the thread deallocs the
> >> notification object.
> >> 6) When the dealloc message is sent, the Monobjc's native-to-managed
> >> callback is called.
> >> 7) The Mono runtime see that a new thread wants to be registered (remember
> >> that the thread has been previously removed)
> >> 8) The Mono runtime has a reference on a dead thread (the thread has exited
> >> right after the notification).
> >> 9) The Mono runtime crashes when it tries to get the thread's stack trace 
> >> as
> >> it is not valid anymore.
> >>
> >> *** Conclusion ***
> >>
> >> The dealloc swizzling is one of the suspects: if you disabled swizzling
> >> (comment the ObjectiveCMessage attribute on the NSObject dealloc method),
> >> there is no crash anymore. So an short term solution would be to avoid
> >> method swizzling. But there is still a situation regarding the use of
> >> NSThreadWillExitNotification notifications as they will cross the bridge 
> >> and
> >> trigger the registration of the dead thread in the Mono runtime, if an
> >> application registers for them.
> >>
> >> So, there are two options left:
> >> 1) Remove method swizzling by finding another solution, and prohibit the 
> >> use
> >> of thread related notifications.
> >> 2) Patch the Mono runtime to check that when a foreign thread is added, it
> >> is a valid one. Help will be greatly appreciated to find a robust way to do
> >> it (pthread + mach checks).
> > Laurent,
> >
> > What about the following approach?
> >
> > We replace the deregister_foreign method in libgc/pthread_support.c with 
> > something which marks the thread as about to terminate (just add a bit in 
> > the flag), but do not terminate it yet.
> >
> > Then in the darwin_stop_world.c we test if the TERMINATING-flag is set...
> > If it is set, we call the mach_port_get_refs to count the number of 
> > MACH_PORT_RIGHT_DEAD_NAME on the thread. If it is > 0 then this is the 
> > signal to deallocate all the reservations for this thread, because it went 
> > dead. So we call the orignal deregister_foreign method, because we are now 
> > sure that the thread really exited . After deallocating all this stuff we 
> > process the next entry in the GC-threads...
> > If it is not set, we just do the same as what is already implemented.
> >
> >
> > This approach describes a way where we only cleanup data when the thread 
> > really exited...There is no possibility to have other parts of 
> > 'garbage'/thread clean up to occur later on.
> >
> > btw... is monobjc doing some reservations in the thread specific data or is 
> > this cocoa?
> >
> > cc
> >
> >
> >
> >
> >>
> >> Any comments ?
> >>
> >> Regards, Laurent Etiemble.
> >>
> >> 2009/10/14 Sledge Ham <[email protected]>
> >>
> >> > > >From: Anthony Bowker <[email protected]>
> >> > > >Date: Tue 13 Oct 2009 21:41:20 GMT+02:00
> >> > > >To: "[email protected]" <[email protected]>
> >> > > >Subject: RE: [[email protected]] Re: [[email protected]]
> >> > > >Feeback Wanted on Snow Leopard
> >> > > >Reply-To: "[email protected]" <[email protected]>
> >> > > >
> >> > > >I haven't been able to devote much time over the past couple of
> >> > > >weeks to
> >> > > >building mono with the patch and testing my app.  My apologies for
> >> > > >not being
> >> > > >able to help out more.
> >> > > >
> >> > > >I've just noticed Sledge Ham's stack trace in the bug
> >> > > >(https://bugzilla.novell.com/show_bug.cgi?id=537764) and since the
> >> > > >notification in the stack is dealloc, I've been reading about
> >> > > >monobjc's
> >> > > >method swizzling on dealloc.
> >> > > >
> >> > > >How about this hypothesis: In Snow Leopard, there are some foreign
> >> > > >threads
> >> > > >created by the ObjectiveC runtime which do some stuff with
> >> > > >NSObjects, and
> >> > > >eventually their NSObject:dealloc method is called, this is swizzled
> >> > > >by
> >> > > >monobjc which runs the managed NSObjectImposter.dealloc method to
> >> > > >maintain
> >> > > >the managed wrapper class instances.  This is a managed method, so
> >> > > >mono
> >> > > >kicks into gear and mono_jit_thread_attach and friends are called to
> >> > > >add the
> >> > > >thread to the GC_threads-array.  This is a peculiar thread, it goes
> >> > > >away
> >> > > >later without notification to the mono runtime, and causes the crash.
> >> > > >
> >> > > >It would be cool if we could swizzle for only the threads we care
> >> > > >about.  Or
> >> > > >perhaps we should swizzle with native code first...
> >> > > >
> >> > Anthony,
> >> >
> >> > What about the following hypothesis:
> >> > In snow leopard there are some foreign threads created by/used by the
> >> > ObjectiveC runtime which do some stuff with NSObjects.
> >> > One of the actions is that this thread is registered with mono and
> >> > mono registers a thread specific data with clean up function.
> >> >
> >> >
> >> > Later some object/function does some things with NS objects.. the
> >> > runtime now allocates a new NSobject and stores it in some thread
> >> > specific data with a clean up handler...
> >> >
> >> > Now for some reason this thread decides to exit (probably because the
> >> > app lost focus or so).. Now the pthread calls the pthread_exit()
> >> > function.
> >> >
> >> > One of the tasks of the pthread_exit function is to clean up all
> >> > thread specific data.. The first thing it does is deregister itself
> >> > from the GC..
> >> > The second thing it does, is proces the other thread specific cleanup
> >> > handlers, which is the dealloc-function... This is picked up by mono
> >> > which does not find the thread (it just go cleaned up in the previous
> >> > step) so it reregisters this thread and does all required stuff
> >> > (perform the dealloc on the objects). When this clean up has
> >> > completed, the cleanup handler returns and the thread thinks it
> >> > cleaned up everything..
> >> >
> >> > but... wait.. the dealloc cause a reregistration of the thread.. and
> >> > the thread exited.. We have an inconsistecy between the threads which
> >> > are still alive (accroding to the processor/OS) and which threads do
> >> > exists according to mono's gc..
> >> >
> >> > The problem is that to my knowledge pthreads do not specify an order
> >> > in which thread-specific cleanup is implemented... To my feeling we
> >> > should find a way to make sure we do not reregister threads when they
> >> > are being cleaned up. as pthreads do not guarantee any order we are
> >> > kinda screwed...
> >> >
> >> >
> >> > Ideas I have, but which are probably not ok:
> >> >
> >> > 1) When the thread is deregistered we call ourselves
> >> > pthread_tsd_cleanup...
> >> > This will make sure all other thread specific data is gone before we
> >> > return from the deregister functions... I am not sure if this will
> >> > work however, because we would have a nested
> >> > pthread_tsd_cleanup().. Chances are very high that the outer
> >> > pthread_tsd_cleanup will crash because data structures have changed..
> >> >
> >> > 2) We try to implement something which makes sure that we do not
> >> > reregister a thread which is quiting.. how?
> >> > a) we add a new thread
> >> > specific data with clean up and we 'hope' that it gets called again
> >> > after the other objects are cleaned.. This is also very dependent on
> >> > the pthread implementation, so it is probably not a good way
> >> >
> >> > b) while deregistering we add a new tsd which says that the thread is
> >> > quitting... before registering a function we check that this tsd does
> >> > not exist.. If it does we do not reregister the thread. problem is
> >> > that this new tsd might be cleanedup before the other objects and we
> >> > run into the same issue..
> >> >
> >> >
> >> > I think that without a good insight of the pthread_tsd_cleanup() we
> >> > can make guesses on what is happening and probably introduce a lot of
> >> > new issues ;-(
> >> >
> >> >
> >> > Maybe we should think in a completely other direction... Instead of
> >> > deleting the thread when the deregister function is called, we just
> >> > mark the thread stale (like laurent proposed) and we register the time
> >> > at which the thread went stale... Then at a later time we remove all
> >> > threads which were stale for at least x (10?) seconds or
> >> > so... Question I have is.. what if after 5 seconds a new thread is
> >> > fired with the same threadid.. we will probably again run into an
> >> > inconsistent state...
> >> >
> >> > Another idea .. which is a radical change.. Instead of trying to
> >> > register the threads, can't we just ask mach which threads are alive
> >> > and suspend all these threads when the GC runs? this way we do not
> >> > rely on any pthreads mechanism...
> >> >
> >> > Other ideas are welcome...
> >> >
> >> > Sledge
> >> >
> >> >
> >> > > >I realize this is probably the wrong place to be asking, but does
> >> > > >anyone
> >> > > >know if the other mono-objectiveC bridges have the same problem?
> >> > > >
> >> > > >I'm very new to understanding swizzling and the mono GC, so if what
> >> > > >I've
> >> > > >written above sounds like nonsense, it probably is :-)
> >> > > >
> >> > > >Many thanks for everyone's efforts here.
> >> > > >
> >> > > >Anthony
> >> > >
> >> >
> >
> >
> >
>

Re: [[email protected]] Re: Snow Leopard: mono hangs + threading.

Reply via email to