Hi David, thanks for doing this change on all platforms. The fix looks good. Maybe you can just extend the following comment with something like:
// Note that the SR_lock plays no role in this suspend/resume protocol. // It is only used in SR_handler as a thread termination indicator if NULL. Regards, Volker On Wed, Aug 3, 2016 at 3:13 AM, David Holmes <david.hol...@oracle.com> wrote: > webrev: http://cr.openjdk.java.net/~dholmes/8159461/webrev/ > > bug: https://bugs.openjdk.java.net/browse/JDK-8159461 > > The suspend/resume signal (SR_signum) is never sent to a thread once it > has started to terminate. On one platform (SuSE 12) we have seen what > appears to be a "stuck" signal, which is only delivered when the > terminating thread restores its original signal mask (as if pthread_sigmask > makes the system realize there is a pending signal - we already check the > signal was not blocked). At this point in the thread termination we have > freed the osthread, so the the SR_handler would access deallocated memory. > In debug builds we first hit an assertion that the current thread is a > JavaThread or the VMThread - that assertion fails, even though it is a > JavaThread, because we have already executed the ~JavaThread destructor and > inside the ~Thread destructor we are a plain Thread not a JavaThread. > > The fix was to make a small adjustment to the thread termination process > so that we delete the SR_lock before calling os::free_thread(). In the > SR_handler() we can then use a NULL check of SR_lock() to indicate the > thread has terminated and we return. > > While only seen on Linux I took the opportunity to apply the fix on all > platforms and also cleaned up the code where we were using > Thread::current() unsafely in a signal-handling context. > > Testing: regular tier 1 (JPRT) > Kitchensink (in progress) > > As we can't readily reproduce the problem I tested this by having a > terminating thread raise SR_signum directly from within the ~Thread > destructor. > > Thanks, > David >