Thanks Volker!

David

On 8/08/2016 11:35 PM, Volker Simonis wrote:
Hi David,

looks good now.

Thanks,
Volker


On Fri, Aug 5, 2016 at 4:28 AM, David Holmes <david.hol...@oracle.com> wrote:
Hi Volker,

Thanks for looking at this.

On 5/08/2016 1:48 AM, Volker Simonis wrote:

Hi David,

thanks for doing this change on all platforms.
The fix looks good. Maybe you can just extend the following comment with
something like:

 //  Note that the SR_lock plays no role in this suspend/resume protocol.
 //  It is only used in SR_handler as a thread termination indicator if
NULL.


Darn this code is confusing - too many "SR"'s :( I have added

//  Note that the SR_lock plays no role in this suspend/resume protocol,
//  but is checked for NULL in SR_handler as a thread termination indicator.

Updated webrev:

http://cr.openjdk.java.net/~dholmes/8159461/webrev.v2/

This also reminded me to follow up on why the Solaris SR_handler is
different and I found it is not actually installed as a direct signal
handler, but is called from the real signal handler if dealing with a
JavaThread or the VMThread. Consequently the Solaris version of the
SR_handler can not encounter this specific bug and so I have reverted the
changes to os_solaris.cpp

Thanks,
David


Regards,
Volker

On Wed, Aug 3, 2016 at 3:13 AM, David Holmes <david.hol...@oracle.com
<mailto:david.hol...@oracle.com>> wrote:

    webrev: http://cr.openjdk.java.net/~dholmes/8159461/webrev/
    <http://cr.openjdk.java.net/~dholmes/8159461/webrev/>

    bug: https://bugs.openjdk.java.net/browse/JDK-8159461
    <https://bugs.openjdk.java.net/browse/JDK-8159461>

    The suspend/resume signal (SR_signum) is never sent to a thread once
    it has started to terminate. On one platform (SuSE 12) we have seen
    what appears to be a "stuck" signal, which is only delivered when
    the terminating thread restores its original signal mask (as if
    pthread_sigmask makes the system realize there is a pending signal -
    we already check the signal was not blocked). At this point in the
    thread termination we have freed the osthread, so the the SR_handler
    would access deallocated memory. In debug builds we first hit an
    assertion that the current thread is a JavaThread or the VMThread -
    that assertion fails, even though it is a JavaThread, because we
    have already executed the ~JavaThread destructor and inside the
    ~Thread destructor we are a plain Thread not a JavaThread.

    The fix was to make a small adjustment to the thread termination
    process so that we delete the SR_lock before calling
    os::free_thread(). In the SR_handler() we can then use a NULL check
    of SR_lock() to indicate the thread has terminated and we return.

    While only seen on Linux I took the opportunity to apply the fix on
    all platforms and also cleaned up the code where we were using
    Thread::current() unsafely in a signal-handling context.

    Testing: regular tier 1 (JPRT)
             Kitchensink (in progress)

    As we can't readily reproduce the problem I tested this by having a
    terminating thread raise SR_signum directly from within the ~Thread
    destructor.

    Thanks,
    David



Reply via email to