Bug: https://bugs.openjdk.java.net/browse/JDK-8137165
Webrev: http://cr.openjdk.java.net/~dholmes/8137165/webrev/

This isn't a fix per-se but some additional diagnostic code to try and detect the conditions where the bug might manifest. The basic failure mode was:

# Internal Error (/opt/jprt/T/P1/175841.hseigel/s/hotspot/src/os/linux/vm/os_linux.cpp:3950), pid=27906, tid=13248 # assert(thread->is_VM_thread() || thread->is_Java_thread()) failed: Must be VMThread or JavaThread

with a stack showing in part:

#34 0xf6623ec0 in report_vm_error (
file=0xf71b6140 "/scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp", line=3901, error_msg=0xf71b62e0 "assert(thread->is_VM_thread() || thread->is_Java_thread()) failed", detail_fmt=0xf71b62c0 "Must be VMThread or JavaThread") at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/utilities/debug.cpp:218 #35 0xf6d21b3f in SR_handler (sig=12, siginfo=0xc1b58ccc, context=0xc1b58d4c) at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp:3901
#36 <signal handler called>
#37 0xf776b430 in __kernel_vsyscall ()
#38 0xf773ccef in pthread_sigmask () from /lib/libpthread.so.0
#39 0xf6d23e6c in os::free_thread (osthread=0xc20cf8b8)
at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/os/linux/vm/os_linux.cpp:879 #40 0xf6f6811d in Thread::~Thread (this=0xc20cd800, __in_chrg=<optimized out>) at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:367
#41 0xf6f6866f in JavaThread::~JavaThread (this=0xc20cd800,
    __in_chrg=<optimized out>)
at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1611
#42 0xf6f6877c in JavaThread::~JavaThread (this=0xc20cd800,
    __in_chrg=<optimized out>)
at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1655
#43 0xf6f74a38 in JavaThread::thread_main_inner (this=0xc20cd800)
at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1724
#44 0xf6f74e12 in JavaThread::run (this=0xc20cd800)
at /scratch/opt/jprt/T/P1/205457.cphillim/s/hotspot/src/share/vm/runtime/thread.cpp:1698
#45 0xf6d238ec in java_start (thread=0xc20cd800)

What appears to be happening is that the thread has blocked SR_signum (SIGUSR2) at some point (though there is no code that does this), and the signal has become pending on the thread due to the event sampling logic. The thread terminates, executing well into the destructor until it gets to os::free_thread which restores the original signal mask for the thread - that signal mask has SR_signum unblocked and so the signal is delivered immediately and we enter the SR_handler. For some reason this triggers the assertion failure - though why exactly is unclear as we have not released the thread memory as yet, nor done anything that should invalidate that call. Whatever the reason the state of the thread causes secondary failures in the error reporting code as well.

Attempts to reproduce this bug have been unsuccessful (so maybe we had a random memory stomp on the thread state - who knows.)

So what I am doing is simply adding an additional assertion to try and catch, during regular testing, any occurrence of SR_signum being blocked while a thread is terminating.

In addition a couple of minor cleanups in the signal related code:
- strictly speaking SR_handler needs to use Thread::current_or_null_safe() because it needs ot use library-based TLS in a signal context. - sigsets should (POSIX recommendation) be explicitly emptied/filled before being set via pthread_sigmask
- change 0 to NULL in call to pthread_sigmask

Testing: - JPRT, original failing testcase

Thanks,
David

Reply via email to