Looks good, thanks David!
/Robbin
On 11/18/19 3:30 AM, David Holmes wrote:
Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
This was a very difficult bug to track down and I want to publicly acknowledge
and thank the jemalloc folk (users and developers) for continuing to investigate
this issue from their side. Without their persistence this issue would have
languished.
The thread stack_base() is the first address above the thread's stack. However,
the "in stack" checks performed by Thread::on_local_stack and
Thread::is_in_stack allowed the checked address to be equal to the stack_base()
- which is not correct. Here's how this manifests as the bug:
- Let a JavaThread instance, T2, be allocated at the end of thread T1's stack
i.e. at T1->stack_base()
[This seems to be why this only reproduced with jemalloc.]
- Let T2 lock an inflated monitor
- Let T1 try to lock the same monitor
- T1 would consider the _owner field value (T2) as being in its stack and so
consider the monitor stack-locked by T1
- And so both T1 and T2 would have ownership of the monitor allowing the
monitor state (and application state) to be corrupted. This results in a range
of hangs and crashes depending on the exact interleaving.
Interestingly Thread::is_in_usable_stack does not have this bug.
The bug can be tracked way back to JDK-6699669 as explained in the bug report.
That issue also showed that the same bug existed in the SA implementations of
these "on stack" checks.
Testing:
- The reproducer from the bug report, using jemalloc, ran over 5000 times
without failing in any way.
- tiers 1-3 on all Oracle platforms
- serviceability/sa tests
Thanks,
David
-----