Hi Dan,
Thanks for taking a look at this.
On 19/11/2019 3:36 am, Daniel D. Daugherty wrote:
Hi David,
On 11/17/19 9:30 PM, David Holmes wrote:
Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/
src/hotspot/share/runtime/thread.hpp
Nice catch!
src/hotspot/share/runtime/thread.cpp
Nice catch!
Not your issue, but these two lines feel strange/wrong:
L1008: // Allow non Java threads to call this without stack_base
L1009: if (_stack_base == NULL) return true;
When _stack_base is NULL, any 'adr' is in the caller's stack? The
comment is not helping understand why this is so...
src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaThread.java
Nice catch!
Again, not your issue, but these four lines are questionable:
L383 Address sp = lastSPDbg();
L384 Address stackBase = getStackBase();
L385 // Be robust
L386 if (sp == null) return false;
I can see why a NULL sp would cause a "false" return since obviously
something is a amiss in the frame. However, the C++ code doesn't make
this check so why does the SA code?
And this code doesn't check stackBase == NULL so it's not matching
the C++ code either.
Thumbs up on the change itself. My queries above and below might warrant
new bugs or RFEs to be filed.
I have filed a bug to examine this and the issue Thomas flagged:
https://bugs.openjdk.java.net/browse/JDK-8234372
"Investigate use of Thread::stack_base() and queries for "in stack""
This was a very difficult bug to track down and I want to publicly
acknowledge and thank the jemalloc folk (users and developers) for
continuing to investigate this issue from their side. Without their
persistence this issue would have languished.
You also deserve thanks for sticking with this bug: Thanks David!!
Thanks, but I had written this off as a jemalloc issue until they
provided the additional data.
The thread stack_base() is the first address above the thread's stack.
However, the "in stack" checks performed by Thread::on_local_stack and
Thread::is_in_stack allowed the checked address to be equal to the
stack_base() - which is not correct. Here's how this manifests as the
bug:
- Let a JavaThread instance, T2, be allocated at the end of thread
T1's stack i.e. at T1->stack_base()
[This seems to be why this only reproduced with jemalloc.]
- Let T2 lock an inflated monitor
- Let T1 try to lock the same monitor
- T1 would consider the _owner field value (T2) as being in its
stack and so consider the monitor stack-locked by T1
- And so both T1 and T2 would have ownership of the monitor allowing
the monitor state (and application state) to be corrupted. This
results in a range of hangs and crashes depending on the exact
interleaving.
Ouch!
So I was wondering how this bug could happen with the thread alignment
logic that we have in place... search for the _real_malloc_address stuff...
And then I noticed that the logic only kicks in when UseBiasedLocking ==
true
(and this bug says it doesn't happen with -XX:-UseBiasedLocking):
Actually that is a false claim. As per my comment on "2019-10-09 14:09"
it does reproduce with biased-locking disabled but much more rarely.
src/hotspot/share/runtime/thread.cpp:
// ======= Thread ========
// Support for forcing alignment of thread objects for biased locking
void* Thread::allocate(size_t size, bool throw_excpt, MEMFLAGS flags) {
if (UseBiasedLocking) {
const size_t alignment = markWord::biased_lock_alignment;
size_t aligned_size = size + (alignment - sizeof(intptr_t));
void* real_malloc_addr = throw_excpt? AllocateHeap(aligned_size,
flags, CURRENT_PC)
: AllocateHeap(aligned_size,
flags, CURRENT_PC,
AllocFailStrategy::RETURN_NULL);
void* aligned_addr = align_up(real_malloc_addr, alignment);
assert(((uintptr_t) aligned_addr + (uintptr_t) size) <=
((uintptr_t) real_malloc_addr + (uintptr_t) aligned_size),
"JavaThread alignment code overflowed allocated storage");
if (aligned_addr != real_malloc_addr) {
log_info(biasedlocking)("Aligned thread " INTPTR_FORMAT " to "
INTPTR_FORMAT,
p2i(real_malloc_addr),
p2i(aligned_addr));
}
((Thread*) aligned_addr)->_real_malloc_address = real_malloc_addr;
return aligned_addr;
} else {
return throw_excpt? AllocateHeap(size, flags, CURRENT_PC)
: AllocateHeap(size, flags, CURRENT_PC,
AllocFailStrategy::RETURN_NULL);
}
}
The logging logic above:
if (aligned_addr != real_malloc_addr) {
log_info(biasedlocking)("Aligned thread " INTPTR_FORMAT " to "
INTPTR_FORMAT,
p2i(real_malloc_addr),
p2i(aligned_addr));
}
allows for real_malloc_addr to be the same as aligned_addr sometimes
(and no log message is issued), but I'm not sure from spelunking in
code whether it's really possible for:
void* aligned_addr = align_up(real_malloc_addr, alignment);
to return aligned_addr == real_malloc_addr. In other words, if
real_malloc_addr is already aligned perfectly, does align_up() still
change that value?
If it is possible for (aligned_addr == real_malloc_addr), then it is
possible for this bug to happen without jemalloc.
I've convinced myself that this is possible because of this line:
size_t aligned_size = size + (alignment - sizeof(intptr_t));
If real_malloc_addr is already aligned perfectly and align_up()
always changed the input address, then the aligned_size would be
too small by sizeof(intptr_t) and we would have seen a buffer
overwrite like that over the many, many years.
So my conclusion is that it should be possible for this bug to
happen without jemalloc, but it would have to be rare.
I'm a little surprised that we specialize this way as I thought the
128/256 byte alignment was necessary regardless of biased-locking.
Further even if running without biased-locking we still have alignment
requirements for the lock-bits, age-bits etc, that do not seem to be
captured by the above code unless AllocateHeap somehow already provides
such alignment by default. (I'm also unclear why this doesn't fail in
debug builds but just assume the allocation patterns are different.)
Anyway, if the allocator already returns a suitably aligned block of
memory then I am assuming the above code doesn't actually need to do
anything.
So theoretically, without having advance knowledge of the details of the
allocator, yes this bug could happen for any allocator.
Thanks,
David
-----
Interestingly Thread::is_in_usable_stack does not have this bug.
So we have Thread::is_in_usable_stack(), Thread::on_local_stack() and
Thread::is_in_stack()? I haven't compared all three side by side, but
there might be some cleanup work that can be done here (in a different
bug).
The bug can be tracked way back to JDK-6699669 as explained in the bug
report. That issue also showed that the same bug existed in the SA
implementations of these "on stack" checks.
Ouch! JDK-6699669 was fixed in jdk7-B56 and looks like it was pushed
to the jdk6u train... so this bug goes back quite a ways...
Outstanding hunt David!
Dan
Testing:
- The reproducer from the bug report, using jemalloc, ran over 5000
times without failing in any way.
- tiers 1-3 on all Oracle platforms
- serviceability/sa tests
Thanks,
David
-----