Hi David, On Mon, Nov 18, 2019 at 2:26 PM David Holmes <[email protected]> wrote:
> Hi Thomas, > > Thanks for taking a look. > > On 18/11/2019 9:58 pm, Thomas Stüfe wrote: > > This is evil :) > > > > There might be more cases like this, e.g. > > > > frame_x86.cpp frame::is_interpreted_frame_valid(): > > > > if (locals > thread->stack_base() || locals < (address) fp()) return > false; > > Yes that might be a case where >= should be in use. I'll file another > bug to check uses of stack_base(). > > Many of them could use Thread::in_usable_stack(), I assume. > > Also, I would have thought the little alloca() dance we do at the start > > of thread_native_entry() would push the first real frame down the stack. > > I know nothing of that code. :) > See os_linux.cpp: ... // Try to randomize the cache line index of hot stack frames. // This helps when threads of the same stack traces evict each other's // cache lines. The threads can be either from the same JVM instance, or // from different JVM instances. The benefit is especially true for // processors with hyperthreading technology. static int counter = 0; int pid = os::current_process_id(); alloca(((pid ^ counter++) & 7) * 128); > > The fix looks good. > > Thanks! > > David > ----- > > Cheers, Thomas > > Cheers, Thomas > > > > > > > > On Mon, Nov 18, 2019 at 3:31 AM David Holmes <[email protected] > > <mailto:[email protected]>> wrote: > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215355 > > webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/ > > > > This was a very difficult bug to track down and I want to publicly > > acknowledge and thank the jemalloc folk (users and developers) for > > continuing to investigate this issue from their side. Without their > > persistence this issue would have languished. > > > > The thread stack_base() is the first address above the thread's > stack. > > However, the "in stack" checks performed by Thread::on_local_stack > and > > Thread::is_in_stack allowed the checked address to be equal to the > > stack_base() - which is not correct. Here's how this manifests as > > the bug: > > > > - Let a JavaThread instance, T2, be allocated at the end of thread > T1's > > stack i.e. at T1->stack_base() > > [This seems to be why this only reproduced with jemalloc.] > > - Let T2 lock an inflated monitor > > - Let T1 try to lock the same monitor > > - T1 would consider the _owner field value (T2) as being in its > > stack > > and so consider the monitor stack-locked by T1 > > - And so both T1 and T2 would have ownership of the monitor > > allowing > > the monitor state (and application state) to be corrupted. This > results > > in a range of hangs and crashes depending on the exact interleaving. > > > > Interestingly Thread::is_in_usable_stack does not have this bug. > > > > The bug can be tracked way back to JDK-6699669 as explained in the > bug > > report. That issue also showed that the same bug existed in the SA > > implementations of these "on stack" checks. > > > > Testing: > > - The reproducer from the bug report, using jemalloc, ran over > 5000 > > times without failing in any way. > > - tiers 1-3 on all Oracle platforms > > - serviceability/sa tests > > > > Thanks, > > David > > ----- > > >
