Hi Thomas,

Thanks for taking a look.

On 18/11/2019 9:58 pm, Thomas Stüfe wrote:
This is evil :)

There might be more cases like this, e.g.

frame_x86.cpp  frame::is_interpreted_frame_valid():

if (locals > thread->stack_base() || locals < (address) fp()) return false;

Yes that might be a case where >= should be in use. I'll file another bug to check uses of stack_base().

Also, I would have thought the little alloca() dance we do at the start of thread_native_entry() would push the first real frame down the stack.

I know nothing of that code. :)

The fix looks good.

Thanks!

David
-----

Cheers, Thomas



On Mon, Nov 18, 2019 at 3:31 AM David Holmes <[email protected] <mailto:[email protected]>> wrote:

    Bug: https://bugs.openjdk.java.net/browse/JDK-8215355
    webrev: http://cr.openjdk.java.net/~dholmes/8215355/webrev/

    This was a very difficult bug to track down and I want to publicly
    acknowledge and thank the jemalloc folk (users and developers) for
    continuing to investigate this issue from their side. Without their
    persistence this issue would have languished.

    The thread stack_base() is the first address above the thread's stack.
    However, the "in stack" checks performed by Thread::on_local_stack and
    Thread::is_in_stack allowed the checked address to be equal to the
    stack_base() - which is not correct. Here's how this manifests as
    the bug:

    - Let a JavaThread instance, T2, be allocated at the end of thread T1's
    stack i.e. at T1->stack_base()
        [This seems to be why this only reproduced with jemalloc.]
    - Let T2 lock an inflated monitor
    - Let T1 try to lock the same monitor
        - T1 would consider the _owner field value (T2) as being in its
    stack
    and so consider the monitor stack-locked by T1
        - And so both T1 and T2 would have ownership of the monitor
    allowing
    the monitor state (and application state) to be corrupted. This results
    in a range of hangs and crashes depending on the exact interleaving.

    Interestingly Thread::is_in_usable_stack does not have this bug.

    The bug can be tracked way back to JDK-6699669 as explained in the bug
    report. That issue also showed that the same bug existed in the SA
    implementations of these "on stack" checks.

    Testing:
        - The reproducer from the bug report, using jemalloc, ran over 5000
    times without failing in any way.
        - tiers 1-3 on all Oracle platforms
        - serviceability/sa tests

    Thanks,
    David
    -----

Reply via email to