Re: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark"

Jamsheed C M Thu, 16 Jul 2020 00:03:48 -0700

Hi all,

could i get another review?


Best regards,

Jamsheed

On 16/07/2020 06:37, David Holmes wrote:

Hi Jamsheed,
tl;dr version: fix looks good. Thanks for working through things withme on this one.
Long version ... for the sake of other reviewers (and myself) I'mgoing to walk through the problem scenario and how the fix addressesit, because the bug report is long and confusing and touches on anumber of different issues with async exception handling.
We are dealing with the code generated for Java method entry, and inparticular for a synchronized Java method. We do a lot of things inthe entry code before we actually lock the monitor and jump to theJava method. Some of those things include method profiling and thecounter overflow check for the JIT. If an exception is thrown at thispoint, the logic to remove the activation would unlock the monitor -which we haven't actually locked yet! So we have thedo_not_unlock_if_synchronized flag which is stored in the currentJavaThread. We set that flag true so that if any exceptions result inactivation removal, the removal logic won't try to unlock the monitor.Once we're ready to lock the monitor we set the flag back to false(note there is an implicit assumption here that monitor locking cannever raise an exception).
The problem arises with async exceptions, or more specifically theasync exception that is raised due to an "unsafe access error". Thisis where a memory-mapped ByteBuffer causes an access violation (SEGV)due to a bad pointer. The signal handler simply sets a flag toindicate we encountered an "unsafe access error", adjusts the BCI tothe next instruction and allows execution to proceed at the nextinstruction. It is then expected that the runtime will "soon" noticethis pending unsafe access error and create and throw theInternalError instance that indicates the ByteBuffer operation failed.This requires executing Java code.
One of the places that checks for that pending unsafe access error isin the destructor of the JRT_ENTRY wrapper that is used for the methodprofiling and counter overflow checking. This occurs whilst thedo_not_unlock_if_synchronized flag is true, so the resultingInternalError won't result in an attempt to unlock the not-lockedmonitor.
The problem is that creating the InternalError executes Java code - itcalls constructors, which call methods etc. And some of those methodsare synchronized. So the method entry logic for such a call will setdo_not_unlock_if_synchronized to true, perform all the preamblerelated to the call, then set do_not_unlock_if_synchronized to false,lock the monitor and make the call. When construction completes theInternalError is thrown and we remove the activation for the method wehad originally started to call. But now thedo_not_unlock_if_synchronized flag has been reset to false by thenested Java method call, so we do in fact try to unlock a monitor thatwas never locked, and things break.
This nesting problem is well known and we have a mechanism for dealingwith - the UnlockFlagSaver. The actual logic executed for profilingmethods and doing the counter overflow check contains the requisiteUnlockFlagSaver to avoid the problem just outlined. Unfortunately theasync exception is processed in the JRT_ENTRY wrapper, which isoutside the scope of those UnlockFlagSaver helpers and so they don'thelp in this case.
So the fix is to "simply" move the UnlockFlagSaver deeper into thecall stack to the code that actually does the async exception processing:
void JavaThread::check_and_handle_async_exceptions(boolcheck_unsafe_error) {+ // May be we are at method entry and requires to save do notunlock flag.
+   UnlockFlagSaver fs(this);
so now after the InternalError has been created and thrown we willrestore the original value of the do_not_unlock_if_synchronized flag(false) and so the InternalError will not cause activation removal toattempt to unlock the not-locked monitor.
The scope of the UnlockFlagSaver could be narrowed to the actual logicfor processing the unsafe access error, but it seems fine at methodscope.
A second fix is that the overflow counter check had an assertion thatit was not executed with any pending exceptions. But that turned outto be false for reasons I can't fully explain, but it again appears torelate to a pending async exception being installed prior to themethod call - and seems related to the two referenced JVM TIfunctions. The simple solution here is to delete the assertion and tocheck for pending exceptions on entry to the code and just returnimmediately. The JRT_ENTRY destructor will see the pending exceptionand propagate it.
Cheers,
David

On 16/07/2020 9:50 am, David Holmes wrote:
Hi Jamsheed,

On 16/07/2020 8:16 am, Jamsheed C M wrote:
(Thank you Dean, adding serviceability team as this issue involvesJVMTI features PopFrame, EarlyReturn features)
It is not at all obvious how your proposed fix impacts the JVM TIfeatures.
JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381

(testing: mach5, tier1-5 links in JBS)

Best regards,

Jamsheed

On 15/07/2020 21:25, Jamsheed C M wrote:
Hi,
Async handling at method entry requires it to be aware ofsynchronization(like whether it is doing async handling before lockacquire or after)
This is required as exception handler rely on this info forunlocking. Async handling code never had this special conditionhandled and it worked most of the time as we were using biasedlocking which got disabled by [1]
There was one other issue reported in similar time[2]. This issuegot triggered in test case by [3], back to back extra safepointafter suspend and TLH for ThreadDeath. So in this setup bothPopFrame request and Thread.Stop request happened together for thetest scenario and it reached java method entry withpending_exception set.
I have done a partial fix for the issue, mainly to handleproduction mode crash failures(do not unlock flag related ones)
Fix detail:

1) I save restore the "do not unlock" flag in async handling.
Sorry but you completely changed the fix compared to what wediscussed and what I pre-reviewed! What happened to changing fromJRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of timeand effort to determine that this save/restore of the "do not unlockflag" is actually correct and valid!
2) Return for floating pending exception for some cases(PopFrame,Early return related). This is debug(JVMTI) feature and floatingexception can get cleaned just like that in present compilerrequest and deopt code.
What part of the change addresses this?

Thanks,
David
-----
webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/
There are more problems in these code areas, like we clear allexceptions in compilation request path(interpreter,c1), as well asdeoptimization path.
All these un-handled cases will be separately handled byhttps://bugs.openjdk.java.net/browse/JDK-8249451
Request for review.

Best regards,

Jamsheed
[1]https://bugs.openjdk.java.net/browse/JDK-8231264<https://bugs.openjdk.java.net/browse/JDK-8231264>
[2] https://bugs.openjdk.java.net/browse/JDK-8246727

[3] https://bugs.openjdk.java.net/browse/JDK-8221207

Re: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark"

Reply via email to