Hi Coleen,
Thank you for the review.
Best regards,
Jamsheed
On 16/07/2020 20:13, [email protected] wrote:
Thanks to David's description of the problem and the fix, this makes
sense to me now.
I don't like it and we should revisit async exceptions for all the
other problems it causes, but this change looks safe and good.
thanks,
Coleen
On 7/16/20 3:00 AM, Jamsheed C M wrote:
Hi all,
could i get another review?
Best regards,
Jamsheed
On 16/07/2020 06:37, David Holmes wrote:
Hi Jamsheed,
tl;dr version: fix looks good. Thanks for working through things
with me on this one.
Long version ... for the sake of other reviewers (and myself) I'm
going to walk through the problem scenario and how the fix addresses
it, because the bug report is long and confusing and touches on a
number of different issues with async exception handling.
We are dealing with the code generated for Java method entry, and in
particular for a synchronized Java method. We do a lot of things in
the entry code before we actually lock the monitor and jump to the
Java method. Some of those things include method profiling and the
counter overflow check for the JIT. If an exception is thrown at
this point, the logic to remove the activation would unlock the
monitor - which we haven't actually locked yet! So we have the
do_not_unlock_if_synchronized flag which is stored in the current
JavaThread. We set that flag true so that if any exceptions result
in activation removal, the removal logic won't try to unlock the
monitor. Once we're ready to lock the monitor we set the flag back
to false (note there is an implicit assumption here that monitor
locking can never raise an exception).
The problem arises with async exceptions, or more specifically the
async exception that is raised due to an "unsafe access error". This
is where a memory-mapped ByteBuffer causes an access violation
(SEGV) due to a bad pointer. The signal handler simply sets a flag
to indicate we encountered an "unsafe access error", adjusts the BCI
to the next instruction and allows execution to proceed at the next
instruction. It is then expected that the runtime will "soon" notice
this pending unsafe access error and create and throw the
InternalError instance that indicates the ByteBuffer operation
failed. This requires executing Java code.
One of the places that checks for that pending unsafe access error
is in the destructor of the JRT_ENTRY wrapper that is used for the
method profiling and counter overflow checking. This occurs whilst
the do_not_unlock_if_synchronized flag is true, so the resulting
InternalError won't result in an attempt to unlock the not-locked
monitor.
The problem is that creating the InternalError executes Java code -
it calls constructors, which call methods etc. And some of those
methods are synchronized. So the method entry logic for such a call
will set do_not_unlock_if_synchronized to true, perform all the
preamble related to the call, then set do_not_unlock_if_synchronized
to false, lock the monitor and make the call. When construction
completes the InternalError is thrown and we remove the activation
for the method we had originally started to call. But now the
do_not_unlock_if_synchronized flag has been reset to false by the
nested Java method call, so we do in fact try to unlock a monitor
that was never locked, and things break.
This nesting problem is well known and we have a mechanism for
dealing with - the UnlockFlagSaver. The actual logic executed for
profiling methods and doing the counter overflow check contains the
requisite UnlockFlagSaver to avoid the problem just outlined.
Unfortunately the async exception is processed in the JRT_ENTRY
wrapper, which is outside the scope of those UnlockFlagSaver helpers
and so they don't help in this case.
So the fix is to "simply" move the UnlockFlagSaver deeper into the
call stack to the code that actually does the async exception
processing:
void JavaThread::check_and_handle_async_exceptions(bool
check_unsafe_error) {
+ // May be we are at method entry and requires to save do not
unlock flag.
+ UnlockFlagSaver fs(this);
so now after the InternalError has been created and thrown we will
restore the original value of the do_not_unlock_if_synchronized flag
(false) and so the InternalError will not cause activation removal
to attempt to unlock the not-locked monitor.
The scope of the UnlockFlagSaver could be narrowed to the actual
logic for processing the unsafe access error, but it seems fine at
method scope.
A second fix is that the overflow counter check had an assertion
that it was not executed with any pending exceptions. But that
turned out to be false for reasons I can't fully explain, but it
again appears to relate to a pending async exception being installed
prior to the method call - and seems related to the two referenced
JVM TI functions. The simple solution here is to delete the
assertion and to check for pending exceptions on entry to the code
and just return immediately. The JRT_ENTRY destructor will see the
pending exception and propagate it.
Cheers,
David
On 16/07/2020 9:50 am, David Holmes wrote:
Hi Jamsheed,
On 16/07/2020 8:16 am, Jamsheed C M wrote:
(Thank you Dean, adding serviceability team as this issue involves
JVMTI features PopFrame, EarlyReturn features)
It is not at all obvious how your proposed fix impacts the JVM TI
features.
JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381
(testing: mach5, tier1-5 links in JBS)
Best regards,
Jamsheed
On 15/07/2020 21:25, Jamsheed C M wrote:
Hi,
Async handling at method entry requires it to be aware of
synchronization(like whether it is doing async handling before
lock acquire or after)
This is required as exception handler rely on this info for
unlocking. Async handling code never had this special condition
handled and it worked most of the time as we were using biased
locking which got disabled by [1]
There was one other issue reported in similar time[2]. This issue
got triggered in test case by [3], back to back extra safepoint
after suspend and TLH for ThreadDeath. So in this setup both
PopFrame request and Thread.Stop request happened together for
the test scenario and it reached java method entry with
pending_exception set.
I have done a partial fix for the issue, mainly to handle
production mode crash failures(do not unlock flag related ones)
Fix detail:
1) I save restore the "do not unlock" flag in async handling.
Sorry but you completely changed the fix compared to what we
discussed and what I pre-reviewed! What happened to changing from
JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of
time and effort to determine that this save/restore of the "do not
unlock flag" is actually correct and valid!
2) Return for floating pending exception for some cases(PopFrame,
Early return related). This is debug(JVMTI) feature and floating
exception can get cleaned just like that in present compiler
request and deopt code.
What part of the change addresses this?
Thanks,
David
-----
webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/
There are more problems in these code areas, like we clear all
exceptions in compilation request path(interpreter,c1), as well
as deoptimization path.
All these un-handled cases will be separately handled by
https://bugs.openjdk.java.net/browse/JDK-8249451
Request for review.
Best regards,
Jamsheed
[1]https://bugs.openjdk.java.net/browse/JDK-8231264
<https://bugs.openjdk.java.net/browse/JDK-8231264>
[2] https://bugs.openjdk.java.net/browse/JDK-8246727
[3] https://bugs.openjdk.java.net/browse/JDK-8221207