On Thu, 29 Oct 2020 21:23:12 GMT, Coleen Phillimore <[email protected]> wrote:
>> The imasm::remove_activation() call does not deal with safepoints very well.
>> However, when the MethodExit JVMTI event is being called, we call into the
>> runtime in the middle of remove_activation(). If the value being returned is
>> an object type, then the top-of-stack contains the oop. However, the GC does
>> not traverse said oop in any oop map, because it is simply not expected that
>> we safepoint in the middle of remove_activation().
>>
>> The JvmtiExport::post_method_exit() function we end up calling, reads the
>> top-of-stack oop, and puts it in a handle. Then it calls JVMTI callbacks,
>> that eventually call Java and a bunch of stuff that safepoints. So after the
>> JVMTI callback, we can expect the top-of-stack oop to be broken.
>> Unfortunately, when we continue, we therefore end up returning a broken oop.
>>
>> Notably, the fact that InterpreterRuntime::post_method_exit is a JRT_ENTRY,
>> is wrong, as we can safepoint on the way back to Java, which will break the
>> return oop in a similar way. So this patch makes it a JRT_BLOCK_ENTRY,
>> moving the transition to VM and back, into a block of code that is protected
>> against GC. Before the JRT_BLOCK is called, we stash away the return oop,
>> and after the JRT_BLOCK_END, we restore the top-of-stack oop. In the path
>> when InterpreterRuntime::post_method_exit is called when throwing an
>> exception, we don't have the same problem of retaining an oop result, and
>> hence the JRT_BLOCK/JRT_BLOCK_END section is not performed in this case; the
>> logic is the same as before for this path.
>>
>> This is a JVMTI bug that has probably been around for a long time. It
>> crashes with all GCs, but was discovered recently after concurrent stack
>> processing, as StefanK has been running better GC stressing code in JVMTI,
>> and the bug reproduced more easily with concurrent stack processing, as the
>> timings were a bit different. The following reproducer failed pretty much
>> 100% of the time:
>> while true; do make test JTREG="RETAIN=all"
>> TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/MethodExitEvent/returnValue/returnValue003/returnValue003.java
>> TEST_OPTS_JAVA_OPTIONS="-XX:+UseZGC -Xmx2g -XX:ZCollectionInterval=0.0001
>> -XX:ZFragmentationLimit=0.01 -XX:+VerifyOops -XX:+ZVerifyViews -Xint" ; done
>>
>> With my fix I can run this repeatedly without any more failures. I have also
>> sanity checked the patch by running tier 1-5, so that it does not introduces
>> any new issues on its own. I have also used Stefan's nice external GC
>> stressing with jcmd technique that was used to trigger crashes with other
>> GCs, to make sure said crashes no longer reproduce either.
>
> src/hotspot/share/prims/jvmtiExport.cpp line 1600:
>
>> 1598:
>> 1599: if (exception_exit) {
>> 1600: post_method_exit_inner(thread, mh, state, exception_exit,
>> current_frame, result, value);
>
> I think for exception exit, you also need JRT_BLOCK because you want the
> transition to thread_in_VM for this code, since JRT_BLOCK_ENTRY doesn't do
> the transition. It should be safe for exception exit and retain the old
> behavior.
Thanks for having a look coleen. In fact, not doing the JRT_BLOCK for the
exception entry is intentional, because that entry goes through a different
JRT_ENTRY (not JRT_BLOCK_ENTRY), that already transitions. So if I do the
JRT_BLOCK for the exception path, it asserts saying hey you are already in VM.
-------------
PR: https://git.openjdk.java.net/jdk/pull/930