RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Reingruber, Richard Tue, 31 Mar 2020 23:21:40 -0700

  > Thanks for cleaning up thread.hpp!

Thanks for providing the feedback!


I justed noticed that the forward declaration of class 
jvmtiDeferredLocalVariableSet is not required anymore. Will remove it in the 
next webrev. Hope to get some more (partial) reviews.

Thanks, Richard.

-----Original Message-----
From: Robbin Ehn <robbin....@oracle.com> 
Sent: Dienstag, 31. März 2020 16:21
To: Reingruber, Richard <richard.reingru...@sap.com>; Doerr, Martin 
<martin.do...@sap.com>; Lindenmaier, Goetz <goetz.lindenma...@sap.com>; David 
Holmes <david.hol...@oracle.com>; Vladimir Kozlov (vladimir.koz...@oracle.com) 
<vladimir.koz...@oracle.com>; serviceability-dev@openjdk.java.net; 
hotspot-compiler-...@openjdk.java.net; hotspot-runtime-...@openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in 
the Presence of JVMTI Agents

Thanks for cleaning up thread.hpp!

/Robbin

On 2020-03-30 10:31, Reingruber, Richard wrote:
> Hi,
> 
> this is webrev.5 based on Robbin's feedback and Martin's review - thanks! :)
> 
> The change affects jvmti, hotspot and c2. Partial reviews are very welcome 
> too.
> 
> Full:  http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/
> Delta: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5.inc/
> 
> Robbin, Martin, please let me know, if anything shouldn't be quite as you 
> wanted it. Also find my
> comments on your feedback below.
> 
> Robbin, can I count you as Reviewer for the runtime part?
> 
> Thanks, Richard.
> 
> --
> 
>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
>> You can move both declaration and definition to that file, no need to clobber
>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
> 
> Done.
> 
>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's 
>> own
>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
> 
> I moved JvmtiDeferredUpdates to vframe_hp.hpp where preexisting 
> jvmtiDeferredLocalVariableSet is
> declared.
> 
>> src/hotspot/share/code/compiledMethod.cpp
>> Nice cleanup!
> 
> Thanks :)
> 
>> src/hotspot/share/code/debugInfoRec.cpp
>> src/hotspot/share/code/debugInfoRec.hpp
>> Additional parmeters. (Remark: I think "non_global_escape_in_scope" would 
>> read better than "not_global_escape_in_scope", but your version is 
>> consistent with existing code, so no change request from my side.) Ok.
> 
> I've been thinking about this too and finally stayed with 
> not_global_escape_in_scope. It's supposed
> to mean an object whose escape state is not GlobalEscape is in scope.
> 
>> src/hotspot/share/compiler/compileBroker.cpp
>> src/hotspot/share/compiler/compileBroker.hpp
>> Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a 
>> follow up change together with the test in order to make this webrev 
>> smaller, but since it is included, I'm reviewing everything at once. Not a 
>> big deal.) Ok.
> 
> Yes the change would be a little smaller. And if it helps I'll split it off. 
> In general I prefer
> patches that bring along a suitable amount of tests.
> 
>> src/hotspot/share/opto/c2compiler.cpp
>> Make do_escape_analysis independent of JVMCI capabilities. Nice!
> 
> It is the main goal of the enhancement. It is done for C2, but could be done 
> for JVMCI compilers
> with just a small effort as well.
> 
>> src/hotspot/share/opto/escape.cpp
>> Annotation for MachSafePointNodes. Your added functionality looks correct.
>> But I'd prefer to move the bulky code out of the large function.
>> I suggest to factor out something like has_not_global_escape and 
>> has_arg_escape. So the code could look like this:
>>        SafePointNode* sfn = sfn_worklist.at(next);
>>        sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
>>        if (sfn->is_CallJava()) {
>>          CallJavaNode* call = sfn->as_CallJava();
>>          call->set_arg_escape(has_arg_escape(call));
>>        }
>> This would also allow us to get rid of the found_..._escape_in_args 
>> variables making the loops better readable.
> 
> Done.
> 
>> It's kind of ugly to use strcmp to recognize uncommon trap, but that seems 
>> to be the way to do it (there are more such places). So it's ok.
> 
> Yeah. I copied the snippet.
> 
>> src/hotspot/share/prims/jvmtiImpl.cpp
>> src/hotspot/share/prims/jvmtiImpl.hpp
>> The sequence is pretty complex:
>> VM_GetOrSetLocal element initialization executes EscapeBarrier code which 
>> suspends the target thread (extra VM Operation).
> 
> Note that the target threads have to be suspended already for 
> VM_GetOrSetLocal*. So it's mainly the
> synchronization effect of EscapeBarrier::sync_and_suspend_one() that is 
> required here. Also no extra
> _handshake_ is executed, since sync_and_suspend_one() will find the target 
> threads already
> suspended.
> 
>> VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread 
>> to prepare VM Operation with frame deoptimization).
>> VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor which 
>> resumes the target thread.
>> But I don't have any improvement proposal. Performance is probably not a 
>> concern, here. So it's ok.
> 
>> VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has 
>> non-globally escaping objects and other frames if they have arg escaping 
>> ones. Good.
> 
> It's not specifically the top frame, but the frame that is accessed.
> 
>> src/hotspot/share/runtime/deoptimization.cpp
>> Object deoptimization. I have more comments and proposals, here.
>> First of all, handling recursive and waiting locks in relock_objects is 
>> tricky, but looks correct.
>> Comments are sufficient to understand why things are done as they are 
>> implemented.
> 
>> BiasedLocking related parts are complex, but we may get rid of them in the 
>> future (with BiasedLocking removal).
>> Anyway, looks correct, too.
> 
>> Typo in comment: "regularily" => "regularly"
> 
>> Deoptimization::fetch_unroll_info_helper is the only place where 
>> _jvmti_deferred_updates get deallocated (except JavaThread destructor). But 
>> I think we always go through it, so I can't see a memory leak or such kind 
>> of issues.
> 
> That's correct. The compiled frame for which deferred updates are allocated 
> is always deoptimized
> before (see EscapeBarrier::deoptimize_objects()). This is also asserted in
> compiledVFrame::update_deferred_value(). I've added the same assertion to
> Deoptimization::relock_objects(). So we can be sure that 
> _jvmti_deferred_updates are deallocated
> again in fetch_unroll_info_helper().
> 
>> EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread().
> 
> Sure, well spotted!
> 
>> You can use MutexLocker and MonitorLocker with Thread* to save the 
>> Thread::current() call.
> 
> Right, good hint. This was recently introduced with 8235678. I even had to 
> resolve conflicts. Should
> have done this then.
> 
>> I'd make set_objs_are_deoptimized static and remove it from the 
>> EscapeBarrier interface because I think it shouldn't be used outside of 
>> EscapeBarrier::deoptimize_objects.
> 
> Done.
> 
>> Typo in comment: "we must only deoptimize" => "we only have to deoptimize"
> 
> Replaced with "[...] we deoptimize iff local objects are passed as args"
> 
>> "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and 
>> barrier_active() is redundant. Implementation can get moved to hpp file.
> 
> Ok. Done.
> 
>> I'll get back to suspend flags, later.
> 
>> There are weird cases regarding _self_deoptimization_in_progress.
>> Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C 
>> can set _self_deoptimization_in_progress while A performs the handshake for 
>> suspending C. I think this doesn't lead to errors, but it's probably not 
>> desired.
>> I think it would be better to use only one "wait" call in 
>> sync_and_suspend_one and sync_and_suspend_all.
> 
> You're right. We've discussed that face-to-face, but couldn't find a real 
> issue. But now, thinking again, a reckon I found one:
> 
> 2808   // Sync with other threads that might be doing deoptimizations
> 2809   {
> 2810     // Need to switch to _thread_blocked for the wait() call
> 2811     ThreadBlockInVM tbivm(_calling_thread);
> 2812     MonitorLocker ml(EscapeBarrier_lock, 
> Mutex::_no_safepoint_check_flag);
> 2813     while (_self_deoptimization_in_progress) {
> 2814       ml.wait();
> 2815     }
> 2816
> 2817     if (self_deopt()) {
> 2818       _self_deoptimization_in_progress = true;
> 2819     }
> 2820
> 2821     while (_deoptee_thread->is_ea_obj_deopt_suspend()) {
> 2822       ml.wait();
> 2823     }
> 2824
> 2825     if (self_deopt()) {
> 2826       return;
> 2827     }
> 2828
> 2829     // set suspend flag for target thread
> 2830     _deoptee_thread->set_ea_obj_deopt_flag();
> 2831   }
> 
> - A waits in 2822
> - C is suspended
> - B notifies all in resume_one()
> - A and C wake up
> - C wins over A and sets _self_deoptimization_in_progress = true in 2818
> - C does the self deoptimization
> - A executes 2830 _deoptee_thread->set_ea_obj_deopt_flag()
> 
> C will self suspend at some undefined point. The resulting state is illegal.
> 
>> I first thought it'd be better to move ThreadBlockInVM before wait() to 
>> reduce thread state transitions, but that seems to be problematic because 
>> ThreadBlockInVM destructor contains a safepoint check which we shouldn't do 
>> while holding EscapeBarrier_lock. So no change request.
> 
> Yes, would be nice to have the state change only if needed, but for the 
> reason you mentioned it is
> not quite as easy as it seems to be. I experimented as well with a second 
> lock, but did not succeed.
> 
>> Change in thred_added:
>> I think the sequence would be more comprehensive if we waited for 
>> deopt_all_threads in Thread::start and all other places where a new thread 
>> can run into Java code (e.g. JVMTI attach).
>> Your version makes new threads come up with suspend flag set. That looks 
>> correct, too. Advantage is that you only have to change one place 
>> (thread_added). It'll be interesting to see how it will look like when we 
>> use async handshakes instead of suspend flags.
>> For now, I'm ok with your version.
> 
> I had a version that did what you are suggesting. The current version also 
> has the advantage, that
> there are fewer places where a thread has to wait for ongoing object 
> deoptimization. This means
> viewer places where you have to worry about correct thread state transitions, 
> possible deadlocks,
> and if all oops are properly Handle'ed.
> 
>> I'd only move MutexLocker ml(EscapeBarrier_lock...) after if 
>> (!jt->is_hidden_from_external_view()).
> 
> Done.
> 
>> Having 4 different deoptimize_objects functions makes it a little hard to 
>> keep an overview of which one is used for what.
>> Maybe adding suffixes would help a little bit, but I can also live with what 
>> you have.
>> Implementation looks correct to me.
> 
> 2 are internal. I added the suffix _internal to them. This leaves 2 to choose 
> from.
> 
>> src/hotspot/share/runtime/deoptimization.hpp
>> Escape barriers and object deoptimization functions.
>> Typo in comment: "helt" => "held"
> 
> Done in place already.
> 
>> src/hotspot/share/runtime/interfaceSupport.cpp
>> InterfaceSupport::deoptimizeAllObjects() is only used for 
>> DeoptimizeObjectsALot = 1.
>> I think DeoptimizeObjectsALot = 2 is more important, but I think it's not 
>> bad to have DeoptimizeObjectsALot = 1 in addition. Ok.
> 
> I never used DeoptimizeObjectsALot = 1 that much. It could be more 
> deterministic in single threaded
> scenarios. I wouldn't object to get rid of it though.
> 
>> src/hotspot/share/runtime/stackValue.hpp
>> Better reinitilization in StackValue. Good.
> 
> StackValue::obj_is_scalar_replaced() should not return true after calling 
> set_obj().
> 
>> src/hotspot/share/runtime/thread.cpp
>> src/hotspot/share/runtime/thread.hpp
>> src/hotspot/share/runtime/thread.inline.hpp
>> wait_for_object_deoptimization, suspend flag, deferred updates and test 
>> feature to deoptimize objects.
> 
>> In the long term, we want to get rid of suspend flags, so it's not so nice 
>> to introduce a new one. But I agree with Götz that it should be acceptable 
>> as temporary solution until async handshakes are available (which takes more 
>> time). So I'm ok with your change.
> 
> I'm keen to build the feature on async handshakes when the arive.
> 
>> You can use MutexLocker with Thread*.
> 
> Done.
> 
>> JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out 
>> of thread.hpp.
> 
> Done.
> 
>> src/hotspot/share/runtime/vframe.cpp
>> Added support for entry frame to new_vframe. Ok.
> 
> 
>> src/hotspot/share/runtime/vframe_hp.cpp
>> src/hotspot/share/runtime/vframe_hp.hpp
> 
>> I think code()->as_nmethod() in not_global_escape_in_scope() and 
>> arg_escape() should better be under #ifdef ASSERT or inside the assert 
>> statement (no need for code cache walking in product build).
> 
> Done.
> 
>> jvmtiDeferredLocalVariableSet::update_monitors:
>> Please add a comment explaining that owner referenced by original info may 
>> be scalar replaced, but it is deoptimized in the vframe.
> 
> Done.
> 
> -----Original Message-----
> From: Doerr, Martin <martin.do...@sap.com>
> Sent: Donnerstag, 12. März 2020 17:28
> To: Reingruber, Richard <richard.reingru...@sap.com>; 'Robbin Ehn' 
> <robbin....@oracle.com>; Lindenmaier, Goetz <goetz.lindenma...@sap.com>; 
> David Holmes <david.hol...@oracle.com>; Vladimir Kozlov 
> (vladimir.koz...@oracle.com) <vladimir.koz...@oracle.com>; 
> serviceability-dev@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; 
> hotspot-runtime-...@openjdk.java.net
> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in 
> the Presence of JVMTI Agents
> 
> Hi Richard,
> 
> 
> I managed to find time for a (almost) complete review of webrev.4. (I'll 
> review the tests separately.)
> 
> First of all, the change seems to be in pretty good quality for its 
> significant complexity. I couldn't find any real bugs. But I'd like to 
> propose minor improvements.
> I'm convinced that it's mature because we did substantial testing.
> 
> I like the new functionality for object deoptimization. It can possibly be 
> reused for future escape analysis based optimizations. So I appreciate having 
> it available in the code base.
> In addition to that, your change makes the JVMTI implementation better 
> integrated into the VM.
> 
> 
> Now to the details:
> 
> 
> src/hotspot/share/c1/c1_IR.hpp
> describe_scope parameters. Ok.
> 
> 
> src/hotspot/share/ci/ciEnv.cpp
> src/hotspot/share/ci/ciEnv.hpp
> Fix for JvmtiExport::can_walk_any_space() capability. Ok.
> 
> 
> src/hotspot/share/code/compiledMethod.cpp
> Nice cleanup!
> 
> 
> src/hotspot/share/code/debugInfoRec.cpp
> src/hotspot/share/code/debugInfoRec.hpp
> Additional parmeters. (Remark: I think "non_global_escape_in_scope" would 
> read better than "not_global_escape_in_scope", but your version is consistent 
> with existing code, so no change request from my side.) Ok.
> 
> 
> src/hotspot/share/code/nmethod.cpp
> Nice cleanup!
> 
> 
> src/hotspot/share/code/pcDesc.hpp
> Additional parameters. Ok.
> 
> 
> src/hotspot/share/code/scopeDesc.cpp
> src/hotspot/share/code/scopeDesc.hpp
> Improved implementation + additional parameters. Ok.
> 
> 
> src/hotspot/share/compiler/compileBroker.cpp
> src/hotspot/share/compiler/compileBroker.hpp
> Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a 
> follow up change together with the test in order to make this webrev smaller, 
> but since it is included, I'm reviewing everything at once. Not a big deal.) 
> Ok.
> 
> 
> src/hotspot/share/jvmci/jvmciCodeInstaller.cpp
> Additional parameters. Ok.
> 
> 
> src/hotspot/share/opto/c2compiler.cpp
> Make do_escape_analysis independent of JVMCI capabilities. Nice!
> 
> 
> src/hotspot/share/opto/callnode.hpp
> Additional fields for MachSafePointNodes. Ok.
> 
> 
> src/hotspot/share/opto/escape.cpp
> Annotation for MachSafePointNodes. Your added functionality looks correct.
> But I'd prefer to move the bulky code out of the large function.
> I suggest to factor out something like has_not_global_escape and 
> has_arg_escape. So the code could look like this:
>        SafePointNode* sfn = sfn_worklist.at(next);
>        sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn));
>        if (sfn->is_CallJava()) {
>          CallJavaNode* call = sfn->as_CallJava();
>          call->set_arg_escape(has_arg_escape(call));
>        }
> This would also allow us to get rid of the found_..._escape_in_args variables 
> making the loops better readable.
> 
> It's kind of ugly to use strcmp to recognize uncommon trap, but that seems to 
> be the way to do it (there are more such places). So it's ok.
> 
> 
> src/hotspot/share/opto/machnode.hpp
> Additional fields for MachSafePointNodes. Ok.
> 
> 
> src/hotspot/share/opto/macro.cpp
> Allow elimination of non-escaping allocations. Ok.
> 
> 
> src/hotspot/share/opto/matcher.cpp
> src/hotspot/share/opto/output.cpp
> Copy attribute / pass parameters. Ok.
> 
> 
> src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp
> Nice cleanup!
> 
> 
> src/hotspot/share/prims/jvmtiEnv.cpp
> src/hotspot/share/prims/jvmtiEnvBase.cpp
> Escape barriers + deoptimize objects for target thread. Good.
> 
> 
> src/hotspot/share/prims/jvmtiImpl.cpp
> src/hotspot/share/prims/jvmtiImpl.hpp
> The sequence is pretty complex:
> VM_GetOrSetLocal element initialization executes EscapeBarrier code which 
> suspends the target thread (extra VM Operation).
> VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread 
> to prepare VM Operation with frame deoptimization).
> VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor which 
> resumes the target thread.
> But I don't have any improvement proposal. Performance is probably not a 
> concern, here. So it's ok.
> 
> VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has 
> non-globally escaping objects and other frames if they have arg escaping 
> ones. Good.
> 
> 
> src/hotspot/share/prims/jvmtiTagMap.cpp
> Escape barriers + deoptimize objects for all threads. Ok.
> 
> 
> src/hotspot/share/prims/whitebox.cpp
> Added WB_IsFrameDeoptimized to API. Ok.
> 
> 
> src/hotspot/share/runtime/deoptimization.cpp
> Object deoptimization. I have more comments and proposals, here.
> First of all, handling recursive and waiting locks in relock_objects is 
> tricky, but looks correct.
> Comments are sufficient to understand why things are done as they are 
> implemented.
> 
> BiasedLocking related parts are complex, but we may get rid of them in the 
> future (with BiasedLocking removal).
> Anyway, looks correct, too.
> 
> Typo in comment: "regularily" => "regularly"
> 
> Deoptimization::fetch_unroll_info_helper is the only place where 
> _jvmti_deferred_updates get deallocated (except JavaThread destructor). But I 
> think we always go through it, so I can't see a memory leak or such kind of 
> issues.
> 
> EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread().
> 
> You can use MutexLocker and MonitorLocker with Thread* to save the 
> Thread::current() call.
> 
> I'd make set_objs_are_deoptimized static and remove it from the EscapeBarrier 
> interface because I think it shouldn't be used outside of 
> EscapeBarrier::deoptimize_objects.
> 
> Typo in comment: "we must only deoptimize" => "we only have to deoptimize"
> 
> "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and 
> barrier_active() is redundant. Implementation can get moved to hpp file.
> 
> I'll get back to suspend flags, later.
> 
> There are weird cases regarding _self_deoptimization_in_progress.
> Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C 
> can set _self_deoptimization_in_progress while A performs the handshake for 
> suspending C. I think this doesn't lead to errors, but it's probably not 
> desired.
> I think it would be better to use only one "wait" call in 
> sync_and_suspend_one and sync_and_suspend_all.
> 
> I first thought it'd be better to move ThreadBlockInVM before wait() to 
> reduce thread state transitions, but that seems to be problematic because 
> ThreadBlockInVM destructor contains a safepoint check which we shouldn't do 
> while holding EscapeBarrier_lock. So no change request.
> 
> Change in thred_added:
> I think the sequence would be more comprehensive if we waited for 
> deopt_all_threads in Thread::start and all other places where a new thread 
> can run into Java code (e.g. JVMTI attach).
> Your version makes new threads come up with suspend flag set. That looks 
> correct, too. Advantage is that you only have to change one place 
> (thread_added). It'll be interesting to see how it will look like when we use 
> async handshakes instead of suspend flags.
> For now, I'm ok with your version.
> 
> I'd only move MutexLocker ml(EscapeBarrier_lock...) after if 
> (!jt->is_hidden_from_external_view()).
> 
> Having 4 different deoptimize_objects functions makes it a little hard to 
> keep an overview of which one is used for what.
> Maybe adding suffixes would help a little bit, but I can also live with what 
> you have.
> Implementation looks correct to me.
> 
> 
> src/hotspot/share/runtime/deoptimization.hpp
> Escape barriers and object deoptimization functions.
> Typo in comment: "helt" => "held"
> 
> 
> src/hotspot/share/runtime/globals.hpp
> Addition of develop flag DeoptimizeObjectsALotInterval. Ok.
> 
> 
> src/hotspot/share/runtime/interfaceSupport.cpp
> InterfaceSupport::deoptimizeAllObjects() is only used for 
> DeoptimizeObjectsALot = 1.
> I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad 
> to have DeoptimizeObjectsALot = 1 in addition. Ok.
> 
> 
> src/hotspot/share/runtime/interfaceSupport.inline.hpp
> Addition of deoptimizeAllObjects. Ok.
> 
> 
> src/hotspot/share/runtime/mutexLocker.cpp
> src/hotspot/share/runtime/mutexLocker.hpp
> Addition of EscapeBarrier_lock. Ok.
> 
> 
> src/hotspot/share/runtime/objectMonitor.cpp
> Make recursion count relock aware. Ok.
> 
> 
> src/hotspot/share/runtime/stackValue.hpp
> Better reinitilization in StackValue. Good.
> 
> 
> src/hotspot/share/runtime/thread.cpp
> src/hotspot/share/runtime/thread.hpp
> src/hotspot/share/runtime/thread.inline.hpp
> wait_for_object_deoptimization, suspend flag, deferred updates and test 
> feature to deoptimize objects.
> 
> In the long term, we want to get rid of suspend flags, so it's not so nice to 
> introduce a new one. But I agree with Götz that it should be acceptable as 
> temporary solution until async handshakes are available (which takes more 
> time). So I'm ok with your change.
> 
> You can use MutexLocker with Thread*.
> 
> JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out 
> of thread.hpp.
> 
> 
> src/hotspot/share/runtime/vframe.cpp
> Added support for entry frame to new_vframe. Ok.
> 
> 
> src/hotspot/share/runtime/vframe_hp.cpp
> src/hotspot/share/runtime/vframe_hp.hpp
> 
> I think code()->as_nmethod() in not_global_escape_in_scope() and arg_escape() 
> should better be under #ifdef ASSERT or inside the assert statement (no need 
> for code cache walking in product build).
> 
> jvmtiDeferredLocalVariableSet::update_monitors:
> Please add a comment explaining that owner referenced by original info may be 
> scalar replaced, but it is deoptimized in the vframe.
> 
> 
> src/hotspot/share/utilities/macros.hpp
> Addition of NOT_COMPILER2_OR_JVMCI_RETURN macros. Ok.
> 
> 
> test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysisEnabled.java
> test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnalysisEnabled.c
> New test. Will review separately.
> 
> 
> test/jdk/TEST.ROOT
> Addition of vm.jvmci as required property. Ok.
> 
> 
> test/jdk/com/sun/jdi/EATests.java
> test/jdk/com/sun/jdi/EATestsJVMCI.java
> New test. Will review separately.
> 
> 
> test/lib/sun/hotspot/WhiteBox.java
> Added isFrameDeoptimized to API. Ok.
> 
> 
> That was it. Best regards,
> Martin
> 
> 
>> -----Original Message-----
>> From: hotspot-compiler-dev <hotspot-compiler-dev-
>> boun...@openjdk.java.net> On Behalf Of Reingruber, Richard
>> Sent: Dienstag, 3. März 2020 21:23
>> To: 'Robbin Ehn' <robbin....@oracle.com>; Lindenmaier, Goetz
>> <goetz.lindenma...@sap.com>; David Holmes <david.hol...@oracle.com>;
>> Vladimir Kozlov (vladimir.koz...@oracle.com)
>> <vladimir.koz...@oracle.com>; serviceability-dev@openjdk.java.net;
>> hotspot-compiler-...@openjdk.java.net; hotspot-runtime-
>> d...@openjdk.java.net
>> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance in the Presence of JVMTI Agents
>>
>> Hi Robbin,
>>
>>>> I understand that Robbin proposed to replace the usage of
>>>> _suspend_flag with handshakes. Apparently, async handshakes
>>>> are needed to do so. We have been waiting a while for removal
>>>> of the _suspend_flag / introduction of async handshakes [2].
>>>> What is the status here?
>>
>>> I have an old prototype which I would like to continue to work on.
>>> So do not assume asynch handshakes will make 15.
>>> Even if it would, I think there are a lot more investigate work to remove
>>> _suspend_flag.
>>
>> Let us know, if we can be of any help to you and be it only testing.
>>
>>>>> Full:
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
>>
>>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
>>> You can move both declaration and definition to that file, no need to
>> clobber
>>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
>>
>> Will do.
>>
>>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's
>> own
>>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
>>
>> You are right. It shouldn't be declared in thread.hpp. I will look into that.
>>
>>> Note that we also think we may have a bug in deopt:
>>> https://bugs.openjdk.java.net/browse/JDK-8238237
>>
>>> I think it would be best, if possible, to push after that is resolved.
>>
>> Sure.
>>
>>> Not even nearly a full review :)
>>
>> I know :)
>>
>> Anyways, thanks a lot,
>> Richard.
>>
>>
>> -----Original Message-----
>> From: Robbin Ehn <robbin....@oracle.com>
>> Sent: Monday, March 2, 2020 11:17 AM
>> To: Lindenmaier, Goetz <goetz.lindenma...@sap.com>; Reingruber, Richard
>> <richard.reingru...@sap.com>; David Holmes <david.hol...@oracle.com>;
>> Vladimir Kozlov (vladimir.koz...@oracle.com)
>> <vladimir.koz...@oracle.com>; serviceability-dev@openjdk.java.net;
>> hotspot-compiler-...@openjdk.java.net; hotspot-runtime-
>> d...@openjdk.java.net
>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance
>> in the Presence of JVMTI Agents
>>
>> Hi,
>>
>> On 2/24/20 5:39 PM, Lindenmaier, Goetz wrote:
>>> Hi,
>>>
>>> I had a look at the progress of this change. Nothing
>>> happened since Richard posted his update using more
>>> handshakes [1].
>>> But we (SAP) would appreciate a lot if this change could
>>> be successfully reviewed and pushed.
>>>
>>> I think there is basic understanding that this
>>> change is helpful. It fixes a number of issues with JVMTI,
>>> and will deliver the same performance benefits as EA
>>> does in current production mode for debugging scenarios.
>>>
>>> This is important for us as we run our VMs prepared
>>> for debugging in production mode.
>>>
>>> I understand that Robbin proposed to replace the usage of
>>> _suspend_flag with handshakes. Apparently, async handshakes
>>> are needed to do so. We have been waiting a while for removal
>>> of the _suspend_flag / introduction of async handshakes [2].
>>> What is the status here?
>>
>> I have an old prototype which I would like to continue to work on.
>> So do not assume asynch handshakes will make 15.
>> Even if it would, I think there are a lot more investigate work to remove
>> _suspend_flag.
>>
>>>
>>> I think we should no longer wait, but proceed with
>>> this change. We will look into removing the usage of
>>> suspend_flag introduced here once it is possible to implement
>>> it with handshakes.
>>
>> Yes, sure.
>>
>>>> Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/
>>
>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp.
>> You can move both declaration and definition to that file, no need to clobber
>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry)
>>
>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's
>> own
>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp.
>>
>> Note that we also think we may have a bug in deopt:
>> https://bugs.openjdk.java.net/browse/JDK-8238237
>>
>> I think it would be best, if possible, to push after that is resolved.
>>
>> Not even nearly a full review :)
>>
>> Thanks, Robbin
>>
>>
>>>> Incremental:
>>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4.inc/
>>>>
>>>> I was not able to eliminate the additional suspend flag now. I'll take care
>> of this
>>>> as soon as the
>>>> existing suspend-resume-mechanism is reworked.
>>>>
>>>> Testing:
>>>>
>>>> Nightly tests @SAP:
>>>>
>>>>     JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
>> Renaissance
>>>> Suite, SAP specific tests
>>>>     with fastdebug and release builds on all platforms
>>>>
>>>>     Stress testing with DeoptimizeObjectsALot running SPECjvm2008 40x
>> parallel
>>>> for 24h
>>>>
>>>> Thanks, Richard.
>>>>
>>>>
>>>> More details on the changes:
>>>>
>>>> * Hide DeoptimizeObjectsALotThread from external view.
>>>>
>>>> * Changed EscapeBarrier_lock to be a _safepoint_check_never lock.
>>>>     It used to be _safepoint_check_sometimes, which will be eliminated
>> sooner or
>>>> later.
>>>>     I added explicit thread state changes with ThreadBlockInVM to code
>> paths
>>>> where we can wait()
>>>>     on EscapeBarrier_lock to become safepoint safe.
>>>>
>>>> * Use handshake EscapeBarrierSuspendHandshake to suspend target
>> threads
>>>> instead of vm operation
>>>>     VM_ThreadSuspendAllForObjDeopt.
>>>>
>>>> * Removed uses of Threads_lock. When adding a new thread we suspend
>> it iff
>>>> EA optimizations are
>>>>     being reverted. In the previous version we were waiting on
>> Threads_lock
>>>> while EA optimizations
>>>>     were reverted. See EscapeBarrier::thread_added().
>>>>
>>>> * Made tests require Xmixed compilation mode.
>>>>
>>>> * Made tests agnostic regarding tiered compilation.
>>>>     I.e. tc isn't disabled anymore, and the tests can be run with tc 
>>>> enabled or
>>>> disabled.
>>>>
>>>> * Exercising EATests.java as well with stress test options
>>>> DeoptimizeObjectsALot*
>>>>     Due to the non-deterministic deoptimizations some tests need to be
>> skipped.
>>>>     We do this to prevent bit-rot of the stress test code.
>>>>
>>>> * Executing EATests.java as well with graal if available. Driver for this 
>>>> is
>>>>     EATestsJVMCI.java. Graal cannot pass all tests, because it does not
>> provide all
>>>> the new debug info
>>>>     (namely not_global_escape_in_scope and arg_escape in
>> scopeDesc.hpp).
>>>>     And graal does not yet support the JVMTI operations force early return
>> and
>>>> pop frame.
>>>>
>>>> * Removed tracing from new jdi tests in EATests.java. Too much trace
>> output
>>>> before the debugging
>>>>     connection is established can cause deadlock because output buffers 
>>>> fill
>> up.
>>>>     (See https://bugs.openjdk.java.net/browse/JDK-8173304)
>>>>
>>>> * Many copyright year changes and smaller clean-up changes of testing
>> code
>>>> (trailing white-space and
>>>>     the like).
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes <david.hol...@oracle.com>
>>>> Sent: Donnerstag, 19. Dezember 2019 03:12
>>>> To: Reingruber, Richard <richard.reingru...@sap.com>; serviceability-
>>>> d...@openjdk.java.net; hotspot-compiler-...@openjdk.java.net;
>> hotspot-
>>>> runtime-...@openjdk.java.net; Vladimir Kozlov
>> (vladimir.koz...@oracle.com)
>>>> <vladimir.koz...@oracle.com>
>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance in
>>>> the Presence of JVMTI Agents
>>>>
>>>> Hi Richard,
>>>>
>>>> I think my issue is with the way EliminateNestedLocks works so I'm going
>>>> to look into that more deeply.
>>>>
>>>> Thanks for the explanations.
>>>>
>>>> David
>>>>
>>>> On 18/12/2019 12:47 am, Reingruber, Richard wrote:
>>>>> Hi David,
>>>>>
>>>>>      > >    > Some further queries/concerns:
>>>>>      > >    >
>>>>>      > >    > src/hotspot/share/runtime/objectMonitor.cpp
>>>>>      > >    >
>>>>>      > >    > Can you please explain the changes to ObjectMonitor::wait:
>>>>>      > >    >
>>>>>      > >    > !   _recursions = save      // restore the old recursion 
>>>>> count
>>>>>      > >    > !                 + 
>>>>> jt->get_and_reset_relock_count_after_wait(); //
>>>>>      > >    > increased by the deferred relock count
>>>>>      > >    >
>>>>>      > >    > what is the "deferred relock count"? I gather it relates to
>>>>>      > >    >
>>>>>      > >    > "The code was extended to be able to deoptimize objects of a
>>>>>      > > frame that
>>>>>      > >    > is not the top frame and to let another thread than the 
>>>>> owning
>>>>>      > > thread do
>>>>>      > >    > it."
>>>>>      > >
>>>>>      > > Yes, these relate. Currently EA based optimizations are reverted,
>> when a
>>>> compiled frame is
>>>>>      > > replaced with corresponding interpreter frames. Part of this is
>> relocking
>>>> objects with eliminated
>>>>>      > > locking. New with the enhancement is that we do this also just
>> before
>>>> object references are
>>>>>      > > acquired through JVMTI. In this case we deoptimize also the
>> owning
>>>> compiled frame C and we
>>>>>      > > register deoptimized objects as deferred updates. When control
>> returns
>>>> to C it gets deoptimized,
>>>>>      > > we notice that objects are already deoptimized (reallocated and
>>>> relocked), so we don't do it again
>>>>>      > > (relocking twice would be incorrect of course). Deferred updates
>> are
>>>> copied into the new
>>>>>      > > interpreter frames.
>>>>>      > >
>>>>>      > > Problem: relocking is not possible if the target thread T is 
>>>>> waiting
>> on the
>>>> monitor that needs to
>>>>>      > > be relocked. This happens only with non-local objects with
>>>> EliminateNestedLocks. Instead relocking
>>>>>      > > is deferred until T owns the monitor again. This is what the 
>>>>> piece of
>>>> code above does.
>>>>>      >
>>>>>      >  Sorry I need some more detail here. How can you wait() on an
>> object
>>>>>      >  monitor if the object allocation and/or locking was optimised 
>>>>> away?
>> And
>>>>>      >  what is a "non-local object" in this context? Isn't EA restricted 
>>>>> to
>>>>>      >  thread-confined objects?
>>>>>
>>>>> "Non-local object" is an object that escapes its thread. The issue I'm
>>>> addressing with the changes
>>>>> in ObjectMonitor::wait are almost unrelated to EA. They are caused by
>>>> EliminateNestedLocks, where C2
>>>>> eliminates recursive locking of an already owned lock. The lock owning
>> object
>>>> exists on the heap, it
>>>>> is locked and you can call wait() on it.
>>>>>
>>>>> EliminateLocks is the C2 option that controls lock elimination based on
>> EA.
>>>> Both optimizations have
>>>>> in common that objects with eliminated locking need to be relocked
>> when
>>>> deoptimizing a frame,
>>>>> i.e. when replacing a compiled frame with equivalent interpreter
>>>>> frames. Deoptimization::relock_objects does that job for /all/ eliminated
>>>> locks in scope. /All/ can
>>>>> be a mix of eliminated nested locks and locks of not-escaping objects.
>>>>>
>>>>> New with the enhancement: I call relock_objects earlier, just before
>> objects
>>>> pontentially
>>>>> escape. But then later when the owning compiled frame gets
>> deoptimized, I
>>>> must not do it again:
>>>>>
>>>>> See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp:
>>>>>
>>>>>     373   if ((jvmci_enabled || ((DoEscapeAnalysis ||
>> EliminateNestedLocks) &&
>>>> EliminateLocks))
>>>>>     374       && !EscapeBarrier::objs_are_deoptimized(thread,
>> deoptee.id())) {
>>>>>     375     bool unused;
>>>>>     376     eliminate_locks(thread, chunk, realloc_failures, deoptee,
>> exec_mode,
>>>> unused);
>>>>>     377   }
>>>>>
>>>>> Now when calling relock_objects early it is quiet possible that I have to
>> relock
>>>> an object the
>>>>> target thread currently waits for. Obviously I cannot relock in this case,
>>>> instead I chose to
>>>>> introduce relock_count_after_wait to JavaThread.
>>>>>
>>>>>      >  Is it just that some of the locking gets optimized away e.g.
>>>>>      >
>>>>>      >  synchronised(obj) {
>>>>>      >     synchronised(obj) {
>>>>>      >       synchronised(obj) {
>>>>>      >         obj.wait();
>>>>>      >       }
>>>>>      >     }
>>>>>      >  }
>>>>>      >
>>>>>      >  If this is reduced to a form as-if it were a single lock of the 
>>>>> monitor
>>>>>      >  (due to EA) and the wait() triggers a JVM TI event which leads to 
>>>>> the
>>>>>      >  escape of "obj" then we need to reconstruct the true lock state, 
>>>>> and
>> so
>>>>>      >  when the wait() internally unblocks and reacquires the monitor it
>> has to
>>>>>      >  set the true recursion count to 3, not the 1 that it appeared to 
>>>>> be
>> when
>>>>>      >  wait() was initially called. Is that the scenario?
>>>>>
>>>>> Kind of... except that the locking is not eliminated due to EA and there 
>>>>> is
>> no
>>>> JVM TI event
>>>>> triggered by wait.
>>>>>
>>>>> Add
>>>>>
>>>>> LocalObject l1 = new LocalObject();
>>>>>
>>>>> in front of the synchrnized blocks and assume a JVM TI agent acquires l1.
>> This
>>>> triggers the code in
>>>>> question.
>>>>>
>>>>> See that relocking/reallocating is transactional. If it is done then for 
>>>>> /all/
>>>> objects in scope and it is
>>>>> done at most once. It wouldn't be quite so easy to split this in relocking
>> of
>>>> nested/EA-based
>>>>> eliminated locks.
>>>>>
>>>>>      >  If so I find this truly awful. Anyone using wait() in a realistic 
>>>>> form
>>>>>      >  requires a notification and so the object cannot be thread 
>>>>> confined.
>> In
>>>>>
>>>>> It is not thread confined.
>>>>>
>>>>>      >  which case I would strongly argue that upon hitting the wait() the
>> deopt
>>>>>      >  should occur unconditionally and so the lock state is correct 
>>>>> before
>> we
>>>>>      >  wait and so we don't need to mess with the recursion count
>> internally
>>>>>      >  when we reacquire the monitor.
>>>>>      >
>>>>>      > >
>>>>>      > >    > which I don't like the sound of at all when it comes to
>> ObjectMonitor
>>>>>      > >    > state. So I'd like to understand in detail exactly what is 
>>>>> going on
>> here
>>>>>      > >    > and why.  This is a very intrusive change that seems to 
>>>>> badly
>> break
>>>>>      > >    > encapsulation and impacts future changes to ObjectMonitor
>> that are
>>>> under
>>>>>      > >    > investigation.
>>>>>      > >
>>>>>      > > I would not regard this as breaking encapsulation. Certainly not
>> badly.
>>>>>      > >
>>>>>      > > I've added a property relock_count_after_wait to JavaThread. The
>>>> property is well
>>>>>      > > encapsulated. Future ObjectMonitor implementations have to deal
>> with
>>>> recursion too. They are free
>>>>>      > > in choosing a way to do that as long as that property is taken 
>>>>> into
>>>> account. This is hardly a
>>>>>      > > limitation.
>>>>>      >
>>>>>      >  I do think this badly breaks encapsulation as you have to add a
>> callout
>>>>>      >  from the guts of the ObjectMonitor code to reach into the thread 
>>>>> to
>> get
>>>>>      >  this lock count adjustment. I understand why you have had to do
>> this but
>>>>>      >  I would much rather see a change to the EA optimisation strategy 
>>>>> so
>> that
>>>>>      >  this is not needed.
>>>>>      >
>>>>>      > > Note also that the property is a straight forward extension of 
>>>>> the
>>>> existing concept of deferred
>>>>>      > > local updates. It is embedded into the structure holding them. So
>> not
>>>> even the footprint of a
>>>>>      > > JavaThread is enlarged if no deferred updates are generated.
>>>>>      >
>>>>>      > [...]
>>>>>      >
>>>>>      > >
>>>>>      > > I'm actually duplicating the existing external suspend mechanism,
>>>> because a thread can be
>>>>>      > > suspended at most once. And hey, and don't like that either! But 
>>>>> it
>>>> seems not unlikely that the
>>>>>      > > duplicate can be removed together with the original and the new
>> type
>>>> of handshakes that will be
>>>>>      > > used for thread suspend can be used for object deoptimization
>> too. See
>>>> today's discussion in
>>>>>      > > JDK-8227745 [2].
>>>>>      >
>>>>>      >  I hope that discussion bears some fruit, at the moment it seems 
>>>>> not
>> to
>>>>>      >  be possible to use handshakes here. :(
>>>>>      >
>>>>>      >  The external suspend mechanism is a royal pain in the proverbial
>> that we
>>>>>      >  have to carefully live with. The idea that we're duplicating that 
>>>>> for
>>>>>      >  use in another fringe area of functionality does not thrill me at 
>>>>> all.
>>>>>      >
>>>>>      >  To be clear, I understand the problem that exists and that you 
>>>>> wish
>> to
>>>>>      >  solve, but for the runtime parts I balk at the complexity cost of
>>>>>      >  solving it.
>>>>>
>>>>> I know it's complex, but by far no rocket science.
>>>>>
>>>>> Also I find it hard to imagine another fix for JDK-8233915 besides
>> changing
>>>> the JVM TI specification.
>>>>>
>>>>> Thanks, Richard.
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes <david.hol...@oracle.com>
>>>>> Sent: Dienstag, 17. Dezember 2019 08:03
>>>>> To: Reingruber, Richard <richard.reingru...@sap.com>; serviceability-
>>>> d...@openjdk.java.net; hotspot-compiler-...@openjdk.java.net;
>> hotspot-
>>>> runtime-...@openjdk.java.net; Vladimir Kozlov
>> (vladimir.koz...@oracle.com)
>>>> <vladimir.koz...@oracle.com>
>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>> Performance
>>>> in the Presence of JVMTI Agents
>>>>>
>>>>> <resend as my mailer crashed during last send>
>>>>>
>>>>> David
>>>>>
>>>>> On 17/12/2019 4:57 pm, David Holmes wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> On 14/12/2019 5:01 am, Reingruber, Richard wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>>       > Some further queries/concerns:
>>>>>>>       >
>>>>>>>       > src/hotspot/share/runtime/objectMonitor.cpp
>>>>>>>       >
>>>>>>>       > Can you please explain the changes to ObjectMonitor::wait:
>>>>>>>       >
>>>>>>>       > !   _recursions = save      // restore the old recursion count
>>>>>>>       > !                 + 
>>>>>>> jt->get_and_reset_relock_count_after_wait(); //
>>>>>>>       > increased by the deferred relock count
>>>>>>>       >
>>>>>>>       > what is the "deferred relock count"? I gather it relates to
>>>>>>>       >
>>>>>>>       > "The code was extended to be able to deoptimize objects of a
>>>>>>> frame that
>>>>>>>       > is not the top frame and to let another thread than the owning
>>>>>>> thread do
>>>>>>>       > it."
>>>>>>>
>>>>>>> Yes, these relate. Currently EA based optimizations are reverted,
>> when
>>>>>>> a compiled frame is replaced
>>>>>>> with corresponding interpreter frames. Part of this is relocking
>>>>>>> objects with eliminated
>>>>>>> locking. New with the enhancement is that we do this also just before
>>>>>>> object references are acquired
>>>>>>> through JVMTI. In this case we deoptimize also the owning compiled
>>>>>>> frame C and we register
>>>>>>> deoptimized objects as deferred updates. When control returns to C
>> it
>>>>>>> gets deoptimized, we notice
>>>>>>> that objects are already deoptimized (reallocated and relocked), so
>> we
>>>>>>> don't do it again (relocking
>>>>>>> twice would be incorrect of course). Deferred updates are copied into
>>>>>>> the new interpreter frames.
>>>>>>>
>>>>>>> Problem: relocking is not possible if the target thread T is waiting
>>>>>>> on the monitor that needs to be
>>>>>>> relocked. This happens only with non-local objects with
>>>>>>> EliminateNestedLocks. Instead relocking is
>>>>>>> deferred until T owns the monitor again. This is what the piece of
>>>>>>> code above does.
>>>>>>
>>>>>> Sorry I need some more detail here. How can you wait() on an object
>>>>>> monitor if the object allocation and/or locking was optimised away?
>> And
>>>>>> what is a "non-local object" in this context? Isn't EA restricted to
>>>>>> thread-confined objects?
>>>>>>
>>>>>> Is it just that some of the locking gets optimized away e.g.
>>>>>>
>>>>>> synchronised(obj) {
>>>>>>       synchronised(obj) {
>>>>>>         synchronised(obj) {
>>>>>>           obj.wait();
>>>>>>         }
>>>>>>       }
>>>>>> }
>>>>>>
>>>>>> If this is reduced to a form as-if it were a single lock of the monitor
>>>>>> (due to EA) and the wait() triggers a JVM TI event which leads to the
>>>>>> escape of "obj" then we need to reconstruct the true lock state, and so
>>>>>> when the wait() internally unblocks and reacquires the monitor it has to
>>>>>> set the true recursion count to 3, not the 1 that it appeared to be when
>>>>>> wait() was initially called. Is that the scenario?
>>>>>>
>>>>>> If so I find this truly awful. Anyone using wait() in a realistic form
>>>>>> requires a notification and so the object cannot be thread confined. In
>>>>>> which case I would strongly argue that upon hitting the wait() the
>> deopt
>>>>>> should occur unconditionally and so the lock state is correct before we
>>>>>> wait and so we don't need to mess with the recursion count internally
>>>>>> when we reacquire the monitor.
>>>>>>
>>>>>>>
>>>>>>>       > which I don't like the sound of at all when it comes to
>>>>>>> ObjectMonitor
>>>>>>>       > state. So I'd like to understand in detail exactly what is going
>>>>>>> on here
>>>>>>>       > and why.  This is a very intrusive change that seems to badly
>> break
>>>>>>>       > encapsulation and impacts future changes to ObjectMonitor that
>>>>>>> are under
>>>>>>>       > investigation.
>>>>>>>
>>>>>>> I would not regard this as breaking encapsulation. Certainly not badly.
>>>>>>>
>>>>>>> I've added a property relock_count_after_wait to JavaThread. The
>>>>>>> property is well
>>>>>>> encapsulated. Future ObjectMonitor implementations have to deal
>> with
>>>>>>> recursion too. They are free in
>>>>>>> choosing a way to do that as long as that property is taken into
>>>>>>> account. This is hardly a
>>>>>>> limitation.
>>>>>>
>>>>>> I do think this badly breaks encapsulation as you have to add a callout
>>>>>> from the guts of the ObjectMonitor code to reach into the thread to
>> get
>>>>>> this lock count adjustment. I understand why you have had to do this
>> but
>>>>>> I would much rather see a change to the EA optimisation strategy so
>> that
>>>>>> this is not needed.
>>>>>>
>>>>>>> Note also that the property is a straight forward extension of the
>>>>>>> existing concept of deferred
>>>>>>> local updates. It is embedded into the structure holding them. So not
>>>>>>> even the footprint of a
>>>>>>> JavaThread is enlarged if no deferred updates are generated.
>>>>>>>
>>>>>>>       > ---
>>>>>>>       >
>>>>>>>       > src/hotspot/share/runtime/thread.cpp
>>>>>>>       >
>>>>>>>       > Can you please explain why
>>>>>>> JavaThread::wait_for_object_deoptimization
>>>>>>>       > has to be handcrafted in this way rather than using proper
>>>>>>> transitions.
>>>>>>>       >
>>>>>>>
>>>>>>> I wrote wait_for_object_deoptimization taking
>>>>>>> JavaThread::java_suspend_self_with_safepoint_check
>>>>>>> as template. So in short: for the same reasons :)
>>>>>>>
>>>>>>> Threads reach both methods as part of thread state transitions,
>>>>>>> therefore special handling is
>>>>>>> required to change thread state on top of ongoing transitions.
>>>>>>>
>>>>>>>       > We got rid of "deopt suspend" some time ago and it is disturbing
>>>>>>> to see
>>>>>>>       > it being added back (effectively). This seems like it may be
>>>>>>> something
>>>>>>>       > that handshakes could be used for.
>>>>>>>
>>>>>>> Deopt suspend used to be something rather different with a similar
>>>>>>> name[1]. It is not being added back.
>>>>>>
>>>>>> I stand corrected. Despite comments in the code to the contrary
>>>>>> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of
>>>>>> cleanup in this area 13 years ago :)
>>>>>>
>>>>>>>
>>>>>>> I'm actually duplicating the existing external suspend mechanism,
>>>>>>> because a thread can be suspended
>>>>>>> at most once. And hey, and don't like that either! But it seems not
>>>>>>> unlikely that the duplicate can
>>>>>>> be removed together with the original and the new type of
>> handshakes
>>>>>>> that will be used for
>>>>>>> thread suspend can be used for object deoptimization too. See
>> today's
>>>>>>> discussion in JDK-8227745 [2].
>>>>>>
>>>>>> I hope that discussion bears some fruit, at the moment it seems not to
>>>>>> be possible to use handshakes here. :(
>>>>>>
>>>>>> The external suspend mechanism is a royal pain in the proverbial that
>> we
>>>>>> have to carefully live with. The idea that we're duplicating that for
>>>>>> use in another fringe area of functionality does not thrill me at all.
>>>>>>
>>>>>> To be clear, I understand the problem that exists and that you wish to
>>>>>> solve, but for the runtime parts I balk at the complexity cost of
>>>>>> solving it.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks, Richard.
>>>>>>>
>>>>>>> [1] Deopt suspend was something like an async. handshake for
>>>>>>> architectures with register windows,
>>>>>>>         where patching the return pc for deoptimization of a compiled
>>>>>>> frame was racy if the owner thread
>>>>>>>         was in native code. Instead a "deopt" suspend flag was set on
>>>>>>> which the thread patched its own
>>>>>>>         frame upon return from native. So no thread was suspended. It
>> got
>>>>>>> its name only from the name of
>>>>>>>         the flags.
>>>>>>>
>>>>>>> [2] Discussion about using handshakes to sync. with the target thread:
>>>>>>>
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-
>>>>
>> 8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.syst
>> e
>>>> m.issuetabpanels:comment-tabpanel#comment-14306727
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: David Holmes <david.hol...@oracle.com>
>>>>>>> Sent: Freitag, 13. Dezember 2019 00:56
>>>>>>> To: Reingruber, Richard <richard.reingru...@sap.com>;
>>>>>>> serviceability-dev@openjdk.java.net;
>>>>>>> hotspot-compiler-...@openjdk.java.net;
>>>>>>> hotspot-runtime-...@openjdk.java.net
>>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>>>> Performance in the Presence of JVMTI Agents
>>>>>>>
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Some further queries/concerns:
>>>>>>>
>>>>>>> src/hotspot/share/runtime/objectMonitor.cpp
>>>>>>>
>>>>>>> Can you please explain the changes to ObjectMonitor::wait:
>>>>>>>
>>>>>>> !   _recursions = save      // restore the old recursion count
>>>>>>> !                 + jt->get_and_reset_relock_count_after_wait(); //
>>>>>>> increased by the deferred relock count
>>>>>>>
>>>>>>> what is the "deferred relock count"? I gather it relates to
>>>>>>>
>>>>>>> "The code was extended to be able to deoptimize objects of a frame
>> that
>>>>>>> is not the top frame and to let another thread than the owning thread
>> do
>>>>>>> it."
>>>>>>>
>>>>>>> which I don't like the sound of at all when it comes to ObjectMonitor
>>>>>>> state. So I'd like to understand in detail exactly what is going on here
>>>>>>> and why.  This is a very intrusive change that seems to badly break
>>>>>>> encapsulation and impacts future changes to ObjectMonitor that are
>> under
>>>>>>> investigation.
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>>
>>>>>>> Can you please explain why
>> JavaThread::wait_for_object_deoptimization
>>>>>>> has to be handcrafted in this way rather than using proper transitions.
>>>>>>>
>>>>>>> We got rid of "deopt suspend" some time ago and it is disturbing to
>> see
>>>>>>> it being added back (effectively). This seems like it may be something
>>>>>>> that handshakes could be used for.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> On 12/12/2019 7:02 am, David Holmes wrote:
>>>>>>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>>        > Most of the details here are in areas I can comment on in
>> detail,
>>>>>>>>> but I
>>>>>>>>>        > did take an initial general look at things.
>>>>>>>>>
>>>>>>>>> Thanks for taking the time!
>>>>>>>>
>>>>>>>> Apologies the above should read:
>>>>>>>>
>>>>>>>> "Most of the details here are in areas I *can't* comment on in detail
>>>>>>>> ..."
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>>        > The only thing that jumped out at me is that I think the
>>>>>>>>>        > DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>>>>>        >
>>>>>>>>>        > +  bool is_hidden_from_external_view() const { return true; }
>>>>>>>>>
>>>>>>>>> Yes, it should. Will add the method like above.
>>>>>>>>>
>>>>>>>>>        > Also I don't see any testing of the
>> DeoptimizeObjectsALotThread.
>>>>>>>>> Without
>>>>>>>>>        > active testing this will just bit-rot.
>>>>>>>>>
>>>>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger
>>>>>>>>> workload. I will add a minimal test
>>>>>>>>> to keep it fresh.
>>>>>>>>>
>>>>>>>>>        > Also on the tests I don't understand your @requires clause:
>>>>>>>>>        >
>>>>>>>>>        >   @requires ((vm.compMode != "Xcomp") &
>> vm.compiler2.enabled
>>>> &
>>>>>>>>>        > (vm.opt.TieredCompilation != true))
>>>>>>>>>        >
>>>>>>>>>        > This seems to require that TieredCompilation is disabled, but
>>>>>>>>> tiered is
>>>>>>>>>        > our normal mode of operation. ??
>>>>>>>>>        >
>>>>>>>>>
>>>>>>>>> I removed the clause. I guess I wanted to target the tests towards
>> the
>>>>>>>>> code they are supposed to
>>>>>>>>> test, and it's easier to analyze failures w/o tiered compilation and
>>>>>>>>> with just one compiler thread.
>>>>>>>>>
>>>>>>>>> Additionally I will make use of
>>>>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: David Holmes <david.hol...@oracle.com>
>>>>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03
>>>>>>>>> To: Reingruber, Richard <richard.reingru...@sap.com>;
>>>>>>>>> serviceability-dev@openjdk.java.net;
>>>>>>>>> hotspot-compiler-...@openjdk.java.net;
>>>>>>>>> hotspot-runtime-...@openjdk.java.net
>>>>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
>>>>>>>>> Performance in the Presence of JVMTI Agents
>>>>>>>>>
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I would like to get reviews please for
>>>>>>>>>>
>>>>>>>>>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/
>>>>>>>>>>
>>>>>>>>>> Corresponding RFE:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745
>>>>>>>>>>
>>>>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
>>>>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK-
>> 8214584 [1]
>>>>>>>>>>
>>>>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing
>> without
>>>>>>>>>> issues (thanks!). In addition the
>>>>>>>>>> change is being tested at SAP since I posted the first RFR some
>>>>>>>>>> months ago.
>>>>>>>>>>
>>>>>>>>>> The intention of this enhancement is to benefit performance wise
>> from
>>>>>>>>>> escape analysis even if JVMTI
>>>>>>>>>> agents request capabilities that allow them to access local variable
>>>>>>>>>> values. E.g. if you start-up
>>>>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,
>> then
>>>>>>>>>> escape analysis is disabled right
>>>>>>>>>> from the beginning, well before a debugger attaches -- if ever one
>>>>>>>>>> should do so. With the
>>>>>>>>>> enhancement, escape analysis will remain enabled until and after
>> a
>>>>>>>>>> debugger attaches. EA based
>>>>>>>>>> optimizations are reverted just before an agent acquires the
>>>>>>>>>> reference to an object. In the JBS item
>>>>>>>>>> you'll find more details.
>>>>>>>>>
>>>>>>>>> Most of the details here are in areas I can comment on in detail, but
>> I
>>>>>>>>> did take an initial general look at things.
>>>>>>>>>
>>>>>>>>> The only thing that jumped out at me is that I think the
>>>>>>>>> DeoptimizeObjectsALotThread should be a hidden thread.
>>>>>>>>>
>>>>>>>>> +  bool is_hidden_from_external_view() const { return true; }
>>>>>>>>>
>>>>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread.
>>>>>>>>> Without
>>>>>>>>> active testing this will just bit-rot.
>>>>>>>>>
>>>>>>>>> Also on the tests I don't understand your @requires clause:
>>>>>>>>>
>>>>>>>>>        @requires ((vm.compMode != "Xcomp") &
>> vm.compiler2.enabled &
>>>>>>>>> (vm.opt.TieredCompilation != true))
>>>>>>>>>
>>>>>>>>> This seems to require that TieredCompilation is disabled, but tiered
>> is
>>>>>>>>> our normal mode of operation. ??
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745
>>>>>>>>>>
>>>>
>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.pa
>> tc
>>>> h
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Reply via email to