> Thanks for cleaning up thread.hpp! Thanks for providing the feedback!
I justed noticed that the forward declaration of class jvmtiDeferredLocalVariableSet is not required anymore. Will remove it in the next webrev. Hope to get some more (partial) reviews. Thanks, Richard. -----Original Message----- From: Robbin Ehn <robbin....@oracle.com> Sent: Dienstag, 31. März 2020 16:21 To: Reingruber, Richard <richard.reingru...@sap.com>; Doerr, Martin <martin.do...@sap.com>; Lindenmaier, Goetz <goetz.lindenma...@sap.com>; David Holmes <david.hol...@oracle.com>; Vladimir Kozlov (vladimir.koz...@oracle.com) <vladimir.koz...@oracle.com>; serviceability-dev@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; hotspot-runtime-...@openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Thanks for cleaning up thread.hpp! /Robbin On 2020-03-30 10:31, Reingruber, Richard wrote: > Hi, > > this is webrev.5 based on Robbin's feedback and Martin's review - thanks! :) > > The change affects jvmti, hotspot and c2. Partial reviews are very welcome > too. > > Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/ > Delta: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5.inc/ > > Robbin, Martin, please let me know, if anything shouldn't be quite as you > wanted it. Also find my > comments on your feedback below. > > Robbin, can I count you as Reviewer for the runtime part? > > Thanks, Richard. > > -- > >> DeoptimizeObjectsALotThread is only used in compileBroker.cpp. >> You can move both declaration and definition to that file, no need to clobber >> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry) > > Done. > >> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's >> own >> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp. > > I moved JvmtiDeferredUpdates to vframe_hp.hpp where preexisting > jvmtiDeferredLocalVariableSet is > declared. > >> src/hotspot/share/code/compiledMethod.cpp >> Nice cleanup! > > Thanks :) > >> src/hotspot/share/code/debugInfoRec.cpp >> src/hotspot/share/code/debugInfoRec.hpp >> Additional parmeters. (Remark: I think "non_global_escape_in_scope" would >> read better than "not_global_escape_in_scope", but your version is >> consistent with existing code, so no change request from my side.) Ok. > > I've been thinking about this too and finally stayed with > not_global_escape_in_scope. It's supposed > to mean an object whose escape state is not GlobalEscape is in scope. > >> src/hotspot/share/compiler/compileBroker.cpp >> src/hotspot/share/compiler/compileBroker.hpp >> Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a >> follow up change together with the test in order to make this webrev >> smaller, but since it is included, I'm reviewing everything at once. Not a >> big deal.) Ok. > > Yes the change would be a little smaller. And if it helps I'll split it off. > In general I prefer > patches that bring along a suitable amount of tests. > >> src/hotspot/share/opto/c2compiler.cpp >> Make do_escape_analysis independent of JVMCI capabilities. Nice! > > It is the main goal of the enhancement. It is done for C2, but could be done > for JVMCI compilers > with just a small effort as well. > >> src/hotspot/share/opto/escape.cpp >> Annotation for MachSafePointNodes. Your added functionality looks correct. >> But I'd prefer to move the bulky code out of the large function. >> I suggest to factor out something like has_not_global_escape and >> has_arg_escape. So the code could look like this: >> SafePointNode* sfn = sfn_worklist.at(next); >> sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn)); >> if (sfn->is_CallJava()) { >> CallJavaNode* call = sfn->as_CallJava(); >> call->set_arg_escape(has_arg_escape(call)); >> } >> This would also allow us to get rid of the found_..._escape_in_args >> variables making the loops better readable. > > Done. > >> It's kind of ugly to use strcmp to recognize uncommon trap, but that seems >> to be the way to do it (there are more such places). So it's ok. > > Yeah. I copied the snippet. > >> src/hotspot/share/prims/jvmtiImpl.cpp >> src/hotspot/share/prims/jvmtiImpl.hpp >> The sequence is pretty complex: >> VM_GetOrSetLocal element initialization executes EscapeBarrier code which >> suspends the target thread (extra VM Operation). > > Note that the target threads have to be suspended already for > VM_GetOrSetLocal*. So it's mainly the > synchronization effect of EscapeBarrier::sync_and_suspend_one() that is > required here. Also no extra > _handshake_ is executed, since sync_and_suspend_one() will find the target > threads already > suspended. > >> VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread >> to prepare VM Operation with frame deoptimization). >> VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor which >> resumes the target thread. >> But I don't have any improvement proposal. Performance is probably not a >> concern, here. So it's ok. > >> VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has >> non-globally escaping objects and other frames if they have arg escaping >> ones. Good. > > It's not specifically the top frame, but the frame that is accessed. > >> src/hotspot/share/runtime/deoptimization.cpp >> Object deoptimization. I have more comments and proposals, here. >> First of all, handling recursive and waiting locks in relock_objects is >> tricky, but looks correct. >> Comments are sufficient to understand why things are done as they are >> implemented. > >> BiasedLocking related parts are complex, but we may get rid of them in the >> future (with BiasedLocking removal). >> Anyway, looks correct, too. > >> Typo in comment: "regularily" => "regularly" > >> Deoptimization::fetch_unroll_info_helper is the only place where >> _jvmti_deferred_updates get deallocated (except JavaThread destructor). But >> I think we always go through it, so I can't see a memory leak or such kind >> of issues. > > That's correct. The compiled frame for which deferred updates are allocated > is always deoptimized > before (see EscapeBarrier::deoptimize_objects()). This is also asserted in > compiledVFrame::update_deferred_value(). I've added the same assertion to > Deoptimization::relock_objects(). So we can be sure that > _jvmti_deferred_updates are deallocated > again in fetch_unroll_info_helper(). > >> EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread(). > > Sure, well spotted! > >> You can use MutexLocker and MonitorLocker with Thread* to save the >> Thread::current() call. > > Right, good hint. This was recently introduced with 8235678. I even had to > resolve conflicts. Should > have done this then. > >> I'd make set_objs_are_deoptimized static and remove it from the >> EscapeBarrier interface because I think it shouldn't be used outside of >> EscapeBarrier::deoptimize_objects. > > Done. > >> Typo in comment: "we must only deoptimize" => "we only have to deoptimize" > > Replaced with "[...] we deoptimize iff local objects are passed as args" > >> "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and >> barrier_active() is redundant. Implementation can get moved to hpp file. > > Ok. Done. > >> I'll get back to suspend flags, later. > >> There are weird cases regarding _self_deoptimization_in_progress. >> Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C >> can set _self_deoptimization_in_progress while A performs the handshake for >> suspending C. I think this doesn't lead to errors, but it's probably not >> desired. >> I think it would be better to use only one "wait" call in >> sync_and_suspend_one and sync_and_suspend_all. > > You're right. We've discussed that face-to-face, but couldn't find a real > issue. But now, thinking again, a reckon I found one: > > 2808 // Sync with other threads that might be doing deoptimizations > 2809 { > 2810 // Need to switch to _thread_blocked for the wait() call > 2811 ThreadBlockInVM tbivm(_calling_thread); > 2812 MonitorLocker ml(EscapeBarrier_lock, > Mutex::_no_safepoint_check_flag); > 2813 while (_self_deoptimization_in_progress) { > 2814 ml.wait(); > 2815 } > 2816 > 2817 if (self_deopt()) { > 2818 _self_deoptimization_in_progress = true; > 2819 } > 2820 > 2821 while (_deoptee_thread->is_ea_obj_deopt_suspend()) { > 2822 ml.wait(); > 2823 } > 2824 > 2825 if (self_deopt()) { > 2826 return; > 2827 } > 2828 > 2829 // set suspend flag for target thread > 2830 _deoptee_thread->set_ea_obj_deopt_flag(); > 2831 } > > - A waits in 2822 > - C is suspended > - B notifies all in resume_one() > - A and C wake up > - C wins over A and sets _self_deoptimization_in_progress = true in 2818 > - C does the self deoptimization > - A executes 2830 _deoptee_thread->set_ea_obj_deopt_flag() > > C will self suspend at some undefined point. The resulting state is illegal. > >> I first thought it'd be better to move ThreadBlockInVM before wait() to >> reduce thread state transitions, but that seems to be problematic because >> ThreadBlockInVM destructor contains a safepoint check which we shouldn't do >> while holding EscapeBarrier_lock. So no change request. > > Yes, would be nice to have the state change only if needed, but for the > reason you mentioned it is > not quite as easy as it seems to be. I experimented as well with a second > lock, but did not succeed. > >> Change in thred_added: >> I think the sequence would be more comprehensive if we waited for >> deopt_all_threads in Thread::start and all other places where a new thread >> can run into Java code (e.g. JVMTI attach). >> Your version makes new threads come up with suspend flag set. That looks >> correct, too. Advantage is that you only have to change one place >> (thread_added). It'll be interesting to see how it will look like when we >> use async handshakes instead of suspend flags. >> For now, I'm ok with your version. > > I had a version that did what you are suggesting. The current version also > has the advantage, that > there are fewer places where a thread has to wait for ongoing object > deoptimization. This means > viewer places where you have to worry about correct thread state transitions, > possible deadlocks, > and if all oops are properly Handle'ed. > >> I'd only move MutexLocker ml(EscapeBarrier_lock...) after if >> (!jt->is_hidden_from_external_view()). > > Done. > >> Having 4 different deoptimize_objects functions makes it a little hard to >> keep an overview of which one is used for what. >> Maybe adding suffixes would help a little bit, but I can also live with what >> you have. >> Implementation looks correct to me. > > 2 are internal. I added the suffix _internal to them. This leaves 2 to choose > from. > >> src/hotspot/share/runtime/deoptimization.hpp >> Escape barriers and object deoptimization functions. >> Typo in comment: "helt" => "held" > > Done in place already. > >> src/hotspot/share/runtime/interfaceSupport.cpp >> InterfaceSupport::deoptimizeAllObjects() is only used for >> DeoptimizeObjectsALot = 1. >> I think DeoptimizeObjectsALot = 2 is more important, but I think it's not >> bad to have DeoptimizeObjectsALot = 1 in addition. Ok. > > I never used DeoptimizeObjectsALot = 1 that much. It could be more > deterministic in single threaded > scenarios. I wouldn't object to get rid of it though. > >> src/hotspot/share/runtime/stackValue.hpp >> Better reinitilization in StackValue. Good. > > StackValue::obj_is_scalar_replaced() should not return true after calling > set_obj(). > >> src/hotspot/share/runtime/thread.cpp >> src/hotspot/share/runtime/thread.hpp >> src/hotspot/share/runtime/thread.inline.hpp >> wait_for_object_deoptimization, suspend flag, deferred updates and test >> feature to deoptimize objects. > >> In the long term, we want to get rid of suspend flags, so it's not so nice >> to introduce a new one. But I agree with Götz that it should be acceptable >> as temporary solution until async handshakes are available (which takes more >> time). So I'm ok with your change. > > I'm keen to build the feature on async handshakes when the arive. > >> You can use MutexLocker with Thread*. > > Done. > >> JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out >> of thread.hpp. > > Done. > >> src/hotspot/share/runtime/vframe.cpp >> Added support for entry frame to new_vframe. Ok. > > >> src/hotspot/share/runtime/vframe_hp.cpp >> src/hotspot/share/runtime/vframe_hp.hpp > >> I think code()->as_nmethod() in not_global_escape_in_scope() and >> arg_escape() should better be under #ifdef ASSERT or inside the assert >> statement (no need for code cache walking in product build). > > Done. > >> jvmtiDeferredLocalVariableSet::update_monitors: >> Please add a comment explaining that owner referenced by original info may >> be scalar replaced, but it is deoptimized in the vframe. > > Done. > > -----Original Message----- > From: Doerr, Martin <martin.do...@sap.com> > Sent: Donnerstag, 12. März 2020 17:28 > To: Reingruber, Richard <richard.reingru...@sap.com>; 'Robbin Ehn' > <robbin....@oracle.com>; Lindenmaier, Goetz <goetz.lindenma...@sap.com>; > David Holmes <david.hol...@oracle.com>; Vladimir Kozlov > (vladimir.koz...@oracle.com) <vladimir.koz...@oracle.com>; > serviceability-dev@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; > hotspot-runtime-...@openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in > the Presence of JVMTI Agents > > Hi Richard, > > > I managed to find time for a (almost) complete review of webrev.4. (I'll > review the tests separately.) > > First of all, the change seems to be in pretty good quality for its > significant complexity. I couldn't find any real bugs. But I'd like to > propose minor improvements. > I'm convinced that it's mature because we did substantial testing. > > I like the new functionality for object deoptimization. It can possibly be > reused for future escape analysis based optimizations. So I appreciate having > it available in the code base. > In addition to that, your change makes the JVMTI implementation better > integrated into the VM. > > > Now to the details: > > > src/hotspot/share/c1/c1_IR.hpp > describe_scope parameters. Ok. > > > src/hotspot/share/ci/ciEnv.cpp > src/hotspot/share/ci/ciEnv.hpp > Fix for JvmtiExport::can_walk_any_space() capability. Ok. > > > src/hotspot/share/code/compiledMethod.cpp > Nice cleanup! > > > src/hotspot/share/code/debugInfoRec.cpp > src/hotspot/share/code/debugInfoRec.hpp > Additional parmeters. (Remark: I think "non_global_escape_in_scope" would > read better than "not_global_escape_in_scope", but your version is consistent > with existing code, so no change request from my side.) Ok. > > > src/hotspot/share/code/nmethod.cpp > Nice cleanup! > > > src/hotspot/share/code/pcDesc.hpp > Additional parameters. Ok. > > > src/hotspot/share/code/scopeDesc.cpp > src/hotspot/share/code/scopeDesc.hpp > Improved implementation + additional parameters. Ok. > > > src/hotspot/share/compiler/compileBroker.cpp > src/hotspot/share/compiler/compileBroker.hpp > Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a > follow up change together with the test in order to make this webrev smaller, > but since it is included, I'm reviewing everything at once. Not a big deal.) > Ok. > > > src/hotspot/share/jvmci/jvmciCodeInstaller.cpp > Additional parameters. Ok. > > > src/hotspot/share/opto/c2compiler.cpp > Make do_escape_analysis independent of JVMCI capabilities. Nice! > > > src/hotspot/share/opto/callnode.hpp > Additional fields for MachSafePointNodes. Ok. > > > src/hotspot/share/opto/escape.cpp > Annotation for MachSafePointNodes. Your added functionality looks correct. > But I'd prefer to move the bulky code out of the large function. > I suggest to factor out something like has_not_global_escape and > has_arg_escape. So the code could look like this: > SafePointNode* sfn = sfn_worklist.at(next); > sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn)); > if (sfn->is_CallJava()) { > CallJavaNode* call = sfn->as_CallJava(); > call->set_arg_escape(has_arg_escape(call)); > } > This would also allow us to get rid of the found_..._escape_in_args variables > making the loops better readable. > > It's kind of ugly to use strcmp to recognize uncommon trap, but that seems to > be the way to do it (there are more such places). So it's ok. > > > src/hotspot/share/opto/machnode.hpp > Additional fields for MachSafePointNodes. Ok. > > > src/hotspot/share/opto/macro.cpp > Allow elimination of non-escaping allocations. Ok. > > > src/hotspot/share/opto/matcher.cpp > src/hotspot/share/opto/output.cpp > Copy attribute / pass parameters. Ok. > > > src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp > Nice cleanup! > > > src/hotspot/share/prims/jvmtiEnv.cpp > src/hotspot/share/prims/jvmtiEnvBase.cpp > Escape barriers + deoptimize objects for target thread. Good. > > > src/hotspot/share/prims/jvmtiImpl.cpp > src/hotspot/share/prims/jvmtiImpl.hpp > The sequence is pretty complex: > VM_GetOrSetLocal element initialization executes EscapeBarrier code which > suspends the target thread (extra VM Operation). > VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM Thread > to prepare VM Operation with frame deoptimization). > VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor which > resumes the target thread. > But I don't have any improvement proposal. Performance is probably not a > concern, here. So it's ok. > > VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has > non-globally escaping objects and other frames if they have arg escaping > ones. Good. > > > src/hotspot/share/prims/jvmtiTagMap.cpp > Escape barriers + deoptimize objects for all threads. Ok. > > > src/hotspot/share/prims/whitebox.cpp > Added WB_IsFrameDeoptimized to API. Ok. > > > src/hotspot/share/runtime/deoptimization.cpp > Object deoptimization. I have more comments and proposals, here. > First of all, handling recursive and waiting locks in relock_objects is > tricky, but looks correct. > Comments are sufficient to understand why things are done as they are > implemented. > > BiasedLocking related parts are complex, but we may get rid of them in the > future (with BiasedLocking removal). > Anyway, looks correct, too. > > Typo in comment: "regularily" => "regularly" > > Deoptimization::fetch_unroll_info_helper is the only place where > _jvmti_deferred_updates get deallocated (except JavaThread destructor). But I > think we always go through it, so I can't see a memory leak or such kind of > issues. > > EscapeBarrier::deoptimize_objects: ResourceMark should use calling_thread(). > > You can use MutexLocker and MonitorLocker with Thread* to save the > Thread::current() call. > > I'd make set_objs_are_deoptimized static and remove it from the EscapeBarrier > interface because I think it shouldn't be used outside of > EscapeBarrier::deoptimize_objects. > > Typo in comment: "we must only deoptimize" => "we only have to deoptimize" > > "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and > barrier_active() is redundant. Implementation can get moved to hpp file. > > I'll get back to suspend flags, later. > > There are weird cases regarding _self_deoptimization_in_progress. > Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. C > can set _self_deoptimization_in_progress while A performs the handshake for > suspending C. I think this doesn't lead to errors, but it's probably not > desired. > I think it would be better to use only one "wait" call in > sync_and_suspend_one and sync_and_suspend_all. > > I first thought it'd be better to move ThreadBlockInVM before wait() to > reduce thread state transitions, but that seems to be problematic because > ThreadBlockInVM destructor contains a safepoint check which we shouldn't do > while holding EscapeBarrier_lock. So no change request. > > Change in thred_added: > I think the sequence would be more comprehensive if we waited for > deopt_all_threads in Thread::start and all other places where a new thread > can run into Java code (e.g. JVMTI attach). > Your version makes new threads come up with suspend flag set. That looks > correct, too. Advantage is that you only have to change one place > (thread_added). It'll be interesting to see how it will look like when we use > async handshakes instead of suspend flags. > For now, I'm ok with your version. > > I'd only move MutexLocker ml(EscapeBarrier_lock...) after if > (!jt->is_hidden_from_external_view()). > > Having 4 different deoptimize_objects functions makes it a little hard to > keep an overview of which one is used for what. > Maybe adding suffixes would help a little bit, but I can also live with what > you have. > Implementation looks correct to me. > > > src/hotspot/share/runtime/deoptimization.hpp > Escape barriers and object deoptimization functions. > Typo in comment: "helt" => "held" > > > src/hotspot/share/runtime/globals.hpp > Addition of develop flag DeoptimizeObjectsALotInterval. Ok. > > > src/hotspot/share/runtime/interfaceSupport.cpp > InterfaceSupport::deoptimizeAllObjects() is only used for > DeoptimizeObjectsALot = 1. > I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad > to have DeoptimizeObjectsALot = 1 in addition. Ok. > > > src/hotspot/share/runtime/interfaceSupport.inline.hpp > Addition of deoptimizeAllObjects. Ok. > > > src/hotspot/share/runtime/mutexLocker.cpp > src/hotspot/share/runtime/mutexLocker.hpp > Addition of EscapeBarrier_lock. Ok. > > > src/hotspot/share/runtime/objectMonitor.cpp > Make recursion count relock aware. Ok. > > > src/hotspot/share/runtime/stackValue.hpp > Better reinitilization in StackValue. Good. > > > src/hotspot/share/runtime/thread.cpp > src/hotspot/share/runtime/thread.hpp > src/hotspot/share/runtime/thread.inline.hpp > wait_for_object_deoptimization, suspend flag, deferred updates and test > feature to deoptimize objects. > > In the long term, we want to get rid of suspend flags, so it's not so nice to > introduce a new one. But I agree with Götz that it should be acceptable as > temporary solution until async handshakes are available (which takes more > time). So I'm ok with your change. > > You can use MutexLocker with Thread*. > > JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class out > of thread.hpp. > > > src/hotspot/share/runtime/vframe.cpp > Added support for entry frame to new_vframe. Ok. > > > src/hotspot/share/runtime/vframe_hp.cpp > src/hotspot/share/runtime/vframe_hp.hpp > > I think code()->as_nmethod() in not_global_escape_in_scope() and arg_escape() > should better be under #ifdef ASSERT or inside the assert statement (no need > for code cache walking in product build). > > jvmtiDeferredLocalVariableSet::update_monitors: > Please add a comment explaining that owner referenced by original info may be > scalar replaced, but it is deoptimized in the vframe. > > > src/hotspot/share/utilities/macros.hpp > Addition of NOT_COMPILER2_OR_JVMCI_RETURN macros. Ok. > > > test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysisEnabled.java > test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnalysisEnabled.c > New test. Will review separately. > > > test/jdk/TEST.ROOT > Addition of vm.jvmci as required property. Ok. > > > test/jdk/com/sun/jdi/EATests.java > test/jdk/com/sun/jdi/EATestsJVMCI.java > New test. Will review separately. > > > test/lib/sun/hotspot/WhiteBox.java > Added isFrameDeoptimized to API. Ok. > > > That was it. Best regards, > Martin > > >> -----Original Message----- >> From: hotspot-compiler-dev <hotspot-compiler-dev- >> boun...@openjdk.java.net> On Behalf Of Reingruber, Richard >> Sent: Dienstag, 3. März 2020 21:23 >> To: 'Robbin Ehn' <robbin....@oracle.com>; Lindenmaier, Goetz >> <goetz.lindenma...@sap.com>; David Holmes <david.hol...@oracle.com>; >> Vladimir Kozlov (vladimir.koz...@oracle.com) >> <vladimir.koz...@oracle.com>; serviceability-dev@openjdk.java.net; >> hotspot-compiler-...@openjdk.java.net; hotspot-runtime- >> d...@openjdk.java.net >> Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance in the Presence of JVMTI Agents >> >> Hi Robbin, >> >>>> I understand that Robbin proposed to replace the usage of >>>> _suspend_flag with handshakes. Apparently, async handshakes >>>> are needed to do so. We have been waiting a while for removal >>>> of the _suspend_flag / introduction of async handshakes [2]. >>>> What is the status here? >> >>> I have an old prototype which I would like to continue to work on. >>> So do not assume asynch handshakes will make 15. >>> Even if it would, I think there are a lot more investigate work to remove >>> _suspend_flag. >> >> Let us know, if we can be of any help to you and be it only testing. >> >>>>> Full: >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/ >> >>> DeoptimizeObjectsALotThread is only used in compileBroker.cpp. >>> You can move both declaration and definition to that file, no need to >> clobber >>> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry) >> >> Will do. >> >>> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's >> own >>> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp. >> >> You are right. It shouldn't be declared in thread.hpp. I will look into that. >> >>> Note that we also think we may have a bug in deopt: >>> https://bugs.openjdk.java.net/browse/JDK-8238237 >> >>> I think it would be best, if possible, to push after that is resolved. >> >> Sure. >> >>> Not even nearly a full review :) >> >> I know :) >> >> Anyways, thanks a lot, >> Richard. >> >> >> -----Original Message----- >> From: Robbin Ehn <robbin....@oracle.com> >> Sent: Monday, March 2, 2020 11:17 AM >> To: Lindenmaier, Goetz <goetz.lindenma...@sap.com>; Reingruber, Richard >> <richard.reingru...@sap.com>; David Holmes <david.hol...@oracle.com>; >> Vladimir Kozlov (vladimir.koz...@oracle.com) >> <vladimir.koz...@oracle.com>; serviceability-dev@openjdk.java.net; >> hotspot-compiler-...@openjdk.java.net; hotspot-runtime- >> d...@openjdk.java.net >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance >> in the Presence of JVMTI Agents >> >> Hi, >> >> On 2/24/20 5:39 PM, Lindenmaier, Goetz wrote: >>> Hi, >>> >>> I had a look at the progress of this change. Nothing >>> happened since Richard posted his update using more >>> handshakes [1]. >>> But we (SAP) would appreciate a lot if this change could >>> be successfully reviewed and pushed. >>> >>> I think there is basic understanding that this >>> change is helpful. It fixes a number of issues with JVMTI, >>> and will deliver the same performance benefits as EA >>> does in current production mode for debugging scenarios. >>> >>> This is important for us as we run our VMs prepared >>> for debugging in production mode. >>> >>> I understand that Robbin proposed to replace the usage of >>> _suspend_flag with handshakes. Apparently, async handshakes >>> are needed to do so. We have been waiting a while for removal >>> of the _suspend_flag / introduction of async handshakes [2]. >>> What is the status here? >> >> I have an old prototype which I would like to continue to work on. >> So do not assume asynch handshakes will make 15. >> Even if it would, I think there are a lot more investigate work to remove >> _suspend_flag. >> >>> >>> I think we should no longer wait, but proceed with >>> this change. We will look into removing the usage of >>> suspend_flag introduced here once it is possible to implement >>> it with handshakes. >> >> Yes, sure. >> >>>> Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/ >> >> DeoptimizeObjectsALotThread is only used in compileBroker.cpp. >> You can move both declaration and definition to that file, no need to clobber >> thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry) >> >> Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in it's >> own >> hpp file? It doesn't seem right to add JVM TI classes into thread.hpp. >> >> Note that we also think we may have a bug in deopt: >> https://bugs.openjdk.java.net/browse/JDK-8238237 >> >> I think it would be best, if possible, to push after that is resolved. >> >> Not even nearly a full review :) >> >> Thanks, Robbin >> >> >>>> Incremental: >>>> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4.inc/ >>>> >>>> I was not able to eliminate the additional suspend flag now. I'll take care >> of this >>>> as soon as the >>>> existing suspend-resume-mechanism is reworked. >>>> >>>> Testing: >>>> >>>> Nightly tests @SAP: >>>> >>>> JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015, >> Renaissance >>>> Suite, SAP specific tests >>>> with fastdebug and release builds on all platforms >>>> >>>> Stress testing with DeoptimizeObjectsALot running SPECjvm2008 40x >> parallel >>>> for 24h >>>> >>>> Thanks, Richard. >>>> >>>> >>>> More details on the changes: >>>> >>>> * Hide DeoptimizeObjectsALotThread from external view. >>>> >>>> * Changed EscapeBarrier_lock to be a _safepoint_check_never lock. >>>> It used to be _safepoint_check_sometimes, which will be eliminated >> sooner or >>>> later. >>>> I added explicit thread state changes with ThreadBlockInVM to code >> paths >>>> where we can wait() >>>> on EscapeBarrier_lock to become safepoint safe. >>>> >>>> * Use handshake EscapeBarrierSuspendHandshake to suspend target >> threads >>>> instead of vm operation >>>> VM_ThreadSuspendAllForObjDeopt. >>>> >>>> * Removed uses of Threads_lock. When adding a new thread we suspend >> it iff >>>> EA optimizations are >>>> being reverted. In the previous version we were waiting on >> Threads_lock >>>> while EA optimizations >>>> were reverted. See EscapeBarrier::thread_added(). >>>> >>>> * Made tests require Xmixed compilation mode. >>>> >>>> * Made tests agnostic regarding tiered compilation. >>>> I.e. tc isn't disabled anymore, and the tests can be run with tc >>>> enabled or >>>> disabled. >>>> >>>> * Exercising EATests.java as well with stress test options >>>> DeoptimizeObjectsALot* >>>> Due to the non-deterministic deoptimizations some tests need to be >> skipped. >>>> We do this to prevent bit-rot of the stress test code. >>>> >>>> * Executing EATests.java as well with graal if available. Driver for this >>>> is >>>> EATestsJVMCI.java. Graal cannot pass all tests, because it does not >> provide all >>>> the new debug info >>>> (namely not_global_escape_in_scope and arg_escape in >> scopeDesc.hpp). >>>> And graal does not yet support the JVMTI operations force early return >> and >>>> pop frame. >>>> >>>> * Removed tracing from new jdi tests in EATests.java. Too much trace >> output >>>> before the debugging >>>> connection is established can cause deadlock because output buffers >>>> fill >> up. >>>> (See https://bugs.openjdk.java.net/browse/JDK-8173304) >>>> >>>> * Many copyright year changes and smaller clean-up changes of testing >> code >>>> (trailing white-space and >>>> the like). >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes <david.hol...@oracle.com> >>>> Sent: Donnerstag, 19. Dezember 2019 03:12 >>>> To: Reingruber, Richard <richard.reingru...@sap.com>; serviceability- >>>> d...@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; >> hotspot- >>>> runtime-...@openjdk.java.net; Vladimir Kozlov >> (vladimir.koz...@oracle.com) >>>> <vladimir.koz...@oracle.com> >>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance in >>>> the Presence of JVMTI Agents >>>> >>>> Hi Richard, >>>> >>>> I think my issue is with the way EliminateNestedLocks works so I'm going >>>> to look into that more deeply. >>>> >>>> Thanks for the explanations. >>>> >>>> David >>>> >>>> On 18/12/2019 12:47 am, Reingruber, Richard wrote: >>>>> Hi David, >>>>> >>>>> > > > Some further queries/concerns: >>>>> > > > >>>>> > > > src/hotspot/share/runtime/objectMonitor.cpp >>>>> > > > >>>>> > > > Can you please explain the changes to ObjectMonitor::wait: >>>>> > > > >>>>> > > > ! _recursions = save // restore the old recursion >>>>> count >>>>> > > > ! + >>>>> jt->get_and_reset_relock_count_after_wait(); // >>>>> > > > increased by the deferred relock count >>>>> > > > >>>>> > > > what is the "deferred relock count"? I gather it relates to >>>>> > > > >>>>> > > > "The code was extended to be able to deoptimize objects of a >>>>> > > frame that >>>>> > > > is not the top frame and to let another thread than the >>>>> owning >>>>> > > thread do >>>>> > > > it." >>>>> > > >>>>> > > Yes, these relate. Currently EA based optimizations are reverted, >> when a >>>> compiled frame is >>>>> > > replaced with corresponding interpreter frames. Part of this is >> relocking >>>> objects with eliminated >>>>> > > locking. New with the enhancement is that we do this also just >> before >>>> object references are >>>>> > > acquired through JVMTI. In this case we deoptimize also the >> owning >>>> compiled frame C and we >>>>> > > register deoptimized objects as deferred updates. When control >> returns >>>> to C it gets deoptimized, >>>>> > > we notice that objects are already deoptimized (reallocated and >>>> relocked), so we don't do it again >>>>> > > (relocking twice would be incorrect of course). Deferred updates >> are >>>> copied into the new >>>>> > > interpreter frames. >>>>> > > >>>>> > > Problem: relocking is not possible if the target thread T is >>>>> waiting >> on the >>>> monitor that needs to >>>>> > > be relocked. This happens only with non-local objects with >>>> EliminateNestedLocks. Instead relocking >>>>> > > is deferred until T owns the monitor again. This is what the >>>>> piece of >>>> code above does. >>>>> > >>>>> > Sorry I need some more detail here. How can you wait() on an >> object >>>>> > monitor if the object allocation and/or locking was optimised >>>>> away? >> And >>>>> > what is a "non-local object" in this context? Isn't EA restricted >>>>> to >>>>> > thread-confined objects? >>>>> >>>>> "Non-local object" is an object that escapes its thread. The issue I'm >>>> addressing with the changes >>>>> in ObjectMonitor::wait are almost unrelated to EA. They are caused by >>>> EliminateNestedLocks, where C2 >>>>> eliminates recursive locking of an already owned lock. The lock owning >> object >>>> exists on the heap, it >>>>> is locked and you can call wait() on it. >>>>> >>>>> EliminateLocks is the C2 option that controls lock elimination based on >> EA. >>>> Both optimizations have >>>>> in common that objects with eliminated locking need to be relocked >> when >>>> deoptimizing a frame, >>>>> i.e. when replacing a compiled frame with equivalent interpreter >>>>> frames. Deoptimization::relock_objects does that job for /all/ eliminated >>>> locks in scope. /All/ can >>>>> be a mix of eliminated nested locks and locks of not-escaping objects. >>>>> >>>>> New with the enhancement: I call relock_objects earlier, just before >> objects >>>> pontentially >>>>> escape. But then later when the owning compiled frame gets >> deoptimized, I >>>> must not do it again: >>>>> >>>>> See call to EscapeBarrier::objs_are_deoptimized in deoptimization.cpp: >>>>> >>>>> 373 if ((jvmci_enabled || ((DoEscapeAnalysis || >> EliminateNestedLocks) && >>>> EliminateLocks)) >>>>> 374 && !EscapeBarrier::objs_are_deoptimized(thread, >> deoptee.id())) { >>>>> 375 bool unused; >>>>> 376 eliminate_locks(thread, chunk, realloc_failures, deoptee, >> exec_mode, >>>> unused); >>>>> 377 } >>>>> >>>>> Now when calling relock_objects early it is quiet possible that I have to >> relock >>>> an object the >>>>> target thread currently waits for. Obviously I cannot relock in this case, >>>> instead I chose to >>>>> introduce relock_count_after_wait to JavaThread. >>>>> >>>>> > Is it just that some of the locking gets optimized away e.g. >>>>> > >>>>> > synchronised(obj) { >>>>> > synchronised(obj) { >>>>> > synchronised(obj) { >>>>> > obj.wait(); >>>>> > } >>>>> > } >>>>> > } >>>>> > >>>>> > If this is reduced to a form as-if it were a single lock of the >>>>> monitor >>>>> > (due to EA) and the wait() triggers a JVM TI event which leads to >>>>> the >>>>> > escape of "obj" then we need to reconstruct the true lock state, >>>>> and >> so >>>>> > when the wait() internally unblocks and reacquires the monitor it >> has to >>>>> > set the true recursion count to 3, not the 1 that it appeared to >>>>> be >> when >>>>> > wait() was initially called. Is that the scenario? >>>>> >>>>> Kind of... except that the locking is not eliminated due to EA and there >>>>> is >> no >>>> JVM TI event >>>>> triggered by wait. >>>>> >>>>> Add >>>>> >>>>> LocalObject l1 = new LocalObject(); >>>>> >>>>> in front of the synchrnized blocks and assume a JVM TI agent acquires l1. >> This >>>> triggers the code in >>>>> question. >>>>> >>>>> See that relocking/reallocating is transactional. If it is done then for >>>>> /all/ >>>> objects in scope and it is >>>>> done at most once. It wouldn't be quite so easy to split this in relocking >> of >>>> nested/EA-based >>>>> eliminated locks. >>>>> >>>>> > If so I find this truly awful. Anyone using wait() in a realistic >>>>> form >>>>> > requires a notification and so the object cannot be thread >>>>> confined. >> In >>>>> >>>>> It is not thread confined. >>>>> >>>>> > which case I would strongly argue that upon hitting the wait() the >> deopt >>>>> > should occur unconditionally and so the lock state is correct >>>>> before >> we >>>>> > wait and so we don't need to mess with the recursion count >> internally >>>>> > when we reacquire the monitor. >>>>> > >>>>> > > >>>>> > > > which I don't like the sound of at all when it comes to >> ObjectMonitor >>>>> > > > state. So I'd like to understand in detail exactly what is >>>>> going on >> here >>>>> > > > and why. This is a very intrusive change that seems to >>>>> badly >> break >>>>> > > > encapsulation and impacts future changes to ObjectMonitor >> that are >>>> under >>>>> > > > investigation. >>>>> > > >>>>> > > I would not regard this as breaking encapsulation. Certainly not >> badly. >>>>> > > >>>>> > > I've added a property relock_count_after_wait to JavaThread. The >>>> property is well >>>>> > > encapsulated. Future ObjectMonitor implementations have to deal >> with >>>> recursion too. They are free >>>>> > > in choosing a way to do that as long as that property is taken >>>>> into >>>> account. This is hardly a >>>>> > > limitation. >>>>> > >>>>> > I do think this badly breaks encapsulation as you have to add a >> callout >>>>> > from the guts of the ObjectMonitor code to reach into the thread >>>>> to >> get >>>>> > this lock count adjustment. I understand why you have had to do >> this but >>>>> > I would much rather see a change to the EA optimisation strategy >>>>> so >> that >>>>> > this is not needed. >>>>> > >>>>> > > Note also that the property is a straight forward extension of >>>>> the >>>> existing concept of deferred >>>>> > > local updates. It is embedded into the structure holding them. So >> not >>>> even the footprint of a >>>>> > > JavaThread is enlarged if no deferred updates are generated. >>>>> > >>>>> > [...] >>>>> > >>>>> > > >>>>> > > I'm actually duplicating the existing external suspend mechanism, >>>> because a thread can be >>>>> > > suspended at most once. And hey, and don't like that either! But >>>>> it >>>> seems not unlikely that the >>>>> > > duplicate can be removed together with the original and the new >> type >>>> of handshakes that will be >>>>> > > used for thread suspend can be used for object deoptimization >> too. See >>>> today's discussion in >>>>> > > JDK-8227745 [2]. >>>>> > >>>>> > I hope that discussion bears some fruit, at the moment it seems >>>>> not >> to >>>>> > be possible to use handshakes here. :( >>>>> > >>>>> > The external suspend mechanism is a royal pain in the proverbial >> that we >>>>> > have to carefully live with. The idea that we're duplicating that >>>>> for >>>>> > use in another fringe area of functionality does not thrill me at >>>>> all. >>>>> > >>>>> > To be clear, I understand the problem that exists and that you >>>>> wish >> to >>>>> > solve, but for the runtime parts I balk at the complexity cost of >>>>> > solving it. >>>>> >>>>> I know it's complex, but by far no rocket science. >>>>> >>>>> Also I find it hard to imagine another fix for JDK-8233915 besides >> changing >>>> the JVM TI specification. >>>>> >>>>> Thanks, Richard. >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes <david.hol...@oracle.com> >>>>> Sent: Dienstag, 17. Dezember 2019 08:03 >>>>> To: Reingruber, Richard <richard.reingru...@sap.com>; serviceability- >>>> d...@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; >> hotspot- >>>> runtime-...@openjdk.java.net; Vladimir Kozlov >> (vladimir.koz...@oracle.com) >>>> <vladimir.koz...@oracle.com> >>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >> Performance >>>> in the Presence of JVMTI Agents >>>>> >>>>> <resend as my mailer crashed during last send> >>>>> >>>>> David >>>>> >>>>> On 17/12/2019 4:57 pm, David Holmes wrote: >>>>>> Hi Richard, >>>>>> >>>>>> On 14/12/2019 5:01 am, Reingruber, Richard wrote: >>>>>>> Hi David, >>>>>>> >>>>>>> > Some further queries/concerns: >>>>>>> > >>>>>>> > src/hotspot/share/runtime/objectMonitor.cpp >>>>>>> > >>>>>>> > Can you please explain the changes to ObjectMonitor::wait: >>>>>>> > >>>>>>> > ! _recursions = save // restore the old recursion count >>>>>>> > ! + >>>>>>> jt->get_and_reset_relock_count_after_wait(); // >>>>>>> > increased by the deferred relock count >>>>>>> > >>>>>>> > what is the "deferred relock count"? I gather it relates to >>>>>>> > >>>>>>> > "The code was extended to be able to deoptimize objects of a >>>>>>> frame that >>>>>>> > is not the top frame and to let another thread than the owning >>>>>>> thread do >>>>>>> > it." >>>>>>> >>>>>>> Yes, these relate. Currently EA based optimizations are reverted, >> when >>>>>>> a compiled frame is replaced >>>>>>> with corresponding interpreter frames. Part of this is relocking >>>>>>> objects with eliminated >>>>>>> locking. New with the enhancement is that we do this also just before >>>>>>> object references are acquired >>>>>>> through JVMTI. In this case we deoptimize also the owning compiled >>>>>>> frame C and we register >>>>>>> deoptimized objects as deferred updates. When control returns to C >> it >>>>>>> gets deoptimized, we notice >>>>>>> that objects are already deoptimized (reallocated and relocked), so >> we >>>>>>> don't do it again (relocking >>>>>>> twice would be incorrect of course). Deferred updates are copied into >>>>>>> the new interpreter frames. >>>>>>> >>>>>>> Problem: relocking is not possible if the target thread T is waiting >>>>>>> on the monitor that needs to be >>>>>>> relocked. This happens only with non-local objects with >>>>>>> EliminateNestedLocks. Instead relocking is >>>>>>> deferred until T owns the monitor again. This is what the piece of >>>>>>> code above does. >>>>>> >>>>>> Sorry I need some more detail here. How can you wait() on an object >>>>>> monitor if the object allocation and/or locking was optimised away? >> And >>>>>> what is a "non-local object" in this context? Isn't EA restricted to >>>>>> thread-confined objects? >>>>>> >>>>>> Is it just that some of the locking gets optimized away e.g. >>>>>> >>>>>> synchronised(obj) { >>>>>> synchronised(obj) { >>>>>> synchronised(obj) { >>>>>> obj.wait(); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> If this is reduced to a form as-if it were a single lock of the monitor >>>>>> (due to EA) and the wait() triggers a JVM TI event which leads to the >>>>>> escape of "obj" then we need to reconstruct the true lock state, and so >>>>>> when the wait() internally unblocks and reacquires the monitor it has to >>>>>> set the true recursion count to 3, not the 1 that it appeared to be when >>>>>> wait() was initially called. Is that the scenario? >>>>>> >>>>>> If so I find this truly awful. Anyone using wait() in a realistic form >>>>>> requires a notification and so the object cannot be thread confined. In >>>>>> which case I would strongly argue that upon hitting the wait() the >> deopt >>>>>> should occur unconditionally and so the lock state is correct before we >>>>>> wait and so we don't need to mess with the recursion count internally >>>>>> when we reacquire the monitor. >>>>>> >>>>>>> >>>>>>> > which I don't like the sound of at all when it comes to >>>>>>> ObjectMonitor >>>>>>> > state. So I'd like to understand in detail exactly what is going >>>>>>> on here >>>>>>> > and why. This is a very intrusive change that seems to badly >> break >>>>>>> > encapsulation and impacts future changes to ObjectMonitor that >>>>>>> are under >>>>>>> > investigation. >>>>>>> >>>>>>> I would not regard this as breaking encapsulation. Certainly not badly. >>>>>>> >>>>>>> I've added a property relock_count_after_wait to JavaThread. The >>>>>>> property is well >>>>>>> encapsulated. Future ObjectMonitor implementations have to deal >> with >>>>>>> recursion too. They are free in >>>>>>> choosing a way to do that as long as that property is taken into >>>>>>> account. This is hardly a >>>>>>> limitation. >>>>>> >>>>>> I do think this badly breaks encapsulation as you have to add a callout >>>>>> from the guts of the ObjectMonitor code to reach into the thread to >> get >>>>>> this lock count adjustment. I understand why you have had to do this >> but >>>>>> I would much rather see a change to the EA optimisation strategy so >> that >>>>>> this is not needed. >>>>>> >>>>>>> Note also that the property is a straight forward extension of the >>>>>>> existing concept of deferred >>>>>>> local updates. It is embedded into the structure holding them. So not >>>>>>> even the footprint of a >>>>>>> JavaThread is enlarged if no deferred updates are generated. >>>>>>> >>>>>>> > --- >>>>>>> > >>>>>>> > src/hotspot/share/runtime/thread.cpp >>>>>>> > >>>>>>> > Can you please explain why >>>>>>> JavaThread::wait_for_object_deoptimization >>>>>>> > has to be handcrafted in this way rather than using proper >>>>>>> transitions. >>>>>>> > >>>>>>> >>>>>>> I wrote wait_for_object_deoptimization taking >>>>>>> JavaThread::java_suspend_self_with_safepoint_check >>>>>>> as template. So in short: for the same reasons :) >>>>>>> >>>>>>> Threads reach both methods as part of thread state transitions, >>>>>>> therefore special handling is >>>>>>> required to change thread state on top of ongoing transitions. >>>>>>> >>>>>>> > We got rid of "deopt suspend" some time ago and it is disturbing >>>>>>> to see >>>>>>> > it being added back (effectively). This seems like it may be >>>>>>> something >>>>>>> > that handshakes could be used for. >>>>>>> >>>>>>> Deopt suspend used to be something rather different with a similar >>>>>>> name[1]. It is not being added back. >>>>>> >>>>>> I stand corrected. Despite comments in the code to the contrary >>>>>> deopt_suspend didn't actually cause a self-suspend. I was doing a lot of >>>>>> cleanup in this area 13 years ago :) >>>>>> >>>>>>> >>>>>>> I'm actually duplicating the existing external suspend mechanism, >>>>>>> because a thread can be suspended >>>>>>> at most once. And hey, and don't like that either! But it seems not >>>>>>> unlikely that the duplicate can >>>>>>> be removed together with the original and the new type of >> handshakes >>>>>>> that will be used for >>>>>>> thread suspend can be used for object deoptimization too. See >> today's >>>>>>> discussion in JDK-8227745 [2]. >>>>>> >>>>>> I hope that discussion bears some fruit, at the moment it seems not to >>>>>> be possible to use handshakes here. :( >>>>>> >>>>>> The external suspend mechanism is a royal pain in the proverbial that >> we >>>>>> have to carefully live with. The idea that we're duplicating that for >>>>>> use in another fringe area of functionality does not thrill me at all. >>>>>> >>>>>> To be clear, I understand the problem that exists and that you wish to >>>>>> solve, but for the runtime parts I balk at the complexity cost of >>>>>> solving it. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Thanks, Richard. >>>>>>> >>>>>>> [1] Deopt suspend was something like an async. handshake for >>>>>>> architectures with register windows, >>>>>>> where patching the return pc for deoptimization of a compiled >>>>>>> frame was racy if the owner thread >>>>>>> was in native code. Instead a "deopt" suspend flag was set on >>>>>>> which the thread patched its own >>>>>>> frame upon return from native. So no thread was suspended. It >> got >>>>>>> its name only from the name of >>>>>>> the flags. >>>>>>> >>>>>>> [2] Discussion about using handshakes to sync. with the target thread: >>>>>>> >>>>>>> https://bugs.openjdk.java.net/browse/JDK- >>>> >> 8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.syst >> e >>>> m.issuetabpanels:comment-tabpanel#comment-14306727 >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: David Holmes <david.hol...@oracle.com> >>>>>>> Sent: Freitag, 13. Dezember 2019 00:56 >>>>>>> To: Reingruber, Richard <richard.reingru...@sap.com>; >>>>>>> serviceability-dev@openjdk.java.net; >>>>>>> hotspot-compiler-...@openjdk.java.net; >>>>>>> hotspot-runtime-...@openjdk.java.net >>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>>>>> Performance in the Presence of JVMTI Agents >>>>>>> >>>>>>> Hi Richard, >>>>>>> >>>>>>> Some further queries/concerns: >>>>>>> >>>>>>> src/hotspot/share/runtime/objectMonitor.cpp >>>>>>> >>>>>>> Can you please explain the changes to ObjectMonitor::wait: >>>>>>> >>>>>>> ! _recursions = save // restore the old recursion count >>>>>>> ! + jt->get_and_reset_relock_count_after_wait(); // >>>>>>> increased by the deferred relock count >>>>>>> >>>>>>> what is the "deferred relock count"? I gather it relates to >>>>>>> >>>>>>> "The code was extended to be able to deoptimize objects of a frame >> that >>>>>>> is not the top frame and to let another thread than the owning thread >> do >>>>>>> it." >>>>>>> >>>>>>> which I don't like the sound of at all when it comes to ObjectMonitor >>>>>>> state. So I'd like to understand in detail exactly what is going on here >>>>>>> and why. This is a very intrusive change that seems to badly break >>>>>>> encapsulation and impacts future changes to ObjectMonitor that are >> under >>>>>>> investigation. >>>>>>> >>>>>>> --- >>>>>>> >>>>>>> src/hotspot/share/runtime/thread.cpp >>>>>>> >>>>>>> Can you please explain why >> JavaThread::wait_for_object_deoptimization >>>>>>> has to be handcrafted in this way rather than using proper transitions. >>>>>>> >>>>>>> We got rid of "deopt suspend" some time ago and it is disturbing to >> see >>>>>>> it being added back (effectively). This seems like it may be something >>>>>>> that handshakes could be used for. >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> On 12/12/2019 7:02 am, David Holmes wrote: >>>>>>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote: >>>>>>>>> Hi David, >>>>>>>>> >>>>>>>>> > Most of the details here are in areas I can comment on in >> detail, >>>>>>>>> but I >>>>>>>>> > did take an initial general look at things. >>>>>>>>> >>>>>>>>> Thanks for taking the time! >>>>>>>> >>>>>>>> Apologies the above should read: >>>>>>>> >>>>>>>> "Most of the details here are in areas I *can't* comment on in detail >>>>>>>> ..." >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>>>> > The only thing that jumped out at me is that I think the >>>>>>>>> > DeoptimizeObjectsALotThread should be a hidden thread. >>>>>>>>> > >>>>>>>>> > + bool is_hidden_from_external_view() const { return true; } >>>>>>>>> >>>>>>>>> Yes, it should. Will add the method like above. >>>>>>>>> >>>>>>>>> > Also I don't see any testing of the >> DeoptimizeObjectsALotThread. >>>>>>>>> Without >>>>>>>>> > active testing this will just bit-rot. >>>>>>>>> >>>>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger >>>>>>>>> workload. I will add a minimal test >>>>>>>>> to keep it fresh. >>>>>>>>> >>>>>>>>> > Also on the tests I don't understand your @requires clause: >>>>>>>>> > >>>>>>>>> > @requires ((vm.compMode != "Xcomp") & >> vm.compiler2.enabled >>>> & >>>>>>>>> > (vm.opt.TieredCompilation != true)) >>>>>>>>> > >>>>>>>>> > This seems to require that TieredCompilation is disabled, but >>>>>>>>> tiered is >>>>>>>>> > our normal mode of operation. ?? >>>>>>>>> > >>>>>>>>> >>>>>>>>> I removed the clause. I guess I wanted to target the tests towards >> the >>>>>>>>> code they are supposed to >>>>>>>>> test, and it's easier to analyze failures w/o tiered compilation and >>>>>>>>> with just one compiler thread. >>>>>>>>> >>>>>>>>> Additionally I will make use of >>>>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: David Holmes <david.hol...@oracle.com> >>>>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03 >>>>>>>>> To: Reingruber, Richard <richard.reingru...@sap.com>; >>>>>>>>> serviceability-dev@openjdk.java.net; >>>>>>>>> hotspot-compiler-...@openjdk.java.net; >>>>>>>>> hotspot-runtime-...@openjdk.java.net >>>>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better >>>>>>>>> Performance in the Presence of JVMTI Agents >>>>>>>>> >>>>>>>>> Hi Richard, >>>>>>>>> >>>>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I would like to get reviews please for >>>>>>>>>> >>>>>>>>>> >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ >>>>>>>>>> >>>>>>>>>> Corresponding RFE: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 >>>>>>>>>> >>>>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 >>>>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK- >> 8214584 [1] >>>>>>>>>> >>>>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing >> without >>>>>>>>>> issues (thanks!). In addition the >>>>>>>>>> change is being tested at SAP since I posted the first RFR some >>>>>>>>>> months ago. >>>>>>>>>> >>>>>>>>>> The intention of this enhancement is to benefit performance wise >> from >>>>>>>>>> escape analysis even if JVMTI >>>>>>>>>> agents request capabilities that allow them to access local variable >>>>>>>>>> values. E.g. if you start-up >>>>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, >> then >>>>>>>>>> escape analysis is disabled right >>>>>>>>>> from the beginning, well before a debugger attaches -- if ever one >>>>>>>>>> should do so. With the >>>>>>>>>> enhancement, escape analysis will remain enabled until and after >> a >>>>>>>>>> debugger attaches. EA based >>>>>>>>>> optimizations are reverted just before an agent acquires the >>>>>>>>>> reference to an object. In the JBS item >>>>>>>>>> you'll find more details. >>>>>>>>> >>>>>>>>> Most of the details here are in areas I can comment on in detail, but >> I >>>>>>>>> did take an initial general look at things. >>>>>>>>> >>>>>>>>> The only thing that jumped out at me is that I think the >>>>>>>>> DeoptimizeObjectsALotThread should be a hidden thread. >>>>>>>>> >>>>>>>>> + bool is_hidden_from_external_view() const { return true; } >>>>>>>>> >>>>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. >>>>>>>>> Without >>>>>>>>> active testing this will just bit-rot. >>>>>>>>> >>>>>>>>> Also on the tests I don't understand your @requires clause: >>>>>>>>> >>>>>>>>> @requires ((vm.compMode != "Xcomp") & >> vm.compiler2.enabled & >>>>>>>>> (vm.opt.TieredCompilation != true)) >>>>>>>>> >>>>>>>>> This seems to require that TieredCompilation is disabled, but tiered >> is >>>>>>>>> our normal mode of operation. ?? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 >>>>>>>>>> >>>> >> http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.pa >> tc >>>> h >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>