<resend as my mailer crashed during last send>

David

On 17/12/2019 4:57 pm, David Holmes wrote:
Hi Richard,

On 14/12/2019 5:01 am, Reingruber, Richard wrote:
Hi David,

   > Some further queries/concerns:
   >
   > src/hotspot/share/runtime/objectMonitor.cpp
   >
   > Can you please explain the changes to ObjectMonitor::wait:
   >
   > !   _recursions = save      // restore the old recursion count
   > !                 + jt->get_and_reset_relock_count_after_wait(); //
   > increased by the deferred relock count
   >
   > what is the "deferred relock count"? I gather it relates to
   >
   > "The code was extended to be able to deoptimize objects of a frame that    > is not the top frame and to let another thread than the owning thread do
   > it."

Yes, these relate. Currently EA based optimizations are reverted, when a compiled frame is replaced with corresponding interpreter frames. Part of this is relocking objects with eliminated locking. New with the enhancement is that we do this also just before object references are acquired through JVMTI. In this case we deoptimize also the owning compiled frame C and we register deoptimized objects as deferred updates. When control returns to C it gets deoptimized, we notice that objects are already deoptimized (reallocated and relocked), so we don't do it again (relocking twice would be incorrect of course). Deferred updates are copied into the new interpreter frames.

Problem: relocking is not possible if the target thread T is waiting on the monitor that needs to be relocked. This happens only with non-local objects with EliminateNestedLocks. Instead relocking is deferred until T owns the monitor again. This is what the piece of code above does.

Sorry I need some more detail here. How can you wait() on an object monitor if the object allocation and/or locking was optimised away? And what is a "non-local object" in this context? Isn't EA restricted to thread-confined objects?

Is it just that some of the locking gets optimized away e.g.

synchronised(obj) {
   synchronised(obj) {
     synchronised(obj) {
       obj.wait();
     }
   }
}

If this is reduced to a form as-if it were a single lock of the monitor (due to EA) and the wait() triggers a JVM TI event which leads to the escape of "obj" then we need to reconstruct the true lock state, and so when the wait() internally unblocks and reacquires the monitor it has to set the true recursion count to 3, not the 1 that it appeared to be when wait() was initially called. Is that the scenario?

If so I find this truly awful. Anyone using wait() in a realistic form requires a notification and so the object cannot be thread confined. In which case I would strongly argue that upon hitting the wait() the deopt should occur unconditionally and so the lock state is correct before we wait and so we don't need to mess with the recursion count internally when we reacquire the monitor.


   > which I don't like the sound of at all when it comes to ObjectMonitor    > state. So I'd like to understand in detail exactly what is going on here
   > and why.  This is a very intrusive change that seems to badly break
   > encapsulation and impacts future changes to ObjectMonitor that are under
   > investigation.

I would not regard this as breaking encapsulation. Certainly not badly.

I've added a property relock_count_after_wait to JavaThread. The property is well encapsulated. Future ObjectMonitor implementations have to deal with recursion too. They are free in choosing a way to do that as long as that property is taken into account. This is hardly a
limitation.

I do think this badly breaks encapsulation as you have to add a callout from the guts of the ObjectMonitor code to reach into the thread to get this lock count adjustment. I understand why you have had to do this but I would much rather see a change to the EA optimisation strategy so that this is not needed.

Note also that the property is a straight forward extension of the existing concept of deferred local updates. It is embedded into the structure holding them. So not even the footprint of a
JavaThread is enlarged if no deferred updates are generated.

   > ---
   >
   > src/hotspot/share/runtime/thread.cpp
   >
   > Can you please explain why JavaThread::wait_for_object_deoptimization    > has to be handcrafted in this way rather than using proper transitions.
   >

I wrote wait_for_object_deoptimization taking JavaThread::java_suspend_self_with_safepoint_check
as template. So in short: for the same reasons :)

Threads reach both methods as part of thread state transitions, therefore special handling is
required to change thread state on top of ongoing transitions.

   > We got rid of "deopt suspend" some time ago and it is disturbing to see    > it being added back (effectively). This seems like it may be something
   > that handshakes could be used for.

Deopt suspend used to be something rather different with a similar name[1]. It is not being added back.

I stand corrected. Despite comments in the code to the contrary deopt_suspend didn't actually cause a self-suspend. I was doing a lot of cleanup in this area 13 years ago :)


I'm actually duplicating the existing external suspend mechanism, because a thread can be suspended at most once. And hey, and don't like that either! But it seems not unlikely that the duplicate can be removed together with the original and the new type of handshakes that will be used for thread suspend can be used for object deoptimization too. See today's discussion in JDK-8227745 [2].

I hope that discussion bears some fruit, at the moment it seems not to be possible to use handshakes here. :(

The external suspend mechanism is a royal pain in the proverbial that we have to carefully live with. The idea that we're duplicating that for use in another fringe area of functionality does not thrill me at all.

To be clear, I understand the problem that exists and that you wish to solve, but for the runtime parts I balk at the complexity cost of solving it.

Thanks,
David
-----

Thanks, Richard.

[1] Deopt suspend was something like an async. handshake for architectures with register windows,      where patching the return pc for deoptimization of a compiled frame was racy if the owner thread      was in native code. Instead a "deopt" suspend flag was set on which the thread patched its own      frame upon return from native. So no thread was suspended. It got its name only from the name of
     the flags.

[2] Discussion about using handshakes to sync. with the target thread:
https://bugs.openjdk.java.net/browse/JDK-8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14306727

-----Original Message-----
From: David Holmes <david.hol...@oracle.com>
Sent: Freitag, 13. Dezember 2019 00:56
To: Reingruber, Richard <richard.reingru...@sap.com>; serviceability-dev@openjdk.java.net; hotspot-compiler-...@openjdk.java.net; hotspot-runtime-...@openjdk.java.net Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents

Hi Richard,

Some further queries/concerns:

src/hotspot/share/runtime/objectMonitor.cpp

Can you please explain the changes to ObjectMonitor::wait:

!   _recursions = save      // restore the old recursion count
!                 + jt->get_and_reset_relock_count_after_wait(); //
increased by the deferred relock count

what is the "deferred relock count"? I gather it relates to

"The code was extended to be able to deoptimize objects of a frame that
is not the top frame and to let another thread than the owning thread do
it."

which I don't like the sound of at all when it comes to ObjectMonitor
state. So I'd like to understand in detail exactly what is going on here
and why.  This is a very intrusive change that seems to badly break
encapsulation and impacts future changes to ObjectMonitor that are under
investigation.

---

src/hotspot/share/runtime/thread.cpp

Can you please explain why JavaThread::wait_for_object_deoptimization
has to be handcrafted in this way rather than using proper transitions.

We got rid of "deopt suspend" some time ago and it is disturbing to see
it being added back (effectively). This seems like it may be something
that handshakes could be used for.

Thanks,
David
-----

On 12/12/2019 7:02 am, David Holmes wrote:
On 12/12/2019 1:07 am, Reingruber, Richard wrote:
Hi David,

    > Most of the details here are in areas I can comment on in detail,
but I
    > did take an initial general look at things.

Thanks for taking the time!

Apologies the above should read:

"Most of the details here are in areas I *can't* comment on in detail ..."

David

    > The only thing that jumped out at me is that I think the
    > DeoptimizeObjectsALotThread should be a hidden thread.
    >
    > +  bool is_hidden_from_external_view() const { return true; }

Yes, it should. Will add the method like above.

    > Also I don't see any testing of the DeoptimizeObjectsALotThread.
Without
    > active testing this will just bit-rot.

DeoptimizeObjectsALot is meant for stress testing with a larger
workload. I will add a minimal test
to keep it fresh.

    > Also on the tests I don't understand your @requires clause:
    >
    >   @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
    > (vm.opt.TieredCompilation != true))
    >
    > This seems to require that TieredCompilation is disabled, but
tiered is
    > our normal mode of operation. ??
    >

I removed the clause. I guess I wanted to target the tests towards the
code they are supposed to
test, and it's easier to analyze failures w/o tiered compilation and
with just one compiler thread.

Additionally I will make use of
compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the tests.

Thanks,
Richard.

-----Original Message-----
From: David Holmes <david.hol...@oracle.com>
Sent: Mittwoch, 11. Dezember 2019 08:03
To: Reingruber, Richard <richard.reingru...@sap.com>;
serviceability-dev@openjdk.java.net;
hotspot-compiler-...@openjdk.java.net;
hotspot-runtime-...@openjdk.java.net
Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better
Performance in the Presence of JVMTI Agents

Hi Richard,

On 11/12/2019 7:45 am, Reingruber, Richard wrote:
Hi,

I would like to get reviews please for

http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/

Corresponding RFE:
https://bugs.openjdk.java.net/browse/JDK-8227745

Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915
And potentially https://bugs.openjdk.java.net/browse/JDK-8214584 [1]

Vladimir Kozlov kindly put webrev.3 through tier1-8 testing without
issues (thanks!). In addition the
change is being tested at SAP since I posted the first RFR some
months ago.

The intention of this enhancement is to benefit performance wise from
escape analysis even if JVMTI
agents request capabilities that allow them to access local variable
values. E.g. if you start-up
with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, then
escape analysis is disabled right
from the beginning, well before a debugger attaches -- if ever one
should do so. With the
enhancement, escape analysis will remain enabled until and after a
debugger attaches. EA based
optimizations are reverted just before an agent acquires the
reference to an object. In the JBS item
you'll find more details.

Most of the details here are in areas I can comment on in detail, but I
did take an initial general look at things.

The only thing that jumped out at me is that I think the
DeoptimizeObjectsALotThread should be a hidden thread.

+  bool is_hidden_from_external_view() const { return true; }

Also I don't see any testing of the DeoptimizeObjectsALotThread. Without
active testing this will just bit-rot.

Also on the tests I don't understand your @requires clause:

    @requires ((vm.compMode != "Xcomp") & vm.compiler2.enabled &
(vm.opt.TieredCompilation != true))

This seems to require that TieredCompilation is disabled, but tiered is
our normal mode of operation. ??

Thanks,
David

Thanks,
Richard.

[1] Experimental fix for JDK-8214584 based on JDK-8227745
http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.patch


Reply via email to