Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode

2016-01-26 Thread John Rose
On Jan 20, 2016, at 4:13 AM, Remi Forax  wrote:
> 
> I understand that having the VM that may always recompile may be seen as a 
> bug,
> but having the VM that bailout and stop recompiling, or more generally change 
> the compilation strategy is a bug too.

As you can guess from my previous message, I agree with this, except for
"change the compilation strategy".  The JVM earns its way in the world by
routinely changing compilation strategy.  The reason most people don't
notice is the strategy changes are profile-driven and self-correcting.

Nothing in the 292 world promises a particular strategy, just a best effort
to create and execute great code, assuming stable application behavior.

When an optimization breaks, the JVM's strategy may also fail to adjust
correctly.  One symptom of that is infinite recompilation, usually because
one line of code is being handled badly, but which creates huge bloat
in the code cache for thousands of lines of code that happen to be
inlined nearby.  We try hard to avoid this.

We also try hard to detect this problem.  That is the true meaning of those
strange cutoffs.  Nobody things falling into the interpreter is a good idea,
except that it, on balance, is a better idea than (a) throwing an assertion
error, or (b) filling the CPU with JIT jobs and the code cache with discards.
The third choice (c) run offending method in the interpreter at least
preserves a degree of forward progress, while allowing the outraged
user to report a bug.

The correct fix to the bug, IMO, is never to jump from (c) to (a) or (b).
It is to find and fix the problem with the compilation strategy, and the
profile-driven gating logic for it.

If your car's transmission gets a bug (now that they are computers,
they can), what would you prefer?
(a) stop the car immediately,
(b) run the car in first gear at full speed, or
(c) slow the car to a defined speed limit (25mph).
Detroit prefers, and the JVM implements, option (c).

> The problem here is that there is no way from the point of view of a dyn lang 
> runtime to know what will be the behavior of the VM for a callsite if the VM 
> decide to stop to recompile, decide to not inline, decide to inline some part 
> of the tree, etc.

Yes.  And it usually doesn't matter; the issue doesn't come up until something 
breaks,
or we find a performance pothole.  The current problem is (in my mind) a break, 
not
a performance pothole that needs tuning.  If we fix the break, people shouldn't 
need
to worry about this stuff, usually.

> Said differently, using an invokedynamic allows to create code shapes that 
> will change dynamically, if the VM behavior also changes dynamically, it's 
> like building a wall on moving parts, the result is strange dynamic behaviors 
> that are hard to diagnose and reproduce.

JVMs have always been like that, because of dynamic class loading, but with indy
it is more so, since it's much easier to "override" some previously fixed 
behavior.

> The recompilation behavior of the VM should be keep simple and predicatable, 
> basically, the VM should always recompile the CS with no failsafe switch.

We agree that the failsafe should not trip.  Just like we agree that
the circuit breakers in our building should not trip.  We disagree,
perhaps, about what to do when they trip.  I don't want to duct-tape
them back into the "on" position; do you?

> If dyn lang runtime devs have trouble with that, they can already use an 
> exactInvoker to simulate an indirect mh call and we can even provide new 
> method handle combiners to gracefully handle multi-stable CS.

That's all true.  The new combiners might have some sort
of handshake with the JVM to self-adjust their code shape.

But I claim the baseline behavior that I have called for is
the most generally useful, since it is able to amortize
recompilation resources over multiple CS misses,
put global limits on total recompilation effort, and
preserve reasonable forward progress executing
good-enough code.

(Having a CS change force a reoptimization is tantamount to
adding a JIT control API, as Compiler.recompile(cs) like
System.gc().  But just for CS-bearing methods.  We are
a long way from understanding how to work such an API.)

Idea:  Perhaps CS's should have a callback which says,
"Hey, CS, the JIT has mispredicted you a bunch of times;
would you like to nominate an alternative representation?"
The call would be made asynchronously, outside the JIT.
The default behavior would be to say "nope" with the results
given above, but the CS could also return a MH (perhaps
the CS.dynamicInvoker, or perhaps some more elaborate logic),
which the JVM would slide into place over the top of the CS.
Despite the fact that CS bindings are final, the new binding
would take its place.  And it would be the user's choice
whether that binding pointed to the old CS or a new CS or
some combination of both.

— John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
ht

Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode

2016-01-26 Thread John Rose
What I would like to see is for users to feel free to use CallSites
with any amount of mutability, and have the JVM pick a good
strategy for speculating and optimizing through CS target bindings.

By "good" I mean that, if the CS is not megamutable, you get
the performance comparable to an "invokestatic".  But if the
CS *is* megamutable (unstable), it is not "good" (IMO) to issue
a storm of recompilations, especially if (as is usually the case)
the megamutable CS is one of 1000s of other call sites in the
same code blob, all of which must be recompiled because one
CS had a problem.

Instead, the megamutable CS should be downgraded to an
indirect call through a normal (or volatile) variable.

So does this leave some performance on the floor?  Of course;
perhaps the CS finally settles down long enough for the JVM
to venture a profitable recompilation, and for the cost of recompilation
to be paid off by further stability and efficient execution of the CS.

My main point here is that reoptimization of megamutables is
a misuse of speculation.  I'm not saying that the JIT should have
a tantrum and refuse to compile the call site (which is a bug),
but it should stop speculating that it is stable when in fact it is not.

There are lots of ways to improve the performance of megamutables,
but unconditional recompilation is not one of those ways.  It uses
a wrecking ball to swat a fly.

Handling megamutables is very much like handling megamorphics.
You want to hang on to the hope that there are really just a few
branches (common case) and optimize those, and call out-of-line
for the rest.  If that hope fails, you call out-of-line always.  And
you want to detect if the statistics change, where the entropy of
the CS target goes down to a small number, so you can venture
another recompile with up-to-date speculation.  We should apply
these techniques to both megamorphics and megamutables.

So there's an ambiguity in the contract:  Is CS speculation just
a best-efforts kind of thing, or is the JVM contracted to mechanically
recompile on every CS change?  I think the reasonable reading
of the javadoc (etc.) is the first, not the second.

How would a user communicate that his CS is a special one,
whose invalidation should *always* trigger reoptimization?
I don't know, maybe an integer-valued callback that is triggered
during setTarget calls, and returns the amount of (virtual)
time before the next reoptimization should be attempted.
The callback would be passed the number of previous
reoptimizations (at this site or in the whole method or
both), as a warning of how resource-intensive this CS
is becoming.  Returning constant zero means the
current behavior.  I think you can see lots of problems
with such an API.

And, I think that sort of thing isn't notably better than simple
JVM heuristics.  Here's how I think we should fix the
megamutable problem:

1. Speculate at first that a CS is immutable.

2. If that fails, speculate that it is stable, as:
   if (cs.t == expected) inline expected(); else outline cs.t();
Collect a profile count along the outline path.

3. Every once in a while, if a code blog is accumulating
outline counts, queue it for reoptimization.
Crucially, do this in such a way that the JIT does
not become a foreground consumer of CPU cycles.

4. When recompiling a stable call site, always
inline the current target ("this time fer sure!").
Maybe if this is a *really* bad actor (but how
can you tell?) forget the speculation part.

5. Maybe, speculate on the LF of the target,
not the target itself, to allow some degree of
harmless variation by targets.  (For some
codes that will help, although it interacts
with MH customization in tricky ways.)

6. Maybe fiddle with collecting previous hot targets,
or (better) empower the JDK code to manage that stuff.
PIC logic should be handled at the JDK level,
not in the JIT.

Anyway, if the above gets addressed eventually,
or if the rest of the MLVM crew proves that I don't
know what I'm talking about, I'm OK with this fix.

"Reviewed", assuming future improvements.

— John

On Jan 20, 2016, at 3:54 AM, Vladimir Ivanov  
wrote:
> 
> John, Chris, thanks for the feedback.
> 
> I don't think it is only about microbenchmarks. Long-running large 
> applications with lots of mutable call sites should also benefit for this 
> change. Current JVM behavior counts invalidations on root method, so nmethods 
> with multiple mutable call sites (from root & all inlined callees) are more 
> likely to hit the limit, even if there's no mega-mutable sites. It just sums 
> up and PerMethodRecompilationCutoff (= 400, by default) doesn't look like a 
> huge number.
> 
> Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms were 
> mostly customized, different method handle chains were compiled into a single 
> nmethod. Right now, it means that not only the root method is always 
> interpreted, but all bound method handle chains are broken into numerous 
> per-LF nmethods (see

Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode

2016-01-20 Thread MacGregor, Duncan (GE Energy Management)
I was going to say it is unlikely to matter in production cases but might
well hit test code which does extensive meta-programming, but actually,
since it¹s a question of invalidations across _all_ sites, rather than any
single one I think it might make a difference. I¹ll need to take a look at
what our compilation counts eventually come to and experiment with
changing the limits. We did work quite early on to limit the the extent of
call site invalidations.

One thing that might affect this is how megamorphic call sites are
handled. At the moment we keep a cache of classes, method handles, and
switch points, and we check the switch point before calling the method
handle. I had considered a change to bind the switch points to the method
handles and thus allow those checks to be optimised out for methods called
extensively from mega-morphia call sites, would that also fall foul of the
compilation count being increased?

I think there is definitely room for communicating more in the nature of a
callsite to the JIT. Whether this should be around recompilation or
perhaps more focused round inlining and type specialisation to avoid
invalidations and recompilation would be my question. For example, method
invocation sites may go megamorphic, and this currently forms a barrier to
the JIT seeing the types in a way that doesn¹t really exist with standard
invokeVirtual sites. If there was some feedback loop allowing sites to be
cloned as methods are inlined, and a way to indicate this was allowed or
desired, then that might allow significantly more optimisations to happen
in invokeDynamic based languages. It would also probably be a horror to
implement in the current model, but I¹m sure you guys can fix all that. :-)

Duncan.

On 20/01/2016, 11:54, "mlvm-dev on behalf of Vladimir Ivanov"
 wrote:
>MLVM folks, I'd like to hear your opinion about what kind of behavior do
>you expect from JVM w.r.t. mutable call sites.
>
>There are valid use-cases when JVM shouldn't throttle the recompilation
>(e.g., long-running application with indy-based dynamic tracing). Maybe
>there's a place for a new CallSite flavor to clearly communicate
>application expectations to the JVM? Either always recompile (thus
>eventually reaching peak performance) or give up and generate less
>efficient machine code, but save on possible recompilations.
>
>Best regards,
>Vladimir Ivanov
>
>On 1/20/16 2:37 AM, John Rose wrote:
>> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov
>> mailto:vladimir.x.iva...@oracle.com>>
>>wrote:
>>>
>>> The fix is to avoid updating recompilation count when corresponding
>>> nmethod is invalidated due to a call site target change.
>>
>> Although I'm not vetoing it (since it seems it will help customers in
>> the short term), I'm uncomfortable with this fix because it doesn't
>> scale to large dyn. lang. applications with many unstable call sites.
>>   Put another way, it feels like we are duct-taping down a failsafe
>> switch (against infinite recompilation) in order to spam a
>> micro-benchmark:  a small number mega-mutable call sites for which we
>> are willing to spend (potentially) all of the JIT resources, including
>> those usually allocated to application performance in the steady state.
>>   Put a third way:  I am not comfortable with unthrottled infinite
>> recompilation as a performance strategy.
>>
>> I've commented on the new RFE (JDK-8147550) where to go next, including
>> the following sentiments:
>>
>>> There is a serious design tension here, though: Some users apparently
>>> are willing to endure an infinite series of recompilations as part of
>>> the cost of doing business; JDK-7177745 addresses this need by turning
>>> off the fail-safe against (accidental, buggy) infinite recompilation
>>> for unstable CSs. Other users might find that having a percentage of
>>> machine time devoted to recompilation is a problem. (This has been the
>>> case in the past with non-dynamic languages, at least.) The code shape
>>> proposed in this bug report would cover all simple unstable call
>>> sites (bi-stable, for example, would compile to a bi-morphic call),
>>> but, in pathological cases (infinite sequence of distinct CS targets)
>>> would "settle down" into a code shape that would be sub-optimal for
>>> any single target, but (as an indirect MH call) reasonable for all the
>>> targets together.
>>>
>>> In the absence of clear direction from the user or the profile, the
>>> JVM has to choose infinite recompilation or a good-enough final
>>> compilation. The latter choice is safer. And the
>>> infinite recompilation is less safe because there is no intrinsic
>>> bound on the amount of machine cycles that could be diverted to
>>> recompilation, given a dynamic language application with
>>> enough mega-mutable CSs. Settling down to a network of indirect calls
>>> has a bounded cost.
>>>
>>> Yes, one size-fits-all tactics never please everybody. But the JVM
>>> should not choose tactics with unlimited downsides.
>>
>> ‹ John

Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode

2016-01-20 Thread Remi Forax
Hi John, 
I understand that having the VM that may always recompile may be seen as a bug, 
but having the VM that bailout and stop recompiling, or more generally change 
the compilation strategy is a bug too. 

The problem here is that there is no way from the point of view of a dyn lang 
runtime to know what will be the behavior of the VM for a callsite if the VM 
decide to stop to recompile, decide to not inline, decide to inline some part 
of the tree, etc. 
Said differently, using an invokedynamic allows to create code shapes that will 
change dynamically, if the VM behavior also changes dynamically, it's like 
building a wall on moving parts, the result is strange dynamic behaviors that 
are hard to diagnose and reproduce. 

The recompilation behavior of the VM should be keep simple and predicatable, 
basically, the VM should always recompile the CS with no failsafe switch. 
If dyn lang runtime devs have trouble with that, they can already use an 
exactInvoker to simulate an indirect mh call and we can even provide new method 
handle combiners to gracefully handle multi-stable CS. 

regards, 
Rémi 

- Mail original -

> De: "John Rose" 
> À: "Vladimir Ivanov" 
> Cc: "hotspot compiler" 
> Envoyé: Mercredi 20 Janvier 2016 00:37:29
> Objet: Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause
> target method to always run in interpreter mode

> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov < vladimir.x.iva...@oracle.com >
> wrote:

> > The fix is to avoid updating recompilation count when corresponding nmethod
> > is invalidated due to a call site target change.
> 

> Although I'm not vetoing it (since it seems it will help customers in the
> short term), I'm uncomfortable with this fix because it doesn't scale to
> large dyn. lang. applications with many unstable call sites. Put another
> way, it feels like we are duct-taping down a failsafe switch (against
> infinite recompilation) in order to spam a micro-benchmark: a small number
> mega-mutable call sites for which we are willing to spend (potentially) all
> of the JIT resources, including those usually allocated to application
> performance in the steady state. Put a third way: I am not comfortable with
> unthrottled infinite recompilation as a performance strategy.

> I've commented on the new RFE (JDK-8147550) where to go next, including the
> following sentiments:

> > There is a serious design tension here, though: Some users apparently are
> > willing to endure an infinite series of recompilations as part of the cost
> > of doing business; JDK-7177745 addresses this need by turning off the
> > fail-safe against (accidental, buggy) infinite recompilation for unstable
> > CSs. Other users might find that having a percentage of machine time
> > devoted
> > to recompilation is a problem. (This has been the case in the past with
> > non-dynamic languages, at least.) The code shape proposed in this bug
> > report
> > would cover all simple unstable call sites (bi-stable, for example, would
> > compile to a bi-morphic call), but, in pathological cases (infinite
> > sequence
> > of distinct CS targets) would "settle down" into a code shape that would be
> > sub-optimal for any single target, but (as an indirect MH call) reasonable
> > for all the targets together.
> 

> > In the absence of clear direction from the user or the profile, the JVM has
> > to choose infinite recompilation or a good-enough final compilation. The
> > latter choice is safer. And the infinite recompilation is less safe because
> > there is no intrinsic bound on the amount of machine cycles that could be
> > diverted to recompilation, given a dynamic language application with enough
> > mega-mutable CSs. Settling down to a network of indirect calls has a
> > bounded
> > cost.
> 

> > Yes, one size-fits-all tactics never please everybody. But the JVM should
> > not
> > choose tactics with unlimited downsides.
> 

> — John
___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode

2016-01-20 Thread Vladimir Ivanov

John, Chris, thanks for the feedback.

I don't think it is only about microbenchmarks. Long-running large 
applications with lots of mutable call sites should also benefit for 
this change. Current JVM behavior counts invalidations on root method, 
so nmethods with multiple mutable call sites (from root & all inlined 
callees) are more likely to hit the limit, even if there's no 
mega-mutable sites. It just sums up and PerMethodRecompilationCutoff (= 
400, by default) doesn't look like a huge number.


Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms 
were mostly customized, different method handle chains were compiled 
into a single nmethod. Right now, it means that not only the root method 
is always interpreted, but all bound method handle chains are broken 
into numerous per-LF nmethods (see JDK-8069591 for some details).


MLVM folks, I'd like to hear your opinion about what kind of behavior do 
you expect from JVM w.r.t. mutable call sites.


There are valid use-cases when JVM shouldn't throttle the recompilation 
(e.g., long-running application with indy-based dynamic tracing). Maybe 
there's a place for a new CallSite flavor to clearly communicate 
application expectations to the JVM? Either always recompile (thus 
eventually reaching peak performance) or give up and generate less 
efficient machine code, but save on possible recompilations.


Best regards,
Vladimir Ivanov

On 1/20/16 2:37 AM, John Rose wrote:

On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov
mailto:vladimir.x.iva...@oracle.com>> wrote:


The fix is to avoid updating recompilation count when corresponding
nmethod is invalidated due to a call site target change.


Although I'm not vetoing it (since it seems it will help customers in
the short term), I'm uncomfortable with this fix because it doesn't
scale to large dyn. lang. applications with many unstable call sites.
  Put another way, it feels like we are duct-taping down a failsafe
switch (against infinite recompilation) in order to spam a
micro-benchmark:  a small number mega-mutable call sites for which we
are willing to spend (potentially) all of the JIT resources, including
those usually allocated to application performance in the steady state.
  Put a third way:  I am not comfortable with unthrottled infinite
recompilation as a performance strategy.

I've commented on the new RFE (JDK-8147550) where to go next, including
the following sentiments:


There is a serious design tension here, though: Some users apparently
are willing to endure an infinite series of recompilations as part of
the cost of doing business; JDK-7177745 addresses this need by turning
off the fail-safe against (accidental, buggy) infinite recompilation
for unstable CSs. Other users might find that having a percentage of
machine time devoted to recompilation is a problem. (This has been the
case in the past with non-dynamic languages, at least.) The code shape
proposed in this bug report would cover all simple unstable call
sites (bi-stable, for example, would compile to a bi-morphic call),
but, in pathological cases (infinite sequence of distinct CS targets)
would "settle down" into a code shape that would be sub-optimal for
any single target, but (as an indirect MH call) reasonable for all the
targets together.

In the absence of clear direction from the user or the profile, the
JVM has to choose infinite recompilation or a good-enough final
compilation. The latter choice is safer. And the
infinite recompilation is less safe because there is no intrinsic
bound on the amount of machine cycles that could be diverted to
recompilation, given a dynamic language application with
enough mega-mutable CSs. Settling down to a network of indirect calls
has a bounded cost.

Yes, one size-fits-all tactics never please everybody. But the JVM
should not choose tactics with unlimited downsides.


— John

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev