Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode
What I would like to see is for users to feel free to use CallSites with any amount of mutability, and have the JVM pick a good strategy for speculating and optimizing through CS target bindings. By "good" I mean that, if the CS is not megamutable, you get the performance comparable to an "invokestatic". But if the CS *is* megamutable (unstable), it is not "good" (IMO) to issue a storm of recompilations, especially if (as is usually the case) the megamutable CS is one of 1000s of other call sites in the same code blob, all of which must be recompiled because one CS had a problem. Instead, the megamutable CS should be downgraded to an indirect call through a normal (or volatile) variable. So does this leave some performance on the floor? Of course; perhaps the CS finally settles down long enough for the JVM to venture a profitable recompilation, and for the cost of recompilation to be paid off by further stability and efficient execution of the CS. My main point here is that reoptimization of megamutables is a misuse of speculation. I'm not saying that the JIT should have a tantrum and refuse to compile the call site (which is a bug), but it should stop speculating that it is stable when in fact it is not. There are lots of ways to improve the performance of megamutables, but unconditional recompilation is not one of those ways. It uses a wrecking ball to swat a fly. Handling megamutables is very much like handling megamorphics. You want to hang on to the hope that there are really just a few branches (common case) and optimize those, and call out-of-line for the rest. If that hope fails, you call out-of-line always. And you want to detect if the statistics change, where the entropy of the CS target goes down to a small number, so you can venture another recompile with up-to-date speculation. We should apply these techniques to both megamorphics and megamutables. So there's an ambiguity in the contract: Is CS speculation just a best-efforts kind of thing, or is the JVM contracted to mechanically recompile on every CS change? I think the reasonable reading of the javadoc (etc.) is the first, not the second. How would a user communicate that his CS is a special one, whose invalidation should *always* trigger reoptimization? I don't know, maybe an integer-valued callback that is triggered during setTarget calls, and returns the amount of (virtual) time before the next reoptimization should be attempted. The callback would be passed the number of previous reoptimizations (at this site or in the whole method or both), as a warning of how resource-intensive this CS is becoming. Returning constant zero means the current behavior. I think you can see lots of problems with such an API. And, I think that sort of thing isn't notably better than simple JVM heuristics. Here's how I think we should fix the megamutable problem: 1. Speculate at first that a CS is immutable. 2. If that fails, speculate that it is stable, as: if (cs.t == expected) inline expected(); else outline cs.t(); Collect a profile count along the outline path. 3. Every once in a while, if a code blog is accumulating outline counts, queue it for reoptimization. Crucially, do this in such a way that the JIT does not become a foreground consumer of CPU cycles. 4. When recompiling a stable call site, always inline the current target ("this time fer sure!"). Maybe if this is a *really* bad actor (but how can you tell?) forget the speculation part. 5. Maybe, speculate on the LF of the target, not the target itself, to allow some degree of harmless variation by targets. (For some codes that will help, although it interacts with MH customization in tricky ways.) 6. Maybe fiddle with collecting previous hot targets, or (better) empower the JDK code to manage that stuff. PIC logic should be handled at the JDK level, not in the JIT. Anyway, if the above gets addressed eventually, or if the rest of the MLVM crew proves that I don't know what I'm talking about, I'm OK with this fix. "Reviewed", assuming future improvements. — John On Jan 20, 2016, at 3:54 AM, Vladimir Ivanovwrote: > > John, Chris, thanks for the feedback. > > I don't think it is only about microbenchmarks. Long-running large > applications with lots of mutable call sites should also benefit for this > change. Current JVM behavior counts invalidations on root method, so nmethods > with multiple mutable call sites (from root & all inlined callees) are more > likely to hit the limit, even if there's no mega-mutable sites. It just sums > up and PerMethodRecompilationCutoff (= 400, by default) doesn't look like a > huge number. > > Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms were > mostly customized, different method handle chains were compiled into a single > nmethod. Right now, it means that not only the root method is always > interpreted, but all bound method handle chains are broken into
Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode
On Jan 20, 2016, at 4:13 AM, Remi Foraxwrote: > > I understand that having the VM that may always recompile may be seen as a > bug, > but having the VM that bailout and stop recompiling, or more generally change > the compilation strategy is a bug too. As you can guess from my previous message, I agree with this, except for "change the compilation strategy". The JVM earns its way in the world by routinely changing compilation strategy. The reason most people don't notice is the strategy changes are profile-driven and self-correcting. Nothing in the 292 world promises a particular strategy, just a best effort to create and execute great code, assuming stable application behavior. When an optimization breaks, the JVM's strategy may also fail to adjust correctly. One symptom of that is infinite recompilation, usually because one line of code is being handled badly, but which creates huge bloat in the code cache for thousands of lines of code that happen to be inlined nearby. We try hard to avoid this. We also try hard to detect this problem. That is the true meaning of those strange cutoffs. Nobody things falling into the interpreter is a good idea, except that it, on balance, is a better idea than (a) throwing an assertion error, or (b) filling the CPU with JIT jobs and the code cache with discards. The third choice (c) run offending method in the interpreter at least preserves a degree of forward progress, while allowing the outraged user to report a bug. The correct fix to the bug, IMO, is never to jump from (c) to (a) or (b). It is to find and fix the problem with the compilation strategy, and the profile-driven gating logic for it. If your car's transmission gets a bug (now that they are computers, they can), what would you prefer? (a) stop the car immediately, (b) run the car in first gear at full speed, or (c) slow the car to a defined speed limit (25mph). Detroit prefers, and the JVM implements, option (c). > The problem here is that there is no way from the point of view of a dyn lang > runtime to know what will be the behavior of the VM for a callsite if the VM > decide to stop to recompile, decide to not inline, decide to inline some part > of the tree, etc. Yes. And it usually doesn't matter; the issue doesn't come up until something breaks, or we find a performance pothole. The current problem is (in my mind) a break, not a performance pothole that needs tuning. If we fix the break, people shouldn't need to worry about this stuff, usually. > Said differently, using an invokedynamic allows to create code shapes that > will change dynamically, if the VM behavior also changes dynamically, it's > like building a wall on moving parts, the result is strange dynamic behaviors > that are hard to diagnose and reproduce. JVMs have always been like that, because of dynamic class loading, but with indy it is more so, since it's much easier to "override" some previously fixed behavior. > The recompilation behavior of the VM should be keep simple and predicatable, > basically, the VM should always recompile the CS with no failsafe switch. We agree that the failsafe should not trip. Just like we agree that the circuit breakers in our building should not trip. We disagree, perhaps, about what to do when they trip. I don't want to duct-tape them back into the "on" position; do you? > If dyn lang runtime devs have trouble with that, they can already use an > exactInvoker to simulate an indirect mh call and we can even provide new > method handle combiners to gracefully handle multi-stable CS. That's all true. The new combiners might have some sort of handshake with the JVM to self-adjust their code shape. But I claim the baseline behavior that I have called for is the most generally useful, since it is able to amortize recompilation resources over multiple CS misses, put global limits on total recompilation effort, and preserve reasonable forward progress executing good-enough code. (Having a CS change force a reoptimization is tantamount to adding a JIT control API, as Compiler.recompile(cs) like System.gc(). But just for CS-bearing methods. We are a long way from understanding how to work such an API.) Idea: Perhaps CS's should have a callback which says, "Hey, CS, the JIT has mispredicted you a bunch of times; would you like to nominate an alternative representation?" The call would be made asynchronously, outside the JIT. The default behavior would be to say "nope" with the results given above, but the CS could also return a MH (perhaps the CS.dynamicInvoker, or perhaps some more elaborate logic), which the JVM would slide into place over the top of the CS. Despite the fact that CS bindings are final, the new binding would take its place. And it would be the user's choice whether that binding pointed to the old CS or a new CS or some combination of both. — John ___ mlvm-dev mailing list
Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode
I was going to say it is unlikely to matter in production cases but might well hit test code which does extensive meta-programming, but actually, since it¹s a question of invalidations across _all_ sites, rather than any single one I think it might make a difference. I¹ll need to take a look at what our compilation counts eventually come to and experiment with changing the limits. We did work quite early on to limit the the extent of call site invalidations. One thing that might affect this is how megamorphic call sites are handled. At the moment we keep a cache of classes, method handles, and switch points, and we check the switch point before calling the method handle. I had considered a change to bind the switch points to the method handles and thus allow those checks to be optimised out for methods called extensively from mega-morphia call sites, would that also fall foul of the compilation count being increased? I think there is definitely room for communicating more in the nature of a callsite to the JIT. Whether this should be around recompilation or perhaps more focused round inlining and type specialisation to avoid invalidations and recompilation would be my question. For example, method invocation sites may go megamorphic, and this currently forms a barrier to the JIT seeing the types in a way that doesn¹t really exist with standard invokeVirtual sites. If there was some feedback loop allowing sites to be cloned as methods are inlined, and a way to indicate this was allowed or desired, then that might allow significantly more optimisations to happen in invokeDynamic based languages. It would also probably be a horror to implement in the current model, but I¹m sure you guys can fix all that. :-) Duncan. On 20/01/2016, 11:54, "mlvm-dev on behalf of Vladimir Ivanov"wrote: >MLVM folks, I'd like to hear your opinion about what kind of behavior do >you expect from JVM w.r.t. mutable call sites. > >There are valid use-cases when JVM shouldn't throttle the recompilation >(e.g., long-running application with indy-based dynamic tracing). Maybe >there's a place for a new CallSite flavor to clearly communicate >application expectations to the JVM? Either always recompile (thus >eventually reaching peak performance) or give up and generate less >efficient machine code, but save on possible recompilations. > >Best regards, >Vladimir Ivanov > >On 1/20/16 2:37 AM, John Rose wrote: >> On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov >> > >>wrote: >>> >>> The fix is to avoid updating recompilation count when corresponding >>> nmethod is invalidated due to a call site target change. >> >> Although I'm not vetoing it (since it seems it will help customers in >> the short term), I'm uncomfortable with this fix because it doesn't >> scale to large dyn. lang. applications with many unstable call sites. >> Put another way, it feels like we are duct-taping down a failsafe >> switch (against infinite recompilation) in order to spam a >> micro-benchmark: a small number mega-mutable call sites for which we >> are willing to spend (potentially) all of the JIT resources, including >> those usually allocated to application performance in the steady state. >> Put a third way: I am not comfortable with unthrottled infinite >> recompilation as a performance strategy. >> >> I've commented on the new RFE (JDK-8147550) where to go next, including >> the following sentiments: >> >>> There is a serious design tension here, though: Some users apparently >>> are willing to endure an infinite series of recompilations as part of >>> the cost of doing business; JDK-7177745 addresses this need by turning >>> off the fail-safe against (accidental, buggy) infinite recompilation >>> for unstable CSs. Other users might find that having a percentage of >>> machine time devoted to recompilation is a problem. (This has been the >>> case in the past with non-dynamic languages, at least.) The code shape >>> proposed in this bug report would cover all simple unstable call >>> sites (bi-stable, for example, would compile to a bi-morphic call), >>> but, in pathological cases (infinite sequence of distinct CS targets) >>> would "settle down" into a code shape that would be sub-optimal for >>> any single target, but (as an indirect MH call) reasonable for all the >>> targets together. >>> >>> In the absence of clear direction from the user or the profile, the >>> JVM has to choose infinite recompilation or a good-enough final >>> compilation. The latter choice is safer. And the >>> infinite recompilation is less safe because there is no intrinsic >>> bound on the amount of machine cycles that could be diverted to >>> recompilation, given a dynamic language application with >>> enough mega-mutable CSs. Settling down to a network of indirect calls >>> has a bounded cost. >>> >>> Yes, one size-fits-all
Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode
John, Chris, thanks for the feedback. I don't think it is only about microbenchmarks. Long-running large applications with lots of mutable call sites should also benefit for this change. Current JVM behavior counts invalidations on root method, so nmethods with multiple mutable call sites (from root & all inlined callees) are more likely to hit the limit, even if there's no mega-mutable sites. It just sums up and PerMethodRecompilationCutoff (= 400, by default) doesn't look like a huge number. Also, LambdaForm sharing somewhat worsen the situation. When LambdaForms were mostly customized, different method handle chains were compiled into a single nmethod. Right now, it means that not only the root method is always interpreted, but all bound method handle chains are broken into numerous per-LF nmethods (see JDK-8069591 for some details). MLVM folks, I'd like to hear your opinion about what kind of behavior do you expect from JVM w.r.t. mutable call sites. There are valid use-cases when JVM shouldn't throttle the recompilation (e.g., long-running application with indy-based dynamic tracing). Maybe there's a place for a new CallSite flavor to clearly communicate application expectations to the JVM? Either always recompile (thus eventually reaching peak performance) or give up and generate less efficient machine code, but save on possible recompilations. Best regards, Vladimir Ivanov On 1/20/16 2:37 AM, John Rose wrote: On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov> wrote: The fix is to avoid updating recompilation count when corresponding nmethod is invalidated due to a call site target change. Although I'm not vetoing it (since it seems it will help customers in the short term), I'm uncomfortable with this fix because it doesn't scale to large dyn. lang. applications with many unstable call sites. Put another way, it feels like we are duct-taping down a failsafe switch (against infinite recompilation) in order to spam a micro-benchmark: a small number mega-mutable call sites for which we are willing to spend (potentially) all of the JIT resources, including those usually allocated to application performance in the steady state. Put a third way: I am not comfortable with unthrottled infinite recompilation as a performance strategy. I've commented on the new RFE (JDK-8147550) where to go next, including the following sentiments: There is a serious design tension here, though: Some users apparently are willing to endure an infinite series of recompilations as part of the cost of doing business; JDK-7177745 addresses this need by turning off the fail-safe against (accidental, buggy) infinite recompilation for unstable CSs. Other users might find that having a percentage of machine time devoted to recompilation is a problem. (This has been the case in the past with non-dynamic languages, at least.) The code shape proposed in this bug report would cover all simple unstable call sites (bi-stable, for example, would compile to a bi-morphic call), but, in pathological cases (infinite sequence of distinct CS targets) would "settle down" into a code shape that would be sub-optimal for any single target, but (as an indirect MH call) reasonable for all the targets together. In the absence of clear direction from the user or the profile, the JVM has to choose infinite recompilation or a good-enough final compilation. The latter choice is safer. And the infinite recompilation is less safe because there is no intrinsic bound on the amount of machine cycles that could be diverted to recompilation, given a dynamic language application with enough mega-mutable CSs. Settling down to a network of indirect calls has a bounded cost. Yes, one size-fits-all tactics never please everybody. But the JVM should not choose tactics with unlimited downsides. — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause target method to always run in interpreter mode
Hi John, I understand that having the VM that may always recompile may be seen as a bug, but having the VM that bailout and stop recompiling, or more generally change the compilation strategy is a bug too. The problem here is that there is no way from the point of view of a dyn lang runtime to know what will be the behavior of the VM for a callsite if the VM decide to stop to recompile, decide to not inline, decide to inline some part of the tree, etc. Said differently, using an invokedynamic allows to create code shapes that will change dynamically, if the VM behavior also changes dynamically, it's like building a wall on moving parts, the result is strange dynamic behaviors that are hard to diagnose and reproduce. The recompilation behavior of the VM should be keep simple and predicatable, basically, the VM should always recompile the CS with no failsafe switch. If dyn lang runtime devs have trouble with that, they can already use an exactInvoker to simulate an indirect mh call and we can even provide new method handle combiners to gracefully handle multi-stable CS. regards, Rémi - Mail original - > De: "John Rose" <john.r.r...@oracle.com> > À: "Vladimir Ivanov" <vladimir.x.iva...@oracle.com> > Cc: "hotspot compiler" <hotspot-compiler-...@openjdk.java.net> > Envoyé: Mercredi 20 Janvier 2016 00:37:29 > Objet: Re: [9] RFR (S): 7177745: JSR292: Many Callsite relinkages cause > target method to always run in interpreter mode > On Jan 18, 2016, at 4:54 AM, Vladimir Ivanov < vladimir.x.iva...@oracle.com > > wrote: > > The fix is to avoid updating recompilation count when corresponding nmethod > > is invalidated due to a call site target change. > > Although I'm not vetoing it (since it seems it will help customers in the > short term), I'm uncomfortable with this fix because it doesn't scale to > large dyn. lang. applications with many unstable call sites. Put another > way, it feels like we are duct-taping down a failsafe switch (against > infinite recompilation) in order to spam a micro-benchmark: a small number > mega-mutable call sites for which we are willing to spend (potentially) all > of the JIT resources, including those usually allocated to application > performance in the steady state. Put a third way: I am not comfortable with > unthrottled infinite recompilation as a performance strategy. > I've commented on the new RFE (JDK-8147550) where to go next, including the > following sentiments: > > There is a serious design tension here, though: Some users apparently are > > willing to endure an infinite series of recompilations as part of the cost > > of doing business; JDK-7177745 addresses this need by turning off the > > fail-safe against (accidental, buggy) infinite recompilation for unstable > > CSs. Other users might find that having a percentage of machine time > > devoted > > to recompilation is a problem. (This has been the case in the past with > > non-dynamic languages, at least.) The code shape proposed in this bug > > report > > would cover all simple unstable call sites (bi-stable, for example, would > > compile to a bi-morphic call), but, in pathological cases (infinite > > sequence > > of distinct CS targets) would "settle down" into a code shape that would be > > sub-optimal for any single target, but (as an indirect MH call) reasonable > > for all the targets together. > > > In the absence of clear direction from the user or the profile, the JVM has > > to choose infinite recompilation or a good-enough final compilation. The > > latter choice is safer. And the infinite recompilation is less safe because > > there is no intrinsic bound on the amount of machine cycles that could be > > diverted to recompilation, given a dynamic language application with enough > > mega-mutable CSs. Settling down to a network of indirect calls has a > > bounded > > cost. > > > Yes, one size-fits-all tactics never please everybody. But the JVM should > > not > > choose tactics with unlimited downsides. > > — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev