Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Looking very good, thanks. Ship it! Thanks, John! Actually, can you insert a comment why the injected counts are not scaled? (Or perhaps they should be??) Sure! I intentionally don't scale the counts because I don't see any reason to do so. Profiling is done on per-MethodHandle basis, so the counts should be very close (considering racy updates) to the actual behavior. Also, we may need a followup bug for the code with this comment: // Look for the following shape: AndI (ProfileBoolean) (ConI 1)) Since profileBoolean returns a TypeInt::BOOL, the AndI with (ConI 1) should fold up. So there's some work to do in MulNode, which may allow that special pattern match to go away. But I don't want to divert the present bug by a possibly complex dive into fixing AndI::Ideal. Good catch! It's an overlook on my side. The following change for ProfileBooleanNode solves the problem: - virtual const Type *bottom_type() const { return TypeInt::INT; } + virtual const Type *bottom_type() const { return TypeInt::BOOL; } I polished the change a little according to your comments (diff against v03): http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot Changes: - added short explanation why injected counts aren't scaled - adjusted ProfileBooleanNode type to TypeInt::BOOL and removed excessive pattern matching in has_injected_profile() - added an assert when ProfileBooleanNode is removed to catch the cases when injected profile isn't used: if we decide to generalize the API, I'd be happy to remove it, but current usages assumes that injected counts are always consumed during parsing and missing cases can cause hard-to-diagnose performance problems. Best regards, Vladimir Ivanov (Generally speaking, pattern matching should assume strong normalization of its inputs. Otherwise you end up duplicating pattern match code in many places, inconsistently. Funny one-off idiom checks like this are evidence of incomplete IR normalization. See http://en.wikipedia.org/wiki/Rewriting for some background on terms like normalization and confluence which are relevant to C2.) — John On Jan 27, 2015, at 8:05 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: Thanks for the feedback, John! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot Changes: - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode; - restructured profile layout ([0] = false_cnt, [1] = true_cnt) - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp) - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts - matching from value counts to taken/not_taken happens in has_injected_profile(); - added BoolTest::ne support - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape Best regards, Vladimir Ivanov ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
On Jan 28, 2015, at 1:00 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: I polished the change a little according to your comments (diff against v03): http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03-04/hotspot +1 Glad to see the AndI folds up easily; thanks for the cleanup.___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Looking very good, thanks. Ship it! Actually, can you insert a comment why the injected counts are not scaled? (Or perhaps they should be??) Also, we may need a followup bug for the code with this comment: // Look for the following shape: AndI (ProfileBoolean) (ConI 1)) Since profileBoolean returns a TypeInt::BOOL, the AndI with (ConI 1) should fold up. So there's some work to do in MulNode, which may allow that special pattern match to go away. But I don't want to divert the present bug by a possibly complex dive into fixing AndI::Ideal. (Generally speaking, pattern matching should assume strong normalization of its inputs. Otherwise you end up duplicating pattern match code in many places, inconsistently. Funny one-off idiom checks like this are evidence of incomplete IR normalization. See http://en.wikipedia.org/wiki/Rewriting for some background on terms like normalization and confluence which are relevant to C2.) — John On Jan 27, 2015, at 8:05 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: Thanks for the feedback, John! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot Changes: - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode; - restructured profile layout ([0] = false_cnt, [1] = true_cnt) - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp) - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts - matching from value counts to taken/not_taken happens in has_injected_profile(); - added BoolTest::ne support - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape Best regards, Vladimir Ivanov ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Thanks for the feedback, John! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/jdk http://cr.openjdk.java.net/~vlivanov/8063137/webrev.03/hotspot Changes: - renamed MHI::profileBranch to MHI::profileBoolean, and ProfileBranchNode to ProfileBooleanNode; - restructured profile layout ([0] = false_cnt, [1] = true_cnt) - factored out profile injection in a separate function (has_injected_profile() in parse2.cpp) - ProfileBooleanNode stores true/false counts instead of taken/not_taken counts - matching from value counts to taken/not_taken happens in has_injected_profile(); - added BoolTest::ne support - sharpened test for AndI case: now it checks AndI (ProfileBoolean) (ConI 1) shape Best regards, Vladimir Ivanov On 1/27/15 3:04 AM, John Rose wrote: On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com mailto:vladimir.x.iva...@oracle.com wrote: What do you think about the following version? http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()). I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer? Generic advice here: It's better to leave it out, if in doubt. If it has a real benefit, and we don't have time to make it clean, put it in and file a tracking bug to clean it up. I re-read the change. It's simpler and more coherent now. I see one more issue which we should fix now, while we can. It's the sort of thing which is hard to clean up later. The two fields of the profileBranch array have obscure and inconsistent labelings. It took me some hard thought and the inspection of three files to decide what taken and not taken mean in the C2 code that injects the profile. The problem is that, when you look at profileBranch, all you see is an integer (boolean) argument and an array, and no clear indication about which array element corresponds to which argument value. It's made worse by the fact that taken and not taken are not mentioned at all in the JDK code, which instead wires together the branches of selectAlternative without much comment. My preferred formulation, for making things clearer: Decouple the idea of branching from the idea of profile injection. Name the intrinsic (yes, one more bikeshed color) profileBoolean (or even injectBooleanProfile), and use the natural indexing of the array: 0 (Java false) is a[0], and 1 (Java true) is a[1]. We might later extend this to work with booleans (more generally, small-integer flags), of more than two possible values, klasses, etc. This line then goes away, and 'result' is used directly as the profile index: +int idx = result ? 0 : 1; The ProfileBooleanNode should have an embedded (or simply indirect) array of ints which is a simple copy of the profile array, so there's no doubt about which count is which. The parsing of the predicate that contains profileBoolean should probably be more robust, at least allowing for 'eq' and 'ne' versions of the test. (C2 freely flips comparison senses, in various places.) The check for Op_AndI must be more precise; make sure n-in(2) is a constant of the expected value (1). The most robust way to handle it (but try this another time, I think) would be to make two temp copies of the predicate, substituting the occurrence of ProfileBoolean with '0' and '1', respectively; if they both fold to '0' and '1' or '1' and '0', then you take the indicated action. I suggest putting the new code in Parse::dynamic_branch_prediction, which pattern-matches for injected profiles, into its own subroutine. Maybe: bool use_mdo = true; if (has_injected_profile(btest, test, taken, not_taken)) { use_mdo = false; } if (use_mdo) { ... old code I see why you used the opposite order in the existing code: It mirrors the order of the second and third arguments to selectAlternative. But the JVM knows nothing about selectAlternative, so it's just confusing when reading the VM code to know which profile array element means what. — John P.S. Long experience with byte-order bugs in HotSpot convinces me that if you are not scrupulously clear in your terms, when working with equal and opposite configuration pairs, you will have a long bug tail, especially if you have to maintain agreement about the configurations through many layers of software. This is one of those cases. The best chance to fix such bugs is not to allow them in the first place. In the case of byte-order, we have first vs. second, MSB vs. LSB, and high vs. low parts of values, for values in memory and
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
John, What do you think about the following version? http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()). I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer? Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8068915 On 1/23/15 4:31 AM, John Rose wrote: On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com mailto:vladimir.x.iva...@oracle.com wrote: What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag. I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set. So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested. Good. I think there is a sweeter spot just a little further on. Make profileBranch be an LF intrinsic and expose it like this: GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); } Then selectAlternative triggers branchy bytecodes in the IBGen, and profileBranch injects profiling in C2. The presence of profileBranch would then trigger the @Shared annotation, if you still need it. After thinking about it some more, I still believe it would be better to detect the use of profileBranch during a C2 compile task, and feed that to the too_many_traps logic. I agree it is much easier to stick the annotation on in the IBGen; the problem is that because of a minor phase ordering problem you are introducing an annotation which flows from the JDK to the VM. Here's one more suggestion at reducing this coupling… Note that C-set_trap_count is called when each Parse phase processes a whole method. This means that information about the contents of the nmethod accumulates during the parse. Likewise, add a flag method C-{has,set}_injected_profile, and set the flag whenever the parser sees a profileBranch intrinsic (with or without a constant profile array; your call). Then consult that flag from too_many_traps. It is true that code which is parsed upstream of the very first profileBranch will potentially issue a non-trapping fallback, but by definition that code would be unrelated to the injected profile, so I don't see a harm in that. If this approach works, then you can remove the annotation altogether, which is clearly preferable. We understand the annotation now, but it has the danger of becoming a maintainer's puzzlement. In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions. Will that optimize or will it cause a permanent slowdown? Consider a hack like this on the exception path: counters[idx] = Integer.MAX_VALUE / 2; I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed. Good. On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in the VM) promises too much ignorance, since it suppresses branch counts and traps, but allows type profiles to be consulted. Maybe something positive like @ManyTraps or @SharedMegamorphic? (It's just a name, and this is just a suggestion.) What do you think about @LambdaForm.Shared? That's fine. Suggest changing the JVM accessor to is_lambda_form_shared, because the term shared is already overused in the VM. Or, to be much more accurate, s/@Shared/@CollectiveProfile/. Better yet, get rid of it, as suggested above. (I just realized that profile pollution looks logically parallel to the http://en.wikipedia.org/wiki/Tragedy_of_the_commons .) Also, in the comment explaining the annotation: s/mostly useless/probably polluted by conflicting behavior from multiple call sites/ I very much like the fact that profileBranch is the VM intrinsic, not selectAlternative. A VM intrinsic should be nice and narrow like that. In fact, you can delete selectAlternative from vmSymbols while you are at it. (We could do profileInteger and profileClass in a similar way, if that turned out to be useful.) — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()). Ignore that. Additional runs don't prove there's a regression on Gbemu. There's some variance on Gbemu and it's present w/ and w/o @Shared. Best regards, Vladimir Ivanov I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer? Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8068915 On 1/23/15 4:31 AM, John Rose wrote: On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com mailto:vladimir.x.iva...@oracle.com wrote: What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag. I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set. So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested. Good. I think there is a sweeter spot just a little further on. Make profileBranch be an LF intrinsic and expose it like this: GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); } Then selectAlternative triggers branchy bytecodes in the IBGen, and profileBranch injects profiling in C2. The presence of profileBranch would then trigger the @Shared annotation, if you still need it. After thinking about it some more, I still believe it would be better to detect the use of profileBranch during a C2 compile task, and feed that to the too_many_traps logic. I agree it is much easier to stick the annotation on in the IBGen; the problem is that because of a minor phase ordering problem you are introducing an annotation which flows from the JDK to the VM. Here's one more suggestion at reducing this coupling… Note that C-set_trap_count is called when each Parse phase processes a whole method. This means that information about the contents of the nmethod accumulates during the parse. Likewise, add a flag method C-{has,set}_injected_profile, and set the flag whenever the parser sees a profileBranch intrinsic (with or without a constant profile array; your call). Then consult that flag from too_many_traps. It is true that code which is parsed upstream of the very first profileBranch will potentially issue a non-trapping fallback, but by definition that code would be unrelated to the injected profile, so I don't see a harm in that. If this approach works, then you can remove the annotation altogether, which is clearly preferable. We understand the annotation now, but it has the danger of becoming a maintainer's puzzlement. In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions. Will that optimize or will it cause a permanent slowdown? Consider a hack like this on the exception path: counters[idx] = Integer.MAX_VALUE / 2; I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed. Good. On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in the VM) promises too much ignorance, since it suppresses branch counts and traps, but allows type profiles to be consulted. Maybe something positive like @ManyTraps or @SharedMegamorphic? (It's just a name, and this is just a suggestion.) What do you think about @LambdaForm.Shared? That's fine. Suggest changing the JVM accessor to is_lambda_form_shared, because the term shared is already overused in the VM. Or, to be much more accurate, s/@Shared/@CollectiveProfile/. Better yet, get rid of it, as suggested above. (I just realized that profile pollution looks logically parallel to the http://en.wikipedia.org/wiki/Tragedy_of_the_commons .) Also, in the comment explaining the annotation: s/mostly useless/probably polluted by conflicting behavior from multiple call sites/ I very much like the fact that profileBranch is the VM intrinsic, not selectAlternative. A VM intrinsic should be nice and narrow like that. In fact, you can delete selectAlternative from vmSymbols while you are at it. (We could do profileInteger and profileClass in a similar way, if that turned out to be useful.) — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
On Jan 26, 2015, at 8:41 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: What do you think about the following version? http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 http://cr.openjdk.java.net/~vlivanov/8063137/webrev.02 As you suggested, I reified MHI::profileBranch on LambdaForm level and removed @LambdaForm.Shared. My main concern about removing @Sharen was that profile pollution can affect the code before profileBranch call (akin to 8068915 [1]) and it seems it's the case: Gbemu (at least) is sensitive to that change (there's a 10% difference in peak performance between @Shared and has_injected_profile()). I can leave @Shared as is for now or remove it and work on the fix to the deoptimization counts pollution. What do you prefer? Generic advice here: It's better to leave it out, if in doubt. If it has a real benefit, and we don't have time to make it clean, put it in and file a tracking bug to clean it up. I re-read the change. It's simpler and more coherent now. I see one more issue which we should fix now, while we can. It's the sort of thing which is hard to clean up later. The two fields of the profileBranch array have obscure and inconsistent labelings. It took me some hard thought and the inspection of three files to decide what taken and not taken mean in the C2 code that injects the profile. The problem is that, when you look at profileBranch, all you see is an integer (boolean) argument and an array, and no clear indication about which array element corresponds to which argument value. It's made worse by the fact that taken and not taken are not mentioned at all in the JDK code, which instead wires together the branches of selectAlternative without much comment. My preferred formulation, for making things clearer: Decouple the idea of branching from the idea of profile injection. Name the intrinsic (yes, one more bikeshed color) profileBoolean (or even injectBooleanProfile), and use the natural indexing of the array: 0 (Java false) is a[0], and 1 (Java true) is a[1]. We might later extend this to work with booleans (more generally, small-integer flags), of more than two possible values, klasses, etc. This line then goes away, and 'result' is used directly as the profile index: +int idx = result ? 0 : 1; The ProfileBooleanNode should have an embedded (or simply indirect) array of ints which is a simple copy of the profile array, so there's no doubt about which count is which. The parsing of the predicate that contains profileBoolean should probably be more robust, at least allowing for 'eq' and 'ne' versions of the test. (C2 freely flips comparison senses, in various places.) The check for Op_AndI must be more precise; make sure n-in(2) is a constant of the expected value (1). The most robust way to handle it (but try this another time, I think) would be to make two temp copies of the predicate, substituting the occurrence of ProfileBoolean with '0' and '1', respectively; if they both fold to '0' and '1' or '1' and '0', then you take the indicated action. I suggest putting the new code in Parse::dynamic_branch_prediction, which pattern-matches for injected profiles, into its own subroutine. Maybe: bool use_mdo = true; if (has_injected_profile(btest, test, taken, not_taken)) { use_mdo = false; } if (use_mdo) { ... old code I see why you used the opposite order in the existing code: It mirrors the order of the second and third arguments to selectAlternative. But the JVM knows nothing about selectAlternative, so it's just confusing when reading the VM code to know which profile array element means what. — John P.S. Long experience with byte-order bugs in HotSpot convinces me that if you are not scrupulously clear in your terms, when working with equal and opposite configuration pairs, you will have a long bug tail, especially if you have to maintain agreement about the configurations through many layers of software. This is one of those cases. The best chance to fix such bugs is not to allow them in the first place. In the case of byte-order, we have first vs. second, MSB vs. LSB, and high vs. low parts of values, for values in memory and in registers, and all possible misunderstandings about them and their relation have probably happened and caused bugs.___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
On Jan 20, 2015, at 11:09 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag. I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set. So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested. Good. I think there is a sweeter spot just a little further on. Make profileBranch be an LF intrinsic and expose it like this: GWT(p,t,f;S) := let(a=new int[3]) in lambda(*: S) { selectAlternative(profileBranch(p.invoke( *), a), t, f).invoke( *); } Then selectAlternative triggers branchy bytecodes in the IBGen, and profileBranch injects profiling in C2. The presence of profileBranch would then trigger the @Shared annotation, if you still need it. After thinking about it some more, I still believe it would be better to detect the use of profileBranch during a C2 compile task, and feed that to the too_many_traps logic. I agree it is much easier to stick the annotation on in the IBGen; the problem is that because of a minor phase ordering problem you are introducing an annotation which flows from the JDK to the VM. Here's one more suggestion at reducing this coupling… Note that C-set_trap_count is called when each Parse phase processes a whole method. This means that information about the contents of the nmethod accumulates during the parse. Likewise, add a flag method C-{has,set}_injected_profile, and set the flag whenever the parser sees a profileBranch intrinsic (with or without a constant profile array; your call). Then consult that flag from too_many_traps. It is true that code which is parsed upstream of the very first profileBranch will potentially issue a non-trapping fallback, but by definition that code would be unrelated to the injected profile, so I don't see a harm in that. If this approach works, then you can remove the annotation altogether, which is clearly preferable. We understand the annotation now, but it has the danger of becoming a maintainer's puzzlement. In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions. Will that optimize or will it cause a permanent slowdown? Consider a hack like this on the exception path: counters[idx] = Integer.MAX_VALUE / 2; I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed. Good. On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in the VM) promises too much ignorance, since it suppresses branch counts and traps, but allows type profiles to be consulted. Maybe something positive like @ManyTraps or @SharedMegamorphic? (It's just a name, and this is just a suggestion.) What do you think about @LambdaForm.Shared? That's fine. Suggest changing the JVM accessor to is_lambda_form_shared, because the term shared is already overused in the VM. Or, to be much more accurate, s/@Shared/@CollectiveProfile/. Better yet, get rid of it, as suggested above. (I just realized that profile pollution looks logically parallel to the http://en.wikipedia.org/wiki/Tragedy_of_the_commons http://en.wikipedia.org/wiki/Tragedy_of_the_commons .) Also, in the comment explaining the annotation: s/mostly useless/probably polluted by conflicting behavior from multiple call sites/ I very much like the fact that profileBranch is the VM intrinsic, not selectAlternative. A VM intrinsic should be nice and narrow like that. In fact, you can delete selectAlternative from vmSymbols while you are at it. (We could do profileInteger and profileClass in a similar way, if that turned out to be useful.) — John___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Duncan, sorry for that. Updated webrev inplace. Best regards, Vladimir Ivanov On 1/21/15 1:39 PM, MacGregor, Duncan (GE Energy Management) wrote: This version seems to have inconsistent removal of ignore profile in the hotspot patch. It’s no longer added to vmSymbols but is still referenced in classFileParser. On 19/01/2015 20:21, MacGregor, Duncan (GE Energy Management) duncan.macgre...@ge.com wrote: Okay, I¹ve done some tests of this with the micro benchmarks for our language runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It¹s a bit of a pathological case, and I haven¹t seen the same sort of degradation in the other benchmarks or in real applications, but I haven¹t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what¹s going on. Duncan. On 16/01/2015 17:16, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
This version seems to have inconsistent removal of ignore profile in the hotspot patch. It’s no longer added to vmSymbols but is still referenced in classFileParser. On 19/01/2015 20:21, MacGregor, Duncan (GE Energy Management) duncan.macgre...@ge.com wrote: Okay, I¹ve done some tests of this with the micro benchmarks for our language runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It¹s a bit of a pathological case, and I haven¹t seen the same sort of degradation in the other benchmarks or in real applications, but I haven¹t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what¹s going on. Duncan. On 16/01/2015 17:16, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Duncan, thanks a lot for giving it a try! If you plan to spend more time on it, please, apply 8068915 as well. I saw huge intermittent performance regressions due to continuous deoptimization storm. You can look into -XX:+LogCompilation output and look for repeated deoptimization events in steady state w/ Action_none. Also, there's deoptimization statistics in the log (at least, in jdk9). It's located right before compilation_log tag. Thanks again for the valuable feedback! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: Okay, I¹ve done some tests of this with the micro benchmarks for our language runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It¹s a bit of a pathological case, and I haven¹t seen the same sort of degradation in the other benchmarks or in real applications, but I haven¹t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what¹s going on. Duncan. On 16/01/2015 17:16, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Hmm, 8068915 hasn’t fixed it, but running fewer benchmarks seems to make the problem go away, so it looks like there’s something going wrong fairly deep in our runtime. Trying the full suite with compilation logging enabled now to see if I can find a smoking gun. On 20/01/2015 12:40, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: Duncan, thanks a lot for giving it a try! If you plan to spend more time on it, please, apply 8068915 as well. I saw huge intermittent performance regressions due to continuous deoptimization storm. You can look into -XX:+LogCompilation output and look for repeated deoptimization events in steady state w/ Action_none. Also, there's deoptimization statistics in the log (at least, in jdk9). It's located right before compilation_log tag. Thanks again for the valuable feedback! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: Okay, I¹ve done some tests of this with the micro benchmarks for our language runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It¹s a bit of a pathological case, and I haven¹t seen the same sort of degradation in the other benchmarks or in real applications, but I haven¹t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what¹s going on. Duncan. On 16/01/2015 17:16, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
So, very few deopt events in the logs (exactly 4 in fact, in both the performant and non-performant cases, and for the exact same methods), but in the case where performance has degraded I only see an initial compilation for the problem method and not the later inlining I see in the performant case. I’ll dig through the rest of the logs and try see if there’s any differences leading up to the inlining. On the bright side while going through the logs I did spot one obvious snafu in our code (unnecessary MutableCallSite usage), and have got a 2.5 times speed up on another benchmark, so I’m not too unhappy. :-) On 20/01/2015 17:14, MacGregor, Duncan (GE Energy Management) duncan.macgre...@ge.com wrote: Hmm, 8068915 hasn’t fixed it, but running fewer benchmarks seems to make the problem go away, so it looks like there’s something going wrong fairly deep in our runtime. Trying the full suite with compilation logging enabled now to see if I can find a smoking gun. On 20/01/2015 12:40, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: Duncan, thanks a lot for giving it a try! If you plan to spend more time on it, please, apply 8068915 as well. I saw huge intermittent performance regressions due to continuous deoptimization storm. You can look into -XX:+LogCompilation output and look for repeated deoptimization events in steady state w/ Action_none. Also, there's deoptimization statistics in the log (at least, in jdk9). It's located right before compilation_log tag. Thanks again for the valuable feedback! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00 On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote: Okay, I¹ve done some tests of this with the micro benchmarks for our language runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It¹s a bit of a pathological case, and I haven¹t seen the same sort of degradation in the other benchmarks or in real applications, but I haven¹t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what¹s going on. Duncan. On 16/01/2015 17:16, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
John, thanks for the review! Updated webrev: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.01/hotspot http://cr.openjdk.java.net/~vlivanov/8063137/webrev.01/jdk See my answers inline. On 1/17/15 2:13 AM, John Rose wrote: On Jan 16, 2015, at 9:16 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com mailto:vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 ... PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. This performance bump is excellent news. LFs are supposed to express emergently common behaviors, like hidden classes. We are much closer to that goal now. I'm glad to see that the library-assisted profiling turns out to be relatively clean. In effect this restores the pre-LF CountingMethodHandle logic from 2011, which was so beneficial in JDK 7: http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/02de5cdbef21/src/share/classes/java/lang/invoke/CountingMethodHandle.java I have some suggestions to make this version a little cleaner; see below. Starting with the JDK changes: In LambdaForm.java, I'm feeling flag pressure from all the little boolean fields and constructor parameters. (Is it time to put in a bit-encoded field private byte LambdaForm.flags, or do we wait for another boolean to come along? But see next questions, which are more important.) What happens when a GWT LF gets inlined into a larger LF? Then there might be two or more selectAlternative calls. Will this confuse anything or will it Just Work? The combined LF will get profiled as usual, and the selectAlternative calls will also collect profile (or not?). This leads to another question: Why have a boolean 'isGWT' at all? Why not just check for one or more occurrence of selectAlternative, and declare that those guys override (some of) the profiling. Something like: -+ if (PROFILE_GWT lambdaForm.isGWT) ++ if (PROFILE_GWT lambdaForm.containsFunction(NF_selectAlternative)) (...where LF.containsFunction(NamedFunction) is a variation of LF.contains(Name).) I suppose the answer may be that you want to inline GWTs (if ever) into customized code where the JVM profiling should get maximum benefit. In that case case you might want to set the boolean to false to distinguish immature GWT combinators from customized ones. If that's the case, perhaps the real boolean flag you want is not 'isGWT' but 'sharedProfile' or 'immature' or some such, or (inverting) 'customized'. (I like the feel of a 'customized' flag.) Then @IgnoreProfile would get attached to a LF that (a ) contains selectAlternative and (b ) is marked as non-customized/immature/shared. You might also want to adjust the call to 'profileBranch' based on whether the containing LF was shared or customized. What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag. I agree. It was an interim solution. Initially, I planned to introduce customization and guide the logic based on that property. But it's not there yet and I needed something for GWT case. Unfortunately, I missed the case when GWT is edited. In that case, isGWT flag is missed and no annotation is set. So, I removed isGWT flag and introduced a check for selectAlternative occurence in LambdaForm shape, as you suggested. In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions. Will that optimize or will it cause a permanent slowdown? Consider a hack like this on the exception path: counters[idx] = Integer.MAX_VALUE / 2; I had an impression that VM optimizes overflows in Math.exact* intrinsics, but it's not the case - it always inserts an uncommon trap. I used the workaround you proposed. On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in the VM) promises too much ignorance, since it suppresses branch counts and traps, but allows type profiles to be consulted. Maybe something positive like @ManyTraps or @SharedMegamorphic? (It's just a name, and this is just a suggestion.) What do you think about @LambdaForm.Shared? Going to the JVM: In library_call.cpp, I think you should change the assert to a guard: -+ assert(aobj-length() == 2, ); ++ aobj-length() == 2) { Done. In Parse::dynamic_branch_prediction, the mere presence of the Opaque4 node is enough to trigger replacement of profiling. I think there should *not* be a test of method()-ignore_profile(). That should provide better integration between the two sources of profile data to JVM profiling? Done. Also, I think the name 'Opaque4Node' is way too… opaque. Suggest 'ProfileBranchNode', since that's
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Thanks, Vladimir! I would suggest to add more detailed comment (instead of simple Stop profiling) to inline_profileBranch() intrinsic explaining what it is doing because it is not strictly intrinsic - it does not implement profileBranch() java code when counts is constant. Sure, will do. You forgot to mark Opaque4Node as macro node. I would suggest to base it on Opaque2Node then you will get some methods from it. Do I really need to do so? I expect it to go away during IGVN pass right after parsing is over. That's why I register the node for igvn in LibraryCallKit::inline_profileBranch(). Changes in macro.cpp compile.cpp are leftovers from the version when Opaque4 was macro node. I plan to remove them. Best regards, Vladimir Ivanov On 1/16/15 9:16 AM, Vladimir Ivanov wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Okay, I¹ve done some tests of this with the micro benchmarks for our language runtime which show pretty much no change except for one test which is now almost 3x slower. It uses nested loops to iterate over an array and concatenate the string-like objects it contains, and replaces elements with these new longer string-llike objects. It¹s a bit of a pathological case, and I haven¹t seen the same sort of degradation in the other benchmarks or in real applications, but I haven¹t done serious benchmarking of them with this change. I shall see if the test case can be reduced down to anything simpler while still showing the same performance behaviour, and try add some compilation logging options to narrow down what¹s going on. Duncan. On 16/01/2015 17:16, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Nice! At least Hotspot part since I don't understand jdk part :) I would suggest to add more detailed comment (instead of simple Stop profiling) to inline_profileBranch() intrinsic explaining what it is doing because it is not strictly intrinsic - it does not implement profileBranch() java code when counts is constant. You forgot to mark Opaque4Node as macro node. I would suggest to base it on Opaque2Node then you will get some methods from it. Thanks, Vladimir On 1/16/15 9:16 AM, Vladimir Ivanov wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 After GuardWithTest (GWT) LambdaForms became shared, profile pollution significantly distorted compilation decisions. It affected inlining and hindered some optimizations. It causes significant performance regressions for Nashorn (on Octane benchmarks). Inlining was fixed by 8059877 [1], but it didn't cover the case when a branch is never taken. It can cause missed optimization opportunity, and not just increase in code size. For example, non-pruned branch can break escape analysis. Currently, there are 2 problems: - branch frequencies profile pollution - deoptimization counts pollution Branch frequency pollution hides from JIT the fact that a branch is never taken. Since GWT LambdaForms (and hence their bytecode) are heavily shared, but the behavior is specific to MethodHandle, there's no way for JIT to understand how particular GWT instance behaves. The solution I propose is to do profiling in Java code and feed it to JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where profiling info is stored. Once JIT kicks in, it can retrieve these counts, if corresponding MethodHandle is a compile-time constant (and it is usually the case). To communicate the profile data from Java code to JIT, MethodHandleImpl::profileBranch() is used. If GWT MethodHandle isn't a compile-time constant, profiling should proceed. It happens when corresponding LambdaForm is already shared, for newly created GWT MethodHandles profiling can occur only in native code (dedicated nmethod for a single LambdaForm). So, when compilation of the whole MethodHandle chain is triggered, the profile should be already gathered. Overriding branch frequencies is not enough. Statistics on deoptimization events is also polluted. Even if a branch is never taken, JIT doesn't issue an uncommon trap there unless corresponding bytecode doesn't trap too much and doesn't cause too many recompiles. I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT sees it on some method, Compile::too_many_traps Compile::too_many_recompiles for that method always return false. It allows JIT to prune the branch based on custom profile and recompile the method, if the branch is visited. For now, I wanted to keep the fix very focused. The next thing I plan to do is to experiment with ignoring deoptimization counts for other LambdaForms which are heavily shared. I already saw problems caused by deoptimization counts pollution (see JDK-8068915 [2]). I plan to backport the fix into 8u40, once I finish extensive performance testing. Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane). Thanks! PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8059877 8059877: GWT branch frequencies pollution due to LF sharing [2] https://bugs.openjdk.java.net/browse/JDK-8068915 [3] https://bugs.openjdk.java.net/browse/JDK-8046703 JEP 210: LambdaForm Reduction and Caching ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Re: [9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
On Jan 16, 2015, at 9:16 AM, Vladimir Ivanov vladimir.x.iva...@oracle.com wrote: http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/ https://bugs.openjdk.java.net/browse/JDK-8063137 https://bugs.openjdk.java.net/browse/JDK-8063137 ... PS: as a summary, my experiments show that fixes for 8063137 8068915 [2] almost completely recovers peak performance after LambdaForm sharing [3]. There's one more problem left (non-inlined MethodHandle invocations are more expensive when LFs are shared), but it's a story for another day. This performance bump is excellent news. LFs are supposed to express emergently common behaviors, like hidden classes. We are much closer to that goal now. I'm glad to see that the library-assisted profiling turns out to be relatively clean. In effect this restores the pre-LF CountingMethodHandle logic from 2011, which was so beneficial in JDK 7: http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/02de5cdbef21/src/share/classes/java/lang/invoke/CountingMethodHandle.java I have some suggestions to make this version a little cleaner; see below. Starting with the JDK changes: In LambdaForm.java, I'm feeling flag pressure from all the little boolean fields and constructor parameters. (Is it time to put in a bit-encoded field private byte LambdaForm.flags, or do we wait for another boolean to come along? But see next questions, which are more important.) What happens when a GWT LF gets inlined into a larger LF? Then there might be two or more selectAlternative calls. Will this confuse anything or will it Just Work? The combined LF will get profiled as usual, and the selectAlternative calls will also collect profile (or not?). This leads to another question: Why have a boolean 'isGWT' at all? Why not just check for one or more occurrence of selectAlternative, and declare that those guys override (some of) the profiling. Something like: -+ if (PROFILE_GWT lambdaForm.isGWT) ++ if (PROFILE_GWT lambdaForm.containsFunction(NF_selectAlternative)) (...where LF.containsFunction(NamedFunction) is a variation of LF.contains(Name).) I suppose the answer may be that you want to inline GWTs (if ever) into customized code where the JVM profiling should get maximum benefit. In that case case you might want to set the boolean to false to distinguish immature GWT combinators from customized ones. If that's the case, perhaps the real boolean flag you want is not 'isGWT' but 'sharedProfile' or 'immature' or some such, or (inverting) 'customized'. (I like the feel of a 'customized' flag.) Then @IgnoreProfile would get attached to a LF that (a ) contains selectAlternative and (b ) is marked as non-customized/immature/shared. You might also want to adjust the call to 'profileBranch' based on whether the containing LF was shared or customized. What I'm mainly poking at here is that 'isGWT' is not informative about the intended use of the flag. In 'updateCounters', if the counter overflows, you'll get continuous creation of ArithmeticExceptions. Will that optimize or will it cause a permanent slowdown? Consider a hack like this on the exception path: counters[idx] = Integer.MAX_VALUE / 2; On the Name Bikeshed: It looks like @IgnoreProfile (ignore_profile in the VM) promises too much ignorance, since it suppresses branch counts and traps, but allows type profiles to be consulted. Maybe something positive like @ManyTraps or @SharedMegamorphic? (It's just a name, and this is just a suggestion.) Going to the JVM: In library_call.cpp, I think you should change the assert to a guard: -+ assert(aobj-length() == 2, ); ++ aobj-length() == 2) { In Parse::dynamic_branch_prediction, the mere presence of the Opaque4 node is enough to trigger replacement of profiling. I think there should *not* be a test of method()-ignore_profile(). That should provide better integration between the two sources of profile data to JVM profiling? Also, I think the name 'Opaque4Node' is way too… opaque. Suggest 'ProfileBranchNode', since that's exactly what it does. Suggest changing the log element profile_branch to observe source='profileBranch', to make a better hint as to the source of the info. — John ___ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev