RFR: 8288425: Footprint regression due MH creation when initializing StringConcatFactory
Avoid doing MH creation when initializing `StringConcatFactory` by lazily initializing to a `@Stable` field instead. - Commit messages: - 8288425: Footprint regression due MH creation when initializing StringConcatFactory Changes: https://git.openjdk.org/jdk/pull/9154/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9154&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288425 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9154.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9154/head:pull/9154 PR: https://git.openjdk.org/jdk/pull/9154
Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers [v2]
> To take optimal advantage of the pre-existing optimization for repeated > filters we could split the application of different types of stringifiers. > > The resulting difference in order of evaluation is not observable by > conventional means since all reference type share the same object > stringifier, and the others are filtering primitives (floats and doubles) > which have been passed by value already. > > This change neutral on many concatenation expression shapes, but for any > complex expressions with interleaving float/double and reference parameters > it brings a straightforward reduction in rebinds and underlying LFs > generated. For example on the > [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248) > test there's a modest 2% reduction in total classes loaded with this change > (from 16209 to 15872) Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Improve comments based on review feedback - Changes: - all: https://git.openjdk.org/jdk/pull/9082/files - new: https://git.openjdk.org/jdk/pull/9082/files/51c841e8..4b7c696e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=9082&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=9082&range=00-01 Stats: 11 lines in 1 file changed: 7 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9082.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9082/head:pull/9082 PR: https://git.openjdk.org/jdk/pull/9082
Integrated: 8287903: Reduce runtime of java.math microbenchmarks
On Tue, 7 Jun 2022 12:34:25 GMT, Claes Redestad wrote: > - Reduce runtime by running fewer forks, fewer iterations, less warmup. All > micros tested in this group appear to stabilize very quickly. > - Refactor BigIntegers to avoid re-running some (most) micros over and over > with parameter values that don't affect them. > > Expected runtime down from 14 hours to 15 minutes. This pull request has now been integrated. Changeset: 7e948f7c Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/7e948f7ccbb4b9be04f5ecb65cc8dd72e3b495f4 Stats: 104 lines in 4 files changed: 66 ins; 33 del; 5 mod 8287903: Reduce runtime of java.math microbenchmarks Reviewed-by: ecaspole, aph - PR: https://git.openjdk.java.net/jdk/pull/9062
Re: RFR: 8287903: Reduce runtime of java.math microbenchmarks
On Tue, 7 Jun 2022 12:34:25 GMT, Claes Redestad wrote: > - Reduce runtime by running fewer forks, fewer iterations, less warmup. All > micros tested in this group appear to stabilize very quickly. > - Refactor BigIntegers to avoid re-running some (most) micros over and over > with parameter values that don't affect them. > > Expected runtime down from 14 hours to 15 minutes. Thanks for reviews! - PR: https://git.openjdk.java.net/jdk/pull/9062
Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers
On Wed, 8 Jun 2022 14:44:33 GMT, Jim Laskey wrote: >> To take optimal advantage of the pre-existing optimization for repeated >> filters we could split the application of different types of stringifiers. >> >> The resulting difference in order of evaluation is not observable by >> conventional means since all reference type share the same object >> stringifier, and the others are filtering primitives (floats and doubles) >> which have been passed by value already. >> >> This change neutral on many concatenation expression shapes, but for any >> complex expressions with interleaving float/double and reference parameters >> it brings a straightforward reduction in rebinds and underlying LFs >> generated. For example on the >> [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248) >> test there's a modest 2% reduction in total classes loaded with this change >> (from 16209 to 15872) > > I make heavy use of the `mh = filterArgument(baseMH, 0, arrayOfFilters);` > pattern. I assumed it generated fewer LF than making individual calls. Am I > wrong? @JimLaskey no, you're generally right to assume that bunching them up reduces the number of LFs created. There's a flaw in the `makeRepeatedFilterForm` transform that means the theoretical total amount of forms can surpass the baseline if you do _both_ one-by-one and all-at-once transforms from the same root MH. That's a very big if, but it's something I've felt slight unease about since I did that optimization. - PR: https://git.openjdk.java.net/jdk/pull/9082
Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers
On Wed, 8 Jun 2022 13:42:11 GMT, Jorn Vernee wrote: > Looking at the code in MethodHandles::filterArguments, and this optimization. > It seems that for multiple filters we could potentially always generate just > a single new lambda form if we wanted, not just for repeated filters. > > It would be similar to `LambdaFormEditor::makeRepeatedFilterForm`. Just need > to extend the species data with one new `L_TYPE` field for each unique > filter, generate a new getCombiner `Name` for each unique combiner, pass in > an array of combiner types instead of just a single one, and pass in an > additional array of ints that marks the combiner used by each argument > position. Then do similar wiring up as we do now, but calling the > corresponding combiner for each argument pos. Right? > > (If I'm right about that, maybe that's even a more interesting direction to > explore, since it would benefit everyone, and the code in StringConcatFactory > could stay the same) You're right, but I think such a transform would add quite a bit of complexity. Back when I added the `makeRepeatedFilterForm` optimization (for the benefit of `StringConcatFactory` š ) I sought to strike a balance between the simple repeatedly-create-`filterArgumentForm`s transform we would do before and more complex forms that would remove more intermediates. I was also a bit concerned that such "shortcutting" transforms could mean we eliminate sharing, and the more complex we make things the harder it gets to reason about. Example: ... mh1 = filterArgument(baseMH, 0, filter); mh1 = filterArgument(mh1, 0, filter); mh2 = filterArgument(baseMH, 0, filter, filter); mh1.form() == mh2.form(); // false The two different pathways end up with two different LFs, since `makeRepeatedFilterForm` shortcuts. In practice this might be a non-issue, but I figured it'd be nice if we could have a reasonable pathway to resolve mh1.form() and mh2.form() to the same thing in the future. - PR: https://git.openjdk.java.net/jdk/pull/9082
Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers
On Wed, 8 Jun 2022 12:11:30 GMT, Š”ŠµŃŠ³ŠµŠ¹ Š¦ŃŠæŠ°Š½Š¾Š² wrote: >> To take optimal advantage of the pre-existing optimization for repeated >> filters we could split the application of different types of stringifiers. >> >> The resulting difference in order of evaluation is not observable by >> conventional means since all reference type share the same object >> stringifier, and the others are filtering primitives (floats and doubles) >> which have been passed by value already. >> >> This change neutral on many concatenation expression shapes, but for any >> complex expressions with interleaving float/double and reference parameters >> it brings a straightforward reduction in rebinds and underlying LFs >> generated. For example on the >> [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248) >> test there's a modest 2% reduction in total classes loaded with this change >> (from 16209 to 15872) > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 509: > >> 507: doubleFilters = new MethodHandle[ptypes.length]; >> 508: } >> 509: doubleFilters[i] = doubleStringifier(); > > Why do we have `ptypes[i] = int.class` for `int` and `short`, `ptypes[i] = > String.class` for `float`, `double` and `Object` and none for `boolean` and > `char`? š§ `Object`, `float` and `double` arguments are all eagerly transformed to `String` (this is what the "stringifiers" do). Since we apply these as filters as a last step (which means that's the first thing that will run) we then must change the internal type to `String`. Ultimately means we have fewer types to worry about in the combinator. We also widen integral subword types to `int` (`short` and `byte`). This doesn't need to be adapted by filtering but can lean on implicit conversion (subword primitives are widened to int anyhow, what we care about is matching the correct logic). This mainly matters insofar that the same direct method handle will be used for these three types, as it can (and will) be adapted without modification by the `viewAsType` call that ensures the `MH` returned from the BSM has the exact type asked for. `boolean` and `char` types can't use the `int` mixers/prependers and instead have specialized methods that will be adapted into the concat MH tree. This means the parameter type stays `boolean` and `char` throughout, respectively. - PR: https://git.openjdk.java.net/jdk/pull/9082
RFR: 8288011: StringConcatFactory: Split application of stringifiers
To take optimal advantage of the pre-existing optimization for repeated filters we could split the application of different types of stringifiers. The resulting difference in order of evaluation is not observable by conventional means since all reference type share the same object stringifier, and the others are filtering primitives (floats and doubles) which have been passed by value already. This change neutral on many concatenation expression shapes, but for any complex expressions with interleaving float/double and reference parameters it brings a straightforward reduction in rebinds and underlying LFs generated. For example on the [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248) test there's a modest 2% reduction in total classes loaded with this change (from 16209 to 15872) - Commit messages: - StringConcatFactory: Split application of stringifiers Changes: https://git.openjdk.java.net/jdk/pull/9082/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9082&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288011 Stats: 27 lines in 1 file changed: 16 ins; 3 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/9082.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9082/head:pull/9082 PR: https://git.openjdk.java.net/jdk/pull/9082
Integrated: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles
On Fri, 27 May 2022 14:18:19 GMT, Claes Redestad wrote: > In preparation of #8855 this PR refactors the conversions from `List` to > array and array to `List`, reducing the number of conversions when calling > `MethodHandles.dropArguments` in particular. This remove about ~5% of > allocations on the `StringConcatFactoryBootstraps` microbenchmark. This pull request has now been integrated. Changeset: ecf00785 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/ecf00785f21125d88f5cc18311f586a7bb6ddc56 Stats: 49 lines in 3 files changed: 9 ins; 6 del; 34 mod 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles Reviewed-by: jvernee - PR: https://git.openjdk.java.net/jdk/pull/8923
Integrated: 8287522: StringConcatFactory: Add in prependers and mixers in batches
On Mon, 23 May 2022 20:03:05 GMT, Claes Redestad wrote: > When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op This pull request has now been integrated. Changeset: 5c39a366 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/5c39a3664186b91512c6a6cfcd8aa0e9860614ea Stats: 468 lines in 4 files changed: 268 ins; 114 del; 86 mod 8287522: StringConcatFactory: Add in prependers and mixers in batches Reviewed-by: jlaskey, mchung - PR: https://git.openjdk.java.net/jdk/pull/8855
RFR: 8287903: Reduce runtime of java.math microbenchmarks
- Reduce runtime by running fewer forks, fewer iterations, less warmup. All micros tested in this group appear to stabilize very quickly. - Refactor BigIntegers to avoid re-running some (most) micros over and over with parameter values that don't affect them. Expected runtime down from 14 hours to 15 minutes. - Commit messages: - Cleanup SmallShifts, reduce param space - Annotations not cascading to static nested class - Warmup/Measurement not cascading to static nested class - Reduce runtime of java.math micros - Reduce runtime of java.math micros Changes: https://git.openjdk.java.net/jdk/pull/9062/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9062&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8287903 Stats: 104 lines in 4 files changed: 66 ins; 33 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/9062.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9062/head:pull/9062 PR: https://git.openjdk.java.net/jdk/pull/9062
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v8]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'master' into scf_chunked - We now don't need big Species classes for shorter concats, so on some tests the improvements meant more Species class generation. Adjusting HelloClasslist - Clarify that consts are always reduced to null, even if calling with constants. Also clarify that the number of constants is paramCount + 1 by refactoring to an array - Mandy review comment #1: Cleanup LF.basicType a bit more. - Cleanup too generic String.valueOf lookups - Address review comments from @ExE-Boss - Improve bootstrap microbenchmark to include more shapes, reduce runtime - Minor stylistic cleanup - Revert change to HelloClasslist (doesn't affect generation) - Reduce allocation adding in mixers by extracting constant arg lists and caching double arg mixers - ... and 7 more: https://git.openjdk.java.net/jdk/compare/98404245...afe61394 - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/aaa442d5..afe61394 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=06-07 Stats: 59296 lines in 765 files changed: 33676 ins; 19450 del; 6170 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Integrated: 8287785: Reduce runtime of java.lang.invoke microbenchmarks
On Fri, 3 Jun 2022 11:16:29 GMT, Claes Redestad wrote: > - Add explicit run configurations to java.lang.invoke micros, aiming to > reduce runtime while maintaining a decently high confidence that there's > enough warmup to produce good enough data. > > - Remove several trivial baseline micros, mainly those that only return a > static object: It's reasonable to have baseline microbenchmarks when the > baseline op is complex and you're mostly interested in checking the overhead > of doing the same thing via some MH API, but blackhole operations are now > shortcutting very quickly and timings doesn't differ from one type of object > to another, so we don't need a multitude of such baseline tests. > > Estimated runtime of `make test TEST=micro:java.lang.micro` (excluding build) > drops from just above 28 to just above 3 hours. This pull request has now been integrated. Changeset: 42261d75 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/42261d752a140325496ffdd40d9ad62b189d1b3e Stats: 439 lines in 47 files changed: 276 ins; 153 del; 10 mod 8287785: Reduce runtime of java.lang.invoke microbenchmarks Reviewed-by: mchung - PR: https://git.openjdk.java.net/jdk/pull/9012
Integrated: 8287810: Reduce runtime of java.lang microbenchmarks
On Fri, 3 Jun 2022 15:05:33 GMT, Claes Redestad wrote: > - Reduce runtime by providing explicit configurations that try to run the > micros as fast as possible while giving enough time for warmup. > - Remove some excessive parameters > - Remove some benchmarks testing experimental features > (ObjectHashCode.mode_) or were simply dubious at large > (StringHttp) > > This reduces runtime of running `java.lang.[A-Z]` (only java.lang, not > java.lang.reflect etc..) from over 56 hours to 6:45. This pull request has now been integrated. Changeset: 778ed1a7 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/778ed1a760d8f673811914b75e5d14e465881c91 Stats: 293 lines in 39 files changed: 192 ins; 87 del; 14 mod 8287810: Reduce runtime of java.lang microbenchmarks Reviewed-by: mchung - PR: https://git.openjdk.java.net/jdk/pull/9015
Integrated: 8287798: Reduce runtime of java.lang.reflect/runtime microbenchmarks
On Fri, 3 Jun 2022 12:46:31 GMT, Claes Redestad wrote: > Add explicit run configurations to java.lang.reflect and .runtime micros, > aiming to reduce runtime while maintaining a decently high confidence that > there's enough warmup to produce good enough data. > > Roughly halves runtime of running `java.lang.reflect|java.lang.runtime` from > 159 to 78 minutes. This pull request has now been integrated. Changeset: aa6c568a Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/aa6c568a06fa92263d4b74ff979eb521ae953bc8 Stats: 27 lines in 4 files changed: 25 ins; 2 del; 0 mod 8287798: Reduce runtime of java.lang.reflect/runtime microbenchmarks Reviewed-by: jvernee, mchung - PR: https://git.openjdk.java.net/jdk/pull/9013
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v3]
> In preparation of #8855 this PR refactors the conversions from `List` to > array and array to `List`, reducing the number of conversions when calling > `MethodHandles.dropArguments` in particular. This remove about ~5% of > allocations on the `StringConcatFactoryBootstraps` microbenchmark. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Jorn review: Drop boolean, rename internal variant dropArgumentsTrusted - Changes: - all: https://git.openjdk.java.net/jdk/pull/8923/files - new: https://git.openjdk.java.net/jdk/pull/8923/files/15ff2125..c590f0dd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=01-02 Stats: 13 lines in 3 files changed: 0 ins; 3 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/8923.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8923/head:pull/8923 PR: https://git.openjdk.java.net/jdk/pull/8923
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]
On Fri, 3 Jun 2022 16:48:11 GMT, Jorn Vernee wrote: >> I'm not so sure. >> >> First of all we're no worse than before with the defensive copying here. >> Second of an optimizing compiler might theoretically be able to see that the >> array we get from the toArray is always fresh and not escaping anywhere in >> any well-behaved collection, so the clone could be elided. But if not then >> both toArray and clone are intrinsified operations that are heavily >> optimized and pretty fast even when interpreting, so emulating it with an >> external loop might end up taking more time even at peak. While likely >> taking longer to reach that peak. Using dumb, shared and common code >> (especially things that get JITted early anyhow) is nice in areas that see >> most action during startup/warmup. > > Ok, please keep it the way it is in your current patch then. Correction: `toArray` isn't intrinsified, but common `List` implementations are likely to implement it with a `System.arraycopy`, which is. - PR: https://git.openjdk.java.net/jdk/pull/8923
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]
On Fri, 3 Jun 2022 15:06:48 GMT, Jorn Vernee wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Review comments, eagerly convert sooner in tryFinally > > src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 5269: > >> 5267: } >> 5268: >> 5269: static MethodHandle dropArguments(MethodHandle target, int pos, >> Class[] valueTypes, boolean trusted) { > > Having a boolean that flips the behaviour makes it harder to know what that > `true` or `false` literal at the call site does. I'd suggest splitting this > into `dropArgumentsTrusted`, which doesn't clone the array, and > `dropArguments` which always makes a copy (and the latter calls former). WDYT? Yeah, that sounds better. - PR: https://git.openjdk.java.net/jdk/pull/8923
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]
On Fri, 3 Jun 2022 15:18:52 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 5266: >> >>> 5264: */ >>> 5265: public static MethodHandle dropArguments(MethodHandle target, int >>> pos, List> valueTypes) { >>> 5266: return dropArguments(target, pos, valueTypes.toArray(new >>> Class[0]), false); >> >> A bit unfortunate that we can't trust this `toArray` to do a copy. I was >> going to suggest Stream, but it has the same issue (someone might have a >> custom stream implementation). >> >> I do think a manual copy of the array is possible by having a loop though: >> Suggestion: >> >> Class[] ptypes = new Class[valueTypes.size()]; >> for (int i = 0; i < ptypes.length; i++) { >> ptypes[i] = valueTypes.get(i); >> } >> return dropArguments(target, pos, ptypes, false); >> >> >> (or, maybe extract such a loop to a helper method for clarity). > > The same could be done for the public `dropArgumentsToMatch` I think. I'm not so sure. First of all we're no worse than before with the defensive copying here. Second of an optimizing compiler might theoretically be able to see that the array we get from the toArray is always fresh and not escaping anywhere in any well-behaved collection, so the clone could be elided. But if not then both toArray and clone are intrinsified operations that are heavily optimized and pretty fast even when interpreting, so emulating it with an external loop might end up taking more time even at peak. While likely taking longer to reach that peak. Using dumb, shared and common code (especially things that get JITted early anyhow) is nice in areas that see most action during startup/warmup. - PR: https://git.openjdk.java.net/jdk/pull/8923
RFR: 8287810: Reduce runtime of java.lang microbenchmarks
- Reduce runtime by providing explicit configurations that try to run the micros as fast as possible while giving enough time for warmup. - Remove some excessive parameters - Remove some benchmarks testing experimental features (ObjectHashCode.mode_) or were simply dubious at large (StringHttp) This reduces runtime of running `java.lang.[A-Z]` (only java.lang, not java.lang.reflect etc..) from over 56 hours to 6:45. - Commit messages: - Reduce runtime of java.lang microbenchmarks Changes: https://git.openjdk.java.net/jdk/pull/9015/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9015&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287810 Stats: 293 lines in 39 files changed: 192 ins; 87 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/9015.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9015/head:pull/9015 PR: https://git.openjdk.java.net/jdk/pull/9015
RFR: 8287798: Reduce runtime in java.lang.reflect/runtime microbenchmarks
Add explicit run configurations to java.lang.reflect and .runtime micros, aiming to reduce runtime while maintaining a decently high confidence that there's enough warmup to produce good enough data. Roughly halves runtime of running `java.lang.reflect|java.lang.runtime` from 159 to 78 minutes. - Commit messages: - Reduce runtime in java.lang.reflect/runtime microbenchmarks Changes: https://git.openjdk.java.net/jdk/pull/9013/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9013&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287798 Stats: 27 lines in 4 files changed: 25 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/9013.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9013/head:pull/9013 PR: https://git.openjdk.java.net/jdk/pull/9013
Re: RFR: 8287785: Reduce runtime of java.lang.invoke microbenchmarks [v2]
On Fri, 3 Jun 2022 12:37:16 GMT, liach wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Remove unused cached field in LookupAcquire > > test/micro/org/openjdk/bench/java/lang/invoke/LookupAcquire.java line 59: > >> 57: @Setup >> 58: public void setup() { >> 59: cached = MethodHandles.lookup(); > > Since the benchmark using the cached object is removed, should this field be > removed as well? Fixed! - PR: https://git.openjdk.java.net/jdk/pull/9012
Re: RFR: 8287785: Reduce runtime of java.lang.invoke microbenchmarks [v2]
> - Add explicit run configurations to java.lang.invoke micros, aiming to > reduce runtime while maintaining a decently high confidence that there's > enough warmup to produce good enough data. > > - Remove several trivial baseline micros, mainly those that only return a > static object: It's reasonable to have baseline microbenchmarks when the > baseline op is complex and you're mostly interested in checking the overhead > of doing the same thing via some MH API, but blackhole operations are now > shortcutting very quickly and timings doesn't differ from one type of object > to another, so we don't need a multitude of such baseline tests. > > Estimated runtime of `make test TEST=micro:java.lang.micro` (excluding build) > drops from just above 28 to just above 3 hours. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove unused cached field in LookupAcquire - Changes: - all: https://git.openjdk.java.net/jdk/pull/9012/files - new: https://git.openjdk.java.net/jdk/pull/9012/files/f351ac70..0a1f7ec7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=9012&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=9012&range=00-01 Stats: 11 lines in 1 file changed: 0 ins; 11 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/9012.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9012/head:pull/9012 PR: https://git.openjdk.java.net/jdk/pull/9012
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v7]
On Fri, 3 Jun 2022 11:19:04 GMT, ExE Boss wrote: >> You would think that, but javac doesn't do anything fancy once you store to >> a local >> >> javap output for lines 108 through 111: >> >>449: ldc #148// float 0.1f >>451: fstore31 >>453: fload 31 >>455: invokedynamic #149, 0// InvokeDynamic >> #4:makeConcatWithConstants:(F)Ljava/lang/String; >>460: astore32 >>462: fload 31 >>464: aload 4 >>466: invokedynamic #152, 0// InvokeDynamic >> #7:makeConcatWithConstants:(FLjava/lang/String;)Ljava/lang/String; >>471: astore33 >>473: aload 4 >>475: fload 31 >>477: invokedynamic #155, 0// InvokeDynamic >> #10:makeConcatWithConstants:(Ljava/lang/String;F)Ljava/lang/String; > > IĀ guess itĀ onlyĀ happens forĀ `final`Ā locals. Yes, looks like javac does constant fold when the local is explicitly final: final float f = 0.1f; System.out.println("const folding? " + f); javap: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String const folding? 0.1 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V Maybe this constant folding is something javac could be enhanced to do on effectively final locals, though I think we can defer adjusting `HelloClasslist` to break that optimization when that time comes. - PR: https://git.openjdk.java.net/jdk/pull/8855
RFR: 8287785: Reduce runtime of java.lang.invoke microbenchmarks
- Add explicit run configurations to java.lang.invoke micros, aiming to reduce runtime while maintaining a decently high confidence that there's enough warmup to produce good enough data. - Remove several trivial baseline micros, mainly those that only return a static object: It's reasonable to have baseline microbenchmarks when the baseline op is complex and you're mostly interested in checking the overhead of doing the same thing via some MH API, but blackhole operations are now shortcutting very quickly and timings doesn't differ from one type of object to another, so we don't need a multitude of such baseline tests. Estimated runtime of `make test TEST=micro:java.lang.micro` (excluding build) drops from just above 28 to just above 3 hours. - Commit messages: - 8287785: Apply explicit run configuration to java.lang.invoke microbenchmarks Changes: https://git.openjdk.java.net/jdk/pull/9012/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9012&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287785 Stats: 429 lines in 47 files changed: 276 ins; 142 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/9012.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9012/head:pull/9012 PR: https://git.openjdk.java.net/jdk/pull/9012
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v7]
On Thu, 2 Jun 2022 18:49:13 GMT, ExE Boss wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> We now don't need big Species classes for shorter concats, so on some >> tests the improvements meant more Species class generation. Adjusting >> HelloClasslist > > make/jdk/src/classes/build/tools/classlist/HelloClasslist.java line 116: > >> 114: String CC = "string" + c; >> 115: String CCS= "string" + c + s; >> 116: String CSCC = "string" + s + "string" + c; > > IāmĀ prettyĀ sure thatĀ `f` andĀ `c` willĀ beĀ inlined asĀ compileātimeĀ constants inĀ > theĀ String concatĀ recipes. You would think that, but javac doesn't do anything fancy once you store to a local javap output for lines 108 through 111: 449: ldc #148// float 0.1f 451: fstore31 453: fload 31 455: invokedynamic #149, 0// InvokeDynamic #4:makeConcatWithConstants:(F)Ljava/lang/String; 460: astore32 462: fload 31 464: aload 4 466: invokedynamic #152, 0// InvokeDynamic #7:makeConcatWithConstants:(FLjava/lang/String;)Ljava/lang/String; 471: astore33 473: aload 4 475: fload 31 477: invokedynamic #155, 0// InvokeDynamic #10:makeConcatWithConstants:(Ljava/lang/String;F)Ljava/lang/String; - PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v7]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: We now don't need big Species classes for shorter concats, so on some tests the improvements meant more Species class generation. Adjusting HelloClasslist - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/444cdd39..aaa442d5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=05-06 Stats: 18 lines in 1 file changed: 17 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v6]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Clarify that consts are always reduced to null, even if calling with constants. Also clarify that the number of constants is paramCount + 1 by refactoring to an array - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/56bdc4c4..444cdd39 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=04-05 Stats: 55 lines in 1 file changed: 15 ins; 13 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v2]
On Tue, 31 May 2022 23:15:26 GMT, Mandy Chung wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Minor stylistic cleanup > > src/java.base/share/classes/java/lang/invoke/LambdaForm.java line 204: > >> 202: static BasicType basicType(Class type) { >> 203: if (!type.isPrimitive()) return L_TYPE; >> 204: return >> basicType(Wrapper.forPrimitiveType(type).basicTypeChar()); > > Suggestion: > > return basicType(Wrapper.basicTypeChar(type)); I'll go one step further then and also remove the line before it too. - PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v5]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Mandy review comment #1: Cleanup LF.basicType a bit more. - Cleanup too generic String.valueOf lookups - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/f1742a84..56bdc4c4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=03-04 Stats: 18 lines in 2 files changed: 1 ins; 8 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v4]
On Wed, 1 Jun 2022 20:51:50 GMT, Mandy Chung wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Address review comments from @ExE-Boss > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 459: > >> 457: String prefix = constants.get(0); >> 458: if (prefix == null) { >> 459: if (suffix == null) { > > Is it possible for prefix or suffix an empty string (non-null)? The > original code checks `if (constant.isEmpty())` No: Empty string constants would lead to adding a null to the list of constants in the prologue. - PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v4]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Address review comments from @ExE-Boss - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/4221b348..f1742a84 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=02-03 Stats: 17 lines in 1 file changed: 0 ins; 5 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v3]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Improve bootstrap microbenchmark to include more shapes, reduce runtime - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/263db625..4221b348 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=01-02 Stats: 25 lines in 1 file changed: 19 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8282662: Use List.of() factory method to reduce memory consumption [v7]
On Tue, 31 May 2022 07:40:56 GMT, Š”ŠµŃŠ³ŠµŠ¹ Š¦ŃŠæŠ°Š½Š¾Š² wrote: >> `List.of()` along with `Set.of()` create unmodifiable `List/Set` but with >> smaller footprint comparing to `Arrays.asList()` / `new HashSet()` when >> called with vararg of size 0, 1, 2. >> >> In general replacement of `Arrays.asList()` with `List.of()` is dubious as >> the latter is null-hostile, however in some cases we are sure that arguments >> are non-null. Into this PR I've included the following cases (in addition to >> those where the argument is proved to be non-null at compile-time): >> - `MethodHandles.longestParameterList()` never returns null >> - parameter types are never null >> - interfaces used for proxy construction and returned from >> `Class.getInterfaces()` are never null >> - exceptions types of method signature are never null > > Š”ŠµŃŠ³ŠµŠ¹ Š¦ŃŠæŠ°Š½Š¾Š² has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains six commits: > > - 8282662: Revert wrong copyright year change > - 8282662: Revert ProxyGenerator > - 8282662: Revert ProxyGenerator > - 8282662: Revert dubious changes in MethodType > - 8282662: Revert dubious changes > - 8282662: Use List/Set.of() factory methods to save memory Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/7729
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v2]
> When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Minor stylistic cleanup - Changes: - all: https://git.openjdk.java.net/jdk/pull/8855/files - new: https://git.openjdk.java.net/jdk/pull/8855/files/114d3fd8..263db625 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=00-01 Stats: 26 lines in 1 file changed: 6 ins; 10 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches
On Mon, 23 May 2022 20:03:05 GMT, Claes Redestad wrote: > When generating `MethodHandle`-based concatenation expressions in > `StringConcatFactory` we can reduce the number of classes generated at > runtime by creating small batches of prependers and mixers before binding > them into the root expression tree. > > Improvements on one-off tests are modest, while the improvement on > bootstrapping stress tests can be substantial > ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): > > | Build |# classes| Runtime | > | --- | - | --- | > | Baseline | 31119 | 2.942s | > | Patch | 16208 | 1.958s | > > An earlier version of this patch saw a regression in the > `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along > with the optimizations in #8881 and #8923 that is no longer the case, and > allocation pressure is down slightly compared to the baseline on such a > repeat-the-same-bootstrap stress test: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2170.039 ? 117.441 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3538.020 ?4.558B/op > > This patch: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt5 > 2144.807 ? 162.685 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 > 3184.943 ?4.495B/op Before moving forward with this I've been looking at slightly more expensive bootstrapping of already known shapes. On one such bootstrap microbenchmark, `StringConcatFactoryBootstraps` this patch would cause a ~25% slowdown (and a 33% increase in allocations). Not too concerning, but since the purpose of this patch is to reduce String concat bootstrapping overhead it would be nice to not trade that for a localized regression. First improvement was to pack `TransformKey`s better: #8881 - which helps both variants, but doesn't really narrow the gap. We can improve further outside of `StringConcatFactory` to prepare for this RFE by reducing the number List-to-array and array-to-List conversions in `MethodHandles.dropArguments` (-280b/op). Another candidate is to turn `LambdaFormEditor` either into a static utility or it into `LamdaForm` proper (TBD if this is worthwhile; they're tightly coupled, and all state maintained by `LambdaFormEditor` lives in `LambdaForm`). - PR: https://git.openjdk.java.net/jdk/pull/8855
RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches
When generating `MethodHandle`-based concatenation expressions in `StringConcatFactory` we can reduce the number of classes generated at runtime by creating small batches of prependers and mixers before binding them into the root expression tree. Improvements on one-off tests are modest, while the improvement on bootstrapping stress tests can be substantial ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)): | Build |# classes| Runtime | | --- | - | --- | | Baseline | 31119 | 2.942s | | Patch | 16208 | 1.958s | An earlier version of this patch saw a regression in the `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along with the optimizations in #8881 and #8923 that is no longer the case, and allocation pressure is down slightly compared to the baseline on such a repeat-the-same-bootstrap stress test: Baseline: Benchmark Mode Cnt Score Error Units SCFB.makeConcatWithConstants avgt5 2170.039 ? 117.441 ns/op SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 3538.020 ?4.558B/op This patch: Benchmark Mode Cnt Score Error Units SCFB.makeConcatWithConstants avgt5 2144.807 ? 162.685 ns/op SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt5 3184.943 ?4.495B/op - Commit messages: - Revert change to HelloClasslist (doesn't affect generation) - Reduce allocation adding in mixers by extracting constant arg lists and caching double arg mixers - De-CHM the prepender and mixer caches - Reduce allocations: Extract constant argument lists, cache base form for two-arg prependers - Refactor, reduce allocations - Merge branch 'master' into scf_chunked - Improve chunking further - Fix mis-merged change - Experiment: Chunk prependers and mixers to reduce number of rebinds and generated LF classes Changes: https://git.openjdk.java.net/jdk/pull/8855/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287522 Stats: 390 lines in 2 files changed: 232 ins; 99 del; 59 mod Patch: https://git.openjdk.java.net/jdk/pull/8855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855 PR: https://git.openjdk.java.net/jdk/pull/8855
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]
> In preparation of #8855 this PR refactors the conversions from `List` to > array and array to `List`, reducing the number of conversions when calling > `MethodHandles.dropArguments` in particular. This remove about ~5% of > allocations on the `StringConcatFactoryBootstraps` microbenchmark. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Review comments, eagerly convert sooner in tryFinally - Changes: - all: https://git.openjdk.java.net/jdk/pull/8923/files - new: https://git.openjdk.java.net/jdk/pull/8923/files/019e2ef8..15ff2125 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/8923.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8923/head:pull/8923 PR: https://git.openjdk.java.net/jdk/pull/8923
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles
On Fri, 27 May 2022 19:27:44 GMT, ExE Boss wrote: > If `parameterList` is too slow for `List.of` copies the backing parameter > types array, wouldn't calling > `JavaUtilCollectionAccess::listFromTrustedArray` a better alternative, as it > only allocates one wrapper and has better performance on `subList` than > `Arrays.copyOfRange`? Good observation that `MethodType::parameterList()` is a plausible candidate for `listFromTrustedArray`. That could be a good standalone and orthogonal RFE. I recall @rose00 musing about how he'd prefer it if `MethodType` and friends would have been implemented with `List.of`-style immutable lists rather than arrays. While I agree with him in principle we now have a mix of both worlds where we're converting between either representation haphazardly, making the implementation more complex in the process. It seems better for now to streamline the implementation with an early conversion to what we use internally (arrays) and stick with that. - PR: https://git.openjdk.java.net/jdk/pull/8923
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles
On Fri, 27 May 2022 14:18:19 GMT, Claes Redestad wrote: > In preparation of #8855 this PR refactors the conversions from `List` to > array and array to `List`, reducing the number of conversions when calling > `MethodHandles.dropArguments` in particular. This remove about ~5% of > allocations on the `StringConcatFactoryBootstraps` microbenchmark. It might be a bit too paranoid in this instance (since we don't keep the array around for long), but not cloning the result of calling `toArray` on an arbitrary and possibly adversary `List` could open up for TOCTOU race bugs / attacks. The existing code was being paranoid and copying and I don't want to weaken something that could have security implications without double- and triple-checking that it's safe to do so. - PR: https://git.openjdk.java.net/jdk/pull/8923
Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles
On Fri, 27 May 2022 14:18:19 GMT, Claes Redestad wrote: > In preparation of #8855 this PR refactors the conversions from `List` to > array and array to `List`, reducing the number of conversions when calling > `MethodHandles.dropArguments` in particular. This remove about ~5% of > allocations on the `StringConcatFactoryBootstraps` microbenchmark. Another datapoint is the `MethodHandlesDropArguments.interCreate` microbenchmark which explicitly tests `MethodHandles.dropArguments` (with a minimal argument list): Before: BenchmarkMode Cnt ScoreError Units MethodHandlesDropArguments.interCreate avgt 5 136.778 ? 1.265 ns/op MethodHandlesDropArguments.interCreate:?gc.alloc.rate.norm avgt 5 264.020 ? 0.021B/op Patch: BenchmarkMode Cnt Score Error Units MethodHandlesDropArguments.interCreate avgt 5 120.536 ? 3.133 ns/op MethodHandlesDropArguments.interCreate:?gc.alloc.rate.norm avgt 5 220.818 ? 41.309B/op - PR: https://git.openjdk.java.net/jdk/pull/8923
RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles
In preparation of #8855 this PR refactors the conversions from `List` to array and array to `List`, reducing the number of conversions when calling `MethodHandles.dropArguments` in particular. This remove about ~5% of allocations on the `StringConcatFactoryBootstraps` microbenchmark. - Commit messages: - Move all conversions closer to the edge - Merge master - Streamline MHs.dropArguments parameter handling - Missed correctly taking b1 into account in of(byte, int, int...) (java/lang/String/concat/ImplicitStringConcatShapes.java test failure) - Address review comments, fix unsafe shift, rework and remove ofBothArrays - Revert unrelated and unverified hashCode change - Improve TransformKey to pack more kinds of transforms efficiently Changes: https://git.openjdk.java.net/jdk/pull/8923/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287442 Stats: 46 lines in 3 files changed: 9 ins; 3 del; 34 mod Patch: https://git.openjdk.java.net/jdk/pull/8923.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8923/head:pull/8923 PR: https://git.openjdk.java.net/jdk/pull/8923
Integrated: 8287292: Improve TransformKey to pack more kinds of transforms efficiently
On Wed, 25 May 2022 09:38:08 GMT, Claes Redestad wrote: > The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` > allows keys to be compacted when all byte values of the key fit in 4 bits, > otherwise a byte array is allocated and used. This means that all transforms > with a kind value above 15 will be forced to allocate and use array > comparisons. > > Removing unused and folding some transforms to ensure all existing kinds can > fit snugly within the 0-15 value range realize a minor improvement to > footprint, speed and allocation pressure of affected transforms, e.g. > ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 2048.475 ? 69.887 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3487.311 ? 80.385B/op > > > Patched: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 1961.985 ? 101.519 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3156.478 ? 183.600 B/op This pull request has now been integrated. Changeset: be933185 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/be93318576896e8f5f9733ae1f7e3e74d63f5594 Stats: 124 lines in 2 files changed: 70 ins; 21 del; 33 mod 8287292: Improve TransformKey to pack more kinds of transforms efficiently Reviewed-by: jlaskey, jvernee, mchung - PR: https://git.openjdk.java.net/jdk/pull/8881
Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]
On Wed, 25 May 2022 14:13:52 GMT, Claes Redestad wrote: >> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` >> allows keys to be compacted when all byte values of the key fit in 4 bits, >> otherwise a byte array is allocated and used. This means that all transforms >> with a kind value above 15 will be forced to allocate and use array >> comparisons. >> >> Removing unused and folding some transforms to ensure all existing kinds can >> fit snugly within the 0-15 value range realize a minor improvement to >> footprint, speed and allocation pressure of affected transforms, e.g. >> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark: >> >> Baseline: >> >> Benchmark Mode Cnt >> Score Error Units >> SCFB.makeConcatWithConstants avgt 15 >> 2048.475 ? 69.887 ns/op >> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 >> 3487.311 ? 80.385B/op >> >> >> Patched: >> >> Benchmark Mode Cnt >> Score Error Units >> SCFB.makeConcatWithConstants avgt 15 >> 1961.985 ? 101.519 ns/op >> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 >> 3156.478 ? 183.600B/op > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Missed correctly taking b1 into account in of(byte, int, int...) > (java/lang/String/concat/ImplicitStringConcatShapes.java test failure) Thanks! - PR: https://git.openjdk.java.net/jdk/pull/8881
Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v4]
> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` > allows keys to be compacted when all byte values of the key fit in 4 bits, > otherwise a byte array is allocated and used. This means that all transforms > with a kind value above 15 will be forced to allocate and use array > comparisons. > > Removing unused and folding some transforms to ensure all existing kinds can > fit snugly within the 0-15 value range realize a minor improvement to > footprint, speed and allocation pressure of affected transforms, e.g. > ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 2048.475 ? 69.887 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3487.311 ? 80.385B/op > > > Patched: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 1961.985 ? 101.519 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3156.478 ? 183.600B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Address Mandy review comments - Changes: - all: https://git.openjdk.java.net/jdk/pull/8881/files - new: https://git.openjdk.java.net/jdk/pull/8881/files/612b4ece..bcd79aad Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=02-03 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/8881.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881 PR: https://git.openjdk.java.net/jdk/pull/8881
Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]
> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` > allows keys to be compacted when all byte values of the key fit in 4 bits, > otherwise a byte array is allocated and used. This means that all transforms > with a kind value above 15 will be forced to allocate and use array > comparisons. > > Removing unused and folding some transforms to ensure all existing kinds can > fit snugly within the 0-15 value range realize a minor improvement to > footprint, speed and allocation pressure of affected transforms, e.g. > ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 2048.475 ? 69.887 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3487.311 ? 80.385B/op > > > Patched: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 1961.985 ? 101.519 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3156.478 ? 183.600B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Missed correctly taking b1 into account in of(byte, int, int...) (java/lang/String/concat/ImplicitStringConcatShapes.java test failure) - Changes: - all: https://git.openjdk.java.net/jdk/pull/8881/files - new: https://git.openjdk.java.net/jdk/pull/8881/files/2be3b25c..612b4ece Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8881.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881 PR: https://git.openjdk.java.net/jdk/pull/8881
Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v2]
> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` > allows keys to be compacted when all byte values of the key fit in 4 bits, > otherwise a byte array is allocated and used. This means that all transforms > with a kind value above 15 will be forced to allocate and use array > comparisons. > > Removing unused and folding some transforms to ensure all existing kinds can > fit snugly within the 0-15 value range realize a minor improvement to > footprint, speed and allocation pressure of affected transforms, e.g. > ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark: > > Baseline: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 2048.475 ? 69.887 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3487.311 ? 80.385B/op > > > Patched: > > Benchmark Mode Cnt > Score Error Units > SCFB.makeConcatWithConstants avgt 15 > 1961.985 ? 101.519 ns/op > SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 > 3156.478 ? 183.600B/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Address review comments, fix unsafe shift, rework and remove ofBothArrays - Changes: - all: https://git.openjdk.java.net/jdk/pull/8881/files - new: https://git.openjdk.java.net/jdk/pull/8881/files/001e9d16..2be3b25c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=00-01 Stats: 86 lines in 2 files changed: 50 ins; 12 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/8881.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881 PR: https://git.openjdk.java.net/jdk/pull/8881
Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently
On Wed, 25 May 2022 13:30:23 GMT, Jorn Vernee wrote: >> Maybe not... argument positions should fit in a byte as well. But, maybe >> there are other problematic cases? Or are the ints guaranteed to fit in a >> byte? > > Maybe an `assert b == b23456[i]` would be nice here. All usage implies the int is an argument position, which by spec is constrained to be in the 0-255 range. But surely checking or asserting shouldn't hurt. - PR: https://git.openjdk.java.net/jdk/pull/8881
RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently
The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` allows keys to be compacted when all byte values of the key fit in 4 bits, otherwise a byte array is allocated and used. This means that all transforms with a kind value above 15 will be forced to allocate and use array comparisons. Removing unused and folding some transforms to ensure all existing kinds can fit snugly within the 0-15 value range realize a minor improvement to footprint, speed and allocation pressure of affected transforms, e.g. ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark: Baseline: Benchmark Mode Cnt Score Error Units SCFB.makeConcatWithConstants avgt 15 2048.475 ? 69.887 ns/op SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 3487.311 ? 80.385B/op Patched: Benchmark Mode Cnt Score Error Units SCFB.makeConcatWithConstants avgt 15 1961.985 ? 101.519 ns/op SCFB.makeConcatWithConstants:?gc.alloc.rate.norm avgt 15 3156.478 ? 183.600B/op - Commit messages: - Revert unrelated and unverified hashCode change - Improve TransformKey to pack more kinds of transforms efficiently Changes: https://git.openjdk.java.net/jdk/pull/8881/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287292 Stats: 44 lines in 1 file changed: 21 ins; 10 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/8881.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881 PR: https://git.openjdk.java.net/jdk/pull/8881
Integrated: 8287013: StringConcatFactory: remove short and byte mixers/prependers
On Thu, 19 May 2022 10:47:23 GMT, Claes Redestad wrote: > The current short and byte mixers and prependers simply delegate to the int > variants. At the LambdaForm level the values has already been implicitly cast > to int, so removing this indirection and directly adapting to call the > int-based variants is possible. > > This is a cleanup with minimal effect on bootstrap overhead and peak > performance, since all the LF logic only deals with basic types (where byte > and short and other subword primitives has been widened to ints anyhow). This pull request has now been integrated. Changeset: d5d19f52 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/d5d19f52ceb1430104b12a42c78489f42477a9b0 Stats: 87 lines in 2 files changed: 12 ins; 74 del; 1 mod 8287013: StringConcatFactory: remove short and byte mixers/prependers Reviewed-by: jlaskey - PR: https://git.openjdk.java.net/jdk/pull/8786
Re: RFR: 8287053: Avoid redundant HashMap.containsKey calls in ZoneInfoFile.getZoneInfo0
On Sat, 30 Apr 2022 17:00:03 GMT, Andrey Turbanov wrote: > Instead of pair `HashMap.containsKey`/`HashMap.get` method calls, we can use > single call `HashMap.getOrDefault`. > It's faster and clearer. Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8487
Re: Question: Is it possible to introduce ASCII coder for compact strings to optimize performance for ASCII compatible encoding?
Yes, I think 20% is close to an upper bound, with less gain on smaller strings due improved cache-locality. And as you say such a small speed-up might not be very noticeable in practice when I/O is involved. /Claes On 2022-05-19 14:17, Glavo wrote: Thank you very much for your work, it seems that the decoding speed of the new version of JDK has been greatly improved. However, `hasNegatives` still has a cost on my device. I tried removing the call to hasNegatives in `String.encodeUTF8` based on JDK 17.0.3, then I tested its performance through JMH.Ā I ran the JMH benchmark on two of my x86 machines (The CPU of one of the machines is Ryzen 7 3700X and the other is Xeon W-2175) and got about a 20% performance boost on both machines. Of course, The decoding performance in the new JDK is better than I thought. For an ASCII string of length 1, hasNegatives takes about 150 nanoseconds, and the duration increases linearly with the length. Considering that these operations usually accompany IO, the overhead is really not as noticeable as I thought. Thank you again for your contribution to OpenJDK and your patience in replying. On Wed, May 18, 2022 at 10:47 PM Claes Redestad mailto:claes.redes...@oracle.com>> wrote: Hi, so your suggestion is that String::coder would be ASCII if all codepoints are 0-127, then LATIN1 if it contains at least one char in the 128-255 range, otherwise UTF16? As you say this would mean we could skip scanning StringCoding::hasNegatives scans and similar overheads when converting known-to-be-ASCII Strings to UTF-8 and other ASCII compatible encoding. But it might also complicate logic in various places, so we need a clear win to motivate such work. So the first question to answer might be how much does the hasNegatives calls cost us. Depending on your hardware (and JDK version) the answer might be "almost nothing"! I did some work last year to speed up encoding and decoding ASCII-only data to/from various encodings. When I was done there was no longer much of a difference when encoding from String to an ASCII-compatible charset [1]. Maybe a scaling ~10% advantage for latin-1 in some microbenchmark when decoding on really large strings[2]. But with the suggested change the decoding advantage would likely disappear: we maintain the invariant that Strings are equals if and only if their coders are equal, so no we'd now have to scan latin-1 encoded streams for non-ASCII bytes to disambiguate between ASCII and LATIN1. Maybe there are some other case that would be helped, but with some likely negatives and only a modest potential win then my gut feeling is that this wouldn't pay off. Best regards Claes [1] https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html <https://urldefense.com/v3/__https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html__;!!ACWV5N9M2RV99hQ!I_AH1iAOysINPqrZ3OaqijBHyoZwpzC_5jUnQk-lTGa8-V3RL6CnY8WP3t__jWfDKrUylGMS6E5NEBW7JwNo$> [2] https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html <https://urldefense.com/v3/__https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html__;!!ACWV5N9M2RV99hQ!I_AH1iAOysINPqrZ3OaqijBHyoZwpzC_5jUnQk-lTGa8-V3RL6CnY8WP3t__jWfDKrUylGMS6E5NEDHadwAE$> (All the improvements discussed in the blog entires should be available since 17.0.2.) On 2022-05-18 14:07, Glavo wrote: > After the introduction of compact strings in Java 9, the current String may > store byte arrays encoded as Latin1 or UTF-16. > Here's a troublesome thing: Latin1 is not compatible with UTF-8. Not all > Latin1 byte strings are legal UTF-8 byte strings: > They are only compatible within the ASCII range, when there are code points > greater than 127, Latin1 uses a one-byte > representation, while UTF-8 requires two bytes. > > As an example, every time `JavaLangAccess::getBytesNoRepl` is called to > convert a string to a UTF-8 array, > the internal implementation needs to call `StringCoding.hasNegatives` to > scan the content byte array to determine that > the string can be encoded in ASCII, and thus eliminate array copies. > Similar steps are performed when calling `str.getBytes(UTF_8)`. > > So, is it possible to introduce a third possible value for `String::coder`: > ASCII? > This looks like an attractive option, and if we do this, we can introduce > fast paths for many methods. > > Of course, I know that this change is not completely free, and some methods > may bring slight performance > degradation due to the need to judge the coder, in particular, there may be > an impact on the performance of the > StringBuil
RFR: 8287013: StringConcatFactory: remove short and byte mixers/prependers
The current short and byte mixers and prependers simply delegate to the int variants. At the LambdaForm level the values has already been implicitly cast to int, so removing this indirection and directly adapting to call the int-based variants is possible. This is a cleanup with minimal effect on bootstrap overhead and peak performance, since all the LF logic only deals with basic types (where byte and short and other subword primitives has been widened to ints anyhow). - Commit messages: - 8287013: StringConcatFactory: remove short and byte mixers/prependers Changes: https://git.openjdk.java.net/jdk/pull/8786/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8786&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8287013 Stats: 87 lines in 2 files changed: 12 ins; 74 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8786.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8786/head:pull/8786 PR: https://git.openjdk.java.net/jdk/pull/8786
Re: Question: Is it possible to introduce ASCII coder for compact strings to optimize performance for ASCII compatible encoding?
Hi, so your suggestion is that String::coder would be ASCII if all codepoints are 0-127, then LATIN1 if it contains at least one char in the 128-255 range, otherwise UTF16? As you say this would mean we could skip scanning StringCoding::hasNegatives scans and similar overheads when converting known-to-be-ASCII Strings to UTF-8 and other ASCII compatible encoding. But it might also complicate logic in various places, so we need a clear win to motivate such work. So the first question to answer might be how much does the hasNegatives calls cost us. Depending on your hardware (and JDK version) the answer might be "almost nothing"! I did some work last year to speed up encoding and decoding ASCII-only data to/from various encodings. When I was done there was no longer much of a difference when encoding from String to an ASCII-compatible charset [1]. Maybe a scaling ~10% advantage for latin-1 in some microbenchmark when decoding on really large strings[2]. But with the suggested change the decoding advantage would likely disappear: we maintain the invariant that Strings are equals if and only if their coders are equal, so no we'd now have to scan latin-1 encoded streams for non-ASCII bytes to disambiguate between ASCII and LATIN1. Maybe there are some other case that would be helped, but with some likely negatives and only a modest potential win then my gut feeling is that this wouldn't pay off. Best regards Claes [1] https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html [2] https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html (All the improvements discussed in the blog entires should be available since 17.0.2.) On 2022-05-18 14:07, Glavo wrote: After the introduction of compact strings in Java 9, the current String may store byte arrays encoded as Latin1 or UTF-16. Here's a troublesome thing: Latin1 is not compatible with UTF-8. Not all Latin1 byte strings are legal UTF-8 byte strings: They are only compatible within the ASCII range, when there are code points greater than 127, Latin1 uses a one-byte representation, while UTF-8 requires two bytes. As an example, every time `JavaLangAccess::getBytesNoRepl` is called to convert a string to a UTF-8 array, the internal implementation needs to call `StringCoding.hasNegatives` to scan the content byte array to determine that the string can be encoded in ASCII, and thus eliminate array copies. Similar steps are performed when calling `str.getBytes(UTF_8)`. So, is it possible to introduce a third possible value for `String::coder`: ASCII? This looks like an attractive option, and if we do this, we can introduce fast paths for many methods. Of course, I know that this change is not completely free, and some methods may bring slight performance degradation due to the need to judge the coder, in particular, there may be an impact on the performance of the StringBuilder/StringBuffer. However, given that UTF-8 is by far the most commonly used file encoding, the performance benefits of fast paths are likely to cover more scenarios. In addition to this, other ASCII compatible encodings may also benefit, such as GB18030(GBK), or ISO 8859 variants. And if frozen arrays were introduced into the JDK, there would be more scenarios to enjoy performance improvements. So I would like to ask, is it possible for JDK to improve String storage in a similar way in the future? Has anyone explored this issue before? Sorry to bother you all, but I'm very much looking forward to the answer to this question.
Re: RFR: 8286441: Remove mode parameter from jdk.internal.perf.Perf.attach()
On Tue, 10 May 2022 04:00:29 GMT, Ioi Lam wrote: > The `mode` parameter for ` jdk.internal.perf.Perf.attach()` has never > supported the value `"rw"` since the source code was imported to the openjdk > repo more than 15 years ago. In fact HotSpot throws > `IllegalArgumentException` when such a mode is specified. > > It's unlikely such a mode will be required for future enhancements. Support > for `"rw"` is removed. The "mode" parameter is also removed, since now `"r"` > is the only supported mode. > > I also cleaned up related code in the JDK and HotSpot. > > Testing: > - Passed tiers 1 and 2 > - Tiers 3, 4, 5 are in progress Nice cleanup! (Some stylistic suggestions inline, feel free to ignore) src/hotspot/os/windows/perfMemory_windows.cpp line 1781: > 1779: // address space. > 1780: // > 1781: void PerfMemory::attach(const char* user, int vmid, One line? src/hotspot/share/prims/perf.cpp line 84: > 82: > 83: // attach to the PerfData memory region for the specified VM > 84: PerfMemory::attach(user_utf, vmid, One line? src/hotspot/share/runtime/perfMemory.hpp line 146: > 144: // methods for attaching to and detaching from the PerfData > 145: // memory segment of another JVM process on the same system. > 146: static void attach(const char* user, int vmid, One line? src/jdk.jstatd/share/classes/sun/tools/jstatd/RemoteHostImpl.java line 74: > 72: Integer v = lvmid; > 73: RemoteVm stub = null; > 74: StringBuilder sb = new StringBuilder(); Suggestion: String vmidStr = "local://" + lvmid + "@localhost"; - Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8622
Integrated: 8286298: Remove unused methods in sun.invoke.util.VerifyType
On Fri, 6 May 2022 11:32:25 GMT, Claes Redestad wrote: > A few untested and unused methods in `VerifyType` which can be removed. > (Possibly used by native JSR 292 implementations in JDK 7). This pull request has now been integrated. Changeset: 3fa1c404 Author: Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/3fa1c4043919943baf0a2cdfaf040ffdd844750c Stats: 91 lines in 2 files changed: 2 ins; 85 del; 4 mod 8286298: Remove unused methods in sun.invoke.util.VerifyType Reviewed-by: bpb, alanb, mchung - PR: https://git.openjdk.java.net/jdk/pull/8570
Re: RFR: 8282701: Use Class.getInterfaces(false) where possible to reduce allocation pressure [v2]
On Wed, 4 May 2022 09:46:00 GMT, Š”ŠµŃŠ³ŠµŠ¹ Š¦ŃŠæŠ°Š½Š¾Š² wrote: >> `Class.getInterfaces(false)` does not clone underlying array and can be used >> in cases when the returned array is only read from. > > Š”ŠµŃŠ³ŠµŠ¹ Š¦ŃŠæŠ°Š½Š¾Š² has updated the pull request incrementally with one additional > commit since the last revision: > > 8282701: Revert some irrelevant changes Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/7714
Re: RFR: 8286298: Remove unused methods in sun.invoke.util.VerifyType [v2]
> A few untested and unused methods in `VerifyType` which can be removed. > (Possibly used by native JSR 292 implementations in JDK 7). Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Inline isNullReferenceConversion at sole call site and remove it as well - Changes: - all: https://git.openjdk.java.net/jdk/pull/8570/files - new: https://git.openjdk.java.net/jdk/pull/8570/files/a456de4f..b2d1af4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8570&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8570&range=00-01 Stats: 19 lines in 2 files changed: 2 ins; 14 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/8570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8570/head:pull/8570 PR: https://git.openjdk.java.net/jdk/pull/8570
Re: RFR: 8286163: micro-optimize Instant.plusSeconds
On Fri, 6 May 2022 22:02:58 GMT, Stephen Colebourne wrote: > Seems reasonable to me. plus(long, long) already has this optimisation. If it already had this optimization then why change anything? I think you're referring to the check for `0` to return `this` then that is something `plusSeconds` will need to retain for parity. The optimization here appears to be the avoiding of added arithmetic for dealing with nanoseconds. And though it bothers me that the JIT doesn't optimize this better (given that the nanosecond parameter to `plus(long, long)` is a constant zero) the patch does seem reasonable. - PR: https://git.openjdk.java.net/jdk/pull/8542
Re: RFR: 8286163: micro-optimize Instant.plusSeconds
On Wed, 4 May 2022 20:27:04 GMT, lennartfricke wrote: > Provide micro-benchmark for comparison Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8542
Re: RFR: 8286163: micro-optimize Instant.plusSeconds
On Wed, 4 May 2022 20:27:04 GMT, lennartfricke wrote: > Provide micro-benchmark for comparison Hi, thanks for the contribution! How big a speed-up are you observing? Keeping the optimization in `plusSeconds` rather than moving it to `plus(long, long)` means expressions like `instant.plusMillis(1000)` won't be helped, but such expressions might be rarely helped anyway so what you have might be better overall since it doesn't add a branch to the common code. - PR: https://git.openjdk.java.net/jdk/pull/8542
RFR: 8286298: Remove unused methods in sun.invoke.util.VerifyType
A few untested and unused methods in `VerifyType` which can be removed. (Possibly used by native JSR 292 implementations in JDK 7). - Commit messages: - Remove unused methods in sun.invoke.util.VerifyType Changes: https://git.openjdk.java.net/jdk/pull/8570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8570&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8286298 Stats: 72 lines in 1 file changed: 0 ins; 71 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8570/head:pull/8570 PR: https://git.openjdk.java.net/jdk/pull/8570
Re: RFR: 8282701: Use Class.getInterfaces(false) where possible to reduce allocation pressure
On Mon, 2 May 2022 16:19:07 GMT, Mandy Chung wrote: >> `Class.getInterfaces(false)` does not clone underlying array and can be used >> in cases when the returned array is only read from. > > For the `checkPackageAccess` case, I don't think it worths fixing; not only > that security manager is deprecated for removal but also when security > manager is enabled, there are lots of allocations for other security checks. > > So the `getInterface(boolean)` method can be kept private. I agree with @mlchung that the call site in `ClassLoader` is not particularly interesting and doesn't motivate a non-cloning, trusted method. There are a few other places in java.base where it's used and a trusted method could help, but not sure any of those are performance critical. - PR: https://git.openjdk.java.net/jdk/pull/7714
Re: RFR: 8282701: Use Class.getInterfaces(false) where possible to reduce allocation pressure
On Sat, 5 Mar 2022 13:07:56 GMT, Š”ŠµŃŠ³ŠµŠ¹ Š¦ŃŠæŠ°Š½Š¾Š² wrote: > `Class.getInterfaces(false)` does not clone underlying array and can be used > in cases when the returned array is only read from. I think this ok in general, but for consistency and to better call out that we're dealing with a trusted shared array I would prefer if `getInterfaces(boolean)` remains private and instead we add a `getInterfacesShared()` (which simply call `getInterfaces(false)`). - Changes requested by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7714
Integrated: 8285633: Take better advantage of generic MethodType cache
On Tue, 26 Apr 2022 10:57:04 GMT, Claes Redestad wrote: > The `MethodType.genericMethodType` methods provide convenience methods for > certain common method types and also provide `@Stable` cache that allows for > constant folding. This patch enhances the more generic `methodType` methods > to take advantage of this cache, when possible. This allows calls like > `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, > making them 50x+ faster and allocation-free. > > Baseline: > > BenchmarkMode Cnt Score Error Units > MethodTypeAcquire.baselineRawavgt 15 0.673 ? 0.017 ns/op > MethodTypeAcquire.testGenericObject avgt 15 0.579 ? 0.125 ns/op > MethodTypeAcquire.testMultiPType avgt 15 46.671 ? 1.134 ns/op > MethodTypeAcquire.testMultiPType_Arg avgt 15 27.019 ? 0.437 ns/op > MethodTypeAcquire.testMultiPType_ObjectAndA avgt 15 57.217 ? 0.505 ns/op > MethodTypeAcquire.testMultiPType_ObjectOnly avgt 15 57.114 ? 1.214 ns/op > MethodTypeAcquire.testObjectObject avgt 15 33.380 ? 1.810 ns/op > MethodTypeAcquire.testReturnInt avgt 15 28.043 ? 0.187 ns/op > MethodTypeAcquire.testReturnObject avgt 15 24.313 ? 0.074 ns/op > MethodTypeAcquire.testReturnVoid avgt 15 24.360 ? 0.155 ns/op > MethodTypeAcquire.testSinglePTypeavgt 15 33.572 ? 1.101 ns/op > > > Patched: > > BenchmarkMode Cnt Score Error Units > MethodTypeAcquire.baselineRawavgt 15 0.683 ? 0.024 ns/op > MethodTypeAcquire.testGenericObject avgt 15 0.547 ? 0.047 ns/op > MethodTypeAcquire.testMultiPType avgt 15 48.532 ? 0.982 ns/op > MethodTypeAcquire.testMultiPType_Arg avgt 15 31.390 ? 5.269 ns/op > MethodTypeAcquire.testMultiPType_ObjectAndA avgt 15 60.165 ? 2.756 ns/op > MethodTypeAcquire.testMultiPType_ObjectOnly avgt 15 0.599 ? 0.166 ns/op > MethodTypeAcquire.testObjectObject avgt 15 0.541 ? 0.045 ns/op > MethodTypeAcquire.testReturnInt avgt 15 28.174 ? 0.257 ns/op > MethodTypeAcquire.testReturnObject avgt 15 0.542 ? 0.045 ns/op > MethodTypeAcquire.testReturnVoid avgt 15 24.621 ? 0.202 ns/op > MethodTypeAcquire.testSinglePTypeavgt 15 33.614 ? 1.109 ns/op > > > The cost on method types that don't match are in the noise (0-2ns/op). Even > on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) > there's barely any appreciable cost (~3ns/op, with overlapping error). This pull request has now been integrated. Changeset: 6c79671e Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/6c79671e50d572f3da3a286d34a98dcb83b8d906 Stats: 119 lines in 8 files changed: 96 ins; 1 del; 22 mod 8285633: Take better advantage of generic MethodType cache Reviewed-by: jvernee - PR: https://git.openjdk.java.net/jdk/pull/8398
Re: RFR: 8285633: Take better advantage of generic MethodType cache [v2]
On Tue, 26 Apr 2022 19:30:31 GMT, Jorn Vernee wrote: >> Right.. I did a quick experiment and there's a large speed-up in the trivial >> `methodType(obj, obj)` case: >> >> Benchmark Mode Cnt Score Error >> Units >> MethodTypeAcquire.testObjectObjectNonConstant avgt5 30.052 ? 3.440 >> ns/op # baseline >> MethodTypeAcquire.testObjectObjectNonConstant avgt5 1.171 ? 0.001 >> ns/op # patch >> >> >> I'll add a non-constant variant for the multi-arg Object variants, too. It >> should scale linearly and see a gain in the same ballpark. I think we need >> to keep the number of microbenchmarks here somewhat under control, though. > > That sounds good, thanks Added 3 micros I think carry their own weight as they assess some different major pathways through the caching fast-path. Baseline: Benchmark Mode Cnt Score Error Units MethodTypeAcquire.testMultiPType_ObjectAndA_NonConst avgt 15 63.523 ? 2.452 ns/op MethodTypeAcquire.testMultiPType_ObjectOnly_NonConst avgt 15 57.770 ? 0.501 ns/op MethodTypeAcquire.testObjectObject_NonConst avgt 15 30.090 ? 0.251 ns/op Patched: Benchmark Mode Cnt Score Error Units MethodTypeAcquire.testMultiPType_ObjectAndA_NonConst avgt 15 64.570 ? 0.410 ns/op MethodTypeAcquire.testMultiPType_ObjectOnly_NonConst avgt 15 4.735 ? 0.063 ns/op MethodTypeAcquire.testObjectObject_NonConst avgt 15 1.171 ? 0.001 ns/op The relative speed-up diminishes a bit for the loopy 6-arg `methodType` calls, but is still a healthy 12x. Overhead on the negative test might be there but is very much in the noise on my tests (tried doubling warmup time to no avail) - PR: https://git.openjdk.java.net/jdk/pull/8398
Re: RFR: 8285633: Take better advantage of generic MethodType cache [v2]
> The `MethodType.genericMethodType` methods provide convenience methods for > certain common method types and also provide `@Stable` cache that allows for > constant folding. This patch enhances the more generic `methodType` methods > to take advantage of this cache, when possible. This allows calls like > `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, > making them 50x+ faster and allocation-free. > > Baseline: > > BenchmarkMode Cnt Score Error Units > MethodTypeAcquire.baselineRawavgt 15 0.673 ? 0.017 ns/op > MethodTypeAcquire.testGenericObject avgt 15 0.579 ? 0.125 ns/op > MethodTypeAcquire.testMultiPType avgt 15 46.671 ? 1.134 ns/op > MethodTypeAcquire.testMultiPType_Arg avgt 15 27.019 ? 0.437 ns/op > MethodTypeAcquire.testMultiPType_ObjectAndA avgt 15 57.217 ? 0.505 ns/op > MethodTypeAcquire.testMultiPType_ObjectOnly avgt 15 57.114 ? 1.214 ns/op > MethodTypeAcquire.testObjectObject avgt 15 33.380 ? 1.810 ns/op > MethodTypeAcquire.testReturnInt avgt 15 28.043 ? 0.187 ns/op > MethodTypeAcquire.testReturnObject avgt 15 24.313 ? 0.074 ns/op > MethodTypeAcquire.testReturnVoid avgt 15 24.360 ? 0.155 ns/op > MethodTypeAcquire.testSinglePTypeavgt 15 33.572 ? 1.101 ns/op > > > Patched: > > BenchmarkMode Cnt Score Error Units > MethodTypeAcquire.baselineRawavgt 15 0.683 ? 0.024 ns/op > MethodTypeAcquire.testGenericObject avgt 15 0.547 ? 0.047 ns/op > MethodTypeAcquire.testMultiPType avgt 15 48.532 ? 0.982 ns/op > MethodTypeAcquire.testMultiPType_Arg avgt 15 31.390 ? 5.269 ns/op > MethodTypeAcquire.testMultiPType_ObjectAndA avgt 15 60.165 ? 2.756 ns/op > MethodTypeAcquire.testMultiPType_ObjectOnly avgt 15 0.599 ? 0.166 ns/op > MethodTypeAcquire.testObjectObject avgt 15 0.541 ? 0.045 ns/op > MethodTypeAcquire.testReturnInt avgt 15 28.174 ? 0.257 ns/op > MethodTypeAcquire.testReturnObject avgt 15 0.542 ? 0.045 ns/op > MethodTypeAcquire.testReturnVoid avgt 15 24.621 ? 0.202 ns/op > MethodTypeAcquire.testSinglePTypeavgt 15 33.614 ? 1.109 ns/op > > > The cost on method types that don't match are in the noise (0-2ns/op). Even > on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) > there's barely any appreciable cost (~3ns/op, with overlapping error). Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Add micros using non-constant arguments to assess performance in absense of constant folding - Changes: - all: https://git.openjdk.java.net/jdk/pull/8398/files - new: https://git.openjdk.java.net/jdk/pull/8398/files/6d2edc59..fcc020e6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8398&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8398&range=00-01 Stats: 18 lines in 1 file changed: 18 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/8398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8398/head:pull/8398 PR: https://git.openjdk.java.net/jdk/pull/8398
Re: RFR: 8285633: Take better advantage of generic MethodType cache
On Tue, 26 Apr 2022 15:30:41 GMT, Jorn Vernee wrote: >> The `MethodType.genericMethodType` methods provide convenience methods for >> certain common method types and also provide `@Stable` cache that allows for >> constant folding. This patch enhances the more generic `methodType` methods >> to take advantage of this cache, when possible. This allows calls like >> `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, >> making them 50x+ faster and allocation-free. >> >> Baseline: >> >> BenchmarkMode Cnt Score Error Units >> MethodTypeAcquire.baselineRawavgt 15 0.673 ? 0.017 ns/op >> MethodTypeAcquire.testGenericObject avgt 15 0.579 ? 0.125 ns/op >> MethodTypeAcquire.testMultiPType avgt 15 46.671 ? 1.134 ns/op >> MethodTypeAcquire.testMultiPType_Arg avgt 15 27.019 ? 0.437 ns/op >> MethodTypeAcquire.testMultiPType_ObjectAndA avgt 15 57.217 ? 0.505 ns/op >> MethodTypeAcquire.testMultiPType_ObjectOnly avgt 15 57.114 ? 1.214 ns/op >> MethodTypeAcquire.testObjectObject avgt 15 33.380 ? 1.810 ns/op >> MethodTypeAcquire.testReturnInt avgt 15 28.043 ? 0.187 ns/op >> MethodTypeAcquire.testReturnObject avgt 15 24.313 ? 0.074 ns/op >> MethodTypeAcquire.testReturnVoid avgt 15 24.360 ? 0.155 ns/op >> MethodTypeAcquire.testSinglePTypeavgt 15 33.572 ? 1.101 ns/op >> >> >> Patched: >> >> BenchmarkMode Cnt Score Error Units >> MethodTypeAcquire.baselineRawavgt 15 0.683 ? 0.024 ns/op >> MethodTypeAcquire.testGenericObject avgt 15 0.547 ? 0.047 ns/op >> MethodTypeAcquire.testMultiPType avgt 15 48.532 ? 0.982 ns/op >> MethodTypeAcquire.testMultiPType_Arg avgt 15 31.390 ? 5.269 ns/op >> MethodTypeAcquire.testMultiPType_ObjectAndA avgt 15 60.165 ? 2.756 ns/op >> MethodTypeAcquire.testMultiPType_ObjectOnly avgt 15 0.599 ? 0.166 ns/op >> MethodTypeAcquire.testObjectObject avgt 15 0.541 ? 0.045 ns/op >> MethodTypeAcquire.testReturnInt avgt 15 28.174 ? 0.257 ns/op >> MethodTypeAcquire.testReturnObject avgt 15 0.542 ? 0.045 ns/op >> MethodTypeAcquire.testReturnVoid avgt 15 24.621 ? 0.202 ns/op >> MethodTypeAcquire.testSinglePTypeavgt 15 33.614 ? 1.109 ns/op >> >> >> The cost on method types that don't match are in the noise (0-2ns/op). Even >> on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) >> there's barely any appreciable cost (~3ns/op, with overlapping error). > > test/micro/org/openjdk/bench/java/lang/invoke/MethodTypeAcquire.java line 104: > >> 102: @Benchmark >> 103: public MethodType testMultiPType_ObjectOnly() { >> 104: return MethodType.methodType(Object.class, Object.class, >> Object.class, Object.class, Object.class, Object.class, Object.class); > > It might be interesting to add a benchmark where all types are `Object`, but > non-constants, to see if that case improves as well. Right.. I did a quick experiment and there's a large speed-up in the trivial `methodType(obj, obj)` case: Benchmark Mode Cnt Score Error Units MethodTypeAcquire.testObjectObjectNonConstant avgt5 30.052 ? 3.440 ns/op # baseline MethodTypeAcquire.testObjectObjectNonConstant avgt5 1.171 ? 0.001 ns/op # patch I'll add a non-constant variant for the multi-arg Object variants, too. It should scale linearly and see a gain in the same ballpark. I think we need to keep the number of microbenchmarks here somewhat under control, though. - PR: https://git.openjdk.java.net/jdk/pull/8398
RFR: 8285633: Take better advantage of generic MethodType cache
The `MethodType.genericMethodType` methods provide convenience methods for certain common method types and also provide `@Stable` cache that allows for constant folding. This patch enhances the more generic `methodType` methods to take advantage of this cache, when possible. This allows calls like `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, making them 50x+ faster and allocation-free. Baseline: BenchmarkMode Cnt Score Error Units MethodTypeAcquire.baselineRawavgt5 0.673 ? 0.036 ns/op MethodTypeAcquire.testGenericObject avgt5 0.520 ? 0.162 ns/op MethodTypeAcquire.testMultiPType avgt5 47.922 ? 1.778 ns/op MethodTypeAcquire.testMultiPType_Arg avgt5 28.992 ? 0.949 ns/op MethodTypeAcquire.testMultiPType_ObjectAndA avgt5 56.698 ? 0.879 ns/op MethodTypeAcquire.testMultiPType_ObjectOnly avgt5 55.705 ? 0.854 ns/op MethodTypeAcquire.testObjectObject avgt5 31.729 ? 2.204 ns/op MethodTypeAcquire.testReturnInt avgt5 26.171 ? 0.141 ns/op MethodTypeAcquire.testReturnObject avgt5 24.340 ? 0.868 ns/op MethodTypeAcquire.testReturnVoid avgt5 24.793 ? 2.816 ns/op MethodTypeAcquire.testSinglePTypeavgt5 32.653 ? 1.082 ns/op Patched: BenchmarkMode Cnt Score Error Units MethodTypeAcquire.baselineRawavgt5 0.727 ? 0.382 ns/op MethodTypeAcquire.testGenericObject avgt5 0.658 ? 0.655 ns/op MethodTypeAcquire.testMultiPType avgt5 48.066 ? 1.187 ns/op MethodTypeAcquire.testMultiPType_Arg avgt5 27.097 ? 0.745 ns/op MethodTypeAcquire.testMultiPType_ObjectAndA avgt5 58.074 ? 6.299 ns/op MethodTypeAcquire.testMultiPType_ObjectOnly avgt5 0.538 ? 0.195 ns/op MethodTypeAcquire.testObjectObject avgt5 0.520 ? 0.162 ns/op MethodTypeAcquire.testReturnInt avgt5 28.093 ? 0.235 ns/op MethodTypeAcquire.testReturnObject avgt5 0.540 ? 0.204 ns/op MethodTypeAcquire.testReturnVoid avgt5 24.746 ? 0.706 ns/op MethodTypeAcquire.testSinglePTypeavgt5 32.662 ? 1.924 ns/op The cost on method types that don't match are in the noise (0-2ns/op). Even on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) there's barely any appreciable cost. - Commit messages: - Copyrights - 8285633: Take better advantage of generic MethodType cache Changes: https://git.openjdk.java.net/jdk/pull/8398/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8398&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8285633 Stats: 101 lines in 8 files changed: 78 ins; 1 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/8398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8398/head:pull/8398 PR: https://git.openjdk.java.net/jdk/pull/8398
Re: RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]
On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess wrote: >> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches >> https://github.com/openjdk/jdk/pull/8292/ >> >> if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) { >> continue; >> } >> >> should be changed to >> >> if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : >> c1) == Character.toLowerCase(u2)) { >> continue; >> } >> >> as: >> >> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster. >> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and >> don't need a lowercase cauculation. > > XenoAmess has updated the pull request incrementally with one additional > commit since the last revision: > > remove = check Unfortunately this leads to an error for case-insensitive `regionMatches` between a latin-1-string that contains either of `\u00b5` or `\u00ff` (these two code-points have upper case codepoints outside of the latin-1 range) and a UTF-16 string: jshell> "\u00b5".regionMatches(true, 0, "\u0100", 0, 1) | Exception java.lang.ArrayIndexOutOfBoundsException: Index 924 out of bounds for length 256 |at CharacterDataLatin1.getProperties (CharacterDataLatin1.java:74) |at CharacterDataLatin1.toLowerCase (CharacterDataLatin1.java:140) |at StringLatin1.regionMatchesCI_UTF16 (StringLatin1.java:420) |at String.regionMatches (String.java:2238) |at (#4:1) - PR: https://git.openjdk.java.net/jdk/pull/8308
Integrated: 8284880: Re-examine sun.invoke.util.Wrapper hash tables
On Thu, 14 Apr 2022 11:19:04 GMT, Claes Redestad wrote: > This patch examines and optimizes `Wrapper` lookups. > > First wrote a few simple microbenchmarks to verify there are actual speedups > from using perfect hash tables in `sun.invoke.util.Wrapper` compared to > simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a > speed-up for the case of `char` -> `Wrapper`, but not when mapping from > `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case > didn't use the `FROM_CHAR` table for some reason, which is remedied. > > Micros show benefits across the board for warmed up case: > > > Baseline, OOTB > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 14.387 ? 0.127 ns/op > Wrappers.forPrimitive avgt5 38.818 ? 0.592 ns/op > Wrappers.forPrimitiveType avgt5 26.085 ? 2.291 ns/op > Wrappers.forWrapperavgt5 44.459 ? 1.635 ns/op > > Patch, OOTB > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 14.357 ? 0.133 ns/op > Wrappers.forPrimitive avgt5 23.930 ? 0.071 ns/op > Wrappers.forPrimitiveType avgt5 14.343 ? 0.017 ns/op > Wrappers.forWrapperavgt5 27.622 ? 0.022 ns/op > > > For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when > spinning up of MHs) there are decent or even great wins in all cases but > `forPrimitiveType` - which was changed from a simple switch to use the hash > lookup. Since the interpreter penalty is small in absolute terms and the win > on JITed code is significant this seems like a reasonable trade-off: > > > Baseline, -Xint > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 1246.144 ? 149.933 ns/op > Wrappers.forPrimitive avgt5 4955.297 ? 329.869 ns/op > Wrappers.forPrimitiveType avgt5 716.840 ? 62.568 ns/op > Wrappers.forWrapperavgt5 5774.700 ? 367.627 ns/op > > Patch, -Xint > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 1068.096 ? 101.728 ns/op > Wrappers.forPrimitive avgt5 1146.670 ? 59.142 ns/op > Wrappers.forPrimitiveType avgt5 998.037 ? 118.144 ns/op > Wrappers.forWrapperavgt5 3581.226 ? 20.167 ns/op This pull request has now been integrated. Changeset: 5df8bd6b Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/5df8bd6b4e15686aa7d72b3f5a977eb51b0befc3 Stats: 259 lines in 3 files changed: 158 ins; 59 del; 42 mod 8284880: Re-examine sun.invoke.util.Wrapper hash tables Reviewed-by: erikj, mchung - PR: https://git.openjdk.java.net/jdk/pull/8242
Re: RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables [v3]
> This patch examines and optimizes `Wrapper` lookups. > > First wrote a few simple microbenchmarks to verify there are actual speedups > from using perfect hash tables in `sun.invoke.util.Wrapper` compared to > simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a > speed-up for the case of `char` -> `Wrapper`, but not when mapping from > `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case > didn't use the `FROM_CHAR` table for some reason, which is remedied. > > Micros show benefits across the board for warmed up case: > > > Baseline, OOTB > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 14.387 ? 0.127 ns/op > Wrappers.forPrimitive avgt5 38.818 ? 0.592 ns/op > Wrappers.forPrimitiveType avgt5 26.085 ? 2.291 ns/op > Wrappers.forWrapperavgt5 44.459 ? 1.635 ns/op > > Patch, OOTB > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 14.357 ? 0.133 ns/op > Wrappers.forPrimitive avgt5 23.930 ? 0.071 ns/op > Wrappers.forPrimitiveType avgt5 14.343 ? 0.017 ns/op > Wrappers.forWrapperavgt5 27.622 ? 0.022 ns/op > > > For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when > spinning up of MHs) there are decent or even great wins in all cases but > `forPrimitiveType` - which was changed from a simple switch to use the hash > lookup. Since the interpreter penalty is small in absolute terms and the win > on JITed code is significant this seems like a reasonable trade-off: > > > Baseline, -Xint > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 1246.144 ? 149.933 ns/op > Wrappers.forPrimitive avgt5 4955.297 ? 329.869 ns/op > Wrappers.forPrimitiveType avgt5 716.840 ? 62.568 ns/op > Wrappers.forWrapperavgt5 5774.700 ? 367.627 ns/op > > Patch, -Xint > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 1068.096 ? 101.728 ns/op > Wrappers.forPrimitive avgt5 1146.670 ? 59.142 ns/op > Wrappers.forPrimitiveType avgt5 998.037 ? 118.144 ns/op > Wrappers.forWrapperavgt5 3581.226 ? 20.167 ns/op Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'wrappers' of https://github.com/cl4es/jdk into wrappers - Copyrights - Changes: - all: https://git.openjdk.java.net/jdk/pull/8242/files - new: https://git.openjdk.java.net/jdk/pull/8242/files/97eec9d3..792c37e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/8242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8242/head:pull/8242 PR: https://git.openjdk.java.net/jdk/pull/8242
Re: RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables [v2]
On Thu, 14 Apr 2022 13:59:57 GMT, Claes Redestad wrote: >> This patch examines and optimizes `Wrapper` lookups. >> >> First wrote a few simple microbenchmarks to verify there are actual speedups >> from using perfect hash tables in `sun.invoke.util.Wrapper` compared to >> simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ >> a speed-up for the case of `char` -> `Wrapper`, but not when mapping from >> `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case >> didn't use the `FROM_CHAR` table for some reason, which is remedied. >> >> Micros show benefits across the board for warmed up case: >> >> >> Baseline, OOTB >> Benchmark Mode Cnt Score Error Units >> Wrappers.forBasicType avgt5 14.387 ? 0.127 ns/op >> Wrappers.forPrimitive avgt5 38.818 ? 0.592 ns/op >> Wrappers.forPrimitiveType avgt5 26.085 ? 2.291 ns/op >> Wrappers.forWrapperavgt5 44.459 ? 1.635 ns/op >> >> Patch, OOTB >> Benchmark Mode Cnt Score Error Units >> Wrappers.forBasicType avgt5 14.357 ? 0.133 ns/op >> Wrappers.forPrimitive avgt5 23.930 ? 0.071 ns/op >> Wrappers.forPrimitiveType avgt5 14.343 ? 0.017 ns/op >> Wrappers.forWrapperavgt5 27.622 ? 0.022 ns/op >> >> >> For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when >> spinning up of MHs) there are decent or even great wins in all cases but >> `forPrimitiveType` - which was changed from a simple switch to use the hash >> lookup. Since the interpreter penalty is small in absolute terms and the win >> on JITed code is significant this seems like a reasonable trade-off: >> >> >> Baseline, -Xint >> Benchmark Mode Cnt Score Error Units >> Wrappers.forBasicType avgt5 1246.144 ? 149.933 ns/op >> Wrappers.forPrimitive avgt5 4955.297 ? 329.869 ns/op >> Wrappers.forPrimitiveType avgt5 716.840 ? 62.568 ns/op >> Wrappers.forWrapperavgt5 5774.700 ? 367.627 ns/op >> >> Patch, -Xint >> Benchmark Mode Cnt Score Error Units >> Wrappers.forBasicType avgt5 1068.096 ? 101.728 ns/op >> Wrappers.forPrimitive avgt5 1146.670 ? 59.142 ns/op >> Wrappers.forPrimitiveType avgt5 998.037 ? 118.144 ns/op >> Wrappers.forWrapperavgt5 3581.226 ? 20.167 ns/op > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Add line break in make/test/BuildMicrobenchmark.gmk > > Co-authored-by: Erik Joelsson <37597443+erik...@users.noreply.github.com> Thanks for reviews! - PR: https://git.openjdk.java.net/jdk/pull/8242
Integrated: 8285001: Simplify StringLatin1.regionMatches
On Tue, 19 Apr 2022 08:50:09 GMT, Claes Redestad wrote: > There is no pair of character values in the latin1 range (0-255) that will > make the condition at line 401 in `StringLatin1.java` true, so this test can > be removed. > > Added a test and a microbenchmark (which as expected sees a few ns/op > improvement from this change). Took the opportunity to tune the default > settings for the micro to reduce runtime from 30+ minutes to 3 with no > discernible degradation of quality. This pull request has now been integrated. Changeset: e307bc86 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/e307bc8694462568807021191f9653ee80a93ed1 Stats: 95 lines in 3 files changed: 88 ins; 3 del; 4 mod 8285001: Simplify StringLatin1.regionMatches Reviewed-by: rriggs, naoto - PR: https://git.openjdk.java.net/jdk/pull/8292
Re: RFR: 8285001: Simplify StringLatin1.regionMatches
On Tue, 19 Apr 2022 08:50:09 GMT, Claes Redestad wrote: > There is no pair of character values in the latin1 range (0-255) that will > make the condition at line 401 in `StringLatin1.java` true, so this test can > be removed. > > Added a test and a microbenchmark (which as expected sees a few ns/op > improvement from this change). Took the opportunity to tune the default > settings for the micro to reduce runtime from 30+ minutes to 3 with no > discernible degradation of quality. Thanks for reviews! - PR: https://git.openjdk.java.net/jdk/pull/8292
Re: RFR: 8285001: Simplify StringLatin1.regionMatches [v2]
> There is no pair of character values in the latin1 range (0-255) that will > make the condition at line 401 in `StringLatin1.java` true, so this test can > be removed. > > Added a test and a microbenchmark (which as expected sees a few ns/op > improvement from this change). Took the opportunity to tune the default > settings for the micro to reduce runtime from 30+ minutes to 3 with no > discernible degradation of quality. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Copyright - Changes: - all: https://git.openjdk.java.net/jdk/pull/8292/files - new: https://git.openjdk.java.net/jdk/pull/8292/files/a8894f79..434a0fd2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8292&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8292&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8292.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8292/head:pull/8292 PR: https://git.openjdk.java.net/jdk/pull/8292
Integrated: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE
On Tue, 19 Apr 2022 13:19:31 GMT, Claes Redestad wrote: > Trivially fix the resolution of the `NF_UNSAFE` named function to use the > appropriate lookup mode. > > In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed > from regular reflection to use a `NamedFunction` for this field, but it > appears the lookup mode came out wrong. This mismatch appears to be benign > and ignored by HotSpot, but a user implementing an experimental JVM ran into > some issues (and additionally noticed the resolve throws the wrong > exception). > > This patch is a point fix to this particular issue, but I think we should > consider following up with a stronger assertion or test for proper staticness > of fields somewhere when resolving fields via > `MemberName.getFactory().resolveOrNull(..)`. This pull request has now been integrated. Changeset: 5d1ec54d Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/5d1ec54d6c20dfe67a459c9d102cdfa0394bcc1e Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE Reviewed-by: psandoz, mchung - PR: https://git.openjdk.java.net/jdk/pull/8297
Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE [v2]
On Tue, 19 Apr 2022 17:33:04 GMT, Claes Redestad wrote: >> Trivially fix the resolution of the `NF_UNSAFE` named function to use the >> appropriate lookup mode. >> >> In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we >> changed from regular reflection to use a `NamedFunction` for this field, but >> it appears the lookup mode came out wrong. This mismatch appears to be >> benign and ignored by HotSpot, but a user implementing an experimental JVM >> ran into some issues (and additionally noticed the resolve throws the wrong >> exception). >> >> This patch is a point fix to this particular issue, but I think we should >> consider following up with a stronger assertion or test for proper >> staticness of fields somewhere when resolving fields via >> `MemberName.getFactory().resolveOrNull(..)`. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Missed a spot! Thanks for reviewing! - PR: https://git.openjdk.java.net/jdk/pull/8297
Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE [v2]
On Tue, 19 Apr 2022 17:01:38 GMT, Mandy Chung wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Missed a spot! > > src/java.base/share/classes/java/lang/invoke/DirectMethodHandle.java line 901: > >> 899: MemberName member = new >> MemberName(MethodHandleStatics.class, "UNSAFE", Unsafe.class, REF_getStatic); >> 900: return new NamedFunction( >> 901: >> MemberName.getFactory().resolveOrFail(REF_getField, member, > > `REF_getField` passed to `resolveOrFail` should also be corrected? Yes, fixed. - PR: https://git.openjdk.java.net/jdk/pull/8297
Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE [v2]
> Trivially fix the resolution of the `NF_UNSAFE` named function to use the > appropriate lookup mode. > > In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed > from regular reflection to use a `NamedFunction` for this field, but it > appears the lookup mode came out wrong. This mismatch appears to be benign > and ignored by HotSpot, but a user implementing an experimental JVM ran into > some issues (and additionally noticed the resolve throws the wrong > exception). > > This patch is a point fix to this particular issue, but I think we should > consider following up with a stronger assertion or test for proper staticness > of fields somewhere when resolving fields via > `MemberName.getFactory().resolveOrNull(..)`. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Missed a spot! - Changes: - all: https://git.openjdk.java.net/jdk/pull/8297/files - new: https://git.openjdk.java.net/jdk/pull/8297/files/fa10026a..e4f37c49 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8297&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8297&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8297.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8297/head:pull/8297 PR: https://git.openjdk.java.net/jdk/pull/8297
RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE
Trivially fix the resolution of the `NF_UNSAFE` named function to use the appropriate lookup mode. In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed from regular reflection to use a `NamedFunction` for this field, but it appears the lookup mode came out wrong. This mismatch appears to be benign and ignored by HotSpot, but a user implementing an experimental JVM ran into some issues (and additionally noticed the resolve throws the wrong exception). This patch is a point fix to this particular issue, but I think we should consider following up with a stronger assertion or test for proper staticness of fields somewhere when resolving fields via `MemberName.getFactory().resolveOrNull(..)`. - Commit messages: - Use correct lookup mode for MethodHandleStatics.UNSAFE Changes: https://git.openjdk.java.net/jdk/pull/8297/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8297&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8285007 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/8297.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8297/head:pull/8297 PR: https://git.openjdk.java.net/jdk/pull/8297
RFR: 8285001: Simplify StringLatin1.regionMatches
There is no pair of character values in the latin1 range (0-255) that will make the condition at line 401 in `StringLatin1.java` true, so this test can be removed. Added a test and a microbenchmark (which as expected sees a few ns/op improvement from this change). Took the opportunity to tune the default settings for the micro to reduce runtime from 30+ minutes to 3 with no discernible degradation of quality. - Commit messages: - Add tests and micro - Simplify latin1 to latin1 locale-independent comparisons in StringLatin1.regionMatchesCI Changes: https://git.openjdk.java.net/jdk/pull/8292/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8292&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8285001 Stats: 94 lines in 3 files changed: 88 ins; 3 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/8292.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8292/head:pull/8292 PR: https://git.openjdk.java.net/jdk/pull/8292
Re: RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables [v2]
> This patch examines and optimizes `Wrapper` lookups. > > First wrote a few simple microbenchmarks to verify there are actual speedups > from using perfect hash tables in `sun.invoke.util.Wrapper` compared to > simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a > speed-up for the case of `char` -> `Wrapper`, but not when mapping from > `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case > didn't use the `FROM_CHAR` table for some reason, which is remedied. > > Micros show benefits across the board for warmed up case: > > > Baseline, OOTB > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 14.387 ? 0.127 ns/op > Wrappers.forPrimitive avgt5 38.818 ? 0.592 ns/op > Wrappers.forPrimitiveType avgt5 26.085 ? 2.291 ns/op > Wrappers.forWrapperavgt5 44.459 ? 1.635 ns/op > > Patch, OOTB > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 14.357 ? 0.133 ns/op > Wrappers.forPrimitive avgt5 23.930 ? 0.071 ns/op > Wrappers.forPrimitiveType avgt5 14.343 ? 0.017 ns/op > Wrappers.forWrapperavgt5 27.622 ? 0.022 ns/op > > > For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when > spinning up of MHs) there are decent or even great wins in all cases but > `forPrimitiveType` - which was changed from a simple switch to use the hash > lookup. Since the interpreter penalty is small in absolute terms and the win > on JITed code is significant this seems like a reasonable trade-off: > > > Baseline, -Xint > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 1246.144 ? 149.933 ns/op > Wrappers.forPrimitive avgt5 4955.297 ? 329.869 ns/op > Wrappers.forPrimitiveType avgt5 716.840 ? 62.568 ns/op > Wrappers.forWrapperavgt5 5774.700 ? 367.627 ns/op > > Patch, -Xint > Benchmark Mode Cnt Score Error Units > Wrappers.forBasicType avgt5 1068.096 ? 101.728 ns/op > Wrappers.forPrimitive avgt5 1146.670 ? 59.142 ns/op > Wrappers.forPrimitiveType avgt5 998.037 ? 118.144 ns/op > Wrappers.forWrapperavgt5 3581.226 ? 20.167 ns/op Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Add line break in make/test/BuildMicrobenchmark.gmk Co-authored-by: Erik Joelsson <37597443+erik...@users.noreply.github.com> - Changes: - all: https://git.openjdk.java.net/jdk/pull/8242/files - new: https://git.openjdk.java.net/jdk/pull/8242/files/d21330bb..97eec9d3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8242/head:pull/8242 PR: https://git.openjdk.java.net/jdk/pull/8242
RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables
This patch examines and optimizes `Wrapper` lookups. First wrote a few simple microbenchmarks to verify there are actual speedups from using perfect hash tables in `sun.invoke.util.Wrapper` compared to simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a speed-up for the case of `char` -> `Wrapper`, but not when mapping from `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case didn't use the `FROM_CHAR` table for some reason, which is remedied. Micros show benefits across the board for warmed up case: Baseline, OOTB Benchmark Mode Cnt Score Error Units Wrappers.forBasicType avgt5 14.387 ? 0.127 ns/op Wrappers.forPrimitive avgt5 38.818 ? 0.592 ns/op Wrappers.forPrimitiveType avgt5 26.085 ? 2.291 ns/op Wrappers.forWrapperavgt5 44.459 ? 1.635 ns/op Patch, OOTB Benchmark Mode Cnt Score Error Units Wrappers.forBasicType avgt5 14.357 ? 0.133 ns/op Wrappers.forPrimitive avgt5 23.930 ? 0.071 ns/op Wrappers.forPrimitiveType avgt5 14.343 ? 0.017 ns/op Wrappers.forWrapperavgt5 27.622 ? 0.022 ns/op For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when spinning up of MHs) there are decent or even great wins in all cases but `forPrimitiveType` - which was changed from a simple switch to use the hash lookup. Since the interpreter penalty is small in absolute terms and the win on JITed code is significant this seems like a reasonable trade-off: Baseline, -Xint Benchmark Mode Cnt Score Error Units Wrappers.forBasicType avgt5 1246.144 ? 149.933 ns/op Wrappers.forPrimitive avgt5 4955.297 ? 329.869 ns/op Wrappers.forPrimitiveType avgt5 716.840 ? 62.568 ns/op Wrappers.forWrapperavgt5 5774.700 ? 367.627 ns/op Patch, -Xint Benchmark Mode Cnt Score Error Units Wrappers.forBasicType avgt5 1068.096 ? 101.728 ns/op Wrappers.forPrimitive avgt5 1146.670 ? 59.142 ns/op Wrappers.forPrimitiveType avgt5 998.037 ? 118.144 ns/op Wrappers.forWrapperavgt5 3581.226 ? 20.167 ns/op - Commit messages: - Revert accidental changes to MethodType - Fix typo - Missed some changes refactoring, import DontInline - Merge branch 'master' into wrappers - Cleanup, use mask instead of modulo - Cleanup, add forPrimitiveType micro, reinstantiate char hashing - Add required add-exports to BuildMicrobenchmark.gmk - Fix imports - Tune sun.invoke.util.Wrapper accessors Changes: https://git.openjdk.java.net/jdk/pull/8242/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8284880 Stats: 256 lines in 3 files changed: 157 ins; 59 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/8242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8242/head:pull/8242 PR: https://git.openjdk.java.net/jdk/pull/8242
Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v9]
On Thu, 7 Apr 2022 07:01:23 GMT, Ludovic Henry wrote: >> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark(size) Mode Cnt Score >>Error Units >> StringHashCode.Algorithm.scalarLatin1 0 avgt 25 2.111 >> Ā± 0.210 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 3.500 >> Ā± 0.127 ns/op >> StringHashCode.Algorithm.scalarLatin110 avgt 25 7.001 >> Ā± 0.099 ns/op >> StringHashCode.Algorithm.scalarLatin1 100 avgt 2561.285 >> Ā± 0.444 ns/op >> StringHashCode.Algorithm.scalarLatin1 1000 avgt 25 628.995 >> Ā± 0.846 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 6307.990 >> Ā± 4.071 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 0 avgt 25 2.358 >> Ā± 0.092 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3.631 >> Ā± 0.159 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 10 avgt 25 7.049 >> Ā± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 2533.626 >> Ā± 1.218 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled161000 avgt 25 317.811 >> Ā± 1.225 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3212.333 >> Ā± 14.621 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled80 avgt 25 2.356 >> Ā± 0.097 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3.630 >> Ā± 0.158 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 10 avgt 25 8.724 >> Ā± 0.065 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 2532.402 >> Ā± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000 avgt 25 321.949 >> Ā± 0.251 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3202.083 >> Ā± 1.667 ns/op >> StringHashCode.Algorithm.scalarUTF16 0 avgt 25 2.135 >> Ā± 0.191 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 5.202 >> Ā± 0.362 ns/op >> StringHashCode.Algorithm.scalarUTF16 10 avgt 2511.105 >> Ā± 0.112 ns/op >> StringHashCode.Algorithm.scalarUTF16100 avgt 2575.974 >> Ā± 0.702 ns/op >> StringHashCode.Algorithm.scalarUTF16 1000 avgt 25 716.429 >> Ā± 3.290 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 7095.459 >> Ā± 43.847 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled160 avgt 25 2.381 >> Ā± 0.038 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 5.268 >> Ā± 0.422 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 10 avgt 2511.248 >> Ā± 0.178 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 100 avgt 2552.966 >> Ā± 0.089 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000 avgt 25 450.912 >> Ā± 1.834 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 4403.988 >> Ā± 2.927 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 0 avgt 25 2.401 >> Ā± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 5.091 >> Ā± 0.396 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled810 avgt 2512.801 >> Ā± 0.189 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 100 avgt 2552.068 >> Ā± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1000 avgt 25 453.270 >> Ā± 0.340 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 4433.112 >> Ā± 2.699 ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage. It's >> very unlikely that we are the only one as String.hashCode is such a core >> feature of the JVM-based languages with its use in HashMap for example. >> Having even only a 2x speedup would allow us to save thousands of CPU cores >> per month and improve correspondingly the energy/carbon impact. >> >> [1] >> https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf > > Ludovic Henry has updated the pull request incrementally with one additional > commit since t
Integrated: 8284579: Improve VarHandle checks for interpreter
On Fri, 8 Apr 2022 11:48:10 GMT, Claes Redestad wrote: > A few additional enhancements aiming to improve VH performance in the > interpreter: > > - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase > 40->48) but removes an object and an indirection on any instance actually > used - and might avoid allocating the `MethodHandle[]` unnecessarily on some > instances > - Have `checkExactAccessMode` return the directness of the `VarHandle` so > that we can avoid some `isDirect` method calls. > > Baseline, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op > VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op > VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op > > > Patched, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op > VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op > VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op > > > No significant performance difference in normal mode. This pull request has now been integrated. Changeset: 280aa428 Author:Claes Redestad URL: https://git.openjdk.java.net/jdk/commit/280aa428800043f314b92ae88076d596cb4c2fe0 Stats: 244 lines in 4 files changed: 26 ins; 28 del; 190 mod 8284579: Improve VarHandle checks for interpreter Reviewed-by: mcimadamore, mchung - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v5]
> A few additional enhancements aiming to improve VH performance in the > interpreter: > > - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase > 40->48) but removes an object and an indirection on any instance actually > used - and might avoid allocating the `MethodHandle[]` unnecessarily on some > instances > - Have `checkExactAccessMode` return the directness of the `VarHandle` so > that we can avoid some `isDirect` method calls. > > Baseline, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op > VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op > VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op > > > Patched, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op > VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op > VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op > > > No significant performance difference in normal mode. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: typo - Changes: - all: https://git.openjdk.java.net/jdk/pull/8160/files - new: https://git.openjdk.java.net/jdk/pull/8160/files/df1d652b..9998e538 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8160.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160 PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v4]
> A few additional enhancements aiming to improve VH performance in the > interpreter: > > - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase > 40->48) but removes an object and an indirection on any instance actually > used - and might avoid allocating the `MethodHandle[]` unnecessarily on some > instances > - Have `checkExactAccessMode` return the directness of the `VarHandle` so > that we can avoid some `isDirect` method calls. > > Baseline, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op > VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op > VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op > > > Patched, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op > VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op > VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op > > > No significant performance difference in normal mode. Claes Redestad has updated the pull request incrementally with two additional commits since the last revision: - Improve javadoc for merged method - Add javadoc for merged method - Changes: - all: https://git.openjdk.java.net/jdk/pull/8160/files - new: https://git.openjdk.java.net/jdk/pull/8160/files/2a4fbd6d..df1d652b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=02-03 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/8160.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160 PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v3]
On Tue, 12 Apr 2022 01:15:33 GMT, David Holmes wrote: > > checkExactAccessMode -> checkAccessModeThenIsDirect > > Don't you still want "Exact" in there? That "access" check seems odd anyway > as it only checks for one form of mismatch - should it not also check for > `!exact && accessModeType(ad.type) == ad.symbolicMethodTypeExact`? What's checked is that the access mode is nominally OK. It just so happens that the only rule validated up front is that iff the VH is exact then the types must be an exact match. For non-exact VH sufficient checks will be done later, eg. by the `asType` call (if the type of the VH isn't adaptable to `ad.symbolicMethodTypeInvoker` this will throw an exception). With that in mind then the name `checkExactAccessMode` even appears misleading as it can be interpreted that the VH _must_ be `exact`. Which is not the case. Dropping the `Exact` part makes it less ambiguous. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284637: Improve String.join performance
On Fri, 8 Apr 2022 19:33:26 GMT, XenoAmess wrote: > 8284637: Improve String.join performance I think you can use or extend the existing `org.openjdk.bench.java.util.StringJoinerBench` JMH for this. Refer to [doc/testing.md](../tree/master/doc/testing.md) for setup help, but you should be able to run specific JMH using the build system with something like `make test TEST=micro:StringJoinerBench.join` - PR: https://git.openjdk.java.net/jdk/pull/8169
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v3]
> A few additional enhancements aiming to improve VH performance in the > interpreter: > > - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase > 40->48) but removes an object and an indirection on any instance actually > used - and might avoid allocating the `MethodHandle[]` unnecessarily on some > instances > - Have `checkExactAccessMode` return the directness of the `VarHandle` so > that we can avoid some `isDirect` method calls. > > Baseline, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op > VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op > VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op > > > Patched, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op > VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op > VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op > > > No significant performance difference in normal mode. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: checkExactAccessMode -> checkAccessModeThenIsDirect, privatize throw method, cleanup code generator - Changes: - all: https://git.openjdk.java.net/jdk/pull/8160/files - new: https://git.openjdk.java.net/jdk/pull/8160/files/68f414a1..2a4fbd6d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=01-02 Stats: 90 lines in 4 files changed: 0 ins; 5 del; 85 mod Patch: https://git.openjdk.java.net/jdk/pull/8160.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160 PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Mon, 11 Apr 2022 09:57:42 GMT, Claes Redestad wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandles.java line 719: >> >>> 717: // >>> MethodHandle.linkToStatic(); >>> 718: //} else { >>> 719: //MethodHandle mh = >>> handle.getMethodHandle(ad.mode); >> >> TheĀ `direct`ānessĀ check canĀ beĀ hoisted intoĀ anĀ enclosing `if`Ā statement: >> Suggestion: >> >> //if (direct) { >> //if (handle.vform.methodType_table[ad.type] == >> ad.symbolicMethodTypeErased) { >> // >> MethodHandle.linkToStatic(); >> //return; >> //} else if (handle.vform.getMethodType_V(ad.type) >> == ad.symbolicMethodTypeErased) { >> // >> MethodHandle.linkToStatic(); >> //return; >> //} >> //} >> //MethodHandle mh = handle.getMethodHandle(ad.mode); >> >> >> Also, anyĀ reason `GUARD_METHOD_TEMPLATE_V` uses `LINK_TO_STATIC_ARGS` >> instead ofĀ `LINK_TO_STATIC_ARGS_V`? > >> How would the performance change if the `isDirect` and >> `checkExactAccessMode` merger was reverted? > > Add around 15-20ns/op for these micros. Restructuring so that we only check `direct` once sounds reasonable at face value but would be a lot of churn for little gain (even in the interpreter testing a local boolean field is fast, and JITs will optimize this well). The `LINK_TO_STATIC_ARGS_V` in the code generator seems like a remnant from an earlier iteration of this code. The `vform.getMemberName_V` method the code generator would emit a call to doesn't even exist. This should probably be cleaned up, separately. @PaulSandoz, WDYT? - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Mon, 11 Apr 2022 09:34:17 GMT, ExE Boss wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Simplified as suggested by @ExE-Boss > > src/java.base/share/classes/java/lang/invoke/VarHandles.java line 719: > >> 717: // >> MethodHandle.linkToStatic(); >> 718: //} else { >> 719: //MethodHandle mh = >> handle.getMethodHandle(ad.mode); > > TheĀ `direct`ānessĀ check canĀ beĀ hoisted intoĀ anĀ enclosing `if`Ā statement: > Suggestion: > > //if (direct) { > //if (handle.vform.methodType_table[ad.type] == > ad.symbolicMethodTypeErased) { > // > MethodHandle.linkToStatic(); > //return; > //} else if (handle.vform.getMethodType_V(ad.type) == > ad.symbolicMethodTypeErased) { > // > MethodHandle.linkToStatic(); > //return; > //} > //} > //MethodHandle mh = handle.getMethodHandle(ad.mode); > > > Also, anyĀ reason `GUARD_METHOD_TEMPLATE_V` uses `LINK_TO_STATIC_ARGS` instead > ofĀ `LINK_TO_STATIC_ARGS_V`? > How would the performance change if the `isDirect` and `checkExactAccessMode` > merger was reverted? Add around 15-20ns/op for these micros. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Mon, 11 Apr 2022 09:26:12 GMT, ExE Boss wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Simplified as suggested by @ExE-Boss > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2075: > >> 2073: >> 2074: @DontInline >> 2075: final void >> throwWrongMethodTypeException(VarHandle.AccessDescriptor ad) { > > ThisĀ canĀ actually beĀ `private` andĀ `static`: > Suggestion: > > private static final void > throwWrongMethodTypeException(VarHandle.AccessDescriptor ad) { `private` yes. `static` would require to pass the `MethodType` and since this is an exceptional slow-path it doesn't make sense to keep the complexity contained inside the `throw...` method. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8283892: Compress and expand bits [v6]
On Fri, 8 Apr 2022 19:13:35 GMT, Paul Sandoz wrote: >> Add support to compress bits and expand bits of `int` and `long` values, see >> Hacker's Delight (2nd edition), section 7.4. >> >> Compressing or expanding bits of an `int` or `long` value can be composed to >> enable general permutations, and the "sheep and goats" operation (SAG) see >> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a >> stable binary radix sort. >> >> The compress and expand functionality maps efficiently to hardware >> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the >> implementations can be very efficient on supporting hardware. >> Intrinsification will occur in a separate PR. >> >> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the >> beneficial performance impact of the `PDEP` instruction, and by extension >> the `expand` method, when applied to the implementation of a bit-vector >> select operation in succinct data structures (for example `select(r)` >> returns the position of the `r`th 1). >> >> Testing-wise the approach take is three fold: >> 1. Tests compared against simple implementations that are easy to read and >> verify against the JDK implementations (which later will also be made >> intrinsic). To compensate all tests are also run flipping the test methods >> and the methods under test. >> 2. Tests composed of compress and expand and vice versa. >> 3. Tests with known mask patterns, whose expected values are easily derived >> from the inputs. > > Paul Sandoz has updated the pull request incrementally with one additional > commit since the last revision: > > Refer to hexadecimal digit Marked as reviewed by redestad (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Fri, 8 Apr 2022 12:20:32 GMT, Claes Redestad wrote: >> A few additional enhancements aiming to improve VH performance in the >> interpreter: >> >> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase >> 40->48) but removes an object and an indirection on any instance actually >> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some >> instances >> - Have `checkExactAccessMode` return the directness of the `VarHandle` so >> that we can avoid some `isDirect` method calls. >> >> Baseline, `-Xint` >> >> Benchmark Mode CntScore Error Units >> VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op >> VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op >> VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op >> >> >> Patched, `-Xint` >> >> Benchmark Mode CntScore Error Units >> VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op >> VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op >> VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op >> >> >> No significant performance difference in normal mode. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Simplified as suggested by @ExE-Boss I couldn't come up with a better name, either, but I'll see if something comes up over the weekend. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Fri, 8 Apr 2022 15:05:54 GMT, Maurizio Cimadamore wrote: > TestAdaptVarHandles Thanks for reviewing, Maurizio. I ran the micros you suggested and I detect no change whatsoever at peak performance. With `-Xint` there are some small improvements in line with the results above. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Fri, 8 Apr 2022 12:12:06 GMT, ExE Boss wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Simplified as suggested by @ExE-Boss > > src/java.base/share/classes/java/lang/invoke/IndirectVarHandle.java line 104: > >> 102: if (exact && accessModeType(ad.type) != >> ad.symbolicMethodTypeExact) { >> 103: throwWrongMethodTypeException(ad); >> 104: } > > ThisĀ could alsoĀ be: > Suggestion: > > super.checkExactAccessMode(ad); Good suggestions, fixed. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
> A few additional enhancements aiming to improve VH performance in the > interpreter: > > - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase > 40->48) but removes an object and an indirection on any instance actually > used - and might avoid allocating the `MethodHandle[]` unnecessarily on some > instances > - Have `checkExactAccessMode` return the directness of the `VarHandle` so > that we can avoid some `isDirect` method calls. > > Baseline, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op > VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op > VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op > > > Patched, `-Xint` > > Benchmark Mode CntScore Error Units > VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op > VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op > VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op > > > No significant performance difference in normal mode. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Simplified as suggested by @ExE-Boss - Changes: - all: https://git.openjdk.java.net/jdk/pull/8160/files - new: https://git.openjdk.java.net/jdk/pull/8160/files/49c8fdf8..68f414a1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8160.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160 PR: https://git.openjdk.java.net/jdk/pull/8160
RFR: 8284579: Improve VarHandle checks for interpreter
A few additional enhancements aiming to improve VH performance in the interpreter: - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 40->48) but removes an object and an indirection on any instance actually used - and might avoid allocating the `MethodHandle[]` unnecessarily on some instances - Have `checkExactAccessMode` return the directness of the `VarHandle` so that we can avoid some `isDirect` method calls. Baseline Benchmark Mode CntScore Error Units VarHandleExact.exact_exactInvocation avgt 30 478.324 ? 5.762 ns/op VarHandleExact.generic_exactInvocationavgt 30 392.114 ? 1.644 ns/op VarHandleExact.generic_genericInvocation avgt 30 822.484 ? 1.865 ns/op Patched Benchmark Mode CntScore Error Units VarHandleExact.exact_exactInvocation avgt 30 437.704 ? 5.320 ns/op VarHandleExact.generic_exactInvocationavgt 30 374.512 ? 3.154 ns/op VarHandleExact.generic_genericInvocation avgt 30 757.054 ? 1.237 ns/op - Commit messages: - Various improvements to VarHandle startup Changes: https://git.openjdk.java.net/jdk/pull/8160/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8284579 Stats: 232 lines in 4 files changed: 18 ins; 23 del; 191 mod Patch: https://git.openjdk.java.net/jdk/pull/8160.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160 PR: https://git.openjdk.java.net/jdk/pull/8160