from:"\"Claes Redestad\""

RFR: 8288425: Footprint regression due MH creation when initializing StringConcatFactory

2022-06-14 Thread Claes Redestad

Avoid doing MH creation when initializing `StringConcatFactory` by lazily 
initializing to a `@Stable` field instead.

-

Commit messages:
 - 8288425: Footprint regression due MH creation when initializing 
StringConcatFactory

Changes: https://git.openjdk.org/jdk/pull/9154/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9154&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8288425
  Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/9154.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9154/head:pull/9154

PR: https://git.openjdk.org/jdk/pull/9154

Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers [v2]

2022-06-09 Thread Claes Redestad

> To take optimal advantage of the pre-existing optimization for repeated 
> filters we could split the application of different types of stringifiers.
> 
> The resulting difference in order of evaluation is not observable by 
> conventional means since all reference type share the same object 
> stringifier, and the others are filtering primitives (floats and doubles) 
> which have been passed by value already. 
> 
> This change neutral on many concatenation expression shapes, but for any 
> complex expressions with interleaving float/double and reference parameters 
> it brings a straightforward reduction in rebinds and underlying LFs 
> generated. For example on the 
> [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)
>  test there's a modest 2% reduction in total classes loaded with this change 
> (from 16209 to 15872)

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Improve comments based on review feedback

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/9082/files
  - new: https://git.openjdk.org/jdk/pull/9082/files/51c841e8..4b7c696e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=9082&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=9082&range=00-01

  Stats: 11 lines in 1 file changed: 7 ins; 2 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/9082.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9082/head:pull/9082

PR: https://git.openjdk.org/jdk/pull/9082

Integrated: 8287903: Reduce runtime of java.math microbenchmarks

2022-06-09 Thread Claes Redestad

On Tue, 7 Jun 2022 12:34:25 GMT, Claes Redestad  wrote:

> - Reduce runtime by running fewer forks, fewer iterations, less warmup. All 
> micros tested in this group appear to stabilize very quickly.
> - Refactor BigIntegers to avoid re-running some (most) micros over and over 
> with parameter values that don't affect them.
> 
> Expected runtime down from 14 hours to 15 minutes.

This pull request has now been integrated.

Changeset: 7e948f7c
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/7e948f7ccbb4b9be04f5ecb65cc8dd72e3b495f4
Stats: 104 lines in 4 files changed: 66 ins; 33 del; 5 mod

8287903: Reduce runtime of java.math microbenchmarks

Reviewed-by: ecaspole, aph

-

PR: https://git.openjdk.java.net/jdk/pull/9062

Re: RFR: 8287903: Reduce runtime of java.math microbenchmarks

2022-06-09 Thread Claes Redestad

On Tue, 7 Jun 2022 12:34:25 GMT, Claes Redestad  wrote:

> - Reduce runtime by running fewer forks, fewer iterations, less warmup. All 
> micros tested in this group appear to stabilize very quickly.
> - Refactor BigIntegers to avoid re-running some (most) micros over and over 
> with parameter values that don't affect them.
> 
> Expected runtime down from 14 hours to 15 minutes.

Thanks for reviews!

-

PR: https://git.openjdk.java.net/jdk/pull/9062

Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers

2022-06-08 Thread Claes Redestad

On Wed, 8 Jun 2022 14:44:33 GMT, Jim Laskey  wrote:

>> To take optimal advantage of the pre-existing optimization for repeated 
>> filters we could split the application of different types of stringifiers.
>> 
>> The resulting difference in order of evaluation is not observable by 
>> conventional means since all reference type share the same object 
>> stringifier, and the others are filtering primitives (floats and doubles) 
>> which have been passed by value already. 
>> 
>> This change neutral on many concatenation expression shapes, but for any 
>> complex expressions with interleaving float/double and reference parameters 
>> it brings a straightforward reduction in rebinds and underlying LFs 
>> generated. For example on the 
>> [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)
>>  test there's a modest 2% reduction in total classes loaded with this change 
>> (from 16209 to 15872)
>
> I make heavy use of the `mh = filterArgument(baseMH, 0, arrayOfFilters);` 
> pattern. I assumed it generated fewer LF than making individual calls. Am I 
> wrong?

@JimLaskey no, you're generally right to assume that bunching them up reduces 
the number of LFs created. There's a flaw in the `makeRepeatedFilterForm` 
transform that means the theoretical total amount of forms can surpass the 
baseline if you do _both_ one-by-one and all-at-once transforms from the same 
root MH. That's a very big if, but it's something I've felt slight unease about 
since I did that optimization.

-

PR: https://git.openjdk.java.net/jdk/pull/9082

Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers

2022-06-08 Thread Claes Redestad

On Wed, 8 Jun 2022 13:42:11 GMT, Jorn Vernee  wrote:

> Looking at the code in MethodHandles::filterArguments, and this optimization. 
> It seems that for multiple filters we could potentially always generate just 
> a single new lambda form if we wanted, not just for repeated filters.
> 
> It would be similar to `LambdaFormEditor::makeRepeatedFilterForm`. Just need 
> to extend the species data with one new `L_TYPE` field for each unique 
> filter, generate a new getCombiner `Name` for each unique combiner, pass in 
> an array of combiner types instead of just a single one, and pass in an 
> additional array of ints that marks the combiner used by each argument 
> position. Then do similar wiring up as we do now, but calling the 
> corresponding combiner for each argument pos. Right?
> 
> (If I'm right about that, maybe that's even a more interesting direction to 
> explore, since it would benefit everyone, and the code in StringConcatFactory 
> could stay the same)

You're right, but I think such a transform would add quite a bit of complexity. 

Back when I added the `makeRepeatedFilterForm` optimization (for the benefit of 
`StringConcatFactory` 😄 ) I sought to strike a balance between the simple 
repeatedly-create-`filterArgumentForm`s transform we would do before and more 
complex forms that would remove more intermediates. I was also a bit concerned 
that such "shortcutting" transforms could mean we eliminate sharing, and the 
more complex we make things the harder it gets to reason about. Example:

...
mh1 = filterArgument(baseMH, 0, filter);
mh1 = filterArgument(mh1, 0, filter);

mh2 = filterArgument(baseMH, 0, filter, filter);

mh1.form() == mh2.form(); // false

The two different pathways end up with two different LFs, since 
`makeRepeatedFilterForm` shortcuts. In practice this might be a non-issue, but 
I figured it'd be nice if we could have a reasonable pathway to resolve 
mh1.form() and mh2.form() to the same thing in the future.

-

PR: https://git.openjdk.java.net/jdk/pull/9082

Re: RFR: 8288011: StringConcatFactory: Split application of stringifiers

2022-06-08 Thread Claes Redestad

On Wed, 8 Jun 2022 12:11:30 GMT, Сергей Цыпанов  wrote:

>> To take optimal advantage of the pre-existing optimization for repeated 
>> filters we could split the application of different types of stringifiers.
>> 
>> The resulting difference in order of evaluation is not observable by 
>> conventional means since all reference type share the same object 
>> stringifier, and the others are filtering primitives (floats and doubles) 
>> which have been passed by value already. 
>> 
>> This change neutral on many concatenation expression shapes, but for any 
>> complex expressions with interleaving float/double and reference parameters 
>> it brings a straightforward reduction in rebinds and underlying LFs 
>> generated. For example on the 
>> [MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)
>>  test there's a modest 2% reduction in total classes loaded with this change 
>> (from 16209 to 15872)
>
> src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 
> 509:
> 
>> 507: doubleFilters = new MethodHandle[ptypes.length];
>> 508: }
>> 509: doubleFilters[i] = doubleStringifier();
> 
> Why do we have `ptypes[i] = int.class` for `int` and `short`, `ptypes[i] = 
> String.class` for `float`, `double` and `Object` and none for `boolean` and 
> `char`? 🧐

`Object`, `float` and `double` arguments are all eagerly transformed to 
`String` (this is what the "stringifiers" do). Since we apply these as filters 
as a last step (which means that's the first thing that will run) we then must 
change the internal type to `String`. Ultimately means we have fewer types to 
worry about in the combinator.

We also widen integral subword types to `int` (`short` and `byte`). This 
doesn't need to be adapted by filtering but can lean on implicit conversion 
(subword primitives are widened to int anyhow, what we care about is matching 
the correct logic). This mainly matters insofar that the same direct method 
handle will be used for these three types, as it can (and will) be adapted 
without modification by the `viewAsType` call that ensures the `MH` returned 
from the BSM has the exact type asked for. 

`boolean` and `char` types can't use the `int` mixers/prependers and instead 
have specialized methods that will be adapted into the concat MH tree. This 
means the parameter type stays `boolean` and `char` throughout, respectively.

-

PR: https://git.openjdk.java.net/jdk/pull/9082

RFR: 8288011: StringConcatFactory: Split application of stringifiers

2022-06-08 Thread Claes Redestad

To take optimal advantage of the pre-existing optimization for repeated filters 
we could split the application of different types of stringifiers.

The resulting difference in order of evaluation is not observable by 
conventional means since all reference type share the same object stringifier, 
and the others are filtering primitives (floats and doubles) which have been 
passed by value already. 

This change neutral on many concatenation expression shapes, but for any 
complex expressions with interleaving float/double and reference parameters it 
brings a straightforward reduction in rebinds and underlying LFs generated. For 
example on the 
[MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)
 test there's a modest 2% reduction in total classes loaded with this change 
(from 16209 to 15872)

-

Commit messages:
 - StringConcatFactory: Split application of stringifiers

Changes: https://git.openjdk.java.net/jdk/pull/9082/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9082&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8288011
  Stats: 27 lines in 1 file changed: 16 ins; 3 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9082.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9082/head:pull/9082

PR: https://git.openjdk.java.net/jdk/pull/9082

Integrated: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles

2022-06-08 Thread Claes Redestad

On Fri, 27 May 2022 14:18:19 GMT, Claes Redestad  wrote:

> In preparation of #8855 this PR refactors the conversions from `List` to 
> array and array to `List`, reducing the number of conversions when calling 
> `MethodHandles.dropArguments` in particular. This remove about ~5% of 
> allocations on the `StringConcatFactoryBootstraps` microbenchmark.

This pull request has now been integrated.

Changeset: ecf00785
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/ecf00785f21125d88f5cc18311f586a7bb6ddc56
Stats: 49 lines in 3 files changed: 9 ins; 6 del; 34 mod

8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles

Reviewed-by: jvernee

-

PR: https://git.openjdk.java.net/jdk/pull/8923

Integrated: 8287522: StringConcatFactory: Add in prependers and mixers in batches

2022-06-08 Thread Claes Redestad

On Mon, 23 May 2022 20:03:05 GMT, Claes Redestad  wrote:

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

This pull request has now been integrated.

Changeset: 5c39a366
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/5c39a3664186b91512c6a6cfcd8aa0e9860614ea
Stats: 468 lines in 4 files changed: 268 ins; 114 del; 86 mod

8287522: StringConcatFactory: Add in prependers and mixers in batches

Reviewed-by: jlaskey, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/8855

RFR: 8287903: Reduce runtime of java.math microbenchmarks

2022-06-07 Thread Claes Redestad

- Reduce runtime by running fewer forks, fewer iterations, less warmup. All 
micros tested in this group appear to stabilize very quickly.
- Refactor BigIntegers to avoid re-running some (most) micros over and over 
with parameter values that don't affect them.

Expected runtime down from 14 hours to 15 minutes.

-

Commit messages:
 - Cleanup SmallShifts, reduce param space
 - Annotations not cascading to static nested class
 - Warmup/Measurement not cascading to static nested class
 - Reduce runtime of java.math micros
 - Reduce runtime of java.math micros

Changes: https://git.openjdk.java.net/jdk/pull/9062/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9062&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8287903
  Stats: 104 lines in 4 files changed: 66 ins; 33 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9062.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9062/head:pull/9062

PR: https://git.openjdk.java.net/jdk/pull/9062

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v8]

2022-06-07 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 17 additional commits 
since the last revision:

 - Merge branch 'master' into scf_chunked
 - We now don't need big Species classes for shorter concats, so on some tests 
the improvements meant more Species class generation. Adjusting HelloClasslist
 - Clarify that consts are always reduced to null, even if calling with 
constants. Also clarify that the number of constants is paramCount + 1 by 
refactoring to an array
 - Mandy review comment #1: Cleanup LF.basicType a bit more.
 - Cleanup too generic String.valueOf lookups
 - Address review comments from @ExE-Boss
 - Improve bootstrap microbenchmark to include more shapes, reduce runtime
 - Minor stylistic cleanup
 - Revert change to HelloClasslist (doesn't affect generation)
 - Reduce allocation adding in mixers by extracting constant arg lists and 
caching double arg mixers
 - ... and 7 more: https://git.openjdk.java.net/jdk/compare/98404245...afe61394

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/aaa442d5..afe61394

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=06-07

  Stats: 59296 lines in 765 files changed: 33676 ins; 19450 del; 6170 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Integrated: 8287785: Reduce runtime of java.lang.invoke microbenchmarks

2022-06-07 Thread Claes Redestad

On Fri, 3 Jun 2022 11:16:29 GMT, Claes Redestad  wrote:

> - Add explicit run configurations to java.lang.invoke micros, aiming to 
> reduce runtime while maintaining a decently high confidence that there's 
> enough warmup to produce good enough data.
> 
> - Remove several trivial baseline micros, mainly those that only return a 
> static object: It's reasonable to have baseline microbenchmarks when the 
> baseline op is complex and you're mostly interested in checking the overhead 
> of doing the same thing via some MH API, but blackhole operations are now 
> shortcutting very quickly and timings doesn't differ from one type of object 
> to another, so we don't need a multitude of such baseline tests.
> 
> Estimated runtime of `make test TEST=micro:java.lang.micro` (excluding build) 
> drops from just above 28 to just above 3 hours.

This pull request has now been integrated.

Changeset: 42261d75
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/42261d752a140325496ffdd40d9ad62b189d1b3e
Stats: 439 lines in 47 files changed: 276 ins; 153 del; 10 mod

8287785: Reduce runtime of java.lang.invoke microbenchmarks

Reviewed-by: mchung

-

PR: https://git.openjdk.java.net/jdk/pull/9012

Integrated: 8287810: Reduce runtime of java.lang microbenchmarks

2022-06-07 Thread Claes Redestad

On Fri, 3 Jun 2022 15:05:33 GMT, Claes Redestad  wrote:

> - Reduce runtime by providing explicit configurations that try to run the 
> micros as fast as possible while giving enough time for warmup.
> - Remove some excessive parameters
> - Remove some benchmarks testing experimental features 
> (ObjectHashCode.mode_) or were simply dubious at large 
> (StringHttp)
> 
> This reduces runtime of running `java.lang.[A-Z]` (only java.lang, not 
> java.lang.reflect etc..) from over 56 hours to 6:45.

This pull request has now been integrated.

Changeset: 778ed1a7
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/778ed1a760d8f673811914b75e5d14e465881c91
Stats: 293 lines in 39 files changed: 192 ins; 87 del; 14 mod

8287810: Reduce runtime of java.lang microbenchmarks

Reviewed-by: mchung

-

PR: https://git.openjdk.java.net/jdk/pull/9015

Integrated: 8287798: Reduce runtime of java.lang.reflect/runtime microbenchmarks

2022-06-07 Thread Claes Redestad

On Fri, 3 Jun 2022 12:46:31 GMT, Claes Redestad  wrote:

> Add explicit run configurations to java.lang.reflect and .runtime micros, 
> aiming to reduce runtime while maintaining a decently high confidence that 
> there's enough warmup to produce good enough data.
> 
> Roughly halves runtime of running `java.lang.reflect|java.lang.runtime` from 
> 159 to 78 minutes.

This pull request has now been integrated.

Changeset: aa6c568a
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/aa6c568a06fa92263d4b74ff979eb521ae953bc8
Stats: 27 lines in 4 files changed: 25 ins; 2 del; 0 mod

8287798: Reduce runtime of java.lang.reflect/runtime microbenchmarks

Reviewed-by: jvernee, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/9013

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v3]

2022-06-03 Thread Claes Redestad

> In preparation of #8855 this PR refactors the conversions from `List` to 
> array and array to `List`, reducing the number of conversions when calling 
> `MethodHandles.dropArguments` in particular. This remove about ~5% of 
> allocations on the `StringConcatFactoryBootstraps` microbenchmark.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Jorn review: Drop boolean, rename internal variant dropArgumentsTrusted

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8923/files
  - new: https://git.openjdk.java.net/jdk/pull/8923/files/15ff2125..c590f0dd

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=01-02

  Stats: 13 lines in 3 files changed: 0 ins; 3 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8923.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8923/head:pull/8923

PR: https://git.openjdk.java.net/jdk/pull/8923

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]

2022-06-03 Thread Claes Redestad

On Fri, 3 Jun 2022 16:48:11 GMT, Jorn Vernee  wrote:

>> I'm not so sure. 
>> 
>> First of all we're no worse than before with the defensive copying here. 
>> Second of an optimizing compiler might theoretically be able to see that the 
>> array we get from the toArray is always fresh and not escaping anywhere in 
>> any well-behaved collection, so the clone could be elided. But if not then 
>> both toArray and clone are intrinsified operations that are heavily 
>> optimized and pretty fast even when interpreting, so emulating it with an 
>> external loop might end up taking more time even at peak. While likely 
>> taking longer to reach that peak. Using dumb, shared and common code 
>> (especially things that get JITted early anyhow) is nice in areas that see 
>> most action during startup/warmup.
>
> Ok, please keep it the way it is in your current patch then.

Correction: `toArray` isn't intrinsified, but common `List` implementations are 
likely to implement it with a `System.arraycopy`, which is.

-

PR: https://git.openjdk.java.net/jdk/pull/8923

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]

2022-06-03 Thread Claes Redestad

On Fri, 3 Jun 2022 15:06:48 GMT, Jorn Vernee  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Review comments, eagerly convert sooner in tryFinally
>
> src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 5269:
> 
>> 5267: }
>> 5268: 
>> 5269: static MethodHandle dropArguments(MethodHandle target, int pos, 
>> Class[] valueTypes, boolean trusted) {
> 
> Having a boolean that flips the behaviour makes it harder to know what that 
> `true` or `false` literal at the call site does. I'd suggest splitting this 
> into `dropArgumentsTrusted`, which doesn't clone the array, and 
> `dropArguments` which always makes a copy (and the latter calls former). WDYT?

Yeah, that sounds better.

-

PR: https://git.openjdk.java.net/jdk/pull/8923

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]

2022-06-03 Thread Claes Redestad

On Fri, 3 Jun 2022 15:18:52 GMT, Jorn Vernee  wrote:

>> src/java.base/share/classes/java/lang/invoke/MethodHandles.java line 5266:
>> 
>>> 5264:  */
>>> 5265: public static MethodHandle dropArguments(MethodHandle target, int 
>>> pos, List> valueTypes) {
>>> 5266: return dropArguments(target, pos, valueTypes.toArray(new 
>>> Class[0]), false);
>> 
>> A bit unfortunate that we can't trust this `toArray` to do a copy. I was 
>> going to suggest Stream, but it has the same issue (someone might have a 
>> custom stream implementation).
>> 
>> I do think a manual copy of the array is possible by having a loop though:
>> Suggestion:
>> 
>> Class[] ptypes = new Class[valueTypes.size()];
>> for (int i = 0; i < ptypes.length; i++) {
>> ptypes[i] = valueTypes.get(i);
>> }
>> return dropArguments(target, pos, ptypes, false);
>> 
>> 
>> (or, maybe extract such a loop to a helper method for clarity).
>
> The same could be done for the public `dropArgumentsToMatch` I think.

I'm not so sure. 

First of all we're no worse than before with the defensive copying here. Second 
of an optimizing compiler might theoretically be able to see that the array we 
get from the toArray is always fresh and not escaping anywhere in any 
well-behaved collection, so the clone could be elided. But if not then both 
toArray and clone are intrinsified operations that are heavily optimized and 
pretty fast even when interpreting, so emulating it with an external loop might 
end up taking more time even at peak. While likely taking longer to reach that 
peak. Using dumb, shared and common code (especially things that get JITted 
early anyhow) is nice in areas that see most action during startup/warmup.

-

PR: https://git.openjdk.java.net/jdk/pull/8923

RFR: 8287810: Reduce runtime of java.lang microbenchmarks

2022-06-03 Thread Claes Redestad

- Reduce runtime by providing explicit configurations that try to run the 
micros as fast as possible while giving enough time for warmup.
- Remove some excessive parameters
- Remove some benchmarks testing experimental features 
(ObjectHashCode.mode_) or were simply dubious at large (StringHttp)

This reduces runtime of running `java.lang.[A-Z]` (only java.lang, not 
java.lang.reflect etc..) from over 56 hours to 6:45.

-

Commit messages:
 - Reduce runtime of java.lang microbenchmarks

Changes: https://git.openjdk.java.net/jdk/pull/9015/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9015&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287810
  Stats: 293 lines in 39 files changed: 192 ins; 87 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9015.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9015/head:pull/9015

PR: https://git.openjdk.java.net/jdk/pull/9015

RFR: 8287798: Reduce runtime in java.lang.reflect/runtime microbenchmarks

2022-06-03 Thread Claes Redestad

Add explicit run configurations to java.lang.reflect and .runtime micros, 
aiming to reduce runtime while maintaining a decently high confidence that 
there's enough warmup to produce good enough data.

Roughly halves runtime of running `java.lang.reflect|java.lang.runtime` from 
159 to 78 minutes.

-

Commit messages:
 - Reduce runtime in java.lang.reflect/runtime microbenchmarks

Changes: https://git.openjdk.java.net/jdk/pull/9013/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9013&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287798
  Stats: 27 lines in 4 files changed: 25 ins; 2 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9013.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9013/head:pull/9013

PR: https://git.openjdk.java.net/jdk/pull/9013

Re: RFR: 8287785: Reduce runtime of java.lang.invoke microbenchmarks [v2]

2022-06-03 Thread Claes Redestad

On Fri, 3 Jun 2022 12:37:16 GMT, liach  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Remove unused cached field in LookupAcquire
>
> test/micro/org/openjdk/bench/java/lang/invoke/LookupAcquire.java line 59:
> 
>> 57: @Setup
>> 58: public void setup() {
>> 59: cached = MethodHandles.lookup();
> 
> Since the benchmark using the cached object is removed, should this field be 
> removed as well?

Fixed!

-

PR: https://git.openjdk.java.net/jdk/pull/9012

Re: RFR: 8287785: Reduce runtime of java.lang.invoke microbenchmarks [v2]

2022-06-03 Thread Claes Redestad

> - Add explicit run configurations to java.lang.invoke micros, aiming to 
> reduce runtime while maintaining a decently high confidence that there's 
> enough warmup to produce good enough data.
> 
> - Remove several trivial baseline micros, mainly those that only return a 
> static object: It's reasonable to have baseline microbenchmarks when the 
> baseline op is complex and you're mostly interested in checking the overhead 
> of doing the same thing via some MH API, but blackhole operations are now 
> shortcutting very quickly and timings doesn't differ from one type of object 
> to another, so we don't need a multitude of such baseline tests.
> 
> Estimated runtime of `make test TEST=micro:java.lang.micro` (excluding build) 
> drops from just above 28 to just above 3 hours.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Remove unused cached field in LookupAcquire

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/9012/files
  - new: https://git.openjdk.java.net/jdk/pull/9012/files/f351ac70..0a1f7ec7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=9012&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=9012&range=00-01

  Stats: 11 lines in 1 file changed: 0 ins; 11 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9012.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9012/head:pull/9012

PR: https://git.openjdk.java.net/jdk/pull/9012

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v7]

2022-06-03 Thread Claes Redestad

On Fri, 3 Jun 2022 11:19:04 GMT, ExE Boss  wrote:

>> You would think that, but javac doesn't do anything fancy once you store to 
>> a local
>> 
>> javap output for lines 108 through 111:
>> 
>>449: ldc   #148// float 0.1f
>>451: fstore31
>>453: fload 31
>>455: invokedynamic #149,  0// InvokeDynamic 
>> #4:makeConcatWithConstants:(F)Ljava/lang/String;
>>460: astore32
>>462: fload 31
>>464: aload 4
>>466: invokedynamic #152,  0// InvokeDynamic 
>> #7:makeConcatWithConstants:(FLjava/lang/String;)Ljava/lang/String;
>>471: astore33
>>473: aload 4
>>475: fload 31
>>477: invokedynamic #155,  0// InvokeDynamic 
>> #10:makeConcatWithConstants:(Ljava/lang/String;F)Ljava/lang/String;
>
> I guess it only happens for `final` locals.

Yes, looks like javac does constant fold when the local is explicitly final:

final float f = 0.1f;
System.out.println("const folding? " + f);

javap:

 0: getstatic #2  // Field 
java/lang/System.out:Ljava/io/PrintStream;
 3: ldc   #3  // String const folding? 0.1
 5: invokevirtual #4  // Method 
java/io/PrintStream.println:(Ljava/lang/String;)V


Maybe this constant folding is something javac could be enhanced to do on 
effectively final locals, though I think we can defer adjusting 
`HelloClasslist` to break that optimization when that time comes.

-

PR: https://git.openjdk.java.net/jdk/pull/8855

RFR: 8287785: Reduce runtime of java.lang.invoke microbenchmarks

2022-06-03 Thread Claes Redestad

- Add explicit run configurations to java.lang.invoke micros, aiming to reduce 
runtime while maintaining a decently high confidence that there's enough warmup 
to produce good enough data.

- Remove several trivial baseline micros, mainly those that only return a 
static object: It's reasonable to have baseline microbenchmarks when the 
baseline op is complex and you're mostly interested in checking the overhead of 
doing the same thing via some MH API, but blackhole operations are now 
shortcutting very quickly and timings doesn't differ from one type of object to 
another, so we don't need a multitude of such baseline tests.

Estimated runtime of `make test TEST=micro:java.lang.micro` (excluding build) 
drops from just above 28 to just above 3 hours.

-

Commit messages:
 - 8287785: Apply explicit run configuration to java.lang.invoke microbenchmarks

Changes: https://git.openjdk.java.net/jdk/pull/9012/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=9012&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287785
  Stats: 429 lines in 47 files changed: 276 ins; 142 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9012.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9012/head:pull/9012

PR: https://git.openjdk.java.net/jdk/pull/9012

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v7]

2022-06-02 Thread Claes Redestad

On Thu, 2 Jun 2022 18:49:13 GMT, ExE Boss  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   We now don't need big Species classes for shorter concats, so on some 
>> tests the improvements meant more Species class generation. Adjusting 
>> HelloClasslist
>
> make/jdk/src/classes/build/tools/classlist/HelloClasslist.java line 116:
> 
>> 114: String CC = "string" + c;
>> 115: String CCS= "string" + c + s;
>> 116: String CSCC   = "string" + s + "string" + c;
> 
> I’m pretty sure that `f` and `c` will be inlined as compile‑time constants in 
> the String concat recipes.

You would think that, but javac doesn't do anything fancy once you store to a 
local

javap output for lines 108 through 111:

   449: ldc   #148// float 0.1f
   451: fstore31
   453: fload 31
   455: invokedynamic #149,  0// InvokeDynamic 
#4:makeConcatWithConstants:(F)Ljava/lang/String;
   460: astore32
   462: fload 31
   464: aload 4
   466: invokedynamic #152,  0// InvokeDynamic 
#7:makeConcatWithConstants:(FLjava/lang/String;)Ljava/lang/String;
   471: astore33
   473: aload 4
   475: fload 31
   477: invokedynamic #155,  0// InvokeDynamic 
#10:makeConcatWithConstants:(Ljava/lang/String;F)Ljava/lang/String;

-

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v7]

2022-06-02 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  We now don't need big Species classes for shorter concats, so on some tests 
the improvements meant more Species class generation. Adjusting HelloClasslist

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/444cdd39..aaa442d5

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=05-06

  Stats: 18 lines in 1 file changed: 17 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v6]

2022-06-02 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Clarify that consts are always reduced to null, even if calling with 
constants. Also clarify that the number of constants is paramCount + 1 by 
refactoring to an array

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/56bdc4c4..444cdd39

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=04-05

  Stats: 55 lines in 1 file changed: 15 ins; 13 del; 27 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v2]

2022-06-01 Thread Claes Redestad

On Tue, 31 May 2022 23:15:26 GMT, Mandy Chung  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Minor stylistic cleanup
>
> src/java.base/share/classes/java/lang/invoke/LambdaForm.java line 204:
> 
>> 202: static BasicType basicType(Class type) {
>> 203: if (!type.isPrimitive())  return L_TYPE;
>> 204: return 
>> basicType(Wrapper.forPrimitiveType(type).basicTypeChar());
> 
> Suggestion:
> 
> return basicType(Wrapper.basicTypeChar(type));

I'll go one step further then and also remove the line before it too.

-

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v5]

2022-06-01 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request incrementally with two additional 
commits since the last revision:

 - Mandy review comment #1: Cleanup LF.basicType a bit more.
 - Cleanup too generic String.valueOf lookups

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/f1742a84..56bdc4c4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=03-04

  Stats: 18 lines in 2 files changed: 1 ins; 8 del; 9 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v4]

2022-06-01 Thread Claes Redestad

On Wed, 1 Jun 2022 20:51:50 GMT, Mandy Chung  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Address review comments from @ExE-Boss
>
> src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 
> 459:
> 
>> 457: String prefix = constants.get(0);
>> 458: if (prefix == null) {
>> 459: if (suffix == null) {
> 
> Is it possible for prefix or suffix an empty string (non-null)?   The 
> original code checks `if (constant.isEmpty())`

No: Empty string constants would lead to adding a null to the list of constants 
in the prologue.

-

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v4]

2022-06-01 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Address review comments from @ExE-Boss

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/4221b348..f1742a84

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=02-03

  Stats: 17 lines in 1 file changed: 0 ins; 5 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v3]

2022-06-01 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Improve bootstrap microbenchmark to include more shapes, reduce runtime

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/263db625..4221b348

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=01-02

  Stats: 25 lines in 1 file changed: 19 ins; 1 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8282662: Use List.of() factory method to reduce memory consumption [v7]

2022-05-31 Thread Claes Redestad

On Tue, 31 May 2022 07:40:56 GMT, Сергей Цыпанов  wrote:

>> `List.of()` along with `Set.of()` create unmodifiable `List/Set` but with 
>> smaller footprint comparing to `Arrays.asList()` / `new HashSet()` when 
>> called with vararg of size 0, 1, 2.
>> 
>> In general replacement of `Arrays.asList()` with `List.of()` is dubious as 
>> the latter is null-hostile, however in some cases we are sure that arguments 
>> are non-null. Into this PR I've included the following cases (in addition to 
>> those where the argument is proved to be non-null at compile-time):
>> - `MethodHandles.longestParameterList()` never returns null
>> - parameter types are never null
>> - interfaces used for proxy construction and returned from 
>> `Class.getInterfaces()` are never null
>> - exceptions types of method signature are never null
>
> Сергей Цыпанов has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains six commits:
> 
>  - 8282662: Revert wrong copyright year change
>  - 8282662: Revert ProxyGenerator
>  - 8282662: Revert ProxyGenerator
>  - 8282662: Revert dubious changes in MethodType
>  - 8282662: Revert dubious changes
>  - 8282662: Use List/Set.of() factory methods to save memory

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/7729

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches [v2]

2022-05-30 Thread Claes Redestad

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Minor stylistic cleanup

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8855/files
  - new: https://git.openjdk.java.net/jdk/pull/8855/files/114d3fd8..263db625

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=00-01

  Stats: 26 lines in 1 file changed: 6 ins; 10 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches

2022-05-30 Thread Claes Redestad

On Mon, 23 May 2022 20:03:05 GMT, Claes Redestad  wrote:

> When generating `MethodHandle`-based concatenation expressions in 
> `StringConcatFactory` we can reduce the number of classes generated at 
> runtime by creating small batches of prependers and mixers before binding 
> them into the root expression tree. 
> 
> Improvements on one-off tests are modest, while the improvement on 
> bootstrapping stress tests can be substantial 
> ([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):
> 
> | Build  |# classes|   Runtime   |
> | --- | - |  --- |
> | Baseline | 31119   | 2.942s |
> | Patch  | 16208   | 1.958s |
> 
> An earlier version of this patch saw a regression in the 
> `StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
> with the optimizations in #8881 and #8923 that is no longer the case, and 
> allocation pressure is down slightly compared to the baseline on such a 
> repeat-the-same-bootstrap stress test:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2170.039 ?  117.441   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3538.020 ?4.558B/op
> 
> This patch:
> 
> Benchmark  Mode  Cnt 
> Score  Error   Units
> SCFB.makeConcatWithConstants   avgt5  
> 2144.807 ?  162.685   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
> 3184.943 ?4.495B/op

Before moving forward with this I've been looking at slightly more expensive 
bootstrapping of already known shapes. On one such bootstrap microbenchmark, 
`StringConcatFactoryBootstraps` this patch would cause a ~25% slowdown (and a 
33% increase in allocations). Not too concerning, but since the purpose of this 
patch is to reduce String concat bootstrapping overhead it would be nice to not 
trade that for a localized regression.

First improvement was to pack `TransformKey`s better: #8881 - which helps both 
variants, but doesn't really narrow the gap.

We can improve further outside of `StringConcatFactory` to prepare for this RFE 
by reducing the number List-to-array and array-to-List conversions in 
`MethodHandles.dropArguments` (-280b/op). Another candidate is to turn 
`LambdaFormEditor` either into a static utility or it into `LamdaForm` proper 
(TBD if this is worthwhile; they're tightly coupled, and all state maintained 
by `LambdaFormEditor` lives in `LambdaForm`).

-

PR: https://git.openjdk.java.net/jdk/pull/8855

RFR: 8287522: StringConcatFactory: Add in prependers and mixers in batches

2022-05-30 Thread Claes Redestad

When generating `MethodHandle`-based concatenation expressions in 
`StringConcatFactory` we can reduce the number of classes generated at runtime 
by creating small batches of prependers and mixers before binding them into the 
root expression tree. 

Improvements on one-off tests are modest, while the improvement on 
bootstrapping stress tests can be substantial 
([MixedStringCombinations.java](https://gist.github.com/cl4es/08fb581dece3a73e89bfa52337bc4248)):

| Build  |# classes|   Runtime   |
| --- | - |  --- |
| Baseline | 31119   | 2.942s |
| Patch  | 16208   | 1.958s |

An earlier version of this patch saw a regression in the 
`StringConcatFactoryBootstraps` microbenchmark. After some refactoring along 
with the optimizations in #8881 and #8923 that is no longer the case, and 
allocation pressure is down slightly compared to the baseline on such a 
repeat-the-same-bootstrap stress test:

Baseline:

Benchmark  Mode  Cnt 
Score  Error   Units
SCFB.makeConcatWithConstants   avgt5  
2170.039 ?  117.441   ns/op
SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
3538.020 ?4.558B/op

This patch:

Benchmark  Mode  Cnt 
Score  Error   Units
SCFB.makeConcatWithConstants   avgt5  
2144.807 ?  162.685   ns/op
SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt5  
3184.943 ?4.495B/op

-

Commit messages:
 - Revert change to HelloClasslist (doesn't affect generation)
 - Reduce allocation adding in mixers by extracting constant arg lists and 
caching double arg mixers
 - De-CHM the prepender and mixer caches
 - Reduce allocations: Extract constant argument lists, cache base form for 
two-arg prependers
 - Refactor, reduce allocations
 - Merge branch 'master' into scf_chunked
 - Improve chunking further
 - Fix mis-merged change
 - Experiment: Chunk prependers and mixers to reduce number of rebinds and 
generated LF classes

Changes: https://git.openjdk.java.net/jdk/pull/8855/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8855&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287522
  Stats: 390 lines in 2 files changed: 232 ins; 99 del; 59 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8855/head:pull/8855

PR: https://git.openjdk.java.net/jdk/pull/8855

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles [v2]

2022-05-27 Thread Claes Redestad

> In preparation of #8855 this PR refactors the conversions from `List` to 
> array and array to `List`, reducing the number of conversions when calling 
> `MethodHandles.dropArguments` in particular. This remove about ~5% of 
> allocations on the `StringConcatFactoryBootstraps` microbenchmark.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Review comments, eagerly convert sooner in tryFinally

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8923/files
  - new: https://git.openjdk.java.net/jdk/pull/8923/files/019e2ef8..15ff2125

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=00-01

  Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8923.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8923/head:pull/8923

PR: https://git.openjdk.java.net/jdk/pull/8923

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles

2022-05-27 Thread Claes Redestad

On Fri, 27 May 2022 19:27:44 GMT, ExE Boss  wrote:

> If `parameterList` is too slow for `List.of` copies the backing parameter 
> types array, wouldn't calling 
> `JavaUtilCollectionAccess::listFromTrustedArray` a better alternative, as it 
> only allocates one wrapper and has better performance on `subList` than 
> `Arrays.copyOfRange`?

Good observation that `MethodType::parameterList()` is a plausible candidate 
for `listFromTrustedArray`. That could be a good standalone and orthogonal RFE. 

I recall @rose00 musing about how he'd prefer it if `MethodType` and friends 
would have been implemented with `List.of`-style immutable lists rather than 
arrays. While I agree with him in principle we now have a mix of both worlds 
where we're converting between either representation haphazardly, making the 
implementation more complex in the process. It seems better for now to 
streamline the implementation with an early conversion to what we use 
internally (arrays) and stick with that.

-

PR: https://git.openjdk.java.net/jdk/pull/8923

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles

2022-05-27 Thread Claes Redestad

On Fri, 27 May 2022 14:18:19 GMT, Claes Redestad  wrote:

> In preparation of #8855 this PR refactors the conversions from `List` to 
> array and array to `List`, reducing the number of conversions when calling 
> `MethodHandles.dropArguments` in particular. This remove about ~5% of 
> allocations on the `StringConcatFactoryBootstraps` microbenchmark.

It might be a bit too paranoid in this instance (since we don't keep the array 
around for long), but not cloning the result of calling `toArray` on an 
arbitrary and possibly adversary `List` could open up for TOCTOU race bugs / 
attacks. The existing code was being paranoid and copying and I don't want to 
weaken something that could have security implications without double- and 
triple-checking that it's safe to do so.

-

PR: https://git.openjdk.java.net/jdk/pull/8923

Re: RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles

2022-05-27 Thread Claes Redestad

On Fri, 27 May 2022 14:18:19 GMT, Claes Redestad  wrote:

> In preparation of #8855 this PR refactors the conversions from `List` to 
> array and array to `List`, reducing the number of conversions when calling 
> `MethodHandles.dropArguments` in particular. This remove about ~5% of 
> allocations on the `StringConcatFactoryBootstraps` microbenchmark.

Another datapoint is the `MethodHandlesDropArguments.interCreate` 
microbenchmark which explicitly tests `MethodHandles.dropArguments` (with a 
minimal argument list):

Before:

BenchmarkMode  
Cnt ScoreError   Units
MethodHandlesDropArguments.interCreate   avgt   
 5   136.778 ?  1.265   ns/op
MethodHandlesDropArguments.interCreate:?gc.alloc.rate.norm   avgt   
 5   264.020 ?  0.021B/op

Patch:

BenchmarkMode  
Cnt Score Error   Units
MethodHandlesDropArguments.interCreate   avgt   
 5   120.536 ?   3.133   ns/op
MethodHandlesDropArguments.interCreate:?gc.alloc.rate.norm   avgt   
 5   220.818 ?  41.309B/op

-

PR: https://git.openjdk.java.net/jdk/pull/8923

RFR: 8287442: Reduce list to array conversions in java.lang.invoke.MethodHandles

2022-05-27 Thread Claes Redestad

In preparation of #8855 this PR refactors the conversions from `List` to array 
and array to `List`, reducing the number of conversions when calling 
`MethodHandles.dropArguments` in particular. This remove about ~5% of 
allocations on the `StringConcatFactoryBootstraps` microbenchmark.

-

Commit messages:
 - Move all conversions closer to the edge
 - Merge master
 - Streamline MHs.dropArguments parameter handling
 - Missed correctly taking b1 into account in of(byte, int, int...) 
(java/lang/String/concat/ImplicitStringConcatShapes.java test failure)
 - Address review comments, fix unsafe shift, rework and remove ofBothArrays
 - Revert unrelated and unverified hashCode change
 - Improve TransformKey to pack more kinds of transforms efficiently

Changes: https://git.openjdk.java.net/jdk/pull/8923/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8923&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287442
  Stats: 46 lines in 3 files changed: 9 ins; 3 del; 34 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8923.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8923/head:pull/8923

PR: https://git.openjdk.java.net/jdk/pull/8923

Integrated: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-27 Thread Claes Redestad

On Wed, 25 May 2022 09:38:08 GMT, Claes Redestad  wrote:

> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600    B/op

This pull request has now been integrated.

Changeset: be933185
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/be93318576896e8f5f9733ae1f7e3e74d63f5594
Stats: 124 lines in 2 files changed: 70 ins; 21 del; 33 mod

8287292: Improve TransformKey to pack more kinds of transforms efficiently

Reviewed-by: jlaskey, jvernee, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/8881

Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]

2022-05-27 Thread Claes Redestad

On Wed, 25 May 2022 14:13:52 GMT, Claes Redestad  wrote:

>> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
>> allows keys to be compacted when all byte values of the key fit in 4 bits, 
>> otherwise a byte array is allocated and used. This means that all transforms 
>> with a kind value above 15 will be forced to allocate and use array 
>> comparisons.
>> 
>> Removing unused and folding some transforms to ensure all existing kinds can 
>> fit snugly within the 0-15 value range realize a minor improvement to 
>> footprint, speed and allocation pressure of affected transforms, e.g. 
>> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
>> 
>> Baseline:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 2048.475 ?  69.887   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3487.311 ?  80.385B/op
>> 
>> 
>> Patched:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 1961.985 ? 101.519   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3156.478 ? 183.600B/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Missed correctly taking b1 into account in of(byte, int, int...) 
> (java/lang/String/concat/ImplicitStringConcatShapes.java test failure)

Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8881

Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v4]

2022-05-27 Thread Claes Redestad

> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Address Mandy review comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8881/files
  - new: https://git.openjdk.java.net/jdk/pull/8881/files/612b4ece..bcd79aad

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=02-03

  Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881

Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]

2022-05-25 Thread Claes Redestad

> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Missed correctly taking b1 into account in of(byte, int, int...) 
(java/lang/String/concat/ImplicitStringConcatShapes.java test failure)

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8881/files
  - new: https://git.openjdk.java.net/jdk/pull/8881/files/2be3b25c..612b4ece

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881

Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v2]

2022-05-25 Thread Claes Redestad

> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Address review comments, fix unsafe shift, rework and remove ofBothArrays

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8881/files
  - new: https://git.openjdk.java.net/jdk/pull/8881/files/001e9d16..2be3b25c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=00-01

  Stats: 86 lines in 2 files changed: 50 ins; 12 del; 24 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881

Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Claes Redestad

On Wed, 25 May 2022 13:30:23 GMT, Jorn Vernee  wrote:

>> Maybe not... argument positions should fit in a byte as well. But, maybe 
>> there are other problematic cases? Or are the ints guaranteed to fit in a 
>> byte?
>
> Maybe an `assert b == b23456[i]` would be nice here.

All usage implies the int is an argument position, which by spec is constrained 
to be in the 0-255 range. But surely checking or asserting shouldn't hurt.

-

PR: https://git.openjdk.java.net/jdk/pull/8881

RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Claes Redestad

The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` allows 
keys to be compacted when all byte values of the key fit in 4 bits, otherwise a 
byte array is allocated and used. This means that all transforms with a kind 
value above 15 will be forced to allocate and use array comparisons.

Removing unused and folding some transforms to ensure all existing kinds can 
fit snugly within the 0-15 value range realize a minor improvement to 
footprint, speed and allocation pressure of affected transforms, e.g. 
~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:

Baseline:

Benchmark  Mode  Cnt 
Score Error   Units
SCFB.makeConcatWithConstants   avgt   15  
2048.475 ?  69.887   ns/op
SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
3487.311 ?  80.385B/op


Patched:

Benchmark  Mode  Cnt 
Score Error   Units
SCFB.makeConcatWithConstants   avgt   15  
1961.985 ? 101.519   ns/op
SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
3156.478 ? 183.600B/op

-

Commit messages:
 - Revert unrelated and unverified hashCode change
 - Improve TransformKey to pack more kinds of transforms efficiently

Changes: https://git.openjdk.java.net/jdk/pull/8881/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8881&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287292
  Stats: 44 lines in 1 file changed: 21 ins; 10 del; 13 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881

Integrated: 8287013: StringConcatFactory: remove short and byte mixers/prependers

2022-05-20 Thread Claes Redestad

On Thu, 19 May 2022 10:47:23 GMT, Claes Redestad  wrote:

> The current short and byte mixers and prependers simply delegate to the int 
> variants. At the LambdaForm level the values has already been implicitly cast 
> to int, so removing this indirection and directly adapting to call the 
> int-based variants is possible.
> 
> This is a cleanup with minimal effect on bootstrap overhead and peak 
> performance, since all the LF logic only deals with basic types (where byte 
> and short and other subword primitives has been widened to ints anyhow).

This pull request has now been integrated.

Changeset: d5d19f52
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/d5d19f52ceb1430104b12a42c78489f42477a9b0
Stats: 87 lines in 2 files changed: 12 ins; 74 del; 1 mod

8287013: StringConcatFactory: remove short and byte mixers/prependers

Reviewed-by: jlaskey

-

PR: https://git.openjdk.java.net/jdk/pull/8786

Re: RFR: 8287053: Avoid redundant HashMap.containsKey calls in ZoneInfoFile.getZoneInfo0

2022-05-20 Thread Claes Redestad

On Sat, 30 Apr 2022 17:00:03 GMT, Andrey Turbanov  wrote:

> Instead of pair `HashMap.containsKey`/`HashMap.get` method calls, we can use 
> single call `HashMap.getOrDefault`.
> It's faster and clearer.

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8487

Re: Question: Is it possible to introduce ASCII coder for compact strings to optimize performance for ASCII compatible encoding?

2022-05-19 Thread Claes Redestad

Yes, I think 20% is close to an upper bound, with less gain on smaller
strings due improved cache-locality. And as you say such a small
speed-up might not be very noticeable in practice when I/O is involved.

/Claes

On 2022-05-19 14:17, Glavo wrote:
Thank you very much for your work, it seems that the decoding speed of 
the new version of JDK has been greatly improved.

However, `hasNegatives` still has a cost on my device. I tried removing 
the call to hasNegatives in `String.encodeUTF8` based on JDK 17.0.3,
then I tested its performance through JMH. I ran the JMH benchmark on 
two of my x86 machines (The CPU of one of the machines is Ryzen 7 3700X 
and the other is Xeon W-2175)

and got about a 20% performance boost on both machines.

Of course, The decoding performance in the new JDK is better than I 
thought. For an ASCII string of length 1, hasNegatives takes about 
150 nanoseconds, and the duration increases linearly with the length.
Considering that these operations usually accompany IO, the overhead is 
really not as noticeable as I thought.

Thank you again for your contribution to OpenJDK and your patience in 
replying.

On Wed, May 18, 2022 at 10:47 PM Claes Redestad 
mailto:claes.redes...@oracle.com>> wrote:

Hi,

so your suggestion is that String::coder would be ASCII if all
codepoints are 0-127, then LATIN1 if it contains at least one char in
the 128-255 range, otherwise UTF16? As you say this would mean we could
skip scanning StringCoding::hasNegatives scans and similar overheads
when converting known-to-be-ASCII Strings to UTF-8 and other ASCII
compatible encoding. But it might also complicate logic in various
places, so we need a clear win to motivate such work.

So the first question to answer might be how much does the hasNegatives
calls cost us. Depending on your hardware (and JDK version) the answer
might be "almost nothing"!

I did some work last year to speed up encoding and decoding ASCII-only
data to/from various encodings. When I was done there was no longer much
of a difference when encoding from String to an ASCII-compatible charset
[1]. Maybe a scaling ~10% advantage for latin-1 in some microbenchmark
when decoding on really large strings[2].

But with the suggested change the decoding advantage would likely
disappear: we maintain the invariant that Strings are equals if and only
if their coders are equal, so no we'd now have to scan latin-1 encoded
streams for non-ASCII bytes to disambiguate between ASCII and LATIN1.

Maybe there are some other case that would be helped, but with some
likely negatives and only a modest potential win then my gut feeling is
that this wouldn't pay off.

Best regards
Claes

[1] https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html

<https://urldefense.com/v3/__https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html__;!!ACWV5N9M2RV99hQ!I_AH1iAOysINPqrZ3OaqijBHyoZwpzC_5jUnQk-lTGa8-V3RL6CnY8WP3t__jWfDKrUylGMS6E5NEBW7JwNo$>
[2] https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html

<https://urldefense.com/v3/__https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html__;!!ACWV5N9M2RV99hQ!I_AH1iAOysINPqrZ3OaqijBHyoZwpzC_5jUnQk-lTGa8-V3RL6CnY8WP3t__jWfDKrUylGMS6E5NEDHadwAE$>

(All the improvements discussed in the blog entires should be available
since 17.0.2.)

On 2022-05-18 14:07, Glavo wrote:
 > After the introduction of compact strings in Java 9, the current
String may
 > store byte arrays encoded as Latin1 or UTF-16.
 > Here's a troublesome thing: Latin1 is not compatible with UTF-8.
Not all
 > Latin1 byte strings are legal UTF-8 byte strings:
 > They are only compatible within the ASCII range, when there are
code points
 > greater than 127, Latin1 uses a one-byte
 > representation, while UTF-8 requires two bytes.
 >
 > As an example, every time `JavaLangAccess::getBytesNoRepl` is
called to
 > convert a string to a UTF-8 array,
 > the internal implementation needs to call
`StringCoding.hasNegatives` to
 > scan the content byte array to determine that
 > the string can be encoded in ASCII, and thus eliminate array copies.
 > Similar steps are performed when calling `str.getBytes(UTF_8)`.
 >
 > So, is it possible to introduce a third possible value for
`String::coder`:
 > ASCII?
 > This looks like an attractive option, and if we do this, we can
introduce
 > fast paths for many methods.
 >
 > Of course, I know that this change is not completely free, and
some methods
 > may bring slight performance
 > degradation due to the need to judge the coder, in particular,
there may be
 > an impact on the performance of the
 > StringBuil

RFR: 8287013: StringConcatFactory: remove short and byte mixers/prependers

2022-05-19 Thread Claes Redestad

The current short and byte mixers and prependers simply delegate to the int 
variants. At the LambdaForm level the values has already been implicitly cast 
to int, so removing this indirection and directly adapting to call the 
int-based variants is possible.

This is a cleanup with minimal effect on bootstrap overhead and peak 
performance, since all the LF logic only deals with basic types (where byte and 
short and other subword primitives has been widened to ints anyhow).

-

Commit messages:
 - 8287013: StringConcatFactory: remove short and byte mixers/prependers

Changes: https://git.openjdk.java.net/jdk/pull/8786/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8786&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8287013
  Stats: 87 lines in 2 files changed: 12 ins; 74 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8786.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8786/head:pull/8786

PR: https://git.openjdk.java.net/jdk/pull/8786

Re: Question: Is it possible to introduce ASCII coder for compact strings to optimize performance for ASCII compatible encoding?

2022-05-18 Thread Claes Redestad


Hi,

so your suggestion is that String::coder would be ASCII if all
codepoints are 0-127, then LATIN1 if it contains at least one char in
the 128-255 range, otherwise UTF16? As you say this would mean we could
skip scanning StringCoding::hasNegatives scans and similar overheads
when converting known-to-be-ASCII Strings to UTF-8 and other ASCII
compatible encoding. But it might also complicate logic in various
places, so we need a clear win to motivate such work.

So the first question to answer might be how much does the hasNegatives
calls cost us. Depending on your hardware (and JDK version) the answer
might be "almost nothing"!

I did some work last year to speed up encoding and decoding ASCII-only
data to/from various encodings. When I was done there was no longer much
of a difference when encoding from String to an ASCII-compatible charset
[1]. Maybe a scaling ~10% advantage for latin-1 in some microbenchmark
when decoding on really large strings[2].

But with the suggested change the decoding advantage would likely
disappear: we maintain the invariant that Strings are equals if and only
if their coders are equal, so no we'd now have to scan latin-1 encoded 
streams for non-ASCII bytes to disambiguate between ASCII and LATIN1.


Maybe there are some other case that would be helped, but with some
likely negatives and only a modest potential win then my gut feeling is
that this wouldn't pay off.

Best regards
Claes

[1] https://cl4es.github.io/2021/10/17/Faster-Charset-Encoding.html
[2] https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html

(All the improvements discussed in the blog entires should be available 
since 17.0.2.)


On 2022-05-18 14:07, Glavo wrote:

After the introduction of compact strings in Java 9, the current String may
store byte arrays encoded as Latin1 or UTF-16.
Here's a troublesome thing: Latin1 is not compatible with UTF-8. Not all
Latin1 byte strings are legal UTF-8 byte strings:
They are only compatible within the ASCII range, when there are code points
greater than 127, Latin1 uses a one-byte
representation, while UTF-8 requires two bytes.

As an example, every time `JavaLangAccess::getBytesNoRepl` is called to
convert a string to a UTF-8 array,
the internal implementation needs to call `StringCoding.hasNegatives` to
scan the content byte array to determine that
the string can be encoded in ASCII, and thus eliminate array copies.
Similar steps are performed when calling `str.getBytes(UTF_8)`.

So, is it possible to introduce a third possible value for `String::coder`:
ASCII?
This looks like an attractive option, and if we do this, we can introduce
fast paths for many methods.

Of course, I know that this change is not completely free, and some methods
may bring slight performance
degradation due to the need to judge the coder, in particular, there may be
an impact on the performance of the
StringBuilder/StringBuffer. However, given that UTF-8 is by far the most
commonly used file encoding, the performance
benefits of fast paths are likely to cover more scenarios. In addition to
this, other ASCII compatible encodings may
also benefit, such as GB18030(GBK), or ISO 8859 variants. And if frozen
arrays were introduced into the JDK,
there would be more scenarios to enjoy performance improvements.

So I would like to ask, is it possible for JDK to improve String storage in
a similar way in the future?
Has anyone explored this issue before?

Sorry to bother you all, but I'm very much looking forward to the answer to
this question.

Re: RFR: 8286441: Remove mode parameter from jdk.internal.perf.Perf.attach()

2022-05-10 Thread Claes Redestad

On Tue, 10 May 2022 04:00:29 GMT, Ioi Lam  wrote:

> The `mode` parameter for ` jdk.internal.perf.Perf.attach()` has never 
> supported the value `"rw"` since the source code was imported to the openjdk 
> repo more than 15 years ago. In fact HotSpot throws 
> `IllegalArgumentException` when such a mode is specified.
> 
> It's unlikely such a mode will be required for future enhancements. Support 
> for `"rw"` is removed. The "mode" parameter is also removed, since now `"r"` 
> is the only supported mode.
> 
> I also cleaned up related code in the JDK and HotSpot.
> 
> Testing:
> - Passed tiers 1 and 2
> - Tiers 3, 4, 5 are in progress

Nice cleanup!

(Some stylistic suggestions inline, feel free to ignore)

src/hotspot/os/windows/perfMemory_windows.cpp line 1781:

> 1779: // address space.
> 1780: //
> 1781: void PerfMemory::attach(const char* user, int vmid,

One line?

src/hotspot/share/prims/perf.cpp line 84:

> 82: 
> 83:   // attach to the PerfData memory region for the specified VM
> 84:   PerfMemory::attach(user_utf, vmid,

One line?

src/hotspot/share/runtime/perfMemory.hpp line 146:

> 144: // methods for attaching to and detaching from the PerfData
> 145: // memory segment of another JVM process on the same system.
> 146: static void attach(const char* user, int vmid,

One line?

src/jdk.jstatd/share/classes/sun/tools/jstatd/RemoteHostImpl.java line 74:

> 72: Integer v = lvmid;
> 73: RemoteVm stub = null;
> 74: StringBuilder sb = new StringBuilder();

Suggestion:

String vmidStr = "local://" + lvmid + "@localhost";

-

Marked as reviewed by redestad (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8622

Integrated: 8286298: Remove unused methods in sun.invoke.util.VerifyType

2022-05-10 Thread Claes Redestad

On Fri, 6 May 2022 11:32:25 GMT, Claes Redestad  wrote:

> A few untested and unused methods in `VerifyType` which can be removed. 
> (Possibly used by native JSR 292 implementations in JDK 7).

This pull request has now been integrated.

Changeset: 3fa1c404
Author:    Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/3fa1c4043919943baf0a2cdfaf040ffdd844750c
Stats: 91 lines in 2 files changed: 2 ins; 85 del; 4 mod

8286298: Remove unused methods in sun.invoke.util.VerifyType

Reviewed-by: bpb, alanb, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/8570

Re: RFR: 8282701: Use Class.getInterfaces(false) where possible to reduce allocation pressure [v2]

2022-05-10 Thread Claes Redestad

On Wed, 4 May 2022 09:46:00 GMT, Сергей Цыпанов  wrote:

>> `Class.getInterfaces(false)` does not clone underlying array and can be used 
>> in cases when the returned array is only read from.
>
> Сергей Цыпанов has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8282701: Revert some irrelevant changes

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/7714

Re: RFR: 8286298: Remove unused methods in sun.invoke.util.VerifyType [v2]

2022-05-09 Thread Claes Redestad

> A few untested and unused methods in `VerifyType` which can be removed. 
> (Possibly used by native JSR 292 implementations in JDK 7).

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Inline isNullReferenceConversion at sole call site and remove it as well

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8570/files
  - new: https://git.openjdk.java.net/jdk/pull/8570/files/a456de4f..b2d1af4b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8570&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8570&range=00-01

  Stats: 19 lines in 2 files changed: 2 ins; 14 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8570.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8570/head:pull/8570

PR: https://git.openjdk.java.net/jdk/pull/8570

Re: RFR: 8286163: micro-optimize Instant.plusSeconds

2022-05-08 Thread Claes Redestad

On Fri, 6 May 2022 22:02:58 GMT, Stephen Colebourne  
wrote:

> Seems reasonable to me. plus(long, long) already has this optimisation.

If it already had this optimization then why change anything? I think you're 
referring to the check for `0` to return `this` then that is something 
`plusSeconds` will need to retain for parity. The optimization here appears to 
be the avoiding of added arithmetic for dealing with nanoseconds. And though it 
bothers me that the JIT doesn't optimize this better (given that the nanosecond 
parameter to `plus(long, long)` is a constant zero) the patch does seem 
reasonable.

-

PR: https://git.openjdk.java.net/jdk/pull/8542

Re: RFR: 8286163: micro-optimize Instant.plusSeconds

2022-05-08 Thread Claes Redestad

On Wed, 4 May 2022 20:27:04 GMT, lennartfricke  wrote:

> Provide micro-benchmark for comparison

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8542

Re: RFR: 8286163: micro-optimize Instant.plusSeconds

2022-05-06 Thread Claes Redestad

On Wed, 4 May 2022 20:27:04 GMT, lennartfricke  wrote:

> Provide micro-benchmark for comparison

Hi, thanks for the contribution! 

How big a speed-up are you observing? 

Keeping the optimization in `plusSeconds` rather than moving it to `plus(long, 
long)` means expressions like `instant.plusMillis(1000)` won't be helped, but 
such expressions might be rarely helped anyway so what you have might be better 
overall since it doesn't add a branch to the common code.

-

PR: https://git.openjdk.java.net/jdk/pull/8542

RFR: 8286298: Remove unused methods in sun.invoke.util.VerifyType

2022-05-06 Thread Claes Redestad

A few untested and unused methods in `VerifyType` which can be removed. 
(Possibly used by native JSR 292 implementations in JDK 7).

-

Commit messages:
 - Remove unused methods in sun.invoke.util.VerifyType

Changes: https://git.openjdk.java.net/jdk/pull/8570/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8570&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8286298
  Stats: 72 lines in 1 file changed: 0 ins; 71 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8570.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8570/head:pull/8570

PR: https://git.openjdk.java.net/jdk/pull/8570

Re: RFR: 8282701: Use Class.getInterfaces(false) where possible to reduce allocation pressure

2022-05-02 Thread Claes Redestad

On Mon, 2 May 2022 16:19:07 GMT, Mandy Chung  wrote:

>> `Class.getInterfaces(false)` does not clone underlying array and can be used 
>> in cases when the returned array is only read from.
>
> For the `checkPackageAccess` case, I don't think it worths fixing; not only 
> that security manager is deprecated for removal but also when security 
> manager is enabled, there are lots of allocations for other security checks.
> 
> So the `getInterface(boolean)` method can be kept private.

I agree with @mlchung that the call site in `ClassLoader` is not particularly 
interesting and doesn't motivate a non-cloning, trusted method. There are a few 
other places in java.base where it's used and a trusted method could help, but 
not sure any of those are performance critical.

-

PR: https://git.openjdk.java.net/jdk/pull/7714

Re: RFR: 8282701: Use Class.getInterfaces(false) where possible to reduce allocation pressure

2022-05-02 Thread Claes Redestad

On Sat, 5 Mar 2022 13:07:56 GMT, Сергей Цыпанов  wrote:

> `Class.getInterfaces(false)` does not clone underlying array and can be used 
> in cases when the returned array is only read from.

I think this ok in general, but for consistency and to better call out that 
we're dealing with a trusted shared array I would prefer if 
`getInterfaces(boolean)` remains private and instead we add a 
`getInterfacesShared()` (which simply call `getInterfaces(false)`).

-

Changes requested by redestad (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7714

Integrated: 8285633: Take better advantage of generic MethodType cache

2022-04-27 Thread Claes Redestad

On Tue, 26 Apr 2022 10:57:04 GMT, Claes Redestad  wrote:

> The `MethodType.genericMethodType` methods provide convenience methods for 
> certain common method types and also provide `@Stable` cache that allows for 
> constant folding. This patch enhances the more generic `methodType` methods 
> to take advantage of this cache, when possible. This allows calls like 
> `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, 
> making them 50x+ faster and allocation-free.
> 
> Baseline:
> 
> BenchmarkMode  Cnt   Score   Error  Units
> MethodTypeAcquire.baselineRawavgt   15   0.673 ? 0.017  ns/op
> MethodTypeAcquire.testGenericObject  avgt   15   0.579 ? 0.125  ns/op
> MethodTypeAcquire.testMultiPType avgt   15  46.671 ? 1.134  ns/op
> MethodTypeAcquire.testMultiPType_Arg avgt   15  27.019 ? 0.437  ns/op
> MethodTypeAcquire.testMultiPType_ObjectAndA  avgt   15  57.217 ? 0.505  ns/op
> MethodTypeAcquire.testMultiPType_ObjectOnly  avgt   15  57.114 ? 1.214  ns/op
> MethodTypeAcquire.testObjectObject   avgt   15  33.380 ? 1.810  ns/op
> MethodTypeAcquire.testReturnInt  avgt   15  28.043 ? 0.187  ns/op
> MethodTypeAcquire.testReturnObject   avgt   15  24.313 ? 0.074  ns/op
> MethodTypeAcquire.testReturnVoid avgt   15  24.360 ? 0.155  ns/op
> MethodTypeAcquire.testSinglePTypeavgt   15  33.572 ? 1.101  ns/op
> 
> 
> Patched:
> 
> BenchmarkMode  Cnt   Score   Error  Units
> MethodTypeAcquire.baselineRawavgt   15   0.683 ? 0.024  ns/op
> MethodTypeAcquire.testGenericObject  avgt   15   0.547 ? 0.047  ns/op
> MethodTypeAcquire.testMultiPType avgt   15  48.532 ? 0.982  ns/op
> MethodTypeAcquire.testMultiPType_Arg avgt   15  31.390 ? 5.269  ns/op
> MethodTypeAcquire.testMultiPType_ObjectAndA  avgt   15  60.165 ? 2.756  ns/op
> MethodTypeAcquire.testMultiPType_ObjectOnly  avgt   15   0.599 ? 0.166  ns/op
> MethodTypeAcquire.testObjectObject   avgt   15   0.541 ? 0.045  ns/op
> MethodTypeAcquire.testReturnInt  avgt   15  28.174 ? 0.257  ns/op
> MethodTypeAcquire.testReturnObject   avgt   15   0.542 ? 0.045  ns/op
> MethodTypeAcquire.testReturnVoid avgt   15  24.621 ? 0.202  ns/op
> MethodTypeAcquire.testSinglePTypeavgt   15  33.614 ? 1.109  ns/op
> 
> 
> The cost on method types that don't match are in the noise (0-2ns/op). Even 
> on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) 
> there's barely any appreciable cost (~3ns/op, with overlapping error).

This pull request has now been integrated.

Changeset: 6c79671e
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/6c79671e50d572f3da3a286d34a98dcb83b8d906
Stats: 119 lines in 8 files changed: 96 ins; 1 del; 22 mod

8285633: Take better advantage of generic MethodType cache

Reviewed-by: jvernee

-

PR: https://git.openjdk.java.net/jdk/pull/8398

Re: RFR: 8285633: Take better advantage of generic MethodType cache [v2]

2022-04-26 Thread Claes Redestad

On Tue, 26 Apr 2022 19:30:31 GMT, Jorn Vernee  wrote:

>> Right.. I did a quick experiment and there's a large speed-up in the trivial 
>> `methodType(obj, obj)` case:
>> 
>> Benchmark  Mode  Cnt   Score   Error  
>> Units
>> MethodTypeAcquire.testObjectObjectNonConstant  avgt5  30.052 ? 3.440  
>> ns/op # baseline
>> MethodTypeAcquire.testObjectObjectNonConstant  avgt5  1.171  ? 0.001  
>> ns/op # patch
>> 
>> 
>> I'll add a non-constant variant for the multi-arg Object variants, too. It 
>> should scale linearly and see a gain in the same ballpark. I think we need 
>> to keep the number of microbenchmarks here somewhat under control, though.
>
> That sounds good, thanks

Added 3 micros I think carry their own weight as they assess some different 
major pathways through the caching fast-path.

Baseline:

Benchmark Mode  Cnt   Score   Error 
 Units
MethodTypeAcquire.testMultiPType_ObjectAndA_NonConst  avgt   15  63.523 ? 2.452 
 ns/op
MethodTypeAcquire.testMultiPType_ObjectOnly_NonConst  avgt   15  57.770 ? 0.501 
 ns/op
MethodTypeAcquire.testObjectObject_NonConst   avgt   15  30.090 ? 0.251 
 ns/op

Patched:

Benchmark Mode  Cnt   Score
Error  Units
MethodTypeAcquire.testMultiPType_ObjectAndA_NonConst  avgt   15  64.570 ?  
0.410  ns/op
MethodTypeAcquire.testMultiPType_ObjectOnly_NonConst  avgt   15   4.735 ?  
0.063  ns/op
MethodTypeAcquire.testObjectObject_NonConst   avgt   15   1.171 ?  
0.001  ns/op

The relative speed-up diminishes a bit for the loopy 6-arg `methodType` calls, 
but is still a healthy 12x. Overhead on the negative test might be there but is 
very much in the noise on my tests (tried doubling warmup time to no avail)

-

PR: https://git.openjdk.java.net/jdk/pull/8398

Re: RFR: 8285633: Take better advantage of generic MethodType cache [v2]

2022-04-26 Thread Claes Redestad

> The `MethodType.genericMethodType` methods provide convenience methods for 
> certain common method types and also provide `@Stable` cache that allows for 
> constant folding. This patch enhances the more generic `methodType` methods 
> to take advantage of this cache, when possible. This allows calls like 
> `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, 
> making them 50x+ faster and allocation-free.
> 
> Baseline:
> 
> BenchmarkMode  Cnt   Score   Error  Units
> MethodTypeAcquire.baselineRawavgt   15   0.673 ? 0.017  ns/op
> MethodTypeAcquire.testGenericObject  avgt   15   0.579 ? 0.125  ns/op
> MethodTypeAcquire.testMultiPType avgt   15  46.671 ? 1.134  ns/op
> MethodTypeAcquire.testMultiPType_Arg avgt   15  27.019 ? 0.437  ns/op
> MethodTypeAcquire.testMultiPType_ObjectAndA  avgt   15  57.217 ? 0.505  ns/op
> MethodTypeAcquire.testMultiPType_ObjectOnly  avgt   15  57.114 ? 1.214  ns/op
> MethodTypeAcquire.testObjectObject   avgt   15  33.380 ? 1.810  ns/op
> MethodTypeAcquire.testReturnInt  avgt   15  28.043 ? 0.187  ns/op
> MethodTypeAcquire.testReturnObject   avgt   15  24.313 ? 0.074  ns/op
> MethodTypeAcquire.testReturnVoid avgt   15  24.360 ? 0.155  ns/op
> MethodTypeAcquire.testSinglePTypeavgt   15  33.572 ? 1.101  ns/op
> 
> 
> Patched:
> 
> BenchmarkMode  Cnt   Score   Error  Units
> MethodTypeAcquire.baselineRawavgt   15   0.683 ? 0.024  ns/op
> MethodTypeAcquire.testGenericObject  avgt   15   0.547 ? 0.047  ns/op
> MethodTypeAcquire.testMultiPType avgt   15  48.532 ? 0.982  ns/op
> MethodTypeAcquire.testMultiPType_Arg avgt   15  31.390 ? 5.269  ns/op
> MethodTypeAcquire.testMultiPType_ObjectAndA  avgt   15  60.165 ? 2.756  ns/op
> MethodTypeAcquire.testMultiPType_ObjectOnly  avgt   15   0.599 ? 0.166  ns/op
> MethodTypeAcquire.testObjectObject   avgt   15   0.541 ? 0.045  ns/op
> MethodTypeAcquire.testReturnInt  avgt   15  28.174 ? 0.257  ns/op
> MethodTypeAcquire.testReturnObject   avgt   15   0.542 ? 0.045  ns/op
> MethodTypeAcquire.testReturnVoid avgt   15  24.621 ? 0.202  ns/op
> MethodTypeAcquire.testSinglePTypeavgt   15  33.614 ? 1.109  ns/op
> 
> 
> The cost on method types that don't match are in the noise (0-2ns/op). Even 
> on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) 
> there's barely any appreciable cost (~3ns/op, with overlapping error).

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Add micros using non-constant arguments to assess performance in absense of 
constant folding

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8398/files
  - new: https://git.openjdk.java.net/jdk/pull/8398/files/6d2edc59..fcc020e6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8398&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8398&range=00-01

  Stats: 18 lines in 1 file changed: 18 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8398.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8398/head:pull/8398

PR: https://git.openjdk.java.net/jdk/pull/8398

Re: RFR: 8285633: Take better advantage of generic MethodType cache

2022-04-26 Thread Claes Redestad

On Tue, 26 Apr 2022 15:30:41 GMT, Jorn Vernee  wrote:

>> The `MethodType.genericMethodType` methods provide convenience methods for 
>> certain common method types and also provide `@Stable` cache that allows for 
>> constant folding. This patch enhances the more generic `methodType` methods 
>> to take advantage of this cache, when possible. This allows calls like 
>> `MethodType.methodType(Object.class, Object.class, ...)` to constant fold, 
>> making them 50x+ faster and allocation-free.
>> 
>> Baseline:
>> 
>> BenchmarkMode  Cnt   Score   Error  Units
>> MethodTypeAcquire.baselineRawavgt   15   0.673 ? 0.017  ns/op
>> MethodTypeAcquire.testGenericObject  avgt   15   0.579 ? 0.125  ns/op
>> MethodTypeAcquire.testMultiPType avgt   15  46.671 ? 1.134  ns/op
>> MethodTypeAcquire.testMultiPType_Arg avgt   15  27.019 ? 0.437  ns/op
>> MethodTypeAcquire.testMultiPType_ObjectAndA  avgt   15  57.217 ? 0.505  ns/op
>> MethodTypeAcquire.testMultiPType_ObjectOnly  avgt   15  57.114 ? 1.214  ns/op
>> MethodTypeAcquire.testObjectObject   avgt   15  33.380 ? 1.810  ns/op
>> MethodTypeAcquire.testReturnInt  avgt   15  28.043 ? 0.187  ns/op
>> MethodTypeAcquire.testReturnObject   avgt   15  24.313 ? 0.074  ns/op
>> MethodTypeAcquire.testReturnVoid avgt   15  24.360 ? 0.155  ns/op
>> MethodTypeAcquire.testSinglePTypeavgt   15  33.572 ? 1.101  ns/op
>> 
>> 
>> Patched:
>> 
>> BenchmarkMode  Cnt   Score   Error  Units
>> MethodTypeAcquire.baselineRawavgt   15   0.683 ? 0.024  ns/op
>> MethodTypeAcquire.testGenericObject  avgt   15   0.547 ? 0.047  ns/op
>> MethodTypeAcquire.testMultiPType avgt   15  48.532 ? 0.982  ns/op
>> MethodTypeAcquire.testMultiPType_Arg avgt   15  31.390 ? 5.269  ns/op
>> MethodTypeAcquire.testMultiPType_ObjectAndA  avgt   15  60.165 ? 2.756  ns/op
>> MethodTypeAcquire.testMultiPType_ObjectOnly  avgt   15   0.599 ? 0.166  ns/op
>> MethodTypeAcquire.testObjectObject   avgt   15   0.541 ? 0.045  ns/op
>> MethodTypeAcquire.testReturnInt  avgt   15  28.174 ? 0.257  ns/op
>> MethodTypeAcquire.testReturnObject   avgt   15   0.542 ? 0.045  ns/op
>> MethodTypeAcquire.testReturnVoid avgt   15  24.621 ? 0.202  ns/op
>> MethodTypeAcquire.testSinglePTypeavgt   15  33.614 ? 1.109  ns/op
>> 
>> 
>> The cost on method types that don't match are in the noise (0-2ns/op). Even 
>> on a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) 
>> there's barely any appreciable cost (~3ns/op, with overlapping error).
>
> test/micro/org/openjdk/bench/java/lang/invoke/MethodTypeAcquire.java line 104:
> 
>> 102: @Benchmark
>> 103: public MethodType testMultiPType_ObjectOnly() {
>> 104: return MethodType.methodType(Object.class, Object.class, 
>> Object.class, Object.class, Object.class, Object.class, Object.class);
> 
> It might be interesting to add a benchmark where all types are `Object`, but 
> non-constants, to see if that case improves as well.

Right.. I did a quick experiment and there's a large speed-up in the trivial 
`methodType(obj, obj)` case:

Benchmark  Mode  Cnt   Score   Error  Units
MethodTypeAcquire.testObjectObjectNonConstant  avgt5  30.052 ? 3.440  ns/op 
# baseline
MethodTypeAcquire.testObjectObjectNonConstant  avgt5  1.171  ? 0.001  ns/op 
# patch


I'll add a non-constant variant for the multi-arg Object variants, too. It 
should scale linearly and see a gain in the same ballpark. I think we need to 
keep the number of microbenchmarks here somewhat under control, though.

-

PR: https://git.openjdk.java.net/jdk/pull/8398

RFR: 8285633: Take better advantage of generic MethodType cache

2022-04-26 Thread Claes Redestad

The `MethodType.genericMethodType` methods provide convenience methods for 
certain common method types and also provide `@Stable` cache that allows for 
constant folding. This patch enhances the more generic `methodType` methods to 
take advantage of this cache, when possible. This allows calls like 
`MethodType.methodType(Object.class, Object.class, ...)` to constant fold, 
making them 50x+ faster and allocation-free.

Baseline:

BenchmarkMode  Cnt   Score   Error  Units
MethodTypeAcquire.baselineRawavgt5   0.673 ? 0.036  ns/op
MethodTypeAcquire.testGenericObject  avgt5   0.520 ? 0.162  ns/op
MethodTypeAcquire.testMultiPType avgt5  47.922 ? 1.778  ns/op
MethodTypeAcquire.testMultiPType_Arg avgt5  28.992 ? 0.949  ns/op
MethodTypeAcquire.testMultiPType_ObjectAndA  avgt5  56.698 ? 0.879  ns/op
MethodTypeAcquire.testMultiPType_ObjectOnly  avgt5  55.705 ? 0.854  ns/op
MethodTypeAcquire.testObjectObject   avgt5  31.729 ? 2.204  ns/op
MethodTypeAcquire.testReturnInt  avgt5  26.171 ? 0.141  ns/op
MethodTypeAcquire.testReturnObject   avgt5  24.340 ? 0.868  ns/op
MethodTypeAcquire.testReturnVoid avgt5  24.793 ? 2.816  ns/op
MethodTypeAcquire.testSinglePTypeavgt5  32.653 ? 1.082  ns/op


Patched:

BenchmarkMode  Cnt   Score   Error  Units
MethodTypeAcquire.baselineRawavgt5   0.727 ? 0.382  ns/op
MethodTypeAcquire.testGenericObject  avgt5   0.658 ? 0.655  ns/op
MethodTypeAcquire.testMultiPType avgt5  48.066 ? 1.187  ns/op
MethodTypeAcquire.testMultiPType_Arg avgt5  27.097 ? 0.745  ns/op
MethodTypeAcquire.testMultiPType_ObjectAndA  avgt5  58.074 ? 6.299  ns/op
MethodTypeAcquire.testMultiPType_ObjectOnly  avgt5   0.538 ? 0.195  ns/op
MethodTypeAcquire.testObjectObject   avgt5   0.520 ? 0.162  ns/op
MethodTypeAcquire.testReturnInt  avgt5  28.093 ? 0.235  ns/op
MethodTypeAcquire.testReturnObject   avgt5   0.540 ? 0.204  ns/op
MethodTypeAcquire.testReturnVoid avgt5  24.746 ? 0.706  ns/op
MethodTypeAcquire.testSinglePTypeavgt5  32.662 ? 1.924  ns/op


The cost on method types that don't match are in the noise (0-2ns/op). Even on 
a test I constructed to act as a worst case (`testMultiPType_ObjectAndA`) 
there's barely any appreciable cost.

-

Commit messages:
 - Copyrights
 - 8285633: Take better advantage of generic MethodType cache

Changes: https://git.openjdk.java.net/jdk/pull/8398/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8398&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8285633
  Stats: 101 lines in 8 files changed: 78 ins; 1 del; 22 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8398.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8398/head:pull/8398

PR: https://git.openjdk.java.net/jdk/pull/8398

Re: RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]

2022-04-25 Thread Claes Redestad

On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess  wrote:

>> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches  
>> https://github.com/openjdk/jdk/pull/8292/
>> 
>> if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>> continue;
>> }
>> 
>> should be changed to 
>> 
>> if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : 
>> c1) == Character.toLowerCase(u2)) {
>> continue;
>> }
>> 
>> as:
>> 
>> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster.
>> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and 
>> don't need a lowercase cauculation.
>
> XenoAmess has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   remove = check

Unfortunately this leads to an error for case-insensitive `regionMatches` 
between a latin-1-string that contains either of `\u00b5` or `\u00ff` (these 
two code-points have upper case codepoints outside of the latin-1 range) and a 
UTF-16 string:


jshell> "\u00b5".regionMatches(true, 0, "\u0100", 0, 1)
|  Exception java.lang.ArrayIndexOutOfBoundsException: Index 924 out of bounds 
for length 256
|at CharacterDataLatin1.getProperties (CharacterDataLatin1.java:74)
|at CharacterDataLatin1.toLowerCase (CharacterDataLatin1.java:140)
|at StringLatin1.regionMatchesCI_UTF16 (StringLatin1.java:420)
|at String.regionMatches (String.java:2238)
|at (#4:1)

-

PR: https://git.openjdk.java.net/jdk/pull/8308

Integrated: 8284880: Re-examine sun.invoke.util.Wrapper hash tables

2022-04-19 Thread Claes Redestad

On Thu, 14 Apr 2022 11:19:04 GMT, Claes Redestad  wrote:

> This patch examines and optimizes `Wrapper` lookups.
> 
> First wrote a few simple microbenchmarks to verify there are actual speedups 
> from using perfect hash tables in `sun.invoke.util.Wrapper` compared to 
> simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a 
> speed-up for the case of `char` -> `Wrapper`, but not when mapping from 
> `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case 
> didn't use the `FROM_CHAR` table for some reason, which is remedied.  
> 
> Micros show benefits across the board for warmed up case:
> 
> 
> Baseline, OOTB
> Benchmark  Mode  Cnt   Score   Error  Units
> Wrappers.forBasicType  avgt5  14.387 ? 0.127  ns/op
> Wrappers.forPrimitive  avgt5  38.818 ? 0.592  ns/op
> Wrappers.forPrimitiveType  avgt5  26.085 ? 2.291  ns/op
> Wrappers.forWrapperavgt5  44.459 ? 1.635  ns/op
> 
> Patch, OOTB
> Benchmark  Mode  Cnt   Score   Error  Units
> Wrappers.forBasicType  avgt5  14.357 ? 0.133  ns/op
> Wrappers.forPrimitive  avgt5  23.930 ? 0.071  ns/op
> Wrappers.forPrimitiveType  avgt5  14.343 ? 0.017  ns/op
> Wrappers.forWrapperavgt5  27.622 ? 0.022  ns/op
> 
> 
> For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when 
> spinning up of MHs) there are decent or even great wins in all cases but 
> `forPrimitiveType` - which was changed from a simple switch to use the hash 
> lookup. Since the interpreter penalty is small in absolute terms and the win 
> on JITed code is significant this seems like a reasonable trade-off:
> 
> 
> Baseline, -Xint
> Benchmark  Mode  Cnt Score Error  Units
> Wrappers.forBasicType  avgt5  1246.144 ? 149.933  ns/op
> Wrappers.forPrimitive  avgt5  4955.297 ? 329.869  ns/op
> Wrappers.forPrimitiveType  avgt5   716.840 ?  62.568  ns/op
> Wrappers.forWrapperavgt5  5774.700 ? 367.627  ns/op
> 
> Patch, -Xint
> Benchmark  Mode  Cnt Score Error  Units
> Wrappers.forBasicType  avgt5  1068.096 ? 101.728  ns/op
> Wrappers.forPrimitive  avgt5  1146.670 ?  59.142  ns/op
> Wrappers.forPrimitiveType  avgt5   998.037 ? 118.144  ns/op
> Wrappers.forWrapperavgt5  3581.226 ?  20.167  ns/op

This pull request has now been integrated.

Changeset: 5df8bd6b
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/5df8bd6b4e15686aa7d72b3f5a977eb51b0befc3
Stats: 259 lines in 3 files changed: 158 ins; 59 del; 42 mod

8284880: Re-examine sun.invoke.util.Wrapper hash tables

Reviewed-by: erikj, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/8242

Re: RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables [v3]

2022-04-19 Thread Claes Redestad

> This patch examines and optimizes `Wrapper` lookups.
> 
> First wrote a few simple microbenchmarks to verify there are actual speedups 
> from using perfect hash tables in `sun.invoke.util.Wrapper` compared to 
> simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a 
> speed-up for the case of `char` -> `Wrapper`, but not when mapping from 
> `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case 
> didn't use the `FROM_CHAR` table for some reason, which is remedied.  
> 
> Micros show benefits across the board for warmed up case:
> 
> 
> Baseline, OOTB
> Benchmark  Mode  Cnt   Score   Error  Units
> Wrappers.forBasicType  avgt5  14.387 ? 0.127  ns/op
> Wrappers.forPrimitive  avgt5  38.818 ? 0.592  ns/op
> Wrappers.forPrimitiveType  avgt5  26.085 ? 2.291  ns/op
> Wrappers.forWrapperavgt5  44.459 ? 1.635  ns/op
> 
> Patch, OOTB
> Benchmark  Mode  Cnt   Score   Error  Units
> Wrappers.forBasicType  avgt5  14.357 ? 0.133  ns/op
> Wrappers.forPrimitive  avgt5  23.930 ? 0.071  ns/op
> Wrappers.forPrimitiveType  avgt5  14.343 ? 0.017  ns/op
> Wrappers.forWrapperavgt5  27.622 ? 0.022  ns/op
> 
> 
> For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when 
> spinning up of MHs) there are decent or even great wins in all cases but 
> `forPrimitiveType` - which was changed from a simple switch to use the hash 
> lookup. Since the interpreter penalty is small in absolute terms and the win 
> on JITed code is significant this seems like a reasonable trade-off:
> 
> 
> Baseline, -Xint
> Benchmark  Mode  Cnt Score Error  Units
> Wrappers.forBasicType  avgt5  1246.144 ? 149.933  ns/op
> Wrappers.forPrimitive  avgt5  4955.297 ? 329.869  ns/op
> Wrappers.forPrimitiveType  avgt5   716.840 ?  62.568  ns/op
> Wrappers.forWrapperavgt5  5774.700 ? 367.627  ns/op
> 
> Patch, -Xint
> Benchmark  Mode  Cnt Score Error  Units
> Wrappers.forBasicType  avgt5  1068.096 ? 101.728  ns/op
> Wrappers.forPrimitive  avgt5  1146.670 ?  59.142  ns/op
> Wrappers.forPrimitiveType  avgt5   998.037 ? 118.144  ns/op
> Wrappers.forWrapperavgt5  3581.226 ?  20.167  ns/op

Claes Redestad has updated the pull request incrementally with two additional 
commits since the last revision:

 - Merge branch 'wrappers' of https://github.com/cl4es/jdk into wrappers
 - Copyrights

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8242/files
  - new: https://git.openjdk.java.net/jdk/pull/8242/files/97eec9d3..792c37e2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=01-02

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8242.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8242/head:pull/8242

PR: https://git.openjdk.java.net/jdk/pull/8242

Re: RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables [v2]

2022-04-19 Thread Claes Redestad

On Thu, 14 Apr 2022 13:59:57 GMT, Claes Redestad  wrote:

>> This patch examines and optimizes `Wrapper` lookups.
>> 
>> First wrote a few simple microbenchmarks to verify there are actual speedups 
>> from using perfect hash tables in `sun.invoke.util.Wrapper` compared to 
>> simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ 
>> a speed-up for the case of `char` -> `Wrapper`, but not when mapping from 
>> `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case 
>> didn't use the `FROM_CHAR` table for some reason, which is remedied.  
>> 
>> Micros show benefits across the board for warmed up case:
>> 
>> 
>> Baseline, OOTB
>> Benchmark  Mode  Cnt   Score   Error  Units
>> Wrappers.forBasicType  avgt5  14.387 ? 0.127  ns/op
>> Wrappers.forPrimitive  avgt5  38.818 ? 0.592  ns/op
>> Wrappers.forPrimitiveType  avgt5  26.085 ? 2.291  ns/op
>> Wrappers.forWrapperavgt5  44.459 ? 1.635  ns/op
>> 
>> Patch, OOTB
>> Benchmark  Mode  Cnt   Score   Error  Units
>> Wrappers.forBasicType  avgt5  14.357 ? 0.133  ns/op
>> Wrappers.forPrimitive  avgt5  23.930 ? 0.071  ns/op
>> Wrappers.forPrimitiveType  avgt5  14.343 ? 0.017  ns/op
>> Wrappers.forWrapperavgt5  27.622 ? 0.022  ns/op
>> 
>> 
>> For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when 
>> spinning up of MHs) there are decent or even great wins in all cases but 
>> `forPrimitiveType` - which was changed from a simple switch to use the hash 
>> lookup. Since the interpreter penalty is small in absolute terms and the win 
>> on JITed code is significant this seems like a reasonable trade-off:
>> 
>> 
>> Baseline, -Xint
>> Benchmark  Mode  Cnt Score Error  Units
>> Wrappers.forBasicType  avgt5  1246.144 ? 149.933  ns/op
>> Wrappers.forPrimitive  avgt5  4955.297 ? 329.869  ns/op
>> Wrappers.forPrimitiveType  avgt5   716.840 ?  62.568  ns/op
>> Wrappers.forWrapperavgt5  5774.700 ? 367.627  ns/op
>> 
>> Patch, -Xint
>> Benchmark  Mode  Cnt Score Error  Units
>> Wrappers.forBasicType  avgt5  1068.096 ? 101.728  ns/op
>> Wrappers.forPrimitive  avgt5  1146.670 ?  59.142  ns/op
>> Wrappers.forPrimitiveType  avgt5   998.037 ? 118.144  ns/op
>> Wrappers.forWrapperavgt5  3581.226 ?  20.167  ns/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add line break in make/test/BuildMicrobenchmark.gmk
>   
>   Co-authored-by: Erik Joelsson <37597443+erik...@users.noreply.github.com>

Thanks for reviews!

-

PR: https://git.openjdk.java.net/jdk/pull/8242

Integrated: 8285001: Simplify StringLatin1.regionMatches

2022-04-19 Thread Claes Redestad

On Tue, 19 Apr 2022 08:50:09 GMT, Claes Redestad  wrote:

> There is no pair of character values in the latin1 range (0-255) that will 
> make the condition at line 401 in `StringLatin1.java` true, so this test can 
> be removed.
> 
> Added a test and a microbenchmark (which as expected sees a few ns/op 
> improvement from this change). Took the opportunity to tune the default 
> settings for the micro to reduce runtime from 30+ minutes to 3 with no 
> discernible degradation of quality.

This pull request has now been integrated.

Changeset: e307bc86
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/e307bc8694462568807021191f9653ee80a93ed1
Stats: 95 lines in 3 files changed: 88 ins; 3 del; 4 mod

8285001: Simplify StringLatin1.regionMatches

Reviewed-by: rriggs, naoto

-

PR: https://git.openjdk.java.net/jdk/pull/8292

Re: RFR: 8285001: Simplify StringLatin1.regionMatches

2022-04-19 Thread Claes Redestad

On Tue, 19 Apr 2022 08:50:09 GMT, Claes Redestad  wrote:

> There is no pair of character values in the latin1 range (0-255) that will 
> make the condition at line 401 in `StringLatin1.java` true, so this test can 
> be removed.
> 
> Added a test and a microbenchmark (which as expected sees a few ns/op 
> improvement from this change). Took the opportunity to tune the default 
> settings for the micro to reduce runtime from 30+ minutes to 3 with no 
> discernible degradation of quality.

Thanks for reviews!

-

PR: https://git.openjdk.java.net/jdk/pull/8292

Re: RFR: 8285001: Simplify StringLatin1.regionMatches [v2]

2022-04-19 Thread Claes Redestad

> There is no pair of character values in the latin1 range (0-255) that will 
> make the condition at line 401 in `StringLatin1.java` true, so this test can 
> be removed.
> 
> Added a test and a microbenchmark (which as expected sees a few ns/op 
> improvement from this change). Took the opportunity to tune the default 
> settings for the micro to reduce runtime from 30+ minutes to 3 with no 
> discernible degradation of quality.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Copyright

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8292/files
  - new: https://git.openjdk.java.net/jdk/pull/8292/files/a8894f79..434a0fd2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8292&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8292&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8292.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8292/head:pull/8292

PR: https://git.openjdk.java.net/jdk/pull/8292

Integrated: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE

2022-04-19 Thread Claes Redestad

On Tue, 19 Apr 2022 13:19:31 GMT, Claes Redestad  wrote:

> Trivially fix the resolution of the `NF_UNSAFE` named function to use the 
> appropriate lookup mode.
> 
> In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed 
> from regular reflection to use a `NamedFunction` for this field, but it 
> appears the lookup mode came out wrong. This mismatch appears to be benign 
> and ignored by HotSpot, but a user implementing an experimental JVM ran into 
> some issues (and additionally noticed the resolve throws the wrong 
> exception). 
> 
> This patch is a point fix to this particular issue, but I think we should 
> consider following up with a stronger assertion or test for proper staticness 
> of fields somewhere when resolving fields via 
> `MemberName.getFactory().resolveOrNull(..)`.

This pull request has now been integrated.

Changeset: 5d1ec54d
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/5d1ec54d6c20dfe67a459c9d102cdfa0394bcc1e
Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod

8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE

Reviewed-by: psandoz, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/8297

Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE [v2]

2022-04-19 Thread Claes Redestad

On Tue, 19 Apr 2022 17:33:04 GMT, Claes Redestad  wrote:

>> Trivially fix the resolution of the `NF_UNSAFE` named function to use the 
>> appropriate lookup mode.
>> 
>> In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we 
>> changed from regular reflection to use a `NamedFunction` for this field, but 
>> it appears the lookup mode came out wrong. This mismatch appears to be 
>> benign and ignored by HotSpot, but a user implementing an experimental JVM 
>> ran into some issues (and additionally noticed the resolve throws the wrong 
>> exception). 
>> 
>> This patch is a point fix to this particular issue, but I think we should 
>> consider following up with a stronger assertion or test for proper 
>> staticness of fields somewhere when resolving fields via 
>> `MemberName.getFactory().resolveOrNull(..)`.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Missed a spot!

Thanks for reviewing!

-

PR: https://git.openjdk.java.net/jdk/pull/8297

Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE [v2]

2022-04-19 Thread Claes Redestad

On Tue, 19 Apr 2022 17:01:38 GMT, Mandy Chung  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Missed a spot!
>
> src/java.base/share/classes/java/lang/invoke/DirectMethodHandle.java line 901:
> 
>> 899: MemberName member = new 
>> MemberName(MethodHandleStatics.class, "UNSAFE", Unsafe.class, REF_getStatic);
>> 900: return new NamedFunction(
>> 901: 
>> MemberName.getFactory().resolveOrFail(REF_getField, member,
> 
> `REF_getField` passed to `resolveOrFail` should also be corrected?

Yes, fixed.

-

PR: https://git.openjdk.java.net/jdk/pull/8297

Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE [v2]

2022-04-19 Thread Claes Redestad

> Trivially fix the resolution of the `NF_UNSAFE` named function to use the 
> appropriate lookup mode.
> 
> In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed 
> from regular reflection to use a `NamedFunction` for this field, but it 
> appears the lookup mode came out wrong. This mismatch appears to be benign 
> and ignored by HotSpot, but a user implementing an experimental JVM ran into 
> some issues (and additionally noticed the resolve throws the wrong 
> exception). 
> 
> This patch is a point fix to this particular issue, but I think we should 
> consider following up with a stronger assertion or test for proper staticness 
> of fields somewhere when resolving fields via 
> `MemberName.getFactory().resolveOrNull(..)`.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Missed a spot!

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8297/files
  - new: https://git.openjdk.java.net/jdk/pull/8297/files/fa10026a..e4f37c49

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8297&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8297&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8297.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8297/head:pull/8297

PR: https://git.openjdk.java.net/jdk/pull/8297

RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE

2022-04-19 Thread Claes Redestad

Trivially fix the resolution of the `NF_UNSAFE` named function to use the 
appropriate lookup mode.

In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed 
from regular reflection to use a `NamedFunction` for this field, but it appears 
the lookup mode came out wrong. This mismatch appears to be benign and ignored 
by HotSpot, but a user implementing an experimental JVM ran into some issues 
(and additionally noticed the resolve throws the wrong exception). 

This patch is a point fix to this particular issue, but I think we should 
consider following up with a stronger assertion or test for proper staticness 
of fields somewhere when resolving fields via 
`MemberName.getFactory().resolveOrNull(..)`.

-

Commit messages:
 - Use correct lookup mode for MethodHandleStatics.UNSAFE

Changes: https://git.openjdk.java.net/jdk/pull/8297/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8297&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8285007
  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8297.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8297/head:pull/8297

PR: https://git.openjdk.java.net/jdk/pull/8297

RFR: 8285001: Simplify StringLatin1.regionMatches

2022-04-19 Thread Claes Redestad

There is no pair of character values in the latin1 range (0-255) that will make 
the condition at line 401 in `StringLatin1.java` true, so this test can be 
removed.

Added a test and a microbenchmark (which as expected sees a few ns/op 
improvement from this change). Took the opportunity to tune the default 
settings for the micro to reduce runtime from 30+ minutes to 3 with no 
discernible degradation of quality.

-

Commit messages:
 - Add tests and micro
 - Simplify latin1 to latin1 locale-independent comparisons in 
StringLatin1.regionMatchesCI

Changes: https://git.openjdk.java.net/jdk/pull/8292/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8292&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8285001
  Stats: 94 lines in 3 files changed: 88 ins; 3 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8292.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8292/head:pull/8292

PR: https://git.openjdk.java.net/jdk/pull/8292

Re: RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables [v2]

2022-04-14 Thread Claes Redestad

> This patch examines and optimizes `Wrapper` lookups.
> 
> First wrote a few simple microbenchmarks to verify there are actual speedups 
> from using perfect hash tables in `sun.invoke.util.Wrapper` compared to 
> simpler lookup mechanisms (such as if-else or switch). Turns out there _is_ a 
> speed-up for the case of `char` -> `Wrapper`, but not when mapping from 
> `Class` -> `Wrapper`, so let's drop those. The `forPrimitiveType` case 
> didn't use the `FROM_CHAR` table for some reason, which is remedied.  
> 
> Micros show benefits across the board for warmed up case:
> 
> 
> Baseline, OOTB
> Benchmark  Mode  Cnt   Score   Error  Units
> Wrappers.forBasicType  avgt5  14.387 ? 0.127  ns/op
> Wrappers.forPrimitive  avgt5  38.818 ? 0.592  ns/op
> Wrappers.forPrimitiveType  avgt5  26.085 ? 2.291  ns/op
> Wrappers.forWrapperavgt5  44.459 ? 1.635  ns/op
> 
> Patch, OOTB
> Benchmark  Mode  Cnt   Score   Error  Units
> Wrappers.forBasicType  avgt5  14.357 ? 0.133  ns/op
> Wrappers.forPrimitive  avgt5  23.930 ? 0.071  ns/op
> Wrappers.forPrimitiveType  avgt5  14.343 ? 0.017  ns/op
> Wrappers.forWrapperavgt5  27.622 ? 0.022  ns/op
> 
> 
> For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when 
> spinning up of MHs) there are decent or even great wins in all cases but 
> `forPrimitiveType` - which was changed from a simple switch to use the hash 
> lookup. Since the interpreter penalty is small in absolute terms and the win 
> on JITed code is significant this seems like a reasonable trade-off:
> 
> 
> Baseline, -Xint
> Benchmark  Mode  Cnt Score Error  Units
> Wrappers.forBasicType  avgt5  1246.144 ? 149.933  ns/op
> Wrappers.forPrimitive  avgt5  4955.297 ? 329.869  ns/op
> Wrappers.forPrimitiveType  avgt5   716.840 ?  62.568  ns/op
> Wrappers.forWrapperavgt5  5774.700 ? 367.627  ns/op
> 
> Patch, -Xint
> Benchmark  Mode  Cnt Score Error  Units
> Wrappers.forBasicType  avgt5  1068.096 ? 101.728  ns/op
> Wrappers.forPrimitive  avgt5  1146.670 ?  59.142  ns/op
> Wrappers.forPrimitiveType  avgt5   998.037 ? 118.144  ns/op
> Wrappers.forWrapperavgt5  3581.226 ?  20.167  ns/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Add line break in make/test/BuildMicrobenchmark.gmk
  
  Co-authored-by: Erik Joelsson <37597443+erik...@users.noreply.github.com>

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8242/files
  - new: https://git.openjdk.java.net/jdk/pull/8242/files/d21330bb..97eec9d3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=00-01

  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8242.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8242/head:pull/8242

PR: https://git.openjdk.java.net/jdk/pull/8242

RFR: 8284880: Re-examine sun.invoke.util.Wrapper hash tables

2022-04-14 Thread Claes Redestad

This patch examines and optimizes `Wrapper` lookups.

First wrote a few simple microbenchmarks to verify there are actual speedups 
from using perfect hash tables in `sun.invoke.util.Wrapper` compared to simpler 
lookup mechanisms (such as if-else or switch). Turns out there _is_ a speed-up 
for the case of `char` -> `Wrapper`, but not when mapping from `Class` -> 
`Wrapper`, so let's drop those. The `forPrimitiveType` case didn't use the 
`FROM_CHAR` table for some reason, which is remedied.  

Micros show benefits across the board for warmed up case:


Baseline, OOTB
Benchmark  Mode  Cnt   Score   Error  Units
Wrappers.forBasicType  avgt5  14.387 ? 0.127  ns/op
Wrappers.forPrimitive  avgt5  38.818 ? 0.592  ns/op
Wrappers.forPrimitiveType  avgt5  26.085 ? 2.291  ns/op
Wrappers.forWrapperavgt5  44.459 ? 1.635  ns/op

Patch, OOTB
Benchmark  Mode  Cnt   Score   Error  Units
Wrappers.forBasicType  avgt5  14.357 ? 0.133  ns/op
Wrappers.forPrimitive  avgt5  23.930 ? 0.071  ns/op
Wrappers.forPrimitiveType  avgt5  14.343 ? 0.017  ns/op
Wrappers.forWrapperavgt5  27.622 ? 0.022  ns/op


For `-Xint` case (`Wrapper` use is prominent during warmup, e.g., when spinning 
up of MHs) there are decent or even great wins in all cases but 
`forPrimitiveType` - which was changed from a simple switch to use the hash 
lookup. Since the interpreter penalty is small in absolute terms and the win on 
JITed code is significant this seems like a reasonable trade-off:


Baseline, -Xint
Benchmark  Mode  Cnt Score Error  Units
Wrappers.forBasicType  avgt5  1246.144 ? 149.933  ns/op
Wrappers.forPrimitive  avgt5  4955.297 ? 329.869  ns/op
Wrappers.forPrimitiveType  avgt5   716.840 ?  62.568  ns/op
Wrappers.forWrapperavgt5  5774.700 ? 367.627  ns/op

Patch, -Xint
Benchmark  Mode  Cnt Score Error  Units
Wrappers.forBasicType  avgt5  1068.096 ? 101.728  ns/op
Wrappers.forPrimitive  avgt5  1146.670 ?  59.142  ns/op
Wrappers.forPrimitiveType  avgt5   998.037 ? 118.144  ns/op
Wrappers.forWrapperavgt5  3581.226 ?  20.167  ns/op

-

Commit messages:
 - Revert accidental changes to MethodType
 - Fix typo
 - Missed some changes refactoring, import DontInline
 - Merge branch 'master' into wrappers
 - Cleanup, use mask instead of modulo
 - Cleanup, add forPrimitiveType micro, reinstantiate char hashing
 - Add required add-exports to BuildMicrobenchmark.gmk
 - Fix imports
 - Tune sun.invoke.util.Wrapper accessors

Changes: https://git.openjdk.java.net/jdk/pull/8242/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8242&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8284880
  Stats: 256 lines in 3 files changed: 157 ins; 59 del; 40 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8242.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8242/head:pull/8242

PR: https://git.openjdk.java.net/jdk/pull/8242

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v9]

2022-04-13 Thread Claes Redestad

On Thu, 7 Apr 2022 07:01:23 GMT, Ludovic Henry  wrote:

>> Despite the hash value being cached for Strings, computing the hash still 
>> represents a significant CPU usage for applications handling lots of text.
>> 
>> Even though it would be generally better to do it through an enhancement to 
>> the autovectorizer, the complexity of doing it by hand is trivial and the 
>> gain is sizable (2x speedup) even without the Vector API. The algorithm has 
>> been proposed by Richard Startin and Paul Sandoz [1].
>> 
>> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`
>> 
>> 
>> Benchmark(size)  Mode  Cnt Score 
>>Error  Units
>> StringHashCode.Algorithm.scalarLatin1 0  avgt   25 2.111 
>> ±  0.210  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25 3.500 
>> ±  0.127  ns/op
>> StringHashCode.Algorithm.scalarLatin110  avgt   25 7.001 
>> ±  0.099  ns/op
>> StringHashCode.Algorithm.scalarLatin1   100  avgt   2561.285 
>> ±  0.444  ns/op
>> StringHashCode.Algorithm.scalarLatin1  1000  avgt   25   628.995 
>> ±  0.846  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25  6307.990 
>> ±  4.071  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   0  avgt   25 2.358 
>> ±  0.092  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25 3.631 
>> ±  0.159  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16  10  avgt   25 7.049 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16 100  avgt   2533.626 
>> ±  1.218  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled161000  avgt   25   317.811 
>> ±  1.225  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25  3212.333 
>> ± 14.621  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled80  avgt   25 2.356 
>> ±  0.097  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25 3.630 
>> ±  0.158  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8   10  avgt   25 8.724 
>> ±  0.065  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8  100  avgt   2532.402 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000  avgt   25   321.949 
>> ±  0.251  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25  3202.083 
>> ±  1.667  ns/op
>> StringHashCode.Algorithm.scalarUTF16  0  avgt   25 2.135 
>> ±  0.191  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25 5.202 
>> ±  0.362  ns/op
>> StringHashCode.Algorithm.scalarUTF16 10  avgt   2511.105 
>> ±  0.112  ns/op
>> StringHashCode.Algorithm.scalarUTF16100  avgt   2575.974 
>> ±  0.702  ns/op
>> StringHashCode.Algorithm.scalarUTF16   1000  avgt   25   716.429 
>> ±  3.290  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25  7095.459 
>> ± 43.847  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled160  avgt   25 2.381 
>> ±  0.038  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25 5.268 
>> ±  0.422  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16   10  avgt   2511.248 
>> ±  0.178  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16  100  avgt   2552.966 
>> ±  0.089  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000  avgt   25   450.912 
>> ±  1.834  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25  4403.988 
>> ±  2.927  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 0  avgt   25 2.401 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25 5.091 
>> ±  0.396  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled810  avgt   2512.801 
>> ±  0.189  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8   100  avgt   2552.068 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8  1000  avgt   25   453.270 
>> ±  0.340  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25  4433.112 
>> ±  2.699  ns/op
>> 
>> 
>> At Datadog, we handle a great amount of text (through logs management for 
>> example), and hashing String represents a large part of our CPU usage. It's 
>> very unlikely that we are the only one as String.hashCode is such a core 
>> feature of the JVM-based languages with its use in HashMap for example. 
>> Having even only a 2x speedup would allow us to save thousands of CPU cores 
>> per month and improve correspondingly the energy/carbon impact.
>> 
>> [1] 
>> https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf
>
> Ludovic Henry has updated the pull request incrementally with one additional 
> commit since t

Integrated: 8284579: Improve VarHandle checks for interpreter

2022-04-13 Thread Claes Redestad

On Fri, 8 Apr 2022 11:48:10 GMT, Claes Redestad  wrote:

> A few additional enhancements aiming to improve VH performance in the 
> interpreter:
> 
> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
> 40->48) but removes an object and an indirection on any instance actually 
> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some 
> instances
> - Have `checkExactAccessMode` return the directness of the `VarHandle` so 
> that we can avoid some `isDirect` method calls.
> 
> Baseline, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op
> 
> 
> Patched, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op
> 
> 
> No significant performance difference in normal mode.

This pull request has now been integrated.

Changeset: 280aa428
Author:Claes Redestad 
URL:   
https://git.openjdk.java.net/jdk/commit/280aa428800043f314b92ae88076d596cb4c2fe0
Stats: 244 lines in 4 files changed: 26 ins; 28 del; 190 mod

8284579: Improve VarHandle checks for interpreter

Reviewed-by: mcimadamore, mchung

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v5]

2022-04-12 Thread Claes Redestad

> A few additional enhancements aiming to improve VH performance in the 
> interpreter:
> 
> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
> 40->48) but removes an object and an indirection on any instance actually 
> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some 
> instances
> - Have `checkExactAccessMode` return the directness of the `VarHandle` so 
> that we can avoid some `isDirect` method calls.
> 
> Baseline, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op
> 
> 
> Patched, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op
> 
> 
> No significant performance difference in normal mode.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  typo

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8160/files
  - new: https://git.openjdk.java.net/jdk/pull/8160/files/df1d652b..9998e538

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8160.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v4]

2022-04-12 Thread Claes Redestad

> A few additional enhancements aiming to improve VH performance in the 
> interpreter:
> 
> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
> 40->48) but removes an object and an indirection on any instance actually 
> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some 
> instances
> - Have `checkExactAccessMode` return the directness of the `VarHandle` so 
> that we can avoid some `isDirect` method calls.
> 
> Baseline, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op
> 
> 
> Patched, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op
> 
> 
> No significant performance difference in normal mode.

Claes Redestad has updated the pull request incrementally with two additional 
commits since the last revision:

 - Improve javadoc for merged method
 - Add javadoc for merged method

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8160/files
  - new: https://git.openjdk.java.net/jdk/pull/8160/files/2a4fbd6d..df1d652b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=02-03

  Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8160.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v3]

2022-04-12 Thread Claes Redestad

On Tue, 12 Apr 2022 01:15:33 GMT, David Holmes  wrote:

> > checkExactAccessMode -> checkAccessModeThenIsDirect
> 
> Don't you still want "Exact" in there? That "access" check seems odd anyway 
> as it only checks for one form of mismatch - should it not also check for 
> `!exact && accessModeType(ad.type) == ad.symbolicMethodTypeExact`?

What's checked is that the access mode is nominally OK. It just so happens that 
the only rule validated up front is that iff the VH is exact then the types 
must be an exact match. 

For non-exact VH sufficient checks will be done later, eg. by the `asType` call 
(if the type of the VH isn't adaptable to `ad.symbolicMethodTypeInvoker` this 
will throw an exception).

With that in mind then the name `checkExactAccessMode` even appears misleading 
as it can be interpreted that the VH _must_ be `exact`. Which is not the case. 
Dropping the `Exact` part makes it less ambiguous.

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284637: Improve String.join performance

2022-04-11 Thread Claes Redestad

On Fri, 8 Apr 2022 19:33:26 GMT, XenoAmess  wrote:

> 8284637: Improve String.join performance

I think you can use or extend the existing 
`org.openjdk.bench.java.util.StringJoinerBench` JMH for this.

Refer to [doc/testing.md](../tree/master/doc/testing.md) for setup help, but 
you should be able to run specific JMH using the build system with something 
like `make test TEST=micro:StringJoinerBench.join`

-

PR: https://git.openjdk.java.net/jdk/pull/8169

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v3]

2022-04-11 Thread Claes Redestad

> A few additional enhancements aiming to improve VH performance in the 
> interpreter:
> 
> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
> 40->48) but removes an object and an indirection on any instance actually 
> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some 
> instances
> - Have `checkExactAccessMode` return the directness of the `VarHandle` so 
> that we can avoid some `isDirect` method calls.
> 
> Baseline, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op
> 
> 
> Patched, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op
> 
> 
> No significant performance difference in normal mode.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  checkExactAccessMode -> checkAccessModeThenIsDirect, privatize throw method, 
cleanup code generator

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8160/files
  - new: https://git.openjdk.java.net/jdk/pull/8160/files/68f414a1..2a4fbd6d

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=01-02

  Stats: 90 lines in 4 files changed: 0 ins; 5 del; 85 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8160.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-11 Thread Claes Redestad

On Mon, 11 Apr 2022 09:57:42 GMT, Claes Redestad  wrote:

>> src/java.base/share/classes/java/lang/invoke/VarHandles.java line 719:
>> 
>>> 717: //
>>> MethodHandle.linkToStatic();
>>> 718: //} else {
>>> 719: //MethodHandle mh = 
>>> handle.getMethodHandle(ad.mode);
>> 
>> The `direct`‑ness check can be hoisted into an enclosing `if` statement:
>> Suggestion:
>> 
>> //if (direct) {
>> //if (handle.vform.methodType_table[ad.type] == 
>> ad.symbolicMethodTypeErased) {
>> //
>> MethodHandle.linkToStatic();
>> //return;
>> //} else if (handle.vform.getMethodType_V(ad.type) 
>> == ad.symbolicMethodTypeErased) {
>> //
>> MethodHandle.linkToStatic();
>> //return;
>> //}
>> //}
>> //MethodHandle mh = handle.getMethodHandle(ad.mode);
>> 
>> 
>> Also, any reason `GUARD_METHOD_TEMPLATE_V` uses `LINK_TO_STATIC_ARGS` 
>> instead of `LINK_TO_STATIC_ARGS_V`?
>
>> How would the performance change if the `isDirect` and 
>> `checkExactAccessMode` merger was reverted?
> 
> Add around 15-20ns/op for these micros.

Restructuring so that we only check `direct` once sounds reasonable at face 
value but would be a lot of churn for little gain (even in the interpreter 
testing a local boolean field is fast, and JITs will optimize this well).

The `LINK_TO_STATIC_ARGS_V` in the code generator seems like a remnant from an 
earlier iteration of this code. The `vform.getMemberName_V` method the code 
generator would emit a call to doesn't even exist. This should probably be 
cleaned up, separately. @PaulSandoz, WDYT?

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-11 Thread Claes Redestad

On Mon, 11 Apr 2022 09:34:17 GMT, ExE Boss  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Simplified as suggested by @ExE-Boss
>
> src/java.base/share/classes/java/lang/invoke/VarHandles.java line 719:
> 
>> 717: //
>> MethodHandle.linkToStatic();
>> 718: //} else {
>> 719: //MethodHandle mh = 
>> handle.getMethodHandle(ad.mode);
> 
> The `direct`‑ness check can be hoisted into an enclosing `if` statement:
> Suggestion:
> 
> //if (direct) {
> //if (handle.vform.methodType_table[ad.type] == 
> ad.symbolicMethodTypeErased) {
> //
> MethodHandle.linkToStatic();
> //return;
> //} else if (handle.vform.getMethodType_V(ad.type) == 
> ad.symbolicMethodTypeErased) {
> //
> MethodHandle.linkToStatic();
> //return;
> //}
> //}
> //MethodHandle mh = handle.getMethodHandle(ad.mode);
> 
> 
> Also, any reason `GUARD_METHOD_TEMPLATE_V` uses `LINK_TO_STATIC_ARGS` instead 
> of `LINK_TO_STATIC_ARGS_V`?

> How would the performance change if the `isDirect` and `checkExactAccessMode` 
> merger was reverted?

Add around 15-20ns/op for these micros.

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-11 Thread Claes Redestad

On Mon, 11 Apr 2022 09:26:12 GMT, ExE Boss  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Simplified as suggested by @ExE-Boss
>
> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2075:
> 
>> 2073: 
>> 2074: @DontInline
>> 2075: final void 
>> throwWrongMethodTypeException(VarHandle.AccessDescriptor ad) {
> 
> This can actually be `private` and `static`:
> Suggestion:
> 
> private static final void 
> throwWrongMethodTypeException(VarHandle.AccessDescriptor ad) {

`private` yes. `static` would require to pass the `MethodType` and since this 
is an exceptional slow-path it doesn't make sense to keep the complexity 
contained inside the `throw...` method.

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8283892: Compress and expand bits [v6]

2022-04-09 Thread Claes Redestad

On Fri, 8 Apr 2022 19:13:35 GMT, Paul Sandoz  wrote:

>> Add support to compress bits and expand bits of `int` and `long` values, see 
>> Hacker's Delight (2nd edition), section 7.4.
>> 
>> Compressing or expanding bits of an `int` or `long` value can be composed to 
>> enable general permutations, and the "sheep and goats" operation (SAG) see 
>> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
>> stable binary radix sort.
>> 
>> The compress and expand functionality maps efficiently to hardware 
>> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
>> implementations can be very efficient on supporting hardware. 
>> Intrinsification will occur in a separate PR.
>> 
>> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
>> beneficial performance impact of the `PDEP` instruction, and by extension 
>> the `expand` method, when applied to the implementation of a bit-vector 
>> select operation in succinct data structures (for example `select(r)` 
>> returns the position of the `r`th 1).
>> 
>> Testing-wise the approach take is three fold:
>> 1. Tests compared against simple implementations that are easy to read and 
>> verify against the JDK implementations (which later will also be made 
>> intrinsic). To compensate all tests are also run flipping the test methods 
>> and the methods under test.
>> 2. Tests composed of compress and expand and vice versa.
>> 3. Tests with known mask patterns, whose expected values are easily derived 
>> from the inputs.
>
> Paul Sandoz has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refer to hexadecimal digit

Marked as reviewed by redestad (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8115

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-08 Thread Claes Redestad

On Fri, 8 Apr 2022 12:20:32 GMT, Claes Redestad  wrote:

>> A few additional enhancements aiming to improve VH performance in the 
>> interpreter:
>> 
>> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
>> 40->48) but removes an object and an indirection on any instance actually 
>> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some 
>> instances
>> - Have `checkExactAccessMode` return the directness of the `VarHandle` so 
>> that we can avoid some `isDirect` method calls.
>> 
>> Baseline, `-Xint`
>> 
>> Benchmark Mode  CntScore   Error  Units
>> VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
>> VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
>> VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op
>> 
>> 
>> Patched, `-Xint`
>> 
>> Benchmark Mode  CntScore   Error  Units
>> VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
>> VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
>> VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op
>> 
>> 
>> No significant performance difference in normal mode.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Simplified as suggested by @ExE-Boss

I couldn't come up with a better name, either, but I'll see if something comes 
up over the weekend.

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-08 Thread Claes Redestad

On Fri, 8 Apr 2022 15:05:54 GMT, Maurizio Cimadamore  
wrote:

> TestAdaptVarHandles

Thanks for reviewing, Maurizio. I ran the micros you suggested and I detect no 
change whatsoever at peak performance. With `-Xint` there are some small 
improvements in line with the results above.

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-08 Thread Claes Redestad

On Fri, 8 Apr 2022 12:12:06 GMT, ExE Boss  wrote:

>> Claes Redestad has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Simplified as suggested by @ExE-Boss
>
> src/java.base/share/classes/java/lang/invoke/IndirectVarHandle.java line 104:
> 
>> 102: if (exact && accessModeType(ad.type) != 
>> ad.symbolicMethodTypeExact) {
>> 103: throwWrongMethodTypeException(ad);
>> 104: }
> 
> This could also be:
> Suggestion:
> 
> super.checkExactAccessMode(ad);

Good suggestions, fixed.

-

PR: https://git.openjdk.java.net/jdk/pull/8160

Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-08 Thread Claes Redestad

> A few additional enhancements aiming to improve VH performance in the 
> interpreter:
> 
> - Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
> 40->48) but removes an object and an indirection on any instance actually 
> used - and might avoid allocating the `MethodHandle[]` unnecessarily on some 
> instances
> - Have `checkExactAccessMode` return the directness of the `VarHandle` so 
> that we can avoid some `isDirect` method calls.
> 
> Baseline, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op
> 
> 
> Patched, `-Xint`
> 
> Benchmark Mode  CntScore   Error  Units
> VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
> VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
> VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op
> 
> 
> No significant performance difference in normal mode.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Simplified as suggested by @ExE-Boss

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8160/files
  - new: https://git.openjdk.java.net/jdk/pull/8160/files/49c8fdf8..68f414a1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=00-01

  Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8160.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160

PR: https://git.openjdk.java.net/jdk/pull/8160

RFR: 8284579: Improve VarHandle checks for interpreter

2022-04-08 Thread Claes Redestad

A few additional enhancements aiming to improve VH performance in the 
interpreter:

- Flatten `TypeAndInvokers`: adds a pointer to `VarHandle` (a small increase 
40->48) but removes an object and an indirection on any instance actually used 
- and might avoid allocating the `MethodHandle[]` unnecessarily on some 
instances
- Have `checkExactAccessMode` return the directness of the `VarHandle` so that 
we can avoid some `isDirect` method calls.

Baseline

Benchmark Mode  CntScore   Error  Units
VarHandleExact.exact_exactInvocation  avgt   30  478.324 ? 5.762  ns/op
VarHandleExact.generic_exactInvocationavgt   30  392.114 ? 1.644  ns/op
VarHandleExact.generic_genericInvocation  avgt   30  822.484 ? 1.865  ns/op


Patched

Benchmark Mode  CntScore   Error  Units
VarHandleExact.exact_exactInvocation  avgt   30  437.704 ? 5.320  ns/op
VarHandleExact.generic_exactInvocationavgt   30  374.512 ? 3.154  ns/op
VarHandleExact.generic_genericInvocation  avgt   30  757.054 ? 1.237  ns/op

-

Commit messages:
 - Various improvements to VarHandle startup

Changes: https://git.openjdk.java.net/jdk/pull/8160/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=8160&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8284579
  Stats: 232 lines in 4 files changed: 18 ins; 23 del; 191 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8160.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8160/head:pull/8160

PR: https://git.openjdk.java.net/jdk/pull/8160

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1001 matches

Mail list logo