Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]

2022-05-27 Thread Claes Redestad
On Wed, 25 May 2022 14:13:52 GMT, Claes Redestad  wrote:

>> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
>> allows keys to be compacted when all byte values of the key fit in 4 bits, 
>> otherwise a byte array is allocated and used. This means that all transforms 
>> with a kind value above 15 will be forced to allocate and use array 
>> comparisons.
>> 
>> Removing unused and folding some transforms to ensure all existing kinds can 
>> fit snugly within the 0-15 value range realize a minor improvement to 
>> footprint, speed and allocation pressure of affected transforms, e.g. 
>> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
>> 
>> Baseline:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 2048.475 ?  69.887   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3487.311 ?  80.385B/op
>> 
>> 
>> Patched:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 1961.985 ? 101.519   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3156.478 ? 183.600B/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Missed correctly taking b1 into account in of(byte, int, int...) 
> (java/lang/String/concat/ImplicitStringConcatShapes.java test failure)

Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v4]

2022-05-27 Thread Claes Redestad
> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Address Mandy review comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8881/files
  - new: https://git.openjdk.java.net/jdk/pull/8881/files/612b4ece..bcd79aad

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8881=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8881=02-03

  Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]

2022-05-26 Thread Mandy Chung
On Wed, 25 May 2022 14:13:52 GMT, Claes Redestad  wrote:

>> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
>> allows keys to be compacted when all byte values of the key fit in 4 bits, 
>> otherwise a byte array is allocated and used. This means that all transforms 
>> with a kind value above 15 will be forced to allocate and use array 
>> comparisons.
>> 
>> Removing unused and folding some transforms to ensure all existing kinds can 
>> fit snugly within the 0-15 value range realize a minor improvement to 
>> footprint, speed and allocation pressure of affected transforms, e.g. 
>> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
>> 
>> Baseline:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 2048.475 ?  69.887   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3487.311 ?  80.385B/op
>> 
>> 
>> Patched:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 1961.985 ? 101.519   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3156.478 ? 183.600B/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Missed correctly taking b1 into account in of(byte, int, int...) 
> (java/lang/String/concat/ImplicitStringConcatShapes.java test failure)

LGTM with a couple of suggestions.

src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 219:

> 217: return new TransformKey(fullBytes);
> 218: }
> 219: static TransformKey of(byte kind, int b1, int b2, int[] b3456) {

Nit: I can't figure out if `b3456` intends to be a different convention than 
the others `b123` and `b234`.  I would expect something like this:

Suggestion:

static TransformKey of(byte kind, int b1, int b2, int... b345) {

src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 252:

> 250: if (!inRange(bitset))
> 251: return 0;
> 252: pb = pb | b2 << (2 * PACKED_BYTE_SIZE) | b1 << 
> PACKED_BYTE_SIZE | b0;

Suggestion:

pb = pb | packagedBytes(b0, b1, b2);

src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 267:

> 265: if (!inRange(bitset))
> 266: return 0;
> 267: pb = pb | b1 << PACKED_BYTE_SIZE | b0;

Suggestion:

pb = pb | packagedBytes(b0, b1);

-

Marked as reviewed by mchung (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v2]

2022-05-25 Thread Jorn Vernee
On Wed, 25 May 2022 14:03:41 GMT, Claes Redestad  wrote:

>> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
>> allows keys to be compacted when all byte values of the key fit in 4 bits, 
>> otherwise a byte array is allocated and used. This means that all transforms 
>> with a kind value above 15 will be forced to allocate and use array 
>> comparisons.
>> 
>> Removing unused and folding some transforms to ensure all existing kinds can 
>> fit snugly within the 0-15 value range realize a minor improvement to 
>> footprint, speed and allocation pressure of affected transforms, e.g. 
>> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
>> 
>> Baseline:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 2048.475 ?  69.887   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3487.311 ?  80.385B/op
>> 
>> 
>> Patched:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 1961.985 ? 101.519   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3156.478 ? 183.600B/op
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Address review comments, fix unsafe shift, rework and remove ofBothArrays

Marked as reviewed by jvernee (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v3]

2022-05-25 Thread Claes Redestad
> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Missed correctly taking b1 into account in of(byte, int, int...) 
(java/lang/String/concat/ImplicitStringConcatShapes.java test failure)

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8881/files
  - new: https://git.openjdk.java.net/jdk/pull/8881/files/2be3b25c..612b4ece

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8881=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8881=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently [v2]

2022-05-25 Thread Claes Redestad
> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Address review comments, fix unsafe shift, rework and remove ofBothArrays

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8881/files
  - new: https://git.openjdk.java.net/jdk/pull/8881/files/001e9d16..2be3b25c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8881=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8881=00-01

  Stats: 86 lines in 2 files changed: 50 ins; 12 del; 24 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8881.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8881/head:pull/8881

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Jorn Vernee
On Wed, 25 May 2022 13:37:14 GMT, Claes Redestad  wrote:

>> Maybe an `assert b == b23456[i]` would be nice here.
>
> All usage implies the int is an argument position, which by spec is 
> constrained to be in the 0-255 range. But surely checking or asserting 
> shouldn't hurt.

Yes, please. IMHO it makes it clear that that is the assumption of this code.

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Claes Redestad
On Wed, 25 May 2022 13:30:23 GMT, Jorn Vernee  wrote:

>> Maybe not... argument positions should fit in a byte as well. But, maybe 
>> there are other problematic cases? Or are the ints guaranteed to fit in a 
>> byte?
>
> Maybe an `assert b == b23456[i]` would be nice here.

All usage implies the int is an argument position, which by spec is constrained 
to be in the 0-255 range. But surely checking or asserting shouldn't hurt.

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Jorn Vernee
On Wed, 25 May 2022 13:27:17 GMT, Jorn Vernee  wrote:

>> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
>> allows keys to be compacted when all byte values of the key fit in 4 bits, 
>> otherwise a byte array is allocated and used. This means that all transforms 
>> with a kind value above 15 will be forced to allocate and use array 
>> comparisons.
>> 
>> Removing unused and folding some transforms to ensure all existing kinds can 
>> fit snugly within the 0-15 value range realize a minor improvement to 
>> footprint, speed and allocation pressure of affected transforms, e.g. 
>> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
>> 
>> Baseline:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 2048.475 ?  69.887   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3487.311 ?  80.385B/op
>> 
>> 
>> Patched:
>> 
>> Benchmark  Mode  Cnt 
>> Score Error   Units
>> SCFB.makeConcatWithConstants   avgt   15  
>> 1961.985 ? 101.519   ns/op
>> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
>> 3156.478 ? 183.600B/op
>
> src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 239:
> 
>> 237: for (int i = 0; i < b23456.length; i++) {
>> 238: int b = b23456[i] & 0xFF;
>> 239: bitset |= b;
> 
> Looks like `b` is always truncated. I wonder what happens if the ints in this 
> array are larger than a byte (which seems to be possible in e.g. the case of 
> argument positions). Some higher order bits might be dropped, but the 
> resulting `b` might only have the least significant 4 bits set.
> 
> I think the untruncated value should be used to compute the bitset? `butset 
> |= b23456[i]`? Then the `inRange` check should reject that case.

Maybe not... argument positions should fit in a byte as well. But, maybe there 
are other problematic cases? Or are the ints guaranteed to fit in a byte?

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Jorn Vernee
On Wed, 25 May 2022 13:28:52 GMT, Jorn Vernee  wrote:

>> src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 239:
>> 
>>> 237: for (int i = 0; i < b23456.length; i++) {
>>> 238: int b = b23456[i] & 0xFF;
>>> 239: bitset |= b;
>> 
>> Looks like `b` is always truncated. I wonder what happens if the ints in 
>> this array are larger than a byte (which seems to be possible in e.g. the 
>> case of argument positions). Some higher order bits might be dropped, but 
>> the resulting `b` might only have the least significant 4 bits set.
>> 
>> I think the untruncated value should be used to compute the bitset? `butset 
>> |= b23456[i]`? Then the `inRange` check should reject that case.
>
> Maybe not... argument positions should fit in a byte as well. But, maybe 
> there are other problematic cases? Or are the ints guaranteed to fit in a 
> byte?

Maybe an `assert b == b23456[i]` would be nice here.

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Jorn Vernee
On Wed, 25 May 2022 09:38:08 GMT, Claes Redestad  wrote:

> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 239:

> 237: for (int i = 0; i < b23456.length; i++) {
> 238: int b = b23456[i] & 0xFF;
> 239: bitset |= b;

Looks like `b` is always truncated. I wonder what happens if the ints in this 
array are larger than a byte (which seems to be possible in e.g. the case of 
argument positions). Some higher order bits might be dropped, but the resulting 
`b` might only have the least significant 4 bits set.

I think the untruncated value should be used to compute the bitset? `butset |= 
b23456[i]`? Then the `inRange` check should reject that case.

-

PR: https://git.openjdk.java.net/jdk/pull/8881


Re: RFR: 8287292: Improve TransformKey to pack more kinds of transforms efficiently

2022-05-25 Thread Jim Laskey
On Wed, 25 May 2022 09:38:08 GMT, Claes Redestad  wrote:

> The bespoke caching scheme in `jl.invoke.LambdaFormEditor.TransformKey` 
> allows keys to be compacted when all byte values of the key fit in 4 bits, 
> otherwise a byte array is allocated and used. This means that all transforms 
> with a kind value above 15 will be forced to allocate and use array 
> comparisons.
> 
> Removing unused and folding some transforms to ensure all existing kinds can 
> fit snugly within the 0-15 value range realize a minor improvement to 
> footprint, speed and allocation pressure of affected transforms, e.g. 
> ~300bytes/op reduction in the `StringConcatFactoryBootstraps` microbenchmark:
> 
> Baseline:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 2048.475 ?  69.887   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3487.311 ?  80.385B/op
> 
> 
> Patched:
> 
> Benchmark  Mode  Cnt 
> Score Error   Units
> SCFB.makeConcatWithConstants   avgt   15  
> 1961.985 ? 101.519   ns/op
> SCFB.makeConcatWithConstants:?gc.alloc.rate.norm   avgt   15  
> 3156.478 ? 183.600B/op

Marked as reviewed by jlaskey (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8881