Re: RFR: 8287186: JDK modules participating in preview [v2]

2022-06-14 Thread Paul Sandoz
On Wed, 8 Jun 2022 22:07:38 GMT, liach  wrote:

>> Paul Sandoz has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Let java.management participate in preview features.
>
> Just curious, is it still up to incubator modules' discretion to avoid 
> accidental user access to preview content via the modules without enabling 
> preview, like the `PreviewFeatures.ensureEnabled()` in `StructuredTaskScope`?

@liach it depends on the API and its scope. A constructor of 
`StructuredTaskScope` specifies that it implicitly uses a virtual thread 
factory, so it performs a preview runtime check. 

The same check is also performed by `Thread.ofVirtual`, ensuring developers 
cannot reflectively work around 
`--enable-preview` when creating virtual threads. This approach is feasible 
since the API surface is so small.

-

PR: https://git.openjdk.org/jdk/pull/9087


RFR: 8288414: Long::compress/expand samples are not correct

2022-06-14 Thread Paul Sandoz
Update the code examples in the api notes of Long::compress/expand. Some 
constants need to be explicitly long values.

-

Commit messages:
 - Correct Long.compress/expand api notes.

Changes: https://git.openjdk.org/jdk19/pull/14/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk19=14=00
  Issue: https://bugs.openjdk.org/browse/JDK-8288414
  Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod
  Patch: https://git.openjdk.org/jdk19/pull/14.diff
  Fetch: git fetch https://git.openjdk.org/jdk19 pull/14/head:pull/14

PR: https://git.openjdk.org/jdk19/pull/14


Integrated: 8287186: JDK modules participating in preview

2022-06-14 Thread Paul Sandoz
On Wed, 8 Jun 2022 15:46:24 GMT, Paul Sandoz  wrote:

> Allow JDK modules that use preview features (preview language features or 
> preview API features from dependent modules) to participate without the need 
> to compile with `--enable-preview`.
> 
> It's difficult to enable participation using an annotation due to the nature 
> in which symbols are encountered when processing source as there is no 
> guaranteed order to the processing of certain symbols.
> 
> Instead a JDK module participates if the `java.base` package 
> `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An 
> internal annotation `jdk.internal.javac.ParticipatesInPreview` can be 
> declared on the module. Such a declaration cannot be enforced but does by its 
> use require the `jdk.internal.javac`'s export list to be updated.
> 
> The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been 
> updated accordingly, both of which participate in preview APIs (APIs in 
> `java.lang.foreign` and `Thread.ofVirtual`, respectively).

This pull request has now been integrated.

Changeset: fb297705
Author:Paul Sandoz 
URL:   
https://git.openjdk.org/jdk/commit/fb297705f6dc668bea0257efb9c46b90b5eab2e9
Stats: 140 lines in 12 files changed: 65 ins; 57 del; 18 mod

8287186: JDK modules participating in preview

Reviewed-by: alanb, jlahoda

-

PR: https://git.openjdk.org/jdk/pull/9087


Re: RFR: 8287186: JDK modules participating in preview [v3]

2022-06-13 Thread Paul Sandoz
> Allow JDK modules that use preview features (preview language features or 
> preview API features from dependent modules) to participate without the need 
> to compile with `--enable-preview`.
> 
> It's difficult to enable participation using an annotation due to the nature 
> in which symbols are encountered when processing source as there is no 
> guaranteed order to the processing of certain symbols.
> 
> Instead a JDK module participates if the `java.base` package 
> `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An 
> internal annotation `jdk.internal.javac.ParticipatesInPreview` can be 
> declared on the module. Such a declaration cannot be enforced but does by its 
> use require the `jdk.internal.javac`'s export list to be updated.
> 
> The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been 
> updated accordingly, both of which participate in preview APIs (APIs in 
> `java.lang.foreign` and `Thread.ofVirtual`, respectively).

Paul Sandoz has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains four additional commits since 
the last revision:

 - Merge remote-tracking branch 'upstream/master' into 
JDK-8287186-preview-participating
 - Let java.management participate in preview features.
 - Unused import.
 - Generalize the pariticipating in preview APIs.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/9087/files
  - new: https://git.openjdk.org/jdk/pull/9087/files/9defdf23..abd1fbf6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=9087=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=9087=01-02

  Stats: 17219 lines in 554 files changed: 13408 ins; 1651 del; 2160 mod
  Patch: https://git.openjdk.org/jdk/pull/9087.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9087/head:pull/9087

PR: https://git.openjdk.org/jdk/pull/9087


Re: RFR: 8287186: JDK modules participating in preview

2022-06-08 Thread Paul Sandoz
On Wed, 8 Jun 2022 17:21:08 GMT, Alan Bateman  wrote:

> Can java.management participate too? It would allow 
> sun.management.Util.isVirtual(Thread) to go away (lots of methods in 
> sun.management.ThreadImpl need to test if a thread is virtual).

Pushed update.

-

PR: https://git.openjdk.java.net/jdk/pull/9087


Re: RFR: 8287186: JDK modules participating in preview [v2]

2022-06-08 Thread Paul Sandoz
> Allow JDK modules that use preview features (preview language features or 
> preview API features from dependent modules) to participate without the need 
> to compile with `--enable-preview`.
> 
> It's difficult to enable participation using an annotation due to the nature 
> in which symbols are encountered when processing source as there is no 
> guaranteed order to the processing of certain symbols.
> 
> Instead a JDK module participates if the `java.base` package 
> `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An 
> internal annotation `jdk.internal.javac.ParticipatesInPreview` can be 
> declared on the module. Such a declaration cannot be enforced but does by its 
> use require the `jdk.internal.javac`'s export list to be updated.
> 
> The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been 
> updated accordingly, both of which participate in preview APIs (APIs in 
> `java.lang.foreign` and `Thread.ofVirtual`, respectively).

Paul Sandoz has updated the pull request incrementally with one additional 
commit since the last revision:

  Let java.management participate in preview features.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/9087/files
  - new: https://git.openjdk.java.net/jdk/pull/9087/files/5e7ca855..9defdf23

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=9087=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=9087=00-01

  Stats: 44 lines in 5 files changed: 4 ins; 29 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9087.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9087/head:pull/9087

PR: https://git.openjdk.java.net/jdk/pull/9087


RFR: 8287186: JDK modules participating in preview

2022-06-08 Thread Paul Sandoz
Allow JDK modules that use preview features (preview language features or 
preview API features from dependent modules) to participate without the need to 
compile with `--enable-preview`.

It's difficult to enable participation using an annotation due to the nature in 
which symbols are encountered when processing source as there is no guaranteed 
order to the processing of certain symbols.

Instead a JDK module participates if the `java.base` package 
`jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An 
internal annotation `jdk.internal.javac.ParticipatesInPreview` can be declared 
on the module. Such a declaration cannot be enforced but does by its use 
require the `jdk.internal.javac`'s export list to be updated.

The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been 
updated accordingly, both of which participate in preview APIs (APIs in 
`java.lang.foreign` and `Thread.ofVirtual`, respectively).

-

Commit messages:
 - Unused import.
 - Generalize the pariticipating in preview APIs.

Changes: https://git.openjdk.java.net/jdk/pull/9087/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=9087=00
  Issue: https://bugs.openjdk.org/browse/JDK-8287186
  Stats: 98 lines in 7 files changed: 62 ins; 28 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9087.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9087/head:pull/9087

PR: https://git.openjdk.java.net/jdk/pull/9087


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]

2022-06-06 Thread Paul Sandoz
On Thu, 2 Jun 2022 01:29:54 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains three commits:
> 
>  - Merge branch 'jdk:master' to JDK-8286279
>  - Wrap the offset check into a static method
>  - 8286279: [vectorapi] Only check index of masked lanes if offset is out of 
> array boundary for masked store

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v5]

2022-06-06 Thread Paul Sandoz
On Thu, 2 Jun 2022 03:27:59 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains five commits:
> 
>  - Merge branch 'jdk:master' into JDK-8283667
>  - Use integer constant for offsetInRange all the way through
>  - Rename "use_predicate" to "needs_predicate"
>  - Rename the "usePred" to "offsetInRange"
>  - 8283667: [vectorapi] Vectorization for masked load with IOOBE with 
> predicate feature

Looks good. As a follow on PR I think it would be useful to add constants 
`OFFSET_IN_RANGE` and `OFFSET_OUT_OF_RANGE`, then it becomes much clearer in 
source and you can drop the `/* offsetInRange */` comment on the argument.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-31 Thread Paul Sandoz
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Wrap the offset check into a static method
>
> @PaulSandoz, could you please help to check whether the current version is ok 
> for you? Thanks so much!

@XiaohongGong looks good, now the Vector API JEP has been integrated you will 
get a merge conflict, but it should be easier to resolve.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8287244: Add bound check in indexed memory access var handle [v3]

2022-05-24 Thread Paul Sandoz
On Tue, 24 May 2022 18:15:45 GMT, Maurizio Cimadamore  
wrote:

>> Constructing indexed var handles using the `MemoryLayout` API produces 
>> `VarHandle` which do not check the input indices for out-of-bounds 
>> conditions.
>> While this can never result in a VM crash (after all the memory segment will 
>> protect against "true" OOB access), it is still possible for an access 
>> expression to refer to parts of a segment that are logically unrelated.
>> 
>> This patch adds a "logical" bound check to all indexed var handles generated 
>> using the layout API.
>> Benchmarks are not affected by the check. Users are still able to create 
>> custom "unchecked" var handles, using the combinator API in `MethodHandles`.
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Address review comments

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8868


Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]

2022-05-24 Thread Paul Sandoz
On Tue, 24 May 2022 17:55:13 GMT, Paul Sandoz  wrote:

>> Here's a concrete example:
>> 
>> Consider a sequence layout with 6 elements. Then:
>> 
>> element count = 6
>> valid indices 0, 1, 2, 3, 4, 5
>> 
>> Now consider a var handle that is obtained by calling the path element 
>> method, passing the following parameters
>> 
>> start = 1
>> step = 2
>> 
>> This sets up the following mapping between logical an physical indices:
>> 
>> 0 -> 1
>> 1 -> 3
>> 2 -> 5
>> 
>> Where on the LHS we have the logical index (the one passed to the VH) and on 
>> the RHS we have the actual index it is translated to.
>> 
>> Note that the index map has three elements. So the upper bound (exclusive) 
>> of the index map is 3 - that is, we can pass indices 0, 1, 2.
>> 
>> According to the formula shown in the javadoc:
>> 
>> B = ceilDiv((elementCount - start) / step);
>> 
>> so:
>> 
>> B = ceilDiv((6 - 1) / 2)
>> = ceilDiv(5 / 2)
>> 
>> Note how, w/o ceilDiv we'd get 2 (the last valid index), and not 3 (the 
>> first invalid index).
>
> The terms `x_1, x_2, ... x__n` are defined, but `x_0` is not. 
> 
> I think you can refer to the first index out of bounds as the exclusive upper 
> bound of the range?

Sorry i misread the text, we are talking about the same thing. I think it would 
be clearer to refer `x_i` being in the range of `0` (inclusive) and `b_i` 
(exclusive), otherwise an  is thrown. That way in subsequent doc on other 
methods in matches with `B`, which is exclusive.

-

PR: https://git.openjdk.java.net/jdk/pull/8868


Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]

2022-05-24 Thread Paul Sandoz
On Tue, 24 May 2022 17:53:38 GMT, Maurizio Cimadamore  
wrote:

>> Indices start at zero. The ceilDiv operation is needed so that the operation 
>> returns the first index outisde the range (it's a bit subtle, sorry, but I 
>> don't know how else to express).
>
> Here's a concrete example:
> 
> Consider a sequence layout with 6 elements. Then:
> 
> element count = 6
> valid indices 0, 1, 2, 3, 4, 5
> 
> Now consider a var handle that is obtained by calling the path element 
> method, passing the following parameters
> 
> start = 1
> step = 2
> 
> This sets up the following mapping between logical an physical indices:
> 
> 0 -> 1
> 1 -> 3
> 2 -> 5
> 
> Where on the LHS we have the logical index (the one passed to the VH) and on 
> the RHS we have the actual index it is translated to.
> 
> Note that the index map has three elements. So the upper bound (exclusive) of 
> the index map is 3 - that is, we can pass indices 0, 1, 2.
> 
> According to the formula shown in the javadoc:
> 
> B = ceilDiv((elementCount - start) / step);
> 
> so:
> 
> B = ceilDiv((6 - 1) / 2)
> = ceilDiv(5 / 2)
> 
> Note how, w/o ceilDiv we'd get 2 (the last valid index), and not 3 (the first 
> invalid index).

The terms `x_1, x_2, ... x__n` are defined, but `x_0` is not. 

I think you can refer to the first index out of bounds as the exclusive upper 
bound of the range?

-

PR: https://git.openjdk.java.net/jdk/pull/8868


Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]

2022-05-24 Thread Paul Sandoz
On Tue, 24 May 2022 15:28:27 GMT, Maurizio Cimadamore  
wrote:

>> Constructing indexed var handles using the `MemoryLayout` API produces 
>> `VarHandle` which do not check the input indices for out-of-bounds 
>> conditions.
>> While this can never result in a VM crash (after all the memory segment will 
>> protect against "true" OOB access), it is still possible for an access 
>> expression to refer to parts of a segment that are logically unrelated.
>> 
>> This patch adds a "logical" bound check to all indexed var handles generated 
>> using the layout API.
>> Benchmarks are not affected by the check. Users are still able to create 
>> custom "unchecked" var handles, using the combinator API in `MethodHandles`.
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Tweak javadoc for ValueLayout::arrayElementVarHandle

src/java.base/share/classes/jdk/internal/foreign/LayoutPath.java line 256:

> 254: private long[] addBound(long maxIndex) {
> 255: long[] newBounds = new long[bounds.length + 1];
> 256: System.arraycopy(bounds, 0, newBounds, 0, bounds.length);

Suggestion:

long[] newBounds = Arrays.copyOf(bounds, bounds.length + 1);

-

PR: https://git.openjdk.java.net/jdk/pull/8868


Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]

2022-05-24 Thread Paul Sandoz
On Tue, 24 May 2022 15:28:27 GMT, Maurizio Cimadamore  
wrote:

>> Constructing indexed var handles using the `MemoryLayout` API produces 
>> `VarHandle` which do not check the input indices for out-of-bounds 
>> conditions.
>> While this can never result in a VM crash (after all the memory segment will 
>> protect against "true" OOB access), it is still possible for an access 
>> expression to refer to parts of a segment that are logically unrelated.
>> 
>> This patch adds a "logical" bound check to all indexed var handles generated 
>> using the layout API.
>> Benchmarks are not affected by the check. Users are still able to create 
>> custom "unchecked" var handles, using the combinator API in `MethodHandles`.
>
> Maurizio Cimadamore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Tweak javadoc for ValueLayout::arrayElementVarHandle

src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 374:

> 372:  *
> 373:  * Additionally, the provided dynamic values must conform to some 
> bound which is derived from the layout path, that is,
> 374:  * {@code 0 <= x_i <= b_i}, where {@code 0 <= i <= n}, or {@link 
> IndexOutOfBoundsException} is thrown.

Suggestion:

 * {@code 0 <= x_i < b_i}, where {@code 1 <= i <= n}, or {@link 
IndexOutOfBoundsException} is thrown.

We refer later to `B` being an exclusive upper bound (computed using `ceilDiv`).

-

PR: https://git.openjdk.java.net/jdk/pull/8868


Re: RFR: 8284199: Implementation of Structured Concurrency (Incubator)

2022-05-23 Thread Paul Sandoz
On Sat, 21 May 2022 16:25:57 GMT, Alan Bateman  wrote:

>> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java
>>  line 1172:
>> 
>>> 1170: }
>>> 1171: };
>>> 1172: return AccessController.doPrivileged(pa);
>> 
>> It might be better to use `MethodHandle`s obtained using 
>> [jdk.internal.access.SharedSecrets].getJavaLangInvokeAccess()
>>  and [JavaLangInvokeAccess].findStatic(…) and 
>> [JavaLangInvokeAccess].findVirtual(…) for this, which 
>> would avoid going through `AccessController.doPrivilaged(…)`.
>> 
>> [jdk.internal.access.SharedSecrets]: 
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
>> [JavaLangInvokeAccess]: 
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/access/JavaLangInvokeAccess.java
>
> I'd prefer not export jdk.internal.access to this module, if possible.  In 
> general, it's a lot easier to reason about the security and integrity of the 
> core platform when java.base doesn't export any of its internal packages.
> 
> In any case, I expect this issue will resolve itself once there is a way for 
> code in "core" modules to use preview APIs without needing to be compiled 
> with --enable-preview. The java.management and jdk.incubator.vector modules 
> have the same issue.

Yes, we will add a more general internal for source that participate in preview 
APIs, and then the use of reflection can be removed. (The Vector API PR has a 
focused fix to the compiler, since it cannot use the reflection trick).

-

PR: https://git.openjdk.java.net/jdk/pull/8787


Re: RFR: 8284199: Implementation of Structured Concurrency (Incubator)

2022-05-23 Thread Paul Sandoz
On Thu, 19 May 2022 13:05:54 GMT, Alan Bateman  wrote:

> This is the implementation of JEP 428: Structured Concurrency (Incubator).
> 
> This is a non-final API that provides a gentle on-ramp to structure a task as 
> a family of concurrent subtasks, and to coordinate the subtasks as a unit.

Previously reviewed while in the loom repo.

src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructureViolationException.java
 line 40:

> 38: 
> 39: /**
> 40:  * Constructs an {@code StructureViolationException} with no detail 
> message.

Suggestion:

 * Constructs a {@code StructureViolationException} with no detail message.

src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructureViolationException.java
 line 47:

> 45: 
> 46: /**
> 47:  * Constructs an {@code StructureViolationException} with the 
> specified

Suggestion:

 * Constructs a {@code StructureViolationException} with the specified

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8787


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v7]

2022-05-19 Thread Paul Sandoz
On Thu, 19 May 2022 21:11:41 GMT, Jatin Bhateja  wrote:

>> Hi All,
>> 
>> Patch adds the planned support for new vector operations and APIs targeted 
>> for [JEP 426: Vector API (Fourth 
>> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
>> 
>> Following is the brief summary of changes:-
>> 
>> 1)  Extends the scope of existing lanewise API for following new vector 
>> operations.
>>-  VectorOperations.BIT_COUNT: counts the number of one-bits
>>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero 
>> bits
>>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing 
>> zero bits
>>- VectorOperations.REVERSE: reversing the order of bits
>>- VectorOperations.REVERSE_BYTES: reversing the order of bytes
>>- compress and expand bits: Semantics are based on Hacker's Delight 
>> section 7-4 Compress, or Generalized Extract.
>> 
>> 2)  Adds following new APIs to perform cross lane vector compress and 
>> expansion operations under the influence of a mask.
>>- Vector.compress
>>- Vector.expand 
>>- VectorMask.compress
>> 
>> 3) Adds predicated and non-predicated versions of following new APIs to load 
>> and store the contents of vector from foreign MemorySegments. 
>>   - Vector.fromMemorySegment
>>   - Vector.intoMemorySegment
>> 
>> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support 
>> for each newly added operation.
>> 
>> 
>>  Patch has been regressed over AARCH64 and X86 targets different AVX levels. 
>> 
>>  Kindly review and share your feedback.
>> 
>>  Best Regards,
>>  Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 16 commits:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Changes to enable jdk.incubator.vector to be treated as preview 
> participant. Code re-organization related to Reverse/ReverseByte IR 
> transforms.
>  - 8284960: Adding --enable-preview in vectorAPI benchmarks.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Review comments resolution.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Correcting a typo.
>  - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - ... and 6 more: 
> https://git.openjdk.java.net/jdk/compare/9f562ef7...311f3233

src/jdk.compiler/share/classes/com/sun/tools/javac/code/Preview.java line 132:

> 130:  * @return true if {@code s} is participating in the preview of 
> {@code previewSymbol}
> 131:  */
> 132: public boolean isPreviewParticipating(Symbol s, Symbol 
> previewSymbol) {

Some feedback from a colleague:
Suggestion:

/**
 * Returns true if {@code s} is deemed to participate in the preview of 
{@code previewSymbol}, and
 * therefore no warnings or errors will be produced.
 *
 * @param s the symbol depending on the preview symbol
 * @param previewSymbol the preview symbol marked with @Preview
 * @return true if {@code s} is participating in the preview of {@code 
previewSymbol}
 */
public boolean participatesInPreview(Symbol s, Symbol previewSymbol) {

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v7]

2022-05-19 Thread Paul Sandoz
On Thu, 19 May 2022 21:11:41 GMT, Jatin Bhateja  wrote:

>> Hi All,
>> 
>> Patch adds the planned support for new vector operations and APIs targeted 
>> for [JEP 426: Vector API (Fourth 
>> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
>> 
>> Following is the brief summary of changes:-
>> 
>> 1)  Extends the scope of existing lanewise API for following new vector 
>> operations.
>>-  VectorOperations.BIT_COUNT: counts the number of one-bits
>>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero 
>> bits
>>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing 
>> zero bits
>>- VectorOperations.REVERSE: reversing the order of bits
>>- VectorOperations.REVERSE_BYTES: reversing the order of bytes
>>- compress and expand bits: Semantics are based on Hacker's Delight 
>> section 7-4 Compress, or Generalized Extract.
>> 
>> 2)  Adds following new APIs to perform cross lane vector compress and 
>> expansion operations under the influence of a mask.
>>- Vector.compress
>>- Vector.expand 
>>- VectorMask.compress
>> 
>> 3) Adds predicated and non-predicated versions of following new APIs to load 
>> and store the contents of vector from foreign MemorySegments. 
>>   - Vector.fromMemorySegment
>>   - Vector.intoMemorySegment
>> 
>> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support 
>> for each newly added operation.
>> 
>> 
>>  Patch has been regressed over AARCH64 and X86 targets different AVX levels. 
>> 
>>  Kindly review and share your feedback.
>> 
>>  Best Regards,
>>  Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 16 commits:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Changes to enable jdk.incubator.vector to be treated as preview 
> participant. Code re-organization related to Reverse/ReverseByte IR 
> transforms.
>  - 8284960: Adding --enable-preview in vectorAPI benchmarks.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Review comments resolution.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Correcting a typo.
>  - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - ... and 6 more: 
> https://git.openjdk.java.net/jdk/compare/9f562ef7...311f3233

src/jdk.compiler/share/classes/com/sun/tools/javac/code/Preview.java line 50:

> 48: import java.util.Set;
> 49: 
> 50: import static com.sun.tools.javac.code.Flags.PREVIEW_API;

Suggestion:


Redundant import (sorry i should have checked before i sent you updates to this 
area)

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-12 Thread Paul Sandoz
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename "use_predicate" to "needs_predicate"

Yes, the tests were run in debug mode. The reporting of the missing constant 
occurs for the compiled method that is called from the method where the 
constants are declared e.g.:

719  240bjdk.incubator.vector.Int256Vector::fromArray0 (15 
bytes)
  ** Rejected vector op (LoadVectorMasked,int,8) because architecture does not 
support it
  ** missing constant: offsetInRange=Parm
@ 11   
jdk.incubator.vector.IntVector::fromArray0Template (22 bytes)   force inline by 
annotation


So it appears to be working as expected. A similar pattern occurs at a 
lower-level for the passing of the mask class. `Int256Vector::fromArray0` 
passes a constant class to `IntVector::fromArray0Template` (the compilation of 
which bails out before checking that the `offsetInRange` is constant).

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]

2022-05-12 Thread Paul Sandoz
On Thu, 12 May 2022 11:08:10 GMT, Ludovic Henry  wrote:

>> Looks like you are making great progress.
>> 
>> Have you thought about ways the intrinsic implementation might be simplified 
>> if some code is retained in Java and passed as constant arguments? e.g. 
>> table of constants, scalar loop, bounds checks etc, such that the intrinsic 
>> primarily focuses on the vectorized code. To some extent that's related to 
>> John's point on generalization, and through simplification there may be some 
>> generalization.
>> 
>> For example if there was a general intrinsic that returned a long value 
>> (e.g. first 32 bits are the offset in the array to continue processing, the 
>> second 32 bits are the current hashcode value) then we could call that from 
>> the Java implementations that then proceed with the scalar loop up to the 
>> array length. The Java implementation intrinsic would return `(0L | 1L << 
>> 32)`.
>> 
>> Separately it would be nice to consider computing the hash code from the 
>> contents of a memory segment, similar to how we added `mismatch` support, 
>> but the trick of returning a value that merges the offset and hash code 
>> would not work, unless we return the remaining elements to process and that 
>> guaranteed to be less than a certain value (or alternatively a relative 
>> offset value given an upper bound argument, and performing the intrinsic 
>> call in a loop until no progress can be made, which works better for 
>> safepoints).
>> 
>> The `long[]` hashcode is annoying given `(element ^ (element >>> 32))`, but 
>> if we simplify the intrinsic maybe we can add back that complexity?
>
> @PaulSandoz yes, keeping the "short" string part in pure Java and switching 
> to an intrinsified/vectorized version for "long" strings is on the next 
> avenue of exploration. I would also put the intrinsic as a runtime stub to 
> avoid unnecessarily increase the size of the calling method unnecessarily. 
> The overhead of the call would be amortised because it would only be called 
> for longer strings anyway.
> 
> I haven't given much thoughts to how we could split up the different elements 
> of the algorithm to generalise the approach just yet. I'll give it a try, see 
> how far I can get with it, and keep you updated on my findings.

@luhenry ok, we took a similar approach to the mismatch intrinsic, carefully 
analyzing the threshold by which the intrinsic would be called.

My suggestion would be to follow that approach further and head towards an 
internal intrinsic perhaps with this signature:

@IntrinsicCandidate
static long hashCode(Class eType, Object base, long offset, int length /* in 
bytes * /) {
  return 0 | (1L << 32); // or perhaps just return 0
}  
``` 

Then on a further iteration try and pass the polynomial constant and table of 
powers (stable array) as arguments.

-

PR: https://git.openjdk.java.net/jdk/pull/7700


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Paul Sandoz
On Thu, 12 May 2022 01:56:25 GMT, Xiaohong Gong  wrote:

>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
>>  line 4086:
>> 
>>> 4084: } else {
>>> 4085: $Type$Species vsp = vspecies();
>>> 4086: if (offset < 0 || offset > (a.length - vsp.length())) {
>> 
>> Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. 
>> 
>> if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { 
>> ...
>
> Thanks for the review @PaulSandoz ! For the 
> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be 
> used here because the `outOfBounds` exception will be thrown if the offset is 
> not inside of the valid array boundary. And  for the masked operations, this 
> is not needed since we only need to check the masked lanes. Please correct me 
> if I didn't understand correctly. Thanks!

Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My 
intent was that we could use a method rather than repeating the expression in 
numerous places (including masked loads IIRC).

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-11 Thread Paul Sandoz
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename "use_predicate" to "needs_predicate"

I tried your test code with the patch and logged compilation 
(`-XX:-TieredCompilation -XX:+PrintCompilation -XX:+PrintInlining 
-XX:+PrintIntrinsics -Xbatch`)

For `func` the first call to `VectorSupport::loadMasked` is intrinsic and 
inlined:

@ 45   jdk.internal.vm.vector.VectorSupport::loadMasked (40 bytes)   
(intrinsic)

But the second call (for the last loop iteration) fails to inline: 

@ 45   jdk.internal.vm.vector.VectorSupport::loadMasked (40 bytes)   failed 
to inline (intrinsic)


Since i am running on an mac book this looks right and aligns with the 
`-XX:+PrintIntrinsics` output: 

  ** Rejected vector op (LoadVectorMasked,int,8) because architecture does not 
support it
  ** Rejected vector op (LoadVectorMasked,int,8) because architecture does not 
support it
  ** not supported: op=loadMasked vlen=8 etype=int using_byte_array=0

?

I have not looked at the code gen nor measured performance comparing the case 
when never out of bounds and only out of bounds for the last loop iteration.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]

2022-05-11 Thread Paul Sandoz
On Tue, 10 May 2022 14:46:56 GMT, Ludovic Henry  wrote:

>> Despite the hash value being cached for Strings, computing the hash still 
>> represents a significant CPU usage for applications handling lots of text.
>> 
>> Even though it would be generally better to do it through an enhancement to 
>> the autovectorizer, the complexity of doing it by hand is trivial and the 
>> gain is sizable (2x speedup) even without the Vector API. The algorithm has 
>> been proposed by Richard Startin and Paul Sandoz [1].
>> 
>> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`
>> 
>> 
>> Benchmark(size)  Mode  Cnt Score 
>>Error  Units
>> StringHashCode.Algorithm.scalarLatin1 0  avgt   25 2.111 
>> ±  0.210  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25 3.500 
>> ±  0.127  ns/op
>> StringHashCode.Algorithm.scalarLatin110  avgt   25 7.001 
>> ±  0.099  ns/op
>> StringHashCode.Algorithm.scalarLatin1   100  avgt   2561.285 
>> ±  0.444  ns/op
>> StringHashCode.Algorithm.scalarLatin1  1000  avgt   25   628.995 
>> ±  0.846  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25  6307.990 
>> ±  4.071  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   0  avgt   25 2.358 
>> ±  0.092  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25 3.631 
>> ±  0.159  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16  10  avgt   25 7.049 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16 100  avgt   2533.626 
>> ±  1.218  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled161000  avgt   25   317.811 
>> ±  1.225  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25  3212.333 
>> ± 14.621  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled80  avgt   25 2.356 
>> ±  0.097  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25 3.630 
>> ±  0.158  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8   10  avgt   25 8.724 
>> ±  0.065  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8  100  avgt   2532.402 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000  avgt   25   321.949 
>> ±  0.251  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25  3202.083 
>> ±  1.667  ns/op
>> StringHashCode.Algorithm.scalarUTF16  0  avgt   25 2.135 
>> ±  0.191  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25 5.202 
>> ±  0.362  ns/op
>> StringHashCode.Algorithm.scalarUTF16 10  avgt   2511.105 
>> ±  0.112  ns/op
>> StringHashCode.Algorithm.scalarUTF16100  avgt   2575.974 
>> ±  0.702  ns/op
>> StringHashCode.Algorithm.scalarUTF16   1000  avgt   25   716.429 
>> ±  3.290  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25  7095.459 
>> ± 43.847  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled160  avgt   25 2.381 
>> ±  0.038  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25 5.268 
>> ±  0.422  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16   10  avgt   2511.248 
>> ±  0.178  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16  100  avgt   2552.966 
>> ±  0.089  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000  avgt   25   450.912 
>> ±  1.834  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25  4403.988 
>> ±  2.927  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 0  avgt   25 2.401 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25 5.091 
>> ±  0.396  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled810  avgt   2512.801 
>> ±  0.189  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8   100  avgt   2552.068 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8  1000  avgt   25   453.270 
>> ±  0.340  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25  4433.112 
>> ±  2.699  ns/op
>> 
>> 
>> At Datadog, we handle a great amount of text (through logs management for 
>> example), and hashing String represents a large part of our CPU usage.

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Paul Sandoz
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong  wrote:

> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
 line 4086:

> 4084: } else {
> 4085: $Type$Species vsp = vspecies();
> 4086: if (offset < 0 || offset > (a.length - vsp.length())) {

Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. 

if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { ...

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-11 Thread Paul Sandoz
On Wed, 11 May 2022 03:23:13 GMT, Xiaohong Gong  wrote:

>> I modified the code of this PR to avoid the conversion of `boolean` to 
>> `int`, so a constant integer value is passed all the way through, and the 
>> masked load is made intrinsic from the method at which the constants are 
>> passed as arguments i.e. the public `fromArray` mask accepting method.
>
> Hi @PaulSandoz , thanks for the patch for the constant int parameter. I think 
> the main change is:
> 
> -ByteVector fromArray0Template(Class maskClass, C base, long offset, 
> int index, M m, boolean offsetInRange,
> +ByteVector fromArray0Template(Class maskClass, C base, long offset, 
> int index, M m, int offsetInRange,
>  VectorSupport.LoadVectorMaskedOperation ByteVector, ByteSpecies, M> defaultImpl) {
>  m.check(species());
>  ByteSpecies vsp = vspecies();
> -if (offsetInRange) {
> -return VectorSupport.loadMasked(
> -vsp.vectorType(), maskClass, vsp.elementType(), 
> vsp.laneCount(),
> -base, offset, m, /* offsetInRange */ 1,
> -base, index, vsp, defaultImpl);
> -} else {
> -return VectorSupport.loadMasked(
> -vsp.vectorType(), maskClass, vsp.elementType(), 
> vsp.laneCount(),
> -base, offset, m, /* offsetInRange */ 0,
> -base, index, vsp, defaultImpl);
> -}
> +return VectorSupport.loadMasked(
> +vsp.vectorType(), maskClass, vsp.elementType(), vsp.laneCount(),
> +base, offset, m, offsetInRange == 1 ? 1 : 0,
> +base, index, vsp, defaultImpl);
>  }
> 
> which uses `offsetInRange == 1 ? 1 : 0`. Unfortunately this could not always 
> make sure the `offsetInRange` a constant a the compiler time. Again, this 
> change could also make the assertion fail randomly:
> 
> --- a/src/hotspot/share/opto/vectorIntrinsics.cpp
> +++ b/src/hotspot/share/opto/vectorIntrinsics.cpp
> @@ -1236,6 +1236,7 @@ bool 
> LibraryCallKit::inline_vector_mem_masked_operation(bool is_store) {
>  } else {
>// Masked vector load with IOOBE always uses the predicated load.
>const TypeInt* offset_in_range = gvn().type(argument(8))->isa_int();
> +  assert(offset_in_range->is_con(), "must be a constant");
>if (!offset_in_range->is_con()) {
>  if (C->print_intrinsics()) {
>tty->print_cr("  ** missing constant: offsetInRange=%s",
> 
> Sometimes, the compiler can parse it a constant. I think this depends on the 
> compiler OSR and speculative optimization. Did you try an example with IOOBE 
> on a non predicated hardware?
> 
> Here is the main code of my unittest to reproduce the issue:
> 
> static final VectorSpecies I_SPECIES = IntVector.SPECIES_128;
> static final int LENGTH = 1026;
> public static int[] ia;
> public static int[] ib;
> 
> private static void init() {
> for (int i = 0; i < LENGTH; i++) {
> ia[i] = i;
> ib[i] = 0;
> }
> 
> for (int i = 0; i < 2; i++) {
> m[i] = i % 2 == 0;
> }
> }
> 
>  private static void func() {
> VectorMask mask = VectorMask.fromArray(I_SPECIES, m, 0);
> for (int i = 0; i < LENGTH; i += vl) {
> IntVector av = IntVector.fromArray(I_SPECIES, ia, i, mask);
> av.lanewise(VectorOperators.ABS).intoArray(ic, i, mask);
> }
>  }
> 
>  public static void main(String[] args) {
> init();
> for (int i = 0; i < 1; i++) {
> func();
> }
>   }

@XiaohongGong Doh! The ternary was an experiment, and I forgot to re-run the 
code gen script before sending your the patch. See `IntVector`, which does not 
have that. I presume when the offset is not in range and the other code path is 
taken then it might be problematic unless all code paths are inlined. I will 
experiment further with tests.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-09 Thread Paul Sandoz
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename "use_predicate" to "needs_predicate"

I did modified the code of this PR to avoid the conversion of `boolean` to 
`int` and the masked load is made intrinsic from the method at which the 
constants are passed as arguments i.e. the public `fromArray` mask accepting 
method.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8286294 : ForkJoinPool.commonPool().close() spins

2022-05-06 Thread Paul Sandoz
On Fri, 6 May 2022 15:05:57 GMT, Doug Lea  wrote:

> Changes ForkJoinPool.close spec and code to trap close as a no-op if called 
> on common pool

Changes look good, it will need a CSR.

-

PR: https://git.openjdk.java.net/jdk/pull/8577


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-06 Thread Paul Sandoz
On Fri, 6 May 2022 04:49:39 GMT, Xiaohong Gong  wrote:

>> offset is long so uses two argument slots (5 and 6). 
>> mask is argument (7).
>> offsetInRange is argument(8).
>
> Make sense! Thanks for the explanation!

Doh! of course. This is not the first and will not be the last time i get 
caught out by the 2-slot requirement.
It may be useful to do this:

Node* mask_arg = is_store ? argument(8) : argument(7);

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-05 Thread Paul Sandoz
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename "use_predicate" to "needs_predicate"

src/hotspot/share/opto/vectorIntrinsics.cpp line 1238:

> 1236: } else {
> 1237:   // Masked vector load with IOOBE always uses the predicated load.
> 1238:   const TypeInt* offset_in_range = 
> gvn().type(argument(8))->isa_int();

Should it be `argument(7)`? (and adjustments later to access the container).

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Paul Sandoz
On Thu, 5 May 2022 01:13:23 GMT, Xiaohong Gong  wrote:

> Yeah, I agree that it's not good by adding a branch checking for 
> `offsetInRange`. But actually I met the constant issue that passing the 
> values all the way cannot guarantee the argument a constant in compiler at 
> the compile time. Do you have any better idea to fixing this?

That's odd, `boolean` constants are passed that are then converted to `int` 
constants.
Did you try passing integer constants all the way through?

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-04 Thread Paul Sandoz
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai  wrote:

> Hi,
> 
> This patch optimises the matching rules for floating-point comparison with 
> respects to eq/ne on x86-64
> 
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
> improves the sequence of `If (CmpF x x) (Bool ne)` from
> 
> ucomiss xmm0, xmm0
> jp  label
> jne label
> 
> into
> 
> ucomiss xmm0, xmm0
> jp  label
> 
> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x 
> == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
> fixing the flags, such as
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> jnp done
> pushf
> andq[rsp], 0xff2b
> popf
> done:
> movleax, 1
> cmovel  eax, ecx
> 
> The patch changes this sequence into
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> movleax, 1
> cmovpl  eax, ecx
> cmovnel eax, ecx
> 
> 3, The patch also changes the pattern of `isInfinite` to be more optimised by 
> using `Math.abs` to reduce 1 comparison and compares the result with 
> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
> 
> The benchmark results are as follow:
> 
> Before:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
> 
> After:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
> 
> Thank you very much.

The changes to `Float` and `Double` look good. I don't think we need additional 
tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.

At first i thought we no longer need PR #8459 but it seems both PRs are 
complimentary, albeit PR #8459 has more modest performance gains for the 
intrinsics.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator)

2022-05-04 Thread Paul Sandoz
On Wed, 27 Apr 2022 11:03:48 GMT, Jatin Bhateja  wrote:

> Hi All,
> 
> Patch adds the planned support for new vector operations and APIs targeted 
> for [JEP 426: Vector API (Fourth 
> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
> 
> Following is the brief summary of changes:-
> 
> 1)  Extends the scope of existing lanewise API for following new vector 
> operations.
>-  VectorOperations.BIT_COUNT: counts the number of one-bits
>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero 
> bits
>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing 
> zero bits
>- VectorOperations.REVERSE: reversing the order of bits
>- VectorOperations.REVERSE_BYTES: reversing the order of bytes
>- compress and expand bits: Semantics are based on Hacker's Delight 
> section 7-4 Compress, or Generalized Extract.
> 
> 2)  Adds following new APIs to perform cross lane vector compress and 
> expansion operations under the influence of a mask.
>- Vector.compress
>- Vector.expand 
>- VectorMask.compress
> 
> 3) Adds predicated and non-predicated versions of following new APIs to load 
> and store the contents of vector from foreign MemorySegments. 
>   - Vector.fromMemorySegment
>   - Vector.intoMemorySegment
> 
> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support 
> for each newly added operation.
> 
> 
>  Patch has been regressed over AARCH64 and X86 targets different AVX levels. 
> 
>  Kindly review and share your feedback.
> 
>  Best Regards,
>  Jatin

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1340:

> 1338:   assert_different_registers(dst, src, vtmp1, vtmp2, vtmp3, vtmp4);
> 1339:   assert_different_registers(mask, ptmp, pgtmp);
> 1340:   // Example input:   src   = 88 77 66 45 44 33 22 11

Suggestion:

  // Example input:   src   = 88 77 66 55 44 33 22 11

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8277090 : jsr166 refresh for jdk19

2022-05-04 Thread Paul Sandoz
On Wed, 4 May 2022 12:30:53 GMT, Doug Lea  wrote:

> This is the revised jsr166 refresh for jdk19. See 
> https://bugs.openjdk.java.net/browse/JDK-8285450 and 
> https://bugs.openjdk.java.net/browse/JDK-8277090. Out of caution, this PR 
> removes the unrelated commits from original version.

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8536


Re: RFR: 8277090 : jsr166 refresh for jdk19 [v2]

2022-05-03 Thread Paul Sandoz
On Tue, 3 May 2022 15:32:27 GMT, Martin Buchholz  wrote:

>> Doug Lea has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Address review comments
>
> src/java.base/share/classes/java/util/concurrent/CompletableFuture.java line 
> 2153:
> 
>> 2151: 
>> 2152: @Override
>> 2153: public Throwable exceptionNow() {
> 
> The unwrapping of CompletionExceptions should be documented

`exceptionNow` is specified to call `get`, which does not throw 
`CompletionException`:

 * @implSpec
 * The default implementation invokes {@code isDone()} to test if the task
 * has completed. If done and not cancelled, it invokes {@code get()} and
 * catches the {@code ExecutionException} to obtain the exception.

-

PR: https://git.openjdk.java.net/jdk/pull/8490


Re: RFR: 8277090 : jsr166 refresh for jdk19 [v2]

2022-05-03 Thread Paul Sandoz
On Mon, 2 May 2022 13:33:33 GMT, Doug Lea  wrote:

>> This is the jsr166 refresh for jdk19. See 
>> https://bugs.openjdk.java.net/browse/JDK-8285450 and 
>> https://bugs.openjdk.java.net/browse/JDK-8277090
>
> Doug Lea has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Address review comments

Looks good, although I cannot say i reviewed the fork join pool code as 
thoroughly, since it is particularly difficult, so i was most looking for 
consistency in the pattern of code.

src/java.base/share/classes/java/util/concurrent/ForkJoinWorkerThread.java line 
76:

> 74: ForkJoinWorkerThread(ThreadGroup group, ForkJoinPool pool,
> 75:  boolean useSystemClassLoader,
> 76:  boolean clearThreadLocals) {

It's tempting to toggle the sense of last boolean argument to be 
`preserveThreadLocals` for consistency (given the multiple toggles as the value 
is propagated from the newly added protected constructor), but which means 
toggling the value at the use sites that you may not wish to change.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8490


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-29 Thread Paul Sandoz
On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename the "usePred" to "offsetInRange"

IIUC when the hardware does not support predicated loads then any false 
`offsetIntRange` value causes the load intrinsic to fail resulting in the 
fallback, so it would not be materially any different to the current behavior, 
just more uniformly implemented.

Why can't the intrinsic support the passing a boolean directly? Is it something 
to do with constants? If that is not possible I recommend creating named 
constant values and pass those all the way through rather than converting a 
boolean to an integer value. Then there is no need for a branch checking 
`offsetInRange`.

Might be better to hold off until the JEP is integrated and then update, since 
this will conflict (`byte[]` and `ByteBuffer` load methods are removed and 
`MemorySegment` load methods are added). You could prepare for that now by 
branching off `vectorIntrinsics`.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v4]

2022-04-29 Thread Paul Sandoz
On Fri, 29 Apr 2022 06:35:44 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> The Current Vector API doc for `LSHR` is
>> 
>> Produce a>>>(n&(ESIZE*8-1)). Integral only.
>> 
>> 
>> This is misleading which may lead to bugs for Java developers.
>> This is because for negative byte/short elements, the results computed by 
>> `LSHR` will be different from that of `>>>`.
>> For more details, please see 
>> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
>> 
>> After the patch, the doc for `LSHR` is
>> 
>> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral 
>> only.
>> 
>> 
>> Thanks.
>> Best regards,
>> Jie
>
> Jie Fu has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains eight additional commits since 
> the last revision:
> 
>  - Address CSR review comments
>  - Merge branch 'master' into JDK-8284992
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - Merge branch 'master' into JDK-8284992
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - 8284992: Fix misleading Vector API doc for LSHR operator

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v3]

2022-04-28 Thread Paul Sandoz
On Wed, 27 Apr 2022 09:06:12 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> The Current Vector API doc for `LSHR` is
>> 
>> Produce a>>>(n&(ESIZE*8-1)). Integral only.
>> 
>> 
>> This is misleading which may lead to bugs for Java developers.
>> This is because for negative byte/short elements, the results computed by 
>> `LSHR` will be different from that of `>>>`.
>> For more details, please see 
>> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
>> 
>> After the patch, the doc for `LSHR` is
>> 
>> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral 
>> only.
>> 
>> 
>> Thanks.
>> Best regards,
>> Jie
>
> Jie Fu has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains six additional commits since the 
> last revision:
> 
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - Merge branch 'master' into JDK-8284992
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - 8284992: Fix misleading Vector API doc for LSHR operator

It should be possible for you finalize now.

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v3]

2022-04-27 Thread Paul Sandoz
On Wed, 27 Apr 2022 09:06:12 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> The Current Vector API doc for `LSHR` is
>> 
>> Produce a>>>(n&(ESIZE*8-1)). Integral only.
>> 
>> 
>> This is misleading which may lead to bugs for Java developers.
>> This is because for negative byte/short elements, the results computed by 
>> `LSHR` will be different from that of `>>>`.
>> For more details, please see 
>> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
>> 
>> After the patch, the doc for `LSHR` is
>> 
>> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral 
>> only.
>> 
>> 
>> Thanks.
>> Best regards,
>> Jie
>
> Jie Fu has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains six additional commits since the 
> last revision:
> 
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - Merge branch 'master' into JDK-8284992
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - 8284992: Fix misleading Vector API doc for LSHR operator

I created one, filled it in, and assigned it to you (for other examples you can 
search in the issue tracker, this one quite is simple so i thought it was 
quicker to do myself to show you). For any specification change we need to 
review and track that change (independent of any implementation changes, if 
any).

If you are ok with it I can add myself as reviewer, then you can "Finalize" it 
(see button on same line as "Edit"), triggering a review request, from which we 
may receive comments to address, and once addressed and final, it will unblock 
the PR for integration.

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]

2022-04-27 Thread Paul Sandoz
On Wed, 27 Apr 2022 14:24:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e

It's great to see end game in play for this multi-year effort, impressive work 
by all involved.

I looked through code in the base module. Generally looks well structured and 
documented.

The fork join code is naturally hard to review. I did try! (it takes the prize 
for the highest Java code density in the JDK). I mostly looked for regular and 
repeated structure rather than trying to fully understand/review the intricate 
concurrency details.

IMO there is some post integration work to do around the architecture of thread 
containers, i presume the structured concurrency implementation (use of thread 
flock) will help with that.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]

2022-04-27 Thread Paul Sandoz
On Wed, 27 Apr 2022 14:24:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e

src/java.base/share/classes/java/lang/Thread.java line 116:

> 114:  * carrier threads. Locking and I/O operations are examples of 
> operations
> 115:  * where a carrier thread may be re-scheduled from one virtual thread to 
> another.
> 116:  * Code executing in a virtual thread is not aware of underlying carrier 
> thread.

Suggestion:

 * Code executing in a virtual thread is not aware of the underlying carrier 
thread.

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]

2022-04-27 Thread Paul Sandoz
On Wed, 27 Apr 2022 14:24:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e

src/java.base/share/classes/java/util/concurrent/ThreadPerTaskExecutor.java 
line 45:

> 43:  * threads is unbounded.
> 44:  */
> 45: class ThreadPerTaskExecutor implements ExecutorService {

This class manages the set of per-task threads which arguably should be the job 
of the thread container, and it awkwardly overrides the container's set of 
threads by setting a field on `SharedThreadContainer.threadsSupplier`.

`SharedThreadContainer` supports a number of different shared container 
policies that could be made clearer.

Architecturally i think this could be better layered but it can be iterated on 
later.

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]

2022-04-27 Thread Paul Sandoz
On Wed, 27 Apr 2022 17:27:56 GMT, Mandy Chung  wrote:

>> Alan Bateman has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e
>
> src/java.base/share/classes/java/lang/PinnedThreadPrinter.java line 58:
> 
>> 56: 
>> 57: // maps a class to the hashes of stack traces pinned by that code in 
>> that class
>> 58: private static final Map, Hashes> classToHashes = new 
>> WeakHashMap<>();
> 
> Can you use `ClassValue` in this case?

I was wondering that too, but held off commenting since it's not super 
performance critical given `classToHashes` is synchronized on. However, it does 
make the code a little clearer.

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v3]

2022-04-27 Thread Paul Sandoz
On Wed, 27 Apr 2022 09:06:12 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> The Current Vector API doc for `LSHR` is
>> 
>> Produce a>>>(n&(ESIZE*8-1)). Integral only.
>> 
>> 
>> This is misleading which may lead to bugs for Java developers.
>> This is because for negative byte/short elements, the results computed by 
>> `LSHR` will be different from that of `>>>`.
>> For more details, please see 
>> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
>> 
>> After the patch, the doc for `LSHR` is
>> 
>> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral 
>> only.
>> 
>> 
>> Thanks.
>> Best regards,
>> Jie
>
> Jie Fu has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains six additional commits since the 
> last revision:
> 
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - Merge branch 'master' into JDK-8284992
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - 8284992: Fix misleading Vector API doc for LSHR operator

Thanks, looks good, we will need to create a CSR. Have you done that before?

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v2]

2022-04-26 Thread Paul Sandoz
On Thu, 21 Apr 2022 04:23:22 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> The Current Vector API doc for `LSHR` is
>> 
>> Produce a>>>(n&(ESIZE*8-1)). Integral only.
>> 
>> 
>> This is misleading which may lead to bugs for Java developers.
>> This is because for negative byte/short elements, the results computed by 
>> `LSHR` will be different from that of `>>>`.
>> For more details, please see 
>> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
>> 
>> After the patch, the doc for `LSHR` is
>> 
>> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral 
>> only.
>> 
>> 
>> Thanks.
>> Best regards,
>> Jie
>
> Jie Fu has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains three additional commits since 
> the last revision:
> 
>  - Address review comments
>  - Merge branch 'master' into JDK-8284992
>  - 8284992: Fix misleading Vector API doc for LSHR operator

After talking with John here's what we think is a better approach than what I 
originally had in mind:
1. In the class doc of `VectorOperators` add a definition for `EMASK` occurring 
after the definition for `ESIZE`:

 * {@code EMASK}  the bit mask of the operand type, where {@code 
EMASK=(1<>>(n&(ESIZE*8-1))}.  Integral only. */

That more clearly gets across operating in the correct domain for sub-word 
operand types, which was the original intention (e.g. the right shift value).

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator

2022-04-20 Thread Paul Sandoz
On Tue, 19 Apr 2022 08:41:50 GMT, Jie Fu  wrote:

> Hi all,
> 
> The Current Vector API doc for `LSHR` is
> 
> Produce a>>>(n&(ESIZE*8-1)). Integral only.
> 
> 
> This is misleading which may lead to bugs for Java developers.
> This is because for negative byte/short elements, the results computed by 
> `LSHR` will be different from that of `>>>`.
> For more details, please see 
> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
> 
> After the patch, the doc for `LSHR` is
> 
> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral only.
> 
> 
> Thanks.
> Best regards,
> Jie

We can raise attention to that:

/** Produce {@code a>>>(n&(ESIZE*8-1))} 
  * (The operand and result are converted if the operand type is {@code byte} 
or {@code short}, see below).  Integral only.
  * ...
  */

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator

2022-04-19 Thread Paul Sandoz
On Tue, 19 Apr 2022 08:41:50 GMT, Jie Fu  wrote:

> Hi all,
> 
> The Current Vector API doc for `LSHR` is
> 
> Produce a>>>(n&(ESIZE*8-1)). Integral only.
> 
> 
> This is misleading which may lead to bugs for Java developers.
> This is because for negative byte/short elements, the results computed by 
> `LSHR` will be different from that of `>>>`.
> For more details, please see 
> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
> 
> After the patch, the doc for `LSHR` is
> 
> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral only.
> 
> 
> Thanks.
> Best regards,
> Jie

The intended pattern for the operator tokens is to present a short symbolic 
description using Java operators and common methods. It would be good to try 
and keep with this pattern, and clarify for the extra cases. Here's what i had 
in mind:

/** Produce {@code a>>>(n&(ESIZE*8-1))}.  Integral only. 
  * 
  * For operand types {@code byte} and {@code short} the operation behaves 
as if the operand is first implicitly widened  
  * to an {@code int} value with {@code (a & ((1 << ESIZE) - 1))} the 
result of which is then applied as the operand to this 
  * operation, the result of the operation is then narrowed from {@code 
int} to the operand type using an explicit cast.
  */
public static final /*bitwise*/ Binary LSHR;

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements

2022-04-19 Thread Paul Sandoz
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu  wrote:

> Hi all,
> 
> According to the Vector API doc, the `LSHR` operator computes 
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
> 
> The background is that one of our customers try to vectorize `urshift` with 
> `urshiftVector` like the following.
> 
>  13 public static void urshift(byte[] src, byte[] dst) {
>  14 for (int i = 0; i < src.length; i++) {
>  15 dst[i] = (byte)(src[i] >>> 3);
>  16 }
>  17 }
>  18 
>  19 public static void urshiftVector(byte[] src, byte[] dst) {
>  20 int i = 0;
>  21 for (; i < spec.loopBound(src.length); i +=spec.length()) {
>  22 var va = ByteVector.fromArray(spec, src, i);
>  23 var vb = va.lanewise(VectorOperators.LSHR, 3);
>  24 vb.intoArray(dst, i);
>  25 }
>  26 
>  27 for (; i < src.length; i++) {
>  28 dst[i] = (byte)(src[i] >>> 3);
>  29 }
>  30 }
> 
> 
> Unfortunately and to our surprise, code@line28 computes different results 
> with code@line23.
> It took quite a long time to figure out this bug.
> 
> The root cause is that current implemenation of Vector API can't compute the  
> unsigned right shift results as what is done for scalar `>>>` for negative 
> byte/short elements.
> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all 
> bytes, which is unable to compute the vectorized `>>>` for negative bytes.
> So this seems unreasonable and unfriendly to Java developers.
> It would be better to fix it.
> 
> The key idea to support unsigned right shift of negative bytes/shorts is just 
> to replace the unsigned right shift operation with the signed right shift 
> operation.
> This logic is:
> - For byte elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 24.
> - For short elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 16.
> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes 
> and shift_cnt <= 15 for shorts.
> 
> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
> And many thanks to @fg1417 .
> 
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935

I am yet to be convinced we a 3rd vector right shift operator. We are talking 
about narrow cases of correct use which i believe can be supported with the 
existing operators. The user needs to think very careful when deviating from 
common right shift patterns on sub-words, which when deviated from can often 
imply misunderstanding and incorrect use, or an obtuse use. I would prefer to 
stick the existing operators and clarify their use.

-

PR: https://git.openjdk.java.net/jdk/pull/8276


Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements

2022-04-19 Thread Paul Sandoz
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu  wrote:

> Hi all,
> 
> According to the Vector API doc, the `LSHR` operator computes 
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
> 
> The background is that one of our customers try to vectorize `urshift` with 
> `urshiftVector` like the following.
> 
>  13 public static void urshift(byte[] src, byte[] dst) {
>  14 for (int i = 0; i < src.length; i++) {
>  15 dst[i] = (byte)(src[i] >>> 3);
>  16 }
>  17 }
>  18 
>  19 public static void urshiftVector(byte[] src, byte[] dst) {
>  20 int i = 0;
>  21 for (; i < spec.loopBound(src.length); i +=spec.length()) {
>  22 var va = ByteVector.fromArray(spec, src, i);
>  23 var vb = va.lanewise(VectorOperators.LSHR, 3);
>  24 vb.intoArray(dst, i);
>  25 }
>  26 
>  27 for (; i < src.length; i++) {
>  28 dst[i] = (byte)(src[i] >>> 3);
>  29 }
>  30 }
> 
> 
> Unfortunately and to our surprise, code@line28 computes different results 
> with code@line23.
> It took quite a long time to figure out this bug.
> 
> The root cause is that current implemenation of Vector API can't compute the  
> unsigned right shift results as what is done for scalar `>>>` for negative 
> byte/short elements.
> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all 
> bytes, which is unable to compute the vectorized `>>>` for negative bytes.
> So this seems unreasonable and unfriendly to Java developers.
> It would be better to fix it.
> 
> The key idea to support unsigned right shift of negative bytes/shorts is just 
> to replace the unsigned right shift operation with the signed right shift 
> operation.
> This logic is:
> - For byte elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 24.
> - For short elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 16.
> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes 
> and shift_cnt <= 15 for shorts.
> 
> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
> And many thanks to @fg1417 .
> 
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935

Not yet talked with John, but i investigated further. The implementation of the 
`LSHR` operation is behaving as intended, but is under specified with regards 
to `byte` and `short` as you noted in #8291.

This is a subtle area, but i am wondering if the user really means to use 
arithmetic shift in this case? Since is not the following true for all values 
of `e` and `c`, where `e` is a `byte` and `c` is the right shift count ranging 
from 0 to 7:


(byte) (e >>> c) == (byte) (e >> c)

?

Then the user can use `VectorOperators.ASHR`.

-

PR: https://git.openjdk.java.net/jdk/pull/8276


Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE

2022-04-19 Thread Paul Sandoz
On Tue, 19 Apr 2022 13:19:31 GMT, Claes Redestad  wrote:

> Trivially fix the resolution of the `NF_UNSAFE` named function to use the 
> appropriate lookup mode.
> 
> In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed 
> from regular reflection to use a `NamedFunction` for this field, but it 
> appears the lookup mode came out wrong. This mismatch appears to be benign 
> and ignored by HotSpot, but a user implementing an experimental JVM ran into 
> some issues (and additionally noticed the resolve throws the wrong 
> exception). 
> 
> This patch is a point fix to this particular issue, but I think we should 
> consider following up with a stronger assertion or test for proper staticness 
> of fields somewhere when resolving fields via 
> `MemberName.getFactory().resolveOrNull(..)`.

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8297


Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator

2022-04-19 Thread Paul Sandoz
On Tue, 19 Apr 2022 08:41:50 GMT, Jie Fu  wrote:

> Hi all,
> 
> The Current Vector API doc for `LSHR` is
> 
> Produce a>>>(n&(ESIZE*8-1)). Integral only.
> 
> 
> This is misleading which may lead to bugs for Java developers.
> This is because for negative byte/short elements, the results computed by 
> `LSHR` will be different from that of `>>>`.
> For more details, please see 
> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 .
> 
> After the patch, the doc for `LSHR` is
> 
> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral only.
> 
> 
> Thanks.
> Best regards,
> Jie

I need to think a little more about this. The specification is not accurate and 
likely requires a CSR.

My initial thoughts are I would prefer the operation to retain reference to the 
succinct definition using the logical right shift operator but we add 
additional specification explaining the cases for `byte` and `short`, namely 
that the result is widened to an `int` as if by `(a & ((1 << ESIZE) - 1))` and 
the result narrowed. That's commonly (at least for the widening part) how `>>>` 
is used with `byte` and `short`, and i think would be clearer to be explicit in 
that regard.

-

PR: https://git.openjdk.java.net/jdk/pull/8291


Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements

2022-04-19 Thread Paul Sandoz
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu  wrote:

> Hi all,
> 
> According to the Vector API doc, the `LSHR` operator computes 
> `a>>>(n&(ESIZE*8-1))`.
> However, current implementation is incorrect for negative bytes/shorts.
> 
> The background is that one of our customers try to vectorize `urshift` with 
> `urshiftVector` like the following.
> 
>  13 public static void urshift(byte[] src, byte[] dst) {
>  14 for (int i = 0; i < src.length; i++) {
>  15 dst[i] = (byte)(src[i] >>> 3);
>  16 }
>  17 }
>  18 
>  19 public static void urshiftVector(byte[] src, byte[] dst) {
>  20 int i = 0;
>  21 for (; i < spec.loopBound(src.length); i +=spec.length()) {
>  22 var va = ByteVector.fromArray(spec, src, i);
>  23 var vb = va.lanewise(VectorOperators.LSHR, 3);
>  24 vb.intoArray(dst, i);
>  25 }
>  26 
>  27 for (; i < src.length; i++) {
>  28 dst[i] = (byte)(src[i] >>> 3);
>  29 }
>  30 }
> 
> 
> Unfortunately and to our surprise, code@line28 computes different results 
> with code@line23.
> It took quite a long time to figure out this bug.
> 
> The root cause is that current implemenation of Vector API can't compute the  
> unsigned right shift results as what is done for scalar `>>>` for negative 
> byte/short elements.
> Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all 
> bytes, which is unable to compute the vectorized `>>>` for negative bytes.
> So this seems unreasonable and unfriendly to Java developers.
> It would be better to fix it.
> 
> The key idea to support unsigned right shift of negative bytes/shorts is just 
> to replace the unsigned right shift operation with the signed right shift 
> operation.
> This logic is:
> - For byte elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 24.
> - For short elements, unsigned right shift is equal to signed right shift if 
> the shift_cnt <= 16.
> - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes 
> and shift_cnt <= 15 for shorts.
> 
> I just learned this idea from https://github.com/openjdk/jdk/pull/7979 .
> And many thanks to @fg1417 .
> 
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935

I need to discuss with @rose00 on the history behind this before deciding what 
to do.

-

PR: https://git.openjdk.java.net/jdk/pull/8276


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/vm/annotation/ChangesCurrentThread.java
 line 35:

> 33:  * disables inlining for the method to which it is applied unless the
> 34:  * method being unlined into is also annotated ChangesCurrentThread.
> 35: 

Suggestion:

 * method being inlined into is also annotated ChangesCurrentThread.
 *

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/vm/ThreadContainers.java line 184:

> 182:  * with the Thread API.
> 183:  */
> 184: private static class RootContainer extends ThreadContainer {

This implementation could be clearer if split out into two, and the value 
assigned to `INSTANCE` is selected based  on the system property. Something to 
consider later perhaps.

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/vm/Continuation.java line 474:

> 472: private void processNmethods(int before, int after) {
> 473: 
> 474: }

Unused field and method?

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/vm/Continuation.java line 264:

> 262: } finally {
> 263: fence();
> 264: StackChunk c = tail;

`c` is not used

src/java.base/share/classes/jdk/internal/vm/Continuation.java line 310:

> 308: 
> 309: @IntrinsicCandidate
> 310: private static int doYield() { throw new Error("Intrinsic not 
> installed"); };

Suggestion:

private static int doYield() { throw new Error("Intrinsic not installed"); }

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/vm/Continuation.java line 115:

> 113: }
> 114: 
> 115: private Runnable target;

Can be final? (Does not appear to be updated by the VM, or via unsafe by some 
other class)

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/vm/Continuation.java line 94:

> 92: default:
> 93: throw new AssertionError("Unknown pinned reason: " + 
> reason);
> 94: }

Suggestion:

return switch (reason) {
case 2 -> Pinned.CRITICAL_SECTION;
case 3 -> Pinned.NATIVE;
case 4 -> Pinned.MONITOR;
default -> throw new AssertionError("Unknown pinned reason: " + 
reason);
};

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/misc/ThreadTracker.java line 59:

> 57: && this.thread == other.thread;
> 58: }
> 59: }

Suggestion:

private record ThreadRef(Thread thread) {
@Override
public int hashCode() {
return Long.hashCode(thread.threadId());
}

@Override
public boolean equals(Object obj) {
return (obj instanceof ThreadRef other)
&& this.thread == other.thread;
}
}

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/misc/ThreadFlock.java line 262:

> 260:  * @return the thread, started
> 261:  * @throws IllegalStateException if this flock is shutdown or closed
> 262:  * @throws jdk.incubator.concurrent.StructureViolationException if 
> the current

Suggestion:

 * @throws WrongThreadException if the current thread is not the owner or a 
thread
 * contained in the flock
 * @throws jdk.incubator.concurrent.StructureViolationException if the 
current

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/misc/InternalLock.java line 46:

> 44: } else {
> 45: CAN_USE_INTERNAL_LOCK = true;
> 46: }

Suggestion:

CAN_USE_INTERNAL_LOCK = Boolean.getBoolean("jdk.io.useMonitors");

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-15 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/jdk/internal/misc/Blocker.java line 118:

> 116: static long beginCompensatedBlock(ForkJoinPool pool) {
> 117: try {
> 118: return (long) beginCompensatedBlock.invoke(pool);

Suggestion:

return (long) beginCompensatedBlock.invokeExact(pool);

src/java.base/share/classes/jdk/internal/misc/Blocker.java line 126:

> 124: static void endCompensatedBlock(ForkJoinPool pool, long post) {
> 125: try {
> 126: endCompensatedBlock.invoke(pool, post);

Suggestion:

endCompensatedBlock.invokeExact(pool, post);

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-14 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/java/lang/VirtualThread.java line 65:

> 63:  * system.
> 64:  */
> 65: class VirtualThread extends Thread {

Suggestion:

final class VirtualThread extends Thread {

src/java.base/share/classes/java/lang/VirtualThread.java line 94:

> 92:  *  RUNNING -> PARKING // Thread attempts to park
> 93:  *  PARKING -> PARKED  // yield successful, thread is parked
> 94:  *  PARKING -> PINNED  // yield failed, thread is pinned

Suggestion:

 *  PARKING -> PARKED  // parking successful, thread is parked
 *  PARKING -> PINNED  // parking failed, thread is pinned

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-14 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/java/lang/Thread.java line 3028:

> 3026: 
> 3027: /** The thread container that this thread is in */
> 3028: @Stable private volatile ThreadContainer container;

The `volatile` modifier with `@Stable` is unusual. IIUC the field is set once 
on start (behaves as if a final field). Once C2 gets hold of it and can treat 
the value as a runtime constant the `volatile` has no effect. This is a tricky 
area but i think a store fence will suffice in `setThreadContainer` since the 
value goes from null to non-null and therefore cannot be speculated upon.

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-14 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/java/lang/Thread.java line 213:

> 211: 
> 212: // Additional fields for platform threads
> 213: private static class FieldHolder {

All but the task field are explicitly known by the VM (and stillborn is not 
accessed by Java code. Suggest grouping accordingly.

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]

2022-04-14 Thread Paul Sandoz
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman  wrote:

>> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which 
>> JDK version to target.
>> 
>> We will refresh this PR periodically to pick up changes and fixes from the 
>> loom repo.
>> 
>> Most of the new mechanisms in the HotSpot VM are disabled by default and 
>> require running with `--enable-preview` to enable.
>> 
>> The patch has support for x64 and aarch64 on the usual operating systems 
>> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for 
>> zero and some of the other ports. Additional ports can be contributed via 
>> PRs against the fibers branch in the loom repo.
>> 
>> There are changes in many areas. To reduce notifications/mails, the labels 
>> have been trimmed down for now to hotspot, serviceability and core-libs. 
>> We'll add the complete set of labels when the PR is further along.
>> 
>> The changes include a refresh of java.util.concurrent and ForkJoinPool from 
>> Doug Lea's CVS. These changes will probably be proposed and integrated in 
>> advance of this PR.
>> 
>> The changes include some non-exposed and low-level infrastructure to support 
>> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to 
>> make life a bit easier and avoid having to separate VM changes and juggle 
>> branches at this time.
>
> Alan Bateman has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Refresh

src/java.base/share/classes/java/lang/Thread.java line 862:

> 860:  * @param start the starting value of the counter
> 861:  * @return this builder
> 862:  * @throws IllegalArgumentException if count is negative

Suggestion:

 * @throws IllegalArgumentException if start is negative

-

PR: https://git.openjdk.java.net/jdk/pull/8166


Integrated: 8283892: Compress and expand bits

2022-04-14 Thread Paul Sandoz
On Tue, 5 Apr 2022 22:05:19 GMT, Paul Sandoz  wrote:

> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

This pull request has now been integrated.

Changeset: fbb09160
Author:Paul Sandoz 
URL:   
https://git.openjdk.java.net/jdk/commit/fbb09160906b4d9b0a29c8e99465f12ad16d4c88
Stats: 1031 lines in 7 files changed: 1024 ins; 5 del; 2 mod

8283892: Compress and expand bits

Reviewed-by: alanb, redestad

-

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v5]

2022-04-14 Thread Paul Sandoz
On Thu, 10 Feb 2022 04:30:34 GMT, Tagir F. Valeev  wrote:

>> See the bug description for details.
>> 
>> I propose a simple solution. Let's allow ArraySpliterator to be non-SIZED 
>> and report artificial estimatedSize(), much bigger than the real one. This 
>> will allow AbstractSpliterator and IteratorSpliterator to produce prefix 
>> whose size is comparable to Long.MAX_VALUE (say, starting with 
>> Long.MAX_VALUE/2), and this will enable further splitting of the prefix. 
>> This change will drastically improve parallel streaming for affected streams 
>> of size <= 1024 and significantly improve for streams of size 1025..2. 
>> The cost is higher-grained splitting for huge streams of unknown size. This 
>> might add a minor overhead for such scenarios which, I believe, is 
>> completely tolerable.
>> 
>> No public API changes are necessary, sequential processing should not be 
>> affected, except an extra field in ArraySpliterator which increases a 
>> footprint by 8 bytes.
>> 
>> I added a simple test using an artificial collector to ensure that at least 
>> two non-empty parts are created when parallelizing Stream.iterate source. 
>> More testing ideas are welcome.
>
> Tagir F. Valeev has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - Update copyright year
>  - Cosmetic fixes

Getting back to this after much delay! Approving. But, i would like to try and 
document this design decision in comments, and maybe in implementation notes. 
We can do that as a follow on PR.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7279


Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]

2022-04-11 Thread Paul Sandoz
On Mon, 11 Apr 2022 10:03:58 GMT, Claes Redestad  wrote:

>>> How would the performance change if the `isDirect` and 
>>> `checkExactAccessMode` merger was reverted?
>> 
>> Add around 15-20ns/op for these micros.
>
> Restructuring so that we only check `direct` once sounds reasonable at face 
> value but would be a lot of churn for little gain (even in the interpreter 
> testing a local boolean field is fast, and JITs will optimize this well).
> 
> The `LINK_TO_STATIC_ARGS_V` in the code generator seems like a remnant from 
> an earlier iteration of this code. The `vform.getMemberName_V` method the 
> code generator would emit a call to doesn't even exist. This should probably 
> be cleaned up, separately. @PaulSandoz, WDYT?

I think it was some left-over from prototyping and was not cleaned up. AFAICT 
`getMemberName_V` never existed in an committed code. It's OK to remove the 
`LINK_TO_STATIC_ARGS_V` as part of this commit of you wish.

-

PR: https://git.openjdk.java.net/jdk/pull/8160


Re: RFR: 8283892: Compress and expand bits [v7]

2022-04-11 Thread Paul Sandoz
> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

Paul Sandoz has updated the pull request incrementally with one additional 
commit since the last revision:

  JavaDoc typo

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8115/files
  - new: https://git.openjdk.java.net/jdk/pull/8115/files/f9b7b2bc..8a52eabe

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=05-06

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v6]

2022-04-08 Thread Paul Sandoz
> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

Paul Sandoz has updated the pull request incrementally with one additional 
commit since the last revision:

  Refer to hexadecimal digit

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8115/files
  - new: https://git.openjdk.java.net/jdk/pull/8115/files/ee062537..f9b7b2bc

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=04-05

  Stats: 18 lines in 2 files changed: 0 ins; 0 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v5]

2022-04-08 Thread Paul Sandoz
> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

Paul Sandoz has updated the pull request incrementally with one additional 
commit since the last revision:

  Updates to examples

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8115/files
  - new: https://git.openjdk.java.net/jdk/pull/8115/files/96d90e1a..ee062537

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=03-04

  Stats: 40 lines in 2 files changed: 10 ins; 2 del; 28 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v4]

2022-04-07 Thread Paul Sandoz
On Thu, 7 Apr 2022 20:02:51 GMT, Alan Bateman  wrote:

>> Examples added in latest commit.
>
> I see you've added a comment on each example too, I think this really helpers 
> readers to see how they can be used.
> 
> The class description of both Integer and Long have an implementation note 
> (they pre-date @implNote) referencing Hacker's Delight. We could potentially 
> expand the list of methods mentioned but it's not strictly necessary are they 
> are just examples.

I also saw that on the class doc and considered the new methods fit neatly 
under the category of "bit twiddling" methods.

-

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v4]

2022-04-07 Thread Paul Sandoz
On Wed, 6 Apr 2022 18:22:45 GMT, John R Rose  wrote:

>> Paul Sandoz has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Fix typo.
>>  - Provide examples.
>
> src/java.base/share/classes/java/lang/Integer.java line 1781:
> 
>> 1779:  * All the upper remaining bits of the compressed value are set
>> 1780:  * to zero.
>> 1781:  *
> 
> Following Alan's comment, I suggest the following examples:
> 
> 
>   compress(0x9ABCDEF, 0x0F0F0FF) == 0xACEF
>   compress(x, 1<>n & 1)
>   compress(x, -1<>> n
>   compress(m, m) == (m==-1||m==0)? m : (1<   compress(x, m) == compress(x & m, m)
>   compress(expand(x, m), m) == x & compress(m, m)
>   (compress(x, m) >>> n) & 1 == /*the bit of x corresponding to the nth set 
> bit in m*/
> 
> 
> …In two places.  Similarly, examples for expand:
> 
> 
>   expand(0x9ABCDEF, 0x0F0F0FF) == 0xC0D0EF
>   expand(x, 1<   expand(x, -1<   expand(-1, m) == m
>   expand(x, m) == expand(x, m) & m
>   expand(compress(x, m), m) == x & m
>   expand(1< highest/lowestOneBit*/
> 
> 
> (Please double check these examples!)

Examples added in latest commit.

-

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v4]

2022-04-07 Thread Paul Sandoz
> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

Paul Sandoz has updated the pull request incrementally with two additional 
commits since the last revision:

 - Fix typo.
 - Provide examples.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8115/files
  - new: https://git.openjdk.java.net/jdk/pull/8115/files/da49874b..96d90e1a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=02-03

  Stats: 186 lines in 2 files changed: 186 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v3]

2022-04-07 Thread Paul Sandoz
> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

Paul Sandoz has updated the pull request incrementally with five additional 
commits since the last revision:

 - Merge remote-tracking branch 'cl4es/expand_a_bit_more' into 
compress-expand-bits
 - Rename partx -> maskPrefixx, add ForceInline
 - Fix copy-paste error in Long.expand
 - Add microbenchmarks and manually unroll expand implementations
 - Consistent bounds.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8115/files
  - new: https://git.openjdk.java.net/jdk/pull/8115/files/8c536de6..da49874b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=01-02

  Stats: 163 lines in 5 files changed: 107 ins; 11 del; 45 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v2]

2022-04-07 Thread Paul Sandoz
On Thu, 7 Apr 2022 14:01:53 GMT, Claes Redestad  wrote:

>> Paul Sandoz has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Doc and test updates.
>
> I experimented with this and drafted a few microbenchmarks along and a 
> micro-optimization to the expand methods (see #8146). Up to you, but I think 
> it makes sense to consider such optimizations to the plain java 
> implementation given that 1) not all platforms have specialized instructions 
> where an intrinsic will make sense and 2) it appears to be a boost to warmup.

@cl4es thanks, your changes to `expand` looks reasonable, given the array 
allocation cost.

-

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v2]

2022-04-07 Thread Paul Sandoz
On Wed, 6 Apr 2022 17:39:48 GMT, John R Rose  wrote:

>> Paul Sandoz has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Doc and test updates.
>
> test/jdk/java/lang/AbstractCompressExpandTest.java line 145:
> 
>> 143: 
>> 144: int i = 0;
>> 145: for (int len = 0; len < 32; len++) {
> 
> Lengths should be `len = 1; len <= 32; len++`, generating fewer redundant 
> zero-length masks and a full-length -1 mask.  The pos should be `pos = 0; pos 
> <= 32 - len; pos++`, generating fully left-justified masks of the form 
> `-1< 
> Also, you should generate a zero-mask, just once.  I think initializing `int 
> i = 1` will do that trick.
> 
> (Same comment in two places.)

Done.

-

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v2]

2022-04-06 Thread Paul Sandoz
On Wed, 6 Apr 2022 17:43:34 GMT, John R Rose  wrote:

>> Paul Sandoz has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Doc and test updates.
>
> test/jdk/java/lang/AbstractCompressExpandTest.java line 104:
> 
>> 102: abstract long expectedExpand(long i, long mask);
>> 103: 
>> 104: static int SIZE = 1024;
> 
> It feels like `SIZE` should be a constructor parameter, to make it easier to 
> run ad hoc stress tests.

That's tricky because the test is TestNG based, launched via jtreg, it would 
need to be a system property.

-

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits [v2]

2022-04-06 Thread Paul Sandoz
> Add support to compress bits and expand bits of `int` and `long` values, see 
> Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to 
> enable general permutations, and the "sheep and goats" operation (SAG) see 
> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
> stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware 
> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
> implementations can be very efficient on supporting hardware. 
> Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
> beneficial performance impact of the `PDEP` instruction, and by extension the 
> `expand` method, when applied to the implementation of a bit-vector select 
> operation in succinct data structures (for example `select(r)` returns the 
> position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and 
> verify against the JDK implementations (which later will also be made 
> intrinsic). To compensate all tests are also run flipping the test methods 
> and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived 
> from the inputs.

Paul Sandoz has updated the pull request incrementally with one additional 
commit since the last revision:

  Doc and test updates.

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8115/files
  - new: https://git.openjdk.java.net/jdk/pull/8115/files/9516ecca..8c536de6

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=00-01

  Stats: 33 lines in 3 files changed: 17 ins; 2 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8283892: Compress and expand bits

2022-04-06 Thread Paul Sandoz
On Wed, 6 Apr 2022 14:58:37 GMT, Alan Bateman  wrote:

>> Add support to compress bits and expand bits of `int` and `long` values, see 
>> Hacker's Delight (2nd edition), section 7.4.
>> 
>> Compressing or expanding bits of an `int` or `long` value can be composed to 
>> enable general permutations, and the "sheep and goats" operation (SAG) see 
>> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
>> stable binary radix sort.
>> 
>> The compress and expand functionality maps efficiently to hardware 
>> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
>> implementations can be very efficient on supporting hardware. 
>> Intrinsification will occur in a separate PR.
>> 
>> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the 
>> beneficial performance impact of the `PDEP` instruction, and by extension 
>> the `expand` method, when applied to the implementation of a bit-vector 
>> select operation in succinct data structures (for example `select(r)` 
>> returns the position of the `r`th 1).
>> 
>> Testing-wise the approach take is three fold:
>> 1. Tests compared against simple implementations that are easy to read and 
>> verify against the JDK implementations (which later will also be made 
>> intrinsic). To compensate all tests are also run flipping the test methods 
>> and the methods under test.
>> 2. Tests composed of compress and expand and vice versa.
>> 3. Tests with known mask patterns, whose expected values are easily derived 
>> from the inputs.
>
> src/java.base/share/classes/java/lang/Integer.java line 1775:
> 
>> 1773:  * the specified bit mask.
>> 1774:  * 
>> 1775:  * For each one-bit value of the mask, {@code mb} say, from least
> 
> A minor comments is that "For each one-bit value of the mask mb " might 
> be a bit better, otherwise I think these methods and their javadoc looks 
> good. If it comes up then these methods could include an example in the 
> javadoc as they aren't hard once you see an example.

I can change to "For each one-bit value {@code mb} of the mask ..."

-

PR: https://git.openjdk.java.net/jdk/pull/8115


RFR: 8283892: Compress and expand bits

2022-04-05 Thread Paul Sandoz
Add support to compress bits and expand bits of `int` and `long` values, see 
Hacker's Delight (2nd edition), section 7.4.

Compressing or expanding bits of an `int` or `long` value can be composed to 
enable general permutations, and the "sheep and goats" operation (SAG) see 
Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a 
stable binary radix sort.

The compress and expand functionality maps efficiently to hardware 
instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the 
implementations can be very efficient on supporting hardware. Intrinsification 
will occur in a separate PR.

This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the beneficial 
performance impact of the `PDEP` instruction, and by extension the `expand` 
method, when applied to the implementation of a bit-vector select operation in 
succinct data structures (for example `select(r)` returns the position of the 
`r`th 1).

Testing-wise the approach take is three fold:
1. Tests compared against simple implementations that are easy to read and 
verify against the JDK implementations (which later will also be made 
intrinsic). To compensate all tests are also run flipping the test methods and 
the methods under test.
2. Tests composed of compress and expand and vice versa.
3. Tests with known mask patterns, whose expected values are easily derived 
from the inputs.

-

Commit messages:
 - Format.
 - License update.
 - Update test.
 - Update tests.
 - JavaDoc.
 - Remove byte/short impls. Update tests.
 - Compositional test.
 - Fix array size.
 - Basic test.
 - Initial commit

Changes: https://git.openjdk.java.net/jdk/pull/8115/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=8115=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8283892
  Stats: 726 lines in 5 files changed: 719 ins; 5 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8115.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115

PR: https://git.openjdk.java.net/jdk/pull/8115


Re: RFR: 8282191: Implementation of Foreign Function & Memory API (Preview) [v16]

2022-03-30 Thread Paul Sandoz
On Wed, 30 Mar 2022 21:51:16 GMT, Maurizio Cimadamore  
wrote:

>> This PR contains the API and implementation changes for JEP-424 [1]. A more 
>> detailed description of such changes, to avoid repetitions during the review 
>> process, is included as a separate comment.
>> 
>> [1] - https://openjdk.java.net/jeps/424
>
> Maurizio Cimadamore has updated the pull request with a new target base due 
> to a merge or a rebase. The pull request now contains 31 commits:
> 
>  - Merge branch 'master' into foreign-preview
>  - Tweak FunctionDescriptor::argumentLayouts to return an immutable list
>  - Fix bad usage of `@link` with primitive array types
>  - Switch to daemon threads for async upcalls
>  - Use thread local storage to optimize attach of async threads
>  - Drop support for Constable from MemoryLayout/FunctionDescriptor
>  - Merge branch 'master' into foreign-preview
>  - Revert changes to RunTests.gmk
>  - Add --enable-preview to micro benchmark java options
>  - Address more review comments
>  - ... and 21 more: 
> https://git.openjdk.java.net/jdk/compare/ce27d9dd...247e5eb5

Java code looks good (i did not go through the tests). As is common no 
comments, since code was reviewed in smaller steps in the panama-foreign 
respository.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7888


Re: RFR: 8282162: [vector] Optimize integral vector negation API [v3]

2022-03-29 Thread Paul Sandoz
On Mon, 28 Mar 2022 09:56:22 GMT, Xiaohong Gong  wrote:

>> The current vector `"NEG"` is implemented with substraction a vector by zero 
>> in case the architecture does not support the negation instruction. And to 
>> fit the predicate feature for architectures that support it, the masked 
>> vector `"NEG" ` is implemented with pattern `"v.not(m).add(1, m)"`. They 
>> both can be optimized to a single negation instruction for ARM SVE.
>> And so does the non-masked "NEG" for NEON. Besides, implementing the masked 
>> "NEG" with substraction for architectures that support neither negation 
>> instruction nor predicate feature can also save several instructions than 
>> the current pattern.
>> 
>> To optimize the VectorAPI negation, this patch moves the implementation from 
>> Java side to hotspot. The compiler will generate different nodes according 
>> to the architecture:
>>   - Generate the (predicated) negation node if architecture supports it, 
>> otherwise, generate "`zero.sub(v)`" pattern for non-masked operation.
>>   - Generate `"zero.sub(v, m)"` for masked operation if the architecture 
>> does not have predicate feature, otherwise generate the original pattern 
>> `"v.xor(-1, m).add(1, m)"`.
>> 
>> So with this patch, the following transformations are applied:
>> 
>> For non-masked negation with NEON:
>> 
>>   moviv16.4s, #0x0
>>   sub v17.4s, v16.4s, v17.4s   ==> neg v17.4s, v17.4s
>> 
>> and with SVE:
>> 
>>   mov z16.s, #0
>>   sub z18.s, z16.s, z17.s  ==> neg z16.s, p7/m, z16.s
>> 
>> For masked negation with NEON:
>> 
>>   moviv17.4s, #0x1
>>   mvn v19.16b, v18.16b
>>   mov v20.16b, v16.16b ==>  neg v18.4s, v17.4s
>>   bsl v20.16b, v19.16b, v18.16b bsl v19.16b, v18.16b, v17.16b
>>   add v19.4s, v20.4s, v17.4s
>>   mov v18.16b, v16.16b
>>   bsl v18.16b, v19.16b, v20.16b
>> 
>> and with SVE:
>> 
>>   mov z16.s, #-1
>>   mov z17.s, #1==> neg z16.s, p0/m, z16.s
>>   eor z18.s, p0/m, z18.s, z16.s
>>   add z18.s, p0/m, z18.s, z17.s
>> 
>> Here are the performance gains for benchmarks (see [1][2]) on ARM and x86 
>> machines(note that the non-masked negation benchmarks do not have any 
>> improvement on X86 since no instructions are changed):
>> 
>> NEON:
>> BenchmarkGain
>> Byte128Vector.NEG1.029
>> Byte128Vector.NEGMasked  1.757
>> Short128Vector.NEG   1.041
>> Short128Vector.NEGMasked 1.659
>> Int128Vector.NEG 1.005
>> Int128Vector.NEGMasked   1.513
>> Long128Vector.NEG1.003
>> Long128Vector.NEGMasked  1.878
>> 
>> SVE with 512-bits:
>> BenchmarkGain
>> ByteMaxVector.NEG1.10
>> ByteMaxVector.NEGMasked  1.165
>> ShortMaxVector.NEG   1.056
>> ShortMaxVector.NEGMasked 1.195
>> IntMaxVector.NEG 1.002
>> IntMaxVector.NEGMasked   1.239
>> LongMaxVector.NEG1.031
>> LongMaxVector.NEGMasked  1.191
>> 
>> X86 (non AVX-512):
>> BenchmarkGain
>> ByteMaxVector.NEGMasked  1.254
>> ShortMaxVector.NEGMasked 1.359
>> IntMaxVector.NEGMasked   1.431
>> LongMaxVector.NEGMasked  1.989
>> 
>> [1] 
>> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1881
>> [2] 
>> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1896
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Make "degenerate_vector_integral_negate" to be "NegVI" private

Java changes are good.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7782


Re: RFR: JDK-8282776: Bad NullPointerException message when invoking an interface MethodHandle on a null receiver [v4]

2022-03-29 Thread Paul Sandoz
On Wed, 23 Mar 2022 23:22:31 GMT, Mandy Chung  wrote:

>> A simple patch to call `Objects.requireNonNull(recv)` for an explicit null 
>> receiver check rather than NPE thrown by `Object::getClass`.  The message of 
>> NPE generated by JEP 358 (Helpful NullPointerExceptions) is supposed to be 
>> helpful but not in this case.
>
> Mandy Chung has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   per feedback

Moving the check under the `!isInstance(recv)` check is better.

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7766


Re: RFR: 8283470: Update java.lang.invoke.VarHandle to use sealed classes

2022-03-29 Thread Paul Sandoz
On Wed, 23 Mar 2022 16:27:29 GMT, Mandy Chung  wrote:

> This patch changes VarHandle and its implementation hierarchy to use sealed 
> classes.  All VarHandle permitted classes are package-private and either 
> final or sealed abstract classes.
> 
> Please also review the CSR: https://bugs.openjdk.java.net/browse/JDK-8283540

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/7926


Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

2022-03-08 Thread Paul Sandoz
On Tue, 8 Mar 2022 15:13:46 GMT, Claes Redestad  wrote:

>> Ludovic Henry has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add UTF-16 benchmarks
>
> An awkward effect of this implementation is that it perturbs results on small 
> Strings a bit. Adding a branch in the trivial case, but also regressing on 
> certain lengths (try size=7). The added complexity seem to be no issue for 
> JITs in these microbenchmarks, but I worry that the increased code size might 
> play tricks with inlining heuristics in real applications.
> 
> After chatting a bit with @richardstartin regarding the observation that 
> preventing a strength reduction on the constant 31 value being part of the 
> improvement I devised an experiment which simply makes the 31 non-constant as 
> to disable the strength reduction:
> 
> private static int base = 31;
> @Benchmark
> public int scalarLatin1_NoStrengthReduction() {
> int h = 0;
> int i = 0, len = latin1.length;
> for (; i < len; i++) {
> h = base * h + (latin1[i] & 0xff);
> }
> return h;
> }
> 
> Interestingly results of that get planted in the middle of the baseline on 
> large inputs, while avoiding most of the irregularities on small inputs 
> compared to manually unrolled versions:
> 
> Benchmark  (size)  Mode  Cnt  
>  Score   Error  Units
> StringHashCode.Algorithm.scalarLatin1   1  avgt3  
>  2.910 ? 0.068  ns/op
> StringHashCode.Algorithm.scalarLatin1   7  avgt3  
>  6.530 ? 0.065  ns/op
> StringHashCode.Algorithm.scalarLatin1   8  avgt3  
>  7.472 ? 0.034  ns/op
> StringHashCode.Algorithm.scalarLatin1  15  avgt3  
> 12.750 ? 0.028  ns/op
> StringHashCode.Algorithm.scalarLatin1 100  avgt3  
> 99.190 ? 0.618  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 1  avgt3  
>  3.119 ? 0.015  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 7  avgt3  
> 11.556 ? 4.690  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16 8  avgt3  
>  7.740 ? 0.005  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled1615  avgt3  
> 13.030 ? 0.124  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled16   100  avgt3  
> 46.470 ? 0.496  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8  1  avgt3  
>  3.123 ? 0.057  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8  7  avgt3  
> 11.380 ? 0.085  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8  8  avgt3  
>  5.849 ? 0.583  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8 15  avgt3  
> 12.312 ? 0.025  ns/op
> StringHashCode.Algorithm.scalarLatin1Unrolled8100  avgt3  
> 45.751 ? 0.146  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction   1  avgt3  
>  3.173 ? 0.015  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction   7  avgt3  
>  5.229 ? 0.455  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction   8  avgt3  
>  5.679 ? 0.012  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction  15  avgt3  
>  8.731 ? 0.103  ns/op
> StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 100  avgt3  
> 70.954 ? 3.386  ns/op
> 
> I wonder if this might be a safer play while we investigate intrinsification 
> and other possible enhancements?

@cl4es Yes, we would need to carefully measure the impact for small array sizes 
(similar to what we had to do when the array mismatch intrinsic was implemented 
and applied to array equals). My sense is to focus on the intrinsic and also 
look for potential opportunities like @merykitty points out, as that is where 
the larger impact is, although it is more work!

-

PR: https://git.openjdk.java.net/jdk/pull/7700


Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

2022-03-07 Thread Paul Sandoz
On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry  wrote:

>> Despite the hash value being cached for Strings, computing the hash still 
>> represents a significant CPU usage for applications handling lots of text.
>> 
>> Even though it would be generally better to do it through an enhancement to 
>> the autovectorizer, the complexity of doing it by hand is trivial and the 
>> gain is sizable (2x speedup) even without the Vector API. The algorithm has 
>> been proposed by Richard Startin and Paul Sandoz [1].
>> 
>> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`
>> 
>> 
>> Benchmark(size)  Mode  Cnt Score 
>>Error  Units
>> StringHashCode.Algorithm.scalarLatin1 0  avgt   25 2.111 
>> ±  0.210  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25 3.500 
>> ±  0.127  ns/op
>> StringHashCode.Algorithm.scalarLatin110  avgt   25 7.001 
>> ±  0.099  ns/op
>> StringHashCode.Algorithm.scalarLatin1   100  avgt   2561.285 
>> ±  0.444  ns/op
>> StringHashCode.Algorithm.scalarLatin1  1000  avgt   25   628.995 
>> ±  0.846  ns/op
>> StringHashCode.Algorithm.scalarLatin1 1  avgt   25  6307.990 
>> ±  4.071  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   0  avgt   25 2.358 
>> ±  0.092  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25 3.631 
>> ±  0.159  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16  10  avgt   25 7.049 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16 100  avgt   2533.626 
>> ±  1.218  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled161000  avgt   25   317.811 
>> ±  1.225  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled16   1  avgt   25  3212.333 
>> ± 14.621  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled80  avgt   25 2.356 
>> ±  0.097  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25 3.630 
>> ±  0.158  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8   10  avgt   25 8.724 
>> ±  0.065  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8  100  avgt   2532.402 
>> ±  0.019  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000  avgt   25   321.949 
>> ±  0.251  ns/op
>> StringHashCode.Algorithm.scalarLatin1Unrolled81  avgt   25  3202.083 
>> ±  1.667  ns/op
>> StringHashCode.Algorithm.scalarUTF16  0  avgt   25 2.135 
>> ±  0.191  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25 5.202 
>> ±  0.362  ns/op
>> StringHashCode.Algorithm.scalarUTF16 10  avgt   2511.105 
>> ±  0.112  ns/op
>> StringHashCode.Algorithm.scalarUTF16100  avgt   2575.974 
>> ±  0.702  ns/op
>> StringHashCode.Algorithm.scalarUTF16   1000  avgt   25   716.429 
>> ±  3.290  ns/op
>> StringHashCode.Algorithm.scalarUTF16  1  avgt   25  7095.459 
>> ± 43.847  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled160  avgt   25 2.381 
>> ±  0.038  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25 5.268 
>> ±  0.422  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16   10  avgt   2511.248 
>> ±  0.178  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16  100  avgt   2552.966 
>> ±  0.089  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000  avgt   25   450.912 
>> ±  1.834  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled161  avgt   25  4403.988 
>> ±  2.927  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 0  avgt   25 2.401 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25 5.091 
>> ±  0.396  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled810  avgt   2512.801 
>> ±  0.189  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8   100  avgt   2552.068 
>> ±  0.032  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8  1000  avgt   25   453.270 
>> ±  0.340  ns/op
>> StringHashCode.Algorithm.scalarUTF16Unrolled8 1  avgt   25  4433.112 
>> ±  2.699  ns/op
>> 
>> 
>> At Datadog, we handle a great amount of text (through logs management for 
>> example), and hashing String represents a large part of our CPU usage. It's 
&g

Re: RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v5]

2022-03-03 Thread Paul Sandoz
On Thu, 10 Feb 2022 04:30:34 GMT, Tagir F. Valeev  wrote:

>> See the bug description for details.
>> 
>> I propose a simple solution. Let's allow ArraySpliterator to be non-SIZED 
>> and report artificial estimatedSize(), much bigger than the real one. This 
>> will allow AbstractSpliterator and IteratorSpliterator to produce prefix 
>> whose size is comparable to Long.MAX_VALUE (say, starting with 
>> Long.MAX_VALUE/2), and this will enable further splitting of the prefix. 
>> This change will drastically improve parallel streaming for affected streams 
>> of size <= 1024 and significantly improve for streams of size 1025..2. 
>> The cost is higher-grained splitting for huge streams of unknown size. This 
>> might add a minor overhead for such scenarios which, I believe, is 
>> completely tolerable.
>> 
>> No public API changes are necessary, sequential processing should not be 
>> affected, except an extra field in ArraySpliterator which increases a 
>> footprint by 8 bytes.
>> 
>> I added a simple test using an artificial collector to ensure that at least 
>> two non-empty parts are created when parallelizing Stream.iterate source. 
>> More testing ideas are welcome.
>
> Tagir F. Valeev has updated the pull request incrementally with two 
> additional commits since the last revision:
> 
>  - Update copyright year
>  - Cosmetic fixes

For unknown-sized iterators we have to bias towards some strategy (recognizing 
that the source is a poor choice for parallelism).

Currently the "peeled" split containing an array of elements (copied from the 
iterator) is never split, since the array length will never in practice be 
greater than the size threshold. The parallelism is derived from the splits of 
the iterator, and this is biased towards a large number of elements.

In your approach each  "peeled" split effectively gets to use half of the total 
parallelism. Over subsequent splits of the source that's gonna over provision 
tasks, specifically after two splits (or 2 * 2^10 + 2^10 elements), and this is 
biased towards a smaller number of elements.

In the issue you point out a number of relevant stream sources. In practice 
they are unlikely to have millions of elements, and probably more likely 
fitting into the first few steps of the arithmetic splitting sequence. If the 
cost-per-element is reasonably high that would overcome any over provisioning.

Overall it looks reasonable. I ran it through tier 1/2 testing and there were 
no failures. If you don't mind I would like to let this percolate a little in 
my mind (i.e. sleep on it for a day or two).

-

PR: https://git.openjdk.java.net/jdk/pull/7279


Re: RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v4]

2022-03-03 Thread Paul Sandoz
On Thu, 10 Feb 2022 04:22:34 GMT, Tagir F. Valeev  wrote:

>> Tagir F. Valeev has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Benchmark to demonstrate the patch usefulness
>
> I added a microbenchmark to demonstrate the speed improvement achievable by 
> this patch. For demonstration, I used `Pattern.splitAsStream` source (though 
> as listed in JDK-8280915, many other standard sources are also affected). For 
> downstream operation I selected `BigInteger.pow(1000)` which is moderately 
> CPU-intensive (takes ~10us on my machine for input numbers within 
> 1000..2000). The benchmarking results are as follows (my machine has 8 
> cores). Here's non-patched version:
> 
> 
> Benchmark  (size)  Mode  Cnt  Score  Error  Units
> sumOf1000thPowers  10  avgt   30100.367 ±0.793  us/op
> sumOf1000thPowers 100  avgt   30963.857 ±6.252  us/op
> sumOf1000thPowers1000  avgt   30  10012.923 ±   81.091  us/op
> sumOf1000thPowers   1  avgt   30  99546.370 ±  625.105  us/op
> sumOf1000thPowersParallel  10  avgt   30104.561 ±1.118  us/op
> sumOf1000thPowersParallel 100  avgt   30995.400 ±   26.116  us/op
> sumOf1000thPowersParallel1000  avgt   30   9969.296 ±   85.166  us/op
> sumOf1000thPowersParallel   1  avgt   30  55856.516 ± 2315.530  us/op
> 
> 
> We observe that the results depend on the input size linearly for sequential 
> run, which is quite expected. Parallel doesn't help at all for size = 10, 
> 100, and 1000, which validates my claim that these spliterators cannot split 
> at all for sizes <= 1024. For size = 1, parallel version starts 
> performing somewhat better (1.78x), though it's not nearly close to the 
> available machine parallelism.
> 
> Here's patched version:
> 
> 
> Benchmark  (size)  Mode  Cnt  Score  Error  Units
> sumOf1000thPowers  10  avgt   30101.380 ±0.961  us/op
> sumOf1000thPowers 100  avgt   30970.843 ±8.360  us/op
> sumOf1000thPowers1000  avgt   30   9837.397 ±   93.656  us/op
> sumOf1000thPowers   1  avgt   30  97741.823 ± 1276.116  us/op
> sumOf1000thPowersParallel  10  avgt   30 41.597 ±0.910  us/op
> sumOf1000thPowersParallel 100  avgt   30189.735 ±1.911  us/op
> sumOf1000thPowersParallel1000  avgt   30   1776.337 ±   31.034  us/op
> sumOf1000thPowersParallel   1  avgt   30  17685.723 ±  127.846  us/op
> 
> 
> We observe no regression for sequential run and drastic improvement for 
> parallel. Even for size = 10, we see 2.46x speedup and 41 us total time. For 
> bigger sizes, we see 5.11x..5.54x speedup.
> 
> Please review my patch. I'll be happy to discuss any concerns about this 
> optimization you may have. Thanks in advance!

> @amaembo It's on my queue to review, i may get to it tomorrow, otherwise next 
> week.

I took last week off at short notice, since it was the school holidays. Looking 
today.

-

PR: https://git.openjdk.java.net/jdk/pull/7279


Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature [v2]

2022-03-03 Thread Paul Sandoz
On Thu, 3 Mar 2022 08:12:46 GMT, Xiaohong Gong  wrote:

>> The vector `"test"` api is implemented with vector `"compare"`. And the 
>> masked `"test" `is implemented with `"test(op).and(m)"` which means 
>> `"compare().and(m)"` finally. Since the masked vector `"compare"` has been 
>> optimized with predicated instruction for archituctures that support the 
>> feature, the masked `"test"` can also be optimized by implementing it with 
>> the predicated compare. This could save the additional ` "and"` for 
>> architectures that support the predicate feature.
>> 
>> This patch optimized the masked `"test"` by implementing it with masked 
>> `"compare"`. With this patch, the following codes for the` "IS_NEGATIVE"` op 
>> for a DoubleVector with SVE:
>> 
>>   mov z19.d, #0
>>   cmpgt   p1.d, p7/z, z19.d, z17.d
>>   and p0.b, p7/z, p1.b, p0.b
>> 
>> are optimized to:
>> 
>>   mov z19.d, #0
>>   cmpgt   p0.d, p0/z, z19.d, z17.d
>> 
>> Also update the jtreg tests for masked` "test" ` to make sure they are hot 
>> enough to be compiled by c2.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Simply the testTemplate

I guess the following:
`mask.cast(IntVector.species(shape())`
is more efficient than:
`m.cast(vspecies().asIntegral()))`
?

-

Marked as reviewed by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7654


Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature

2022-03-02 Thread Paul Sandoz
On Wed, 2 Mar 2022 02:50:27 GMT, Xiaohong Gong  wrote:

> The vector `"test"` api is implemented with vector `"compare"`. And the 
> masked `"test" `is implemented with `"test(op).and(m)"` which means 
> `"compare().and(m)"` finally. Since the masked vector `"compare"` has been 
> optimized with predicated instruction for archituctures that support the 
> feature, the masked `"test"` can also be optimized by implementing it with 
> the predicated compare. This could save the additional ` "and"` for 
> architectures that support the predicate feature.
> 
> This patch optimized the masked `"test"` by implementing it with masked 
> `"compare"`. With this patch, the following codes for the` "IS_NEGATIVE"` op 
> for a DoubleVector with SVE:
> 
>   mov z19.d, #0
>   cmpgt   p1.d, p7/z, z19.d, z17.d
>   and p0.b, p7/z, p1.b, p0.b
> 
> are optimized to:
> 
>   mov z19.d, #0
>   cmpgt   p0.d, p0/z, z19.d, z17.d
> 
> Also update the jtreg tests for masked` "test" ` to make sure they are hot 
> enough to be compiled by c2.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java 
line 1737:

> 1735: ,
> 1736:  M2 extends VectorMask>
> 1737: M1 testTemplate(Class maskType, Test op, M2 mask) {

Can we simplify by making some code FP specific?
- the mask `cast` can be applied in this method rather than in the caller, 
simplifying the signature
- for clarify, vector `viewAsIntegralLanes` is only needed for FP (update would 
be required for the non-mask template).

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java 
line 1766:

> 1764: throw new AssertionError(op);
> 1765: }
> 1766: return maskType.cast(m.cast(this.vspecies()));

Suggestion:

return maskType.cast(m.cast(vsp));

Same for non-mask template too.

test/jdk/jdk/incubator/vector/templates/Unit-Test.template line 29:

> 27: VectorMask<$Wideboxtype$> vmask = VectorMask.fromArray(SPECIES, 
> mask, 0);
> 28: 
> 29: for (int ic = 0; ic < INVOC_COUNT; ic++) {

Can you remove `SmokeTest` from the method name.

-

PR: https://git.openjdk.java.net/jdk/pull/7654


Re: RFR: 8282143: Objects.requireNonNull should be ForceInline [v2]

2022-03-01 Thread Paul Sandoz
On Wed, 2 Mar 2022 00:32:46 GMT, Quan Anh Mai  wrote:

>> Quan Anh Mai has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   the other
>
> Thanks a lot for your reviews, do I need a second review for this?
> @ExE-Boss IMO generally we only decorate `ForceInline` on functions of which 
> inlining is crucial since inlining decision is a hard problem and often not 
> solvable from the local point of view, especially from the callee's one.

@merykitty it's good.

-

PR: https://git.openjdk.java.net/jdk/pull/7543


Re: RFR: 8282143: Objects.requireNonNull should be ForceInline [v2]

2022-03-01 Thread Paul Sandoz
On Tue, 1 Mar 2022 02:22:49 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> `Objects.requireNonNull` may fail to be inlined. The call is expensive and 
>> may lead to objects escaping to the heap while the null check is cheap and 
>> is often elided. I have observed this when using the vector API when a call 
>> to `Objects.requireNonNull` leads to vectors being materialised in a hot 
>> loop.
>> 
>> Should the other `requireNonNull` be `ForceInline` as well?
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   the other

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/7543


Re: RFR: 8282143: Objects.requireNonNull should be ForceInline

2022-02-28 Thread Paul Sandoz
On Sat, 19 Feb 2022 05:51:52 GMT, Quan Anh Mai  wrote:

> Hi,
> 
> `Objects.requireNonNull` may fail to be inlined. The call is expensive and 
> may lead to objects escaping to the heap while the null check is cheap and is 
> often elided. I have observed this when using the vector API when a call to 
> `Objects.requireNonNull` leads to vectors being materialised in a hot loop.
> 
> Should the other `requireNonNull` be `ForceInline` as well?
> 
> Thank you very much.

`Objects.requireNonNull` is also used on the critical path of VarHandles, and 
other places in `j.l.invoke` so I think this is generally beneficial beyond use 
by the Vector API.

-

PR: https://git.openjdk.java.net/jdk/pull/7543


Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2]

2022-02-14 Thread Paul Sandoz
On Sun, 13 Feb 2022 05:14:47 GMT, Quan Anh Mai  wrote:

>> Observing the following failures on CPUs with 
>> "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags:
>> 
>> -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 
>> -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation
>> 
>> 
>> TestVectorCastAVX512.java:
>> 
>> Failed IR Rules (1)
>> --
>> - Method "public static void 
>> compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
>> applyIfAnd={}, applyIfOr={}, 
>> counts={"(d+(s){2}(VectorUCastI2X.*)+(s){2}===.*)", "1"}, 
>> applyIfNot={})"
>> - counts: Graph contains wrong number of nodes:
>> Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*)
>> Expected 1 but found 0 nodes.
>> 
>> 
>> TestVectorCastAVX1.java:
>> 
>> - Method "public static void 
>> compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
>> applyIfAnd={}, applyIfOr={}, 
>> counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, 
>> applyIfNot={})"
>> - counts: Graph contains wrong number of nodes:
>> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*)
>> Expected 1 but found 0 nodes.
>> 
>> - Method "public static void 
>> compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])":
>>   * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
>> applyIfAnd={}, applyIfOr={}, 
>> counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, 
>> applyIfNot={})"
>> - counts: Graph contains wrong number of nodes:
>> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*)
>> Expected 1 but found 0 nodes.
>
> @PaulSandoz Thanks a lot for your testing, the reason seems to be due to 
> `LaneType::asIntegral` missing `ForceInline` annotation. I have run the 
> reshape test 10 times without getting any failure while with previous patch 
> there is often 1 or 2.
> Thanks.

@merykitty testing now passes. Java bits look good. Needs HotSpot reviewer.

-

PR: https://git.openjdk.java.net/jdk/pull/7358


Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3]

2022-02-14 Thread Paul Sandoz
On Sun, 13 Feb 2022 05:18:34 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch implements the unsigned upcast intrinsics in x86, which are used 
>> in vector lane-wise reinterpreting operations.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   missing ForceInline

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/7358


Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2]

2022-02-10 Thread Paul Sandoz
On Thu, 10 Feb 2022 15:14:44 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch implements the unsigned upcast intrinsics in x86, which are used 
>> in vector lane-wise reinterpreting operations.
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - minor rename
>  - address reviews

Observing the following failures on CPUs with 
"Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags:

-XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 
-XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation


TestVectorCastAVX512.java:

Failed IR Rules (1)
--
- Method "public static void 
compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])":
  * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
applyIfAnd={}, applyIfOr={}, 
counts={"(d+(s){2}(VectorUCastI2X.*)+(s){2}===.*)", "1"}, 
applyIfNot={})"
- counts: Graph contains wrong number of nodes:
Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*)
Expected 1 but found 0 nodes.


TestVectorCastAVX1.java:

- Method "public static void 
compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])":
  * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
applyIfAnd={}, applyIfOr={}, 
counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, 
applyIfNot={})"
- counts: Graph contains wrong number of nodes:
Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*)
Expected 1 but found 0 nodes.

- Method "public static void 
compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])":
  * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, 
applyIfAnd={}, applyIfOr={}, 
counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, 
applyIfNot={})"
- counts: Graph contains wrong number of nodes:
Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*)
Expected 1 but found 0 nodes.

-

PR: https://git.openjdk.java.net/jdk/pull/7358


Integrated: 8281294: [vectorapi] FIRST_NONZERO reduction operation throws IllegalArgumentExcept on zero vectors

2022-02-10 Thread Paul Sandoz
On Wed, 9 Feb 2022 21:36:01 GMT, Paul Sandoz  wrote:

> …ArgumentExcept on zero vectors
> 
> This PR fixes issues for reduction using FIRST_NONZERO and adds tests. The 
> testing infrastructure is generalized to reduction functions and tests for 
> min/max reductions are updated to use that.

This pull request has now been integrated.

Changeset: 83b6e4bc
Author:Paul Sandoz 
URL:   
https://git.openjdk.java.net/jdk/commit/83b6e4bc04db89a846a1b6c2d0666efe139f8f61
Stats: 3725 lines in 54 files changed: 2742 ins; 407 del; 576 mod

8281294: [vectorapi] FIRST_NONZERO reduction operation throws 
IllegalArgumentExcept on zero vectors

Reviewed-by: jrose

-

PR: https://git.openjdk.java.net/jdk/pull/7410


  1   2   3   4   5   6   7   8   9   10   >