Re: RFR: 8287186: JDK modules participating in preview [v2]
On Wed, 8 Jun 2022 22:07:38 GMT, liach wrote: >> Paul Sandoz has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Let java.management participate in preview features. > > Just curious, is it still up to incubator modules' discretion to avoid > accidental user access to preview content via the modules without enabling > preview, like the `PreviewFeatures.ensureEnabled()` in `StructuredTaskScope`? @liach it depends on the API and its scope. A constructor of `StructuredTaskScope` specifies that it implicitly uses a virtual thread factory, so it performs a preview runtime check. The same check is also performed by `Thread.ofVirtual`, ensuring developers cannot reflectively work around `--enable-preview` when creating virtual threads. This approach is feasible since the API surface is so small. - PR: https://git.openjdk.org/jdk/pull/9087
RFR: 8288414: Long::compress/expand samples are not correct
Update the code examples in the api notes of Long::compress/expand. Some constants need to be explicitly long values. - Commit messages: - Correct Long.compress/expand api notes. Changes: https://git.openjdk.org/jdk19/pull/14/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk19=14=00 Issue: https://bugs.openjdk.org/browse/JDK-8288414 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk19/pull/14.diff Fetch: git fetch https://git.openjdk.org/jdk19 pull/14/head:pull/14 PR: https://git.openjdk.org/jdk19/pull/14
Integrated: 8287186: JDK modules participating in preview
On Wed, 8 Jun 2022 15:46:24 GMT, Paul Sandoz wrote: > Allow JDK modules that use preview features (preview language features or > preview API features from dependent modules) to participate without the need > to compile with `--enable-preview`. > > It's difficult to enable participation using an annotation due to the nature > in which symbols are encountered when processing source as there is no > guaranteed order to the processing of certain symbols. > > Instead a JDK module participates if the `java.base` package > `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An > internal annotation `jdk.internal.javac.ParticipatesInPreview` can be > declared on the module. Such a declaration cannot be enforced but does by its > use require the `jdk.internal.javac`'s export list to be updated. > > The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been > updated accordingly, both of which participate in preview APIs (APIs in > `java.lang.foreign` and `Thread.ofVirtual`, respectively). This pull request has now been integrated. Changeset: fb297705 Author:Paul Sandoz URL: https://git.openjdk.org/jdk/commit/fb297705f6dc668bea0257efb9c46b90b5eab2e9 Stats: 140 lines in 12 files changed: 65 ins; 57 del; 18 mod 8287186: JDK modules participating in preview Reviewed-by: alanb, jlahoda - PR: https://git.openjdk.org/jdk/pull/9087
Re: RFR: 8287186: JDK modules participating in preview [v3]
> Allow JDK modules that use preview features (preview language features or > preview API features from dependent modules) to participate without the need > to compile with `--enable-preview`. > > It's difficult to enable participation using an annotation due to the nature > in which symbols are encountered when processing source as there is no > guaranteed order to the processing of certain symbols. > > Instead a JDK module participates if the `java.base` package > `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An > internal annotation `jdk.internal.javac.ParticipatesInPreview` can be > declared on the module. Such a declaration cannot be enforced but does by its > use require the `jdk.internal.javac`'s export list to be updated. > > The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been > updated accordingly, both of which participate in preview APIs (APIs in > `java.lang.foreign` and `Thread.ofVirtual`, respectively). Paul Sandoz has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8287186-preview-participating - Let java.management participate in preview features. - Unused import. - Generalize the pariticipating in preview APIs. - Changes: - all: https://git.openjdk.org/jdk/pull/9087/files - new: https://git.openjdk.org/jdk/pull/9087/files/9defdf23..abd1fbf6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=9087=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=9087=01-02 Stats: 17219 lines in 554 files changed: 13408 ins; 1651 del; 2160 mod Patch: https://git.openjdk.org/jdk/pull/9087.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9087/head:pull/9087 PR: https://git.openjdk.org/jdk/pull/9087
Re: RFR: 8287186: JDK modules participating in preview
On Wed, 8 Jun 2022 17:21:08 GMT, Alan Bateman wrote: > Can java.management participate too? It would allow > sun.management.Util.isVirtual(Thread) to go away (lots of methods in > sun.management.ThreadImpl need to test if a thread is virtual). Pushed update. - PR: https://git.openjdk.java.net/jdk/pull/9087
Re: RFR: 8287186: JDK modules participating in preview [v2]
> Allow JDK modules that use preview features (preview language features or > preview API features from dependent modules) to participate without the need > to compile with `--enable-preview`. > > It's difficult to enable participation using an annotation due to the nature > in which symbols are encountered when processing source as there is no > guaranteed order to the processing of certain symbols. > > Instead a JDK module participates if the `java.base` package > `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An > internal annotation `jdk.internal.javac.ParticipatesInPreview` can be > declared on the module. Such a declaration cannot be enforced but does by its > use require the `jdk.internal.javac`'s export list to be updated. > > The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been > updated accordingly, both of which participate in preview APIs (APIs in > `java.lang.foreign` and `Thread.ofVirtual`, respectively). Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Let java.management participate in preview features. - Changes: - all: https://git.openjdk.java.net/jdk/pull/9087/files - new: https://git.openjdk.java.net/jdk/pull/9087/files/5e7ca855..9defdf23 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=9087=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=9087=00-01 Stats: 44 lines in 5 files changed: 4 ins; 29 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/9087.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9087/head:pull/9087 PR: https://git.openjdk.java.net/jdk/pull/9087
RFR: 8287186: JDK modules participating in preview
Allow JDK modules that use preview features (preview language features or preview API features from dependent modules) to participate without the need to compile with `--enable-preview`. It's difficult to enable participation using an annotation due to the nature in which symbols are encountered when processing source as there is no guaranteed order to the processing of certain symbols. Instead a JDK module participates if the `java.base` package `jdk.internal.javac` is exported to that module (@lahodaj clever idea!). An internal annotation `jdk.internal.javac.ParticipatesInPreview` can be declared on the module. Such a declaration cannot be enforced but does by its use require the `jdk.internal.javac`'s export list to be updated. The modules `jdk.incubator.vector` and `jdk.incubator.concurrent` have been updated accordingly, both of which participate in preview APIs (APIs in `java.lang.foreign` and `Thread.ofVirtual`, respectively). - Commit messages: - Unused import. - Generalize the pariticipating in preview APIs. Changes: https://git.openjdk.java.net/jdk/pull/9087/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=9087=00 Issue: https://bugs.openjdk.org/browse/JDK-8287186 Stats: 98 lines in 7 files changed: 62 ins; 28 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/9087.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/9087/head:pull/9087 PR: https://git.openjdk.java.net/jdk/pull/9087
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]
On Thu, 2 Jun 2022 01:29:54 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains three commits: > > - Merge branch 'jdk:master' to JDK-8286279 > - Wrap the offset check into a static method > - 8286279: [vectorapi] Only check index of masked lanes if offset is out of > array boundary for masked store Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v5]
On Thu, 2 Jun 2022 03:27:59 GMT, Xiaohong Gong wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that do >> not support the predicate feature. Because the masked load is implemented >> with a full vector load and a vector blend applied on it. And a full vector >> load will definitely cause the IOOBE which is not valid. However, for >> architectures that support the predicate feature like SVE/AVX-512/RVV, it >> can be vectorized with the predicated load instruction as long as the >> indexes of the masked lanes are within the bounds of the array. For these >> architectures, loading with unmasked lanes does not raise exception. >> >> This patch adds the vectorization support for the masked load with IOOBE >> part. Please see the original java implementation (FIXME: optimize): >> >> >> @ForceInline >> public static >> ByteVector fromArray(VectorSpecies species, >>byte[] a, int offset, >>VectorMask m) { >> ByteSpecies vsp = (ByteSpecies) species; >> if (offset >= 0 && offset <= (a.length - species.length())) { >> return vsp.dummyVector().fromArray0(a, offset, m); >> } >> >> // FIXME: optimize >> checkMaskFromIndexSize(offset, vsp, m, 1, a.length); >> return vsp.vOp(m, i -> a[offset + i]); >> } >> >> Since it can only be vectorized with the predicate load, the hotspot must >> check whether the current backend supports it and falls back to the java >> scalar version if not. This is different from the normal masked vector load >> that the compiler will generate a full vector load and a vector blend if the >> predicate load is not supported. So to let the compiler make the expected >> action, an additional flag (i.e. `usePred`) is added to the existing >> "loadMasked" intrinsic, with the value "true" for the IOOBE part while >> "false" for the normal load. And the compiler will fail to intrinsify if the >> flag is "true" and the predicate load is not supported by the backend, which >> means that normal java path will be executed. >> >> Also adds the same vectorization support for masked: >> - fromByteArray/fromByteBuffer >> - fromBooleanArray >> - fromCharArray >> >> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` >> on the x86 AVX-512 system: >> >> Benchmark before After Units >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms >> >> Similar performance gain can also be observed on 512-bit SVE system. > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains five commits: > > - Merge branch 'jdk:master' into JDK-8283667 > - Use integer constant for offsetInRange all the way through > - Rename "use_predicate" to "needs_predicate" > - Rename the "usePred" to "offsetInRange" > - 8283667: [vectorapi] Vectorization for masked load with IOOBE with > predicate feature Looks good. As a follow on PR I think it would be useful to add constants `OFFSET_IN_RANGE` and `OFFSET_OUT_OF_RANGE`, then it becomes much clearer in source and you can drop the `/* offsetInRange */` comment on the argument. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Wrap the offset check into a static method > > @PaulSandoz, could you please help to check whether the current version is ok > for you? Thanks so much! @XiaohongGong looks good, now the Vector API JEP has been integrated you will get a merge conflict, but it should be easier to resolve. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8287244: Add bound check in indexed memory access var handle [v3]
On Tue, 24 May 2022 18:15:45 GMT, Maurizio Cimadamore wrote: >> Constructing indexed var handles using the `MemoryLayout` API produces >> `VarHandle` which do not check the input indices for out-of-bounds >> conditions. >> While this can never result in a VM crash (after all the memory segment will >> protect against "true" OOB access), it is still possible for an access >> expression to refer to parts of a segment that are logically unrelated. >> >> This patch adds a "logical" bound check to all indexed var handles generated >> using the layout API. >> Benchmarks are not affected by the check. Users are still able to create >> custom "unchecked" var handles, using the combinator API in `MethodHandles`. > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Address review comments Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8868
Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]
On Tue, 24 May 2022 17:55:13 GMT, Paul Sandoz wrote: >> Here's a concrete example: >> >> Consider a sequence layout with 6 elements. Then: >> >> element count = 6 >> valid indices 0, 1, 2, 3, 4, 5 >> >> Now consider a var handle that is obtained by calling the path element >> method, passing the following parameters >> >> start = 1 >> step = 2 >> >> This sets up the following mapping between logical an physical indices: >> >> 0 -> 1 >> 1 -> 3 >> 2 -> 5 >> >> Where on the LHS we have the logical index (the one passed to the VH) and on >> the RHS we have the actual index it is translated to. >> >> Note that the index map has three elements. So the upper bound (exclusive) >> of the index map is 3 - that is, we can pass indices 0, 1, 2. >> >> According to the formula shown in the javadoc: >> >> B = ceilDiv((elementCount - start) / step); >> >> so: >> >> B = ceilDiv((6 - 1) / 2) >> = ceilDiv(5 / 2) >> >> Note how, w/o ceilDiv we'd get 2 (the last valid index), and not 3 (the >> first invalid index). > > The terms `x_1, x_2, ... x__n` are defined, but `x_0` is not. > > I think you can refer to the first index out of bounds as the exclusive upper > bound of the range? Sorry i misread the text, we are talking about the same thing. I think it would be clearer to refer `x_i` being in the range of `0` (inclusive) and `b_i` (exclusive), otherwise an is thrown. That way in subsequent doc on other methods in matches with `B`, which is exclusive. - PR: https://git.openjdk.java.net/jdk/pull/8868
Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]
On Tue, 24 May 2022 17:53:38 GMT, Maurizio Cimadamore wrote: >> Indices start at zero. The ceilDiv operation is needed so that the operation >> returns the first index outisde the range (it's a bit subtle, sorry, but I >> don't know how else to express). > > Here's a concrete example: > > Consider a sequence layout with 6 elements. Then: > > element count = 6 > valid indices 0, 1, 2, 3, 4, 5 > > Now consider a var handle that is obtained by calling the path element > method, passing the following parameters > > start = 1 > step = 2 > > This sets up the following mapping between logical an physical indices: > > 0 -> 1 > 1 -> 3 > 2 -> 5 > > Where on the LHS we have the logical index (the one passed to the VH) and on > the RHS we have the actual index it is translated to. > > Note that the index map has three elements. So the upper bound (exclusive) of > the index map is 3 - that is, we can pass indices 0, 1, 2. > > According to the formula shown in the javadoc: > > B = ceilDiv((elementCount - start) / step); > > so: > > B = ceilDiv((6 - 1) / 2) > = ceilDiv(5 / 2) > > Note how, w/o ceilDiv we'd get 2 (the last valid index), and not 3 (the first > invalid index). The terms `x_1, x_2, ... x__n` are defined, but `x_0` is not. I think you can refer to the first index out of bounds as the exclusive upper bound of the range? - PR: https://git.openjdk.java.net/jdk/pull/8868
Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]
On Tue, 24 May 2022 15:28:27 GMT, Maurizio Cimadamore wrote: >> Constructing indexed var handles using the `MemoryLayout` API produces >> `VarHandle` which do not check the input indices for out-of-bounds >> conditions. >> While this can never result in a VM crash (after all the memory segment will >> protect against "true" OOB access), it is still possible for an access >> expression to refer to parts of a segment that are logically unrelated. >> >> This patch adds a "logical" bound check to all indexed var handles generated >> using the layout API. >> Benchmarks are not affected by the check. Users are still able to create >> custom "unchecked" var handles, using the combinator API in `MethodHandles`. > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Tweak javadoc for ValueLayout::arrayElementVarHandle src/java.base/share/classes/jdk/internal/foreign/LayoutPath.java line 256: > 254: private long[] addBound(long maxIndex) { > 255: long[] newBounds = new long[bounds.length + 1]; > 256: System.arraycopy(bounds, 0, newBounds, 0, bounds.length); Suggestion: long[] newBounds = Arrays.copyOf(bounds, bounds.length + 1); - PR: https://git.openjdk.java.net/jdk/pull/8868
Re: RFR: 8287244: Add bound check in indexed memory access var handle [v2]
On Tue, 24 May 2022 15:28:27 GMT, Maurizio Cimadamore wrote: >> Constructing indexed var handles using the `MemoryLayout` API produces >> `VarHandle` which do not check the input indices for out-of-bounds >> conditions. >> While this can never result in a VM crash (after all the memory segment will >> protect against "true" OOB access), it is still possible for an access >> expression to refer to parts of a segment that are logically unrelated. >> >> This patch adds a "logical" bound check to all indexed var handles generated >> using the layout API. >> Benchmarks are not affected by the check. Users are still able to create >> custom "unchecked" var handles, using the combinator API in `MethodHandles`. > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > Tweak javadoc for ValueLayout::arrayElementVarHandle src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 374: > 372: * > 373: * Additionally, the provided dynamic values must conform to some > bound which is derived from the layout path, that is, > 374: * {@code 0 <= x_i <= b_i}, where {@code 0 <= i <= n}, or {@link > IndexOutOfBoundsException} is thrown. Suggestion: * {@code 0 <= x_i < b_i}, where {@code 1 <= i <= n}, or {@link IndexOutOfBoundsException} is thrown. We refer later to `B` being an exclusive upper bound (computed using `ceilDiv`). - PR: https://git.openjdk.java.net/jdk/pull/8868
Re: RFR: 8284199: Implementation of Structured Concurrency (Incubator)
On Sat, 21 May 2022 16:25:57 GMT, Alan Bateman wrote: >> src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructuredTaskScope.java >> line 1172: >> >>> 1170: } >>> 1171: }; >>> 1172: return AccessController.doPrivileged(pa); >> >> It might be better to use `MethodHandle`s obtained using >> [jdk.internal.access.SharedSecrets].getJavaLangInvokeAccess() >> and [JavaLangInvokeAccess].findStatic(…) and >> [JavaLangInvokeAccess].findVirtual(…) for this, which >> would avoid going through `AccessController.doPrivilaged(…)`. >> >> [jdk.internal.access.SharedSecrets]: >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java >> [JavaLangInvokeAccess]: >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/access/JavaLangInvokeAccess.java > > I'd prefer not export jdk.internal.access to this module, if possible. In > general, it's a lot easier to reason about the security and integrity of the > core platform when java.base doesn't export any of its internal packages. > > In any case, I expect this issue will resolve itself once there is a way for > code in "core" modules to use preview APIs without needing to be compiled > with --enable-preview. The java.management and jdk.incubator.vector modules > have the same issue. Yes, we will add a more general internal for source that participate in preview APIs, and then the use of reflection can be removed. (The Vector API PR has a focused fix to the compiler, since it cannot use the reflection trick). - PR: https://git.openjdk.java.net/jdk/pull/8787
Re: RFR: 8284199: Implementation of Structured Concurrency (Incubator)
On Thu, 19 May 2022 13:05:54 GMT, Alan Bateman wrote: > This is the implementation of JEP 428: Structured Concurrency (Incubator). > > This is a non-final API that provides a gentle on-ramp to structure a task as > a family of concurrent subtasks, and to coordinate the subtasks as a unit. Previously reviewed while in the loom repo. src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructureViolationException.java line 40: > 38: > 39: /** > 40: * Constructs an {@code StructureViolationException} with no detail > message. Suggestion: * Constructs a {@code StructureViolationException} with no detail message. src/jdk.incubator.concurrent/share/classes/jdk/incubator/concurrent/StructureViolationException.java line 47: > 45: > 46: /** > 47: * Constructs an {@code StructureViolationException} with the > specified Suggestion: * Constructs a {@code StructureViolationException} with the specified - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8787
Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v7]
On Thu, 19 May 2022 21:11:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> Patch adds the planned support for new vector operations and APIs targeted >> for [JEP 426: Vector API (Fourth >> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173) >> >> Following is the brief summary of changes:- >> >> 1) Extends the scope of existing lanewise API for following new vector >> operations. >>- VectorOperations.BIT_COUNT: counts the number of one-bits >>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero >> bits >>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing >> zero bits >>- VectorOperations.REVERSE: reversing the order of bits >>- VectorOperations.REVERSE_BYTES: reversing the order of bytes >>- compress and expand bits: Semantics are based on Hacker's Delight >> section 7-4 Compress, or Generalized Extract. >> >> 2) Adds following new APIs to perform cross lane vector compress and >> expansion operations under the influence of a mask. >>- Vector.compress >>- Vector.expand >>- VectorMask.compress >> >> 3) Adds predicated and non-predicated versions of following new APIs to load >> and store the contents of vector from foreign MemorySegments. >> - Vector.fromMemorySegment >> - Vector.intoMemorySegment >> >> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support >> for each newly added operation. >> >> >> Patch has been regressed over AARCH64 and X86 targets different AVX levels. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Changes to enable jdk.incubator.vector to be treated as preview > participant. Code re-organization related to Reverse/ReverseByte IR > transforms. > - 8284960: Adding --enable-preview in vectorAPI benchmarks. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Correcting a typo. > - 8284960: Integrating changes from panama-vector (Add @since 19 tags). > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - ... and 6 more: > https://git.openjdk.java.net/jdk/compare/9f562ef7...311f3233 src/jdk.compiler/share/classes/com/sun/tools/javac/code/Preview.java line 132: > 130: * @return true if {@code s} is participating in the preview of > {@code previewSymbol} > 131: */ > 132: public boolean isPreviewParticipating(Symbol s, Symbol > previewSymbol) { Some feedback from a colleague: Suggestion: /** * Returns true if {@code s} is deemed to participate in the preview of {@code previewSymbol}, and * therefore no warnings or errors will be produced. * * @param s the symbol depending on the preview symbol * @param previewSymbol the preview symbol marked with @Preview * @return true if {@code s} is participating in the preview of {@code previewSymbol} */ public boolean participatesInPreview(Symbol s, Symbol previewSymbol) { - PR: https://git.openjdk.java.net/jdk/pull/8425
Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v7]
On Thu, 19 May 2022 21:11:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> Patch adds the planned support for new vector operations and APIs targeted >> for [JEP 426: Vector API (Fourth >> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173) >> >> Following is the brief summary of changes:- >> >> 1) Extends the scope of existing lanewise API for following new vector >> operations. >>- VectorOperations.BIT_COUNT: counts the number of one-bits >>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero >> bits >>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing >> zero bits >>- VectorOperations.REVERSE: reversing the order of bits >>- VectorOperations.REVERSE_BYTES: reversing the order of bytes >>- compress and expand bits: Semantics are based on Hacker's Delight >> section 7-4 Compress, or Generalized Extract. >> >> 2) Adds following new APIs to perform cross lane vector compress and >> expansion operations under the influence of a mask. >>- Vector.compress >>- Vector.expand >>- VectorMask.compress >> >> 3) Adds predicated and non-predicated versions of following new APIs to load >> and store the contents of vector from foreign MemorySegments. >> - Vector.fromMemorySegment >> - Vector.intoMemorySegment >> >> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support >> for each newly added operation. >> >> >> Patch has been regressed over AARCH64 and X86 targets different AVX levels. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Changes to enable jdk.incubator.vector to be treated as preview > participant. Code re-organization related to Reverse/ReverseByte IR > transforms. > - 8284960: Adding --enable-preview in vectorAPI benchmarks. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - 8284960: Correcting a typo. > - 8284960: Integrating changes from panama-vector (Add @since 19 tags). > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960 > - ... and 6 more: > https://git.openjdk.java.net/jdk/compare/9f562ef7...311f3233 src/jdk.compiler/share/classes/com/sun/tools/javac/code/Preview.java line 50: > 48: import java.util.Set; > 49: > 50: import static com.sun.tools.javac.code.Flags.PREVIEW_API; Suggestion: Redundant import (sorry i should have checked before i sent you updates to this area) - PR: https://git.openjdk.java.net/jdk/pull/8425
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that do >> not support the predicate feature. Because the masked load is implemented >> with a full vector load and a vector blend applied on it. And a full vector >> load will definitely cause the IOOBE which is not valid. However, for >> architectures that support the predicate feature like SVE/AVX-512/RVV, it >> can be vectorized with the predicated load instruction as long as the >> indexes of the masked lanes are within the bounds of the array. For these >> architectures, loading with unmasked lanes does not raise exception. >> >> This patch adds the vectorization support for the masked load with IOOBE >> part. Please see the original java implementation (FIXME: optimize): >> >> >> @ForceInline >> public static >> ByteVector fromArray(VectorSpecies species, >>byte[] a, int offset, >>VectorMask m) { >> ByteSpecies vsp = (ByteSpecies) species; >> if (offset >= 0 && offset <= (a.length - species.length())) { >> return vsp.dummyVector().fromArray0(a, offset, m); >> } >> >> // FIXME: optimize >> checkMaskFromIndexSize(offset, vsp, m, 1, a.length); >> return vsp.vOp(m, i -> a[offset + i]); >> } >> >> Since it can only be vectorized with the predicate load, the hotspot must >> check whether the current backend supports it and falls back to the java >> scalar version if not. This is different from the normal masked vector load >> that the compiler will generate a full vector load and a vector blend if the >> predicate load is not supported. So to let the compiler make the expected >> action, an additional flag (i.e. `usePred`) is added to the existing >> "loadMasked" intrinsic, with the value "true" for the IOOBE part while >> "false" for the normal load. And the compiler will fail to intrinsify if the >> flag is "true" and the predicate load is not supported by the backend, which >> means that normal java path will be executed. >> >> Also adds the same vectorization support for masked: >> - fromByteArray/fromByteBuffer >> - fromBooleanArray >> - fromCharArray >> >> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` >> on the x86 AVX-512 system: >> >> Benchmark before After Units >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms >> >> Similar performance gain can also be observed on 512-bit SVE system. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Rename "use_predicate" to "needs_predicate" Yes, the tests were run in debug mode. The reporting of the missing constant occurs for the compiled method that is called from the method where the constants are declared e.g.: 719 240bjdk.incubator.vector.Int256Vector::fromArray0 (15 bytes) ** Rejected vector op (LoadVectorMasked,int,8) because architecture does not support it ** missing constant: offsetInRange=Parm @ 11 jdk.incubator.vector.IntVector::fromArray0Template (22 bytes) force inline by annotation So it appears to be working as expected. A similar pattern occurs at a lower-level for the passing of the mask class. `Int256Vector::fromArray0` passes a constant class to `IntVector::fromArray0Template` (the compilation of which bails out before checking that the `offsetInRange` is constant). - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]
On Thu, 12 May 2022 11:08:10 GMT, Ludovic Henry wrote: >> Looks like you are making great progress. >> >> Have you thought about ways the intrinsic implementation might be simplified >> if some code is retained in Java and passed as constant arguments? e.g. >> table of constants, scalar loop, bounds checks etc, such that the intrinsic >> primarily focuses on the vectorized code. To some extent that's related to >> John's point on generalization, and through simplification there may be some >> generalization. >> >> For example if there was a general intrinsic that returned a long value >> (e.g. first 32 bits are the offset in the array to continue processing, the >> second 32 bits are the current hashcode value) then we could call that from >> the Java implementations that then proceed with the scalar loop up to the >> array length. The Java implementation intrinsic would return `(0L | 1L << >> 32)`. >> >> Separately it would be nice to consider computing the hash code from the >> contents of a memory segment, similar to how we added `mismatch` support, >> but the trick of returning a value that merges the offset and hash code >> would not work, unless we return the remaining elements to process and that >> guaranteed to be less than a certain value (or alternatively a relative >> offset value given an upper bound argument, and performing the intrinsic >> call in a loop until no progress can be made, which works better for >> safepoints). >> >> The `long[]` hashcode is annoying given `(element ^ (element >>> 32))`, but >> if we simplify the intrinsic maybe we can add back that complexity? > > @PaulSandoz yes, keeping the "short" string part in pure Java and switching > to an intrinsified/vectorized version for "long" strings is on the next > avenue of exploration. I would also put the intrinsic as a runtime stub to > avoid unnecessarily increase the size of the calling method unnecessarily. > The overhead of the call would be amortised because it would only be called > for longer strings anyway. > > I haven't given much thoughts to how we could split up the different elements > of the algorithm to generalise the approach just yet. I'll give it a try, see > how far I can get with it, and keep you updated on my findings. @luhenry ok, we took a similar approach to the mismatch intrinsic, carefully analyzing the threshold by which the intrinsic would be called. My suggestion would be to follow that approach further and head towards an internal intrinsic perhaps with this signature: @IntrinsicCandidate static long hashCode(Class eType, Object base, long offset, int length /* in bytes * /) { return 0 | (1L << 32); // or perhaps just return 0 } ``` Then on a further iteration try and pass the polynomial constant and table of powers (stable array) as arguments. - PR: https://git.openjdk.java.net/jdk/pull/7700
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Thu, 12 May 2022 01:56:25 GMT, Xiaohong Gong wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template >> line 4086: >> >>> 4084: } else { >>> 4085: $Type$Species vsp = vspecies(); >>> 4086: if (offset < 0 || offset > (a.length - vsp.length())) { >> >> Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. >> >> if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { >> ... > > Thanks for the review @PaulSandoz ! For the > `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be > used here because the `outOfBounds` exception will be thrown if the offset is > not inside of the valid array boundary. And for the masked operations, this > is not needed since we only need to check the masked lanes. Please correct me > if I didn't understand correctly. Thanks! Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My intent was that we could use a method rather than repeating the expression in numerous places (including masked loads IIRC). - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that do >> not support the predicate feature. Because the masked load is implemented >> with a full vector load and a vector blend applied on it. And a full vector >> load will definitely cause the IOOBE which is not valid. However, for >> architectures that support the predicate feature like SVE/AVX-512/RVV, it >> can be vectorized with the predicated load instruction as long as the >> indexes of the masked lanes are within the bounds of the array. For these >> architectures, loading with unmasked lanes does not raise exception. >> >> This patch adds the vectorization support for the masked load with IOOBE >> part. Please see the original java implementation (FIXME: optimize): >> >> >> @ForceInline >> public static >> ByteVector fromArray(VectorSpecies species, >>byte[] a, int offset, >>VectorMask m) { >> ByteSpecies vsp = (ByteSpecies) species; >> if (offset >= 0 && offset <= (a.length - species.length())) { >> return vsp.dummyVector().fromArray0(a, offset, m); >> } >> >> // FIXME: optimize >> checkMaskFromIndexSize(offset, vsp, m, 1, a.length); >> return vsp.vOp(m, i -> a[offset + i]); >> } >> >> Since it can only be vectorized with the predicate load, the hotspot must >> check whether the current backend supports it and falls back to the java >> scalar version if not. This is different from the normal masked vector load >> that the compiler will generate a full vector load and a vector blend if the >> predicate load is not supported. So to let the compiler make the expected >> action, an additional flag (i.e. `usePred`) is added to the existing >> "loadMasked" intrinsic, with the value "true" for the IOOBE part while >> "false" for the normal load. And the compiler will fail to intrinsify if the >> flag is "true" and the predicate load is not supported by the backend, which >> means that normal java path will be executed. >> >> Also adds the same vectorization support for masked: >> - fromByteArray/fromByteBuffer >> - fromBooleanArray >> - fromCharArray >> >> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` >> on the x86 AVX-512 system: >> >> Benchmark before After Units >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms >> >> Similar performance gain can also be observed on 512-bit SVE system. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Rename "use_predicate" to "needs_predicate" I tried your test code with the patch and logged compilation (`-XX:-TieredCompilation -XX:+PrintCompilation -XX:+PrintInlining -XX:+PrintIntrinsics -Xbatch`) For `func` the first call to `VectorSupport::loadMasked` is intrinsic and inlined: @ 45 jdk.internal.vm.vector.VectorSupport::loadMasked (40 bytes) (intrinsic) But the second call (for the last loop iteration) fails to inline: @ 45 jdk.internal.vm.vector.VectorSupport::loadMasked (40 bytes) failed to inline (intrinsic) Since i am running on an mac book this looks right and aligns with the `-XX:+PrintIntrinsics` output: ** Rejected vector op (LoadVectorMasked,int,8) because architecture does not support it ** Rejected vector op (LoadVectorMasked,int,8) because architecture does not support it ** not supported: op=loadMasked vlen=8 etype=int using_byte_array=0 ? I have not looked at the code gen nor measured performance comparing the case when never out of bounds and only out of bounds for the last loop iteration. - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8282664: Unroll by hand StringUTF16, StringLatin1, and Arrays polynomial hash loops [v12]
On Tue, 10 May 2022 14:46:56 GMT, Ludovic Henry wrote: >> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark(size) Mode Cnt Score >>Error Units >> StringHashCode.Algorithm.scalarLatin1 0 avgt 25 2.111 >> ± 0.210 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 3.500 >> ± 0.127 ns/op >> StringHashCode.Algorithm.scalarLatin110 avgt 25 7.001 >> ± 0.099 ns/op >> StringHashCode.Algorithm.scalarLatin1 100 avgt 2561.285 >> ± 0.444 ns/op >> StringHashCode.Algorithm.scalarLatin1 1000 avgt 25 628.995 >> ± 0.846 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 6307.990 >> ± 4.071 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 0 avgt 25 2.358 >> ± 0.092 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3.631 >> ± 0.159 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 10 avgt 25 7.049 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 2533.626 >> ± 1.218 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled161000 avgt 25 317.811 >> ± 1.225 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3212.333 >> ± 14.621 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled80 avgt 25 2.356 >> ± 0.097 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3.630 >> ± 0.158 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 10 avgt 25 8.724 >> ± 0.065 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 2532.402 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000 avgt 25 321.949 >> ± 0.251 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3202.083 >> ± 1.667 ns/op >> StringHashCode.Algorithm.scalarUTF16 0 avgt 25 2.135 >> ± 0.191 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 5.202 >> ± 0.362 ns/op >> StringHashCode.Algorithm.scalarUTF16 10 avgt 2511.105 >> ± 0.112 ns/op >> StringHashCode.Algorithm.scalarUTF16100 avgt 2575.974 >> ± 0.702 ns/op >> StringHashCode.Algorithm.scalarUTF16 1000 avgt 25 716.429 >> ± 3.290 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 7095.459 >> ± 43.847 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled160 avgt 25 2.381 >> ± 0.038 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 5.268 >> ± 0.422 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 10 avgt 2511.248 >> ± 0.178 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 100 avgt 2552.966 >> ± 0.089 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000 avgt 25 450.912 >> ± 1.834 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 4403.988 >> ± 2.927 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 0 avgt 25 2.401 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 5.091 >> ± 0.396 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled810 avgt 2512.801 >> ± 0.189 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 100 avgt 2552.068 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1000 avgt 25 453.270 >> ± 0.340 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 4433.112 >> ± 2.699 ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage.
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote: > Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 4086: > 4084: } else { > 4085: $Type$Species vsp = vspecies(); > 4086: if (offset < 0 || offset > (a.length - vsp.length())) { Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { ... - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]
On Wed, 11 May 2022 03:23:13 GMT, Xiaohong Gong wrote: >> I modified the code of this PR to avoid the conversion of `boolean` to >> `int`, so a constant integer value is passed all the way through, and the >> masked load is made intrinsic from the method at which the constants are >> passed as arguments i.e. the public `fromArray` mask accepting method. > > Hi @PaulSandoz , thanks for the patch for the constant int parameter. I think > the main change is: > > -ByteVector fromArray0Template(Class maskClass, C base, long offset, > int index, M m, boolean offsetInRange, > +ByteVector fromArray0Template(Class maskClass, C base, long offset, > int index, M m, int offsetInRange, > VectorSupport.LoadVectorMaskedOperation ByteVector, ByteSpecies, M> defaultImpl) { > m.check(species()); > ByteSpecies vsp = vspecies(); > -if (offsetInRange) { > -return VectorSupport.loadMasked( > -vsp.vectorType(), maskClass, vsp.elementType(), > vsp.laneCount(), > -base, offset, m, /* offsetInRange */ 1, > -base, index, vsp, defaultImpl); > -} else { > -return VectorSupport.loadMasked( > -vsp.vectorType(), maskClass, vsp.elementType(), > vsp.laneCount(), > -base, offset, m, /* offsetInRange */ 0, > -base, index, vsp, defaultImpl); > -} > +return VectorSupport.loadMasked( > +vsp.vectorType(), maskClass, vsp.elementType(), vsp.laneCount(), > +base, offset, m, offsetInRange == 1 ? 1 : 0, > +base, index, vsp, defaultImpl); > } > > which uses `offsetInRange == 1 ? 1 : 0`. Unfortunately this could not always > make sure the `offsetInRange` a constant a the compiler time. Again, this > change could also make the assertion fail randomly: > > --- a/src/hotspot/share/opto/vectorIntrinsics.cpp > +++ b/src/hotspot/share/opto/vectorIntrinsics.cpp > @@ -1236,6 +1236,7 @@ bool > LibraryCallKit::inline_vector_mem_masked_operation(bool is_store) { > } else { >// Masked vector load with IOOBE always uses the predicated load. >const TypeInt* offset_in_range = gvn().type(argument(8))->isa_int(); > + assert(offset_in_range->is_con(), "must be a constant"); >if (!offset_in_range->is_con()) { > if (C->print_intrinsics()) { >tty->print_cr(" ** missing constant: offsetInRange=%s", > > Sometimes, the compiler can parse it a constant. I think this depends on the > compiler OSR and speculative optimization. Did you try an example with IOOBE > on a non predicated hardware? > > Here is the main code of my unittest to reproduce the issue: > > static final VectorSpecies I_SPECIES = IntVector.SPECIES_128; > static final int LENGTH = 1026; > public static int[] ia; > public static int[] ib; > > private static void init() { > for (int i = 0; i < LENGTH; i++) { > ia[i] = i; > ib[i] = 0; > } > > for (int i = 0; i < 2; i++) { > m[i] = i % 2 == 0; > } > } > > private static void func() { > VectorMask mask = VectorMask.fromArray(I_SPECIES, m, 0); > for (int i = 0; i < LENGTH; i += vl) { > IntVector av = IntVector.fromArray(I_SPECIES, ia, i, mask); > av.lanewise(VectorOperators.ABS).intoArray(ic, i, mask); > } > } > > public static void main(String[] args) { > init(); > for (int i = 0; i < 1; i++) { > func(); > } > } @XiaohongGong Doh! The ternary was an experiment, and I forgot to re-run the code gen script before sending your the patch. See `IntVector`, which does not have that. I presume when the offset is not in range and the other code path is taken then it might be problematic unless all code paths are inlined. I will experiment further with tests. - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that do >> not support the predicate feature. Because the masked load is implemented >> with a full vector load and a vector blend applied on it. And a full vector >> load will definitely cause the IOOBE which is not valid. However, for >> architectures that support the predicate feature like SVE/AVX-512/RVV, it >> can be vectorized with the predicated load instruction as long as the >> indexes of the masked lanes are within the bounds of the array. For these >> architectures, loading with unmasked lanes does not raise exception. >> >> This patch adds the vectorization support for the masked load with IOOBE >> part. Please see the original java implementation (FIXME: optimize): >> >> >> @ForceInline >> public static >> ByteVector fromArray(VectorSpecies species, >>byte[] a, int offset, >>VectorMask m) { >> ByteSpecies vsp = (ByteSpecies) species; >> if (offset >= 0 && offset <= (a.length - species.length())) { >> return vsp.dummyVector().fromArray0(a, offset, m); >> } >> >> // FIXME: optimize >> checkMaskFromIndexSize(offset, vsp, m, 1, a.length); >> return vsp.vOp(m, i -> a[offset + i]); >> } >> >> Since it can only be vectorized with the predicate load, the hotspot must >> check whether the current backend supports it and falls back to the java >> scalar version if not. This is different from the normal masked vector load >> that the compiler will generate a full vector load and a vector blend if the >> predicate load is not supported. So to let the compiler make the expected >> action, an additional flag (i.e. `usePred`) is added to the existing >> "loadMasked" intrinsic, with the value "true" for the IOOBE part while >> "false" for the normal load. And the compiler will fail to intrinsify if the >> flag is "true" and the predicate load is not supported by the backend, which >> means that normal java path will be executed. >> >> Also adds the same vectorization support for masked: >> - fromByteArray/fromByteBuffer >> - fromBooleanArray >> - fromCharArray >> >> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` >> on the x86 AVX-512 system: >> >> Benchmark before After Units >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms >> >> Similar performance gain can also be observed on 512-bit SVE system. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Rename "use_predicate" to "needs_predicate" I did modified the code of this PR to avoid the conversion of `boolean` to `int` and the masked load is made intrinsic from the method at which the constants are passed as arguments i.e. the public `fromArray` mask accepting method. - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8286294 : ForkJoinPool.commonPool().close() spins
On Fri, 6 May 2022 15:05:57 GMT, Doug Lea wrote: > Changes ForkJoinPool.close spec and code to trap close as a no-op if called > on common pool Changes look good, it will need a CSR. - PR: https://git.openjdk.java.net/jdk/pull/8577
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]
On Fri, 6 May 2022 04:49:39 GMT, Xiaohong Gong wrote: >> offset is long so uses two argument slots (5 and 6). >> mask is argument (7). >> offsetInRange is argument(8). > > Make sense! Thanks for the explanation! Doh! of course. This is not the first and will not be the last time i get caught out by the 2-slot requirement. It may be useful to do this: Node* mask_arg = is_store ? argument(8) : argument(7); - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]
On Thu, 5 May 2022 08:56:07 GMT, Xiaohong Gong wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that do >> not support the predicate feature. Because the masked load is implemented >> with a full vector load and a vector blend applied on it. And a full vector >> load will definitely cause the IOOBE which is not valid. However, for >> architectures that support the predicate feature like SVE/AVX-512/RVV, it >> can be vectorized with the predicated load instruction as long as the >> indexes of the masked lanes are within the bounds of the array. For these >> architectures, loading with unmasked lanes does not raise exception. >> >> This patch adds the vectorization support for the masked load with IOOBE >> part. Please see the original java implementation (FIXME: optimize): >> >> >> @ForceInline >> public static >> ByteVector fromArray(VectorSpecies species, >>byte[] a, int offset, >>VectorMask m) { >> ByteSpecies vsp = (ByteSpecies) species; >> if (offset >= 0 && offset <= (a.length - species.length())) { >> return vsp.dummyVector().fromArray0(a, offset, m); >> } >> >> // FIXME: optimize >> checkMaskFromIndexSize(offset, vsp, m, 1, a.length); >> return vsp.vOp(m, i -> a[offset + i]); >> } >> >> Since it can only be vectorized with the predicate load, the hotspot must >> check whether the current backend supports it and falls back to the java >> scalar version if not. This is different from the normal masked vector load >> that the compiler will generate a full vector load and a vector blend if the >> predicate load is not supported. So to let the compiler make the expected >> action, an additional flag (i.e. `usePred`) is added to the existing >> "loadMasked" intrinsic, with the value "true" for the IOOBE part while >> "false" for the normal load. And the compiler will fail to intrinsify if the >> flag is "true" and the predicate load is not supported by the backend, which >> means that normal java path will be executed. >> >> Also adds the same vectorization support for masked: >> - fromByteArray/fromByteBuffer >> - fromBooleanArray >> - fromCharArray >> >> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` >> on the x86 AVX-512 system: >> >> Benchmark before After Units >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms >> >> Similar performance gain can also be observed on 512-bit SVE system. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Rename "use_predicate" to "needs_predicate" src/hotspot/share/opto/vectorIntrinsics.cpp line 1238: > 1236: } else { > 1237: // Masked vector load with IOOBE always uses the predicated load. > 1238: const TypeInt* offset_in_range = > gvn().type(argument(8))->isa_int(); Should it be `argument(7)`? (and adjustments later to access the container). - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]
On Thu, 5 May 2022 01:13:23 GMT, Xiaohong Gong wrote: > Yeah, I agree that it's not good by adding a branch checking for > `offsetInRange`. But actually I met the constant issue that passing the > values all the way cannot guarantee the argument a constant in compiler at > the compile time. Do you have any better idea to fixing this? That's odd, `boolean` constants are passed that are then converted to `int` constants. Did you try passing integer constants all the way through? - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai wrote: > Hi, > > This patch optimises the matching rules for floating-point comparison with > respects to eq/ne on x86-64 > > 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` > is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which > improves the sequence of `If (CmpF x x) (Bool ne)` from > > ucomiss xmm0, xmm0 > jp label > jne label > > into > > ucomiss xmm0, xmm0 > jp label > > 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x > == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of > fixing the flags, such as > > xorlecx, ecx > ucomiss xmm0, xmm1 > jnp done > pushf > andq[rsp], 0xff2b > popf > done: > movleax, 1 > cmovel eax, ecx > > The patch changes this sequence into > > xorlecx, ecx > ucomiss xmm0, xmm1 > movleax, 1 > cmovpl eax, ecx > cmovnel eax, ecx > > 3, The patch also changes the pattern of `isInfinite` to be more optimised by > using `Math.abs` to reduce 1 comparison and compares the result with > `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. > > The benchmark results are as follow: > > Before: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op > FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op > FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op > FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op > FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op > FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op > FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op > FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op > > After: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op > FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op > FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op > FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op > FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op > FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op > FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op > FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op > > Thank you very much. The changes to `Float` and `Double` look good. I don't think we need additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java. At first i thought we no longer need PR #8459 but it seems both PRs are complimentary, albeit PR #8459 has more modest performance gains for the intrinsics. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator)
On Wed, 27 Apr 2022 11:03:48 GMT, Jatin Bhateja wrote: > Hi All, > > Patch adds the planned support for new vector operations and APIs targeted > for [JEP 426: Vector API (Fourth > Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173) > > Following is the brief summary of changes:- > > 1) Extends the scope of existing lanewise API for following new vector > operations. >- VectorOperations.BIT_COUNT: counts the number of one-bits >- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero > bits >- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing > zero bits >- VectorOperations.REVERSE: reversing the order of bits >- VectorOperations.REVERSE_BYTES: reversing the order of bytes >- compress and expand bits: Semantics are based on Hacker's Delight > section 7-4 Compress, or Generalized Extract. > > 2) Adds following new APIs to perform cross lane vector compress and > expansion operations under the influence of a mask. >- Vector.compress >- Vector.expand >- VectorMask.compress > > 3) Adds predicated and non-predicated versions of following new APIs to load > and store the contents of vector from foreign MemorySegments. > - Vector.fromMemorySegment > - Vector.intoMemorySegment > > 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support > for each newly added operation. > > > Patch has been regressed over AARCH64 and X86 targets different AVX levels. > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1340: > 1338: assert_different_registers(dst, src, vtmp1, vtmp2, vtmp3, vtmp4); > 1339: assert_different_registers(mask, ptmp, pgtmp); > 1340: // Example input: src = 88 77 66 45 44 33 22 11 Suggestion: // Example input: src = 88 77 66 55 44 33 22 11 - PR: https://git.openjdk.java.net/jdk/pull/8425
Re: RFR: 8277090 : jsr166 refresh for jdk19
On Wed, 4 May 2022 12:30:53 GMT, Doug Lea wrote: > This is the revised jsr166 refresh for jdk19. See > https://bugs.openjdk.java.net/browse/JDK-8285450 and > https://bugs.openjdk.java.net/browse/JDK-8277090. Out of caution, this PR > removes the unrelated commits from original version. Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8536
Re: RFR: 8277090 : jsr166 refresh for jdk19 [v2]
On Tue, 3 May 2022 15:32:27 GMT, Martin Buchholz wrote: >> Doug Lea has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Address review comments > > src/java.base/share/classes/java/util/concurrent/CompletableFuture.java line > 2153: > >> 2151: >> 2152: @Override >> 2153: public Throwable exceptionNow() { > > The unwrapping of CompletionExceptions should be documented `exceptionNow` is specified to call `get`, which does not throw `CompletionException`: * @implSpec * The default implementation invokes {@code isDone()} to test if the task * has completed. If done and not cancelled, it invokes {@code get()} and * catches the {@code ExecutionException} to obtain the exception. - PR: https://git.openjdk.java.net/jdk/pull/8490
Re: RFR: 8277090 : jsr166 refresh for jdk19 [v2]
On Mon, 2 May 2022 13:33:33 GMT, Doug Lea wrote: >> This is the jsr166 refresh for jdk19. See >> https://bugs.openjdk.java.net/browse/JDK-8285450 and >> https://bugs.openjdk.java.net/browse/JDK-8277090 > > Doug Lea has updated the pull request incrementally with one additional > commit since the last revision: > > Address review comments Looks good, although I cannot say i reviewed the fork join pool code as thoroughly, since it is particularly difficult, so i was most looking for consistency in the pattern of code. src/java.base/share/classes/java/util/concurrent/ForkJoinWorkerThread.java line 76: > 74: ForkJoinWorkerThread(ThreadGroup group, ForkJoinPool pool, > 75: boolean useSystemClassLoader, > 76: boolean clearThreadLocals) { It's tempting to toggle the sense of last boolean argument to be `preserveThreadLocals` for consistency (given the multiple toggles as the value is propagated from the newly added protected constructor), but which means toggling the value at the use sites that you may not wish to change. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8490
Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]
On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that do >> not support the predicate feature. Because the masked load is implemented >> with a full vector load and a vector blend applied on it. And a full vector >> load will definitely cause the IOOBE which is not valid. However, for >> architectures that support the predicate feature like SVE/AVX-512/RVV, it >> can be vectorized with the predicated load instruction as long as the >> indexes of the masked lanes are within the bounds of the array. For these >> architectures, loading with unmasked lanes does not raise exception. >> >> This patch adds the vectorization support for the masked load with IOOBE >> part. Please see the original java implementation (FIXME: optimize): >> >> >> @ForceInline >> public static >> ByteVector fromArray(VectorSpecies species, >>byte[] a, int offset, >>VectorMask m) { >> ByteSpecies vsp = (ByteSpecies) species; >> if (offset >= 0 && offset <= (a.length - species.length())) { >> return vsp.dummyVector().fromArray0(a, offset, m); >> } >> >> // FIXME: optimize >> checkMaskFromIndexSize(offset, vsp, m, 1, a.length); >> return vsp.vOp(m, i -> a[offset + i]); >> } >> >> Since it can only be vectorized with the predicate load, the hotspot must >> check whether the current backend supports it and falls back to the java >> scalar version if not. This is different from the normal masked vector load >> that the compiler will generate a full vector load and a vector blend if the >> predicate load is not supported. So to let the compiler make the expected >> action, an additional flag (i.e. `usePred`) is added to the existing >> "loadMasked" intrinsic, with the value "true" for the IOOBE part while >> "false" for the normal load. And the compiler will fail to intrinsify if the >> flag is "true" and the predicate load is not supported by the backend, which >> means that normal java path will be executed. >> >> Also adds the same vectorization support for masked: >> - fromByteArray/fromByteBuffer >> - fromBooleanArray >> - fromCharArray >> >> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` >> on the x86 AVX-512 system: >> >> Benchmark before After Units >> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE 737.542 1387.069 ops/ms >> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366 330.776 ops/ms >> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE 233.832 6125.026 ops/ms >> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms >> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms >> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms >> >> Similar performance gain can also be observed on 512-bit SVE system. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Rename the "usePred" to "offsetInRange" IIUC when the hardware does not support predicated loads then any false `offsetIntRange` value causes the load intrinsic to fail resulting in the fallback, so it would not be materially any different to the current behavior, just more uniformly implemented. Why can't the intrinsic support the passing a boolean directly? Is it something to do with constants? If that is not possible I recommend creating named constant values and pass those all the way through rather than converting a boolean to an integer value. Then there is no need for a branch checking `offsetInRange`. Might be better to hold off until the JEP is integrated and then update, since this will conflict (`byte[]` and `ByteBuffer` load methods are removed and `MemorySegment` load methods are added). You could prepare for that now by branching off `vectorIntrinsics`. - PR: https://git.openjdk.java.net/jdk/pull/8035
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v4]
On Fri, 29 Apr 2022 06:35:44 GMT, Jie Fu wrote: >> Hi all, >> >> The Current Vector API doc for `LSHR` is >> >> Produce a>>>(n&(ESIZE*8-1)). Integral only. >> >> >> This is misleading which may lead to bugs for Java developers. >> This is because for negative byte/short elements, the results computed by >> `LSHR` will be different from that of `>>>`. >> For more details, please see >> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . >> >> After the patch, the doc for `LSHR` is >> >> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral >> only. >> >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains eight additional commits since > the last revision: > > - Address CSR review comments > - Merge branch 'master' into JDK-8284992 > - Address review comments > - Merge branch 'master' into JDK-8284992 > - Merge branch 'master' into JDK-8284992 > - Address review comments > - Merge branch 'master' into JDK-8284992 > - 8284992: Fix misleading Vector API doc for LSHR operator Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v3]
On Wed, 27 Apr 2022 09:06:12 GMT, Jie Fu wrote: >> Hi all, >> >> The Current Vector API doc for `LSHR` is >> >> Produce a>>>(n&(ESIZE*8-1)). Integral only. >> >> >> This is misleading which may lead to bugs for Java developers. >> This is because for negative byte/short elements, the results computed by >> `LSHR` will be different from that of `>>>`. >> For more details, please see >> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . >> >> After the patch, the doc for `LSHR` is >> >> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral >> only. >> >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains six additional commits since the > last revision: > > - Address review comments > - Merge branch 'master' into JDK-8284992 > - Merge branch 'master' into JDK-8284992 > - Address review comments > - Merge branch 'master' into JDK-8284992 > - 8284992: Fix misleading Vector API doc for LSHR operator It should be possible for you finalize now. - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v3]
On Wed, 27 Apr 2022 09:06:12 GMT, Jie Fu wrote: >> Hi all, >> >> The Current Vector API doc for `LSHR` is >> >> Produce a>>>(n&(ESIZE*8-1)). Integral only. >> >> >> This is misleading which may lead to bugs for Java developers. >> This is because for negative byte/short elements, the results computed by >> `LSHR` will be different from that of `>>>`. >> For more details, please see >> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . >> >> After the patch, the doc for `LSHR` is >> >> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral >> only. >> >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains six additional commits since the > last revision: > > - Address review comments > - Merge branch 'master' into JDK-8284992 > - Merge branch 'master' into JDK-8284992 > - Address review comments > - Merge branch 'master' into JDK-8284992 > - 8284992: Fix misleading Vector API doc for LSHR operator I created one, filled it in, and assigned it to you (for other examples you can search in the issue tracker, this one quite is simple so i thought it was quicker to do myself to show you). For any specification change we need to review and track that change (independent of any implementation changes, if any). If you are ok with it I can add myself as reviewer, then you can "Finalize" it (see button on same line as "Edit"), triggering a review request, from which we may receive comments to address, and once addressed and final, it will unblock the PR for integration. - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]
On Wed, 27 Apr 2022 14:24:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e It's great to see end game in play for this multi-year effort, impressive work by all involved. I looked through code in the base module. Generally looks well structured and documented. The fork join code is naturally hard to review. I did try! (it takes the prize for the highest Java code density in the JDK). I mostly looked for regular and repeated structure rather than trying to fully understand/review the intricate concurrency details. IMO there is some post integration work to do around the architecture of thread containers, i presume the structured concurrency implementation (use of thread flock) will help with that. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]
On Wed, 27 Apr 2022 14:24:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e src/java.base/share/classes/java/lang/Thread.java line 116: > 114: * carrier threads. Locking and I/O operations are examples of > operations > 115: * where a carrier thread may be re-scheduled from one virtual thread to > another. > 116: * Code executing in a virtual thread is not aware of underlying carrier > thread. Suggestion: * Code executing in a virtual thread is not aware of the underlying carrier thread. - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]
On Wed, 27 Apr 2022 14:24:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e src/java.base/share/classes/java/util/concurrent/ThreadPerTaskExecutor.java line 45: > 43: * threads is unbounded. > 44: */ > 45: class ThreadPerTaskExecutor implements ExecutorService { This class manages the set of per-task threads which arguably should be the job of the thread container, and it awkwardly overrides the container's set of threads by setting a field on `SharedThreadContainer.threadsSupplier`. `SharedThreadContainer` supports a number of different shared container policies that could be made clearer. Architecturally i think this could be better layered but it can be iterated on later. - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v8]
On Wed, 27 Apr 2022 17:27:56 GMT, Mandy Chung wrote: >> Alan Bateman has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Refresh 7965cc6b168e567ac2596f2fbc3b00a7d99b7e1e > > src/java.base/share/classes/java/lang/PinnedThreadPrinter.java line 58: > >> 56: >> 57: // maps a class to the hashes of stack traces pinned by that code in >> that class >> 58: private static final Map, Hashes> classToHashes = new >> WeakHashMap<>(); > > Can you use `ClassValue` in this case? I was wondering that too, but held off commenting since it's not super performance critical given `classToHashes` is synchronized on. However, it does make the code a little clearer. - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v3]
On Wed, 27 Apr 2022 09:06:12 GMT, Jie Fu wrote: >> Hi all, >> >> The Current Vector API doc for `LSHR` is >> >> Produce a>>>(n&(ESIZE*8-1)). Integral only. >> >> >> This is misleading which may lead to bugs for Java developers. >> This is because for negative byte/short elements, the results computed by >> `LSHR` will be different from that of `>>>`. >> For more details, please see >> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . >> >> After the patch, the doc for `LSHR` is >> >> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral >> only. >> >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains six additional commits since the > last revision: > > - Address review comments > - Merge branch 'master' into JDK-8284992 > - Merge branch 'master' into JDK-8284992 > - Address review comments > - Merge branch 'master' into JDK-8284992 > - 8284992: Fix misleading Vector API doc for LSHR operator Thanks, looks good, we will need to create a CSR. Have you done that before? - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator [v2]
On Thu, 21 Apr 2022 04:23:22 GMT, Jie Fu wrote: >> Hi all, >> >> The Current Vector API doc for `LSHR` is >> >> Produce a>>>(n&(ESIZE*8-1)). Integral only. >> >> >> This is misleading which may lead to bugs for Java developers. >> This is because for negative byte/short elements, the results computed by >> `LSHR` will be different from that of `>>>`. >> For more details, please see >> https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . >> >> After the patch, the doc for `LSHR` is >> >> Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral >> only. >> >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains three additional commits since > the last revision: > > - Address review comments > - Merge branch 'master' into JDK-8284992 > - 8284992: Fix misleading Vector API doc for LSHR operator After talking with John here's what we think is a better approach than what I originally had in mind: 1. In the class doc of `VectorOperators` add a definition for `EMASK` occurring after the definition for `ESIZE`: * {@code EMASK} the bit mask of the operand type, where {@code EMASK=(1<>>(n&(ESIZE*8-1))}. Integral only. */ That more clearly gets across operating in the correct domain for sub-word operand types, which was the original intention (e.g. the right shift value). - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator
On Tue, 19 Apr 2022 08:41:50 GMT, Jie Fu wrote: > Hi all, > > The Current Vector API doc for `LSHR` is > > Produce a>>>(n&(ESIZE*8-1)). Integral only. > > > This is misleading which may lead to bugs for Java developers. > This is because for negative byte/short elements, the results computed by > `LSHR` will be different from that of `>>>`. > For more details, please see > https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . > > After the patch, the doc for `LSHR` is > > Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral only. > > > Thanks. > Best regards, > Jie We can raise attention to that: /** Produce {@code a>>>(n&(ESIZE*8-1))} * (The operand and result are converted if the operand type is {@code byte} or {@code short}, see below). Integral only. * ... */ - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator
On Tue, 19 Apr 2022 08:41:50 GMT, Jie Fu wrote: > Hi all, > > The Current Vector API doc for `LSHR` is > > Produce a>>>(n&(ESIZE*8-1)). Integral only. > > > This is misleading which may lead to bugs for Java developers. > This is because for negative byte/short elements, the results computed by > `LSHR` will be different from that of `>>>`. > For more details, please see > https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . > > After the patch, the doc for `LSHR` is > > Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral only. > > > Thanks. > Best regards, > Jie The intended pattern for the operator tokens is to present a short symbolic description using Java operators and common methods. It would be good to try and keep with this pattern, and clarify for the extra cases. Here's what i had in mind: /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. * * For operand types {@code byte} and {@code short} the operation behaves as if the operand is first implicitly widened * to an {@code int} value with {@code (a & ((1 << ESIZE) - 1))} the result of which is then applied as the operand to this * operation, the result of the operation is then narrowed from {@code int} to the operand type using an explicit cast. */ public static final /*bitwise*/ Binary LSHR; - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote: > Hi all, > > According to the Vector API doc, the `LSHR` operator computes > `a>>>(n&(ESIZE*8-1))`. > However, current implementation is incorrect for negative bytes/shorts. > > The background is that one of our customers try to vectorize `urshift` with > `urshiftVector` like the following. > > 13 public static void urshift(byte[] src, byte[] dst) { > 14 for (int i = 0; i < src.length; i++) { > 15 dst[i] = (byte)(src[i] >>> 3); > 16 } > 17 } > 18 > 19 public static void urshiftVector(byte[] src, byte[] dst) { > 20 int i = 0; > 21 for (; i < spec.loopBound(src.length); i +=spec.length()) { > 22 var va = ByteVector.fromArray(spec, src, i); > 23 var vb = va.lanewise(VectorOperators.LSHR, 3); > 24 vb.intoArray(dst, i); > 25 } > 26 > 27 for (; i < src.length; i++) { > 28 dst[i] = (byte)(src[i] >>> 3); > 29 } > 30 } > > > Unfortunately and to our surprise, code@line28 computes different results > with code@line23. > It took quite a long time to figure out this bug. > > The root cause is that current implemenation of Vector API can't compute the > unsigned right shift results as what is done for scalar `>>>` for negative > byte/short elements. > Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all > bytes, which is unable to compute the vectorized `>>>` for negative bytes. > So this seems unreasonable and unfriendly to Java developers. > It would be better to fix it. > > The key idea to support unsigned right shift of negative bytes/shorts is just > to replace the unsigned right shift operation with the signed right shift > operation. > This logic is: > - For byte elements, unsigned right shift is equal to signed right shift if > the shift_cnt <= 24. > - For short elements, unsigned right shift is equal to signed right shift if > the shift_cnt <= 16. > - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes > and shift_cnt <= 15 for shorts. > > I just learned this idea from https://github.com/openjdk/jdk/pull/7979 . > And many thanks to @fg1417 . > > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935 I am yet to be convinced we a 3rd vector right shift operator. We are talking about narrow cases of correct use which i believe can be supported with the existing operators. The user needs to think very careful when deviating from common right shift patterns on sub-words, which when deviated from can often imply misunderstanding and incorrect use, or an obtuse use. I would prefer to stick the existing operators and clarify their use. - PR: https://git.openjdk.java.net/jdk/pull/8276
Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote: > Hi all, > > According to the Vector API doc, the `LSHR` operator computes > `a>>>(n&(ESIZE*8-1))`. > However, current implementation is incorrect for negative bytes/shorts. > > The background is that one of our customers try to vectorize `urshift` with > `urshiftVector` like the following. > > 13 public static void urshift(byte[] src, byte[] dst) { > 14 for (int i = 0; i < src.length; i++) { > 15 dst[i] = (byte)(src[i] >>> 3); > 16 } > 17 } > 18 > 19 public static void urshiftVector(byte[] src, byte[] dst) { > 20 int i = 0; > 21 for (; i < spec.loopBound(src.length); i +=spec.length()) { > 22 var va = ByteVector.fromArray(spec, src, i); > 23 var vb = va.lanewise(VectorOperators.LSHR, 3); > 24 vb.intoArray(dst, i); > 25 } > 26 > 27 for (; i < src.length; i++) { > 28 dst[i] = (byte)(src[i] >>> 3); > 29 } > 30 } > > > Unfortunately and to our surprise, code@line28 computes different results > with code@line23. > It took quite a long time to figure out this bug. > > The root cause is that current implemenation of Vector API can't compute the > unsigned right shift results as what is done for scalar `>>>` for negative > byte/short elements. > Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all > bytes, which is unable to compute the vectorized `>>>` for negative bytes. > So this seems unreasonable and unfriendly to Java developers. > It would be better to fix it. > > The key idea to support unsigned right shift of negative bytes/shorts is just > to replace the unsigned right shift operation with the signed right shift > operation. > This logic is: > - For byte elements, unsigned right shift is equal to signed right shift if > the shift_cnt <= 24. > - For short elements, unsigned right shift is equal to signed right shift if > the shift_cnt <= 16. > - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes > and shift_cnt <= 15 for shorts. > > I just learned this idea from https://github.com/openjdk/jdk/pull/7979 . > And many thanks to @fg1417 . > > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935 Not yet talked with John, but i investigated further. The implementation of the `LSHR` operation is behaving as intended, but is under specified with regards to `byte` and `short` as you noted in #8291. This is a subtle area, but i am wondering if the user really means to use arithmetic shift in this case? Since is not the following true for all values of `e` and `c`, where `e` is a `byte` and `c` is the right shift count ranging from 0 to 7: (byte) (e >>> c) == (byte) (e >> c) ? Then the user can use `VectorOperators.ASHR`. - PR: https://git.openjdk.java.net/jdk/pull/8276
Re: RFR: 8285007: Use correct lookup mode for MethodHandleStatics.UNSAFE
On Tue, 19 Apr 2022 13:19:31 GMT, Claes Redestad wrote: > Trivially fix the resolution of the `NF_UNSAFE` named function to use the > appropriate lookup mode. > > In [JDK-8187826](https://bugs.openjdk.java.net/browse/JDK-8187826) we changed > from regular reflection to use a `NamedFunction` for this field, but it > appears the lookup mode came out wrong. This mismatch appears to be benign > and ignored by HotSpot, but a user implementing an experimental JVM ran into > some issues (and additionally noticed the resolve throws the wrong > exception). > > This patch is a point fix to this particular issue, but I think we should > consider following up with a stronger assertion or test for proper staticness > of fields somewhere when resolving fields via > `MemberName.getFactory().resolveOrNull(..)`. Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8297
Re: RFR: 8284992: Fix misleading Vector API doc for LSHR operator
On Tue, 19 Apr 2022 08:41:50 GMT, Jie Fu wrote: > Hi all, > > The Current Vector API doc for `LSHR` is > > Produce a>>>(n&(ESIZE*8-1)). Integral only. > > > This is misleading which may lead to bugs for Java developers. > This is because for negative byte/short elements, the results computed by > `LSHR` will be different from that of `>>>`. > For more details, please see > https://github.com/openjdk/jdk/pull/8276#issue-1206391831 . > > After the patch, the doc for `LSHR` is > > Produce zero-extended right shift of a by (n&(ESIZE*8-1)) bits. Integral only. > > > Thanks. > Best regards, > Jie I need to think a little more about this. The specification is not accurate and likely requires a CSR. My initial thoughts are I would prefer the operation to retain reference to the succinct definition using the logical right shift operator but we add additional specification explaining the cases for `byte` and `short`, namely that the result is widened to an `int` as if by `(a & ((1 << ESIZE) - 1))` and the result narrowed. That's commonly (at least for the widening part) how `>>>` is used with `byte` and `short`, and i think would be clearer to be explicit in that regard. - PR: https://git.openjdk.java.net/jdk/pull/8291
Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote: > Hi all, > > According to the Vector API doc, the `LSHR` operator computes > `a>>>(n&(ESIZE*8-1))`. > However, current implementation is incorrect for negative bytes/shorts. > > The background is that one of our customers try to vectorize `urshift` with > `urshiftVector` like the following. > > 13 public static void urshift(byte[] src, byte[] dst) { > 14 for (int i = 0; i < src.length; i++) { > 15 dst[i] = (byte)(src[i] >>> 3); > 16 } > 17 } > 18 > 19 public static void urshiftVector(byte[] src, byte[] dst) { > 20 int i = 0; > 21 for (; i < spec.loopBound(src.length); i +=spec.length()) { > 22 var va = ByteVector.fromArray(spec, src, i); > 23 var vb = va.lanewise(VectorOperators.LSHR, 3); > 24 vb.intoArray(dst, i); > 25 } > 26 > 27 for (; i < src.length; i++) { > 28 dst[i] = (byte)(src[i] >>> 3); > 29 } > 30 } > > > Unfortunately and to our surprise, code@line28 computes different results > with code@line23. > It took quite a long time to figure out this bug. > > The root cause is that current implemenation of Vector API can't compute the > unsigned right shift results as what is done for scalar `>>>` for negative > byte/short elements. > Actually, current implementation will do `(a & 0xFF) >>> (n & 7)` [1] for all > bytes, which is unable to compute the vectorized `>>>` for negative bytes. > So this seems unreasonable and unfriendly to Java developers. > It would be better to fix it. > > The key idea to support unsigned right shift of negative bytes/shorts is just > to replace the unsigned right shift operation with the signed right shift > operation. > This logic is: > - For byte elements, unsigned right shift is equal to signed right shift if > the shift_cnt <= 24. > - For short elements, unsigned right shift is equal to signed right shift if > the shift_cnt <= 16. > - For Vector API, the shift_cnt will be masked to shift_cnt <= 7 for bytes > and shift_cnt <= 15 for shorts. > > I just learned this idea from https://github.com/openjdk/jdk/pull/7979 . > And many thanks to @fg1417 . > > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java#L935 I need to discuss with @rose00 on the history behind this before deciding what to do. - PR: https://git.openjdk.java.net/jdk/pull/8276
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/vm/annotation/ChangesCurrentThread.java line 35: > 33: * disables inlining for the method to which it is applied unless the > 34: * method being unlined into is also annotated ChangesCurrentThread. > 35: Suggestion: * method being inlined into is also annotated ChangesCurrentThread. * - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/vm/ThreadContainers.java line 184: > 182: * with the Thread API. > 183: */ > 184: private static class RootContainer extends ThreadContainer { This implementation could be clearer if split out into two, and the value assigned to `INSTANCE` is selected based on the system property. Something to consider later perhaps. - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/vm/Continuation.java line 474: > 472: private void processNmethods(int before, int after) { > 473: > 474: } Unused field and method? - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/vm/Continuation.java line 264: > 262: } finally { > 263: fence(); > 264: StackChunk c = tail; `c` is not used src/java.base/share/classes/jdk/internal/vm/Continuation.java line 310: > 308: > 309: @IntrinsicCandidate > 310: private static int doYield() { throw new Error("Intrinsic not > installed"); }; Suggestion: private static int doYield() { throw new Error("Intrinsic not installed"); } - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/vm/Continuation.java line 115: > 113: } > 114: > 115: private Runnable target; Can be final? (Does not appear to be updated by the VM, or via unsafe by some other class) - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/vm/Continuation.java line 94: > 92: default: > 93: throw new AssertionError("Unknown pinned reason: " + > reason); > 94: } Suggestion: return switch (reason) { case 2 -> Pinned.CRITICAL_SECTION; case 3 -> Pinned.NATIVE; case 4 -> Pinned.MONITOR; default -> throw new AssertionError("Unknown pinned reason: " + reason); }; - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/misc/ThreadTracker.java line 59: > 57: && this.thread == other.thread; > 58: } > 59: } Suggestion: private record ThreadRef(Thread thread) { @Override public int hashCode() { return Long.hashCode(thread.threadId()); } @Override public boolean equals(Object obj) { return (obj instanceof ThreadRef other) && this.thread == other.thread; } } - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/misc/ThreadFlock.java line 262: > 260: * @return the thread, started > 261: * @throws IllegalStateException if this flock is shutdown or closed > 262: * @throws jdk.incubator.concurrent.StructureViolationException if > the current Suggestion: * @throws WrongThreadException if the current thread is not the owner or a thread * contained in the flock * @throws jdk.incubator.concurrent.StructureViolationException if the current - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/misc/InternalLock.java line 46: > 44: } else { > 45: CAN_USE_INTERNAL_LOCK = true; > 46: } Suggestion: CAN_USE_INTERNAL_LOCK = Boolean.getBoolean("jdk.io.useMonitors"); - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/jdk/internal/misc/Blocker.java line 118: > 116: static long beginCompensatedBlock(ForkJoinPool pool) { > 117: try { > 118: return (long) beginCompensatedBlock.invoke(pool); Suggestion: return (long) beginCompensatedBlock.invokeExact(pool); src/java.base/share/classes/jdk/internal/misc/Blocker.java line 126: > 124: static void endCompensatedBlock(ForkJoinPool pool, long post) { > 125: try { > 126: endCompensatedBlock.invoke(pool, post); Suggestion: endCompensatedBlock.invokeExact(pool, post); - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/java/lang/VirtualThread.java line 65: > 63: * system. > 64: */ > 65: class VirtualThread extends Thread { Suggestion: final class VirtualThread extends Thread { src/java.base/share/classes/java/lang/VirtualThread.java line 94: > 92: * RUNNING -> PARKING // Thread attempts to park > 93: * PARKING -> PARKED // yield successful, thread is parked > 94: * PARKING -> PINNED // yield failed, thread is pinned Suggestion: * PARKING -> PARKED // parking successful, thread is parked * PARKING -> PINNED // parking failed, thread is pinned - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/java/lang/Thread.java line 3028: > 3026: > 3027: /** The thread container that this thread is in */ > 3028: @Stable private volatile ThreadContainer container; The `volatile` modifier with `@Stable` is unusual. IIUC the field is set once on start (behaves as if a final field). Once C2 gets hold of it and can treat the value as a runtime constant the `volatile` has no effect. This is a tricky area but i think a store fence will suffice in `setThreadContainer` since the value goes from null to non-null and therefore cannot be speculated upon. - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/java/lang/Thread.java line 213: > 211: > 212: // Additional fields for platform threads > 213: private static class FieldHolder { All but the task field are explicitly known by the VM (and stillborn is not accessed by Java code. Suggest grouping accordingly. - PR: https://git.openjdk.java.net/jdk/pull/8166
Re: RFR: 8284161: Implementation of Virtual Threads (Preview) [v2]
On Wed, 13 Apr 2022 14:21:20 GMT, Alan Bateman wrote: >> This is the implementation of JEP 425: Virtual Threads (Preview); TBD which >> JDK version to target. >> >> We will refresh this PR periodically to pick up changes and fixes from the >> loom repo. >> >> Most of the new mechanisms in the HotSpot VM are disabled by default and >> require running with `--enable-preview` to enable. >> >> The patch has support for x64 and aarch64 on the usual operating systems >> (Linux, macOS, and Windows). There are stubs (calling Unimplemented) for >> zero and some of the other ports. Additional ports can be contributed via >> PRs against the fibers branch in the loom repo. >> >> There are changes in many areas. To reduce notifications/mails, the labels >> have been trimmed down for now to hotspot, serviceability and core-libs. >> We'll add the complete set of labels when the PR is further along. >> >> The changes include a refresh of java.util.concurrent and ForkJoinPool from >> Doug Lea's CVS. These changes will probably be proposed and integrated in >> advance of this PR. >> >> The changes include some non-exposed and low-level infrastructure to support >> the (in draft) JEPs for Structured Concurrency and Scope Locals. This is to >> make life a bit easier and avoid having to separate VM changes and juggle >> branches at this time. > > Alan Bateman has updated the pull request incrementally with one additional > commit since the last revision: > > Refresh src/java.base/share/classes/java/lang/Thread.java line 862: > 860: * @param start the starting value of the counter > 861: * @return this builder > 862: * @throws IllegalArgumentException if count is negative Suggestion: * @throws IllegalArgumentException if start is negative - PR: https://git.openjdk.java.net/jdk/pull/8166
Integrated: 8283892: Compress and expand bits
On Tue, 5 Apr 2022 22:05:19 GMT, Paul Sandoz wrote: > Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. This pull request has now been integrated. Changeset: fbb09160 Author:Paul Sandoz URL: https://git.openjdk.java.net/jdk/commit/fbb09160906b4d9b0a29c8e99465f12ad16d4c88 Stats: 1031 lines in 7 files changed: 1024 ins; 5 del; 2 mod 8283892: Compress and expand bits Reviewed-by: alanb, redestad - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v5]
On Thu, 10 Feb 2022 04:30:34 GMT, Tagir F. Valeev wrote: >> See the bug description for details. >> >> I propose a simple solution. Let's allow ArraySpliterator to be non-SIZED >> and report artificial estimatedSize(), much bigger than the real one. This >> will allow AbstractSpliterator and IteratorSpliterator to produce prefix >> whose size is comparable to Long.MAX_VALUE (say, starting with >> Long.MAX_VALUE/2), and this will enable further splitting of the prefix. >> This change will drastically improve parallel streaming for affected streams >> of size <= 1024 and significantly improve for streams of size 1025..2. >> The cost is higher-grained splitting for huge streams of unknown size. This >> might add a minor overhead for such scenarios which, I believe, is >> completely tolerable. >> >> No public API changes are necessary, sequential processing should not be >> affected, except an extra field in ArraySpliterator which increases a >> footprint by 8 bytes. >> >> I added a simple test using an artificial collector to ensure that at least >> two non-empty parts are created when parallelizing Stream.iterate source. >> More testing ideas are welcome. > > Tagir F. Valeev has updated the pull request incrementally with two > additional commits since the last revision: > > - Update copyright year > - Cosmetic fixes Getting back to this after much delay! Approving. But, i would like to try and document this design decision in comments, and maybe in implementation notes. We can do that as a follow on PR. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7279
Re: RFR: 8284579: Improve VarHandle checks for interpreter [v2]
On Mon, 11 Apr 2022 10:03:58 GMT, Claes Redestad wrote: >>> How would the performance change if the `isDirect` and >>> `checkExactAccessMode` merger was reverted? >> >> Add around 15-20ns/op for these micros. > > Restructuring so that we only check `direct` once sounds reasonable at face > value but would be a lot of churn for little gain (even in the interpreter > testing a local boolean field is fast, and JITs will optimize this well). > > The `LINK_TO_STATIC_ARGS_V` in the code generator seems like a remnant from > an earlier iteration of this code. The `vform.getMemberName_V` method the > code generator would emit a call to doesn't even exist. This should probably > be cleaned up, separately. @PaulSandoz, WDYT? I think it was some left-over from prototyping and was not cleaned up. AFAICT `getMemberName_V` never existed in an committed code. It's OK to remove the `LINK_TO_STATIC_ARGS_V` as part of this commit of you wish. - PR: https://git.openjdk.java.net/jdk/pull/8160
Re: RFR: 8283892: Compress and expand bits [v7]
> Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: JavaDoc typo - Changes: - all: https://git.openjdk.java.net/jdk/pull/8115/files - new: https://git.openjdk.java.net/jdk/pull/8115/files/f9b7b2bc..8a52eabe Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v6]
> Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Refer to hexadecimal digit - Changes: - all: https://git.openjdk.java.net/jdk/pull/8115/files - new: https://git.openjdk.java.net/jdk/pull/8115/files/ee062537..f9b7b2bc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=04-05 Stats: 18 lines in 2 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v5]
> Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Updates to examples - Changes: - all: https://git.openjdk.java.net/jdk/pull/8115/files - new: https://git.openjdk.java.net/jdk/pull/8115/files/96d90e1a..ee062537 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=03-04 Stats: 40 lines in 2 files changed: 10 ins; 2 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v4]
On Thu, 7 Apr 2022 20:02:51 GMT, Alan Bateman wrote: >> Examples added in latest commit. > > I see you've added a comment on each example too, I think this really helpers > readers to see how they can be used. > > The class description of both Integer and Long have an implementation note > (they pre-date @implNote) referencing Hacker's Delight. We could potentially > expand the list of methods mentioned but it's not strictly necessary are they > are just examples. I also saw that on the class doc and considered the new methods fit neatly under the category of "bit twiddling" methods. - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v4]
On Wed, 6 Apr 2022 18:22:45 GMT, John R Rose wrote: >> Paul Sandoz has updated the pull request incrementally with two additional >> commits since the last revision: >> >> - Fix typo. >> - Provide examples. > > src/java.base/share/classes/java/lang/Integer.java line 1781: > >> 1779: * All the upper remaining bits of the compressed value are set >> 1780: * to zero. >> 1781: * > > Following Alan's comment, I suggest the following examples: > > > compress(0x9ABCDEF, 0x0F0F0FF) == 0xACEF > compress(x, 1<>n & 1) > compress(x, -1<>> n > compress(m, m) == (m==-1||m==0)? m : (1< compress(x, m) == compress(x & m, m) > compress(expand(x, m), m) == x & compress(m, m) > (compress(x, m) >>> n) & 1 == /*the bit of x corresponding to the nth set > bit in m*/ > > > …In two places. Similarly, examples for expand: > > > expand(0x9ABCDEF, 0x0F0F0FF) == 0xC0D0EF > expand(x, 1< expand(x, -1< expand(-1, m) == m > expand(x, m) == expand(x, m) & m > expand(compress(x, m), m) == x & m > expand(1< highest/lowestOneBit*/ > > > (Please double check these examples!) Examples added in latest commit. - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v4]
> Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. Paul Sandoz has updated the pull request incrementally with two additional commits since the last revision: - Fix typo. - Provide examples. - Changes: - all: https://git.openjdk.java.net/jdk/pull/8115/files - new: https://git.openjdk.java.net/jdk/pull/8115/files/da49874b..96d90e1a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=02-03 Stats: 186 lines in 2 files changed: 186 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v3]
> Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. Paul Sandoz has updated the pull request incrementally with five additional commits since the last revision: - Merge remote-tracking branch 'cl4es/expand_a_bit_more' into compress-expand-bits - Rename partx -> maskPrefixx, add ForceInline - Fix copy-paste error in Long.expand - Add microbenchmarks and manually unroll expand implementations - Consistent bounds. - Changes: - all: https://git.openjdk.java.net/jdk/pull/8115/files - new: https://git.openjdk.java.net/jdk/pull/8115/files/8c536de6..da49874b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=01-02 Stats: 163 lines in 5 files changed: 107 ins; 11 del; 45 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v2]
On Thu, 7 Apr 2022 14:01:53 GMT, Claes Redestad wrote: >> Paul Sandoz has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Doc and test updates. > > I experimented with this and drafted a few microbenchmarks along and a > micro-optimization to the expand methods (see #8146). Up to you, but I think > it makes sense to consider such optimizations to the plain java > implementation given that 1) not all platforms have specialized instructions > where an intrinsic will make sense and 2) it appears to be a boost to warmup. @cl4es thanks, your changes to `expand` looks reasonable, given the array allocation cost. - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v2]
On Wed, 6 Apr 2022 17:39:48 GMT, John R Rose wrote: >> Paul Sandoz has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Doc and test updates. > > test/jdk/java/lang/AbstractCompressExpandTest.java line 145: > >> 143: >> 144: int i = 0; >> 145: for (int len = 0; len < 32; len++) { > > Lengths should be `len = 1; len <= 32; len++`, generating fewer redundant > zero-length masks and a full-length -1 mask. The pos should be `pos = 0; pos > <= 32 - len; pos++`, generating fully left-justified masks of the form > `-1< > Also, you should generate a zero-mask, just once. I think initializing `int > i = 1` will do that trick. > > (Same comment in two places.) Done. - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v2]
On Wed, 6 Apr 2022 17:43:34 GMT, John R Rose wrote: >> Paul Sandoz has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Doc and test updates. > > test/jdk/java/lang/AbstractCompressExpandTest.java line 104: > >> 102: abstract long expectedExpand(long i, long mask); >> 103: >> 104: static int SIZE = 1024; > > It feels like `SIZE` should be a constructor parameter, to make it easier to > run ad hoc stress tests. That's tricky because the test is TestNG based, launched via jtreg, it would need to be a system property. - PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits [v2]
> Add support to compress bits and expand bits of `int` and `long` values, see > Hacker's Delight (2nd edition), section 7.4. > > Compressing or expanding bits of an `int` or `long` value can be composed to > enable general permutations, and the "sheep and goats" operation (SAG) see > Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a > stable binary radix sort. > > The compress and expand functionality maps efficiently to hardware > instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the > implementations can be very efficient on supporting hardware. > Intrinsification will occur in a separate PR. > > This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the > beneficial performance impact of the `PDEP` instruction, and by extension the > `expand` method, when applied to the implementation of a bit-vector select > operation in succinct data structures (for example `select(r)` returns the > position of the `r`th 1). > > Testing-wise the approach take is three fold: > 1. Tests compared against simple implementations that are easy to read and > verify against the JDK implementations (which later will also be made > intrinsic). To compensate all tests are also run flipping the test methods > and the methods under test. > 2. Tests composed of compress and expand and vice versa. > 3. Tests with known mask patterns, whose expected values are easily derived > from the inputs. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Doc and test updates. - Changes: - all: https://git.openjdk.java.net/jdk/pull/8115/files - new: https://git.openjdk.java.net/jdk/pull/8115/files/9516ecca..8c536de6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8115=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8115=00-01 Stats: 33 lines in 3 files changed: 17 ins; 2 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8283892: Compress and expand bits
On Wed, 6 Apr 2022 14:58:37 GMT, Alan Bateman wrote: >> Add support to compress bits and expand bits of `int` and `long` values, see >> Hacker's Delight (2nd edition), section 7.4. >> >> Compressing or expanding bits of an `int` or `long` value can be composed to >> enable general permutations, and the "sheep and goats" operation (SAG) see >> Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a >> stable binary radix sort. >> >> The compress and expand functionality maps efficiently to hardware >> instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the >> implementations can be very efficient on supporting hardware. >> Intrinsification will occur in a separate PR. >> >> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the >> beneficial performance impact of the `PDEP` instruction, and by extension >> the `expand` method, when applied to the implementation of a bit-vector >> select operation in succinct data structures (for example `select(r)` >> returns the position of the `r`th 1). >> >> Testing-wise the approach take is three fold: >> 1. Tests compared against simple implementations that are easy to read and >> verify against the JDK implementations (which later will also be made >> intrinsic). To compensate all tests are also run flipping the test methods >> and the methods under test. >> 2. Tests composed of compress and expand and vice versa. >> 3. Tests with known mask patterns, whose expected values are easily derived >> from the inputs. > > src/java.base/share/classes/java/lang/Integer.java line 1775: > >> 1773: * the specified bit mask. >> 1774: * >> 1775: * For each one-bit value of the mask, {@code mb} say, from least > > A minor comments is that "For each one-bit value of the mask mb " might > be a bit better, otherwise I think these methods and their javadoc looks > good. If it comes up then these methods could include an example in the > javadoc as they aren't hard once you see an example. I can change to "For each one-bit value {@code mb} of the mask ..." - PR: https://git.openjdk.java.net/jdk/pull/8115
RFR: 8283892: Compress and expand bits
Add support to compress bits and expand bits of `int` and `long` values, see Hacker's Delight (2nd edition), section 7.4. Compressing or expanding bits of an `int` or `long` value can be composed to enable general permutations, and the "sheep and goats" operation (SAG) see Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a stable binary radix sort. The compress and expand functionality maps efficiently to hardware instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the implementations can be very efficient on supporting hardware. Intrinsification will occur in a separate PR. This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the beneficial performance impact of the `PDEP` instruction, and by extension the `expand` method, when applied to the implementation of a bit-vector select operation in succinct data structures (for example `select(r)` returns the position of the `r`th 1). Testing-wise the approach take is three fold: 1. Tests compared against simple implementations that are easy to read and verify against the JDK implementations (which later will also be made intrinsic). To compensate all tests are also run flipping the test methods and the methods under test. 2. Tests composed of compress and expand and vice versa. 3. Tests with known mask patterns, whose expected values are easily derived from the inputs. - Commit messages: - Format. - License update. - Update test. - Update tests. - JavaDoc. - Remove byte/short impls. Update tests. - Compositional test. - Fix array size. - Basic test. - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/8115/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=8115=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8283892 Stats: 726 lines in 5 files changed: 719 ins; 5 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/8115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8115/head:pull/8115 PR: https://git.openjdk.java.net/jdk/pull/8115
Re: RFR: 8282191: Implementation of Foreign Function & Memory API (Preview) [v16]
On Wed, 30 Mar 2022 21:51:16 GMT, Maurizio Cimadamore wrote: >> This PR contains the API and implementation changes for JEP-424 [1]. A more >> detailed description of such changes, to avoid repetitions during the review >> process, is included as a separate comment. >> >> [1] - https://openjdk.java.net/jeps/424 > > Maurizio Cimadamore has updated the pull request with a new target base due > to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'master' into foreign-preview > - Tweak FunctionDescriptor::argumentLayouts to return an immutable list > - Fix bad usage of `@link` with primitive array types > - Switch to daemon threads for async upcalls > - Use thread local storage to optimize attach of async threads > - Drop support for Constable from MemoryLayout/FunctionDescriptor > - Merge branch 'master' into foreign-preview > - Revert changes to RunTests.gmk > - Add --enable-preview to micro benchmark java options > - Address more review comments > - ... and 21 more: > https://git.openjdk.java.net/jdk/compare/ce27d9dd...247e5eb5 Java code looks good (i did not go through the tests). As is common no comments, since code was reviewed in smaller steps in the panama-foreign respository. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7888
Re: RFR: 8282162: [vector] Optimize integral vector negation API [v3]
On Mon, 28 Mar 2022 09:56:22 GMT, Xiaohong Gong wrote: >> The current vector `"NEG"` is implemented with substraction a vector by zero >> in case the architecture does not support the negation instruction. And to >> fit the predicate feature for architectures that support it, the masked >> vector `"NEG" ` is implemented with pattern `"v.not(m).add(1, m)"`. They >> both can be optimized to a single negation instruction for ARM SVE. >> And so does the non-masked "NEG" for NEON. Besides, implementing the masked >> "NEG" with substraction for architectures that support neither negation >> instruction nor predicate feature can also save several instructions than >> the current pattern. >> >> To optimize the VectorAPI negation, this patch moves the implementation from >> Java side to hotspot. The compiler will generate different nodes according >> to the architecture: >> - Generate the (predicated) negation node if architecture supports it, >> otherwise, generate "`zero.sub(v)`" pattern for non-masked operation. >> - Generate `"zero.sub(v, m)"` for masked operation if the architecture >> does not have predicate feature, otherwise generate the original pattern >> `"v.xor(-1, m).add(1, m)"`. >> >> So with this patch, the following transformations are applied: >> >> For non-masked negation with NEON: >> >> moviv16.4s, #0x0 >> sub v17.4s, v16.4s, v17.4s ==> neg v17.4s, v17.4s >> >> and with SVE: >> >> mov z16.s, #0 >> sub z18.s, z16.s, z17.s ==> neg z16.s, p7/m, z16.s >> >> For masked negation with NEON: >> >> moviv17.4s, #0x1 >> mvn v19.16b, v18.16b >> mov v20.16b, v16.16b ==> neg v18.4s, v17.4s >> bsl v20.16b, v19.16b, v18.16b bsl v19.16b, v18.16b, v17.16b >> add v19.4s, v20.4s, v17.4s >> mov v18.16b, v16.16b >> bsl v18.16b, v19.16b, v20.16b >> >> and with SVE: >> >> mov z16.s, #-1 >> mov z17.s, #1==> neg z16.s, p0/m, z16.s >> eor z18.s, p0/m, z18.s, z16.s >> add z18.s, p0/m, z18.s, z17.s >> >> Here are the performance gains for benchmarks (see [1][2]) on ARM and x86 >> machines(note that the non-masked negation benchmarks do not have any >> improvement on X86 since no instructions are changed): >> >> NEON: >> BenchmarkGain >> Byte128Vector.NEG1.029 >> Byte128Vector.NEGMasked 1.757 >> Short128Vector.NEG 1.041 >> Short128Vector.NEGMasked 1.659 >> Int128Vector.NEG 1.005 >> Int128Vector.NEGMasked 1.513 >> Long128Vector.NEG1.003 >> Long128Vector.NEGMasked 1.878 >> >> SVE with 512-bits: >> BenchmarkGain >> ByteMaxVector.NEG1.10 >> ByteMaxVector.NEGMasked 1.165 >> ShortMaxVector.NEG 1.056 >> ShortMaxVector.NEGMasked 1.195 >> IntMaxVector.NEG 1.002 >> IntMaxVector.NEGMasked 1.239 >> LongMaxVector.NEG1.031 >> LongMaxVector.NEGMasked 1.191 >> >> X86 (non AVX-512): >> BenchmarkGain >> ByteMaxVector.NEGMasked 1.254 >> ShortMaxVector.NEGMasked 1.359 >> IntMaxVector.NEGMasked 1.431 >> LongMaxVector.NEGMasked 1.989 >> >> [1] >> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1881 >> [2] >> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Byte128Vector.java#L1896 > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Make "degenerate_vector_integral_negate" to be "NegVI" private Java changes are good. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7782
Re: RFR: JDK-8282776: Bad NullPointerException message when invoking an interface MethodHandle on a null receiver [v4]
On Wed, 23 Mar 2022 23:22:31 GMT, Mandy Chung wrote: >> A simple patch to call `Objects.requireNonNull(recv)` for an explicit null >> receiver check rather than NPE thrown by `Object::getClass`. The message of >> NPE generated by JEP 358 (Helpful NullPointerExceptions) is supposed to be >> helpful but not in this case. > > Mandy Chung has updated the pull request incrementally with one additional > commit since the last revision: > > per feedback Moving the check under the `!isInstance(recv)` check is better. - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7766
Re: RFR: 8283470: Update java.lang.invoke.VarHandle to use sealed classes
On Wed, 23 Mar 2022 16:27:29 GMT, Mandy Chung wrote: > This patch changes VarHandle and its implementation hierarchy to use sealed > classes. All VarHandle permitted classes are package-private and either > final or sealed abstract classes. > > Please also review the CSR: https://bugs.openjdk.java.net/browse/JDK-8283540 Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/7926
Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]
On Tue, 8 Mar 2022 15:13:46 GMT, Claes Redestad wrote: >> Ludovic Henry has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add UTF-16 benchmarks > > An awkward effect of this implementation is that it perturbs results on small > Strings a bit. Adding a branch in the trivial case, but also regressing on > certain lengths (try size=7). The added complexity seem to be no issue for > JITs in these microbenchmarks, but I worry that the increased code size might > play tricks with inlining heuristics in real applications. > > After chatting a bit with @richardstartin regarding the observation that > preventing a strength reduction on the constant 31 value being part of the > improvement I devised an experiment which simply makes the 31 non-constant as > to disable the strength reduction: > > private static int base = 31; > @Benchmark > public int scalarLatin1_NoStrengthReduction() { > int h = 0; > int i = 0, len = latin1.length; > for (; i < len; i++) { > h = base * h + (latin1[i] & 0xff); > } > return h; > } > > Interestingly results of that get planted in the middle of the baseline on > large inputs, while avoiding most of the irregularities on small inputs > compared to manually unrolled versions: > > Benchmark (size) Mode Cnt > Score Error Units > StringHashCode.Algorithm.scalarLatin1 1 avgt3 > 2.910 ? 0.068 ns/op > StringHashCode.Algorithm.scalarLatin1 7 avgt3 > 6.530 ? 0.065 ns/op > StringHashCode.Algorithm.scalarLatin1 8 avgt3 > 7.472 ? 0.034 ns/op > StringHashCode.Algorithm.scalarLatin1 15 avgt3 > 12.750 ? 0.028 ns/op > StringHashCode.Algorithm.scalarLatin1 100 avgt3 > 99.190 ? 0.618 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt3 > 3.119 ? 0.015 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 7 avgt3 > 11.556 ? 4.690 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 8 avgt3 > 7.740 ? 0.005 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled1615 avgt3 > 13.030 ? 0.124 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt3 > 46.470 ? 0.496 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 1 avgt3 > 3.123 ? 0.057 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 7 avgt3 > 11.380 ? 0.085 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 8 avgt3 > 5.849 ? 0.583 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 15 avgt3 > 12.312 ? 0.025 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8100 avgt3 > 45.751 ? 0.146 ns/op > StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 1 avgt3 > 3.173 ? 0.015 ns/op > StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 7 avgt3 > 5.229 ? 0.455 ns/op > StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 8 avgt3 > 5.679 ? 0.012 ns/op > StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 15 avgt3 > 8.731 ? 0.103 ns/op > StringHashCode.Algorithm.scalarLatin1_NoStrengthReduction 100 avgt3 > 70.954 ? 3.386 ns/op > > I wonder if this might be a safer play while we investigate intrinsification > and other possible enhancements? @cl4es Yes, we would need to carefully measure the impact for small array sizes (similar to what we had to do when the array mismatch intrinsic was implemented and applied to array equals). My sense is to focus on the intrinsic and also look for potential opportunities like @merykitty points out, as that is where the larger impact is, although it is more work! - PR: https://git.openjdk.java.net/jdk/pull/7700
Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]
On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry wrote: >> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark(size) Mode Cnt Score >>Error Units >> StringHashCode.Algorithm.scalarLatin1 0 avgt 25 2.111 >> ± 0.210 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 3.500 >> ± 0.127 ns/op >> StringHashCode.Algorithm.scalarLatin110 avgt 25 7.001 >> ± 0.099 ns/op >> StringHashCode.Algorithm.scalarLatin1 100 avgt 2561.285 >> ± 0.444 ns/op >> StringHashCode.Algorithm.scalarLatin1 1000 avgt 25 628.995 >> ± 0.846 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 6307.990 >> ± 4.071 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 0 avgt 25 2.358 >> ± 0.092 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3.631 >> ± 0.159 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 10 avgt 25 7.049 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 2533.626 >> ± 1.218 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled161000 avgt 25 317.811 >> ± 1.225 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3212.333 >> ± 14.621 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled80 avgt 25 2.356 >> ± 0.097 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3.630 >> ± 0.158 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 10 avgt 25 8.724 >> ± 0.065 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 2532.402 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000 avgt 25 321.949 >> ± 0.251 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled81 avgt 25 3202.083 >> ± 1.667 ns/op >> StringHashCode.Algorithm.scalarUTF16 0 avgt 25 2.135 >> ± 0.191 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 5.202 >> ± 0.362 ns/op >> StringHashCode.Algorithm.scalarUTF16 10 avgt 2511.105 >> ± 0.112 ns/op >> StringHashCode.Algorithm.scalarUTF16100 avgt 2575.974 >> ± 0.702 ns/op >> StringHashCode.Algorithm.scalarUTF16 1000 avgt 25 716.429 >> ± 3.290 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 7095.459 >> ± 43.847 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled160 avgt 25 2.381 >> ± 0.038 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 5.268 >> ± 0.422 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 10 avgt 2511.248 >> ± 0.178 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 100 avgt 2552.966 >> ± 0.089 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000 avgt 25 450.912 >> ± 1.834 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled161 avgt 25 4403.988 >> ± 2.927 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 0 avgt 25 2.401 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 5.091 >> ± 0.396 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled810 avgt 2512.801 >> ± 0.189 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 100 avgt 2552.068 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1000 avgt 25 453.270 >> ± 0.340 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 4433.112 >> ± 2.699 ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage. It's &g
Re: RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v5]
On Thu, 10 Feb 2022 04:30:34 GMT, Tagir F. Valeev wrote: >> See the bug description for details. >> >> I propose a simple solution. Let's allow ArraySpliterator to be non-SIZED >> and report artificial estimatedSize(), much bigger than the real one. This >> will allow AbstractSpliterator and IteratorSpliterator to produce prefix >> whose size is comparable to Long.MAX_VALUE (say, starting with >> Long.MAX_VALUE/2), and this will enable further splitting of the prefix. >> This change will drastically improve parallel streaming for affected streams >> of size <= 1024 and significantly improve for streams of size 1025..2. >> The cost is higher-grained splitting for huge streams of unknown size. This >> might add a minor overhead for such scenarios which, I believe, is >> completely tolerable. >> >> No public API changes are necessary, sequential processing should not be >> affected, except an extra field in ArraySpliterator which increases a >> footprint by 8 bytes. >> >> I added a simple test using an artificial collector to ensure that at least >> two non-empty parts are created when parallelizing Stream.iterate source. >> More testing ideas are welcome. > > Tagir F. Valeev has updated the pull request incrementally with two > additional commits since the last revision: > > - Update copyright year > - Cosmetic fixes For unknown-sized iterators we have to bias towards some strategy (recognizing that the source is a poor choice for parallelism). Currently the "peeled" split containing an array of elements (copied from the iterator) is never split, since the array length will never in practice be greater than the size threshold. The parallelism is derived from the splits of the iterator, and this is biased towards a large number of elements. In your approach each "peeled" split effectively gets to use half of the total parallelism. Over subsequent splits of the source that's gonna over provision tasks, specifically after two splits (or 2 * 2^10 + 2^10 elements), and this is biased towards a smaller number of elements. In the issue you point out a number of relevant stream sources. In practice they are unlikely to have millions of elements, and probably more likely fitting into the first few steps of the arithmetic splitting sequence. If the cost-per-element is reasonably high that would overcome any over provisioning. Overall it looks reasonable. I ran it through tier 1/2 testing and there were no failures. If you don't mind I would like to let this percolate a little in my mind (i.e. sleep on it for a day or two). - PR: https://git.openjdk.java.net/jdk/pull/7279
Re: RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v4]
On Thu, 10 Feb 2022 04:22:34 GMT, Tagir F. Valeev wrote: >> Tagir F. Valeev has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Benchmark to demonstrate the patch usefulness > > I added a microbenchmark to demonstrate the speed improvement achievable by > this patch. For demonstration, I used `Pattern.splitAsStream` source (though > as listed in JDK-8280915, many other standard sources are also affected). For > downstream operation I selected `BigInteger.pow(1000)` which is moderately > CPU-intensive (takes ~10us on my machine for input numbers within > 1000..2000). The benchmarking results are as follows (my machine has 8 > cores). Here's non-patched version: > > > Benchmark (size) Mode Cnt Score Error Units > sumOf1000thPowers 10 avgt 30100.367 ±0.793 us/op > sumOf1000thPowers 100 avgt 30963.857 ±6.252 us/op > sumOf1000thPowers1000 avgt 30 10012.923 ± 81.091 us/op > sumOf1000thPowers 1 avgt 30 99546.370 ± 625.105 us/op > sumOf1000thPowersParallel 10 avgt 30104.561 ±1.118 us/op > sumOf1000thPowersParallel 100 avgt 30995.400 ± 26.116 us/op > sumOf1000thPowersParallel1000 avgt 30 9969.296 ± 85.166 us/op > sumOf1000thPowersParallel 1 avgt 30 55856.516 ± 2315.530 us/op > > > We observe that the results depend on the input size linearly for sequential > run, which is quite expected. Parallel doesn't help at all for size = 10, > 100, and 1000, which validates my claim that these spliterators cannot split > at all for sizes <= 1024. For size = 1, parallel version starts > performing somewhat better (1.78x), though it's not nearly close to the > available machine parallelism. > > Here's patched version: > > > Benchmark (size) Mode Cnt Score Error Units > sumOf1000thPowers 10 avgt 30101.380 ±0.961 us/op > sumOf1000thPowers 100 avgt 30970.843 ±8.360 us/op > sumOf1000thPowers1000 avgt 30 9837.397 ± 93.656 us/op > sumOf1000thPowers 1 avgt 30 97741.823 ± 1276.116 us/op > sumOf1000thPowersParallel 10 avgt 30 41.597 ±0.910 us/op > sumOf1000thPowersParallel 100 avgt 30189.735 ±1.911 us/op > sumOf1000thPowersParallel1000 avgt 30 1776.337 ± 31.034 us/op > sumOf1000thPowersParallel 1 avgt 30 17685.723 ± 127.846 us/op > > > We observe no regression for sequential run and drastic improvement for > parallel. Even for size = 10, we see 2.46x speedup and 41 us total time. For > bigger sizes, we see 5.11x..5.54x speedup. > > Please review my patch. I'll be happy to discuss any concerns about this > optimization you may have. Thanks in advance! > @amaembo It's on my queue to review, i may get to it tomorrow, otherwise next > week. I took last week off at short notice, since it was the school holidays. Looking today. - PR: https://git.openjdk.java.net/jdk/pull/7279
Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature [v2]
On Thu, 3 Mar 2022 08:12:46 GMT, Xiaohong Gong wrote: >> The vector `"test"` api is implemented with vector `"compare"`. And the >> masked `"test" `is implemented with `"test(op).and(m)"` which means >> `"compare().and(m)"` finally. Since the masked vector `"compare"` has been >> optimized with predicated instruction for archituctures that support the >> feature, the masked `"test"` can also be optimized by implementing it with >> the predicated compare. This could save the additional ` "and"` for >> architectures that support the predicate feature. >> >> This patch optimized the masked `"test"` by implementing it with masked >> `"compare"`. With this patch, the following codes for the` "IS_NEGATIVE"` op >> for a DoubleVector with SVE: >> >> mov z19.d, #0 >> cmpgt p1.d, p7/z, z19.d, z17.d >> and p0.b, p7/z, p1.b, p0.b >> >> are optimized to: >> >> mov z19.d, #0 >> cmpgt p0.d, p0/z, z19.d, z17.d >> >> Also update the jtreg tests for masked` "test" ` to make sure they are hot >> enough to be compiled by c2. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Simply the testTemplate I guess the following: `mask.cast(IntVector.species(shape())` is more efficient than: `m.cast(vspecies().asIntegral()))` ? - Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/7654
Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature
On Wed, 2 Mar 2022 02:50:27 GMT, Xiaohong Gong wrote: > The vector `"test"` api is implemented with vector `"compare"`. And the > masked `"test" `is implemented with `"test(op).and(m)"` which means > `"compare().and(m)"` finally. Since the masked vector `"compare"` has been > optimized with predicated instruction for archituctures that support the > feature, the masked `"test"` can also be optimized by implementing it with > the predicated compare. This could save the additional ` "and"` for > architectures that support the predicate feature. > > This patch optimized the masked `"test"` by implementing it with masked > `"compare"`. With this patch, the following codes for the` "IS_NEGATIVE"` op > for a DoubleVector with SVE: > > mov z19.d, #0 > cmpgt p1.d, p7/z, z19.d, z17.d > and p0.b, p7/z, p1.b, p0.b > > are optimized to: > > mov z19.d, #0 > cmpgt p0.d, p0/z, z19.d, z17.d > > Also update the jtreg tests for masked` "test" ` to make sure they are hot > enough to be compiled by c2. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java line 1737: > 1735: , > 1736: M2 extends VectorMask> > 1737: M1 testTemplate(Class maskType, Test op, M2 mask) { Can we simplify by making some code FP specific? - the mask `cast` can be applied in this method rather than in the caller, simplifying the signature - for clarify, vector `viewAsIntegralLanes` is only needed for FP (update would be required for the non-mask template). src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java line 1766: > 1764: throw new AssertionError(op); > 1765: } > 1766: return maskType.cast(m.cast(this.vspecies())); Suggestion: return maskType.cast(m.cast(vsp)); Same for non-mask template too. test/jdk/jdk/incubator/vector/templates/Unit-Test.template line 29: > 27: VectorMask<$Wideboxtype$> vmask = VectorMask.fromArray(SPECIES, > mask, 0); > 28: > 29: for (int ic = 0; ic < INVOC_COUNT; ic++) { Can you remove `SmokeTest` from the method name. - PR: https://git.openjdk.java.net/jdk/pull/7654
Re: RFR: 8282143: Objects.requireNonNull should be ForceInline [v2]
On Wed, 2 Mar 2022 00:32:46 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional >> commit since the last revision: >> >> the other > > Thanks a lot for your reviews, do I need a second review for this? > @ExE-Boss IMO generally we only decorate `ForceInline` on functions of which > inlining is crucial since inlining decision is a hard problem and often not > solvable from the local point of view, especially from the callee's one. @merykitty it's good. - PR: https://git.openjdk.java.net/jdk/pull/7543
Re: RFR: 8282143: Objects.requireNonNull should be ForceInline [v2]
On Tue, 1 Mar 2022 02:22:49 GMT, Quan Anh Mai wrote: >> Hi, >> >> `Objects.requireNonNull` may fail to be inlined. The call is expensive and >> may lead to objects escaping to the heap while the null check is cheap and >> is often elided. I have observed this when using the vector API when a call >> to `Objects.requireNonNull` leads to vectors being materialised in a hot >> loop. >> >> Should the other `requireNonNull` be `ForceInline` as well? >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional > commit since the last revision: > > the other Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/7543
Re: RFR: 8282143: Objects.requireNonNull should be ForceInline
On Sat, 19 Feb 2022 05:51:52 GMT, Quan Anh Mai wrote: > Hi, > > `Objects.requireNonNull` may fail to be inlined. The call is expensive and > may lead to objects escaping to the heap while the null check is cheap and is > often elided. I have observed this when using the vector API when a call to > `Objects.requireNonNull` leads to vectors being materialised in a hot loop. > > Should the other `requireNonNull` be `ForceInline` as well? > > Thank you very much. `Objects.requireNonNull` is also used on the critical path of VarHandles, and other places in `j.l.invoke` so I think this is generally beneficial beyond use by the Vector API. - PR: https://git.openjdk.java.net/jdk/pull/7543
Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2]
On Sun, 13 Feb 2022 05:14:47 GMT, Quan Anh Mai wrote: >> Observing the following failures on CPUs with >> "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags: >> >> -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 >> -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation >> >> >> TestVectorCastAVX512.java: >> >> Failed IR Rules (1) >> -- >> - Method "public static void >> compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])": >> * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, >> applyIfAnd={}, applyIfOr={}, >> counts={"(d+(s){2}(VectorUCastI2X.*)+(s){2}===.*)", "1"}, >> applyIfNot={})" >> - counts: Graph contains wrong number of nodes: >> Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*) >> Expected 1 but found 0 nodes. >> >> >> TestVectorCastAVX1.java: >> >> - Method "public static void >> compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])": >> * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, >> applyIfAnd={}, applyIfOr={}, >> counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, >> applyIfNot={})" >> - counts: Graph contains wrong number of nodes: >> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) >> Expected 1 but found 0 nodes. >> >> - Method "public static void >> compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])": >> * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, >> applyIfAnd={}, applyIfOr={}, >> counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, >> applyIfNot={})" >> - counts: Graph contains wrong number of nodes: >> Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) >> Expected 1 but found 0 nodes. > > @PaulSandoz Thanks a lot for your testing, the reason seems to be due to > `LaneType::asIntegral` missing `ForceInline` annotation. I have run the > reshape test 10 times without getting any failure while with previous patch > there is often 1 or 2. > Thanks. @merykitty testing now passes. Java bits look good. Needs HotSpot reviewer. - PR: https://git.openjdk.java.net/jdk/pull/7358
Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v3]
On Sun, 13 Feb 2022 05:18:34 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used >> in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional > commit since the last revision: > > missing ForceInline Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/7358
Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts [v2]
On Thu, 10 Feb 2022 15:14:44 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch implements the unsigned upcast intrinsics in x86, which are used >> in vector lane-wise reinterpreting operations. >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with two additional > commits since the last revision: > > - minor rename > - address reviews Observing the following failures on CPUs with "Intel_R__Xeon_R__Gold_6354_CPU___3.00GHz" with HotSpot flags: -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation TestVectorCastAVX512.java: Failed IR Rules (1) -- - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUI256toL512(int[],long[])": * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(d+(s){2}(VectorUCastI2X.*)+(s){2}===.*)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: Regex 1: (\\d+(\\s){2}(VectorUCastI2X.*)+(\\s){2}===.*) Expected 1 but found 0 nodes. TestVectorCastAVX1.java: - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toS64(byte[],short[])": * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) Expected 1 but found 0 nodes. - Method "public static void compiler.vectorapi.reshape.tests.TestVectorCast.testUB64toI128(byte[],int[])": * @IR rule 1: "@compiler.lib.ir_framework.IR(failOn={}, applyIf={}, applyIfAnd={}, applyIfOr={}, counts={"(d+(s){2}(VectorUCastB2X.*)+(s){2}===.*)", "1"}, applyIfNot={})" - counts: Graph contains wrong number of nodes: Regex 1: (\\d+(\\s){2}(VectorUCastB2X.*)+(\\s){2}===.*) Expected 1 but found 0 nodes. - PR: https://git.openjdk.java.net/jdk/pull/7358
Integrated: 8281294: [vectorapi] FIRST_NONZERO reduction operation throws IllegalArgumentExcept on zero vectors
On Wed, 9 Feb 2022 21:36:01 GMT, Paul Sandoz wrote: > …ArgumentExcept on zero vectors > > This PR fixes issues for reduction using FIRST_NONZERO and adds tests. The > testing infrastructure is generalized to reduction functions and tests for > min/max reductions are updated to use that. This pull request has now been integrated. Changeset: 83b6e4bc Author:Paul Sandoz URL: https://git.openjdk.java.net/jdk/commit/83b6e4bc04db89a846a1b6c2d0666efe139f8f61 Stats: 3725 lines in 54 files changed: 2742 ins; 407 del; 576 mod 8281294: [vectorapi] FIRST_NONZERO reduction operation throws IllegalArgumentExcept on zero vectors Reviewed-by: jrose - PR: https://git.openjdk.java.net/jdk/pull/7410