Re: RFR: 8267969: Add vectorized implementation for VectorMask.eq()

2021-06-01 Thread Xiaohong Gong
On Tue, 1 Jun 2021 16:29:58 GMT, Paul Sandoz wrote: > Looks. Later we may want to consider pushing this down as an intrinsic, > perhaps reusing `VectorSupport.compare`. Thanks for your review @PaulSandoz ! Yes, reusing `VectorSupport.compare` is an alternative way to do the vectorization.

Re: RFR: 8267969: Add vectorized implementation for VectorMask.eq() [v2]

2021-06-01 Thread Xiaohong Gong
> // accumulate results, so JIT can't eliminate relevant > computations > m = m.and(av.eq(bv)); > } > } > > return m; > } Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The increme

Integrated: 8267969: Add vectorized implementation for VectorMask.eq()

2021-06-02 Thread Xiaohong Gong
On Mon, 31 May 2021 10:25:26 GMT, Xiaohong Gong wrote: > Currently `"VectorMask.eq()" `is not vectorized: > > public VectorMask eq(VectorMask m) { > // FIXME: Generate good code here. > return bOp(m, (i, a, b) -> a == b); > } > > Thi

Re: RFR: 8267969: Add vectorized implementation for VectorMask.eq() [v2]

2021-06-02 Thread Xiaohong Gong
On Wed, 2 Jun 2021 07:48:47 GMT, Nils Eliasson wrote: > Please wait until you have two reviewers before integrating. Sure! Thanks so much for looking at this PR! - PR: https://git.openjdk.java.net/jdk/pull/4272

Re: RFR: 8268151: Vector API toShuffle optimization

2021-06-02 Thread Xiaohong Gong
On Thu, 3 Jun 2021 00:29:00 GMT, Sandhya Viswanathan wrote: > The Vector API toShuffle method can be optimized using existing vector > conversion intrinsic. > > The following changes are made: > 1) vector.toShuffle java implementation is changed to call > VectorSupport.convert. > 2) The

Re: RFR: 8268151: Vector API toShuffle optimization

2021-06-02 Thread Xiaohong Gong
On Thu, 3 Jun 2021 00:29:00 GMT, Sandhya Viswanathan wrote: > The Vector API toShuffle method can be optimized using existing vector > conversion intrinsic. > > The following changes are made: > 1) vector.toShuffle java implementation is changed to call > VectorSupport.convert. > 2) The

RFR: 8267969: Add vectorized implementation for VectorMask.eq()

2021-05-31 Thread Xiaohong Gong
Currently `"VectorMask.eq()" `is not vectorized: public VectorMask eq(VectorMask m) { // FIXME: Generate good code here. return bOp(m, (i, a, b) -> a == b); } This can be implemented by calling `"xor(m.not())"` directly. The performance improved about 1.4x ~ 1.9x for the

Re: RFR: 8268151: Vector API toShuffle optimization [v2]

2021-06-03 Thread Xiaohong Gong
On Thu, 3 Jun 2021 18:40:09 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/vectornode.cpp line 1246: >> >>> 1244: return new VectorLoadMaskNode(value, out_vt); >>> 1245: } else if (is_vector_shuffle) { >>> 1246: if (!is_shuffle_to_vector()) { >> >> Hi

Re: RFR: 8268151: Vector API toShuffle optimization [v2]

2021-06-03 Thread Xiaohong Gong
On Thu, 3 Jun 2021 21:43:19 GMT, Sandhya Viswanathan wrote: >> The Vector API toShuffle method can be optimized using existing vector >> conversion intrinsic. >> >> The following changes are made: >> 1) vector.toShuffle java implementation is changed to call >> VectorSupport.convert. >> 2)

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot() [v2]

2021-04-01 Thread Xiaohong Gong
On Thu, 1 Apr 2021 15:13:31 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Move the changing to AbstractMask.andNot and revert changes in VectorM

Integrated: 8264109: Add vectorized implementation for VectorMask.andNot()

2021-04-02 Thread Xiaohong Gong
On Fri, 26 Mar 2021 01:50:33 GMT, Xiaohong Gong wrote: > Currently "VectorMask.andNot()" is not vectorized: > public VectorMask andNot(VectorMask m) { > // FIXME: Generate good code here. > return bOp(m, (i, a, b) -> a && !b);

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot() [v2]

2021-04-02 Thread Xiaohong Gong
On Fri, 2 Apr 2021 09:59:31 GMT, Ningsheng Jian wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Move the changing to AbstractMask.andNot and revert changes in VectorM

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot() [v2]

2021-03-31 Thread Xiaohong Gong
.length()) { > VectorMask vmask = VectorMask.fromArray(SPECIES, mask, i); > rm = rm.andNot(vmask); > } > } > return rm; > } Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Move the changing to Abst

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot() [v2]

2021-03-31 Thread Xiaohong Gong
On Wed, 31 Mar 2021 16:42:09 GMT, Paul Sandoz wrote: > Would you mind updating the existing `AbstractMask.andNot` implementation? > rather than changing `VectorMask.andNot`. That fits in with the current > implementation pattern. Hi @PaulSandoz , thanks for looking at this PR. I'v updated the

Re: RFR: 8264109: Add vectorized implementation for VectorMask.andNot()

2021-03-30 Thread Xiaohong Gong
On Fri, 26 Mar 2021 01:50:33 GMT, Xiaohong Gong wrote: > Currently "VectorMask.andNot()" is not vectorized: > public VectorMask andNot(VectorMask m) { > // FIXME: Generate good code here. > return bOp(m, (i, a, b) -> a && !b);

Re: RFR: 8282162: [vector] Optimize vector negation API

2022-03-14 Thread Xiaohong Gong
On Tue, 15 Mar 2022 02:39:42 GMT, Joe Darcy wrote: > Note that in terms of Java semantics, negation of floating point values needs > to be implemented as subtraction from negative zero rather than positive zero: > > double negate(double arg) {return -0.0 - arg; } > > This is to handle signed

Re: RFR: 8282162: [vector] Optimize vector negation API

2022-03-14 Thread Xiaohong Gong
On Fri, 11 Mar 2022 06:29:22 GMT, Xiaohong Gong wrote: > The current vector `"NEG"` is implemented with substraction a vector by zero > in case the architecture does not support the negation instruction. And to > fit the predicate feature for architectures that support it, t

Integrated: 8282432: Optimize masked "test" Vector API with predicate feature

2022-03-08 Thread Xiaohong Gong
On Wed, 2 Mar 2022 02:50:27 GMT, Xiaohong Gong wrote: > The vector `"test"` api is implemented with vector `"compare"`. And the > masked `"test" `is implemented with `"test(op).and(m)"` which means > `"compare().and(m)"` finally

Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature [v2]

2022-03-06 Thread Xiaohong Gong
On Thu, 3 Mar 2022 17:40:13 GMT, Paul Sandoz wrote: > I guess the following: `mask.cast(IntVector.species(shape())` is more > efficient than: `m.cast(vspecies().asIntegral()))` ? Yeah, that's one point. Another main reason is `m.cast(vspecies().asIntegral()))` cannot be handled well in the

RFR: 8282432: Optimize masked "test" Vector API with predicate feature

2022-03-01 Thread Xiaohong Gong
The vector `"test"` api is implemented with vector `"compare"`. And the masked `"test" `is implemented with `"test(op).and(m)"` which means `"compare().and(m)"` finally. Since the masked vector `"compare"` has been optimized with predicated instruction for archituctures that support the

Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature

2022-03-03 Thread Xiaohong Gong
On Wed, 2 Mar 2022 02:50:27 GMT, Xiaohong Gong wrote: > The vector `"test"` api is implemented with vector `"compare"`. And the > masked `"test" `is implemented with `"test(op).and(m)"` which means > `"compare().and(m)"` finally

Re: RFR: 8282432: Optimize masked "test" Vector API with predicate feature [v2]

2022-03-03 Thread Xiaohong Gong
, z17.d > and p0.b, p7/z, p1.b, p0.b > > are optimized to: > > mov z19.d, #0 > cmpgt p0.d, p0/z, z19.d, z17.d > > Also update the jtreg tests for masked` "test" ` to make sure they are hot > enough to be compiled by c2. Xiaohong Gong has updated the

Re: RFR: 8282162: [vector] Optimize vector negation API [v2]

2022-03-28 Thread Xiaohong Gong
On Mon, 28 Mar 2022 07:43:29 GMT, Jie Fu wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add a superclass for vector negation > > src/hotspot/share/opto/vectornode.cpp line 1592: >

Re: RFR: 8282162: [vector] Optimize vector negation API [v2]

2022-03-28 Thread Xiaohong Gong
On Mon, 28 Mar 2022 07:40:48 GMT, Jie Fu wrote: >> The compiler can get the real type info from `Op_NegVI` that can also handle >> the `BYTE ` and `SHORT ` basic type. I just don't want to add more new IRs >> which also need more match rules in the ad files. >> >>> Is there any performance

RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

2022-03-30 Thread Xiaohong Gong
Currently the vector load with mask when the given index happens out of the array boundary is implemented with pure java scalar code to avoid the IOOBE (IndexOutOfBoundaryException). This is necessary for architectures that do not support the predicate feature. Because the masked load is

Re: RFR: 8282162: [vector] Optimize integral vector negation API [v3]

2022-03-28 Thread Xiaohong Gong
sked 1.239 > LongMaxVector.NEG1.031 > LongMaxVector.NEGMasked 1.191 > > X86 (non AVX-512): > BenchmarkGain > ByteMaxVector.NEGMasked 1.254 > ShortMaxVector.NEGMasked 1.359 > IntMaxVector.NEGMasked 1.431 > LongMaxVector.NEGMasked 1.989 >

Re: RFR: 8282162: [vector] Optimize integral vector negation API [v3]

2022-03-29 Thread Xiaohong Gong
On Tue, 29 Mar 2022 18:05:56 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Make "degenerate_vector_integral_negate" to be "NegVI" private >

Integrated: 8282162: [vector] Optimize integral vector negation API

2022-03-29 Thread Xiaohong Gong
On Fri, 11 Mar 2022 06:29:22 GMT, Xiaohong Gong wrote: > The current vector `"NEG"` is implemented with substraction a vector by zero > in case the architecture does not support the negation instruction. And to > fit the predicate feature for architectures that support it, t

Re: RFR: 8282162: [vector] Optimize vector negation API

2022-03-20 Thread Xiaohong Gong
On Sat, 19 Mar 2022 02:34:55 GMT, Jie Fu wrote: >> The current vector `"NEG"` is implemented with substraction a vector by zero >> in case the architecture does not support the negation instruction. And to >> fit the predicate feature for architectures that support it, the masked >> vector

Re: RFR: 8282162: [vector] Optimize vector negation API

2022-03-22 Thread Xiaohong Gong
On Sat, 19 Mar 2022 03:11:12 GMT, Jie Fu wrote: >>> Note that in terms of Java semantics, negation of floating point values >>> needs to be implemented as subtraction from negative zero rather than >>> positive zero: >>> >>> double negate(double arg) {return -0.0 - arg; } >>> >>> This is to

Re: RFR: 8282162: [vector] Optimize vector negation API [v2]

2022-03-22 Thread Xiaohong Gong
sked 1.239 > LongMaxVector.NEG1.031 > LongMaxVector.NEGMasked 1.191 > > X86 (non AVX-512): > BenchmarkGain > ByteMaxVector.NEGMasked 1.254 > ShortMaxVector.NEGMasked 1.359 > IntMaxVector.NEGMasked 1.431 > LongMaxVector.NEGMasked 1.989 >

Re: RFR: 8282162: [vector] Optimize integral vector negation API [v3]

2022-03-29 Thread Xiaohong Gong
On Tue, 29 Mar 2022 05:05:43 GMT, Jie Fu wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Make "degenerate_vector_integral_negate" to be "NegVI" private > > Not

Re: RFR: 8282162: [vector] Optimize vector negation API [v2]

2022-03-23 Thread Xiaohong Gong
On Tue, 22 Mar 2022 09:58:23 GMT, Xiaohong Gong wrote: >> The current vector `"NEG"` is implemented with substraction a vector by zero >> in case the architecture does not support the negation instruction. And to >> fit the predicate feature for architectures

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

2022-04-05 Thread Xiaohong Gong
On Wed, 30 Mar 2022 10:31:59 GMT, Xiaohong Gong wrote: > Currently the vector load with mask when the given index happens out of the > array boundary is implemented with pure java scalar code to avoid the IOOBE > (IndexOutOfBoundaryException). This is necessary for architecture

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-22 Thread Xiaohong Gong
7075.923 ops/ms > LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms > LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms > > Similar performance gain can also be observed on 512-bit SVE system. Xiaohong Gong has updated the pull request inc

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-22 Thread Xiaohong Gong
On Wed, 20 Apr 2022 02:46:09 GMT, Xiaohong Gong wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java >> line 2861: >> >>> 2859: ByteSpecies vsp = (ByteSpecies) species; >>> 2860: if

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

2022-04-19 Thread Xiaohong Gong
On Sat, 9 Apr 2022 00:10:40 GMT, Sandhya Viswanathan wrote: >> Currently the vector load with mask when the given index happens out of the >> array boundary is implemented with pure java scalar code to avoid the IOOBE >> (IndexOutOfBoundaryException). This is necessary for architectures that

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

2022-04-19 Thread Xiaohong Gong
On Mon, 11 Apr 2022 09:04:36 GMT, Jatin Bhateja wrote: >> The optimization for masked store is recorded to: >> https://bugs.openjdk.java.net/browse/JDK-8284050 > >> The blend should be with the intended-to-store vector, so that masked lanes >> contain the need-to-store elements and unmasked

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Xiaohong Gong
rayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can als

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Xiaohong Gong
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz wrote: >> Thanks for the review @PaulSandoz ! For the >> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be >> used here because the `outOfBounds` exception will be thrown if the offset >> is not inside of the valid array

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-12 Thread Xiaohong Gong
On Thu, 12 May 2022 16:07:54 GMT, Paul Sandoz wrote: > Yes, the tests were run in debug mode. The reporting of the missing constant > occurs for the compiled method that is called from the method where the > constants are declared e.g.: > > ``` > 719 240b

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-13 Thread Xiaohong Gong
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector ran

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-12 Thread Xiaohong Gong
On Thu, 12 May 2022 09:49:17 GMT, Quan Anh Mai wrote: > Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - > vsp.length()` which would hoist the first check outside of the loop. Thanks. Thanks for the review @merykitty ! We need the check `offset >= 0` which I think is

Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v6]

2022-05-19 Thread Xiaohong Gong
On Thu, 19 May 2022 08:53:31 GMT, Ningsheng Jian wrote: >>> LUT should be generated only if UsePopCountInsturction is false >> >> Should there be `!UsePopCountInsturction` check then? >> >>> restrict the scope of flag to only scalar popcount operation >> >> Interesting. But AArch64 code does

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]

2022-06-01 Thread Xiaohong Gong
rayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can als

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-06-01 Thread Xiaohong Gong
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz wrote: >> @PaulSandoz, could you please help to check whether the current version is >> ok for you? Thanks so much! > > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v5]

2022-06-01 Thread Xiaohong Gong
7075.923 ops/ms > LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms > LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms > > Similar performance gain can also be observed on 512-bit SVE system. Xiaohong Gong has updated the pull reques

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-06-01 Thread Xiaohong Gong
On Thu, 2 Jun 2022 01:49:10 GMT, Xiaohong Gong wrote: > > @XiaohongGong Could you please rebase the branch and resolve conflicts? > > Sure, I'm working on this now. The patch will be updated soon. Thanks. Resolved the conflicts. Thanks! - PR: https://git.openjdk.java.

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-06-01 Thread Xiaohong Gong
On Fri, 13 May 2022 08:58:12 GMT, Xiaohong Gong wrote: >> Yes, the tests were run in debug mode. The reporting of the missing constant >> occurs for the compiled method that is called from the method where the >> constants are declared e.g.: >&g

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-31 Thread Xiaohong Gong
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Wrap the offset check into a static method > > @PaulSandoz, could you please hel

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-13 Thread Xiaohong Gong
On Thu, 12 May 2022 16:07:54 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename "use_predicate" to "needs_predicate" > > Yes, the tests we

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Xiaohong Gong
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz wrote: >> Thanks for the review @PaulSandoz ! For the >> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be >> used here because the `outOfBounds` exception will be thrown if the offset >> is not inside of the valid array

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-11 Thread Xiaohong Gong
On Wed, 11 May 2022 19:45:55 GMT, Paul Sandoz wrote: > I tried your test code with the patch and logged compilation > (`-XX:-TieredCompilation -XX:+PrintCompilation -XX:+PrintInlining > -XX:+PrintIntrinsics -Xbatch`) > > For `func` the first call to `VectorSupport::loadMasked` is intrinsic

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Xiaohong Gong
On Wed, 11 May 2022 15:10:55 GMT, Paul Sandoz wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >>

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v4]

2022-05-13 Thread Xiaohong Gong
7075.923 ops/ms > LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms > LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms > > Similar performance gain can also be observed on 512-bit SVE system. Xiaohong Gong has updated the pull req

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-29 Thread Xiaohong Gong
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector ran

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-06-06 Thread Xiaohong Gong
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz wrote: >> @PaulSandoz, could you please help to check whether the current version is >> ok for you? Thanks so much! > > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v6]

2022-06-06 Thread Xiaohong Gong
7075.923 ops/ms > LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms > LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms > > Similar performance gain can also be observed on 512-bit SVE system. Xiaohong Gong has updated the pull requ

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v5]

2022-06-06 Thread Xiaohong Gong
On Mon, 6 Jun 2022 10:40:45 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'jdk:master' into JDK-8283667 >> - Use inte

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v5]

2022-06-06 Thread Xiaohong Gong
On Mon, 6 Jun 2022 15:41:06 GMT, Paul Sandoz wrote: > Looks good. As a follow on PR I think it would be useful to add constants > `OFFSET_IN_RANGE` and `OFFSET_OUT_OF_RANGE`, then it becomes much clearer in > source and you can drop the `/* offsetInRange */` comment on the argument. Hi

Integrated: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-06-06 Thread Xiaohong Gong
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote: > Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v5]

2022-06-07 Thread Xiaohong Gong
On Tue, 7 Jun 2022 06:41:36 GMT, Jatin Bhateja wrote: >> Yeah, thanks and it's really a good suggestion to limit this benchmark only >> for the IOOBE cases. I locally modified the tests to make sure only the >> IOOBE case happens and the results show good as well. But do you think it's >>

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-06-07 Thread Xiaohong Gong
On Thu, 12 May 2022 16:07:54 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename "use_predicate" to "needs_predicate" > > Yes, the tests we

Integrated: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

2022-06-07 Thread Xiaohong Gong
On Wed, 30 Mar 2022 10:31:59 GMT, Xiaohong Gong wrote: > Currently the vector load with mask when the given index happens out of the > array boundary is implemented with pure java scalar code to avoid the IOOBE > (IndexOutOfBoundaryException). This is necessary for architecture

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-06-05 Thread Xiaohong Gong
On Thu, 2 Jun 2022 03:24:07 GMT, Xiaohong Gong wrote: >>> @XiaohongGong Could you please rebase the branch and resolve conflicts? >> >> Sure, I'm working on this now. The patch will be updated soon. Thanks. > >> > @XiaohongGong Could you please rebase

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-06-05 Thread Xiaohong Gong
On Thu, 12 May 2022 16:07:54 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename "use_predicate" to "needs_predicate" > > Yes, the tests we

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-06 Thread Xiaohong Gong
On Fri, 6 May 2022 14:59:26 GMT, Paul Sandoz wrote: >> Make sense! Thanks for the explanation! > > Doh! of course. This is not the first and will not be the last time i get > caught out by the 2-slot requirement. > It may be useful to do this: > > Node* mask_arg = is_store ? argument(8) :

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-10 Thread Xiaohong Gong
On Mon, 9 May 2022 21:55:27 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename "use_predicate" to "needs_predicate" > > I modified the c

Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Xiaohong Gong
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote: > Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-09 Thread Xiaohong Gong
On Mon, 9 May 2022 21:55:27 GMT, Paul Sandoz wrote: > I modified the code of this PR to avoid the conversion of `boolean` to `int`, > so a constant integer value is passed all the way through, and the masked > load is made intrinsic from the method at which the constants are passed as >

RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-09 Thread Xiaohong Gong
Checking whether the indexes of masked lanes are inside of the valid memory boundary is necessary for masked vector memory access. However, this could be saved if the given offset is inside of the vector range that could make sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 5 May 2022 01:21:40 GMT, Paul Sandoz wrote: > > Yeah, I agree that it's not good by adding a branch checking for > > `offsetInRange`. But actually I met the constant issue that passing the > > values all the way cannot guarantee the argument a constant in compiler at > > the compile

Re: RFR: 8284050: [vectorapi] Optimize masked store for non-predicated architectures [v2]

2022-05-04 Thread Xiaohong Gong
oreMaskedBenchmark.longStoreArrayMask2025.031 4604.504 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8339.389 17817.128 ops/ms > > Similar performance gain can also be observed on ARM SVE system. Xiaohong Gong has updated the pull request with a new target base due to a merge or a

Re: RFR: 8284050: [vectorapi] Optimize masked store for non-predicated architectures [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 5 May 2022 02:27:03 GMT, John R Rose wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains one commit: >> >> 8284050: [vectorapi] Optimize masked store for non-predicated archi

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 28 Apr 2022 00:13:49 GMT, Sandhya Viswanathan wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename the "usePred" to "offsetInRange" > > src/hotspot/sha

RFR: 8284050: [vectorapi] Optimize masked store for non-predicated architectures

2022-05-04 Thread Xiaohong Gong
Currently the vectorization of masked vector store is implemented by the masked store instruction only on architectures that support the predicate feature. The compiler will fall back to the java scalar code for non-predicate supported architectures like ARM NEON. However, for these systems,

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Fri, 29 Apr 2022 21:34:13 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename the "usePred" to "offsetInRange" > > IIUC when the hardwar

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 5 May 2022 01:42:48 GMT, Xiaohong Gong wrote: > > > Yeah, I agree that it's not good by adding a branch checking for > > > `offsetInRange`. But actually I met the constant issue that passing the > > > values all the way cannot guarantee the argu

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-05 Thread Xiaohong Gong
On Thu, 5 May 2022 02:14:08 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 1232: >> >>> 1230: // out when current case uses the predicate feature. >>> 1231: if (!supports_predicate) { >>> 1232: bool use

Re: RFR: 8284050: [vectorapi] Optimize masked store for non-predicated architectures [v2]

2022-05-05 Thread Xiaohong Gong
On Thu, 5 May 2022 02:09:39 GMT, Xiaohong Gong wrote: >> Currently the vectorization of masked vector store is implemented by the >> masked store instruction only on architectures that support the predicate >> feature. The compiler will fall back to the java scalar code for

Withdrawn: 8284050: [vectorapi] Optimize masked store for non-predicated architectures

2022-05-05 Thread Xiaohong Gong
On Thu, 5 May 2022 02:00:04 GMT, Xiaohong Gong wrote: > Currently the vectorization of masked vector store is implemented by the > masked store instruction only on architectures that support the predicate > feature. The compiler will fall back to the java scalar code for > n

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-05 Thread Xiaohong Gong
7075.923 ops/ms > LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE 119.771 330.587 ops/ms > LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE 431.961 939.301 ops/ms > > Similar performance gain can also be observed on 512-bit SVE system. Xiaohong Gong has updated the pull request inc

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 31 Mar 2022 02:15:26 GMT, Quan Anh Mai wrote: >> I'm afraid not. "Load + Blend" makes the elements of unmasked lanes to be >> `0`. Then a full store may change the values in the unmasked memory to be 0, >> which is different with the mask store API definition. > > The blend should be

Re: RFR: 8284050: [vectorapi] Optimize masked store for non-predicated architectures [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 5 May 2022 02:09:39 GMT, Xiaohong Gong wrote: >> Currently the vectorization of masked vector store is implemented by the >> masked store instruction only on architectures that support the predicate >> feature. The compiler will fall back to the java scalar code for

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-05 Thread Xiaohong Gong
On Thu, 5 May 2022 19:27:47 GMT, Paul Sandoz wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Rename "use_predicate" to "needs_predicate" > > src/hotspot/sha

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v3]

2022-05-05 Thread Xiaohong Gong
On Fri, 6 May 2022 04:22:30 GMT, Sandhya Viswanathan wrote: >> I'm afraid it's `argument(8)` for the load operation since the `argument(7)` >> is the mask input. It seems the argument number is not right begin from the >> mask input which is expected to be `6`. But the it's not. Actually I