Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz wrote: >> @PaulSandoz, could you please help to check whether the current version is >> ok for you? Thanks so much! > > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to resolve. Thanks for the review @PaulSandoz @merykitty ! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]
On Thu, 2 Jun 2022 01:29:54 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains three commits: > > - Merge branch 'jdk:master' to JDK-8286279 > - Wrap the offset check into a static method > - 8286279: [vectorapi] Only check index of masked lanes if offset is out of > array boundary for masked store Marked as reviewed by psandoz (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz wrote: >> @PaulSandoz, could you please help to check whether the current version is >> ok for you? Thanks so much! > > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to resolve. Hi @PaulSandoz , I updated the patch and resolved the conflicts. Could you please take a look at it again? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]
> Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'jdk:master' to JDK-8286279 - Wrap the offset check into a static method - 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store - Changes: https://git.openjdk.java.net/jdk/pull/8620/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=8620=02 Stats: 216 lines in 9 files changed: 179 ins; 0 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/8620.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620 PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Wrap the offset check into a static method > > @PaulSandoz, could you please help to check whether the current version is ok > for you? Thanks so much! > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to resolve. Great! I will resolve the conflict and update the patch, thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Wrap the offset check into a static method > > @PaulSandoz, could you please help to check whether the current version is ok > for you? Thanks so much! @XiaohongGong looks good, now the Vector API JEP has been integrated you will get a merge conflict, but it should be easier to resolve. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method @PaulSandoz, could you please help to check whether the current version is ok for you? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method Thanks for the explanation! Yeah, the main problem is Java doesn't have the direct unsigned comparison. We need the function call. From the two ways you provided, I think the second `Integer.lessThanUnsigned` looks better. But I'm not sure whether this could improve the performance a lot, although the first check `a.length - vsp.length() > 0` can be hosited out side of the loop. And this might make the codes more complex for me. Maybe we can do a pre research to find a better implementation to the unsigned comparison first. Do you think so? - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method However, we seem to lack the ability to do an unsigned comparison reliably. C2 can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will fail if `x` or `y` is an addition with constant in such cases the constants will be merged together. As a result, I think we need an intrinsic for this. `Integer.compareUnsigned` may fit but it manifests the result into an integer register which may lead to suboptimal materialisation of flags, another approach would be to have a separate method `Integer.lessThanUnsigned` which only returns `boolean` and C2 can have better time splitting the boolean comparison through `IfNode`, which will prevent the materialisation of `boolean` values. What do you two think? I.e, after splitting if through merge point, the shape of `if (Integer.lessThanUnsigned(a, b))` would be transformed from ab \ / CmpU | Bool | If / \ IfTrueIfFalse \ / Region10 \ | / Phi 0 \ / CmpI into ab \ / CmpU Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Fri, 13 May 2022 01:27:18 GMT, Xiaohong Gong wrote: >> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - >> vsp.length()` which would hoist the first check outside of the loop. >> Thanks. > >> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - >> vsp.length()` which would hoist the first check outside of the loop. Thanks. > > Thanks for the review @merykitty ! We need the check `offset >= 0` which I > think is different from `a.length - vsp.length()`. @XiaohongGong `a >= 0 && a < b` is the same as `b >= 0 && a u< b`, it is how we are doing range check today. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
> Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Wrap the offset check into a static method - Changes: - all: https://git.openjdk.java.net/jdk/pull/8620/files - new: https://git.openjdk.java.net/jdk/pull/8620/files/eb67f681..2229dd24 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8620=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8620=00-01 Stats: 56 lines in 8 files changed: 5 ins; 0 del; 51 mod Patch: https://git.openjdk.java.net/jdk/pull/8620.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620 PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz wrote: >> Thanks for the review @PaulSandoz ! For the >> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be >> used here because the `outOfBounds` exception will be thrown if the offset >> is not inside of the valid array boundary. And for the masked operations, >> this is not needed since we only need to check the masked lanes. Please >> correct me if I didn't understand correctly. Thanks! > > Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My > intent was that we could use a method rather than repeating the expression in > numerous places (including masked loads IIRC). Hi @PaulSandoz , I'v updated the offset check for masked load/store. Could you please help to check whether it is ok? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Thu, 12 May 2022 09:49:17 GMT, Quan Anh Mai wrote: > Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - > vsp.length()` which would hoist the first check outside of the loop. Thanks. Thanks for the review @merykitty ! We need the check `offset >= 0` which I think is different from `a.length - vsp.length()`. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote: > Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - vsp.length()` which would hoist the first check outside of the loop. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz wrote: >> Thanks for the review @PaulSandoz ! For the >> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be >> used here because the `outOfBounds` exception will be thrown if the offset >> is not inside of the valid array boundary. And for the masked operations, >> this is not needed since we only need to check the masked lanes. Please >> correct me if I didn't understand correctly. Thanks! > > Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My > intent was that we could use a method rather than repeating the expression in > numerous places (including masked loads IIRC). Good idea! I will try. Thanks! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Thu, 12 May 2022 01:56:25 GMT, Xiaohong Gong wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template >> line 4086: >> >>> 4084: } else { >>> 4085: $Type$Species vsp = vspecies(); >>> 4086: if (offset < 0 || offset > (a.length - vsp.length())) { >> >> Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. >> >> if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { >> ... > > Thanks for the review @PaulSandoz ! For the > `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be > used here because the `outOfBounds` exception will be thrown if the offset is > not inside of the valid array boundary. And for the masked operations, this > is not needed since we only need to check the masked lanes. Please correct me > if I didn't understand correctly. Thanks! Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My intent was that we could use a method rather than repeating the expression in numerous places (including masked loads IIRC). - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Wed, 11 May 2022 15:10:55 GMT, Paul Sandoz wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template > line 4086: > >> 4084: } else { >> 4085: $Type$Species vsp = vspecies(); >> 4086: if (offset < 0 || offset > (a.length - vsp.length())) { > > Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. > > if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { > ... Thanks for the review @PaulSandoz ! For the `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be used here because the `outOfBounds` exception will be thrown if the offset is not inside of the valid array boundary. And for the masked operations, this is not needed since we only need to check the masked lanes. Please correct me if I didn't understand correctly. Thanks! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote: > Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 4086: > 4084: } else { > 4085: $Type$Species vsp = vspecies(); > 4086: if (offset < 0 || offset > (a.length - vsp.length())) { Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { ... - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong wrote: > Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. Hi @PaulSandoz @rose00, could you please help to take a look at this simple PR? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620