Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-06-06 Thread Xiaohong Gong
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz  wrote:

>> @PaulSandoz, could you please help to check whether the current version is 
>> ok for you? Thanks so much!
>
> @XiaohongGong looks good, now the Vector API JEP has been integrated you will 
> get a merge conflict, but it should be easier to resolve.

Thanks for the review @PaulSandoz @merykitty !

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]

2022-06-06 Thread Paul Sandoz
On Thu, 2 Jun 2022 01:29:54 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains three commits:
> 
>  - Merge branch 'jdk:master' to JDK-8286279
>  - Wrap the offset check into a static method
>  - 8286279: [vectorapi] Only check index of masked lanes if offset is out of 
> array boundary for masked store

Marked as reviewed by psandoz (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-06-01 Thread Xiaohong Gong
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz  wrote:

>> @PaulSandoz, could you please help to check whether the current version is 
>> ok for you? Thanks so much!
>
> @XiaohongGong looks good, now the Vector API JEP has been integrated you will 
> get a merge conflict, but it should be easier to resolve.

Hi @PaulSandoz , I updated the patch and resolved the conflicts. Could you 
please take a look at it again? Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v3]

2022-06-01 Thread Xiaohong Gong
> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

Xiaohong Gong has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now contains three commits:

 - Merge branch 'jdk:master' to JDK-8286279
 - Wrap the offset check into a static method
 - 8286279: [vectorapi] Only check index of masked lanes if offset is out of 
array boundary for masked store

-

Changes: https://git.openjdk.java.net/jdk/pull/8620/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=8620=02
  Stats: 216 lines in 9 files changed: 179 ins; 0 del; 37 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8620.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-31 Thread Xiaohong Gong
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Wrap the offset check into a static method
>
> @PaulSandoz, could you please help to check whether the current version is ok 
> for you? Thanks so much!

> @XiaohongGong looks good, now the Vector API JEP has been integrated you will 
> get a merge conflict, but it should be easier to resolve.

Great! I will resolve the conflict and update the patch, thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-31 Thread Paul Sandoz
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Wrap the offset check into a static method
>
> @PaulSandoz, could you please help to check whether the current version is ok 
> for you? Thanks so much!

@XiaohongGong looks good, now the Vector API JEP has been integrated you will 
get a merge conflict, but it should be easier to resolve.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-29 Thread Xiaohong Gong
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

@PaulSandoz, could you please help to check whether the current version is ok 
for you? Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-13 Thread Xiaohong Gong
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

Thanks for the explanation! Yeah, the main problem is Java doesn't have the 
direct unsigned comparison. We need the function call. From the two ways you 
provided, I think the second `Integer.lessThanUnsigned` looks better. But I'm 
not sure whether this could improve the performance a lot, although the first 
check `a.length - vsp.length() > 0` can be hosited out side of the loop. And 
this might make the codes more complex for me. Maybe we can do a pre research 
to find a better implementation to the unsigned comparison first. Do you think 
so?

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Quan Anh Mai
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

However, we seem to lack the ability to do an unsigned comparison reliably. C2 
can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will 
fail if `x` or `y` is an addition with constant in such cases the constants 
will be merged together. As a result, I think we need an intrinsic for this. 
`Integer.compareUnsigned` may fit but it manifests the result into an integer 
register which may lead to suboptimal materialisation of flags, another 
approach would be to have a separate method `Integer.lessThanUnsigned` which 
only returns `boolean` and C2 can have better time splitting the boolean 
comparison through `IfNode`, which will prevent the materialisation of 
`boolean` values. What do you two think?

I.e, after splitting if through merge point, the shape of `if 
(Integer.lessThanUnsigned(a, b))` would be transformed from

 ab
  \  /
CmpU
 |
Bool
 |
If
  / \
IfTrueIfFalse
  \ /
Region10
\ |   /
 Phi 0
  \ /
  CmpI

into

 ab
  \  /
CmpU

Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-12 Thread Quan Anh Mai
On Fri, 13 May 2022 01:27:18 GMT, Xiaohong Gong  wrote:

>> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - 
>> vsp.length()` which would hoist the first check outside of the loop.
>> Thanks.
>
>> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - 
>> vsp.length()` which would hoist the first check outside of the loop. Thanks.
> 
> Thanks for the review @merykitty ! We need the check `offset >= 0` which I 
> think is different from `a.length - vsp.length()`.

@XiaohongGong  `a >= 0 && a < b` is the same as `b >= 0 && a u< b`, it is how 
we are doing range check today. Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Xiaohong Gong
> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

Xiaohong Gong has updated the pull request incrementally with one additional 
commit since the last revision:

  Wrap the offset check into a static method

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8620/files
  - new: https://git.openjdk.java.net/jdk/pull/8620/files/eb67f681..2229dd24

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8620=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8620=00-01

  Stats: 56 lines in 8 files changed: 5 ins; 0 del; 51 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8620.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Xiaohong Gong
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz  wrote:

>> Thanks for the review @PaulSandoz ! For the 
>> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be 
>> used here because the `outOfBounds` exception will be thrown if the offset 
>> is not inside of the valid array boundary. And  for the masked operations, 
>> this is not needed since we only need to check the masked lanes. Please 
>> correct me if I didn't understand correctly. Thanks!
>
> Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My 
> intent was that we could use a method rather than repeating the expression in 
> numerous places (including masked loads IIRC).

Hi @PaulSandoz , I'v updated the offset check for masked load/store. Could you 
please help to check whether it is ok? Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-12 Thread Xiaohong Gong
On Thu, 12 May 2022 09:49:17 GMT, Quan Anh Mai  wrote:

> Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - 
> vsp.length()` which would hoist the first check outside of the loop. Thanks.

Thanks for the review @merykitty ! We need the check `offset >= 0` which I 
think is different from `a.length - vsp.length()`.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-12 Thread Quan Anh Mai
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong  wrote:

> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

Maybe we could use `a.length - vsp.length() > 0 && offset u< a.length - 
vsp.length()` which would hoist the first check outside of the loop.
Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Xiaohong Gong
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz  wrote:

>> Thanks for the review @PaulSandoz ! For the 
>> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be 
>> used here because the `outOfBounds` exception will be thrown if the offset 
>> is not inside of the valid array boundary. And  for the masked operations, 
>> this is not needed since we only need to check the masked lanes. Please 
>> correct me if I didn't understand correctly. Thanks!
>
> Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My 
> intent was that we could use a method rather than repeating the expression in 
> numerous places (including masked loads IIRC).

Good idea! I will try. Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Paul Sandoz
On Thu, 12 May 2022 01:56:25 GMT, Xiaohong Gong  wrote:

>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
>>  line 4086:
>> 
>>> 4084: } else {
>>> 4085: $Type$Species vsp = vspecies();
>>> 4086: if (offset < 0 || offset > (a.length - vsp.length())) {
>> 
>> Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. 
>> 
>> if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { 
>> ...
>
> Thanks for the review @PaulSandoz ! For the 
> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be 
> used here because the `outOfBounds` exception will be thrown if the offset is 
> not inside of the valid array boundary. And  for the masked operations, this 
> is not needed since we only need to check the masked lanes. Please correct me 
> if I didn't understand correctly. Thanks!

Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My 
intent was that we could use a method rather than repeating the expression in 
numerous places (including masked loads IIRC).

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Xiaohong Gong
On Wed, 11 May 2022 15:10:55 GMT, Paul Sandoz  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
>  line 4086:
> 
>> 4084: } else {
>> 4085: $Type$Species vsp = vspecies();
>> 4086: if (offset < 0 || offset > (a.length - vsp.length())) {
> 
> Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. 
> 
> if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { 
> ...

Thanks for the review @PaulSandoz ! For the 
`VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be used 
here because the `outOfBounds` exception will be thrown if the offset is not 
inside of the valid array boundary. And  for the masked operations, this is not 
needed since we only need to check the masked lanes. Please correct me if I 
didn't understand correctly. Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Paul Sandoz
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong  wrote:

> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template
 line 4086:

> 4084: } else {
> 4085: $Type$Species vsp = vspecies();
> 4086: if (offset < 0 || offset > (a.length - vsp.length())) {

Can we use `VectorIntrinsics.checkFromIndexSize`? e.g. 

if (!VectorIntrinsics.checkFromIndexSize(offset, vsp.length(), a.length)) { ...

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store

2022-05-11 Thread Xiaohong Gong
On Tue, 10 May 2022 01:23:55 GMT, Xiaohong Gong  wrote:

> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

Hi @PaulSandoz @rose00, could you please help to take a look at this simple PR? 
Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620