Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-05 Thread Xiaohong Gong
On Thu, 5 May 2022 02:14:08 GMT, Xiaohong Gong  wrote:

>> src/hotspot/share/opto/vectorIntrinsics.cpp line 1232:
>> 
>>> 1230:   // out when current case uses the predicate feature.
>>> 1231:   if (!supports_predicate) {
>>> 1232: bool use_predicate = false;
>> 
>> If we rename this to needs_predicate it will be easier to understand.
>
> Thanks for the comment! This local variable will be removed after adding the 
> similar intrinsify for store masked. Please help to see the PR 
> https://github.com/openjdk/jdk/pull/8544. Thanks so much!

Renamed to "needs_predicate". Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 31 Mar 2022 02:15:26 GMT, Quan Anh Mai  wrote:

>> I'm afraid not. "Load + Blend" makes the elements of unmasked lanes to be 
>> `0`. Then a full store may change the values in the unmasked memory to be 0, 
>> which is different with the mask store API definition.
>
> The blend should be with the intended-to-store vector, so that masked lanes 
> contain the need-to-store elements and unmasked lanes contain the loaded 
> elements, which would be stored back, which results in unchanged values.

Hi @merykitty @jatin-bhateja , could you please help to take a review at the 
similar store masked PR https://github.com/openjdk/jdk/pull/8544 ? Any feedback 
is welcome! Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 28 Apr 2022 00:13:49 GMT, Sandhya Viswanathan 
 wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Rename the "usePred" to "offsetInRange"
>
> src/hotspot/share/opto/vectorIntrinsics.cpp line 1232:
> 
>> 1230:   // out when current case uses the predicate feature.
>> 1231:   if (!supports_predicate) {
>> 1232: bool use_predicate = false;
> 
> If we rename this to needs_predicate it will be easier to understand.

Thanks for the comment! This local variable will be removed after adding the 
similar intrinsify for store masked. Please help to see the PR 
https://github.com/openjdk/jdk/pull/8544. Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 5 May 2022 01:42:48 GMT, Xiaohong Gong  wrote:

> > > Yeah, I agree that it's not good by adding a branch checking for 
> > > `offsetInRange`. But actually I met the constant issue that passing the 
> > > values all the way cannot guarantee the argument a constant in compiler 
> > > at the compile time. Do you have any better idea to fixing this?
> > 
> > 
> > That's odd, `boolean` constants are passed that are then converted to `int` 
> > constants. Did you try passing integer constants all the way through?
> 
> I will try again. I remember the main cause is the calling of `fromArray0` 
> from `fromArray`, it is not annotated with `ForceInline`. The arguments might 
> not be compiled to a constant for cases that the offset is not in the array 
> range like tail loop.

I tried to pass the integer constant all the way, and unfortunate that the 
`offsetInRange` is not compiled to a constant. The following assertion in the 
`vectorIntrinsics.cpp` will fail:

  --- a/src/hotspot/share/opto/vectorIntrinsics.cpp
+++ b/src/hotspot/share/opto/vectorIntrinsics.cpp
@@ -1236,6 +1236,7 @@ bool 
LibraryCallKit::inline_vector_mem_masked_operation(bool is_store) {
 } else {
   // Masked vector load with IOOBE always uses the predicated load.
   const TypeInt* offset_in_range = gvn().type(argument(8))->isa_int();
+  assert(offset_in_range->is_con(), "not a constant");
   if (!offset_in_range->is_con()) {
 if (C->print_intrinsics()) {
   tty->print_cr("  ** missing constant: offsetInRange=%s",

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Thu, 5 May 2022 01:21:40 GMT, Paul Sandoz  wrote:

> > Yeah, I agree that it's not good by adding a branch checking for 
> > `offsetInRange`. But actually I met the constant issue that passing the 
> > values all the way cannot guarantee the argument a constant in compiler at 
> > the compile time. Do you have any better idea to fixing this?
> 
> That's odd, `boolean` constants are passed that are then converted to `int` 
> constants. Did you try passing integer constants all the way through?

I will try again. I remember the main cause is the calling of `fromArray0` from 
`fromArray`, it is not annotated with `ForceInline`. The arguments might not be 
compiled to a constant for cases that the offset is not in the array range like 
tail loop.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Paul Sandoz
On Thu, 5 May 2022 01:13:23 GMT, Xiaohong Gong  wrote:

> Yeah, I agree that it's not good by adding a branch checking for 
> `offsetInRange`. But actually I met the constant issue that passing the 
> values all the way cannot guarantee the argument a constant in compiler at 
> the compile time. Do you have any better idea to fixing this?

That's odd, `boolean` constants are passed that are then converted to `int` 
constants.
Did you try passing integer constants all the way through?

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-05-04 Thread Xiaohong Gong
On Fri, 29 Apr 2022 21:34:13 GMT, Paul Sandoz  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Rename the "usePred" to "offsetInRange"
>
> IIUC when the hardware does not support predicated loads then any false 
> `offsetIntRange` value causes the load intrinsic to fail resulting in the 
> fallback, so it would not be materially any different to the current 
> behavior, just more uniformly implemented.
> 
> Why can't the intrinsic support the passing a boolean directly? Is it 
> something to do with constants? If that is not possible I recommend creating 
> named constant values and pass those all the way through rather than 
> converting a boolean to an integer value. Then there is no need for a branch 
> checking `offsetInRange`.
> 
> Might be better to hold off until the JEP is integrated and then update, 
> since this will conflict (`byte[]` and `ByteBuffer` load methods are removed 
> and `MemorySegment` load methods are added). You could prepare for that now 
> by branching off `vectorIntrinsics`.

Thanks for your comments @PaulSandoz !

> IIUC when the hardware does not support predicated loads then any false 
> `offsetIntRange` value causes the load intrinsic to fail resulting in the 
> fallback, so it would not be materially any different to the current 
> behavior, just more uniformly implemented.

Yes, it's true that this patch doesn't have any different to the hardware that 
does not support the predicated loads. It only benefits the predicated feature 
supported systems like ARM SVE and X86 AVX-512.

> Why can't the intrinsic support the passing a boolean directly? Is it 
> something to do with constants? If that is not possible I recommend creating 
> named constant values and pass those all the way through rather than 
> converting a boolean to an integer value. Then there is no need for a branch 
> checking offsetInRange.

Yeah, I agree that it's not good by adding a branch checking for 
`offsetInRange`. But actually I met the constant issue that passing the values 
all the way cannot guarantee the argument a constant in compiler at the compile 
time. Do you have any better idea to fixing this?

> Might be better to hold off until the JEP is integrated and then update, 
> since this will conflict (byte[] and ByteBuffer load methods are removed and 
> MemorySegment load methods are added). You could prepare for that now by 
> branching off vectorIntrinsics.

Agree. Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-29 Thread Paul Sandoz
On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename the "usePred" to "offsetInRange"

IIUC when the hardware does not support predicated loads then any false 
`offsetIntRange` value causes the load intrinsic to fail resulting in the 
fallback, so it would not be materially any different to the current behavior, 
just more uniformly implemented.

Why can't the intrinsic support the passing a boolean directly? Is it something 
to do with constants? If that is not possible I recommend creating named 
constant values and pass those all the way through rather than converting a 
boolean to an integer value. Then there is no need for a branch checking 
`offsetInRange`.

Might be better to hold off until the JEP is integrated and then update, since 
this will conflict (`byte[]` and `ByteBuffer` load methods are removed and 
`MemorySegment` load methods are added). You could prepare for that now by 
branching off `vectorIntrinsics`.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-28 Thread Sandhya Viswanathan
On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename the "usePred" to "offsetInRange"

@PaulSandoz Could you please take a look at the Java changes when you find 
time. This PR from @XiaohongGong is a very good step towards long standing 
Vector API  wish list for better tail loop handling.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-28 Thread Jatin Bhateja
On Wed, 20 Apr 2022 02:44:39 GMT, Xiaohong Gong  wrote:

>>> The blend should be with the intended-to-store vector, so that masked lanes 
>>> contain the need-to-store elements and unmasked lanes contain the loaded 
>>> elements, which would be stored back, which results in unchanged values.
>> 
>> It may not work if memory is beyond legal accessible address space of the 
>> process, a corner case could be a page boundary.  Thus re-composing the 
>> intermediated vector which partially contains actual updates but effectively 
>> perform full vector write to destination address may not work in all 
>> scenarios.
>
> Thanks for the comment! So how about adding the check for the valid array 
> range like the masked vector load?
> Codes like:
> 
> public final
> void intoArray(byte[] a, int offset,
>VectorMask m) {
> if (m.allTrue()) {
> intoArray(a, offset);
> } else {
> ByteSpecies vsp = vspecies();
> if (offset >= 0 && offset <= (a.length - vsp.length())) { // 
> a full range check
> intoArray0(a, offset, m, /* usePred */ false);
>// can be vectorized by load+blend_store
> } else {
> checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
> intoArray0(a, offset, m, /* usePred */ true); 
>// only be vectorized by the predicated store
> }
> }
> }

Thanks, this looks ok since out of range condition will not be intrinsified if 
targets does not support predicated vector store.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-27 Thread Sandhya Viswanathan
On Fri, 22 Apr 2022 07:08:24 GMT, Xiaohong Gong  wrote:

>> Currently the vector load with mask when the given index happens out of the 
>> array boundary is implemented with pure java scalar code to avoid the IOOBE 
>> (IndexOutOfBoundaryException). This is necessary for architectures that do 
>> not support the predicate feature. Because the masked load is implemented 
>> with a full vector load and a vector blend applied on it. And a full vector 
>> load will definitely cause the IOOBE which is not valid. However, for 
>> architectures that support the predicate feature like SVE/AVX-512/RVV, it 
>> can be vectorized with the predicated load instruction as long as the 
>> indexes of the masked lanes are within the bounds of the array. For these 
>> architectures, loading with unmasked lanes does not raise exception.
>> 
>> This patch adds the vectorization support for the masked load with IOOBE 
>> part. Please see the original java implementation (FIXME: optimize):
>> 
>> 
>>   @ForceInline
>>   public static
>>   ByteVector fromArray(VectorSpecies species,
>>byte[] a, int offset,
>>VectorMask m) {
>>   ByteSpecies vsp = (ByteSpecies) species;
>>   if (offset >= 0 && offset <= (a.length - species.length())) {
>>   return vsp.dummyVector().fromArray0(a, offset, m);
>>   }
>> 
>>   // FIXME: optimize
>>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>>   return vsp.vOp(m, i -> a[offset + i]);
>>   }
>> 
>> Since it can only be vectorized with the predicate load, the hotspot must 
>> check whether the current backend supports it and falls back to the java 
>> scalar version if not. This is different from the normal masked vector load 
>> that the compiler will generate a full vector load and a vector blend if the 
>> predicate load is not supported. So to let the compiler make the expected 
>> action, an additional flag (i.e. `usePred`) is added to the existing 
>> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
>> "false" for the normal load. And the compiler will fail to intrinsify if the 
>> flag is "true" and the predicate load is not supported by the backend, which 
>> means that normal java path will be executed.
>> 
>> Also adds the same vectorization support for masked:
>>  - fromByteArray/fromByteBuffer
>>  - fromBooleanArray
>>  - fromCharArray
>> 
>> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
>> on the x86 AVX-512 system:
>> 
>> Benchmark  before   After  Units
>> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
>> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
>> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
>> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
>> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
>> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
>> 
>> Similar performance gain can also be observed on 512-bit SVE system.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Rename the "usePred" to "offsetInRange"

Rest of the patch looks good to me.

src/hotspot/share/opto/vectorIntrinsics.cpp line 1232:

> 1230:   // out when current case uses the predicate feature.
> 1231:   if (!supports_predicate) {
> 1232: bool use_predicate = false;

If we rename this to needs_predicate it will be easier to understand.

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-22 Thread Xiaohong Gong
On Wed, 20 Apr 2022 02:46:09 GMT, Xiaohong Gong  wrote:

>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java 
>> line 2861:
>> 
>>> 2859: ByteSpecies vsp = (ByteSpecies) species;
>>> 2860: if (offset >= 0 && offset <= (a.length - 
>>> species.vectorByteSize())) {
>>> 2861: return vsp.dummyVector().fromByteArray0(a, offset, m, /* 
>>> usePred */ false).maybeSwap(bo);
>> 
>> Instead of usePred a term like inRange or offetInRage or offsetInVectorRange 
>> would be easier to follow.
>
> Thanks for the review. I will change it later.

The name is updated to `offsetInRange`. Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/8035


Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature [v2]

2022-04-22 Thread Xiaohong Gong
> Currently the vector load with mask when the given index happens out of the 
> array boundary is implemented with pure java scalar code to avoid the IOOBE 
> (IndexOutOfBoundaryException). This is necessary for architectures that do 
> not support the predicate feature. Because the masked load is implemented 
> with a full vector load and a vector blend applied on it. And a full vector 
> load will definitely cause the IOOBE which is not valid. However, for 
> architectures that support the predicate feature like SVE/AVX-512/RVV, it can 
> be vectorized with the predicated load instruction as long as the indexes of 
> the masked lanes are within the bounds of the array. For these architectures, 
> loading with unmasked lanes does not raise exception.
> 
> This patch adds the vectorization support for the masked load with IOOBE 
> part. Please see the original java implementation (FIXME: optimize):
> 
> 
>   @ForceInline
>   public static
>   ByteVector fromArray(VectorSpecies species,
>byte[] a, int offset,
>VectorMask m) {
>   ByteSpecies vsp = (ByteSpecies) species;
>   if (offset >= 0 && offset <= (a.length - species.length())) {
>   return vsp.dummyVector().fromArray0(a, offset, m);
>   }
> 
>   // FIXME: optimize
>   checkMaskFromIndexSize(offset, vsp, m, 1, a.length);
>   return vsp.vOp(m, i -> a[offset + i]);
>   }
> 
> Since it can only be vectorized with the predicate load, the hotspot must 
> check whether the current backend supports it and falls back to the java 
> scalar version if not. This is different from the normal masked vector load 
> that the compiler will generate a full vector load and a vector blend if the 
> predicate load is not supported. So to let the compiler make the expected 
> action, an additional flag (i.e. `usePred`) is added to the existing 
> "loadMasked" intrinsic, with the value "true" for the IOOBE part while 
> "false" for the normal load. And the compiler will fail to intrinsify if the 
> flag is "true" and the predicate load is not supported by the backend, which 
> means that normal java path will be executed.
> 
> Also adds the same vectorization support for masked:
>  - fromByteArray/fromByteBuffer
>  - fromBooleanArray
>  - fromCharArray
> 
> The performance for the new added benchmarks improve about `1.88x ~ 30.26x` 
> on the x86 AVX-512 system:
> 
> Benchmark  before   After  Units
> LoadMaskedIOOBEBenchmark.byteLoadArrayMaskIOOBE   737.542 1387.069 ops/ms
> LoadMaskedIOOBEBenchmark.doubleLoadArrayMaskIOOBE 118.366  330.776 ops/ms
> LoadMaskedIOOBEBenchmark.floatLoadArrayMaskIOOBE  233.832 6125.026 ops/ms
> LoadMaskedIOOBEBenchmark.intLoadArrayMaskIOOBE233.816 7075.923 ops/ms
> LoadMaskedIOOBEBenchmark.longLoadArrayMaskIOOBE   119.771  330.587 ops/ms
> LoadMaskedIOOBEBenchmark.shortLoadArrayMaskIOOBE  431.961  939.301 ops/ms
> 
> Similar performance gain can also be observed on 512-bit SVE system.

Xiaohong Gong has updated the pull request incrementally with one additional 
commit since the last revision:

  Rename the "usePred" to "offsetInRange"

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8035/files
  - new: https://git.openjdk.java.net/jdk/pull/8035/files/8f9e8a3c..9b2d2f19

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8035=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8035=00-01

  Stats: 393 lines in 41 files changed: 0 ins; 0 del; 393 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8035.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8035/head:pull/8035

PR: https://git.openjdk.java.net/jdk/pull/8035