Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-23 Thread Jatin Bhateja
On Thu, 12 May 2022 23:56:49 GMT, Vladimir Ivanov  wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 11 commits:
>> 
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - 8284960: Correcting a typo.
>>  - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - 8284960: AARCH64 backend changes.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - ... and 1 more: 
>> https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082
>
> Overall, looks good.
> 
> Some minor questions/suggestions follow.

Hi @iwanowww , your comments have been addressed. kindly let me know if you 
have other comments on x86 side changes.

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-19 Thread Jatin Bhateja
On Thu, 19 May 2022 15:33:49 GMT, Jatin Bhateja  wrote:

>> Do you mean it's important to apply the transformation at the right node 
>> (pick the right node as the root) and it is hard to make a decision during 
>> GVN?
>
> Yes, that what I meant, but with recently added 
> Node::Flag_is_predicated_using_blend it could be possible to move this 
> transformation ahead into idealization routines of reverse/reverse bytes IR 
> nodes.

Addressed this after internally discussing with Sandhya. Moved the transforms 
from final graph re-shaping back to vector intrinsic routines.

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-19 Thread Jatin Bhateja
On Wed, 18 May 2022 23:35:54 GMT, Vladimir Ivanov  wrote:

>> It was an attempt to facilitate in-lining of these APIs over targets which 
>> do not intrinsify them. I agree its not a generic fix since three APIs are 
>> piggybacking on same entry point and without the knowledge of opcode it will 
>> be inappropriate to take any call at this place, lazy intrinsification gives 
>> opportunity for some of the predications to concertize as compilation 
>> happens under closed world assumptions.
>
> Still not clear why the code is shaped the way it is.
> 
> `Matcher::match_rule_supported_vector()` already checks that there are 
> relevant matching rules.
> 
> The checks require both `CompressM` and `CompressV` to be present to enable 
> the intrinsic. Is it important?
> 
> Also, it doesn't take `EnableVectorSupport` into account while all other 
> vector intrinsics respect it.

Yes, the code was modified to accommodate your comments. 
https://github.com/openjdk/jdk/pull/8425/files#diff-a9dd7e411772c1ee37b54c5ab868a01fe82af905758350f0ba1c370f422c3fe6R718

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-19 Thread Jatin Bhateja
On Wed, 18 May 2022 23:28:22 GMT, Vladimir Ivanov  wrote:

>> Its more of a chicken-egg problem here, for masked reverse operation, 
>> Reverse IR node is followed by a Blend Node, thus in such a case doing an 
>> eager Identity transform in Reverse::Identity will not work, also deferring 
>> this to blend may also not work since it could be a non-masked reverse 
>> operation, we could have handled it as a special case in 
>> inline_vector_nary_operation, but handling such special case in final graph 
>> reshaping looked more appropriate.
>> 
>> https://github.com/openjdk/panama-vector/pull/182#discussion_r845678080
>
> Do you mean it's important to apply the transformation at the right node 
> (pick the right node as the root) and it is hard to make a decision during 
> GVN?

Yes, that what I meant.

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-18 Thread Vladimir Ivanov
On Fri, 13 May 2022 08:24:24 GMT, Jatin Bhateja  wrote:

>> src/hotspot/share/opto/c2compiler.cpp line 521:
>> 
>>> 519: if (!Matcher::match_rule_supported(Op_SignumF)) return false;
>>> 520: break;
>>> 521:   case vmIntrinsics::_VectorComExp:
>> 
>> Why `_VectorComExp` intrinsic is special? Other vector intrinsics are 
>> handled later and in a different manner.
>> 
>> What about `ExpandV` case?
>
> It was an attempt to facilitate in-lining of these APIs over targets which do 
> not intrinsify them. I agree its not a generic fix since three APIs are 
> piggybacking on same entry point and without the knowledge of opcode it will 
> be inappropriate to take any call at this place, lazy intrinsification gives 
> opportunity for some of the predications to concertize as compilation happens 
> under closed world assumptions.

Still not clear why the code is shaped the way it is.

`Matcher::match_rule_supported_vector()` already checks that there are relevant 
matching rules.

The checks require both `CompressM` and `CompressV` to be present to enable the 
intrinsic. Is it important?

Also, it doesn't take `EnableVectorSupport` into account while all other vector 
intrinsics respect it.

>> src/hotspot/share/opto/compile.cpp line 3416:
>> 
>>> 3414: 
>>> 3415:   case Op_ReverseBytesV:
>>> 3416:   case Op_ReverseV: {
>> 
>> Can you elaborate, please, why it is performed so late in the optimization 
>> phase (at the very end during graph reshaping) and not during GVN?
>
> Its more of a chicken-egg problem here, for masked reverse operation, Reverse 
> IR node is followed by a Blend Node, thus in such a case doing an eager 
> Identity transform in Reverse::Identity will not work, also deferring this to 
> blend may also not work since it could be a non-masked reverse operation, we 
> could have handled it as a special case in inline_vector_nary_operation, but 
> handling such special case in final graph reshaping looked more appropriate.
> 
> https://github.com/openjdk/panama-vector/pull/182#discussion_r845678080

Do you mean it's important to apply the transformation at the right node (pick 
the right node as the root) and it is hard to make a decision during GVN?

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-13 Thread Jatin Bhateja
On Thu, 12 May 2022 22:40:50 GMT, Vladimir Ivanov  wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 11 commits:
>> 
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - 8284960: Correcting a typo.
>>  - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - 8284960: AARCH64 backend changes.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>>  - ... and 1 more: 
>> https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082
>
> src/hotspot/cpu/x86/matcher_x86.hpp line 195:
> 
>> 193:   case Op_PopCountVI:
>> 194: return ((ety == T_INT && 
>> VM_Version::supports_avx512_vpopcntdq()) ||
>> 195:(is_subword_type(ety) && 
>> VM_Version::supports_avx512_bitalg())) ? 0 : 50;
> 
> Should be easier to read when the condition is split. E.g.:
> 
> if (is_subword_type(ety)) {
>   return VM_Version::supports_avx512_bitalg())) ? 0 : 50;
> } else {
>   assert(ety == T_INT, "sanity"); // for documentation purposes
>   return VM_Version::supports_avx512_vpopcntdq() ? 0 : 50;
> }

DONE

> src/hotspot/cpu/x86/vm_version_x86.hpp line 375:
> 
>> 373: decl(RDTSCP,"rdtscp",48) /* RDTSCP 
>> instruction */ \
>> 374: decl(RDPID, "rdpid", 49) /* RDPID 
>> instruction */ \
>> 375: decl(FSRM,  "fsrm",  50) /* Fast Short REP 
>> MOV */ \
> 
> `test/lib-test/jdk/test/whitebox/CPUInfoTest.java` should be adjusted as 
> well, shouldn't it?

Yes, test updated appropriately.

> src/hotspot/share/classfile/vmIntrinsics.hpp line 1152:
> 
>> 1150:   
>> "Ljdk/internal/vm/vector/VectorSupport$ComExpOperation;)"
>> \
>> 1151:   
>> "Ljdk/internal/vm/vector/VectorSupport$VectorPayload;")  
>> \
>> 1152:do_name(vector_comexp_op_name, "comExpOp")  
>> \
> 
> I don't see much value in trying to shorten the name by abbreviating it. I 
> find it easier to read in an expanded form:
> ` compressExpandOp`, `vector_compress_expand_op_name`, 
> `_VectorCompressExpand`, etc.

DONE

> src/hotspot/share/opto/c2compiler.cpp line 521:
> 
>> 519: if (!Matcher::match_rule_supported(Op_SignumF)) return false;
>> 520: break;
>> 521:   case vmIntrinsics::_VectorComExp:
> 
> Why `_VectorComExp` intrinsic is special? Other vector intrinsics are handled 
> later and in a different manner.
> 
> What about `ExpandV` case?

It was an attempt to facilitate in-lining of these APIs over targets which do 
not intrinsify them. I agree its not a generic fix since three APIs are 
piggybacking on same entry point and without the knowledge of opcode it will be 
inappropriate to take any call at this place, lazy intrinsification gives 
opportunity for some of the predications to concertize as compilation happens 
under closed world assumptions.

> src/hotspot/share/opto/compile.cpp line 3416:
> 
>> 3414: 
>> 3415:   case Op_ReverseBytesV:
>> 3416:   case Op_ReverseV: {
> 
> Can you elaborate, please, why it is performed so late in the optimization 
> phase (at the very end during graph reshaping) and not during GVN?

Its more of a chicken-egg problem here, for masked reverse operation, Reverse 
IR node is followed by a Blend Node, thus in such a case doing an eager 
Identity transform in Reverse::Identity will not work, also deferring this to 
blend may also not work since it could be a non-masked reverse operation, we 
could have handled it as a special case in inline_vector_nary_operation, but 
handling such special case in final graph reshaping looked more appropriate.

https://github.com/openjdk/panama-vector/pull/182#discussion_r845678080

-

PR: https://git.openjdk.java.net/jdk/pull/8425


Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-12 Thread Vladimir Ivanov
On Tue, 10 May 2022 12:48:25 GMT, Jatin Bhateja  wrote:

>> Hi All,
>> 
>> Patch adds the planned support for new vector operations and APIs targeted 
>> for [JEP 426: Vector API (Fourth 
>> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
>> 
>> Following is the brief summary of changes:-
>> 
>> 1)  Extends the scope of existing lanewise API for following new vector 
>> operations.
>>-  VectorOperations.BIT_COUNT: counts the number of one-bits
>>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero 
>> bits
>>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing 
>> zero bits
>>- VectorOperations.REVERSE: reversing the order of bits
>>- VectorOperations.REVERSE_BYTES: reversing the order of bytes
>>- compress and expand bits: Semantics are based on Hacker's Delight 
>> section 7-4 Compress, or Generalized Extract.
>> 
>> 2)  Adds following new APIs to perform cross lane vector compress and 
>> expansion operations under the influence of a mask.
>>- Vector.compress
>>- Vector.expand 
>>- VectorMask.compress
>> 
>> 3) Adds predicated and non-predicated versions of following new APIs to load 
>> and store the contents of vector from foreign MemorySegments. 
>>   - Vector.fromMemorySegment
>>   - Vector.intoMemorySegment
>> 
>> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support 
>> for each newly added operation.
>> 
>> 
>>  Patch has been regressed over AARCH64 and X86 targets different AVX levels. 
>> 
>>  Kindly review and share your feedback.
>> 
>>  Best Regards,
>>  Jatin
>
> Jatin Bhateja has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 11 commits:
> 
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: Correcting a typo.
>  - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - 8284960: AARCH64 backend changes.
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
>  - ... and 1 more: 
> https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082

Overall, looks good.

Some minor questions/suggestions follow.

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5700:

> 5698: as_FloatRegister($dst$$reg));
> 5699: }
> 5700: if (bt == T_INT) {

I find it hard to reason about the code in its current form.

Maybe make the second `if` (`bt == T_INT`) nested and move it under  `if (bt == 
T_SHORT || bt == T_INT)`?

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2587:

> 2585: 
> 2586: void MacroAssembler::vmovdqu(XMMRegister dst, AddressLiteral src, 
> Register scratch_reg, int vector_len) {
> 2587:   assert(vector_len <= AVX_512bit, "unexpected vector length");

The assert becomes redundant.

src/hotspot/cpu/x86/matcher_x86.hpp line 195:

> 193:   case Op_PopCountVI:
> 194: return ((ety == T_INT && 
> VM_Version::supports_avx512_vpopcntdq()) ||
> 195:(is_subword_type(ety) && 
> VM_Version::supports_avx512_bitalg())) ? 0 : 50;

Should be easier to read when the condition is split. E.g.:

if (is_subword_type(ety)) {
  return VM_Version::supports_avx512_bitalg())) ? 0 : 50;
} else {
  assert(ety == T_INT, "sanity"); // for documentation purposes
  return VM_Version::supports_avx512_vpopcntdq() ? 0 : 50;
}

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 7953:

> 7951: StubRoutines::x86::_vector_iota_indices = 
> generate_iota_indices("iota_indices");
> 7952: 
> 7953: if (UsePopCountInstruction && VM_Version::supports_avx2() && 
> !VM_Version::supports_avx512_vpopcntdq()) {

Why is the LUT unconditionally generated? `UsePopCountInstruction` still guides 
the usages.

src/hotspot/cpu/x86/vm_version_x86.hpp line 375:

> 373: decl(RDTSCP,"rdtscp",48) /* RDTSCP 
> instruction */ \
> 374: decl(RDPID, "rdpid", 49) /* RDPID 
> instruction */ \
> 375: decl(FSRM,  "fsrm",  50) /* Fast Short REP 
> MOV */ \

`test/lib-test/jdk/test/whitebox/CPUInfoTest.java` should be adjusted as well, 
shouldn't it?

src/hotspot/cpu/x86/x86.ad line 2113:

> 2111: 
> 2112: case Op_CountLeadingZerosV:
> 2113:   if ((bt == T_INT || bt == T_LONG) && 
> VM_Version::supports_avx512cd()) {

Newly introduced `is_non_subword_integral_type(bt)` can be used here instead of 
`bt == T_INT || bt == T_LONG`.

src/hotspot/share/classfile/vmIntrinsics.hpp line 1152:

> 1150:   
> "Ljdk/internal/vm/vector/VectorSupport$ComExpOperation;)"  

Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v3]

2022-05-10 Thread Jatin Bhateja
> Hi All,
> 
> Patch adds the planned support for new vector operations and APIs targeted 
> for [JEP 426: Vector API (Fourth 
> Incubator).](https://bugs.openjdk.java.net/browse/JDK-8280173)
> 
> Following is the brief summary of changes:-
> 
> 1)  Extends the scope of existing lanewise API for following new vector 
> operations.
>-  VectorOperations.BIT_COUNT: counts the number of one-bits
>- VectorOperations.LEADING_ZEROS_COUNT: counts the number of leading zero 
> bits
>- VectorOperations.TRAILING_ZEROS_COUNT: counts the number of trailing 
> zero bits
>- VectorOperations.REVERSE: reversing the order of bits
>- VectorOperations.REVERSE_BYTES: reversing the order of bytes
>- compress and expand bits: Semantics are based on Hacker's Delight 
> section 7-4 Compress, or Generalized Extract.
> 
> 2)  Adds following new APIs to perform cross lane vector compress and 
> expansion operations under the influence of a mask.
>- Vector.compress
>- Vector.expand 
>- VectorMask.compress
> 
> 3) Adds predicated and non-predicated versions of following new APIs to load 
> and store the contents of vector from foreign MemorySegments. 
>   - Vector.fromMemorySegment
>   - Vector.intoMemorySegment
> 
> 4) C2 Compiler IR enhancements and optimized X86 and AARCH64 backend support 
> for each newly added operation.
> 
> 
>  Patch has been regressed over AARCH64 and X86 targets different AVX levels. 
> 
>  Kindly review and share your feedback.
> 
>  Best Regards,
>  Jatin

Jatin Bhateja has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now contains 11 commits:

 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - 8284960: Correcting a typo.
 - 8284960: Integrating changes from panama-vector (Add @since 19 tags).
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - 8284960: AARCH64 backend changes.
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8284960
 - ... and 1 more: https://git.openjdk.java.net/jdk/compare/3fa1c404...b021e082

-

Changes: https://git.openjdk.java.net/jdk/pull/8425/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=8425=02
  Stats: 37901 lines in 214 files changed: 16527 ins; 16924 del; 4450 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8425.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8425/head:pull/8425

PR: https://git.openjdk.java.net/jdk/pull/8425