Re: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v6]

2020-09-28 Thread Jatin Bhateja
: > [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847 : Review comments resolution - Changes: - all: https://git.openjdk.java.net/jdk/pul

Re: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v3]

2020-09-25 Thread Jatin Bhateja
On Tue, 22 Sep 2020 16:39:15 GMT, Jatin Bhateja wrote: > @jatin-bhateja Can you put summary of performance improvement into JBS? Hi @vnkozlov , @neliasso Kindly let me know your feedback, If there are no more comments is it ok to integrate this patch. - PR: ht

Re: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v4]

2020-09-22 Thread Jatin Bhateja
: > [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847: Review comments resolution; code reorganized to cover arraycopy for reference types. -

Re: RFR: 8252847: New AVX512 optimized stubs for both conjoint and disjoint arraycopy [v5]

2020-09-22 Thread Jatin Bhateja
: > [http://cr.openjdk.java.net/~jbhateja/8252847/JMH_results/ArrayCopy_AVX3_Stubs_WithOpts.txt]() Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8252847 : Modifying file permission to resolve jcheck failure. - Changes: -

Re: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v3]

2020-09-22 Thread Jatin Bhateja
On Tue, 22 Sep 2020 03:34:37 GMT, Vladimir Kozlov wrote: > @jatin-bhateja Can you put summary of performance improvement into JBS? yes, I have added the summary in JBS. - PR: https://git.openjdk.java.net/jdk/pull/61

Re: RFR: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions [v3]

2020-09-22 Thread Jatin Bhateja
On Tue, 22 Sep 2020 03:34:37 GMT, Vladimir Kozlov wrote: > @jatin-bhateja Can you put summary of performance improvement into JBS? Yes, I have added the summary to JBS - PR: https://git.openjdk.java.net/jdk/pull/61

Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v3]

2021-06-07 Thread Jatin Bhateja
On Tue, 8 Jun 2021 00:30:38 GMT, Scott Gibbons wrote: >> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to

Re: RFR: 8266054: VectorAPI rotate operation optimization [v8]

2021-06-10 Thread Jatin Bhateja
On Tue, 8 Jun 2021 10:29:44 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(Vecto

Re: RFR: 8266054: VectorAPI rotate operation optimization [v8]

2021-06-08 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > Rotate

Re: RFR: 8266054: VectorAPI rotate operation optimization [v8]

2021-06-18 Thread Jatin Bhateja
On Tue, 8 Jun 2021 10:29:44 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(Vecto

Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v3]

2021-06-08 Thread Jatin Bhateja
On Tue, 8 Jun 2021 13:25:00 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6239: >> >>> 6237: >>> 6238: __ align(32); >>> 6239: __ BIND(L_bruteForce); >> >> Is this alignment needed ? Given that brute force loop is already aligned. > > I must be

Re: RFR: 8266054: VectorAPI rotate operation optimization [v5]

2021-05-10 Thread Jatin Bhateja
On Sat, 8 May 2021 15:40:53 GMT, Paul Sandoz wrote: > Looks good. Someone from the HotSpot side needs to review related changes. > > The way i read the perf numbers is that on non AVX512 systems the numbers are > in the noise (no worse, no better), with significant improvement on AVX512. Hi

Integrated: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973

2021-05-19 Thread Jatin Bhateja
On Wed, 19 May 2021 08:20:13 GMT, Jatin Bhateja wrote: > Relevant declarations modified and tested with -Werror, no longer see > unchecked conversion warnings. > > Kindly review and approve. This pull request has now been integrated. Changeset: 88b11423 Author:Jatin

RFR: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973

2021-05-19 Thread Jatin Bhateja
Relevant declarations modified and tested with -Werror, no longer see unchecked conversion warnings. Kindly review and approve. - Commit messages: - 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973 Changes:

Re: RFR: 8266054: VectorAPI rotate operation optimization [v5]

2021-05-07 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > RotateBenc

Re: RFR: 8266054: VectorAPI rotate operation optimization [v4]

2021-05-07 Thread Jatin Bhateja
On Fri, 7 May 2021 18:31:15 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(Vecto

Re: RFR: 8266054: VectorAPI rotate operation optimization [v4]

2021-05-07 Thread Jatin Bhateja
2.78 > RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11 > RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47 > RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10 > RotateBenchmark.testRotateRightL | 256 | 21 | 8155.01

Re: RFR: 8266317: Vector API enhancements

2021-05-06 Thread Jatin Bhateja
On Thu, 29 Apr 2021 21:13:38 GMT, Paul Sandoz wrote: > This PR contains API and implementation changes for [JEP-414 Vector API > (Second Incubator)](https://openjdk.java.net/jeps/414), in preparation for > when targeted. > > Enhancements are made to the API for the support of operations on

Re: RFR: 8266054: VectorAPI rotate operation optimization [v6]

2021-05-17 Thread Jatin Bhateja
On Mon, 17 May 2021 12:06:33 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(Vecto

Re: RFR: 8266054: VectorAPI rotate operation optimization [v6]

2021-05-17 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > RotateBenchm

Re: RFR: 8266054: VectorAPI rotate operation optimization [v7]

2021-05-23 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > Rotate

Re: RFR: 8266054: VectorAPI rotate operation optimization [v7]

2021-06-01 Thread Jatin Bhateja
On Mon, 24 May 2021 05:50:44 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(Vecto

Re: RFR: 8266054: VectorAPI rotate operation optimization [v9]

2021-06-30 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > Rotate

RFR: 8266054: VectorAPI rotate operation optimization

2021-04-27 Thread Jatin Bhateja
Current VectorAPI Java side implementation expresses rotateLeft and rotateRight operation using following operations:- vec1 = lanewise(VectorOperators.LSHL, n) vec2 = lanewise(VectorOperators.LSHR, n) res = lanewise(VectorOperations.OR, vec1 , vec2) This patch moves above handling

Re: RFR: 8266054: VectorAPI rotate operation optimization [v3]

2021-05-03 Thread Jatin Bhateja
2.78 > RotateBenchmark.testRotateRightL | 256 | 11 | 8183.789 | 8193.087 | 0.11 > RotateBenchmark.testRotateRightL | 64 | 21 | 4092.686 | 4193.712 | 2.47 > RotateBenchmark.testRotateRightL | 128 | 21 | 2036.854 | 2038.927 | 0.10 > RotateBenchmark.testRotateRightL | 256 | 21 | 8155.01

Re: RFR: 8266054: VectorAPI rotate operation optimization [v3]

2021-05-03 Thread Jatin Bhateja
On Tue, 27 Apr 2021 18:43:11 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8266054: Review comments resolution. > > I noticed the tests are only updated for int

Re: RFR: 8266054: VectorAPI rotate operation optimization [v2]

2021-05-03 Thread Jatin Bhateja
On Fri, 30 Apr 2021 15:44:41 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contai

Re: RFR: 8266054: VectorAPI rotate operation optimization [v10]

2021-07-15 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > RotateBen

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v2]

2022-01-03 Thread Jatin Bhateja
cOpts.partiallyMaskedLogicOperationsLong256 > | 1024 | 996.906 | 1013.649 | 1.016794964 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 > | 256 | 2045.594 | 2048.966 | 1.001648421 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOper

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v3]

2022-01-05 Thread Jatin Bhateja
On Tue, 4 Jan 2022 15:11:47 GMT, Jatin Bhateja wrote: >> Patch extends existing macrologic inferencing algorithm to handle masked >> logic operations. >> >> Existing algorithm: >> >> 1. Identify logic cone roots. >> 2. Packs parent and logic child

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v2]

2022-01-05 Thread Jatin Bhateja
On Tue, 4 Jan 2022 02:25:36 GMT, Vladimir Kozlov wrote: > I think whole "Bitwise operation packing optimization" code should be moved > out from `compile.cpp`. May be to `vectornode.cpp where `MacroLogicVNode` > code is located. > Hi @vnkozlov , Yes we can also extended

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v4]

2022-01-05 Thread Jatin Bhateja
cOpts.partiallyMaskedLogicOperationsLong256 > | 1024 | 996.906 | 1013.649 | 1.016794964 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 > | 256 | 2045.594 | 2048.966 | 1.001648421 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMasked

RFR: 8273322: Enhance macro logic optimization for masked logic operations.

2021-12-20 Thread Jatin Bhateja
Patch extends existing macrologic inferencing algorithm to handle masked logic operations. Existing algorithm: 1. Identify logic cone roots. 2. Packs parent and logic child nodes into a MacroLogic node in bottom up traversal if input constraint are met. i.e. maximum number of inputs which a

Re: RFR: 8266054: VectorAPI rotate operation optimization [v12]

2021-07-18 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > Ro

Re: RFR: 8266054: VectorAPI rotate operation optimization [v11]

2021-07-18 Thread Jatin Bhateja
On Sun, 18 Jul 2021 20:28:34 GMT, Jatin Bhateja wrote: >> Current VectorAPI Java side implementation expresses rotateLeft and >> rotateRight operation using following operations:- >> >> vec1 = lanewise(VectorOperators.LSHL, n) >> vec2 = lanewise(Vecto

Re: RFR: 8266054: VectorAPI rotate operation optimization [v11]

2021-07-18 Thread Jatin Bhateja
66.01 | > -1.33 | 21140.67 | 21970.03 | 3.92 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 11676.46 | 11358.64 | > -2.72 | 11204.90 | 11213.48 | 0.08 > RotateBenchmark.testRotateRightS | 256.00 | 31.00 | 5728.20 | 5772.49 | 0.77 > | 5594.33 | 5544.25 | -0.90 > RotateBenchm

Re: RFR: 8266054: VectorAPI rotate operation optimization [v10]

2021-07-18 Thread Jatin Bhateja
On Fri, 16 Jul 2021 00:52:21 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 15 commits: >> >> - 8266054: Incorporating styling changes based on rev

Re: RFR: 8266054: VectorAPI rotate operation optimization [v10]

2021-07-27 Thread Jatin Bhateja
On Tue, 27 Jul 2021 02:52:13 GMT, Eric Liu wrote: >> @sviswa7, SLP flow will either have a constant 8bit shift value or a >> variable shift present in vector, this also include broadcasted non-constant >> shift value or a shift value beyond 8 bit. > > It would be better comment here, since the

Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Jatin Bhateja
On Tue, 27 Jul 2021 01:54:01 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 19 commits: >> >> - 8266054: Re-designing benchmark to remove noise. &g

Integrated: 8266054: VectorAPI rotate operation optimization

2021-07-27 Thread Jatin Bhateja
On Tue, 27 Apr 2021 17:56:04 GMT, Jatin Bhateja wrote: > Current VectorAPI Java side implementation expresses rotateLeft and > rotateRight operation using following operations:- > > vec1 = lanewise(VectorOperators.LSHL, n) > vec2 = lanewise(VectorOperators.LSHR, n) >

Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Jatin Bhateja
On Tue, 27 Jul 2021 00:24:52 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 19 commits: >> >> - 8266054: Re-designing benchmark to remove noise. &g

Re: RFR: 8271368: [BACKOUT] JDK-8266054 VectorAPI rotate operation optimization

2021-07-28 Thread Jatin Bhateja
On Wed, 28 Jul 2021 05:35:59 GMT, Vladimir Kozlov wrote: > Backout the following changes due to vector tests failures in tier 2 and > later: > [JDK-8266054](https://bugs.openjdk.java.net/browse/JDK-8266054) VectorAPI > rotate operation optimization > > Changes also caused copyright header

Re: RFR: 8266054: VectorAPI rotate operation optimization [v10]

2021-07-26 Thread Jatin Bhateja
On Mon, 26 Jul 2021 17:19:07 GMT, Sandhya Viswanathan wrote: >> And'ing with shift_mask is already done on Java API side implementation >> before making a call to intrinsic rountine. > > @jatin-bhateja This question is still pending. Other than VectorAPI , SLP also inf

Re: RFR: 8266054: VectorAPI rotate operation optimization [v10]

2021-07-26 Thread Jatin Bhateja
On Mon, 26 Jul 2021 17:19:07 GMT, Sandhya Viswanathan wrote: >> And'ing with shift_mask is already done on Java API side implementation >> before making a call to intrinsic rountine. > > @jatin-bhateja This question is still pending. @sviswa7, SLP flow will either have a c

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v2]

2022-01-04 Thread Jatin Bhateja
On Tue, 4 Jan 2022 02:21:35 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request conta

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v3]

2022-01-04 Thread Jatin Bhateja
cOpts.partiallyMaskedLogicOperationsLong256 > | 1024 | 996.906 | 1013.649 | 1.016794964 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 > | 256 | 2045.594 | 2048.966 | 1.001648421 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaske

Re: RFR: 8279508: Auto-vectorize Math.round API

2022-01-15 Thread Jatin Bhateja
On Sun, 16 Jan 2022 02:23:15 GMT, Quan Anh Mai wrote: > Hi, did we have tests for the scalar intrinsification already? Thanks. Verification is done against scalar rounding operation.

Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-13 Thread Jatin Bhateja
On Sun, 13 Feb 2022 10:58:19 GMT, Andrew Haley wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contai

Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-13 Thread Jatin Bhateja
On Sun, 13 Feb 2022 13:08:41 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4066: >> >>> 4064: } >>> 4065: >>> 4066: void >>> C2_MacroAssembler::vector_cast_double_special_cases_evex(XMMRegister dst, >>

Re: RFR: 8278173: [vectorapi] Add x64 intrinsics for unsigned (zero extended) casts

2022-02-10 Thread Jatin Bhateja
On Sat, 5 Feb 2022 15:34:08 GMT, Quan Anh Mai wrote: > Hi, > > This patch implements the unsigned upcast intrinsics in x86, which are used > in vector lane-wise reinterpreting operations. > > Thank you very much. src/hotspot/cpu/x86/x86.ad line 7288: > 7286: break; > 7287:

Re: RFR: 8279508: Auto-vectorize Math.round API [v9]

2022-03-04 Thread Jatin Bhateja
On Fri, 4 Mar 2022 06:06:52 GMT, Joe Darcy wrote: >> test/jdk/java/lang/Math/RoundTests.java line 32: >> >>> 30: public static void main(String... args) { >>> 31: int failures = 0; >>> 32: for (int i = 0; i < 10; i++) { >> >> Is there an idiom to trigger the

Re: RFR: 8279508: Auto-vectorize Math.round API [v11]

2022-03-06 Thread Jatin Bhateja
On Sun, 6 Mar 2022 09:31:27 GMT, Andrew Haley wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Removing +LogCompilation flag. > > src/hotspot/cpu/x86/c2_MacroAssembler

Re: RFR: 8279508: Auto-vectorize Math.round API [v14]

2022-03-11 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v9]

2022-03-11 Thread Jatin Bhateja
On Thu, 10 Mar 2022 14:29:36 GMT, Joe Darcy wrote: >> Hi @jddarcy , >> >> Test has been modified on the same lines using generic options which >> manipulate compilation thresholds and agnostic to target platforms. >> >> * @run main/othervm -XX:Tier3CompileThreshold=100 >>

Re: RFR: 8279508: Auto-vectorize Math.round API [v15]

2022-03-14 Thread Jatin Bhateja
On Mon, 14 Mar 2022 09:29:28 GMT, Andrew Haley wrote: >> Good suggestion, but as of now we are not using vector calling conventions >> for stubs. > > I don't understand this comment. If the stub is only to be used by you, then > you can determine your own calling convention. We are passing

Re: RFR: 8279508: Auto-vectorize Math.round API [v12]

2022-03-09 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has u

Re: RFR: 8279508: Auto-vectorize Math.round API [v17]

2022-03-18 Thread Jatin Bhateja
On Mon, 14 Mar 2022 10:35:58 GMT, Tobias Hartmann wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Windows build failure fix. > > `compiler/c2/cr6340864/TestFloatVect.java`

Re: RFR: 8279508: Auto-vectorize Math.round API [v18]

2022-03-18 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has u

Re: RFR: 8279508: Auto-vectorize Math.round API [v15]

2022-03-12 Thread Jatin Bhateja
On Sat, 12 Mar 2022 23:20:58 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Creating separate test for round double under feature check. > > src/hotspot

Re: RFR: 8279508: Auto-vectorize Math.round API [v15]

2022-03-12 Thread Jatin Bhateja
On Sun, 13 Mar 2022 00:06:07 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4161: >> >>> 4159: movl(scratch, 1056964608); >>> 4160: movq(xtmp1, scratch); >>> 4161: vbroadcastss(xtmp1, xtmp1, vec_enc); >> >> An `evpbroadcastd` would reduce this by one

Re: RFR: 8279508: Auto-vectorize Math.round API [v16]

2022-03-12 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v13]

2022-03-10 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v15]

2022-03-12 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v17]

2022-03-12 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v4]

2022-02-16 Thread Jatin Bhateja
und_float | 1024.00 | 825.69 | 3592.54 | 4.35 | > 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | > 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-02-12 Thread Jatin Bhateja
On Fri, 21 Jan 2022 00:49:04 GMT, Sandhya Viswanathan wrote: > The JVM currently initializes the x86 mxcsr to round to nearest even, see > below in stubGenerator_x86_64.cpp: // Round to nearest (even), 64-bit mode, > exceptions masked StubRoutines::x86::_mxcsr_std = 0x1F80; The above works

Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-12 Thread Jatin Bhateja
und_float | 1024.00 | 825.69 | 3592.54 | 4.35 | > 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | > 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bha

Re: RFR: 8279508: Auto-vectorize Math.round API [v6]

2022-02-17 Thread Jatin Bhateja
> Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Fixing for windows failure. - Changes: - all: https://git.openjdk.java.net/jdk/p

Re: RFR: 8279508: Auto-vectorize Math.round API [v7]

2022-02-24 Thread Jatin Bhateja
On Thu, 24 Feb 2022 00:43:27 GMT, Sandhya Viswanathan wrote: > Also curious, how does the performance look with all these changes. Updated new perf numbers. - PR: https://git.openjdk.java.net/jdk/pull/7094

Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]

2022-02-24 Thread Jatin Bhateja
On Thu, 24 Feb 2022 02:43:46 GMT, Vamsi Parasa wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in >> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This >> change shows 3x improvement for Integer methods and upto 25% improvement for >> Long. This

Re: RFR: 8279508: Auto-vectorize Math.round API [v8]

2022-02-24 Thread Jatin Bhateja
> Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Review comments resolved. - Changes: - all: https://git.openjdk.java.net/jdk/p

Re: RFR: 8279508: Auto-vectorize Math.round API [v7]

2022-02-24 Thread Jatin Bhateja
On Thu, 24 Feb 2022 01:43:27 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Review comments resolved. > > src/hotspot/cpu/x86/macroAssembler

Re: RFR: 8279508: Auto-vectorize Math.round API [v9]

2022-02-24 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-16 Thread Jatin Bhateja
On Mon, 14 Feb 2022 17:14:10 GMT, Jatin Bhateja wrote: >> That pseudocode would make a very useful comment too. This whole patch is >> very thinly commented. > >> > Hi, IIRC for evex encoding you can embed the RC control bit directly in >> > the evex prefix, re

Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-16 Thread Jatin Bhateja
On Wed, 16 Feb 2022 12:26:45 GMT, Jatin Bhateja wrote: >>> > Hi, IIRC for evex encoding you can embed the RC control bit directly in >>> > the evex prefix, removing the need to rely on global MXCSR register. >>> > Thanks. >>> >>>

Re: RFR: 8279508: Auto-vectorize Math.round API [v5]

2022-02-16 Thread Jatin Bhateja
und_float | 1024.00 | 825.69 | 3592.54 | 4.35 | > 825.32 | 1836.42 | 2.23 > FpRoundingBenchmark.test_round_float | 2048.00 | 388.55 | 1895.77 | 4.88 | > 412.31 | 945.82 | 2.29 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has u

Re: RFR: 8279508: Auto-vectorize Math.round API [v5]

2022-02-16 Thread Jatin Bhateja
On Wed, 16 Feb 2022 12:30:27 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test

Re: RFR: 8279508: Auto-vectorize Math.round API [v9]

2022-02-25 Thread Jatin Bhateja
On Fri, 25 Feb 2022 06:22:42 GMT, Jatin Bhateja wrote: >> Summary of changes: >> - Intrinsify Math.round(float) and Math.round(double) APIs. >> - Extend auto-vectorizer to infer vector operations on encountering scalar >> IR nodes for above intrinsics. >> - Test

Re: RFR: 8279508: Auto-vectorize Math.round API [v7]

2022-02-23 Thread Jatin Bhateja
> Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Review comments resolved. - Changes: - all: https://git.openjdk.java.net/jdk/p

Re: RFR: 8279508: Auto-vectorize Math.round API [v6]

2022-02-23 Thread Jatin Bhateja
On Wed, 23 Feb 2022 01:31:24 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Fixing for windows failure. > > src/hotspot/cpu/x86/c2_MacroAssembler

Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

2022-02-22 Thread Jatin Bhateja
On Tue, 22 Feb 2022 09:24:47 GMT, Vamsi Parasa wrote: > Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This

Re: RFR: 8279508: Auto-vectorize Math.round API [v10]

2022-03-01 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v11]

2022-03-01 Thread Jatin Bhateja
und_float | 1024.00 | 825.99 | 4754.66 | 5.76 | > 751.83 | 2274.13 | 3.02 > FpRoundingBenchmark.test_round_float | 2048.00 | 412.22 | 2490.09 | 6.04 | > 388.52 | 1334.18 | 3.43 > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja h

Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-03-02 Thread Jatin Bhateja
On Wed, 19 Jan 2022 22:09:26 GMT, Joe Darcy wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8279508: Adding a test for scalar intrinsification. > > The testing for this PR doesn't lo

Re: RFR: 8279508: Auto-vectorize Math.round API [v3]

2022-02-14 Thread Jatin Bhateja
On Mon, 14 Feb 2022 09:12:54 GMT, Andrew Haley wrote: >>> What does this do? Comment, even pseudo code, would be nice. >> >> Thanks @theRealAph , I shall append the comments over the routine. >> BTW, entire rounding algorithm can also be implemented using Vector API >> which can perform

Re: RFR: 8279508: Auto-vectorize Math.round API [v15]

2022-03-21 Thread Jatin Bhateja
On Mon, 21 Mar 2022 17:56:22 GMT, Quan Anh Mai wrote: >> constant and register to register moves are never issued to execution ports, >> rematerializing value rather than reading from memory will give better >> performance. > > I have come across this a little bit. While `movl r, i` may not

Re: RFR: 8279508: Auto-vectorize Math.round API [v18]

2022-03-24 Thread Jatin Bhateja
On Wed, 23 Mar 2022 06:55:50 GMT, Tobias Hartmann wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 22 commits: >> >> - 8279508: Using an explicit scratch register since rscra

Re: RFR: 8279508: Auto-vectorize Math.round API [v15]

2022-03-21 Thread Jatin Bhateja
On Tue, 22 Mar 2022 01:55:38 GMT, Quan Anh Mai wrote: >> A read from constant table will incur minimum of L1I access penalty to >> access code blob or at worst even more if data is not present in first level >> cache. Change was done for replace vpbroadcastd with vbroadcastss because of >>

Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-28 Thread Jatin Bhateja
On Sun, 27 Mar 2022 06:15:34 GMT, Vamsi Parasa wrote: > Implements x86 intrinsics for compare() method in java.lang.Integer and > java.lang.Long. src/hotspot/cpu/x86/x86_64.ad line 12107: > 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI > tmp, rFlagsReg cr) >

RFR: 8279508: Auto-vectorize Math.round API

2022-01-14 Thread Jatin Bhateja
Summary of changes: - Intrinsify Math.round(float) and Math.round(double) APIs. - Extend auto-vectorizer to infer vector operations on encountering scalar IR nodes for above intrinsics. - Test creation using new IR testing framework. Following are the performance number of a JMH micro included

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v4]

2022-01-06 Thread Jatin Bhateja
On Thu, 6 Jan 2022 17:39:20 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> 8273322: Review comments resolution. > > test/hotspot/jtreg/compiler/vectorapi/T

Re: RFR: 8273322: Enhance macro logic optimization for masked logic operations. [v5]

2022-01-06 Thread Jatin Bhateja
cOpts.partiallyMaskedLogicOperationsLong256 > | 1024 | 996.906 | 1013.649 | 1.016794964 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLogicOperationsLong512 > | 256 | 2045.594 | 2048.966 | 1.001648421 > o.o.b.jdk.incubator.vector.MaskedLogicOpts.partiallyMaskedLog

Integrated: 8273322: Enhance macro logic optimization for masked logic operations.

2022-01-06 Thread Jatin Bhateja
On Mon, 20 Dec 2021 13:33:01 GMT, Jatin Bhateja wrote: > Patch extends existing macrologic inferencing algorithm to handle masked > logic operations. > > Existing algorithm: > > 1. Identify logic cone roots. > 2. Packs parent and logic child nodes into a MacroLo

Re: RFR: 8279508: Auto-vectorize Math.round API [v2]

2022-01-19 Thread Jatin Bhateja
3 | 12.48279907 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8279508: Adding a test for scalar intrinsification.

Re: RFR: 8283667: [vectorapi] Vectorization for masked load with IOOBE with predicate feature

2022-04-11 Thread Jatin Bhateja
On Thu, 31 Mar 2022 03:53:15 GMT, Xiaohong Gong wrote: >> Yeah, maybe I misunderstood what you mean. So maybe the masked store >> `(store(src, m))` could be implemented with: >> >> 1) v1 = load >> 2) v2 = blend(load, src, m) >> 3) store(v2) >> >> Let's record this a JBS and fix it with a

Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v9]

2022-04-06 Thread Jatin Bhateja
On Wed, 6 Apr 2022 06:02:07 GMT, Vamsi Parasa wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in >> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This >> change shows 3x improvement for Integer methods and upto 25% improvement for >> Long. This

Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]

2022-04-06 Thread Jatin Bhateja
On Mon, 4 Apr 2022 07:24:12 GMT, Vamsi Parasa wrote: >> Also need a jtreg test for this. > >> Also need a jtreg test for this. > > Thanks Sandhya for the review. Made the suggested changes and added jtreg > tests as well. Hi @vamsi-parasa , thanks for addressing my comments, looks good to me

Re: RFR: 8284932: [Vector API] Incorrect implementation of LSHR operator for negative byte/short elements

2022-04-26 Thread Jatin Bhateja
On Sun, 17 Apr 2022 14:35:14 GMT, Jie Fu wrote: >> According to the Vector API doc, the LSHR operator computes >> a>>>(n&(ESIZE*8-1)) Documentation is correct if viewed strictly in context of subword vector lane, JVM internally promotes/sign extends subword type scalar variables into int

Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v5]

2022-05-17 Thread Jatin Bhateja
over AARCH64 and X86 targets different AVX levels. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch

Re: RFR: 8284960: Integration of JEP 426: Vector API (Fourth Incubator) [v6]

2022-05-17 Thread Jatin Bhateja
over AARCH64 and X86 targets different AVX levels. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8284960: Adding --enable-preview

  1   2   >