Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

2022-05-19 Thread Srinivas Vamsi Parasa
On Wed, 18 May 2022 14:59:49 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>> Before:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
>> 
>> After:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
>> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
>> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
>> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
>> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
>> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
>> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
>> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains eight additional 
> commits since the last revision:
> 
>  - incidental ws
>  - add tests
>  - Merge branch 'master' into fpcompare
>  - fix tests
>  - test
>  - improve infinity
>  - remove expensive rules
>  - improve fp comparison

Could you pls show the performance delta with the baseline and after the patch? 
Otherwise, people reviewing this PR have to manually compute how much 
improvement is obtained. For example, `FPComparison.isNanFloat` is showing 
`4.7x` improvement. Kindly fill the delta column for rest of the data points. 

Instead of two separate tables, the suggested table format for each row would 
be:
` ,  , , `

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v13]

2022-04-10 Thread Srinivas Vamsi Parasa
On Mon, 11 Apr 2022 02:01:17 GMT, David Holmes  wrote:

> This change appears to be causing crashes in tier4 - possibly Xcomp related:
> 
> # assert(ctrl == kit.control()) failed: Control flow was added although the 
> intrinsic bailed out
> I will file a new bug.

Thank you for informing this issue! In order to reproduce the issue, I just 
kickstarted a tier4 run on a Xeon server. Will  keep you updated.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]

2022-04-10 Thread Srinivas Vamsi Parasa
On Wed, 6 Apr 2022 06:23:47 GMT, Jatin Bhateja  wrote:

>>> Also need a jtreg test for this.
>> 
>> Thanks Sandhya for the review. Made the suggested changes and added jtreg 
>> tests as well.
>
> Hi @vamsi-parasa , thanks for addressing my comments, looks good to me 
> otherwise apart from the outstanding comments.

@jatin-bhateja Thank you Jatin!

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Integrated: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

2022-04-09 Thread Srinivas Vamsi Parasa
On Tue, 22 Feb 2022 09:24:47 GMT, Srinivas Vamsi Parasa  
wrote:

> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

This pull request has now been integrated.

Changeset: 37e28aea
Author:vamsi-parasa 
Committer: Jatin Bhateja 
URL:   
https://git.openjdk.java.net/jdk/commit/37e28aea27c8d8336ddecde777e63b51a939d281
Stats: 1156 lines in 20 files changed: 1154 ins; 1 del; 1 mod

8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in 
java.lang.Integer and java.lang.Long

Reviewed-by: sviswanathan, kvn, jbhateja

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v13]

2022-04-09 Thread Srinivas Vamsi Parasa
On Sat, 9 Apr 2022 18:25:54 GMT, Vladimir Kozlov  wrote:

> Testing passed.

Thank you Vladimir!

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]

2022-04-08 Thread Srinivas Vamsi Parasa
On Fri, 8 Apr 2022 00:55:50 GMT, Srinivas Vamsi Parasa  
wrote:

>> I have few comments.
>
> Hi Vladimir (@vnkozlov),
> 
> Incorporated all the suggestions you made in the previous review and pushed a 
> new commit.
> Please let me know if anything else is needed.
> 
> Thanks,
> Vamsi

> @vamsi-parasa I got failures in new tests when run with `-XX:UseAVX=3 
> -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting ` flags:
> 
> ```
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGFPE (0x8) at pc=0x7f2fa8c674ea, pid=3334, tid=3335
> #
> # JRE version: Java(TM) SE Runtime Environment (19.0) (fastdebug build 
> 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 
> 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit, mixed mode, sharing, 
> tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # Problematic frame:
> # J 504% c2 compiler.intrinsics.TestLongUnsignedDivMod.testDivideUnsigned()V 
> (48 bytes) @ 0x7f2fa8c674ea [0x7f2fa8c672a0+0x024a]
> #
> ```
> 
> ```
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGFPE (0x8) at pc=0x7fb8c0c4fb18, pid=3309, tid=3310
> #
> # JRE version: Java(TM) SE Runtime Environment (19.0) (fastdebug build 
> 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 
> 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit, mixed mode, sharing, 
> tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
> # Problematic frame:
> # J 445 c2 compiler.intrinsics.TestIntegerUnsignedDivMod.divmod(III)V (23 
> bytes) @ 0x7fb8c0c4fb18 [0x7fb8c0c4fae0+0x0038]
> #
> ```

Hi Vladimir (@vnkozlov), fixed it in the new commit, could you pls check? 
This is being caused by lack of control() node in udiv/umod related nodes. 
After adding the control() node, the tests are passing for me. Thanks for 
pointing this out!

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v13]

2022-04-08 Thread Srinivas Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  Fix the divmod crash due to lack of control node

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/3e3fc977..a71ea238

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=12
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=11-12

  Stats: 8 lines in 2 files changed: 0 ins; 4 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v12]

2022-04-07 Thread Srinivas Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  uncomment zero in integer div, mod test

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/bfb6c02e..3e3fc977

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=11
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=10-11

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]

2022-04-07 Thread Srinivas Vamsi Parasa
On Wed, 6 Apr 2022 00:45:37 GMT, Vladimir Kozlov  wrote:

>> Thanks for suggesting the enhancement. This enhancement will be implemented 
>> as a part of https://bugs.openjdk.java.net/browse/JDK-8282365
>
> You do need `Ideal()` methods at least to check for dead code.

Added the Ideal() methods for checking dead code. Pls see the new commit.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]

2022-04-07 Thread Srinivas Vamsi Parasa
On Wed, 6 Apr 2022 00:46:01 GMT, Vladimir Kozlov  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   add error msg for jtreg test
>
> I have few comments.

Hi Vladimir (@vnkozlov),

Incorporated all the suggestions you made in the previous review and pushed a 
new commit.
Please let me know if anything else is needed.

Thanks,
Vamsi

> src/hotspot/cpu/x86/assembler_x86.cpp line 12375:
> 
>> 12373: }
>> 12374: #endif
>> 12375: 
> 
> Please, place it near `idivq()` so you would not need `#ifdef`.

Made the change as per your suggestion.

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4568:
> 
>> 4566:   subl(rdx, divisor);
>> 4567:   if (VM_Version::supports_bmi1()) andnl(rax, rdx, rax);
>> 4568:   else {
> 
> Please, follow our coding stile here and in following methods:
> 
> if (VM_Version::supports_bmi1()) {
>   andnl(rax, rdx, rax);
> } else {

Pls see the new commit which fixed the coding style.

> src/hotspot/cpu/x86/x86_64.ad line 8701:
> 
>> 8699: %}
>> 8700: 
>> 8701: instruct udivI_rReg(rax_RegI rax, no_rax_rdx_RegI div, rFlagsReg cr, 
>> rdx_RegI rdx)
> 
> I suggest to follow the pattern in other `div/mod` instructions: `(rax_RegI 
> rax, rdx_RegI rdx, no_rax_rdx_RegI div, rFlagsReg cr)`
> 
> Similar in following new instructions.

Pls see the new commit which fixed the pattern.

> test/hotspot/jtreg/compiler/intrinsics/TestIntegerDivMod.java line 55:
> 
>> 53: dividends[i] = rng.nextInt();
>> 54: divisors[i] = rng.nextInt();
>> 55: }
> 
> I don't trust RND to generate corner cases.
> Please, add cases here and in TestLongDivMod.java for MAX, MIN, 0.

You are right. Using an updated corner cases test revealed divide by zero crash 
which was fixed. Please see the updated jtreg tests inspired by unsigned 
divide/remainder tests in test/jdk/java/lang/Long/Unsigned.java and 
test/jdk/java/lang/Integer/Unsigned.java.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v11]

2022-04-07 Thread Srinivas Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  Idead Ideal for udiv, umod nodes and update jtreg tests to use corner cases

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/9949047c..bfb6c02e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=10
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=09-10

  Stats: 701 lines in 7 files changed: 423 ins; 274 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v10]

2022-04-06 Thread Srinivas Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Srinivas Vamsi Parasa has updated the pull request with a new target base due 
to a merge or a rebase. The pull request now contains 15 commits:

 - use appropriate style changes
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into udivmod
 - Merge branch 'openjdk:master' into udivmod
 - add error msg for jtreg test
 - update jtreg test to run on x86_64
 - add bmi1 support check and jtreg tests
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into udivmod
 - fix 32bit build issues
 - Fix line at end of file
 - Move intrinsic code to macro assembly routines; remove unused 
transformations for div and mod nodes
 - ... and 5 more: https://git.openjdk.java.net/jdk/compare/4451257b...9949047c

-

Changes: https://git.openjdk.java.net/jdk/pull/7572/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7572=09
  Stats: 1011 lines in 20 files changed: 1009 ins; 1 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]

2022-04-06 Thread Vamsi Parasa
On Wed, 6 Apr 2022 00:46:01 GMT, Vladimir Kozlov  wrote:

> I have few comments.

Thank you Vladimir (@vnkozlov) for suggesting the changes! Will incorporate the 
suggestions and push an update in few hours.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v9]

2022-04-06 Thread Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Vamsi Parasa has updated the pull request with a new target base due to a merge 
or a rebase. The pull request now contains 13 commits:

 - Merge branch 'openjdk:master' into udivmod
 - add error msg for jtreg test
 - update jtreg test to run on x86_64
 - add bmi1 support check and jtreg tests
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into udivmod
 - fix 32bit build issues
 - Fix line at end of file
 - Move intrinsic code to macro assembly routines; remove unused 
transformations for div and mod nodes
 - fix trailing white space errors
 - fix whitespaces
 - ... and 3 more: https://git.openjdk.java.net/jdk/compare/741be461...acba7c19

-

Changes: https://git.openjdk.java.net/jdk/pull/7572/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7572=08
  Stats: 1007 lines in 20 files changed: 1005 ins; 1 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v6]

2022-04-05 Thread Vamsi Parasa
On Tue, 5 Apr 2022 17:06:44 GMT, Sandhya Viswanathan  
wrote:

>> Vamsi Parasa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   add bmi1 support check and jtreg tests
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 362:
> 
>> 360:   void vector_popcount_long(XMMRegister dst, XMMRegister src, 
>> XMMRegister xtmp1,
>> 361: XMMRegister xtmp2, XMMRegister xtmp3, 
>> Register rtmp,
>> 362: int vec_enc);
> 
> This doesn't seem to be related to this patch.

This is coming due to a merge with the latest upstream (jdk)

> test/hotspot/jtreg/compiler/intrinsics/TestIntegerDivMod.java line 107:
> 
>> 105: }
>> 106: if (mismatch) {
>> 107: throw new RuntimeException("Test failed");
> 
> It would be good to print dividend, divisor, operation, actual result and 
> expected result here.

Please see the updated error message in the recent commit.

> test/hotspot/jtreg/compiler/intrinsics/TestLongDivMod.java line 104:
> 
>> 102: }
>> 103: if (mismatch) {
>> 104: throw new RuntimeException("Test failed");
> 
> It would be good to print dividend, divisor, operation, actual result and 
> expected result here.

Please see the updated error message in the recent commit.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]

2022-04-05 Thread Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  add error msg for jtreg test

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/e97c6fbc..8047767c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=06-07

  Stats: 41 lines in 2 files changed: 37 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long [v2]

2022-03-30 Thread Vamsi Parasa
> Implements x86 intrinsics for compare() method in java.lang.Integer and 
> java.lang.Long.

Vamsi Parasa has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains four additional commits since 
the last revision:

 - refactored x86_64.ad code to macro assembly routines
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into cmp
 - add JMH benchmarks
 - 8283726: x86 intrinsics for compare method in Integer and Long

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7975/files
  - new: https://git.openjdk.java.net/jdk/pull/7975/files/b0c3314d..79e4aa50

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7975=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7975=00-01

  Stats: 5452 lines in 218 files changed: 3925 ins; 856 del; 671 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7975.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-29 Thread Vamsi Parasa
On Tue, 29 Mar 2022 02:24:21 GMT, Jatin Bhateja  wrote:

>> Implements x86 intrinsics for compare() method in java.lang.Integer and 
>> java.lang.Long.
>
> src/hotspot/cpu/x86/x86_64.ad line 12107:
> 
>> 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI 
>> tmp, rFlagsReg cr)
>> 12106: %{
>> 12107:   match(Set dst (CompareSignedI op1 op2));
> 
> Please also include these patterns in x86_32.ad

Will update x86_32.ad as well.

> src/hotspot/cpu/x86/x86_64.ad line 12125:
> 
>> 12123: __ movl(tmp, 0);
>> 12124: __ bind(done);
>> 12125: __ movl(dest, tmp);
> 
> Please move this in macro-assembly routine.

Sure, will refactor it into a macro-assembly

> src/hotspot/cpu/x86/x86_64.ad line 12178:
> 
>> 12176: __ movl(tmp, 0);
>> 12177: __ bind(done);
>> 12178: __ movl(dest, tmp);
> 
> Please move this into a macro-assembly routine.

Sure, will do that and update it soon.

> src/hotspot/cpu/x86/x86_64.ad line 12204:
> 
>> 12202: __ movl(tmp, 0);
>> 12203: __ bind(done);
>> 12204: __ movl(dest, tmp);
> 
> Please move this into macro-assembly routine.

Sure, will do that and update it soon.

> src/hotspot/share/opto/comparenode.hpp line 67:
> 
>> 65: CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
>> 66: virtual int Opcode() const;
>> 67: };
> 
> Intent here seems to be to enable further auto-vectorization of newly create 
> IR nodes.

Yes, that is the intention.

> test/micro/org/openjdk/bench/java/lang/CompareInteger.java line 78:
> 
>> 76: input2[i] = tmp;
>> 77: }
>> 78: }
> 
> Logic re-organization suggestion:- 
>  
> 
>  for (int i = 0 ; i < BUFFER_SIZE; i++) {
> input1[i] = rng.nextLong();
>  }
> 
>  if (mode.equals("equals") {
> GRADIANT = 0;
>  } else if (mode.equals("greaterThanEquals")) {
> GRADIANT = 1;
>  } else {
> assert mode.equals("lessThanEqual");
> GRADIANT = -1;
>  }
> 
>  for(int i = 0 ; i < BUFFER_SIZE; i++) {
> input2[i] = input1[i] + i*GRADIANT;
>  }

The suggested refactoring is definitely elegant but one rare possibility is 
overflow due to the addition/subtraction. The swap logic doesn't have that 
problem.

> test/micro/org/openjdk/bench/java/lang/CompareLong.java line 5:
> 
>> 3:  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>> 4:  *
>> 5:  * This code is free software; you can redistribute it and/or modify it
> 
> We can unify this benchmark along with integer compare micro.

Sure, will do the unification.

-

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-29 Thread Vamsi Parasa
On Sun, 27 Mar 2022 06:57:58 GMT, Quan Anh Mai  wrote:

> This is both complicated and inefficient, I would suggest building the 
> intrinsic in the IR graph so that the compiler can simplify 
> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks.

Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead of 
two and shows ~10% improvement due to better branch prediction. Even without 
the intrinsic, the compiler is currently able to reduce it to x u< y but is 
still generating two cmp (unsigned) instructions as Integer.compareUnsigned(x, 
y) is implemented as x u< y? -1 : (x ==y ? 0 : 1).

-

PR: https://git.openjdk.java.net/jdk/pull/7975


RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-27 Thread Vamsi Parasa
Implements x86 intrinsics for compare() method in java.lang.Integer and 
java.lang.Long.

-

Commit messages:
 - add JMH benchmarks
 - 8283726: x86 intrinsics for compare method in Integer and Long

Changes: https://git.openjdk.java.net/jdk/pull/7975/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7975=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8283726
  Stats: 430 lines in 13 files changed: 428 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7975.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]

2022-02-24 Thread Vamsi Parasa
On Thu, 24 Feb 2022 14:13:47 GMT, Jatin Bhateja  wrote:

>> Vamsi Parasa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   fix 32bit build issues
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4408:
> 
>> 4406:   jmp(done);
>> 4407:   bind(neg_divisor_fastpath);
>> 4408:   // Fastpath for divisor < 0:
> 
> How about checking if divisor is +ve or -ve constant and non-constant 
> dividend in identity routine and setting a flag in IR node, which can be used 
> to either emit fast / slow path in a new instruction selection pattern. It 
> will save emitting redundant instructions.

Thanks for suggesting the enhancement. This enhancement will be implemented as 
a part of https://bugs.openjdk.java.net/browse/JDK-8282365

> src/hotspot/share/opto/divnode.cpp line 881:
> 
>> 879:   return (phase->type( in(2) )->higher_equal(TypeLong::ONE)) ? in(1) : 
>> this;
>> 880: }
>> 881: 
>> //--Value--
> 
> Ideal transform to replace unsigned divide by cheaper logical right shift 
> instruction if divisor is POW will be useful.

Thanks for suggesting the enhancement. This enhancement will be implemented as 
a part of https://bugs.openjdk.java.net/browse/JDK-8282365

> src/hotspot/share/opto/divnode.cpp line 897:
> 
>> 895: 
>> 896:   // Either input is BOTTOM ==> the result is the local BOTTOM
>> 897:   const Type *bot = bottom_type();
> 
> Can we add constant folding handling when both dividend and divisor are 
> constants.

Thanks for suggesting the enhancement. This enhancement will be implemented as 
a part of https://bugs.openjdk.java.net/browse/JDK-8282365

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]

2022-02-23 Thread Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  fix 32bit build issues

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/13549290..2915b2e7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=02-03

  Stats: 91 lines in 2 files changed: 49 ins; 42 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3]

2022-02-23 Thread Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  Fix line at end of file

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/7fc18af3..13549290

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2]

2022-02-23 Thread Vamsi Parasa
> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
> change shows 3x improvement for Integer methods and upto 25% improvement for 
> Long. This change also implements the DivMod optimization which fuses 
> division and modulus operations if needed. The DivMod optimization shows 3x 
> improvement for Integer and ~65% improvement for Long.

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  Move intrinsic code to macro assembly routines; remove unused transformations 
for div and mod nodes

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7572/files
  - new: https://git.openjdk.java.net/jdk/pull/7572/files/fa57175a..7fc18af3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=00-01

  Stats: 326 lines in 7 files changed: 137 ins; 176 del; 13 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

2022-02-23 Thread Vamsi Parasa
On Wed, 23 Feb 2022 05:52:00 GMT, Jatin Bhateja  wrote:

>> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
>> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
>> change shows 3x improvement for Integer methods and upto 25% improvement for 
>> Long. This change also implements the DivMod optimization which fuses 
>> division and modulus operations if needed. The DivMod optimization shows 3x 
>> improvement for Integer and ~65% improvement for Long.
>
> test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 76:
> 
>> 74: return quotients;
>> 75: }
>> 76: 
> 
> Return seems redundant here.

Will remove it.

> test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 83:
> 
>> 81: }
>> 82: return remainders;
>> 83: }
> 
> Return seems redundant here.

Will remove it.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

2022-02-23 Thread Vamsi Parasa
On Wed, 23 Feb 2022 05:46:45 GMT, Jatin Bhateja  wrote:

>> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
>> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
>> change shows 3x improvement for Integer methods and upto 25% improvement for 
>> Long. This change also implements the DivMod optimization which fuses 
>> division and modulus operations if needed. The DivMod optimization shows 3x 
>> improvement for Integer and ~65% improvement for Long.
>
> src/hotspot/share/opto/divnode.cpp line 1350:
> 
>> 1348: return NULL;
>> 1349:   }
>> 1350: 
> 
> Please remove Value and Ideal routines if no explicit transforms are being 
> done.

Will remove the unused transformations.

> src/hotspot/share/opto/divnode.cpp line 1362:
> 
>> 1360:   }
>> 1361: 
>> 1362: 
>> //=
> 
> You can remove Ideal routine is not transformation is being done.

Will remove the unused transformation.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

2022-02-23 Thread Vamsi Parasa
On Wed, 23 Feb 2022 05:43:10 GMT, Jatin Bhateja  wrote:

>> Optimizes the divideUnsigned() and remainderUnsigned() methods in 
>> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This 
>> change shows 3x improvement for Integer methods and upto 25% improvement for 
>> Long. This change also implements the DivMod optimization which fuses 
>> division and modulus operations if needed. The DivMod optimization shows 3x 
>> improvement for Integer and ~65% improvement for Long.
>
> src/hotspot/cpu/x86/x86_64.ad line 8602:
> 
>> 8600: __ jmp(done);
>> 8601: __ bind(neg_divisor_fastpath); 
>> 8602: // Fastpath for divisor < 0: 
> 
> Move in macro assembly routine.

Sure, will move it to a macro assembly routine

> src/hotspot/cpu/x86/x86_64.ad line 8633:
> 
>> 8631: __ jmp(done);
>> 8632: __ bind(neg_divisor_fastpath);
>> 8633: // Fastpath for divisor < 0: 
> 
> Move in macro assembly rountine.

Sure, will move it to a macro assembly routine

> src/hotspot/cpu/x86/x86_64.ad line 8902:
> 
>> 8900: __ subl(tmp_rax, divisor);
>> 8901: __ andnl(tmp_rax, tmp_rax, rdx);
>> 8902: __ sarl(tmp_rax, 31);
> 
> Please move this into a macro assembly routine.

Sure, will move it to a macro assembly routine

> src/hotspot/cpu/x86/x86_64.ad line 8932:
> 
>> 8930: // Fastpath when divisor < 0: 
>> 8931: // remainder = dividend - (((dividend & ~(dividend - divisor)) >> 
>> (Long.SIZE - 1)) & divisor)
>> 8932: // See Hacker's Delight (2nd ed), section 9.3 which is implemented 
>> in java.lang.Long.remainderUnsigned()
> 
> Please move it into a macro assembly routine.

Sure, will move it to a macro assembly routine

> src/hotspot/share/opto/compile.cpp line 3499:
> 
>> 3497:   Node* d = n->find_similar(Op_UDivI);
>> 3498:   if (d) {
>> 3499: // Replace them with a fused unsigned divmod if supported
> 
> Can you explain a bit here, why can't this transformation be handled earlier ?

This is following the existing approach being used for signed DivMod

> test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 75:
> 
>> 73: }
>> 74: return quotients;
>> 75: }
> 
> Do we need to return quotients, since it's a field  being explicitly modified.

Will remove it.

> test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 82:
> 
>> 80: remainders[i] = Long.remainderUnsigned(dividends[i], 
>> divisors[i]);
>> 81: }
>> 82: return remainders;
> 
> Same as above

Will remove it.

-

PR: https://git.openjdk.java.net/jdk/pull/7572


RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long

2022-02-22 Thread Vamsi Parasa
Optimizes the divideUnsigned() and remainderUnsigned() methods in 
java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change 
shows 3x improvement for Integer methods and upto 25% improvement for Long. 
This change also implements the DivMod optimization which fuses division and 
modulus operations if needed. The DivMod optimization shows 3x improvement for 
Integer and ~65% improvement for Long.

-

Commit messages:
 - fix trailing white space errors
 - fix whitespaces
 - revert comment to original for divmodI
 - Update rax and rdx register usage in x86_64.ad
 - 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in 
java.lang.Integer and java.lang.Long

Changes: https://git.openjdk.java.net/jdk/pull/7572/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7572=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8282221
  Stats: 741 lines in 16 files changed: 738 ins; 1 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7572.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572

PR: https://git.openjdk.java.net/jdk/pull/7572


Integrated: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh()

2021-12-03 Thread Vamsi Parasa
On Tue, 2 Nov 2021 03:02:42 GMT, Vamsi Parasa  wrote:

> This change optimizes random number generators using 
> Math.unsignedMultiplyHigh()

This pull request has now been integrated.

Changeset: 38f525e9
Author:    vamsi-parasa 
Committer: Sandhya Viswanathan 
URL:   
https://git.openjdk.java.net/jdk/commit/38f525e96e767597d16698d4b243b681782acc9f
Stats: 103 lines in 4 files changed: 72 ins; 28 del; 3 mod

8275821: Optimize random number generators developed in JDK-8248862 using 
Math.unsignedMultiplyHigh()

Reviewed-by: psandoz, jlaskey

-

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]

2021-12-03 Thread Vamsi Parasa
On Fri, 3 Dec 2021 17:19:17 GMT, Paul Sandoz  wrote:

>> @JimLaskey Could you please review this PR which optimizes three of the 
>> Pseudo-Random Number Generators you implemented in 
>> https://bugs.openjdk.java.net/browse/JDK-8248862 ?
>
> @vamsi-parasa you can now integrate this and then I or someone else will 
> sponsor it

Thank you @PaulSandoz  and @JimLaskey for reviewing the PR!

-

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v5]

2021-12-02 Thread Vamsi Parasa
On Fri, 3 Dec 2021 00:27:13 GMT, Paul Sandoz  wrote:

>> Vamsi Parasa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Update the JMH micro to take RNG parameters for elegant implementation
>
> test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 47:
> 
>> 45: public class RandomGeneratorNext {
>> 46: 
>> 47: public RandomGenerator randomGenerator;
> 
> Suggestion:
> 
> RandomGenerator randomGenerator;

Incorporated your suggestion in the recent commit...

> test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 52:
> 
>> 50: String randomGeneratorName;
>> 51: 
>> 52: public static long[] buffer;
> 
> Suggestion:
> 
> long[] buffer;

Incorporated your suggestion in the recent commit...

> test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 69:
> 
>> 67: 
>> 68: @Benchmark
>> 69: @Fork(1)
> 
> Why is `@Fork` need here?

Removed the @Fork annotation...

> test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 70:
> 
>> 68: @Benchmark
>> 69: @Fork(1)
>> 70: public void testFillBufferWithNextLong() {
> 
> Return `buffer` after the loop completes, just in case the JIT decides it is 
> otherwise dead code

Added the code to return buffer at the end of the loop...

-

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v6]

2021-12-02 Thread Vamsi Parasa
> This change optimizes random number generators using 
> Math.unsignedMultiplyHigh()

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  minor changes to the benchmark

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6206/files
  - new: https://git.openjdk.java.net/jdk/pull/6206/files/e9c71477..cf68fe00

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=04-05

  Stats: 7 lines in 1 file changed: 1 ins; 3 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6206.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]

2021-12-02 Thread Vamsi Parasa
On Thu, 2 Dec 2021 23:02:57 GMT, Paul Sandoz  wrote:

>> Vamsi Parasa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   add seeds for the random generators to eliminate run-to-run variance
>
> test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 57:
> 
>> 55: rngL128X128MixRandom = 
>> RandomGeneratorFactory.of("L128X128MixRandom").create(42);
>> 56: rngL128X256MixRandom = 
>> RandomGeneratorFactory.of("L128X256MixRandom").create(174);
>> 57: rngL128X1024MixRandom = 
>> RandomGeneratorFactory.of("L128X1024MixRandom").create(308);
> 
> You can declare parameters:
> 
> 
> @Param({"L128X128MixRandom", "L128X256MixRandom", "L128X1024MixRandom"})
> String randomGeneratorName;
> 
> @Param("1024")
> int size;
> 
> long[] buffer;
> RandomGenerator randomGenerator;
> 
> 
> @Setup
> public void setup() {
> buffer = new long[size];
> randomGenerator = RandomGeneratorFactory.of(randomGeneratorName)
> .create(randomGeneratorName.hashCode());
> }
> 
> 
> Then you can simplify to just two benchmark methods. Further, the benchmark 
> can be used for other PRNGs.

@PaulSandoz  Implemented the changes you suggested for the benchmark. Tested it 
on my side and ensured that it's working as expected...

-

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v5]

2021-12-02 Thread Vamsi Parasa
> This change optimizes random number generators using 
> Math.unsignedMultiplyHigh()

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  Update the JMH micro to take RNG parameters for elegant implementation

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6206/files
  - new: https://git.openjdk.java.net/jdk/pull/6206/files/78a45142..e9c71477

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=03-04

  Stats: 35 lines in 1 file changed: 1 ins; 20 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6206.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]

2021-12-02 Thread Vamsi Parasa
On Thu, 2 Dec 2021 23:02:57 GMT, Paul Sandoz  wrote:

>> Vamsi Parasa has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   add seeds for the random generators to eliminate run-to-run variance
>
> test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 57:
> 
>> 55: rngL128X128MixRandom = 
>> RandomGeneratorFactory.of("L128X128MixRandom").create(42);
>> 56: rngL128X256MixRandom = 
>> RandomGeneratorFactory.of("L128X256MixRandom").create(174);
>> 57: rngL128X1024MixRandom = 
>> RandomGeneratorFactory.of("L128X1024MixRandom").create(308);
> 
> You can declare parameters:
> 
> 
> @Param({"L128X128MixRandom", "L128X256MixRandom", "L128X1024MixRandom"})
> String randomGeneratorName;
> 
> @Param("1024")
> int size;
> 
> long[] buffer;
> RandomGenerator randomGenerator;
> 
> 
> @Setup
> public void setup() {
> buffer = new long[size];
> randomGenerator = RandomGeneratorFactory.of(randomGeneratorName)
> .create(randomGeneratorName.hashCode());
> }
> 
> 
> Then you can simplify to just two benchmark methods. Further, the benchmark 
> can be used for other PRNGs.

Thank you Paul! Will make the necessary changes and update the benchmark...

-

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]

2021-12-02 Thread Vamsi Parasa
On Thu, 2 Dec 2021 20:43:56 GMT, Vamsi Parasa  wrote:

>> This change optimizes random number generators using 
>> Math.unsignedMultiplyHigh()
>
> Vamsi Parasa has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   add seeds for the random generators to eliminate run-to-run variance

The following speedups were observed for the JMH benchmarks. L128X128MixRandom 
shows 8% improvement, L128x256MixRandom shows 6% improvement and 
L128X1024MixRandom shows 14% improvement. (Another micro workload shows 10% 
improvement for both L128X128MixRandom and L128x256MixRandom while 
L128X1024MixRandom shows 25% improvement in performance.)

@JimLaskey Could you please review this PR which optimizes three of the 
Pseudo-Random Number Generators you implemented in 
https://bugs.openjdk.java.net/browse/JDK-8248862 ?

-

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]

2021-12-02 Thread Vamsi Parasa
> This change optimizes random number generators using 
> Math.unsignedMultiplyHigh()

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  add seeds for the random generators to eliminate run-to-run variance

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6206/files
  - new: https://git.openjdk.java.net/jdk/pull/6206/files/8bde4731..78a45142

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=02-03

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6206.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v3]

2021-11-04 Thread Vamsi Parasa
> This change optimizes random number generators using 
> Math.unsignedMultiplyHigh()

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  fix trailing whitespace

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6206/files
  - new: https://git.openjdk.java.net/jdk/pull/6206/files/e46c2d92..8bde4731

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6206.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v2]

2021-11-04 Thread Vamsi Parasa
> This change optimizes random number generators using 
> Math.unsignedMultiplyHigh()

Vamsi Parasa has updated the pull request incrementally with two additional 
commits since the last revision:

 - Fix year in copyright
 - micro benchmarks to test the performance

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6206/files
  - new: https://git.openjdk.java.net/jdk/pull/6206/files/c6ec6487..e46c2d92

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=00-01

  Stats: 94 lines in 1 file changed: 94 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6206.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206

PR: https://git.openjdk.java.net/jdk/pull/6206


RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh()

2021-11-02 Thread Vamsi Parasa
This change optimizes random number generators using Math.unsignedMultiplyHigh()

-

Commit messages:
 - Update Math.java
 - 8275821: Optimize random number generators developed in JDK-8248862 using 
Math.unsignedMultiplyHigh()

Changes: https://git.openjdk.java.net/jdk/pull/6206/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6206=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8275821
  Stats: 31 lines in 3 files changed: 0 ins; 28 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6206.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206

PR: https://git.openjdk.java.net/jdk/pull/6206


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]

2021-11-01 Thread Vamsi Parasa
On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa  wrote:

>> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. 
>> This change show 1.87X improvement on a micro benchmark.
>
> Vamsi Parasa has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   refactoring to remove code duplication by using a common routine for 
> UMulHiLNode and MulHiLNode

Thank you for spotting the stale comment. It will removed in another related 
commit that will be pushed soon...

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Integrated: 8275167: x86 intrinsic for unsignedMultiplyHigh

2021-10-20 Thread Vamsi Parasa
On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa  wrote:

> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. 
> This change show 1.87X improvement on a micro benchmark.

This pull request has now been integrated.

Changeset: af7c56b8
Author:    vamsi-parasa 
Committer: Sandhya Viswanathan 
URL:   
https://git.openjdk.java.net/jdk/commit/af7c56b85bb2828a9d68f9e1c753a4adfa7ebb4f
Stats: 63 lines in 11 files changed: 61 ins; 2 del; 0 mod

8275167: x86 intrinsic for unsignedMultiplyHigh

Reviewed-by: kvn, sviswanathan

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]

2021-10-20 Thread Vamsi Parasa
On Wed, 20 Oct 2021 22:14:33 GMT, Vladimir Kozlov  wrote:

> Tests passed. You can integrate changes.

Thanks Vladimir! What are the next steps to integrate the change?

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]

2021-10-19 Thread Vamsi Parasa
On Fri, 15 Oct 2021 19:31:24 GMT, Vamsi Parasa  wrote:

>> src/hotspot/share/opto/mulnode.cpp line 468:
>> 
>>> 466: }
>>> 467: 
>>> 468: 
>>> //=
>> 
>> MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps 
>> some refactoring would be in order, maybe make a common shared routine.
>
> Sure, will do the refactoring to use a shared routine.

Pushed the refactored code to use a common routine for MulHiLNode::Value() and 
UMulHiLNode::Value(). Please review...

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]

2021-10-19 Thread Vamsi Parasa
> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. 
> This change show 1.87X improvement on a micro benchmark.

Vamsi Parasa has updated the pull request incrementally with one additional 
commit since the last revision:

  refactoring to remove code duplication by using a common routine for 
UMulHiLNode and MulHiLNode

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5933/files
  - new: https://git.openjdk.java.net/jdk/pull/5933/files/cb30b268..a10a9fbe

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=5933=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=5933=00-01

  Stats: 25 lines in 2 files changed: 12 ins; 12 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5933.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5933/head:pull/5933

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]

2021-10-19 Thread Vamsi Parasa
On Fri, 15 Oct 2021 21:04:12 GMT, Vamsi Parasa  wrote:

> > > How you verified correctness of results? I suggest to extend 
> > > `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover 
> > > unsigned method.
> > 
> > 
> > Tests for unsignedMultiplyHigh were already added in 
> > test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian 
> > Burkhalter). Used that test to verify the correctness of the results.
> 
> Good. It seems I have old version of the test. Did you run it with -Xcomp? 
> How you verified that intrinsic is used?

I have verified that the intrinsic is being used by looking at the x86 assembly 
code generated by using perfasm profiler.

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Withdrawn: 8275167: x86 intrinsic for unsignedMultiplyHigh

2021-10-19 Thread Vamsi Parasa
On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa  wrote:

> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. 
> This change show 1.87X improvement on a micro benchmark.

This pull request has been closed without being integrated.

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh

2021-10-15 Thread Vamsi Parasa
On Fri, 15 Oct 2021 20:19:31 GMT, Vladimir Kozlov  wrote:

> > > How you verified correctness of results? I suggest to extend 
> > > `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover 
> > > unsigned method.
> > 
> > 
> > Tests for unsignedMultiplyHigh were already added in 
> > test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian 
> > Burkhalter). Used that test to verify the correctness of the results.
> 
> Good. It seems I have old version of the test. Did you run it with -Xcomp? 
> How you verified that intrinsic is used?

The tests were not run with -Xcomp. Looked at the generated x86 code using the 
disassembler (hsdis) and verified that that correct instruction (mul) instead 
of (imul, without the intrinsic) was being generated.

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh

2021-10-15 Thread Vamsi Parasa
On Fri, 15 Oct 2021 16:14:25 GMT, Andrew Haley  wrote:

>> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. 
>> This change show 1.87X improvement on a micro benchmark.
>
> src/hotspot/share/opto/mulnode.cpp line 468:
> 
>> 466: }
>> 467: 
>> 468: 
>> //=
> 
> MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps 
> some refactoring would be in order, maybe make a common shared routine.

Sure, will do the refactoring to use a shared routine.

> test/micro/org/openjdk/bench/java/lang/MathBench.java line 547:
> 
>> 545: return  Math.unsignedMultiplyHigh(long747, long13);
>> 546: }
>> 547: 
> 
> As far as I can tell, the JMH overhead dominates when trying to measure the 
> latency of events in the nanosecond range. `unsignedMultiplyHigh` should have 
> a latency of maybe 1.5-2ns. Is that what you saw?

Yes, the JMH overhead was dominating the measurement of latency. The latency 
observed for `unsignedMultiplyHigh` was 2.3ns with the intrinsic enabled.

-

PR: https://git.openjdk.java.net/jdk/pull/5933


Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh

2021-10-15 Thread Vamsi Parasa
On Fri, 15 Oct 2021 18:45:41 GMT, Vladimir Kozlov  wrote:

> How you verified correctness of results? I suggest to extend 
> `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned 
> method.

Tests for unsignedMultiplyHigh were already added in 
test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian 
Burkhalter). Used that test to verify the correctness of the results.

-

PR: https://git.openjdk.java.net/jdk/pull/5933


RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh

2021-10-15 Thread Vamsi Parasa
Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This 
change show 1.87X improvement on a micro benchmark.

-

Commit messages:
 - fix typo in function name
 - 8275167: x86 intrinsic for unsignedMultiplyHigh

Changes: https://git.openjdk.java.net/jdk/pull/5933/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=5933=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8275167
  Stats: 59 lines in 11 files changed: 59 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5933.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5933/head:pull/5933

PR: https://git.openjdk.java.net/jdk/pull/5933