Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
On Wed, 18 May 2022 14:59:49 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op >> FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op >> FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op >> FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op >> FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op >> FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op >> FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op >> FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op >> >> Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains eight additional > commits since the last revision: > > - incidental ws > - add tests > - Merge branch 'master' into fpcompare > - fix tests > - test > - improve infinity > - remove expensive rules > - improve fp comparison Could you pls show the performance delta with the baseline and after the patch? Otherwise, people reviewing this PR have to manually compute how much improvement is obtained. For example, `FPComparison.isNanFloat` is showing `4.7x` improvement. Kindly fill the delta column for rest of the data points. Instead of two separate tables, the suggested table format for each row would be: ` , , , ` - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v13]
On Mon, 11 Apr 2022 02:01:17 GMT, David Holmes wrote: > This change appears to be causing crashes in tier4 - possibly Xcomp related: > > # assert(ctrl == kit.control()) failed: Control flow was added although the > intrinsic bailed out > I will file a new bug. Thank you for informing this issue! In order to reproduce the issue, I just kickstarted a tier4 run on a Xeon server. Will keep you updated. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]
On Wed, 6 Apr 2022 06:23:47 GMT, Jatin Bhateja wrote: >>> Also need a jtreg test for this. >> >> Thanks Sandhya for the review. Made the suggested changes and added jtreg >> tests as well. > > Hi @vamsi-parasa , thanks for addressing my comments, looks good to me > otherwise apart from the outstanding comments. @jatin-bhateja Thank you Jatin! - PR: https://git.openjdk.java.net/jdk/pull/7572
Integrated: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long
On Tue, 22 Feb 2022 09:24:47 GMT, Srinivas Vamsi Parasa wrote: > Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. This pull request has now been integrated. Changeset: 37e28aea Author:vamsi-parasa Committer: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/37e28aea27c8d8336ddecde777e63b51a939d281 Stats: 1156 lines in 20 files changed: 1154 ins; 1 del; 1 mod 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Reviewed-by: sviswanathan, kvn, jbhateja - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v13]
On Sat, 9 Apr 2022 18:25:54 GMT, Vladimir Kozlov wrote: > Testing passed. Thank you Vladimir! - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]
On Fri, 8 Apr 2022 00:55:50 GMT, Srinivas Vamsi Parasa wrote: >> I have few comments. > > Hi Vladimir (@vnkozlov), > > Incorporated all the suggestions you made in the previous review and pushed a > new commit. > Please let me know if anything else is needed. > > Thanks, > Vamsi > @vamsi-parasa I got failures in new tests when run with `-XX:UseAVX=3 > -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting ` flags: > > ``` > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGFPE (0x8) at pc=0x7f2fa8c674ea, pid=3334, tid=3335 > # > # JRE version: Java(TM) SE Runtime Environment (19.0) (fastdebug build > 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug > 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit, mixed mode, sharing, > tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # J 504% c2 compiler.intrinsics.TestLongUnsignedDivMod.testDivideUnsigned()V > (48 bytes) @ 0x7f2fa8c674ea [0x7f2fa8c672a0+0x024a] > # > ``` > > ``` > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGFPE (0x8) at pc=0x7fb8c0c4fb18, pid=3309, tid=3310 > # > # JRE version: Java(TM) SE Runtime Environment (19.0) (fastdebug build > 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug > 19-internal-2022-04-08-0157190.vladimir.kozlov.jdkgit, mixed mode, sharing, > tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # J 445 c2 compiler.intrinsics.TestIntegerUnsignedDivMod.divmod(III)V (23 > bytes) @ 0x7fb8c0c4fb18 [0x7fb8c0c4fae0+0x0038] > # > ``` Hi Vladimir (@vnkozlov), fixed it in the new commit, could you pls check? This is being caused by lack of control() node in udiv/umod related nodes. After adding the control() node, the tests are passing for me. Thanks for pointing this out! - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v13]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix the divmod crash due to lack of control node - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/3e3fc977..a71ea238 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=11-12 Stats: 8 lines in 2 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v12]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: uncomment zero in integer div, mod test - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/bfb6c02e..3e3fc977 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]
On Wed, 6 Apr 2022 00:45:37 GMT, Vladimir Kozlov wrote: >> Thanks for suggesting the enhancement. This enhancement will be implemented >> as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 > > You do need `Ideal()` methods at least to check for dead code. Added the Ideal() methods for checking dead code. Pls see the new commit. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]
On Wed, 6 Apr 2022 00:46:01 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> add error msg for jtreg test > > I have few comments. Hi Vladimir (@vnkozlov), Incorporated all the suggestions you made in the previous review and pushed a new commit. Please let me know if anything else is needed. Thanks, Vamsi > src/hotspot/cpu/x86/assembler_x86.cpp line 12375: > >> 12373: } >> 12374: #endif >> 12375: > > Please, place it near `idivq()` so you would not need `#ifdef`. Made the change as per your suggestion. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4568: > >> 4566: subl(rdx, divisor); >> 4567: if (VM_Version::supports_bmi1()) andnl(rax, rdx, rax); >> 4568: else { > > Please, follow our coding stile here and in following methods: > > if (VM_Version::supports_bmi1()) { > andnl(rax, rdx, rax); > } else { Pls see the new commit which fixed the coding style. > src/hotspot/cpu/x86/x86_64.ad line 8701: > >> 8699: %} >> 8700: >> 8701: instruct udivI_rReg(rax_RegI rax, no_rax_rdx_RegI div, rFlagsReg cr, >> rdx_RegI rdx) > > I suggest to follow the pattern in other `div/mod` instructions: `(rax_RegI > rax, rdx_RegI rdx, no_rax_rdx_RegI div, rFlagsReg cr)` > > Similar in following new instructions. Pls see the new commit which fixed the pattern. > test/hotspot/jtreg/compiler/intrinsics/TestIntegerDivMod.java line 55: > >> 53: dividends[i] = rng.nextInt(); >> 54: divisors[i] = rng.nextInt(); >> 55: } > > I don't trust RND to generate corner cases. > Please, add cases here and in TestLongDivMod.java for MAX, MIN, 0. You are right. Using an updated corner cases test revealed divide by zero crash which was fixed. Please see the updated jtreg tests inspired by unsigned divide/remainder tests in test/jdk/java/lang/Long/Unsigned.java and test/jdk/java/lang/Integer/Unsigned.java. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v11]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Idead Ideal for udiv, umod nodes and update jtreg tests to use corner cases - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/9949047c..bfb6c02e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=09-10 Stats: 701 lines in 7 files changed: 423 ins; 274 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v10]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - use appropriate style changes - Merge branch 'master' of https://git.openjdk.java.net/jdk into udivmod - Merge branch 'openjdk:master' into udivmod - add error msg for jtreg test - update jtreg test to run on x86_64 - add bmi1 support check and jtreg tests - Merge branch 'master' of https://git.openjdk.java.net/jdk into udivmod - fix 32bit build issues - Fix line at end of file - Move intrinsic code to macro assembly routines; remove unused transformations for div and mod nodes - ... and 5 more: https://git.openjdk.java.net/jdk/compare/4451257b...9949047c - Changes: https://git.openjdk.java.net/jdk/pull/7572/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7572=09 Stats: 1011 lines in 20 files changed: 1009 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]
On Wed, 6 Apr 2022 00:46:01 GMT, Vladimir Kozlov wrote: > I have few comments. Thank you Vladimir (@vnkozlov) for suggesting the changes! Will incorporate the suggestions and push an update in few hours. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v9]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into udivmod - add error msg for jtreg test - update jtreg test to run on x86_64 - add bmi1 support check and jtreg tests - Merge branch 'master' of https://git.openjdk.java.net/jdk into udivmod - fix 32bit build issues - Fix line at end of file - Move intrinsic code to macro assembly routines; remove unused transformations for div and mod nodes - fix trailing white space errors - fix whitespaces - ... and 3 more: https://git.openjdk.java.net/jdk/compare/741be461...acba7c19 - Changes: https://git.openjdk.java.net/jdk/pull/7572/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7572=08 Stats: 1007 lines in 20 files changed: 1005 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v6]
On Tue, 5 Apr 2022 17:06:44 GMT, Sandhya Viswanathan wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional >> commit since the last revision: >> >> add bmi1 support check and jtreg tests > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 362: > >> 360: void vector_popcount_long(XMMRegister dst, XMMRegister src, >> XMMRegister xtmp1, >> 361: XMMRegister xtmp2, XMMRegister xtmp3, >> Register rtmp, >> 362: int vec_enc); > > This doesn't seem to be related to this patch. This is coming due to a merge with the latest upstream (jdk) > test/hotspot/jtreg/compiler/intrinsics/TestIntegerDivMod.java line 107: > >> 105: } >> 106: if (mismatch) { >> 107: throw new RuntimeException("Test failed"); > > It would be good to print dividend, divisor, operation, actual result and > expected result here. Please see the updated error message in the recent commit. > test/hotspot/jtreg/compiler/intrinsics/TestLongDivMod.java line 104: > >> 102: } >> 103: if (mismatch) { >> 104: throw new RuntimeException("Test failed"); > > It would be good to print dividend, divisor, operation, actual result and > expected result here. Please see the updated error message in the recent commit. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v8]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add error msg for jtreg test - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/e97c6fbc..8047767c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=06-07 Stats: 41 lines in 2 files changed: 37 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long [v2]
> Implements x86 intrinsics for compare() method in java.lang.Integer and > java.lang.Long. Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - refactored x86_64.ad code to macro assembly routines - Merge branch 'master' of https://git.openjdk.java.net/jdk into cmp - add JMH benchmarks - 8283726: x86 intrinsics for compare method in Integer and Long - Changes: - all: https://git.openjdk.java.net/jdk/pull/7975/files - new: https://git.openjdk.java.net/jdk/pull/7975/files/b0c3314d..79e4aa50 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7975=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7975=00-01 Stats: 5452 lines in 218 files changed: 3925 ins; 856 del; 671 mod Patch: https://git.openjdk.java.net/jdk/pull/7975.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975 PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Tue, 29 Mar 2022 02:24:21 GMT, Jatin Bhateja wrote: >> Implements x86 intrinsics for compare() method in java.lang.Integer and >> java.lang.Long. > > src/hotspot/cpu/x86/x86_64.ad line 12107: > >> 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI >> tmp, rFlagsReg cr) >> 12106: %{ >> 12107: match(Set dst (CompareSignedI op1 op2)); > > Please also include these patterns in x86_32.ad Will update x86_32.ad as well. > src/hotspot/cpu/x86/x86_64.ad line 12125: > >> 12123: __ movl(tmp, 0); >> 12124: __ bind(done); >> 12125: __ movl(dest, tmp); > > Please move this in macro-assembly routine. Sure, will refactor it into a macro-assembly > src/hotspot/cpu/x86/x86_64.ad line 12178: > >> 12176: __ movl(tmp, 0); >> 12177: __ bind(done); >> 12178: __ movl(dest, tmp); > > Please move this into a macro-assembly routine. Sure, will do that and update it soon. > src/hotspot/cpu/x86/x86_64.ad line 12204: > >> 12202: __ movl(tmp, 0); >> 12203: __ bind(done); >> 12204: __ movl(dest, tmp); > > Please move this into macro-assembly routine. Sure, will do that and update it soon. > src/hotspot/share/opto/comparenode.hpp line 67: > >> 65: CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {} >> 66: virtual int Opcode() const; >> 67: }; > > Intent here seems to be to enable further auto-vectorization of newly create > IR nodes. Yes, that is the intention. > test/micro/org/openjdk/bench/java/lang/CompareInteger.java line 78: > >> 76: input2[i] = tmp; >> 77: } >> 78: } > > Logic re-organization suggestion:- > > > for (int i = 0 ; i < BUFFER_SIZE; i++) { > input1[i] = rng.nextLong(); > } > > if (mode.equals("equals") { > GRADIANT = 0; > } else if (mode.equals("greaterThanEquals")) { > GRADIANT = 1; > } else { > assert mode.equals("lessThanEqual"); > GRADIANT = -1; > } > > for(int i = 0 ; i < BUFFER_SIZE; i++) { > input2[i] = input1[i] + i*GRADIANT; > } The suggested refactoring is definitely elegant but one rare possibility is overflow due to the addition/subtraction. The swap logic doesn't have that problem. > test/micro/org/openjdk/bench/java/lang/CompareLong.java line 5: > >> 3: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> 4: * >> 5: * This code is free software; you can redistribute it and/or modify it > > We can unify this benchmark along with integer compare micro. Sure, will do the unification. - PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Sun, 27 Mar 2022 06:57:58 GMT, Quan Anh Mai wrote: > This is both complicated and inefficient, I would suggest building the > intrinsic in the IR graph so that the compiler can simplify > `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks. Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead of two and shows ~10% improvement due to better branch prediction. Even without the intrinsic, the compiler is currently able to reduce it to x u< y but is still generating two cmp (unsigned) instructions as Integer.compareUnsigned(x, y) is implemented as x u< y? -1 : (x ==y ? 0 : 1). - PR: https://git.openjdk.java.net/jdk/pull/7975
RFR: 8283726: x86 intrinsics for compare method in Integer and Long
Implements x86 intrinsics for compare() method in java.lang.Integer and java.lang.Long. - Commit messages: - add JMH benchmarks - 8283726: x86 intrinsics for compare method in Integer and Long Changes: https://git.openjdk.java.net/jdk/pull/7975/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7975=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8283726 Stats: 430 lines in 13 files changed: 428 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7975.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975 PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]
On Thu, 24 Feb 2022 14:13:47 GMT, Jatin Bhateja wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional >> commit since the last revision: >> >> fix 32bit build issues > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4408: > >> 4406: jmp(done); >> 4407: bind(neg_divisor_fastpath); >> 4408: // Fastpath for divisor < 0: > > How about checking if divisor is +ve or -ve constant and non-constant > dividend in identity routine and setting a flag in IR node, which can be used > to either emit fast / slow path in a new instruction selection pattern. It > will save emitting redundant instructions. Thanks for suggesting the enhancement. This enhancement will be implemented as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 > src/hotspot/share/opto/divnode.cpp line 881: > >> 879: return (phase->type( in(2) )->higher_equal(TypeLong::ONE)) ? in(1) : >> this; >> 880: } >> 881: >> //--Value-- > > Ideal transform to replace unsigned divide by cheaper logical right shift > instruction if divisor is POW will be useful. Thanks for suggesting the enhancement. This enhancement will be implemented as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 > src/hotspot/share/opto/divnode.cpp line 897: > >> 895: >> 896: // Either input is BOTTOM ==> the result is the local BOTTOM >> 897: const Type *bot = bottom_type(); > > Can we add constant folding handling when both dividend and divisor are > constants. Thanks for suggesting the enhancement. This enhancement will be implemented as a part of https://bugs.openjdk.java.net/browse/JDK-8282365 - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v4]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix 32bit build issues - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/13549290..2915b2e7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=02-03 Stats: 91 lines in 2 files changed: 49 ins; 42 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix line at end of file - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/7fc18af3..13549290 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v2]
> Optimizes the divideUnsigned() and remainderUnsigned() methods in > java.lang.Integer and java.lang.Long classes using x86 intrinsics. This > change shows 3x improvement for Integer methods and upto 25% improvement for > Long. This change also implements the DivMod optimization which fuses > division and modulus operations if needed. The DivMod optimization shows 3x > improvement for Integer and ~65% improvement for Long. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Move intrinsic code to macro assembly routines; remove unused transformations for div and mod nodes - Changes: - all: https://git.openjdk.java.net/jdk/pull/7572/files - new: https://git.openjdk.java.net/jdk/pull/7572/files/fa57175a..7fc18af3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7572=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7572=00-01 Stats: 326 lines in 7 files changed: 137 ins; 176 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long
On Wed, 23 Feb 2022 05:52:00 GMT, Jatin Bhateja wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in >> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This >> change shows 3x improvement for Integer methods and upto 25% improvement for >> Long. This change also implements the DivMod optimization which fuses >> division and modulus operations if needed. The DivMod optimization shows 3x >> improvement for Integer and ~65% improvement for Long. > > test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 76: > >> 74: return quotients; >> 75: } >> 76: > > Return seems redundant here. Will remove it. > test/micro/org/openjdk/bench/java/lang/IntegerDivMod.java line 83: > >> 81: } >> 82: return remainders; >> 83: } > > Return seems redundant here. Will remove it. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long
On Wed, 23 Feb 2022 05:46:45 GMT, Jatin Bhateja wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in >> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This >> change shows 3x improvement for Integer methods and upto 25% improvement for >> Long. This change also implements the DivMod optimization which fuses >> division and modulus operations if needed. The DivMod optimization shows 3x >> improvement for Integer and ~65% improvement for Long. > > src/hotspot/share/opto/divnode.cpp line 1350: > >> 1348: return NULL; >> 1349: } >> 1350: > > Please remove Value and Ideal routines if no explicit transforms are being > done. Will remove the unused transformations. > src/hotspot/share/opto/divnode.cpp line 1362: > >> 1360: } >> 1361: >> 1362: >> //= > > You can remove Ideal routine is not transformation is being done. Will remove the unused transformation. - PR: https://git.openjdk.java.net/jdk/pull/7572
Re: RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long
On Wed, 23 Feb 2022 05:43:10 GMT, Jatin Bhateja wrote: >> Optimizes the divideUnsigned() and remainderUnsigned() methods in >> java.lang.Integer and java.lang.Long classes using x86 intrinsics. This >> change shows 3x improvement for Integer methods and upto 25% improvement for >> Long. This change also implements the DivMod optimization which fuses >> division and modulus operations if needed. The DivMod optimization shows 3x >> improvement for Integer and ~65% improvement for Long. > > src/hotspot/cpu/x86/x86_64.ad line 8602: > >> 8600: __ jmp(done); >> 8601: __ bind(neg_divisor_fastpath); >> 8602: // Fastpath for divisor < 0: > > Move in macro assembly routine. Sure, will move it to a macro assembly routine > src/hotspot/cpu/x86/x86_64.ad line 8633: > >> 8631: __ jmp(done); >> 8632: __ bind(neg_divisor_fastpath); >> 8633: // Fastpath for divisor < 0: > > Move in macro assembly rountine. Sure, will move it to a macro assembly routine > src/hotspot/cpu/x86/x86_64.ad line 8902: > >> 8900: __ subl(tmp_rax, divisor); >> 8901: __ andnl(tmp_rax, tmp_rax, rdx); >> 8902: __ sarl(tmp_rax, 31); > > Please move this into a macro assembly routine. Sure, will move it to a macro assembly routine > src/hotspot/cpu/x86/x86_64.ad line 8932: > >> 8930: // Fastpath when divisor < 0: >> 8931: // remainder = dividend - (((dividend & ~(dividend - divisor)) >> >> (Long.SIZE - 1)) & divisor) >> 8932: // See Hacker's Delight (2nd ed), section 9.3 which is implemented >> in java.lang.Long.remainderUnsigned() > > Please move it into a macro assembly routine. Sure, will move it to a macro assembly routine > src/hotspot/share/opto/compile.cpp line 3499: > >> 3497: Node* d = n->find_similar(Op_UDivI); >> 3498: if (d) { >> 3499: // Replace them with a fused unsigned divmod if supported > > Can you explain a bit here, why can't this transformation be handled earlier ? This is following the existing approach being used for signed DivMod > test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 75: > >> 73: } >> 74: return quotients; >> 75: } > > Do we need to return quotients, since it's a field being explicitly modified. Will remove it. > test/micro/org/openjdk/bench/java/lang/LongDivMod.java line 82: > >> 80: remainders[i] = Long.remainderUnsigned(dividends[i], >> divisors[i]); >> 81: } >> 82: return remainders; > > Same as above Will remove it. - PR: https://git.openjdk.java.net/jdk/pull/7572
RFR: 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long
Optimizes the divideUnsigned() and remainderUnsigned() methods in java.lang.Integer and java.lang.Long classes using x86 intrinsics. This change shows 3x improvement for Integer methods and upto 25% improvement for Long. This change also implements the DivMod optimization which fuses division and modulus operations if needed. The DivMod optimization shows 3x improvement for Integer and ~65% improvement for Long. - Commit messages: - fix trailing white space errors - fix whitespaces - revert comment to original for divmodI - Update rax and rdx register usage in x86_64.ad - 8282221: x86 intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Changes: https://git.openjdk.java.net/jdk/pull/7572/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7572=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8282221 Stats: 741 lines in 16 files changed: 738 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7572.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7572/head:pull/7572 PR: https://git.openjdk.java.net/jdk/pull/7572
Integrated: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh()
On Tue, 2 Nov 2021 03:02:42 GMT, Vamsi Parasa wrote: > This change optimizes random number generators using > Math.unsignedMultiplyHigh() This pull request has now been integrated. Changeset: 38f525e9 Author: vamsi-parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/38f525e96e767597d16698d4b243b681782acc9f Stats: 103 lines in 4 files changed: 72 ins; 28 del; 3 mod 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() Reviewed-by: psandoz, jlaskey - PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]
On Fri, 3 Dec 2021 17:19:17 GMT, Paul Sandoz wrote: >> @JimLaskey Could you please review this PR which optimizes three of the >> Pseudo-Random Number Generators you implemented in >> https://bugs.openjdk.java.net/browse/JDK-8248862 ? > > @vamsi-parasa you can now integrate this and then I or someone else will > sponsor it Thank you @PaulSandoz and @JimLaskey for reviewing the PR! - PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v5]
On Fri, 3 Dec 2021 00:27:13 GMT, Paul Sandoz wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Update the JMH micro to take RNG parameters for elegant implementation > > test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 47: > >> 45: public class RandomGeneratorNext { >> 46: >> 47: public RandomGenerator randomGenerator; > > Suggestion: > > RandomGenerator randomGenerator; Incorporated your suggestion in the recent commit... > test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 52: > >> 50: String randomGeneratorName; >> 51: >> 52: public static long[] buffer; > > Suggestion: > > long[] buffer; Incorporated your suggestion in the recent commit... > test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 69: > >> 67: >> 68: @Benchmark >> 69: @Fork(1) > > Why is `@Fork` need here? Removed the @Fork annotation... > test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 70: > >> 68: @Benchmark >> 69: @Fork(1) >> 70: public void testFillBufferWithNextLong() { > > Return `buffer` after the loop completes, just in case the JIT decides it is > otherwise dead code Added the code to return buffer at the end of the loop... - PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v6]
> This change optimizes random number generators using > Math.unsignedMultiplyHigh() Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: minor changes to the benchmark - Changes: - all: https://git.openjdk.java.net/jdk/pull/6206/files - new: https://git.openjdk.java.net/jdk/pull/6206/files/e9c71477..cf68fe00 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=04-05 Stats: 7 lines in 1 file changed: 1 ins; 3 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6206.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206 PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]
On Thu, 2 Dec 2021 23:02:57 GMT, Paul Sandoz wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional >> commit since the last revision: >> >> add seeds for the random generators to eliminate run-to-run variance > > test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 57: > >> 55: rngL128X128MixRandom = >> RandomGeneratorFactory.of("L128X128MixRandom").create(42); >> 56: rngL128X256MixRandom = >> RandomGeneratorFactory.of("L128X256MixRandom").create(174); >> 57: rngL128X1024MixRandom = >> RandomGeneratorFactory.of("L128X1024MixRandom").create(308); > > You can declare parameters: > > > @Param({"L128X128MixRandom", "L128X256MixRandom", "L128X1024MixRandom"}) > String randomGeneratorName; > > @Param("1024") > int size; > > long[] buffer; > RandomGenerator randomGenerator; > > > @Setup > public void setup() { > buffer = new long[size]; > randomGenerator = RandomGeneratorFactory.of(randomGeneratorName) > .create(randomGeneratorName.hashCode()); > } > > > Then you can simplify to just two benchmark methods. Further, the benchmark > can be used for other PRNGs. @PaulSandoz Implemented the changes you suggested for the benchmark. Tested it on my side and ensured that it's working as expected... - PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v5]
> This change optimizes random number generators using > Math.unsignedMultiplyHigh() Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update the JMH micro to take RNG parameters for elegant implementation - Changes: - all: https://git.openjdk.java.net/jdk/pull/6206/files - new: https://git.openjdk.java.net/jdk/pull/6206/files/78a45142..e9c71477 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=03-04 Stats: 35 lines in 1 file changed: 1 ins; 20 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/6206.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206 PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]
On Thu, 2 Dec 2021 23:02:57 GMT, Paul Sandoz wrote: >> Vamsi Parasa has updated the pull request incrementally with one additional >> commit since the last revision: >> >> add seeds for the random generators to eliminate run-to-run variance > > test/micro/org/openjdk/bench/java/util/RandomGeneratorNext.java line 57: > >> 55: rngL128X128MixRandom = >> RandomGeneratorFactory.of("L128X128MixRandom").create(42); >> 56: rngL128X256MixRandom = >> RandomGeneratorFactory.of("L128X256MixRandom").create(174); >> 57: rngL128X1024MixRandom = >> RandomGeneratorFactory.of("L128X1024MixRandom").create(308); > > You can declare parameters: > > > @Param({"L128X128MixRandom", "L128X256MixRandom", "L128X1024MixRandom"}) > String randomGeneratorName; > > @Param("1024") > int size; > > long[] buffer; > RandomGenerator randomGenerator; > > > @Setup > public void setup() { > buffer = new long[size]; > randomGenerator = RandomGeneratorFactory.of(randomGeneratorName) > .create(randomGeneratorName.hashCode()); > } > > > Then you can simplify to just two benchmark methods. Further, the benchmark > can be used for other PRNGs. Thank you Paul! Will make the necessary changes and update the benchmark... - PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]
On Thu, 2 Dec 2021 20:43:56 GMT, Vamsi Parasa wrote: >> This change optimizes random number generators using >> Math.unsignedMultiplyHigh() > > Vamsi Parasa has updated the pull request incrementally with one additional > commit since the last revision: > > add seeds for the random generators to eliminate run-to-run variance The following speedups were observed for the JMH benchmarks. L128X128MixRandom shows 8% improvement, L128x256MixRandom shows 6% improvement and L128X1024MixRandom shows 14% improvement. (Another micro workload shows 10% improvement for both L128X128MixRandom and L128x256MixRandom while L128X1024MixRandom shows 25% improvement in performance.) @JimLaskey Could you please review this PR which optimizes three of the Pseudo-Random Number Generators you implemented in https://bugs.openjdk.java.net/browse/JDK-8248862 ? - PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v4]
> This change optimizes random number generators using > Math.unsignedMultiplyHigh() Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add seeds for the random generators to eliminate run-to-run variance - Changes: - all: https://git.openjdk.java.net/jdk/pull/6206/files - new: https://git.openjdk.java.net/jdk/pull/6206/files/8bde4731..78a45142 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6206.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206 PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v3]
> This change optimizes random number generators using > Math.unsignedMultiplyHigh() Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix trailing whitespace - Changes: - all: https://git.openjdk.java.net/jdk/pull/6206/files - new: https://git.openjdk.java.net/jdk/pull/6206/files/e46c2d92..8bde4731 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6206.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206 PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() [v2]
> This change optimizes random number generators using > Math.unsignedMultiplyHigh() Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: - Fix year in copyright - micro benchmarks to test the performance - Changes: - all: https://git.openjdk.java.net/jdk/pull/6206/files - new: https://git.openjdk.java.net/jdk/pull/6206/files/c6ec6487..e46c2d92 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6206=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6206=00-01 Stats: 94 lines in 1 file changed: 94 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6206.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206 PR: https://git.openjdk.java.net/jdk/pull/6206
RFR: 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh()
This change optimizes random number generators using Math.unsignedMultiplyHigh() - Commit messages: - Update Math.java - 8275821: Optimize random number generators developed in JDK-8248862 using Math.unsignedMultiplyHigh() Changes: https://git.openjdk.java.net/jdk/pull/6206/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6206=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275821 Stats: 31 lines in 3 files changed: 0 ins; 28 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6206.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6206/head:pull/6206 PR: https://git.openjdk.java.net/jdk/pull/6206
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]
On Tue, 19 Oct 2021 20:34:55 GMT, Vamsi Parasa wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. >> This change show 1.87X improvement on a micro benchmark. > > Vamsi Parasa has updated the pull request incrementally with one additional > commit since the last revision: > > refactoring to remove code duplication by using a common routine for > UMulHiLNode and MulHiLNode Thank you for spotting the stale comment. It will removed in another related commit that will be pushed soon... - PR: https://git.openjdk.java.net/jdk/pull/5933
Integrated: 8275167: x86 intrinsic for unsignedMultiplyHigh
On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. > This change show 1.87X improvement on a micro benchmark. This pull request has now been integrated. Changeset: af7c56b8 Author: vamsi-parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/af7c56b85bb2828a9d68f9e1c753a4adfa7ebb4f Stats: 63 lines in 11 files changed: 61 ins; 2 del; 0 mod 8275167: x86 intrinsic for unsignedMultiplyHigh Reviewed-by: kvn, sviswanathan - PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]
On Wed, 20 Oct 2021 22:14:33 GMT, Vladimir Kozlov wrote: > Tests passed. You can integrate changes. Thanks Vladimir! What are the next steps to integrate the change? - PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]
On Fri, 15 Oct 2021 19:31:24 GMT, Vamsi Parasa wrote: >> src/hotspot/share/opto/mulnode.cpp line 468: >> >>> 466: } >>> 467: >>> 468: >>> //= >> >> MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps >> some refactoring would be in order, maybe make a common shared routine. > > Sure, will do the refactoring to use a shared routine. Pushed the refactored code to use a common routine for MulHiLNode::Value() and UMulHiLNode::Value(). Please review... - PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]
> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. > This change show 1.87X improvement on a micro benchmark. Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactoring to remove code duplication by using a common routine for UMulHiLNode and MulHiLNode - Changes: - all: https://git.openjdk.java.net/jdk/pull/5933/files - new: https://git.openjdk.java.net/jdk/pull/5933/files/cb30b268..a10a9fbe Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=5933=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=5933=00-01 Stats: 25 lines in 2 files changed: 12 ins; 12 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5933.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5933/head:pull/5933 PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh [v2]
On Fri, 15 Oct 2021 21:04:12 GMT, Vamsi Parasa wrote: > > > How you verified correctness of results? I suggest to extend > > > `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover > > > unsigned method. > > > > > > Tests for unsignedMultiplyHigh were already added in > > test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian > > Burkhalter). Used that test to verify the correctness of the results. > > Good. It seems I have old version of the test. Did you run it with -Xcomp? > How you verified that intrinsic is used? I have verified that the intrinsic is being used by looking at the x86 assembly code generated by using perfasm profiler. - PR: https://git.openjdk.java.net/jdk/pull/5933
Withdrawn: 8275167: x86 intrinsic for unsignedMultiplyHigh
On Wed, 13 Oct 2021 18:55:10 GMT, Vamsi Parasa wrote: > Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. > This change show 1.87X improvement on a micro benchmark. This pull request has been closed without being integrated. - PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh
On Fri, 15 Oct 2021 20:19:31 GMT, Vladimir Kozlov wrote: > > > How you verified correctness of results? I suggest to extend > > > `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover > > > unsigned method. > > > > > > Tests for unsignedMultiplyHigh were already added in > > test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian > > Burkhalter). Used that test to verify the correctness of the results. > > Good. It seems I have old version of the test. Did you run it with -Xcomp? > How you verified that intrinsic is used? The tests were not run with -Xcomp. Looked at the generated x86 code using the disassembler (hsdis) and verified that that correct instruction (mul) instead of (imul, without the intrinsic) was being generated. - PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh
On Fri, 15 Oct 2021 16:14:25 GMT, Andrew Haley wrote: >> Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. >> This change show 1.87X improvement on a micro benchmark. > > src/hotspot/share/opto/mulnode.cpp line 468: > >> 466: } >> 467: >> 468: >> //= > > MulHiLNode::Value() and UMulHiLNode::Value() seem to be identical. Perhaps > some refactoring would be in order, maybe make a common shared routine. Sure, will do the refactoring to use a shared routine. > test/micro/org/openjdk/bench/java/lang/MathBench.java line 547: > >> 545: return Math.unsignedMultiplyHigh(long747, long13); >> 546: } >> 547: > > As far as I can tell, the JMH overhead dominates when trying to measure the > latency of events in the nanosecond range. `unsignedMultiplyHigh` should have > a latency of maybe 1.5-2ns. Is that what you saw? Yes, the JMH overhead was dominating the measurement of latency. The latency observed for `unsignedMultiplyHigh` was 2.3ns with the intrinsic enabled. - PR: https://git.openjdk.java.net/jdk/pull/5933
Re: RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh
On Fri, 15 Oct 2021 18:45:41 GMT, Vladimir Kozlov wrote: > How you verified correctness of results? I suggest to extend > `test/jdk//java/lang/Math/MultiplicationTests.java` test to cover unsigned > method. Tests for unsignedMultiplyHigh were already added in test/jdk//java/lang/Math/MultiplicationTests.java (in July 2021 by Brian Burkhalter). Used that test to verify the correctness of the results. - PR: https://git.openjdk.java.net/jdk/pull/5933
RFR: 8275167: x86 intrinsic for unsignedMultiplyHigh
Optimize the new Math.unsignedMultiplyHigh using the x86 mul instruction. This change show 1.87X improvement on a micro benchmark. - Commit messages: - fix typo in function name - 8275167: x86 intrinsic for unsignedMultiplyHigh Changes: https://git.openjdk.java.net/jdk/pull/5933/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=5933=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8275167 Stats: 59 lines in 11 files changed: 59 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5933.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5933/head:pull/5933 PR: https://git.openjdk.java.net/jdk/pull/5933