Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before >> After >> Benchmark Mode Cnt Score ErrorScore >> Error Unit Ratio >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 594.636 ± >> 8.922 ns/op4.84 >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 663.849 ± >> 3.656 ns/op4.61 >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 518.309 ± >> 107.352 ns/op0.92 >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 515.576 ± >> 14.669 ns/op0.98 >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 621.185 ± >> 11.935 ns/op1.98 >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 623.566 ± >> 15.206 ns/op1.98 >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 400.124 ± >> 0.762 ns/op5.64 >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 546.486 ± >> 1.509 ns/op4.70 >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional > commit since the last revision: > > comments Thank you very much for your reviews and testing. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before >> After >> Benchmark Mode Cnt Score ErrorScore >> Error Unit Ratio >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 594.636 ± >> 8.922 ns/op4.84 >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 663.849 ± >> 3.656 ns/op4.61 >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 518.309 ± >> 107.352 ns/op0.92 >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 515.576 ± >> 14.669 ns/op0.98 >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 621.185 ± >> 11.935 ns/op1.98 >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 623.566 ± >> 15.206 ns/op1.98 >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 400.124 ± >> 0.762 ns/op5.64 >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 546.486 ± >> 1.509 ns/op4.70 >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional > commit since the last revision: > > comments Marked as reviewed by sviswanathan (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]
On Sat, 21 May 2022 15:30:29 GMT, Vladimir Kozlov wrote: > I started our testing. Please, wait results. Testing passed clean. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before >> After >> Benchmark Mode Cnt Score ErrorScore >> Error Unit Ratio >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 594.636 ± >> 8.922 ns/op4.84 >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 663.849 ± >> 3.656 ns/op4.61 >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 518.309 ± >> 107.352 ns/op0.92 >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 515.576 ± >> 14.669 ns/op0.98 >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 621.185 ± >> 11.935 ns/op1.98 >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 623.566 ± >> 15.206 ns/op1.98 >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 400.124 ± >> 0.762 ns/op5.64 >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 546.486 ± >> 1.509 ns/op4.70 >> >> Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional > commit since the last revision: > > comments Thank you for trying my suggestion and new comments. Approved. I started our testing. Please, wait results. - Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]
On Wed, 4 May 2022 23:27:45 GMT, Vladimir Kozlov wrote: >> The changes to `Float` and `Double` look good. I don't think we need >> additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java. >> >> At first i thought we no longer need PR #8459 but it seems both PRs are >> complimentary, albeit PR #8459 has more modest performance gains for the >> intrinsics. > >> The changes to `Float` and `Double` look good. I don't think we need >> additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java. > > Thank you, Paul for pointing the test. It means we need to run tier4 (which > runs these tests with -Xcomp) to make sure methods are compiled by C2. @vnkozlov I have added comments to describe the changes in `cmpOpUCF` and the reasons behind the `cmov_regUCF2_eq` match rules. Using `expand` broke the build with `Syntax Error: :For expand in cmovI_regUCF2_eq to work, parameter declaration order in cmovI_regUCF2_ne must follow matchrule`. @sviswa7 (x != y) ? a : b can be calculated using pseudocode as follow: res = (!ZF || PF) ? a : b = !ZF ? a : (PF ? a : b) which can be calculated using cmovp rb, ra // rb1 = PF ? ra : rb cmovne rb, ra // rb2 = !ZF ? ra : rb1 = !ZF ? ra : (PF ? ra : rb) Furthermore, since `(x == y) == !(x != y)`, we have `((x == y) ? a : b) == ((x != y) ? b : a)`, which explains the implementation of `cmov_regUCF2_eq`. @vamsi-parasa Thanks a lot for your suggestion, I have modified the PR description as you say. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]
> Hi, > > This patch optimises the matching rules for floating-point comparison with > respects to eq/ne on x86-64 > > 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` > is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which > improves the sequence of `If (CmpF x x) (Bool ne)` from > > ucomiss xmm0, xmm0 > jp label > jne label > > into > > ucomiss xmm0, xmm0 > jp label > > 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x > == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of > fixing the flags, such as > > xorlecx, ecx > ucomiss xmm0, xmm1 > jnp done > pushf > andq[rsp], 0xff2b > popf > done: > movleax, 1 > cmovel eax, ecx > > The patch changes this sequence into > > xorlecx, ecx > ucomiss xmm0, xmm1 > movleax, 1 > cmovpl eax, ecx > cmovnel eax, ecx > > 3, The patch also changes the pattern of `isInfinite` to be more optimised by > using `Math.abs` to reduce 1 comparison and compares the result with > `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. > > The benchmark results are as follow: > > Before After > Benchmark Mode Cnt Score ErrorScore > Error Unit Ratio > FPComparison.equalDouble avgt5 2876.242 ± 58.875 594.636 ± > 8.922 ns/op4.84 > FPComparison.equalFloatavgt5 3062.430 ± 31.371 663.849 ± > 3.656 ns/op4.61 > FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 518.309 ± > 107.352 ns/op0.92 > FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 515.576 ± > 14.669 ns/op0.98 > FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 621.185 ± > 11.935 ns/op1.98 > FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 623.566 ± > 15.206 ns/op1.98 > FPComparison.isNanDouble avgt5 2255.847 ± 7.238 400.124 ± > 0.762 ns/op5.64 > FPComparison.isNanFloatavgt5 2567.044 ± 36.078 546.486 ± > 1.509 ns/op4.70 > > Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: comments - Changes: - all: https://git.openjdk.java.net/jdk/pull/8525/files - new: https://git.openjdk.java.net/jdk/pull/8525/files/ba93dcf2..7fcfe4a3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=01-02 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8525.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8525/head:pull/8525 PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
On Fri, 20 May 2022 22:22:43 GMT, Sandhya Viswanathan wrote: >> I see that you swapped `src, dst` in `match()` but `format` is sill >> incorrect and the code is confusing. > > I agree with @vnkozlov that this needs explanation. Could you please add > comments here with IR and example code generated for both the eq case and ne > case? You have some explanation in the PR description but not in the code. > The description needs to be in the code as well for maintenance. Right, I missed that. Then you can use `expand %{` to avoid duplication (keep format). - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
On Wed, 4 May 2022 23:16:41 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/x86_64.ad line 6998: >> >>> 6996: ins_encode %{ >>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register); >>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register); >> >> Should this be `equal`? > > I see that you swapped `src, dst` in `match()` but `format` is sill incorrect > and the code is confusing. I agree with @vnkozlov that this needs explanation. Could you please add comments here with IR and example code generated for both the eq case and ne case? You have some explanation in the PR description but not in the code. The description needs to be in the code as well for maintenance. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 18 May 2022 14:59:33 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op >> FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op >> FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op >> FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op >> FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op >> FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op >> FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op >> FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op >> >> Thank you very much. > > I have reverted the changes to `java.lang.Float` and `java.lang.Double` to > not interfere with the intrinsic PR. More tests are added to cover all cases > regarding floating-point comparison of compiled code. > > The rules for fp comparison that output the result to `rFlagRegsU` are > expensive and should be avoided. As a result, I removed the shortcut rules > with memory or constant operands to reduce the number of match rules. Only > the basic rules are kept. > > Thanks. @merykitty Very nice work! The patch looks good to me. @merykitty Very nice work! The patch looks good to me. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
On Wed, 18 May 2022 14:59:49 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op >> FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op >> FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op >> FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op >> FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op >> FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op >> FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op >> FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op >> >> Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains eight additional > commits since the last revision: > > - incidental ws > - add tests > - Merge branch 'master' into fpcompare > - fix tests > - test > - improve infinity > - remove expensive rules > - improve fp comparison Could you pls show the performance delta with the baseline and after the patch? Otherwise, people reviewing this PR have to manually compute how much improvement is obtained. For example, `FPComparison.isNanFloat` is showing `4.7x` improvement. Kindly fill the delta column for rest of the data points. Instead of two separate tables, the suggested table format for each row would be: ` , , , ` - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
On Wed, 4 May 2022 23:16:41 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/x86_64.ad line 6998: >> >>> 6996: ins_encode %{ >>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register); >>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register); >> >> Should this be `equal`? > > I see that you swapped `src, dst` in `match()` but `format` is sill incorrect > and the code is confusing. This is a flipping the sense of the test by flipping the input of the `CMove`, so this is essentially the same as in the above rule. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai wrote: > Hi, > > This patch optimises the matching rules for floating-point comparison with > respects to eq/ne on x86-64 > > 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` > is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which > improves the sequence of `If (CmpF x x) (Bool ne)` from > > ucomiss xmm0, xmm0 > jp label > jne label > > into > > ucomiss xmm0, xmm0 > jp label > > 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x > == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of > fixing the flags, such as > > xorlecx, ecx > ucomiss xmm0, xmm1 > jnp done > pushf > andq[rsp], 0xff2b > popf > done: > movleax, 1 > cmovel eax, ecx > > The patch changes this sequence into > > xorlecx, ecx > ucomiss xmm0, xmm1 > movleax, 1 > cmovpl eax, ecx > cmovnel eax, ecx > > 3, The patch also changes the pattern of `isInfinite` to be more optimised by > using `Math.abs` to reduce 1 comparison and compares the result with > `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. > > The benchmark results are as follow: > > Before: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op > FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op > FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op > FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op > FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op > FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op > FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op > FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op > > After: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op > FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op > FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op > FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op > FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op > FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op > FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op > FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op > > Thank you very much. I have reverted the changes to `java.lang.Float` and `java.lang.Double` to not interfere with the intrinsic PR. More tests are added to cover all cases regarding floating-point comparison of compiled code. The rules for fp comparison that output the result to `rFlagRegsU` are expensive and should be avoided. As a result, I removed the shortcut rules with memory or constant operands to reduce the number of match rules. Only the basic rules are kept. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]
> Hi, > > This patch optimises the matching rules for floating-point comparison with > respects to eq/ne on x86-64 > > 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` > is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which > improves the sequence of `If (CmpF x x) (Bool ne)` from > > ucomiss xmm0, xmm0 > jp label > jne label > > into > > ucomiss xmm0, xmm0 > jp label > > 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x > == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of > fixing the flags, such as > > xorlecx, ecx > ucomiss xmm0, xmm1 > jnp done > pushf > andq[rsp], 0xff2b > popf > done: > movleax, 1 > cmovel eax, ecx > > The patch changes this sequence into > > xorlecx, ecx > ucomiss xmm0, xmm1 > movleax, 1 > cmovpl eax, ecx > cmovnel eax, ecx > > 3, The patch also changes the pattern of `isInfinite` to be more optimised by > using `Math.abs` to reduce 1 comparison and compares the result with > `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. > > The benchmark results are as follow: > > Before: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op > FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op > FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op > FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op > FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op > FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op > FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op > FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op > > After: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op > FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op > FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op > FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op > FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op > FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op > FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op > FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op > > Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - incidental ws - add tests - Merge branch 'master' into fpcompare - fix tests - test - improve infinity - remove expensive rules - improve fp comparison - Changes: - all: https://git.openjdk.java.net/jdk/pull/8525/files - new: https://git.openjdk.java.net/jdk/pull/8525/files/b64e04b5..ba93dcf2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=00-01 Stats: 210103 lines in 2627 files changed: 159508 ins; 36691 del; 13904 mod Patch: https://git.openjdk.java.net/jdk/pull/8525.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8525/head:pull/8525 PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 4 May 2022 22:27:48 GMT, Paul Sandoz wrote: > The changes to `Float` and `Double` look good. I don't think we need > additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java. Thank you, Paul for pointing the test. It means we need to run tier4 (which runs these tests with -Xcomp) to make sure methods are compiled by C2. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 4 May 2022 19:32:41 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This patch optimises the matching rules for floating-point comparison with >> respects to eq/ne on x86-64 >> >> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` >> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which >> improves the sequence of `If (CmpF x x) (Bool ne)` from >> >> ucomiss xmm0, xmm0 >> jp label >> jne label >> >> into >> >> ucomiss xmm0, xmm0 >> jp label >> >> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as >> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of >> fixing the flags, such as >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> jnp done >> pushf >> andq[rsp], 0xff2b >> popf >> done: >> movleax, 1 >> cmovel eax, ecx >> >> The patch changes this sequence into >> >> xorlecx, ecx >> ucomiss xmm0, xmm1 >> movleax, 1 >> cmovpl eax, ecx >> cmovnel eax, ecx >> >> 3, The patch also changes the pattern of `isInfinite` to be more optimised >> by using `Math.abs` to reduce 1 comparison and compares the result with >> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. >> >> The benchmark results are as follow: >> >> Before: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op >> FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op >> FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op >> FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op >> FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op >> FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op >> FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op >> FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op >> FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op >> FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op >> FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op >> FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op >> FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op >> FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op >> FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op >> >> Thank you very much. > > src/hotspot/cpu/x86/x86_64.ad line 6998: > >> 6996: ins_encode %{ >> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register); >> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register); > > Should this be `equal`? I see that you swapped `src, dst` in `match()` but `format` is sill incorrect and the code is confusing. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai wrote: > Hi, > > This patch optimises the matching rules for floating-point comparison with > respects to eq/ne on x86-64 > > 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` > is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which > improves the sequence of `If (CmpF x x) (Bool ne)` from > > ucomiss xmm0, xmm0 > jp label > jne label > > into > > ucomiss xmm0, xmm0 > jp label > > 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x > == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of > fixing the flags, such as > > xorlecx, ecx > ucomiss xmm0, xmm1 > jnp done > pushf > andq[rsp], 0xff2b > popf > done: > movleax, 1 > cmovel eax, ecx > > The patch changes this sequence into > > xorlecx, ecx > ucomiss xmm0, xmm1 > movleax, 1 > cmovpl eax, ecx > cmovnel eax, ecx > > 3, The patch also changes the pattern of `isInfinite` to be more optimised by > using `Math.abs` to reduce 1 comparison and compares the result with > `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. > > The benchmark results are as follow: > > Before: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op > FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op > FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op > FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op > FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op > FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op > FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op > FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op > > After: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op > FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op > FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op > FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op > FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op > FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op > FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op > FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op > > Thank you very much. The changes to `Float` and `Double` look good. I don't think we need additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java. At first i thought we no longer need PR #8459 but it seems both PRs are complimentary, albeit PR #8459 has more modest performance gains for the intrinsics. - PR: https://git.openjdk.java.net/jdk/pull/8525
Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai wrote: > Hi, > > This patch optimises the matching rules for floating-point comparison with > respects to eq/ne on x86-64 > > 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` > is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which > improves the sequence of `If (CmpF x x) (Bool ne)` from > > ucomiss xmm0, xmm0 > jp label > jne label > > into > > ucomiss xmm0, xmm0 > jp label > > 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x > == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of > fixing the flags, such as > > xorlecx, ecx > ucomiss xmm0, xmm1 > jnp done > pushf > andq[rsp], 0xff2b > popf > done: > movleax, 1 > cmovel eax, ecx > > The patch changes this sequence into > > xorlecx, ecx > ucomiss xmm0, xmm1 > movleax, 1 > cmovpl eax, ecx > cmovnel eax, ecx > > 3, The patch also changes the pattern of `isInfinite` to be more optimised by > using `Math.abs` to reduce 1 comparison and compares the result with > `MAX_VALUE` since `>` is more optimised than `==` for floating-point types. > > The benchmark results are as follow: > > Before: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 2876.242 ± 58.875 ns/op > FPComparison.equalFloatavgt5 3062.430 ± 31.371 ns/op > FPComparison.isFiniteDoubleavgt5 475.749 ± 19.027 ns/op > FPComparison.isFiniteFloat avgt5 506.525 ± 14.417 ns/op > FPComparison.isInfiniteDouble avgt5 1232.800 ± 31.677 ns/op > FPComparison.isInfiniteFloat avgt5 1234.708 ± 70.239 ns/op > FPComparison.isNanDouble avgt5 2255.847 ± 7.238 ns/op > FPComparison.isNanFloatavgt5 2567.044 ± 36.078 ns/op > > After: > Benchmark Mode Cnt Score Error Units > FPComparison.equalDouble avgt5 594.636 ± 8.922 ns/op > FPComparison.equalFloatavgt5 663.849 ± 3.656 ns/op > FPComparison.isFiniteDoubleavgt5 518.309 ± 107.352 ns/op > FPComparison.isFiniteFloat avgt5 515.576 ± 14.669 ns/op > FPComparison.isInfiniteDouble avgt5 621.185 ± 11.935 ns/op > FPComparison.isInfiniteFloat avgt5 623.566 ± 15.206 ns/op > FPComparison.isNanDouble avgt5 400.124 ± 0.762 ns/op > FPComparison.isNanFloatavgt5 546.486 ± 1.509 ns/op > > Thank you very much. Thank you for including tests. But you need additional other float compare (not just `ea` `ne`) tests since you removed some of `Cmp` instructions. You need approval from core libs for Java methods changes (it affects all platforms). And we will get intrinsics for them soon (I think): #8459. Please add comments to `cmpOpU*` operands explaining changes in them. You did not explain removal of some float compare instructions. src/hotspot/cpu/x86/x86_64.ad line 6998: > 6996: ins_encode %{ > 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register); > 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register); Should this be `equal`? - Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/8525