Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]

2022-05-23 Thread Quan Anh Mai
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>>  Before  
>> After
>> Benchmark  Mode  Cnt Score ErrorScore
>>  Error   Unit   Ratio
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  594.636 ±  
>>  8.922  ns/op4.84
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  663.849 ±  
>>  3.656  ns/op4.61
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  518.309 ± 
>> 107.352  ns/op0.92
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  515.576 ±  
>> 14.669  ns/op0.98
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  621.185 ±  
>> 11.935  ns/op1.98
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  623.566 ±  
>> 15.206  ns/op1.98
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  400.124 ±  
>>  0.762  ns/op5.64
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  546.486 ±  
>>  1.509  ns/op4.70
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   comments

Thank you very much for your reviews and testing.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]

2022-05-23 Thread Sandhya Viswanathan
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>>  Before  
>> After
>> Benchmark  Mode  Cnt Score ErrorScore
>>  Error   Unit   Ratio
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  594.636 ±  
>>  8.922  ns/op4.84
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  663.849 ±  
>>  3.656  ns/op4.61
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  518.309 ± 
>> 107.352  ns/op0.92
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  515.576 ±  
>> 14.669  ns/op0.98
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  621.185 ±  
>> 11.935  ns/op1.98
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  623.566 ±  
>> 15.206  ns/op1.98
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  400.124 ±  
>>  0.762  ns/op5.64
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  546.486 ±  
>>  1.509  ns/op4.70
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   comments

Marked as reviewed by sviswanathan (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]

2022-05-21 Thread Vladimir Kozlov
On Sat, 21 May 2022 15:30:29 GMT, Vladimir Kozlov  wrote:

> I started our testing. Please, wait results.

Testing passed clean.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]

2022-05-21 Thread Vladimir Kozlov
On Sat, 21 May 2022 10:31:25 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>>  Before  
>> After
>> Benchmark  Mode  Cnt Score ErrorScore
>>  Error   Unit   Ratio
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  594.636 ±  
>>  8.922  ns/op4.84
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  663.849 ±  
>>  3.656  ns/op4.61
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  518.309 ± 
>> 107.352  ns/op0.92
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  515.576 ±  
>> 14.669  ns/op0.98
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  621.185 ±  
>> 11.935  ns/op1.98
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  623.566 ±  
>> 15.206  ns/op1.98
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  400.124 ±  
>>  0.762  ns/op5.64
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  546.486 ±  
>>  1.509  ns/op4.70
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   comments

Thank you for trying my suggestion and new comments. Approved.

I started our testing. Please, wait results.

-

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]

2022-05-21 Thread Quan Anh Mai
On Wed, 4 May 2022 23:27:45 GMT, Vladimir Kozlov  wrote:

>> The changes to `Float` and `Double` look good. I don't think we need 
>> additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.
>> 
>> At first i thought we no longer need PR #8459 but it seems both PRs are 
>> complimentary, albeit PR #8459 has more modest performance gains for the 
>> intrinsics.
>
>> The changes to `Float` and `Double` look good. I don't think we need 
>> additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.
> 
> Thank you, Paul for pointing the test. It means we need to run tier4 (which 
> runs these tests with -Xcomp) to make sure methods are compiled by C2.

@vnkozlov  I have added comments to describe the changes in `cmpOpUCF` and the 
reasons behind the `cmov_regUCF2_eq` match rules. Using `expand` broke the 
build with `Syntax Error: :For expand in cmovI_regUCF2_eq to work, parameter 
declaration order in cmovI_regUCF2_ne must follow matchrule`.

@sviswa7  (x != y) ? a : b can be calculated using pseudocode as follow:

res = (!ZF || PF) ? a : b
= !ZF ? a : (PF ? a : b)

which can be calculated using

cmovp  rb, ra // rb1 = PF ? ra : rb
cmovne rb, ra // rb2 = !ZF ? ra : rb1 = !ZF ? ra : (PF ? ra : rb)

Furthermore, since `(x == y) == !(x != y)`, we have `((x == y) ? a : b) == ((x 
!= y) ? b : a)`, which explains the implementation of `cmov_regUCF2_eq`.

@vamsi-parasa Thanks a lot for your suggestion, I have modified the PR 
description as you say.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v3]

2022-05-21 Thread Quan Anh Mai
> Hi,
> 
> This patch optimises the matching rules for floating-point comparison with 
> respects to eq/ne on x86-64
> 
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
> improves the sequence of `If (CmpF x x) (Bool ne)` from
> 
> ucomiss xmm0, xmm0
> jp  label
> jne label
> 
> into
> 
> ucomiss xmm0, xmm0
> jp  label
> 
> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x 
> == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
> fixing the flags, such as
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> jnp done
> pushf
> andq[rsp], 0xff2b
> popf
> done:
> movleax, 1
> cmovel  eax, ecx
> 
> The patch changes this sequence into
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> movleax, 1
> cmovpl  eax, ecx
> cmovnel eax, ecx
> 
> 3, The patch also changes the pattern of `isInfinite` to be more optimised by 
> using `Math.abs` to reduce 1 comparison and compares the result with 
> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
> 
> The benchmark results are as follow:
> 
>  Before  After
> Benchmark  Mode  Cnt Score ErrorScore 
> Error   Unit   Ratio
> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  594.636 ±   
> 8.922  ns/op4.84
> FPComparison.equalFloatavgt5  3062.430 ±  31.371  663.849 ±   
> 3.656  ns/op4.61
> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  518.309 ± 
> 107.352  ns/op0.92
> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  515.576 ±  
> 14.669  ns/op0.98
> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  621.185 ±  
> 11.935  ns/op1.98
> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  623.566 ±  
> 15.206  ns/op1.98
> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  400.124 ±   
> 0.762  ns/op5.64
> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  546.486 ±   
> 1.509  ns/op4.70
> 
> Thank you very much.

Quan Anh Mai has updated the pull request incrementally with one additional 
commit since the last revision:

  comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8525/files
  - new: https://git.openjdk.java.net/jdk/pull/8525/files/ba93dcf2..7fcfe4a3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=01-02

  Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8525.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8525/head:pull/8525

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

2022-05-20 Thread Vladimir Kozlov
On Fri, 20 May 2022 22:22:43 GMT, Sandhya Viswanathan 
 wrote:

>> I see that you swapped `src, dst` in `match()` but `format` is sill 
>> incorrect and the code is confusing.
>
> I agree with @vnkozlov that this needs explanation. Could you please add 
> comments here with IR and example code generated for both the eq case and ne 
> case? You have some explanation in the PR description but not in the code. 
> The description needs to be in the code as well for maintenance.

Right, I missed that. Then you can use `expand %{` to avoid duplication (keep 
format).

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

2022-05-20 Thread Sandhya Viswanathan
On Wed, 4 May 2022 23:16:41 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/cpu/x86/x86_64.ad line 6998:
>> 
>>> 6996:   ins_encode %{
>>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register);
>>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register);
>> 
>> Should this be `equal`?
>
> I see that you swapped `src, dst` in `match()` but `format` is sill incorrect 
> and the code is confusing.

I agree with @vnkozlov that this needs explanation. Could you please add 
comments here with IR and example code generated for both the eq case and ne 
case? You have some explanation in the PR description but not in the code. The 
description needs to be in the code as well for maintenance.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-20 Thread Sandhya Viswanathan
On Wed, 18 May 2022 14:59:33 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>> Before:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
>> 
>> After:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
>> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
>> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
>> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
>> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
>> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
>> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
>> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
>> 
>> Thank you very much.
>
> I have reverted the changes to `java.lang.Float` and `java.lang.Double` to 
> not interfere with the intrinsic PR. More tests are added to cover all cases 
> regarding floating-point comparison of compiled code.
> 
> The rules for fp comparison that output the result to `rFlagRegsU` are 
> expensive and should be avoided. As a result, I removed the shortcut rules 
> with memory or constant operands to reduce the number of match rules. Only 
> the basic rules are kept.
> 
> Thanks.

@merykitty Very nice work! The patch looks good to me.

@merykitty Very nice work! The patch looks good to me.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

2022-05-19 Thread Srinivas Vamsi Parasa
On Wed, 18 May 2022 14:59:49 GMT, Quan Anh Mai  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>> Before:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
>> 
>> After:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
>> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
>> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
>> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
>> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
>> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
>> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
>> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
>> 
>> Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains eight additional 
> commits since the last revision:
> 
>  - incidental ws
>  - add tests
>  - Merge branch 'master' into fpcompare
>  - fix tests
>  - test
>  - improve infinity
>  - remove expensive rules
>  - improve fp comparison

Could you pls show the performance delta with the baseline and after the patch? 
Otherwise, people reviewing this PR have to manually compute how much 
improvement is obtained. For example, `FPComparison.isNanFloat` is showing 
`4.7x` improvement. Kindly fill the delta column for rest of the data points. 

Instead of two separate tables, the suggested table format for each row would 
be:
` ,  , , `

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

2022-05-18 Thread Quan Anh Mai
On Wed, 4 May 2022 23:16:41 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/cpu/x86/x86_64.ad line 6998:
>> 
>>> 6996:   ins_encode %{
>>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register);
>>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register);
>> 
>> Should this be `equal`?
>
> I see that you swapped `src, dst` in `match()` but `format` is sill incorrect 
> and the code is confusing.

This is a flipping the sense of the test by flipping the input of the `CMove`, 
so this is essentially the same as in the above rule. Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-18 Thread Quan Anh Mai
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai  wrote:

> Hi,
> 
> This patch optimises the matching rules for floating-point comparison with 
> respects to eq/ne on x86-64
> 
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
> improves the sequence of `If (CmpF x x) (Bool ne)` from
> 
> ucomiss xmm0, xmm0
> jp  label
> jne label
> 
> into
> 
> ucomiss xmm0, xmm0
> jp  label
> 
> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x 
> == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
> fixing the flags, such as
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> jnp done
> pushf
> andq[rsp], 0xff2b
> popf
> done:
> movleax, 1
> cmovel  eax, ecx
> 
> The patch changes this sequence into
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> movleax, 1
> cmovpl  eax, ecx
> cmovnel eax, ecx
> 
> 3, The patch also changes the pattern of `isInfinite` to be more optimised by 
> using `Math.abs` to reduce 1 comparison and compares the result with 
> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
> 
> The benchmark results are as follow:
> 
> Before:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
> 
> After:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
> 
> Thank you very much.

I have reverted the changes to `java.lang.Float` and `java.lang.Double` to not 
interfere with the intrinsic PR. More tests are added to cover all cases 
regarding floating-point comparison of compiled code.

The rules for fp comparison that output the result to `rFlagRegsU` are 
expensive and should be avoided. As a result, I removed the shortcut rules with 
memory or constant operands to reduce the number of match rules. Only the basic 
rules are kept.

Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne [v2]

2022-05-18 Thread Quan Anh Mai
> Hi,
> 
> This patch optimises the matching rules for floating-point comparison with 
> respects to eq/ne on x86-64
> 
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
> improves the sequence of `If (CmpF x x) (Bool ne)` from
> 
> ucomiss xmm0, xmm0
> jp  label
> jne label
> 
> into
> 
> ucomiss xmm0, xmm0
> jp  label
> 
> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x 
> == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
> fixing the flags, such as
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> jnp done
> pushf
> andq[rsp], 0xff2b
> popf
> done:
> movleax, 1
> cmovel  eax, ecx
> 
> The patch changes this sequence into
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> movleax, 1
> cmovpl  eax, ecx
> cmovnel eax, ecx
> 
> 3, The patch also changes the pattern of `isInfinite` to be more optimised by 
> using `Math.abs` to reduce 1 comparison and compares the result with 
> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
> 
> The benchmark results are as follow:
> 
> Before:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
> 
> After:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
> 
> Thank you very much.

Quan Anh Mai has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains eight additional commits since 
the last revision:

 - incidental ws
 - add tests
 - Merge branch 'master' into fpcompare
 - fix tests
 - test
 - improve infinity
 - remove expensive rules
 - improve fp comparison

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8525/files
  - new: https://git.openjdk.java.net/jdk/pull/8525/files/b64e04b5..ba93dcf2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8525&range=00-01

  Stats: 210103 lines in 2627 files changed: 159508 ins; 36691 del; 13904 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8525.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8525/head:pull/8525

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-04 Thread Vladimir Kozlov
On Wed, 4 May 2022 22:27:48 GMT, Paul Sandoz  wrote:

> The changes to `Float` and `Double` look good. I don't think we need 
> additional tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.

Thank you, Paul for pointing the test. It means we need to run tier4 (which 
runs these tests with -Xcomp) to make sure methods are compiled by C2.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-04 Thread Vladimir Kozlov
On Wed, 4 May 2022 19:32:41 GMT, Vladimir Kozlov  wrote:

>> Hi,
>> 
>> This patch optimises the matching rules for floating-point comparison with 
>> respects to eq/ne on x86-64
>> 
>> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
>> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
>> improves the sequence of `If (CmpF x x) (Bool ne)` from
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> jne label
>> 
>> into
>> 
>> ucomiss xmm0, xmm0
>> jp  label
>> 
>> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as 
>> `x == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
>> fixing the flags, such as
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> jnp done
>> pushf
>> andq[rsp], 0xff2b
>> popf
>> done:
>> movleax, 1
>> cmovel  eax, ecx
>> 
>> The patch changes this sequence into
>> 
>> xorlecx, ecx
>> ucomiss xmm0, xmm1
>> movleax, 1
>> cmovpl  eax, ecx
>> cmovnel eax, ecx
>> 
>> 3, The patch also changes the pattern of `isInfinite` to be more optimised 
>> by using `Math.abs` to reduce 1 comparison and compares the result with 
>> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
>> 
>> The benchmark results are as follow:
>> 
>> Before:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
>> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
>> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
>> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
>> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
>> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
>> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
>> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
>> 
>> After:
>> Benchmark  Mode  Cnt Score Error  Units
>> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
>> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
>> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
>> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
>> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
>> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
>> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
>> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
>> 
>> Thank you very much.
>
> src/hotspot/cpu/x86/x86_64.ad line 6998:
> 
>> 6996:   ins_encode %{
>> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register);
>> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register);
> 
> Should this be `equal`?

I see that you swapped `src, dst` in `match()` but `format` is sill incorrect 
and the code is confusing.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-04 Thread Paul Sandoz
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai  wrote:

> Hi,
> 
> This patch optimises the matching rules for floating-point comparison with 
> respects to eq/ne on x86-64
> 
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
> improves the sequence of `If (CmpF x x) (Bool ne)` from
> 
> ucomiss xmm0, xmm0
> jp  label
> jne label
> 
> into
> 
> ucomiss xmm0, xmm0
> jp  label
> 
> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x 
> == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
> fixing the flags, such as
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> jnp done
> pushf
> andq[rsp], 0xff2b
> popf
> done:
> movleax, 1
> cmovel  eax, ecx
> 
> The patch changes this sequence into
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> movleax, 1
> cmovpl  eax, ecx
> cmovnel eax, ecx
> 
> 3, The patch also changes the pattern of `isInfinite` to be more optimised by 
> using `Math.abs` to reduce 1 comparison and compares the result with 
> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
> 
> The benchmark results are as follow:
> 
> Before:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
> 
> After:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
> 
> Thank you very much.

The changes to `Float` and `Double` look good. I don't think we need additional 
tests, see test/jdk/java/lang/Math/IeeeRecommendedTests.java.

At first i thought we no longer need PR #8459 but it seems both PRs are 
complimentary, albeit PR #8459 has more modest performance gains for the 
intrinsics.

-

PR: https://git.openjdk.java.net/jdk/pull/8525


Re: RFR: 8285973: x86_64: Improve fp comparison and cmove for eq/ne

2022-05-04 Thread Vladimir Kozlov
On Wed, 4 May 2022 01:59:17 GMT, Quan Anh Mai  wrote:

> Hi,
> 
> This patch optimises the matching rules for floating-point comparison with 
> respects to eq/ne on x86-64
> 
> 1, When the inputs of a comparison is the same (i.e `isNaN` patterns), `ZF` 
> is always set, so we don't need `cmpOpUCF2` for the eq/ne cases, which 
> improves the sequence of `If (CmpF x x) (Bool ne)` from
> 
> ucomiss xmm0, xmm0
> jp  label
> jne label
> 
> into
> 
> ucomiss xmm0, xmm0
> jp  label
> 
> 2, The move rules for `cmpOpUCF2` is missing, which makes patterns such as `x 
> == y ? 1 : 0` to fall back to `cmpOpU`, which have a really high cost of 
> fixing the flags, such as
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> jnp done
> pushf
> andq[rsp], 0xff2b
> popf
> done:
> movleax, 1
> cmovel  eax, ecx
> 
> The patch changes this sequence into
> 
> xorlecx, ecx
> ucomiss xmm0, xmm1
> movleax, 1
> cmovpl  eax, ecx
> cmovnel eax, ecx
> 
> 3, The patch also changes the pattern of `isInfinite` to be more optimised by 
> using `Math.abs` to reduce 1 comparison and compares the result with 
> `MAX_VALUE` since `>` is more optimised than `==` for floating-point types.
> 
> The benchmark results are as follow:
> 
> Before:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5  2876.242 ±  58.875  ns/op
> FPComparison.equalFloatavgt5  3062.430 ±  31.371  ns/op
> FPComparison.isFiniteDoubleavgt5   475.749 ±  19.027  ns/op
> FPComparison.isFiniteFloat avgt5   506.525 ±  14.417  ns/op
> FPComparison.isInfiniteDouble  avgt5  1232.800 ±  31.677  ns/op
> FPComparison.isInfiniteFloat   avgt5  1234.708 ±  70.239  ns/op
> FPComparison.isNanDouble   avgt5  2255.847 ±   7.238  ns/op
> FPComparison.isNanFloatavgt5  2567.044 ±  36.078  ns/op
> 
> After:
> Benchmark  Mode  Cnt Score Error  Units
> FPComparison.equalDouble   avgt5   594.636 ±   8.922  ns/op
> FPComparison.equalFloatavgt5   663.849 ±   3.656  ns/op
> FPComparison.isFiniteDoubleavgt5   518.309 ± 107.352  ns/op
> FPComparison.isFiniteFloat avgt5   515.576 ±  14.669  ns/op
> FPComparison.isInfiniteDouble  avgt5   621.185 ±  11.935  ns/op
> FPComparison.isInfiniteFloat   avgt5   623.566 ±  15.206  ns/op
> FPComparison.isNanDouble   avgt5   400.124 ±   0.762  ns/op
> FPComparison.isNanFloatavgt5   546.486 ±   1.509  ns/op
> 
> Thank you very much.

Thank you for including tests. But you need additional other float compare (not 
just `ea` `ne`) tests since you removed some of `Cmp` instructions.

You need approval from core libs for Java methods changes (it affects all 
platforms). And we will get intrinsics for them soon (I think): #8459.

Please add comments to `cmpOpU*` operands explaining changes in them.

You did not explain removal of some float compare instructions.

src/hotspot/cpu/x86/x86_64.ad line 6998:

> 6996:   ins_encode %{
> 6997: __ cmovl(Assembler::parity, $dst$$Register, $src$$Register);
> 6998: __ cmovl(Assembler::notEqual, $dst$$Register, $src$$Register);

Should this be `equal`?

-

Changes requested by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/8525