Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long [v2]

2022-03-30 Thread Vamsi Parasa
> Implements x86 intrinsics for compare() method in java.lang.Integer and 
> java.lang.Long.

Vamsi Parasa has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains four additional commits since 
the last revision:

 - refactored x86_64.ad code to macro assembly routines
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into cmp
 - add JMH benchmarks
 - 8283726: x86 intrinsics for compare method in Integer and Long

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/7975/files
  - new: https://git.openjdk.java.net/jdk/pull/7975/files/b0c3314d..79e4aa50

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=7975=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7975=00-01

  Stats: 5452 lines in 218 files changed: 3925 ins; 856 del; 671 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7975.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-29 Thread Quan Anh Mai
On Tue, 29 Mar 2022 21:56:18 GMT, Vamsi Parasa  wrote:

>> This is both complicated and inefficient, I would suggest building the 
>> intrinsic in the IR graph so that the compiler can simplify 
>> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks.
>
>> This is both complicated and inefficient, I would suggest building the 
>> intrinsic in the IR graph so that the compiler can simplify 
>> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks.
> 
> Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead 
> of two and shows ~10% improvement due to better branch prediction. Even 
> without the intrinsic, the compiler is currently able to reduce it to x u< y 
> but is still generating two cmp (unsigned) instructions as 
> Integer.compareUnsigned(x, y) is implemented as x u< y? -1 : (x ==y ? 0 : 1).

@vamsi-parasa  But normally the result of the `compare` methods is not used as 
a raw integer (it is useless to do so since the methods do not have any promise 
regarding the value of the result, only its sign). The idiom is to compare the 
result with 0, such as `Integer.compare(x, y) > 0`, the compiler can reduce 
this to `x > y` (last time I checked it does not do so but in principle this is 
possible). Your intrinsic prevents the compiler to do such transformations, 
hurting the performance of real programs.

> Even without the intrinsic, the compiler is currently able to reduce it to x 
> u< y

It is because the compiler can recognise the pattern `x + MIN_VALUE < y + 
MIN_VALUE` and transforms it into `x u< y`. This transformation is fragile 
however if one of the arguments is in the form `x + con`, in such cases 
constant propagation may lead to slight deviations from recognised patterns, 
defeat the transformations. As a result, it may be justifiable to have a 
dedicated intrinsic for that since unsigned comparisons are pretty basic 
operations that are needed for optimal range check performance.

Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-29 Thread Vamsi Parasa
On Tue, 29 Mar 2022 02:24:21 GMT, Jatin Bhateja  wrote:

>> Implements x86 intrinsics for compare() method in java.lang.Integer and 
>> java.lang.Long.
>
> src/hotspot/cpu/x86/x86_64.ad line 12107:
> 
>> 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI 
>> tmp, rFlagsReg cr)
>> 12106: %{
>> 12107:   match(Set dst (CompareSignedI op1 op2));
> 
> Please also include these patterns in x86_32.ad

Will update x86_32.ad as well.

> src/hotspot/cpu/x86/x86_64.ad line 12125:
> 
>> 12123: __ movl(tmp, 0);
>> 12124: __ bind(done);
>> 12125: __ movl(dest, tmp);
> 
> Please move this in macro-assembly routine.

Sure, will refactor it into a macro-assembly

> src/hotspot/cpu/x86/x86_64.ad line 12178:
> 
>> 12176: __ movl(tmp, 0);
>> 12177: __ bind(done);
>> 12178: __ movl(dest, tmp);
> 
> Please move this into a macro-assembly routine.

Sure, will do that and update it soon.

> src/hotspot/cpu/x86/x86_64.ad line 12204:
> 
>> 12202: __ movl(tmp, 0);
>> 12203: __ bind(done);
>> 12204: __ movl(dest, tmp);
> 
> Please move this into macro-assembly routine.

Sure, will do that and update it soon.

> src/hotspot/share/opto/comparenode.hpp line 67:
> 
>> 65: CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
>> 66: virtual int Opcode() const;
>> 67: };
> 
> Intent here seems to be to enable further auto-vectorization of newly create 
> IR nodes.

Yes, that is the intention.

> test/micro/org/openjdk/bench/java/lang/CompareInteger.java line 78:
> 
>> 76: input2[i] = tmp;
>> 77: }
>> 78: }
> 
> Logic re-organization suggestion:- 
>  
> 
>  for (int i = 0 ; i < BUFFER_SIZE; i++) {
> input1[i] = rng.nextLong();
>  }
> 
>  if (mode.equals("equals") {
> GRADIANT = 0;
>  } else if (mode.equals("greaterThanEquals")) {
> GRADIANT = 1;
>  } else {
> assert mode.equals("lessThanEqual");
> GRADIANT = -1;
>  }
> 
>  for(int i = 0 ; i < BUFFER_SIZE; i++) {
> input2[i] = input1[i] + i*GRADIANT;
>  }

The suggested refactoring is definitely elegant but one rare possibility is 
overflow due to the addition/subtraction. The swap logic doesn't have that 
problem.

> test/micro/org/openjdk/bench/java/lang/CompareLong.java line 5:
> 
>> 3:  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>> 4:  *
>> 5:  * This code is free software; you can redistribute it and/or modify it
> 
> We can unify this benchmark along with integer compare micro.

Sure, will do the unification.

-

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-29 Thread Vamsi Parasa
On Sun, 27 Mar 2022 06:57:58 GMT, Quan Anh Mai  wrote:

> This is both complicated and inefficient, I would suggest building the 
> intrinsic in the IR graph so that the compiler can simplify 
> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks.

Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead of 
two and shows ~10% improvement due to better branch prediction. Even without 
the intrinsic, the compiler is currently able to reduce it to x u< y but is 
still generating two cmp (unsigned) instructions as Integer.compareUnsigned(x, 
y) is implemented as x u< y? -1 : (x ==y ? 0 : 1).

-

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-28 Thread Jatin Bhateja
On Sun, 27 Mar 2022 06:15:34 GMT, Vamsi Parasa  wrote:

> Implements x86 intrinsics for compare() method in java.lang.Integer and 
> java.lang.Long.

src/hotspot/cpu/x86/x86_64.ad line 12107:

> 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI 
> tmp, rFlagsReg cr)
> 12106: %{
> 12107:   match(Set dst (CompareSignedI op1 op2));

Please also include these patterns in x86_32.ad

src/hotspot/cpu/x86/x86_64.ad line 12125:

> 12123:  __ movl(tmp, 0);
> 12124: __ bind(done);
> 12125: __ movl(dest, tmp);

Please move this in macro-assembly routine.

src/hotspot/cpu/x86/x86_64.ad line 12178:

> 12176:  __ movl(tmp, 0);
> 12177: __ bind(done);
> 12178: __ movl(dest, tmp);

Please move this into a macro-assembly routine.

src/hotspot/cpu/x86/x86_64.ad line 12204:

> 12202:  __ movl(tmp, 0);
> 12203: __ bind(done);
> 12204: __ movl(dest, tmp);

Please move this into macro-assembly routine.

src/hotspot/share/classfile/vmIntrinsics.hpp line 239:

> 237:   do_intrinsic(_compareUnsigned_i,java_lang_Integer,  
> compare_unsigned_name,   int2_int_signature,   F_S)   \
> 238:do_name( compare_unsigned_name, 
> "compareUnsigned")   \
> 239:   do_intrinsic(_compareUnsigned_l,java_lang_Long, 
> compare_unsigned_name,   long2_int_signature,   F_S)  \

Creating these methods as intrinsic will create a box around the underneath 
comparison logic, this shall prevent any regular constant folding which could 
have optimized out certain control paths, I would suggest to to handle constant 
folding for newly added nodes in associated Value routines.

src/hotspot/share/opto/comparenode.hpp line 67:

> 65: CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
> 66: virtual int Opcode() const;
> 67: };

Intent here seems to be to enable further auto-vectorization of newly create IR 
nodes.

test/micro/org/openjdk/bench/java/lang/CompareInteger.java line 78:

> 76: input2[i] = tmp;
> 77: }
> 78: }

Logic re-organization suggestion:- 
 

 for (int i = 0 ; i < BUFFER_SIZE; i++) {
input1[i] = rng.nextLong();
 }

 if (mode.equals("equals") {
GRADIANT = 0;
 } else if (mode.equals("greaterThanEquals")) {
GRADIANT = 1;
 } else {
assert mode.equals("lessThanEqual");
GRADIANT = -1;
 }

 for(int i = 0 ; i < BUFFER_SIZE; i++) {
input2[i] = input1[i] + i*GRADIANT;
 }

test/micro/org/openjdk/bench/java/lang/CompareLong.java line 5:

> 3:  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
> 4:  *
> 5:  * This code is free software; you can redistribute it and/or modify it

We can unify this benchmark along with integer compare micro.

-

PR: https://git.openjdk.java.net/jdk/pull/7975


Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-27 Thread Quan Anh Mai
On Sun, 27 Mar 2022 06:15:34 GMT, Vamsi Parasa  wrote:

> Implements x86 intrinsics for compare() method in java.lang.Integer and 
> java.lang.Long.

This is both complicated and inefficient, I would suggest building the 
intrinsic in the IR graph so that the compiler can simplify 
`Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/7975


RFR: 8283726: x86 intrinsics for compare method in Integer and Long

2022-03-27 Thread Vamsi Parasa
Implements x86 intrinsics for compare() method in java.lang.Integer and 
java.lang.Long.

-

Commit messages:
 - add JMH benchmarks
 - 8283726: x86 intrinsics for compare method in Integer and Long

Changes: https://git.openjdk.java.net/jdk/pull/7975/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7975=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8283726
  Stats: 430 lines in 13 files changed: 428 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7975.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975

PR: https://git.openjdk.java.net/jdk/pull/7975