Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long [v2]
> Implements x86 intrinsics for compare() method in java.lang.Integer and > java.lang.Long. Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - refactored x86_64.ad code to macro assembly routines - Merge branch 'master' of https://git.openjdk.java.net/jdk into cmp - add JMH benchmarks - 8283726: x86 intrinsics for compare method in Integer and Long - Changes: - all: https://git.openjdk.java.net/jdk/pull/7975/files - new: https://git.openjdk.java.net/jdk/pull/7975/files/b0c3314d..79e4aa50 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=7975=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=7975=00-01 Stats: 5452 lines in 218 files changed: 3925 ins; 856 del; 671 mod Patch: https://git.openjdk.java.net/jdk/pull/7975.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975 PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Tue, 29 Mar 2022 21:56:18 GMT, Vamsi Parasa wrote: >> This is both complicated and inefficient, I would suggest building the >> intrinsic in the IR graph so that the compiler can simplify >> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks. > >> This is both complicated and inefficient, I would suggest building the >> intrinsic in the IR graph so that the compiler can simplify >> `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks. > > Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead > of two and shows ~10% improvement due to better branch prediction. Even > without the intrinsic, the compiler is currently able to reduce it to x u< y > but is still generating two cmp (unsigned) instructions as > Integer.compareUnsigned(x, y) is implemented as x u< y? -1 : (x ==y ? 0 : 1). @vamsi-parasa But normally the result of the `compare` methods is not used as a raw integer (it is useless to do so since the methods do not have any promise regarding the value of the result, only its sign). The idiom is to compare the result with 0, such as `Integer.compare(x, y) > 0`, the compiler can reduce this to `x > y` (last time I checked it does not do so but in principle this is possible). Your intrinsic prevents the compiler to do such transformations, hurting the performance of real programs. > Even without the intrinsic, the compiler is currently able to reduce it to x > u< y It is because the compiler can recognise the pattern `x + MIN_VALUE < y + MIN_VALUE` and transforms it into `x u< y`. This transformation is fragile however if one of the arguments is in the form `x + con`, in such cases constant propagation may lead to slight deviations from recognised patterns, defeat the transformations. As a result, it may be justifiable to have a dedicated intrinsic for that since unsigned comparisons are pretty basic operations that are needed for optimal range check performance. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Tue, 29 Mar 2022 02:24:21 GMT, Jatin Bhateja wrote: >> Implements x86 intrinsics for compare() method in java.lang.Integer and >> java.lang.Long. > > src/hotspot/cpu/x86/x86_64.ad line 12107: > >> 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI >> tmp, rFlagsReg cr) >> 12106: %{ >> 12107: match(Set dst (CompareSignedI op1 op2)); > > Please also include these patterns in x86_32.ad Will update x86_32.ad as well. > src/hotspot/cpu/x86/x86_64.ad line 12125: > >> 12123: __ movl(tmp, 0); >> 12124: __ bind(done); >> 12125: __ movl(dest, tmp); > > Please move this in macro-assembly routine. Sure, will refactor it into a macro-assembly > src/hotspot/cpu/x86/x86_64.ad line 12178: > >> 12176: __ movl(tmp, 0); >> 12177: __ bind(done); >> 12178: __ movl(dest, tmp); > > Please move this into a macro-assembly routine. Sure, will do that and update it soon. > src/hotspot/cpu/x86/x86_64.ad line 12204: > >> 12202: __ movl(tmp, 0); >> 12203: __ bind(done); >> 12204: __ movl(dest, tmp); > > Please move this into macro-assembly routine. Sure, will do that and update it soon. > src/hotspot/share/opto/comparenode.hpp line 67: > >> 65: CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {} >> 66: virtual int Opcode() const; >> 67: }; > > Intent here seems to be to enable further auto-vectorization of newly create > IR nodes. Yes, that is the intention. > test/micro/org/openjdk/bench/java/lang/CompareInteger.java line 78: > >> 76: input2[i] = tmp; >> 77: } >> 78: } > > Logic re-organization suggestion:- > > > for (int i = 0 ; i < BUFFER_SIZE; i++) { > input1[i] = rng.nextLong(); > } > > if (mode.equals("equals") { > GRADIANT = 0; > } else if (mode.equals("greaterThanEquals")) { > GRADIANT = 1; > } else { > assert mode.equals("lessThanEqual"); > GRADIANT = -1; > } > > for(int i = 0 ; i < BUFFER_SIZE; i++) { > input2[i] = input1[i] + i*GRADIANT; > } The suggested refactoring is definitely elegant but one rare possibility is overflow due to the addition/subtraction. The swap logic doesn't have that problem. > test/micro/org/openjdk/bench/java/lang/CompareLong.java line 5: > >> 3: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. >> 4: * >> 5: * This code is free software; you can redistribute it and/or modify it > > We can unify this benchmark along with integer compare micro. Sure, will do the unification. - PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Sun, 27 Mar 2022 06:57:58 GMT, Quan Anh Mai wrote: > This is both complicated and inefficient, I would suggest building the > intrinsic in the IR graph so that the compiler can simplify > `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks. Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead of two and shows ~10% improvement due to better branch prediction. Even without the intrinsic, the compiler is currently able to reduce it to x u< y but is still generating two cmp (unsigned) instructions as Integer.compareUnsigned(x, y) is implemented as x u< y? -1 : (x ==y ? 0 : 1). - PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Sun, 27 Mar 2022 06:15:34 GMT, Vamsi Parasa wrote: > Implements x86 intrinsics for compare() method in java.lang.Integer and > java.lang.Long. src/hotspot/cpu/x86/x86_64.ad line 12107: > 12105: instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI > tmp, rFlagsReg cr) > 12106: %{ > 12107: match(Set dst (CompareSignedI op1 op2)); Please also include these patterns in x86_32.ad src/hotspot/cpu/x86/x86_64.ad line 12125: > 12123: __ movl(tmp, 0); > 12124: __ bind(done); > 12125: __ movl(dest, tmp); Please move this in macro-assembly routine. src/hotspot/cpu/x86/x86_64.ad line 12178: > 12176: __ movl(tmp, 0); > 12177: __ bind(done); > 12178: __ movl(dest, tmp); Please move this into a macro-assembly routine. src/hotspot/cpu/x86/x86_64.ad line 12204: > 12202: __ movl(tmp, 0); > 12203: __ bind(done); > 12204: __ movl(dest, tmp); Please move this into macro-assembly routine. src/hotspot/share/classfile/vmIntrinsics.hpp line 239: > 237: do_intrinsic(_compareUnsigned_i,java_lang_Integer, > compare_unsigned_name, int2_int_signature, F_S) \ > 238:do_name( compare_unsigned_name, > "compareUnsigned") \ > 239: do_intrinsic(_compareUnsigned_l,java_lang_Long, > compare_unsigned_name, long2_int_signature, F_S) \ Creating these methods as intrinsic will create a box around the underneath comparison logic, this shall prevent any regular constant folding which could have optimized out certain control paths, I would suggest to to handle constant folding for newly added nodes in associated Value routines. src/hotspot/share/opto/comparenode.hpp line 67: > 65: CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {} > 66: virtual int Opcode() const; > 67: }; Intent here seems to be to enable further auto-vectorization of newly create IR nodes. test/micro/org/openjdk/bench/java/lang/CompareInteger.java line 78: > 76: input2[i] = tmp; > 77: } > 78: } Logic re-organization suggestion:- for (int i = 0 ; i < BUFFER_SIZE; i++) { input1[i] = rng.nextLong(); } if (mode.equals("equals") { GRADIANT = 0; } else if (mode.equals("greaterThanEquals")) { GRADIANT = 1; } else { assert mode.equals("lessThanEqual"); GRADIANT = -1; } for(int i = 0 ; i < BUFFER_SIZE; i++) { input2[i] = input1[i] + i*GRADIANT; } test/micro/org/openjdk/bench/java/lang/CompareLong.java line 5: > 3: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. > 4: * > 5: * This code is free software; you can redistribute it and/or modify it We can unify this benchmark along with integer compare micro. - PR: https://git.openjdk.java.net/jdk/pull/7975
Re: RFR: 8283726: x86 intrinsics for compare method in Integer and Long
On Sun, 27 Mar 2022 06:15:34 GMT, Vamsi Parasa wrote: > Implements x86 intrinsics for compare() method in java.lang.Integer and > java.lang.Long. This is both complicated and inefficient, I would suggest building the intrinsic in the IR graph so that the compiler can simplify `Integer.compareUnsigned(x, y) < 0` into `x u< y`. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/7975
RFR: 8283726: x86 intrinsics for compare method in Integer and Long
Implements x86 intrinsics for compare() method in java.lang.Integer and java.lang.Long. - Commit messages: - add JMH benchmarks - 8283726: x86 intrinsics for compare method in Integer and Long Changes: https://git.openjdk.java.net/jdk/pull/7975/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=7975=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8283726 Stats: 430 lines in 13 files changed: 428 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/7975.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975 PR: https://git.openjdk.java.net/jdk/pull/7975