[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #12 from Andrew Pinski --- Note the original example in comment #0 is now optimized for GCC 14 but only at the RTL level rather than the gimple level.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #11 from CVS Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:6d449531a60b56ed0f4aeb640aa9e46e4ec35208 commit r14-2698-g6d449531a60b56ed0f4aeb640aa9e46e4ec35208 Author: Andrew Pinski Date: Thu Jul 20 17:36:29 2023 -0700 MATCH: Add Max,a> -> Max simplifcation This adds a simple match pattern to simplify `max,a>` to `max`. Reassociation handles this already (r0-77700-ge969dbde29bfd396259357) but seems like we should be able to handle this even before reassociation. This fixes part of PR tree-optimization/80574 but more work is needed fix it the rest of the way. The original testcase there is fixed but the RTL level is what fixes it the rest of the way. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (minmax,a>->minmax): New transformation. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-12.c: Disable all of the passes that enables match-and-simplify. * gcc.dg/tree-ssa/minmax-23.c: New test.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #10 from Andrew Pinski --- (In reply to Andrew Pinski from comment #9) > One thing I noticed is that: > _2 = MAX_EXPR <_6, a3_7(D)>; > _3 = MAX_EXPR <_2, a3_7(D)>; > > Is not optimized at all. > > (for minmax (min max) > (simplify > (minmax:c (minmax:c@2 @0 @1) @0) > @2)) Submitted the patch for that as https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625135.html . Note after that patch we get decent code for the original testcases but it is not fully optimized at the gimple level.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #9 from Andrew Pinski --- One thing I noticed is that: _2 = MAX_EXPR <_6, a3_7(D)>; _3 = MAX_EXPR <_2, a3_7(D)>; Is not optimized at all. (for minmax (min max) (simplify (minmax:c (minmax:c@2 @0 @1) @0) @2))
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 Andrew Pinski changed: What|Removed |Added Target Milestone|8.0 |--- --- Comment #8 from Andrew Pinski --- (In reply to Andrew Pinski from comment #7) > The original testcase in comment #0 is fixed in GCC 8, I don't know what > caused the improvement though. Well actually if you use the C++ front-end, it still fails. for f2_signed, we start out as: _1 = MAX_EXPR ; if (_1 >= a1_6(D)) goto ; [INV] else goto ; [INV] : if (a3_4(D) < a2_5(D)) goto ; [INV] else goto ; [INV] : : # iftmp.5_2 = PHI return iftmp.5_2; phiopt1 transforms it to: _1 = MAX_EXPR ; if (_1 >= a1_6(D)) goto ; [INV] else goto ; [INV] : _3 = MAX_EXPR ; : # iftmp.12_2 = PHI <_3(3), a1_6(D)(2)> Which is perfect. But then we don't exactly patch that _1 and _3 are the same though we do try to simplify it at least on the trunk: phiopt match-simplify trying: _1 >= a1_6(D) ? _3 : a1_6(D) phiopt match-simplify trying: _1 < a1_6(D) ? a1_6(D) : _3 What happens afterwards is fre (or is it pre) figures out _1 and _3 are the same and get: if (_1 >= a1_6(D)) goto ; [INV] else goto ; [INV] : : # iftmp.12_2 = PHI <_1(3), a1_6(D)(2)> Which then phiopt2 is able to simplify. So if we iterate phiopt and fre we should able to handle all of these but that is NOT a reasonable solution. I have to think of a good way of solving these really.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |8.0 --- Comment #7 from Andrew Pinski --- The original testcase in comment #0 is fixed in GCC 8, I don't know what caused the improvement though.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 SztfG at yandex dot ru changed: What|Removed |Added CC||SztfG at yandex dot ru --- Comment #6 from SztfG at yandex dot ru --- Created attachment 41316 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41316=edit some benchmark with macro stuff and std::max Well, maybe this is also not related to this issue, but here is some benchmark, and std::max is slower than macro
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-05-02 Ever confirmed|0 |1
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #5 from SztfG at yandex dot ru --- > He did not claim it was always better... Ahh, so I need to do some research to figure out, in which cases static inline function is better, and in which macro is better. It's bad > Please don't mix unrelated issues OK, will fill this in another bugreport
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #4 from Marc Glisse --- (In reply to SztfG from comment #3) > Georg-Johann Lay, GCC not always do things better if use static inline > function instead macro. He did not claim it was always better... > For example, this code: Please don't mix unrelated issues, you can file a different bug if you want that one addressed. In that case, it is because we have an old optimization in fold_unary (like other "do ... if ... simplifies" it is not straightforward to move it to match.pd) and thus only applies when everything is part of the same expression in the source code. /* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify. */ If you use even less macros and more lines, things optimize a bit better: m_xnor: TYPE tmp=!!a^!!b; return !tmp; but we still have some discrepancy where we optimize (a^b)==0 but not ~(a^b) (for _Bool type). This will be much more convenient to analyze in a different bug report.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #3 from SztfG at yandex dot ru --- Georg-Johann Lay, GCC not always do things better if use static inline function instead macro. For example, this code: #include #define TYPE uint8_t #define M_XOR(a,b) ((!!a)^(!!b)) #define M_NXOR(a,b) (!((!!a)^(!!b))) __attribute__((__always_inline__, const)) static inline TYPE m_xor (const TYPE a, const TYPE b) { return M_XOR(a,b); } __attribute__((__always_inline__, const)) static inline TYPE m_xnor (const TYPE a, const TYPE b) { return M_NXOR(a,b); } // bad assembly output int test1b(const TYPE a, const TYPE b) { return m_xor(a,b) == !m_xnor(a,b); } int test2b(const TYPE a, const TYPE b) { return !m_xor(a,b) == m_xnor(a,b); } int test3b(const TYPE a, const TYPE b) { return M_XOR(a,b) == !m_xnor(a,b); } // good assembly output int test1g(const TYPE a, const TYPE b) { return m_xor(a,b) == M_XOR(a,b); } int test2g(const TYPE a, const TYPE b) { return M_XOR(a,b) == !M_NXOR(a,b); } int test3g(const TYPE a, const TYPE b) { return M_XOR(a,b) != !M_NXOR(a,b);; }
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 Georg-Johann Lay changed: What|Removed |Added CC||gjl at gcc dot gnu.org --- Comment #2 from Georg-Johann Lay --- GCC performs poor on code expanded from macros that (recursively) duplicate macro arguments. Some time ago I digged into it, and the reason was that it failed to recognize MIN_EXPR / MAX_EXPR because some optimizations factor out common subextressions. Yet another problem is that the expressions inside the conditions get promoted to int resp. unsigned, whereas the target values remain types smaller than int. (Is was actually more complex code that implemented saturation by nested MIN / MAX expressions. As a work around, you can try to use inline functions so that GCC will recognize MAX_EXPR and MIN_EXPR as expected. The drawback is that you need a series or macros for each input type like: int8_t, uint8_t, int16_t, ... (unsigned and signed should be enough thou). Sample work around for unsigned char: #define MAX_1(VAR, ...) \ (VAR) #define MAX_2(VAR, ...) \ (((VAR)>MAX_1(__VA_ARGS__))?(VAR):MAX_1(__VA_ARGS__)) __attribute__((__always_inline__)) static inline unsigned char max2 (unsigned char a, unsigned char b) { return MAX_2 (a, b); } #undef MAX_2 #define MAX_2(a, b) max2 (a, b) #define MAX_3(VAR, ...) \ (MAX_2 ((VAR), MAX_2(__VA_ARGS__))) #define MAX_4(VAR, ...) \ (MAX_2 ((VAR), MAX_3(__VA_ARGS__))) #define MAX_5(VAR, ...) \ (MAX_2 ((VAR), MAX_4(__VA_ARGS__))) #define MAX_6(VAR, ...) \ (MAX_2 ((VAR), MAX_5(__VA_ARGS__))) The .original dump as generated with -fdump-tree-original reads now: ;; Function max2 (null) { return MAX_EXPR ; } ;; Function f1_unsigned (null) { return max2 (a1, max2 (a2, max2 (a3, max2 (a1, max2 (a2, a3); } ;; Function f2_unsigned (null) { return max2 (a1, max2 (a2, a3)); } and after inline expansion everything is nice with MAX_EXPR whereas your original code leads to: ; Function f1_unsigned (null) { return (unsigned char) MAX_EXPR a3 ? (int) a2 : (int) a3, (int) a1>, (int) a3>, (int) a2>, (int) a1>; } ;; Function f2_unsigned (null) { return (unsigned char) MAX_EXPR a3 ? (int) a2 : (int) a3, (int) a1>; } Not all of the expressions are recognized as MAX_EXPR.
[Bug tree-optimization/80574] GCC fail to optimize nested ternary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80574 --- Comment #1 from Marc Glisse --- With -fdump-tree-original, the signed case looks perfect: return MAX_EXPR , a1>, a3>, a2>, a1>; (which reassoc eventually simplifies) while in the unsigned case, we fail to recognize the innermost max: return (unsigned char) MAX_EXPR a3 ? (int) a2 : (int) a3, (int) a1>, (int) a3>, (int) a2>, (int) a1>; and we also fail during gimple, probably because of the conversions.