[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #13 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:0b93a0ae153ef70a82ff63e67926a01fdab9956b commit r15-520-g0b93a0ae153ef70a82ff63e67926a01fdab9956b Author: Jakub Jelinek Date: Wed May 15 18:37:17 2024 +0200 combine: Fix up simplify_compare_const [PR115092] The following testcases are miscompiled (with tons of GIMPLE optimization disabled) because combine sees GE comparison of 1-bit sign_extract (i.e. something with [-1, 0] value range) with (const_int -1) (which is always true) and optimizes it into NE comparison of 1-bit zero_extract ([0, 1] value range) against (const_int 0). The reason is that simplify_compare_const first (correctly) simplifies the comparison to GE (ashift:SI something (const_int 31)) (const_int -2147483648) and then an optimization for when the second operand is power of 2 triggers. That optimization is fine for power of 2s which aren't the signed minimum of the mode, or if it is NE, EQ, GEU or LTU against the signed minimum of the mode, but for GE or LT optimizing it into NE (or EQ) against const0_rtx is wrong, those cases are always true or always false (but the function doesn't have a standardized way to tell callers the comparison is now unconditional). The following patch just disables the optimization in that case. 2024-05-15 Jakub Jelinek PR rtl-optimization/114902 PR rtl-optimization/115092 * combine.cc (simplify_compare_const): Don't optimize GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or EQ op0 const0_rtx. * gcc.dg/pr114902.c: New test. * gcc.dg/pr115092.c: New test.
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #12 from Andrew Pinski --- *** Bug 115092 has been marked as a duplicate of this bug. ***
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #11 from Segher Boessenkool --- So, is there a simplified testcase that *actually* shows any *actual* problem?
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #10 from Segher Boessenkool --- (_extract, btw.)
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #9 from Segher Boessenkool --- (In reply to Andrew Pinski from comment #2) > We go from CCGC with a sign_extend to a zero_extend with CCZ. that can't be > right. Why not? We prefer zero_extend whenever it has the same result.
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 Richard Biener changed: What|Removed |Added Target Milestone|14.0|14.2 --- Comment #8 from Richard Biener --- GCC 14.1 is being released, retargeting bugs to GCC 14.2.
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #7 from Andrew Pinski --- (In reply to Segher Boessenkool from comment #6) > (In reply to Andrew Pinski from comment #2) > > Looks like the issue is during combine. > > > > We go from CCGC with a sign_extend to a zero_extend with CCZ. that can't be > > right. > > Why is that not correct? zero_extend is preferred over sign_extend, and both > are equivalent when only checking for zero. For Equality they are equivalent yes. But when doing `a >=s 0` a sign extend/extract will cause different results from a zero extend/extract. > Is there something wrong in target code here, perhaps? For arm, x86 and mips? For testcase in comment #4 on x86_64: Before combine we start with: ``` (insn 16 15 17 2 (parallel [ (set (reg:SI 106 [ t_4 ]) (and:SI (reg:SI 105 [ tt1_3 ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "/app/example.cpp":6:9 617 {*andsi_1} (expr_list:REG_DEAD (reg:SI 105 [ tt1_3 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 17 16 20 2 (parallel [ (set (reg:SI 107 [ e_5 ]) (neg:SI (reg:SI 106 [ t_4 ]))) (clobber (reg:CC 17 flags)) ]) "/app/example.cpp":7:9 804 {*negsi_1} (expr_list:REG_DEAD (reg:SI 106 [ t_4 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 20 17 21 2 (set (reg:CCGC 17 flags) (compare:CCGC (reg:SI 107 [ e_5 ]) (const_int -1 [0x]))) "/app/example.cpp":8:16 11 {*cmpsi_1} (expr_list:REG_DEAD (reg:SI 107 [ e_5 ]) (nil))) (insn 21 20 22 2 (set (reg:QI 109) (ge:QI (reg:CCGC 17 flags) (const_int 0 [0]))) "/app/example.cpp":8:16 1125 {*setcc_qi} (expr_list:REG_DEAD (reg:CCGC 17 flags) (nil))) (insn 22 21 23 2 (set (reg:SI 108 [ _1 ]) (zero_extend:SI (reg:QI 109))) "/app/example.cpp":8:16 169 {*zero_extendqisi2} (expr_list:REG_DEAD (reg:QI 109) (nil))) (insn 23 22 24 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 108 [ _1 ]) (const_int 0 [0]))) "/app/example.cpp":9:8 7 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 108 [ _1 ]) (nil))) (jump_insn 24 23 30 2 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 30) (pc))) "/app/example.cpp":9:8 1130 {*jcc} (expr_list:REG_DEAD (reg:CCZ 17 flags) (int_list:REG_BR_PROB 7 (nil))) -> 30) ``` We first combine 16->17 into: ``` (parallel [ (set (reg:SI 107 [ e_5 ]) (sign_extract:SI (reg:SI 105 [ tt1_3 ]) (const_int 1 [0x1]) (const_int 0 [0]))) (clobber (reg:CC 17 flags)) ]) ``` which is correct and good And then when combining 17 -> 20 combine does: Trying 17 -> 20: 17: {r107:SI=sign_extract(r105:SI,0x1,0);clobber flags:CC;} REG_DEAD r105:SI REG_UNUSED flags:CC 20: flags:CCGC=cmp(r107:SI,0x) REG_DEAD r107:SI Successfully matched this instruction: (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:SI (reg:SI 105 [ tt1_3 ]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 0 [0]))) Successfully matched this instruction: (set (reg:QI 109) (ne:QI (reg:CCZ 17 flags) (const_int 0 [0]))) Which is also replacing insn 21 incorrectly. We go from `-(a&1) >= -1` (which is always true) to `(a&1) != 0`. Maybe we go to `(a&1) <= 1` (still always true) and we mess up somehow to `(a & 1) != 0`
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #6 from Segher Boessenkool --- (In reply to Andrew Pinski from comment #2) > Looks like the issue is during combine. > > We go from CCGC with a sign_extend to a zero_extend with CCZ. that can't be > right. Why is that not correct? zero_extend is preferred over sign_extend, and both are equivalent when only checking for zero. Is there something wrong in target code here, perhaps?
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #5 from Andrew Pinski --- (In reply to Andrew Pinski from comment #4) > here is a reduced testcase: > Note ` -O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp > -fno-tree-dominator-opts` This testcase is broken in GCC 13 for mips64-linux-gnu with the added option -march=octeon. And it has been broken since at least 4.9.4. andi$4,$4,0x1 xori$4,$4,0x1 teq $4,0 j $31 move$2,$0 That is: $4 = $4 & 0x1 $4 = $4 ^ 1 trapif $4 == 0 That is the earliest compiler version I could test where I Know that sign_extract shows up in RTL.
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #4 from Andrew Pinski --- here is a reduced testcase: ``` [[gnu::noipa]] int f(int b) { int tt1 = ~b; int t = 1 & tt1; int e = -t; int tt = e >= -1; if (tt) return 0; __builtin_trap(); } int main() { for(int i = -1;i < 2; i++) f(i); } ``` Note ` -O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp -fno-tree-dominator-opts` is needed to reproduce it with this one. The generate gimple is the same between GCC 13 and 14 here. But the first difference is in combine: ``` Trying 7 -> 8: 7: {r106:SI=r105:SI&0x1;clobber flags:CC;} REG_DEAD r105:SI REG_UNUSED flags:CC 8: {r107:SI=-r106:SI;clobber flags:CC;} REG_DEAD r106:SI REG_UNUSED flags:CC Successfully matched this instruction: (parallel [ (set (reg:SI 107 [ e_5 ]) (sign_extract:SI (reg:SI 105 [ tt1_3 ]) (const_int 1 [0x1]) (const_int 0 [0]))) (clobber (reg:CC 17 flags)) ]) allowing combination of insns 7 and 8 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 7. modifying insn i3 8: {r107:SI=sign_extract(r105:SI,0x1,0);clobber flags:CC;} REG_DEAD r105:SI ``` This is correct but it goes down hill after like as I mentioned in comment #2. So it does look like a latent bug after all. If someone does a bisect of this testcase, I am 99% sure you find r14-4810-ge28869670c9879 is where the failure was introduced. For the original testcase and the one in comment #1 might find a different commit due to gimple level being different.
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 --- Comment #3 from Andrew Pinski --- Note this is almost definitely a latent bug exposed by some change. Might be interesting to see what change exposed it but not so much really.
[Bug rtl-optimization/114902] [14/15 Regression] wrong code at -O3 with "-fno-tree-vrp -fno-expensive-optimizations -fno-tree-dominator-opts" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114902 Andrew Pinski changed: What|Removed |Added Component|target |rtl-optimization --- Comment #2 from Andrew Pinski --- Looks like the issue is during combine. After combine we have: ``` 12: r113:SI=[`b'] 13: r112:SI=~r113:SI REG_DEAD r113:SI REG_EQUAL ~[`b'] 14: NOTE_INSN_DELETED 15: {r109:SI=sign_extract(r112:SI,0x1,0);clobber flags:CC;} REG_UNUSED flags:CC 18: NOTE_INSN_DELETED 19: NOTE_INSN_DELETED 22: r117:SI=0x1 21: flags:CCZ=cmp(zero_extract(r112:SI,0x1,0),0) REG_DEAD r112:SI 23: r106:SI={(flags:CCZ==0)?r109:SI:r117:SI} REG_DEAD r117:SI REG_DEAD r109:SI REG_DEAD flags:CCZ REG_EQUAL {(flags:CCZ==0)?r109:SI:0x1} ``` insn 21 is wrong. ``` Trying 15 -> 18: 15: {r109:SI=sign_extract(r112:SI,0x1,0);clobber flags:CC;} REG_DEAD r112:SI REG_UNUSED flags:CC 18: flags:CCGC=cmp(r109:SI,0x) Failed to match this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 0 [0]))) (set (reg/v:SI 109 [ eD.2798 ]) (sign_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0]))) ]) Failed to match this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 0 [0]))) (set (reg/v:SI 109 [ eD.2798 ]) (sign_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0]))) ]) Failed to match this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (and:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1])) (const_int 0 [0]))) (set (reg/v:SI 109 [ eD.2798 ]) (sign_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0]))) ]) Failed to match this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (and:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1])) (const_int 0 [0]))) (set (reg/v:SI 109 [ eD.2798 ]) (sign_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0]))) ]) Successfully matched this instruction: (set (reg/v:SI 109 [ eD.2798 ]) (sign_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0]))) Successfully matched this instruction: (set (reg:CCZ 17 flags) (compare:CCZ (zero_extract:SI (reg:SI 112 [ _2 ]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 0 [0]))) Successfully matched this instruction: (set (reg:QI 115 [ _10 ]) (ne:QI (reg:CCZ 17 flags) (const_int 0 [0]))) allowing combination of insns 15 and 18 original costs 4 + 4 = 12 replacement costs 4 + 4 = 12 modifying other_insn19: r115:QI=flags:CCZ!=0 REG_DEAD flags:CCGC deferring rescan insn with uid = 19. modifying insn i215: {r109:SI=sign_extract(r112:SI,0x1,0);clobber flags:CC;} REG_UNUSED flags:CC deferring rescan insn with uid = 15. modifying insn i318: flags:CCZ=cmp(zero_extract(r112:SI,0x1,0),0) REG_DEAD r112:SI deferring rescan insn with uid = 18. ``` We go from CCGC with a sign_extend to a zero_extend with CCZ. that can't be right.