[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #18 from GCC Commits --- The releases/gcc-13 branch has been updated by Richard Biener : https://gcc.gnu.org/g:14a16787d99831a28b0c9690e80c420d765ba26f commit r13-8702-g14a16787d99831a28b0c9690e80c420d765ba26f Author: Richard Biener Date: Wed Feb 28 12:37:07 2024 +0100 tree-optimization/114121 - wrong VN with context sensitive range info When VN ends up exploiting range-info specifying the ao_ref offset and max_size we have to make sure to reflect this in the hashtable entry for the recorded expression. The PR113831 fix handled the case where we can encode this in the operands themselves but this bug shows the issue is more widespread. So instead of altering the operands the following instead records this extra info that's possibly used, only throwing it away when the value-numbering didn't come up with a non-VARYING value which is an important detail to preserve CSE as opposed to constant folding which is where all cases currently known popped up. With this the original PR113831 fix can be reverted. PR tree-optimization/113831 PR tree-optimization/114121 * tree-ssa-sccvn.h (vn_reference_s::offset, vn_reference_s::max_size): New fields. (vn_reference_insert_pieces): Adjust prototype. * tree-ssa-pre.cc (phi_translate_1): Preserve offset/max_size. * tree-ssa-sccvn.cc (vn_reference_eq): Compare offset and size, allow using "don't know" state. (vn_walk_cb_data::finish): Pass along offset/max_size. (vn_reference_lookup_or_insert_for_pieces): Take offset and max_size as argument and use it. (vn_reference_lookup_3): Properly adjust offset and max_size according to the adjusted ao_ref. (vn_reference_lookup_pieces): Initialize offset and max_size. (vn_reference_lookup): Likewise. (vn_reference_lookup_call): Likewise. (vn_reference_insert): Likewise. (visit_reference_op_call): Likewise. (vn_reference_insert_pieces): Take offset and max_size as argument and use it. * gcc.dg/torture/pr113831.c: New testcase. (cherry picked from commit c841144a94363ff26e40ab3f26b14702c32987a8)
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #17 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:73dac51b32575f980289c073969c6d825963d076 commit r14-9440-g73dac51b32575f980289c073969c6d825963d076 Author: Richard Biener Date: Tue Mar 12 14:00:05 2024 +0100 tree-optimization/114121 - chrec_fold_{plus,multiply} and recursion The following addresses endless recursion in the chrec_fold_{plus,multiply} functions when handling sign-conversions. We only need to apply tricks when we'd fail (there's a chrec in the converted operand) and we need to make sure to not turn the other operand into something worse (for the chrec-vs-chrec case). PR tree-optimization/114121 * tree-chrec.cc (chrec_fold_plus_1): Guard recursion with converted operand properly. (chrec_fold_multiply): Likewise. Handle missed recursion. * gcc.dg/torture/pr114312.c: New testcase.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |14.0
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #16 from Richard Biener --- Fixed.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #15 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:c841144a94363ff26e40ab3f26b14702c32987a8 commit r14-9215-gc841144a94363ff26e40ab3f26b14702c32987a8 Author: Richard Biener Date: Wed Feb 28 12:37:07 2024 +0100 tree-optimization/114121 - wrong VN with context sensitive range info When VN ends up exploiting range-info specifying the ao_ref offset and max_size we have to make sure to reflect this in the hashtable entry for the recorded expression. The PR113831 fix handled the case where we can encode this in the operands themselves but this bug shows the issue is more widespread. So instead of altering the operands the following instead records this extra info that's possibly used, only throwing it away when the value-numbering didn't come up with a non-VARYING value which is an important detail to preserve CSE as opposed to constant folding which is where all cases currently known popped up. With this the original PR113831 fix can be reverted. PR tree-optimization/114121 * tree-ssa-sccvn.h (vn_reference_s::offset, vn_reference_s::max_size): New fields. (vn_reference_insert_pieces): Adjust prototype. * tree-ssa-pre.cc (phi_translate_1): Preserve offset/max_size. * tree-ssa-sccvn.cc (vn_reference_eq): Compare offset and size, allow using "don't know" state. (vn_walk_cb_data::finish): Pass along offset/max_size. (vn_reference_lookup_or_insert_for_pieces): Take offset and max_size as argument and use it. (vn_reference_lookup_3): Properly adjust offset and max_size according to the adjusted ao_ref. (vn_reference_lookup_pieces): Initialize offset and max_size. (vn_reference_lookup): Likewise. (vn_reference_lookup_call): Likewise. (vn_reference_insert): Likewise. (visit_reference_op_call): Likewise. (vn_reference_insert_pieces): Take offset and max_size as argument and use it. * gcc.dg/torture/pr114121.c: New testcase.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #14 from Jakub Jelinek --- Tried __attribute__((noipa)) unsigned long foo (unsigned long x) { unsigned long y[128], z = 0, w = 0; y[127] = x; __builtin_memset (, 0, 127 * sizeof (long)); for (unsigned long i = 0; i < 128; i += 2) { unsigned long a = y[i], b, c, d; b = __builtin_subcl (0, a, z, ); z = c; if (i >= 64) { if (i == 64) w = c != 0; else w = (c != 0) | w; } d = i + 1; a = y[d]; b = __builtin_subcl (0, a, z, ); z = c; if (d > 64) w = (c != 0) | w; } return w; } but that doesn't reproduce it unfortunately.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #13 from Richard Biener --- (In reply to Jakub Jelinek from comment #10) > Could we for lookups if range isn't a subset of the found range pretend > there was not a match, try to see through definitions again and only if it > yields an equivalent result value range it the same? Perhaps even remember > the range used in it and in case we find non-subset lookup having the same > result union the remembered range? So pretend we record the first match with using a range that improves the result in the hashtable. Then, when looking up the second ref we hit the hashtable entry, see it has an incompatible range so we can't use the recorded value. We can then easily only ignore the entry (the prototype patch does this). As we can't easily tell whether we used any (or even which) range without doing multiple lookups for each ref and comparing the result "re-doing" things wouldn't work. But for determining two refs are equivalent it might be enough to avoid recording any kind of range for when the value was "varying". The value of such hashtable entry would be usable even by lookups with narrower range (but also not yielding any better "constant" value). I'm trying to improve things this way.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #12 from Richard Biener --- (In reply to Jakub Jelinek from comment #11) > Shall I try to construct a non-bitint testcase for this? That would be nice, more coverage is always good.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #11 from Jakub Jelinek --- Shall I try to construct a non-bitint testcase for this?
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #10 from Jakub Jelinek --- Could we for lookups if range isn't a subset of the found range pretend there was not a match, try to see through definitions again and only if it yields an equivalent result value range it the same? Perhaps even remember the range used in it and in case we find non-subset lookup having the same result union the remembered range?
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #9 from Richard Biener --- Which of course would regress something like int a[16]; int foo (int i) { if (i > 7) return a[i]; else return a[i]; } where we'd no longer hoist as we no longer would value-number the refs the same (that extends to any ref using SSA names with eventually differing ranges). Reverting r9-398 likely isn't the best answer either, it would of course also regress the two valid replacements with zero (the prototype patch preserves those). There's the PR100923 fix (r12-1295-g7a56d3d3e99cc7) which targeted a similar (but even more odd) case with "contextual" PTA info (though there's really no such thing). But it didn't really fix the contextual thing but added re-valueization which in case of vn_reference_lookup_pieces works on value-numbered refs where failure mode is keeping the value-number. The prototype (after fixing it a bit) passes bootstrap but regresses quite some number of testcases (maybe due to ???s present). FAIL: g++.dg/ipa/devirt-20.C -std=gnu++98 scan-tree-dump-not release_ssa "abor t" FAIL: g++.dg/ipa/devirt-20.C -std=gnu++14 scan-tree-dump-not release_ssa "abor t" FAIL: g++.dg/ipa/devirt-20.C -std=gnu++17 scan-tree-dump-not release_ssa "abor t" FAIL: g++.dg/ipa/devirt-20.C -std=gnu++20 scan-tree-dump-not release_ssa "abort" FAIL: g++.dg/opt/pr110879.C -std=gnu++14 scan-tree-dump-not optimized "=s*S*res_(?!S*_M_end_of_storage;)" FAIL: g++.dg/opt/pr110879.C -std=gnu++17 scan-tree-dump-not optimized "=s*S*res_(?!S*_M_end_of_storage;)" FAIL: g++.dg/opt/pr110879.C -std=gnu++20 scan-tree-dump-not optimized "=s*S*res_(?!S*_M_end_of_storage;)" FAIL: g++.dg/pr99966.C -std=gnu++17 scan-tree-dump-not vrp1 "throw" FAIL: g++.dg/pr99966.C -std=gnu++20 scan-tree-dump-not vrp1 "throw" FAIL: g++.dg/vect/pr112961.cc -std=c++98 scan-tree-dump vect "LOOP VECTORIZED" FAIL: g++.dg/vect/pr112961.cc -std=c++14 scan-tree-dump vect "LOOP VECTORIZED" FAIL: g++.dg/vect/pr112961.cc -std=c++17 scan-tree-dump vect "LOOP VECTORIZED" FAIL: g++.dg/vect/pr112961.cc -std=c++20 scan-tree-dump vect "LOOP VECTORIZED" FAIL: g++.dg/vect/pr89653.cc -std=c++98 scan-tree-dump vect "vectorized 1 loops" FAIL: g++.dg/vect/pr89653.cc -std=c++14 scan-tree-dump vect "vectorized 1 loops" FAIL: g++.dg/vect/pr89653.cc -std=c++17 scan-tree-dump vect "vectorized 1 loops" FAIL: g++.dg/vect/pr89653.cc -std=c++20 scan-tree-dump vect "vectorized 1 loops" FAIL: g++.dg/vect/simd-10.cc -std=c++98 scan-tree-dump-times vect "vectorized [1-3] loops" 2 FAIL: g++.dg/vect/simd-10.cc -std=c++14 scan-tree-dump-times vect "vectorized [1-3] loops" 2 FAIL: g++.dg/vect/simd-10.cc -std=c++17 scan-tree-dump-times vect "vectorized [1-3] loops" 2 FAIL: g++.dg/vect/simd-10.cc -std=c++20 scan-tree-dump-times vect "vectorized [1-3] loops" 2 FAIL: gcc.dg/ira-loop-pressure.c scan-rtl-dump loop2_invariant "Decided to move invariant" FAIL: gcc.dg/pr41783.c scan-tree-dump pre "pretmp[^n]* = a_global_var;" FAIL: gcc.dg/pr78138.c (test for warnings, line 23) FAIL: gcc.dg/tree-ssa/ifc-pr69489-1.c scan-tree-dump-times ifcvt "Applying if-conversion" 1 FAIL: gcc.dg/tree-ssa/ifc-pr69489-1.c scan-tree-dump-times ifcvt "Invalid sum of outgoing probabilities 200.0" 1 FAIL: gcc.dg/tree-ssa/ifc-pr69489-1.c scan-tree-dump-times ifcvt "Invalid sum of incoming counts" 1 FAIL: gcc.dg/tree-ssa/ifc-pr69489-2.c scan-tree-dump-times ifcvt "Applying if-conversion" 1 FAIL: gcc.dg/tree-ssa/ifc-pr69489-2.c scan-tree-dump-times ifcvt "Invalid sum of outgoing probabilities 200.0" 1 FAIL: gcc.dg/tree-ssa/ifc-pr69489-2.c scan-tree-dump-times ifcvt "Invalid sum of incoming counts" 1 FAIL: gcc.dg/tree-ssa/loadpre1.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre10.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre11.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre12.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre13.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre14.c scan-tree-dump-times pre "Eliminated: 2" 1 FAIL: gcc.dg/tree-ssa/loadpre16.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre2.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre21.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre23.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre24.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre25.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre3.c scan-tree-dump-times pre "Eliminated: 2" 1 FAIL: gcc.dg/tree-ssa/loadpre4.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre6.c scan-tree-dump-times pre "Eliminated: 1" 1 FAIL: gcc.dg/tree-ssa/loadpre6.c scan-tree-dump-times pre "Insertions: 1" 1 FAIL: gcc.dg/tree-ssa/pr21417.c
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #8 from Richard Biener --- Created attachment 57549 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57549=edit prototype fix This is very similar to PR113831. We again have two refs looking seemingly the same: _80 = _109 + 1; _79 = VIEW_CONVERT_EXPR(y)[_80]; _77 = .USUBC (0, _79, _103); and _43 = _109 + 1; _42 = VIEW_CONVERT_EXPR(y)[_43]; _39 = .USUBC (0, _42, _103); so they are structurally entered in the same way into the expression hash table. But since _80 and _43 have different ranges what get_ref_base_and_extent will compute differs - in the case of _109 <= 3 it will make the stmt walking hit the __builtin_memset and record a value number of zero for the expresssion. As we only after that (by bad luck) visit the other reference we successfully look up the existing value from the hashtable during the walk. In the PR113831 the accesses degenerated to a single array element which allowed the fix to work (adjust the expression we put into the hash). But this shows (and I feared that ...) this doesn't work. We either have to make all ranges part of the expression (even if they make a difference in the end) or avoid using ranges alltogether when computing a value for an expression during the walk, most definitely when we walk to different context (but that's hard to specify). Maybe a middle-ground would be to make the get_ref_base_and_extent computed info part of the expression. Like the attached. Lot's of ??? to address though ...
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #7 from Richard Biener --- I will have a look.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 Jakub Jelinek changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- pass_pre_slp_scalar_cleanup invokes another copy of FRE and I think this goes wrong in there. The .USUBC calls emitted by bitintlower1 are: _50 = .USUBC (0, _47, _48); _61 = .USUBC (0, _60, _51); _65 = .USUBC (0, 0, _49); _36 = .USUBC (_32, _33, _34); _40 = .USUBC (0, 0, _37); where the first two process even and odd limbs of y from the first .SUB_OVERFLOW, the two second operands are initialized with _47 = VIEW_CONVERT_EXPR(y)[_45]; and _59 = _45 + 1; _60 = VIEW_CONVERT_EXPR(y)[_59]; where _45 is an IV going from 0 to 6 in steps of 2 and y has just y[7] non-zero, all lower limbs zero. The third .USUBC is the final processing of the first .SUB_OVERFLOW and the remaining two are from the second .SUB_OVERFLOW, we can ignore that now. Now, in cunroll we can see some jump threading from earlier passes: _50 = .USUBC (0, _47, _48); _16 = .USUBC (0, _17, _51); _74 = .USUBC (0, _73, _51); _61 = .USUBC (0, _60, _51); _65 = .USUBC (0, 0, _125); but the _48 vs. _51 last operands clearly identify where it is coming from. Now the pre-slp fre4 seems to have replaced the second arguments of the 3 calls with 0s: _50 = .USUBC (0, _47, _48); _16 = .USUBC (0, 0, _51); _74 = .USUBC (0, 0, _51); _61 = .USUBC (0, 0, _51); _65 = .USUBC (0, 0, _125); While that it would be correct to replace _47 with 0, because _45 iterates over the 0, 2, 4 and 6 indexes into the array and the array is known to be 0 there due to __builtin_memset (, 0, 56); that is not the case for VIEW_CONVERT_EXPR(y)[7]. _16 = .USUBC (0, _17, _51); is guarded on _45 <= 3 (aka 0 or 2) and so _17 -> 0 replacement is ok. _74 = .USUBC (0, _73, _51); is guarded on _45 == 4 and VIEW_CONVERT_EXPR(y)[5] is also known to be 0, so _73 -> 0 is ok as well. But in _59 = _45 + 1; _60 = VIEW_CONVERT_EXPR(y)[_59]; _61 = .USUBC (0, _60, _51); either we don't know anything, in that case we need to load, or we know that _45 is 6 and _59 is 7 and VIEW_CONVERT_EXPR(y)[7] is _84, not 0.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #5 from Jakub Jelinek --- And it works even with -O2 -ftree-loop-vectorize -fno-tree-slp-vectorize and doesn't work with -O2 -fno-tree-loop-vectorize -ftree-slp-vectorize so will need to look at what SLP vectorization does here.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #4 from Jakub Jelinek --- (In reply to Andrew Pinski from comment #2) > As an aside we at -O3 has: > _87 = .USUBC (_30, 3, 0); > _93 = IMAGPART_EXPR <_87>; > _88 = .USUBC (0, 0, _93); > _29 = IMAGPART_EXPR <_88>; > _187 = .USUBC (0, 0, _29); > _217 = IMAGPART_EXPR <_187>; > _218 = .USUBC (0, 0, _217); > _214 = IMAGPART_EXPR <_218>; > _213 = .USUBC (0, 0, _214); > _212 = IMAGPART_EXPR <_213>; > _200 = .USUBC (0, 0, _212); > _34 = IMAGPART_EXPR <_200>; > _35 = .USUBC (0, 0, _34); > _36 = IMAGPART_EXPR <_35>; > _39 = .USUBC (0, 0, _36); > _40 = REALPART_EXPR <_39>; > _2 = (signed long) _40; > _77 = _2 < 0; > > isn't `.USUBC (0, 0, _93);` just Complex<-_93, _93!=0> ? Or did I > misunderstand USUBC ? It is, but would that result in better generated code in case there is .USUBC (_400, _401, _29); later on? It would be nice to optimize such long series of .USUBC (0, 0, prev) into just copying of the result of the first one such call. For similar long series of .UADDC (0, 0, prev) that would be just the first one and all the rest is 0+0i because it never carries over anymore.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #3 from Jakub Jelinek --- Indeed. And -O2 -fno-tree-vectorize works. I've changed it to unsigned a, b, c, d, e; unsigned _BitInt(256) f; __attribute__((noipa)) unsigned short bswap16 (int t) { return __builtin_bswap16 (t); } void foo (unsigned z, unsigned _BitInt(512) y, unsigned *r) { unsigned t = __builtin_sub_overflow_p (0, y << 509, f); z *= bswap16 (t); d = __builtin_sub_overflow_p (c, 3, (unsigned _BitInt(512)) 0); unsigned q = z + c + b; unsigned short n = q >> (8 + a); *r = b + e + n; } int main () { unsigned x; foo (8, 2, ); if (x != 8) __builtin_abort (); } and bswap16 is called with 1 with -O2 -fno-tree-vectorize and 0 with -O2, so the problem is either during the computation of y << 509 (but that is fairly simple thing out of bitintlower, set highest limb to the lowest limb << 61 and clear all others), or during the sub overflow.
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 --- Comment #2 from Andrew Pinski --- As an aside we at -O3 has: _87 = .USUBC (_30, 3, 0); _93 = IMAGPART_EXPR <_87>; _88 = .USUBC (0, 0, _93); _29 = IMAGPART_EXPR <_88>; _187 = .USUBC (0, 0, _29); _217 = IMAGPART_EXPR <_187>; _218 = .USUBC (0, 0, _217); _214 = IMAGPART_EXPR <_218>; _213 = .USUBC (0, 0, _214); _212 = IMAGPART_EXPR <_213>; _200 = .USUBC (0, 0, _212); _34 = IMAGPART_EXPR <_200>; _35 = .USUBC (0, 0, _34); _36 = IMAGPART_EXPR <_35>; _39 = .USUBC (0, 0, _36); _40 = REALPART_EXPR <_39>; _2 = (signed long) _40; _77 = _2 < 0; isn't `.USUBC (0, 0, _93);` just Complex<-_93, _93!=0> ? Or did I misunderstand USUBC ?
[Bug tree-optimization/114121] wrong code with _BitInt() arithmetics at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114121 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-02-26 --- Comment #1 from Andrew Pinski --- Confirmed. Though I wonder if this could be an issue even without using _BitInt since bitintlower produces the same IR for -O2 and -O3.