[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #12 from Michael Matz --- Author: matz Date: Tue Oct 22 12:25:03 2019 New Revision: 277287 URL: https://gcc.gnu.org/viewcvs?rev=277287=gcc=rev Log: Fix PR middle-end/90796 PR middle-end/90796 * gimple-loop-jam.c (any_access_function_variant_p): New function. (adjust_unroll_factor): Use it to constrain safety, new parameter. (tree_loop_unroll_and_jam): Adjust call and profitable unroll factor. testsuite/ * gcc.dg/unroll-and-jam.c: Add three invalid and one valid case. Modified: trunk/gcc/ChangeLog trunk/gcc/gimple-loop-jam.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/unroll-and-jam.c
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #11 from Michael Matz --- (In reply to rguent...@suse.de from comment #10) > >It's the only affine functions that don't progress with each iteration. > > I > >think, at least :) > > Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be > caught in another way.. Yes, we analyze them, but for nothing. They aren't affine either, and hence result in unknown dependences.
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #10 from rguenther at suse dot de --- On August 6, 2019 5:36:49 PM GMT+02:00, "matz at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 > >--- Comment #9 from Michael Matz --- >(In reply to rguent...@suse.de from comment #8) >> >The fun thing is, there's a difference between these two loop nests: >> > >> > for (i) for (j) a[i][0] = f(a[i+1][0]); >> > for (i) for (j) b[i][j] = f(a[i+1][j]); >> >> What about >> >> B[i][j/2]... >> >> ? > >That would be a problem as well, but luckily that's not an affine >function of >j, >and hence has no analyzable access function, and so isn't fused for >different >reasons. > >> It's really surprising that only invariants are special here. > >It's the only affine functions that don't progress with each iteration. > I >think, at least :) Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be caught in another way..
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #9 from Michael Matz --- (In reply to rguent...@suse.de from comment #8) > >The fun thing is, there's a difference between these two loop nests: > > > > for (i) for (j) a[i][0] = f(a[i+1][0]); > > for (i) for (j) b[i][j] = f(a[i+1][j]); > > What about > > B[i][j/2]... > > ? That would be a problem as well, but luckily that's not an affine function of j, and hence has no analyzable access function, and so isn't fused for different reasons. > It's really surprising that only invariants are special here. It's the only affine functions that don't progress with each iteration. I think, at least :)
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #8 from rguenther at suse dot de --- On August 5, 2019 9:53:48 PM GMT+02:00, "matz at gcc dot gnu.org" wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 > >--- Comment #7 from Michael Matz --- >Created attachment 46675 > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675=edit >potential patch > >Actually I was barking up the wrong tree. It's not as easy as the CFG >manipulation for loop fusion going wrong (like missing some last >iterations >or so). It's really a problem in the dependence analysis. See the >extensive >comment in the patch. > >The fun thing is, there's a difference between these two loop nests: > > for (i) for (j) a[i][0] = f(a[i+1][0]); > for (i) for (j) b[i][j] = f(a[i+1][j]); What about B[i][j/2]... ? It's really surprising that only invariants are special here. >Even though the distance vector for the read/write in the single >statement >is (-1,0) for both loops, unroll-and-jam is valid for the second but >not >for the first.
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #7 from Michael Matz --- Created attachment 46675 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675=edit potential patch Actually I was barking up the wrong tree. It's not as easy as the CFG manipulation for loop fusion going wrong (like missing some last iterations or so). It's really a problem in the dependence analysis. See the extensive comment in the patch. The fun thing is, there's a difference between these two loop nests: for (i) for (j) a[i][0] = f(a[i+1][0]); for (i) for (j) b[i][j] = f(a[i+1][j]); Even though the distance vector for the read/write in the single statement is (-1,0) for both loops, unroll-and-jam is valid for the second but not for the first.
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 Michael Matz changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |matz at gcc dot gnu.org --- Comment #6 from Michael Matz --- I think I know what's going on. Mine.
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #5 from Michael Matz --- FWIW, the reduced testcase from comment #3 is wrong. Even with -O0 or with gcc 4.3 or 6 I get: b:48 Aborted (core dumped) But I can reproduce the problem with the original testcase.
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #4 from Richard Biener --- Hmm, the CFG looks like unroll-and-jam attempts to do versioning/peeling but forgets the tail loop is executed at least once?
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org, ||matz at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Known to work||7.4.0 Target Milestone|--- |8.4 Known to fail||10.0, 8.3.0, 9.1.0 --- Comment #3 from Martin Liška --- Yes, it started with r255467. There's a simplified test-case: $ cat pr90796.c unsigned b[11]; unsigned c; int d, e, f; char en; int main() { char b[100]; for (; e < 6; e += 3) { __builtin_sprintf(b, "%u", b[0]); for (; c < 9; c++) for (d = 2; d < 11; d++) { f = b[c + 2] ^ 9; b[c] = f; } } __builtin_printf("b:%s\n", b); if (__builtin_strcmp (b, "9") != 0) __builtin_abort (); return 0; } $ gcc pr90796.c -O3 && ./a.out b:0 Aborted (core dumped)
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 --- Comment #2 from Marc Glisse --- -fdisable-tree-unrolljam helps, which may (or may not) point at a potential culprit.
[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796 Marc Glisse changed: What|Removed |Added Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2019-06-09 Summary|GCC: O2 vs O3 output|[8/9/10 Regression] GCC: O2 |differs on simple test |vs O3 output differs on ||simple test Ever confirmed|0 |1 --- Comment #1 from Marc Glisse --- Confirmed. I also notice a missed optimization: _16 = ivtmp.21_234 + 2; _2 = b[_16]; _41 = _2 ^ 9; b[ivtmp.21_234] = _41; _52 = b[_16]; _51 = _52 ^ 9; b[ivtmp.21_234] = _51; _65 = b[_16]; etc. it seems clear from the first line that b[_16] and b[ivtmp.21_234] cannot alias, so this should simplify. And it isn't just because cunroll is late, we don't simplify it either if I write directly this pattern in C.