[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-10-22 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #12 from Michael Matz  ---
Author: matz
Date: Tue Oct 22 12:25:03 2019
New Revision: 277287

URL: https://gcc.gnu.org/viewcvs?rev=277287=gcc=rev
Log:
Fix PR middle-end/90796

PR middle-end/90796
* gimple-loop-jam.c (any_access_function_variant_p): New function.
(adjust_unroll_factor): Use it to constrain safety, new parameter.
(tree_loop_unroll_and_jam): Adjust call and profitable unroll factor.

testsuite/
* gcc.dg/unroll-and-jam.c: Add three invalid and one valid case.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimple-loop-jam.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/unroll-and-jam.c

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-08-07 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #11 from Michael Matz  ---
(In reply to rguent...@suse.de from comment #10)
> >It's the only affine functions that don't progress with each iteration.
> > I
> >think, at least :)
> 
> Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be
> caught in another way..

Yes, we analyze them, but for nothing.  They aren't affine either, and hence
result in unknown dependences.

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-08-06 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #10 from rguenther at suse dot de  ---
On August 6, 2019 5:36:49 PM GMT+02:00, "matz at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796
>
>--- Comment #9 from Michael Matz  ---
>(In reply to rguent...@suse.de from comment #8)
>> >The fun thing is, there's a difference between these two loop nests:
>> >
>> >   for (i) for (j) a[i][0] = f(a[i+1][0]);
>> >   for (i) for (j) b[i][j] = f(a[i+1][j]);
>> 
>> What about
>> 
>>   B[i][j/2]...
>> 
>> ?
>
>That would be a problem as well, but luckily that's not an affine
>function of
>j,
>and hence has no analyzable access function, and so isn't fused for
>different
>reasons.
>
>> It's really surprising that only invariants are special here.
>
>It's the only affine functions that don't progress with each iteration.
> I
>think, at least :)

Hm. At least we analyze wrapping ones, but I guess 0, 1, 0, 1 would be caught
in another way..

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-08-06 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #9 from Michael Matz  ---
(In reply to rguent...@suse.de from comment #8)
> >The fun thing is, there's a difference between these two loop nests:
> >
> >   for (i) for (j) a[i][0] = f(a[i+1][0]);
> >   for (i) for (j) b[i][j] = f(a[i+1][j]);
> 
> What about
> 
>   B[i][j/2]...
> 
> ?

That would be a problem as well, but luckily that's not an affine function of
j,
and hence has no analyzable access function, and so isn't fused for different
reasons.

> It's really surprising that only invariants are special here.

It's the only affine functions that don't progress with each iteration.  I
think, at least :)

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-08-06 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #8 from rguenther at suse dot de  ---
On August 5, 2019 9:53:48 PM GMT+02:00, "matz at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796
>
>--- Comment #7 from Michael Matz  ---
>Created attachment 46675
>  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675=edit
>potential patch
>
>Actually I was barking up the wrong tree.  It's not as easy as the CFG
>manipulation for loop fusion going wrong (like missing some last
>iterations
>or so).  It's really a problem in the dependence analysis.  See the
>extensive
>comment in the patch.
>
>The fun thing is, there's a difference between these two loop nests:
>
>   for (i) for (j) a[i][0] = f(a[i+1][0]);
>   for (i) for (j) b[i][j] = f(a[i+1][j]);

What about

  B[i][j/2]...

? It's really surprising that only invariants are special here.

>Even though the distance vector for the read/write in the single
>statement
>is (-1,0) for both loops, unroll-and-jam is valid for the second but
>not
>for the first.

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-08-05 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #7 from Michael Matz  ---
Created attachment 46675
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46675=edit
potential patch

Actually I was barking up the wrong tree.  It's not as easy as the CFG
manipulation for loop fusion going wrong (like missing some last iterations
or so).  It's really a problem in the dependence analysis.  See the extensive
comment in the patch.

The fun thing is, there's a difference between these two loop nests:

   for (i) for (j) a[i][0] = f(a[i+1][0]);
   for (i) for (j) b[i][j] = f(a[i+1][j]);

Even though the distance vector for the read/write in the single statement
is (-1,0) for both loops, unroll-and-jam is valid for the second but not
for the first.

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-07-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-06-12 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

Michael Matz  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |matz at gcc dot gnu.org

--- Comment #6 from Michael Matz  ---
I think I know what's going on.  Mine.

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-06-11 Thread matz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #5 from Michael Matz  ---
FWIW, the reduced testcase from comment #3 is wrong.  Even with -O0 or with gcc
4.3 or 6 I get:

b:48
Aborted (core dumped)


But I can reproduce the problem with the original testcase.

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-06-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #4 from Richard Biener  ---
Hmm, the CFG looks like unroll-and-jam attempts to do versioning/peeling
but forgets the tail loop is executed at least once?

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-06-10 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org,
   ||matz at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
  Known to work||7.4.0
   Target Milestone|--- |8.4
  Known to fail||10.0, 8.3.0, 9.1.0

--- Comment #3 from Martin Liška  ---
Yes, it started with r255467.
There's a simplified test-case:

$ cat pr90796.c
unsigned b[11];
unsigned c;
int d, e, f;
char en;

int main() {
  char b[100];
  for (; e < 6; e += 3) {
__builtin_sprintf(b, "%u", b[0]);
for (; c < 9; c++)
  for (d = 2; d < 11; d++) {
f = b[c + 2] ^ 9;
b[c] = f;
  }
  }
  __builtin_printf("b:%s\n", b);
  if (__builtin_strcmp (b, "9") != 0)
__builtin_abort ();
  return 0;
}

$ gcc pr90796.c -O3 && ./a.out
b:0
Aborted (core dumped)

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-06-09 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

--- Comment #2 from Marc Glisse  ---
-fdisable-tree-unrolljam helps, which may (or may not) point at a potential
culprit.

[Bug middle-end/90796] [8/9/10 Regression] GCC: O2 vs O3 output differs on simple test

2019-06-09 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90796

Marc Glisse  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-06-09
Summary|GCC: O2 vs O3 output|[8/9/10 Regression] GCC: O2
   |differs on simple test  |vs O3 output differs on
   ||simple test
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
Confirmed.

I also notice a missed optimization:

  _16 = ivtmp.21_234 + 2;
  _2 = b[_16];
  _41 = _2 ^ 9;
  b[ivtmp.21_234] = _41;
  _52 = b[_16];
  _51 = _52 ^ 9;
  b[ivtmp.21_234] = _51;
  _65 = b[_16];
etc.

it seems clear from the first line that b[_16] and b[ivtmp.21_234] cannot
alias, so this should simplify. And it isn't just because cunroll is late, we
don't simplify it either if I write directly this pattern in C.