[Bug tree-optimization/112736] [14 Regression] vectorizer is introducing out of bounds memory access

2023-12-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Richard Biener  ---
Fixed.

[Bug tree-optimization/112736] [14 Regression] vectorizer is introducing out of bounds memory access

2023-12-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736

--- Comment #4 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:6d0b0806eb638447c3184c59d996c2f178553d45

commit r14-6459-g6d0b0806eb638447c3184c59d996c2f178553d45
Author: Richard Biener 
Date:   Mon Dec 11 14:39:48 2023 +0100

tree-optimization/112736 - avoid overread with non-grouped SLP load

The following aovids over/under-read of storage when vectorizing
a non-grouped load with SLP.  Instead of forcing peeling for gaps
use a smaller load for the last vector which might access excess
elements.  This builds upon the existing optimization avoiding
peeling for gaps, generalizing it to all gap widths leaving a
power-of-two remaining number of elements (but it doesn't replace
or improve that particular case at this point).

I wonder if the poly relational compares I set up are good enough
to guarantee /* remain should now be > 0 and < nunits.  */.

There is existing test coverage that runs into /* DR will be unused.  */
always when the gap is wider than nunits.  Compared to the
existing gap == nunits/2 case this only adjusts the load that will
cause the overrun at the end, not every load.  Apart from the
poly relational compares it should reliably cover these cases but
I'll leave it for stage1 to remove.

PR tree-optimization/112736
* tree-vect-stmts.cc (vectorizable_load): Extend optimization
to avoid peeling for gaps to handle single-element non-groups
we now allow with SLP.

* gcc.dg/torture/pr112736.c: New testcase.

[Bug tree-optimization/112736] [14 Regression] vectorizer is introducing out of bounds memory access

2023-12-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736

--- Comment #3 from Richard Biener  ---
Runtime testcase:

#include 
#include 

int a, c[3][5];

void __attribute__((noipa))
fn1 (int * __restrict b)
{
  int e;
  for (a = 2; a >= 0; a--)
for (e = 0; e < 4; e++)
  c[a][e] = b[a];
}

int main()
{
  long pgsz = sysconf (_SC_PAGESIZE);
  void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE,
 MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
  if (p == MAP_FAILED)
return 0;
  mprotect (p, pgsz, PROT_NONE);
  fn1 (p + pgsz);
  return 0;
}

[Bug tree-optimization/112736] [14 Regression] vectorizer is introducing out of bounds memory access

2023-11-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #2 from Richard Biener  ---
The vectorizer sees

   [local count: 214748368]:
  # a.3_5 = PHI <_2(5), 2(2)>
  # ivtmp_9 = PHI 
  _14 = b[a.3_5];
  c[a.3_5][0] = _14;
  c[a.3_5][1] = _14;
  c[a.3_5][2] = _14;
  c[a.3_5][3] = _14;
  _2 = a.3_5 + -1;
  ivtmp_3 = ivtmp_9 - 1;
  if (ivtmp_3 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 191126048]:
  goto ; [100.00%]

and uses SLP, this is likely caused by my patch to allow non-grouped-loads
there.

t.c:7:17: note:   node 0x4637048 (max_nunits=4, refcnt=1) vector(4) int
t.c:7:17: note:   op template: _14 = b[a.3_5];
t.c:7:17: note: stmt 0 _14 = b[a.3_5];
t.c:7:17: note: stmt 1 _14 = b[a.3_5];
t.c:7:17: note: stmt 2 _14 = b[a.3_5];
t.c:7:17: note: stmt 3 _14 = b[a.3_5];
t.c:7:17: note: load permutation { 0 0 0 0 }

I think we need to force strided-SLP for them.

[Bug tree-optimization/112736] [14 Regression] vectorizer is introducing out of bounds memory access

2023-11-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112736

Andrew Pinski  changed:

   What|Removed |Added

Summary|vectorizer is introducing   |[14 Regression] vectorizer
   |out of bounds memory access |is introducing out of
   ||bounds memory access
  Known to work||13.1.0
  Known to fail||14.0
   Last reconfirmed||2023-11-27
   Keywords||needs-bisection, wrong-code
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Target Milestone|--- |14.0

--- Comment #1 from Andrew Pinski  ---
  vect__14.12_2 = MEM  [(int *) + -4B];
  vect__14.14_16 = VEC_PERM_EXPR ;


This might be ok, unless before b is unaligned and what is before is unmapped.

  # vectp_b.10_23 = PHI  [(void *) + -4B](2)>
  vect__14.12_1 = MEM  [(int *)vectp_b.10_23];
  vect__14.13_10 = VEC_PERM_EXPR ;
  vectp_b.10_11 = vectp_b.10_23 + 12;
  vect__14.14_12 = VEC_PERM_EXPR ;


Note GCC 13 was ok:
  _1 = b[2];
  _2 = {_1, _1, _1, _1};
  MEM  [(int *) + 40B] = _2;