[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped

2018-01-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|8.0 |9.0

--- Comment #6 from Richard Biener  ---
Re-targeting to GCC9.

[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped

2018-01-08 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

--- Comment #5 from rguenther at suse dot de  ---
On Mon, 8 Jan 2018, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518
> 
> --- Comment #4 from Jakub Jelinek  ---
> Before store-merging we have:
>   MEM[(int *)] = { 5, 4, 3, 2 };
>   t_2 = arr[0];
>   _65 = arr[1];
>   _69 = arr[2];
>   _73 = arr[3];
>   arr[0] = _69;
>   arr[1] = _73;
>   arr[2] = 1;
>   arr[3] = t_2;
>   vect__2.5_38 = MEM[(int *)];
> and all store-merging can do with this is what it does:
>   MEM[(int *)] = { 5, 4, 3, 2 };
>   t_2 = arr[0];
>   _65 = arr[1];
>   _69 = arr[2];
>   _73 = arr[3];
>   _46 = MEM[(int *) + 8B];
>   MEM[(int *)] = _46;
>   arr[2] = 1;
>   arr[3] = t_2;
>   vect__2.5_38 = MEM[(int *)];
> where the _69 and _73 sets can be DCEd later.  store-merging has no framework
> like FRE to do analysis what memory location contains at which point.
> So we'd need another late FRE pass to handle this?

Without enhancing FRE that wouldn't help.  I think the best thing is
to try sanitizing the IL produced by unrolling so SLP vectorization
can do its job here.

Nothing for GCC 8 though.

[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped

2018-01-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

--- Comment #4 from Jakub Jelinek  ---
Before store-merging we have:
  MEM[(int *)] = { 5, 4, 3, 2 };
  t_2 = arr[0];
  _65 = arr[1];
  _69 = arr[2];
  _73 = arr[3];
  arr[0] = _69;
  arr[1] = _73;
  arr[2] = 1;
  arr[3] = t_2;
  vect__2.5_38 = MEM[(int *)];
and all store-merging can do with this is what it does:
  MEM[(int *)] = { 5, 4, 3, 2 };
  t_2 = arr[0];
  _65 = arr[1];
  _69 = arr[2];
  _73 = arr[3];
  _46 = MEM[(int *) + 8B];
  MEM[(int *)] = _46;
  arr[2] = 1;
  arr[3] = t_2;
  vect__2.5_38 = MEM[(int *)];
where the _69 and _73 sets can be DCEd later.  store-merging has no framework
like FRE to do analysis what memory location contains at which point.
So we'd need another late FRE pass to handle this?

[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped

2018-01-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

--- Comment #3 from Richard Biener  ---
Yes, it's not really sth new but a known issue with late value-numbering.  Note
that FRE wouldn't know how to simplify this either, we'd need store-merging
to effectively vectorize the earlier sets.  BB vectorization doesn't do this
because after unrolling we see

  vect_cst__46 = { 5, 4, 3, 2 };
  MEM[(int *)] = vect_cst__46;
  arr[4] = 1;
  t_2 = arr[0];
  arr[0] = 5;
  arr[0] = t_2;
  t_32 = arr[0];
  _65 = arr[1];
  arr[0] = _65;
  arr[1] = t_32;
  t_68 = arr[0];
  _69 = arr[2];
  arr[0] = _69;
  arr[2] = t_68;
  t_72 = arr[0];
  _73 = arr[3];
  arr[0] = _73;
  arr[3] = t_72;
  t_76 = arr[0];
  _77 = arr[4];
  arr[0] = _77;
  arr[4] = t_76;
  i_80 = 1;
  ivtmp_81 = 4;
  pretmp_82 = arr[0];
  t_87 = arr[i_80];
  arr[i_80] = pretmp_82;
...

and BB vectorization is confused by the dead stores (and DSE would be
by the missed constant propagations).

[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped

2018-01-05 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
We do vectorize this and thus not unroll it since r253975.
With -O3 -fno-vect-cost-model we've generated what GCC 8 emits or something
similar for many years, including 4.4+ (r140264 already shows this behavior).

[Bug tree-optimization/83518] [8 Regression] Missing optimization: useless instructions should be dropped

2018-01-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Known to work||7.2.1
Version|tree-ssa|8.0
   Keywords||missed-optimization
   Last reconfirmed||2018-01-05
 CC||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1
Summary|Missing optimization:   |[8 Regression] Missing
   |useless instructions should |optimization: useless
   |be dropped  |instructions should be
   ||dropped
   Target Milestone|--- |8.0

--- Comment #1 from Richard Biener  ---
Works fine with GCC 7.  I suppose unroller limits hit and/or we're just lucky
that GCC 7 doesn't vectorize the reduction loop ...

Trunk has

t.C:16:20: note: loop vectorized
t.C:21:10: note: basic block vectorized

resulting in

   [local count: 178992762]:
  MEM[(int *)] = { 5, 4, 3, 2 };
  t_2 = arr[0];
  _65 = arr[1];
  _46 = MEM[(int *) + 8B];
  MEM[(int *)] = _46;
  arr[2] = 1;
  arr[3] = t_2;
  vect__2.5_38 = MEM[(int *)];
  vect_sum_21.8_30 = VEC_PERM_EXPR ;
  vect_sum_21.8_15 = vect_sum_21.8_30 + vect__2.5_38;
  vect_sum_21.8_59 = VEC_PERM_EXPR ;
  vect_sum_21.8_60 = vect_sum_21.8_15 + vect_sum_21.8_59;
  stmp_sum_21.7_61 = BIT_FIELD_REF ;
  sum_27 = stmp_sum_21.7_61 + _65;
  _23 = (unsigned int) sum_27;
  arr ={v} {CLOBBER};
  return _23;

while GCC 7 simply unrolls the loop.  DOM is not able to simplify the
vector load from MEM[(int *)] but the scalar loads from the unrolled
variant.