[Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction

2024-01-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

--- Comment #5 from Richard Biener  ---
You want to find the duplicate bugreport for the min/max + index reductions,
IIRC the issue is that we fail the reduction detection because of multi-use
and we should really have two conditional reductions, one on the value and
one on the index without trying to be too clever combining them into a single
one.

That is, don't try to invent sth completely new based on what LLVM does but
understand what's missing in GCCs handling of conditional reductions
(it can do conditional value and conditional index reductions just fine,
just not both at the same time IIRC).

[Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction

2024-01-02 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #4 from Xi Ruoyao  ---
There is also:

double
test (double *p)
{
  double ret = p[0];
  for (int i = 1; i < 4; i++)
ret = __builtin_fmin (ret, p[i]);
  return ret;
}

This is not vectorized.

And

double
test (double *p)
{
  double ret = __builtin_inf(); /* or __builtin_nan("") */
  for (int i = 0; i < 4; i++)
ret = __builtin_fmin (ret, p[i]);
  return ret;
}

is compiled to:

  _16 = .REDUC_FMIN (vect__4.7_17);
  _22 = .REDUC_FMIN ({  Inf,  Inf,  Inf,  Inf }); 
  _20 = .FMIN (_16, _22); [tail call]
  return _20;

So there is an redundant .FMIN operation.

[Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction

2024-01-02 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

--- Comment #3 from JuzheZhong  ---
Created attachment 56973
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56973=edit
min/max reduction approach with index

Hi, Richi.

I have watch all PPT/video of 2023 llvm development meeting.

Turns out they already have a feasible solution/approach to support min/max
reduction with index.

Is it Ok that I support it by following the LLVM approach ?

The attachment is the PPT of LLVM development meeting that mentioned min/max
reduction with index.

Thanks.

[Bug tree-optimization/112457] Possible better vectorization of different reduction min/max reduction

2023-11-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112457

Richard Biener  changed:

   What|Removed |Added

  Component|c   |tree-optimization
 Blocks||53947

--- Comment #2 from Richard Biener  ---
Well, this is because MAX_EXPR detection fails when store motion inserts flags
(the max = max is elided) to avoid store-data races.  Also when using
-Ofast we avoid this but then the next phiopt comes too late to discover
MAX after store motion is applied.

The more practical example is

int foo2 (int max, int n, int * __restrict a)
{
  for (int i = 0; i < n; ++i)
if (max < a[i]) {
max = a[i];
}
  return max;
}

and that's handled OK.  For your second example, index reduction, there's
already bugreports.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations