Re: [PATCH] Implement constant-folding simplifications of reductions.

2022-02-21 Thread Jeff Law via Gcc-patches




On 2/21/2022 3:55 AM, Richard Biener via Gcc-patches wrote:

On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle  wrote:


Hi Marc,
I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
reduction to be done in any order, for example the testcase requires
-ffast-math to allow the REDUC_PLUS to be introduced in the first place.
This also applies explains why the patch doesn't need to distinguish
negative zeros from positive zeros in ctor_single_nonzero_element
(but it's perhaps something to beware of in uses of VECTOR_CST's
single_nonzero_element).

.IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab
which does not specify an operation order.  There's also
fold_left_plus which does (left-to-right).
fold-left-plus is meant to be a strictly in-order reduction, which 
implies (at least to me) that REDUC_PLUS allows reassociation.

An argument could be made that constant folding should do what
the CPU will end up doing but we're already not doing that for
double arithmetic and flush-to-zero enabled with -ffast-math and
denormals I think.
Right.  We can also lose inexact state when folding, but I think that's 
largely considered OK as well.


jeff



Re: [PATCH] Implement constant-folding simplifications of reductions.

2022-02-21 Thread Richard Biener via Gcc-patches
On Mon, Feb 21, 2022 at 9:31 AM Roger Sayle  wrote:
>
>
> Hi Marc,
> I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
> reduction to be done in any order, for example the testcase requires
> -ffast-math to allow the REDUC_PLUS to be introduced in the first place.
> This also applies explains why the patch doesn't need to distinguish
> negative zeros from positive zeros in ctor_single_nonzero_element
> (but it's perhaps something to beware of in uses of VECTOR_CST's
> single_nonzero_element).

.IFN_REDUC_PLUS directly maps to the reduc_plus_scal optab
which does not specify an operation order.  There's also
fold_left_plus which does (left-to-right).

An argument could be made that constant folding should do what
the CPU will end up doing but we're already not doing that for
double arithmetic and flush-to-zero enabled with -ffast-math and
denormals I think.

+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE.  */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+  unsigned HOST_WIDE_INT idx;
+  constructor_elt *ce;
+  tree elt = NULL_TREE;
+
+  if (TREE_CODE (t) == SSA_NAME)
+{
+  gassign *def = dyn_cast  (SSA_NAME_DEF_STMT (t));

I think this function should expect a CONSTRUCTOR 'T' and the use in
match.pd be adjusted similar to other CONSTRUCTOR users like

(simplify
 (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)
...
  (with
   {
 tree ctor = (TREE_CODE (@0) == SSA_NAME
  ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0);


+/* Fold reduction of a single nonzero element constructor.  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+  (simplify (reduc (CONSTRUCTOR@0))
+(with { tree elt = ctor_single_nonzero_element (@0); }
+  (if (elt)
+(non_lvalue { elt; })

is (non_value... ) really necessary?  I highly doubt so.

I wonder if there's a case like { _1, 0., 0., -0. } where independent on
the order of the operations this transform relies on !HONOR_SIGNED_ZEROS?
It likely also should demote _1 from sNaN to qNaN and thus relies on
-fno-signalling-nans?

Otherwise OK.

Thanks,
Richard.

> Cheers,
> Roger
> --
>
> > -Original Message-
> > From: Marc Glisse 
> > Sent: 21 February 2022 08:21
> > To: Roger Sayle 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Implement constant-folding simplifications of
> reductions.
> >
> > On Mon, 21 Feb 2022, Roger Sayle wrote:
> >
> > > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC
> > (VECTOR_CST).
> > > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN
> > IFN_REDUC_FMAX
> > > +IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR
> > IFN_REDUC_XOR)
> > > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> > > +  (simplify (reduc (op @0 VECTOR_CST@1))
> > > +(op (reduc:type @0) (reduc:type @1
> >
> > I wonder if we need to test flag_associative_math for the 'plus' case, or
> if the
> > presence of IFN_REDUC_PLUS is enough to justify the possible loss of
> precision.
> >
> > --
> > Marc Glisse
>


RE: [PATCH] Implement constant-folding simplifications of reductions.

2022-02-21 Thread Roger Sayle


Hi Marc,
I'm assuming that the use (semantics) of a REDUC_PLUS expr allow the
reduction to be done in any order, for example the testcase requires 
-ffast-math to allow the REDUC_PLUS to be introduced in the first place.
This also applies explains why the patch doesn't need to distinguish
negative zeros from positive zeros in ctor_single_nonzero_element
(but it's perhaps something to beware of in uses of VECTOR_CST's 
single_nonzero_element).

Cheers,
Roger
--

> -Original Message-
> From: Marc Glisse 
> Sent: 21 February 2022 08:21
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Implement constant-folding simplifications of
reductions.
> 
> On Mon, 21 Feb 2022, Roger Sayle wrote:
> 
> > +/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC
> (VECTOR_CST).
> > +*/ (for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN
> IFN_REDUC_FMAX
> > +IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR
> IFN_REDUC_XOR)
> > + op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
> > +  (simplify (reduc (op @0 VECTOR_CST@1))
> > +(op (reduc:type @0) (reduc:type @1
> 
> I wonder if we need to test flag_associative_math for the 'plus' case, or
if the
> presence of IFN_REDUC_PLUS is enough to justify the possible loss of
precision.
> 
> --
> Marc Glisse



Re: [PATCH] Implement constant-folding simplifications of reductions.

2022-02-21 Thread Marc Glisse

On Mon, 21 Feb 2022, Roger Sayle wrote:


+/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST).  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
+IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
+ op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
+  (simplify (reduc (op @0 VECTOR_CST@1))
+(op (reduc:type @0) (reduc:type @1


I wonder if we need to test flag_associative_math for the 'plus' case,
or if the presence of IFN_REDUC_PLUS is enough to justify the possible
loss of precision.

--
Marc Glisse


[PATCH] Implement constant-folding simplifications of reductions.

2022-02-20 Thread Roger Sayle

This patch addresses a code quality regression in GCC 12 by implementing
some constant folding/simplification transformations for REDUC_PLUS_EXPR
in match.pd.  The motivating example is gcc.dg/vect/pr89440.c which with
-O2 -ffast-math (with vectorization now enabled) gets optimized to:

float f (float x)
{
  vector(4) float vect_x_14.11;
  vector(4) float _2;
  float _32;

  _2 = {x_9(D), 0.0, 0.0, 0.0};
  vect_x_14.11_29 = _2 + { 1.0e+1, 2.6e+1, 4.2e+1, 5.8e+1 };
  _32 = .REDUC_PLUS (vect_x_14.11_29); [tail call]
  return _32;
}

With these proposed new transformations, we can simplify the
above code even further.

float f (float x)
{
  float _32;
  _32 = x_9(D) + 1.36e+2;
  return _32;
}

[which happens to match what we'd produce with -fno-tree-vectorize,
and with GCC 11].

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-21  Roger Sayle  

gcc/ChangeLog
* fold-const.cc (ctor_single_nonzero_element): New function to
return the single non-zero element of a (vector) constructor.
* fold-const.h (ctor_single_nonzero_element): Prototype here.
* match.pd (reduc (constructor@0)): Simplify reductions of a
constructor containing a single non-zero element.
(reduc (@0 op VECTOR_CST) ->  (reduc @0) op CONST): Simplify
reductions of vector operations of the same operator with
constant vector operands.

gcc/testsuite/ChangeLog
* gcc.dg/fold-reduc-1.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 386d573..4283308 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -16792,6 +16792,33 @@ address_compare (tree_code code, tree type, tree op0, 
tree op1,
   return equal;
 }
 
+/* Return the single non-zero element of a CONSTRUCTOR or NULL_TREE.  */
+tree
+ctor_single_nonzero_element (const_tree t)
+{
+  unsigned HOST_WIDE_INT idx;
+  constructor_elt *ce;
+  tree elt = NULL_TREE;
+
+  if (TREE_CODE (t) == SSA_NAME)
+{
+  gassign *def = dyn_cast  (SSA_NAME_DEF_STMT (t));
+  if (gimple_assign_rhs_code (def) == CONSTRUCTOR)
+t = gimple_assign_rhs1 (def);
+}
+
+  if (TREE_CODE (t) != CONSTRUCTOR)
+return NULL_TREE;
+  for (idx = 0; vec_safe_iterate (CONSTRUCTOR_ELTS (t), idx, ); idx++)
+if (!integer_zerop (ce->value) && !real_zerop (ce->value))
+  {
+   if (elt)
+ return NULL_TREE;
+   elt = ce->value;
+  }
+  return elt;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index f217598..b2f0a2f 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -224,6 +224,7 @@ extern const char *c_getstr (tree);
 extern wide_int tree_nonzero_bits (const_tree);
 extern int address_compare (tree_code, tree, tree, tree, tree &, tree &,
poly_int64 &, poly_int64 &, bool);
+extern tree ctor_single_nonzero_element (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index d9d8359..047fb50 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7528,6 +7528,20 @@ and,
(BIT_FIELD_REF:elt_type @0 { size; } { pos; })
{ elt; })))
 
+/* Fold reduction of a single nonzero element constructor.  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_IOR IFN_REDUC_XOR)
+  (simplify (reduc (CONSTRUCTOR@0))
+(with { tree elt = ctor_single_nonzero_element (@0); }
+  (if (elt)
+(non_lvalue { elt; })
+
+/* Fold REDUC (@0 op VECTOR_CST) as REDUC (@0) op REDUC (VECTOR_CST).  */
+(for reduc (IFN_REDUC_PLUS IFN_REDUC_MAX IFN_REDUC_MIN IFN_REDUC_FMAX
+IFN_REDUC_FMIN IFN_REDUC_AND IFN_REDUC_IOR IFN_REDUC_XOR)
+ op (plus max min IFN_FMAX IFN_FMIN bit_and bit_ior bit_xor)
+  (simplify (reduc (op @0 VECTOR_CST@1))
+(op (reduc:type @0) (reduc:type @1
+
 (simplify
  (vec_perm @0 @1 VECTOR_CST@2)
  (with
diff --git a/gcc/testsuite/gcc.dg/fold-reduc-1.c 
b/gcc/testsuite/gcc.dg/fold-reduc-1.c
new file mode 100644
index 000..c8360b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-reduc-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+float foo (float x)
+{
+ int i;
+ float j;
+ float a = 0;
+ for (i = 0; i < 4; ++i)
+   {
+ for (j = 0; j < 4; ++j)
+   {
+ a += 1;
+ x += a;
+   }
+   }
+ return x;
+}
+
+/* { dg-final { scan-tree-dump-not "REDUC_PLUS" "optimized"} } */