Re: [PATCH] Defer pow (C, x) folding until after vectorization always (PR middle-end/82004)

2018-02-20 Thread Richard Biener
On February 19, 2018 11:02:50 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>While I've over-simplified the testcase and so this patch doesn't help
>the 628.pop2_s miscompare, I still believe it is beneficial to defer
>this
>folding until late for these reasons:
>1) if we propagate a constant into the second pow argument too, it will
>   be likely more precise than going through the exp (cst * x) way
>2) except when C is M_E, pow is fewer operations and thus smaller IL
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>2018-02-19  Jakub Jelinek  
>
>   PR middle-end/82004
>   * match.pd (pow(C,x) -> exp(log(C)*x)): Delay all folding until
>   after vectorization.
>
>   * gfortran.dg/pr82004.f90: New test.
>
>--- gcc/match.pd.jj2018-02-15 12:15:51.655780636 +0100
>+++ gcc/match.pd   2018-02-19 17:38:06.390763194 +0100
>@@ -4006,7 +4006,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>(pows REAL_CST@0 @1)
>(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
>-  && real_isfinite (TREE_REAL_CST_PTR (@0)))
>+  && real_isfinite (TREE_REAL_CST_PTR (@0))
>+  /* As libmvec doesn't have a vectorized exp2, defer optimizing
>+ the use_exp2 case until after vectorization.  It seems actually
>+ beneficial for all constants to postpone this until later,
>+ because exp(log(C)*x), while faster, will have worse precision
>+ and if x folds into a constant too, that is unnecessary
>+ pessimization.  */
>+  && canonicalize_math_after_vectorization_p ())
> (with {
>const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
>bool use_exp2 = false;
>@@ -4021,10 +4028,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  }
>  (if (!use_exp2)
>   (exps (mult (logs @0) @1))
>-  /* As libmvec doesn't have a vectorized exp2, defer optimizing
>-   this until after vectorization.  */
>-  (if (canonicalize_math_after_vectorization_p ())
>-  (exp2s (mult (log2s @0) @1
>+  (exp2s (mult (log2s @0) @1)))
> 
>  (for sqrts (SQRT)
>   cbrts (CBRT)
>--- gcc/testsuite/gfortran.dg/pr82004.f90.jj   2018-02-19
>17:58:57.435682156 +0100
>+++ gcc/testsuite/gfortran.dg/pr82004.f90  2018-02-19 17:58:34.127684892
>+0100
>@@ -0,0 +1,18 @@
>+! PR middle-end/82004
>+! { dg-do run }
>+! { dg-options "-Ofast" }
>+
>+  integer, parameter :: r8 = selected_real_kind(13), i4 = kind(1)
>+  integer (i4), parameter :: a = 400, b = 2
>+  real (r8), parameter, dimension(b) :: c = (/ .001_r8, 10.00_r8 /)
>+  real (r8) :: d, e, f, g, h
>+  real (r8), parameter :: j &
>+= 10**(log10(c(1))-(log10(c(b))-log10(c(1)))/real(a))
>+
>+  d = c(1)
>+  e = c(b)
>+  f = (log10(e)-log10(d))/real(a)
>+  g = log10(d) - f
>+  h = 10**(g)
>+  if (h.ne.j) stop 1
>+end
>
>   Jakub



[PATCH] Defer pow (C, x) folding until after vectorization always (PR middle-end/82004)

2018-02-19 Thread Jakub Jelinek
Hi!

While I've over-simplified the testcase and so this patch doesn't help
the 628.pop2_s miscompare, I still believe it is beneficial to defer this
folding until late for these reasons:
1) if we propagate a constant into the second pow argument too, it will
   be likely more precise than going through the exp (cst * x) way
2) except when C is M_E, pow is fewer operations and thus smaller IL

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-02-19  Jakub Jelinek  

PR middle-end/82004
* match.pd (pow(C,x) -> exp(log(C)*x)): Delay all folding until
after vectorization.

* gfortran.dg/pr82004.f90: New test.

--- gcc/match.pd.jj 2018-02-15 12:15:51.655780636 +0100
+++ gcc/match.pd2018-02-19 17:38:06.390763194 +0100
@@ -4006,7 +4006,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (simplify
(pows REAL_CST@0 @1)
(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
-   && real_isfinite (TREE_REAL_CST_PTR (@0)))
+   && real_isfinite (TREE_REAL_CST_PTR (@0))
+   /* As libmvec doesn't have a vectorized exp2, defer optimizing
+  the use_exp2 case until after vectorization.  It seems actually
+  beneficial for all constants to postpone this until later,
+  because exp(log(C)*x), while faster, will have worse precision
+  and if x folds into a constant too, that is unnecessary
+  pessimization.  */
+   && canonicalize_math_after_vectorization_p ())
 (with {
const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
bool use_exp2 = false;
@@ -4021,10 +4028,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  }
  (if (!use_exp2)
   (exps (mult (logs @0) @1))
-  /* As libmvec doesn't have a vectorized exp2, defer optimizing
-this until after vectorization.  */
-  (if (canonicalize_math_after_vectorization_p ())
-   (exp2s (mult (log2s @0) @1
+  (exp2s (mult (log2s @0) @1)))
 
  (for sqrts (SQRT)
   cbrts (CBRT)
--- gcc/testsuite/gfortran.dg/pr82004.f90.jj2018-02-19 17:58:57.435682156 
+0100
+++ gcc/testsuite/gfortran.dg/pr82004.f90   2018-02-19 17:58:34.127684892 
+0100
@@ -0,0 +1,18 @@
+! PR middle-end/82004
+! { dg-do run }
+! { dg-options "-Ofast" }
+
+  integer, parameter :: r8 = selected_real_kind(13), i4 = kind(1)
+  integer (i4), parameter :: a = 400, b = 2
+  real (r8), parameter, dimension(b) :: c = (/ .001_r8, 10.00_r8 /)
+  real (r8) :: d, e, f, g, h
+  real (r8), parameter :: j &
+= 10**(log10(c(1))-(log10(c(b))-log10(c(1)))/real(a))
+
+  d = c(1)
+  e = c(b)
+  f = (log10(e)-log10(d))/real(a)
+  g = log10(d) - f
+  h = 10**(g)
+  if (h.ne.j) stop 1
+end

Jakub