Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309, take 2)

2018-02-13 Thread Richard Biener
On Mon, 12 Feb 2018, Jakub Jelinek wrote:

> On Sat, Feb 10, 2018 at 03:26:46PM +0100, Jakub Jelinek wrote:
> > If use_exp2 is true and (cfun->curr_properties & PROP_gimple_lvec) == 0,
> > don't fold it?  Then I guess if we vectorize or slp vectorize the pow
> > as vector pow, we'd need to match.pd it into the exp (log (vec_cst) * x).
> 
> Here is an updated patch, that defers it for pow (0x2.0pN, x) until after
> vectorization and adds tree-vect-patterns.c matcher that will handle it
> during vectorization (that one using exp, because we don't have exp2
> vectorized).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2018-02-12  Jakub Jelinek  
> 
>   PR middle-end/84309
>   * match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
>   exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
>   * generic-match-head.c (canonicalize_math_after_vectorization_p): New
>   inline function.
>   * gimple-match-head.c (canonicalize_math_after_vectorization_p): New
>   inline function.
>   * omp-simd-clone.h: New file.
>   * omp-simd-clone.c: Include omp-simd-clone.h.
>   (expand_simd_clones): No longer static.
>   * tree-vect-patterns.c: Include fold-const-call.h, attribs.h,
>   cgraph.h and omp-simd-clone.h.
>   (vect_recog_pow_pattern): Optimize pow(C,x) to exp(log(C)*x).
>   (vect_recog_widen_shift_pattern): Formatting fix.
>   (vect_pattern_recog_1): Don't check optab for calls.
> 
>   * gcc.dg/pr84309.c: New test.
>   * gcc.target/i386/pr84309.c: New test.
> 
> --- gcc/match.pd.jj   2018-02-09 19:11:26.910070491 +0100
> +++ gcc/match.pd  2018-02-12 14:15:05.653779352 +0100
> @@ -3992,15 +3992,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (logs (pows @0 @1))
> (mult @1 (logs @0
>  
> - /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
> + /* pow(C,x) -> exp(log(C)*x) if C > 0,
> +or if C is a positive power of 2,
> +pow(C,x) -> exp2(log2(C)*x).  */
>   (for pows (POW)
>exps (EXP)
>logs (LOG)
> +  exp2s (EXP2)
> +  log2s (LOG2)
>(simplify
> (pows REAL_CST@0 @1)
> -(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
> -  && real_isfinite (TREE_REAL_CST_PTR (@0)))
> - (exps (mult (logs @0) @1)
> +   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
> + && real_isfinite (TREE_REAL_CST_PTR (@0)))
> +(with {
> +   const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
> +   bool use_exp2 = false;
> +   if (targetm.libc_has_function (function_c99_misc)
> +&& value->cl == rvc_normal)
> +  {
> +REAL_VALUE_TYPE frac_rvt = *value;
> +SET_REAL_EXP (_rvt, 1);
> +if (real_equal (_rvt, ))
> +  use_exp2 = true;
> +  }
> + }
> + (if (!use_exp2)
> +  (exps (mult (logs @0) @1))
> +  /* As libmvec doesn't have a vectorized exp2, defer optimizing
> +  this until after vectorization.  */
> +  (if (canonicalize_math_after_vectorization_p ())
> + (exps (mult (logs @0) @1
>  
>   (for sqrts (SQRT)
>cbrts (CBRT)
> --- gcc/generic-match-head.c.jj   2018-01-03 10:19:55.454534005 +0100
> +++ gcc/generic-match-head.c  2018-02-12 14:13:27.088784495 +0100
> @@ -68,3 +68,12 @@ canonicalize_math_p ()
>  {
>return true;
>  }
> +
> +/* Return true if math operations that are beneficial only after
> +   vectorization should be canonicalized.  */
> +
> +static inline bool
> +canonicalize_math_after_vectorization_p ()
> +{
> +  return false;
> +}
> --- gcc/gimple-match-head.c.jj2018-01-03 10:19:55.931534081 +0100
> +++ gcc/gimple-match-head.c   2018-02-12 14:14:17.352781873 +0100
> @@ -831,3 +831,12 @@ canonicalize_math_p ()
>  {
>return !cfun || (cfun->curr_properties & PROP_gimple_opt_math) == 0;
>  }
> +
> +/* Return true if math operations that are beneficial only after
> +   vectorization should be canonicalized.  */
> +
> +static inline bool
> +canonicalize_math_after_vectorization_p ()
> +{
> +  return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0;
> +}
> --- gcc/omp-simd-clone.h.jj   2018-02-12 18:11:01.843931808 +0100
> +++ gcc/omp-simd-clone.h  2018-02-12 18:12:13.901948041 +0100
> @@ -0,0 +1,26 @@
> +/* OMP constructs' SIMD clone supporting code.
> +
> +   Copyright (C) 2005-2018 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public 

Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-12 Thread Joseph Myers
On Sat, 10 Feb 2018, Wilco Dijkstra wrote:

> For floats exp2f is ~10% faster than expf, powf is 2.2 times slower, and 
> exp10f is 3.2 times slower (slower than powf due to using double pow).

I expect it would be reasonably straightforward to adapt Szabolcs's 
optimized expf to produce an optimized exp10f (and likewise for log10f).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309, take 2)

2018-02-12 Thread Jakub Jelinek
On Sat, Feb 10, 2018 at 03:26:46PM +0100, Jakub Jelinek wrote:
> If use_exp2 is true and (cfun->curr_properties & PROP_gimple_lvec) == 0,
> don't fold it?  Then I guess if we vectorize or slp vectorize the pow
> as vector pow, we'd need to match.pd it into the exp (log (vec_cst) * x).

Here is an updated patch, that defers it for pow (0x2.0pN, x) until after
vectorization and adds tree-vect-patterns.c matcher that will handle it
during vectorization (that one using exp, because we don't have exp2
vectorized).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-02-12  Jakub Jelinek  

PR middle-end/84309
* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
* generic-match-head.c (canonicalize_math_after_vectorization_p): New
inline function.
* gimple-match-head.c (canonicalize_math_after_vectorization_p): New
inline function.
* omp-simd-clone.h: New file.
* omp-simd-clone.c: Include omp-simd-clone.h.
(expand_simd_clones): No longer static.
* tree-vect-patterns.c: Include fold-const-call.h, attribs.h,
cgraph.h and omp-simd-clone.h.
(vect_recog_pow_pattern): Optimize pow(C,x) to exp(log(C)*x).
(vect_recog_widen_shift_pattern): Formatting fix.
(vect_pattern_recog_1): Don't check optab for calls.

* gcc.dg/pr84309.c: New test.
* gcc.target/i386/pr84309.c: New test.

--- gcc/match.pd.jj 2018-02-09 19:11:26.910070491 +0100
+++ gcc/match.pd2018-02-12 14:15:05.653779352 +0100
@@ -3992,15 +3992,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(logs (pows @0 @1))
(mult @1 (logs @0
 
- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
+or if C is a positive power of 2,
+pow(C,x) -> exp2(log2(C)*x).  */
  (for pows (POW)
   exps (EXP)
   logs (LOG)
+  exp2s (EXP2)
+  log2s (LOG2)
   (simplify
(pows REAL_CST@0 @1)
-(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
-&& real_isfinite (TREE_REAL_CST_PTR (@0)))
- (exps (mult (logs @0) @1)
+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
+   && real_isfinite (TREE_REAL_CST_PTR (@0)))
+(with {
+   const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
+   bool use_exp2 = false;
+   if (targetm.libc_has_function (function_c99_misc)
+  && value->cl == rvc_normal)
+{
+  REAL_VALUE_TYPE frac_rvt = *value;
+  SET_REAL_EXP (_rvt, 1);
+  if (real_equal (_rvt, ))
+use_exp2 = true;
+}
+ }
+ (if (!use_exp2)
+  (exps (mult (logs @0) @1))
+  /* As libmvec doesn't have a vectorized exp2, defer optimizing
+this until after vectorization.  */
+  (if (canonicalize_math_after_vectorization_p ())
+   (exps (mult (logs @0) @1
 
  (for sqrts (SQRT)
   cbrts (CBRT)
--- gcc/generic-match-head.c.jj 2018-01-03 10:19:55.454534005 +0100
+++ gcc/generic-match-head.c2018-02-12 14:13:27.088784495 +0100
@@ -68,3 +68,12 @@ canonicalize_math_p ()
 {
   return true;
 }
+
+/* Return true if math operations that are beneficial only after
+   vectorization should be canonicalized.  */
+
+static inline bool
+canonicalize_math_after_vectorization_p ()
+{
+  return false;
+}
--- gcc/gimple-match-head.c.jj  2018-01-03 10:19:55.931534081 +0100
+++ gcc/gimple-match-head.c 2018-02-12 14:14:17.352781873 +0100
@@ -831,3 +831,12 @@ canonicalize_math_p ()
 {
   return !cfun || (cfun->curr_properties & PROP_gimple_opt_math) == 0;
 }
+
+/* Return true if math operations that are beneficial only after
+   vectorization should be canonicalized.  */
+
+static inline bool
+canonicalize_math_after_vectorization_p ()
+{
+  return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0;
+}
--- gcc/omp-simd-clone.h.jj 2018-02-12 18:11:01.843931808 +0100
+++ gcc/omp-simd-clone.h2018-02-12 18:12:13.901948041 +0100
@@ -0,0 +1,26 @@
+/* OMP constructs' SIMD clone supporting code.
+
+   Copyright (C) 2005-2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_OMP_SIMD_CLONE_H
+#define GCC_OMP_SIMD_CLONE_H
+
+extern void expand_simd_clones (struct cgraph_node *);
+
+#endif /* GCC_OMP_SIMD_CLONE_H */
--- 

Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-10 Thread Jakub Jelinek
On Sat, Feb 10, 2018 at 12:29:42PM +0100, Richard Biener wrote:
> On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek  
> wrote:
> >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
> >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek
> > wrote:
> >> >Hi!
> >> >
> >> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
> >> >breaks some package (Marek should know which), as it has 7ulp error.
> >> >Generally one should be prepared for some errors with -ffast-math.
> >> >
> >> >Though, in this case, if the target has c99 runtime and C is
> >> >a positive 0x1pNN it seems much better to use exp2 over exp, for
> >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other
> >> >values log2(C) will still be some positive or negative integer, so
> >> >in many cases there won't be any rounding errors in the
> >multiplication.
> >> >
> >> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >> 
> >> OK. I wonder whether there are vectorized variants in libmvec? 
> >
> >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.
> 
> So maybe delay this folding then, there's already two phases we do for
> math functions.  Not sure if they conveniently align with vectorization...

How would that delay look like?
If use_exp2 is true and (cfun->curr_properties & PROP_gimple_lvec) == 0,
don't fold it?  Then I guess if we vectorize or slp vectorize the pow
as vector pow, we'd need to match.pd it into the exp (log (vec_cst) * x).

Jakub


Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-10 Thread Richard Biener
On February 10, 2018 3:26:46 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Sat, Feb 10, 2018 at 12:29:42PM +0100, Richard Biener wrote:
>> On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek
> wrote:
>> >On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
>> >> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek
>> > wrote:
>> >> >Hi!
>> >> >
>> >> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0
>optimization
>> >> >breaks some package (Marek should know which), as it has 7ulp
>error.
>> >> >Generally one should be prepared for some errors with
>-ffast-math.
>> >> >
>> >> >Though, in this case, if the target has c99 runtime and C is
>> >> >a positive 0x1pNN it seems much better to use exp2 over exp, for
>> >> >C being 2 pow (2, x) is optimized into exp2 (x) and even for
>other
>> >> >values log2(C) will still be some positive or negative integer,
>so
>> >> >in many cases there won't be any rounding errors in the
>> >multiplication.
>> >> >
>> >> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>trunk?
>> >> 
>> >> OK. I wonder whether there are vectorized variants in libmvec? 
>> >
>> >Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.
>> 
>> So maybe delay this folding then, there's already two phases we do
>for
>> math functions.  Not sure if they conveniently align with
>vectorization...
>
>How would that delay look like?
>If use_exp2 is true and (cfun->curr_properties & PROP_gimple_lvec) ==
>0,
>don't fold it?  

I think we have a canonicalize_math phase and an optimization one. But I'm not 
sure this transform matches either case. 

Then I guess if we vectorize or slp vectorize the pow
>as vector pow, we'd need to match.pd it into the exp (log (vec_cst) *
>x).

Yes.  Of course extending libmvec would be much preferred... 

Richard. 

>
>   Jakub



Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-10 Thread Marek Polacek
On Sat, Feb 10, 2018 at 12:37:38AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
> breaks some package (Marek should know which), as it has 7ulp error.
> Generally one should be prepared for some errors with -ffast-math.

I reduced it from "test-cachunker" in package casync.

Marek


Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-10 Thread Richard Biener
On February 10, 2018 10:44:37 AM GMT+01:00, Jakub Jelinek  
wrote:
>On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
>> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek
> wrote:
>> >Hi!
>> >
>> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
>> >breaks some package (Marek should know which), as it has 7ulp error.
>> >Generally one should be prepared for some errors with -ffast-math.
>> >
>> >Though, in this case, if the target has c99 runtime and C is
>> >a positive 0x1pNN it seems much better to use exp2 over exp, for
>> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other
>> >values log2(C) will still be some positive or negative integer, so
>> >in many cases there won't be any rounding errors in the
>multiplication.
>> >
>> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> 
>> OK. I wonder whether there are vectorized variants in libmvec? 
>
>Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.

So maybe delay this folding then, there's already two phases we do for math 
functions. Not sure if they conveniently align with vectorization... 

>Wonder how much work it would be to provide also that.
>
>Joseph, is exp2 in glibc .5ulp accurate like exp for double, or not?
>Anything known about their relative performance?
>
>> >Perhaps we should do something similar if C is a power of 10 (use
>exp10
>> >and log10).
>> >
>> >2018-02-10  Jakub Jelinek  
>> >
>> >PR middle-end/84309
>> >* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
>> >exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
>> >
>> >* gcc.dg/pr84309.c: New test.
>> > 
>> >--- gcc/match.pd.jj 2018-01-26 12:43:23.208922420 +0100
>> >+++ gcc/match.pd2018-02-09 18:48:26.412021408 +0100
>> >@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> >(logs (pows @0 @1))
>> >(mult @1 (logs @0
>> > 
>> >- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
>> >+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
>> >+or if C is a positive power of 2,
>> >+pow(C,x) -> exp2(log2(C)*x).  */
>> >  (for pows (POW)
>> >   exps (EXP)
>> >   logs (LOG)
>> >+  exp2s (EXP2)
>> >+  log2s (LOG2)
>> >   (simplify
>> >(pows REAL_CST@0 @1)
>> >-(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
>> >-&& real_isfinite (TREE_REAL_CST_PTR (@0)))
>> >- (exps (mult (logs @0) @1)
>> >+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
>> >+   && real_isfinite (TREE_REAL_CST_PTR (@0)))
>> >+(with {
>> >+   const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
>> >+   bool use_exp2 = false;
>> >+   if (targetm.libc_has_function (function_c99_misc)
>> >+  && value->cl == rvc_normal)
>> >+{
>> >+  REAL_VALUE_TYPE frac_rvt = *value;
>> >+  SET_REAL_EXP (_rvt, 1);
>> >+  if (real_equal (_rvt, ))
>> >+use_exp2 = true;
>> >+}
>> >+ }
>> >+ (if (use_exp2)
>> >+   (exp2s (mult (log2s @0) @1))
>> >+   (exps (mult (logs @0) @1)))
>> > 
>> >  (for sqrts (SQRT)
>> >   cbrts (CBRT)
>> >--- gcc/testsuite/gcc.dg/pr84309.c.jj   2018-02-09 18:54:52.254787678
>> >+0100
>> >+++ gcc/testsuite/gcc.dg/pr84309.c  2018-02-09 18:59:02.343636178
>+0100
>> >@@ -0,0 +1,14 @@
>> >+/* PR middle-end/84309 */
>> >+/* { dg-do run { target c99_runtime } } */
>> >+/* { dg-options "-O2 -ffast-math" } */
>> >+
>> >+int
>> >+main ()
>> >+{
>> >+  unsigned long a = 1024;
>> >+  unsigned long b = 16 * 1024;
>> >+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) +
>> >__builtin_log2 (b)) / 2);
>> >+  if (c != 4096)
>> >+__builtin_abort ();
>> >+  return 0;
>> >+}
>
>   Jakub



Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-10 Thread Jakub Jelinek
On Sat, Feb 10, 2018 at 08:00:04AM +0100, Richard Biener wrote:
> On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek  
> wrote:
> >Hi!
> >
> >Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
> >breaks some package (Marek should know which), as it has 7ulp error.
> >Generally one should be prepared for some errors with -ffast-math.
> >
> >Though, in this case, if the target has c99 runtime and C is
> >a positive 0x1pNN it seems much better to use exp2 over exp, for
> >C being 2 pow (2, x) is optimized into exp2 (x) and even for other
> >values log2(C) will still be some positive or negative integer, so
> >in many cases there won't be any rounding errors in the multiplication.
> >
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> OK. I wonder whether there are vectorized variants in libmvec? 

Unfortunately libmvec only provides pow and exp, not exp2 nor exp10.
Wonder how much work it would be to provide also that.

Joseph, is exp2 in glibc .5ulp accurate like exp for double, or not?
Anything known about their relative performance?

> >Perhaps we should do something similar if C is a power of 10 (use exp10
> >and log10).
> >
> >2018-02-10  Jakub Jelinek  
> >
> > PR middle-end/84309
> > * match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
> > exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
> >
> > * gcc.dg/pr84309.c: New test.
> > 
> >--- gcc/match.pd.jj  2018-01-26 12:43:23.208922420 +0100
> >+++ gcc/match.pd 2018-02-09 18:48:26.412021408 +0100
> >@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(logs (pows @0 @1))
> >(mult @1 (logs @0
> > 
> >- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
> >+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
> >+or if C is a positive power of 2,
> >+pow(C,x) -> exp2(log2(C)*x).  */
> >  (for pows (POW)
> >   exps (EXP)
> >   logs (LOG)
> >+  exp2s (EXP2)
> >+  log2s (LOG2)
> >   (simplify
> >(pows REAL_CST@0 @1)
> >-(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
> >- && real_isfinite (TREE_REAL_CST_PTR (@0)))
> >- (exps (mult (logs @0) @1)
> >+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
> >+&& real_isfinite (TREE_REAL_CST_PTR (@0)))
> >+(with {
> >+   const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
> >+   bool use_exp2 = false;
> >+   if (targetm.libc_has_function (function_c99_misc)
> >+   && value->cl == rvc_normal)
> >+ {
> >+   REAL_VALUE_TYPE frac_rvt = *value;
> >+   SET_REAL_EXP (_rvt, 1);
> >+   if (real_equal (_rvt, ))
> >+ use_exp2 = true;
> >+ }
> >+ }
> >+ (if (use_exp2)
> >+   (exp2s (mult (log2s @0) @1))
> >+   (exps (mult (logs @0) @1)))
> > 
> >  (for sqrts (SQRT)
> >   cbrts (CBRT)
> >--- gcc/testsuite/gcc.dg/pr84309.c.jj2018-02-09 18:54:52.254787678
> >+0100
> >+++ gcc/testsuite/gcc.dg/pr84309.c   2018-02-09 18:59:02.343636178 +0100
> >@@ -0,0 +1,14 @@
> >+/* PR middle-end/84309 */
> >+/* { dg-do run { target c99_runtime } } */
> >+/* { dg-options "-O2 -ffast-math" } */
> >+
> >+int
> >+main ()
> >+{
> >+  unsigned long a = 1024;
> >+  unsigned long b = 16 * 1024;
> >+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) +
> >__builtin_log2 (b)) / 2);
> >+  if (c != 4096)
> >+__builtin_abort ();
> >+  return 0;
> >+}

Jakub


Re: [PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-09 Thread Richard Biener
On February 10, 2018 12:37:38 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
>breaks some package (Marek should know which), as it has 7ulp error.
>Generally one should be prepared for some errors with -ffast-math.
>
>Though, in this case, if the target has c99 runtime and C is
>a positive 0x1pNN it seems much better to use exp2 over exp, for
>C being 2 pow (2, x) is optimized into exp2 (x) and even for other
>values log2(C) will still be some positive or negative integer, so
>in many cases there won't be any rounding errors in the multiplication.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. I wonder whether there are vectorized variants in libmvec? 

Richard. 

>Perhaps we should do something similar if C is a power of 10 (use exp10
>and log10).
>
>2018-02-10  Jakub Jelinek  
>
>   PR middle-end/84309
>   * match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
>   exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.
>
>   * gcc.dg/pr84309.c: New test.
> 
>--- gcc/match.pd.jj2018-01-26 12:43:23.208922420 +0100
>+++ gcc/match.pd   2018-02-09 18:48:26.412021408 +0100
>@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(logs (pows @0 @1))
>(mult @1 (logs @0
> 
>- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
>+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
>+or if C is a positive power of 2,
>+pow(C,x) -> exp2(log2(C)*x).  */
>  (for pows (POW)
>   exps (EXP)
>   logs (LOG)
>+  exp2s (EXP2)
>+  log2s (LOG2)
>   (simplify
>(pows REAL_CST@0 @1)
>-(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
>-   && real_isfinite (TREE_REAL_CST_PTR (@0)))
>- (exps (mult (logs @0) @1)
>+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
>+  && real_isfinite (TREE_REAL_CST_PTR (@0)))
>+(with {
>+   const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
>+   bool use_exp2 = false;
>+   if (targetm.libc_has_function (function_c99_misc)
>+ && value->cl == rvc_normal)
>+   {
>+ REAL_VALUE_TYPE frac_rvt = *value;
>+ SET_REAL_EXP (_rvt, 1);
>+ if (real_equal (_rvt, ))
>+   use_exp2 = true;
>+   }
>+ }
>+ (if (use_exp2)
>+   (exp2s (mult (log2s @0) @1))
>+   (exps (mult (logs @0) @1)))
> 
>  (for sqrts (SQRT)
>   cbrts (CBRT)
>--- gcc/testsuite/gcc.dg/pr84309.c.jj  2018-02-09 18:54:52.254787678
>+0100
>+++ gcc/testsuite/gcc.dg/pr84309.c 2018-02-09 18:59:02.343636178 +0100
>@@ -0,0 +1,14 @@
>+/* PR middle-end/84309 */
>+/* { dg-do run { target c99_runtime } } */
>+/* { dg-options "-O2 -ffast-math" } */
>+
>+int
>+main ()
>+{
>+  unsigned long a = 1024;
>+  unsigned long b = 16 * 1024;
>+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) +
>__builtin_log2 (b)) / 2);
>+  if (c != 4096)
>+__builtin_abort ();
>+  return 0;
>+}
>
>   Jakub



[PATCH] Improve pow (C, x) -> exp (log (C) * x) optimization (PR middle-end/84309)

2018-02-09 Thread Jakub Jelinek
Hi!

Apparently the new pow(C,x) -> exp(log(C)*x) if C > 0 optimization
breaks some package (Marek should know which), as it has 7ulp error.
Generally one should be prepared for some errors with -ffast-math.

Though, in this case, if the target has c99 runtime and C is
a positive 0x1pNN it seems much better to use exp2 over exp, for
C being 2 pow (2, x) is optimized into exp2 (x) and even for other
values log2(C) will still be some positive or negative integer, so
in many cases there won't be any rounding errors in the multiplication.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Perhaps we should do something similar if C is a power of 10 (use exp10
and log10).

2018-02-10  Jakub Jelinek  

PR middle-end/84309
* match.pd (pow(C,x) -> exp(log(C)*x)): Optimize instead into
exp2(log2(C)*x) if C is a power of 2 and c99 runtime is available.

* gcc.dg/pr84309.c: New test.
 
--- gcc/match.pd.jj 2018-01-26 12:43:23.208922420 +0100
+++ gcc/match.pd2018-02-09 18:48:26.412021408 +0100
@@ -3992,15 +3992,33 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(logs (pows @0 @1))
(mult @1 (logs @0
 
- /* pow(C,x) -> exp(log(C)*x) if C > 0.  */
+ /* pow(C,x) -> exp(log(C)*x) if C > 0,
+or if C is a positive power of 2,
+pow(C,x) -> exp2(log2(C)*x).  */
  (for pows (POW)
   exps (EXP)
   logs (LOG)
+  exp2s (EXP2)
+  log2s (LOG2)
   (simplify
(pows REAL_CST@0 @1)
-(if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
-&& real_isfinite (TREE_REAL_CST_PTR (@0)))
- (exps (mult (logs @0) @1)
+   (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), )
+   && real_isfinite (TREE_REAL_CST_PTR (@0)))
+(with {
+   const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
+   bool use_exp2 = false;
+   if (targetm.libc_has_function (function_c99_misc)
+  && value->cl == rvc_normal)
+{
+  REAL_VALUE_TYPE frac_rvt = *value;
+  SET_REAL_EXP (_rvt, 1);
+  if (real_equal (_rvt, ))
+use_exp2 = true;
+}
+ }
+ (if (use_exp2)
+   (exp2s (mult (log2s @0) @1))
+   (exps (mult (logs @0) @1)))
 
  (for sqrts (SQRT)
   cbrts (CBRT)
--- gcc/testsuite/gcc.dg/pr84309.c.jj   2018-02-09 18:54:52.254787678 +0100
+++ gcc/testsuite/gcc.dg/pr84309.c  2018-02-09 18:59:02.343636178 +0100
@@ -0,0 +1,14 @@
+/* PR middle-end/84309 */
+/* { dg-do run { target c99_runtime } } */
+/* { dg-options "-O2 -ffast-math" } */
+
+int
+main ()
+{
+  unsigned long a = 1024;
+  unsigned long b = 16 * 1024;
+  unsigned long c = __builtin_pow (2, (__builtin_log2 (a) + __builtin_log2 
(b)) / 2);
+  if (c != 4096)
+__builtin_abort ();
+  return 0;
+}

Jakub