Re: [PATCH] Simplify pow with constant

2017-08-04 Thread Joseph Myers
On Fri, 4 Aug 2017, Richard Biener wrote:

> >> Do this only for fast-math as accuracy is reduced.  This is much faster
> >> since pow is more complex than exp - with a current GLIBC the speedup
> >> is more than 7 times for this transformation.
> >
> > Is it bound to be so on future glibc revisions and non-glibc platforms?
> 
> And how is accuracy affected?  I think the transform is only reasonable

For pow to be accurate when the result has large (positive or negative) 
exponent, it needs to compute the log and intermediate multiplication to a 
precision around (number of mantissa bits + number of exponent bits).  
This is inevitably slower than when you omit the extra intermediate 
precision (and if you omit that precision, the error can be around MAX_EXP 
ulps).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Simplify pow with constant

2017-08-04 Thread Wilco Dijkstra
Richard Biener wrote:
> On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov  wrote:
> > On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
> >> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
> >
> > I don't think you can do that for non-positive C.

True, that can be easily disallowed.

> Hmm, the question is also how this interacts with other folders like
> sqrt (pow (x, y)) -> pow (|x|, y * 0,5)?  Also we seem to miss

We fold sqrt (pow (C, x)) into pow (C, x * 0.5) first, then fold that to exp.

> pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may
> be a better fit than exp (log (2/10) * x)?  OTOH for fast-math
> canonicalization getting rid of exp2/10 and pow10 might be beneficial.

exp10 is non-standard and doesn't have a first-class implementation in GLIBC.
Although pow (10, x) is frequently used in Fortran, I can't get exp10 emitted
by match.pd...

>> Do this only for fast-math as accuracy is reduced.  This is much faster
>> since pow is more complex than exp - with a current GLIBC the speedup
>> is more than 7 times for this transformation.
>
> Is it bound to be so on future glibc revisions and non-glibc platforms?

Yes, pow is basically log followed by exp, so exp will always be cheaper than
pow. How much will obviously vary depending on the implementation.
Szabolc's highly optimized expf has 3x throughput of the optimized powf.

> And how is accuracy affected?  I think the transform is only reasonable
> for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10).  Can you
> provide an idea on whether there's a systematic error (with glibc) and
> how that behaves over the parameter space?

Accuracy depends again on the library implementation. If log (C) is accurate
(ie. far less than 0.5 ULP), and exp (x) accurate to 0.5 ULP then you get 
perfect answers over the full range.

The exp function has the largest steps close to inf - a 1 ULP change in input
changes the output by 1024 ULP (128 ULP for expf). So a 0.5ULP input error
would give ~512 ULP error if the final result is close to inf. In practice the 
output
doesn't get anywhere near inf, so ULP errors are far smaller.

> Oh, and what value of C does the benchmark that triggered this have?

10 appears quite common. I extracted a runtime log of all powf calls in SPEC
(see https://sourceware.org/ml/libc-alpha/2017-06/msg00718.html) and noticed
a lot of repetition in some inputs. Further investigation showed many uses of
pow have a constant first operand, so an obvious target for optimization.

Wilco

Re: [PATCH] Simplify pow with constant

2017-08-04 Thread Richard Biener
On Fri, Aug 4, 2017 at 2:26 PM, Alexander Monakov  wrote:
> On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
>> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
>
> I don't think you can do that for non-positive C.

Hmm, the question is also how this interacts with other folders like
sqrt (pow (x, y)) -> pow (|x|, y * 0,5)?  Also we seem to miss
pow (2, x) -> exp2 (x) and pow (10, x) -> pow10/exp10, those may
be a better fit than exp (log (2/10) * x)?  OTOH for fast-math
canonicalization getting rid of exp2/10 and pow10 might be beneficial.

>> Do this only for fast-math as accuracy is reduced.  This is much faster
>> since pow is more complex than exp - with a current GLIBC the speedup
>> is more than 7 times for this transformation.
>
> Is it bound to be so on future glibc revisions and non-glibc platforms?

And how is accuracy affected?  I think the transform is only reasonable
for log (C) being close to e, 2 or 10 (using exp, exp2 or exp10).  Can you
provide an idea on whether there's a systematic error (with glibc) and
how that behaves over the parameter space?

Oh, and what value of C does the benchmark that triggered this have?

Richard.

> Alexander


Re: [PATCH] Simplify pow with constant

2017-08-04 Thread Alexander Monakov
On Fri, 4 Aug 2017, Wilco Dijkstra wrote:
> This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).

I don't think you can do that for non-positive C.

> Do this only for fast-math as accuracy is reduced.  This is much faster
> since pow is more complex than exp - with a current GLIBC the speedup
> is more than 7 times for this transformation.

Is it bound to be so on future glibc revisions and non-glibc platforms?

Alexander


[PATCH] Simplify pow with constant

2017-08-04 Thread Wilco Dijkstra
This patch simplifies pow (C, x) into exp (x * C1), where C1 = log (C).
Do this only for fast-math as accuracy is reduced.  This is much faster
since pow is more complex than exp - with a current GLIBC the speedup
is more than 7 times for this transformation.

ChangeLog:
2017-08-04  Wilco Dijkstra  

* match.pd: Add pow (C, x) simplification.

--

diff --git a/gcc/match.pd b/gcc/match.pd
index 
e98db52af84946cf579c6434e06d450713a47162..96486aa1f512fe32d85a1de95c46523263ea1b6d
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3548,6 +3548,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(logs (pows @0 @1))
(mult @1 (logs @0
 
+ /* pow(C,x) -> exp(log(C)*x).  */
+ (for pows (POW)
+  exps (EXP)
+  logs (LOG)
+  (simplify
+   (pows REAL_CST@0 @1)
+   (exps (mult (logs @0) @1
+
  (for sqrts (SQRT)
   cbrts (CBRT)
   pows (POW)