On Wed, Nov 05, 2014 at 10:02:01PM +0000, Alex Gaynor wrote:
> Hey Toni,
> 
> If this optimization is valid for any float, we should definitely do it,
> and this is a missed optimization. If it's not valid for all floats, I'm
> not sure how we should handle it, if at all.

I don't believe that it is valid for floats apart from exact powers of 
two. Toni says:

> > only powers of two have an exact reciprocal floating point
> > representation, but there might be a benefit in trading the least
> > significant digit for a more significant speedup.

Please don't make that decision for the user. If I want to trade off 
accuracy for speed, I can write:

r = 1/x
y*r

but if I write y/x, I expect y/x to the full accuracy available.


Thanks,


Steve




> 
> Alex
> 
> On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis <
> toni.mat...@student.hpi.uni-potsdam.de> wrote:
> 
> > Hello,
> >
> > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm
> > registers when dividing a float by a constant C. This consumes an order
> > of magnitude more CPU cycles than the corresponding "MULSD" instruction
> > with a precomputed 1/C.
> >
> > I know that only powers of two have an exact reciprocal floating point
> > representation, but there might be a benefit in trading the least
> > significant digit for a more significant speedup.
> >
> > So, is this a missed optimization (at least for reasonably accurate
> > cases), a present or possibly future option (like -ffast-math in gcc) or
> > are there more reasons against it?
> >
> >
> > Thanks,
> >
> > Toni
> >
> >
> > --- PS: Small Example ---
> >
> > This function takes on average 0.41 seconds to compute on an
> > array.array('d') with 10**8 elements between 0 and 1:
> >
> >     def spikes_div(data, threshold=1.99):
> >         count = 0
> >         for i in data:
> >             if i / 0.5 > threshold:
> >                 count += 1
> >         return count
> >
> > Rewritten with a multiplication it takes about 0.29 seconds on average,
> > speeding it up by factor 1.4:
> >
> >         ...
> >             if i * 2.0 > threshold:
> >         ...
> >
> >
> > The traces contain the same instructions (except for the MULSD/DIVSD)
> > and run the same number of times. I'm working with a fresh translation
> > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation
> > Core i7 CPU.
> >
> >
> > _______________________________________________
> > pypy-dev mailing list
> > pypy-dev@python.org
> > https://mail.python.org/mailman/listinfo/pypy-dev
> >

> _______________________________________________
> pypy-dev mailing list
> pypy-dev@python.org
> https://mail.python.org/mailman/listinfo/pypy-dev

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to