one bad thing about making such decisions (assuming we can loose a bit of precision, which I'm not convinced about) would mean that you would get different results when the code is jitted vs when the code is not jitted. I think this is not acceptable.
On Thu, Nov 6, 2014 at 3:48 AM, Steven D'Aprano <st...@pearwood.info> wrote: > On Wed, Nov 05, 2014 at 10:02:01PM +0000, Alex Gaynor wrote: >> Hey Toni, >> >> If this optimization is valid for any float, we should definitely do it, >> and this is a missed optimization. If it's not valid for all floats, I'm >> not sure how we should handle it, if at all. > > I don't believe that it is valid for floats apart from exact powers of > two. Toni says: > >> > only powers of two have an exact reciprocal floating point >> > representation, but there might be a benefit in trading the least >> > significant digit for a more significant speedup. > > Please don't make that decision for the user. If I want to trade off > accuracy for speed, I can write: > > r = 1/x > y*r > > but if I write y/x, I expect y/x to the full accuracy available. > > > Thanks, > > > Steve > > > > >> >> Alex >> >> On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis < >> toni.mat...@student.hpi.uni-potsdam.de> wrote: >> >> > Hello, >> > >> > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm >> > registers when dividing a float by a constant C. This consumes an order >> > of magnitude more CPU cycles than the corresponding "MULSD" instruction >> > with a precomputed 1/C. >> > >> > I know that only powers of two have an exact reciprocal floating point >> > representation, but there might be a benefit in trading the least >> > significant digit for a more significant speedup. >> > >> > So, is this a missed optimization (at least for reasonably accurate >> > cases), a present or possibly future option (like -ffast-math in gcc) or >> > are there more reasons against it? >> > >> > >> > Thanks, >> > >> > Toni >> > >> > >> > --- PS: Small Example --- >> > >> > This function takes on average 0.41 seconds to compute on an >> > array.array('d') with 10**8 elements between 0 and 1: >> > >> > def spikes_div(data, threshold=1.99): >> > count = 0 >> > for i in data: >> > if i / 0.5 > threshold: >> > count += 1 >> > return count >> > >> > Rewritten with a multiplication it takes about 0.29 seconds on average, >> > speeding it up by factor 1.4: >> > >> > ... >> > if i * 2.0 > threshold: >> > ... >> > >> > >> > The traces contain the same instructions (except for the MULSD/DIVSD) >> > and run the same number of times. I'm working with a fresh translation >> > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation >> > Core i7 CPU. >> > >> > >> > _______________________________________________ >> > pypy-dev mailing list >> > pypy-dev@python.org >> > https://mail.python.org/mailman/listinfo/pypy-dev >> > > >> _______________________________________________ >> pypy-dev mailing list >> pypy-dev@python.org >> https://mail.python.org/mailman/listinfo/pypy-dev > > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev