Hey Toni, If this optimization is valid for any float, we should definitely do it, and this is a missed optimization. If it's not valid for all floats, I'm not sure how we should handle it, if at all.
Alex On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis < toni.mat...@student.hpi.uni-potsdam.de> wrote: > Hello, > > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm > registers when dividing a float by a constant C. This consumes an order > of magnitude more CPU cycles than the corresponding "MULSD" instruction > with a precomputed 1/C. > > I know that only powers of two have an exact reciprocal floating point > representation, but there might be a benefit in trading the least > significant digit for a more significant speedup. > > So, is this a missed optimization (at least for reasonably accurate > cases), a present or possibly future option (like -ffast-math in gcc) or > are there more reasons against it? > > > Thanks, > > Toni > > > --- PS: Small Example --- > > This function takes on average 0.41 seconds to compute on an > array.array('d') with 10**8 elements between 0 and 1: > > def spikes_div(data, threshold=1.99): > count = 0 > for i in data: > if i / 0.5 > threshold: > count += 1 > return count > > Rewritten with a multiplication it takes about 0.29 seconds on average, > speeding it up by factor 1.4: > > ... > if i * 2.0 > threshold: > ... > > > The traces contain the same instructions (except for the MULSD/DIVSD) > and run the same number of times. I'm working with a fresh translation > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation > Core i7 CPU. > > > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > https://mail.python.org/mailman/listinfo/pypy-dev >
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev