Hello, I discovered that PyPy's JIT generates "DIVSD" instructions on xmm registers when dividing a float by a constant C. This consumes an order of magnitude more CPU cycles than the corresponding "MULSD" instruction with a precomputed 1/C.
I know that only powers of two have an exact reciprocal floating point representation, but there might be a benefit in trading the least significant digit for a more significant speedup. So, is this a missed optimization (at least for reasonably accurate cases), a present or possibly future option (like -ffast-math in gcc) or are there more reasons against it? Thanks, Toni --- PS: Small Example --- This function takes on average 0.41 seconds to compute on an array.array('d') with 10**8 elements between 0 and 1: def spikes_div(data, threshold=1.99): count = 0 for i in data: if i / 0.5 > threshold: count += 1 return count Rewritten with a multiplication it takes about 0.29 seconds on average, speeding it up by factor 1.4: ... if i * 2.0 > threshold: ... The traces contain the same instructions (except for the MULSD/DIVSD) and run the same number of times. I'm working with a fresh translation of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation Core i7 CPU. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev