one bad thing about making such decisions (assuming we can loose a bit
of precision, which I'm not convinced about) would mean that you would
get different results when the code is jitted vs when the code is not
jitted. I think this is not acceptable.

On Thu, Nov 6, 2014 at 3:48 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> On Wed, Nov 05, 2014 at 10:02:01PM +0000, Alex Gaynor wrote:
>> Hey Toni,
>>
>> If this optimization is valid for any float, we should definitely do it,
>> and this is a missed optimization. If it's not valid for all floats, I'm
>> not sure how we should handle it, if at all.
>
> I don't believe that it is valid for floats apart from exact powers of
> two. Toni says:
>
>> > only powers of two have an exact reciprocal floating point
>> > representation, but there might be a benefit in trading the least
>> > significant digit for a more significant speedup.
>
> Please don't make that decision for the user. If I want to trade off
> accuracy for speed, I can write:
>
> r = 1/x
> y*r
>
> but if I write y/x, I expect y/x to the full accuracy available.
>
>
> Thanks,
>
>
> Steve
>
>
>
>
>>
>> Alex
>>
>> On Wed Nov 05 2014 at 10:16:36 AM Toni Mattis <
>> toni.mat...@student.hpi.uni-potsdam.de> wrote:
>>
>> > Hello,
>> >
>> > I discovered that PyPy's JIT generates "DIVSD" instructions on xmm
>> > registers when dividing a float by a constant C. This consumes an order
>> > of magnitude more CPU cycles than the corresponding "MULSD" instruction
>> > with a precomputed 1/C.
>> >
>> > I know that only powers of two have an exact reciprocal floating point
>> > representation, but there might be a benefit in trading the least
>> > significant digit for a more significant speedup.
>> >
>> > So, is this a missed optimization (at least for reasonably accurate
>> > cases), a present or possibly future option (like -ffast-math in gcc) or
>> > are there more reasons against it?
>> >
>> >
>> > Thanks,
>> >
>> > Toni
>> >
>> >
>> > --- PS: Small Example ---
>> >
>> > This function takes on average 0.41 seconds to compute on an
>> > array.array('d') with 10**8 elements between 0 and 1:
>> >
>> >     def spikes_div(data, threshold=1.99):
>> >         count = 0
>> >         for i in data:
>> >             if i / 0.5 > threshold:
>> >                 count += 1
>> >         return count
>> >
>> > Rewritten with a multiplication it takes about 0.29 seconds on average,
>> > speeding it up by factor 1.4:
>> >
>> >         ...
>> >             if i * 2.0 > threshold:
>> >         ...
>> >
>> >
>> > The traces contain the same instructions (except for the MULSD/DIVSD)
>> > and run the same number of times. I'm working with a fresh translation
>> > of the current PyPy default on Ubuntu 14.04 x64 with a 2nd generation
>> > Core i7 CPU.
>> >
>> >
>> > _______________________________________________
>> > pypy-dev mailing list
>> > pypy-dev@python.org
>> > https://mail.python.org/mailman/listinfo/pypy-dev
>> >
>
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev@python.org
>> https://mail.python.org/mailman/listinfo/pypy-dev
>
> _______________________________________________
> pypy-dev mailing list
> pypy-dev@python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to