Hello,

On Tue, Feb 28, 2012 at 12:36 PM, Florian Schneider <[email protected]
> wrote:

> [+vegorov]
>
> Sorry for the delay. As I see, there are two optimizations here:
>

> 1. Optimizing operations where the result is truncated to int32 (used in a
> bitwise-operation)
>
> (1) is a little tricky to preserve the correct result in case of
> overflowing the precise integer range and get the corner cases
> right. Vyacheslav (cc'ed) and I also did some experiments but did not yet
> come up with a satisfying fast solution. I think your solution is missing
> the propagation through Phi-instructions which is needed in some of the
> crypto-like benchmarks.
>

I agree this is quite tricky. I think there are two sub-issues here.
- The first is to track what operations can be optimized. The range
tracking is lacking information to do that optimally. I found that
propagation of the optimizable status to earlier operations is very limited
by information loss.
- The second is to correctly round and clear the bottom bits within
optimized operations. This is hard in the multiplication case, but feasible.
I found there was a little imprecision here.
ECMA-262 specifies

11.5.1 Applying the * Operator
[...]
In the remaining cases, where neither an infinity or NaN is involved, the
product is computed
and* rounded to the nearest representable value using IEEE 754
round-to-nearest mode*. If
the magnitude is too large to represent, the result is then an infinity of
appropriate sign. If

But in section 4.3 of IEEE-754, two sub-modes are defined for rounding to
nearest: roundTiesToEven and roundTiesToAway.
The standard only requires the implementation of roundTiesToEven, so this
is the one I implemented when rounding multiplication.
This means that the result may differ from what a  FPU is configured
to roundTiesToAway would yield.
However it seems this still satisfies ECMA-262.



> 2. Optimizing Math.floor(x/y)
>
> I think (2) is definitely a good idea. It is a very common pattern used by
> programmers.
>
> I would appreciate if you could separate the Math.floor optimization into
> a separate CL for review.
>

I will do that.
I thought about implementing a similar optimization for (x / y) | 0 (rather
than using 1.) in the same CL, but I think it requires too many additional
changes. I believe it would be interesting as a separate CL to extend the
well known '| 0' trick to the division operation.


>
> Thanks,
> Florian
>

Thanks,

Alexandre


>
> Den 28. feb. 2012 12.10 skrev Alexandre Rames <[email protected]>:
>
>  Any comments or ideas about this?
>>
>> Alexandre
>>
>>
>> On Fri, Feb 3, 2012 at 1:08 PM, Alexandre Rames <
>> [email protected]> wrote:
>>
>>> Hello,
>>>
>>> Sorry for the late answer.
>>>
>>> Although the patch locally improves a few code segments, there are no
>>> observable performance gain on the usual benchmarks.
>>> The main reason seems to be that benchmarks are designed to avoid (or
>>> just don't) going through these bad cases.
>>>
>>> However this extends the trick (var | 0) well known from JS
>>> programmers (example 
>>> here<http://code.google.com/p/v8/issues/detail?id=1840>).
>>> Once it lands it will allow to generate more optimized code. So I think
>>> there is a good potential use for this.
>>>
>>> On the very following very dummy benchmark below, I get the following
>>> timings on my Tegra2:
>>>
>>> $ time ./d8.10579 test/local/bitwise.js
>>>
>>> real    0m4.852s
>>> user    0m4.780s
>>> sys     0m0.030s
>>>
>>> $ time ./d8.bitwise test/local/bitwise.js
>>>
>>> real    0m2.092s
>>> user    0m2.060s
>>> sys     0m0.030s
>>>
>>>
>>> // bitwise.js -----------------------------------------------------------
>>> function main() {
>>>   for(var i = 0; i < 30000000; i++) {
>>>     test(i, i*3);
>>>   }
>>> }
>>> function test(a, b) {
>>>   var v1 = a - 5;
>>>   var v2 = b >> 1;
>>>   return ((a / b) | 0) + ((a * (b + 1)) >> 5) + Math.floor(b / 5);
>>> }
>>> main();
>>>
>>>
>>> And that it is without a SDIV instruction...
>>> I admit here we benefit greatly from the optimization, especially for
>>> the divisions.
>>>
>>> I think it would be interesting to rewrite the crypto benchmarks using
>>> this optimization...
>>>
>>> Alexandre
>>>
>>>
>>>
>>>
>>> On Tue, Jan 31, 2012 at 3:51 PM, Daniel Clifford <[email protected]>wrote:
>>>
>>>> I'll add Florian and Kevin to your CL, they are probably the best
>>>> qualified to review what you've done. Can you please remind me which of the
>>>> benchmarks you found this to help on?
>>>>
>>>> Thanks,
>>>>  Danno
>>>>
>>>>
>>>> On Tue, Jan 31, 2012 at 4:42 PM, Alexandre Rames <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> A few details to help the reviewers.
>>>>>
>>>>> - The hydrogen level optimization to optimize HBitwise instructions
>>>>> away when canonicalizing has been moved down to Lithium level.
>>>>>   Removing the HBitwise instruction loses information on how the
>>>>> inputs are used and prevents from recognizing patterns like (expr | 0) 
>>>>> when
>>>>> inserting representation changes (see below).
>>>>>   The problem with moving this down to Lithium is that the code is
>>>>> duplicated for all architectures.
>>>>> - The first step, triggered during the range inference, visits the
>>>>> Hydrogen graph to mark operations only used by bitwise operations. This
>>>>> allows to discard for example overflow checks or bailout on minus zero.
>>>>>   We may want to move this somewhere else.
>>>>> - The second steps occurs when representation changes are inserted
>>>>> before a bitwise operation. We try to recognize patterns like ((a / b) | 
>>>>> 0)
>>>>> or Math.floor(a / b) and optimize the generated code.
>>>>> - Basic support has been added for other architectures than ARM, but
>>>>> they still lack the optimized code part (implementation should be easy on
>>>>> ia32 and x64).
>>>>>
>>>>> The optimization can be extended to support more cases. I think this
>>>>> handles enough for a first step.
>>>>>
>>>>> Please don't hesitate to ask me for details (email or chat).
>>>>>
>>>>> Alexandre
>>>>>
>>>>> On Tue, Jan 31, 2012 at 3:26 PM, <[email protected]> wrote:
>>>>>
>>>>>> Reviewers: danno,
>>>>>>
>>>>>> Description:
>>>>>> Optimise code for bitwise usage
>>>>>>
>>>>>> BUG=none
>>>>>> TEST=Added bitwise-use.js, math-floor-div.js
>>>>>>
>>>>>> Please review this at 
>>>>>> http://codereview.chromium.**org/9301041/<http://codereview.chromium.org/9301041/>
>>>>>>
>>>>>> SVN Base: 
>>>>>> http://v8.googlecode.com/svn/**branches/bleeding_edge/<http://v8.googlecode.com/svn/branches/bleeding_edge/>
>>>>>>
>>>>>> Affected files:
>>>>>>  M     src/arm/lithium-arm.h
>>>>>>  M     src/arm/lithium-arm.cc
>>>>>>  M     src/arm/lithium-codegen-arm.h
>>>>>>  M     src/arm/lithium-codegen-arm.cc
>>>>>>  M     src/arm/macro-assembler-arm.h
>>>>>>  M     src/arm/macro-assembler-arm.cc
>>>>>>  M     src/hydrogen-instructions.h
>>>>>>  M     src/hydrogen-instructions.cc
>>>>>>  M     src/hydrogen.cc
>>>>>>  M     src/ia32/lithium-codegen-ia32.**cc
>>>>>>  M     src/ia32/lithium-ia32.h
>>>>>>  M     src/ia32/lithium-ia32.cc
>>>>>>  M     src/mips/lithium-codegen-mips.**cc
>>>>>>  M     src/mips/lithium-mips.h
>>>>>>  M     src/mips/lithium-mips.cc
>>>>>>  M     src/utils.h
>>>>>>  M     src/utils.cc
>>>>>>  M     src/x64/lithium-codegen-x64.cc
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> v8-dev mailing list
>>>>>> [email protected]
>>>>>> http://groups.google.com/**group/v8-dev<http://groups.google.com/group/v8-dev>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>  --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>>
>
>  --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
>

-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Reply via email to