Re: [v8-dev] Optimise code for bitwise usage... (issue 9301041)

Florian Schneider Tue, 28 Feb 2012 04:37:06 -0800

[+vegorov]

Sorry for the delay. As I see, there are two optimizations here:


1. Optimizing operations where the result is truncated to int32 (used in a
bitwise-operation)

(1) is a little tricky to preserve the correct result in case of
overflowing the precise integer range and get the corner cases
right. Vyacheslav (cc'ed) and I also did some experiments but did not yet
come up with a satisfying fast solution. I think your solution is missing
the propagation through Phi-instructions which is needed in some of the
crypto-like benchmarks.

2. Optimizing Math.floor(x/y)

I think (2) is definitely a good idea. It is a very common pattern used by
programmers.

I would appreciate if you could separate the Math.floor optimization into a
separate CL for review.

Thanks,
Florian

Den 28. feb. 2012 12.10 skrev Alexandre Rames <[email protected]>:

> Any comments or ideas about this?
>
> Alexandre
>
>
> On Fri, Feb 3, 2012 at 1:08 PM, Alexandre Rames <[email protected]
> > wrote:
>
>> Hello,
>>
>> Sorry for the late answer.
>>
>> Although the patch locally improves a few code segments, there are no
>> observable performance gain on the usual benchmarks.
>> The main reason seems to be that benchmarks are designed to avoid (or
>> just don't) going through these bad cases.
>>
>> However this extends the trick (var | 0) well known from JS
>> programmers (example 
>> here<http://code.google.com/p/v8/issues/detail?id=1840>).
>> Once it lands it will allow to generate more optimized code. So I think
>> there is a good potential use for this.
>>
>> On the very following very dummy benchmark below, I get the following
>> timings on my Tegra2:
>>
>> $ time ./d8.10579 test/local/bitwise.js
>>
>> real    0m4.852s
>> user    0m4.780s
>> sys     0m0.030s
>>
>> $ time ./d8.bitwise test/local/bitwise.js
>>
>> real    0m2.092s
>> user    0m2.060s
>> sys     0m0.030s
>>
>>
>> // bitwise.js -----------------------------------------------------------
>> function main() {
>>   for(var i = 0; i < 30000000; i++) {
>>     test(i, i*3);
>>   }
>> }
>> function test(a, b) {
>>   var v1 = a - 5;
>>   var v2 = b >> 1;
>>   return ((a / b) | 0) + ((a * (b + 1)) >> 5) + Math.floor(b / 5);
>> }
>> main();
>>
>>
>> And that it is without a SDIV instruction...
>> I admit here we benefit greatly from the optimization, especially for the
>> divisions.
>>
>> I think it would be interesting to rewrite the crypto benchmarks using
>> this optimization...
>>
>> Alexandre
>>
>>
>>
>>
>> On Tue, Jan 31, 2012 at 3:51 PM, Daniel Clifford <[email protected]>wrote:
>>
>>> I'll add Florian and Kevin to your CL, they are probably the best
>>> qualified to review what you've done. Can you please remind me which of the
>>> benchmarks you found this to help on?
>>>
>>> Thanks,
>>>  Danno
>>>
>>>
>>> On Tue, Jan 31, 2012 at 4:42 PM, Alexandre Rames <
>>> [email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> A few details to help the reviewers.
>>>>
>>>> - The hydrogen level optimization to optimize HBitwise instructions
>>>> away when canonicalizing has been moved down to Lithium level.
>>>>   Removing the HBitwise instruction loses information on how the inputs
>>>> are used and prevents from recognizing patterns like (expr | 0) when
>>>> inserting representation changes (see below).
>>>>   The problem with moving this down to Lithium is that the code is
>>>> duplicated for all architectures.
>>>> - The first step, triggered during the range inference, visits the
>>>> Hydrogen graph to mark operations only used by bitwise operations. This
>>>> allows to discard for example overflow checks or bailout on minus zero.
>>>>   We may want to move this somewhere else.
>>>> - The second steps occurs when representation changes are inserted
>>>> before a bitwise operation. We try to recognize patterns like ((a / b) | 0)
>>>> or Math.floor(a / b) and optimize the generated code.
>>>> - Basic support has been added for other architectures than ARM, but
>>>> they still lack the optimized code part (implementation should be easy on
>>>> ia32 and x64).
>>>>
>>>> The optimization can be extended to support more cases. I think this
>>>> handles enough for a first step.
>>>>
>>>> Please don't hesitate to ask me for details (email or chat).
>>>>
>>>> Alexandre
>>>>
>>>> On Tue, Jan 31, 2012 at 3:26 PM, <[email protected]> wrote:
>>>>
>>>>> Reviewers: danno,
>>>>>
>>>>> Description:
>>>>> Optimise code for bitwise usage
>>>>>
>>>>> BUG=none
>>>>> TEST=Added bitwise-use.js, math-floor-div.js
>>>>>
>>>>> Please review this at 
>>>>> http://codereview.chromium.**org/9301041/<http://codereview.chromium.org/9301041/>
>>>>>
>>>>> SVN Base: 
>>>>> http://v8.googlecode.com/svn/**branches/bleeding_edge/<http://v8.googlecode.com/svn/branches/bleeding_edge/>
>>>>>
>>>>> Affected files:
>>>>>  M     src/arm/lithium-arm.h
>>>>>  M     src/arm/lithium-arm.cc
>>>>>  M     src/arm/lithium-codegen-arm.h
>>>>>  M     src/arm/lithium-codegen-arm.cc
>>>>>  M     src/arm/macro-assembler-arm.h
>>>>>  M     src/arm/macro-assembler-arm.cc
>>>>>  M     src/hydrogen-instructions.h
>>>>>  M     src/hydrogen-instructions.cc
>>>>>  M     src/hydrogen.cc
>>>>>  M     src/ia32/lithium-codegen-ia32.**cc
>>>>>  M     src/ia32/lithium-ia32.h
>>>>>  M     src/ia32/lithium-ia32.cc
>>>>>  M     src/mips/lithium-codegen-mips.**cc
>>>>>  M     src/mips/lithium-mips.h
>>>>>  M     src/mips/lithium-mips.cc
>>>>>  M     src/utils.h
>>>>>  M     src/utils.cc
>>>>>  M     src/x64/lithium-codegen-x64.cc
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> v8-dev mailing list
>>>>> [email protected]
>>>>> http://groups.google.com/**group/v8-dev<http://groups.google.com/group/v8-dev>
>>>>
>>>>
>>>>
>>>
>>
>  --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
>

-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev

Re: [v8-dev] Optimise code for bitwise usage... (issue 9301041)

Reply via email to