Hello, On Tue, Feb 28, 2012 at 12:36 PM, Florian Schneider <[email protected] > wrote:
> [+vegorov] > > Sorry for the delay. As I see, there are two optimizations here: > > 1. Optimizing operations where the result is truncated to int32 (used in a > bitwise-operation) > > (1) is a little tricky to preserve the correct result in case of > overflowing the precise integer range and get the corner cases > right. Vyacheslav (cc'ed) and I also did some experiments but did not yet > come up with a satisfying fast solution. I think your solution is missing > the propagation through Phi-instructions which is needed in some of the > crypto-like benchmarks. > I agree this is quite tricky. I think there are two sub-issues here. - The first is to track what operations can be optimized. The range tracking is lacking information to do that optimally. I found that propagation of the optimizable status to earlier operations is very limited by information loss. - The second is to correctly round and clear the bottom bits within optimized operations. This is hard in the multiplication case, but feasible. I found there was a little imprecision here. ECMA-262 specifies 11.5.1 Applying the * Operator [...] In the remaining cases, where neither an infinity or NaN is involved, the product is computed and* rounded to the nearest representable value using IEEE 754 round-to-nearest mode*. If the magnitude is too large to represent, the result is then an infinity of appropriate sign. If But in section 4.3 of IEEE-754, two sub-modes are defined for rounding to nearest: roundTiesToEven and roundTiesToAway. The standard only requires the implementation of roundTiesToEven, so this is the one I implemented when rounding multiplication. This means that the result may differ from what a FPU is configured to roundTiesToAway would yield. However it seems this still satisfies ECMA-262. > 2. Optimizing Math.floor(x/y) > > I think (2) is definitely a good idea. It is a very common pattern used by > programmers. > > I would appreciate if you could separate the Math.floor optimization into > a separate CL for review. > I will do that. I thought about implementing a similar optimization for (x / y) | 0 (rather than using 1.) in the same CL, but I think it requires too many additional changes. I believe it would be interesting as a separate CL to extend the well known '| 0' trick to the division operation. > > Thanks, > Florian > Thanks, Alexandre > > Den 28. feb. 2012 12.10 skrev Alexandre Rames <[email protected]>: > > Any comments or ideas about this? >> >> Alexandre >> >> >> On Fri, Feb 3, 2012 at 1:08 PM, Alexandre Rames < >> [email protected]> wrote: >> >>> Hello, >>> >>> Sorry for the late answer. >>> >>> Although the patch locally improves a few code segments, there are no >>> observable performance gain on the usual benchmarks. >>> The main reason seems to be that benchmarks are designed to avoid (or >>> just don't) going through these bad cases. >>> >>> However this extends the trick (var | 0) well known from JS >>> programmers (example >>> here<http://code.google.com/p/v8/issues/detail?id=1840>). >>> Once it lands it will allow to generate more optimized code. So I think >>> there is a good potential use for this. >>> >>> On the very following very dummy benchmark below, I get the following >>> timings on my Tegra2: >>> >>> $ time ./d8.10579 test/local/bitwise.js >>> >>> real 0m4.852s >>> user 0m4.780s >>> sys 0m0.030s >>> >>> $ time ./d8.bitwise test/local/bitwise.js >>> >>> real 0m2.092s >>> user 0m2.060s >>> sys 0m0.030s >>> >>> >>> // bitwise.js ----------------------------------------------------------- >>> function main() { >>> for(var i = 0; i < 30000000; i++) { >>> test(i, i*3); >>> } >>> } >>> function test(a, b) { >>> var v1 = a - 5; >>> var v2 = b >> 1; >>> return ((a / b) | 0) + ((a * (b + 1)) >> 5) + Math.floor(b / 5); >>> } >>> main(); >>> >>> >>> And that it is without a SDIV instruction... >>> I admit here we benefit greatly from the optimization, especially for >>> the divisions. >>> >>> I think it would be interesting to rewrite the crypto benchmarks using >>> this optimization... >>> >>> Alexandre >>> >>> >>> >>> >>> On Tue, Jan 31, 2012 at 3:51 PM, Daniel Clifford <[email protected]>wrote: >>> >>>> I'll add Florian and Kevin to your CL, they are probably the best >>>> qualified to review what you've done. Can you please remind me which of the >>>> benchmarks you found this to help on? >>>> >>>> Thanks, >>>> Danno >>>> >>>> >>>> On Tue, Jan 31, 2012 at 4:42 PM, Alexandre Rames < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> A few details to help the reviewers. >>>>> >>>>> - The hydrogen level optimization to optimize HBitwise instructions >>>>> away when canonicalizing has been moved down to Lithium level. >>>>> Removing the HBitwise instruction loses information on how the >>>>> inputs are used and prevents from recognizing patterns like (expr | 0) >>>>> when >>>>> inserting representation changes (see below). >>>>> The problem with moving this down to Lithium is that the code is >>>>> duplicated for all architectures. >>>>> - The first step, triggered during the range inference, visits the >>>>> Hydrogen graph to mark operations only used by bitwise operations. This >>>>> allows to discard for example overflow checks or bailout on minus zero. >>>>> We may want to move this somewhere else. >>>>> - The second steps occurs when representation changes are inserted >>>>> before a bitwise operation. We try to recognize patterns like ((a / b) | >>>>> 0) >>>>> or Math.floor(a / b) and optimize the generated code. >>>>> - Basic support has been added for other architectures than ARM, but >>>>> they still lack the optimized code part (implementation should be easy on >>>>> ia32 and x64). >>>>> >>>>> The optimization can be extended to support more cases. I think this >>>>> handles enough for a first step. >>>>> >>>>> Please don't hesitate to ask me for details (email or chat). >>>>> >>>>> Alexandre >>>>> >>>>> On Tue, Jan 31, 2012 at 3:26 PM, <[email protected]> wrote: >>>>> >>>>>> Reviewers: danno, >>>>>> >>>>>> Description: >>>>>> Optimise code for bitwise usage >>>>>> >>>>>> BUG=none >>>>>> TEST=Added bitwise-use.js, math-floor-div.js >>>>>> >>>>>> Please review this at >>>>>> http://codereview.chromium.**org/9301041/<http://codereview.chromium.org/9301041/> >>>>>> >>>>>> SVN Base: >>>>>> http://v8.googlecode.com/svn/**branches/bleeding_edge/<http://v8.googlecode.com/svn/branches/bleeding_edge/> >>>>>> >>>>>> Affected files: >>>>>> M src/arm/lithium-arm.h >>>>>> M src/arm/lithium-arm.cc >>>>>> M src/arm/lithium-codegen-arm.h >>>>>> M src/arm/lithium-codegen-arm.cc >>>>>> M src/arm/macro-assembler-arm.h >>>>>> M src/arm/macro-assembler-arm.cc >>>>>> M src/hydrogen-instructions.h >>>>>> M src/hydrogen-instructions.cc >>>>>> M src/hydrogen.cc >>>>>> M src/ia32/lithium-codegen-ia32.**cc >>>>>> M src/ia32/lithium-ia32.h >>>>>> M src/ia32/lithium-ia32.cc >>>>>> M src/mips/lithium-codegen-mips.**cc >>>>>> M src/mips/lithium-mips.h >>>>>> M src/mips/lithium-mips.cc >>>>>> M src/utils.h >>>>>> M src/utils.cc >>>>>> M src/x64/lithium-codegen-x64.cc >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> v8-dev mailing list >>>>>> [email protected] >>>>>> http://groups.google.com/**group/v8-dev<http://groups.google.com/group/v8-dev> >>>>> >>>>> >>>>> >>>> >>> >> -- >> v8-dev mailing list >> [email protected] >> http://groups.google.com/group/v8-dev >> > > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev
