Status: New
Owner: ----
New issue 2096 by [email protected]: Fixing Navier-Stokes performance
on ARM.
http://code.google.com/p/v8/issues/detail?id=2096
ARM currently under-performs in the Navier-Stokes benchmark.
I had a look today and found something.
On my Tegra2 a one line hack (patch ns.diff attached) changes the score
from ~1000 to ~1800.
Any idea how to cleanly fix this?
Let me know if you need more details/traces or have questions (available on
chat).
More details:
1 ---- issue with representation inference
In navier-stokes.js, starting from line 204:
function advect(...)
{
var Wp5 = width + 0.5; // width = 128
var Hp5 = height + 0.5; // height = 128
for() { // OSR-ed loop
...
if (x < 0.5)
x = 0.5;
else if (x > Wp5)
x = Wp5;
...
}
...
Wp5 and Hp5 have double values. But it seems something goes wrong in the
representation inference mechanism, and they end up converted to int32 -
the conversion (non-truncating) fails triggering deop.
The attached c1visualizer output attached shows in block B1 that a 'wrong'
decision has been taken.
2 ---- Deoptimization
From my observation the actual deoptimization occurs when trying to set up
the value when entering the OSR-ed loop. I guess the first attempt to
conversion is done in when entering the OSR-ed loop.
In attached advect.opt_code: (--print_opt_code --code_comments)
Jumping from
316 ;;; @140: tagged-to-i.
317 0x4b06508c 652 e1b010c1 movs r1, r1, asr #1
318 0x4b065090 656 2a000203 bcs +2068 -> 2724 (0x4b0658a4)
to the deferred-tagged-to-i, which fails with an inexact conversion (128.5
to int32)
1162 0x4b0658e4 2788 e312001f tst r2, #31
1163 ;; Deoptimization check. id = 47
1164 0x4b0658e8 2792 1a000142 bne +1296 -> 4088 (0x4b065df8)
3 ---- Too optimistic representation change in TryChange() ?
I think the decision to prefer Integer32 to Double in
HInferRepresentation::TryChange(HValue* value) (hydrogen.cc line 1904)
is too optimistic.
It seems to me
IsConvertibleToInteger()
does not always guarantee that the value can be converted to an int32.
As the patch shows, preferring double over int32 fixes the performance gap.
Note that I am not sure how the 3 points exactly relate together. I didn't
investigate further after finding a hack.
(It seems to me 1 and 2 are two different results caused by 3.)
It is unlikely that the transition (in the c1visualizer trace)
0 1 i351 Change d23 d to i -0? range[-2147483648,2147483647,m0=0]
is generated by (hydrogen.cc line 1860)
HInferRepresentation::InferBasedOnUses()
since we should have returned line 1862
if(r.isSpecialization() || value->HasNoUses()) return;
Attachments:
ns.diff 642 bytes
c1-visu.png 159 KB
advect.opt_code 102 KB
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev