[v8-dev] Re: Polymorphism slower than bytecode?

'[email protected]' via v8-dev Fri, 20 May 2022 13:07:48 -0700

Hi Conrad,

I'll make an attempt at answering, though I'm not an expert on OSR, so 
others like Jakob or Mythri may have more precise answers.

1. Why would this property access be polymorphic?

If you're talking about polymorphic access, you're probably familiar with 
hidden classes (also called "object shapes" or "Maps but not *those* Maps 
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map>").

Regardless, I'll include a link: the most complete and accurate description 
I've found of how hidden classes work in V8 is Fast properties in V8 
<https://v8.dev/blog/fast-properties>.

In this example, the object {a: 0} has a hidden class which says it has one 
property named "a". The object {a: 0, b:1} has a different hidden class 
which says it has two properties, named "a" and "b". After the function 
getA has performed a lookup to get property "a" from {a: 0}, it keeps a 
pointer to that object's hidden class and another value indicating where 
the property can be found (for this case, the first in-object property 
slot). When running getA({a:0, b:1}), V8 checks whether the object {a:0, 
b:1} has the same hidden class as {a: 0}, which it does not. So V8 
remembers a second pair of values: the hidden class for {a:0, b:1} and 
where its "a" property can be found (also the first in-object property 
slot). Now the feedback state for that load operation is polymorphic. The 
"mono" in monomorphic refers to a single hidden class, not to a single 
result of where the property can be found.

This behavior tends to be particularly problematic in codebases with a lot 
of class inheritance, because loading a field defined by a base class is 
often a megamorphic operation, even if that base class's constructor always 
sets the same properties in the same order.

2. Why would polymorphic code optimized by turbofan be a full 3x slower 
than unoptimized bytecode?

It seems that you may be misunderstanding the somewhat cryptic output from 
--trace-opt. In particular, OSR means on-stack replacement 
<https://v8.dev/blog/v8-release-79#osr-caching>. Copying some text from 
that link:

*When V8 identifies that certain functions are hot it marks them for 
optimization on the next call. When the function executes again, V8 
compiles the function using the optimizing compiler and starts using the 
optimized code from the subsequent call. However, for functions with long 
running loops this is not sufficient. V8 uses a technique called on-stack 
replacement (OSR) to install optimized code for the currently executing 
function. This allows us to start using the optimized code during the first 
execution of the function, while it is stuck in a hot loop.*

Iterating through an array of 100 million items certainly counts as a "hot 
loop", so the vast majority of the time in *all* of your measurements is 
spent in optimized code produced by Turbofan, not in the interpreter. You 
can try running unoptimized code by passing the command-line flag --no-opt, 
which I expect will go much more slowly than what you've measured thus far. 
I've added some possibly more human-readable annotations to the output you 
provided:

# This call started in the interpreter, but was replaced by optimized code 
while running (this process is referred to as OSR).
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> 
(target TURBOFAN) using TurboFan OSR]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target 
TURBOFAN) - took 0.000, 0.541, 0.000 ms]
array1: 115.701ms

# This call reused the OSR code from the first call.
[found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = 
0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35]
array1: 113.721ms

# At this point, the function got compiled normally (not using OSR), so 
future calls will use this optimized code.
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> 
(target TURBOFAN) using TurboFan]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target 
TURBOFAN) - took 0.000, 0.500, 0.041 ms]

# These three calls used fully optimized code.
array1: 80.069ms
array1: 79.72ms
array1: 79.245ms

# This call mostly used optimized code, until it bailed out to the 
interpreter for the last four items.
array2: 78.906ms

# This call reused the OSR code from the very first call. This is 
surprising to me; I didn't realize that the OSR code was still available at 
this point, after the non-OSR version of the function has bailed out. 
However, it seems to work nicely in this case. Once again, it bailed out to 
the interpreter for the last four items.
[found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = 
0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35]
array2: 112.758ms

# At this point, the function got compiled normally again, so future calls 
will use this optimized code.
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> 
(target TURBOFAN) using TurboFan]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target 
TURBOFAN) - took 0.000, 0.500, 0.042 ms]

# These three calls used that newly compiled version of the code, which 
uses a megamorphic load.
array2: 350.273ms
array2: 351.822ms
array2: 357.311ms

In closing, I'll just echo Ryan: "JS perf is extremely hard to reason 
about".

Best,
Seth
On Friday, May 20, 2022 at 10:11:37 AM UTC-7 [email protected] wrote:

> I'm looking at a perf example shared by Ryan Cavanaugh of Typescript, and 
> I'm very much failing to understand what is happening and why. The 
> particular contradictions upset my entire mental model of how to write 
> performant javascript. What's going on internally?
>
> Here is the example: 
> https://gist.github.com/conartist6/642dcfbd6fa444da92f211bcb405692b
>
> The two specific things I don't understand are:
>
> 1. If I have this code:
>
> ```js
> function getA(o) {
>   // Why would this property access be polymorphic?
>   // Isn't the offset for the `a` property always the same?
>   return o.a;
> }
> getA({ a: 0 })
> getA({ a: 0, b: 1 })
> ```
>
> 2. Why would polymorphic code optimized by turbofan be a full 3x slower 
> than unoptimized bytecode?
>
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/6857a76a-8c97-445d-9f92-413cc9f99c67n%40googlegroups.com.

[v8-dev] Re: Polymorphism slower than bytecode?

Reply via email to