That's an excellent answer, Seth. On Fri, May 20, 2022 at 10:07 PM '[email protected]' via v8-dev < [email protected]> wrote:
> Hi Conrad, > > I'll make an attempt at answering, though I'm not an expert on OSR, so > others like Jakob or Mythri may have more precise answers. > > 1. Why would this property access be polymorphic? > > If you're talking about polymorphic access, you're probably familiar with > hidden classes (also called "object shapes" or "Maps but not *those* Maps > <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map>"). > Regardless, I'll include a link: the most complete and accurate description > I've found of how hidden classes work in V8 is Fast properties in V8 > <https://v8.dev/blog/fast-properties>. > > In this example, the object {a: 0} has a hidden class which says it has > one property named "a". The object {a: 0, b:1} has a different hidden class > which says it has two properties, named "a" and "b". After the function > getA has performed a lookup to get property "a" from {a: 0}, it keeps a > pointer to that object's hidden class and another value indicating where > the property can be found (for this case, the first in-object property > slot). When running getA({a:0, b:1}), V8 checks whether the object {a:0, > b:1} has the same hidden class as {a: 0}, which it does not. So V8 > remembers a second pair of values: the hidden class for {a:0, b:1} and > where its "a" property can be found (also the first in-object property > slot). Now the feedback state for that load operation is polymorphic. The > "mono" in monomorphic refers to a single hidden class, not to a single > result of where the property can be found. > > This behavior tends to be particularly problematic in codebases with a lot > of class inheritance, because loading a field defined by a base class is > often a megamorphic operation, even if that base class's constructor always > sets the same properties in the same order. > > 2. Why would polymorphic code optimized by turbofan be a full 3x slower > than unoptimized bytecode? > > It seems that you may be misunderstanding the somewhat cryptic output from > --trace-opt. In particular, OSR means on-stack replacement > <https://v8.dev/blog/v8-release-79#osr-caching>. Copying some text from > that link: > > *When V8 identifies that certain functions are hot it marks them for > optimization on the next call. When the function executes again, V8 > compiles the function using the optimizing compiler and starts using the > optimized code from the subsequent call. However, for functions with long > running loops this is not sufficient. V8 uses a technique called on-stack > replacement (OSR) to install optimized code for the currently executing > function. This allows us to start using the optimized code during the first > execution of the function, while it is stuck in a hot loop.* > > Iterating through an array of 100 million items certainly counts as a "hot > loop", so the vast majority of the time in *all* of your measurements is > spent in optimized code produced by Turbofan, not in the interpreter. You > can try running unoptimized code by passing the command-line flag --no-opt, > which I expect will go much more slowly than what you've measured thus far. > I've added some possibly more human-readable annotations to the output you > provided: > > # This call started in the interpreter, but was replaced by optimized code > while running (this process is referred to as OSR). > [compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> > (target TURBOFAN) using TurboFan OSR] > [optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target > TURBOFAN) - took 0.000, 0.541, 0.000 ms] > array1: 115.701ms > > # This call reused the OSR code from the first call. > [found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = > 0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35] > array1: 113.721ms > > # At this point, the function got compiled normally (not using OSR), so > future calls will use this optimized code. > [compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> > (target TURBOFAN) using TurboFan] > [optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target > TURBOFAN) - took 0.000, 0.500, 0.041 ms] > > # These three calls used fully optimized code. > array1: 80.069ms > array1: 79.72ms > array1: 79.245ms > > # This call mostly used optimized code, until it bailed out to the > interpreter for the last four items. > array2: 78.906ms > > # This call reused the OSR code from the very first call. This is > surprising to me; I didn't realize that the OSR code was still available at > this point, after the non-OSR version of the function has bailed out. > However, it seems to work nicely in this case. Once again, it bailed out to > the interpreter for the last four items. > [found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = > 0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35] > array2: 112.758ms > > # At this point, the function got compiled normally again, so future calls > will use this optimized code. > [compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> > (target TURBOFAN) using TurboFan] > [optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target > TURBOFAN) - took 0.000, 0.500, 0.042 ms] > > # These three calls used that newly compiled version of the code, which > uses a megamorphic load. > array2: 350.273ms > array2: 351.822ms > array2: 357.311ms > > In closing, I'll just echo Ryan: "JS perf is extremely hard to reason > about". > > Best, > Seth > On Friday, May 20, 2022 at 10:11:37 AM UTC-7 [email protected] wrote: > >> I'm looking at a perf example shared by Ryan Cavanaugh of Typescript, and >> I'm very much failing to understand what is happening and why. The >> particular contradictions upset my entire mental model of how to write >> performant javascript. What's going on internally? >> >> Here is the example: >> https://gist.github.com/conartist6/642dcfbd6fa444da92f211bcb405692b >> >> The two specific things I don't understand are: >> >> 1. If I have this code: >> >> ```js >> function getA(o) { >> // Why would this property access be polymorphic? >> // Isn't the offset for the `a` property always the same? >> return o.a; >> } >> getA({ a: 0 }) >> getA({ a: 0, b: 1 }) >> ``` >> >> 2. Why would polymorphic code optimized by turbofan be a full 3x slower >> than unoptimized bytecode? >> >> -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to the Google Groups > "v8-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/v8-dev/6857a76a-8c97-445d-9f92-413cc9f99c67n%40googlegroups.com > <https://groups.google.com/d/msgid/v8-dev/6857a76a-8c97-445d-9f92-413cc9f99c67n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/CAKSzg3Qbwm91nxVZ5YVSXYaL%3D-hoQCmm%2BH6sg4jGQhv_5VCdyQ%40mail.gmail.com.
