"extends" is not an anti-pattern. Of course monomorphic code is fastest. Polymorphic/megamorphic loads and stores have to do more work (specifically, dynamic dispatch), which is going to take a bit more time. Class hierarchies are one way how developers *can* create polymorphic code; what you have here is an example of that. But that doesn't mean that inheritance itself is a problem, or that polymorphism is a problem.
The reason this is different from C++ vtables is because JavaScript isn't C++. JavaScript is a dynamic language, and engines must strike a balance between achieving fast execution when the object hierarchies are not modified much, and still having acceptable performance when developers make use of JavaScript's dynamic capabilities and do modify the object hierarchy at runtime. So engines should not have internal mechanics that are too rigid, and would cause huge performance drops when code does something valid but "unexpected". It's easy to point at one pattern and say "clearly this should be handled better", but when you look at a variety of large codebases, they use so many different patterns that it becomes very unclear what internal object representation model would work best on average. FWIW, the original example is essentially equivalent to simple "traditional" polymorphic patterns like the following, which uses no inheritance and no "extends": function DoIt(obj) { obj.done++; } function Cat() {} function Dog() {} // etc. var c = new Cat(); DoIt(c); // Only one type so far -> monomorphic var d = new Dog(); DoIt(d); // Two types -> polymorphic // etc. On Sun, Sep 24, 2017 at 3:51 PM, Caitlin Potter <ca...@igalia.com> wrote: > So yes, the load/store for `this.done++` can't be reduced to simple > machine ops when `doIt` is megamorphic, as far as I can tell. (5+ receiver > maps should make the load/stores megamorphic, or at least that's what > https://github.com/v8/v8/blob/9a0d5d9700ee269cbe7abee6d62733 > 163dadac14/src/ic/ic.h#L33-L35 seems to indicate). > > That doesn't necessarily mean LoadIC is going to be slow, I guess it > depends on how often you the receiver map is missing from the cache. Slower > than the inlined monomorphic in-object field load/store, but probably not > "anti-pattern" slow. > > I could be wrong about this, but I think the CallIC for individual calls > to `doIt` can be inlined with different type feedback information for the > load/store, so if that shows up in hot code which is monomorphic or > polymorphic, it might produce better code than your example there. > > That said, I'm half guessing about all of this, so I'll leave the rest of > this thread for more knowledgeable people. > > On Sunday, September 24, 2017 at 12:07:58 PM UTC-4, Bogdan Orlov wrote: >> >> After removing all intermediate %OptimizeFunctionOnNextCall() and >> putting at the end (after usage of all Animal subclasses for enough >> feedback) like this >> >> class Animal { >> constructor(){ >> this.done = 0 >> } >> doIt(){ >> this.done++ >> } >> } >> class Cat extends Animal {} >> class Dog extends Animal {} >> class Dog1 extends Animal {} >> class Dog2 extends Animal {} >> class Dog3 extends Animal {} >> >> function test(){ >> var cat = new Cat(); >> cat.doIt(); >> cat.doIt(); >> cat.doIt(); >> var dog = new Dog(); >> dog.doIt(); >> dog.doIt(); >> dog.doIt(); >> var dog1 = new Dog1(); >> dog1.doIt(); >> dog1.doIt(); >> dog1.doIt(); >> var dog2 = new Dog2(); >> dog2.doIt(); >> dog2.doIt(); >> dog2.doIt(); >> var dog3 = new Dog3(); >> dog3.doIt(); >> dog3.doIt(); >> dog3.doIt(); >> %OptimizeFunctionOnNextCall(dog3.doIt) >> dog3.doIt() >> } >> test() >> >> I got this output: >> >> 0x2df967b05180 0 55 push rbp >> 0x2df967b05181 1 4889e5 REX.W movq rbp,rsp >> 0x2df967b05184 4 56 push rsi >> 0x2df967b05185 5 57 push rdi >> 0x2df967b05186 6 493ba5480c0000 REX.W cmpq rsp,[r13+0xc48] >> 0x2df967b0518d d 0f8669000000 jna 0x2df967b051fc <+0x7c> >> 0x2df967b05193 13 488b75f8 REX.W movq rsi,[rbp-0x8] >> 0x2df967b05197 17 48b80000000003000000 REX.W movq rax,0x300000000 >> 0x2df967b051a1 21 488b5510 REX.W movq rdx,[rbp+0x10] >> 0x2df967b051a5 25 498b8dd0050000 REX.W movq rcx,[r13+0x5d0] >> 0x2df967b051ac 2c 488bd9 REX.W movq rbx,rcx >> 0x2df967b051af 2f e8ccd7f4ff call 0x2df967a52980 >> (LoadICTrampoline) ;; code: LOAD_IC >> 0x2df967b051b4 34 a801 test al,0x1 >> 0x2df967b051b6 36 0f8557000000 jnz 0x2df967b05213 <+0x93> >> 0x2df967b051bc 3c 488bd8 REX.W movq rbx,rax >> 0x2df967b051bf 3f 48c1eb20 REX.W shrq rbx, 32 >> 0x2df967b051c3 43 83ebff subl rbx,0xff >> 0x2df967b051c6 46 0f804c000000 jo 0x2df967b05218 <+0x98> >> 0x2df967b051cc 4c 48c1e320 REX.W shlq rbx, 32 >> 0x2df967b051d0 50 48bf0000000005000000 REX.W movq rdi,0x500000000 >> 0x2df967b051da 5a 488b5510 REX.W movq rdx,[rbp+0x10] >> 0x2df967b051de 5e 498b8dd0050000 REX.W movq rcx,[r13+0x5d0] >> 0x2df967b051e5 65 488bc3 REX.W movq rax,rbx >> 0x2df967b051e8 68 488b75f8 REX.W movq rsi,[rbp-0x8] >> 0x2df967b051ec 6c e8cf1ff5ff call 0x2df967a571c0 >> (StoreICStrictTrampoline) ;; code: STORE_IC >> 0x2df967b051f1 71 498b45a0 REX.W movq rax,[r13-0x60] >> 0x2df967b051f5 75 488be5 REX.W movq rsp,rbp >> 0x2df967b051f8 78 5d pop rbp >> 0x2df967b051f9 79 c20800 ret 0x8 >> 0x2df967b051fc 7c 48bbe07c890001000000 REX.W movq rbx,0x100897ce0 >> 0x2df967b05206 86 33c0 xorl rax,rax >> 0x2df967b05208 88 488b75f8 REX.W movq rsi,[rbp-0x8] >> 0x2df967b0520c 8c e88ff4e7ff call 0x2df9679846a0 ;; code: >> STUB, CEntryStub, minor: 8 >> 0x2df967b05211 91 eb80 jmp 0x2df967b05193 <+0x13> >> 0x2df967b05213 93 e8fcedcfff call 0x2df967804014 ;; debug: >> deopt position, script offset '130' >> ;; debug: >> deopt position, inlining id '-1' >> ;; debug: >> deopt reason 'not a Smi' >> ;; debug: >> deopt index 2 >> ;; >> deoptimization bailout 2 >> 0x2df967b05218 98 e801eecfff call 0x2df96780401e ;; debug: >> deopt position, script offset '130' >> ;; debug: >> deopt position, inlining id '-1' >> ;; debug: >> deopt reason 'overflow' >> ;; debug: >> deopt index 3 >> ;; >> deoptimization bailout 3 >> 0x2df967b0521d 9d 90 nop >> 0x2df967b0521e 9e 90 nop >> 0x2df967b0521f 9f 90 nop >> 0x2df967b05220 a0 90 nop >> 0x2df967b05221 a1 90 nop >> 0x2df967b05222 a2 90 nop >> 0x2df967b05223 a3 90 nop >> 0x2df967b05224 a4 90 nop >> 0x2df967b05225 a5 90 nop >> 0x2df967b05226 a6 90 nop >> 0x2df967b05227 a7 90 nop >> 0x2df967b05228 a8 90 nop >> 0x2df967b05229 a9 90 nop >> 0x2df967b0522a aa 6690 nop >> >> So the result is the same (but without intermediate deopts) - v8 compiles >> 'this.done++' to slow polymorphic runtime calls LoadICTrampoline and >> StoreICStrictTrampoline. And if this is not a bug and v8 just works this >> way, we have slow polymorphic accessing to 'this' object in all base class >> methods in commonly used inheritance and polymorphism pattern and thus >> 'extends' keyword come out as performance antipattern >> >> >> On Sunday, September 24, 2017 at 4:31:44 PM UTC+3, Caitlin Potter wrote: >>> >>> I think %OptimizeFunctionOnNextCall() is particularly bad for >>> benchmarks, because you train the function to think the method load is >>> monomorphic. >>> >>> With more type feedback, this should do a bit better, I think we can >>> inline a finite amount of polymorphic loads. >>> >>> At least, I _think_ so. >>> >>> On Sep 24, 2017, at 3:12 AM, B. Orlov <bgno...@gmail.com> wrote: >>> >>> Ok, there is more details. This is simplified example without loops >>> >>> class Animal { >>> constructor(){ >>> this.done = 0 >>> } >>> doIt(){ >>> this.done++ >>> } >>> } >>> >>> class Cat extends Animal {} >>> >>> class Dog extends Animal {} >>> class Dog1 extends Animal {} >>> class Dog2 extends Animal {} >>> class Dog3 extends Animal {} >>> >>> class AnimalCat { >>> constructor(){ >>> this.done = 0 >>> } >>> doIt(){ >>> this.done++ >>> } >>> } >>> >>> function test(){ >>> var cat = new Cat(); >>> cat.doIt(); >>> cat.doIt(); //warm >>> %OptimizeFunctionOnNextCall(cat.doIt) >>> cat.doIt(); //fast REX.W asm instruction for this.done read and write >>> >>> var dog = new Dog(); >>> dog.doIt(); //deopt - wrong map >>> %OptimizeFunctionOnNextCall(dog.doIt) >>> dog.doIt(); //the same but now added check for new hidden map - Dog >>> >>> var dog1 = new Dog1(); >>> dog1.doIt(); //deopt - wrong map >>> %OptimizeFunctionOnNextCall(dog1.doIt) >>> dog1.doIt() //the same but now added one more check for new hidden map >>> - Dog1 >>> >>> var dog2 = new Dog2(); >>> dog2.doIt(); //deopt - wrong map >>> %OptimizeFunctionOnNextCall(dog2.doIt) >>> dog2.doIt() //the same but now added one more check for new hidden map >>> - Dog2 >>> >>> var dog3 = new Dog3(); >>> dog3.doIt(); //deopt - wrong map >>> %OptimizeFunctionOnNextCall(dog3.doIt) >>> dog3.doIt() //v8 gives up and compiles this.done to slow calls c++ >>> runtime functions - LoadICTrampoline and StoreICStrictTrampoline >>> } >>> >>> test() >>> >>> Running with latest node with arguments "node --trace-deopt >>> --print-opt-code --allow-natives-syntax test.js" there will be output of >>> asm instructions for each optimization of doIt() method. After first >>> optimizations (with only Cat class invoking) I got >>> >>> 0x3e9869385180 0 55 push rbp >>> 0x3e9869385181 1 4889e5 REX.W movq rbp,rsp >>> 0x3e9869385184 4 56 push rsi >>> 0x3e9869385185 5 57 push rdi >>> 0x3e9869385186 6 493ba5480c0000 REX.W cmpq rsp,[r13+0xc48] >>> 0x3e986938518d d 0f863f000000 jna 0x3e98693851d2 <+0x52> >>> 0x3e9869385193 13 488b4510 REX.W movq rax,[rbp+0x10] >>> 0x3e9869385197 17 a801 test al,0x1 >>> 0x3e9869385199 19 0f844a000000 jz 0x3e98693851e9 <+0x69> >>> 0x3e986938519f 1f 48bb9112642f9b320000 REX.W movq >>> rbx,0x329b2f641291 ;; object: 0x329b2f641291 <Map(FAST_HOLEY_ELEMENTS)> >>> 0x3e98693851a9 29 483958ff REX.W cmpq [rax-0x1],rbx >>> 0x3e98693851ad 2d 0f853b000000 jnz 0x3e98693851ee <+0x6e> >>> 0x3e98693851b3 33 8b581b movl rbx,[rax+0x1b] >>> 0x3e98693851b6 36 83ebff subl rbx,0xff >>> 0x3e98693851b9 39 0f8034000000 jo 0x3e98693851f3 <+0x73> >>> 0x3e98693851bf 3f 48c1e320 REX.W shlq rbx, 32 >>> 0x3e98693851c3 43 48895817 REX.W movq [rax+0x17],rbx >>> 0x3e98693851c7 47 498b45a0 REX.W movq rax,[r13-0x60] >>> 0x3e98693851cb 4b 488be5 REX.W movq rsp,rbp >>> 0x3e98693851ce 4e 5d pop rbp >>> 0x3e98693851cf 4f c20800 ret 0x8 >>> >>> than after few deoptimizations for Dog2 class V8 compiles doIt() method >>> to this >>> >>> 0x3e9869385540 0 55 push rbp >>> 0x3e9869385541 1 4889e5 REX.W movq rbp,rsp >>> 0x3e9869385544 4 56 push rsi >>> 0x3e9869385545 5 57 push rdi >>> 0x3e9869385546 6 493ba5480c0000 REX.W cmpq rsp,[r13+0xc48] >>> 0x3e986938554d d 0f867b000000 jna 0x3e98693855ce <+0x8e> >>> 0x3e9869385553 13 488b4510 REX.W movq rax,[rbp+0x10] >>> 0x3e9869385557 17 a801 test al,0x1 >>> 0x3e9869385559 19 0f8489000000 jz 0x3e98693855e8 <+0xa8> >>> 0x3e986938555f 1f 488b58ff REX.W movq rbx,[rax-0x1] >>> 0x3e9869385563 23 48ba9112642f9b320000 REX.W movq >>> rdx,0x329b2f641291 ;; object: 0x329b2f641291 <Map(FAST_HOLEY_ELEMENTS)> >>> 0x3e986938556d 2d 483bd3 REX.W cmpq rdx,rbx >>> 0x3e9869385570 30 0f8439000000 jz 0x3e98693855af <+0x6f> >>> 0x3e9869385576 36 48ba4914642f9b320000 REX.W movq >>> rdx,0x329b2f641449 ;; object: 0x329b2f641449 <Map(FAST_HOLEY_ELEMENTS)> >>> 0x3e9869385580 40 483bd3 REX.W cmpq rdx,rbx >>> 0x3e9869385583 43 0f8426000000 jz 0x3e98693855af <+0x6f> >>> 0x3e9869385589 49 48ba5115642f9b320000 REX.W movq >>> rdx,0x329b2f641551 ;; object: 0x329b2f641551 <Map(FAST_HOLEY_ELEMENTS)> >>> 0x3e9869385593 53 483bd3 REX.W cmpq rdx,rbx >>> 0x3e9869385596 56 0f8413000000 jz 0x3e98693855af <+0x6f> >>> 0x3e986938559c 5c 48ba5916642f9b320000 REX.W movq >>> rdx,0x329b2f641659 ;; object: 0x329b2f641659 <Map(FAST_HOLEY_ELEMENTS)> >>> 0x3e98693855a6 66 483bd3 REX.W cmpq rdx,rbx >>> 0x3e98693855a9 69 0f853e000000 jnz 0x3e98693855ed <+0xad> >>> 0x3e98693855af 6f 8b581b movl rbx,[rax+0x1b] >>> 0x3e98693855b2 72 83ebff subl rbx,0xff >>> 0x3e98693855b5 75 0f8037000000 jo 0x3e98693855f2 <+0xb2> >>> 0x3e98693855bb 7b 48c1e320 REX.W shlq rbx, 32 >>> 0x3e98693855bf 7f 48895817 REX.W movq [rax+0x17],rbx >>> 0x3e98693855c3 83 498b45a0 REX.W movq rax,[r13-0x60] >>> 0x3e98693855c7 87 488be5 REX.W movq rsp,rbp >>> 0x3e98693855ca 8a 5d pop rbp >>> 0x3e98693855cb 8b c20800 ret 0x8 >>> >>> and on Dog3 class v8 gives up and compiles to this (with slow runtime >>> calls LoadICTrampoline and StoreICStrictTrampoline) >>> >>> 0x3e98693856a0 0 55 push rbp >>> 0x3e98693856a1 1 4889e5 REX.W movq rbp,rsp >>> 0x3e98693856a4 4 56 push rsi >>> 0x3e98693856a5 5 57 push rdi >>> 0x3e98693856a6 6 493ba5480c0000 REX.W cmpq rsp,[r13+0xc48] >>> 0x3e98693856ad d 0f8669000000 jna 0x3e986938571c <+0x7c> >>> 0x3e98693856b3 13 488b75f8 REX.W movq rsi,[rbp-0x8] >>> 0x3e98693856b7 17 48b80000000003000000 REX.W movq rax,0x300000000 >>> 0x3e98693856c1 21 488b5510 REX.W movq rdx,[rbp+0x10] >>> 0x3e98693856c5 25 498b8dd0050000 REX.W movq rcx,[r13+0x5d0] >>> 0x3e98693856cc 2c 488bd9 REX.W movq rbx,rcx >>> 0x3e98693856cf 2f e8acd2f4ff call 0x3e98692d2980 >>> (LoadICTrampoline) ;; code: LOAD_IC >>> 0x3e98693856d4 34 a801 test al,0x1 >>> 0x3e98693856d6 36 0f8557000000 jnz 0x3e9869385733 <+0x93> >>> 0x3e98693856dc 3c 488bd8 REX.W movq rbx,rax >>> 0x3e98693856df 3f 48c1eb20 REX.W shrq rbx, 32 >>> 0x3e98693856e3 43 83ebff subl rbx,0xff >>> 0x3e98693856e6 46 0f804c000000 jo 0x3e9869385738 <+0x98> >>> 0x3e98693856ec 4c 48c1e320 REX.W shlq rbx, 32 >>> 0x3e98693856f0 50 48bf0000000005000000 REX.W movq rdi,0x500000000 >>> 0x3e98693856fa 5a 488b5510 REX.W movq rdx,[rbp+0x10] >>> 0x3e98693856fe 5e 498b8dd0050000 REX.W movq rcx,[r13+0x5d0] >>> 0x3e9869385705 65 488bc3 REX.W movq rax,rbx >>> 0x3e9869385708 68 488b75f8 REX.W movq rsi,[rbp-0x8] >>> 0x3e986938570c 6c e8af1af5ff call 0x3e98692d71c0 >>> (StoreICStrictTrampoline) ;; code: STORE_IC >>> 0x3e9869385711 71 498b45a0 REX.W movq rax,[r13-0x60] >>> 0x3e9869385715 75 488be5 REX.W movq rsp,rbp >>> 0x3e9869385718 78 5d pop rbp >>> 0x3e9869385719 79 c20800 ret 0x8 >>> >>> So I suspect either there is a bug in v8 or v8 generally can't handle >>> inheritance and subclassing and always will be compile to slow polymorphic >>> access to this object in ancestors methods, thus "extends" keyword turns >>> out as performance antipattern >>> >>> On Sunday, September 24, 2017 at 7:33:35 AM UTC+3, Zac Hansen wrote: >>>> >>>> Microbenchmarks are infamously difficult to get right as often you're >>>> not testing what you think you're testing. >>>> >>>> Are you sure the optimizer isn't just throwing away code in some cases, >>>> since you're not actually doing any work with the `done` property? >>>> There's no reason that your code even has to run unless I'm reading it >>>> wrong. And it's not like C++ where you can look at the generated >>>> instructions to see what the optimizer is doing.. >>>> >>>> On Friday, September 22, 2017 at 9:24:00 PM UTC-7, B. Orlov wrote: >>>>> >>>>> Take a look at commonly used in oop inheritance pattern for extending >>>>> base class: >>>>> >>>>> class Animal { >>>>> constructor(){ >>>>> this.done = 0 >>>>> } >>>>> doIt(){ >>>>> this.done++ >>>>> } >>>>> } >>>>> >>>>> class Cat extends Animal {} >>>>> class Dog extends Animal {} >>>>> class Dog1 extends Animal {} >>>>> class Dog2 extends Animal {} >>>>> class Dog3 extends Animal {} >>>>> >>>>> function testAnimal(animal){ >>>>> for(var i = 0; i < 100000; i++){ >>>>> animal.doIt(); >>>>> } >>>>> } >>>>> >>>>> >>>>> function test(){ >>>>> var cat = new Cat(); >>>>> testAnimal(cat) >>>>> var dog = new Dog(); >>>>> testAnimal(dog) >>>>> var dog1 = new Dog1(); >>>>> testAnimal(dog1) >>>>> var dog2 = new Dog2(); >>>>> testAnimal(dog2) >>>>> var dog3 = new Dog3(); >>>>> testAnimal(dog3) >>>>> } >>>>> >>>>> test() >>>>> >>>>> Running in latest node (6.0.287.53 v8 version) I get the following >>>>> results: >>>>> Invoking testAnimal function with first descendant class Cat, V8 does >>>>> a great job by compiling to doIt() method and "this.done++" incrementation >>>>> to REX.W asm instruction without any calls to slow-runtime c++ function >>>>> for >>>>> generic field access.Than, by invoking doIt() on second descendant class >>>>> Dog, V8 fall down to doIt() method deoptimization (and all outer functions >>>>> which also were inlined) and add few ams rew.x and one jz instruction to >>>>> check hidden map for second Dog class. Than on invoking doIt() on each new >>>>> subclass v8 again fall down to deoptimization and add new checks for new >>>>> hidden map and finally on fourth descendant class Dog3 v8 give up and for >>>>> "this.done++" goes to call LoadICTrampoline StoreICStrictTrampoline c++ >>>>> runtime function for generic access. I hope its only bug and v8 can >>>>> efficient deal with inheritance and accessing field without slow runtime >>>>> generic filed access otherwise the big question came in - is "extend" >>>>> keyword a performance antipattern and why v8 can't implement something >>>>> like >>>>> c++ v-table mechanism ? >>>>> >>>> -- >>> -- >>> v8-users mailing list >>> v8-u...@googlegroups.com >>> http://groups.google.com/group/v8-users >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to v8-users+u...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- > -- > v8-users mailing list > v8-users@googlegroups.com > http://groups.google.com/group/v8-users > --- > You received this message because you are subscribed to the Google Groups > "v8-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to v8-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- -- v8-users mailing list v8-users@googlegroups.com http://groups.google.com/group/v8-users --- You received this message because you are subscribed to the Google Groups "v8-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.