Hi Caitlin, Jakob and I just spent some time digging into this, comments inline (though we don't have answers to everything).
On Thu, Aug 31, 2017 at 9:01 AM, <[email protected]> wrote: > Hi, > > Recently I've been trying out some things to get more out of Promises and > async functions. > > Different people in/around the Node.js project have been writing various > benchmarks which show cases where `await` > seems to slow things down significantly. One simple example is > https://github.com/tc39/proposal-async-iteration/issues/112#issuecomment- > 324885954. While it's invalid to compare > the simple synchronous loop to the one with `await`, it does highlight > that in situations like that, the v8 implementation > can seem to be very slow, when really it should be more similar to the > sync. loop (~20 times slower seems like a steeper > price to pay than is necessary). > > I drafted an informal document to come up with some ideas for speeding up > Await in v8. In general, the solutions were > split into 2 categories: > > 1) reduce heap use and GC overhead (allocate fewer objects for Await). > 2) avoid JS->C and C->JS transitions where possible (mainly accomplished > by translating > Isolate::RunMicrotasksInternal() and Isolate::EnqueueMicrotask() into > code stubs). This generally makes JS-defined > microtasks (for Promises and Await) much faster, but may cause > DOM-defined microtasks to slow down a bit (unclear > at this time). I expect Promises and Await to be used more frequently > in tight loops, and certainly DOM microtasks don't > affect Node.js at all, so this may be something worth going after. > > The first approach did not make much of a dent in any benchmarks. More > useful profiles of actual applications did not > show `await` to be a bottleneck at all. Reducing overall memory use seems > like a good thing in general, however. > > The second approach yielded a significant improvement (~60% over 10 runs) > for the simple benchmark (in a very > simple prototype implementation with some limitations discussed below). > > So there are some constraints WRT implementing RunMicrotasks in JIT'd > code. Particularly, it needs to be possible to > perform RunMicrotasks() when no context has been entered. I've tried a few > things to work around this: > > Initially, I had wrote the stub with JS linkage, and used the typical > JSEntryStub to invoke it. This is partly > wasteful, and partly problematic. There need not be a separate JSFunction > for RunMicrotasks in each > context. More importantly, the function ought not to be associated with a > context at all, given the > constraint that it must be possible to invoke it without a context having > been entered. > >From looking at the JSEntryStub codepath for JSFunctions, it appears to us that for a function marked as native, strict, or both (which seems appropriate in this case) there shouldn't be any need for a Context. So it seems like you could unblock your prototype by creating a single JSFunction (as a strong root on the Heap) which wraps the builtin, and call through that from the C++ API. If you already tried something like this and ran into trouble it'd be interesting to hear what went wrong. > A second approach involved creating new TF operators to initialize the > roots register (the main > manifestation of problems when not using the JSEntryStub was that the > roots register was not initialized, > leading to access violations when using heap constants). I didn't spend > much time with this, because I > felt that it was more important to make sure callee-saved registers were > restored properly, even though > there wasn't much going on in the sole caller of the function. I thought > it might be interesting to produce > more general operators which would handle entry and exit for stubs which > need to be invoked from C, > but it seemed like a lot of work and I haven't gotten around to doing this > yet. > > Finally, I tried adding a new variant to JSEntryStub, which call the > RunMicrotasks stub rather than the various entry > trampolines. At this moment, it's mostly in working order, but it's > possible there are still problems with > StackFrameIteration and exception handling. > These approaches seem too involved just for this one case, I'd prefer the JSFunction approach above if it works. > Another limitation is, previously SaveContexts (which seem to matter to > the debugger and API in some way, though I > haven't really looked at why yet) were not set up when calling API-defined > microtask callbacks. In my prototype, I > always set up the SaveContext before entering the RunMicrotasks stub. It's > yet unclear if this breaks anything, or if it > would be possible (or even a good idea) to mimic the old behaviour in the > stub rather than always pushing the SaveContext. > This is a subtle difference, but as noted it could have some bad effects. > Still digging into this. It appears I may have inadvertently removed the regression test for this code in https://codereview.chromium.org/1909433003 when I removed support for Object.observe, but the regression test should be able to be adapted for Promises (see https://codereview.chromium.org/332923003 for the test). I'm going to try restoring the test and playing around with this code in the current C++ version to see if I can get a better handle on it. But from an initial reading, it really shouldn't make a difference for the C++ callback case anyway. > Finally, a somewhat strange behaviour of the stub is that it enters > contexts by itself when it needs to, inlining > HandleScopeImplementer::EnterMicrotaskContext and > LeaveMicrotaskContext(), and overwriting Isolate::context(). > I believe this is done in a valid way in the prototype, but it's not > something that comes up in other stubs, so there isn't really > any other code to model it on. > This seems fine to me, it's rather special behavior even in its C++ form. > --- > > I was wondering if anyone thought reducing the C++->JS->C++ overhead in > RunMicrotasks for that 60% boost in certain > very simple and unrepresentative-of-real-code benchmarks might be worth > doing properly and upstreaming? While it's > unclear what the impact would be on real-world code, it seems like a > reasonable expectation that you'd see some kind of > significant benefit (though perhaps not on the order of 60% as in the very > simple benchmark mentioned above). > > If (in the opinion of the v8 team) it might be worth my time to try to > upstream this, I'd love some feedback on the approaches > taken to address the problems listed above, and get an idea of what sort > of approach you'd all be happiest with. > If we can pin down the answers to all the stuff above to our satisfaction, then yes, my inclination is that this is a worthwhile thing to do: the code may be a bit verbose (what with having to deal with the different sorts of things stored in the queue), but it's at least relatively straightforward. - Adam -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
