On Thu, Aug 31, 2017 at 5:52 PM, Adam Klein <[email protected]> wrote:
> Hi Caitlin, > > Jakob and I just spent some time digging into this, comments inline > (though we don't have answers to everything). > > On Thu, Aug 31, 2017 at 9:01 AM, <[email protected]> wrote: > >> Hi, >> >> Recently I've been trying out some things to get more out of Promises and >> async functions. >> >> Different people in/around the Node.js project have been writing various >> benchmarks which show cases where `await` >> seems to slow things down significantly. One simple example is >> https://github.com/tc39/proposal-async-iteration/issues/112# >> issuecomment-324885954. While it's invalid to compare >> the simple synchronous loop to the one with `await`, it does highlight >> that in situations like that, the v8 implementation >> can seem to be very slow, when really it should be more similar to the >> sync. loop (~20 times slower seems like a steeper >> price to pay than is necessary). >> >> I drafted an informal document to come up with some ideas for speeding up >> Await in v8. In general, the solutions were >> split into 2 categories: >> >> 1) reduce heap use and GC overhead (allocate fewer objects for Await). >> 2) avoid JS->C and C->JS transitions where possible (mainly accomplished >> by translating >> Isolate::RunMicrotasksInternal() and Isolate::EnqueueMicrotask() >> into code stubs). This generally makes JS-defined >> microtasks (for Promises and Await) much faster, but may cause >> DOM-defined microtasks to slow down a bit (unclear >> at this time). I expect Promises and Await to be used more frequently >> in tight loops, and certainly DOM microtasks don't >> affect Node.js at all, so this may be something worth going after. >> >> The first approach did not make much of a dent in any benchmarks. More >> useful profiles of actual applications did not >> show `await` to be a bottleneck at all. Reducing overall memory use seems >> like a good thing in general, however. >> >> The second approach yielded a significant improvement (~60% over 10 runs) >> for the simple benchmark (in a very >> simple prototype implementation with some limitations discussed below). >> >> So there are some constraints WRT implementing RunMicrotasks in JIT'd >> code. Particularly, it needs to be possible to >> perform RunMicrotasks() when no context has been entered. I've tried a >> few things to work around this: >> >> Initially, I had wrote the stub with JS linkage, and used the typical >> JSEntryStub to invoke it. This is partly >> wasteful, and partly problematic. There need not be a separate JSFunction >> for RunMicrotasks in each >> context. More importantly, the function ought not to be associated with a >> context at all, given the >> constraint that it must be possible to invoke it without a context having >> been entered. >> > > From looking at the JSEntryStub codepath for JSFunctions, it appears to us > that for a function marked as native, strict, or both (which seems > appropriate in this case) there shouldn't be any need for a Context. So it > seems like you could unblock your prototype by creating a single JSFunction > (as a strong root on the Heap) which wraps the builtin, and call through > that from the C++ API. If you already tried something like this and ran > into trouble it'd be interesting to hear what went wrong. > > >> A second approach involved creating new TF operators to initialize the >> roots register (the main >> manifestation of problems when not using the JSEntryStub was that the >> roots register was not initialized, >> leading to access violations when using heap constants). I didn't spend >> much time with this, because I >> felt that it was more important to make sure callee-saved registers were >> restored properly, even though >> there wasn't much going on in the sole caller of the function. I thought >> it might be interesting to produce >> more general operators which would handle entry and exit for stubs which >> need to be invoked from C, >> but it seemed like a lot of work and I haven't gotten around to doing >> this yet. >> >> Finally, I tried adding a new variant to JSEntryStub, which call the >> RunMicrotasks stub rather than the various entry >> trampolines. At this moment, it's mostly in working order, but it's >> possible there are still problems with >> StackFrameIteration and exception handling. >> > > These approaches seem too involved just for this one case, I'd prefer the > JSFunction approach above if it works. > > >> Another limitation is, previously SaveContexts (which seem to matter to >> the debugger and API in some way, though I >> haven't really looked at why yet) were not set up when calling >> API-defined microtask callbacks. In my prototype, I >> always set up the SaveContext before entering the RunMicrotasks stub. >> It's yet unclear if this breaks anything, or if it >> would be possible (or even a good idea) to mimic the old behaviour in the >> stub rather than always pushing the SaveContext. >> This is a subtle difference, but as noted it could have some bad effects. >> > > Still digging into this. It appears I may have inadvertently removed the > regression test for this code in https://codereview. > chromium.org/1909433003 when I removed support for Object.observe, but > the regression test should be able to be adapted for Promises (see > https://codereview.chromium.org/332923003 for the test). I'm going to try > restoring the test and playing around with this code in the current C++ > version to see if I can get a better handle on it. But from an initial > reading, it really shouldn't make a difference for the C++ callback case > anyway. > Looking more deeply at this (and at https://bugs.chromium.org/p/chromium/issues/detail?id=385349, the bug for that regression test), I'm not convinced this is actually necessary anymore. +yangguo in case he has a less fuzzy memory than me (this was all >3 years ago). > Finally, a somewhat strange behaviour of the stub is that it enters >> contexts by itself when it needs to, inlining >> HandleScopeImplementer::EnterMicrotaskContext and >> LeaveMicrotaskContext(), and overwriting Isolate::context(). >> I believe this is done in a valid way in the prototype, but it's not >> something that comes up in other stubs, so there isn't really >> any other code to model it on. >> > > This seems fine to me, it's rather special behavior even in its C++ form. > > >> --- >> >> I was wondering if anyone thought reducing the C++->JS->C++ overhead in >> RunMicrotasks for that 60% boost in certain >> very simple and unrepresentative-of-real-code benchmarks might be worth >> doing properly and upstreaming? While it's >> unclear what the impact would be on real-world code, it seems like a >> reasonable expectation that you'd see some kind of >> significant benefit (though perhaps not on the order of 60% as in the >> very simple benchmark mentioned above). >> >> If (in the opinion of the v8 team) it might be worth my time to try to >> upstream this, I'd love some feedback on the approaches >> taken to address the problems listed above, and get an idea of what sort >> of approach you'd all be happiest with. >> > > If we can pin down the answers to all the stuff above to our satisfaction, > then yes, my inclination is that this is a worthwhile thing to do: the code > may be a bit verbose (what with having to deal with the different sorts of > things stored in the queue), but it's at least relatively straightforward. > > - Adam > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
