Hi, Recently I've been trying out some things to get more out of Promises and async functions.
Different people in/around the Node.js project have been writing various benchmarks which show cases where `await` seems to slow things down significantly. One simple example is https://github.com/tc39/proposal-async-iteration/issues/112#issuecomment-324885954 <https://github.com/tc39/proposal-async-iteration/issues/112#issuecomment-324885954>. While it's invalid to compare the simple synchronous loop to the one with `await`, it does highlight that in situations like that, the v8 implementation can seem to be very slow, when really it should be more similar to the sync. loop (~20 times slower seems like a steeper price to pay than is necessary). I drafted an informal document to come up with some ideas for speeding up Await in v8. In general, the solutions were split into 2 categories: 1) reduce heap use and GC overhead (allocate fewer objects for Await). 2) avoid JS->C and C->JS transitions where possible (mainly accomplished by translating Isolate::RunMicrotasksInternal() and Isolate::EnqueueMicrotask() into code stubs). This generally makes JS-defined microtasks (for Promises and Await) much faster, but may cause DOM-defined microtasks to slow down a bit (unclear at this time). I expect Promises and Await to be used more frequently in tight loops, and certainly DOM microtasks don't affect Node.js at all, so this may be something worth going after. The first approach did not make much of a dent in any benchmarks. More useful profiles of actual applications did not show `await` to be a bottleneck at all. Reducing overall memory use seems like a good thing in general, however. The second approach yielded a significant improvement (~60% over 10 runs) for the simple benchmark (in a very simple prototype implementation with some limitations discussed below). So there are some constraints WRT implementing RunMicrotasks in JIT'd code. Particularly, it needs to be possible to perform RunMicrotasks() when no context has been entered. I've tried a few things to work around this: Initially, I had wrote the stub with JS linkage, and used the typical JSEntryStub to invoke it. This is partly wasteful, and partly problematic. There need not be a separate JSFunction for RunMicrotasks in each context. More importantly, the function ought not to be associated with a context at all, given the constraint that it must be possible to invoke it without a context having been entered. A second approach involved creating new TF operators to initialize the roots register (the main manifestation of problems when not using the JSEntryStub was that the roots register was not initialized, leading to access violations when using heap constants). I didn't spend much time with this, because I felt that it was more important to make sure callee-saved registers were restored properly, even though there wasn't much going on in the sole caller of the function. I thought it might be interesting to produce more general operators which would handle entry and exit for stubs which need to be invoked from C, but it seemed like a lot of work and I haven't gotten around to doing this yet. Finally, I tried adding a new variant to JSEntryStub, which call the RunMicrotasks stub rather than the various entry trampolines. At this moment, it's mostly in working order, but it's possible there are still problems with StackFrameIteration and exception handling. Another limitation is, previously SaveContexts (which seem to matter to the debugger and API in some way, though I haven't really looked at why yet) were not set up when calling API-defined microtask callbacks. In my prototype, I always set up the SaveContext before entering the RunMicrotasks stub. It's yet unclear if this breaks anything, or if it would be possible (or even a good idea) to mimic the old behaviour in the stub rather than always pushing the SaveContext. This is a subtle difference, but as noted it could have some bad effects. Finally, a somewhat strange behaviour of the stub is that it enters contexts by itself when it needs to, inlining HandleScopeImplementer::EnterMicrotaskContext and LeaveMicrotaskContext(), and overwriting Isolate::context(). I believe this is done in a valid way in the prototype, but it's not something that comes up in other stubs, so there isn't really any other code to model it on. --- I was wondering if anyone thought reducing the C++->JS->C++ overhead in RunMicrotasks for that 60% boost in certain very simple and unrepresentative-of-real-code benchmarks might be worth doing properly and upstreaming? While it's unclear what the impact would be on real-world code, it seems like a reasonable expectation that you'd see some kind of significant benefit (though perhaps not on the order of 60% as in the very simple benchmark mentioned above). If (in the opinion of the v8 team) it might be worth my time to try to upstream this, I'd love some feedback on the approaches taken to address the problems listed above, and get an idea of what sort of approach you'd all be happiest with. -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
