[v8-dev] A potential speedup for Promises and Async Functions

caitp Thu, 31 Aug 2017 09:03:02 -0700

Hi,

Recently I've been trying out some things to get more out of Promises and async 
functions.


Different people in/around the Node.js project have been writing various 
benchmarks which show cases where `await`
seems to slow things down significantly. One simple example is
https://github.com/tc39/proposal-async-iteration/issues/112#issuecomment-324885954
 
<https://github.com/tc39/proposal-async-iteration/issues/112#issuecomment-324885954>.
 While it's invalid to compare
the simple synchronous loop to the one with `await`, it does highlight that in 
situations like that, the v8 implementation
can seem to be very slow, when really it should be more similar to the sync. 
loop (~20 times slower seems like a steeper
price to pay than is necessary).

I drafted an informal document to come up with some ideas for speeding up Await 
in v8. In general, the solutions were
split into 2 categories:

1) reduce heap use and GC overhead (allocate fewer objects for Await).
2) avoid JS->C and C->JS transitions where possible (mainly accomplished by 
translating
    Isolate::RunMicrotasksInternal() and Isolate::EnqueueMicrotask() into code 
stubs). This generally makes JS-defined
    microtasks (for Promises and Await) much faster, but may cause DOM-defined 
microtasks to slow down a bit (unclear
    at this time). I expect Promises and Await to be used more frequently in 
tight loops, and certainly DOM microtasks don't
    affect Node.js at all, so this may be something worth going after.

The first approach did not make much of a dent in any benchmarks. More useful 
profiles of actual applications did not
show `await` to be a bottleneck at all. Reducing overall memory use seems like 
a good thing in general, however.

The second approach yielded a significant improvement (~60% over 10 runs) for 
the simple benchmark (in a very
simple prototype implementation with some limitations discussed below).

So there are some constraints WRT implementing RunMicrotasks in JIT'd code. 
Particularly, it needs to be possible to
perform RunMicrotasks() when no context has been entered. I've tried a few 
things to work around this:

Initially, I had wrote the stub with JS linkage, and used the typical 
JSEntryStub to invoke it. This is partly
wasteful, and partly problematic. There need not be a separate JSFunction for 
RunMicrotasks in each
context. More importantly, the function ought not to be associated with a 
context at all, given the
constraint that it must be possible to invoke it without a context having been 
entered.

A second approach involved creating new TF operators to initialize the roots 
register (the main
manifestation of problems when not using the JSEntryStub was that the roots 
register was not initialized,
leading to access violations when using heap constants). I didn't spend much 
time with this, because I
felt that it was more important to make sure callee-saved registers were 
restored properly, even though
there wasn't much going on in the sole caller of the function.  I thought it 
might be interesting to produce
more general operators which would handle entry and exit for stubs which need 
to be invoked from C,
but it seemed like a lot of work and I haven't gotten around to doing this yet.

Finally, I tried adding a new variant to JSEntryStub, which call the 
RunMicrotasks stub rather than the various entry
trampolines. At this moment, it's mostly in working order, but it's possible 
there are still problems with
StackFrameIteration and exception handling.

Another limitation is, previously SaveContexts (which seem to matter to the 
debugger and API in some way, though I
haven't really looked at why yet) were not set up when calling API-defined 
microtask callbacks. In my prototype, I
always set up the SaveContext before entering the RunMicrotasks stub. It's yet 
unclear if this breaks anything, or if it
would be possible (or even a good idea) to mimic the old behaviour in the stub 
rather than always pushing the SaveContext.
This is a subtle difference, but as noted it could have some bad effects.

Finally, a somewhat strange behaviour of the stub is that it enters contexts by 
itself when it needs to, inlining
HandleScopeImplementer::EnterMicrotaskContext and LeaveMicrotaskContext(), and 
overwriting Isolate::context().
I believe this is done in a valid way in the prototype, but it's not something 
that comes up in other stubs, so there isn't really
any other code to model it on.

---

I was wondering if anyone thought reducing the C++->JS->C++ overhead in 
RunMicrotasks for that 60% boost in certain
very simple and unrepresentative-of-real-code benchmarks might be worth doing 
properly and upstreaming? While it's
unclear what the impact would be on real-world code, it seems like a reasonable 
expectation that you'd see some kind of
significant benefit (though perhaps not on the order of 60% as in the very 
simple benchmark mentioned above).

If (in the opinion of the v8 team) it might be worth my time to try to upstream 
this, I'd love some feedback on the approaches
taken to address the problems listed above, and get an idea of what sort of 
approach you'd all be happiest with.

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[v8-dev] A potential speedup for Promises and Async Functions

Reply via email to