Hi Caitlin,

Jakob and I just spent some time digging into this, comments inline (though
we don't have answers to everything).

On Thu, Aug 31, 2017 at 9:01 AM, <[email protected]> wrote:

> Hi,
>
> Recently I've been trying out some things to get more out of Promises and
> async functions.
>
> Different people in/around the Node.js project have been writing various
> benchmarks which show cases where `await`
> seems to slow things down significantly. One simple example is
> https://github.com/tc39/proposal-async-iteration/issues/112#issuecomment-
> 324885954. While it's invalid to compare
> the simple synchronous loop to the one with `await`, it does highlight
> that in situations like that, the v8 implementation
> can seem to be very slow, when really it should be more similar to the
> sync. loop (~20 times slower seems like a steeper
> price to pay than is necessary).
>
> I drafted an informal document to come up with some ideas for speeding up
> Await in v8. In general, the solutions were
> split into 2 categories:
>
> 1) reduce heap use and GC overhead (allocate fewer objects for Await).
> 2) avoid JS->C and C->JS transitions where possible (mainly accomplished
> by translating
>     Isolate::RunMicrotasksInternal() and Isolate::EnqueueMicrotask() into
> code stubs). This generally makes JS-defined
>     microtasks (for Promises and Await) much faster, but may cause
> DOM-defined microtasks to slow down a bit (unclear
>     at this time). I expect Promises and Await to be used more frequently
> in tight loops, and certainly DOM microtasks don't
>     affect Node.js at all, so this may be something worth going after.
>
> The first approach did not make much of a dent in any benchmarks. More
> useful profiles of actual applications did not
> show `await` to be a bottleneck at all. Reducing overall memory use seems
> like a good thing in general, however.
>
> The second approach yielded a significant improvement (~60% over 10 runs)
> for the simple benchmark (in a very
> simple prototype implementation with some limitations discussed below).
>
> So there are some constraints WRT implementing RunMicrotasks in JIT'd
> code. Particularly, it needs to be possible to
> perform RunMicrotasks() when no context has been entered. I've tried a few
> things to work around this:
>
> Initially, I had wrote the stub with JS linkage, and used the typical
> JSEntryStub to invoke it. This is partly
> wasteful, and partly problematic. There need not be a separate JSFunction
> for RunMicrotasks in each
> context. More importantly, the function ought not to be associated with a
> context at all, given the
> constraint that it must be possible to invoke it without a context having
> been entered.
>

>From looking at the JSEntryStub codepath for JSFunctions, it appears to us
that for a function marked as native, strict, or both (which seems
appropriate in this case) there shouldn't be any need for a Context. So it
seems like you could unblock your prototype by creating a single JSFunction
(as a strong root on the Heap) which wraps the builtin, and call through
that from the C++ API. If you already tried something like this and ran
into trouble it'd be interesting to hear what went wrong.


> A second approach involved creating new TF operators to initialize the
> roots register (the main
> manifestation of problems when not using the JSEntryStub was that the
> roots register was not initialized,
> leading to access violations when using heap constants). I didn't spend
> much time with this, because I
> felt that it was more important to make sure callee-saved registers were
> restored properly, even though
> there wasn't much going on in the sole caller of the function.  I thought
> it might be interesting to produce
> more general operators which would handle entry and exit for stubs which
> need to be invoked from C,
> but it seemed like a lot of work and I haven't gotten around to doing this
> yet.
>
> Finally, I tried adding a new variant to JSEntryStub, which call the
> RunMicrotasks stub rather than the various entry
> trampolines. At this moment, it's mostly in working order, but it's
> possible there are still problems with
> StackFrameIteration and exception handling.
>

These approaches seem too involved just for this one case, I'd prefer the
JSFunction approach above if it works.


> Another limitation is, previously SaveContexts (which seem to matter to
> the debugger and API in some way, though I
> haven't really looked at why yet) were not set up when calling API-defined
> microtask callbacks. In my prototype, I
> always set up the SaveContext before entering the RunMicrotasks stub. It's
> yet unclear if this breaks anything, or if it
> would be possible (or even a good idea) to mimic the old behaviour in the
> stub rather than always pushing the SaveContext.
> This is a subtle difference, but as noted it could have some bad effects.
>

Still digging into this. It appears I may have inadvertently removed the
regression test for this code in https://codereview.chromium.org/1909433003
when I removed support for Object.observe, but the regression test should
be able to be adapted for Promises (see
https://codereview.chromium.org/332923003 for the test). I'm going to try
restoring the test and playing around with this code in the current C++
version to see if I can get a better handle on it. But from an initial
reading, it really shouldn't make a difference for the C++ callback case
anyway.


> Finally, a somewhat strange behaviour of the stub is that it enters
> contexts by itself when it needs to, inlining
> HandleScopeImplementer::EnterMicrotaskContext and
> LeaveMicrotaskContext(), and overwriting Isolate::context().
> I believe this is done in a valid way in the prototype, but it's not
> something that comes up in other stubs, so there isn't really
> any other code to model it on.
>

This seems fine to me, it's rather special behavior even in its C++ form.


> ---
>
> I was wondering if anyone thought reducing the C++->JS->C++ overhead in
> RunMicrotasks for that 60% boost in certain
> very simple and unrepresentative-of-real-code benchmarks might be worth
> doing properly and upstreaming? While it's
> unclear what the impact would be on real-world code, it seems like a
> reasonable expectation that you'd see some kind of
> significant benefit (though perhaps not on the order of 60% as in the very
> simple benchmark mentioned above).
>
> If (in the opinion of the v8 team) it might be worth my time to try to
> upstream this, I'd love some feedback on the approaches
> taken to address the problems listed above, and get an idea of what sort
> of approach you'd all be happiest with.
>

If we can pin down the answers to all the stuff above to our satisfaction,
then yes, my inclination is that this is a worthwhile thing to do: the code
may be a bit verbose (what with having to deal with the different sorts of
things stored in the queue), but it's at least relatively straightforward.

- Adam

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to