On 2014/06/23 22:04:39, danno wrote:
On 2014/06/23 21:00:04, arv wrote:
> On 2014/06/23 at 20:47:27, danno wrote:
> > On 2014/06/23 19:51:52, Michael Starzinger wrote:
> > > Handing off to either Andreas or Danno.
> > >
> > >
https://codereview.chromium.org/329253004/diff/90001/src/collection.js
> > > File src/collection.js (right):
> > >
> > >
>
https://codereview.chromium.org/329253004/diff/90001/src/collection.js#newcode87
> > > src/collection.js:87: %SetIteratorMoveNext(iterator);
> > > I am _very_ surprised that this is faster. How are three runtime
calls
> faster
> > > than one? Don't get me wrong, I trust your measurements, but is
this a
good
> > > optimization in the long run?
> >
> > NOT LGTM
> >
> > Replacing one runtime function with three might provide a win in some
> situations (how did you measure it?),
>
> Using the tests that we added to golem
>
> > but I am a concerned that this is not the way to go if you are
wanting to
> optimize this for the long run. If you really want a fast
implementation,
you
> long-term aim should be to remove the runtime call altogether (which is
> possible, e.g. with a Hydrogen stub, but obviously more work).
>
> We have explored this path. There was a concern that we should not write
> hydrogen for too specialized code. And I also wrote a self hosted
version of
Map
> and Set where the only runtime call was to get the hash code. This,
however
> turned out to be a lot slower so it was decided not to pursue this path
for
now.
I am not sure if the concern is with hydrogen stubs in general for
optimization
here, I would rather frame it this way: code generation is not a very
subtle
tool and should be used sparingly and as the final step in the
optimization
process. Before you consider Hydrogen stub generation, the other
bottlenecks
should be reduced and massaged so that when the last little bit needs to
be
squeezed out with code generation, it can be done minimally intrusively.
Given
we just started with the optimization process, let's focus on the
low-hanging
fruits first (as you have conceptually done), but not design out the path
to
the
high-hanging ones like Hydrogen stubs eventually. The means incrementally
teasing out more and more functionality of the native runtime call in
JavaScript
(which get's turned into highly optimized code via Crankshaft "for free")
so
that the code that remains in C++ is easy and concise to implement in
Hydrogen.
>
> > Your proposed solution relies too much on C++ machinery in order to
make
> forEach faster and moves further away from the ideal solution, so it
doesn't
> really match current v8 practice.
> >
> > What exactly is causing the existing function to be slow? Did you
measure
it?
> Is it the allocation for every iteration? Or something else?
>
> > If it's the allocation, I can think of another, simpler ways to
improve
this.
> For example, this particular use case doesn't expose the entry object
> externally, so a _much_ faster approach is to turn %XXXIteratorNext into
> %XXXIteratorNextObject (or whatever appropriate name) that either
returns
the
> next object, or a special sentinel for done. You can call that new
version
> directly from the forEach implementation, removing all allocation and
doing
the
> job with only a single runtime function.
> >
> > If you do it right, I think you might be able get rid of the nastiness
with
> the special casing of the entry object and its map in the runtime,
turning
the
> creating of the externally-visible entry into pure JS something like
this:
> >
> > function SetIteratorNext() {
> > var next = %SetIteratorNextObject();
> > var is_done = next == __some_sentinel;
> > return { value: (is_done ? undefined : next), done: is_done }
> > }
>
> Thanks for the feedback Danno. Much appreciated.
>
> The sentinel solution works for Set.prototype.forEach but not for
> Map.prototype.forEach which needs both key and value.
Yes, but consider this variant of that approach:
function MapIteratorNext() {
var entry = { key: undefined, value: undefined, done: false };
%MapIteratorNextWithEntry(iterator, entry);
}
In the forEach case, you can allocate a single entry and pass it into
multiple
calls of the %MapIteratorNextWithEntry, reducing the overhead (one
allocation
vs. n)
Interesting, but that doesn't settle it, there's deeper digging that needs
to be
done. The next thing to do is to determine exactly where the time is going
in
the single runtime call. Without the allocation, the raw computation needed
to
calculate the result for the single-call and three-call versions should be
the
same, I suspect the discrepancy is coming from something non-obvious.
Can you please run the benchmark with the perf tool on linux?
tools/run-llprof.sh out/ia32.release/d8 your_benchmark.js
then use
tools/ll_prof.py
to get a list the hotspots
https://codereview.chromium.org/329253004/
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.