On Jul 7, 2009, at 4:19 PM, Peter Kasting wrote:

For example, the framework could compute both sums _and_ geomeans, if people thought both were valuable.

That's a plausible thing to do, but I think there's a downside: if you make a change that moves the two scores in opposite directions, the benchmark doesn't help you decide if it's good or not. Avoiding paralysis in the face of tradeoffs is part of the reason we look primarily at the total score, not the individual subtest scores. The whole point of a meta-benchmark like this is to force ourselves to simplemindedly look at only one number.

We could agree on a way of benchmarking a representative sample of current sites to get an idea of how widespread certain operations currently are. We could talk with the maintainers of jQuery, Dojo, etc. to see what sorts of operations they think would be helpful to future apps to make faster. We could instrument browsers to have some sort of (opt-in) sampling of real-world workloads. etc. Surely together we can come up with ways to make Sunspider even better, while keeping its current strengths in mind.

I think these are all good ideas. I think there's one way in which sampling the Web is not quite right. To some extent, what matters is not average density of an operation but peak density. An operation that's used a *lot* by a few sites and hardly used by most sites, may deserve a weighting above its average proportion of Web use. I would like to hear input on what is inadequately covered. I tend to think there should be more coverage of the following:

- property access, involving at least some polymorphic access patterns
- method calls
- object-oriented programming patterns
- GC load
- programming in a style that makes significant use of closures

I think the V8 benchmark does a much better job of covering the first four of these things. I also think it overweights them, to the exclusion of most other considerations(*). As I mentioned before, I'd like to include some of V8's tests in a future SunSpider 2.0 content set.

It would be good to know what other things should be tested that are not sufficiently covered.

Regards,
Maciej

* - For example, Mozilla's TraceMonkey effort showed relatively little improvement on the V8 benchmark, even though it showed significant improvement on SunSpider and other benchmarks. I think TraceMonkey speedups are real and significant, so this would tend to undermine my confidence in the V8 benchmark's coverage. Note: I don't mean to start a side thread about whether the V8 benchmark is good or not, I just wanted to justify my remarks above. _______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Reply via email to