Hi! First, I want to restate the obvious, before pointing out what I think is a mistake: your work on this website is great and very useful!
On Fri, Jun 25, 2010 at 13:08, Miquel Torres <tob...@googlemail.com> wrote: > - stacked bars Here you are summing up normalized times, which is more or less like taking their arithmetic average. And that doesn't work at all: in many cases you can "show" completely different results by normalizing relatively to another item. Even the simple question "who is faster?" can be answered in different ways So you should use the geometric mean, even if this is not so widely known. Or better, it is known by benchmarking experts, but it's difficult to become so. Please, have a look at the short paper: "How not to lie with statistics: the correct way to summarize benchmark results" http://scholar.google.com/scholar?cluster=1051144955483053492&hl=en&as_sdt=2000 I downloaded it from the ACM library, please tell me if you can't find it. > horizontal(http://speed.pypy.org/comparison/?hor=true&bas=2%2B35&chart=stacked+bars): > This is not meant to "demonstrate" that overall the jit is over two times > faster than cpython. It is just another way for a developer to picture how > long a programme would take to complete if it were composed of 21 such > tasks. You are not summing up absolute times, so your claim is incorrect. And the error is significant, given the above paper. A sum of absolute times would provide what you claim. > You can see that cpython's (the normalization chosen) benchmarks all > take 1"relative" second. Here, for instance, I see that CPython and pypy-c take more or less the same time, which surprises me (since the PyPy interpreter was known to be slower than CPython). But given that the result is invalid, it may well be an artifact of your statistics. > pypy-c needs more or less the same time, some > "tasks" being slower and some faster. Psyco shows an interesting picture: > From meteor-contest downwards (fortuitously) , all benchmarks are extremely > "compressed", which means they are speeded up by psyco quite a lot. But any > further speed up wouldn't make overall time much shorter because the first > group of benchmarks now takes most of the time to complete. pypy-c-jit is a > more extreme case of this: If the jit accelerated all "fast" benchmarks to 0 > seconds (infinitely fast), it would only get about twice as fast as now > because ai, slowspitfire, spambayes and twisted_tcp now need half the entire > execution time. An good demonstration of "you are only as fast as your > slowest part". Of course the aggregate of all benchmarks is not a real app, > but it is still fun. This could maybe be still true, at least in part, but you have to do this reasoning on absolute times. Best regards, and keep up the good work! -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/ _______________________________________________ pypy-...@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev