Carl Friedrich Bolz, 26.02.2010 11:25: > http://buytaert.net/files/oopsla07-georges.pdf
It's sad that the paper doesn't try to understand *why* others use different ways to benchmark. They even admit at the end that their statistical approach is only really interesting when the differences are small enough, not mentioning at that point that the system must be complex enough also, such as the Sun JVM. However, if the differences are small and the benchmarked system is complex, it's best to question the benchmark in the first place, rather than the statistics that lead to its results. Anyway, I agree that, given the complexity of at least some of the benchmarks in the suite, and given the requirement to do continuous benchmarking to find both small and large differences, taking statistically relevant run lines makes sense. Stefan _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
