Miquel Torres, 26.02.2010 11:05:
> You may also consider that a benchmark that varies greatly between
> runs may be a flawed benchmark.
> 
> I think it should be considered, but only on the running side, and act
> accordingly (too high a deviation: discard run, reconsider benchmark,
> reconsider environment or whatever).

Right, there might even have been a cron job running at the same time.
There are various reasons why benchmark numbers can vary.

Especially in a JIT environment, you'd normally expect the benchmark
numbers to decrease over a certain time, or to stay constantly high for a
while, then show a peak when the compiler kicks in, and then continue at a
lower level (e.g. with the Sun JVM's hotspot JIT or incremental JIT
compilers in general). I assume that the benchmarking machinery handles
this, but it's yet another reason why highly differing timings can occur
within a single run, and why it's only the best run that really matters.

You could even go one step further: ignore deviating results in the history
graph and only present them when they are still reproducible (preferably
with the same source revision) an hour later.

Stefan

_______________________________________________
[email protected]
http://codespeak.net/mailman/listinfo/pypy-dev

Reply via email to