Miquel Torres, 26.02.2010 11:05: > You may also consider that a benchmark that varies greatly between > runs may be a flawed benchmark. > > I think it should be considered, but only on the running side, and act > accordingly (too high a deviation: discard run, reconsider benchmark, > reconsider environment or whatever).
Right, there might even have been a cron job running at the same time. There are various reasons why benchmark numbers can vary. Especially in a JIT environment, you'd normally expect the benchmark numbers to decrease over a certain time, or to stay constantly high for a while, then show a peak when the compiler kicks in, and then continue at a lower level (e.g. with the Sun JVM's hotspot JIT or incremental JIT compilers in general). I assume that the benchmarking machinery handles this, but it's yet another reason why highly differing timings can occur within a single run, and why it's only the best run that really matters. You could even go one step further: ignore deviating results in the history graph and only present them when they are still reproducible (preferably with the same source revision) an hour later. Stefan _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
