On Wed, 15 Mar 2017 08:41:47 +0200
Serhiy Storchaka <storch...@gmail.com>
wrote:
> 
> "half of the samples are 1% below the median and half of the samples are 
> 50% above" -- this is unrealistic example.

I was inventing an extreme example for the sake of clarity.
You can easily derive more "realistic" examples from the same
principle and get similar results at the end: non-negligible variations
being totally unrepresented in the "median +- MAD" aggregate.

> In real examples samples are 
> distributed around some point, with the skew and outliers.

If you're assuming the benchmark itself is stable and variations are
due to outside system noise, then you should really take the minimum,
which has the most chance of ignoring system noise.

If you're mainly worried about outliers, you can first insert a data
preparation (or cleanup) phase before computing the mean.  But you have
to decide up front whether an outlier is due to system noise or actual
benchmark instability (which can be due to non-determinism in the
runtime, e.g. hash randomization). For that, you may want to collect
additional system data while running the benchmark (for example, if
total CPU occupation during the benchmark is much more than the
benchmark's own CPU times, you might decide the system wasn't idle
enough and the result may be classified as outlier).

Regards

Antoine.


_______________________________________________
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed

Reply via email to