On Wed, 15 Mar 2017 08:41:47 +0200 Serhiy Storchaka <storch...@gmail.com> wrote: > > "half of the samples are 1% below the median and half of the samples are > 50% above" -- this is unrealistic example.
I was inventing an extreme example for the sake of clarity. You can easily derive more "realistic" examples from the same principle and get similar results at the end: non-negligible variations being totally unrepresented in the "median +- MAD" aggregate. > In real examples samples are > distributed around some point, with the skew and outliers. If you're assuming the benchmark itself is stable and variations are due to outside system noise, then you should really take the minimum, which has the most chance of ignoring system noise. If you're mainly worried about outliers, you can first insert a data preparation (or cleanup) phase before computing the mean. But you have to decide up front whether an outlier is due to system noise or actual benchmark instability (which can be due to non-determinism in the runtime, e.g. hash randomization). For that, you may want to collect additional system data while running the benchmark (for example, if total CPU occupation during the benchmark is much more than the benchmark's own CPU times, you might decide the system wasn't idle enough and the result may be classified as outlier). Regards Antoine. _______________________________________________ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed