While I like the "automatic removal of outliers feature" of median and MAD ("robust" statistics), I'm not confortable with these numbers. They are new to me and uncommon in other benchmark tools.
It's not easy to compare MAD to standard deviation. It seems like MAD can even be misleading when reading the "1 ms" part of "10 ms +- 1 ms". The perf module has already a function to emit warnings if a benchmark is considered as "unstable". A warning is emitted if stdev/mean is greater than 0.10. I chose this threshold arbitrary. Maybe we need another check to emit a warning when mean and median, or std dev and MAD are too different? Maybe we need a new --median command line option to display median/MAD, instead of mean/stdev displayed by default? About the reproductibility, I should experiment mean vs median. Currently, perf doesn't use MAD nor std dev to compare two benchmark results. Victor _______________________________________________ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed