While I like the "automatic removal of outliers feature" of median and
MAD ("robust" statistics), I'm not confortable with these numbers.
They are new to me and uncommon in other benchmark tools.

It's not easy to compare MAD to standard deviation. It seems like MAD
can even be misleading when reading the "1 ms" part of "10 ms +- 1
ms".

The perf module has already a function to emit warnings if a benchmark
is considered as "unstable". A warning is emitted if stdev/mean is
greater than 0.10. I chose this threshold arbitrary.

Maybe we need another check to emit a warning when mean and median, or
std dev and MAD are too different?

Maybe we need a new --median command line option to display
median/MAD, instead of mean/stdev displayed by default?

About the reproductibility, I should experiment mean vs median.
Currently, perf doesn't use MAD nor std dev to compare two benchmark
results.

Victor
_______________________________________________
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed

Reply via email to