I started to work on visualisation. IMHO it helps to understand the problem.
Let's create a large dataset: 500 samples (100 processes x 5 samples): --- $ python3 telco.py --json-file=telco.json -p 100 -n 5 --- Attached plot.py script creates an histogram: --- avg: 26.7 ms +- 0.2 ms; min = 26.2 ms 26.1 ms: 1 # 26.2 ms: 12 ##### 26.3 ms: 34 ############ 26.4 ms: 44 ################ 26.5 ms: 109 ###################################### 26.6 ms: 117 ######################################## 26.7 ms: 86 ############################## 26.8 ms: 50 ################## 26.9 ms: 32 ########### 27.0 ms: 10 #### 27.1 ms: 3 ## 27.2 ms: 1 # 27.3 ms: 1 # minimum 26.1 ms: 0.2% (1) of 500 samples --- Replace "if 1" with "if 0" to produce a graphical view, or just view the attached distribution.png, the numpy+scipy histogram. The distribution looks a gaussian curve: https://en.wikipedia.org/wiki/Gaussian_function The interesting thing is that only 1 sample on 500 are in the minimum bucket (26.1 ms). If you say that the performance is 26.1 ms, only 0.2% of your users will be able to reproduce this timing. The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us 394/500 = 79%. IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than 26.1 ms (0.2%). Victor
telco.json
Description: application/json
import perf import sys filename = sys.argv[1] bench = perf.Benchmark.json_load(open(filename).read()) h = [sample * 1e3 for sample in bench.get_samples()] if 1: import math import collections print("avg: %.1f ms +- %.1f ms; min = %.1f ms" % (perf.mean(h), perf.stdev(h), min(h))) print("") c = collections.Counter([int(value * 10) for value in h]) k = 40.0 / max(c.values()) for ms in range(min(c), max(c)+1): value = c.get(ms, 0) linelen = int(math.ceil(value * k)) print("%.1f ms: % 3s %s" % (float(ms) / 10, value, '#' * linelen)) print("") cmin = min(c) value = c.get(cmin) print("minimum %.1f ms: %.1f%% (%s) of %s samples" % (float(cmin) / 10, value * 100.0 / len(h), value, len(h))) else: import numpy as np import scipy.stats as stats import pylab as pl h.sort() fit = stats.norm.pdf(h, np.mean(h), np.std(h)) #this is a fitting indeed pl.plot(h,fit,'-o') pl.hist(h,normed=True) #use this to draw histogram of your data pl.show() #use may also need add this
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com