I started to work on visualisation. IMHO it helps to understand the problem.

Let's create a large dataset: 500 samples (100 processes x 5 samples):
---
$ python3 telco.py --json-file=telco.json -p 100 -n 5
---

Attached plot.py script creates an histogram:
---
avg: 26.7 ms +- 0.2 ms; min = 26.2 ms

26.1 ms:   1 #
26.2 ms:  12 #####
26.3 ms:  34 ############
26.4 ms:  44 ################
26.5 ms: 109 ######################################
26.6 ms: 117 ########################################
26.7 ms:  86 ##############################
26.8 ms:  50 ##################
26.9 ms:  32 ###########
27.0 ms:  10 ####
27.1 ms:   3 ##
27.2 ms:   1 #
27.3 ms:   1 #

minimum 26.1 ms: 0.2% (1) of 500 samples
---

Replace "if 1" with "if 0" to produce a graphical view, or just view
the attached distribution.png, the numpy+scipy histogram.

The distribution looks a gaussian curve:
https://en.wikipedia.org/wiki/Gaussian_function

The interesting thing is that only 1 sample on 500 are in the minimum
bucket (26.1 ms). If you say that the performance is 26.1 ms, only
0.2% of your users will be able to reproduce this timing.

The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms ..
26.9 ms: we got 109+117+86+50+32 samples in this range which gives us
394/500 = 79%.

IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than
26.1 ms (0.2%).

Victor

Attachment: telco.json
Description: application/json

import perf
import sys

filename = sys.argv[1]

bench = perf.Benchmark.json_load(open(filename).read())
h = [sample * 1e3 for sample in bench.get_samples()]

if 1:
    import math
    import collections

    print("avg: %.1f ms +- %.1f ms; min = %.1f ms"
          % (perf.mean(h), perf.stdev(h), min(h)))
    print("")

    c = collections.Counter([int(value * 10) for value in h])

    k = 40.0 / max(c.values())
    for ms in range(min(c), max(c)+1):
        value = c.get(ms, 0)
        linelen = int(math.ceil(value * k))
        print("%.1f ms: % 3s %s" % (float(ms) / 10, value, '#' * linelen))

    print("")
    cmin = min(c)
    value = c.get(cmin)
    print("minimum %.1f ms: %.1f%% (%s) of %s samples" % (float(cmin) / 10, value * 100.0 / len(h), value, len(h)))
else:
    import numpy as np
    import scipy.stats as stats
    import pylab as pl


    h.sort()
    fit = stats.norm.pdf(h, np.mean(h), np.std(h))  #this is a fitting indeed
    pl.plot(h,fit,'-o')
    pl.hist(h,normed=True)      #use this to draw histogram of your data
    pl.show()                   #use may also need add this
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to