2016-07-04 16:17 GMT+02:00 Victor Stinner <victor.stin...@gmail.com>: > I modified the CPython benchmark suite to use my perf module: > https://hg.python.org/sandbox/benchmarks_perf
Hum, you need the development version of perf to test it: git clone https://github.com/haypo/perf.git > Changes: > > * replace explicit warmups with perf automatic warmup > (...) > * avoid nested loops, prefer a single level of loop: perf is > responsible to call the sample function enough times to collect enough > samples Concrete example with performance/bm_go.py. Before: ------------------------- def main(n, timer): times = [] for i in range(5): versus_cpu() # warmup for i in range(n): t1 = timer() versus_cpu() t2 = timer() times.append(t2 - t1) return times ------------------------- After: ------------------------- def main(loops): t0 = perf.perf_counter() for _ in xrange(loops): versus_cpu() return perf.perf_counter() - t0 ------------------------- Example of go benchmark output: --- $ python3 benchmarks_perf/performance/bm_go.py -v calibration: 1 loop: 599 ms calibration: use 1 loop Run 1/25: warmup (1): 601 ms; raw samples (3): 593 ms, 593 ms, 593 ms Run 2/25: warmup (1): 609 ms; raw samples (3): 609 ms, 610 ms, 608 ms Run 3/25: warmup (1): 599 ms; raw samples (3): 598 ms, 606 ms, 598 ms (...) Run 25/25: warmup (1): 606 ms; raw samples (3): 591 ms, 590 ms, 591 ms Median +- std dev: 598 ms +- 8 ms --- The warmup samples ("warmup (1): ... ms") are not used to compute median or std dev. Another example to show fancy features of perf: --- $ python3 benchmarks_perf/performance/bm_telco.py -v --hist --stats --metadata -n5 -p50 calibration: 1 loop: 34.6 ms calibration: 2 loops: 57.8 ms calibration: 4 loops: 105 ms calibration: use 4 loops Run 1/50: warmup (1): 116 ms; raw samples (5): 106 ms, 106 ms, 105 ms, 106 ms, 106 ms Run 2/50: warmup (1): 107 ms; raw samples (5): 107 ms, 107 ms, 106 ms, 106 ms, 106 ms Run 3/50: warmup (1): 107 ms; raw samples (5): 106 ms, 106 ms, 106 ms, 106 ms, 106 ms (...) Run 50/50: warmup (1): 106 ms; raw samples (5): 104 ms, 105 ms, 105 ms, 106 ms, 105 ms Metadata: - aslr: enabled - cpu_count: 4 - cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz - date: 2016-07-04T17:00:33 - description: Test the performance of the Telco decimal benchmark - duration: 35.6 sec - hostname: smithers - name: telco - perf_version: 0.6 - platform: Linux-4.5.7-300.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four - python_executable: /usr/bin/python3 - python_implementation: cpython - python_version: 3.5.1 (64bit) - timer: clock_gettime(CLOCK_MONOTONIC), resolution: 1.00 ns 25.8 ms: 1 ## 25.9 ms: 2 ##### 26.0 ms: 4 ########## 26.0 ms: 13 ############################### 26.1 ms: 27 ################################################################# 26.2 ms: 28 ################################################################### 26.3 ms: 21 ################################################## 26.3 ms: 25 ############################################################ 26.4 ms: 32 ############################################################################# 26.5 ms: 33 ############################################################################### 26.6 ms: 18 ########################################### 26.6 ms: 13 ############################### 26.7 ms: 8 ################### 26.8 ms: 8 ################### 26.8 ms: 7 ################# 26.9 ms: 4 ########## 27.0 ms: 4 ########## 27.1 ms: 1 ## 27.1 ms: 0 | 27.2 ms: 0 | 27.3 ms: 1 ## Number of samples: 250 (50 runs x 5 samples; 1 warmup) Standard deviation / median: 1% Shortest raw sample: 103 ms (4 loops) Minimum: 25.9 ms (-2.1%) Median +- std dev: 26.4 ms +- 0.2 ms Maximum: 27.3 ms (+3.4%) Median +- std dev: 26.4 ms +- 0.2 ms --- I used " -n5 -p50" to compute 5 samples per process and use 50 processes. It helps to get a nicer histogram :-) (to have a better uniform distribution) For histogram, I like using telco because it generates a regular gaussian curve :-) Victor _______________________________________________ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed