The discussion on benchmarking is no more related to compact dict, so I start a new thread.
2016-09-15 13:27 GMT+02:00 Paul Moore <p.f.mo...@gmail.com>: > Just as a side point, perf provided essentially identical results but > took 2 minutes as opposed to 8 seconds for timeit to do so. I > understand why perf is better, and I appreciate all the work Victor > did to create it, and analyze the results, but for getting a quick > impression of how a microbenchmark performs, I don't see timeit as > being *quite* as bad as Victor is claiming. He he, I expected such complain. I already wrote a section in the doc explaining "why perf is so slow": http://perf.readthedocs.io/en/latest/perf.html#why-is-perf-so-slow So you say that timeit just works and is faster? Ok. Let's see a small session: $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 46.7 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 46.9 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 46.9To msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 47 msec per loop $ python2 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 36.3 msec per loop $ python2 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 36.1 msec per loop $ python2 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 36.5 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 48.3 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 48.4 msec per loop $ python3 -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 48.8 msec per loop I ran timeit 7 times on Python 3 and 3 times on Python 2. Please ignore Python 2, it's just a quick command to interfere with Python 3 tests. Now the question is: what is the "correct" result for Python3? Let's take the minimum of the minimums: 46.7 ms. Now imagine that you only ran only have the first 4 runs. What is the "good" result now? Min is still 46.7 ms. And what if you only had the last 3 runs? What is the "good" result now? Min becomes 48.3 ms. On such microbenchmark, the difference between 46.7 ms and 48.3 ms is large :-( How do you know that you ran timeit enough times to make sure that the result is the good one? For me, the timeit tool is broken because you *must* run it many times to workaround its limits. In short, I wrote the perf module to answer to these questions. * perf uses multiple processes to test multiple memory layouts and multiple randomized hash functions * perf ignores the first run, used to "warmup" the benchmark (--warmups command line option) * perf provides many tools to analyze the distribution of results: minimum, maximum, standard deviation, histogram, number of samples, median, etc. * perf displays the median +- standard deviation: median is more reproductible and standard deviation gives an idea of the stability of the benchmark * etc. > I will tend to use perf now that I have it installed, and now that I > know how to run a published timeit invocation using perf. It's a > really cool tool. But I certainly won't object to seeing people > publish timeit results (any more than I'd object to *any* > mirobenchmark). I consider that timeit results are not reliable at all. There is no standard deviation and it's hard to guess how much times the user ran timeit nor how he/she computed the "good result". perf takes ~60 seconds by default. If you don't care of the accuracy, use --fast and it now only takes 20 seconds ;-) Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com