Re: [Speed] Measure of Python performance for general-purpose code
On 23 April 2018 at 05:00, Matthew Woodcraft <matt...@woodcraft.me.uk> wrote: > To get comprehensible results, I think I really need to summarise the > speed of a particular build+hardware combination as a single number, > representing Python's performance for "general purpose code". > > So does anyone have any recommendations on what the best figure to > extract from pyperformance results would be? There's no such number in the general case, since the way different aspects should be weighted differs significantly based on your use case (e.g. a long running server or GUI application may care very little about startup time, while it's critical for command line application responsiveness). That's why we have a benchmark suite, rather than just a single benchmark. https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b is an example of going through and calling out specific benchmarks based on the kind of code they best represent. So I don't think you're going to be able to get away from coming up with your own custom scheme that emphasises a particular usage profile. While the simplest approach is the one the linked article took (i.e. weight one benchmark at a time at 100%, ignore the others), searching for "combining multiple benchmark results into an aggregate score" returned https://pubsonline.informs.org/doi/pdf/10.1287/ited.2013.0124 as the first link for me, and based on skimming the abstract and introduction, I think it's likely to be quite relevant to your question. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] steps to get pypy benchmarks running
On 14 February 2018 at 07:52, Mark Shannon <m...@hotpy.org> wrote: > Hi, > > On 13/02/18 14:27, Matti Picus wrote: >> >> I have begun to dive into the performance/perf code. My goal is to get >> pypy benchmarks running on http://speed.python.org. Since PyPy has a JIT, >> the benchmark runs must have a warmup stage. > > > Why? > The other interpreters don't get an arbitrary chunk of time for free, so > neither should PyPy. Warmup in an inherent cost of dynamic optimisers. The > benefits should outweigh the costs, but the costs shouldn't be ignored. For speed.python.org purposes, that would likely be most usefully reported as separate "PyPy (cold)" and "PyPy (warm)" results (where the former runs under the same conditions as CPython, while the latter is given the benefit of warming up the JIT first). Only reporting the former would miss the point of PyPy's main use case (i.e. long lived processes), while only reporting the latter would miss one of the main answers to "Why hasn't everyone already switched to PyPy for all their Python needs?" (i.e. when the app doesn't run long enough to pay back the increased start-up overhead). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Update Django from 1.11 to 2.0? What about Python 2.7 and PyPy2?
On 19 January 2018 at 20:39, Victor Stinner <victor.stin...@gmail.com> wrote: > 2018-01-19 11:28 GMT+01:00 Stefan Behnel <stefan...@behnel.de>: > > That suggests adding Django 2 as a new Py3-only benchmark. > > Again, the practical issue is to install Django 2 and Django 1.11 in > the same virtual environment. I'm not sure that it's doable. > > I would prefer to not have to create a different virtualenv for > Python3-only dependencies. > > I needed to release quickly a bugfix release, fix --track-memory, > feature asked by Xiang Zhang, so I released performance 0.6.1 which > only updated Django from 1.11.3 to 1.11.9. > > Or we need to redesign how performance install dependencies, but > that's a larger project :-) > It may be worth looking at using pew to set up a separate virtual environment for each benchmark, but then use `pew add` to share common components (like perf itself) between them. That way you won't have conflicting dependencies between benchmarks (since they'll be in separate venvs), without having to duplicate *all* the common components. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
[Speed] Impact of Meltdown/Spectre OS patches on benchmark results?
Hi folks, Reading https://medium.com/implodinggradients/meltdown-c24a9d5e254e prompts me to ask: are speed.python.org benchmark results produced now actually going to be comparable with those executed last year? Or will the old results need to be backfilled again with the new baseline OS performance? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Python Performance Benchmark Suite revision request
On 14 September 2017 at 08:48, Victor Stinner <victor.stin...@gmail.com> wrote: > There are likely tools to automate these steps. wagon, for example: https://github.com/cloudify-cosmo/wagon#create-packages (Although then you have to bootstrap wagon in the destination environment to handle the install process) >> We used The Grand Unified Python Benchmark Suite >> https://hg.python.org/benchmarks in the past, and found that one was very >> easy to use, with far less dependency, and can be simply zipped and deployed >> easily. > > Yeah, you are right. But it was more complex to add new dependencies > and update dependencies. As a consequence, we tested softwares which > were 5 years old... Not really revelant. It would be useful to provide instructions in the README on how to: 1. Use "pip download --no-binary :all: -r requirements.txt archive_dir" to get the dependencies on an internet connected machine 2. Use "pip install --no-index --find-links archive_dir -r requirements.txt" to install from the unpacked archive instead of the internet Potentially, performance could gain a subcommand to do the initial download, and a "performance venv create" option to specify a local directory to use instead of the index server when installing dependencies. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Testing a wide Unicode build of Python 2 on speed.python.org?
On 13 April 2017 at 02:15, Victor Stinner <victor.stin...@gmail.com> wrote: > 2017-04-12 10:52 GMT+02:00 Victor Stinner <victor.stin...@gmail.com>: >> I'm running benchmarks with this option. Once results will be ready, I >> will remove the old 2.7 result to replace it with the new one. > > Done. speed.python.org now uses UCS-4 on Python 2.7. Is it better now? Thanks! > Previous JSON file: > https://github.com/haypo/performance_results/raw/master/2017-03-31-cpython/2017-04-03_16-11-2.7-23d6eb656ec2.json.gz > > New JSON file: > https://github.com/haypo/performance_results/raw/master/2017-04-12-cpython/2017-04-10_17-27-2.7-e0cba5b45a5c.json.gz > > I see small performance differences, but they don't seem to be related > to UTF-16 => UCS-4, but more random noise. Given that lack of divergence and the known Unicode correctness problems in narrow builds, I guess it doesn't make much sense to invest time in benchmarking both of them. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
[Speed] Testing a wide Unicode build of Python 2 on speed.python.org?
speed.python.org has been updated to split out per-branch results for easier cross version comparisons, but looking at the performance repo suggests that the only 2.7 results currently reported are for the default UCS2 builds. That isn't the way Linux distros typically ship Python: we/they specify the "--enable-unicode=ucs4" option when calling configure in order to get correct Unicode handling. Not that long ago, `pyenv` also switched to using wide builds for `manylinux1` wheel compatibility, and conda has similarly used wide builds from the start for ABI compatibility with system Python runtimes. That means the current Python 2 benchmark results may be unrepresentative for anyone using a typical Linux build of CPython: the pay-off in reduced memory use and reduced data copying from Python 3's dynamic string representation is higher relative to Python 2 wide builds than it is relative to narrow builds, and we'd expect that to affect at least the benchmarks that manipulate text data. Perhaps it would make sense to benchmark two different variants of the Python 2.7 branch, one with a wide build, and one with a narrow one? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Median +- MAD or Mean +- std dev?
On 14 March 2017 at 17:14, Serhiy Storchaka <storch...@gmail.com> wrote: > On 13.03.17 22:38, Antoine Pitrou wrote: > >> Additionally, while mean and std dev are generally quite well >> understood, the properties of the median absolute deviation are >> generally little known. >> > > Std dev is well understood for the distribution close to normal. But when > the distribution is too skewed or multimodal (as in your quick example) > common assumptions (that 2/3 of samples are in the range of the std dev, > 95% of samples are in the range of two std devs, 99% of samples are in the > range of three std devs) are no longer valid. That would suggest that the implicit assumption of a measure-of-centrality with a measure-of-symmetric-deviation may need to be challenged, as at least some meaningful performance problems are going to show up as non-normal distributions in the benchmark results. Network services typically get around the "inherent variance" problem by looking at a few key percentiles like 50%, 90% and 95%. Perhaps that would be appropriate here as well? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Ubuntu 16.04 speed issues
On 11 November 2016 at 03:01, Paul Graydon <p...@paulgraydon.co.uk> wrote: > I've a niggling feeling there was discussion about some performance drops on > 16.04 not all that long ago, but I'm > completely failing to find it in my emails. You may be thinking of the PGO-related issue that Victor found on *14*.04: https://mail.python.org/pipermail/speed/2016-November/000471.html Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Latest enhancements of perf 0.8.1 and performance 0.3.1
On 3 November 2016 at 02:03, Armin Rigo <armin.r...@gmail.com> wrote: > Hi Victor, > > On 2 November 2016 at 16:53, Victor Stinner <victor.stin...@gmail.com> wrote: >> 2016-11-02 15:20 GMT+01:00 Armin Rigo <armin.r...@gmail.com>: >>> Is that really the kind of examples you want to put forward? >> >> I am not a big fan of timeit, but we must use it sometimes to >> micro-optimizations in CPython to check if an optimize really makes >> CPython faster or not. I am only trying to enhance timeit. >> Understanding results require to understand how the statements are >> executed. > > Don't get me wrong, I understand the point of the following usage of timeit: > > python2 -m perf timeit '[1,2]*1000' --duplicate=1000 > > What I'm criticizing here is this instead: > > python2 -m perf timeit '[1,2]*1000' --duplicate=1000 --compare-to=pypy > > because you're very unlikely to get any relevant information from such > a comparison. I stand by my original remark: I would say it should be > an error or at least a big fat warning to use --duplicate and PyPy in > the same invocation. This is as opposed to silently ignoring > --duplicate for PyPy, which is just adding more confusion imho. Since the use case for --duplicate is to reduce the relative overhead of the outer loop when testing a micro-optimisation within a *given* interpreter, perhaps the error should be for combining --duplicate and --compare-to at all? And then it would just be up to developers of a *particular* implementation to know if "--duplicate" is relevant to them. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] New benchmark suite for Python
On 20 August 2016 at 02:50, Maciej Fijalkowski <fij...@gmail.com> wrote: > Very likely just pyc import time As one of the import system maintainers, that's a number I consider quite interesting and worth benchmarking :) It's also one of the key numbers for Linux distro Python usage, since it impacts how responsive the system shell feels to developers and administrators - an end user can't readily tell the difference between "this shell is slow" and "this particular command I am running is using a language interpreter with a long startup time", but an interpreter benchmark suite can. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] New benchmark suite for Python
On 19 August 2016 at 01:55, Victor Stinner <victor.stin...@gmail.com> wrote: > 2016-08-18 8:48 GMT+02:00 Armin Rigo <ar...@tunes.org>: >> Indeed, bzr cannot be installed on PyPy because it uses Cython in a >> strange way: it declares and directly pokes inside PyListObjects from >> a .pyx file. But note that bzr (seems to) have systematically a pure >> Python version of all its .pyx files. (...) > > bazar is only used for a "startup" benchmark. I don't think that such > benchmark is very interesting... I would prefer to see a benchmark on > a less dummy operation on the repository than displaying the help... Simple commands like displaying help messages are where interpreter startup time dominates the end user experience for applications written in Python, though. For example, improvements to import system performance tend to mostly show up there - for longer running benchmarks, changes in startup time tend to get swamped by the actual runtime speed, while the baseline "python -c 'pass'" mainly varies based on how many modules we're implicitly importing at startup rather than how well the import system is performing . Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] New CPython benchmark suite based on perf
On 5 July 2016 at 20:08, Antoine Pitrou <solip...@pitrou.net> wrote: > On Tue, 5 Jul 2016 11:35:30 +0200 > Victor Stinner <victor.stin...@gmail.com> > wrote: >> In practice, it almost never occurs to have all samples with the same >> value. There is always a statistic distribution, usually as a gaussian >> curse. > > If it's a gaussian curve (not a curse, probably :-)), then you can > summarize it with two values: the mean and the stddev. But it's > probably not a gaussian, because of system noise and other factors, so > your assumption is wrong :-) If you haven't already, I highly recommend reading the discussion in https://github.com/haypo/perf/issues/1 that led to Victor adopting the current median + stddev approach As Mahmoud noted there, in terms of really understanding the benchmark results, there's no substitute for actually looking at the histograms with the result distributions. The numeric results are never going to be able to do more than provide a "flavour" for those results, since the distributions aren't Guassian, but trying to characterise and describe them properly would inevitably confuse folks that aren't already expert statisticians. The median + stddev approach helps convey a "typical" result better than the minimum or mean do, while also providing an indication when the variation in results is too high for the median to really be meaningful. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Should we change what benchmarks we have?
On 12 February 2016 at 15:17, Nick Coghlan <ncogh...@gmail.com> wrote: > It's probably best to consider telco as a microbenchmark of decimal > module performance rather than as a general macrobenchmark, though - > that's why the integration of cdecimal improved it so dramatically. Ah, I had misread the rest of the thread - if telco in its current form isn't useful as a decimal microbenchmark, then yes, updating it to improve its stability is more important than preserving it as is. Its original use case was to optimise the decimal implementation itself by figuring out where the hotspots were and optimising those, rather than as a general benchmark for other changes to the interpreter implementation. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] Should we change what benchmarks we have?
On 12 February 2016 at 08:50, Yury Selivanov <yselivanov...@gmail.com> wrote: > On 2016-02-11 5:37 PM, Victor Stinner wrote: >> 2016-02-11 19:36 GMT+01:00 Brett Cannon <br...@python.org>: >>> Are we happy with the current benchmarks? >> >> bm_regex8 looks unstable, but I don't know if it's an issue of the >> benchmark itself or perf.py (see the other thread "[Speed] Any changes >> we want to make to perf.py?"). > > It's super unstable. As well as telco -- I don't trust those benchmarks. telco covers a fairly important use case in the form of "Do things that billing applications need to do". Spending a few months running and re-running that to help optimise the original Python implementation of decimal was one of my first contributions to CPython (including figuring out the "int("".join(map(str, digits)))" hack that proved to be the fastest way in CPython to convert a tuple of digits into a Python integer, much to the annoyance of the PyPy folks trying to accelerate that code later). It's probably best to consider telco as a microbenchmark of decimal module performance rather than as a general macrobenchmark, though - that's why the integration of cdecimal improved it so dramatically. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org https://mail.python.org/mailman/listinfo/speed
Re: [Speed] merging PyPy and Python benchmark suite
On Wed, Jul 25, 2012 at 8:54 PM, Maciej Fijalkowski fij...@gmail.com wrote: Done. PyPy benchmarks are MIT Thanks for clearing that up. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Speed mailing list Speed@python.org http://mail.python.org/mailman/listinfo/speed