Re: [Speed] Measure of Python performance for general-purpose code

2018-04-24 Thread Nick Coghlan
On 23 April 2018 at 05:00, Matthew Woodcraft <matt...@woodcraft.me.uk> wrote:
> To get comprehensible results, I think I really need to summarise the
> speed of a particular build+hardware combination as a single number,
> representing Python's performance for "general purpose code".
>
> So does anyone have any recommendations on what the best figure to
> extract from pyperformance results would be?

There's no such number in the general case, since the way different
aspects should be weighted differs significantly based on your use
case (e.g. a long running server or GUI application may care very
little about startup time, while it's critical for command line
application responsiveness). That's why we have a benchmark suite,
rather than just a single benchmark.

https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b
is an example of going through and calling out specific benchmarks
based on the kind of code they best represent.

So I don't think you're going to be able to get away from coming up
with your own custom scheme that emphasises a particular usage
profile. While the simplest approach is the one the linked article
took (i.e. weight one benchmark at a time at 100%, ignore the others),
searching for "combining multiple benchmark results into an aggregate
score" returned
https://pubsonline.informs.org/doi/pdf/10.1287/ited.2013.0124 as the
first link for me, and based on skimming the abstract and
introduction, I think it's likely to be quite relevant to your
question.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] steps to get pypy benchmarks running

2018-02-14 Thread Nick Coghlan
On 14 February 2018 at 07:52, Mark Shannon <m...@hotpy.org> wrote:
> Hi,
>
> On 13/02/18 14:27, Matti Picus wrote:
>>
>> I have begun to dive into the performance/perf code. My goal is to get
>> pypy benchmarks running on http://speed.python.org. Since PyPy has a JIT,
>> the benchmark runs must have a warmup stage.
>
>
> Why?
> The other interpreters don't get an arbitrary chunk of time for free, so
> neither should PyPy. Warmup in an inherent cost of dynamic optimisers. The
> benefits should outweigh the costs, but the costs shouldn't be ignored.

For speed.python.org purposes, that would likely be most usefully
reported as separate "PyPy (cold)" and "PyPy (warm)" results (where
the former runs under the same conditions as CPython, while the latter
is given the benefit of warming up the JIT first).

Only reporting the former would miss the point of PyPy's main use case
(i.e. long lived processes), while only reporting the latter would
miss one of the main answers to "Why hasn't everyone already switched
to PyPy for all their Python needs?" (i.e. when the app doesn't run
long enough to pay back the increased start-up overhead).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Update Django from 1.11 to 2.0? What about Python 2.7 and PyPy2?

2018-01-27 Thread Nick Coghlan
On 19 January 2018 at 20:39, Victor Stinner <victor.stin...@gmail.com>
wrote:

> 2018-01-19 11:28 GMT+01:00 Stefan Behnel <stefan...@behnel.de>:
> > That suggests adding Django 2 as a new Py3-only benchmark.
>
> Again, the practical issue is to install Django 2 and Django 1.11 in
> the same virtual environment. I'm not sure that it's doable.
>
> I would prefer to not have to create a different virtualenv for
> Python3-only dependencies.
>
> I needed to release quickly a bugfix release, fix --track-memory,
> feature asked by Xiang Zhang, so I released performance 0.6.1 which
> only updated Django from 1.11.3 to 1.11.9.
>
> Or we need to redesign how performance install dependencies, but
> that's a larger project :-)
>

It may be worth looking at using pew to set up a separate virtual
environment for each benchmark, but then use `pew add` to share common
components (like perf itself) between them. That way you won't have
conflicting dependencies between benchmarks (since they'll be in separate
venvs), without having to duplicate *all* the common components.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


[Speed] Impact of Meltdown/Spectre OS patches on benchmark results?

2018-01-10 Thread Nick Coghlan
Hi folks,

Reading https://medium.com/implodinggradients/meltdown-c24a9d5e254e
prompts me to ask: are speed.python.org benchmark results produced now
actually going to be comparable with those executed last year?

Or will the old results need to be backfilled again with the new
baseline OS performance?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Python Performance Benchmark Suite revision request

2017-09-13 Thread Nick Coghlan
On 14 September 2017 at 08:48, Victor Stinner <victor.stin...@gmail.com> wrote:
> There are likely tools to automate these steps.

wagon, for example: https://github.com/cloudify-cosmo/wagon#create-packages

(Although then you have to bootstrap wagon in the destination
environment to handle the install process)

>> We used The Grand Unified Python Benchmark Suite
>> https://hg.python.org/benchmarks in the past, and found that one was very
>> easy to use, with far less dependency, and can be simply zipped and deployed
>> easily.
>
> Yeah, you are right. But it was more complex to add new dependencies
> and update dependencies. As a consequence, we tested softwares which
> were 5 years old... Not really revelant.

It would be useful to provide instructions in the README on how to:

1. Use "pip download --no-binary :all: -r requirements.txt
archive_dir" to get the dependencies on an internet connected machine
2. Use "pip install --no-index --find-links archive_dir -r
requirements.txt" to install from the unpacked archive instead of the
internet

Potentially, performance could gain a subcommand to do the initial
download, and a "performance venv create" option to specify a local
directory to use instead of the index server when installing
dependencies.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Testing a wide Unicode build of Python 2 on speed.python.org?

2017-04-16 Thread Nick Coghlan
On 13 April 2017 at 02:15, Victor Stinner <victor.stin...@gmail.com> wrote:
> 2017-04-12 10:52 GMT+02:00 Victor Stinner <victor.stin...@gmail.com>:
>> I'm running benchmarks with this option. Once results will be ready, I
>> will remove the old 2.7 result to replace it with the new one.
>
> Done. speed.python.org now uses UCS-4 on Python 2.7. Is it better now?

Thanks!

> Previous JSON file:
> https://github.com/haypo/performance_results/raw/master/2017-03-31-cpython/2017-04-03_16-11-2.7-23d6eb656ec2.json.gz
>
> New JSON file:
> https://github.com/haypo/performance_results/raw/master/2017-04-12-cpython/2017-04-10_17-27-2.7-e0cba5b45a5c.json.gz
>
> I see small performance differences, but they don't seem to be related
> to UTF-16 => UCS-4, but more random noise.

Given that lack of divergence and the known Unicode correctness
problems in narrow builds, I guess it doesn't make much sense to
invest time in benchmarking both of them.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


[Speed] Testing a wide Unicode build of Python 2 on speed.python.org?

2017-04-11 Thread Nick Coghlan
speed.python.org has been updated to split out per-branch results for
easier cross version comparisons, but looking at the performance repo
suggests that the only 2.7 results currently reported are for the
default UCS2 builds.

That isn't the way Linux distros typically ship Python: we/they
specify the "--enable-unicode=ucs4" option when calling configure in
order to get correct Unicode handling.

Not that long ago, `pyenv` also switched to using wide builds for
`manylinux1` wheel compatibility, and conda has similarly used wide
builds from the start for ABI compatibility with system Python
runtimes.

That means the current Python 2 benchmark results may be
unrepresentative for anyone using a typical Linux build of CPython:
the pay-off in reduced memory use and reduced data copying from Python
3's dynamic string representation is higher relative to Python 2 wide
builds than it is relative to narrow builds, and we'd expect that to
affect at least the benchmarks that manipulate text data.

Perhaps it would make sense to benchmark two different variants of the
Python 2.7 branch, one with a wide build, and one with a narrow one?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Median +- MAD or Mean +- std dev?

2017-03-14 Thread Nick Coghlan
On 14 March 2017 at 17:14, Serhiy Storchaka <storch...@gmail.com> wrote:

> On 13.03.17 22:38, Antoine Pitrou wrote:
>
>> Additionally, while mean and std dev are generally quite well
>> understood, the properties of the median absolute deviation are
>> generally little known.
>>
>
> Std dev is well understood for the distribution close to normal. But when
> the distribution is too skewed or multimodal (as in your quick example)
> common assumptions (that 2/3 of samples are in the range of the std dev,
> 95% of samples are in the range of two std devs, 99% of samples are in the
> range of three std devs) are no longer valid.


That would suggest that the implicit assumption of a measure-of-centrality
with a measure-of-symmetric-deviation may need to be challenged, as at
least some meaningful performance problems are going to show up as
non-normal distributions in the benchmark results.

Network services typically get around the "inherent variance" problem by
looking at a few key percentiles like 50%, 90% and 95%. Perhaps that would
be appropriate here as well?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Ubuntu 16.04 speed issues

2016-11-14 Thread Nick Coghlan
On 11 November 2016 at 03:01, Paul Graydon <p...@paulgraydon.co.uk> wrote:
> I've a niggling feeling there was discussion about some performance drops on 
> 16.04 not all that long ago, but I'm
> completely failing to find it in my emails.

You may be thinking of the PGO-related issue that Victor found on
*14*.04: https://mail.python.org/pipermail/speed/2016-November/000471.html

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Latest enhancements of perf 0.8.1 and performance 0.3.1

2016-11-05 Thread Nick Coghlan
On 3 November 2016 at 02:03, Armin Rigo <armin.r...@gmail.com> wrote:
> Hi Victor,
>
> On 2 November 2016 at 16:53, Victor Stinner <victor.stin...@gmail.com> wrote:
>> 2016-11-02 15:20 GMT+01:00 Armin Rigo <armin.r...@gmail.com>:
>>> Is that really the kind of examples you want to put forward?
>>
>> I am not a big fan of timeit, but we must use it sometimes to
>> micro-optimizations in CPython to check if an optimize really makes
>> CPython faster or not. I am only trying to enhance timeit.
>> Understanding results require to understand how the statements are
>> executed.
>
> Don't get me wrong, I understand the point of the following usage of timeit:
>
> python2 -m perf timeit '[1,2]*1000' --duplicate=1000
>
> What I'm criticizing here is this instead:
>
> python2 -m perf timeit '[1,2]*1000' --duplicate=1000 --compare-to=pypy
>
> because you're very unlikely to get any relevant information from such
> a comparison.  I stand by my original remark: I would say it should be
> an error or at least a big fat warning to use --duplicate and PyPy in
> the same invocation.  This is as opposed to silently ignoring
> --duplicate for PyPy, which is just adding more confusion imho.

Since the use case for --duplicate is to reduce the relative overhead
of the outer loop when testing a micro-optimisation within a *given*
interpreter, perhaps the error should be for combining --duplicate and
--compare-to at all? And then it would just be up to developers of a
*particular* implementation to know if "--duplicate" is relevant to
them.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] New benchmark suite for Python

2016-08-20 Thread Nick Coghlan
On 20 August 2016 at 02:50, Maciej Fijalkowski <fij...@gmail.com> wrote:
> Very likely just pyc import time

As one of the import system maintainers, that's a number I consider
quite interesting and worth benchmarking :)

It's also one of the key numbers for Linux distro Python usage, since
it impacts how responsive the system shell feels to developers and
administrators - an end user can't readily tell the difference between
"this shell is slow" and "this particular command I am running is
using a language interpreter with a long startup time", but an
interpreter benchmark suite can.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] New benchmark suite for Python

2016-08-19 Thread Nick Coghlan
On 19 August 2016 at 01:55, Victor Stinner <victor.stin...@gmail.com> wrote:
> 2016-08-18 8:48 GMT+02:00 Armin Rigo <ar...@tunes.org>:
>> Indeed, bzr cannot be installed on PyPy because it uses Cython in a
>> strange way: it declares and directly pokes inside PyListObjects from
>> a .pyx file.  But note that bzr (seems to) have systematically a pure
>> Python version of all its .pyx files. (...)
>
> bazar is only used for a "startup" benchmark. I don't think that such
> benchmark is very interesting... I would prefer to see a benchmark on
> a less dummy operation on the repository than displaying the help...

Simple commands like displaying help messages are where interpreter
startup time dominates the end user experience for applications
written in Python, though. For example, improvements to import system
performance tend to mostly show up there - for longer running
benchmarks, changes in startup time tend to get swamped by the actual
runtime speed, while the baseline "python -c 'pass'" mainly varies
based on how many modules we're implicitly importing at startup rather
than how well the import system is performing .

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] New CPython benchmark suite based on perf

2016-07-05 Thread Nick Coghlan
On 5 July 2016 at 20:08, Antoine Pitrou <solip...@pitrou.net> wrote:
> On Tue, 5 Jul 2016 11:35:30 +0200
> Victor Stinner <victor.stin...@gmail.com>
> wrote:
>> In practice, it almost never occurs to have all samples with the same
>> value. There is always a statistic distribution, usually as a gaussian
>> curse.
>
> If it's a gaussian curve (not a curse, probably :-)), then you can
> summarize it with two values: the mean and the stddev.  But it's
> probably not a gaussian, because of system noise and other factors, so
> your assumption is wrong :-)

If you haven't already, I highly recommend reading the discussion in
https://github.com/haypo/perf/issues/1 that led to Victor adopting the
current median + stddev approach

As Mahmoud noted there, in terms of really understanding the benchmark
results, there's no substitute for actually looking at the histograms
with the result distributions. The numeric results are never going to
be able to do more than provide a "flavour" for those results, since
the distributions aren't Guassian, but trying to characterise and
describe them properly would inevitably confuse folks that aren't
already expert statisticians.

The median + stddev approach helps convey a "typical" result better
than the minimum or mean do, while also providing an indication when
the variation in results is too high for the median to really be
meaningful.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Should we change what benchmarks we have?

2016-02-11 Thread Nick Coghlan
On 12 February 2016 at 15:17, Nick Coghlan <ncogh...@gmail.com> wrote:
> It's probably best to consider telco as a microbenchmark of decimal
> module performance rather than as a general macrobenchmark, though -
> that's why the integration of cdecimal improved it so dramatically.

Ah, I had misread the rest of the thread - if telco in its current
form isn't useful as a decimal microbenchmark, then yes, updating it
to improve its stability is more important than preserving it as is.
Its original use case was to optimise the decimal implementation
itself by figuring out where the hotspots were and optimising those,
rather than as a general benchmark for other changes to the
interpreter implementation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] Should we change what benchmarks we have?

2016-02-11 Thread Nick Coghlan
On 12 February 2016 at 08:50, Yury Selivanov <yselivanov...@gmail.com> wrote:
> On 2016-02-11 5:37 PM, Victor Stinner wrote:
>> 2016-02-11 19:36 GMT+01:00 Brett Cannon <br...@python.org>:
>>> Are we happy with the current benchmarks?
>>
>> bm_regex8 looks unstable, but I don't know if it's an issue of the
>> benchmark itself or perf.py (see the other thread "[Speed] Any changes
>> we want to make to perf.py?").
>
> It's super unstable.  As well as telco -- I don't trust those benchmarks.

telco covers a fairly important use case in the form of "Do things
that billing applications need to do". Spending a few months running
and re-running that to help optimise the original Python
implementation of decimal was one of my first contributions to CPython
(including figuring out the "int("".join(map(str, digits)))" hack that
proved to be the fastest way in CPython to convert a tuple of digits
into a Python integer, much to the annoyance of the PyPy folks trying
to accelerate that code later).

It's probably best to consider telco as a microbenchmark of decimal
module performance rather than as a general macrobenchmark, though -
that's why the integration of cdecimal improved it so dramatically.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
https://mail.python.org/mailman/listinfo/speed


Re: [Speed] merging PyPy and Python benchmark suite

2012-07-25 Thread Nick Coghlan
On Wed, Jul 25, 2012 at 8:54 PM, Maciej Fijalkowski fij...@gmail.com wrote:
 Done. PyPy benchmarks are MIT

Thanks for clearing that up.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Speed mailing list
Speed@python.org
http://mail.python.org/mailman/listinfo/speed