Author: Maciej Fijalkowski <fij...@gmail.com> Branch: extradoc Changeset: r5449:e015701f8bee Date: 2014-10-29 15:07 +0100 http://bitbucket.org/pypy/extradoc/changeset/e015701f8bee/
Log: draft diff --git a/blog/draft/io-improvements.rst b/blog/draft/io-improvements.rst new file mode 100644 --- /dev/null +++ b/blog/draft/io-improvements.rst @@ -0,0 +1,45 @@ + +Hello everyone! + +We're about to wrap up the Warsaw sprint, so I would like to describe some +branches we merged before or during the sprint. This blog post describes +two branches, one with IO improvements the other one with GC improvements. + +The first one was a branch started by Wenzhu Man during the summer of code +and finished by Maciej Fijalkowski and Armin Rigo about not zeroing the nursery. +The way PyPy GC works is that it allocates new objects in the young object +area (the nursery) using bump pointer generation. To simplify things we +used to zero the nursery beforehand, because all the GC references can't +point to random memory. This both affects cache, since you zero a large +memory at once and does unnecessary work for things that don't require zeroing +like large strings. We somehow mitigated the first problem with incremental +nursery zeroing, but this branch removes the zeroing completely, thus +improving the string handling and recursive code (since jitframes don't +requires zeroed memory either). I run the effect on three examples, one +`doing IO`_ in a loop, second one running famous `fibonacci`_ recursively, +which I would argue is a good fit this one time and the last one running +`gcbench`_. The results for fibonacci and gcbench are below +(normalized to cpython 2.7). Benchmarks were run 50 times each: + +XXXX + +The second branch was done by Gregor Wegberg for his master thesis and finished +by Maciej Fijalkowski and Armin Rigo. Since in PyPy objects can move in memory, +PyPy 2.4 solves the problem by copying a buffer before calling read or write. +This is obviously inefficient. The branch "pins" the objects for a short period +of time, by making sure they can't move. This introduces slight complexity +in the garbage collector, where bump pointer allocator needs to "jump over" +pinned objects, but improves the IO quite drastically. In this benchmark +we either write a number of bytes from a freshly allocated string into +/dev/null or read a number of bytes from /dev/full. I'm showing the results +for PyPy 2.4, PyPy with non-zero-nursery and PyPy with non-zero-nursery and +object pinning. Those are wall times for cases using ``os.read/os.write`` +and ``file.read/file.write``, normalized against CPython 2.7. + +Benchmarks were done using PyPy 2.4 and revisions ``85646d1d07fb`` for +non-zero-nursery and ``3d8fe96dc4d9`` for non-zero-nursery and pinning. +The benchmarks were run once, since the standard deviation was small. + +XXXX + +XXX summary diff --git a/talk/pyconpl-2014/benchmarks/fib.py b/talk/pyconpl-2014/benchmarks/fib.py --- a/talk/pyconpl-2014/benchmarks/fib.py +++ b/talk/pyconpl-2014/benchmarks/fib.py @@ -1,7 +1,11 @@ import time import numpy -from matplotlib import pylab +try: + from matplotlib import pylab +except: + from embed.emb import import_mod + pylab = import_mod('matplotlib.pylab') def fib(n): if n == 0 or n == 1: @@ -21,7 +25,7 @@ hist, bins = numpy.histogram(times, 20) #pylab.plot(bins[:-1], hist) -pylab.ylim(ymin=0, ymax=max(times) * 1.2) -pylab.plot(times) +pylab.ylim(0, max(times) * 1.2) +pylab.plot(numpy.array(times)) #pylab.hist(hist, bins, histtype='bar') pylab.show() diff --git a/talk/pyconpl-2014/benchmarks/talk.rst b/talk/pyconpl-2014/benchmarks/talk.rst --- a/talk/pyconpl-2014/benchmarks/talk.rst +++ b/talk/pyconpl-2014/benchmarks/talk.rst @@ -1,3 +1,5 @@ +.. include:: ../beamerdefs.txt + --------------------- How to benchmark code --------------------- @@ -5,7 +7,11 @@ Who are we? ------------ -xxx +* Maciej Fijalkowski, Armin Rigo + +* working on PyPy + +* interested in performance What is this talk is about? --------------------------- @@ -76,3 +82,82 @@ |pause| * not ideal at all + +Writing benchmarks - typical approach +------------------------------------- + +* write a set of small programs that exercise one particular thing + + * recursive fibonacci + + * pybench + +PyBench +------- + +* used to be a tool to compare python implementations + +* only uses microbenchmarks + +* assumes operation times are concatenative + +Problems +-------- + +* a lot of effects are not concatenative + +* optimizations often collapse consecutive operations + +* large scale effects only show up on large programs + +An example +---------- + +* python 2.6 vs python 2.7 had minimal performance changes + +* somewhere in the changelog, there is a gc change mentioned + +* it made pypy translation toolchain jump from 3h to 1h + +* it's "impossible" to write a microbenchmarks for this + +More problems +------------- + +* half of the blog posts comparing VM performance uses recursive fibonacci + +* most of the others use computer language shootout + +PyPy benchmark suite +-------------------- + +* programs from small to medium and large + +* 50 LOC to 100k LOC + +* try to exercise various parts of language (but e.g. lack IO) + +Solutions +--------- + +* measure what you are really interested in + +* derive microbenchmarks from your bottlenecks + +* be skeptical + +* understand what you're measuring + +Q&A +--- + +- http://pypy.org/ + +- http://morepypy.blogspot.com/ + +- http://baroquesoftware.com/ + +- ``#pypy`` at freenode.net + +- Any question? + _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit