Author: Hakan Ardo <ha...@debian.org> Branch: extradoc Changeset: r5465:206c058dcb38 Date: 2014-11-21 09:50 +0100 http://bitbucket.org/pypy/extradoc/changeset/206c058dcb38/
Log: merge diff --git a/blog/draft/io-improvements.rst b/blog/draft/io-improvements.rst new file mode 100644 --- /dev/null +++ b/blog/draft/io-improvements.rst @@ -0,0 +1,88 @@ + +Hello everyone! + +We've wrapped up the Warsaw sprint, so I would like to describe some +branches which have been recently merged and which improved the I/O and the +GC: `gc_no_cleanup_nursery`_ and `gc-incminimark-pinning`_. + +.. _`gc_no_cleanup_nursery`: https://bitbucket.org/pypy/pypy/commits/9e2f7a37c1e2 +.. _`gc-incminimark-pinning`: https://bitbucket.org/pypy/pypy/commits/64017d818038 + +The first branch was started by Wenzhu Man for her Google Summer of Code +and finished by Maciej Fijałkowski and Armin Rigo. +The PyPy GC works by allocating new objects in the young object +area (the nursery), simply by incrementing a pointer. After each minor +collection, the nursery has to be cleaned up. For simplicity, the GC used +to do it by zeroing the whole nursery. + +This approach has bad effects on the cache, since you zero a large piece of +memory at once and do unnecessary work for things that don't require zeroing +like large strings. We mitigated the first problem somewhat with incremental +nursery zeroing, but this branch removes the zeroing completely, thus +improving the string handling and recursive code (since jitframes don't +requires zeroed memory either). I measured the effect on two examples: +a recursive implementation of `fibonacci`_ and `gcbench`_, +to measure GC performance. + +.. _`fibonacci`: https://bitbucket.org/pypy/benchmarks/src/69152c2aee7766051aab15735b0b82a46b82b802/own/fib.py?at=default +.. _`gcbench`: https://bitbucket.org/pypy/benchmarks/src/69152c2aee7766051aab15735b0b82a46b82b802/own/gcbench.py?at=default + +The results for fibonacci and gcbench are below (normalized to cpython +2.7). Benchmarks were run 50 times each (note that the big standard +deviation comes mostly from the warmup at the beginning, true figures +are smaller): + ++----------------+----- ------------+-------------------------+--------------------+ +| benchmark | CPython | PyPy 2.4 | PyPy non-zero | ++----------------+------------------+-------------------------+--------------------+ +| fibonacci | 4.8+-0.15 (1.0x) | 0.59+-0.07 (8.1x) | 0.45+-0.07 (10.6x) | ++----------------+------------------+-------------------------+--------------------+ +| gcbench | 22+-0.36 (1.0x) | 1.34+-0.28 (16.4x) | 1.02+-0.15 (21.6x) | ++----------------+------------------+-------------------------+--------------------+ + +The second branch was done by Gregor Wegberg for his master thesis and finished +by Maciej Fijałkowski and Armin Rigo. Because of the way it works, the PyPy GC from +time to time moves the objects in memory, meaning that their address can change. +Therefore, if you want to pass pointers to some external C function (for +example, write(2) or read(2)), you need to ensure that the objects they are +pointing to will not be moved by the GC (e.g. when running a different thread). +PyPy up to 2.4 solves the problem by copying the data into or from a non-movable buffer, which +is obviously inefficient. +The branch introduce the concept of "pinning", which allows us to inform the +GC that it is not allowed to move a certain object for a short period of time. +This introduces a bit of extra complexity +in the garbage collector, but improves the I/O performance quite drastically, +because we no longer need the extra copy to and from the non-movable buffers. + +In `this benchmark`_, which does I/O in a loop, +we either write a number of bytes from a freshly allocated string into +/dev/null or read a number of bytes from /dev/full. I'm showing the results +for PyPy 2.4, PyPy with non-zero-nursery and PyPy with non-zero-nursery and +object pinning. Those are wall times for cases using ``os.read/os.write`` +and ``file.read/file.write``, normalized against CPython 2.7. + +Benchmarks were done using PyPy 2.4 and revisions ``85646d1d07fb`` for +non-zero-nursery and ``3d8fe96dc4d9`` for non-zero-nursery and pinning. +The benchmarks were run once, since the standard deviation was small. + +XXXX + +What we can see is that ``os.read`` and ``os.write`` both improved greatly +and outperforms CPython now for each combination. ``file`` operations are +a little more tricky, and while those branches improved the situation a bit, +the improvement is not as drastic as in ``os`` versions. It really should not +be the case and it showcases how our ``file`` buffering is inferior to CPython. +We plan on removing our own buffering and using ``FILE*`` in C in the near future, +so we should outperform CPython on those too (since our allocations are cheaper). +If you look carefully in the benchmark, the write function is copied three times. +This hack is intended to avoid JIT overspecializing the assembler code, which happens +because the buffering code was written way before the JIT was done. In fact, our buffering +is hilariously bad, but if stars align correctly it can be JIT-compiled to something +that's not half bad. Try removing the hack and seeing how the performance of the last +benchmark drops :-) Again, this hack should be absolutely unnecessary once we remove +our own buffering, stay tuned for more. + +Cheers, +fijal + +.. _`this benchmark`: https://bitbucket.org/pypy/benchmarks/src/69152c2aee7766051aab15735b0b82a46b82b802/io/iobasic.py?at=default diff --git a/blog/draft/iobase.png b/blog/draft/iobase.png new file mode 100644 index 0000000000000000000000000000000000000000..0b8bedc12421ce39c24931d68f8993da74bb2808 GIT binary patch [cut] diff --git a/blog/draft/tornado-stm.rst b/blog/draft/tornado-stm.rst new file mode 100644 --- /dev/null +++ b/blog/draft/tornado-stm.rst @@ -0,0 +1,252 @@ +Tornado without a GIL on PyPy STM +================================= + +Python has a GIL, right? Not quite - PyPy STM is a python implementation +without a GIL, so it can scale CPU-bound work to several cores. +PyPy STM is developed by Armin Rigo and Remi Meier, +and supported by community `donations <http://pypy.org/tmdonate2.html>`_. +You can read more about it in the +`docs <http://pypy.readthedocs.org/en/latest/stm.html>`_. + +Although PyPy STM is still a work in progress, in many cases it can already +run CPU-bound code faster than regular PyPy, when using multiple cores. +Here we will see how to slightly modify Tornado IO loop to use +`transaction <https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py>`_ +module. +This module is `described <http://pypy.readthedocs.org/en/latest/stm.html#atomic-sections-transactions-etc-a-better-way-to-write-parallel-programs>`_ +in the docs and is really simple to use - please see an example there. +An event loop of Tornado, or any other asynchronous +web server, looks like this (with some simplifications):: + + while True: + for callback in list(self._callbacks): + self._run_callback(callback) + event_pairs = self._impl.poll() + self._events.update(event_pairs) + while self._events: + fd, events = self._events.popitem() + handler = self._handlers[fd] + self._handle_event(fd, handler, events) + +We get IO events, and run handlers for all of them, these handlers can +also register new callbacks, which we run too. When using such a framework, +it is very nice to have a guaranty that all handlers are run serially, +so you do not have to put any locks. This is an ideal case for the +transaction module - it gives us guaranties that things appear +to be run serially, so in user code we do not need any locks. We just +need to change the code above to something like:: + + while True: + for callback in list(self._callbacks): + transaction.add( + self._run_callback, callback) # added + transaction.run() # added + event_pairs = self._impl.poll() + self._events.update(event_pairs) + while self._events: + fd, events = self._events.popitem() + handler = self._handlers[fd] + transaction.add( # added + self._handle_event, fd, handler, events) + transaction.run() # added + +The actual commit is +`here <https://github.com/lopuhin/tornado/commit/246c5e71ce8792b20c56049cf2e3eff192a01b20>`_, +- we had to extract a little function to run the callback. + +Part 1: a simple benchmark: primes +---------------------------------- + +Now we need a simple benchmark, lets start with +`this <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/primes.py?at=default>`_ +- just calculate a list of primes up to the given number, and return it +as JSON:: + + def is_prime(n): + for i in xrange(2, n): + if n % i == 0: + return False + return True + + class MainHandler(tornado.web.RequestHandler): + def get(self, num): + num = int(num) + primes = [n for n in xrange(2, num + 1) if is_prime(n)] + self.write({'primes': primes}) + + +We can benchmark it with ``siege``:: + + siege -c 50 -t 20s http://localhost:8888/10000 + +But this does not scale. The CPU load is at 101-104 %, and we handle 30 % +less request per second. The reason for the slowdown is STM overhead, +which needs to keep track of all writes and reads in order to detect conflicts. +And the reason for using only one core is, obviously, conflicts! +Fortunately, we can see what this conflicts are, if we run code like this +(here 4 is the number of cores to use):: + + PYPYSTM=stm.log ./primes.py 4 + +Then we can use `print_stm_log.py <https://bitbucket.org/pypy/pypy/raw/stmgc-c7/pypy/stm/print_stm_log.py>`_ +to analyse this log. It lists the most expensive conflicts:: + + 14.793s lost in aborts, 0.000s paused (1258x STM_CONTENTION_INEVITABLE) + File "/home/ubuntu/tornado-stm/tornado/tornado/httpserver.py", line 455, in __init__ + self._start_time = time.time() + File "/home/ubuntu/tornado-stm/tornado/tornado/httpserver.py", line 455, in __init__ + self._start_time = time.time() + ... + +There are only three kinds of conflicts, they are described in +`stm source <https://bitbucket.org/pypy/pypy/src/6355617bf9a2a0fa8b74ae17906e4a591b38e2b5/rpython/translator/stm/src_stm/stm/contention.c?at=stmgc-c7>`_, +Here we see that two threads call into external function to get current time, +and we can not rollback any of them, so one of them must wait till the other +transaction finishes. +For now we can hack around this by disabling this timing - this is only +needed for internal profiling in tornado. + +If we do it, we get the following results (but see caveats below): + +============ ========= +Impl. req/s +============ ========= +PyPy 2.4 14.4 +------------ --------- +CPython 2.7 3.2 +------------ --------- +PyPy-STM 1 9.3 +------------ --------- +PyPy-STM 2 16.4 +------------ --------- +PyPy-STM 3 20.4 +------------ --------- +PyPy STM 4 24.2 +============ ========= + +.. image:: results-1.png + +As we can see, in this benchmark PyPy STM using just two cores +can beat regular PyPy! +This is not linear scaling, there are still conflicts left, and this +is a very simple example but still, it works! + +But its not that simple yet :) + +First, these are best-case numbers after long (much longer than for regular +PyPy) warmup. Second, it can sometimes crash (although removing old pyc files +fixes it). Third, benchmark meta-parameters are also tuned. + +Here we get relatively good results only when there are a lot of concurrent +clients - as a results, a lot of requests pile up, the server is not keeping +with the load, and transaction module is busy with work running this piled up +requests. If we decrease the number of concurrent clients, results get slightly worse. +Another thing we can tune is how heavy is each request - again, if we ask +primes up to a lower number, then less time is spent doing calculations, +more time is spent in tornado, and results get much worse. + +Besides the ``time.time()`` conflict described above, there are a lot of others. +The bulk of time is lost in these two conflicts:: + + 14.153s lost in aborts, 0.000s paused (270x STM_CONTENTION_INEVITABLE) + File "/home/ubuntu/tornado-stm/tornado/tornado/web.py", line 1082, in compute_etag + hasher = hashlib.sha1() + File "/home/ubuntu/tornado-stm/tornado/tornado/web.py", line 1082, in compute_etag + hasher = hashlib.sha1() + + 13.484s lost in aborts, 0.000s paused (130x STM_CONTENTION_WRITE_READ) + File "/home/ubuntu/pypy/lib_pypy/transaction.py", line 164, in _run_thread + got_exception) + +The first one is presumably calling into some C function from stdlib, and we get +the same conflict as for ``time.time()`` above, but is can be fixed on PyPy +side, as we can be sure that computing sha1 is pure. + +It is easy to hack around this one too, just removing etag support, but if +we do it, performance is much worse, only slightly faster than regular PyPy, +with the top conflict being:: + + 83.066s lost in aborts, 0.000s paused (459x STM_CONTENTION_WRITE_WRITE) + File "/home/arigo/hg/pypy/stmgc-c7/lib-python/2.7/_weakrefset.py", line 70, in __contains__ + File "/home/arigo/hg/pypy/stmgc-c7/lib-python/2.7/_weakrefset.py", line 70, in __contains__ + +**FIXME** why does it happen? + +The second conflict (without etag tweaks) originates +in the transaction module, from this piece of code:: + + while True: + self._do_it(self._grab_next_thing_to_do(tloc_pending), + got_exception) + counter[0] += 1 + +**FIXME** why does it happen? + +Tornado modification used in this blog post is based on 3.2.dev2. +As of now, the latest version is 4.0.2, and if we +`apply <https://github.com/lopuhin/tornado/commit/04cd7407f8690fd1dc55b686eb78e3795f4363e6>`_ +the same changes to this version, then we no longer get any scaling on this benchmark, +and there are no conflicts that take any substantial time. + + +Part 2: a more interesting benchmark: A-star +-------------------------------------------- + +Although we have seen that PyPy STM is not all moonlight and roses, +it is interesting to see how it works on a more realistic application. + +`astar.py <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/astar.py>`_ +is a simple game where several players move on a map +(represented as a list of lists of integers), +build and destroy walls, and ask server to give them +shortest paths between two points +using A-star search, adopted from `ActiveState recipie <http://code.activestate.com/recipes/577519-a-star-shortest-path-algorithm/>`_. + +The benchmark `bench_astar.py <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/bench_astar.py>`_ +is simulating players, and tries to put the main load on A-star search, +but also does some wall building and destruction. There are no locks +around map modifications, as normal tornado is executing all callbacks +serially, and we can keep this guaranty with atomic blocks of PyPy STM. +This is also an example of a program that is not trivial +to scale to multiple cores with separate processes (assuming +more interesting shared state and logic). + +This benchmark is very noisy due to randomness of client interactions +(also it could be not linear), so just lower and upper bounds for +number of requests are reported + +============ ========== +Impl. req/s +============ ========== +PyPy 2.4 5 .. 7 +------------ ---------- +CPython 2.7 0.5 .. 0.9 +------------ ---------- +PyPy-STM 1 2 .. 4 +------------ ---------- +PyPy STM 4 2 .. 6 +============ ========== + +Clearly this is a very benchmark, but still we can see that scaling is worse +and STM overhead is sometimes higher. +The bulk of conflicts come from the transaction module (we have seen it +above):: + + 91.655s lost in aborts, 0.000s paused (249x STM_CONTENTION_WRITE_READ) + File "/home/ubuntu/pypy/lib_pypy/transaction.py", line 164, in _run_thread + got_exception) + + +Although it is definitely not ready for production use, you can already try +to run things, report bugs, and see what is missing in user-facing tools +and libraries. + + +Benchmarks setup: + +* Amazon c3.xlarge (4 cores) running Ubuntu 14.04 +* pypy-c-r74011-stm-jit for the primes benchmark (but it has more bugs + than more recent versions), and + `pypy-c-r74378-74379-stm-jit <http://cobra.cs.uni-duesseldorf.de/~buildmaster/misc/pypy-c-r74378-74379-stm-jit.xz>`_ + for astar benchmark (put it inside pypy source checkout at 38c9afbd253c) +* http://bitbucket.org/kostialopuhin/tornado-stm-bench at 65144cda7a1f diff --git a/sprintinfo/warsaw-2014/announcement.txt b/sprintinfo/warsaw-2014/announcement.txt --- a/sprintinfo/warsaw-2014/announcement.txt +++ b/sprintinfo/warsaw-2014/announcement.txt @@ -38,8 +38,8 @@ ------------ The sprint will happen within a room of Warsaw University. The -address is Pasteura 5 (which is a form of "Pasteur street"), room 550. -The person of contact is Maciej Fijalkowski. +address is Pasteura 5 (which is a form of "Pasteur street"), dept. of +Physics, room 450. The person of contact is Maciej Fijalkowski. -------------- diff --git a/sprintinfo/warsaw-2014/people.txt b/sprintinfo/warsaw-2014/people.txt --- a/sprintinfo/warsaw-2014/people.txt +++ b/sprintinfo/warsaw-2014/people.txt @@ -9,5 +9,12 @@ ==================== ============== ======================= Name Arrive/Depart Accomodation ==================== ============== ======================= -Armin Rigo 20/10-2X/10 with fijal? +Armin Rigo 20/10-28/10 with fijal +Maciej Fijalkowski 20/10-30/10 private +Romain Guillebert 19/10-26-10 ibis Reduta with mjacob +Manuel Jacob 20/10-26/10 ibis Reduta with rguillebert +Kostia Lopuhin +Antonio Cuni 20/10-26/10 ibis Reduta http://www.ibis.com/gb/hotel-7125-ibis-warszawa-reduta/index.shtml +Matti Picus 20/10-20/10 just a long layover between flights +Ronan Lamy 19/10-26/10 ibis Reduta ==================== ============== ======================= diff --git a/sprintinfo/warsaw-2014/planning.txt b/sprintinfo/warsaw-2014/planning.txt new file mode 100644 --- /dev/null +++ b/sprintinfo/warsaw-2014/planning.txt @@ -0,0 +1,41 @@ +Topics +====== + +* cffi.verify dlopen flag - TO BE MERGED + +* PyPy/CPython Bridge (Romain, kostia) - MORE PROGRESS + +* Profiler (Antonio, Armin) - IN PROGRESS + +* Merge improve-docs (Manuel, Ronan) - IN PROGRESS + +* Merge kill-multimethod remove-remaining-smm (Manuel, Antonio, fijal) - MERGED remove-remaining-smm, kill-multimethod WAITING FOR REVIEW + +* STM presentation (Everybody) - DONE + +* Refactor annotator/rtyper (Ronan?) - LOOKING FOR PAIRING + +* Python 3.3 - IN PROGRESS + +* look into merging gc pinning (fijal, arigo) - ALMOST READY, more debugging needed + +* investigate -fPIC slowdown (fijal, arigo) - IN PROGRESS, complete mess + +* NumPyPy discussion (everybody) DONE + +* Trying stuff on PyPy-STM (Antonio, Kostia) + +* convincing anto why resume refactoring is a good idea + +* switchify chains of guard_value (Armin, Romain...) + +People +===== + +Antonio +Armin +Kostia +Ronan +Romain +Manuel +Maciej diff --git a/talk/img/baroquesoftware.png b/talk/img/baroquesoftware.png new file mode 100644 index 0000000000000000000000000000000000000000..038e80b25722d7917a1fbecb581ce25b54707ab2 GIT binary patch [cut] diff --git a/talk/pyconie2014/Makefile b/talk/pyconie2014/Makefile new file mode 100644 --- /dev/null +++ b/talk/pyconie2014/Makefile @@ -0,0 +1,18 @@ +# you can find rst2beamer.py here: +# https://bitbucket.org/antocuni/env/raw/default/bin/rst2beamer.py + +# WARNING: to work, it needs this patch for docutils +# https://sourceforge.net/tracker/?func=detail&atid=422032&aid=1459707&group_id=38414 + +talk.pdf: talk.rst author.latex stylesheet.latex + python `which rst2beamer.py` --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst talk.latex || exit + #/home/antocuni/.virtualenvs/rst2beamer/bin/python `which rst2beamer.py` --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst talk.latex || exit + sed 's/\\date{}/\\input{author.latex}/' -i talk.latex || exit + #sed 's/\\maketitle/\\input{title.latex}/' -i talk.latex || exit + pdflatex talk.latex || exit + +view: talk.pdf + evince talk.pdf & + +xpdf: talk.pdf + xpdf talk.pdf & diff --git a/talk/pyconie2014/author.latex b/talk/pyconie2014/author.latex new file mode 100644 --- /dev/null +++ b/talk/pyconie2014/author.latex @@ -0,0 +1,9 @@ +\definecolor{rrblitbackground}{rgb}{0.0, 0.0, 0.0} + +\title[PyPy : A fast Python Virtual Machine]{PyPy : A fast Python Virtual Machine} +\author[rguillebert] +{Romain Guillebert\\ +\includegraphics[width=80px]{../img/py-web-new.png}} + +\institute{Pycon IE} +\date{October 12th, 2014} diff --git a/talk/pyconie2014/beamerdefs.txt b/talk/pyconie2014/beamerdefs.txt new file mode 100644 --- /dev/null +++ b/talk/pyconie2014/beamerdefs.txt @@ -0,0 +1,108 @@ +.. colors +.. =========================== + +.. role:: green +.. role:: red + + +.. general useful commands +.. =========================== + +.. |pause| raw:: latex + + \pause + +.. |small| raw:: latex + + {\small + +.. |end_small| raw:: latex + + } + +.. |scriptsize| raw:: latex + + {\scriptsize + +.. |end_scriptsize| raw:: latex + + } + +.. |strike<| raw:: latex + + \sout{ + +.. closed bracket +.. =========================== + +.. |>| raw:: latex + + } + + +.. example block +.. =========================== + +.. |example<| raw:: latex + + \begin{exampleblock}{ + + +.. |end_example| raw:: latex + + \end{exampleblock} + + + +.. alert block +.. =========================== + +.. |alert<| raw:: latex + + \begin{alertblock}{ + + +.. |end_alert| raw:: latex + + \end{alertblock} + + + +.. columns +.. =========================== + +.. |column1| raw:: latex + + \begin{columns} + \begin{column}{0.45\textwidth} + +.. |column2| raw:: latex + + \end{column} + \begin{column}{0.45\textwidth} + + +.. |end_columns| raw:: latex + + \end{column} + \end{columns} + + + +.. |snake| image:: ../../img/py-web-new.png + :scale: 15% + + + +.. nested blocks +.. =========================== + +.. |nested| raw:: latex + + \begin{columns} + \begin{column}{0.85\textwidth} + +.. |end_nested| raw:: latex + + \end{column} + \end{columns} diff --git a/talk/pyconie2014/speed.png b/talk/pyconie2014/speed.png new file mode 100644 index 0000000000000000000000000000000000000000..4640c76f8a665af1c414dc4c4ca22be3bd8ff360 GIT binary patch [cut] diff --git a/talk/pyconie2014/stylesheet.latex b/talk/pyconie2014/stylesheet.latex new file mode 100644 --- /dev/null +++ b/talk/pyconie2014/stylesheet.latex @@ -0,0 +1,9 @@ +\setbeamercovered{transparent} +\setbeamertemplate{navigation symbols}{} + +\definecolor{darkgreen}{rgb}{0, 0.5, 0.0} +\newcommand{\docutilsrolegreen}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\docutilsrolered}[1]{\color{red}#1\normalcolor} + +\newcommand{\green}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\red}[1]{\color{red}#1\normalcolor} diff --git a/talk/pyconie2014/talk.pdf b/talk/pyconie2014/talk.pdf new file mode 100644 index 0000000000000000000000000000000000000000..748494720b5c056a3734955b2bbb4ad2f93692e4 GIT binary patch [cut] diff --git a/talk/pyconie2014/talk.rst b/talk/pyconie2014/talk.rst new file mode 100644 --- /dev/null +++ b/talk/pyconie2014/talk.rst @@ -0,0 +1,170 @@ +.. include:: beamerdefs.txt + +PyPy : A fast Python Virtual Machine +==================================== + +Me +-- + +- rguillebert on twitter and irc + +- PyPy contributor since 2011 + +- NumPyPy contributor + +- Software consultant (hire me !) + +Introduction +------------ + +- "PyPy is a fast, compliant alternative implementation of the Python language" + +- Aims to reach the best performance possible without changing the syntax or semantics + +- Supports x86, x86_64, ARM + +- Production ready + +- MIT Licensed + +Speed +----- + +.. image:: speed.png + :scale: 37% + +Speed +----- + +- Automatically generated tracing just-in-time compiler + +- Generates linear traces from loops + +- Generates efficient machine code based on runtime observations + +- Removes overhead when unnecessary + +- But Python features which require overhead remain available (frame introspection, pdb) + +Performance ? +------------- + +- Things get done faster + +- Serve more requests per second + +- Lower latency + +- Less servers for the same performance + +Demo +---- + +- Real-time edge detection + + +Compatibility +------------- + +- Fully compatible with CPython 2.7 & 3.2 (minus implementation details) + +- Partial and slow support of the C-API + +- Alternatives might exist + +Ecosystem +--------- + +- We should (slowly, incrementally) move away from the C extension API + + * Makes assumptions on refcounting, object layout, the GIL + + * The future of Python is bound to the future of CPython (a more than 20 years old interpreter) + + * It's hard for a new Python VM without C extension support to get traction (not only PyPy) + +- This doesn't mean we should lose Python's ability to interface with C easily + +- CFFI is the PyPy team's attempt at solving this + +CFFI (1/2) +---------- + +- Where do we go from here ? + +- CFFI is a fairly new way of interacting with C in an implementation independant way + +- Very fast on PyPy + +- Decently fast on CPython + +- The Jython project is working on support + +CFFI (2/2) +---------- + +- More convenient, safer, faster than ctypes + +- Can call C functions easily, API and ABI mode + +- Python functions can be exposed to C + +- Already used by pyopenssl, psycopg2cffi, pygame_cffi, lxml_cffi + +- Other tools could be built on top of it (Cython cffi backend ?) + +Success stories +--------------- + + Magnetic is the leader in online search retargeting, with a large, high volume, performance-critical platform written in Python. [...] + + The Magnetic bidders were ported from CPython to PyPy, yielding an overall 30% performance gain. + +- Julian Berman + + magnetic.com + +Success stories +--------------- + + Currently we have improvements in raw performance (read: response times) that span from 8% to a pretty interesting 40%, but we have a peak of an astonishing 100-120% and even more. + + Take into the account that most of our apps are simple "blocking-on-db" ones, so a 2x increase is literally money. + +- Roberto De Ioris + + Unbit + +Success stories +--------------- + + In addition to this our main (almost secret) objective was reducing resource usage of the application servers, which directly translates to being able to host more customers on the same server. + +- Roberto De Ioris + + Unbit + +Success stories +--------------- + + PyPy is an excellent choice for every pure Python project that depends on speed of execution of readable and maintainable large source code. + [...] + We had roughly a 2x speedup with PyPy over CPython. + +- Marko Tasic (Web and Data processing) + +Future +------ + +- Python 3.3 + +- NumPyPy + +- STM + +- You can donate to help the progress of these features : pypy.org + +Questions +--------- + +- Questions ? diff --git a/talk/pyconpl-2014/Makefile b/talk/pyconpl-2014/Makefile new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/Makefile @@ -0,0 +1,18 @@ +# you can find rst2beamer.py here: +# https://bitbucket.org/antocuni/env/raw/default/bin/rst2beamer.py + +# WARNING: to work, it needs this patch for docutils +# https://sourceforge.net/tracker/?func=detail&atid=422032&aid=1459707&group_id=38414 + +talk.pdf: talk.rst author.latex stylesheet.latex + rst2beamer --stylesheet=stylesheet.latex --documentoptions=14pt --output-encoding=utf8 --overlaybullets=False talk.rst talk.latex || exit + #/home/antocuni/.virtualenvs/rst2beamer/bin/python `which rst2beamer.py` --stylesheet=stylesheet.latex --documentoptions=14pt talk.rst talk.latex || exit + sed 's/\\date{}/\\input{author.latex}/' -i talk.latex || exit + #sed 's/\\maketitle/\\input{title.latex}/' -i talk.latex || exit + pdflatex talk.latex || exit + +view: talk.pdf + evince talk.pdf & + +xpdf: talk.pdf + xpdf talk.pdf & diff --git a/talk/pyconpl-2014/author.latex b/talk/pyconpl-2014/author.latex new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/author.latex @@ -0,0 +1,10 @@ +\definecolor{rrblitbackground}{rgb}{0.0, 0.0, 0.0} + +\title[PyPy]{PyPy} +\author[arigo, fijal] +{Armin Rigo, Maciej Fijałkowski\\ +\includegraphics[width=80px]{../img/py-web-new.png} +\hspace{1em} +\includegraphics[width=80px]{../img/baroquesoftware.png}} +\institute{PyCon PL} +\date{October 2014} diff --git a/talk/pyconpl-2014/beamerdefs.txt b/talk/pyconpl-2014/beamerdefs.txt new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/beamerdefs.txt @@ -0,0 +1,108 @@ +.. colors +.. =========================== + +.. role:: green +.. role:: red + + +.. general useful commands +.. =========================== + +.. |pause| raw:: latex + + \pause + +.. |small| raw:: latex + + {\small + +.. |end_small| raw:: latex + + } + +.. |scriptsize| raw:: latex + + {\scriptsize + +.. |end_scriptsize| raw:: latex + + } + +.. |strike<| raw:: latex + + \sout{ + +.. closed bracket +.. =========================== + +.. |>| raw:: latex + + } + + +.. example block +.. =========================== + +.. |example<| raw:: latex + + \begin{exampleblock}{ + + +.. |end_example| raw:: latex + + \end{exampleblock} + + + +.. alert block +.. =========================== + +.. |alert<| raw:: latex + + \begin{alertblock}{ + + +.. |end_alert| raw:: latex + + \end{alertblock} + + + +.. columns +.. =========================== + +.. |column1| raw:: latex + + \begin{columns} + \begin{column}{0.45\textwidth} + +.. |column2| raw:: latex + + \end{column} + \begin{column}{0.45\textwidth} + + +.. |end_columns| raw:: latex + + \end{column} + \end{columns} + + + +.. |snake| image:: ../../img/py-web-new.png + :scale: 15% + + + +.. nested blocks +.. =========================== + +.. |nested| raw:: latex + + \begin{columns} + \begin{column}{0.85\textwidth} + +.. |end_nested| raw:: latex + + \end{column} + \end{columns} diff --git a/talk/pyconpl-2014/benchmarks/abstract.rst b/talk/pyconpl-2014/benchmarks/abstract.rst new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/benchmarks/abstract.rst @@ -0,0 +1,7 @@ +How to benchmark code +--------------------- + +In this talk, we would like to present basics of how the Python virtual machines +like CPython or PyPy work and how to use that knowledge to write meaningful +benchmarks for your programs. We'll show what's wrong with microbenchmarks +and how to improve the situation. diff --git a/talk/pyconpl-2014/benchmarks/f1.py b/talk/pyconpl-2014/benchmarks/f1.py new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/benchmarks/f1.py @@ -0,0 +1,8 @@ + +def f(): + i = 0 + while i < 100000000: + i += 1 + return i + +f() diff --git a/talk/pyconpl-2014/benchmarks/f2.py b/talk/pyconpl-2014/benchmarks/f2.py new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/benchmarks/f2.py @@ -0,0 +1,10 @@ + +def f(): + i = 0 + s = 0 + while i < 100000000: + s += len(str(i)) + i += 1 + return s + +print f() diff --git a/talk/pyconpl-2014/benchmarks/fib.py b/talk/pyconpl-2014/benchmarks/fib.py new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/benchmarks/fib.py @@ -0,0 +1,31 @@ + +import time +import numpy +try: + from matplotlib import pylab +except: + from embed.emb import import_mod + pylab = import_mod('matplotlib.pylab') + +def fib(n): + if n == 0 or n == 1: + return 1 + return fib(n - 1) + fib(n - 2) + +def f(): + for i in range(10000): + "".join(list(str(i))) + +times = [] +for i in xrange(1000): + t0 = time.time() + #f() + fib(17) + times.append(time.time() - t0) + +hist, bins = numpy.histogram(times, 20) +#pylab.plot(bins[:-1], hist) +pylab.ylim(0, max(times) * 1.2) +pylab.plot(numpy.array(times)) +#pylab.hist(hist, bins, histtype='bar') +pylab.show() diff --git a/talk/pyconpl-2014/benchmarks/talk.rst b/talk/pyconpl-2014/benchmarks/talk.rst new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/benchmarks/talk.rst @@ -0,0 +1,163 @@ +.. include:: ../beamerdefs.txt + +--------------------- +How to benchmark code +--------------------- + +Who are we? +------------ + +* Maciej Fijalkowski, Armin Rigo + +* working on PyPy + +* interested in performance + +What is this talk is about? +--------------------------- + +* basics how CPython and PyPy run programs + +* a bit of theory about measuring performance + +* microbenchmarks + +* complicated picture of "real world" + +CPython +------- + +* a "simple" virtual machine + +* compiles python code to bytecode + +* runs the bytecode + +* usually invokes tons of runtime functions written in C + +CPython (demo) +-------------- + +PyPy +---- + +* not so simple virtual machine + +* all of the above + +* ... and then if the loop/function gets called often enough + it's compiled down to an optimized assembler by the JIT + +PyPy (demo) +----------- + +Measurments 101 +--------------- + +* run your benchmark multiple times + +* the distribution should be gaussian + +* take the average and the variation + +* if the variation is too large, increase the number of iterations + +Let's do it (demo) +------------------ + +Problems +-------- + +* the whole previous slide is a bunch of nonsense + +* ... + +"Solution" +---------- + +* you try your best and do the average anyway + +* presumably cutting off the warmup time + +|pause| + +* not ideal at all + +Writing benchmarks - typical approach +------------------------------------- + +* write a set of small programs that exercise one particular thing + + * recursive fibonacci + + * pybench + +PyBench +------- + +* used to be a tool to compare python implementations + +* only uses microbenchmarks + +* assumes operation times are concatenative + +Problems +-------- + +* a lot of effects are not concatenative + +* optimizations often collapse consecutive operations + +* large scale effects only show up on large programs + +An example +---------- + +* python 2.6 vs python 2.7 had minimal performance changes + +* somewhere in the changelog, there is a gc change mentioned + +* it made pypy translation toolchain jump from 3h to 1h + +* it's "impossible" to write a microbenchmarks for this + +More problems +------------- + +* half of the blog posts comparing VM performance uses recursive fibonacci + +* most of the others use computer language shootout + +PyPy benchmark suite +-------------------- + +* programs from small to medium and large + +* 50 LOC to 100k LOC + +* try to exercise various parts of language (but e.g. lack IO) + +Solutions +--------- + +* measure what you are really interested in + +* derive microbenchmarks from your bottlenecks + +* be skeptical + +* understand what you're measuring + +Q&A +--- + +- http://pypy.org/ + +- http://morepypy.blogspot.com/ + +- http://baroquesoftware.com/ + +- ``#pypy`` at freenode.net + +- Any question? + diff --git a/talk/pyconpl-2014/speed.png b/talk/pyconpl-2014/speed.png new file mode 100644 index 0000000000000000000000000000000000000000..33fe20ac9d81ddbd3ced48f52f9717693dc15518 GIT binary patch [cut] diff --git a/talk/pyconpl-2014/standards.png b/talk/pyconpl-2014/standards.png new file mode 100644 index 0000000000000000000000000000000000000000..5d38303773dd4f1b798a91bec62d05e0423a6a0d GIT binary patch [cut] diff --git a/talk/pyconpl-2014/stylesheet.latex b/talk/pyconpl-2014/stylesheet.latex new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/stylesheet.latex @@ -0,0 +1,9 @@ +\setbeamercovered{transparent} +\setbeamertemplate{navigation symbols}{} + +\definecolor{darkgreen}{rgb}{0, 0.5, 0.0} +\newcommand{\docutilsrolegreen}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\docutilsrolered}[1]{\color{red}#1\normalcolor} + +\newcommand{\green}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\red}[1]{\color{red}#1\normalcolor} diff --git a/talk/pyconpl-2014/talk.pdf b/talk/pyconpl-2014/talk.pdf new file mode 100644 index 0000000000000000000000000000000000000000..96cfd649da81c326dec0dbe7d92087b9864229d7 GIT binary patch [cut] diff --git a/talk/pyconpl-2014/talk.rst b/talk/pyconpl-2014/talk.rst new file mode 100644 --- /dev/null +++ b/talk/pyconpl-2014/talk.rst @@ -0,0 +1,297 @@ +.. include:: beamerdefs.txt + +================================ +PyPy +================================ + +Who We Are +---------- + +* Maciej Fijałkowski + +* Armin Rigo + +* PyPy developers for a long time + +* baroquesoftware + +What is PyPy? +-------------- + +* Python interpreter, alternative to CPython + +* Supports Python 2.7 and (beta) Python 3.2/3.3 + +* Compatible and generally much faster (JIT) + +Benchmarks +-------------------------------- + +.. image:: speed.png + :scale: 44% + :align: center + +Demo +-------------------------------- + + +Recent developments +-------------------------------- + +Between PyPy 2.0 (May 2013) and PyPy 2.4 (now): + +. + +* All kinds of speed improvements for all kinds of programs + + - JIT improvements, incremental GC (garbage collector), + specific Python corners improved, ... + +* Support for ARM in addition to x86 + + - Thanks to the Raspberry-Pi foundation + +* Python 3 support + + - py3k, in addition to Python 2.7 + +* Numpy more complete (but still not done) + +Status +----------------------------- + +- Python code "just works" + + * generally much faster than with CPython + +- C code: improving support + + * cpyext: tries to load CPython C extension modules, slowly + + * CFFI: the future + + * cppyy for C++ + + * A very small native PyPy C API for embedding, WIP + +- Lots of CFFI modules around: + + * pyopenssl, pygame_cffi, psycopg2cffi, lxml... + +Fundraising Campaign +--------------------- + +- py3k: 55'000 $ of 105'000 $ (52%) + +- numpy: 48'000 $ of 60'000 $ (80%) + +- STM, 1st call: 38'000 $ + +- STM, 2nd call: 17'000 $ of 80'000 $ (22%) + +- Thanks to all donors! + +Commercial support +------------------ + +- We offer commercial support for PyPy + +- Consultancy and training + +- Performance issues for open- or closed-source programs, porting, + improving support in parts of the Python or non-Python interpreters, + etc. + +- http://baroquesoftware.com + +Recent developments (2) +-------------------------------- + +* CFFI + + - C Foreign Function Interface + +* STM + + - Software Transactional Memory + +CFFI +----- + +- Python <-> C interfacing done right + + * existing shared libraries + + * custom C code + +- Alternative to the CPython Extension API, ctypes, Cython, etc. + +- Fast-ish on CPython, super-fast on PyPy, Jython support in the future + +- Simple, does not try to be magic + +CFFI +---- + +.. image:: standards.png + :scale: 50% + :align: center + +CFFI demo +--------- + +CFFI idea +--------- + +* C and Python are enough, we don't need an extra language + +* C is well defined, let's avoid magic + +* all the logic (and magic!) can be done in Python + +* API vs ABI + +* Inspired by LuaJIT's FFI + +Work in Progress: STM +--------------------- + +- Software Transactional Memory + +- Solving the GIL problem + + * GIL = Global Interpreter Lock + +- Without bringing the threads and locks mess + +- Preliminary versions of pypy-jit-stm available + +STM (2) +------- + +- STM = Free Threading done right + + * with some overhead: 30-40% so far + +- Done at the level of RPython + +- The interpreter author doesn't have to worry + about adding tons of locks + + - that's us + +- The user *can* if he likes, but doesn't have to either + + - that's you ``:-)`` + +STM (3) +------- + +- Works "like a GIL" but runs optimistically in parallel + +- A few bytecodes from thread A run on core 1 + +- A few bytecodes from thread B run on core 2 + +- If there is no conflict, we're happy + +- If there is a conflict, one of the two aborts and retries + +- Same effect as transactions in databases + +STM (4) +------- + +- Threading made simpler for the user + +- It is generally efficient with *very coarse locks* + + * no fine-grained locking needed + +- Easy to convert a number of existing single-threaded programs + + * start multiple threads, run blocks of code in each + + * use a single lock around everything + + * normally, you win absolutely nothing + + * but STM can (try to) *execute the blocks in parallel* anyway + +STM (Demo) +---------- + +PyPy and RPython +--------------------------- + +* PyPy is an interpreter/JIT-compiled for Python + +* PyPy is written in RPython + +* RPython is a language for writing interpreters: + it provides GC-for-free, JIT-for-free, etc. + +* Ideal for writing VMs for dynamic languages + +More PyPy-Powered Languages +---------------------------- + +- Topaz: implementing Ruby + + * most of the language implemented + + * "definitely faster than MRI" + + * https://github.com/topazproject/topaz + +- HippyVM: implementing PHP + + * ~7x faster than standard PHP + + * comparable speed as HHVM + + * http://hippyvm.com/ + +- And more + +Future +------ + +* future is hard to predict + +* continue working on general improvements + +* improved IO performance in the pipeline + +* warmup improvements + +* numpy + +Warmup improvements +------------------- + +* biggest complain - slow to warmup, memory hog + +* we have ideas how to improve the situation + +* still looking for funding + +Numpy +----- + +* numpy is mostly complete + +* performance can be improved, especially the vectorized versions + +* scipy, matplotlib, the entire ecosystem, we have a hackish idea + +Contacts, Q&A +-------------- + +- http://pypy.org + +- http://morepypy.blogspot.com/ + +- ``#pypy`` at freenode.net + +- Any question? _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit