Author: Maciej Fijalkowski <fij...@gmail.com> Branch: extradoc Changeset: r5073:8ae7445f165e Date: 2013-10-14 17:57 +0200 http://bitbucket.org/pypy/extradoc/changeset/8ae7445f165e/
Log: merge diff --git a/blog/draft/stm-oct2013.rst b/blog/draft/stm-oct2013.rst new file mode 100644 --- /dev/null +++ b/blog/draft/stm-oct2013.rst @@ -0,0 +1,78 @@ +Update on STM +============= + +Hi all, + +the sprint in London was a lot of fun and very fruitful. In the last +update on STM, Armin was working on improving and specializing the +automatic barrier placement. +There is still a lot to do in that area, but that work was merged and +lowered the overhead of STM over non-STM to around **XXX**. The same +improvement has still to be done in the JIT. + +But that is not all. Right after the sprint, we were able to squeeze +the last obvious bugs in the STM-JIT combination. However, the performance +was nowhere near to what we want. So until now, we fixed some of the most +obvious issues. Many come from RPython erring on the side of caution +and e.g. making a transaction inevitable even if that is not strictly +necessary, thereby limiting parallelism. +**XXX any interesting details? transaction breaks maybe? guard counters?** +There are still many performance issues of various complexity left +to tackle. So stay tuned or contribute :) + +Now, since the JIT is all about performance, we want to at least +show you some numbers that are indicative of things to come. +Our set of STM benchmarks is very small unfortunately +(something you can help us out with), so this is +not representative of real-world performance. We tried to +minimize the effect of JIT warm-up in the benchmark results. + + +**Raytracer** from `stm-benchmarks <https://bitbucket.org/Raemi/stm-benchmarks/src>`_: +Render times in seconds for a 1024x1024 image: + ++-------------+----------------------+-------------------+ +| Interpreter | Base time: 1 thread | 8 threads | ++=============+======================+===================+ +| PyPy-2.1 | 2.47 | 2.56 | ++-------------+----------------------+-------------------+ +| CPython | 81.1 | 73.4 | ++-------------+----------------------+-------------------+ +| PyPy-STM | 50.2 | 10.8 | ++-------------+----------------------+-------------------+ + +For comparison, disabling the JIT gives 148ms on PyPy-2.1 and 87ms on +PyPy-STM (with 8 threads). + +**Richards** from `PyPy repository on the stmgc-c4 +branch <https://bitbucket.org/pypy/pypy/commits/branch/stmgc-c4>`_: +Average time per iteration in milliseconds using 8 threads: + ++-------------+----------------------+-------------------+ +| Interpreter | Base time: 1 thread | 8 threads | ++=============+======================+===================+ +| PyPy-2.1 | 15.6 | 15.4 | ++-------------+----------------------+-------------------+ +| CPython | 239 | 237 | ++-------------+----------------------+-------------------+ +| PyPy-STM | 371 | 116 | ++-------------+----------------------+-------------------+ + +For comparison, disabling the JIT gives 492ms on PyPy-2.1 and 538ms on +PyPy-STM. + +All this can be found in the `PyPy repository on the stmgc-c4 +branch <https://bitbucket.org/pypy/pypy/commits/branch/stmgc-c4>`_. +Try it for yourself, but keep in mind that this is still experimental +with a lot of things yet to come. + +You can also download a prebuilt binary from here: **XXX** + +As a summary, what the numbers tell us is that PyPy-STM is, as expected, +the only of the three interpreters where multithreading gives a large +improvement in speed. What they also tell us is that, obviously, the +result is not good enough *yet:* it still takes longer on a 8-threaded +PyPy-STM than on a regular single-threaded PyPy-2.1. As you should know +by now, we are good at promizing speed and delivering it years later. +It has been two years already since PyPy-STM started, so we're in the +fast-progressing step right now :-) diff --git a/blog/draft/stm-sept2013.rst b/blog/draft/stm-sept2013.rst deleted file mode 100644 --- a/blog/draft/stm-sept2013.rst +++ /dev/null @@ -1,52 +0,0 @@ -Update on STM -============= - -Hi all, - -the sprint in London was a lot of fun and very fruitful. In the last -update on STM, Armin was working on improving and specializing the -automatic barrier placement. -There is still a lot to do in that area, but that work was merged and -lowered the overhead of STM over non-STM to around **XXX**. The same -improvement has still to be done in the JIT. - -But that is not all. Right after the sprint, we were able to squeeze -the last obvious bugs in the STM-JIT combination. However, the performance -was nowhere near what we want. So until now, we fixed some of the most -obvious issues. Many come from RPython erring on the side of caution -and e.g. making a transaction inevitable even if that is not strictly -necessary, thereby limiting parallelism. -**XXX any interesting details?** -There are still many performance issues of various complexity left -to tackle. So stay tuned or contribute :) - -Now, since the JIT is all about performance, we want to at least -show you some numbers that are indicative of things to come. -Our set of STM benchmarks is very small unfortunately -(something you can help us out with), so this is -not representative of real-world performance. - -**Raytracer** from `stm-benchmarks <https://bitbucket.org/Raemi/stm-benchmarks/src>`_: -Render times for a 1024x1024 image using 6 threads - -+-------------+----------------------+ -| Interpeter | Time (no-JIT / JIT) | -+=============+======================+ -| PyPy-2.1 | ... / ... | -+-------------+----------------------+ -| CPython | ... / - | -+-------------+----------------------+ -| PyPy-STM | ... / ... | -+-------------+----------------------+ - -**XXX same for Richards** - - -All this can be found in the `PyPy repository on the stmgc-c4 -branch <https://bitbucket.org/pypy/pypy/commits/branch/stmgc-c4>`_. -Try it for yourself, but keep in mind that this is still experimental -with a lot of things yet to come. - -You can also download a prebuilt binary frome here: **XXX** - - diff --git a/planning/jit.txt b/planning/jit.txt --- a/planning/jit.txt +++ b/planning/jit.txt @@ -45,9 +45,6 @@ (SETINTERIORFIELD, GETINTERIORFIELD). This is needed for the previous item to fully work. -- {}.update({}) is not fully unrolled and constant folded because HeapCache - loses track of values in virtual-to-virtual ARRAY_COPY calls. - - ovfcheck(a << b) will do ``result >> b`` and check that the result is equal to ``a``, instead of looking at the x86 flags. diff --git a/talk/pyconza2013/Makefile b/talk/pyconza2013/Makefile --- a/talk/pyconza2013/Makefile +++ b/talk/pyconza2013/Makefile @@ -1,13 +1,13 @@ view: talk.pdf - xpdf talk.pdf + evince talk.pdf talk.pdf: talk.tex 64bit pdflatex talk.tex -talk.tex: talk1.tex fix.py - python fix.py < talk1.tex > talk.tex +talk.tex: talk.rst + rst2beamer --stylesheet=stylesheet.latex --documentoptions=14pt --input-encoding=utf8 --output-encoding=utf8 --overlaybullets=false $< > talk.tex -talk1.tex: talk.rst - rst2beamer $< > talk1.tex +clean: + rm -f talk.tex talk.pdf diff --git a/talk/pyconza2013/stylesheet.latex b/talk/pyconza2013/stylesheet.latex new file mode 100644 --- /dev/null +++ b/talk/pyconza2013/stylesheet.latex @@ -0,0 +1,10 @@ +\usetheme{Warsaw} +\usecolortheme{whale} +\setbeamercovered{transparent} +\definecolor{darkgreen}{rgb}{0, 0.5, 0.0} +\newcommand{\docutilsrolegreen}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\docutilsrolered}[1]{\color{red}#1\normalcolor} +\addtobeamertemplate{block begin}{}{\setlength{\parskip}{35pt plus 1pt minus 1pt}} + +\newcommand{\green}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\red}[1]{\color{red}#1\normalcolor} diff --git a/talk/pyconza2013/talk.pdf b/talk/pyconza2013/talk.pdf index 6fed83a5c845e1d71cd4c32a98eb6a6b93d07bcf..fec69aacfbd0fc9af5c9c60eb65501eed188fc5a GIT binary patch [cut] diff --git a/talk/pyconza2013/talk.rst b/talk/pyconza2013/talk.rst --- a/talk/pyconza2013/talk.rst +++ b/talk/pyconza2013/talk.rst @@ -1,25 +1,25 @@ .. include:: beamerdefs.txt -======================================= -Software Transactional Memory with PyPy -======================================= +.. raw:: latex + \title{Software Transactional Memory with PyPy} + \author[arigo]{Armin Rigo} -Software Transactional Memory with PyPy ---------------------------------------- + \institute{PyCon ZA 2013} + \date{4th October 2013} -* PyCon ZA 2013 - -* talk by Armin Rigo - -* sponsored by crowdfunding (thanks!) + \maketitle Introduction ------------ +* me: Armin Rigo + * what is PyPy: an alternative implementation of Python +* very compatible + * main focus is on speed @@ -27,13 +27,21 @@ ------------ .. image:: speed.png - :scale: 65% + :scale: 67% :align: center SQL by example -------------- +.. raw:: latex + + %empty + + +SQL by example +-------------- + :: BEGIN TRANSACTION; @@ -58,6 +66,27 @@ :: + ... + obj.value += 1 + ... + + +Python by example +----------------- + +:: + + ... + x = obj.value + obj.value = x + 1 + ... + + +Python by example +----------------- + +:: + begin_transaction() x = obj.value obj.value = x + 1 @@ -100,10 +129,10 @@ :: - BEGIN TRANSACTION; BEGIN TRANSACTION; BEGIN.. - SELECT * FROM ...; SELECT * FROM ...; SELEC.. - UPDATE ...; UPDATE ...; UPDAT.. - COMMIT; COMMIT; COMMI.. + BEGIN TRANSACTION; BEGIN TRANSACTION; BEGIN.. + SELECT * FROM ...; SELECT * FROM ...; SELEC.. + UPDATE ...; UPDATE ...; UPDAT.. + COMMIT; COMMIT; COMMI.. Locks != Transactions @@ -111,9 +140,9 @@ :: - with the_lock: with the_lock: with .. - x = obj.val x = obj.val x =.. - obj.val = x + 1 obj.val = x + 1 obj.. + with the_lock: with the_lock: with .. + x = obj.val x = obj.val x =.. + obj.val = x + 1 obj.val = x + 1 obj.. Locks != Transactions @@ -121,9 +150,9 @@ :: - with atomic: with atomic: with .. - x = obj.val x = obj.val x =.. - obj.val = x + 1 obj.val = x + 1 obj.. + with atomic: with atomic: with .. + x = obj.val x = obj.val x =.. + obj.val = x + 1 obj.val = x + 1 obj.. STM @@ -134,14 +163,46 @@ * advanced but not magic (same as databases) -STM versus HTM --------------- +By the way +---------- -* Software versus Hardware +* STM replaces the GIL (Global Interpreter Lock) -* CPU hardware specially to avoid the high overhead +* any existing multithreaded program runs on multiple cores -* too limited for now + +By the way +---------- + +* the GIL is necessary and very hard to avoid, + but if you look at it like a lock around every single + subexpression, then it can be replaced with `with atomic` too + + +So... +----- + +* yes, any existing multithreaded program runs on multiple cores + +* yes, we solved the GIL + +* great + + +So... +----- + +* no, it would be quite hard to implement it in standard CPython + +* too bad for now, only in PyPy + +* but it would not be completely impossible + + +But... +------ + +* but only half of the story in my opinion `:-)` Example 1 @@ -149,11 +210,13 @@ :: - def apply_interest_rate(self): + def apply_interest(self): self.balance *= 1.05 + for account in all_accounts: - account.apply_interest_rate() + account.apply_interest() + . Example 1 @@ -161,12 +224,27 @@ :: - def apply_interest_rate(self): + def apply_interest(self): self.balance *= 1.05 + for account in all_accounts: - add_task(account.apply_interest_rate) - run_tasks() + account.apply_interest() + ^^^ run this loop multithreaded + + +Example 1 +--------- + +:: + + def apply_interest(self): + #with atomic: --- automatic + self.balance *= 1.05 + + for account in all_accounts: + add_task(account.apply_interest) + run_all_tasks() Internally @@ -178,6 +256,8 @@ * uses threads, but internally only +* very simple, pure Python + Example 2 --------- @@ -187,7 +267,7 @@ def next_iteration(all_trains): for train in all_trains: start_time = ... - for othertrain in train.dependencies: + for othertrain in train.deps: if ...: start_time = ... train.start_time = start_time @@ -215,37 +295,29 @@ * but with `objects` instead of `records` -* the transaction aborts and automatically retries +* the transaction aborts and retries automatically Inevitable ---------- -* means "unavoidable" +* "inevitable" (means "unavoidable") * handles I/O in a `with atomic` * cannot abort the transaction any more -By the way ----------- - -* STM replaces the GIL - -* any existing multithreaded program runs on multiple cores - - Current status -------------- * basics work, JIT compiler integration almost done -* different executable called `pypy-stm` +* different executable (`pypy-stm` instead of `pypy`) * slow-down: around 3x (in bad cases up to 10x) -* speed-ups measured with 4 cores +* real time speed-ups measured with 4 or 8 cores * Linux 64-bit only @@ -258,9 +330,11 @@ :: Detected conflict: + File "foo.py", line 58, in wtree + walk(root) File "foo.py", line 17, in walk if node.left not in seen: - Transaction aborted, 0.000047 seconds lost + Transaction aborted, 0.047 sec lost User feedback @@ -273,11 +347,11 @@ Forced inevitable: File "foo.py", line 19, in walk print >> log, logentry - Transaction blocked others for 0.xx seconds + Transaction blocked others for XX s -Async libraries ---------------- +Asynchronous libraries +---------------------- * future work @@ -287,11 +361,11 @@ * existing Twisted apps still work, but we need to look at conflicts/inevitables -* similar with Tornado, gevent, and so on +* similar with Tornado, eventlib, and so on -Async libraries ---------------- +Asynchronous libraries +---------------------- :: @@ -318,6 +392,16 @@ * reduce slow-down, port to other OS'es +STM versus HTM +-------------- + +* Software versus Hardware + +* CPU hardware specially to avoid the high overhead (Intel Haswell processor) + +* too limited for now + + Under the cover --------------- @@ -329,8 +413,8 @@ * the most recent version can belong to one thread -* synchronization only when a thread "steals" another thread's most - recent version, to make it shared +* synchronization only at the point where one thread "steals" + another thread's most recent version, to make it shared * integrated with a generational garbage collector, with one nursery per thread @@ -345,4 +429,8 @@ * a small change for Python users +* (and the GIL is gone) + +* this work is sponsored by crownfunding (thanks!) + * `Q & A` _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit