Author: Konstantin Lopuhin <[email protected]> Branch: extradoc Changeset: r5457:02eb68ef992d Date: 2014-11-02 01:23 +0400 http://bitbucket.org/pypy/extradoc/changeset/02eb68ef992d/
Log: initial version of tornado stm blog post diff --git a/blog/draft/tornado-stm.rst b/blog/draft/tornado-stm.rst new file mode 100644 --- /dev/null +++ b/blog/draft/tornado-stm.rst @@ -0,0 +1,144 @@ +Tornado without a GIL on PyPy STM +================================= + +Python has a GIL, right? Not quite - PyPy STM is a Python implementation +without a GIL, so it can scale CPU-bound work to several cores. +More than that, it proposes an easier to reason about model +that is not based on threads (although you can use threads too). +PyPy STM is developed by Armin Rigo and Remi Meier, +and supported by community `donations <http://pypy.org/tmdonate2.html>`_. +You can read more about it in the +`docs <http://pypy.readthedocs.org/en/latest/stm.html>`_. + +Although PyPy STM is still a work in progress, in many cases it can already +run CPU-bound code faster than regular PyPy when using multiple cores. +Here we will see how to slightly modify Tornado IO loop to use +`transaction <https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py>`_ +module. +This module is `descibed <http://pypy.readthedocs.org/en/latest/stm.html#atomic-sections-transactions-etc-a-better-way-to-write-parallel-programs>`_ +in the docs and is really simple to use if you have several things to do where +you do not care about the order they are run, as long as they +are run separately. So an event loop of Tornado, or any other asynchronous +framework, looks a bit like this (with some simplifications):: + + while True: + for callback in list(self._callbacks): + self._run_callback(callback) + event_pairs = self._impl.poll() + self._events.update(event_pairs) + while self._events: + fd, events = self._events.popitem() + handler = self._handlers[fd] + self._handle_event(fd, handler, events) + +We get IO events, and run handlers for all of them, these handlers can +also register new callbacks, which we run too. Then using such a framework, +it is very nice to have a garanty that all things are run serially, +so you do not have to put any locks. So this is an ideal case for the +transaction module - it gives us garanties that things appear +to be run serially, so in user code we do not need any locks. We just +need to change the code above to something like:: + + while True: + for callback in list(self._callbacks): + transaction.add( + self._run_callback, callback) # added + transaction.run() # added + event_pairs = self._impl.poll() + self._events.update(event_pairs) + while self._events: + fd, events = self._events.popitem() + handler = self._handlers[fd] + transaction.add( # added + self._handle_event, fd, handler, events) + transaction.run() # added + +The actual commit is +`here <https://github.com/lopuhin/tornado/commit/246c5e71ce8792b20c56049cf2e3eff192a01b20>`_, +- we had to extract a little function to run the callback. + +Now we need a simple benchmark, lets start with +`this <https://bitbucket.org/kostialopuhin/tornado-stm-bench/src/a038bf99de718ae97449607f944cecab1a5ae104/primes.py?at=default>`_ +- just calculate a list of primes up to the given number, and return it +as JSON:: + + def is_prime(n): + for i in xrange(2, n): + if n % i == 0: + return False + return True + + class MainHandler(tornado.web.RequestHandler): + def get(self, num): + num = int(num) + primes = [n for n in xrange(2, num + 1) if is_prime(n)] + self.write(json.dumps({'primes': primes})) + + +We can benchmark it with ``siege``:: + + siege -c 50 -t 20s http://localhost:8888/10000 + +But this does not scale. The CPU load is at 101-104 %, and we handle 30 % +less request per second. The reason for the slowdown is STM overhead, +which needs to keep track of all writes and reads in order to detect conflicts. +And the reason for using only one core is, obviously, conflicts! +Fortunately, we can see what this conflicts are, if we run code like this +(here 4 is the number of cores to use):: + + PYPYSTM=stm.log ./primes.py 4 + +Than we can use `print_stm_log.py <https://bitbucket.org/pypy/pypy/raw/stmgc-c7/pypy/stm/print_stm_log.py>`_ +to analyse this log. It lists the most expensive conflicts:: + + 14.793s lost in aborts, 0.000s paused (1258x STM_CONTENTION_INEVITABLE) + File "/home/ubuntu/tornado-stm/tornado/tornado/httpserver.py", line 455, in __init__ + self._start_time = time.time() + File "/home/ubuntu/tornado-stm/tornado/tornado/httpserver.py", line 455, in __init__ + self._start_time = time.time() + + ... + +There are only three kinds of conflicts, they are described in +`stm source <https://bitbucket.org/pypy/pypy/src/6355617bf9a2a0fa8b74ae17906e4a591b38e2b5/rpython/translator/stm/src_stm/stm/contention.c?at=stmgc-c7>`_, +Here we see that two theads call into external function (get current time), +and we can not rollback any of them, so one of them must wait till the other +transaction finishes. +For now we can hack around this by disabling this timing - this is only +needed for internal profiling in tornado. + +If we do it, we get the following results: + +============ ========= +Impl. req/s +============ ========= +PyPy 2.4 14.4 +------------ --------- +CPython 2.7 3.2 +------------ --------- +PyPy-STM 1 9.3 +------------ --------- +PyPy-STM 2 16.4 +------------ --------- +PyPy-STM 3 20.4 +------------ --------- +PyPy STM 4 24.2 +============ ========= + +As we can see, in this benchmark PyPy STM using just two cores +can beat regular PyPy! +This is not linear scaling, there are still conflicts left, and this +is a very simple example but still, it works! And it was easy! + +Although it is defintily not ready for production use, you can alreay try +to run things, report bugs, and see what is missing in user-facing tools +and libraries. + +Benchmark setup: + +* Amazon c3.xlarge (4 cores) running Ubuntu 14.04 +* pypy-c-r74011-stm-jit +* http://bitbucket.org/kostialopuhin/tornado-stm-bench at a038bf9 +* for PyPy-STM in this test the variation is rather high (around 20%), + best results from ``./bench_primes.sh`` were reported + _______________________________________________ pypy-commit mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-commit
