Author: Armin Rigo <ar...@tunes.org> Branch: stm-thread Changeset: r54984:1b4bb89d133e Date: 2012-05-09 11:56 +0200 http://bitbucket.org/pypy/pypy/changeset/1b4bb89d133e/
Log: Starting to write some documentation. diff --git a/pypy/doc/stm.rst b/pypy/doc/stm.rst new file mode 100644 --- /dev/null +++ b/pypy/doc/stm.rst @@ -0,0 +1,167 @@ + +============================= +Software Transactional Memory +============================= + + +Introduction +============ + +PyPy can be translated in a special mode based on Software Transactional +Memory (STM). This mode is not compatible with the JIT so far, and moreover +adds a constant run-time overhead in the range 2x to 5x. The benefit is +that the resulting ``pypy-stm`` can execute multiple threads of Python code +in parallel. + + +High-level interface +==================== + +At the lowest levels, the Global Interpreter Lock (GIL) was just +replaced with STM techniques. This gives a ``pypy-stm`` that should +behave identically to a regular GIL-enabled PyPy, but run multithreaded +programs in a way that scales with the number of cores. The details of +the implementation are explained below. + +However, what we are pushing for is *not writing multithreaded programs* +at all. It is possible to use higher-level interfaces. The basic one +is found in the ``transaction`` module (XXX name to change). Minimal +example of usage:: + + for i in range(10): + transaction.add(do_stuff, i) + transaction.run() + +This schedules and runs all ten ``do_stuff(i)``. Each one appears to +run serially, but in random order. It is also possible to ``add()`` +more transactions within each transaction, to schedule additional pieces +of work. The call to ``run()`` returns when all transactions have +completed. + +The module is written in pure Python (XXX not written yet, add url). +See the source code to see how it is based on the `low-level interface`_. + + +Low-level interface +=================== + +``pypy-stm`` offers one additional low-level API: ``thread.atomic``. +This is a context manager to use in a ``with`` statement. Any code +running in the ``with thread.atomic`` block is guaranteed to be fully +serialized with respect to any code run by other threads (so-called +*strong isolation*). + +Note that this is a guarantee of observed behavior: under the conditions +described below, multiple ``thread.atomic`` blocks can actually run in +parallel. + +Classical minimal example: in a thread, you want to pop an item from +``list1`` and append it to ``list2``, knowing that both lists can be +mutated concurrently by other threads. Using ``thread.atomic`` this +can be done without careful usage of locks:: + + with thread.atomic: + x = list1.pop() + list2.append(x) + +Note that, unlike this minimal example, the expected usage is that +``thread.atomic`` blocks are potentially very complex and long-running. +This is what you typically get with the `high-level interface`_. + + +Interaction with locks +====================== + +Existing multi-threaded programs usually rely on locks, either directly +from ``thread.allocate_lock()`` or by using variants from the +``threading`` module. Actually, some operations in the interpreter +itself acquire locks internally too; most notably, any file access does. + +These locks work fine in ``pypy-stm`` either outside ``thread.atomic`` +blocks or inside ``thread.atomic`` blocks. However, due to hard +technical issues, it is not really possible for them to work correctly +if a ``thread.atomic`` block tries to acquire a lock that is already +acquired. In that situation (only), acquiring the lock will raise +``thread.error``. + +A common way for this issue to show up is using ``print`` statements, +because of the internal lock on ``stdout``. You are free to use +``print`` either outside ``thread.atomic`` blocks or inside them, but +not both concurrently. An easy fix is to put all ``print`` statements +inside ``thread.atomic`` blocks. Writing this kind of code:: + + with thread.atomic: + print "hello, the value is:" + print "\t", value + +actually also helps ensuring that the whole line or lines are printed +atomically, instead of being broken up with interleaved output from +other threads. + +In this case, it is always a good idea to protect ``print`` statements +by ``thread.atomic``. But not all file operations benefit: if you have +a read from a file that may block for some time, putting it in a +``thread.atomic`` would have the negative effect of suspending all other +threads while we wait for data to arrive, as described next__. + +.. __: Parallelization_ + + +Parallelization +=============== + +How much actual parallelization a multithreaded program can see is a bit +subtle. The exact rules may vary over time, too. Here is an overview. + +Each thread is actually running as a sequence of "transactions", which +are separated by "transaction breaks". The execution of the whole +multithreaded program works as if all transactions were serialized, but +actually executing the transactions in parallel. + +This works as long as two principles are respected. The first one is +that the transactions must not *conflict* with each other. The most +obvious sources of conflicts are threads that all increment a global +shared counter, or that all store the result of their computations into +the same shared list. (It is expected that some STM-aware library will +eventually be designed to help with sharing problems, like a STM-aware +list or queue.) + +The other principle is that of avoiding long-running "inevitable" +transactions. This is more subtle to understand. The first thing we +need to learn is where transaction breaks are. There are two modes of +execution: either we are in a ``with thread.atomic`` block (or possibly +several nested ones), or not. + +If we are not in ``thread.atomic`` mode, then transaction breaks occur +at the following places: + +* from time to time between the execution of two bytecodes; +* across an external system call. + +Transaction breaks *never* occur in ``thread.atomic`` mode. + +Every transaction can further be in one of two modes: either "normal" or +"inevitable". To simplify, a transaction starts in "normal" mode, but +switches to "inevitable" as soon as it performs input/output. If we +have an inevitable transaction, all other transactions are paused; this +effect is similar to the GIL. + +In the absence of ``thread.atomic``, inevitable transactions only have a +small effect. Indeed, as soon as the current bytecode finishes, the +interpreter notices that the transaction is inevitable and immediately +introduces a transaction break in order to switch back to a normal-mode +transaction. It means that inevitable transactions only run for a short +fraction of the time. + +With ``thread.atomic`` however you have to be a bit careful, because the +next transaction break will only occur after the end of the outermost +``with thread.atomic``. Basically, you should organize your code in +such a way that for any ``thread.atomic`` block that runs for a +noticable time, any I/O is done near the end of it, not when there is +still a lot of CPU time ahead. + + +Implementation +============== + +XXX diff --git a/pypy/module/thread/stm.py b/pypy/module/thread/stm.py --- a/pypy/module/thread/stm.py +++ b/pypy/module/thread/stm.py @@ -36,7 +36,7 @@ if flag and not acquired: raise wrap_thread_error(self.space, "deadlock: an atomic transaction tries to acquire " - "a lock that is already acquired. See http://XXX.") + "a lock that is already acquired. See pypy/doc/stm.rst.") else: acquired = ll_thread.Lock.acquire(self, flag) return acquired _______________________________________________ pypy-commit mailing list pypy-commit@python.org http://mail.python.org/mailman/listinfo/pypy-commit