This is still rather rough, but I figured it's easier to let everybody fill in the remaining gaps by arguments than it is for me to pick a position I like and try to convince everybody else that it's right. :) Your feedback is requested and welcome.
PEP: XXX Title: Task-local Variables Author: Phillip J. Eby <[EMAIL PROTECTED]> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 19-Oct-2005 Python-Version: 2.5 Post-History: 19-Oct-2005 Abstract ======== Many Python modules provide some kind of global or thread-local state, which is relatively easy to implement. With the acceptance of PEP 342, however, co-routines will become more common, and it will be desirable in many cases to treat each as its own logical thread of execution. So, many kinds of state that might now be kept as a thread-specific variable (such as the "current transaction" in ZODB or the "current database connection" in SQLObject) will not work with coroutines. This PEP proposes a simple mechanism akin to thread-local variables, but which will make it easy and efficient for co-routine schedulers to switch state between tasks. The mechanism is proposed for the standard library because its usefulness is dependent on its adoption by standard library modules, such as the ``decimal`` module. The proposed features can be implemented as pure Python code, and as such are suitable for use by other Python implementations (including older versions of Python, if desired). Motivation ========== PEP 343's new "with" statement makes it very attractive to temporarily alter some aspect of system state, and then restore it, using a context manager. Many of PEP 343's examples are of this nature, whether they are temporarily redirecting ``sys.stdout``, or temporarily altering decimal precision. But when this attractive feature is combined with PEP 342-style co-routines, a new challenge emerges. Consider this code, which may misbehave if run as a co-routine:: with opening(filename, "w") as f: with redirecting_stdout(f): print "Hello world" yield pause(5) print "Goodbye world" Problems can arise from this code in two ways. First, the redirection of output "leaks out" to other coroutines during the pause. Second, when this coroutine is finished, it resets stdout to whatever it was at the beginning of the coroutine, regardless of what another co-routine might have been using. Similar issues can be demonstrated using the decimal context, transactions, database connections, etc., which are all likely to be popular contexts for the "with" statement. However, if these new context managers are written to use global or thread-local state, coroutines will be locked out of the market, so to speak. Therefore, this PEP proposes to provide and promote a standard way of managing per-execution-context state, such that coroutine schedulers can keep each coroutine's state distinct. If this mechanism is then used by library modules (such as ``decimal``) to maintain their current state, then they will be transparently compatible with co-routines as well as threaded and threadless code. (Note that for Python 2.x versions, backward compatibility requires that we continue to allow direct reassignment to e.g. ``sys.stdout``. So, it will still of course be possible to write code that will interoperate poorly with co-routines. But for Python 3.x it seems worth considering making some of the ``sys`` module's contents into task-local variables rather than assignment targets.) Specification ============= This PEP proposes to offer a standard library module called ``context``, with the following core contents: Variable A class that allows creation of a context variable (see below). snapshot() Returns a snapshot of the current execution context. swap(ctx) Set the current context to `ctx`, returning a snapshot of the current context. The basic idea here is that a co-routine scheduler can switch between tasks by doing something like:: last_coroutine.state = context.swap(next_coroutine.state) Or perhaps more like:: # ... execute coroutine iteration last_coroutine.state = context.snapshot() # ... figure out what routine to run next context.swap(next_coroutine.state) Each ``context.Variable`` stores and retrieves its state using the current execution context, which is thread-specific. (Thus, each thread may execute any number of concurrent tasks, although most practical systems today have only one thread that executes coroutines, the other threads being reserved for operations that would otherwise block co-routine execution. Nonetheless, such other threads will often still require context variables of their own.) Context Variable Objects ------------------------ A context variable object provides the following methods: get(default=None) Return the value of the variable in the current execution context, or `default` if not set. set(value) Set the value of the variable for the current execution context. unset() Delete the value of the variable for the current execution context. __call__(*value) If called with an argument, return a context manager that sets the variable to the specified value, then restores the old value upon ``__exit__``. If called without an argument, return the value of the variable for the current execution context, or raise an error if no value is set. Thus:: with some_variable(value): foo() would be roughly equivalent to:: old = some_variable() some_variable.set(value) try: foo() finally: some_variable.set(old) Implementation Details ---------------------- The simplest possible implementation is for ``Variable`` objects to use themselves as unique keys into an execution context dictionary. The context dictionary would be stored in another dictionary, keyed by ``get_thread_ident()``. This approach would work with almost any version or implementation of Python. For efficiency's sake, however, CPython could simply store the execution context dictionary in its "thread state" structure, creating an empty dictionary at thread initialization time. This would make it somewhat easier to offer a C API for access to context variables, especially where efficiency of access is desirable. But the proposal does not depend on this. In the PEP author's experiments, a simple copy-on-write optimization to the the ``set()`` and ``unset()`` methods allows for high performance task switching. By placing a "frozen" flag in the context dictionary when a snapshot is taken, and then checking for the flag before making changes, a single snapshot can be shared by multiple callers, and thus a ``swap()`` operation is little more than two dictionary writes and a read. This leads to higher performance in the typical case, because context variables are more likely to set in outer loops, but task switches are more likely to occur in inner loops. A copy-on-write approach thus prevents copying from occurring during most task switches. Possible Enhancements --------------------- The core of this proposal is extremely minimalist, as it should be possible to do almost anything desired using combinations of ``Variable`` objects or by simply using variables whose values are mutable objects. There are, however, a variety of options for enhancement: ``manager`` decorator The ``context`` module could perhaps be the home of the PEP 343 ``contextmanager`` decorator, effectively renamed to ``context.manager``. This could be a natural fit, in that it would remind the creators of new context managers that they should consider tracking any associated state in a ``context.Variable``. Proxy class Sometimes it's useful to have an object that looks like a module global (e.g. ``sys.stdout``) but which actually delegates its behavior to a context-specific instance. Thus, you could have one ``sys.stdout``, but its actual output would be directed based on the current execution context. The simplest form of such a proxy class might look something like:: class Proxy(object): def __init__(self, initial_value): self.var = context.Variable() self.var.set(initial_value) def __call__(self,*value): return object.__getattribute__(self,'var')(*value) def __getattribute__(self, attr): var = object.__getattribute__(self,'var') return getattr(var, attr) sys.stdout = Proxy(sys.stdout) # make sys.stdout selectable with sys.stdout(somefile): # temporary redirect in current context print "hey!" The main open issues in implementing this sort of proxy are in the precise set of special methods (e.g. ``__getitem__``, ``__setattr__``, etc.) that should be supported, and what API should be supplied for changing the value, setting a default value for new threads, etc. Low-level API Currently, this PEP does not specify an API for accessing and modifying the current execution context, nor a C API for such access. It currently assumes that ``snapshot()``, ``swap()`` and ``Variable`` are the only public means of accessing context information. It may be desirable to offer finer-grained APIs for use by more advanced uses (such as creating an API for management of proxies). And it may be desirable to have a C API for use by Python extensions that wish convenient access to context variables. Rationale ========= Different libraries have different uses for maintaining a "current" state, be it global or local to a specific thread or task. There is currently no way for task-management code to find and switch all of these "current" states. And even if it could, task switching performance would degrade linearly as new libraries were added. One possible alternative approach to this proposal, would be for explicit task objects to exist, and to provide a way to give them identities, so that libraries could instead store their own state as a property of the task, rather than storing their state in a task-specific mapping. This offers similar potential performance to a copy-on-write strategy, but would use more memory than this proposal when only one task is involved. (Because each variable would have a dictionary mapping from task to the variable's value, but in this proposal there is simply a single dictionary for the task.) Some languages offer "dynamically scoped" variables that are somewhat similar in behavior to the context variables proposed by this PEP. The principal differences are that: 1. Context variables are objects used to obtain or save a value, rather than being a syntactic construct of the language. 2. PEP 343 allows for *controlled* manipulation of context variables, helping to prevent "duelling libraries" from changing state on each other. Also, a library can potentially ``snapshot()`` a desired state at startup, and use ``swap()`` to restore that state on re-entry. (And could even define a simple decorator to wrap its entry points to ensure this.) 3. The PEP author is not aware of any language that explicitly offers coroutine-scoped variables, but presumes that they can be modelled with monads or continuations in functional languages like Haskell. (And I only mention this to forestall the otherwise-inevitable response from fans of such techniques, pointing out that it's possible.) Reference Implementation ======================== The author has prototyped an implementation with somewhat fancier features than shown here, but prefers not to publish it until the basic features and choices of optional functionality have been discussed on Python-Dev. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com