Author: Armin Rigo <ar...@tunes.org> Branch: extradoc Changeset: r5170:bf6679f66da3 Date: 2014-04-04 11:26 +0200 http://bitbucket.org/pypy/extradoc/changeset/bf6679f66da3/
Log: finish the draft diff --git a/planning/tmdonate2.txt b/planning/tmdonate2.txt --- a/planning/tmdonate2.txt +++ b/planning/tmdonate2.txt @@ -253,7 +253,7 @@ Goal 1 ------ -The PyPy-STM that we have in the end of March 2014 is good enough in +The PyPy-TM that we have in the end of March 2014 is good enough in some cases to run existing multithreaded code without a GIL, but not in all of them. There are a number of caveats for the user and missing optimizations. The goal #1 is to improve this case and address @@ -279,7 +279,7 @@ * Forking the process is slow. Fixing all these issues is required before we can confidently say that -PyPy-STM is an out-of-the-box replacement of a regular PyPy which gives +PyPy-TM is an out-of-the-box replacement of a regular PyPy which gives speed-ups over the regular PyPy independently of the Python program it runs, as long as it is using at least two threads. @@ -292,46 +292,111 @@ and libraries accessible from Python programs that want to make use of this benefit. -XXX improve from here +This goal requires good support for very-long-running transactions, +started with the ``with atomic`` construct documented here__. This +approach hides the notion of threads from the end programmer, including +all the hard multithreading-related issues. This is not the first +alternative approach to explicit threads; for example, OpenMP_ is one. +However, it is one of the first ones which does not require the code to +be organized in a particular fashion. Instead, it works on any Python +program which has got latent, imperfect parallelism. Ideally, it only +requires that the end programmer identifies where this parallelism is +likely to be found, and communicates it to the system, using some +lightweight library on top of ``with atomic``. -The goal is to improve the existing atomic sections, but the most -visible missing thing is that you don't get reports about the -"conflicts" you get. This would be the first thing that you need in -order to start using atomic sections more extensively. Also, for now: -for better results, try to explicitly force a transaction break just -before (and possibly after) each large atomic section, with -``time.sleep(0)``. +This introduces new issues. At the very least, we need a way to get +feedback about what conflicts we get in these long-running transactions, +and where they are produced. A first step will be to implement getting +"tracebacks" that point to the places where the most time is lost. This +could be later integrated into some "debugger"-like variant where we can +navigate the conflicts, either in a live program or based on data logs. -This approach hides the notion of threads from the end programmer, -including all the hard multithreading-related issues. This is not the -first alternative approach to explicit threads; for example, OpenMP_ is -one. However, it is one of the first ones which does not require the -code to be organized in a particular fashion. Instead, it works on any -Python program which has got latent, imperfect parallelism. Ideally, it -only requires that the end programmer identifies where this parallelism -is likely to be found, and communicates it to the system, using for -example the ``transaction.add()`` scheme. +Some of these conflicts can be solved by improving PyPy-TM directly. +The system works on the granularity of objects and doesn't generate +false conflicts, but some conflicts may be regarded as "false" anyway: +these involve most importantly the built-in dictionary type, for which +we would like accesses and writes using independent keys to be truly +independent. Other built-in data structures we a similar issue are +lists: ideally, writes to different indexes should not cause conflicts; +but more generally, we would need a mechanism, possibly under the +control of the application, to do things like append an item to a list +in a "delayed" manner, to avoid conflicts. -XXX Talk also about dict- or list-specific conflict avoidance; -delaying some updates or I/O; etc. etc. +.. __: https://pypy.readthedocs.org/en/latest/stm.html + +Similarly, we might need a way to delay some I/O: doing it only at the +end of the transaction rather than immediately, in order to prevent the +whole transaction from turning inevitable. + +The goal 2 is thus the development of tools to inspect and fix the +causes of conflicts, as well as fixing the ones that are apparent inside +PyPy-TM directly. Goal 3 ------ -XXX +The third goal is to look at some existing event-based frameworks (for +example Twisted, Tornado, Stackless, gevent, ...) and attempt to make +them use threads and atomic sections internally. We would appreciate +help and feedback from people more involved in these frameworks, of +course. +The idea is to apply the techniques described in the `goal 2`_ until we +get a version of framework X which can transparently parallelize the +dispatching of multiple events. This might require some slight +reorganization of the core in order to split the I/O and the actual +logic into separate transactions. ---------- -XXX fix -Total: 5 months for the initial version; at least 8 additional months -for the fast version. We will go with a total estimate of 15 months, -corresponding to USD$151200. The amount sought by this fundraising -campaign, considering the 2 volunteer hours per paid hour is thus USD$50400. +Funding +------- + +We forecast that goal 1 and a good chunk of goal 2 should be reached in +around 4 months of work. The remaining parts of goal 2 as well as goal +3 are likely to be more open-ended jobs. We will go with a total +estimate of 8 months, corresponding to roughly the second half of the +`original call for proposal`_ which was not covered so far. This +corresponds to USD$80640. The amount sought by this fundraising +campaign, considering the 2 volunteer hours per paid hour is thus +USD$26880. Benefits of This Work to the Python Community and the General Public ==================================================================== -XXX +Python has become one of the most popular dynamic programming languages in +the world. Web developers, educators, and scientific programmers alike +all value Python because Python code is often more readable and because +Python often increases programmer productivity. + +Traditionally, languages like Python ran more slowly than static, compiled +languages; Python developers chose to sacrifice execution speed for ease +of programming. The PyPy project created a substantially improved Python +language implementation, including a fast Just-in-time (JIT) compiler. +The increased execution speed that PyPy provides has attracted many users, +who now find their Python code runs up to four times faster under PyPy +than under the reference implementation written in C. + +However, in the presence of today's machines with multiple processors, +Python progress lags behind. The issue has been described in the +introduction: developers that really need to use multiple CPUs are +constrained to select and use one of the multi-process solutions that +are all in some way or another hacks requiring extra knowledge and +efforts to use. The focus of the work described in this proposal is to +offer an alternative in the core of the Python language --- an +alternative that can naturally integrate with the rest of the program. +This alternative is implemented in PyPy. + +PyPy's developers make all PyPy software available to the public without +charge, under PyPy's Open Source copyright license, the permissive MIT +License. PyPy's license assures that PyPy is equally available to +everyone freely on terms that allow both non-commercial and commercial +activity. This license allows for academics, for-profit software +developers, volunteers and enthusiasts alike to collaborate together to +make a better Python implementation for everyone. + +PyPy-TM is and continues to be available under the same license. Being +licensed freely to the general public means that opportunities to use, +improve and learn about how Transactional Memory works itself will be +generally available to everyone. _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit