Author: Antonio Cuni <anto.c...@gmail.com> Branch: extradoc Changeset: r4289:e99feb284e66 Date: 2012-07-10 11:16 +0200 http://bitbucket.org/pypy/extradoc/changeset/e99feb284e66/
Log: merge heads diff --git a/.hgignore b/.hgignore --- a/.hgignore +++ b/.hgignore @@ -1,3 +1,11 @@ syntax: glob *.py[co] *~ +talk/ep2012/stackless/slp-talk.aux +talk/ep2012/stackless/slp-talk.latex +talk/ep2012/stackless/slp-talk.log +talk/ep2012/stackless/slp-talk.nav +talk/ep2012/stackless/slp-talk.out +talk/ep2012/stackless/slp-talk.snm +talk/ep2012/stackless/slp-talk.toc +talk/ep2012/stackless/slp-talk.vrb \ No newline at end of file diff --git a/blog/draft/plans-for-2-years.rst b/blog/draft/plans-for-2-years.rst new file mode 100644 --- /dev/null +++ b/blog/draft/plans-for-2-years.rst @@ -0,0 +1,73 @@ +What we'll be busy for the forseeable future +============================================ + +Hello. + +The PyPy dev process has been dubbed as too opaque. In this blog post +we try to highlight a few projects being worked on or in plans for the near +future. As it usually goes with such lists, don't expect any deadlines, +it's more "a lot of work that will keep us busy". It also answers +whether or not PyPy has achieved its total possible performance. + +Here is the list of areas, mostly with open branches. Note that the list is +not exhaustive - in fact it does not contain all the areas that are covered +by funding, notably numpy, STM and py3k. + +Iterating in RPython +==================== + +Right now code that has a loop in RPython can be surprised by receiving +an iterable it does not expect. This ends up with doing an unnecessary copy +(or two or three in corner cases), essentially forcing an iterator. +An example of such code would be:: + + import itertools + ''.join(itertools.repeat('ss', 10000000)) + +Would take 4s on PyPy and .4s on CPython. That's absolutely unacceptable :-) + +More optimized frames and generators +==================================== + +Right now generator expressions and generators have to have full frames, +instead of optimized ones like in the case of python functions. This leads +to inefficiences. There is a plan to improve the situation on the +``continuelet-jit-2`` branch. ``-2`` in branch names means it's hard and +has been already tried unsuccessfully :-) + +A bit by chance it would make stackless work with the JIT. Historically though, +the idea was to make stackless work with the JIT and later figured out this +could also be used for generators. Who would have thought :) + +This work should allow to improve the situation of uninlined functions +as well. + +Dynamic specialized tuples and instances +======================================== + +PyPy already uses maps. Read our `blog`_ `posts`_ about details. However, +it's possible to go even further, by storing unboxed integers/floats +directly into the instance storage instead of having pointers to python +objects. This should improve memory efficiency and speed for the cases +where your instances have integer or float fields. + +Tracing speed +============= + +PyPy is probably one of the slowest compilers when it comes to warmup times. +There is no open branch, but we're definitely thinking about the problem :-) + +Bridge optimizations +==================== + +Another "area of interest" is bridge generation. Right now generating a bridge +from compiled loop "forgets" some kind of optimization information from the +loop. + +GC pinning and I/O performance +============================== + +``minimark-gc-pinning`` branch tries to improve the performance of the IO. + +32bit on 64bit +============== diff --git a/talk/dls2012/licm.pdf b/talk/dls2012/licm.pdf new file mode 100644 index 0000000000000000000000000000000000000000..dd7d2286dbdb2201e2f9e266c9279ce9a9ba2a0d GIT binary patch [cut] diff --git a/talk/dls2012/paper.tex b/talk/dls2012/paper.tex --- a/talk/dls2012/paper.tex +++ b/talk/dls2012/paper.tex @@ -124,6 +124,8 @@ One of the nice properties of a tracing JIT is that many of its optimization are simple requiring one forward pass only. This is not true for loop-invariant code motion which is a very important optimization for code with tight kernels. +Especially for dynamic languages that typically performs quite a lot of loop invariant +type checking, boxed value unwrapping and virtual method lookups. In this paper we present a scheme for making simple optimizations loop-aware by using a simple pre-processing step on the trace and not changing the optimizations themselves. The scheme can give performance improvements of a @@ -141,13 +143,15 @@ \section{Introduction} -A dynamically typed language needs to do a lot of type -checking and unwrapping. For tight computationally intensive loops a +A dynamic language typically needs to do quite a lot of type +checking, wrapping/unwrapping of boxed values, and virtual method dispatching. +For tight computationally intensive loops a significant amount of the execution time might be spend on such tasks -instead of the actual calculations. Moreover, the type checking and -unwrapping is often loop invariant and performance could be increased -by moving those operations out of the loop. We propose to design a -loop-aware tracing JIT to perform such optimization at run time. +instead of the actual computations. Moreover, the type checking, +unwrapping and method lookups are often loop invariant and performance could be increased +by moving those operations out of the loop. We propose a simple scheme +to make a tracing JIT loop-aware by allowing it's existing optimizations to +perform loop invariant code motion. One of the advantages that tracing JIT compilers have above traditional method-based @@ -533,7 +537,7 @@ Each operation in the trace is copied in order. To copy an operation $v=\text{op}\left(A_1, A_2, \cdots, A_{|A|}\right)$ -a new variable, $\hat v$ is introduced. The copied operation will +a new variable, $\hat v$, is introduced. The copied operation will return $\hat v$ using \begin{equation} \hat v = \text{op}\left(m\left(A_1\right), m\left(A_2\right), @@ -696,12 +700,12 @@ By constructing a vector, $H$, of such variables, the input and jump arguments can be updated using \begin{equation} - \hat J = \left(J_1, J_2, \cdots, J_{|J|}, H_1, H_2, \cdots, H_{|H}\right) + \hat J = \left(J_1, J_2, \cdots, J_{|J|}, H_1, H_2, \cdots, H_{|H|}\right) \label{eq:heap-inputargs} \end{equation} and \begin{equation} - \hat K = \left(K_1, K_2, \cdots, K_{|J|}, m(H_1), m(H_2), \cdots, m(H_{|H})\right) + \hat K = \left(K_1, K_2, \cdots, K_{|J|}, m(H_1), m(H_2), \cdots, m(H_{|H|})\right) . \label{eq:heap-jumpargs} \end{equation} @@ -772,7 +776,7 @@ . \end{equation} The arguments of the \lstinline{jump} operation of the peeled loop, -$K$, is constructed by inlining $\hat J$, +$K$, is constructed from $\hat J$ using the map $m$, \begin{equation} \hat K = \left(m\left(\hat J_1\right), m\left(\hat J_1\right), \cdots, m\left(\hat J_{|\hat J|}\right)\right) diff --git a/talk/ep2012/stackless/Makefile b/talk/ep2012/stackless/Makefile new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/Makefile @@ -0,0 +1,15 @@ +# you can find rst2beamer.py here: +# http://codespeak.net/svn/user/antocuni/bin/rst2beamer.py + +slp-talk.pdf: slp-talk.rst author.latex title.latex stylesheet.latex + rst2beamer.py --stylesheet=stylesheet.latex --documentoptions=14pt slp-talk.rst slp-talk.latex || exit + sed 's/\\date{}/\\input{author.latex}/' -i slp-talk.latex || exit + sed 's/\\maketitle/\\input{title.latex}/' -i slp-talk.latex || exit + sed 's/\\usepackage\[latin1\]{inputenc}/\\usepackage[utf8]{inputenc}/' -i slp-talk.latex || exit + pdflatex slp-talk.latex || exit + +view: slp-talk.pdf + evince talk.pdf & + +xpdf: slp-talk.pdf + xpdf slp-talk.pdf & diff --git a/talk/ep2012/stackless/author.latex b/talk/ep2012/stackless/author.latex new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/author.latex @@ -0,0 +1,8 @@ +\definecolor{rrblitbackground}{rgb}{0.0, 0.0, 0.0} + +\title[The Story of Stackless Python]{The Story of Stackless Python} +\author[tismer, nagare] +{Christian Tismer, Hervé Coatanhay} + +\institute{EuroPython 2012} +\date{July 4 2012} diff --git a/talk/ep2012/stackless/beamerdefs.txt b/talk/ep2012/stackless/beamerdefs.txt new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/beamerdefs.txt @@ -0,0 +1,108 @@ +.. colors +.. =========================== + +.. role:: green +.. role:: red + + +.. general useful commands +.. =========================== + +.. |pause| raw:: latex + + \pause + +.. |small| raw:: latex + + {\small + +.. |end_small| raw:: latex + + } + +.. |scriptsize| raw:: latex + + {\scriptsize + +.. |end_scriptsize| raw:: latex + + } + +.. |strike<| raw:: latex + + \sout{ + +.. closed bracket +.. =========================== + +.. |>| raw:: latex + + } + + +.. example block +.. =========================== + +.. |example<| raw:: latex + + \begin{exampleblock}{ + + +.. |end_example| raw:: latex + + \end{exampleblock} + + + +.. alert block +.. =========================== + +.. |alert<| raw:: latex + + \begin{alertblock}{ + + +.. |end_alert| raw:: latex + + \end{alertblock} + + + +.. columns +.. =========================== + +.. |column1| raw:: latex + + \begin{columns} + \begin{column}{0.45\textwidth} + +.. |column2| raw:: latex + + \end{column} + \begin{column}{0.45\textwidth} + + +.. |end_columns| raw:: latex + + \end{column} + \end{columns} + + + +.. |snake| image:: ../../img/py-web-new.png + :scale: 15% + + + +.. nested blocks +.. =========================== + +.. |nested| raw:: latex + + \begin{columns} + \begin{column}{0.85\textwidth} + +.. |end_nested| raw:: latex + + \end{column} + \end{columns} diff --git a/talk/ep2012/stackless/demo/pickledtasklet.py b/talk/ep2012/stackless/demo/pickledtasklet.py new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/demo/pickledtasklet.py @@ -0,0 +1,25 @@ +import pickle, sys +import stackless + +ch = stackless.channel() + +def recurs(depth, level=1): + print 'enter level %s%d' % (level*' ', level) + if level >= depth: + ch.send('hi') + if level < depth: + recurs(depth, level+1) + print 'leave level %s%d' % (level*' ', level) + +def demo(depth): + t = stackless.tasklet(recurs)(depth) + print ch.receive() + pickle.dump(t, file('tasklet.pickle', 'wb')) + +if __name__ == '__main__': + if len(sys.argv) > 1: + t = pickle.load(file(sys.argv[1], 'rb')) + t.insert() + else: + t = stackless.tasklet(demo)(9) + stackless.run() diff --git a/talk/ep2012/stackless/eurpython-2012.pptx b/talk/ep2012/stackless/eurpython-2012.pptx new file mode 100644 index 0000000000000000000000000000000000000000..9b34bb66e92cbe27ce5dc5c3928fe9413abf2cef GIT binary patch [cut] diff --git a/talk/ep2012/stackless/logo_small.png b/talk/ep2012/stackless/logo_small.png new file mode 100644 index 0000000000000000000000000000000000000000..acfe083b78f557c394633ca542688a2bfca6a5e8 GIT binary patch [cut] diff --git a/talk/ep2012/stackless/slp-talk.pdf b/talk/ep2012/stackless/slp-talk.pdf new file mode 100644 index 0000000000000000000000000000000000000000..afcb8c00b73bb83d114dc4e0d9c8ec1157800ef3 GIT binary patch [cut] diff --git a/talk/ep2012/stackless/slp-talk.rst b/talk/ep2012/stackless/slp-talk.rst new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/slp-talk.rst @@ -0,0 +1,675 @@ +.. include:: beamerdefs.txt + +============================================ +The Story of Stackless Python +============================================ + + +About This Talk +---------------- + +* first talk after a long break + + - *rst2beamer* for the first time + +guest speaker: + +* Herve Coatanhay about Nagare + + - PowerPoint (Mac) + +|pause| + +Meanwhile I used + +* Powerpoint (PC) + +* Keynote (Mac) + +* Google Docs + +|pause| + +poll: What is your favorite slide tool? + +What is Stackless? +------------------- + +* *Stackless is a Python version that does not use the C stack* + + |pause| + + - really? naah + +|pause| + +* Stackless is a Python version that does not keep state on the C stack + + - the stack *is* used but + + - cleared between function calls + +|pause| + +* Remark: + + - theoretically. In practice... + + - ... it is reasonable 90 % of the time + + - we come back to this! + + +What is Stackless about? +------------------------- + +* it is like CPython + +|pause| + +* it can do a little bit more + +|pause| + +* adds a single builtin module + +|pause| + +|scriptsize| +|example<| |>| + + .. sourcecode:: python + + import stackless + +|end_example| +|end_scriptsize| + +|pause| + +* is like an extension + + - but, sadly, not really + + - stackless **must** be builtin + + - **but:** there is a solution... + + +Now, what is it really about? +------------------------------ + +* have tiny little "main" programs + + - ``tasklet`` + +|pause| + +* tasklets communicate via messages + + - ``channel`` + +|pause| + +* tasklets are often called ``microthreads`` + + - but there are no threads at all + + - only one tasklets runs at any time + +|pause| + +* *but see the PyPy STM* approach + + - this will apply to tasklets as well + + +Cooperative Multitasking ... +------------------------------- + +|scriptsize| +|example<| |>| + + .. sourcecode:: pycon + + >>> import stackless + >>> + >>> channel = stackless.channel() + +|pause| + + .. sourcecode:: pycon + + >>> def receiving_tasklet(): + ... print "Receiving tasklet started" + ... print channel.receive() + ... print "Receiving tasklet finished" + +|pause| + + .. sourcecode:: pycon + + >>> def sending_tasklet(): + ... print "Sending tasklet started" + ... channel.send("send from sending_tasklet") + ... print "sending tasklet finished" + +|end_example| +|end_scriptsize| + + +... Cooperative Multitasking ... +--------------------------------- + +|scriptsize| +|example<| |>| + + .. sourcecode:: pycon + + >>> def another_tasklet(): + ... print "Just another tasklet in the scheduler" + +|pause| + + .. sourcecode:: pycon + + >>> stackless.tasklet(receiving_tasklet)() + <stackless.tasklet object at 0x00A45B30> + >>> stackless.tasklet(sending_tasklet)() + <stackless.tasklet object at 0x00A45B70> + >>> stackless.tasklet(another_tasklet)() + <stackless.tasklet object at 0x00A45BF0> + +|end_example| +|end_scriptsize| + + +... Cooperative Multitasking +------------------------------- + +|scriptsize| +|example<| |>| + + .. sourcecode:: pycon + + <stackless.tasklet object at 0x00A45B70> + >>> stackless.tasklet(another_tasklet)() + <stackless.tasklet object at 0x00A45BF0> + >>> + >>> stackless.run() + Receiving tasklet started + Sending tasklet started + send from sending_tasklet + Receiving tasklet finished + Just another tasklet in the scheduler + sending tasklet finished + +|end_example| +|end_scriptsize| + + +Why not just the *greenlet* ? +------------------------------- + +* greenlets are a subset of stackless + + - can partially emulate stackless + + - there is no builtin scheduler + + - technology quite close to Stackless 2.0 + +|pause| + +* greenlets are about 10x slower to switch context because + using only hard-switching + + - but that's ok in most cases + +|pause| + +* greenlets are kind-of perfect + + - near zero maintenace + + - minimal interface + +|pause| + +* but the main difference is ... + + +Excurs: Hard-Switching +----------------------- + +Sorry ;-) + +Switching program state "the hard way": + +Without notice of the interpreter + +* the machine stack gets hijacked + + - Brute-Force: replace the stack with another one + + - like threads + +* stackless, greenlets + + - stack slicing + + - semantically same effect + +* switching works fine + +* pickling does not work, opaque data on the stack + + - this is more sophisticated in PyPy, another story... + + +Excurs: Soft-Switching +----------------------- + +Switching program state "the soft way": + +With knowledge of the interpreter + +* most efficient implementation in Stackless 3.1 + +* demands the most effort of the developers + +* no opaque data on the stack, pickling does work + + - again, this is more sophisticated in PyPy + +|pause| + +* now we are at the main difference, as you guessed ... + + +Pickling Program State +----------------------- + +|scriptsize| +|example<| Persistence (p. 1 of 2) |>| + + .. sourcecode:: python + + import pickle, sys + import stackless + + ch = stackless.channel() + + def recurs(depth, level=1): + print 'enter level %s%d' % (level*' ', level) + if level >= depth: + ch.send('hi') + if level < depth: + recurs(depth, level+1) + print 'leave level %s%d' % (level*' ', level) + +|end_example| + +# *remember to show it interactively* + +|end_scriptsize| + + +Pickling Program State +----------------------- + +|scriptsize| + +|example<| Persistence (p. 2 of 2) |>| + + .. sourcecode:: python + + + def demo(depth): + t = stackless.tasklet(recurs)(depth) + print ch.receive() + pickle.dump(t, file('tasklet.pickle', 'wb')) + + if __name__ == '__main__': + if len(sys.argv) > 1: + t = pickle.load(file(sys.argv[1], 'rb')) + t.insert() + else: + t = stackless.tasklet(demo)(9) + stackless.run() + + +|end_example| + +# *remember to show it interactively* + +|end_scriptsize| + + +Script Output 1 +----------------- + +|example<| |>| +|scriptsize| + + .. sourcecode:: pycon + + $ ~/src/stackless/python.exe demo/pickledtasklet.py + enter level 1 + enter level 2 + enter level 3 + enter level 4 + enter level 5 + enter level 6 + enter level 7 + enter level 8 + enter level 9 + hi + leave level 9 + leave level 8 + leave level 7 + leave level 6 + leave level 5 + leave level 4 + leave level 3 + leave level 2 + leave level 1 + +|end_scriptsize| +|end_example| + + +Script Output 2 +----------------- + +|example<| |>| +|scriptsize| + + .. sourcecode:: pycon + + $ ~/src/stackless/python.exe demo/pickledtasklet.py tasklet.pickle + leave level 9 + leave level 8 + leave level 7 + leave level 6 + leave level 5 + leave level 4 + leave level 3 + leave level 2 + leave level 1 + +|end_scriptsize| +|end_example| + + +Greenlet vs. Stackless +----------------------- + +* Greenlet is a pure extension module + + - but performance is good enough + +|pause| + +* Stackless can pickle program state + + - but stays a replacement of Python + +|pause| + +* Greenlet never can, as an extension + +|pause| + +* *easy installation* lets people select greenlet over stackless + + - see for example the *eventlet* project + + - *but there is a simple work-around, we'll come to it* + +|pause| + +* *they both have their application domains + and they will persist.* + + +Why Stackless makes a Difference +--------------------------------- + +* Microthreads ? + + - the feature where I put most effort into + + |pause| + + - can be emulated: (in decreasing speed order) + + - generators (incomplete, "half-sided") + + - greenlet + + - threads (even ;-) + +|pause| + +* Pickling program state ! == + +|pause| + +* **persistence** + + +Persistence, Cloud Computing +----------------------------- + +* freeze your running program + +* let it continue anywhere else + + - on a different computer + + - on a different operating system (!) + + - in a cloud + +* migrate your running program + +* save snapshots, have checkpoints + + - without doing any extra-work + + +Software archeology +------------------- + +* Around since 1998 + + - version 1 + + - using only soft-switching + + - continuation-based + + - *please let me skip old design errors :-)* + +|pause| + +* Complete redesign in 2002 + + - version 2 + + - using only hard-switching + + - birth of tasklets and channels + +|pause| + +* Concept merge in 2004 + + - version 3 + + - **80-20** rule: + + - soft-switching whenever possible + + - hard-switching if foreign code is on the stack + + - these 80 % can be *pickled* (90?) + +* This stayed as version 3.1 + +Status of Stackless Python +--------------------------- + +* mature + +* Python 2 and Python 3, all versions + +* maintained by + + - Richard Tew + - Kristjan Valur Jonsson + - me (a bit) + + +The New Direction for Stackless +------------------------------- + +* ``pip install stackless-python`` + + - will install ``slpython`` + - or even ``python`` (opinions?) + +|pause| + +* drop-in replacement of CPython + *(psssst)* + +|pause| + +* ``pip uninstall stackless-python`` + + - Stackless is a bit cheating, as it replaces the python binary + + - but the user perception will be perfect + +* *trying stackless made easy!* + + +New Direction (cont'd) +----------------------- + +* first prototype yesterday from + + Anselm Kruis *(applause)* + + - works on Windows + + |pause| + + - OS X + + - I'll do that one + + |pause| + + - Linux + + - soon as well + +|pause| + +* being very careful to stay compatible + + - python 2.7.3 installs stackless for 2.7.3 + - python 3.2.3 installs stackless for 3.2.3 + + - python 2.7.2 : *please upgrade* + - or maybe have an over-ride option? + +Consequences of the Pseudo-Package +----------------------------------- + +The technical effect is almost nothing. + +The psycological impact is probably huge: + +|pause| + +* stackless is easy to install and uninstall + +|pause| + +* people can simply try if it fits their needs + +|pause| + +* the never ending discussion + + - "Why is Stackless not included in the Python core?" + +|pause| + +* **has ended** + + - "Why should we, after all?" + + |pause| + + - hey Guido :-) + + - what a relief, for you and me + + +Status of Stackless PyPy +--------------------------- + +* was completely implemented before the Jit + + - together with + greenlets + coroutines + + - not Jit compatible + +* was "too complete" with a 30% performance hit + +* new approach is almost ready + + - with full Jit support + - but needs some fixing + - this *will* be efficient + +Applications using Stackless Python +------------------------------------ + +* The Eve Online MMORPG + + http://www.eveonline.com/ + + - based their games on Stackless since 1998 + +* science + computing ag, Anselm Kruis + + https://ep2012.europython.eu/conference/p/anselm-kruis + +* The Nagare Web Framework + + http://www.nagare.org/ + + - works because of Stackless Pickling + +* today's majority: persistence + + +Thank you +--------- + +* the new Stackless Website + http://www.stackless.com/ + + - a **great** donation from Alain Pourier, *Nagare* + +* You can hire me as a consultant + +* Questions? diff --git a/talk/ep2012/stackless/stylesheet.latex b/talk/ep2012/stackless/stylesheet.latex new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/stylesheet.latex @@ -0,0 +1,11 @@ +\usetheme{Boadilla} +\usecolortheme{whale} +\setbeamercovered{transparent} +\setbeamertemplate{navigation symbols}{} + +\definecolor{darkgreen}{rgb}{0, 0.5, 0.0} +\newcommand{\docutilsrolegreen}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\docutilsrolered}[1]{\color{red}#1\normalcolor} + +\newcommand{\green}[1]{\color{darkgreen}#1\normalcolor} +\newcommand{\red}[1]{\color{red}#1\normalcolor} diff --git a/talk/ep2012/stackless/title.latex b/talk/ep2012/stackless/title.latex new file mode 100644 --- /dev/null +++ b/talk/ep2012/stackless/title.latex @@ -0,0 +1,5 @@ +\begin{titlepage} +\begin{figure}[h] +\includegraphics[width=60px]{logo_small.png} +\end{figure} +\end{titlepage} diff --git a/talk/ep2012/stm/stmdemo2.py b/talk/ep2012/stm/stmdemo2.py --- a/talk/ep2012/stm/stmdemo2.py +++ b/talk/ep2012/stm/stmdemo2.py @@ -1,33 +1,37 @@ - def specialize_more_blocks(self): - while True: - # look for blocks not specialized yet - pending = [block for block in self.annotator.annotated - if block not in self.already_seen] - if not pending: - break +def specialize_more_blocks(self): + while True: + # look for blocks not specialized yet + pending = [block for block in self.annotator.annotated + if block not in self.already_seen] + if not pending: + break - # specialize all blocks in the 'pending' list - for block in pending: - self.specialize_block(block) - self.already_seen.add(block) + # specialize all blocks in the 'pending' list + for block in pending: + self.specialize_block(block) + self.already_seen.add(block) - def specialize_more_blocks(self): - while True: - # look for blocks not specialized yet - pending = [block for block in self.annotator.annotated - if block not in self.already_seen] - if not pending: - break - # specialize all blocks in the 'pending' list - # *using transactions* - for block in pending: - transaction.add(self.specialize_block, block) - transaction.run() - self.already_seen.update(pending) + + +def specialize_more_blocks(self): + while True: + # look for blocks not specialized yet + pending = [block for block in self.annotator.annotated + if block not in self.already_seen] + if not pending: + break + + # specialize all blocks in the 'pending' list + # *using transactions* + for block in pending: + transaction.add(self.specialize_block, block) + transaction.run() + + self.already_seen.update(pending) diff --git a/talk/ep2012/stm/talk.pdf b/talk/ep2012/stm/talk.pdf index 19067d178980accc5a060fa819059611fcf1acdc..59ba6454817cd0a87accdf48e505190fe99b4924 GIT binary patch [cut] diff --git a/talk/ep2012/stm/talk.rst b/talk/ep2012/stm/talk.rst --- a/talk/ep2012/stm/talk.rst +++ b/talk/ep2012/stm/talk.rst @@ -484,6 +484,8 @@ * http://pypy.org/ -* You can hire Antonio +* You can hire Antonio (http://antocuni.eu) * Questions? + +* PyPy help desk on Thursday morning \ No newline at end of file diff --git a/talk/ep2012/tools/demo.py b/talk/ep2012/tools/demo.py new file mode 100644 --- /dev/null +++ b/talk/ep2012/tools/demo.py @@ -0,0 +1,208 @@ + +def simple(): + for i in range(100000): + pass + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +def bridge(): + s = 0 + for i in range(100000): + if i % 2: + s += 1 + else: + s += 2 + + + + + + + + + + + + + + + + + + + +def bridge_overflow(): + s = 2 + for i in range(100000): + s += i*i*i*i + return s + + + + + + + + + + + + + + + + + + + + + +def nested_loops(): + s = 0 + for i in range(10000): + for j in range(100000): + s += 1 + + + + + + + + + + + + + + + +def inner1(): + return 1 + +def inlined_call(): + s = 0 + for i in range(10000): + s += inner1() + + + + + + + + + + + + + + + + + + + +def inner2(a): + for i in range(3): + a += 1 + return a + +def inlined_call_loop(): + s = 0 + for i in range(100000): + s += inner2(i) + + + + + + + + + + + + + + + +class A(object): + def __init__(self, x): + if x % 2: + self.y = 3 + self.x = x + +def object_maps(): + l = [A(i) for i in range(100)] + s = 0 + for i in range(1000000): + s += l[i % 100].x + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +if __name__ == '__main__': + simple() + bridge() + bridge_overflow() + nested_loops() + inlined_call() + inlined_call_loop() + object_maps() diff --git a/talk/ep2012/tools/talk.html b/talk/ep2012/tools/talk.html new file mode 100644 --- /dev/null +++ b/talk/ep2012/tools/talk.html @@ -0,0 +1,120 @@ +<html> +<head> + <meta name="viewport" content="width=1024, user-scalable=no"> + <link rel="stylesheet" href="/home/fijal/src/deckjs/core/deck.core.css"> + <link rel="stylesheet" href="web-2.0.css"> + <link rel="stylesheet" href="/home/fijal/src/deckjs/themes/transition/horizontal-slide.css"> + <script src="/home/fijal/src/deckjs/modernizr.custom.js"></script> + <script src="/home/fijal/src/deckjs/jquery-1.7.min.js"></script> + <script src="/home/fijal/src/deckjs/core/deck.core.js"></script> + <script> + $(function() { + $.deck('.slide'); + }); + </script> + +</head> +<body class="deck-container"> + <section class="slide" id="title-slide"> + <h1>Performance analysis tools for JITted VMs</h1> + </section> + <section class="slide"> + <h2>Who am I?</h2> + <ul> + <li>worked on PyPy for 5+ years</li> + <li>often presented with a task "my program runs slow"</li> + <li>never completely satisfied with present solutions</li> + <li class="slide">I'm not antisocial, just shy</li> + </ul> + </section> + <section class="slide"> + <h2>The talk</h2> + <ul> + <li>apologies for a lack of advanced warning - this is a rant</li> + <div class="slide"> + <li>I'll talk about tools</li> + <li>primarily profiling tools</li> + </div> + <div class="slide"> + <li>lots of questions</li> + <li>not that many answers</li> + </div> + </ul> + </section> + <section class="slide"> + <h2>Why ranting?</h2> + <ul> + <li>the topic at hand is hard</li> + <li>the mindset about tools is very much rooted in the static land</li> + </ul> + </section> + <section class="slide"> + <h2>Profiling theory</h2> + <ul> + <li>you spend 90% of your time in 10% of the functions</li> + <li>hence you can start profiling after you're done developing</li> + <li>by optimizing few functions</li> + <div class="slide"> + <li>problem - 10% of 600k lines is still 60k lines</li> + <li>that might be even 1000s of functions</li> + </div> + </ul> + </section> + <section class="slide"> + <h2>Let's talk about profiling</h2> + <ul> + <li>I'll try profiling!</li> + </ul> + </section> + <section class="slide"> + <h2>JITted landscape</h2> + <ul> + <li>you have to account for warmup times</li> + <li>time spent in functions is very context dependent</li> + </ul> + </section> + <section class="slide"> + <h2>Let's try!</h2> + </section> + <section class="slide"> + <h2>High level languages</h2> + <ul> + <li>in C relation C <-> assembler is "trivial"</li> + <li>in PyPy, V8 (JS) or luajit (lua), the mapping is far from trivial</li> + <div class="slide"> + <li>multiple versions of the same code</li> + <li>bridges even if there is no branch in user code</li> + </div> + <li class="slide">sometimes I have absolutely no clue</li> + </ul> + </section> + <section class="slide"> + <h2>The problem</h2> + <ul> + <li>what I've shown is pretty much the state of the art</li> + </ul> + </section> + <section class="slide"> + <h2>Another problem</h2> + <ul> + <li>often when presented with profiling, it's already too late</li> + </ul> + </section> + <section class="slide"> + <h2>Better tools</h2> + <ul> + <li>good vm-level instrumentation</li> + <li>better visualizations, more code oriented</li> + <li>hints at the editor level about your code</li> + <li>hints about coverage, tests</li> + </ul> + </section> + <section class="slide"> + <h2></rant></h2> + <ul> + <li>good part - there are people working on it</li> + <li>questions, suggestions?</li> + </ul> + </section> +</body> +</html> diff --git a/talk/ep2012/tools/web-2.0.css b/talk/ep2012/tools/web-2.0.css new file mode 100644 --- /dev/null +++ b/talk/ep2012/tools/web-2.0.css @@ -0,0 +1,215 @@ +@charset "UTF-8"; +.deck-container { + font-family: "Gill Sans", "Gill Sans MT", Calibri, sans-serif; + font-size: 2.75em; + background: #f4fafe; + /* Old browsers */ + background: -moz-linear-gradient(top, #f4fafe 0%, #ccf0f0 100%); + /* FF3.6+ */ + background: -webkit-gradient(linear, left top, left bottom, color-stop(0%, #f4fafe), color-stop(100%, #ccf0f0)); + /* Chrome,Safari4+ */ + background: -webkit-linear-gradient(top, #f4fafe 0%, #ccf0f0 100%); + /* Chrome10+,Safari5.1+ */ + background: -o-linear-gradient(top, #f4fafe 0%, #ccf0f0 100%); + /* Opera11.10+ */ + background: -ms-linear-gradient(top, #f4fafe 0%, #ccf0f0 100%); + /* IE10+ */ + background: linear-gradient(top, #f4fafe 0%, #ccf0f0 100%); + /* W3C */ + background-attachment: fixed; +} +.deck-container > .slide { + text-shadow: 1px 1px 1px rgba(255, 255, 255, 0.5); +} +.deck-container > .slide .deck-before, .deck-container > .slide .deck-previous { + opacity: 0.4; +} +.deck-container > .slide .deck-before:not(.deck-child-current) .deck-before, .deck-container > .slide .deck-before:not(.deck-child-current) .deck-previous, .deck-container > .slide .deck-previous:not(.deck-child-current) .deck-before, .deck-container > .slide .deck-previous:not(.deck-child-current) .deck-previous { + opacity: 1; +} +.deck-container > .slide .deck-child-current { + opacity: 1; +} +.deck-container .slide h1, .deck-container .slide h2, .deck-container .slide h3, .deck-container .slide h4, .deck-container .slide h5, .deck-container .slide h6 { + font-family: "Hoefler Text", Constantia, Palatino, "Palatino Linotype", "Book Antiqua", Georgia, serif; + font-size: 1.75em; +} +.deck-container .slide h1 { + color: #08455f; +} +.deck-container .slide h2 { + color: #0b7495; + border-bottom: 0; +} +.cssreflections .deck-container .slide h2 { + line-height: 1; + -webkit-box-reflect: below -0.556em -webkit-gradient(linear, left top, left bottom, from(transparent), color-stop(0.3, transparent), color-stop(0.7, rgba(255, 255, 255, 0.1)), to(transparent)); + -moz-box-reflect: below -0.556em -moz-linear-gradient(top, transparent 0%, transparent 30%, rgba(255, 255, 255, 0.3) 100%); +} +.deck-container .slide h3 { + color: #000; +} +.deck-container .slide pre { + border-color: #cde; + background: #fff; + position: relative; + z-index: auto; + /* http://nicolasgallagher.com/css-drop-shadows-without-images/ */ +} +.borderradius .deck-container .slide pre { + -webkit-border-radius: 5px; + -moz-border-radius: 5px; + border-radius: 5px; +} +.csstransforms.boxshadow .deck-container .slide pre > :first-child:before { + content: ""; + position: absolute; + z-index: -1; + background: #fff; + top: 0; + bottom: 0; + left: 0; + right: 0; +} +.csstransforms.boxshadow .deck-container .slide pre:before, .csstransforms.boxshadow .deck-container .slide pre:after { + content: ""; + position: absolute; + z-index: -2; + bottom: 15px; + width: 50%; + height: 20%; + max-width: 300px; + -webkit-box-shadow: 0 15px 10px rgba(0, 0, 0, 0.7); + -moz-box-shadow: 0 15px 10px rgba(0, 0, 0, 0.7); + box-shadow: 0 15px 10px rgba(0, 0, 0, 0.7); +} +.csstransforms.boxshadow .deck-container .slide pre:before { + left: 10px; + -webkit-transform: rotate(-3deg); + -moz-transform: rotate(-3deg); + -ms-transform: rotate(-3deg); + -o-transform: rotate(-3deg); + transform: rotate(-3deg); +} +.csstransforms.boxshadow .deck-container .slide pre:after { + right: 10px; + -webkit-transform: rotate(3deg); + -moz-transform: rotate(3deg); + -ms-transform: rotate(3deg); + -o-transform: rotate(3deg); + transform: rotate(3deg); +} +.deck-container .slide code { + color: #789; +} +.deck-container .slide blockquote { + font-family: "Hoefler Text", Constantia, Palatino, "Palatino Linotype", "Book Antiqua", Georgia, serif; + font-size: 2em; + padding: 1em 2em .5em 2em; + color: #000; + background: #fff; + position: relative; + border: 1px solid #cde; + z-index: auto; +} +.borderradius .deck-container .slide blockquote { + -webkit-border-radius: 5px; + -moz-border-radius: 5px; + border-radius: 5px; +} +.boxshadow .deck-container .slide blockquote > :first-child:before { + content: ""; + position: absolute; + z-index: -1; + background: #fff; + top: 0; + bottom: 0; + left: 0; + right: 0; +} +.boxshadow .deck-container .slide blockquote:after { + content: ""; + position: absolute; + z-index: -2; + top: 10px; + bottom: 10px; + left: 0; + right: 50%; + -moz-border-radius: 10px/100px; + border-radius: 10px/100px; + -webkit-box-shadow: 0 0 15px rgba(0, 0, 0, 0.6); + -moz-box-shadow: 0 0 15px rgba(0, 0, 0, 0.6); + box-shadow: 0 0 15px rgba(0, 0, 0, 0.6); +} +.deck-container .slide blockquote p { + margin: 0; +} +.deck-container .slide blockquote cite { + font-size: .5em; + font-style: normal; + font-weight: bold; + color: #888; +} +.deck-container .slide blockquote:before { + content: "“"; + position: absolute; + top: 0; + left: 0; + font-size: 5em; + line-height: 1; + color: #ccf0f0; + z-index: 1; +} +.deck-container .slide ::-moz-selection { + background: #08455f; + color: #fff; +} +.deck-container .slide ::selection { + background: #08455f; + color: #fff; +} +.deck-container .slide a, .deck-container .slide a:hover, .deck-container .slide a:focus, .deck-container .slide a:active, .deck-container .slide a:visited { + color: #599; + text-decoration: none; +} +.deck-container .slide a:hover, .deck-container .slide a:focus { + text-decoration: underline; +} +.deck-container .deck-prev-link, .deck-container .deck-next-link { + background: #fff; + opacity: 0.5; +} +.deck-container .deck-prev-link, .deck-container .deck-prev-link:hover, .deck-container .deck-prev-link:focus, .deck-container .deck-prev-link:active, .deck-container .deck-prev-link:visited, .deck-container .deck-next-link, .deck-container .deck-next-link:hover, .deck-container .deck-next-link:focus, .deck-container .deck-next-link:active, .deck-container .deck-next-link:visited { + color: #599; +} +.deck-container .deck-prev-link:hover, .deck-container .deck-prev-link:focus, .deck-container .deck-next-link:hover, .deck-container .deck-next-link:focus { + opacity: 1; + text-decoration: none; +} +.deck-container .deck-status { + font-size: 0.6666em; +} +.deck-container.deck-menu .slide { + background: transparent; + -webkit-border-radius: 5px; + -moz-border-radius: 5px; + border-radius: 5px; +} +.rgba .deck-container.deck-menu .slide { + background: rgba(0, 0, 0, 0.1); +} +.deck-container.deck-menu .slide.deck-current, .rgba .deck-container.deck-menu .slide.deck-current, .no-touch .deck-container.deck-menu .slide:hover { + background: #fff; +} +.deck-container .goto-form { + background: #fff; + border: 1px solid #cde; + -webkit-border-radius: 5px; + -moz-border-radius: 5px; + border-radius: 5px; +} +.boxshadow .deck-container .goto-form { + -webkit-box-shadow: 0 15px 10px -10px rgba(0, 0, 0, 0.5), 0 1px 4px rgba(0, 0, 0, 0.3), 0 0 40px rgba(0, 0, 0, 0.1) inset; + -moz-box-shadow: 0 15px 10px -10px rgba(0, 0, 0, 0.5), 0 1px 4px rgba(0, 0, 0, 0.3), 0 0 40px rgba(0, 0, 0, 0.1) inset; + box-shadow: 0 15px 10px -10px rgba(0, 0, 0, 0.5), 0 1px 4px rgba(0, 0, 0, 0.3), 0 0 40px rgba(0, 0, 0, 0.1) inset; +} diff --git a/talk/vmil2012/Makefile b/talk/vmil2012/Makefile --- a/talk/vmil2012/Makefile +++ b/talk/vmil2012/Makefile @@ -6,8 +6,14 @@ pdflatex paper mv paper.pdf jit-guards.pdf +UNAME := $(shell "uname") view: jit-guards.pdf +ifeq ($(UNAME), Linux) evince jit-guards.pdf & +endif +ifeq ($(UNAME), Darwin) + open jit-guards.pdf & +endif %.tex: %.py pygmentize -l python -o $@ $< diff --git a/talk/vmil2012/difflogs.py b/talk/vmil2012/difflogs.py new file mode 100755 --- /dev/null +++ b/talk/vmil2012/difflogs.py @@ -0,0 +1,180 @@ +#!/usr/bin/env python +""" +Parse and summarize the traces produced by pypy-c-jit when PYPYLOG is set. +only works for logs when unrolling is disabled +""" + +import py +import os +import sys +import csv +import optparse +from pprint import pprint +from pypy.tool import logparser +from pypy.jit.tool.oparser import parse +from pypy.jit.metainterp.history import ConstInt +from pypy.rpython.lltypesystem import llmemory, lltype + +categories = { + 'setfield_gc': 'set', + 'setarrayitem_gc': 'set', + 'strsetitem': 'set', + 'getfield_gc': 'get', + 'getfield_gc_pure': 'get', + 'getarrayitem_gc': 'get', + 'getarrayitem_gc_pure': 'get', + 'strgetitem': 'get', + 'new': 'new', + 'new_array': 'new', + 'newstr': 'new', + 'new_with_vtable': 'new', + 'guard_class': 'guard', + 'guard_nonnull_class': 'guard', +} + +all_categories = 'new get set guard numeric rest'.split() + +def extract_opnames(loop): + loop = loop.splitlines() + for line in loop: + if line.startswith('#') or line.startswith("[") or "end of the loop" in line: + continue + frontpart, paren, _ = line.partition("(") + assert paren + if " = " in frontpart: + yield frontpart.split(" = ", 1)[1] + elif ": " in frontpart: + yield frontpart.split(": ", 1)[1] + else: + yield frontpart + +def summarize(loop, adding_insns={}): # for debugging + insns = adding_insns.copy() + seen_label = True + if "label" in loop: + seen_label = False + for opname in extract_opnames(loop): + if not seen_label: + if opname == 'label': + seen_label = True + else: + assert categories.get(opname, "rest") == "get" + continue + if opname.startswith("int_") or opname.startswith("float_"): + opname = "numeric" + else: + opname = categories.get(opname, 'rest') + insns[opname] = insns.get(opname, 0) + 1 + assert seen_label + return insns + +def compute_summary_diff(loopfile, options): + print loopfile + log = logparser.parse_log_file(loopfile) + loops, summary = consider_category(log, options, "jit-log-opt-") + + # non-optimized loops and summary + nloops, nsummary = consider_category(log, options, "jit-log-noopt-") + diff = {} + keys = set(summary.keys()).union(set(nsummary)) + for key in keys: + before = nsummary[key] + after = summary[key] + diff[key] = (before-after, before, after) + return len(loops), summary, diff + +def main(loopfile, options): + _, summary, diff = compute_summary_diff(loopfile, options) + + print + print 'Summary:' + print_summary(summary) + + if options.diff: + print_diff(diff) + +def consider_category(log, options, category): + loops = logparser.extract_category(log, category) + if options.loopnum is None: + input_loops = loops + else: + input_loops = [loops[options.loopnum]] + summary = dict.fromkeys(all_categories, 0) + for loop in loops: + summary = summarize(loop, summary) + return loops, summary + + +def print_summary(summary): + ops = [(summary[key], key) for key in summary] + ops.sort(reverse=True) + for n, key in ops: + print '%5d' % n, key + +def print_diff(diff): + ops = [(key, before, after, d) for key, (d, before, after) in diff.iteritems()] + ops.sort(reverse=True) + tot_before = 0 + tot_after = 0 + print ",", + for key, before, after, d in ops: + print key, ", ,", + print "total" + print args[0], ",", + for key, before, after, d in ops: + tot_before += before + tot_after += after + print before, ",", after, ",", + print tot_before, ",", tot_after + +def mainall(options): + logs = os.listdir("logs") + all = [] + for log in logs: + parts = log.split(".") + if len(parts) != 3: + continue + l, exe, bench = parts + if l != "logbench": + continue + all.append((exe, bench, log)) + all.sort() + with file("logs/summary.csv", "w") as f: + csv_writer = csv.writer(f) + row = ["exe", "bench", "number of loops"] + for cat in all_categories: + row.append(cat + " before") + row.append(cat + " after") + csv_writer.writerow(row) + print row + for exe, bench, log in all: + num_loops, summary, diff = compute_summary_diff("logs/" + log, options) + print diff + print exe, bench, summary + row = [exe, bench, num_loops] + for cat in all_categories: + difference, before, after = diff[cat] + row.append(before) + row.append(after) + csv_writer.writerow(row) + print row + +if __name__ == '__main__': + parser = optparse.OptionParser(usage="%prog loopfile [options]") + parser.add_option('-n', '--loopnum', dest='loopnum', default=None, metavar='N', type=int, + help='show the loop number N [default: last]') + parser.add_option('-a', '--all', dest='loopnum', action='store_const', const=None, + help='show all loops in the file') + parser.add_option('-d', '--diff', dest='diff', action='store_true', default=False, + help='print the difference between non-optimized and optimized operations in the loop(s)') + parser.add_option('--diffall', dest='diffall', action='store_true', default=False, + help='diff all the log files around') + + options, args = parser.parse_args() + if options.diffall: + mainall(options) + elif len(args) != 1: + parser.print_help() + sys.exit(2) + else: + main(args[0], options) diff --git a/talk/vmil2012/paper.tex b/talk/vmil2012/paper.tex --- a/talk/vmil2012/paper.tex +++ b/talk/vmil2012/paper.tex @@ -104,10 +104,10 @@ The contributions of this paper are: \begin{itemize} - \item + \item \end{itemize} -The paper is structured as follows: +The paper is structured as follows: \section{Background} \label{sec:Background} @@ -116,6 +116,34 @@ \label{sub:pypy} +The RPython language and the PyPy Project were started in 2002 with the goal of +creating a python interpreter written in a High level language, allowing easy +language experimentation and extension. PyPy is now a fully compatible +alternative implementation of the Python language, xxx mention speed. The +Implementation takes advantage of the language features provided by RPython +such as the provided tracing just-in-time compiler described below. + +RPython, the language and the toolset originally developed to implement the +Python interpreter have developed into a general environment for experimenting +and developing fast and maintainable dynamic language implementations. xxx Mention +the different language impls. + +RPython is built of two components, the language and the translation toolchain +used to transform RPython programs to executable units. The RPython language +is a statically typed object oriented high level language. The language provides +several features such as automatic memory management (aka. Garbage Collection) +and just-in-time compilation. When writing an interpreter using RPython the +programmer only has to write the interpreter for the language she is +implementing. The second RPython component, the translation toolchain, is used +to transform the program to a low level representations suited to be compiled +and run on one of the different supported target platforms/architectures such +as C, .NET and Java. During the transformation process +different low level aspects suited for the target environment are automatically +added to program such as (if needed) a garbage collector and with some hints +provided by the author a just-in-time compiler. + + + \subsection{PyPy's Meta-Tracing JIT Compilers} \label{sub:tracing} @@ -134,7 +162,7 @@ * High level handling of resumedata * trade-off fast tracing v/s memory usage - * creation in the frontend + * creation in the frontend * optimization * compression * interaction with optimization _______________________________________________ pypy-commit mailing list pypy-commit@python.org http://mail.python.org/mailman/listinfo/pypy-commit