Author: Maciej Fijalkowski <fij...@gmail.com> Branch: extradoc Changeset: r4129:5cf115078d22 Date: 2012-03-07 08:02 -0800 http://bitbucket.org/pypy/extradoc/changeset/5cf115078d22/
Log: merge diff --git a/talk/pycon2012/jit/instancemap.png b/talk/pycon2012/jit/instancemap.png new file mode 100644 index 0000000000000000000000000000000000000000..48ef9eb5291c59f5717cd1d13c391e7371dc19bb GIT binary patch [cut] diff --git a/talk/pycon2012/jit/slides.tex b/talk/pycon2012/jit/slides.tex new file mode 100644 --- /dev/null +++ b/talk/pycon2012/jit/slides.tex @@ -0,0 +1,385 @@ +\documentclass[14pt]{beamer} + +\mode<presentation> +{ + \usetheme{default} + % or ... + + \setbeamercovered{transparent} + % or whatever (possibly just delete it) +} + + +\usepackage[english]{babel} +\usepackage[utf8]{inputenc} +\usepackage{listings} + +% \usepackage{times} +% \usepackage[T1]{fontenc} +% Or whatever. Note that the encoding and the font should match. If T1 +% does not look nice, try deleting the line with the fontenc. + + +\title{How the PyPy JIT works} + +\author{Benjamin Peterson \\ benja...@python.org} +% - Use the \inst{?} command only if the authors have different +% affiliation. + +\date{PyCon 2012} + + +% If you have a file called "university-logo-filename.xxx", where xxx +% is a graphic format that can be processed by latex or pdflatex, +% resp., then you can add a logo as follows: + +% \pgfdeclareimage[height=0.5cm]{university-logo}{university-logo-filename} +% \logo{\pgfuseimage{university-logo}} + + +% If you wish to uncover everything in a step-wise fashion, uncomment +% the following command: + +%\beamerdefaultoverlayspecification{<+->} + +\lstset{basicstyle=\scriptsize\ttfamily} + + +\begin{document} + +\begin{frame} + \titlepage +\end{frame} + +% Since this a solution template for a generic talk, very little can +% be said about how it should be structured. However, the talk length +% of between 15min and 45min and the theme suggest that you stick to +% the following rules: + +% - Exactly two or three sections (other than the summary). +% - At *most* three subsections per section. +% - Talk about 30s to 2min per frame. So there should be between about +% 15 and 30 frames, all told. + +\begin{frame} + % - A title should summarize the slide in an understandable fashion + % for anyone how does not follow everything on the slide itself. + + ``If the implementation is hard to explain, it's a bad idea.'' + \vskip0pt plus.5fill + \uncover<2->{Except PyPy!} +\end{frame} + +\begin{frame}{Design Principles} +\begin{itemize} +\item interpreter agnostic +\item but allow language implementation to help JIT +\item compile hot code: loops or functions +\item use runtime information to remove dynamic overhead +\item optimize common and idiomatic code +\end{itemize} +\end{frame} + +\begin{frame}{Overview} +\begin{itemize} +\item translation +\item runtime +\begin{itemize} +\item tracing +\item optimization +\item assembly generation +\end{itemize} +\end{itemize} +\end{frame} + +\begin{frame}[fragile]{Adding JIT to an interpreter} +\begin{itemize} +\item JIT args +\begin{itemize} +\item ``greens'' = constants +\item ``reds'' = variables +\end{itemize} +\item hints +\begin{itemize} +\item \verb+can_enter_jit+ +\item \verb+jit_merge_point+ +\end{itemize} +\end{itemize} +\end{frame} + +\begin{frame}[fragile]{JIT on the Python interpreter} +\begin{itemize} +\item greens: code object and instruction pointer +\item reds: frame object and execution context +\end{itemize} +\begin{lstlisting}[language=Python] +while True: + jit_merge_point() + next_instr = self.handle_bytecode(code, next_instr) +\end{lstlisting} +\begin{lstlisting}[language=Python] +def JUMP_ABSOLUTE(jump_to): + can_enter_jit() + return jump_to +\end{lstlisting} +\end{frame} + +\begin{frame}{Generating the JIT} +\begin{itemize} +\item low-level flowgraphs converted to jitcodes +\item interpreter decides what functions to ``look into'' +\item jitcodes stored in binary +\end{itemize} +\end{frame} + +\begin{frame}[fragile]{Main example: Primality Test} +\begin{lstlisting}[language=Python] +def is_prime(n): + limit = int(n**.5) + i = 2 + while i <= limit: + if n % i == 0: + return False + i += 1 + return True +\end{lstlisting} +\end{frame} + +\begin{frame}[fragile]{Primality Test - Bytecode} +\tiny{ +\begin{verbatim} +# while i <= limit: +22 SETUP_LOOP 46 (to 71) +25 LOAD_FAST 2 (i) +28 LOAD_FAST 1 (limit) +31 COMPARE_OP 1 (<=) +34 POP_JUMP_IF_FALSE 70 +# if i % 1 === 0: +37 LOAD_FAST 0 (n) +40 LOAD_FAST 2 (i) +43 BINARY_MODULO +44 LOAD_CONST 3 (0) +47 COMPARE_OP 2 (==) +50 POP_JUMP_IF_FALSE 57 +# return False +53 LOAD_GLOBAL 1 (False) +56 RETURN_VALUE +# i += 1 +57 LOAD_FAST 2 (i) +60 LOAD_CONST 4 (1) +63 INPLACE_ADD +64 STORE_FAST 2 (i) +67 JUMP_ABSOLUTE 25 +\end{verbatim} +} +\end{frame} + +\begin{frame}[fragile]{Activating the JIT} +\begin{itemize} +\item every loop has a counter incremented when \verb+can_enter_jit+ is reached +\item tracing entered when threshold reached +\item slight fudging to avoid compiling everything suddenly +\end{itemize} +\end{frame} + +\begin{frame}{Tracing} +\begin{itemize} +\item metainterpreter executes saved jitcodes +\item records operations in JIT IR +\item traces through function calls, flattening loop +\item program values stored in boxes (pointers, integers, floats) +\item tracing ends when loop is closed +\end{itemize} +\end{frame} + +\begin{frame}{Guards} +\begin{itemize} +\item as the jit traces, it makes assumptions based on runtime information +\item these must hold otherwise trace is invalid +\item guards protect trace from different runtime conditions +\item examples: conditional branches, overflow, exceptions, other code accessing JIT data +\end{itemize} +\end{frame} + +\begin{frame}[fragile]{JIT IR example - Head of loop} +\footnotesize{ +\begin{verbatim} +# arguments to loop +[p0, p1, p2, p3, i4, p5, i6, i7, p8, p9, p10, p11, p12, p13] +#25 LOAD_FAST +guard_value(i6, 3) +guard_nonnull(p10) +guard_value(i4, 0) +#28 LOAD_FAST +guard_nonnull(p9) +#31 COMPARE_OP +guard_class(p10, W_IntObject) +guard_class(p9, W_IntObject) +i18 = getfield_gc_pure(p10, <W_IntObject.inst_intval>) +i19 = getfield_gc_pure(p9, <W_IntObject.inst_intval>) +i20 = int_le(i18, i19) +guard_true(i20) +#34 POP_JUMP_IF_FALSE +\end{verbatim} +} +\end{frame} + +\begin{frame}[fragile]{JIT IR example - computing modulo} +\footnotesize{ +\begin{verbatim} +#37 LOAD_FAST +guard_nonnull(p8) +#40 LOAD_FAST +#43 BINARY_MODULO +guard_class(p8, W_IntObject) +i22 = getfield_gc_pure(p8, <W_IntObject.inst_intval>) +i23 = int_is_zero(i18) +guard_false(i23) +i25 = int_eq(i22, -9223372036854775808) +guard_false(i25) +i26 = int_mod(i22, i18) +i28 = int_lt(i18, 0) +guard_false(i28) +i30 = int_rshift(i26, 63) +i31 = int_and(i18, i30) +i32 = int_add(i26, i31) +p34 = new_with_vtable(W_IntObject) +setfield_gc(p34, i32, <W_IntObject.inst_intval>) +\end{verbatim} +} +\end{frame} + +\begin{frame}[fragile]{JIT IR example - check modulo} +\footnotesize{ +\begin{verbatim} +#44 LOAD_CONST +guard_value(p3, ConstPtr(ptr35)) +#47 COMPARE_OP') +i37 = int_eq(i32, 0) +guard_false(i37) +#50 POP_JUMP_IF_FALSE +\end{verbatim} +} +\end{frame} + +\begin{frame}[fragile]{JIT IR example - finish loop} +\footnotesize{ +\begin{verbatim} +#57 LOAD_FAST +#60 LOAD_FAST +#63 INPLACE_ADD +i38 = getfield_gc_pure(p10, <W_IntObject.inst_intval>) +i40 = int_add_ovf(i38, 1) +guard_no_overflow() +p42 = new_with_vtable(W_IntObject) +setfield_gc(p42, i40, <W_IntObject.inst_intval>) +#64 STORE_FAST +#67 JUMP_ABSOLUTE +\end{verbatim} +} +\end{frame} + +\begin{frame}{Optimizations} +\begin{itemize} +\item limited by speed +\item classical compiler optimizations +\begin{itemize} +\item constant folding +\item strength reduction +\item integer bounds +\end{itemize} +\item most important is removing indirection: virtuals and virtualizables +\end{itemize} +\end{frame} + +\begin{frame}{Virtuals} +\begin{itemize} +\item if objects allocated in the loop don't escape it, we don't have to + allocate them +\item removes boxing +\item all field access can be killed +\item in our example, removes integer allocations +\end{itemize} +\end{frame} + +\begin{frame}{Virtualizables} +\begin{itemize} +\item like virtuals but allowed to escape trace +\item handled at metainterp level +\item Python interpreter: the frame object +\item possibility of escape creates bookkeeping problems +\item advantage of JIT generator approach +\end{itemize} +\end{frame} + +\begin{frame}{Unrolling} +\begin{itemize} +\item need some way to compute loop invariants before reaching optimized loop +\item solution: two optimization passes +\item first iteration computes loop invariants +\item second iteration is tight loop +\end{itemize} +\end{frame} + +\begin{frame}{JIT viewer} +Our fancy toy for visualizing JIT +\end{frame} + +\begin{frame}{Creating assembly} +\begin{itemize} +\item surprisingly straightforward +\item linear register allocator +\item gc has to be informed of dynamic allocations +\item assembly now ready to be run +\end{itemize} +\end{frame} + +\begin{frame}{Bailing from the JIT} +\begin{itemize} +\item guards fail while assembly is running +\item assembler has written a compact description of hower to recover state +\item state reconstructed +\item blackhole interpreter +\item if gaurd fails repeatedly, compile a bridge +\end{itemize} +\end{frame} + +\begin{frame}[fragile]{How the Python interpreter helps the JIT} +\begin{itemize} +\item constant promotion +\item decorators: \verb+unroll_safe+, \verb+elidable+ +\item data structures optimized for the JIT +\begin{itemize} +\item map dicts +\item cell dicts +\item list and dict strategies +\item specialized tuples +\end{itemize} +\end{itemize} +\end{frame} + +\begin{frame}{Map dicts} +\begin{itemize} +\item shares object attributes +\item instance data stored in array +\item memory efficient as well as good for the JIT +\end{itemize} +\includegraphics[scale=.45]{instancemap.png} +\end{frame} + +\begin{frame}[fragile]{Thank you} +Questions? + +Resources +\begin{itemize} +\item \#pypy on Freenode +\item http://morepypy.blogspot.com +\item pypy-...@python.org +\item The Architecture of Open Source Applications, Volume 2 +\end{itemize} + + +My email: benja...@python.org +\end{frame} + +\end{document} diff --git a/talk/pycon2012/tutorial/Makefile b/talk/pycon2012/tutorial/Makefile --- a/talk/pycon2012/tutorial/Makefile +++ b/talk/pycon2012/tutorial/Makefile @@ -8,3 +8,6 @@ view: slides.pdf evince slides.pdf & + +clean: + rm -f slides.pdf slides.latex diff --git a/talk/pycon2012/tutorial/slides.rst b/talk/pycon2012/tutorial/slides.rst --- a/talk/pycon2012/tutorial/slides.rst +++ b/talk/pycon2012/tutorial/slides.rst @@ -87,7 +87,7 @@ Performance sweetspots ====================== -* every VM has it's sweetspot +* every VM has its sweetspot * we try hard to make it wider and wider @@ -130,14 +130,14 @@ Tracing JIT =========== -* once the loop gets hot, it's starting tracing (1039 runs, or 1619 function +* once the loop gets hot, it starts tracing (1039 runs, or 1619 function calls) -* generating operations following how the interpreter executes them +* generates operations following how the interpreter executes them -* optimizing them +* optimizes chunks of operations -* compiling to assembler (x86, ppc or arm) +* compiles to assembler (x86, ppc or arm) PyPy's specific features ======================== @@ -148,6 +148,36 @@ * decent tools for inspecting the generated code +The PyPy cake +============= + +.. raw:: latex + + \begin{columns}[c] + \column{0.6\textwidth} + +* Your Python code +* PyPy interpreter (RPython) +* High-level flow graphs +* Low-level flow graphs + + * JIT + + * knows about exceptions and GC + + * C representation + + * reduces flow graphs to remove + exceptions and GC + +.. raw:: latex + + \column{0.4\textwidth} + \includegraphics[width=\textwidth]{images/cake} + \end{columns} + +.. This comment is here to make the LaTeX work. + Performance characteristics - runtime ===================================== @@ -199,13 +229,13 @@ ================= * lists are specialized for type - ``int``, ``float``, ``str``, ``unicode`` and - ``range``. + ``range()``. * appending a new type to an existing list makes you iterate over the entire list and rewrite everything. -Itertools abuse -=============== +Simpler is Faster +================= * some examples @@ -216,8 +246,10 @@ * if we've never seen a use of some piece of stdlib, chances are it'll be suboptimal on pypy -Obscure stuff -============= +* no really, simple is good + +Things we could improve +======================= * frame access is slow @@ -225,12 +257,12 @@ * profiling & tracing hooks -* a bit in the state of flux +* all works but could be optimized more JitViewer ========= -* http://bitbucket.org/pypy/jitviewer +* ``bitbucket.org/pypy/jitviewer`` * ``mkvirtualenv -p <path to pypy>`` @@ -251,33 +283,3 @@ * functions are inlined * sometimes completely twisted flow - -The PyPy Cake -============= - -.. raw:: latex - - \begin{columns}[c] - \column{0.6\textwidth} - -* Your Python code -* PyPy interpreter (RPython) -* High-level flow graphs -* Low-level flow graphs - - * JIT - - * knows about exceptions and GC - - * C representation - - * reduces flow graphs to remove - exceptions and GC - -.. raw:: latex - - \column{0.4\textwidth} - \includegraphics[width=\textwidth]{images/cake} - \end{columns} - -.. This comment is here to make the LaTeX work. _______________________________________________ pypy-commit mailing list pypy-commit@python.org http://mail.python.org/mailman/listinfo/pypy-commit