Author: Laura Creighton <[email protected]>
Branch: extradoc
Changeset: r3799:c3845bc5089b
Date: 2011-06-28 11:50 +0200
http://bitbucket.org/pypy/extradoc/changeset/c3845bc5089b/
Log: merge heads
diff --git a/talk/icooolps2011/bolz-hints-final.pdf
b/talk/icooolps2011/bolz-hints-final.pdf
new file mode 100644
index
0000000000000000000000000000000000000000..197cad8cabb11ab154bd9109fe0416bb689bbb98
GIT binary patch
[cut]
diff --git a/talk/icooolps2011/jit-hints.pdf b/talk/icooolps2011/jit-hints.pdf
index
c78b3b84550a3db53382fb1fb1a9a97c0596a4ef..afbc33e62d31d10aa178450e5a83b3e086fee9b8
GIT binary patch
[cut]
diff --git a/talk/icooolps2011/paper.tex b/talk/icooolps2011/paper.tex
--- a/talk/icooolps2011/paper.tex
+++ b/talk/icooolps2011/paper.tex
@@ -1020,7 +1020,7 @@
The authors would like to thank Peng Wu, David Edelsohn and Laura Creighton for
encouragement, fruitful discussions and feedback during the writing of this
paper.
This research was partially supported by the BMBF funded project PyJIT (nr.
01QE0913B;
-Eureka Eurostars).
+Eureka Eurostars). We also want to thank the anonymous reviewers for their
feedback.
\bibliographystyle{abbrv}
\bibliography{paper}
diff --git a/talk/iwtc11/paper.bib b/talk/iwtc11/paper.bib
--- a/talk/iwtc11/paper.bib
+++ b/talk/iwtc11/paper.bib
@@ -109,6 +109,16 @@
year = {2009}
},
+@inproceedings{bolz_runtime_2011,
+ address = {Lancaster, {UK}},
+ title = {Runtime Feedback in a {Meta-Tracing} {JIT} for Efficient
Dynamic Languages},
+ abstract = {Meta-tracing {JIT} compilers can be applied to a variety of
different languages without explicitly encoding language semantics into the
compiler. So far, they lacked a way to give the language implementor control
over runtime feedback. This restricted their performance. In this paper we
describe the mechanisms in {PyPy’s} meta-tracing {JIT} that can be used
to control runtime feedback in language-specific ways. These mechanisms are
flexible enough to express classical {VM} techniques such as maps and runtime
type feedback.},
+ booktitle = {{ICOOOLPS}},
+ publisher = {{ACM}},
+ author = {Bolz, Carl Friedrich and Cuni, Antonio and Fijałkowski,
Maciej and Leuschel, Michael and Rigo, Armin and Pedroni, Samuele},
+ year = {2011}
+},
+
@inproceedings{chang_tracing_2009,
address = {Washington, {DC}},
title = {Tracing for Web 3.0: Trace Compilation for the Next Generation
Web Applications},
diff --git a/talk/iwtc11/paper.tex b/talk/iwtc11/paper.tex
--- a/talk/iwtc11/paper.tex
+++ b/talk/iwtc11/paper.tex
@@ -45,7 +45,6 @@
\usepackage{listings}
\usepackage{beramono}
-
\definecolor{gray}{rgb}{0.3,0.3,0.3}
\lstset{
@@ -116,7 +115,7 @@
{Heinrich-Heine-Universität Düsseldorf}
{[email protected]}
\authorinfo{Maciej Fijałkowski}
- {Unaffiliated}
+ {}
{[email protected]}
\maketitle
@@ -233,8 +232,11 @@
Because $i_0$ is loop-invariant, the addition could be moved out of the loop.
However, we want to get this effect using our existing optimization passes
-without changing them too much. To achieve this, we peel one iteration off the
-loop before running the optimizations. This peeling gives the following trace:
+without changing them too much. Simple optimizations with one forward pass
+cannot directly get this effect: They just look at the trace without taking
+into account that the trace executes many times in a row. Therefore to achieve
+loop-invariant code motion, we peel one iteration off the loop before running
+the optimizations. This peeling gives the following trace:
\begin{lstlisting}[mathescape,numbers =
right,basicstyle=\setstretch{1.05}\ttfamily\scriptsize]
$L_0$($i_{0}$):
@@ -292,9 +294,14 @@
changing them at all. All that is needed is to peel off one iteration, then
apply simple one-pass optimizations and make sure that the necessary extra
arguments are inserted into the label of the loop itself and the jumps
-afterwards. Giving the optimizations two iterations together
-gives the optimization enough context to remove operations from the peeled
loop,
-because it detects that the operation was performed in the preamble already.
+afterwards.
+
+This is the key insight of the proposed implementation scheme: Giving an
+optimization two iterations together at the same time gives the optimization
+enough context to remove operations from the peeled loop, because it detects
+that the operation was performed in the preamble already. Thus at runtime these
+moved operations are only executed once when entering the loop and the results
+are reused in further iterations.
% section Motivation (end)
@@ -436,9 +443,6 @@
\section{Making Trace Optimizations Loop Aware}
-XXX make clear that the preamble is not necessarily the \emph{first} iteration
-of a loop
-
Before the trace is passed to a backend compiling it into machine code
it needs to be optimized to achieve better performance.
The focus of this paper
@@ -478,6 +482,8 @@
However, the peeled loop can then be optimized using the assumption that a
previous iteration has happened.
+XXX (samuele): the point about the first iteration is hard to understand
+
When applying optimizations to this two-iteration trace
some care has to taken as to how the arguments of the two
\lstinline{jump} operations and the input arguments of the peeled loop are
@@ -713,6 +719,8 @@
K$.
\subsection{Allocation Removals}
+\label{sub:allocation}
+
PyPy's allocation removal optimization \cite{bolz_allocation_2011} makes it
possible to identify objects that are allocated within the loop but never
escape it. Those objects have to be allocated in the loop, but no outside
@@ -744,8 +752,8 @@
In the general case, each allocation-removed object in the jump arguments is
exploded into a
vector of variables containing the values of all registered
-fields\footnote{This is sometimes called \emph{scalar replacement}. XXX check
-whether that's true}. If some of the fields are themselves references to
+fields\footnote{This is sometimes called \emph{scalar replacement}.}.
+If some of the fields are themselves references to
allocation-removed objects they are recursively exploded
to make the vector contain only concrete variables. Some care has
to be taken to always place the fields in the same order when
@@ -937,9 +945,10 @@
We can observe that PyPy (even without loop peeling) is orders of magnitude
faster than either CPython or Psyco. This is due to the JIT compilation
-advantages and optimizations we discussed in XXX [ref to other paper]. Loop
-peeling gives an additional XXX on average, which makes benchmark times
-comparable with native-compiled C code. Missing performance we attribute to
+advantages and optimizations we discussed in previous work
+\cite{bolz_allocation_2011, bolz_runtime_2011}. The geometric mean of the
+speedup of loop peeling is 70\%, which makes benchmark times
+comparable with native-compiled C code. We attribute the performance gap to C
code to
the relative immaturity of PyPy's JIT assembler backend as well as missing
optimizations, like instruction scheduling.
@@ -952,20 +961,23 @@
\section{Related Work}
\label{sec:related}
-All the optimizations presented here are completely standard
-\cite{muchnick_advanced_1997}. XXX
+The effect of combining a one ass optimization with loop peeling gives
+completely standard loop invariant code motion optimizations
+\cite{muchnick_advanced_1997}. We do not claim any novelty in the effect, but
+think that our implementation scheme is a very simple one.
Mike Pall, the author of LuaJIT\footnote{\texttt{http://luajit.org/}} seems to
have developped the described technique independently. There are no papers
about
-LuaJIT but the author of it writes on a mailing list: "The LOOP pass does
+LuaJIT but the author of it writes on a mailing list: ``The LOOP pass does
synthetic unrolling of the recorded IR, combining copy-substitution with
redundancy elimination to achieve code hoisting. The unrolled and
copy-substituted instructions are simply fed back into the compiler pipeline,
which allows reuse of all optimizations for redundancy elimination. Loop
-recurrences are detected on-the-fly and a minimized set of PHIs is generated."
+recurrences are detected on-the-fly and a minimized set of PHIs is generated.''
\cite{pall_luajit_2009}
-SPUR \cite{bebenita_spur:_2010} implements loop-invariant code motion
+Both the Hotpath VM \cite{gal_hotpathvm:_2006} and SPUR
+\cite{bebenita_spur:_2010} implements loop-invariant code motion
directly, by explicitly marking as loop-invariant all variables that stay the
same along all looping paths and then moving all pure computation that depends
only on these variables out of the loop. SPUR can also hoist loads out of the
@@ -973,8 +985,12 @@
move allocations out of the loop, but does not replace the object by its
fields.
This saves only the allocation, not the access to the object fields.
+The type specialization described by Gal \etal \cite{gal_trace-based_2009} can
+be seen as doing a similar optimization (again by manually implementing it)
+than the one described in Section~\ref{sub:allocation}: The effect of both is
+that type checks are fully done before a loop is even entered.
-XXX
+
% section Related Work (end)
\section{Conclusions}
@@ -1002,9 +1018,8 @@
%This is the text of the appendix, if you need one.
-\acks
-
-Acknowledgments, if needed.
+%\acks
+%Acknowledgments, if needed.
% We recommend abbrvnat bibliography style.
_______________________________________________
pypy-commit mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-commit