[pypy-commit] extradoc extradoc: merge heads

lac Tue, 28 Jun 2011 02:54:26 -0700

Author: Laura Creighton <[email protected]>
Branch: extradoc
Changeset: r3799:c3845bc5089b
Date: 2011-06-28 11:50 +0200
http://bitbucket.org/pypy/extradoc/changeset/c3845bc5089b/


Log:    merge heads

diff --git a/talk/icooolps2011/bolz-hints-final.pdf 
b/talk/icooolps2011/bolz-hints-final.pdf
new file mode 100644
index 
0000000000000000000000000000000000000000..197cad8cabb11ab154bd9109fe0416bb689bbb98
GIT binary patch

[cut]

diff --git a/talk/icooolps2011/jit-hints.pdf b/talk/icooolps2011/jit-hints.pdf
index 
c78b3b84550a3db53382fb1fb1a9a97c0596a4ef..afbc33e62d31d10aa178450e5a83b3e086fee9b8
GIT binary patch

[cut]

diff --git a/talk/icooolps2011/paper.tex b/talk/icooolps2011/paper.tex
--- a/talk/icooolps2011/paper.tex
+++ b/talk/icooolps2011/paper.tex
@@ -1020,7 +1020,7 @@
 The authors would like to thank Peng Wu, David Edelsohn and Laura Creighton for
 encouragement, fruitful discussions and feedback during the writing of this 
paper.
 This research was partially supported by the BMBF funded project PyJIT (nr. 
01QE0913B;
-Eureka Eurostars).
+Eureka Eurostars). We also want to thank the anonymous reviewers for their 
feedback.
 
 \bibliographystyle{abbrv}
 \bibliography{paper}
diff --git a/talk/iwtc11/paper.bib b/talk/iwtc11/paper.bib
--- a/talk/iwtc11/paper.bib
+++ b/talk/iwtc11/paper.bib
@@ -109,6 +109,16 @@
        year = {2009}
 },
 
+@inproceedings{bolz_runtime_2011,
+       address = {Lancaster, {UK}},
+       title = {Runtime Feedback in a {Meta-Tracing} {JIT} for Efficient 
Dynamic Languages},
+       abstract = {Meta-tracing {JIT} compilers can be applied to a variety of 
different languages without explicitly encoding language semantics into the 
compiler. So far, they lacked a way to give the language implementor control 
over runtime feedback. This restricted their performance. In this paper we 
describe the mechanisms in {PyPy&#8217;s} meta-tracing {JIT} that can be used 
to control runtime feedback in language-specific ways. These mechanisms are 
flexible enough to express classical {VM} techniques such as maps and runtime 
type feedback.},
+       booktitle = {{ICOOOLPS}},
+       publisher = {{ACM}},
+       author = {Bolz, Carl Friedrich and Cuni, Antonio and Fija&#322;kowski, 
Maciej and Leuschel, Michael and Rigo, Armin and Pedroni, Samuele},
+       year = {2011}
+},
+
 @inproceedings{chang_tracing_2009,
        address = {Washington, {DC}},
        title = {Tracing for Web 3.0: Trace Compilation for the Next Generation 
Web Applications},
diff --git a/talk/iwtc11/paper.tex b/talk/iwtc11/paper.tex
--- a/talk/iwtc11/paper.tex
+++ b/talk/iwtc11/paper.tex
@@ -45,7 +45,6 @@
 \usepackage{listings}
 \usepackage{beramono}
 
-
 \definecolor{gray}{rgb}{0.3,0.3,0.3}
 
 \lstset{
@@ -116,7 +115,7 @@
            {Heinrich-Heine-Universit&#228;t D&#252;sseldorf}
            {[email protected]}
 \authorinfo{Maciej Fija&#322;kowski}
-           {Unaffiliated}
+           {}
            {[email protected]}
 
 \maketitle
@@ -233,8 +232,11 @@
 
 Because $i_0$ is loop-invariant, the addition could be moved out of the loop.
 However, we want to get this effect using our existing optimization passes
-without changing them too much. To achieve this, we peel one iteration off the
-loop before running the optimizations. This peeling gives the following trace:
+without changing them too much. Simple optimizations with one forward pass
+cannot directly get this effect: They just look at the trace without taking
+into account that the trace executes many times in a row. Therefore to achieve
+loop-invariant code motion, we peel one iteration off the loop before running
+the optimizations. This peeling gives the following trace:
 
 \begin{lstlisting}[mathescape,numbers = 
right,basicstyle=\setstretch{1.05}\ttfamily\scriptsize]
 $L_0$($i_{0}$):
@@ -292,9 +294,14 @@
 changing them at all. All that is needed is to peel off one iteration, then
 apply simple one-pass optimizations and make sure that the necessary extra
 arguments are inserted into the label of the loop itself and the jumps
-afterwards. Giving the optimizations two iterations together
-gives the optimization enough context to remove operations from the peeled 
loop,
-because it detects that the operation was performed in the preamble already.
+afterwards.
+
+This is the key insight of the proposed implementation scheme: Giving an
+optimization two iterations together at the same time gives the optimization
+enough context to remove operations from the peeled loop, because it detects
+that the operation was performed in the preamble already. Thus at runtime these
+moved operations are only executed once when entering the loop and the results
+are reused in further iterations.
 
 
 % section Motivation (end)
@@ -436,9 +443,6 @@
 
 \section{Making Trace Optimizations Loop Aware}
 
-XXX make clear that the preamble is not necessarily the \emph{first} iteration
-of a loop
-
 Before the trace is passed to a backend compiling it into machine code
 it needs to be optimized to achieve better performance.
 The focus of this paper
@@ -478,6 +482,8 @@
 However, the peeled loop can then be optimized using the assumption that a
 previous iteration has happened.
 
+XXX (samuele): the point about the first iteration is hard to understand
+
 When applying optimizations to this two-iteration trace
 some care has to taken as to how the arguments of the two
 \lstinline{jump} operations and the input arguments of the peeled loop are
@@ -713,6 +719,8 @@
 K$.
 
 \subsection{Allocation Removals}
+\label{sub:allocation}
+
 PyPy's allocation removal optimization \cite{bolz_allocation_2011} makes it
 possible to identify objects that are allocated within the loop but never
 escape it. Those objects have to be allocated in the loop, but no outside
@@ -744,8 +752,8 @@
 
 In the general case, each allocation-removed object in the jump arguments is 
exploded into a
 vector of variables containing the values of all registered
-fields\footnote{This is sometimes called \emph{scalar replacement}. XXX check
-whether that's true}. If some of the fields are themselves references to
+fields\footnote{This is sometimes called \emph{scalar replacement}.}.
+If some of the fields are themselves references to
 allocation-removed objects they are recursively exploded
 to make the vector contain only concrete variables. Some care has
 to be taken to always place the fields in the same order when
@@ -937,9 +945,10 @@
 
 We can observe that PyPy (even without loop peeling) is orders of magnitude
 faster than either CPython or Psyco. This is due to the JIT compilation
-advantages and optimizations we discussed in XXX [ref to other paper]. Loop
-peeling gives an additional XXX on average, which makes benchmark times
-comparable with native-compiled C code. Missing performance we attribute to
+advantages and optimizations we discussed in previous work
+\cite{bolz_allocation_2011, bolz_runtime_2011}. The geometric mean of the
+speedup of loop peeling is 70\%, which makes benchmark times
+comparable with native-compiled C code. We attribute the performance gap to C 
code to
 the relative immaturity of PyPy's JIT assembler backend as well as missing
 optimizations, like instruction scheduling.
 
@@ -952,20 +961,23 @@
 \section{Related Work}
 \label{sec:related}
 
-All the optimizations presented here are completely standard
-\cite{muchnick_advanced_1997}. XXX
+The effect of combining a one ass optimization with loop peeling gives
+completely standard loop invariant code motion optimizations
+\cite{muchnick_advanced_1997}. We do not claim any novelty in the effect, but
+think that our implementation scheme is a very simple one.
 
 Mike Pall, the author of LuaJIT\footnote{\texttt{http://luajit.org/}} seems to
 have developped the described technique independently. There are no papers 
about
-LuaJIT but the author of it writes on a mailing list: "The LOOP pass does
+LuaJIT but the author of it writes on a mailing list: ``The LOOP pass does
 synthetic unrolling of the recorded IR, combining copy-substitution with
 redundancy elimination to achieve code hoisting. The unrolled and
 copy-substituted instructions are simply fed back into the compiler pipeline,
 which allows reuse of all optimizations for redundancy elimination. Loop
-recurrences are detected on-the-fly and a minimized set of PHIs is generated."
+recurrences are detected on-the-fly and a minimized set of PHIs is generated.''
 \cite{pall_luajit_2009}
 
-SPUR \cite{bebenita_spur:_2010} implements loop-invariant code motion
+Both the Hotpath VM \cite{gal_hotpathvm:_2006} and SPUR
+\cite{bebenita_spur:_2010} implements loop-invariant code motion
 directly, by explicitly marking as loop-invariant all variables that stay the
 same along all looping paths and then moving all pure computation that depends
 only on these variables out of the loop. SPUR can also hoist loads out of the
@@ -973,8 +985,12 @@
 move allocations out of the loop, but does not replace the object by its 
fields.
 This saves only the allocation, not the access to the object fields.
 
+The type specialization described by Gal \etal \cite{gal_trace-based_2009} can
+be seen as doing a similar optimization (again by manually implementing it)
+than the one described in Section~\ref{sub:allocation}: The effect of both is
+that type checks are fully done before a loop is even entered.
 
-XXX
+
 % section Related Work (end)
 
 \section{Conclusions}
@@ -1002,9 +1018,8 @@
 
 %This is the text of the appendix, if you need one.
 
-\acks
-
-Acknowledgments, if needed.
+%\acks
+%Acknowledgments, if needed.
 
 % We recommend abbrvnat bibliography style.
 
_______________________________________________
pypy-commit mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-commit

[pypy-commit] extradoc extradoc: merge heads

Reply via email to