Author: Hakan Ardo <[email protected]>
Branch: extradoc
Changeset: r3800:81ebb4085592
Date: 2011-06-29 13:16 +0200
http://bitbucket.org/pypy/extradoc/changeset/81ebb4085592/
Log: cleanups
diff --git a/talk/iwtc11/paper.tex b/talk/iwtc11/paper.tex
--- a/talk/iwtc11/paper.tex
+++ b/talk/iwtc11/paper.tex
@@ -123,7 +123,7 @@
\begin{abstract}
By introducing loop peeling into the optimization step of a tracing
jit the effect of optimizations already in place will be increased
-greatly. Not only will they become able to move loop invariant code
+greatly. Not only does it make them move loop invariant code
out of loop. They will also become able to reuse results from the
previous iteration. Also, the implementation of excising optimizations
can be left almost intact as they will not have to interact much with
@@ -444,17 +444,16 @@
\section{Making Trace Optimizations Loop Aware}
Before the trace is passed to a backend compiling it into machine code
-it needs to be optimized to achieve better performance.
-The focus of this paper
-is loop invariant code motion. The goal of that is to move as many
-operations as possible out of the loop making them executed at most once
+it is optimized to achieve better performance.
+One goal of that is to move
+operations out of the loop making them executed only once
and not every iteration. This we propose to achieve by loop peeling. It
leaves the loop body intact, but prefixes it with one iteration of the
loop. This operation by itself will not achieve anything. But if it is
combined with other optimizations it can increase the effectiveness of
those optimizations. For many optimization of interest some care has
to be taken when they are combined with loop peeling. This is
-described below by first explaining the loop peeling optimization
+described below by explaining the loop peeling optimization
followed by a set of other optimizations and how they interact with
loop peeling.
@@ -472,8 +471,8 @@
Loop peeling is achieved by appending an copy of the traced iteration at
the end of itself. See Figure~\ref{fig:overview} for an illustration.
-The first part (called \emph{preamble}) finishes with the jump the the second
part
-(called the \emph{peeled loop}). The second part end with the jump to itself.
This way
+The first part (called \emph{preamble}) finishes with a jump the the second
part
+(called the \emph{peeled loop}). The second part finishes with a jump to
itself. This way
the preamble will be executed only once while the peeled loop will
be used for every further iteration. New variable names have to be
introduced in the entire copied trace in order to maintian the SSA-property.
@@ -501,17 +500,13 @@
$J=\left(J_1, J_2, \cdots, J_{|J|}\right)$, that are passed as the input
variables of the target loop. After
loop peeling there will be a second copy of this trace with input
variables equal to the jump arguments of the preamble, $J$, and jump
-arguments $K$. Looking at the peeled version of our example in
Figure~\ref{fig:peeled-trace} we have
-\begin{equation}
- %\left\{
- \begin{array}{lcl}
- I &=& \left( p_0, p_1 \right) \\
- J &=& \left( p_0, p_5 \right) \\
- K &=& \left( p_0, p_9 \right) \\
- \end{array}
- %\right.
- .
-\end{equation}
+arguments $K$.
+Figure~\ref{fig:overview} illustrates the general case. The running
+example in Figure~\ref{fig:unopt-trace} has $I = \left( p_0, p_1
+\right)$ and $J = \left( p_0, p_5 \right)$. The result of applying
+loop peeling to it is shown in Figure~\ref{fig:peeled-trace} with
+$K = \left( p_0, p_9 \right)$.
+
To construct the second copy of the trace (the peeled loop) from the
first (the preeamble) we need a
function $m$, mapping the variables of the preamble onto the
@@ -545,12 +540,10 @@
\end{equation}
Before the
next operation is copied, $m$ is extend by assigning $m\left(v\right) = \hat
-v$. For the example above, after all the operations have been copied we have
+v$. For the example above, that will extend $m$ with
\begin{equation}
%\left\{
\begin{array}{lcl}
- m\left(p_0\right) &=& p_0 \\
- m\left(p_1\right) &=& p_5 \\
m\left(i_2\right) &=& i_6 \\
m\left(i_3\right) &=& i_7 \\
m\left(i_4\right) &=& i_8 \\
@@ -560,10 +553,6 @@
.
\end{equation}
-The trace from Figure~\ref{fig:unopt-trace} would after this operation become
-the trace in Figure~\ref{fig:peeled-trace}. Line 1-13 shows the
-preamble while line 15-27 shows the peeled loop.
-
\begin{figure}
\begin{lstlisting}[mathescape,numbers =
right,basicstyle=\setstretch{1.05}\ttfamily\scriptsize]
$L_0$($p_{0}$, $p_{1}$):
@@ -688,9 +677,9 @@
jump($L_1$, $p_{0}$, $p_{9}$, $i_3$)
\end{lstlisting}
-In general, after loop peeling and redundant operation removal the peeled loop
-will no longer be in SSA form as it operates on variables that are the result
-of pure operations in the preamble. The solution is to extend the input
+After loop peeling and redundant operation removal the peeled loop
+will typically no longer be in SSA form but operate on variables that are the
result
+of operations in the preamble. The solution is to extend the input
arguments, $J$, with those variables. This will also extend the
jump arguments of the preamble, which is also $J$.
Implicitly that also extends the jump arguments of the peeled loop, $K$,
@@ -702,9 +691,9 @@
optimization as it has removed the variable $i_7$.
In general what is needed is to keep track of
-which variables from the preamble it reuses in the peeled loop.
-It has to construct a vector, $H$, of such variables which
-can be used to update the input and jump arguments using
+which variables from the preamble are reused in the peeled loop.
+By constructing a vector, $H$, of such variables, the input and jump
+arguments can be updated using
\begin{equation}
\hat J = \left(J_1, J_2, \cdots, J_{|J|}, H_1, H_2, \cdots, H_{|H}\right)
\label{eq:heap-inputargs}
@@ -723,9 +712,8 @@
PyPy's allocation removal optimization \cite{bolz_allocation_2011} makes it
possible to identify objects that are allocated within the loop but never
-escape it. Those objects have to be allocated in the loop, but no outside
-object ever gets a reference short lived objects with no references outside the
-loop. This
+escape it. That is, no outside
+object ever gets a reference to them. This
is performed by processing the operations in order and
optimistically removing every \lstinline{new} operation. Later on if
it is discovered that a reference to the object escapes the loop, the
@@ -745,18 +733,18 @@
When the optimizer reaches line 13 it needs to construct the
arguments of the \lstinline{jump} operation, which contains the
reference to the allocation-removed object in $p_5$. This can be achieved by
-exploding $p_5$ into the fields of the allocation-removed object.
-In this case there is only one such field and its value is
+exploding $p_5$ into the attributes of the allocation-removed object.
+In this case there is only one such attribute and its value is
$i_4$, which means that $p_5$ is replaced with $i_4$ in the jump
arguments.
In the general case, each allocation-removed object in the jump arguments is
exploded into a
vector of variables containing the values of all registered
-fields\footnote{This is sometimes called \emph{scalar replacement}.}.
-If some of the fields are themselves references to
+attributes\footnote{This is sometimes called \emph{scalar replacement}.}.
+If some of the attributes are themselves references to
allocation-removed objects they are recursively exploded
to make the vector contain only concrete variables. Some care has
-to be taken to always place the fields in the same order when
+to be taken to always place the attributes in the same order when
performing this explosion. Notation becomes somewhat simpler if also every
concrete variable of the jump arguments is exploded into a vector containing
itself. For
@@ -829,7 +817,7 @@
interpreters implemented within PyPy now can take advantage of
it. Benchmarks have been executed for a few different interpreters and
we see improvements in several cases. The ideal loop for this optimization
-would be short numerical calculations with no failing guards and no
+is short and contains numerical calculations with no failing guards and no
external calls. Larger loops involving many operations on complex objects
typically benefit less from it. Loop peeling never makes runtime performance
worse, in
the worst case the peeled loop is exactly the same as the preamble. Therefore
we
@@ -961,7 +949,7 @@
\section{Related Work}
\label{sec:related}
-The effect of combining a one ass optimization with loop peeling gives
+The effect of combining a one pass optimization with loop peeling gives
completely standard loop invariant code motion optimizations
\cite{muchnick_advanced_1997}. We do not claim any novelty in the effect, but
think that our implementation scheme is a very simple one.
@@ -982,8 +970,8 @@
same along all looping paths and then moving all pure computation that depends
only on these variables out of the loop. SPUR can also hoist loads out of the
loop if nothing in the loop can ever write to the memory location. It can also
-move allocations out of the loop, but does not replace the object by its
fields.
-This saves only the allocation, not the access to the object fields.
+move allocations out of the loop, but does not replace the object by its
attributes.
+This saves only the allocation, not the access to the object attributes.
The type specialization described by Gal \etal \cite{gal_trace-based_2009} can
be seen as doing a similar optimization (again by manually implementing it)
_______________________________________________
pypy-commit mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-commit