[pypy-commit] extradoc extradoc: describe the scimark benchmakrs

hakanardo Wed, 15 Aug 2012 10:44:13 -0700

Author: Hakan Ardo <[email protected]>
Branch: extradoc
Changeset: r4585:4887f7fc2e99
Date: 2012-08-15 19:43 +0200
http://bitbucket.org/pypy/extradoc/changeset/4887f7fc2e99/


Log:    describe the scimark benchmakrs

diff --git a/talk/dls2012/licm.pdf b/talk/dls2012/licm.pdf
index 
d0e3ca21bc58e605bbf333d46f6acdc18de2a29d..d44c6adbc9741258517f595c681611569b3e9240
GIT binary patch

[cut]

diff --git a/talk/dls2012/paper.tex b/talk/dls2012/paper.tex
--- a/talk/dls2012/paper.tex
+++ b/talk/dls2012/paper.tex
@@ -63,7 +63,7 @@
 
 
 \newboolean{showcomments}
-\setboolean{showcomments}{true}
+\setboolean{showcomments}{false}
 \ifthenelse{\boolean{showcomments}}
   {\newcommand{\nb}[2]{
     \fbox{\bfseries\sffamily\scriptsize#1}
@@ -931,8 +931,9 @@
 we see improvements in several cases. The ideal loop for this optimization
 is short and contains numerical calculations with no failing guards and no
 external calls. Larger loops involving many operations on complex objects
-typically benefit less from it. Loop peeling never makes runtime performance 
worse, in
-the worst case the peeled loop is exactly the same as the preamble. Therefore 
we
+typically benefit less from it. Loop peeling never makes the generated code 
worse, in
+the worst case the peeled loop is exactly the same as the preamble. 
+Therefore we
 chose to present benchmarks of small numeric kernels where loop peeling can 
show
 its use.
 
@@ -983,30 +984,30 @@
 \subsection{Python}
 The Python interpreter of the RPython framework is a complete Python
 version 2.7 compatible interpreter. A set of numerical
-calculations were implemented in both Python and in C and their
+calculations were implemented in both Python, C and Lua and their
 runtimes are compared in Figure~\ref{fig:benchmarks}. The benchmarks are
 \begin{itemize}
-\item {\bf sqrt}: approximates the square root of $y$. The approximation is 
+\item {\bf sqrt}$\left(T\right)$: approximates the square root of $y$. The 
approximation is 
 initiated to $x_0=y/2$ and the benchmark consists of a single loop updating 
this
 approximation using $x_i = \left( x_{i-1} + y/x_{i-1} \right) / 2$ for $1\leq 
i < 10^8$. 
 Only the latest calculated value $x_i$ is kept alive as a local variable 
within the loop.
 There are three different versions of this benchmark where $x_i$
-  is represented with different type of objects: int's, float's and
+  is represented with different type of objects, $T$,: int's, float's and
   Fix16's. The latter, Fix16, is a custom class that implements
   fixpoint arithmetic with 16 bits precision. In Python there is only
   a single implementation of the benchmark that gets specialized
   depending on the class of it's input argument, $y$, while in C,
   there are three different implementations.
-\item {\bf conv3}: one-dimensional convolution with fixed kernel-size $3$. A 
single loop
+\item {\bf conv3}$\left(n\right)$: one-dimensional convolution with fixed 
kernel-size $3$. A single loop
 is used to calculate a vector ${\bf b} = \left(b_1, \cdots, b_n\right)$ from a 
vector
 ${\bf a} = \left(a_1, \cdots, a_n\right)$ and a kernel ${\bf k} = \left(k_1, 
k_2, k_3\right)$ using 
 $b_i = k_3 a_i + k_2 a_{i+1} + k_1 a_{i+2}$ for $1 \leq i \leq n$. Both the 
output vector, $\bf b$, 
 and the input vectors, $\bf a$ and $\bf k$, are allocated prior to running the 
benchmark. It is executed 
 with $n=10^5$ and $n=10^6$.
-\item {\bf conv5}: one-dimensional convolution with fixed kernel-size $5$. 
Similar to conv3, but with 
+\item {\bf conv5}$\left(n\right)$: one-dimensional convolution with fixed 
kernel-size $5$. Similar to conv3, but with 
 ${\bf k} = \left(k_1, k_2, k_3, k_4, k_5\right)$. The enumeration of the 
elements in $\bf k$ is still 
 hardcoded into the implementation making the benchmark consist of a single 
loop too.
-\item {\bf conv3x3}: two-dimensional convolution with kernel of fixed
+\item {\bf conv3x3}$\left(n\right)$: two-dimensional convolution with kernel 
of fixed
   size $3 \times 3$ using a custom class to represent two-dimensional
   arrays. It is implemented as two nested loops that iterates over the 
elements of the 
 $n\times n$ output matrix ${\bf B} = \left(b_{i,j}\right)$ and calculates each 
element from the input matrix
@@ -1021,12 +1022,12 @@
 \end{equation}
 for $1 \leq i \leq n$ and $1 \leq j \leq n$.
 The memory for storing the matrices are again allocated outside the benchmark 
and $n=1000$ was used.
-\item {\bf dilate3x3}: two-dimensional dilation with kernel of fixed
+\item {\bf dilate3x3}$\left(n\right)$: two-dimensional dilation with kernel of 
fixed
   size $3 \times 3$. This is similar to convolution but instead of
   summing over the terms in Equation~\ref{eq:convsum}, the maximum over those 
terms is taken. That places a
   external call to a max function within the loop that prevents some
   of the optimizations.
-\item {\bf sobel}: a low-level video processing algorithm used to
+\item {\bf sobel}$\left(n\right)$: a low-level video processing algorithm used 
to
   locate edges in an image. It calculates the gradient magnitude
   using sobel derivatives. A Sobel x-derivative, $D_x$, of a $n \times n$ 
image, ${I}$, is formed
 by convolving ${I}$ with
@@ -1050,11 +1051,31 @@
 on top of a custom two-dimensional array class.
 It is
 a straightforward implementation providing 2 dimensional
-indexing with out of bounds checks. For the C implementations it is
+indexing with out of bounds checks and 
+data stored in row-major order.
+For the C implementations it is
 implemented as a C++ class. The other benchmarks are implemented in
 plain C. All the benchmarks except sqrt operate on C double-precision floating
 point numbers, both in the Python and the C code.
 
+In addition we also ported the 
+SciMark\footnote{\texttt{http://math.nist.gov/scimark2/}} benchmakts to 
python, and compared 
+their runtimes with the already existing Lua and C implementations. 
+This port was performed after the release of the pypy used to run the 
benchmarks which means that 
+these benchmarks have not influenced the pypy implementation.
+SciMark consists of 
+
+\begin{itemize}
+\item {\bf SOR}$\left(n, c\right)$: Jacobi successive over-relaxation on a 
$n\times n$ grid repreated $c$ times.
+The same custom two-dimensional array class as described above is used to 
represent
+the gird.
+\item {\bf SparseMatMult}$\left(n, z, c\right)$: Matrix multiplication between 
a $n\times n$ sparse matrix,
+stored in compressed-row format, and a full storage vector, stored in a normal 
array. The matrix has $z$ non-zero elements and the calculation is repeated $c$ 
times.
+\item {\bf MonteCarlo}$\left(n\right)$: Monte Carlo integration by generating 
$n$ points uniformly distributed over the unit square and computing the ratio 
of those within the unit circle.
+\item {\bf LU}$\left(n, c\right)$: Computes the LU factorization of a $n 
\times n$ matrix. The rows of the matrix is shuffled which makes the previously 
used two-dimensional array class unsuitable. Instead a list of arrays is used 
to represent the matrix. The calculation is repeated $c$ times.
+\item {\bf FFT}$\left(n, c\right)$: Fast Fourier Transform of a vector with 
$n$ elements, represented as an array, repeated $c$ times.
+\end{itemize}
+
 Benchmarks were run on Intel i7 M620 @2.67GHz with 4M cache and 8G of RAM
 using Ubuntu Linux 11.4 in 32bit mode.
 The machine was otherwise unoccupied. We use the following software
_______________________________________________
pypy-commit mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-commit

[pypy-commit] extradoc extradoc: describe the scimark benchmakrs

Reply via email to