Author: Remi Meier <remi.me...@gmail.com> Branch: extradoc Changeset: r5954:925d7c1b0666 Date: 2019-07-24 16:40 +0200 http://bitbucket.org/pypy/extradoc/changeset/925d7c1b0666/
Log: trying to make the blog post a bit more appealing :) diff --git a/blog/draft/2019-07-arm64-relative.png b/blog/draft/2019-07-arm64-relative.png new file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..e9caaefc53af5bc4d6fcdb21d9b839b8dbb52d4b GIT binary patch [cut] diff --git a/blog/draft/2019-07-arm64-speedups.png b/blog/draft/2019-07-arm64-speedups.png new file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..60de3896d627f6e54e2ac80d0163a74501c3f20a GIT binary patch [cut] diff --git a/blog/draft/2019-07-arm64.rst b/blog/draft/2019-07-arm64.rst --- a/blog/draft/2019-07-arm64.rst +++ b/blog/draft/2019-07-arm64.rst @@ -1,142 +1,60 @@ Hello everyone. -We are pleased to announce that we have successfully ported PyPy -to the AArch64 platform (also known as 64-bit ARM), thanks to funding -provided by ARM Holdings Ltd. and Crossbar.io. +We are pleased to announce the availability of the new PyPy for AArch64. This +port brings PyPy's high-performance just-in-time compiler to the AArch64 +platform, also known as 64-bit ARM. This work was funded by ARM Holdings Ltd. +and Crossbar.io. -We are presenting here the benchmark run done on a Graviton A1 machine -from AWS. There is a very serious word of warning: Graviton A1's are +To show how well the new PyPy port performs, we compare the performance of PyPy +against CPython on a set of benchmarks. As a point of comparison, we include the +results of PyPy on x86_64. Note, however, that the results presented here were +measured on a Graviton A1 machine from AWS, which comes with a very serious +word of warning: Graviton A1's are virtual machines and as such, are not suitable for benchmarking. If someone has access to a beefy enough (16G) ARM64 server and is willing to give us access to it, we are happy to redo the benchmarks on a real machine. -Our main concern here is that while a vCPU is 1-to-1 with a real CPU, it's +Our main concern here is that while a virtual CPU is 1-to-1 with a real CPU, it's not clear to us how caches are shared, and how they cross CPU boundaries. -We are not here interested in comparing machines, so what we are showing is -the relative speedup of PyPy (hg id 2417f925ce94) compared to CPython -(2.7.15). This is the "AArch64" column. In the "x86_64" column we do the -same on a Linux laptop running x86_64, comparing CPython 2.7.16 with the -most recent release, PyPy 7.1.1. +The following graph shows the speedup of PyPy (hg id 2417f925ce94) compared to +CPython (2.7.15) on AArch64, as well as the speedups on a x86_64 Linux laptop, +comparing the most recent release, PyPy 7.1.1, to CPython 2.7.16. -In the last column is a relative comparison between the ARM -architectures: how much the speedup is on arm64 vs. the same benchmark -on x86_64. One important thing to note is that by no means is this -suite a representative enough benchmark set for us to average together -results. Read the numbers individually per-benchmark. +.. image:: 2019-07-arm64-speedups.png -+------------------------------+----------+----------+----------+ -|*Benchmark name* |x86_64 |Aarch64 |relative | -+------------------------------+----------+----------+----------+ -|ai |5.66 |5.34 |0.94 | -+------------------------------+----------+----------+----------+ -|bm_chameleon |2.85 |6.57 |2.30 | -+------------------------------+----------+----------+----------+ -|bm_dulwich_log |1.98 |1.34 |0.68 | -+------------------------------+----------+----------+----------+ -|bm_krakatau |1.20 |0.69 |0.58 | -+------------------------------+----------+----------+----------+ -|bm_mako |4.88 |6.38 |1.31 | -+------------------------------+----------+----------+----------+ -|bm_mdp |0.82 |0.74 |0.90 | -+------------------------------+----------+----------+----------+ -|chaos |25.40 |25.52 |1.00 | -+------------------------------+----------+----------+----------+ -|crypto_pyaes |32.35 |31.92 |0.99 | -+------------------------------+----------+----------+----------+ -|deltablue |1.60 |1.48 |0.93 | -+------------------------------+----------+----------+----------+ -|django |14.15 |13.71 |0.97 | -+------------------------------+----------+----------+----------+ -|eparse |1.43 |1.12 |0.78 | -+------------------------------+----------+----------+----------+ -|fannkuch |4.83 |6.53 |1.35 | -+------------------------------+----------+----------+----------+ -|float |8.43 |8.16 |0.97 | -+------------------------------+----------+----------+----------+ -|genshi_text |3.70 |3.61 |0.98 | -+------------------------------+----------+----------+----------+ -|genshi_xml |2.97 |1.64 |0.55 | -+------------------------------+----------+----------+----------+ -|go |2.77 |2.47 |0.89 | -+------------------------------+----------+----------+----------+ -|hexiom2 |9.35 |8.03 |0.86 | -+------------------------------+----------+----------+----------+ -|html5lib |2.88 |1.93 |0.67 | -+------------------------------+----------+----------+----------+ -|json_bench |2.85 |2.81 |0.99 | -+------------------------------+----------+----------+----------+ -|meteor-contest |2.21 |2.27 |1.03 | -+------------------------------+----------+----------+----------+ -|nbody_modified |9.86 |8.59 |0.87 | -+------------------------------+----------+----------+----------+ -|nqueens |1.12 |1.02 |0.91 | -+------------------------------+----------+----------+----------+ -|pidigits |0.99 |0.62 |0.63 | -+------------------------------+----------+----------+----------+ -|pyflate-fast |3.86 |4.62 |1.20 | -+------------------------------+----------+----------+----------+ -|pypy_interp |2.12 |2.03 |0.95 | -+------------------------------+----------+----------+----------+ -|pyxl_bench |1.72 |1.37 |0.80 | -+------------------------------+----------+----------+----------+ -|raytrace-simple |58.86 |44.21 |0.75 | -+------------------------------+----------+----------+----------+ -|richards |52.68 |44.90 |0.85 | -+------------------------------+----------+----------+----------+ -|rietveld |1.52 |1.28 |0.84 | -+------------------------------+----------+----------+----------+ -|spambayes |1.87 |1.58 |0.85 | -+------------------------------+----------+----------+----------+ -|spectral-norm |21.38 |20.28 |0.95 | -+------------------------------+----------+----------+----------+ -|spitfire |1.28 |2.77 |2.17 | -+------------------------------+----------+----------+----------+ -|spitfire_cstringio |7.84 |7.42 |0.95 | -+------------------------------+----------+----------+----------+ -|sqlalchemy_declarative |1.76 |1.05 |0.60 | -+------------------------------+----------+----------+----------+ -|sqlalchemy_imperative |0.63 |0.60 |0.95 | -+------------------------------+----------+----------+----------+ -|sqlitesynth |1.17 |1.00 |0.86 | -+------------------------------+----------+----------+----------+ -|sympy_expand |1.32 |1.25 |0.95 | -+------------------------------+----------+----------+----------+ -|sympy_integrate |1.10 |1.01 |0.91 | -+------------------------------+----------+----------+----------+ -|sympy_str |0.65 |0.62 |0.95 | -+------------------------------+----------+----------+----------+ -|sympy_sum |1.87 |1.79 |0.96 | -+------------------------------+----------+----------+----------+ -|telco |30.38 |19.09 |0.63 | -+------------------------------+----------+----------+----------+ -|twisted_iteration |13.24 |8.95 |0.68 | -+------------------------------+----------+----------+----------+ -|twisted_names |5.27 |3.31 |0.63 | -+------------------------------+----------+----------+----------+ -|twisted_pb |5.85 |2.90 |0.50 | -+------------------------------+----------+----------+----------+ -|twisted_tcp |3.03 |2.08 |0.68 | -+------------------------------+----------+----------+----------+ +In the majority of benchmarks, the speedups achieved on AArch64 match those +achieved on the x86_64 laptop. Over CPython, PyPy on AArch64 achieves speedups +between 0.6x to 44.9x. These speedups are comparable to x86_64, where they are +between 0.6x and 58.9x. + +The next graph compares between the speedups achieved on AArch64 to the speedups +achieved on x86_64, i.e., how much the speedup is on AArch64 vs. the same +benchmark on x86_64. Note that by no means is this benchmark suite +representative enough to average the results. Read the numbers individually per +benchmark. + +.. image:: 2019-07-arm64-relative.png Note also that we see a wide variance. There are generally three groups of benchmarks - those that run at more or less the same speed, those that -run at 2x the speedup and those that run at 0.5x the speedup of x86_64. +run at 2x the speed and those that run at 0.5x the speed of x86_64. -The variance and disparity are likely related to a variety of issues, -mostly due to differences in architecture. What *is* however -interesting is that compared to older ARM boards, the branch predictor -has gotten a lot better, which means the speedups will be smaller: -"sophisticated" branchy code like CPython itself just runs a lot faster. +The variance and disparity are likely related to a variety of issues, mostly due +to differences in architecture. What *is* however interesting is that compared +to measurements performed on older ARM boards, the branch predictor on the +Graviton A1 machine appears to have improved. As a result, the speedups achieved +by PyPy over CPython are smaller: "sophisticated" branchy code, like CPython +itself, simply runs a lot faster. One takeaway here is that there is a lot of improvement left to be done in PyPy. This is true for both of the above platforms, but probably more so for AArch64, which comes with a large number of registers. The PyPy backend was written with x86 (the 32-bit variant) in mind, which is very register poor. We think we can improve somewhat in the area of emitting -more modern machine code, which should be more impactful for AArch64 -than x86_64. There are also still a few missing features in the AArch64 -backend, which are implemented as calls instead of inlined instructions, -which we hope to improve. +more modern machine code, which should have more impact for AArch64 +than for x86_64. There are also still a few missing features in the AArch64 +backend, which are implemented as calls instead of inlined instructions; +something we hope to improve. Best, Maciej Fijalkowski, Armin Rigo and the PyPy team _______________________________________________ pypy-commit mailing list pypy-commit@python.org https://mail.python.org/mailman/listinfo/pypy-commit