Author: fijal Branch: extradoc Changeset: r5951:0dd699c2378b Date: 2019-07-24 13:54 +0200 http://bitbucket.org/pypy/extradoc/changeset/0dd699c2378b/
Log: add a draft diff --git a/blog/draft/2019-07-arm64.rst b/blog/draft/2019-07-arm64.rst new file mode 100644 --- /dev/null +++ b/blog/draft/2019-07-arm64.rst @@ -0,0 +1,136 @@ +Hello everyone + +We are pleased to announce that we have successfully ported PyPy +to Aarch64 (also known as a 64bit ARM) platform, thanks to funding +provided by ARM Holdings Ltd. and Crossbar.io. + +We are presenting here the benchmark run done on a Graviton A1 machine +from AWS. There is a very serious word of warning: Graviton A1 are +virtual machines and as such, are not suitable for benchmarking. If someone +has access to a beefy enough (16G) ARM64 server and is willing to give +us access to it, we are happy to redo the benchmarks on a real machine. +My main concern here is that while a vCPU is 1-1 with a real CPU, it's not +clear to me how caches are shared and how they cross CPU boundaries. + +We are not interested in comparing machines, so what we are showing is +a relative speedup to CPython (2.7.15), compared to PyPy (hg id 2417f925ce94). + +In the last column is a comparison - how much do we speedup on arm64, vs +how much do we speed up on x86_64. One important thing to note is that +by no means this is a representative enough benchmark set that we can average +anything. Read numbers per-benchmark. + ++------------------------------+----------+----------+----------+ +|*Benchmark name* |x86_64 |Aarch64 |relative | ++------------------------------+----------+----------+----------+ +|ai |5.66 |5.34 |0.94 | ++------------------------------+----------+----------+----------+ +|bm_chameleon |2.85 |6.57 |2.30 | ++------------------------------+----------+----------+----------+ +|bm_dulwich_log |1.98 |1.34 |0.68 | ++------------------------------+----------+----------+----------+ +|bm_krakatau |1.20 |0.69 |0.58 | ++------------------------------+----------+----------+----------+ +|bm_mako |4.88 |6.38 |1.31 | ++------------------------------+----------+----------+----------+ +|bm_mdp |0.82 |0.74 |0.90 | ++------------------------------+----------+----------+----------+ +|chaos |25.40 |25.52 |1.00 | ++------------------------------+----------+----------+----------+ +|crypto_pyaes |32.35 |31.92 |0.99 | ++------------------------------+----------+----------+----------+ +|deltablue |1.60 |1.48 |0.93 | ++------------------------------+----------+----------+----------+ +|django |14.15 |13.71 |0.97 | ++------------------------------+----------+----------+----------+ +|eparse |1.43 |1.12 |0.78 | ++------------------------------+----------+----------+----------+ +|fannkuch |4.83 |6.53 |1.35 | ++------------------------------+----------+----------+----------+ +|float |8.43 |8.16 |0.97 | ++------------------------------+----------+----------+----------+ +|genshi_text |3.70 |3.61 |0.98 | ++------------------------------+----------+----------+----------+ +|genshi_xml |2.97 |1.64 |0.55 | ++------------------------------+----------+----------+----------+ +|go |2.77 |2.47 |0.89 | ++------------------------------+----------+----------+----------+ +|hexiom2 |9.35 |8.03 |0.86 | ++------------------------------+----------+----------+----------+ +|html5lib |2.88 |1.93 |0.67 | ++------------------------------+----------+----------+----------+ +|json_bench |2.85 |2.81 |0.99 | ++------------------------------+----------+----------+----------+ +|meteor-contest |2.21 |2.27 |1.03 | ++------------------------------+----------+----------+----------+ +|nbody_modified |9.86 |8.59 |0.87 | ++------------------------------+----------+----------+----------+ +|nqueens |1.12 |1.02 |0.91 | ++------------------------------+----------+----------+----------+ +|pidigits |0.99 |0.62 |0.63 | ++------------------------------+----------+----------+----------+ +|pyflate-fast |3.86 |4.62 |1.20 | ++------------------------------+----------+----------+----------+ +|pypy_interp |2.12 |2.03 |0.95 | ++------------------------------+----------+----------+----------+ +|pyxl_bench |1.72 |1.37 |0.80 | ++------------------------------+----------+----------+----------+ +|raytrace-simple |58.86 |44.21 |0.75 | ++------------------------------+----------+----------+----------+ +|richards |52.68 |44.90 |0.85 | ++------------------------------+----------+----------+----------+ +|rietveld |1.52 |1.28 |0.84 | ++------------------------------+----------+----------+----------+ +|spambayes |1.87 |1.58 |0.85 | ++------------------------------+----------+----------+----------+ +|spectral-norm |21.38 |20.28 |0.95 | ++------------------------------+----------+----------+----------+ +|spitfire |1.28 |2.77 |2.17 | ++------------------------------+----------+----------+----------+ +|spitfire_cstringio |7.84 |7.42 |0.95 | ++------------------------------+----------+----------+----------+ +|sqlalchemy_declarative |1.76 |1.05 |0.60 | ++------------------------------+----------+----------+----------+ +|sqlalchemy_imperative |0.63 |0.60 |0.95 | ++------------------------------+----------+----------+----------+ +|sqlitesynth |1.17 |1.00 |0.86 | ++------------------------------+----------+----------+----------+ +|sympy_expand |1.32 |1.25 |0.95 | ++------------------------------+----------+----------+----------+ +|sympy_integrate |1.10 |1.01 |0.91 | ++------------------------------+----------+----------+----------+ +|sympy_str |0.65 |0.62 |0.95 | ++------------------------------+----------+----------+----------+ +|sympy_sum |1.87 |1.79 |0.96 | ++------------------------------+----------+----------+----------+ +|telco |30.38 |19.09 |0.63 | ++------------------------------+----------+----------+----------+ +|twisted_iteration |13.24 |8.95 |0.68 | ++------------------------------+----------+----------+----------+ +|twisted_names |5.27 |3.31 |0.63 | ++------------------------------+----------+----------+----------+ +|twisted_pb |5.85 |2.90 |0.50 | ++------------------------------+----------+----------+----------+ +|twisted_tcp |3.03 |2.08 |0.68 | ++------------------------------+----------+----------+----------+ + +Note that we see a wide variance. There are generally three groups of +benchmarks - those that run at more or less the same speed, those that +run at 2x the speedup and those that run at 0.5x the speedup of x86_64. + +This can be related to a variety of issues, mostly related to differences +in architecture. What *is* however interesting is that compared to older +ARM boards, the branch predictor got a lot better, which means the speedups +will be smaller, since "sophisticated" branchy code, like source code of +CPython just runs a lot faster. + +One takeaway here is that there is a lot of improvement to be done in PyPy. +This is true for both of the above platforms, but probably more so for Aarch64 +which comes with really a lot of registers. PyPy backend has been written with +x86 (the 32bit variant) in mind, which is very register poor. We think we can +improve somewhat in the area of emitting more modern code and it will probably +make somewhat more difference on Aarch64 than on x86_64, where running old +crappy code efficiently has been a massive focus. + +Best, +Maciej Fijalkowski, Armin Rigo and the PyPy team _______________________________________________ pypy-commit mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-commit
