[pypy-dev] Differences performance Julia / PyPy on very similar codes

PIERRE AUGIER Fri, 18 Dec 2020 11:55:44 -0800

Hi, 

I post on this list a message written in PyPy issue tracker 
(https://foss.heptapod.net/pypy/pypy/-/issues/3349#note_150255). It is about 
some experiments I did on writing efficient implementations of the NBody 
problem https://github.com/paugier/nbabel to potentially answer to this article 
https://arxiv.org/pdf/2009.11295.pdf.


I get from a PR an [interesting optimized implementation in 
Julia](https://github.com/paugier/nbabel/blob/master/julia/nbabel4_serial.jl). 
It is very fast (even slightly faster than in Pythran). One idea is to store 
the 3 floats of a 3d physical vector, (x, y, z), in a struct `Point4D` 
containing 4 floats to better use simd instructions.

I added a pure Python implementation inspired by this new Julia implementation 
(but with a simple `Point3D` with 3 floats because with PyPy, the `Point4D` 
does not make the code faster) and good news it is with PyPy a bit faster than 
our previous PyPy implementations (only 3 times slower than the old C++ 
implementation).

However, it is much slower than with Julia (while the code is very similar). I 
coded a simplified version in Julia with nearly nothing else that what can be 
written in pure Python (in particular, no `@inbounds` and `@simd` macros). It 
seems to me that the comparison of these 2 versions could be interesting. So I 
again simplified these 2 versions to keep only what is important for 
performance, which gives 

- https://github.com/paugier/nbabel/blob/master/py/microbench_pypy4.py
- https://github.com/paugier/nbabel/blob/master/py/microbench_ju4.jl

The results are summarized in 
https://github.com/paugier/nbabel/blob/master/py/microbench.md

An important point is that with `Point3D` (a mutable class in Python and an 
immutable struct in Julia), Julia is 3.6 times faster than PyPy. Same code and 
nothing really fancy in Julia so I guess that PyPy might be missing some 
optimization opportunities. At least it would be interesting to understand what 
is slower in PyPy (and why). I have to admit that I don't know how to get 
interesting information on timing and what is happening with PyPy JIT in a 
particular case. I only used cProfile and it's of course clearly not enough. I 
can run vmprof but I'm not able to visualize the data because the website 
http://vmprof.com/ is down. I don't know if I can trust values given by IPython 
`%timeit` for particular instructions since I don't know if PyPy JIT does the 
same thing in `%timeit` and in the function `compute_accelerations`.

I also feel that I really miss in pure Python an efficient fixed size 
homogeneous mutable sequence (a "Vector" in Julia words) that can contain basic 
numerical types (as Python `array.array`) but also instances of user-defined 
classes and instances of Vectors. The Python code uses a [pure Python 
implementation using a 
list](https://github.com/paugier/nbabel/blob/master/py/vector.py). I think it 
would be reasonable to have a good implementation highly compatible with PyPy 
(and potentially other Python implementations) in a package on PyPI. It would 
really help to write PyPy compatible numerical codes. What would be the good 
tool to implement such package? HPy? I wonder whether we can get some speedup 
compared to the pure Python version with lists. For very simple classes like 
`Point3d` and `Point4d`, I wonder if the data could be saved continuously in 
memory and if some operations could be done without boxing/unboxing.

However, I really don't know what is slower in PyPy / faster in Julia.

I would be very interested to get the points of view of people knowing well 
PyPy.

Pierre
_______________________________________________
pypy-dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-dev

[pypy-dev] Differences performance Julia / PyPy on very similar codes

Reply via email to