On 29/5/23 07:37, hzy15610046011 via pypy-dev wrote:
Thanks for your answer!
I read the link [0] and run such benchmarks, finding that slow performance on
allocate_tuple still exists now. On my cpu, it was 0.65s for cpython3.8, and
7.6s for PyPy 3.9. I am wondering if this could lead to the undesired low
performance on pandas.
As for my code pattern, I am trying to add support for PyPy in my simulation
library, and the workflow of my project was like that:
1. Read data from excel files
2. Perform CPU-intensive computations with an object-oriented program involving
a number of objects.
3. Write simulation data to a SQLite database by Pandas, just using `pd.to_sql`.
When running on PyPy, step 2 was more than 8 times faster than CPython
interpreter. However, step 3 were 3~5 times slower.
I have found that it was not a problem caused by the sqlite3 library inside
PyPy, because on a pure-python SQLite3 program, PyPy was 1.2x~2x faster than
CPython. So this problem might be due to the C-API performance problem when
calling pandas.
As far as I know, (1) the easiest way is to rewrite a pure-python table IO
library instead of pandas, because there were just few functions in pandas that
had been imported into my project. (2) But if one day the performance of pandas
on PyPy could be better (about 0.5x~0.8x of that on CPython), the better idea
should be continuing using Pandas, because most of the python programmer knows
it.
Could you please give me some suggestion about what I should do to solve this
problem? Should I choose way (1) to implement a pure-python table library, or
had better wait for (2)? Also I am interested in PyPy project itself, and
wondering if improving performance for `Py_BuildValue` is feasible. Thanks!
Hou
I would think a properly written pure-python solution (1) could out
perform any c-extension on PyPy, but I don't think such a project exists.
For (2), in the long term there is HPy [0]. In the short term, there are
many possible optimizations we could do for cpyext. Are you sure
Py_BuildValue is the top of the list, i.e. did you profile `pd.to_sql`
and that came out as a very common and slower-than-cpython function?
Matti
[0] https://hpyproject.org/
_______________________________________________
pypy-dev mailing list -- pypy-dev@python.org
To unsubscribe send an email to pypy-dev-le...@python.org
https://mail.python.org/mailman3/lists/pypy-dev.python.org/
Member address: arch...@mail-archive.com