cpyext module?

Matti Picus Sun, 28 May 2023 23:12:57 -0700

On 29/5/23 07:37, hzy15610046011 via pypy-dev wrote:

Thanks for your answer!


I read the link [0] and run such benchmarks, finding that slow performance on 
allocate_tuple still exists now. On my cpu, it was 0.65s for cpython3.8, and 
7.6s for PyPy 3.9. I am wondering if this could lead to the undesired low 
performance on pandas.

As for my code pattern, I am trying to add support for PyPy in my simulation 
library, and the workflow of my project was like that:

1. Read data from excel files
2. Perform CPU-intensive computations with an object-oriented program involving 
a number of objects.
3. Write simulation data to a SQLite database by Pandas, just using `pd.to_sql`.

When running on PyPy, step 2 was more than 8 times faster than CPython 
interpreter. However, step 3 were 3~5 times slower.

I have found that it was not a problem caused by the sqlite3 library inside 
PyPy, because on a pure-python SQLite3 program,  PyPy was 1.2x~2x faster than 
CPython. So this problem might be due to the C-API performance problem when 
calling pandas.

As far as I know, (1) the easiest way is to rewrite a pure-python table IO 
library instead of pandas, because there were just few functions in pandas that 
had been imported into my project. (2) But if one day the performance of pandas 
on PyPy could be better (about 0.5x~0.8x of that on CPython), the better idea 
should be continuing using Pandas, because most of the python programmer knows 
it.

Could you please give me some suggestion about what I should do to solve this 
problem? Should I choose way (1) to implement a pure-python table library, or 
had better wait for (2)? Also I am interested in PyPy project itself, and 
wondering if improving performance for `Py_BuildValue` is feasible. Thanks!

Hou

I would think a properly written pure-python solution (1) could outperform any c-extension on PyPy, but I don't think such a project exists.

For (2), in the long term there is HPy [0]. In the short term, there aremany possible optimizations we could do for cpyext. Are you surePy_BuildValue is the top of the list, i.e. did you profile `pd.to_sql`and that came out as a very common and slower-than-cpython function?



Matti


[0] https://hpyproject.org/

_______________________________________________
pypy-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/pypy-dev.python.org/
Member address: [email protected]

[pypy-dev] Re: Question: Is there any faster way to run benchmarks on pypy/module/cpyext module?

Reply via email to