On Wed, 6 Jan 2021, PIERRE AUGIER wrote:
A big issue IMHO with Cython is that Cython code is not compatible with Python and can't be interpreted. So we lose the advantage of an interpreted language in term of development. One small change in this big extension and one needs to recompile everything.
That's a valid point to a certain extent - however, in my experience, I was always somehow able to extract individual small functions in mini-modules and then I wrote some Makefile / setuptools glue to automate chained recompilation of all parts that changed, whenever I ran unit tests or command line interface so recompilation kept annoying me only until I've got the magic to work :-)
For me, debugging is really harder (maybe because I'm not good at debugging native codes). Moreover, actually one needs to know (a bit of) C to write efficient Cython code so that it's difficult for some contributors to understand/develop Cython extensions.
I must admit that I never needed to debug anything because I was doing TDD in the fist place, but probably you are right - debugging generated monster codes must be quite scary as compared to pure Python code with full IDE support like PyCharm.
Anyways, call me chauvinist, but I'd say it's just a sad fact of life that you need to know a thing or two about writing correct numeric low-level performance oriented code.
I assume you know it anyways and I'm sure that your worked up summation example below was just to make a completely different point, but as a matter of fact in your code the worst-case error grows proportionally to the number of elements in the vector (N) and RMS error grows proportionally to the square root of N for random inputs, so the results of your computations are going to be accordingly pretty random in the general case ;-)
Where I'm getting with this is that people who do this kind of stuff are somehow not bothered by Cython problems, and people who don't are rightfully bothered by valid issues, but if they are going to be helped, will it help their cause :-) ? Who knows...
On top of that, again, there is the whole MPI story. I used to write Python stuff that scaled to the hundreds of thousands of cores. I still did SIMD inside OpenMP threads on the local nodes on top of that just for kicks, but actually I could have achieved a factor of 4x speedup just by scheduling my jobs overnight with 4x cores instead and saved myself the trouble. But I wanted trouble, because it was fun :-)
Cython and mpi4py make MPI almost criminally easy on Python, so once you get this far, there comes the question - does 2x or 4x on the local node actually matter at all?
So my questions are: Is it technically possible to extend Python and PyPy to develop such extension and make it very efficient? Which tools should be used? How should it be written?
It is absolutely technically possible and is a good idea in as far as I'm concerned, but I think that the challenge lies in developing conventions for semantics and getting people to accept them. I think that the zoo of various accelerators / compilers / boosters for Python only proves the point that this must be the hard part.
As for a backing buffer access mechanism, cffi is definitively a right tool - PyPy can already "see through" it as you've proven with your small example.
-- Sincerely yours, Yury V. Zaytsev _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev