----- Mail original ----- > De: "Ralf Gommers via NumPy-Discussion" <numpy-discuss...@python.org> > À: "Discussion of Numerical Python" <numpy-discuss...@python.org> > Cc: "hpy-dev" <hpy-...@python.org>, "pypy-dev" <pypy-dev@python.org>, "ralf > gommers" <ralf.gomm...@gmail.com> > Envoyé: Mercredi 30 Avril 2025 07:32:44 > Objet: [Numpy-discussion] Re: Better compatibility of the Python > scientific/data stack with fast Python interpreters
> On Tue, Apr 29, 2025 at 11:24 AM PIERRE AUGIER < [ > mailto:pierre.aug...@univ-grenoble-alpes.fr | > pierre.aug...@univ-grenoble-alpes.fr ] > wrote: > > > Dear Numpy community members and Numpy developers, > > This email is to get the points of view of the Numpy community members and > developers about a subject that I find very important. I'm going to introduce > it just in few lines so I write few statements without backing them with > proper > arguments and without giving links and word definitions. I assume that most > people registered to the numpy-discussion list are at least familiar to the > subject. > > Thanks for thinking about this Pierre. > Thanks Ralf for your answer. Even if it is not very positive, it it VERY interesting! > I think getting proper compatibility of the Python scientific/data stack with > fast Python interpreters is very important for the long-term future of Python > for science and data. > > I'm not sure it is, it wouldn't rank high on my wish list. PyPy is nearing its > end-of-life, and GraalPy is really only for Java devs it seems - I've never > seen it in the real world. - PyPy and GraalPy are not used a lot because the Python ecosystem is highly incompatible with them since it is based on the CPython C API, which assumes in particular reference counting. This is really a chicken and egg problem. With a better ecosystem, we can anticipate a strong increase of PyPy and GraalPy usage. - I'm not aware that PyPy is nearing its end-of-life. It would really be a shame for Python to lose this project. - GraalPy is not at all only for Java devs. It can be used by every Python users. It is just a young project and its main issue for usability is the lack of specific wheels. Numpy implemented with HPy would fix that. However, Java devs can also be Python and Numpy users and getting Numpy fully compatible with GraalPy would be very good for applications using Java and Python. > With CPython performance being worked hard on, the performance gap is > shrinking every year. I don't think the word "shrinking" is very adapted :-) ! Even on pyperformance (which is a set of benchmarks designed for CPython development), PyPy and GraalPy are still typically 4 to 5 times faster than CPython. For pure Python CPU bounded cases, it is more of the order of 20 times faster, which is typically comparable with what one gets with NodeJS. Unfortunately, the Faster CPython project is getting slowly and has a very hard time to get significant performance improvements. One can have a look at https://github.com/faster-cpython/benchmarking-public. It would be a great achievement if CPython 3.14 with its JIT can be 1.5 faster than CPython 3.10 (measured with pyperformance, again compared to typically 4 to 5 times faster with PyPy and GraalPy). The constrains on CPython (in particular related to the CPython C API) are so strong that for most cases the Faster CPython project won't be able to reach what can be obtained with fast Python implementations using advanced methods and algorithms. As long as the CPython C API assumes reference counting (and other implementation details bad for performance), CPython performance cannot reached what can be obtained with PyPy, GraalPy or NodeJS. If CPython changes too much its internals, it would need an emulation layer like PyPy's cpyext, and then extensions would become slower, which is of course not acceptable. > More importantly, none of these efforts (including the "faster CPython" > project), seem critical to numerical/scientific users. We're still talking > about pure Python code that gets maybe up to 5x faster, while the gains from > doing things in compiled languages are a lot higher. For numerical kernels, specialized static languages are clearly needed, but for global organisation and orchestration, PyPy and GraalPy are really very efficient (I guess in most cases efficient enough). For pure Python code and compared to CPython, the factor is not 5x but more 10x or 20x. > So the benefits are more > important for small packages if it moves the threshold at which it becomes > necessary for them to write zero extension modules. For core libraries like > NumPy, pure Python performance isn't super critical. For libraries like NumPy, it is not. But for users of these libraries, it is. Or at least getting the possibility to use fast Python (typically as fast as Javascript) would change what we can do with Python and what can be coded in Python. For example, currently, it is not doable to use a lot of small Python objects in performance critical code so nice OOP Python cannot be used. With alternative fast Python implementations more usable with an ecosystem using HPy and universal wheels, it would become possible. I remember during a Python training a person (it was a good C++ developer) trying to write something doing ray tracing with OOP and Numpy. It was super slow with CPython so I told him to try in pure Python with PyPy (it was just a game to learn so why not). He soon got a prototype and was quite impressed with the speed. Then there was something else in the code that was convenient to be done with Numpy and then, pff, all the good performance disappeared because the Numpy calls where super slow (going through cpyext) and the rest of the code was then using some Numpy numbers. This was very frustrating. We also see projects using a lot Python (with lot of small objects) that are so slow compared to comparable alternatives (for example Pelican compared to Hugo in Go; myst-parser compared to mystmd in JavaScript; I also think about Sphinx and its so long builds, ...). It would change the life of Python users if it would be easy to use fast implementations with performance similar to NodeJS. Moreover, if boxing/unboxing can be avoided between Python JITs and native code (which should be doable with HPy), Numpy calls would be significantly faster. With Graal VM it could even be more spectacular because it could be possible that the same JIT compiles together Python and native code without barriers across the language boundaries. > When thinking about overall performance improvements, I'd say that the two > most > promising large developments are: (1) making it easier to use accelerator > libraries (PyTorch, CuPy et al.), and (2) free-threaded CPython for enabling > Python-level threading. Good and usable sub-interpreters would also be great and I think this is highly coupled with Numpy-HPy. > [...] > > Moreover, unfortunately, HPy does not currently receive as much care as it > should. > > > > It seems to me that the project of fixing the roots of the Python ecosystem > has > to be relaunched. I think that the dynamics has to come from the Python > scientific/data community and in particular Numpy. It is unfortunately outside > of the C API working group's scope (see [ > https://discuss.python.org/t/c-api-working-group-and-plan-to-get-a-python-c-api-compatible-with-alternative-python-implementations/89477 > | > https://discuss.python.org/t/c-api-working-group-and-plan-to-get-a-python-c-api-compatible-with-alternative-python-implementations/89477 > ] ). > > It seems to me that it is necessary (and possible) to get some founding for > such > an impactful project so that we can get people working on it. > > I'd like to write a long and serious text on this subject, and I first try to > get the points of view of the different people and projects involved. > > I guess I should write explicit questions: > > - What do you think about the project of fixing the Python scientific/data > stack > so that it becomes natively compatible (hence fast and convenient, with easy > installations) with alternative and fast Python interpreters? > - Do you have points of view on how this should be done, technically (HPy?, > something else?) and on other aspects (community, NEP?, founding, ...). > - Anything else interesting on this subject? > > Having HPy or something like it will be very nice and lead to long-term > benefits. The problem with HPy seems to be more a social one at this point: if > CPython core devs don't want to adopt it but do their own "make the C API more > opaque" strategy, then more effort on HPy isn't going to help. If you're going > to dig into this more, I suggest trying to get a very good sense of what the > CPython core dev team, and in particular its C API Workgroup, is > thinking/planning. That will inform whether the right strategy is to help > their > efforts along, or work on HPy. This is very interesting that you write that! This is exactly what I tried to understand with my questions in https://discuss.python.org/t/c-api-working-group-and-plan-to-get-a-python-c-api-compatible-with-alternative-python-implementations/89477 I think that the discussion was very informative. It is in particular interesting to see that the C API working group is only focused (i) on the "historical" CPython C API (HPy is considered as an external project) and (ii) on slow and gradual evolution of this C API (to go at the pace of the slowest projects using the CPython C API). Some CPython core devs work in the direction of "making the C API more opaque", but important "revolutionary" changes (like providing an API not assuming reference counting) have just been discussed (https://github.com/capi-workgroup/api-revolution/issues) and it seems that there are not yet practical plans for them (for example https://github.com/markshannon/New-C-API-for-Python/ is stopped). Moreover, there are no time schedule and there cannot be any time schedule since the pace depends on all CPython C API users. In practice, to get in a reasonable time (not 12 years) the scientific/data Python stack natively compatible with advanced methods used in dynamic language interpreters, one needs (i) to invest on HPy and (ii) to port Numpy to something else than the CPython C API (it can be HPy but it could also be Cython or maybe other things). Also, it seems that the C API working group does not care about the principle of the HPy universal wheels (compatible across Python versions and implementations). To have something like that, we need HPy. It seems to me that a strategy based on HPy would give practical benefices for users in a much shorter time (typically few years) than just waiting for CPython C API evolution. Let's recall that Numpy 1 has already been ported in HPy (like Matplotlib by the way). I thus agree with > The problem with HPy seems to be more a social one at this point Cheers, Pierre _______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-le...@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: arch...@mail-archive.com