wlavrij...@lbl.gov, 13.04.2012 22:19: >> It's not necessarily slow because a) the intermediate function can do more >> than just passing through data (especially in the case of Cython or Numba) >> and b) the exception case is usually just that, an exceptional case. > > interesting: under a), what other useful work can be done by the intermediate > function?
Cython is a programming language, so you can stick anything you like into the wrapper. Note that a lot of code is not being (re-)written specifically for a platform (CPython/PyPy/...), or at least shouldn't be, so when writing a wrapper as a library, you may want to put some (and sometimes a lot of) functionality into the wrapper itself. Be it to make a C-ish interface more comfortable or to provide a certain functionality on top of a bare C/C++ library. Also, Cython allows you to parallelise code quite easily based on OpenMP, another thing that is often done in wrappers for computational code. This discussion actually arose from the intention to interface Cython code efficiently with Numba, which uses the LLVM to generate code at runtime. For that, both sides need to be able to see the C level signatures of what they call in order to bypass the Python level call overhead. > (Yes for b), but the slowness is in having an extra layered C++ call in > between, the one that contains the try/catch. That's at least an extra 25% > overhead over the naked function pointer at current levels. Of course, only > in a micro benchmark. In real life, it's irrelevant.) IIRC, exceptions can be surprisingly expensive in C++, so I agree that it matters for very small functions. But you'd want to inline those anyway and avoid exceptions if at all possible. >> Ok, I just took a look at it and it seems like the right thing to use for >> this. Then all that's left is an efficient runtime mapping from the >> exported signature to a libffi call specification. > > It need not even be an efficient mapping: since the mapping is static for > each function pointer, the JIT takes care of removing it (that is, it puts > the results of the mapping inline, so the lookup code itself disappears). We're currently discussing ways to do this in Cython as well. The code wouldn't get removed but at least moved out of the way, so that the CPU's branch prediction can do the right thing. That gives you about the same performance in practice. > Same goes for C++ overloads (with a little care): each overload that fails > should result in a (python) exception during mapping of the arguments. The > JIT then removes those branches from the trace, leaving only the call that > succeeded in the optimized trace. Thus, any time spent making the selection > of the overload efficient is mostly wasted, as that code gets completely > removed. A static compiler would handle that similarly. Stefan _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev