Hey. First sorry for late response, we're kind of busy doing other things now (ie working on 2.5-compatible release). That doesn't mean we don't appreciate input about our problems.
On Fri, Oct 17, 2008 at 5:50 AM, Geoffrey Irving <[EMAIL PROTECTED]> wrote: > Hello, > > I posted a response to your blog post on C++ library bindings, and > wanted to continue the discussion further via email if anyone's > interested. I just signed up for the mailing list, so apologies if I > missed a lot of previous discussion. I'll say up front that it's > unlikely that I'll be able to devote any actual coding effort to this, > so feel free to tell me to get lost if you have plenty of ideas and > not enough manpower. :) That's fine. We don't have enough manpower to work on this now, but knowing what people do in this area is very valuable once we get to it. > > I started out writing C++ bindings using Boost.Python, and was very > happy with it for a long time. It's strongest point is the ability to > wrap libraries that were never designed with python in mind, > specifically code with poor and inflexible ownership semantics. > Internally, this means that C++ objects are exposed indirectly through > a holder object containing either an inline copy of the C++ object or > any type of pointer holding the object. Every access to the object > has to go through runtime dispatch in order to work with any possible > holder type. The holder also contains the logic for ownership and > finalization. For example, Boost.Python can return a reference to a > field inside another object, in which case the holder will keep a > reference to the parent object to keep it alive as long as the field > reference lives. > > The problem with this generality is that it produces a huge amount of > object code (wrapping a single function in Boost.Python can add 10k to > the object file), and adds a lot of runtime indirection. > > Assuming that one is writing C++ bindings because of speed issues, > it'd be nice if this extra layer of memory indirection and runtime > dispatch was exposed to the (eventual) JIT. In order to do that, pypy > would have to be capable of handling pointers to raw memory containing > non-python objects (is already true due to ctypes stuff?) That's true. PyPy is able to handle pointers to any C place. > .. with > separate information about type and ownership. We don't provide this, since C has no notion of that at all. > For example, if you > have bindings for a C++ vector class and a C++ array containing the > vectors, a "reference" to an individual vector in the array is really > three different pieces: > > 1. The actual pointer to the vector. > 2. A type structure containing functions to be called with the pointer > (1) as an argument. > 3. A list of references to other objects that need to stay alive while > this reference lives. > > If pypy and the JIT ends up able to treat these pieces separately, > it'd be a significant performance win over libraries wrapped with > CPython. > > The other main source of slowness and complexity in Boost.Python is > overloading support, but I think that part is fairly straightforward > to handle in the python level. All Boost.Python does internally is > loop over the set of functions registered for a given name, and for > each one loop over the arguments calling into its converter registry > to see if the python object can be converted to the C++ type. > > As I mentioned in the blog comment, a lot of these issues come up in > contexts outside C++, like numpy. Internally numpy represents > operations like addition as a big list of optimized routines to call > depending on the stored data type. Functions in these tables are > called on raw pointers to memory, which is fundamental since numpy > arrays can refer to memory inside objects from C++, Fortran, mmap, > etc. It'd be really awesome if the type dispatch step could be > written in python but still call into optimized C code for the final > arithmetic. That's the goal. Well, not exactly - point is that you write this code in Python/RPython and JIT is able to generate efficient assembler out of it. That's a very far-reaching goal though to have nice integration between yet-non-existant JIT and yet-non-existant PyPy's numpy :-) > > The other major issue is safety: if a lot of overloading and dispatch > code is going to be written in python, it'd be nice to shield that > code from segfaults. I think you can get a long way there just by > having a consistent scheme for boxing the three components above > (pointer, type, and reference info), a way to label C function > pointers with type information, a small RPython layer that did simple > type-checked calls (with no support for overloading or type > conversion). I just wrote a C++ analogue to this last part as a > minimal replacement for Boost.Python, so I could try to formulate what > I mean in pseudocode if there's interest. There'd be some amount of > duplicate type checking if higher level layers such as overload > resolution were written in application level python, but that > duplication should be amenable to elimination by the JIT. I think for now we're happy with extra overhead. We would like to have *any* working C++ bindings first and then eventually think about speeding it up. > > That's enough for now. I'll look forward to the discussion. Most of > my uses of python revolve heavily around C++ bindings, so it's > exciting to see that you're starting to think about it even if it's a > long way off. Thank you :) Cheers, fijal _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
