Hey.

First sorry for late response, we're kind of busy doing other things
now (ie working on 2.5-compatible release). That doesn't mean we don't
appreciate input about our problems.

On Fri, Oct 17, 2008 at 5:50 AM, Geoffrey Irving <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I posted a response to your blog post on C++ library bindings, and
> wanted to continue the discussion further via email if anyone's
> interested.  I just signed up for the mailing list, so apologies if I
> missed a lot of previous discussion.  I'll say up front that it's
> unlikely that I'll be able to devote any actual coding effort to this,
> so feel free to tell me to get lost if you have plenty of ideas and
> not enough manpower. :)

That's fine. We don't have enough manpower to work on this now, but
knowing what people do in this area is very valuable once we get to
it.

>
> I started out writing C++ bindings using Boost.Python, and was very
> happy with it for a long time.  It's strongest point is the ability to
> wrap libraries that were never designed with python in mind,
> specifically code with poor and inflexible ownership semantics.
> Internally, this means that C++ objects are exposed indirectly through
> a holder object containing either an inline copy of the C++ object or
> any type of pointer holding the object.  Every access to the object
> has to go through runtime dispatch in order to work with any possible
> holder type.  The holder also contains the logic for ownership and
> finalization.  For example, Boost.Python can return a reference to a
> field inside another object, in which case the holder will keep a
> reference to the parent object to keep it alive as long as the field
> reference lives.
>
> The problem with this generality is that it produces a huge amount of
> object code (wrapping a single function in Boost.Python can add 10k to
> the object file), and adds a lot of runtime indirection.
>
> Assuming that one is writing C++ bindings because of speed issues,
> it'd be nice if this extra layer of memory indirection and runtime
> dispatch was exposed to the (eventual) JIT.  In order to do that, pypy
> would have to be capable of handling pointers to raw memory containing
> non-python objects (is already true due to ctypes stuff?)

That's true. PyPy is able to handle pointers to any C place.

> .. with
> separate information about type and ownership.

We don't provide this, since C has no notion of that at all.

> For example, if you
> have bindings for a C++ vector class and a C++ array containing the
> vectors, a "reference" to an individual vector in the array is really
> three different pieces:
>
> 1. The actual pointer to the vector.
> 2. A type structure containing functions to be called with the pointer
> (1) as an argument.
> 3. A list of references to other objects that need to stay alive while
> this reference lives.
>
> If pypy and the JIT ends up able to treat these pieces separately,
> it'd be a significant performance win over libraries wrapped with
> CPython.
>
> The other main source of slowness and complexity in Boost.Python is
> overloading support, but I think that part is fairly straightforward
> to handle in the python level.  All Boost.Python does internally is
> loop over the set of functions registered for a given name, and for
> each one loop over the arguments calling into its converter registry
> to see if the python object can be converted to the C++ type.
>
> As I mentioned in the blog comment, a lot of these issues come up in
> contexts outside C++, like numpy.  Internally numpy represents
> operations like addition as a big list of optimized routines to call
> depending on the stored data type.  Functions in these tables are
> called on raw pointers to memory, which is fundamental since numpy
> arrays can refer to memory inside objects from C++, Fortran, mmap,
> etc.  It'd be really awesome if the type dispatch step could be
> written in python but still call into optimized C code for the final
> arithmetic.

That's the goal. Well, not exactly - point is that you write this code
in Python/RPython and JIT is able to generate efficient assembler out
of it. That's a very far-reaching goal though to have nice integration
between yet-non-existant JIT and yet-non-existant PyPy's numpy :-)

>
> The other major issue is safety: if a lot of overloading and dispatch
> code is going to be written in python, it'd be nice to shield that
> code from segfaults.  I think you can get a long way there just by
> having a consistent scheme for boxing the three components above
> (pointer, type, and reference info), a way to label C function
> pointers with type information, a small RPython layer that did simple
> type-checked calls (with no support for overloading or type
> conversion).  I just wrote a C++ analogue to this last part as a
> minimal replacement for Boost.Python, so I could try to formulate what
> I mean in pseudocode if there's interest.  There'd be some amount of
> duplicate type checking if higher level layers such as overload
> resolution were written in application level python, but that
> duplication should be amenable to elimination by the JIT.

I think for now we're happy with extra overhead. We would like to have
*any* working C++ bindings first and then eventually think about
speeding it up.

>
> That's enough for now.  I'll look forward to the discussion.  Most of
> my uses of python revolve heavily around C++ bindings, so it's
> exciting to see that you're starting to think about it even if it's a
> long way off.

Thank you :)

Cheers,
fijal
_______________________________________________
[email protected]
http://codespeak.net/mailman/listinfo/pypy-dev

Reply via email to