I agree with the motivations given by Stefan - two interesting possibilities would be to: a) first, test the compatibility layer with Cython generated code
b) possibly, allow users to use the Python API while replacing refcounting with another, more meaningful, PyPy-specific API* for a garbage collected heap. However, such an API is radically different. I'm also not sure how well such an API would mesh with the CPython API, actually. If Cython could support such an API, that would be great. But I'm unsure whether this is worth it, for Cython, and more in general for other modules (one could easily and elegantly support both CPython and PyPy with preprocessor tricks). See further below about why call overhead is not the biggest performance problem when not inlining. * I thought the Java Native Interface (JNI) design of local and global references (http://download.oracle.com/javase/6/docs/technotes/guides/jni/spec/design.html#wp16785) would work here, with some adaptation. However, if your moving GCs support pinning of objects, as I expect to be necessary to interact with CPython code, I would do an important change to that API: instead of having object references be pointers to (movable by the GC) pointers to objects, like in the JNI API, PyPy should use plain pinned pointers. The pinning would not be apparent in the type, but that should be fine I guess. Problems arise when PyPy-aware code calls code which still uses the refcounting API. It is mostly safe to ignore the refcounting (even decreases) for local references, but I'm unsure about persistent references, even if it's probably still the best solution, so that the PyPy-aware code handles the lifecycle by itself. On Thu, Aug 12, 2010 at 11:25, Stefan Behnel <[email protected]> wrote: > Maciej Fijalkowski, 12.08.2010 10:05: >> On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote: > If you only use it to call into non-trivial Cython code (e.g. some heavy > calculations on NumPy tables), the call overhead should be mostly > negligible, maybe even close to that in CPython. You could even provide > some kind of fast-path to 'cpdef' functions (i.e. functions that are > callable from both C and Python) and 'api' functions (which are currently > exported at the module API level using the PyCapsule mechanism). That would > reduce the call overhead to that of a C call. >> but it's also unjitable. This means that to JIT, cpython >> extension is like a black box which should not be touched. > Well, unless both sides learn about each other, that is. It won't > necessarily impact the JIT, but then again, a JIT usually won't have a > noticeable impact on the performance of Cython code anyway. Call overhead is not the biggest problem, I guess (well, if it's bigger than that in C, it might be); it's IMHO the minor problem when you can't inline. Inlining is important because it allows to do more optimizations on the combined code. Now, it might or might not apply to your typical use cases (present and future), you should just keep this issue in mind, too. Whenever you say "If you only use it to call into non-trivial Cython code", you imply that some kind of functional abstraction, the one where you write short functions, such as accessors, are not efficiently supported. For instance, if you call two functions, each containing a parallel for loops, fusing the loops requires inlining the functions to expose the loops. Inlining accessors (getters and setters) allows to recognize that they often don't need to be called over and over again, i.e., common subexpression elimination, which you can't do on a normal (impure) function. To make a particularly dramatic example (since it comes from C) of a quadratic-to-linear optimization: a loop like for (i = 0; i < strlen(s); i++) { //do something on s without modifying it } takes quadratic time, because strlen takes linear time and is called at each loop. Can the optimizer fix this? The simplest way for it is to inline everything, then it could notice that calculating strlen only once is safe. In C with GCC extensions, one could annotate strlen as pure, and use functions which take s as a const parameter (but I'm unsure if it actually works). In Python (and even in Java), anything such should work without annotations. Of course, one can't rely on this quadratic-linear optimization unless it's guaranteed to work (like tail call elimination), so I wouldn't do it in this case; this point relates to the wider issue of unreliable optimizations and "sufficiently smart compilers", better discussed at http://prog21.dadgum.com/40.html (not mine). -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/ _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
