Hi Paolo,

Thanks for the introduction e-mail.  I'd like to answer it more
directly, but maybe I should first give a quick overview of what I think
are possible misunderstandings.

The main issue is that unlike Google V8, in PyPy there is no machine
code generation at all (ignoring the JIT compiler generator for the
first part of this e-mail).  That means that the notion of inline cache
simply doesn't make sense.  Let me explain this in more details.

What we implemented long ago in PyPy is completely similar to the "maps"
of the Self language.  It is a dictionary implementation that stores the
keys in a separate, shared "map" object, and that stores the values in a
flat list.  In PyPy this thing still presents the interface of a
full-featured dictionary, so it supports adding and removing arbitrary
elements, which is done by changing the dictionary's current map.

This saves memory, as you know.  But I don't see how this can provide
much speed-up in a pure interpreter (as opposed to in generated machine
code with inline caches).  It seems to me that whatever you do, you need
to perform a lookup *somewhere* in order to implement the "getattr"
operation.  The lookup can be done directly in the object's dictionary,
or it can be done in the shadow "map" object, but the point is that in
both cases it needs to be done at runtime for every attribute access.

There are probably advanced ways to cache recent lookups, which we
didn't try so far, like attaching to each GETATTR opcode in the Python
bytecode a tiny structure that caches the result of the most recent
lookup performed at that source code position.  I don't know how much
speed-ups this would give; in fact it is rather unclear to me why people
think that dictionary lookups are seriously bad.  Of course they are
more expensive than just reading an item out of a list, but I don't
think it's worse than roughly 3 times slower; this number looks big in
the context of machine code, but insignificant in the context of a
complicated bytecode interpreter.  I'm ready to be proven wrong, of
course.

However, this is only half of the story.  Another very important point
is that the mid-term goal for PyPy is to have again a JIT compiler
generator (it did so in the 1.0 release, but it was far from providing
consistent speed-ups, which is the reason that we are re-experimenting
at the moment).  In the context of our JIT, the inline cache technique
you describe -- as well as the polymorphic inline caches of Java VMs --
are special cases of a generic construct called "promotion" in [1] or
"unlifting" in Psyco [2].  Using this construct, the JIT compiler
generator takes as input our "dictionary-with-maps"-enabled interpreter
and produces as output a JIT compiler that itself emits inline caches
similar to Google V8's.  Another way to put it is that in PyPy the real
purpose of our "maps", besides saving memory, is to be
JIT-generator-friendly, enabling good speed-ups in the machine code
generated by the JIT (and not in the pure interpreter).  Thus, what we
implemented in the interpreter is enough already; the main missing piece
is a (completely generic) JIT generator producing a consistent JIT.
Note that this is not just an abstract wish: such inline caches is what
Psyco emits already; and the automatically generated JIT of PyPy 1.0
definitely did that too.

To repeat a point that we tried to make a lot of times but apparently
keep failing to: in the context of the PyPy JIT there is no point in
thinking deeply about the various semantics of Python specifically, like
reflection, operator overloading, the handlers for sends to unsupported
methods (that's __getattr__ in Python), and so on.  These semantics are
encoded in the interpreter already, where the JIT compiler generator can
see them and use them to produce a JIT that emits reasonably efficient
machine code for a given user program.  The relevant question in this
context is not how to encode some complicated Python semantics in the
machine code; the relevant question is to identify the cases where the
generated JIT emits definitely inefficient code, and how to tweak the
interpreter in order to be more JIT-friendly (e.g. by adding maps into
the interpreter).

In an attempt to make it clearer that thinking about advanced Python
semantics is not useful in this context, the "Python" in the previous
paragraph can be replaced by any language, preferably dynamic, and we
still get a JIT for it (e.g. we did it for Prolog, and we have partial
or in-progress interpreters for JavaScript and Smalltalk and other
languages).

Does the above make any sense?  There are three relatively distinct
areas to work on -- tweaking the interpreter to maximize its speed,
tweaking it to maximize its JIT-friendlyness (better done after the JIT
generator is ready), or helping develop the JIT generator itself; I hope
the above helped clarify a bit this aspect of PyPy.  In any case, we
would definitely be interested in working with you in these areas, or in
the other areas that you mentioned in your e-mail (GC...).  Feel free to
discuss matters further, here or on irc (#pypy on irc.freenode.net).


    [1] http://codespeak.net/pypy/dist/pypy/doc/jit/
    [2] http://psyco.sourceforge.net/doc.html [PEPM'04 paper]


A bientot,

Armin.
_______________________________________________
[email protected]
http://codespeak.net/mailman/listinfo/pypy-dev

Reply via email to