> ARM is specifically claiming that these instructions can be used to > accelerate Python interpretation. > > > Wow, really? One of the links below mention that?
I'm skeptical though that you can really produce speedups for CPython, though; ISTM that they added Python only as a front-end language for Parrot, and added Parrot only because it looks similar to JVM and .NET (i.e. without actually testing that you can gain performance). >From reading the paper, ISTM that you *can* expect speedups for your JIT-generated code. In ThumbEE, you have the following additional features: - fast null pointer checks: any register-indirect addressing in ThumbEE mode checks whether the base register is NULL; if it is, a callback is invoked (which could then throw NullPointerException). This is irrelevant in Python, because we don't use NULL as the value for "no object" - fast array bounds check: there is an instruction that checks whether 0 <= Rm <= Rn, and invokes a callback if it's not; this would then throw ArrayOutOfBoundsException. This instruction would be emitted by JIT just before any array access. In Python, you cannot easily JIT array access into a direct machine instruction (as you need to go through tp_as_sequence->sq_item); the array bounds check would likely disappear in white noise. - fast switch instruction: there is an efficient way to switch 256 different byte code operations, with an optional immediate parameter. It will call/jump to 256 byte code handlers. This allows for a straight-forward JIT compiler which essentially compiles all byte codes into such switch instructions. That would work for Python as well, but require that ceval gets rewritten entirely. - fast locals: efficient access to a local-variables array, for JIT generation of ldloc.i4 (in .NET, not sure what the Java byte code for local variables is). Would work as well for Python, assuming there is a JIT compiler in the first place. R9 holds the fastlocals pointer (which is good use of the register, since you cannot access it in Thumb mode, anyway) - fast instance variables: likewise, with R10 holding the this pointer. Not applicable to Python, since there is no byte code for instance variable access. - efficient array indexing: they give shift-and-index back to Thumb mode, for a shift by 2, allowing to index arrays with 4-byte elements in a single instruction (rather than requiring a separate multipy-by-four). Again useful for JIT of array access instructions, not applicable to Python - although it would be nice if the C compiler knew how to emit that. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com