Re: Python and the need for speed

Erik Tue, 18 Apr 2017 02:38:14 -0700

On 13/04/17 18:50, MRAB wrote:

On 2017-04-13 09:08, Steven D'Aprano wrote:

On Wed, 12 Apr 2017 16:30:38 -0700, bart4858 wrote:
Is it possible to skip the STORE_NAME op-code? If you knew *for sure*
that the target (x) was a mutable object which implemented += using an
in-
place mutation, then you could, but the only built-in where that applies
is list so even if you could guarantee x was a list, it hardly seems
worth the bother.

If the reference to be stored by STORE_NAME is the same as the reference
returned by LOAD_NAME, then STORE_NAME could be omitted.

That would just mean remembering that address.

When considering special-casing this opcode sequence, remember thatin-place operations can be performed on anonymous objects (i.e., thosereferenced by a collection and not bound directly to a namespace):


>>> import dis
>>> dis.dis(compile("x = [0, 1, 2]; x[1] += 1;", "", "single"))
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST               1 (1)
              6 LOAD_CONST               2 (2)
              9 BUILD_LIST               3
             12 STORE_NAME               0 (x)
             15 LOAD_NAME                0 (x)
             18 LOAD_CONST               1 (1)
             21 DUP_TOP_TWO
             22 BINARY_SUBSCR
             23 LOAD_CONST               1 (1)
             26 INPLACE_ADD
             27 ROT_THREE
             28 STORE_SUBSCR
             29 LOAD_CONST               3 (None)
             32 RETURN_VALUE

So in this case, the STORE_SUBSCR does the re-binding, but it isseparated from the INPLACE_ADD by another opcode.

I'm not saying it's impossible to fold the re-binding into a (set of)special new opcode(s), but I am saying it's more complex than at firstit appears.

FWIW, I spent some time about a year ago looking at things like this(small improvements to the peephole optimizer which allowed certain verycommon sequences to be folded into a (new) opcode which in turn allowedother optimizations to avoid branching). The changes worked, but didn'tactually improve performance significantly in my tests (which is why Iended up not bothering to propose anything).

I remember back in the day (circa 1.5.2?) thattrips-around-the-interpreter-loop were significant and avoiding themcould give wins. However, in the current CPython interpreter, theimprovements over the original huge switch() to dispatch the bytecodesto the correct handler appear to have made this type of optimizationless effective. That was my conclusion at the time, anyway - I only hadabout a week to experiment with it.


E.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Python and the need for speed

Reply via email to