On 13/04/17 18:50, MRAB wrote:
On 2017-04-13 09:08, Steven D'Aprano wrote:
On Wed, 12 Apr 2017 16:30:38 -0700, bart4858 wrote:
Is it possible to skip the STORE_NAME op-code? If you knew *for sure*
that the target (x) was a mutable object which implemented += using an
in-
place mutation, then you could, but the only built-in where that applies
is list so even if you could guarantee x was a list, it hardly seems
worth the bother.

If the reference to be stored by STORE_NAME is the same as the reference
returned by LOAD_NAME, then STORE_NAME could be omitted.

That would just mean remembering that address.

When considering special-casing this opcode sequence, remember that in-place operations can be performed on anonymous objects (i.e., those referenced by a collection and not bound directly to a namespace):

>>> import dis
>>> dis.dis(compile("x = [0, 1, 2]; x[1] += 1;", "", "single"))
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST               1 (1)
              6 LOAD_CONST               2 (2)
              9 BUILD_LIST               3
             12 STORE_NAME               0 (x)
             15 LOAD_NAME                0 (x)
             18 LOAD_CONST               1 (1)
             21 DUP_TOP_TWO
             22 BINARY_SUBSCR
             23 LOAD_CONST               1 (1)
             26 INPLACE_ADD
             27 ROT_THREE
             28 STORE_SUBSCR
             29 LOAD_CONST               3 (None)
             32 RETURN_VALUE

So in this case, the STORE_SUBSCR does the re-binding, but it is separated from the INPLACE_ADD by another opcode.

I'm not saying it's impossible to fold the re-binding into a (set of) special new opcode(s), but I am saying it's more complex than at first it appears.



FWIW, I spent some time about a year ago looking at things like this (small improvements to the peephole optimizer which allowed certain very common sequences to be folded into a (new) opcode which in turn allowed other optimizations to avoid branching). The changes worked, but didn't actually improve performance significantly in my tests (which is why I ended up not bothering to propose anything).

I remember back in the day (circa 1.5.2?) that trips-around-the-interpreter-loop were significant and avoiding them could give wins. However, in the current CPython interpreter, the improvements over the original huge switch() to dispatch the bytecodes to the correct handler appear to have made this type of optimization less effective. That was my conclusion at the time, anyway - I only had about a week to experiment with it.

E.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to