New submission from zcpara <zhangchaospec...@gmail.com>:

It is a common scenario to call functions in loop, such as
def foo():
  for i in range(10):
    len("abc")
Currently, we generate bytecode like this;
2           0 SETUP_LOOP              24 (to 26)
              2 LOAD_GLOBAL              0 (range)
              4 LOAD_CONST               1 (10)
              6 CALL_FUNCTION            1
              8 GET_ITER
        >>   10 FOR_ITER                12 (to 24)
             12 STORE_FAST               0 (i)

  3          14 LOAD_GLOBAL              1 (len)
             16 LOAD_CONST               2 ('abc')
             18 CALL_FUNCTION            1
             20 POP_TOP
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               0 (None)
             28 RETURN_VALUE
If we can make sure len is a loop invariant, we can generate code as follows. 
Only one LOAD_GLOBAL len is executed before the loop. LOAD_GLOBAL len is 
replaced with LOAD_FAST __local__len in the loop.
2           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (10)
              4 CALL_FUNCTION            1
              6 GET_ITER
              8 LOAD_GLOBAL              1 (len)
             10 STORE_FAST               1 (__local__len)
        >>   12 FOR_ITER                 6 (to 26)
             14 STORE_FAST               0 (i)

  3          16 LOAD_FAST                1 (__local__len)
             18 LOAD_CONST               2 ('abc')
             20 CALL_FUNCTION            1
             22 POP_TOP
             24 JUMP_ABSOLUTE            6 (to 12)

  2     >>   26 LOAD_CONST               0 (None)
             28 RETURN_VALU

I prototyped this optimization and gain 4.6% performance improvement on 
regex_v8 of PyPerformance. Currently, only the global invariants are moved out 
of the loop. There are some other loop invariants can be moved. Other 
optimizations such as function inline can provide more opportunities for LICM.
I meet a problem with the following case;
1 def no_pop_blocks():
2    y = 1
3    while not y:
4        bla
5    x = 1

bla is not defined anywhere. With LICM, LOAD_GLOBAL bla will be move out of the 
loop. Then it executes with an error: bla is not defined.

----------
components: Interpreter Core
messages: 396496
nosy: zhangchaospecial
priority: normal
severity: normal
status: open
title: loop invariant code motion
type: performance
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44505>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to