[issue10399] AST Optimization: inlining of function calls

Dave Malcolm Fri, 12 Nov 2010 12:17:03 -0800

New submission from Dave Malcolm <[email protected]>:

In msg#120541 of issue#1346238 Raymond suggested to "aim high", so here goes...


I'm opening this as a separate bug as it's a very different approach to the 
patches in that bug; adding those on the nosy list for that bug.  Sorry in 
advance about the epic length of this comment.

I've been experimenting with AST optimizations, and have a (very) primitive 
implementation of function-call inlining.  As is, it's deeply flawed, but in a 
spirit of "Release Early, Release Often" I though I'd post what I have so far, 
and try to enumerate the rest of the work that would need doing to get this 
into, say Python 3.3

The attached patch adds an AST optimization pass to Python/compiler.c.  It does 
this by adding an __optimizer__.py module: the compiler converts the AST to a 
Python representation using ast2mod, and calls out to 
__optimizer__.optimize_ast().  This can then (potentially) apply a series of 
manipulations to the tree.  The result is then converted back from python to 
ast objects, and compilation proceeds as normal on the modified AST tree.

I initially was experimenting with a .c implementation, but moving to .py makes 
things _much_ easier to develop and debug.  In particular, I'm using graphviz's 
"dot" tool to generate before/after visualizations of the AST.

The optimizer doesn't try to optimize itself (or anything that it imports), to 
avoid infinite recursion when we have a cold .pyo cache.

Currently I'm doing the AST optimization before symbol table generation.  This 
means that the inlining is deeply flawed, as it has no knowledge of the scope 
of names.  A robust implementation would compare the scopes of the callsite and 
that of the function body, and remap locals accordingly.  The current 
implementation renames all name references in the function body, which is 
clearly wrong for e.g. references to globals.  See below for notes on that.

Here's my test code::
    def function_to_be_inlined(x, y, z):
        return (2 * x * y) + z
    print(function_to_be_inlined(3, 4, 5))

Here's what it compiles to after going through the inliner (clearly, line 
numbering needs some work).  Note the removal of the CALL_FUNCTION of our 
target call site; the remaining CALL_FUNCTION is to "print":
  2           0 LOAD_CONST               0 (<code object function_to_be_inlined 
at 0x1e9b998, file "<dis>", line 2>) 
              3 MAKE_FUNCTION            0 
              6 STORE_NAME               0 (function_to_be_inlined) 

  4           9 LOAD_CONST               1 (3) 
             12 STORE_NAME               1 (__inline1f22840__x) 
             15 LOAD_CONST               2 (4) 
             18 STORE_NAME               2 (__inline1f22840__y) 
             21 LOAD_CONST               3 (5) 
             24 STORE_NAME               3 (__inline1f22840__z) 

259          27 LOAD_CONST               4 (2) 
             30 LOAD_NAME                1 (__inline1f22840__x) 
             33 BINARY_MULTIPLY      
             34 LOAD_NAME                2 (__inline1f22840__y) 
             37 BINARY_MULTIPLY      
             38 LOAD_NAME                3 (__inline1f22840__z) 
             41 BINARY_ADD           
             42 STORE_NAME               4 (__inline1f22840____returnval__) 

260          45 LOAD_NAME                5 (print) 
             48 LOAD_NAME                4 (__inline1f22840____returnval__) 
             51 CALL_FUNCTION            1 
             54 POP_TOP              
             55 LOAD_CONST               5 (None) 
             58 RETURN_VALUE         

The idea is that a further optimization pass would go through and ellide the 
unnecessary store-to-local/load-from-local instructions, followed by const 
folding, getting this down to:
   LOAD_CONST (<code object>)
   MAKE_FUNCTION
   STORE_NAME (function_to_be_inlined)
; inlined callsite:
   LOAD_NAME  (print)
   LOAD_CONST (29)
   CALL_FUNCTION 1


The biggest issue here is dealing with runtime differences between which 
function was inlined, and which function is being called, either via 
monkeypatching, or in method calls - we can inline intra-method calls within 
one class, but we have to cope with overriding of those methods in subclasses.

Thinking aloud of a way of solving this (this isn't implemented yet):
   add to AST:
     Specialize(expr name,
                stmt* specialized,
                stmt* generalized)

where you have some interpretation of name (e.g. "self.do_something"), and 
carry two different implementations.

  so that e.g.
  class Something:
     def foo(self, x, y, z):
           ... # some code
           self.bar(x, y, z)
           ... # more code
  the:
     Call(Attribute(Name(id='self'), id='bar'), args=[etc])
  becomes
     Specialize(name=Attribute(Name(id='self'), id='bar'),
                specialized=[inlined implementation of Something.bar for 
"self"],
                generalized=[Call(Attribute(Name(id='self'), id='bar'), 
args=[etc])])

Similarly, would need a new bytecode, say "JUMP_IF_SPECIALIZABLE"

      LOAD_NAME (self)
      GET_ATTR  ('bar')
  ; method self.bar is now on top of stack
      LOAD_CONST (<code object for the regular implementation of "bar">)
      JUMP_IF_SPECIALIZABLE -> label_inline_body
  ; Start of non-inlined call
      eval and push args
      CALL_FUNCTION
      JUMP_ABSOLUTE -> label_after
  ; ...end of non-inlined call; return value is top-of-stack

label_inline_body:
      eval args (potentially optimizing with knowledge of what's to come)
      inlined call
      push "return value"
  ; ...end of inlined call; "return value" is top-of-stack

label_after:     

   ; etc


JUMP_IF_SPECIALIZABLE
Inputs:
TOS    : A: code object we inlined for (via a LOAD_CONST, injected by the 
inliner)
TOS -1 : B: function object via runtime lookup (LOAD_GLOBAL, or e.g. LOAD_LOCAL 
"self"; GET_ATTR "do_something")

Action:
  if B's __code__ is A, we can specialize:
     PC += oparg
     POP
  else:
     PC += 1
     POP
     POP

I'm not sure if this covers all cases (e.g. specializing a CFunction such as 
the builtins), or if this a sane approach that actually works; need to have a 
standard interpretation of what a name lookup means, and to inject LOAD_CONST 
of that value into the bytecode, for use by the JUMP_IF_SPECIALIZABLE operation.

As I understand it, functions are "name-resolved" before the arguments are 
evaluated, so if argument evaluation leads to the name being bound to a 
different function (as a side effect), the status quo in CPython currently is 
that the old function is used for that call; JUMP_IF_SPECIALIZABLE would 
preserve that behavior.

It could also slightly slow down overridden method calls, as it would add a 
test for "am I overridden" that would generally take the "overridden" branch; 
perhaps we could detect classes with subclasses in the same AST and suppress 
inlining for this case.  Similarly, it could spot the
    def override_me(self):
         raise NotImplementedError
pattern and not try to inline.


Other issues:
  - Somehow need to tie together symbol tables with ast2mod.  One approach: do 
the symtable generation first, then adjust ast2mod so that it adds 
PySymtableEntry references to the generated python tree as appopriate.  This 
would expose the name scope structure.  But after the tree changes, do we need 
to regenerate the symbol table?  Or we could annotate the python ast.Name 
objects with scope information, and the ast optimizer would be responsible for 
keeping this correct.  The compiler could then rerun the symtable analysis on 
the returned AST, or could trust the optimizer's info.

  - What to do with the "locals()" builtin?  Inlined functions could save a 
copy of locals with duplicate names, keeping their existing names for the 
locals in the inlined copy.  This would lead to "locals()" having additional 
items when called from an inlined function.

  - Many optimizations would have to assume that locals don't change value.  
Can we assume that in an optimized build that it's acceptable that code that 
use the inspect module and manipulates the f_locals of a frame may have 
surprising behavior in the face of optimization?  (e.g. I'd like to be able to 
optimize away locals if we can prove that absence of side-effects).

  - PyAST_obj2mod enforces checking for a particular top-level element.  To 
cope with this, I needed to pass this around in a few places, so I introduced 
"enum PyCompilationMode" to avoid having the magic 0, 1, or 2 to designate the 
grammar variant.  This better distinguishes these values from other places 
which use the ID of the start token: semantically equivalent, but with 
different constants.  I haven't yet fixed this up in Modules/parsermodule.c

  - Plenty of FIXMEs in the code, many of which are in themselves major issues
 
  - I'm not yet checking the optimization flag; that should probably be 
required to be on for this to be called.

  - Is "__optimizer__" a sane name?  perhaps just "optimizer"

  - Too unstable to benchmark as yet.

  - There are probably other issues I haven't thought of yet.

Notes:
  - the patch also contains a fix for issue 10391 (error handling is important 
when manipulating the AST)
  - potentially could add an optimize_cfg() hook, and expose the CFG struct 
basicblock and struct instr as Python objects in a similar way, as a later 
optimization pass
  
Thanks to Red Hat for giving me time to experiment on this

----------
components: Interpreter Core
files: py3k-ast-optimization-inlining-2010-11-12-001.patch
keywords: patch
messages: 121068
nosy: alex, belopolsky, benjamin.peterson, dmalcolm, jhylton, nnorwitz, 
rhettinger, sdahlbac, thomas.lee, titanstar
priority: normal
severity: normal
status: open
title: AST Optimization: inlining of function calls
type: performance
versions: Python 3.3
Added file: 
http://bugs.python.org/file19583/py3k-ast-optimization-inlining-2010-11-12-001.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue10399>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10399] AST Optimization: inlining of function calls

Reply via email to