New submission from Batuhan Taskaya <isidenti...@gmail.com>:

It is a common scenario to make calls with only constant arguments (e.g to 
datetime.datetime/os.path.join/re.match.group/nox.session.run etc) and the 
bytecode that we currently generate looks like this;
f(1,2,3,4,5,6)
  1           0 LOAD_NAME                0 (f)
              2 LOAD_CONST               0 (1)
              4 LOAD_CONST               1 (2)
              6 LOAD_CONST               2 (3)
              8 LOAD_CONST               3 (4)
             10 LOAD_CONST               4 (5)
             12 LOAD_CONST               5 (6)
             14 CALL_FUNCTION            6
             16 POP_TOP
             18 LOAD_CONST               6 (None)
             20 RETURN_VALUE

But if we are sure that all arguments to a function is positional* (it is also 
possible to support keyword arguments to some extent, needs more research, but 
out of the scope for this particular optimization) and constant, then we could 
simply pack everything together and use CALL_FUNCTION_EX (we also need to set 
some limits, since when it is too little might prevent constant cache, and when 
it is too high might create giant tuples in the code object, perhaps 75 > N > 4)

  1           0 LOAD_NAME                0 (f)
              2 LOAD_CONST               0 ((1, 2, 3, 4, 5, 6))
              4 CALL_FUNCTION_EX         0
              6 POP_TOP
              8 LOAD_CONST               1 (None)
             10 RETURN_VALUE

The implementation is also very simple, and doesn't even touch anywhere beside 
the ast optimizer itself. It is possible to do this in the compiler, but that 
might complicate the logic so I'd say it is best to keep it as isolated as it 
can be.

(debug builds)

-s 'foo = lambda *args: None' 'foo("yyyyy", 123, 123321321312, (1,2,3), 
"yyyyy", 1.0, (1,2,3), "yyyyy", "yyyyy", (1,2,3), 5, 6, 7)'
Mean +- std dev: [master_artificial] 251 ns +- 2 ns -> [optimized_artificial] 
185 ns +- 1 ns: 1.36x faster

-s 'from datetime import datetime' 'datetime(1997, 7, 27, 12, 10, 0, 0)'
Mean +- std dev: [master_datetime] 461 ns +- 1 ns -> [optimized_datetime] 386 
ns +- 2 ns: 1.19x faster

One other potential candidate to this optimization is doing something similar 
in the CFG optimizer, and folding all contiguous LOAD_CONSTs (within some sort 
of limit ofc) into a single tuple load and then adding an UNPACK_SEQUENCE 
(which would replicate the effect). This is a poorer form, and I was only able 
to observe a speedup of 1.13x / 1.03x respectively on the benchmarks. The good 
thing about that optimization was that, first it was able to work with mixed 
parameters (so if you have some other types of expressions besides constants, 
but all constants follow each other, then it was able to optimize that case as 
well) and also it wasn't only for calls but rather all compiler cases where 
LOAD_CONST blocks were generated.

----------
assignee: BTaskaya
components: Interpreter Core
messages: 396437
nosy: BTaskaya, Mark.Shannon, pablogsal, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Packing constant call arguments
type: performance
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44501>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to