New submission from Batuhan Taskaya <[email protected]>:
It is a common scenario to make calls with only constant arguments (e.g to
datetime.datetime/os.path.join/re.match.group/nox.session.run etc) and the
bytecode that we currently generate looks like this;
f(1,2,3,4,5,6)
1 0 LOAD_NAME 0 (f)
2 LOAD_CONST 0 (1)
4 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
8 LOAD_CONST 3 (4)
10 LOAD_CONST 4 (5)
12 LOAD_CONST 5 (6)
14 CALL_FUNCTION 6
16 POP_TOP
18 LOAD_CONST 6 (None)
20 RETURN_VALUE
But if we are sure that all arguments to a function is positional* (it is also
possible to support keyword arguments to some extent, needs more research, but
out of the scope for this particular optimization) and constant, then we could
simply pack everything together and use CALL_FUNCTION_EX (we also need to set
some limits, since when it is too little might prevent constant cache, and when
it is too high might create giant tuples in the code object, perhaps 75 > N > 4)
1 0 LOAD_NAME 0 (f)
2 LOAD_CONST 0 ((1, 2, 3, 4, 5, 6))
4 CALL_FUNCTION_EX 0
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
The implementation is also very simple, and doesn't even touch anywhere beside
the ast optimizer itself. It is possible to do this in the compiler, but that
might complicate the logic so I'd say it is best to keep it as isolated as it
can be.
(debug builds)
-s 'foo = lambda *args: None' 'foo("yyyyy", 123, 123321321312, (1,2,3),
"yyyyy", 1.0, (1,2,3), "yyyyy", "yyyyy", (1,2,3), 5, 6, 7)'
Mean +- std dev: [master_artificial] 251 ns +- 2 ns -> [optimized_artificial]
185 ns +- 1 ns: 1.36x faster
-s 'from datetime import datetime' 'datetime(1997, 7, 27, 12, 10, 0, 0)'
Mean +- std dev: [master_datetime] 461 ns +- 1 ns -> [optimized_datetime] 386
ns +- 2 ns: 1.19x faster
One other potential candidate to this optimization is doing something similar
in the CFG optimizer, and folding all contiguous LOAD_CONSTs (within some sort
of limit ofc) into a single tuple load and then adding an UNPACK_SEQUENCE
(which would replicate the effect). This is a poorer form, and I was only able
to observe a speedup of 1.13x / 1.03x respectively on the benchmarks. The good
thing about that optimization was that, first it was able to work with mixed
parameters (so if you have some other types of expressions besides constants,
but all constants follow each other, then it was able to optimize that case as
well) and also it wasn't only for calls but rather all compiler cases where
LOAD_CONST blocks were generated.
----------
assignee: BTaskaya
components: Interpreter Core
messages: 396437
nosy: BTaskaya, Mark.Shannon, pablogsal, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Packing constant call arguments
type: performance
versions: Python 3.11
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue44501>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com