I noticed something this morning: there's another way in which Inada
Naoki's benchmark here is--possibly?--unrealistic.
As mentioned, his benchmark generates a thousand functions, each of
which takes exactly three parameters, and each of those parameters
randomly chooses one of three annotations. In current trunk (not in my
branch, I'm behind), there's an optimization for stringized annotations
that compiles the annotations into a tuple, and then when you pull out
__annotations__ on the object at runtime it converts it into a dict on
demand.
This means that even though there are a thousand functions, they only
ever generate one of nine possible tuples for these annotation tuples.
And here's the thing: our lovely marshal module is smart enough to
notice that these tuples /are/ duplicates, and it'll throw away the
duplicates and replace them with references to the original.
Something analogous /could/ happen in the PEP 649 branch but currently
doesn't. When running Inada Noki's benchmark, there are a total of nine
possible annotations code objects. Except, each function generated by
the benchmark has a unique name, and I incorporate that name into the
name given to the code object (f"{function_name}.__co_annotations__").
Since each function name is different, each code object name is
different, so each code object /hash/ is different, and since they
aren't /exact/ duplicates they are never consolidated.
Inada Naoki has suggested changing this, so that all the annotations
code objects have the same name ("__co_annotations__"). If we made that
change, I'm pretty sure the code size delta in this synthetic benchmark
would drop. I haven't done it because the current name of the code
object might be helpful in debugging, and I'm not convinced this would
have an effect in real-world code.
But... would it? Someone, and again I think it's Inada Naoki, suggests
that in real-world applications, there are often many, many functions in
a single module that have identical signatures. The annotation-tuples
optimization naturally takes advantage of that. PEP 649 doesn't.
Should it? Would this really be beneficial to real-world code bases?
Cheers,
//arry/
On 4/16/21 12:26 PM, Larry Hastings wrote:
Please don't confuse Inada Naoki's benchmark results with the effect
PEP 649 would have on a real-world codebase. His artifical benchmark
constructs a thousand empty functions that take three parameters with
randomly-chosen annotations--the results provides some insights but
are not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingent
on the number of annotations and the number of objects annotated, not
the overall code size of the module. Expressing it that way, and
suggesting that Python users would see the same results with
real-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on a
real-world codebase currently using PEP 563, but AFAIK nobody has
reported such results.
//arry/
On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <luk...@langa.pl
<mailto:luk...@langa.pl>> wrote:
[snip] I say "compromise" because as Inada Naoki measured,
there's still a non-zero performance cost of PEP 649 versus PEP 563:
- code size: +63%
- memory: +62%
- import time: +60%
Will this hurt some current users of typing? Yes, I can name you
multiple past employers of mine where this will be the case. Is
it worth it for Pydantic? I tend to think that yes, it is, since
it is a significant community, and the operations on type
annotations it performs are in the sensible set for which
`typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import time
and memory use tend to be real issues in large Python codebases (code
size less so), and I think that the relative efficiency of PEP 563 is
an important feature. If PEP 649 can't be made more efficient, this
could be a major regression for some users. Python server
applications need to run multiple processes because of the GIL, and
since code objects generally aren't shared between processes (GC and
reference counting makes it tricky, I understand), code size
increases tend to be amplified on large servers. Even having a lot of
RAM doesn't necessarily help, since a lot of RAM typically implies
many CPU cores, and thus many processes are needed as well.
I can see how both PEP 563 and PEP 649 bring significant benefits,
but typically for different user populations. I wonder if there's a
way of combining the benefits of both approaches. I don't like the
idea of having toggles for different performance tradeoffs
indefinitely, but I can see how this might be a necessary compromise
if we don't want to make things worse for any user groups.
Jukka
_______________________________________________
Python-Dev mailing list --python-dev@python.org
To unsubscribe send an email topython-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived
athttps://mail.python.org/archives/list/python-dev@python.org/message/PBJ6MBQIE3DVQUUAO764PIQ3TWGLBS3X/
Code of Conduct:http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/QEZDCPR3CCDBPEEA36FEM6RA3I7IS2UR/
Code of Conduct: http://python.org/psf/codeofconduct/