Oops: where I said nine, I should have said, twenty-seven. 3-cubed.
Should have had my coffee /before/ posting. Carry on!
//arry/
On 4/19/21 10:51 AM, Larry Hastings wrote:
I noticed something this morning: there's another way in which Inada
Naoki's benchmark here is--possibly?--unrealistic.
As mentioned, his benchmark generates a thousand functions, each of
which takes exactly three parameters, and each of those parameters
randomly chooses one of three annotations. In current trunk (not in
my branch, I'm behind), there's an optimization for stringized
annotations that compiles the annotations into a tuple, and then when
you pull out __annotations__ on the object at runtime it converts it
into a dict on demand.
This means that even though there are a thousand functions, they only
ever generate one of nine possible tuples for these annotation
tuples. And here's the thing: our lovely marshal module is smart
enough to notice that these tuples /are/ duplicates, and it'll throw
away the duplicates and replace them with references to the original.
Something analogous /could/ happen in the PEP 649 branch but currently
doesn't. When running Inada Noki's benchmark, there are a total of
nine possible annotations code objects. Except, each function
generated by the benchmark has a unique name, and I incorporate that
name into the name given to the code object
(f"{function_name}.__co_annotations__"). Since each function name is
different, each code object name is different, so each code object
/hash/ is different, and since they aren't /exact/ duplicates they are
never consolidated.
Inada Naoki has suggested changing this, so that all the annotations
code objects have the same name ("__co_annotations__"). If we made
that change, I'm pretty sure the code size delta in this synthetic
benchmark would drop. I haven't done it because the current name of
the code object might be helpful in debugging, and I'm not convinced
this would have an effect in real-world code.
But... would it? Someone, and again I think it's Inada Naoki,
suggests that in real-world applications, there are often many, many
functions in a single module that have identical signatures. The
annotation-tuples optimization naturally takes advantage of that. PEP
649 doesn't. Should it? Would this really be beneficial to
real-world code bases?
Cheers,
//arry/
On 4/16/21 12:26 PM, Larry Hastings wrote:
Please don't confuse Inada Naoki's benchmark results with the effect
PEP 649 would have on a real-world codebase. His artifical benchmark
constructs a thousand empty functions that take three parameters with
randomly-chosen annotations--the results provides some insights but
are not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingent
on the number of annotations and the number of objects annotated, not
the overall code size of the module. Expressing it that way, and
suggesting that Python users would see the same results with
real-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on a
real-world codebase currently using PEP 563, but AFAIK nobody has
reported such results.
//arry/
On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <luk...@langa.pl
<mailto:luk...@langa.pl>> wrote:
[snip] I say "compromise" because as Inada Naoki measured,
there's still a non-zero performance cost of PEP 649 versus PEP 563:
- code size: +63%
- memory: +62%
- import time: +60%
Will this hurt some current users of typing? Yes, I can name you
multiple past employers of mine where this will be the case. Is
it worth it for Pydantic? I tend to think that yes, it is, since
it is a significant community, and the operations on type
annotations it performs are in the sensible set for which
`typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import time
and memory use tend to be real issues in large Python codebases
(code size less so), and I think that the relative efficiency of PEP
563 is an important feature. If PEP 649 can't be made more
efficient, this could be a major regression for some users. Python
server applications need to run multiple processes because of the
GIL, and since code objects generally aren't shared between
processes (GC and reference counting makes it tricky, I understand),
code size increases tend to be amplified on large servers. Even
having a lot of RAM doesn't necessarily help, since a lot of RAM
typically implies many CPU cores, and thus many processes are needed
as well.
I can see how both PEP 563 and PEP 649 bring significant benefits,
but typically for different user populations. I wonder if there's a
way of combining the benefits of both approaches. I don't like the
idea of having toggles for different performance tradeoffs
indefinitely, but I can see how this might be a necessary compromise
if we don't want to make things worse for any user groups.
Jukka
_______________________________________________
Python-Dev mailing list --python-dev@python.org
To unsubscribe send an email topython-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived
athttps://mail.python.org/archives/list/python-dev@python.org/message/PBJ6MBQIE3DVQUUAO764PIQ3TWGLBS3X/
Code of Conduct:http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/IFK6WLFYKRJ3WLFOORBBNFFZK3JZQRGE/
Code of Conduct: http://python.org/psf/codeofconduct/