[Python-Dev] Re: PEP 563 in light of PEP 649

Larry Hastings Mon, 19 Apr 2021 11:00:22 -0700

Oops: where I said nine, I should have said, twenty-seven. 3-cubed. Should have had my coffee /before/ posting. Carry on!



//arry/

On 4/19/21 10:51 AM, Larry Hastings wrote:

I noticed something this morning: there's another way in which InadaNaoki's benchmark here is--possibly?--unrealistic.
As mentioned, his benchmark generates a thousand functions, each ofwhich takes exactly three parameters, and each of those parametersrandomly chooses one of three annotations. In current trunk (not inmy branch, I'm behind), there's an optimization for stringizedannotations that compiles the annotations into a tuple, and then whenyou pull out __annotations__ on the object at runtime it converts itinto a dict on demand.
This means that even though there are a thousand functions, they onlyever generate one of nine possible tuples for these annotationtuples. And here's the thing: our lovely marshal module is smartenough to notice that these tuples /are/ duplicates, and it'll throwaway the duplicates and replace them with references to the original.
Something analogous /could/ happen in the PEP 649 branch but currentlydoesn't. When running Inada Noki's benchmark, there are a total ofnine possible annotations code objects. Except, each functiongenerated by the benchmark has a unique name, and I incorporate thatname into the name given to the code object(f"{function_name}.__co_annotations__"). Since each function name isdifferent, each code object name is different, so each code object/hash/ is different, and since they aren't /exact/ duplicates they arenever consolidated.
Inada Naoki has suggested changing this, so that all the annotationscode objects have the same name ("__co_annotations__"). If we madethat change, I'm pretty sure the code size delta in this syntheticbenchmark would drop. I haven't done it because the current name ofthe code object might be helpful in debugging, and I'm not convincedthis would have an effect in real-world code.
But... would it? Someone, and again I think it's Inada Naoki,suggests that in real-world applications, there are often many, manyfunctions in a single module that have identical signatures. Theannotation-tuples optimization naturally takes advantage of that. PEP649 doesn't. Should it? Would this really be beneficial toreal-world code bases?
Cheers,


//arry/


On 4/16/21 12:26 PM, Larry Hastings wrote:
Please don't confuse Inada Naoki's benchmark results with the effectPEP 649 would have on a real-world codebase. His artifical benchmarkconstructs a thousand empty functions that take three parameters withrandomly-chosen annotations--the results provides some insights butare not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingenton the number of annotations and the number of objects annotated, notthe overall code size of the module. Expressing it that way, andsuggesting that Python users would see the same results withreal-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on areal-world codebase currently using PEP 563, but AFAIK nobody hasreported such results.
//arry/

On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <[email protected]<mailto:[email protected]>> wrote:
    [snip] I say "compromise" because as Inada Naoki measured,
    there's still a non-zero performance cost of PEP 649 versus PEP 563:

    - code size: +63%
    - memory: +62%
    - import time: +60%


    Will this hurt some current users of typing? Yes, I can name you
    multiple past employers of mine where this will be the case. Is
    it worth it for Pydantic? I tend to think that yes, it is, since
    it is a significant community, and the operations on type
    annotations it performs are in the sensible set for which
    `typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import timeand memory use tend to be real issues in large Python codebases(code size less so), and I think that the relative efficiency of PEP563 is an important feature. If PEP 649 can't be made moreefficient, this could be a major regression for some users. Pythonserver applications need to run multiple processes because of theGIL, and since code objects generally aren't shared betweenprocesses (GC and reference counting makes it tricky, I understand),code size increases tend to be amplified on large servers. Evenhaving a lot of RAM doesn't necessarily help, since a lot of RAMtypically implies many CPU cores, and thus many processes are neededas well.
I can see how both PEP 563 and PEP 649 bring significant benefits,but typically for different user populations. I wonder if there's away of combining the benefits of both approaches. I don't like theidea of having toggles for different performance tradeoffsindefinitely, but I can see how this might be a necessary compromiseif we don't want to make things worse for any user groups.
Jukka

_______________________________________________
Python-Dev mailing list [email protected]
To unsubscribe send an email [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived 
athttps://mail.python.org/archives/list/[email protected]/message/PBJ6MBQIE3DVQUUAO764PIQ3TWGLBS3X/
Code of Conduct:http://python.org/psf/codeofconduct/

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/IFK6WLFYKRJ3WLFOORBBNFFZK3JZQRGE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 563 in light of PEP 649

Reply via email to