[Python-Dev] Re: PEP 563 in light of PEP 649

Larry Hastings Mon, 19 Apr 2021 10:54:08 -0700

I noticed something this morning: there's another way in which InadaNaoki's benchmark here is--possibly?--unrealistic.

As mentioned, his benchmark generates a thousand functions, each ofwhich takes exactly three parameters, and each of those parametersrandomly chooses one of three annotations. In current trunk (not in mybranch, I'm behind), there's an optimization for stringized annotationsthat compiles the annotations into a tuple, and then when you pull out__annotations__ on the object at runtime it converts it into a dict ondemand.

This means that even though there are a thousand functions, they onlyever generate one of nine possible tuples for these annotation tuples. And here's the thing: our lovely marshal module is smart enough tonotice that these tuples /are/ duplicates, and it'll throw away theduplicates and replace them with references to the original.

Something analogous /could/ happen in the PEP 649 branch but currentlydoesn't. When running Inada Noki's benchmark, there are a total of ninepossible annotations code objects. Except, each function generated bythe benchmark has a unique name, and I incorporate that name into thename given to the code object (f"{function_name}.__co_annotations__").Since each function name is different, each code object name isdifferent, so each code object /hash/ is different, and since theyaren't /exact/ duplicates they are never consolidated.

Inada Naoki has suggested changing this, so that all the annotationscode objects have the same name ("__co_annotations__"). If we made thatchange, I'm pretty sure the code size delta in this synthetic benchmarkwould drop. I haven't done it because the current name of the codeobject might be helpful in debugging, and I'm not convinced this wouldhave an effect in real-world code.

But... would it? Someone, and again I think it's Inada Naoki, suggeststhat in real-world applications, there are often many, many functions ina single module that have identical signatures. The annotation-tuplesoptimization naturally takes advantage of that. PEP 649 doesn't. Should it? Would this really be beneficial to real-world code bases?


Cheers,


//arry/


On 4/16/21 12:26 PM, Larry Hastings wrote:

Please don't confuse Inada Naoki's benchmark results with the effectPEP 649 would have on a real-world codebase. His artifical benchmarkconstructs a thousand empty functions that take three parameters withrandomly-chosen annotations--the results provides some insights butare not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingenton the number of annotations and the number of objects annotated, notthe overall code size of the module. Expressing it that way, andsuggesting that Python users would see the same results withreal-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on areal-world codebase currently using PEP 563, but AFAIK nobody hasreported such results.
//arry/

On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <[email protected]<mailto:[email protected]>> wrote:
    [snip] I say "compromise" because as Inada Naoki measured,
    there's still a non-zero performance cost of PEP 649 versus PEP 563:

    - code size: +63%
    - memory: +62%
    - import time: +60%


    Will this hurt some current users of typing? Yes, I can name you
    multiple past employers of mine where this will be the case. Is
    it worth it for Pydantic? I tend to think that yes, it is, since
    it is a significant community, and the operations on type
    annotations it performs are in the sensible set for which
    `typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import timeand memory use tend to be real issues in large Python codebases (codesize less so), and I think that the relative efficiency of PEP 563 isan important feature. If PEP 649 can't be made more efficient, thiscould be a major regression for some users. Python serverapplications need to run multiple processes because of the GIL, andsince code objects generally aren't shared between processes (GC andreference counting makes it tricky, I understand), code sizeincreases tend to be amplified on large servers. Even having a lot ofRAM doesn't necessarily help, since a lot of RAM typically impliesmany CPU cores, and thus many processes are needed as well.
I can see how both PEP 563 and PEP 649 bring significant benefits,but typically for different user populations. I wonder if there's away of combining the benefits of both approaches. I don't like theidea of having toggles for different performance tradeoffsindefinitely, but I can see how this might be a necessary compromiseif we don't want to make things worse for any user groups.
Jukka

_______________________________________________
Python-Dev mailing list [email protected]
To unsubscribe send an email [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived 
athttps://mail.python.org/archives/list/[email protected]/message/PBJ6MBQIE3DVQUUAO764PIQ3TWGLBS3X/
Code of Conduct:http://python.org/psf/codeofconduct/

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/QEZDCPR3CCDBPEEA36FEM6RA3I7IS2UR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP 563 in light of PEP 649

Reply via email to