> We can't piggy back on -OO as the only way to disable this, it needs to have an option of its own. -OO is unusable as code that relies on "doc"strings as application data such as http://www.dabeaz.com/ply/ply.html exists.
-OO is the only sensible way to disable the data. There are two things to disable: * The data in pyc files * Printing the exception highlighting Printing the exception highlighting can be disabled via combo of environment variable / -X option but collecting the data can only be disabled by -OO. The reason is that this will end in pyc files so when the data is not there, a different kind of pyc files need to be produced and I really don't want to have another set of pyc file extension just to deactivate this. Notice that also a configure time variable won't work because it will cause crashes when reading pyc files produced by the interpreter compiled without the flag. On Sat, 8 May 2021 at 21:13, Gregory P. Smith <g...@krypto.org> wrote: > > > On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <pablog...@gmail.com> > wrote: > >> Hi Brett, >> >> Just to be clear, .pyo files have not existed for a while: >>> https://www.python.org/dev/peps/pep-0488/. >> >> >> Whoops, my bad, I wanted to refer to the pyc files that are generated >> with -OO, which have the "opt-2" prefix. >> >> This only kicks in at the -OO level. >> >> >> I will correct the PEP so it reflex this more exactly. >> >> I personally prefer the idea of dropping the data with -OO since if >>> you're stripping out docstrings you're already hurting introspection >>> capabilities in the name of memory. Or one could go as far as to introduce >>> -Os to do -OO plus dropping this extra data. >> >> >> This is indeed the plan, sorry for the confusion. The opt-out mechanism >> is using -OO, precisely as we are already dropping other data. >> > > We can't piggy back on -OO as the only way to disable this, it needs to > have an option of its own. -OO is unusable as code that relies on > "doc"strings as application data such as > http://www.dabeaz.com/ply/ply.html exists. > > -gps > > >> >> Thanks for the clarifications! >> >> >> >> On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote: >> >>> >>> >>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado < >>> pablog...@gmail.com> wrote: >>> >>>> Although we were originally not sympathetic with it, we may need to >>>> offer an opt-out mechanism for those users that care about the impact of >>>> the overhead of the new data in pyc files >>>> and in in-memory code objectsas was suggested by some folks (Thomas, >>>> Yury, and others). For this, we could propose that the functionality will >>>> be deactivated along with the extra >>>> information when Python is executed in optimized mode (``python -O``) >>>> and therefore pyo files will not have the overhead associated with the >>>> extra required data. >>>> >>> >>> Just to be clear, .pyo files have not existed for a while: >>> https://www.python.org/dev/peps/pep-0488/. >>> >>> >>>> Notice that Python >>>> already strips docstrings in this mode so it would be "aligned" with >>>> the current mechanism of optimized mode. >>>> >>> >>> This only kicks in at the -OO level. >>> >>> >>>> >>>> Although this complicates the implementation, it certainly is still >>>> much easier than dealing with compression (and more useful for those that >>>> don't want the feature). Notice that we also >>>> expect pessimistic results from compression as offsets would be quite >>>> random (although predominantly in the range 10 - 120). >>>> >>> >>> I personally prefer the idea of dropping the data with -OO since if >>> you're stripping out docstrings you're already hurting introspection >>> capabilities in the name of memory. Or one could go as far as to introduce >>> -Os to do -OO plus dropping this extra data. >>> >>> As for .pyc file size, I personally wouldn't worry about it. If someone >>> is that space-constrained they either aren't using .pyc files or are only >>> shipping a single set of .pyc files under -OO and skipping source code. And >>> .pyc files are an implementation detail of CPython so there shouldn't be >>> too much of a concern for other interpreters. >>> >>> -Brett >>> >>> >>>> >>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com> >>>> wrote: >>>> >>>>> One last note for clarity: that's the increase of size in the stdlib, >>>>> the increase of size >>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an >>>>> increase of 22%. >>>>> >>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado < >>>>> pablog...@gmail.com> wrote: >>>>> >>>>>> Some update on the numbers. We have made some draft implementation to >>>>>> corroborate the >>>>>> numbers with some more realistic tests and seems that our original >>>>>> calculations were wrong. >>>>>> The actual increase in size is quite bigger than previously >>>>>> advertised: >>>>>> >>>>>> Using bytes object to encode the final object and marshalling that to >>>>>> disk (so using uint8_t) as the underlying >>>>>> type: >>>>>> >>>>>> BEFORE: >>>>>> >>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>>> ❯ du -h Lib -c --max-depth=0 >>>>>> 70M Lib >>>>>> 70M total >>>>>> >>>>>> AFTER: >>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>>> ❯ du -h Lib -c --max-depth=0 >>>>>> 76M Lib >>>>>> 76M total >>>>>> >>>>>> So that's an increase of 8.56 % over the original value. This is >>>>>> storing the start offset and end offset with no compression >>>>>> whatsoever. >>>>>> >>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado < >>>>>> pablog...@gmail.com> wrote: >>>>>> >>>>>>> Hi there, >>>>>>> >>>>>>> We are preparing a PEP and we would like to start some early >>>>>>> discussion about one of the main aspects of the PEP. >>>>>>> >>>>>>> The work we are preparing is to allow the interpreter to produce >>>>>>> more fine-grained error messages, pointing to >>>>>>> the source associated to the instructions that are failing. For >>>>>>> example: >>>>>>> >>>>>>> Traceback (most recent call last): >>>>>>> >>>>>>> File "test.py", line 14, in <module> >>>>>>> >>>>>>> lel3(x) >>>>>>> >>>>>>> ^^^^^^^ >>>>>>> >>>>>>> File "test.py", line 12, in lel3 >>>>>>> >>>>>>> return lel2(x) / 23 >>>>>>> >>>>>>> ^^^^^^^ >>>>>>> >>>>>>> File "test.py", line 9, in lel2 >>>>>>> >>>>>>> return 25 + lel(x) + lel(x) >>>>>>> >>>>>>> ^^^^^^ >>>>>>> >>>>>>> File "test.py", line 6, in lel >>>>>>> >>>>>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>>>>>> >>>>>>> ^^^^^^^^^^^^^^^^^^^^^ >>>>>>> >>>>>>> TypeError: 'NoneType' object is not subscriptable >>>>>>> >>>>>>> The cost of this is having the start column number and end >>>>>>> column number information for every bytecode instruction >>>>>>> and this is what we want to discuss (there is also some stack cost >>>>>>> to re-raise exceptions but that's not a big problem in >>>>>>> any case). Given that column numbers are not very big compared with >>>>>>> line numbers, we plan to store these as unsigned chars >>>>>>> or unsigned shorts. We ran some experiments over the standard >>>>>>> library and we found that the overhead of all pyc files is: >>>>>>> >>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and >>>>>>> the extra size is 0.88 MB). >>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and >>>>>>> the extra size is 0.44MB). >>>>>>> >>>>>>> One of the disadvantages of using chars is that we can only report >>>>>>> columns from 1 to 255 so if an error happens in a column >>>>>>> bigger than that then we would have to exclude it (and not show the >>>>>>> highlighting) for that frame. Unsigned short will allow >>>>>>> the values to go from 0 to 65535. >>>>>>> >>>>>>> Unfortunately these numbers are not easily compressible, as every >>>>>>> instruction would have very different offsets. >>>>>>> >>>>>>> There is also the possibility of not doing this based on some build >>>>>>> flag on when using -O to allow users to opt out, but given the fact >>>>>>> that these numbers can be quite useful to other tools like coverage >>>>>>> measuring tools, tracers, profilers and the such adding conditional >>>>>>> logic to many places would complicate the implementation >>>>>>> considerably and will potentially reduce the usability of those tools >>>>>>> so we >>>>>>> prefer >>>>>>> not to have the conditional logic. We believe this is extra cost is >>>>>>> very much worth the better error reporting but we understand and respect >>>>>>> other points of view. >>>>>>> >>>>>>> Does anyone see a better way to encode this information **without >>>>>>> complicating a lot the implementation**? What are people thoughts on the >>>>>>> feature? >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> Regards from cloudy London, >>>>>>> Pablo Galindo Salgado >>>>>>> >>>>>>> _______________________________________________ >>>> Python-Dev mailing list -- python-dev@python.org >>>> To unsubscribe send an email to python-dev-le...@python.org >>>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>>> Message archived at >>>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/ >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> _______________________________________________ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WIGSM7C3GD7AOCTFJYGTX6ACUSSRMBSU/ Code of Conduct: http://python.org/psf/codeofconduct/