> I don't think the optional existence of column number information needs a different kind of pyc file. Just a flag in a pyc file's header at most. It isn't a new type of file.
That could work, but in my personal opinion, I would prefer not to do that as it complicates things and I think is overkill. On Sat, 8 May 2021 at 21:45, Gregory P. Smith <g...@krypto.org> wrote: > > On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado <pablog...@gmail.com> > wrote: > >> > We can't piggy back on -OO as the only way to disable this, it needs >> to have an option of its own. -OO is unusable as code that relies on >> "doc"strings as application data such as >> http://www.dabeaz.com/ply/ply.html exists. >> >> -OO is the only sensible way to disable the data. There are two things to >> disable: >> > > nit: I wouldn't choose the word "sensible" given that -OO is already > fundamentally unusable without knowing if any code in your entire > transitive dependencies might depend on the presence of docstrings... > > >> >> * The data in pyc files >> * Printing the exception highlighting >> >> Printing the exception highlighting can be disabled via combo of >> environment variable / -X option but collecting the data can only be >> disabled by -OO. The reason is that this will end in pyc files >> so when the data is not there, a different kind of pyc files need to be >> produced and I really don't want to have another set of pyc file extension >> just to deactivate this. Notice that also a configure >> time variable won't work because it will cause crashes when reading pyc >> files produced by the interpreter compiled without the flag. >> > > I don't think the optional existence of column number information needs a > different kind of pyc file. Just a flag in a pyc file's header at most. > It isn't a new type of file. > > >> On Sat, 8 May 2021 at 21:13, Gregory P. Smith <g...@krypto.org> wrote: >> >>> >>> >>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado < >>> pablog...@gmail.com> wrote: >>> >>>> Hi Brett, >>>> >>>> Just to be clear, .pyo files have not existed for a while: >>>>> https://www.python.org/dev/peps/pep-0488/. >>>> >>>> >>>> Whoops, my bad, I wanted to refer to the pyc files that are generated >>>> with -OO, which have the "opt-2" prefix. >>>> >>>> This only kicks in at the -OO level. >>>> >>>> >>>> I will correct the PEP so it reflex this more exactly. >>>> >>>> I personally prefer the idea of dropping the data with -OO since if >>>>> you're stripping out docstrings you're already hurting introspection >>>>> capabilities in the name of memory. Or one could go as far as to introduce >>>>> -Os to do -OO plus dropping this extra data. >>>> >>>> >>>> This is indeed the plan, sorry for the confusion. The opt-out mechanism >>>> is using -OO, precisely as we are already dropping other data. >>>> >>> >>> We can't piggy back on -OO as the only way to disable this, it needs to >>> have an option of its own. -OO is unusable as code that relies on >>> "doc"strings as application data such as >>> http://www.dabeaz.com/ply/ply.html exists. >>> >>> -gps >>> >>> >>>> >>>> Thanks for the clarifications! >>>> >>>> >>>> >>>> On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote: >>>> >>>>> >>>>> >>>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado < >>>>> pablog...@gmail.com> wrote: >>>>> >>>>>> Although we were originally not sympathetic with it, we may need to >>>>>> offer an opt-out mechanism for those users that care about the impact of >>>>>> the overhead of the new data in pyc files >>>>>> and in in-memory code objectsas was suggested by some folks (Thomas, >>>>>> Yury, and others). For this, we could propose that the functionality will >>>>>> be deactivated along with the extra >>>>>> information when Python is executed in optimized mode (``python -O``) >>>>>> and therefore pyo files will not have the overhead associated with the >>>>>> extra required data. >>>>>> >>>>> >>>>> Just to be clear, .pyo files have not existed for a while: >>>>> https://www.python.org/dev/peps/pep-0488/. >>>>> >>>>> >>>>>> Notice that Python >>>>>> already strips docstrings in this mode so it would be "aligned" with >>>>>> the current mechanism of optimized mode. >>>>>> >>>>> >>>>> This only kicks in at the -OO level. >>>>> >>>>> >>>>>> >>>>>> Although this complicates the implementation, it certainly is still >>>>>> much easier than dealing with compression (and more useful for those that >>>>>> don't want the feature). Notice that we also >>>>>> expect pessimistic results from compression as offsets would be quite >>>>>> random (although predominantly in the range 10 - 120). >>>>>> >>>>> >>>>> I personally prefer the idea of dropping the data with -OO since if >>>>> you're stripping out docstrings you're already hurting introspection >>>>> capabilities in the name of memory. Or one could go as far as to introduce >>>>> -Os to do -OO plus dropping this extra data. >>>>> >>>>> As for .pyc file size, I personally wouldn't worry about it. If >>>>> someone is that space-constrained they either aren't using .pyc files or >>>>> are only shipping a single set of .pyc files under -OO and skipping source >>>>> code. And .pyc files are an implementation detail of CPython so there >>>>> shouldn't be too much of a concern for other interpreters. >>>>> >>>>> -Brett >>>>> >>>>> >>>>>> >>>>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado < >>>>>> pablog...@gmail.com> wrote: >>>>>> >>>>>>> One last note for clarity: that's the increase of size in the >>>>>>> stdlib, the increase of size >>>>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an >>>>>>> increase of 22%. >>>>>>> >>>>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado < >>>>>>> pablog...@gmail.com> wrote: >>>>>>> >>>>>>>> Some update on the numbers. We have made some draft implementation >>>>>>>> to corroborate the >>>>>>>> numbers with some more realistic tests and seems that our original >>>>>>>> calculations were wrong. >>>>>>>> The actual increase in size is quite bigger than previously >>>>>>>> advertised: >>>>>>>> >>>>>>>> Using bytes object to encode the final object and marshalling that >>>>>>>> to disk (so using uint8_t) as the underlying >>>>>>>> type: >>>>>>>> >>>>>>>> BEFORE: >>>>>>>> >>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>>>>> ❯ du -h Lib -c --max-depth=0 >>>>>>>> 70M Lib >>>>>>>> 70M total >>>>>>>> >>>>>>>> AFTER: >>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>>>>> ❯ du -h Lib -c --max-depth=0 >>>>>>>> 76M Lib >>>>>>>> 76M total >>>>>>>> >>>>>>>> So that's an increase of 8.56 % over the original value. This is >>>>>>>> storing the start offset and end offset with no compression >>>>>>>> whatsoever. >>>>>>>> >>>>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado < >>>>>>>> pablog...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi there, >>>>>>>>> >>>>>>>>> We are preparing a PEP and we would like to start some early >>>>>>>>> discussion about one of the main aspects of the PEP. >>>>>>>>> >>>>>>>>> The work we are preparing is to allow the interpreter to produce >>>>>>>>> more fine-grained error messages, pointing to >>>>>>>>> the source associated to the instructions that are failing. For >>>>>>>>> example: >>>>>>>>> >>>>>>>>> Traceback (most recent call last): >>>>>>>>> >>>>>>>>> File "test.py", line 14, in <module> >>>>>>>>> >>>>>>>>> lel3(x) >>>>>>>>> >>>>>>>>> ^^^^^^^ >>>>>>>>> >>>>>>>>> File "test.py", line 12, in lel3 >>>>>>>>> >>>>>>>>> return lel2(x) / 23 >>>>>>>>> >>>>>>>>> ^^^^^^^ >>>>>>>>> >>>>>>>>> File "test.py", line 9, in lel2 >>>>>>>>> >>>>>>>>> return 25 + lel(x) + lel(x) >>>>>>>>> >>>>>>>>> ^^^^^^ >>>>>>>>> >>>>>>>>> File "test.py", line 6, in lel >>>>>>>>> >>>>>>>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>>>>>>>> >>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^ >>>>>>>>> >>>>>>>>> TypeError: 'NoneType' object is not subscriptable >>>>>>>>> >>>>>>>>> The cost of this is having the start column number and end >>>>>>>>> column number information for every bytecode instruction >>>>>>>>> and this is what we want to discuss (there is also some stack cost >>>>>>>>> to re-raise exceptions but that's not a big problem in >>>>>>>>> any case). Given that column numbers are not very big compared >>>>>>>>> with line numbers, we plan to store these as unsigned chars >>>>>>>>> or unsigned shorts. We ran some experiments over the standard >>>>>>>>> library and we found that the overhead of all pyc files is: >>>>>>>>> >>>>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and >>>>>>>>> the extra size is 0.88 MB). >>>>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB >>>>>>>>> and the extra size is 0.44MB). >>>>>>>>> >>>>>>>>> One of the disadvantages of using chars is that we can only report >>>>>>>>> columns from 1 to 255 so if an error happens in a column >>>>>>>>> bigger than that then we would have to exclude it (and not show >>>>>>>>> the highlighting) for that frame. Unsigned short will allow >>>>>>>>> the values to go from 0 to 65535. >>>>>>>>> >>>>>>>>> Unfortunately these numbers are not easily compressible, as every >>>>>>>>> instruction would have very different offsets. >>>>>>>>> >>>>>>>>> There is also the possibility of not doing this based on some >>>>>>>>> build flag on when using -O to allow users to opt out, but given the >>>>>>>>> fact >>>>>>>>> that these numbers can be quite useful to other tools like >>>>>>>>> coverage measuring tools, tracers, profilers and the such adding >>>>>>>>> conditional >>>>>>>>> logic to many places would complicate the implementation >>>>>>>>> considerably and will potentially reduce the usability of those tools >>>>>>>>> so we >>>>>>>>> prefer >>>>>>>>> not to have the conditional logic. We believe this is extra cost >>>>>>>>> is very much worth the better error reporting but we understand and >>>>>>>>> respect >>>>>>>>> other points of view. >>>>>>>>> >>>>>>>>> Does anyone see a better way to encode this information **without >>>>>>>>> complicating a lot the implementation**? What are people thoughts on >>>>>>>>> the >>>>>>>>> feature? >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> Regards from cloudy London, >>>>>>>>> Pablo Galindo Salgado >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>> Python-Dev mailing list -- python-dev@python.org >>>>>> To unsubscribe send an email to python-dev-le...@python.org >>>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>>>>> Message archived at >>>>>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/ >>>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>>> >>>>> _______________________________________________ >>>> Python-Dev mailing list -- python-dev@python.org >>>> To unsubscribe send an email to python-dev-le...@python.org >>>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>>> Message archived at >>>> https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/ >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EB7J4QEEARHXG4V5C62ULYQNFMNYTSXM/ Code of Conduct: http://python.org/psf/codeofconduct/