> I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.

That could work, but in my personal opinion, I would prefer not to do that
as it complicates things and I think is overkill.

On Sat, 8 May 2021 at 21:45, Gregory P. Smith <g...@krypto.org> wrote:

>
> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado <pablog...@gmail.com>
> wrote:
>
>> > We can't piggy back on -OO as the only way to disable this, it needs
>> to have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -OO is the only sensible way to disable the data. There are two things to
>> disable:
>>
>
> nit: I wouldn't choose the word "sensible" given that -OO is already
> fundamentally unusable without knowing if any code in your entire
> transitive dependencies might depend on the presence of docstrings...
>
>
>>
>> * The data in pyc files
>> * Printing the exception highlighting
>>
>> Printing the exception highlighting can be disabled via combo of
>> environment variable / -X option but collecting the data can only be
>> disabled by -OO. The reason is that this will end in pyc files
>> so when the data is not there, a different kind of pyc files need to be
>> produced and I really don't want to have another set of pyc file extension
>> just to deactivate this. Notice that also a configure
>> time variable won't work because it will cause crashes when reading pyc
>> files produced by the interpreter compiled without the flag.
>>
>
> I don't think the optional existence of column number information needs a
> different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
>
>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith <g...@krypto.org> wrote:
>>
>>>
>>>
>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
>>>> Hi Brett,
>>>>
>>>> Just to be clear, .pyo files have not existed for a while:
>>>>> https://www.python.org/dev/peps/pep-0488/.
>>>>
>>>>
>>>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>>>> with -OO, which have the "opt-2" prefix.
>>>>
>>>> This only kicks in at the -OO level.
>>>>
>>>>
>>>> I will correct the PEP so it reflex this more exactly.
>>>>
>>>> I personally prefer the idea of dropping the data with -OO since if
>>>>> you're stripping out docstrings you're already hurting introspection
>>>>> capabilities in the name of memory. Or one could go as far as to introduce
>>>>> -Os to do -OO plus dropping this extra data.
>>>>
>>>>
>>>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>>>> is using -OO, precisely as we are already dropping other data.
>>>>
>>>
>>> We can't piggy back on -OO as the only way to disable this, it needs to
>>> have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -gps
>>>
>>>
>>>>
>>>> Thanks for the clarifications!
>>>>
>>>>
>>>>
>>>> On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>>>>> pablog...@gmail.com> wrote:
>>>>>
>>>>>> Although we were originally not sympathetic with it, we may need to
>>>>>> offer an opt-out mechanism for those users that care about the impact of
>>>>>> the overhead of the new data in pyc files
>>>>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>>>>> Yury, and others). For this, we could propose that the functionality will
>>>>>> be deactivated along with the extra
>>>>>> information when Python is executed in optimized mode (``python -O``)
>>>>>> and therefore pyo files will not have the overhead associated with the
>>>>>> extra required data.
>>>>>>
>>>>>
>>>>> Just to be clear, .pyo files have not existed for a while:
>>>>> https://www.python.org/dev/peps/pep-0488/.
>>>>>
>>>>>
>>>>>> Notice that Python
>>>>>> already strips docstrings in this mode so it would be "aligned" with
>>>>>> the current mechanism of optimized mode.
>>>>>>
>>>>>
>>>>> This only kicks in at the -OO level.
>>>>>
>>>>>
>>>>>>
>>>>>> Although this complicates the implementation, it certainly is still
>>>>>> much easier than dealing with compression (and more useful for those that
>>>>>> don't want the feature). Notice that we also
>>>>>> expect pessimistic results from compression as offsets would be quite
>>>>>> random (although predominantly in the range 10 - 120).
>>>>>>
>>>>>
>>>>> I personally prefer the idea of dropping the data with -OO since if
>>>>> you're stripping out docstrings you're already hurting introspection
>>>>> capabilities in the name of memory. Or one could go as far as to introduce
>>>>> -Os to do -OO plus dropping this extra data.
>>>>>
>>>>> As for .pyc file size, I personally wouldn't worry about it. If
>>>>> someone is that space-constrained they either aren't using .pyc files or
>>>>> are only shipping a single set of .pyc files under -OO and skipping source
>>>>> code. And .pyc files are an implementation detail of CPython so there
>>>>> shouldn't be too much of a concern for other interpreters.
>>>>>
>>>>> -Brett
>>>>>
>>>>>
>>>>>>
>>>>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <
>>>>>> pablog...@gmail.com> wrote:
>>>>>>
>>>>>>> One last note for clarity: that's the increase of size in the
>>>>>>> stdlib, the increase of size
>>>>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an
>>>>>>> increase of 22%.
>>>>>>>
>>>>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <
>>>>>>> pablog...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Some update on the numbers. We have made some draft implementation
>>>>>>>> to corroborate the
>>>>>>>> numbers with some more realistic tests and seems that our original
>>>>>>>> calculations were wrong.
>>>>>>>> The actual increase in size is quite bigger than previously
>>>>>>>> advertised:
>>>>>>>>
>>>>>>>> Using bytes object to encode the final object and marshalling that
>>>>>>>> to disk (so using uint8_t) as the underlying
>>>>>>>> type:
>>>>>>>>
>>>>>>>> BEFORE:
>>>>>>>>
>>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>>>> 70M     Lib
>>>>>>>> 70M     total
>>>>>>>>
>>>>>>>> AFTER:
>>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>>>> 76M     Lib
>>>>>>>> 76M     total
>>>>>>>>
>>>>>>>> So that's an increase of 8.56 % over the original value. This is
>>>>>>>> storing the start offset and end offset with no compression
>>>>>>>> whatsoever.
>>>>>>>>
>>>>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
>>>>>>>> pablog...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> We are preparing a PEP and we would like to start some early
>>>>>>>>> discussion about one of the main aspects of the PEP.
>>>>>>>>>
>>>>>>>>> The work we are preparing is to allow the interpreter to produce
>>>>>>>>> more fine-grained error messages, pointing to
>>>>>>>>> the source associated to the instructions that are failing. For
>>>>>>>>> example:
>>>>>>>>>
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>
>>>>>>>>>   File "test.py", line 14, in <module>
>>>>>>>>>
>>>>>>>>>     lel3(x)
>>>>>>>>>
>>>>>>>>>     ^^^^^^^
>>>>>>>>>
>>>>>>>>>   File "test.py", line 12, in lel3
>>>>>>>>>
>>>>>>>>>     return lel2(x) / 23
>>>>>>>>>
>>>>>>>>>            ^^^^^^^
>>>>>>>>>
>>>>>>>>>   File "test.py", line 9, in lel2
>>>>>>>>>
>>>>>>>>>     return 25 + lel(x) + lel(x)
>>>>>>>>>
>>>>>>>>>                 ^^^^^^
>>>>>>>>>
>>>>>>>>>   File "test.py", line 6, in lel
>>>>>>>>>
>>>>>>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>>>>>>
>>>>>>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>
>>>>>>>>> TypeError: 'NoneType' object is not subscriptable
>>>>>>>>>
>>>>>>>>> The cost of this is having the start column number and end
>>>>>>>>> column number information for every bytecode instruction
>>>>>>>>> and this is what we want to discuss (there is also some stack cost
>>>>>>>>> to re-raise exceptions but that's not a big problem in
>>>>>>>>> any case). Given that column numbers are not very big compared
>>>>>>>>> with line numbers, we plan to store these as unsigned chars
>>>>>>>>> or unsigned shorts. We ran some experiments over the standard
>>>>>>>>> library and we found that the overhead of all pyc files is:
>>>>>>>>>
>>>>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and
>>>>>>>>> the extra size is 0.88 MB).
>>>>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB
>>>>>>>>> and the extra size is 0.44MB).
>>>>>>>>>
>>>>>>>>> One of the disadvantages of using chars is that we can only report
>>>>>>>>> columns from 1 to 255 so if an error happens in a column
>>>>>>>>> bigger than that then we would have to exclude it (and not show
>>>>>>>>> the highlighting) for that frame. Unsigned short will allow
>>>>>>>>> the values to go from 0 to 65535.
>>>>>>>>>
>>>>>>>>> Unfortunately these numbers are not easily compressible, as every
>>>>>>>>> instruction would have very different offsets.
>>>>>>>>>
>>>>>>>>> There is also the possibility of not doing this based on some
>>>>>>>>> build flag on when using -O to allow users to opt out, but given the 
>>>>>>>>> fact
>>>>>>>>> that these numbers can be quite useful to other tools like
>>>>>>>>> coverage measuring tools, tracers, profilers and the such adding 
>>>>>>>>> conditional
>>>>>>>>> logic to many places would complicate the implementation
>>>>>>>>> considerably and will potentially reduce the usability of those tools 
>>>>>>>>> so we
>>>>>>>>> prefer
>>>>>>>>> not to have the conditional logic. We believe this is extra cost
>>>>>>>>> is very much worth the better error reporting but we understand and 
>>>>>>>>> respect
>>>>>>>>> other points of view.
>>>>>>>>>
>>>>>>>>> Does anyone see a better way to encode this information **without
>>>>>>>>> complicating a lot the implementation**? What are people thoughts on 
>>>>>>>>> the
>>>>>>>>> feature?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Regards from cloudy London,
>>>>>>>>> Pablo Galindo Salgado
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>> Python-Dev mailing list -- python-dev@python.org
>>>>>> To unsubscribe send an email to python-dev-le...@python.org
>>>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>>>> Message archived at
>>>>>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
>>>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>>>
>>>>> _______________________________________________
>>>> Python-Dev mailing list -- python-dev@python.org
>>>> To unsubscribe send an email to python-dev-le...@python.org
>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>> Message archived at
>>>> https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>
>>>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EB7J4QEEARHXG4V5C62ULYQNFMNYTSXM/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to