> We can't piggy back on -OO as the only way to disable this, it needs to
have an option of its own.  -OO is unusable as code that relies on
"doc"strings as application data such as http://www.dabeaz.com/ply/ply.html
exists.

-OO is the only sensible way to disable the data. There are two things to
disable:

* The data in pyc files
* Printing the exception highlighting

Printing the exception highlighting can be disabled via combo of
environment variable / -X option but collecting the data can only be
disabled by -OO. The reason is that this will end in pyc files
so when the data is not there, a different kind of pyc files need to be
produced and I really don't want to have another set of pyc file extension
just to deactivate this. Notice that also a configure
time variable won't work because it will cause crashes when reading pyc
files produced by the interpreter compiled without the flag.

On Sat, 8 May 2021 at 21:13, Gregory P. Smith <g...@krypto.org> wrote:

>
>
> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <pablog...@gmail.com>
> wrote:
>
>> Hi Brett,
>>
>> Just to be clear, .pyo files have not existed for a while:
>>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>> with -OO, which have the "opt-2" prefix.
>>
>> This only kicks in at the -OO level.
>>
>>
>> I will correct the PEP so it reflex this more exactly.
>>
>> I personally prefer the idea of dropping the data with -OO since if
>>> you're stripping out docstrings you're already hurting introspection
>>> capabilities in the name of memory. Or one could go as far as to introduce
>>> -Os to do -OO plus dropping this extra data.
>>
>>
>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>> is using -OO, precisely as we are already dropping other data.
>>
>
> We can't piggy back on -OO as the only way to disable this, it needs to
> have an option of its own.  -OO is unusable as code that relies on
> "doc"strings as application data such as
> http://www.dabeaz.com/ply/ply.html exists.
>
> -gps
>
>
>>
>> Thanks for the clarifications!
>>
>>
>>
>> On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote:
>>
>>>
>>>
>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>>> pablog...@gmail.com> wrote:
>>>
>>>> Although we were originally not sympathetic with it, we may need to
>>>> offer an opt-out mechanism for those users that care about the impact of
>>>> the overhead of the new data in pyc files
>>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>>> Yury, and others). For this, we could propose that the functionality will
>>>> be deactivated along with the extra
>>>> information when Python is executed in optimized mode (``python -O``)
>>>> and therefore pyo files will not have the overhead associated with the
>>>> extra required data.
>>>>
>>>
>>> Just to be clear, .pyo files have not existed for a while:
>>> https://www.python.org/dev/peps/pep-0488/.
>>>
>>>
>>>> Notice that Python
>>>> already strips docstrings in this mode so it would be "aligned" with
>>>> the current mechanism of optimized mode.
>>>>
>>>
>>> This only kicks in at the -OO level.
>>>
>>>
>>>>
>>>> Although this complicates the implementation, it certainly is still
>>>> much easier than dealing with compression (and more useful for those that
>>>> don't want the feature). Notice that we also
>>>> expect pessimistic results from compression as offsets would be quite
>>>> random (although predominantly in the range 10 - 120).
>>>>
>>>
>>> I personally prefer the idea of dropping the data with -OO since if
>>> you're stripping out docstrings you're already hurting introspection
>>> capabilities in the name of memory. Or one could go as far as to introduce
>>> -Os to do -OO plus dropping this extra data.
>>>
>>> As for .pyc file size, I personally wouldn't worry about it. If someone
>>> is that space-constrained they either aren't using .pyc files or are only
>>> shipping a single set of .pyc files under -OO and skipping source code. And
>>> .pyc files are an implementation detail of CPython so there  shouldn't be
>>> too much of a concern for other interpreters.
>>>
>>> -Brett
>>>
>>>
>>>>
>>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com>
>>>> wrote:
>>>>
>>>>> One last note for clarity: that's the increase of size in the stdlib,
>>>>> the increase of size
>>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an
>>>>> increase of 22%.
>>>>>
>>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <
>>>>> pablog...@gmail.com> wrote:
>>>>>
>>>>>> Some update on the numbers. We have made some draft implementation to
>>>>>> corroborate the
>>>>>> numbers with some more realistic tests and seems that our original
>>>>>> calculations were wrong.
>>>>>> The actual increase in size is quite bigger than previously
>>>>>> advertised:
>>>>>>
>>>>>> Using bytes object to encode the final object and marshalling that to
>>>>>> disk (so using uint8_t) as the underlying
>>>>>> type:
>>>>>>
>>>>>> BEFORE:
>>>>>>
>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>> 70M     Lib
>>>>>> 70M     total
>>>>>>
>>>>>> AFTER:
>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>> 76M     Lib
>>>>>> 76M     total
>>>>>>
>>>>>> So that's an increase of 8.56 % over the original value. This is
>>>>>> storing the start offset and end offset with no compression
>>>>>> whatsoever.
>>>>>>
>>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
>>>>>> pablog...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi there,
>>>>>>>
>>>>>>> We are preparing a PEP and we would like to start some early
>>>>>>> discussion about one of the main aspects of the PEP.
>>>>>>>
>>>>>>> The work we are preparing is to allow the interpreter to produce
>>>>>>> more fine-grained error messages, pointing to
>>>>>>> the source associated to the instructions that are failing. For
>>>>>>> example:
>>>>>>>
>>>>>>> Traceback (most recent call last):
>>>>>>>
>>>>>>>   File "test.py", line 14, in <module>
>>>>>>>
>>>>>>>     lel3(x)
>>>>>>>
>>>>>>>     ^^^^^^^
>>>>>>>
>>>>>>>   File "test.py", line 12, in lel3
>>>>>>>
>>>>>>>     return lel2(x) / 23
>>>>>>>
>>>>>>>            ^^^^^^^
>>>>>>>
>>>>>>>   File "test.py", line 9, in lel2
>>>>>>>
>>>>>>>     return 25 + lel(x) + lel(x)
>>>>>>>
>>>>>>>                 ^^^^^^
>>>>>>>
>>>>>>>   File "test.py", line 6, in lel
>>>>>>>
>>>>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>>>>
>>>>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>>>>
>>>>>>> TypeError: 'NoneType' object is not subscriptable
>>>>>>>
>>>>>>> The cost of this is having the start column number and end
>>>>>>> column number information for every bytecode instruction
>>>>>>> and this is what we want to discuss (there is also some stack cost
>>>>>>> to re-raise exceptions but that's not a big problem in
>>>>>>> any case). Given that column numbers are not very big compared with
>>>>>>> line numbers, we plan to store these as unsigned chars
>>>>>>> or unsigned shorts. We ran some experiments over the standard
>>>>>>> library and we found that the overhead of all pyc files is:
>>>>>>>
>>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and
>>>>>>> the extra size is 0.88 MB).
>>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and
>>>>>>> the extra size is 0.44MB).
>>>>>>>
>>>>>>> One of the disadvantages of using chars is that we can only report
>>>>>>> columns from 1 to 255 so if an error happens in a column
>>>>>>> bigger than that then we would have to exclude it (and not show the
>>>>>>> highlighting) for that frame. Unsigned short will allow
>>>>>>> the values to go from 0 to 65535.
>>>>>>>
>>>>>>> Unfortunately these numbers are not easily compressible, as every
>>>>>>> instruction would have very different offsets.
>>>>>>>
>>>>>>> There is also the possibility of not doing this based on some build
>>>>>>> flag on when using -O to allow users to opt out, but given the fact
>>>>>>> that these numbers can be quite useful to other tools like coverage
>>>>>>> measuring tools, tracers, profilers and the such adding conditional
>>>>>>> logic to many places would complicate the implementation
>>>>>>> considerably and will potentially reduce the usability of those tools 
>>>>>>> so we
>>>>>>> prefer
>>>>>>> not to have the conditional logic. We believe this is extra cost is
>>>>>>> very much worth the better error reporting but we understand and respect
>>>>>>> other points of view.
>>>>>>>
>>>>>>> Does anyone see a better way to encode this information **without
>>>>>>> complicating a lot the implementation**? What are people thoughts on the
>>>>>>> feature?
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> Regards from cloudy London,
>>>>>>> Pablo Galindo Salgado
>>>>>>>
>>>>>>> _______________________________________________
>>>> Python-Dev mailing list -- python-dev@python.org
>>>> To unsubscribe send an email to python-dev-le...@python.org
>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>> Message archived at
>>>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>
>>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WIGSM7C3GD7AOCTFJYGTX6ACUSSRMBSU/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to