[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Gregory P. Smith Sat, 08 May 2021 13:46:46 -0700

On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado <[email protected]>
wrote:


> > We can't piggy back on -OO as the only way to disable this, it needs to
> have an option of its own.  -OO is unusable as code that relies on
> "doc"strings as application data such as
> http://www.dabeaz.com/ply/ply.html exists.
>
> -OO is the only sensible way to disable the data. There are two things to
> disable:
>

nit: I wouldn't choose the word "sensible" given that -OO is already
fundamentally unusable without knowing if any code in your entire
transitive dependencies might depend on the presence of docstrings...


>
> * The data in pyc files
> * Printing the exception highlighting
>
> Printing the exception highlighting can be disabled via combo of
> environment variable / -X option but collecting the data can only be
> disabled by -OO. The reason is that this will end in pyc files
> so when the data is not there, a different kind of pyc files need to be
> produced and I really don't want to have another set of pyc file extension
> just to deactivate this. Notice that also a configure
> time variable won't work because it will cause crashes when reading pyc
> files produced by the interpreter compiled without the flag.
>

I don't think the optional existence of column number information needs a
different kind of pyc file.  Just a flag in a pyc file's header at most.
It isn't a new type of file.


> On Sat, 8 May 2021 at 21:13, Gregory P. Smith <[email protected]> wrote:
>
>>
>>
>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>> [email protected]> wrote:
>>
>>> Hi Brett,
>>>
>>> Just to be clear, .pyo files have not existed for a while:
>>>> https://www.python.org/dev/peps/pep-0488/.
>>>
>>>
>>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>>> with -OO, which have the "opt-2" prefix.
>>>
>>> This only kicks in at the -OO level.
>>>
>>>
>>> I will correct the PEP so it reflex this more exactly.
>>>
>>> I personally prefer the idea of dropping the data with -OO since if
>>>> you're stripping out docstrings you're already hurting introspection
>>>> capabilities in the name of memory. Or one could go as far as to introduce
>>>> -Os to do -OO plus dropping this extra data.
>>>
>>>
>>> This is indeed the plan, sorry for the confusion. The opt-out mechanism
>>> is using -OO, precisely as we are already dropping other data.
>>>
>>
>> We can't piggy back on -OO as the only way to disable this, it needs to
>> have an option of its own.  -OO is unusable as code that relies on
>> "doc"strings as application data such as
>> http://www.dabeaz.com/ply/ply.html exists.
>>
>> -gps
>>
>>
>>>
>>> Thanks for the clarifications!
>>>
>>>
>>>
>>> On Sat, 8 May 2021 at 19:41, Brett Cannon <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>>>> [email protected]> wrote:
>>>>
>>>>> Although we were originally not sympathetic with it, we may need to
>>>>> offer an opt-out mechanism for those users that care about the impact of
>>>>> the overhead of the new data in pyc files
>>>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>>>> Yury, and others). For this, we could propose that the functionality will
>>>>> be deactivated along with the extra
>>>>> information when Python is executed in optimized mode (``python -O``)
>>>>> and therefore pyo files will not have the overhead associated with the
>>>>> extra required data.
>>>>>
>>>>
>>>> Just to be clear, .pyo files have not existed for a while:
>>>> https://www.python.org/dev/peps/pep-0488/.
>>>>
>>>>
>>>>> Notice that Python
>>>>> already strips docstrings in this mode so it would be "aligned" with
>>>>> the current mechanism of optimized mode.
>>>>>
>>>>
>>>> This only kicks in at the -OO level.
>>>>
>>>>
>>>>>
>>>>> Although this complicates the implementation, it certainly is still
>>>>> much easier than dealing with compression (and more useful for those that
>>>>> don't want the feature). Notice that we also
>>>>> expect pessimistic results from compression as offsets would be quite
>>>>> random (although predominantly in the range 10 - 120).
>>>>>
>>>>
>>>> I personally prefer the idea of dropping the data with -OO since if
>>>> you're stripping out docstrings you're already hurting introspection
>>>> capabilities in the name of memory. Or one could go as far as to introduce
>>>> -Os to do -OO plus dropping this extra data.
>>>>
>>>> As for .pyc file size, I personally wouldn't worry about it. If someone
>>>> is that space-constrained they either aren't using .pyc files or are only
>>>> shipping a single set of .pyc files under -OO and skipping source code. And
>>>> .pyc files are an implementation detail of CPython so there  shouldn't be
>>>> too much of a concern for other interpreters.
>>>>
>>>> -Brett
>>>>
>>>>
>>>>>
>>>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> One last note for clarity: that's the increase of size in the stdlib,
>>>>>> the increase of size
>>>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an
>>>>>> increase of 22%.
>>>>>>
>>>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Some update on the numbers. We have made some draft implementation
>>>>>>> to corroborate the
>>>>>>> numbers with some more realistic tests and seems that our original
>>>>>>> calculations were wrong.
>>>>>>> The actual increase in size is quite bigger than previously
>>>>>>> advertised:
>>>>>>>
>>>>>>> Using bytes object to encode the final object and marshalling that
>>>>>>> to disk (so using uint8_t) as the underlying
>>>>>>> type:
>>>>>>>
>>>>>>> BEFORE:
>>>>>>>
>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>>> 70M     Lib
>>>>>>> 70M     total
>>>>>>>
>>>>>>> AFTER:
>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>>> 76M     Lib
>>>>>>> 76M     total
>>>>>>>
>>>>>>> So that's an increase of 8.56 % over the original value. This is
>>>>>>> storing the start offset and end offset with no compression
>>>>>>> whatsoever.
>>>>>>>
>>>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi there,
>>>>>>>>
>>>>>>>> We are preparing a PEP and we would like to start some early
>>>>>>>> discussion about one of the main aspects of the PEP.
>>>>>>>>
>>>>>>>> The work we are preparing is to allow the interpreter to produce
>>>>>>>> more fine-grained error messages, pointing to
>>>>>>>> the source associated to the instructions that are failing. For
>>>>>>>> example:
>>>>>>>>
>>>>>>>> Traceback (most recent call last):
>>>>>>>>
>>>>>>>>   File "test.py", line 14, in <module>
>>>>>>>>
>>>>>>>>     lel3(x)
>>>>>>>>
>>>>>>>>     ^^^^^^^
>>>>>>>>
>>>>>>>>   File "test.py", line 12, in lel3
>>>>>>>>
>>>>>>>>     return lel2(x) / 23
>>>>>>>>
>>>>>>>>            ^^^^^^^
>>>>>>>>
>>>>>>>>   File "test.py", line 9, in lel2
>>>>>>>>
>>>>>>>>     return 25 + lel(x) + lel(x)
>>>>>>>>
>>>>>>>>                 ^^^^^^
>>>>>>>>
>>>>>>>>   File "test.py", line 6, in lel
>>>>>>>>
>>>>>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>>>>>
>>>>>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>
>>>>>>>> TypeError: 'NoneType' object is not subscriptable
>>>>>>>>
>>>>>>>> The cost of this is having the start column number and end
>>>>>>>> column number information for every bytecode instruction
>>>>>>>> and this is what we want to discuss (there is also some stack cost
>>>>>>>> to re-raise exceptions but that's not a big problem in
>>>>>>>> any case). Given that column numbers are not very big compared with
>>>>>>>> line numbers, we plan to store these as unsigned chars
>>>>>>>> or unsigned shorts. We ran some experiments over the standard
>>>>>>>> library and we found that the overhead of all pyc files is:
>>>>>>>>
>>>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and
>>>>>>>> the extra size is 0.88 MB).
>>>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB
>>>>>>>> and the extra size is 0.44MB).
>>>>>>>>
>>>>>>>> One of the disadvantages of using chars is that we can only report
>>>>>>>> columns from 1 to 255 so if an error happens in a column
>>>>>>>> bigger than that then we would have to exclude it (and not show the
>>>>>>>> highlighting) for that frame. Unsigned short will allow
>>>>>>>> the values to go from 0 to 65535.
>>>>>>>>
>>>>>>>> Unfortunately these numbers are not easily compressible, as every
>>>>>>>> instruction would have very different offsets.
>>>>>>>>
>>>>>>>> There is also the possibility of not doing this based on some build
>>>>>>>> flag on when using -O to allow users to opt out, but given the fact
>>>>>>>> that these numbers can be quite useful to other tools like coverage
>>>>>>>> measuring tools, tracers, profilers and the such adding conditional
>>>>>>>> logic to many places would complicate the implementation
>>>>>>>> considerably and will potentially reduce the usability of those tools 
>>>>>>>> so we
>>>>>>>> prefer
>>>>>>>> not to have the conditional logic. We believe this is extra cost is
>>>>>>>> very much worth the better error reporting but we understand and 
>>>>>>>> respect
>>>>>>>> other points of view.
>>>>>>>>
>>>>>>>> Does anyone see a better way to encode this information **without
>>>>>>>> complicating a lot the implementation**? What are people thoughts on 
>>>>>>>> the
>>>>>>>> feature?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> Regards from cloudy London,
>>>>>>>> Pablo Galindo Salgado
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>> Python-Dev mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>>> Message archived at
>>>>> https://mail.python.org/archives/list/[email protected]/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
>>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>>
>>>> _______________________________________________
>>> Python-Dev mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/[email protected]/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/NUEPEZYYGBS653ECF2HYCUPC4VOWC5TC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Reply via email to