[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Gregory P. Smith Fri, 07 May 2021 15:24:51 -0700

On Fri, May 7, 2021 at 2:50 PM Pablo Galindo Salgado <[email protected]>
wrote:


> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in <module>
>
>     lel3(x)
>
>     ^^^^^^^
>
>   File "test.py", line 12, in lel3
>
>     return lel2(x) / 23
>
>            ^^^^^^^
>
>   File "test.py", line 9, in lel2
>
>     return 25 + lel(x) + lel(x)
>
>                 ^^^^^^
>
>   File "test.py", line 6, in lel
>
>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>                          ^^^^^^^^^^^^^^^^^^^^^
>
> TypeError: 'NoneType' object is not subscriptable
>
>
An additional cost to this is things that parse text tracebacks not knowing
how to handle it and things that log tracebacks generating additional
output.  We should provide a way for people to disable the feature on a
process as part of this while they address tooling and logging issues.
(via the usual set of command line flag + python env var + runtime API)

The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>

Neither of those is large. While I'd lean towards uint8_t instead of
uint16_t because not even humans can understand a 255 character line so why
bother being pretty about such a thing... Just document the caveat and move
on with the lower value. A future pyc format could change it if a
compelling argument were ever found.


> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>

A compromise if you want to handle longer lines: A single uint16_t.
Represent the start column in the 9 bits and width in the other 7 bits. (or
any variations thereof)  it's all a matter of what tradeoff you want to
make for space reasons.  encoding as start + width instead of start + end
is likely better anyways if you care about compression as the width byte
will usually be small and thus be friendlier to compression.  I'd
personally ignore compression entirely.

Overall doing this is going to be a big win for developer productivity!

-Greg

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/ULNDFY5CWVDELNPE6S4HY5SDAODOT7DC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Reply via email to