Hi,
Apologies for my tardiness in doing this, but no one explicitly said it
was too late to critique PEP 657...
Critique of PEP 657 (Include Fine Grained Error Locations in Tracebacks)
------------------------------------------------------------------------
First of all I want to say that I support the goal of improving error
messages. IMO the PEP should be "accepted in principle". I think all of
the issues below can be fixed while still supporting the general aims of
the PEP.
The change from points to ranges as locations
---------------------------------------------
Because Python is a procedural language, there is an expectation that
code executes in a certain order. PEP 626 (Precise line numbers for
debugging and other tools) seeks to guarantee that that expectation is met.
PEP 657 describes how locations for exceptions are to be handled, but is
vague on the treatment of locations for tracing, profiling and debugging.
PEP 657 proposes that locations for exceptions be treated as ranges,
whereas tracing, profiling and debugging currently treat locations as
points.
Either this will end in contradictions and confusion should those
locations disagree, or the locations for tracing, profiling and
debugging must change.
Using the start of a range as the point location for tracing may be
misleading when the operation that causes an exception is on a different
line within that range.
Consider this example:
https://github.com/python/cpython/blob/main/Lib/test/test_compile.py#L861
This might seem like a contrived case, but it is based on a real bug
report https://bugs.python.org/issue39316
1. def load_method():
2. return (
3. o.
4. m(
5. 0
6. )
Currently the call is traced on line 4.
PEP 657 would change the location of the call from line 4 to the range
3-6, which would mean that the line of call is no longer traced
separately (or traced several times). PEP 657 makes no mention of this
change.
The PEP claims that these changes are improvements. Maybe they are, but
they are quite impactful changes which the PEP glosses over. The impact
on tools like coverage.py and debuggers should be made clearer. For
example, how would one set a breakpoint on line 4 above?
There are other languages (e.g. jinja templates) that compile to Python
AST and bytecode. These *might* produce locations that overlap, but are
not nested. The behavior of tracing and debuggers needs to be described
for those locations.
Backwards Compatibility
-----------------------
PEP 657 claims it is fully backwards compatible, but it cannot be both
backwards compatible and consistent.
There are fundamental differences between using ranges and points as
locations.
Impact on startup time
----------------------
The PEP 657 suggests the impact on startup would be negligible. That is
not quite true. The impact on startup is probably acceptable, but a
proper analysis needs to be made.
The increase in size of pyc files ~20% puts an upper bound on the
increase of startup time, but I would expect it to be much less than
that as loading files from disk is only a fraction of startup.
Currently, startup is dominated by inefficiencies in interpreter
creation, unmarshalling and module loading.
We plan to reduce these a lot for 3.11, so that the impact of PEP 657 on
startup will be larger (as a ratio) than experiments with 3.10 suggest.
The API
-------
The C API adds three new functions, one each for the end line, start
column and end column.
This is either slow, as any compressed table needs to be parsed four
times, or space inefficient using an uncompressed table.
Opt-out
-------
Allowing opt-out prevents consistent compression of location data,
resulting in larger pyc files for those that do not opt-out.
The exact semantics, in terms of error formatting, tracing, etc is not
described should the user opt-out.
Summary
-------
Overall, there is nothing that blocks acceptance of the PEP in
principle, but there are quite a few issues that need resolving.
Suggestions
-----------
1. Clarify, in detail, the impact on line-based tools like profilers,
coverage.py and debuggers. This should include help on how to use the
new APIs and where using the old APIs might result in behavioral changes.
2. Change the C API to a single function:
int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int
*startcolumn, int *endline, int *endcolumn)
3. Drop the opt-out option.
If the extra information is optional, then the compression scheme must
allow for that; making the code more complex and potentially less
efficient. Does opting out use the start of the range, or the old line,
as the location?
4. Drop the limitation on column offsets.
The data needs to be compressed anyway, so allowing arbitrary column
offsets is effectively free.
6. Store all location information in a single table (this applies more
to the implementation than the PEP)
Using four separate objects to hold the location info adds a lot of
overhead for most functions.
Cheers,
Mark.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/XNSFU7NTF3EWFQJEGTLCIFNX23BCF7QR/
Code of Conduct: http://python.org/psf/codeofconduct/