Hi,

Apologies for my tardiness in doing this, but no one explicitly said it was too late to critique PEP 657...


Critique of PEP 657 (Include Fine Grained Error Locations in Tracebacks)
------------------------------------------------------------------------

First of all I want to say that I support the goal of improving error messages. IMO the PEP should be "accepted in principle". I think all of the issues below can be fixed while still supporting the general aims of the PEP.


The change from points to ranges as locations
---------------------------------------------

Because Python is a procedural language, there is an expectation that code executes in a certain order. PEP 626 (Precise line numbers for debugging and other tools) seeks to guarantee that that expectation is met.

PEP 657 describes how locations for exceptions are to be handled, but is vague on the treatment of locations for tracing, profiling and debugging.

PEP 657 proposes that locations for exceptions be treated as ranges, whereas tracing, profiling and debugging currently treat locations as points.

Either this will end in contradictions and confusion should those locations disagree, or the locations for tracing, profiling and debugging must change.

Using the start of a range as the point location for tracing may be misleading when the operation that causes an exception is on a different line within that range.

Consider this example:
https://github.com/python/cpython/blob/main/Lib/test/test_compile.py#L861

This might seem like a contrived case, but it is based on a real bug report https://bugs.python.org/issue39316

1.  def load_method():
2.      return (
3.          o.
4.          m(
5.              0
6.          )

Currently the call is traced on line 4.

PEP 657 would change the location of the call from line 4 to the range 3-6, which would mean that the line of call is no longer traced separately (or traced several times). PEP 657 makes no mention of this change.

The PEP claims that these changes are improvements. Maybe they are, but they are quite impactful changes which the PEP glosses over. The impact on tools like coverage.py and debuggers should be made clearer. For example, how would one set a breakpoint on line 4 above?

There are other languages (e.g. jinja templates) that compile to Python AST and bytecode. These *might* produce locations that overlap, but are not nested. The behavior of tracing and debuggers needs to be described for those locations.

Backwards Compatibility
-----------------------

PEP 657 claims it is fully backwards compatible, but it cannot be both backwards compatible and consistent. There are fundamental differences between using ranges and points as locations.

Impact on startup time
----------------------

The PEP 657 suggests the impact on startup would be negligible. That is not quite true. The impact on startup is probably acceptable, but a proper analysis needs to be made.

The increase in size of pyc files ~20% puts an upper bound on the increase of startup time, but I would expect it to be much less than that as loading files from disk is only a fraction of startup.

Currently, startup is dominated by inefficiencies in interpreter creation, unmarshalling and module loading. We plan to reduce these a lot for 3.11, so that the impact of PEP 657 on startup will be larger (as a ratio) than experiments with 3.10 suggest.

The API
-------

The C API adds three new functions, one each for the end line, start column and end column. This is either slow, as any compressed table needs to be parsed four times, or space inefficient using an uncompressed table.

Opt-out
-------

Allowing opt-out prevents consistent compression of location data, resulting in larger pyc files for those that do not opt-out. The exact semantics, in terms of error formatting, tracing, etc is not described should the user opt-out.

Summary
-------

Overall, there is nothing that blocks acceptance of the PEP in principle, but there are quite a few issues that need resolving.


Suggestions
-----------

1. Clarify, in detail, the impact on line-based tools like profilers, coverage.py and debuggers. This should include help on how to use the new APIs and where using the old APIs might result in behavioral changes.

2. Change the C API to a single function:
int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int *startcolumn, int *endline, int *endcolumn)

3. Drop the opt-out option.
If the extra information is optional, then the compression scheme must allow for that; making the code more complex and potentially less efficient. Does opting out use the start of the range, or the old line, as the location?

4. Drop the limitation on column offsets.
The data needs to be compressed anyway, so allowing arbitrary column offsets is effectively free.

6. Store all location information in a single table (this applies more to the implementation than the PEP) Using four separate objects to hold the location info adds a lot of overhead for most functions.


Cheers,
Mark.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XNSFU7NTF3EWFQJEGTLCIFNX23BCF7QR/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to