On Sun, May 9, 2021 at 9:13 AM Antoine Pitrou <anto...@python.org> wrote:

> On Sun, 09 May 2021 02:16:02 -0000
> "Jim J. Jewett" <jimjjew...@gmail.com> wrote:
> > Antoine Pitrou wrote:
> > > On Sat, 8 May 2021 02:58:40 +0000
> > > Neil Schemenauer nas-pyt...@arctrix.com wrote:
> >
> > > > It would be cool if we could mmap the pyc files and have the VM run
> > > > code without an unmarshal step.
> > > > What happens if another process mutates or truncates the file while
> the
> > > CPython VM is executing code from the mapped file?  Crash?
> >
> > Why would this be any different than whatever happens now?
>
> What happens now is that the pyc file is transferred at once to memory
> using regular IO.  So the chance is really slim that you read invalid
> data due to concurrent mutation.
>

concurrent mutation isn't even what I was talking about.  We don't protect
against that today as that isn't a concern.  But POSIX semantics on the
bulk of systems where this would ever matter do software updates by moving
new files into place.  Because that is an idempotent inode change.  So the
existing open file already in the process of being read is not changed.
But as soon as you do a new open call on the pathname you get a different
file than the last time that path was opened.

This is not theoretical.  I've seen production problems as a result
(zipimport - https://bugs.python.org/issue19081) making the incorrect
assumption that they can reopen a file that they've read once at a later
point in time.  So if we do open files later, we must code defensively and
assume they might not contain what we thought.

We already have this problem with source code lines displayed in tracebacks
today as those are read on demand.  But as that is debugging information
only the wrong source lines being shown next to the filename +
linenumber in a traceback is something people just learn to ignore in these
situations.  We have the data to prevent this, we just never have.
https://bugs.python.org/issue44091 filed to track that.

Given this context, M.-A. Lemburg's alternative idea could have some merit
as it would synchronize our source skew behavior with our additional
debugging information behavior.  My initial reaction is that it's falling
into the trap of bundling too into one place though.

quoting M.-A. Lemburg:
> Create a new file format which supports enhanced debugging. This
> would include the source code in a indexed format, the AST and
> mappings between byte code, AST node, lines and columns.
>
> Python would then only use and load this file when it needs
> to print a traceback - much like it does today with the source
> code.
>
> The advantage is that you can add even more useful information
> for debugging while not making the default code distribution
> format take more memory (both disk and RAM).

Realistically: This is going to take more disk space in the common case
because in addition to the py, pyc, pyc.opt-1, pyc.opt-2 that some distros
apparently include all of today, there'd be a new pyc.debuginfo to go along
side it. The only benefit is that it isn't resident in ram. And someone
*could* choose to filter these out of their distro or container or
whatever-the-heck-their-package-format-is. But I really doubt that'll be
the default.

Not having debugging information when a problem you're trying to hunt down
and reproduce but only happens once in a blue moon is extraordinarily
frustrating.  Which is why people who value engineering time deploy with
debugging info.

There are environments where people intentionally do not deploy source
code.  But do want to get debugging data from tracebacks that they can then
correlate to their sources later for analysis (they're tracking exactly
which versions of pycs from which versions of sources were deployed).  It'd
be a shame to exclude column information for this scenario.

-gps
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6E7UZ5SFUAADUJUQ6DKPJIGO6CCGCNFU/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to