On Sun, May 9, 2021 at 9:13 AM Antoine Pitrou <anto...@python.org> wrote:
> On Sun, 09 May 2021 02:16:02 -0000 > "Jim J. Jewett" <jimjjew...@gmail.com> wrote: > > Antoine Pitrou wrote: > > > On Sat, 8 May 2021 02:58:40 +0000 > > > Neil Schemenauer nas-pyt...@arctrix.com wrote: > > > > > > It would be cool if we could mmap the pyc files and have the VM run > > > > code without an unmarshal step. > > > > What happens if another process mutates or truncates the file while > the > > > CPython VM is executing code from the mapped file? Crash? > > > > Why would this be any different than whatever happens now? > > What happens now is that the pyc file is transferred at once to memory > using regular IO. So the chance is really slim that you read invalid > data due to concurrent mutation. > concurrent mutation isn't even what I was talking about. We don't protect against that today as that isn't a concern. But POSIX semantics on the bulk of systems where this would ever matter do software updates by moving new files into place. Because that is an idempotent inode change. So the existing open file already in the process of being read is not changed. But as soon as you do a new open call on the pathname you get a different file than the last time that path was opened. This is not theoretical. I've seen production problems as a result (zipimport - https://bugs.python.org/issue19081) making the incorrect assumption that they can reopen a file that they've read once at a later point in time. So if we do open files later, we must code defensively and assume they might not contain what we thought. We already have this problem with source code lines displayed in tracebacks today as those are read on demand. But as that is debugging information only the wrong source lines being shown next to the filename + linenumber in a traceback is something people just learn to ignore in these situations. We have the data to prevent this, we just never have. https://bugs.python.org/issue44091 filed to track that. Given this context, M.-A. Lemburg's alternative idea could have some merit as it would synchronize our source skew behavior with our additional debugging information behavior. My initial reaction is that it's falling into the trap of bundling too into one place though. quoting M.-A. Lemburg: > Create a new file format which supports enhanced debugging. This > would include the source code in a indexed format, the AST and > mappings between byte code, AST node, lines and columns. > > Python would then only use and load this file when it needs > to print a traceback - much like it does today with the source > code. > > The advantage is that you can add even more useful information > for debugging while not making the default code distribution > format take more memory (both disk and RAM). Realistically: This is going to take more disk space in the common case because in addition to the py, pyc, pyc.opt-1, pyc.opt-2 that some distros apparently include all of today, there'd be a new pyc.debuginfo to go along side it. The only benefit is that it isn't resident in ram. And someone *could* choose to filter these out of their distro or container or whatever-the-heck-their-package-format-is. But I really doubt that'll be the default. Not having debugging information when a problem you're trying to hunt down and reproduce but only happens once in a blue moon is extraordinarily frustrating. Which is why people who value engineering time deploy with debugging info. There are environments where people intentionally do not deploy source code. But do want to get debugging data from tracebacks that they can then correlate to their sources later for analysis (they're tracking exactly which versions of pycs from which versions of sources were deployed). It'd be a shame to exclude column information for this scenario. -gps
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6E7UZ5SFUAADUJUQ6DKPJIGO6CCGCNFU/ Code of Conduct: http://python.org/psf/codeofconduct/