Hello Developers, I am trying to debug and fix a particularly vexing problem in Python that manifests on Ubuntu in several different ways. I have a hypothesis about the problem, but there are still some mysteries and I don't know how to reproduce it. I think I can fix it, but I'm sending this message (and soon, another one to python-dev with more technical details) in the hopes that you might have other ideas about how it can happen, or have a reliable way to reproduce the bug.
The problem can show up in any package, but Brian has started to collect a number of bugs that all seem to be related (and I think Steve is going to open a megabug to dupe them all to). The common way this manifests is a traceback on an import statement. The actual error can be a "ValueError: bad marshal data (unknown type code)" such as in LP: #1010077, or an "EOFError: EOF read where not expected" as in LP: #1060842. We have many more instances of both of these. Both of these exceptions come from Python's marshal code (marshal.c). marshal is the low-level serialization protocol used to cache Python byte code into .pyc files, so both of these exception imply corrupt .pyc files, and in fact, the workaround is always to essentially blow away the .pyc file and re-create it. (Various different techniques can be used, but they all boil down to the same thing.) Another commonality is that this bug -- so far -- has not been observed in any Python 3.3 code, only 3.2 and earlier, including 2.7 and 2.6. If this holds up, it's a crucial clue, because the import machinery was flipped over to the pure-Python importlib in 3.3, and this includes an atomic renaming of the .pyc file during write. All earlier versions of Python used a C implemented version of import, which opens the .pyc files exclusively (O_EXCL|O_CREAT) but do *not* do an atomic rename. This leads me to hypothesize that the bug is due to an as yet unidentified race condition during installation of Python source code, which is normally when we automatically byte compile the source to .pyc files. This can happen at package installation/upgrade time, or during a ubiquity run during a fresh install. In each of these cases there *should* be only one process attempting to write the .pyc, but my guess is that for some reason, multiple processes are trying to do this, triggering a truncation or other bogus content of .pyc files. Even in Python < 3.3, it should not be possible to corrupt a .pyc when only a single process is involved, due to the import lock and/or GIL. The exclusive open of the .pyc file is clearly not enough of a protection in a multiprocess situation. I think the list of errors we've seen is too extensive to chalk up to a hardware bug, and I think the systems involved are modern enough to not be subject to file system data loss. There could be a missing fsync somewhere though that might be involved. I think it's doubtful that buggy remote file systems (e.g. NFSv2) are involved. I could be wrong about any of that. I have not succeeded in writing a standalone reproducer using Python 2.7. So, the mystery is: what process on Ubuntu is exploiting holes in the exclusive open and causing this problem? Even without identifying the actual culprit(s), this upstream bug is probably the root cause: http://bugs.python.org/issue13146 The bug is closed because the fix was applied to Python 3.3 (see above), but it was not backported to earlier versions. I think it would not be that difficult to backport it, and will talk to my fellow Python core devs to determine whether and where it should get backported. It probably makes sense to get it into 2.7, and maybe 3.2, but nothing else. In either case, it almost certainly makes sense to get the fix into Ubuntu's Python 2.7, at least SRU'ing it to Raring. I'm not sure whether it makes sense to try to get such a fix into earlier Ubuntu releases or Python versions on Ubuntu. The thing is: while the problem is mysterious and annoying, the workaround is fairly simple, and I would love not to have to care about Python 2.6 or 3.2. :) Thoughts are welcome, though remember that I'm going to engage python-dev on the same topic (not cross-posted). -Barry
signature.asc
Description: PGP signature
-- ubuntu-devel mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
