I am looking into a particularly vexing Python problem on Ubuntu that
manifests in several different ways.  I think the problem is the same one
described in http://bugs.python.org/issue13146 and I sent a message on the
subject to the ubuntu-devel list:
https://lists.ubuntu.com/archives/ubuntu-devel/2013-May/037129.html

I don't know what's causing the problem and have no way to reproduce it, but
all the clues point to corrupt pyc files in Pythons < 3.3.

The common way this manifests is a traceback on an import statement.  The
actual error can be a "ValueError: bad marshal data (unknown type code)" such
as in http://pad.lv/1010077 or an "EOFError: EOF read where not expected" as
in http://pad.lv/1060842.  We have many more instances of both of these.

Since both error messages come from marshal.c when trying to read the pyc for
a module being imported, I suspect that something is causing the pyc files to
get partially overwritten or corrupted.  The workaround is always to
essentially blow away the .pyc file and re-create it.  (Various different
techniques can be used, but they all boil down to the same thing.)

Another commonality is that this bug -- so far -- has not been observed in any
Python 3.3 code, only 3.2 and earlier, including 2.7 and 2.6.  This
strengthens my hypothesis, since importlib in Python 3.3 included an atomic
rename of the .pyc file whereas older Pythons only do an exclusive open on the
pyc files, but do *not* do an atomic rename AFAICT.

This leads me to hypothesize that the bug is due to an as yet unidentified
race condition during installation of Python source code on Ubuntu, which is
normally when we automatically byte compile the source to .pyc files.  This
can happen at package installation/upgrade time, or during a fresh install.
In each of these cases there *should* be only one process attempting to write
the .pyc, but my guess is that for some reason, multiple processes are trying
to do this, triggering a truncation or other bogus content of .pyc files.
Even in Python < 3.3, it should not be possible to corrupt a .pyc when only a
single process is involved, due to the import lock and/or GIL.  The exclusive
open of the .pyc file is clearly not enough of a protection in a multiprocess
situation, since the bug has already been identified in Python on buildbots
during test_multiprocessing.  See http://bugs.python.org/issue13146

I think the list of errors we've seen is too extensive to chalk up to a
hardware bug, and I think the systems involved are modern enough to not be
subject to file system data loss.  There could be a missing fsync somewhere
though that might be involved.  I think it's doubtful that buggy remote file
systems (e.g. NFSv2) are involved.  I could be wrong about any of that.

I have not succeeded in writing a standalone reproducer using Python 2.7.

So, the mystery is: what process on Ubuntu is exploiting holes in the
exclusive open and causing this problem?

Issue 13146 is closed because the fix was applied to Python 3.3 (see above),
but it was not backported to earlier versions.  I think it would not be that
difficult to backport it, and I would be willing to do so for Python 2.7 and
3.2.  We might include 2.6 in that, but only in Ubuntu since I can't see how
this bug could be exploited as a security vulnerability.

Thoughts?

-Barry

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to