Yesterday for the second time in two months an interactive
comprehensive test failed with a "ValueError: bad marshal data
(unknown type code)" error for bindings/python/Plframe.py.  This is a
rather common error with python and typically means the associated
*.pyc that is generated by python has been corrupted.  I moved that
corresponding *.pyc file out of the way, and the comprehensive test
(with *.pyc regenerated by python) sailed through afterward without
issues.  For the record, this issue occurred on my Debian Jessie
platform with python version string of

irwin@raven> python --version
Python 2.7.9

There are lots of potential reasons for such *.pyc corruption issues
such as a change in python version and hardware issues, but these
errors are so common that the python developers list in 2013 became
concerned that python would be subject to race conditions when
generating these files and thus was the author of at least some of
these corruptions (see discussion thread at
<https://mail.python.org/pipermail/python-dev/2013-May/126241.html>
with the subject line "[Python-Dev] Mysterious Python pyc file
corruption problems".

I did an octal dump of the corrupted file versus the uncorrupted
regenerated one, and as far as I can tell the only difference is
a missing byte in the corrupted file.  (If anyone is interested
I can send those files to you for inspection.)

Yesterday I did do some obvious tests (with memtest, fsck, and git
fsck) of my PC hardware (which is 9 [!] years old, but still going
strong), and all was well.  Furthermore, the above octal dumps showed
no i/o issue with the corrupted file, and the problem always occurs
(so far) with just this particular file.  And these rare errors only
started when I started enabling testing of examples/python/pytkdemo
(our only file that imports PLframe which would generate the *.pyc as
a byproduct of that import) with the test_pytkdemo target. So I am
pretty sure this evidence largely rules out any hardware issue.  And I
have not been fiddling with my python versions, and in any case those
changes should just change a version stamp (at least two bytes) in the
file and not simply remove one byte.

So by a process of elimination, I think this is likely one more
candidate for the mysterious python pyc corruption issue.  However, if
the source of this corruption is a race condition in the python
generation of these files, I believe that would only be an issue if
there are simultaneous attempts to generate this file.  The tests I
run do use parallel builds but the test_pytkdemo target is implemented
with a CMake custom target where there should be no build race
conditions (attempting to build that target twice) unless there is a
bug in either CMake or make.  But if that were the case, we would be
seeing similar errors for our other python test targets, and we don't.
However, if you look at examples/python/pytkdemo, it is interesting
that it imports PLframe in two ways, i.e.

import Plframe
from Plframe import *

This is a fairly common (but sloppy) python idiom for importing both a
namespaced and unnamespaced version of PLframe (because some of our
code uses the namespaced version and some of our code does not).
However, the only way you get a race out of that is if python looks
ahead and starts doing the second import (which would attempt to also
generate a PLframe.pyc file) before the first import was finished, and
I have no idea whether that is a possibility or not.

Anyhow, in the near future I plan to track down all our references to
the version of PLframe that is not namespaced and convert it to the
namespaced version so that second import can be eliminated.  And it
will be interesting to see if that makes this corruption issue
disappear.

Meanwhile, if anyone else can replicate this issue that would even be
stronger evidence it is not due to my hardware.  So if you want to
help out with that, you should run the test_pytkdemo target but only
after touching examples/python/Plframe.py (which would force python to
regenerate the *.pyc file when the test_pytkdemo target is run).  And
you should do this test from time to time under a variety of load
conditions so generating the above error even once may be difficult to
accomplish.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to