I was looking at the new module importer used by mod_python.publisher in
3.2.6 to see whether it reloaded a module if file was replaced with an
older file and have come across some code that worries me a bit. Can
someone else (not just Nicolas) check this code and how it is used in
the context of the module importer and reassure me that the code isn't
going to cause a problem?
The code in question is in the FileCache class of mod_python.cache. The
ModuleCache class used by mod_python.publisher derives from FileCache.
The actual code is:
def check(self, key, name, entry):
opened = file(key, self.mode)
timestamp = fstat(opened.fileno())[-2]
if entry._value is NOT_INITIALIZED:
entry._timestamp = timestamp
return opened
else:
if entry._timestamp != timestamp:
entry._timestamp = timestamp
return opened
else:
opened.close()
return None
This code I am assuming is what is determining if a file has changed
on disk and thus would be the trigger for a module being reloaded.
First off, it worries me that it actually opens the file on disk and then
does a fstat() on it. If the file hasn't changed, it would close the file
just opened and use whatever is cached in the process instead.
Given that 99% of the time, the file will not have changed on disk, this
seems to be an inefficient way of doing this. I would have thought that
a stat() call would have less overhead and only if file has deemed to
have changed then would then the actual file on disk be opened.
Anyway, this issue is not what really worries me. What I see as a much
bigger problem is that the whole result of fstat() is what is cached as
the timestamp and what is used for comparing later to determine if
the file has changed and should be reloaded.
Now the result of calling os.fstat() is something like:
(33188, 3157895L, 234881033L, 1, 501, 501, 32367L, 1138847501, 1138846987,
1138847194)
Python documentation describes it as:
The return value is an object whose attributes correspond to the
members of the stat structure, namely: st_mode (protection bits), st_ino
(inode number), st_dev (device), st_nlink (number of hard links), st_uid
(user ID of owner), st_gid (group ID of owner), st_size (size of file,
in bytes), st_atime (time of most recent access), st_mtime (time of most
recent content modification), st_ctime (platform dependent; time of most
recent metadata change on Unix, or the time of creation on Windows).
Problem is that this contains lots of stuff besides the modification time
of the file. This means that a module reload can occur in circumstances
where the content of the file has not even changed. For example, the
owner or group of the file is changed, or the number of hard links to
the file has changed.
The worst of all though is that the time of most recent access is in that
tuple of information. This is the st_atime field which is documented as:
st_atime Time when file data last accessed. Changed by the mknod(2),
utimes(2) and read(2) system calls.
Thus simply reading from the file or other operations on the file, will
cause a module reload. This may not happen immediately, as it seems
some operating systems don't necessarily flush out changes to the access
time to disk immediately when no file changes are made, instead waiting
until in memory file system cache information needs to be discarded.
It will happen though at some point.
Can someone check over my reasoning here and see if I am correct?
Note that up until now I hadn't even looked over how this new module
importer was implemented. I knew it wasn't going to solve various of the
existing module importer problems and I knew it was actually going to
introduce some new issues that would have to be worked around, but now
that I have started to document these new issues for inclusion in my
module importer issues list and when I see other possible problems like
the above, I am really starting to wander if it is really a good idea
letting this interim solution to module importing problems be released.
Comments?
Graham