I was looking at the new module importer used by mod_python.publisher in 3.2.6 to see whether it reloaded a module if file was replaced with an older file and have come across some code that worries me a bit. Can someone else (not just Nicolas) check this code and how it is used in the context of the module importer and reassure me that the code isn't going to cause a problem?
The code in question is in the FileCache class of mod_python.cache. The ModuleCache class used by mod_python.publisher derives from FileCache. The actual code is: def check(self, key, name, entry): opened = file(key, self.mode) timestamp = fstat(opened.fileno())[-2] if entry._value is NOT_INITIALIZED: entry._timestamp = timestamp return opened else: if entry._timestamp != timestamp: entry._timestamp = timestamp return opened else: opened.close() return None This code I am assuming is what is determining if a file has changed on disk and thus would be the trigger for a module being reloaded. First off, it worries me that it actually opens the file on disk and then does a fstat() on it. If the file hasn't changed, it would close the file just opened and use whatever is cached in the process instead. Given that 99% of the time, the file will not have changed on disk, this seems to be an inefficient way of doing this. I would have thought that a stat() call would have less overhead and only if file has deemed to have changed then would then the actual file on disk be opened. Anyway, this issue is not what really worries me. What I see as a much bigger problem is that the whole result of fstat() is what is cached as the timestamp and what is used for comparing later to determine if the file has changed and should be reloaded. Now the result of calling os.fstat() is something like: (33188, 3157895L, 234881033L, 1, 501, 501, 32367L, 1138847501, 1138846987, 1138847194) Python documentation describes it as: The return value is an object whose attributes correspond to the members of the stat structure, namely: st_mode (protection bits), st_ino (inode number), st_dev (device), st_nlink (number of hard links), st_uid (user ID of owner), st_gid (group ID of owner), st_size (size of file, in bytes), st_atime (time of most recent access), st_mtime (time of most recent content modification), st_ctime (platform dependent; time of most recent metadata change on Unix, or the time of creation on Windows). Problem is that this contains lots of stuff besides the modification time of the file. This means that a module reload can occur in circumstances where the content of the file has not even changed. For example, the owner or group of the file is changed, or the number of hard links to the file has changed. The worst of all though is that the time of most recent access is in that tuple of information. This is the st_atime field which is documented as: st_atime Time when file data last accessed. Changed by the mknod(2), utimes(2) and read(2) system calls. Thus simply reading from the file or other operations on the file, will cause a module reload. This may not happen immediately, as it seems some operating systems don't necessarily flush out changes to the access time to disk immediately when no file changes are made, instead waiting until in memory file system cache information needs to be discarded. It will happen though at some point. Can someone check over my reasoning here and see if I am correct? Note that up until now I hadn't even looked over how this new module importer was implemented. I knew it wasn't going to solve various of the existing module importer problems and I knew it was actually going to introduce some new issues that would have to be worked around, but now that I have started to document these new issues for inclusion in my module importer issues list and when I see other possible problems like the above, I am really starting to wander if it is really a good idea letting this interim solution to module importing problems be released. Comments? Graham