Okay, false alarm (I think). Have got myself worked up over nothing. I missed something very important:
timestamp = fstat(opened.fileno())[-2] That is the '[-2]' in the above. I feel like a goose now. I still though question why file/fstat is done and not stat/file though. Ie., why open the file to stat it? Graham Graham Dumpleton wrote .. > I was looking at the new module importer used by mod_python.publisher in > 3.2.6 to see whether it reloaded a module if file was replaced with an > older file and have come across some code that worries me a bit. Can > someone else (not just Nicolas) check this code and how it is used in > the context of the module importer and reassure me that the code isn't > going to cause a problem? > > The code in question is in the FileCache class of mod_python.cache. The > ModuleCache class used by mod_python.publisher derives from FileCache. > The actual code is: > > def check(self, key, name, entry): > opened = file(key, self.mode) > > timestamp = fstat(opened.fileno())[-2] > > if entry._value is NOT_INITIALIZED: > entry._timestamp = timestamp > return opened > else: > if entry._timestamp != timestamp: > entry._timestamp = timestamp > return opened > else: > opened.close() > return None > > This code I am assuming is what is determining if a file has changed > on disk and thus would be the trigger for a module being reloaded. > > First off, it worries me that it actually opens the file on disk and then > does a fstat() on it. If the file hasn't changed, it would close the file > just opened and use whatever is cached in the process instead. > > Given that 99% of the time, the file will not have changed on disk, this > seems to be an inefficient way of doing this. I would have thought that > a stat() call would have less overhead and only if file has deemed to > have changed then would then the actual file on disk be opened. > > Anyway, this issue is not what really worries me. What I see as a much > bigger problem is that the whole result of fstat() is what is cached as > the timestamp and what is used for comparing later to determine if > the file has changed and should be reloaded. > > Now the result of calling os.fstat() is something like: > > (33188, 3157895L, 234881033L, 1, 501, 501, 32367L, 1138847501, 1138846987, > 1138847194) > > Python documentation describes it as: > > The return value is an object whose attributes correspond to the > members of the stat structure, namely: st_mode (protection bits), st_ino > (inode number), st_dev (device), st_nlink (number of hard links), st_uid > (user ID of owner), st_gid (group ID of owner), st_size (size of file, > in bytes), st_atime (time of most recent access), st_mtime (time of most > recent content modification), st_ctime (platform dependent; time of most > recent metadata change on Unix, or the time of creation on Windows). > > Problem is that this contains lots of stuff besides the modification time > of the file. This means that a module reload can occur in circumstances > where the content of the file has not even changed. For example, the > owner or group of the file is changed, or the number of hard links to > the file has changed. > > The worst of all though is that the time of most recent access is in that > tuple of information. This is the st_atime field which is documented as: > > st_atime Time when file data last accessed. Changed by the mknod(2), > utimes(2) and read(2) system calls. > > Thus simply reading from the file or other operations on the file, will > cause a module reload. This may not happen immediately, as it seems > some operating systems don't necessarily flush out changes to the access > time to disk immediately when no file changes are made, instead waiting > until in memory file system cache information needs to be discarded. > It will happen though at some point. > > Can someone check over my reasoning here and see if I am correct? > > Note that up until now I hadn't even looked over how this new module > importer was implemented. I knew it wasn't going to solve various of the > existing module importer problems and I knew it was actually going to > introduce some new issues that would have to be worked around, but now > that I have started to document these new issues for inclusion in my > module importer issues list and when I see other possible problems like > the above, I am really starting to wander if it is really a good idea > letting this interim solution to module importing problems be released. > > Comments? > > Graham