Re: Worrying code in mod_python.publisher module importer.

Graham Dumpleton Wed, 01 Feb 2006 19:07:53 -0800

Okay, false alarm (I think). Have got myself worked up over nothing.
I missed something very important:


  timestamp = fstat(opened.fileno())[-2]

That is the '[-2]' in the above.

I feel like a goose now.

I still though question why file/fstat is done and not stat/file though.
Ie., why open the file to stat it?

Graham

Graham Dumpleton wrote ..
> I was looking at the new module importer used by mod_python.publisher in
> 3.2.6 to see whether it reloaded a module if file was replaced with an
> older file and have come across some code that worries me a bit. Can
> someone else (not just Nicolas) check this code and how it is used in
> the context of the module importer and reassure me that the code isn't
> going to cause a problem?
> 
> The code in question is in the FileCache class of mod_python.cache. The
> ModuleCache class used by mod_python.publisher derives from FileCache.
> The actual code is:
> 
>     def check(self, key, name, entry):
>         opened = file(key, self.mode)
> 
>         timestamp = fstat(opened.fileno())[-2]
> 
>         if entry._value is NOT_INITIALIZED:
>             entry._timestamp = timestamp
>             return opened
>         else:
>             if entry._timestamp != timestamp:
>                 entry._timestamp = timestamp
>                 return opened
>             else:
>                 opened.close()
>                 return None
> 
> This code I am assuming is what is determining if a file has changed
> on disk and thus would be the trigger for a module being reloaded.
> 
> First off, it worries me that it actually opens the file on disk and then
> does a fstat() on it. If the file hasn't changed, it would close the file
> just opened and use whatever is cached in the process instead.
> 
> Given that 99% of the time, the file will not have changed on disk, this
> seems to be an inefficient way of doing this. I would have thought that
> a stat() call would have less overhead and only if file has deemed to
> have changed then would then the actual file on disk be opened.
> 
> Anyway, this issue is not what really worries me. What I see as a much
> bigger problem is that the whole result of fstat() is what is cached as
> the timestamp and what is used for comparing later to determine if
> the file has changed and should be reloaded.
> 
> Now the result of calling os.fstat() is something like:
> 
>   (33188, 3157895L, 234881033L, 1, 501, 501, 32367L, 1138847501, 1138846987,
> 1138847194)
> 
> Python documentation describes it as:
> 
>   The return value is an object whose attributes correspond to the
>   members of the stat structure, namely: st_mode (protection bits), st_ino
>   (inode number), st_dev (device), st_nlink (number of hard links), st_uid
>   (user ID of owner), st_gid (group ID of owner), st_size (size of file,
>   in bytes), st_atime (time of most recent access), st_mtime (time of most
>   recent content modification), st_ctime (platform dependent; time of most
>   recent metadata change on Unix, or the time of creation on Windows).
> 
> Problem is that this contains lots of stuff besides the modification time
> of the file. This means that a module reload can occur in circumstances
> where the content of the file has not even changed. For example, the
> owner or group of the file is changed, or the number of hard links to
> the file has changed.
> 
> The worst of all though is that the time of most recent access is in that
> tuple of information. This is the st_atime field which is documented as:
> 
>      st_atime     Time when file data last accessed.  Changed by the mknod(2),
>                   utimes(2) and read(2) system calls.
> 
> Thus simply reading from the file or other operations on the file, will
> cause a module reload. This may not happen immediately, as it seems
> some operating systems don't necessarily flush out changes to the access
> time to disk immediately when no file changes are made, instead waiting
> until in memory file system cache information needs to be discarded.
> It will happen though at some point.
> 
> Can someone check over my reasoning here and see if I am correct?
> 
> Note that up until now I hadn't even looked over how this new module
> importer was implemented. I knew it wasn't going to solve various of the
> existing module importer problems and I knew it was actually going to
> introduce some new issues that would have to be worked around, but now
> that I have started to document these new issues for inclusion in my
> module importer issues list and when I see other possible problems like
> the above, I am really starting to wander if it is really a good idea
> letting this interim solution to module importing problems be released.
> 
> Comments?
> 
> Graham

Re: Worrying code in mod_python.publisher module importer.

Reply via email to