On Friday 31 December 2004 23:49, David Masover wrote: ... > Seems like all of those are really problems of caching/metadata, or more > accurately, "things which Make would understand". How about some more > general way of caching or cache invalidation?
An entry in a metadata cache must become invalid if the corresponding file changes. That's exactly what my question was about. I don't want the filesystem to manage the metadata, just an efficient way to find files with outdated metadata. From an application's point of view, the recursive modified-timestamps look like the most intuitive solution for me. > Here's how I would do it. I'd make a standard for object dependencies > within the filesystem, some way like "make". This is the same thing I > ranted about as a way for accessing the contents of zipfiles as part of > the filesystem, without a performance hit. (cat foo.zip/bar.txt) I don't want to see that much in the file system itself. I wouldn't even care if these timestamps had to be retrieved with the help of a userspace daemon and a library. But without some help of the filesystem itself you always have to traverse the whole directory tree to find modified files. > For instance, your search engine needs an index, which depends on (is > built from) all the files in the filesystem except itself. Thus you > might have an index for each folder (starting with /). Each index > depends on the indices of its subdirectories. When a search is run, > everything has to be rebuilt, in "make"-like fashion, but it gives you > one global place to add the "many things that could be done" to improve > performance for all systems that do this kind of thing -- search engins, > locate, build systems, fsview, and backup tools. How would the filesystem help in that scenario? It could invalidate or delete the (sub)index or metadata cache if one of the files it depends on changes, ok. But can't you do that just as efficiently in userspace if the filesystems just provides the recursive timestamps? ... > Seems like people use things like FAM nowdays. But you're > right, there needs to be a better way. For instance, your desktop > search engine should only rebuild even the stat data when a user enters > a query, but it should be able to do it quickly (without searching the > whole tree). Yes, this is the problem. And recursively propagating modification timestamps look like a good solution to me. I am not saying that the file system should do that iself. Timestamps with this modified semantics would just exist as an interface to the applications. But the filesystem must help to keep these timestamps up to date. The file system itself could help for instance by providing a new "change-monitor"-flag for a file. This flag would be set only from userspace and reset when the file is modified. If the flag is still set when the file is being modified, the filesystem would then create a symlink or something like for the file in a special directory. The contents of this changed-files-directory will then be collected and removed by a daemon, which manages the recursive-mtime-database (no matter if they are stored as extended attributes or in a Berkely DB or whatever). Now each application which has to manage a metadata cache could ask that daemon for the rec-mtime of / first and descent deeper if the rec-mtime is more recent than a stored timestamp etc. Actually the "flag" would have to be something like a list of path names, since a file can be hard linked, but that doesn't change much (I hope). With this approach, most of the work can be delayed until an application actually asks for rec-mtimes. The overhead while writing to a file (when the stat data is updated) would be to check if the change monitor flag is set and only if it is, remove it and put one - or sometimes a few - symlinks into the special folder with links to changed files. Until this point there is no propagating changes up till "/". That would all be done by a userspace daemon at a later time. If just the test for the existance of the change monitor flag could be made efficient enough, then the overhead during regular operation would be negligible. I hope that this outline was clear enough to let you tell me if this is possible or why it isn't :) bye and a happy new year to one half of the world! Fred -- Fred Schaettgen [EMAIL PROTECTED]
