On Tue, Jul 06, 2010 at 05:29:54PM +0200, Arne Jansen wrote: > Daniel Carosone wrote: > > Something similar would be useful, and much more readily achievable, > > from ZFS from such an application, and many others. Rather than a way > > to compare reliably between two files for identity, I'ld liek a way to > > compare identity of a single file between two points in time. If my > > application can tell quickly that the file content is unaltered since > > last time I saw the file, I can avoid rehashing the content and use a > > stored value. If I can achieve this result for a whole directory > > tree, even better. > > This would be great for any kind of archiving software. Aren't zfs checksums > already ready to solve this? If a file changes, it's dnodes' checksum changes, > the checksum of the directory it is in and so forth all the way up to the > uberblock.
Not quite. The merkle tree of file and metadata blocks gets to the root via a path that is largely separate from the tree of directory objects that name them. Changing the inode(equivalent) metadata doesn't change the contents of any of the directories that have entries pointing to that file. Put another way, we don't have named file versions - a directory name refers to an object and all future versions of that object. Future versions will be found at different disk addresses, but are the same object id. There's no need to rewrite the directory object unless the name changes or the link is removed. We have named filesystem versions (snapshots), that name the entire tree of data and metadata objects, once they do finally converge. > There may be ways a checksum changes without a real change in the files > content, > but the other way round should hold. If the checksum didn't change, the file > didn't change. That's true, for individual files. (Checksum changes can happen when checksum algorithm is changed and same data is rewritten, or if zero-filled blocks are replaced/rewritten with holes or vice-versa, and some other cases) > So the only missing link is a way to determine zfs's checksum for a > file/directory/dataset. That is missing, yes. Perhaps in part because noone could see a valid application-level use for this implementation-specific metadata. The original purpose of my post above was to illustrate a use case for it. The throwaway line at the end, about a directory tree, was more in the vein of wishful thinking.. :) > Am I missing something here? Of course atime update > should be turned off, otherwise the checksum will get changed by the archiving > agent. Separate checksums exist that cover the difference between content and metadata changes, so at least in theory an interface that exposes enough detail could avoid this restriction. -- Dan.
pgp3k9u39IfHh.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss