On Tue, Jul 06, 2010 at 05:29:54PM +0200, Arne Jansen wrote:
> Daniel Carosone wrote:
> > Something similar would be useful, and much more readily achievable,
> > from ZFS from such an application, and many others.  Rather than a way
> > to compare reliably between two files for identity, I'ld liek a way to
> > compare identity of a single file between two points in time.  If my
> > application can tell quickly that the file content is unaltered since
> > last time I saw the file, I can avoid rehashing the content and use a
> > stored value. If I can achieve this result for a whole directory
> > tree, even better.
> 
> This would be great for any kind of archiving software. Aren't zfs checksums
> already ready to solve this? If a file changes, it's dnodes' checksum changes,
> the checksum of the directory it is in and so forth all the way up to the
> uberblock.

Not quite.  The merkle tree of file and metadata blocks gets to the root
via a path that is largely separate from the tree of directory objects
that name them.  Changing the inode(equivalent) metadata doesn't
change the contents of any of the directories that have entries
pointing to that file.   

Put another way, we don't have named file versions - a directory name
refers to an object and all future versions of that object.  Future
versions will be found at different disk addresses, but are the same
object id.  There's no need to rewrite the directory object unless the
name changes or the link is removed.   We have named filesystem
versions (snapshots), that name the entire tree of data and metadata
objects, once they do finally converge.

> There may be ways a checksum changes without a real change in the files 
> content,
> but the other way round should hold. If the checksum didn't change, the file
> didn't change.

That's true, for individual files. (Checksum changes can happen when
checksum algorithm is changed and same data is rewritten, or if
zero-filled blocks are replaced/rewritten with holes or vice-versa,
and some other cases)

> So the only missing link is a way to determine zfs's checksum for a
> file/directory/dataset. 

That is missing, yes.  Perhaps in part because noone could see a valid
application-level use for this implementation-specific metadata.  The
original purpose of my post above was to illustrate a use case for it.

The throwaway line at the end, about a directory tree, was more in the
vein of wishful thinking.. :)

> Am I missing something here? Of course atime update
> should be turned off, otherwise the checksum will get changed by the archiving
> agent.

Separate checksums exist that cover the difference between content and
metadata changes, so at least in theory an interface that exposes
enough detail could avoid this restriction.

--
Dan.

Attachment: pgp3k9u39IfHh.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to