Re: content hash of a tree

Thomas Mueller Tue, 15 Oct 2013 00:14:26 -0700

Hi,

>I thought this was one of the fundamental principles that
>allow for quick diff/merge and help identify a commit within the MVCC
>timeline.


I'm not sure where you heard or read that, but a content hash is not
needed for this; a revision number that is changed whenever there is a
modification is enough. (For each change in this node or any direct or
indirect child node.) For the MongoMK, the revision is a combination of
the timestamp, the cluster node id, and a counter; very similar to the
MongoDB object id: http://docs.mongodb.org/manual/reference/object-id/

>what was the reason to abandon this idea?

As for the MongoDB: performance and scalability. A node lookup by content
hash would be bad for performance, as it would require an index on a
randomly distributed data. See also
http://fr.slideshare.net/daumdna/mongodb-scaling-write-performance - page
9 (the red line is with an index on the content hash, the green line
without). But even without such an index: maintaining the content hash
would be prohibitively expensive and would prevent scalable writes.

Regards,
Thomas

Re: content hash of a tree

Reply via email to