Hi, >I thought this was one of the fundamental principles that >allow for quick diff/merge and help identify a commit within the MVCC >timeline.
I'm not sure where you heard or read that, but a content hash is not needed for this; a revision number that is changed whenever there is a modification is enough. (For each change in this node or any direct or indirect child node.) For the MongoMK, the revision is a combination of the timestamp, the cluster node id, and a counter; very similar to the MongoDB object id: http://docs.mongodb.org/manual/reference/object-id/ >what was the reason to abandon this idea? As for the MongoDB: performance and scalability. A node lookup by content hash would be bad for performance, as it would require an index on a randomly distributed data. See also http://fr.slideshare.net/daumdna/mongodb-scaling-write-performance - page 9 (the red line is with an index on the content hash, the green line without). But even without such an index: maintaining the content hash would be prohibitively expensive and would prevent scalable writes. Regards, Thomas
