Ingenious, Torben, certainly adds efficiency. John On Mon, Jul 1, 2013 at 4:38 AM, Torben Weis <torben.w...@gmail.com> wrote:
> 2013/6/25 Joseph Gentle <jose...@gmail.com> > > > > > >> When peers connect, they send each other missing ops. Figuring out > > >> which ops are missing can be surprisingly tricky - but we'll figure > > >> that out later. New ops must be ingested in order, so we always ingest > > >> an operation after ingesting all of its parents. > > > > Just use a Merkle Tree that is at the same time a prefix tree with > respect > to the hashes of the ops (explanation below). > The bandwidth usage is O(1) if both clients are in sync and O(log n) if > they have one or few different ops and O(n) in the worst case, where n in > the number of ops. > > Constructing the tree is simple. > Let the hash function output 20 bytes and let's encode this in hex. This > results in a hash-string of 40 hex-characters for each operation. > Each node hashes over the hashes of its children. Leaf-nodes correspond to > operations and thus use the hash value of their respective operation. > The tree-invariant is that all siblings on level x share the same prefix of > x hex-characters. > The tree is not sent over the network. Instead, clients start comparing the > hashes at the root. > > Two clients compare their root hash. If it is equal, the entire tree is > equal and therefore they are in sync. > If not, they download all direct children and repeat the procedure for each > sub-tree rooted by one of these children. > For example, if child number 3 has a different hash, but all others share > the same hash, then we have learned that there are one or more ops with a > hash of 3xxxx... that are different and need syncing. > > Typically we can limit the depth of the tree to few levels. 8 levels > already yield a tree that could store 16^8 possible ops. So in the worst > case two clients need to wait for 8 round-trips to determine a missing op. > > In addition, each client sends a time stamp. So when syncing we report the > last time stamp received from this client and ask for all ops this client > received later. If these are few, then simply get them (even if we know > some of the ops already, because we got them from another client). If there > are too many ops, fall back to the merkle tree. With a good approximation > of RTT and bandwidth, it is easy to calculate which algorithm is the best > to sync two clients. > > Greetings > Torben >