Yep, probably right. - sounds good.
On Mon, Jun 24, 2013 at 5:53 PM, Michael MacFadden <michael.macfad...@gmail.com> wrote: > I will respond to this section by section in a bit. > > However my general comment is that there is a lot to digest here and I think > we need to start putting this stuff in a wiki page(s). I think it will get > buried here. I think we might need individual pages on each protocol idea. > And the a pro/con page where we compare them based on discussions on the list. > > ~Michael > > On Jun 24, 2013, at 1:06 PM, Joseph Gentle <jose...@gmail.com> wrote: > >> I want to start the discussion around what OT algorithms to use. This >> is going to get technical. This is not a bug. Please ask simple but >> tangential questions (like how does wave's current crypto system >> work?) in a new thread instead of this one. >> >> Requirements: >> - Arbitrary peers can syncronize data >> - We can collaboratively edit tree structures, key-value pairs and >> rich text. (I'd like arbitrary JSON editing) >> - Fast enough to work on mobile devices. Protocol needs to be low >> bandwidth and require a low number of network round-trips. >> - Ideally a low storage overhead >> >> Nice-to-have: >> - End-to-end encryption of content. (In the wake of PRISM, the less >> information my wave server needs to know about me, the better.) >> - Some wave version of OTR[1] >> - Presence & cursors >> - Simple. At least, as simple as we can manage. Lets at least try :/ >> >> Is there anything else I'm missing? Some of this stuff is only >> tangentially related to OT (like cursors and encryption), but its >> still important we bring it up. >> >> >> Given that we want to be able to connect arbitrary peers, I think it >> makes sense to use a message encryption & signing system. Currently, >> wave servers sign their user's messages. To support users >> collaborating over a LAN without servers, we should allow users to >> sign (& encrypt) their own messages. Unfortunately, the signature will >> become invalid if the operation is transformed. >> >> Consider peers A and B (communicating over a LAN), and disconnected >> peer C. A and B each generate simultaneous operations and sign their >> operations. A sends its operation to B, who transforms it. B then >> connects to C and sends both operations. For C to verify A's >> operation, C must see the original (untransformed) version of the >> operation. I think the only way we can make this work securely is by >> only sending original operations instead of sending operations after >> they've been transformed. >> >> As far as I know, there's three broad directions we could take >> algorithmically: >> 1. CRDTs (like Logoot[1]) >> 2. 'Classical' OT using vector clocks for versioning. >> 3. The OT system that Torben Weis was working on. (Some might remember >> him from the wave summit a few years ago). I have been chatting on and >> off with him about this since the summit. He hasn't published anything >> on it yet though. Its similar to classical OT, except using git style >> version hashes. >> >> >> CRDTs are probably the most elegant solution. They avoid the entire >> complicated mess of transformation. However: >> - They add a significant overhead to the document size (we must store >> a unique name between every two characters in the wave) >> - As far as I know, nobody has implemented arbitrary JSON editing >> using CRDTs. Implementing a set is very difficult. >> - Even without transform, we'll still have to store operations anyway >> so peers can verify signatures. >> >> I'd like to know what we should expect the actual document size >> overhead to be. I expect it to be at least a 5x size increase, but I >> don't know for sure. The papers I've read cleverly minimize the >> overhead by only doing OT on a line-by-line basis, but thats >> unacceptable for wave. >> >> >> As I understand it, vector clocks are the size of the number of unique >> peers which have ever connected to the network. (They contain an >> integer version number per peer on the wave). As I understand them, a >> vector clock must be stored in each operation - which to me always >> seemed unpalatable enough I didn't explore them further. Note that >> this is per *peer*, not per user. If we want full p2p, this is one per >> device which has ever edited a wave. >> >> There's probably some clever ways we could minimize that size. Off the >> top of my head: >> - Most operations are generated on the same site as their single >> parent. These operations only increment the local version number. We >> could special case this. I don't know how much better this would make >> the system - I suspect a lot. >> - If we want, we could build a hybrid system where most clients use a >> client-server protocol like the current system. Servers could then use >> vector clocks or whatever to confer amongst themselves. This way we >> only need one element in the clock per server. However, we lose most >> of the benefits of having a p2p OT system to begin with. >> >> >> Torben's system is the one I know the most about. I'm biased to think >> thats the right answer because I understand it better than the others. >> I just spent the last couple hours writing up how it works, but I'm >> going to run it by Torben before posting to this list. >> >> It works like this: >> >> First, we need a transform function that supports TP2. The simplest >> way to do this is to add tombstones to the data type. (We'll need to >> do this for the vector clock based system as well, by the way). I have >> a plaintext implementation which does this in javascript[3], although >> its quite slow (~150k transforms per second). I expect that an >> optimized version of this will be around 10-20 million transforms / >> second in C[4] for simple ops, and probably a few million per second >> in java. >> >> Second, make an inverse transform function (prune). P(T(a, b), b) = a. >> >> In its current incantation, the algorithm works as follows: >> Consider a DAG of all the operations. The closest metaphor is a >> network of operations in git. Every operation has a unique hash >> identifier and one or more parents (except the root, which has no >> parents). It'll look something like this: >> https://dl.dropboxusercontent.com/u/2494815/ops.png >> ... except it'll usually be waaaay less complicated than that. >> >> The cool thing about this structure is that if you apply all of the >> operations in that graph in *any order*, you'll end up in the same >> document state. (So long as each node comes after all its parents and >> you use the transform function correctly). >> >> Each peer stores a history list of operations. Each operation is >> stored twice. One copy is its original form (the exact bytes which >> were used for hashing and signing). When sending the operation to a >> remote peer, this is the copy we send. The second copy of the op is >> transformed. At all times, the transformed ops in the history list are >> written such that you could directly apply every operation in the >> history list (in order) to a new document and you would reach the >> current document state. The history lists on different nodes don't >> have to be in the same order. >> >> We also store a copy of the current document snapshot, because its >> really useful for UI. >> >> Consider the DAG again. Every operation in the history list is a node >> in that DAG, with its parents as other nodes. The DAG will have at >> least one node with no children (= the bottom nodes in that diagram I >> linked above). These childless nodes form the *frontier*. When a new >> operation is created, all ops in the frontier become the new node's >> parents. The new node also stores the number of ancestors it has (= >> history list length). I'll get to why thats useful that in a moment. >> >> The history list has a special power because of the prune function: >> operations which don't depend on one another can be swapped in place. >> >> swap = (a, b) -> >> b' = prune(b, a) >> a' = transform(a, b) >> >> >> When peers connect, they send each other missing ops. Figuring out >> which ops are missing can be surprisingly tricky - but we'll figure >> that out later. New ops must be ingested in order, so we always ingest >> an operation after ingesting all of its parents. >> >> So you're given an op from another peer to ingest. We need to >> transform that operation by all operations we have locally that aren't >> in the op's ancestors. You can't just arbitrarily transform two ops by >> one another - they have to be both be applyable to the same document. >> So what we do is rewrite our local history so that all of the >> operation's ancestors are at the start of our history list, and all of >> the other operations are at the end of the history. Once we've done >> that, the operation can be transformed by all the ops in the second >> half of our history list and applied (and appended at the end of the >> history list). >> >> Remember, we know how many ancestors the new operation has. We saved >> that earlier when the op was created. We also have all of that >> operation's parents in our history list. So we find the position of >> the last parent the op has in the history list. If that operation is >> at the same position as the new op's ancestor length, we're golden. >> >> Imagine your history list contains: >> A B X Y >> >> and the peer you're connecting to has: >> A B C >> >> (C has B as a parent). C has 2 ancestors (A and B). When you ingest C, >> you find B (its last parent), which is at position 2 in the history >> list. So you know that everything after that in the history list isn't >> one of C's parents, and you can transform C by X, Y and then apply it. >> >> If our history list instead looks like this: >> >> X A B Y >> ... then you'll find B is too late in your history list. You need to >> rewrite your history list to move the X after B. You use the swap >> function to do that, swapping X and A, then X and B. Your history then >> looks like A B X Y like we wanted, so you can proceed as above. >> >> This algorithm requires a lot of transforms, but I think most of them >> are unavoidable because of the signing / trust issue. It also requires >> that every operation (ie, every key press) has a unique ID. Thats >> probably going to be a 16 byte hash or something; which seems like a >> lot of extra bytes for typing one character. >> >> I implemented it here: >> https://github.com/josephg/tp2stuff/blob/master/node2.coffee >> >> ... though with the randomizer setup I have (a handful of peers, all >> randomly generating operations and periodically (randomly) connecting >> to each other to sync up. >> >> 5 clients generating operations, with two random clients syncing about >> every 5 operations results in about 100x as many transforms as >> operations. (10k ops -> 11M transforms). Doing the same thing with 3 >> clients 'only' needs 2M transforms. Performance stays steady over time >> (it doesn't get worse as the document grows like CRDTs will) - this >> scatterplot shows the number of seconds per 30 iterations on one of my >> tests: >> https://dl.dropboxusercontent.com/u/2494815/Screen%20Shot%202013-06-24%20at%201.01.42%20PM.png >> >> I think this huge number of transforms required is one of the costs of >> doing encryption & signing of the ops. We might be able to lower the >> number of transforms required if we trust our servers with our keys - >> but I'm not sure I want to do that. Real-world performance should be >> much better than that anyway, but yeah - it makes me nervous. I'm >> probably going to run more tests to see what kind of performance we >> can expect in practice. >> >> >> Blah. That was a lot of words. Thoughts? >> -J >> >> >> [1] http://www.cypherpunks.ca/otr/ >> [2] http://hal.archives-ouvertes.fr/docs/00/34/59/11/PDF/main.pdf >> [3] https://github.com/share/ottypes/blob/master/lib/text-tp2.js >> [4] based on my extremely fast non-tombstone equivalent data structure >> here: https://github.com/share/libot/blob/master/text.h