Re: Delving into OT

Michael MacFadden Mon, 24 Jun 2013 17:54:45 -0700

I will respond to this section by section in a bit. 

However my general comment is that there is a lot to digest here and I think we 
need to start putting this stuff in a wiki page(s). I think it will get buried 
here. I think we might need individual pages on each protocol idea. And the a 
pro/con page where we compare them based on discussions on the list.


~Michael

On Jun 24, 2013, at 1:06 PM, Joseph Gentle <jose...@gmail.com> wrote:

> I want to start the discussion around what OT algorithms to use. This
> is going to get technical. This is not a bug. Please ask simple but
> tangential questions (like how does wave's current crypto system
> work?) in a new thread instead of this one.
> 
> Requirements:
> - Arbitrary peers can syncronize data
> - We can collaboratively edit tree structures, key-value pairs and
> rich text. (I'd like arbitrary JSON editing)
> - Fast enough to work on mobile devices. Protocol needs to be low
> bandwidth and require a low number of network round-trips.
> - Ideally a low storage overhead
> 
> Nice-to-have:
> - End-to-end encryption of content. (In the wake of PRISM, the less
> information my wave server needs to know about me, the better.)
> - Some wave version of OTR[1]
> - Presence & cursors
> - Simple. At least, as simple as we can manage. Lets at least try :/
> 
> Is there anything else I'm missing? Some of this stuff is only
> tangentially related to OT (like cursors and encryption), but its
> still important we bring it up.
> 
> 
> Given that we want to be able to connect arbitrary peers, I think it
> makes sense to use a message encryption & signing system. Currently,
> wave servers sign their user's messages. To support users
> collaborating over a LAN without servers, we should allow users to
> sign (& encrypt) their own messages. Unfortunately, the signature will
> become invalid if the operation is transformed.
> 
> Consider peers A and B (communicating over a LAN), and disconnected
> peer C. A and B each generate simultaneous operations and sign their
> operations. A sends its operation to B, who transforms it. B then
> connects to C and sends both operations. For C to verify A's
> operation, C must see the original (untransformed) version of the
> operation. I think the only way we can make this work securely is by
> only sending original operations instead of sending operations after
> they've been transformed.
> 
> As far as I know, there's three broad directions we could take 
> algorithmically:
> 1. CRDTs (like Logoot[1])
> 2. 'Classical' OT using vector clocks for versioning.
> 3. The OT system that Torben Weis was working on. (Some might remember
> him from the wave summit a few years ago). I have been chatting on and
> off with him about this since the summit. He hasn't published anything
> on it yet though. Its similar to classical OT, except using git style
> version hashes.
> 
> 
> CRDTs are probably the most elegant solution. They avoid the entire
> complicated mess of transformation. However:
> - They add a significant overhead to the document size (we must store
> a unique name between every two characters in the wave)
> - As far as I know, nobody has implemented arbitrary JSON editing
> using CRDTs. Implementing a set is very difficult.
> - Even without transform, we'll still have to store operations anyway
> so peers can verify signatures.
> 
> I'd like to know what we should expect the actual document size
> overhead to be. I expect it to be at least a 5x size increase, but I
> don't know for sure. The papers I've read cleverly minimize the
> overhead by only doing OT on a line-by-line basis, but thats
> unacceptable for wave.
> 
> 
> As I understand it, vector clocks are the size of the number of unique
> peers which have ever connected to the network. (They contain an
> integer version number per peer on the wave). As I understand them, a
> vector clock must be stored in each operation - which to me always
> seemed unpalatable enough I didn't explore them further. Note that
> this is per *peer*, not per user. If we want full p2p, this is one per
> device which has ever edited a wave.
> 
> There's probably some clever ways we could minimize that size. Off the
> top of my head:
> - Most operations are generated on the same site as their single
> parent. These operations only increment the local version number. We
> could special case this. I don't know how much better this would make
> the system - I suspect a lot.
> - If we want, we could build a hybrid system where most clients use a
> client-server protocol like the current system. Servers could then use
> vector clocks or whatever to confer amongst themselves. This way we
> only need one element in the clock per server. However, we lose most
> of the benefits of having a p2p OT system to begin with.
> 
> 
> Torben's system is the one I know the most about. I'm biased to think
> thats the right answer because I understand it better than the others.
> I just spent the last couple hours writing up how it works, but I'm
> going to run it by Torben before posting to this list.
> 
> It works like this:
> 
> First, we need a transform function that supports TP2. The simplest
> way to do this is to add tombstones to the data type. (We'll need to
> do this for the vector clock based system as well, by the way). I have
> a plaintext implementation which does this in javascript[3], although
> its quite slow (~150k transforms per second). I expect that an
> optimized version of this will be around 10-20 million transforms /
> second in C[4] for simple ops, and probably a few million per second
> in java.
> 
> Second, make an inverse transform function (prune). P(T(a, b), b) = a.
> 
> In its current incantation, the algorithm works as follows:
> Consider a DAG of all the operations. The closest metaphor is a
> network of operations in git. Every operation has a unique hash
> identifier and one or more parents (except the root, which has no
> parents). It'll look something like this:
> https://dl.dropboxusercontent.com/u/2494815/ops.png
> ... except it'll usually be waaaay less complicated than that.
> 
> The cool thing about this structure is that if you apply all of the
> operations in that graph in *any order*, you'll end up in the same
> document state. (So long as each node comes after all its parents and
> you use the transform function correctly).
> 
> Each peer stores a history list of operations. Each operation is
> stored twice. One copy is its original form (the exact bytes which
> were used for hashing and signing). When sending the operation to a
> remote peer, this is the copy we send. The second copy of the op is
> transformed. At all times, the transformed ops in the history list are
> written such that you could directly apply every operation in the
> history list (in order) to a new document and you would reach the
> current document state. The history lists on different nodes don't
> have to be in the same order.
> 
> We also store a copy of the current document snapshot, because its
> really useful for UI.
> 
> Consider the DAG again. Every operation in the history list is a node
> in that DAG, with its parents as other nodes. The DAG will have at
> least one node with no children (= the bottom nodes in that diagram I
> linked above). These childless nodes form the *frontier*. When a new
> operation is created, all ops in the frontier become the new node's
> parents. The new node also stores the number of ancestors it has (=
> history list length). I'll get to why thats useful that in a moment.
> 
> The history list has a special power because of the prune function:
> operations which don't depend on one another can be swapped in place.
> 
> swap = (a, b) ->
>  b' = prune(b, a)
>  a' = transform(a, b)
> 
> 
> When peers connect, they send each other missing ops. Figuring out
> which ops are missing can be surprisingly tricky - but we'll figure
> that out later. New ops must be ingested in order, so we always ingest
> an operation after ingesting all of its parents.
> 
> So you're given an op from another peer to ingest. We need to
> transform that operation by all operations we have locally that aren't
> in the op's ancestors. You can't just arbitrarily transform two ops by
> one another - they have to be both be applyable to the same document.
> So what we do is rewrite our local history so that all of the
> operation's ancestors are at the start of our history list, and all of
> the other operations are at the end of the history. Once we've done
> that, the operation can be transformed by all the ops in the second
> half of our history list and applied (and appended at the end of the
> history list).
> 
> Remember, we know how many ancestors the new operation has. We saved
> that earlier when the op was created. We also have all of that
> operation's parents in our history list. So we find the position of
> the last parent the op has in the history list. If that operation is
> at the same position as the new op's ancestor length, we're golden.
> 
> Imagine your history list contains:
> A B X Y
> 
> and the peer you're connecting to has:
> A B C
> 
> (C has B as a parent). C has 2 ancestors (A and B). When you ingest C,
> you find B (its last parent), which is at position 2 in the history
> list. So you know that everything after that in the history list isn't
> one of C's parents, and you can transform C by X, Y and then apply it.
> 
> If our history list instead looks like this:
> 
> X A B Y
> ... then you'll find B is too late in your history list. You need to
> rewrite your history list to move the X after B. You use the swap
> function to do that, swapping X and A, then X and B. Your history then
> looks like A B X Y like we wanted, so you can proceed as above.
> 
> This algorithm requires a lot of transforms, but I think most of them
> are unavoidable because of the signing / trust issue. It also requires
> that every operation (ie, every key press) has a unique ID. Thats
> probably going to be a 16 byte hash or something; which seems like a
> lot of extra bytes for typing one character.
> 
> I implemented it here:
> https://github.com/josephg/tp2stuff/blob/master/node2.coffee
> 
> ... though with the randomizer setup I have (a handful of peers, all
> randomly generating operations and periodically (randomly) connecting
> to each other to sync up.
> 
> 5 clients generating operations, with two random clients syncing about
> every 5 operations results in about 100x as many transforms as
> operations. (10k ops -> 11M transforms). Doing the same thing with 3
> clients 'only' needs 2M transforms. Performance stays steady over time
> (it doesn't get worse as the document grows like CRDTs will) - this
> scatterplot shows the number of seconds per 30 iterations on one of my
> tests:
> https://dl.dropboxusercontent.com/u/2494815/Screen%20Shot%202013-06-24%20at%201.01.42%20PM.png
> 
> I think this huge number of transforms required is one of the costs of
> doing encryption & signing of the ops. We might be able to lower the
> number of transforms required if we trust our servers with our keys -
> but I'm not sure I want to do that. Real-world performance should be
> much better than that anyway, but yeah - it makes me nervous. I'm
> probably going to run more tests to see what kind of performance we
> can expect in practice.
> 
> 
> Blah. That was a lot of words. Thoughts?
> -J
> 
> 
> [1] http://www.cypherpunks.ca/otr/
> [2] http://hal.archives-ouvertes.fr/docs/00/34/59/11/PDF/main.pdf
> [3] https://github.com/share/ottypes/blob/master/lib/text-tp2.js
> [4] based on my extremely fast non-tombstone equivalent data structure
> here: https://github.com/share/libot/blob/master/text.h

Re: Delving into OT

Reply via email to