Sounds like we need a Hangout with an online whiteboard or such to work this out...? John
On Tue, Jun 25, 2013 at 4:53 PM, Sam Nelson <so...@orcon.net.nz> wrote: > +1 for structured wiki pages. > > I re-read the original email a few times now and I still get a little bit > lost around the bit of ancestor count - compare history lists A B C and D B > X Y, when trying to apply X the ancestor count will match both, how does > that work? X will have a parent of B like C has a parent of B. Both C and > X have an ancestor count of 2. Or have I mistaken this, in that it should > be impossible for two peers to have operation B, without having identical > history lists up to that point? > > I'll keep going through your tp2stuff as I have time and eventually, I > might be able to contribute intelligently in some way (: > > -Sam > > > > On 25/06/2013 14:20, Joseph Gentle wrote: > >> Yep, probably right. - sounds good. >> >> On Mon, Jun 24, 2013 at 5:53 PM, Michael MacFadden >> <michael.macfad...@gmail.com> wrote: >> >>> I will respond to this section by section in a bit. >>> >>> However my general comment is that there is a lot to digest here and I >>> think we need to start putting this stuff in a wiki page(s). I think it >>> will get buried here. I think we might need individual pages on each >>> protocol idea. And the a pro/con page where we compare them based on >>> discussions on the list. >>> >>> ~Michael >>> >>> On Jun 24, 2013, at 1:06 PM, Joseph Gentle <jose...@gmail.com> wrote: >>> >>> I want to start the discussion around what OT algorithms to use. This >>>> is going to get technical. This is not a bug. Please ask simple but >>>> tangential questions (like how does wave's current crypto system >>>> work?) in a new thread instead of this one. >>>> >>>> Requirements: >>>> - Arbitrary peers can syncronize data >>>> - We can collaboratively edit tree structures, key-value pairs and >>>> rich text. (I'd like arbitrary JSON editing) >>>> - Fast enough to work on mobile devices. Protocol needs to be low >>>> bandwidth and require a low number of network round-trips. >>>> - Ideally a low storage overhead >>>> >>>> Nice-to-have: >>>> - End-to-end encryption of content. (In the wake of PRISM, the less >>>> information my wave server needs to know about me, the better.) >>>> - Some wave version of OTR[1] >>>> - Presence & cursors >>>> - Simple. At least, as simple as we can manage. Lets at least try :/ >>>> >>>> Is there anything else I'm missing? Some of this stuff is only >>>> tangentially related to OT (like cursors and encryption), but its >>>> still important we bring it up. >>>> >>>> >>>> Given that we want to be able to connect arbitrary peers, I think it >>>> makes sense to use a message encryption & signing system. Currently, >>>> wave servers sign their user's messages. To support users >>>> collaborating over a LAN without servers, we should allow users to >>>> sign (& encrypt) their own messages. Unfortunately, the signature will >>>> become invalid if the operation is transformed. >>>> >>>> Consider peers A and B (communicating over a LAN), and disconnected >>>> peer C. A and B each generate simultaneous operations and sign their >>>> operations. A sends its operation to B, who transforms it. B then >>>> connects to C and sends both operations. For C to verify A's >>>> operation, C must see the original (untransformed) version of the >>>> operation. I think the only way we can make this work securely is by >>>> only sending original operations instead of sending operations after >>>> they've been transformed. >>>> >>>> As far as I know, there's three broad directions we could take >>>> algorithmically: >>>> 1. CRDTs (like Logoot[1]) >>>> 2. 'Classical' OT using vector clocks for versioning. >>>> 3. The OT system that Torben Weis was working on. (Some might remember >>>> him from the wave summit a few years ago). I have been chatting on and >>>> off with him about this since the summit. He hasn't published anything >>>> on it yet though. Its similar to classical OT, except using git style >>>> version hashes. >>>> >>>> >>>> CRDTs are probably the most elegant solution. They avoid the entire >>>> complicated mess of transformation. However: >>>> - They add a significant overhead to the document size (we must store >>>> a unique name between every two characters in the wave) >>>> - As far as I know, nobody has implemented arbitrary JSON editing >>>> using CRDTs. Implementing a set is very difficult. >>>> - Even without transform, we'll still have to store operations anyway >>>> so peers can verify signatures. >>>> >>>> I'd like to know what we should expect the actual document size >>>> overhead to be. I expect it to be at least a 5x size increase, but I >>>> don't know for sure. The papers I've read cleverly minimize the >>>> overhead by only doing OT on a line-by-line basis, but thats >>>> unacceptable for wave. >>>> >>>> >>>> As I understand it, vector clocks are the size of the number of unique >>>> peers which have ever connected to the network. (They contain an >>>> integer version number per peer on the wave). As I understand them, a >>>> vector clock must be stored in each operation - which to me always >>>> seemed unpalatable enough I didn't explore them further. Note that >>>> this is per *peer*, not per user. If we want full p2p, this is one per >>>> device which has ever edited a wave. >>>> >>>> There's probably some clever ways we could minimize that size. Off the >>>> top of my head: >>>> - Most operations are generated on the same site as their single >>>> parent. These operations only increment the local version number. We >>>> could special case this. I don't know how much better this would make >>>> the system - I suspect a lot. >>>> - If we want, we could build a hybrid system where most clients use a >>>> client-server protocol like the current system. Servers could then use >>>> vector clocks or whatever to confer amongst themselves. This way we >>>> only need one element in the clock per server. However, we lose most >>>> of the benefits of having a p2p OT system to begin with. >>>> >>>> >>>> Torben's system is the one I know the most about. I'm biased to think >>>> thats the right answer because I understand it better than the others. >>>> I just spent the last couple hours writing up how it works, but I'm >>>> going to run it by Torben before posting to this list. >>>> >>>> It works like this: >>>> >>>> First, we need a transform function that supports TP2. The simplest >>>> way to do this is to add tombstones to the data type. (We'll need to >>>> do this for the vector clock based system as well, by the way). I have >>>> a plaintext implementation which does this in javascript[3], although >>>> its quite slow (~150k transforms per second). I expect that an >>>> optimized version of this will be around 10-20 million transforms / >>>> second in C[4] for simple ops, and probably a few million per second >>>> in java. >>>> >>>> Second, make an inverse transform function (prune). P(T(a, b), b) = a. >>>> >>>> In its current incantation, the algorithm works as follows: >>>> Consider a DAG of all the operations. The closest metaphor is a >>>> network of operations in git. Every operation has a unique hash >>>> identifier and one or more parents (except the root, which has no >>>> parents). It'll look something like this: >>>> https://dl.dropboxusercontent.**com/u/2494815/ops.png<https://dl.dropboxusercontent.com/u/2494815/ops.png> >>>> ... except it'll usually be waaaay less complicated than that. >>>> >>>> The cool thing about this structure is that if you apply all of the >>>> operations in that graph in *any order*, you'll end up in the same >>>> document state. (So long as each node comes after all its parents and >>>> you use the transform function correctly). >>>> >>>> Each peer stores a history list of operations. Each operation is >>>> stored twice. One copy is its original form (the exact bytes which >>>> were used for hashing and signing). When sending the operation to a >>>> remote peer, this is the copy we send. The second copy of the op is >>>> transformed. At all times, the transformed ops in the history list are >>>> written such that you could directly apply every operation in the >>>> history list (in order) to a new document and you would reach the >>>> current document state. The history lists on different nodes don't >>>> have to be in the same order. >>>> >>>> We also store a copy of the current document snapshot, because its >>>> really useful for UI. >>>> >>>> Consider the DAG again. Every operation in the history list is a node >>>> in that DAG, with its parents as other nodes. The DAG will have at >>>> least one node with no children (= the bottom nodes in that diagram I >>>> linked above). These childless nodes form the *frontier*. When a new >>>> operation is created, all ops in the frontier become the new node's >>>> parents. The new node also stores the number of ancestors it has (= >>>> history list length). I'll get to why thats useful that in a moment. >>>> >>>> The history list has a special power because of the prune function: >>>> operations which don't depend on one another can be swapped in place. >>>> >>>> swap = (a, b) -> >>>> b' = prune(b, a) >>>> a' = transform(a, b) >>>> >>>> >>>> When peers connect, they send each other missing ops. Figuring out >>>> which ops are missing can be surprisingly tricky - but we'll figure >>>> that out later. New ops must be ingested in order, so we always ingest >>>> an operation after ingesting all of its parents. >>>> >>>> So you're given an op from another peer to ingest. We need to >>>> transform that operation by all operations we have locally that aren't >>>> in the op's ancestors. You can't just arbitrarily transform two ops by >>>> one another - they have to be both be applyable to the same document. >>>> So what we do is rewrite our local history so that all of the >>>> operation's ancestors are at the start of our history list, and all of >>>> the other operations are at the end of the history. Once we've done >>>> that, the operation can be transformed by all the ops in the second >>>> half of our history list and applied (and appended at the end of the >>>> history list). >>>> >>>> Remember, we know how many ancestors the new operation has. We saved >>>> that earlier when the op was created. We also have all of that >>>> operation's parents in our history list. So we find the position of >>>> the last parent the op has in the history list. If that operation is >>>> at the same position as the new op's ancestor length, we're golden. >>>> >>>> Imagine your history list contains: >>>> A B X Y >>>> >>>> and the peer you're connecting to has: >>>> A B C >>>> >>>> (C has B as a parent). C has 2 ancestors (A and B). When you ingest C, >>>> you find B (its last parent), which is at position 2 in the history >>>> list. So you know that everything after that in the history list isn't >>>> one of C's parents, and you can transform C by X, Y and then apply it. >>>> >>>> If our history list instead looks like this: >>>> >>>> X A B Y >>>> ... then you'll find B is too late in your history list. You need to >>>> rewrite your history list to move the X after B. You use the swap >>>> function to do that, swapping X and A, then X and B. Your history then >>>> looks like A B X Y like we wanted, so you can proceed as above. >>>> >>>> This algorithm requires a lot of transforms, but I think most of them >>>> are unavoidable because of the signing / trust issue. It also requires >>>> that every operation (ie, every key press) has a unique ID. Thats >>>> probably going to be a 16 byte hash or something; which seems like a >>>> lot of extra bytes for typing one character. >>>> >>>> I implemented it here: >>>> https://github.com/josephg/**tp2stuff/blob/master/node2.**coffee<https://github.com/josephg/tp2stuff/blob/master/node2.coffee> >>>> >>>> ... though with the randomizer setup I have (a handful of peers, all >>>> randomly generating operations and periodically (randomly) connecting >>>> to each other to sync up. >>>> >>>> 5 clients generating operations, with two random clients syncing about >>>> every 5 operations results in about 100x as many transforms as >>>> operations. (10k ops -> 11M transforms). Doing the same thing with 3 >>>> clients 'only' needs 2M transforms. Performance stays steady over time >>>> (it doesn't get worse as the document grows like CRDTs will) - this >>>> scatterplot shows the number of seconds per 30 iterations on one of my >>>> tests: >>>> https://dl.dropboxusercontent.**com/u/2494815/Screen%20Shot%** >>>> 202013-06-24%20at%201.01.42%**20PM.png<https://dl.dropboxusercontent.com/u/2494815/Screen%20Shot%202013-06-24%20at%201.01.42%20PM.png> >>>> >>>> I think this huge number of transforms required is one of the costs of >>>> doing encryption & signing of the ops. We might be able to lower the >>>> number of transforms required if we trust our servers with our keys - >>>> but I'm not sure I want to do that. Real-world performance should be >>>> much better than that anyway, but yeah - it makes me nervous. I'm >>>> probably going to run more tests to see what kind of performance we >>>> can expect in practice. >>>> >>>> >>>> Blah. That was a lot of words. Thoughts? >>>> -J >>>> >>>> >>>> [1] http://www.cypherpunks.ca/otr/ >>>> [2] >>>> http://hal.archives-ouvertes.**fr/docs/00/34/59/11/PDF/main.**pdf<http://hal.archives-ouvertes.fr/docs/00/34/59/11/PDF/main.pdf> >>>> [3] >>>> https://github.com/share/**ottypes/blob/master/lib/text-**tp2.js<https://github.com/share/ottypes/blob/master/lib/text-tp2.js> >>>> [4] based on my extremely fast non-tombstone equivalent data structure >>>> here: >>>> https://github.com/share/**libot/blob/master/text.h<https://github.com/share/libot/blob/master/text.h> >>>> >>> >