Re: [Xournal-devel] collaborative xournal

Denis Auroux Tue, 19 Apr 2011 17:43:40 -0700

Sorry for writing so many messages; I just realized I made an 
unwarranted assumption in my previous two messages, and I'm beginning to 
understand better what you have been saying.

For people who don't want to read through all my verbose stuff, the 
point of this message is to explore more carefully
(1) how to simplify editing conflicts by only thinking in terms of 
creations/deletions rather than other operations; and
(2) how exactly the real-time nature of local editing can be reconciled 
with the non-real-time nature of network communication in this context.

This is a completely different viewpoint than the one I was previously 
advocating, and it leads to issues that are completely different but 
probably a lot simpler. (It's much less permissive, since conflicting 
edits end up being rejected rather than merged.)

You might not want to actually go this way and do something very 
different. My main point is that, before embarking onto actual coding, 
one should think about what actually happens in various scenarios of 
editing conflicts such as the ones I discuss below, and come up with 
strategies for all possible cases. Failing to think carefully in advance 
means one might end up wasting a lot of effort...

Denis

(1) FORMULATING ALL OPERATIONS AS CREATIONS/DELETIONS:

The types of items that currently go into the undo/redo stack are fairly 
close to "user operations" conceptually, but in fact one could try to 
view them all in terms of adding and removing items (strokes or text 
boxes) in the journal. Currently, moving a selected set of strokes is 
encoded as an ITEM_MOVESEL operation, which basically says which strokes 
are moved and by how much (the strokes don't have IDs, but if they did, 
they would keep the same IDs along the way); instead one could try to 
encode it as "delete all these strokes, and insert all these new strokes".

The two main conceptual differences I can see are:

1. Once everything is expressed in terms purely of creating and deleting 
items, there are far fewer things to worry about concerning the order in 
which operations are performed. So expressing everything as 
creations/deletions should simplify a lot the handling of editing 
conflicts. This is a compelling reason to move away from thinking in 
terms of actual user actions such as "move strokes", and expressing 
everything in terms of creations and deletions.

2. On the other hand, this completely breaks the "identity" of the 
strokes being manipulated: if I'm deleting my strokes and creating new 
ones elsewhere, they're no longer the same strokes.

To take an example: let's say we start with stroke #1 in green, user A 
changes it to red, and user B moves it on the page: if they are both 
doing this at the same time, A will try to say "delete stroke #1, create 
a new red stroke -- which will become say #2", while B will try to say 
"delete stroke #1, create a new green stroke elsewhere -- which becomes 
say stroke #3". The server gets to decide which operation happened 
first. Let's say it gets the message from A first, so that stroke #1 
gets deleted and replaced with stroke #2. Then what happens when B's 
operation is received?

The question is: do we end up with the two strokes #2 and #3 on the 
page? (one moved and the other repainted). Or does the second operation 
processed by the server become entirely invalid (because stroke #1 can't 
be deleted anymore, stroke #3 should not be created either)?
The second outcome is of course much better than the first one, and it 
is satisfactory in my opinion.  (It is different from ending up with a 
stroke that's been both moved and recolored, which would be the outcome 
in the "other" approach. That outcome is just impossible in this model; 
but I really don't think it's a serious problem).

Perhaps this is what Andreas meant by "error management" (if so, sorry, 
I was slow in understanding). I think the conclusion here is that 
messages sent to the server should consist of "minimally combined 
deletion and creation operations", i.e. within a same message one sends 
the set of creation/deletion pairs that correspond to a same conceptual 
operation, with the understanding that if one part of the operation is 
invalid because it involves deleting an already-deleted stroke, then the 
whole operation should become invalid and ignored.

In practice this means: if I try to modify a stroke that's already been 
nearly simultaneously modified by someone else, my operation will be 
rejected since the stroke I'm trying to delete doesn't exist anymore for 
the server.

This is a very reasonable way to handle conflicts (less permissive than 
what I was initially proposing but much more realistic to implement).
In effect, rather than handling out-of-order insertion of operations 
into the timeline of a stroke (where the stroke keeps existing in 
successive states over time, and we permit all sorts of conflicting 
operations on the stroke to be performed in any order), we end up with 
"ephemeral" strokes (that disappear as soon as they get modified), and 
we have to reject conflicting edits because the stroke no longer exists 
after its first modification. Of course this simplifies vastly the 
question of reordering operations, but it brings up the issue of 
cancelling rejected operations.

(2) REAL-TIME LOCAL EDITING VS. NETWORK COMMUNICATION DELAYS:

There's now the question of what the local instance should be doing 
while waiting for validation of a pending action from the server. The 
point is that the user needs to be able to keep working even if 
communication is very slow and delayed. In my initial proposal, the idea 
was that different operations on the same stroke can be carried out in 
different orders. In this proposal, since the first operation performed 
on the stroke actually destroys it and replaces it by a different 
stroke, we'll never perform a second operation on it.

Thinking of instance B in the above scenario, here's what might happen 
in succession:

1. locally I perform a move operation, replacing stroke #1 by a new 
stroke, and send a message to the server to let everyone know.
[I think the operation has to be performed locally -- waiting for 
confirmation is not realistic since it might take a while to arrive.]

2. I might have time to perform more operations involving my new stroke, 
still without hearing anything from the server (perhaps there's a 
network lag right now). My stroke needs to have an ID in order for me to 
be able to tell the server what I'm doing locally -- so indeed IDs have 
to be generated by the client, not by the server. (Sorry!) Let's say I 
give the new stroke of step 1 the ID number 3. I would then send the 
server more messages telling it that I'm further manipulating stroke #3 
(i.e., destroying it and replacing it by yet new strokes).

3. I might learn from the server about other pending operations which 
don't obviously conflict with what I attempted in steps 1 and 2. Those 
should be processed, and if ordering matters, these operations happen 
*before* the ones I attempted in step 1 and 2. (But I think ordering 
might not matter.)

4. I might learn from the server (*before* hearing back about my 
attempted move operation from step 1, since the message might have 
already been in transit over the network before step 1 took place) that 
someone has erased stroke #1 and replaced it by stroke #2. Since stroke 
#1 is already deleted in my local copy, this is the first indication of 
trouble to come. What to do about it is not 100% clear-cut. I can find 
out easily in my local data structures that the operation which caused 
stroke #1 to be deleted is the operation from step 1 -- and I know it 
hasn't been validated by the server yet.

This tells me right away that the server will likely reject my request 
of step 1. Since stroke #3 will never get created, the subsequent 
operations involving it will also get rejected. I'm heading into 
dangerous territory. So perhaps the only reasonable thing to do would be 
to just undo ALL the operations that I've tried to perform locally since 
the last one that was confirmed by the server -- or I can try to undo 
them all, then internally redo those that do not depend on stroke #1 or 
stroke #3 or other subsequently endangered strokes. At this point, I 
might even want to freeze my local user out of action until I know a bit 
better what's happening. This might even be the right time to bring up a 
"please wait while resolving editing conflict" dialog box (with a 
"cancel" button that would disconnect from the server and just continue 
in a local session, in case things break down too badly).
Either that, or I can let the user continue working, but it might be 
really confusing if the user continues to work and sees more and more 
things get cancelled for no apparent reason.

In any case, I need to perform the operation replacing stroke #1 by 
stroke #2, and perform it chronologically before any of the unresolved 
conflicts -- so I think that forces undoing the local operations of 
steps 1 and 2.

5. I learn subsequently from the server that the requests I made at step 
1 is denied. Not surprising, and perfectly ok since I already undid it. 
Or: I might also learn that my request was denied without my having 
already figured it out at step 4. [Not sure! Can it actually happen? If 
the server handles requests in a single sequential order, then it 
shouldn't reject the request unless it already processed information 
that invalidated stroke #1. But one might want to handle this 
possibility in case it happens, for robustness.] In that case I again 
need to undo the operation of step 1, as well as that of step 2 since 
stroke #3 is no longer valid either upon undoing step 1.

Or, more surprising: I might learn from the server that my request from 
step 1 is approved. [Can it actually happen? Most likely, YES because of 
undo. Point: the replacement of stroke #1 by stroke #2 could have been 
cancelled by now, due to an undo from user A. In this case stroke #1 
exists again by the time the server processes my request from step 1, 
and has no reason to deny it.]  What do I do then? Do I request a 
cancellation of the operation since I've already cancelled it locally 
(and let the server propagate the cancellation to everyone?) Or do I 
redo it locally, since the user wanted it to happen and the server is 
letting it happen? I think the correct behavior is to process the action 
again (i.e. replace stroke #1 by stroke #3, as the server is telling me to).

*** One point here about undo: one could try to take the 
"deletion/creation" approach to everything including undo, so that 
undoing the operation (delete stroke #1, create stroke #2) is
performing the operation (delete stroke #2, create stroke #4 = identical 
to #1 but with a new ID number). However this is not good, because next 
I might try to undo the creation of stroke #1, which refers to it by its 
ID number (1) that has now become invalid. So the undo should really 
re-create the stroke with the same ID it used to have, rather than a 
similar stroke with a different ID number. This can still be sent to the 
server as a (delete stroke #2, create stroke #1) message -- and the 
server could choose to reject it if either it thinks stroke #2 doesn't 
exist, or it thinks stroke #1 actually exists. In any case, the point is 
that, in the above sequence of steps, stroke #1 might disappear at step 
4 but might still reappear before step 5, leading to the surprising 
situation of the previous paragraph where a local operation I was sure 
would be rejected is actually getting accepted.

Sorry this is getting messy. Perhaps it's not the easiest thing either.
This kind of sequential model with approval/rejection of conflicting 
edits feels a lot more "standard", and less strange, than my other 
proposal. I'm guessing that such a model might even be already 
implemented in some existing frameworks for collaborative software, 
i.e. one might not even need to implement everything from scratch.

Ok, I'll stop now. Sorry for the long-winded messages, and 
congratulations if you read all the way to here.

Denis

On 04/19/2011 04:17 PM, Denis Auroux wrote:
> There are two main types of operations in there:
>
> - some that create items (ITEM_STROKE, ITEM_PASTE, etc.); these are
> *almost* commutative, the only difference if you switch them being which
> stroke ends up on top of which -- important if you're detail-oriented
> since the strokes are opaque, but very minor for most people.
>
> - operations which change the attributes of existing items, such as for
> example ITEM_MOVESEL which moves a set of objects on the page or to a
> different page, ITEM_REPAINTSEL which modifies the color/thickness of an
> item, ITEM_ERASURE which replaces an existing stroke by the bits and
> pieces that remain after erasing the relevant portions of it. These are
> more dangerous to switch around. (You can certainly perform a MOVESEL
> and a REPAINTSEL on the same items in any order, but two REPAINTSELs or
> two MOVESELs of the same items can't be switched.)

-- 
Denis Auroux
MIT Department of Mathematics           aur...@math.mit.edu
    (on leave) and
University of California, Berkeley      aur...@math.berkeley.edu
Department of Mathematics               Tel: 510-642-4367
817 Evans Hall # 3840                   Fax: 510-642-8204
Berkeley, CA 94720-3840

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload 
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve 
application availability and disaster protection. Learn more about boosting 
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Xournal-devel mailing list
Xournal-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xournal-devel

Re: [Xournal-devel] collaborative xournal

Reply via email to