On Thu, Nov 20, 2014 at 10:59 AM, James Forrester <[email protected]>
wrote:
>
> > A paragraph-level diff means that you only get an edit conflict if two
> > people change the same paragraph. A character-level diff would mean,
> then,
> > that you only get a conflict if they change the same character? That
> sounds
> > a bit excessive. (Stupid example: if I change "sixty-three" to
> "sixty-five"
> > and someone else changes it to "seventy-three", that should probably be a
> > conflict, but a character-level diff would happily merge them into
> > "seventy-five".)
>
>
> ​Sure, but wikitext "paragraphs" are significantly more extensive and
> diverse than the NLP concept; to give an example:
>
> Original wikitext:
>
> There are six [[alpaca]] shearers​ on [[Sunningdale Acers|the farm]].
>
>
> ​My changes:​
>
> There are six [[*Alpaca fiber|*alpaca]]​ shearers on [[Sunningdale
> Acr*e*s|the
> farm]].
>
>
> ​Their changes:​
>
> There are six [[alpaca]]​ shearers on [[Sunningdale Acers|the farm*stead*
> ]].
>
>
>
> ​Merg​ing these two changes requires character-level merging (or something
> that natively understand wikitext at a subtle level. The first change would
> go through as a word-level diff (but not at sentence-level); the second
> wouldn't go through even then. Of course, we could prompt people to review
> the diff after saving if we're auto-merging, but that might be something we
> should be doing even now?


I don't think this is particularly unique to wikitext, but sure, a
character-level (or even word-level) diff would often bring better results
than the current algorithm. My point is that paragraph-based (and maybe
even sentence-based) diffing makes unwanted results rare enough that it can
just be applied without any oversight from the user, while the same
definitely would not be true of the finer-grained algorithms. They could be
applied with some sort of user review, or 3-way merge interface, and those
would be cool features in general, but more complex than just tweaking the
diff algorithm, I would think.

...which made me wonder: are we logging enough information of edit
conflicts that we could just replay them with an alternative algorithm and
see how well it performs? None of the EventLogging schemas which look
relevant (Edit [1], EditConflict [2], EditDebugging [3]) seem to store the
text which could not be saved, and while EditDebugging saves the ids for
both old revisions for a successful automatic merge, I'm not sure if those
can be connected with id of the new revision.


[1] https://meta.wikimedia.org/wiki/Schema:Edit
[2] https://meta.wikimedia.org/wiki/Schema:EditConflict
[3] https://meta.wikimedia.org/wiki/Schema:EditDebugging
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to