[Bug 54328] Make it possible for edit diff to be provided as a raw text

2014-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54328

--- Comment #3 from Aaron Halfaker aaron.halfa...@gmail.com ---
1. Character vs. line offset
I'd much rather represent diffs based on a character offset I'm afraid of
representing position with something like lineno since linebreaks are
differently defined between systems.  Character offsets would also allow us to
make changes to our diff detection strategy without changing the output.

2. Machine readable vs. human readable diffs
Machine readable diff opcode formats tend to represent the full set of
operations used to recreate a revision -- not just the context.  A common
format that I'm familiar with would something like this:

a = These are wrd.
b = These are words.
{
  diff: [
{
  op: equal,
  a_start: 0,
  a_end: 10,
  b_start: 0
  b_end: 10
},
{
  op: remove,
  a_start: 10,
  a_end: 13,
  b_start: 10,
  b_end: 10,
  content: wrd,
},
{
  op: insert,
  a_start: 13,
  a_end: 13,
  b_start: 10,
  b_end: 15,
  content: words,
},
{
  op: equal,
  a_start: 13,
  a_end: 14,
  b_start: 15,
  b_end: 16
}
  ]
}

3. compressed format:
I don't see the value in compressing the format given that the API doesn't
really let you query for more than one diff at a time and diffs tend to be
represented in few operations.  However, we could simply represent each
operation as a tuple with agreed upon field order:


{
  op: insert,
  a_start: 13,
  a_end: 13,
  b_start: 15,
  b_end: 18,
  content: foo
}

could be 

[
  insert,
  13,
  13,
  15,
  18,
  foo
]

or if we really want to get a tight format (since the rest of the fields are
derivable in a sequence of operations).

   [
 insert,
 15,
 foo
   ]

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54328] Make it possible for edit diff to be provided as a raw text

2014-08-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54328

--- Comment #4 from Brad Jorsch bjor...@wikimedia.org ---
(In reply to Aaron Halfaker from comment #3)
 1. Character vs. line offset
 I'd much rather represent diffs based on a character offset I'm afraid of
 representing position with something like lineno since linebreaks are
 differently defined between systems.

Isn't that an argument for line-based rather than chatacter-based offsets?

  Character offsets would also allow us
 to make changes to our diff detection strategy without changing the output.
 
 2. Machine readable vs. human readable diffs
 Machine readable diff opcode formats tend to represent the full set of
 operations used to recreate a revision -- not just the context.

OTOH, what is the usual use of querying the diffs? I suspect it's more often
that the client is wanting to display a human-readable diff to the end user
than because the client is wanting to do the equivalent of the 'patch' utility
on an already-downloaded local copy of the article.

 and diffs tend to be represented in few operations.

On talk pages, maybe. But someone heavily copyediting an article is likely to
generate a huge number of operations. With the way the diff algorithm works,
even some simple edits will generate many operations as it tries to match up
individual letters in the old vs new paragraphs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54328] Make it possible for edit diff to be provided as a raw text

2013-10-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54328

Peter Bena benap...@gmail.com changed:

   What|Removed |Added

 Blocks||55793

--- Comment #2 from Peter Bena benap...@gmail.com ---
I think that for beginning splitting new text and old text would be enough,
right now it's hard to find out what was added by user and what was there
before they edited the page

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54328] Make it possible for edit diff to be provided as a raw text

2013-09-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54328

Peter Bena benap...@gmail.com changed:

   What|Removed |Added

   Priority|Unprioritized   |Normal
   Severity|normal  |enhancement

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 54328] Make it possible for edit diff to be provided as a raw text

2013-09-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=54328

--- Comment #1 from Brad Jorsch bjor...@wikimedia.org ---
The data structure would have to be rather more complicated than that. At first
guess, something along the lines of (in JSON):

 diff: [
 { line: 1, type: context, content: Line },
 { line: 2, type: removed, old: Line },
 { line: 2, type: added, new: Line },
 { line: 3, type: context, content: Line },
 { line: 47, type: context, content: Line },
 { line: 48, type: changed, old: Line, new: Line },
 { line: 49, type: context, content: Line }
 ]

If you want indication in the line of what changed for changed types, that's
another complication. Instead of just Line it would have to be an array of
fragments. One simple way might be that even array indexes are unchanged and
odd are changed:

   old: [
   foo bar ,
   ,
   quux ,
   poop,
   ],
   new: [
   foo bar ,
   baz ,
   quux ,
   etc.,
   ]

That might indicate that baz was inserted into the list and poop at the end
was replaced with etc.. Or maybe it would be better to combine old and
new into one datastructure somehow.

Also, keep in mind that lots of little objects can use a surprising amount of
memory (see bug 53663).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l