Anthony wrote:
> Using skip-deltas I think you could make a system fast enough to run live.
> At the very least it could be used as part of an incremental dump system.
> Using *smart* skip-deltas, you'd resolve the inefficiencies due to
> page-blanking vandalism.

One more possibility is to make md5 of every revision, then diff only 
between those that have unique md5s.

> One improvement over the diff format used by RCS would be to use smarter
> breakpoints, since wikitext tends to have a lot of really long lines with no
> line breaks.  Using some simple heuristics to guess at sentence breaks would
> probably be useful there.  It wouldn't have to be perfect, since

I suggest looking into wdiff ( http://www.gnu.org/software/wdiff/ ).

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to