Anthony wrote: > Using skip-deltas I think you could make a system fast enough to run live. > At the very least it could be used as part of an incremental dump system. > Using *smart* skip-deltas, you'd resolve the inefficiencies due to > page-blanking vandalism.
One more possibility is to make md5 of every revision, then diff only between those that have unique md5s. > One improvement over the diff format used by RCS would be to use smarter > breakpoints, since wikitext tends to have a lot of really long lines with no > line breaks. Using some simple heuristics to guess at sentence breaks would > probably be useful there. It wouldn't have to be perfect, since I suggest looking into wdiff ( http://www.gnu.org/software/wdiff/ ). _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
