Roan Kattouw wrote:
> 2009/4/22 Kent Wang <[email protected]>:
>> I'm building an application that uses DifferenceEngine.php to generate
>> word level unified diffs. I've figured out how to do this but now need
>> to generate patches given the diff.
>
> It's not in MediaWiki, and I don't know if it's in PHP, but there's a
> very widespread command line program installed on virtually every
> UNIX/Linux system that can do this. Unsurprisingly, it's called
> "patch".

The problem is that diff and patch do line-level diffs, and he wants to 
do it on the word level.

Of course, a possible workaround would be to reversibly transform the 
files such that every word (or other token) ends up on a separate line. 
  Since the transformed version doesn't really have to be readable, you 
could, say, URL-encode every token.  Then you'd just have to figure out 
how to correspondingly transform your diff so that it can be applied to 
the transformed files by patch.

Of course, it's not that hard to apply a patch by hand either: a diff is 
essentially just a list of straightforward intructions of the form 
"delete these lines/tokens, insert these in their place".  In general, 
you just first tokenize the file you're patching, and then loop over the 
diff applying the changes to the list of tokens.

This works just fine as long as the patch applies exactly.  Much of the 
complexity in the patch utility is involved in "fuzzy matching", which 
allows it to apply patches even if the target file isn't quite identical 
to the one the diff was generated against, by using the context 
information in the diff to adjust the offsets.  For some purposes, this 
feature isn't particularly important or useful; for others, it's vital.

-- 
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to