Grant Edwards <inva...@invalid> wrote: > Apparently that "filtering out" characters doesn't mean that > they're ignored when doing the comparison. (A bit of a "WTF?" > if you ask me). After some more googling, it appears that I'm > far from the first person who interpreted "filtered out" as > "ignored when comparing lines". I'd submit a fix for the doc > page, but you apparently have to be a lot smarter than me to > figure out what "filters out" means in this context.
So far as I can see from looking at the code: Once if you have identified one block of lines as having been replaced by another the matcher can then give you additional information by marking up the changes within each line. However it only makes sense to do that if the lines are still somewhat similar. 'charjunk' is used to remove junk characters before scanning the lines within a replacement block and the most similar lines (if they are sufficiently similar) are then chosen for this extra step of comparing the character changes within the line. Here's an example. If I do this: >>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1), 'one\nwot\ntoo\nthree\n'.splitlines(1))) one - two ? - + wot ? + + too three The comparison detected that "two" was replaced by 2 lines "wot" and "too". It decided the first of these was the best match for the original line so it shows character level difference between the original and the first replacement line. >>> print ''.join(Differ(charjunk=lambda c:c=='w') .compare('one\ntwo\nthree\n'.splitlines(1), 'one\nwot\ntoo\nthree\n'.splitlines(1))) one + wot - two ? ^ + too ? ^ three This time we told the system that we don't care about 'w' in either the original or replacement text. That means instead of seeing which of "wot" and "too" is closest to "two" it looks to see which of "ot" and "too" is closest to "to". "ot" has two changes but "too" only has one, so this time it does the detailed comparison between the original line and the second line of the output. N.B. The junk function is only used to decide which lines to use for the detailed comparison: the original lines are still used for the comparison itself. -- http://mail.python.org/mailman/listinfo/python-list