On Wed, Jul 7, 2010 at 9:18 AM, Terry Reedy <tjre...@udel.edu> wrote: > In the commit message for revision 26661, which added the heuristic, Tim > Peters wrote "While I like what I've seen of the effects so far, I still > consider this experimental. Please give it a try!" Several people who have > tried it discovered the problem with small alphabets and posted to the > tracker. Issues #1528074, #1678339. #1678345, and #4622 are now-closed > duplicates of #2986. The heuristic needs revision.
Python 2.3 you say... Hmm, I've been using difflib.SequenceMatcher for years in a serial bit error rate tester (with typical message sizes ranging from tens of bytes to tens of thousands of bytes) that occasionally gives unexpected results. I'd been blaming hardware glitches (and, to be fair, all of the odd results I can recall off the top of my head were definitively traced to problems in the hardware under test), but I should probably check I'm not running afoul of this bug. And Tim, the algorithm may not be optimal as a general purpose binary diff algorithm, but it's still a hell of a lot faster than the hardware I use it to test. Compared to the equipment configuration times, the data comparison time is trivial. There's another possibility here - perhaps the heuristic should be off by default in SequenceMatcher, with a TextMatcher subclass that enables it (and Differ and HtmlDiff then inheriting from the latter)? There's currently barely anything in the SequenceMatcher documentation to indicate that it is designed primarily for comparing text rather than arbitrary sequences (the closest it gets is the reference to Ratcliff/Obserhelp gestalt pattern matching and then the link to the Ratcliff/Metzener Dr Dobb's article - and until this thread, I'd never followed the article link). Rather than reverting to Tim's undocumented vision, perhaps we should better articulate it by separating the general purpose matcher from an optimised text matcher. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com