Tim Peters <t...@python.org> added the comment:

SequenceMatcher looks for the longest _contiguous_ match. "UNIQUESTRING" isn't 
the longest by far when autojunk is False, but is the longest when autojunk is 
True. All those bpopular characters then effectively prevent finding a longer 
match than 'QUESTR' (capital 'I" is also in bpopular) directly.

The effects of autojunk can be surprising, and it would have been better if it 
were False by default. But I don't see anything unexpected here. Learn from 
experience and force it to False yourself ;-) BTW, it was introduced as a way 
to greatly speed comparing files of code, viewing them as sequences of lines. 
In that context, autojunk is rarely surprising and usually helpful. But it more 
often backfires when comparing strings (viewed as sequences of characters) :-(

----------
nosy: +tim.peters

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46667>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to