New submission from brian.gallagher <oss.brn...@gmail.com>:

Currently difflib's get_close_matches() doesn't match similar words that differ 
in their casing very well.

Example:
user@host:~$ python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import difflib
>>> difflib.get_close_matches("apple", "APPLE")
[]
>>> difflib.get_close_matches("apple", "APpLe")
[]
>>>

These seem like they should be considered close matches for each other, given 
the SequenceMatcher used in difflib.py attempts to produce a "human-friendly 
diff" of two words in order to yield "intuitive difference reports".

One solution would be for the user of the function to perform their own 
transformation of the supplied data, such as converting all strings to 
lower-case for example. However, it seems like this might be a surprise to a 
user of the function if they weren't aware of this limitation. It would be 
preferable to provide this functionality by default in my eyes.

If this is an issue the relevant maintainer(s) consider worth pursuing, I'd 
love to try my hand at preparing a patch for this.

----------
messages: 363618
nosy: brian.gallagher
priority: normal
severity: normal
status: open
title: [difflib] Improve get_close_matches() to better match when casing of 
words are different
versions: Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39891>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to