Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:

Some research of other projects:


LLVM [1][2]
-----------

- Compute Levenshtein
    - Using O(n) memory rather than O(n^2)
- Uses UpperBound = (len(typo) + 2) // 3


GCC [3]
-------

- Uses Damerau-Levenshtein distance
    - Counts transpositions like "abcd" <-> "bacd" as one move
- Swapping Case as in "a" <-> "A" counts as half a move
- cutoff = (longer + 2) // 3 if longer - shorter >= 2 else max(longer // 3, 1)


Rust [4]
--------

- "maximum allowable edit distance defaults to one-third of the given word."
- First checks for exact case-insensitive match, then check for Levenshtein 
distance small enough, then check if sorted(a.split("_")) == 
sorted(b.split("_"))


Ruby [5]
--------
- Quickly filter out words with bad Jaro–Winkler distance
    - threshold = input.length > 3 ? 0.834 : 0.77
- Only compute Levenshtein for words that remain
    - threshold = (input.length * 0.25).ceil
    - Output all good enough words
- If no word was good enough then output the closest match.


I think there are some good ideas here.


[1] 
https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/llvm/include/llvm/ADT/edit_distance.h#L42
[2] 
https://github.com/llvm/llvm-project/blob/e2b3b89bf1ce74bf889897e0353a3e3fa93e4452/clang/lib/Sema/SemaLookup.cpp#L4263
[3] 
https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/spellcheck.c
[4] 
https://github.com/rust-lang/rust/blob/673d0db5e393e9c64897005b470bfeb6d5aec61b/compiler/rustc_span/src/lev_distance.rs#L44
[5] 
https://github.com/ruby/ruby/blob/48b94b791997881929c739c64f95ac30f3fd0bb9/lib/did_you_mean/spell_checker.rb

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue38530>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to