Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment: Some research of other projects:
LLVM [1][2] ----------- - Compute Levenshtein - Using O(n) memory rather than O(n^2) - Uses UpperBound = (len(typo) + 2) // 3 GCC [3] ------- - Uses Damerau-Levenshtein distance - Counts transpositions like "abcd" <-> "bacd" as one move - Swapping Case as in "a" <-> "A" counts as half a move - cutoff = (longer + 2) // 3 if longer - shorter >= 2 else max(longer // 3, 1) Rust [4] -------- - "maximum allowable edit distance defaults to one-third of the given word." - First checks for exact case-insensitive match, then check for Levenshtein distance small enough, then check if sorted(a.split("_")) == sorted(b.split("_")) Ruby [5] -------- - Quickly filter out words with bad Jaro–Winkler distance - threshold = input.length > 3 ? 0.834 : 0.77 - Only compute Levenshtein for words that remain - threshold = (input.length * 0.25).ceil - Output all good enough words - If no word was good enough then output the closest match. I think there are some good ideas here. [1] https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/llvm/include/llvm/ADT/edit_distance.h#L42 [2] https://github.com/llvm/llvm-project/blob/e2b3b89bf1ce74bf889897e0353a3e3fa93e4452/clang/lib/Sema/SemaLookup.cpp#L4263 [3] https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/spellcheck.c [4] https://github.com/rust-lang/rust/blob/673d0db5e393e9c64897005b470bfeb6d5aec61b/compiler/rustc_span/src/lev_distance.rs#L44 [5] https://github.com/ruby/ruby/blob/48b94b791997881929c739c64f95ac30f3fd0bb9/lib/did_you_mean/spell_checker.rb ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38530> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com