On Feb 13, 5:42 am, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> wrote: > You may replace the last steps (sort + slice top 5) by heapq.nlargest - at > least you won't waste time sorting 49995 irrelevant words... > Anyway you should measure the time taken by the first part (Levenshtein), > it may be the most demanding. I think there is a C extension for this, > should be much faster than pure Python calculations.
subdist: http://pypi.python.org/pypi/subdist/0.2.1 It uses a modified "fuzzy" version of the Levenshtein algorithm, which I found more useful than the strict version. The only quirk to it is that it accepts nothing but unicode. Other than that, it's a keeper. It is extremely fast. Cheers, -Basilisk96 -- http://mail.python.org/mailman/listinfo/python-list