Re: Levenshtein word comparison -performance issue

Basilisk96 Fri, 13 Feb 2009 16:56:36 -0800

On Feb 13, 5:42 am, "Gabriel Genellina" <[email protected]>
wrote:
> You may replace the last steps (sort + slice top 5) by heapq.nlargest - at  
> least you won't waste time sorting 49995 irrelevant words...
> Anyway you should measure the time taken by the first part (Levenshtein),  
> it may be the most demanding. I think there is a C extension for this,  
> should be much faster than pure Python calculations.


subdist: http://pypi.python.org/pypi/subdist/0.2.1
It uses a modified "fuzzy" version of the Levenshtein algorithm, which
I found more useful than the strict version. The only quirk to it is
that it accepts nothing but unicode. Other than that, it's a keeper.
It is extremely fast.

Cheers,
-Basilisk96
--
http://mail.python.org/mailman/listinfo/python-list

Re: Levenshtein word comparison -performance issue

Reply via email to