On Wed, 06 Feb 2008 17:32:53 -0600, Robert Kern wrote: > Jeff Schwab wrote: ... >> If the strings happen to be the same length, the Levenshtein distance >> is equivalent to the Hamming distance. ... > I'm afraid that it isn't. Using Magnus Lie Hetland's implementation: ... > In [4]: hamdist('abcdef', 'fabcde') > Out[4]: 6 > > In [5]: levenshtein('abcdef', 'fabcde') > Out[5]: 2
I can confirm Robert's results, using a different implementation of the Levenshtein edit distance. It isn't true that Levenshtein simplifies to Hamming when the strings are equal length. For what it's worth, here's my implementation, which I wrote some years ago. I doubt it's optimal. def levenshtein(s1, s2): """Return the Levenshtein edit distance between two strings. s1 and s2 should be two arbitrary length strings. Returns an integer count of the edit distance between the strings. """ m, n = len(s1), len(s2) if m == 0: return n elif n == 0: return m # Create an array with rows 0..m and columns 0..n. array = [None]*(m+1) for i in range(m+1): array[i] = [None]*(n+1) # Initialise the first row to 0..n. for i in range(n+1): array[0][i] = i # Initialize the first column to 0..m. for i in range(m+1): array[i][0] = i # Measure the differences. for i in range(1, m+1): c1 = s1[i-1] for j in range(1, n+1): c2 = s2[j-1] cost = int(c1 != c2) x = array[i][j-1] + 1 # Cell immediately above plus one. y = array[i-1][j] + 1 # Cell immediately left plus one. z = array[i-1][j-1] + cost # Cell diagonally above and # to the left, plus the cost. array[i][j] = min(x, y, z) # When done, the bottom-right cell contains the Levenshtein distance. return array[m][n] -- Steven -- http://mail.python.org/mailman/listinfo/python-list