Fredrik Lundh wrote: > Diez B. Roggisch wrote: > >> The advantage becomes apparent when you try to e.g. compare >> >> "Angelina Jolie" >> >> with >> >> "AngelinaJolei" >> >> and >> >> "Bob" >> >> Both have a l-dist of 3 > >>>> distance("Angelina Jolie", "AngelinaJolei") > 3 >>>> distance("Angelina Jolie", "Bob") > 13 > > what did I miss ?
Hmm. I missed something - the "1" before the "3" in 13 when I looked on my terminal after running the example. And according to http://www.reference.com/browse/wiki/Levenshtein_distance it has the property """It is always at least the difference of the sizes of the two strings.""" And my implementation I got from there (or better from Magnus Lie Hetland whoms python version is referenced there) So you are right, my example is crap. But I ran into cases where my normalizing made sense - otherwise I wouldn't have done it :) I guess it is more along the lines of (coughed up example) "abcdef" compared to "abcefd" "abcd" I can only say that I used it to fuzzy-compare people's and hotel names, and applying the normalization made my results by far better. Sorry to cause confusion. Diez -- http://mail.python.org/mailman/listinfo/python-list