On Wed, 18 May 2005 15:06:53 -0500, Ed Morton <[EMAIL PROTECTED]> wrote:
> > >William Park wrote: > >> How do you compare 2 strings, and determine how much they are "close" to >> each other? Eg. >> aqwerty >> qwertyb >> are similar to each other, except for first/last char. But, how do I >> quantify that? >> >> I guess you can say for the above 2 strings that >> - at max, 6 chars out of 7 are same sequence --> 85% max >> >> But, for >> qawerty >> qwerbty >> max correlation is >> - 3 chars out of 7 are the same sequence --> 42% max >> >> (Crossposted to 3 of my favourite newsgroup.) >> > >"However you like" is probably the right answer, but one way might be to >compare their soundex encoding >(http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?soundex) and figure out >percentage difference based on comparing the numeric part. > Fantastic suggestion. Here's a tiny piece of real-life test data: compare the surnames "Mousaferiadis" and "McPherson". -- http://mail.python.org/mailman/listinfo/python-list