Irmen de Jong wrote: > Wouldn't it be easier if it is normalized to a float between 0.0 and 1.0?
Maybe. You could do that by ignoring "negative" values, and by dividing by min(len(s1),len(s2))+1. For my application this is irrelevant, I only need a scale to compare a single word to many different words, and pick the one that stands out. I can relate the (unnormalized) score to the number of typos in the word. > Furthermore, I can also produce wrong(?) results: > > $ python comparestrings.py > s1: test > s2: x > Score: -0.4 > > Minus 0.4... Is this a bug? Not in the procedure, it is a bug in my description. It was late at night when I wrote it. What you are doing here is aligning the two words like this: test --x- the t opens a gap (score -0.3), e elongates the gap (-0.2), s and x are adjacent on the keyboard (score 0.4) and t opens a gap (-0.3). Total score is -0.4. If you compare "test" with "l" you even get a score of -0.7. You can make arbitrarily low numbers by comparing long strings of "a" characters with a single "l". What this is telling you is that not only are the letters all different (score 0.0), but the strings are not even equally long (hence <0.0). The gap-open and gap-elongation penalties can be adjusted in the module. The -0.3 and -0.2 values were set after some experimentation to make the relative scores of different matches "feel" natural. -- Rob W.W. Hooft || [EMAIL PROTECTED] || http://www.hooft.net/people/rob/ -- http://mail.python.org/mailman/listinfo/python-list