Re: String comparision

2009-01-25 Thread S.Selvam Siva
Thank You Gabriel,

On Sun, Jan 25, 2009 at 7:12 AM, Gabriel Genellina
gagsl-...@yahoo.com.arwrote:

 En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva s.selvams...@gmail.com
 escribió:


  I am developing spell checker for my local language(tamil) using python.
 I need to generate alternative word list for a miss-spelled word from the
 dictionary of words.The alternatives must be as much as closer to the
 miss-spelled word.As we know, ordinary string comparison wont work here .
 Any suggestion for this problem is welcome.


 I think it would better to add Tamil support to some existing library like
 GNU aspell: http://aspell.net/



That was my plan earlier,But i am not sure how aspell integrates with other
editors.Better i will ask it in aspell mailing list.


 You are looking for fuzzy matching:
 http://en.wikipedia.org/wiki/Fuzzy_string_searching
 In particular, the Levenshtein distance is widely used; I think there is a
 Python extension providing those calculations.

 --
 Gabriel Genellina

The following code served my purpose,(thanks for some unknown contributors)
def distance(a,b):
c = {}
n = len(a); m = len(b)

for i in range(0,n+1):
c[i,0] = i
for j in range(0,m+1):
c[0,j] = j

for i in range(1,n+1):
for j in range(1,m+1):
x = c[i-1,j]+1
y = c[i,j-1]+1
if a[i-1] == b[j-1]:
z = c[i-1,j-1]
else:
z = c[i-1,j-1]+1
c[i,j] = min(x,y,z)
return c[n,m]

a=sys.argv[1]
b=sys.argv[2]
d=distance(a,b)
print d=,d
longer = float(max((len(a), len(b
shorter = float(min((len(a), len(b
r = ((longer - d) / longer) * (shorter / longer)
# r ranges between 0 and 1




-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


String comparision

2009-01-24 Thread S.Selvam Siva
Hi all,

I am developing spell checker for my local language(tamil) using python.
I need to generate alternative word list for a miss-spelled word from the
dictionary of words.The alternatives must be as much as closer to the
miss-spelled word.As we know, ordinary string comparison wont work here .
Any suggestion for this problem is welcome.

-- 
Yours,
S.Selvam
--
http://mail.python.org/mailman/listinfo/python-list


Re: String comparision

2009-01-24 Thread Gabriel Genellina
En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva s.selvams...@gmail.com  
escribió:



I am developing spell checker for my local language(tamil) using python.
I need to generate alternative word list for a miss-spelled word from the
dictionary of words.The alternatives must be as much as closer to the
miss-spelled word.As we know, ordinary string comparison wont work here .
Any suggestion for this problem is welcome.


I think it would better to add Tamil support to some existing library like  
GNU aspell: http://aspell.net/
You are looking for fuzzy matching:  
http://en.wikipedia.org/wiki/Fuzzy_string_searching
In particular, the Levenshtein distance is widely used; I think there is a  
Python extension providing those calculations.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list