On 11/15/06/11/06 22:02 +0100, Laurent Aguerreche wrote:

>I have begun to search algorithms and I found:
>
>* N-grams
>  http://en.wikipedia.org/wiki/N-gram
>* levenshtein
>  http://www.php.net/manual/en/function.levenshtein.php
>* similar text
>  http://www.php.net/manual/en/function.similar-text.php
>* soundex
>  http://www.php.net/manual/en/function.soundex.php
soundex allows you to find term that *sound* similar to an indexed term, so 
that might actually solve the french/swedish/danish transliteration 
problem.

I'll ask a computational linguist colleague tomorrow, maybe he has some 
ideas. 

I do see one problem, namely that in one context (programming code) people 
seem to prefer exact matches, without stemming or similarity-matching, 
while in other contexts (words in text, file names) people do want stemming 
and some form of similarity search regarding the orthography (spelling).  
There is probably not one solution that fits these two uses, but probably a 
search based on similarity would be fine also for source code.

 -eyal
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to