On 11/15/06/11/06 22:02 +0100, Laurent Aguerreche wrote: >I have begun to search algorithms and I found: > >* N-grams > http://en.wikipedia.org/wiki/N-gram >* levenshtein > http://www.php.net/manual/en/function.levenshtein.php >* similar text > http://www.php.net/manual/en/function.similar-text.php >* soundex > http://www.php.net/manual/en/function.soundex.php soundex allows you to find term that *sound* similar to an indexed term, so that might actually solve the french/swedish/danish transliteration problem.
I'll ask a computational linguist colleague tomorrow, maybe he has some ideas. I do see one problem, namely that in one context (programming code) people seem to prefer exact matches, without stemming or similarity-matching, while in other contexts (words in text, file names) people do want stemming and some form of similarity search regarding the orthography (spelling). There is probably not one solution that fits these two uses, but probably a search based on similarity would be fine also for source code. -eyal _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
