Re: Multilanguage

Till Kinstler Tue, 17 Feb 2009 01:58:54 -0800

Paul Libbrecht schrieb:

Clearly, then, something that matches words in a dictionary and decideson the language based on the language of the majority could do a decentjob to decide the analyzer.
Does such a tool exist?

I once played around with http://ngramj.sourceforge.net/ for languageguessing. It did a good job. It doesn't use dictionaries for languageidentification but a statistical approach using ngrams.I don't have any precise numbers, but out of about 10000 documents indifferent languages (most in English, German and French, few in othereuropean languages like Polish) there were only some 10 not identifiedcorrectly.


Till

--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de

Re: Multilanguage

Reply via email to