Paul Libbrecht schrieb:

Clearly, then, something that matches words in a dictionary and decides on the language based on the language of the majority could do a decent job to decide the analyzer.

Does such a tool exist?

I once played around with http://ngramj.sourceforge.net/ for language guessing. It did a good job. It doesn't use dictionaries for language identification but a statistical approach using ngrams. I don't have any precise numbers, but out of about 10000 documents in different languages (most in English, German and French, few in other european languages like Polish) there were only some 10 not identified correctly.

Till

--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
kinst...@gbv.de, +49 (0) 551 39-13431, http://www.gbv.de

Reply via email to