Language recognition

Tommaso Teofili Mon, 08 Dec 2008 01:23:47 -0800

Hello,
I am writing an AE pipeline and i need to recognize in which language the
starting document is written.
My idea is to use the Whitespace Tokenizer and the HMM Tagger together in
order to analyze the extracted tokens, calculate the percentage of well
known tokens for each language (against a dictionary) and then select the
highest percentage value language...
Do you know other (better) language recognition methods?
Thanks.
Tommaso

Language recognition

Reply via email to