Hello,
I am writing an AE pipeline and i need to recognize in which language the
starting document is written.
My idea is to use the Whitespace Tokenizer and the HMM Tagger together in
order to analyze the extracted tokens, calculate the percentage of well
known tokens for each language (against a dictionary) and then select the
highest percentage value language...
Do you know other (better) language recognition methods?
Thanks.
Tommaso

Reply via email to