Hello, I am writing an AE pipeline and i need to recognize in which language the starting document is written. My idea is to use the Whitespace Tokenizer and the HMM Tagger together in order to analyze the extracted tokens, calculate the percentage of well known tokens for each language (against a dictionary) and then select the highest percentage value language... Do you know other (better) language recognition methods? Thanks. Tommaso
- Language recognition Tommaso Teofili
- RE: Language recognition Torsten Zesch
- Re: Language recognition Niels Ott
- Re: Language recognition Hannes Carl Meyer
- RE: Language recognition D.J. McCloskey
- Re: Language recognition Tommaso Teofili
- Re: Language recognition Tommaso Teofili
- Re: Language recognition Hannes Carl Meyer
