Re: [Nutch-general] Re: LanguageIdentifierPlugin and CJK

Jérôme Charron Wed, 11 Jan 2006 14:35:02 -0800

> So all that's needed is a Japanese, Chinese, and Korean text corpus to
> "train" the identifier?  Can the LanguageIdentifier deal and properly handle
> multi-byte character sets?


In theory, yes, but I have not yet tested it.

Jérôme


--
http://motrech.free.fr/
http://www.frutch.org/

Re: [Nutch-general] Re: LanguageIdentifierPlugin and CJK

Reply via email to