> So all that's needed is a Japanese, Chinese, and Korean text corpus to > "train" the identifier? Can the LanguageIdentifier deal and properly handle > multi-byte character sets?
In theory, yes, but I have not yet tested it. Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
