Hello, There was similar issue with Lucene's StandardTokenizer.jj.
http://issues.apache.org/jira/browse/LUCENE-444 and http://issues.apache.org/jira/browse/LUCENE-461 I'm have almost no experience with Nutch, but you can handle it like those issues above. On 3/4/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote: > I was browing NutchAnalysis.jj and found that > Hungul Syllables (U+AC00 ... U+D7AF; U+xxxx means > a Unicode character of the hex value xxxx) are not > part of LETTER or CJK class. This seems to me that > Nutch cannot handle Korean documents at all. > > Is anybody successfully using Nutch for Korean? > > -kuro > -- Cheolgoo
