RE: Nutch doesn't support Korean?
Thank you. I filed a new bug NUTC-224. http://issues.apache.org/jira/browse/NUTCH-224 > -Original Message- > From: Cheolgoo Kang [mailto:[EMAIL PROTECTED] > Sent: 2006-3-03 20:49 > To: nutch-user@lucene.apache.org > Subject: Re: Nutch doesn't support Korean? > > Hello, > > There was similar issue with Lucene's StandardTokenizer.jj. > > http://issues.apache.org/jira/browse/LUCENE-444 > > and > > http://issues.apache.org/jira/browse/LUCENE-461 > > I'm have almost no experience with Nutch, but you can handle it like > those issues above.
Re: Nutch doesn't support Korean?
Hello, There was similar issue with Lucene's StandardTokenizer.jj. http://issues.apache.org/jira/browse/LUCENE-444 and http://issues.apache.org/jira/browse/LUCENE-461 I'm have almost no experience with Nutch, but you can handle it like those issues above. On 3/4/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote: > I was browing NutchAnalysis.jj and found that > Hungul Syllables (U+AC00 ... U+D7AF; U+ means > a Unicode character of the hex value ) are not > part of LETTER or CJK class. This seems to me that > Nutch cannot handle Korean documents at all. > > Is anybody successfully using Nutch for Korean? > > -kuro > -- Cheolgoo
Nutch doesn't support Korean?
I was browing NutchAnalysis.jj and found that Hungul Syllables (U+AC00 ... U+D7AF; U+ means a Unicode character of the hex value ) are not part of LETTER or CJK class. This seems to me that Nutch cannot handle Korean documents at all. Is anybody successfully using Nutch for Korean? -kuro