Re: Nutch doesn't support Korean?

Cheolgoo Kang Fri, 03 Mar 2006 20:48:50 -0800

Hello,

There was similar issue with Lucene's StandardTokenizer.jj.


http://issues.apache.org/jira/browse/LUCENE-444

and

http://issues.apache.org/jira/browse/LUCENE-461

I'm have almost no experience with Nutch, but you can handle it like
those issues above.


On 3/4/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> I was browing NutchAnalysis.jj and found that
> Hungul Syllables (U+AC00 ... U+D7AF; U+xxxx means
> a Unicode character of the hex value xxxx) are not
> part of LETTER or CJK class.  This seems to me that
> Nutch cannot handle Korean documents at all.
>
> Is anybody successfully using Nutch for Korean?
>
> -kuro
>


--
Cheolgoo

Re: Nutch doesn't support Korean?

Reply via email to