RE: Nutch doesn't support Korean?

2006-03-06 Thread Teruhiko Kurosaka
Thank you.  I filed a new bug NUTC-224.
http://issues.apache.org/jira/browse/NUTCH-224

> -Original Message-
> From: Cheolgoo Kang [mailto:[EMAIL PROTECTED] 
> Sent: 2006-3-03 20:49
> To: nutch-user@lucene.apache.org
> Subject: Re: Nutch doesn't support Korean?
> 
> Hello,
> 
> There was similar issue with Lucene's StandardTokenizer.jj.
> 
> http://issues.apache.org/jira/browse/LUCENE-444
> 
> and
> 
> http://issues.apache.org/jira/browse/LUCENE-461
> 
> I'm have almost no experience with Nutch, but you can handle it like
> those issues above.


Re: Nutch doesn't support Korean?

2006-03-03 Thread Cheolgoo Kang
Hello,

There was similar issue with Lucene's StandardTokenizer.jj.

http://issues.apache.org/jira/browse/LUCENE-444

and

http://issues.apache.org/jira/browse/LUCENE-461

I'm have almost no experience with Nutch, but you can handle it like
those issues above.


On 3/4/06, Teruhiko Kurosaka <[EMAIL PROTECTED]> wrote:
> I was browing NutchAnalysis.jj and found that
> Hungul Syllables (U+AC00 ... U+D7AF; U+ means
> a Unicode character of the hex value ) are not
> part of LETTER or CJK class.  This seems to me that
> Nutch cannot handle Korean documents at all.
>
> Is anybody successfully using Nutch for Korean?
>
> -kuro
>


--
Cheolgoo


Nutch doesn't support Korean?

2006-03-03 Thread Teruhiko Kurosaka
I was browing NutchAnalysis.jj and found that
Hungul Syllables (U+AC00 ... U+D7AF; U+ means
a Unicode character of the hex value ) are not
part of LETTER or CJK class.  This seems to me that
Nutch cannot handle Korean documents at all.

Is anybody successfully using Nutch for Korean?

-kuro