Hi Jack:

May I know what kind of segmentation you use for your CJK project? Did
you add in your own bi-gram segmentation for CJK?

I noticed the project Doug did for creativecommons.org using Nutch, I
tested that website search function and found even Chinese search return
quite decent result. 

Can Doug share with us whether any special handling is included for
thoese CJK-related result? Or you just use default
NutchDocumentTokenizer for creativecommons.org also.

Thanks to all for your reading.

Guoqiao

-----Original Message-----
From: Jack Tang [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 02, 2005 2:18 PM
To: [email protected]; Transbuerg Tian
Subject: Re: Can I build CJK application based no Nutch?


Sorry, I am wrong. It is still broken in svn. I tried to merge bi-gram
segmentation into NutchAnalysis.jj. It seems hard and will take a lot of
time. Can someone working on CJK thread give me some advice ?


/Jack

On 6/2/05, Jack Tang <[EMAIL PROTECTED]> wrote:
> Hi Tian&Wu
> 
> I suppose nutch now supports CJK bi-gram segmentation now.
> 
> /Jack
> 
> On 5/25/05, Transbuerg Tian <[EMAIL PROTECTED]> wrote:
> > hi, wufuheng,
> >
> > first:
> > if you are using lucene or nutch for indexing chinese content, I 
> > recommend weblucene for you , you could get more info at : 
> > http://www.chedong.com .
> > second:
> > cjk sentence split is quite different , for chinese , the very 
> > famous is use
> >
> > ICTCLAS , you could search it at google,
> >
> > and I write a chinese sentence spliter , by java, c sharp ,both.
> >
> > you can get that at: http://www.domolo.com/tec/index.htm
> > or write a letter to : [EMAIL PROTECTED]
> >
> > hope this will help you.
> >
> > transbuerg tian
> > beijing,china
> > http://www.domolo.com
> >
> >
> >
> >
> > 2005/5/24, wu fuheng <[EMAIL PROTECTED]>:
> > >
> > > Dear all,
> > > I think Nutch is a good wrapper for Lucene and with a good 
> > > crawler. Now if I want to build some Chinese/Japan/Korean Language

> > > search application. Should I start from Lucene or Nutch? How Nutch

> > > does support CJK application? Sincerely your,
> > > Simon
> > >
> >
> >
>

Reply via email to