Sorry, I am wrong. It is still broken in svn. I tried to merge bi-gram
segmentation into NutchAnalysis.jj. It seems hard and will take a lot
of time. Can someone working on CJK thread give me some advice ?


/Jack

On 6/2/05, Jack Tang <[EMAIL PROTECTED]> wrote:
> Hi Tian&Wu
> 
> I suppose nutch now supports CJK bi-gram segmentation now.
> 
> /Jack
> 
> On 5/25/05, Transbuerg Tian <[EMAIL PROTECTED]> wrote:
> > hi, wufuheng,
> >
> > first:
> > if you are using lucene or nutch for indexing chinese content,
> > I recommend weblucene for you , you could get more info at :
> > http://www.chedong.com .
> > second:
> > cjk sentence split is quite different , for chinese , the very famous is use
> >
> > ICTCLAS , you could search it at google,
> >
> > and I write a chinese sentence spliter , by java, c sharp ,both.
> >
> > you can get that at: http://www.domolo.com/tec/index.htm
> > or write a letter to : [EMAIL PROTECTED]
> >
> > hope this will help you.
> >
> > transbuerg tian
> > beijing,china
> > http://www.domolo.com
> >
> >
> >
> >
> > 2005/5/24, wu fuheng <[EMAIL PROTECTED]>:
> > >
> > > Dear all,
> > > I think Nutch is a good wrapper for Lucene and with a good crawler.
> > > Now if I want to build some Chinese/Japan/Korean Language search
> > > application. Should I start from Lucene or Nutch? How Nutch does
> > > support CJK application?
> > > Sincerely your,
> > > Simon
> > >
> >
> >
>

Reply via email to