Sorry, I am wrong. It is still broken in svn. I tried to merge bi-gram segmentation into NutchAnalysis.jj. It seems hard and will take a lot of time. Can someone working on CJK thread give me some advice ?
/Jack On 6/2/05, Jack Tang <[EMAIL PROTECTED]> wrote: > Hi Tian&Wu > > I suppose nutch now supports CJK bi-gram segmentation now. > > /Jack > > On 5/25/05, Transbuerg Tian <[EMAIL PROTECTED]> wrote: > > hi, wufuheng, > > > > first: > > if you are using lucene or nutch for indexing chinese content, > > I recommend weblucene for you , you could get more info at : > > http://www.chedong.com . > > second: > > cjk sentence split is quite different , for chinese , the very famous is use > > > > ICTCLAS , you could search it at google, > > > > and I write a chinese sentence spliter , by java, c sharp ,both. > > > > you can get that at: http://www.domolo.com/tec/index.htm > > or write a letter to : [EMAIL PROTECTED] > > > > hope this will help you. > > > > transbuerg tian > > beijing,china > > http://www.domolo.com > > > > > > > > > > 2005/5/24, wu fuheng <[EMAIL PROTECTED]>: > > > > > > Dear all, > > > I think Nutch is a good wrapper for Lucene and with a good crawler. > > > Now if I want to build some Chinese/Japan/Korean Language search > > > application. Should I start from Lucene or Nutch? How Nutch does > > > support CJK application? > > > Sincerely your, > > > Simon > > > > > > > >
