Hi Tian&Wu I suppose nutch now supports CJK bi-gram segmentation now.
/Jack On 5/25/05, Transbuerg Tian <[EMAIL PROTECTED]> wrote: > hi, wufuheng, > > first: > if you are using lucene or nutch for indexing chinese content, > I recommend weblucene for you , you could get more info at : > http://www.chedong.com . > second: > cjk sentence split is quite different , for chinese , the very famous is use > > ICTCLAS , you could search it at google, > > and I write a chinese sentence spliter , by java, c sharp ,both. > > you can get that at: http://www.domolo.com/tec/index.htm > or write a letter to : [EMAIL PROTECTED] > > hope this will help you. > > transbuerg tian > beijing,china > http://www.domolo.com > > > > > 2005/5/24, wu fuheng <[EMAIL PROTECTED]>: > > > > Dear all, > > I think Nutch is a good wrapper for Lucene and with a good crawler. > > Now if I want to build some Chinese/Japan/Korean Language search > > application. Should I start from Lucene or Nutch? How Nutch does > > support CJK application? > > Sincerely your, > > Simon > > > >
