Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
e >- >Uwe Schindler >H.-H.-Meier-Allee 63, D-28213 Bremen >http://www.thetaphi.de >eMail: u...@thetaphi.de > > >> -Original Message- >> From: Wayne Xin [mailto:wayne_...@hotmail.com] >> Sent: Friday, August 14, 2015 8:44 PM >

RE: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Uwe Schindler
; Sent: Friday, August 14, 2015 8:44 PM > To: java-user@lucene.apache.org > Subject: Re: getting full english word from tokenizing with > SmartChineseAnalyzer > > Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is > final, otherwise we could overwrite createCompone

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is final, otherwise we could overwrite createComponents(). New output: 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanese player

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Michael Mastroianni
The easiest thing to do is to create your own analyzer, cut and paste the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into it, and get rid of the line in createComponents(String fieldName, Reader reader) that says result = new PorterStemFilter(result); On Fri, Aug 14,