Re: How to use case in-sentive search

2015-08-14 Thread Jack Krupansky
I was assuming this was a Lucene question... The StandardAnalyzer already includes the lower case filter, so the default should be case-insensitive query. See: https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html If the question was

How to use case in-sentive search

2015-08-14 Thread vardhaman narasagoudar
Dear Team, I am trying to build a search engine for fetching person info based on name or email Id. For this I have standard Analyzer wildcard. If I enter case senstive query I get the result. but how to go about for case in-senstive I mean if I search for rohan or Rohan should be same,

Re: How to use case in-sentive search

2015-08-14 Thread Erick Erickson
Add LowercaseFilterFactory to your analysis chain for the fieldType both at query and index time. You'll need to re-index. The admin UI/analysis page will help you understand the effects of each analysis step defined in your fieldTypes. Best, Erick On Fri, Aug 14, 2015 at 3:44 AM, vardhaman

Re: How to use case in-sentive search

2015-08-14 Thread Uwe Schindler
Hi, Wildcard queries don't use the Analyzer, so they are case sensitive. Most of Lucene's query parsers allow to lowercase although there is a wildcard, but xou have to enable this. In most cases it is recommended to use a plain simple analyzer for fields using wildcards. If you also have

getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Hi, I am new with Lucene Analyzer. I would like to get the full English tokens from SmartChineseAnalyzer. But I’m only getting stems. The following code has predefined the sentence in testStr: String testStr = 女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Michael Mastroianni
The easiest thing to do is to create your own analyzer, cut and paste the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into it, and get rid of the line in createComponents(String fieldName, Reader reader) that says result = new PorterStemFilter(result); On Fri, Aug

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is final, otherwise we could overwrite createComponents(). New output: 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanese player

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Thanks Uwe. This seems to be a handy tool. My problem is I need a better example (tutorial maybe) to show me what are necessary/default filters a SmartChineseAnalyzer or JapaneseAnalyzer needs. In this case, I guess I need a HMMChineseTokenzier and a stop filter but not a porter stem filter. I

RE: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Uwe Schindler
Hi, it's much easier to create own analyzers since Lucene 5.0 (without defining your own classes): https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html Using the builder you can create your own analyzer just with a few lines of code. The