Filtered docs and positions enum

2015-08-14 Thread Jamie Johnson
First sorry for the post to here and the solr list, not sure where this is most appropriately asked but since there is no response there I figured I'd try here... I have what I believe to be a fairly unique use case (as i have not seen it mentioned before) that I'm looking for some thoughts on. I

How to use case in-sentive search

2015-08-14 Thread vardhaman narasagoudar
Dear Team, I am trying to build a search engine for fetching person info based on name or email Id. For this I have standard Analyzer & wildcard. If I enter case senstive query I get the result. but how to go about for case in-senstive I mean if I search for rohan or Rohan should be same, Curren

Re: How to use case in-sentive search

2015-08-14 Thread Erick Erickson
Add LowercaseFilterFactory to your analysis chain for the fieldType both at query and index time. You'll need to re-index. The admin UI/analysis page will help you understand the effects of each analysis step defined in your fieldTypes. Best, Erick On Fri, Aug 14, 2015 at 3:44 AM, vardhaman nara

Re: How to use case in-sentive search

2015-08-14 Thread Jack Krupansky
I was assuming this was a Lucene question... The StandardAnalyzer already includes the lower case filter, so the default should be case-insensitive query. See: https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/standard/StandardAnalyzer.html If the question was real

Re: How to use case in-sentive search

2015-08-14 Thread Uwe Schindler
Hi, Wildcard queries don't use the Analyzer, so they are case sensitive. Most of Lucene's query parsers allow to lowercase although there is a wildcard, but xou have to enable this. In most cases it is recommended to use a plain simple analyzer for fields using wildcards. If you also have ste

getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Hi, I am new with Lucene Analyzer. I would like to get the full English tokens from SmartChineseAnalyzer. But I’m only getting stems. The following code has predefined the sentence in "testStr": String testStr = "女单方面,王适娴second seed和头号种子卫冕冠军西班牙选手马 林first seed同处1/4区,3号种子李雪芮和韩国选手Korean player成池铉处在

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Michael Mastroianni
The easiest thing to do is to create your own analyzer, cut and paste the code from org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer into it, and get rid of the line in createComponents(String fieldName, Reader reader) that says result = new PorterStemFilter(result); On Fri, Aug 14,

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Thanks Michael. That works well. Not sure why SmartChineseAnalyzer is final, otherwise we could overwrite createComponents(). New output: 女 单 方面 王 适 娴 second seed 和 头号 种子 卫冕 冠军 西班牙 选手 马 林 first seed 同 处 1 4 区 3 号 种子 李 雪 芮 和 韩国 选手 korean player 成 池 铉 处在 2 4 区 不过 成 池 铉 先 要 过 日本 小将 japanese player

RE: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Uwe Schindler
Hi, it's much easier to create own analyzers since Lucene 5.0 (without defining your own classes): https://lucene.apache.org/core/5_2_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html Using the builder you can create your own analyzer just with a few lines of code. The nam

Re: getting full english word from tokenizing with SmartChineseAnalyzer

2015-08-14 Thread Wayne Xin
Thanks Uwe. This seems to be a handy tool. My problem is I need a better example (tutorial maybe) to show me what are necessary/default filters a SmartChineseAnalyzer or JapaneseAnalyzer needs. In this case, I guess I need a HMMChineseTokenzier and a stop filter but not a porter stem filter. I coul