Re: How to avoid underscore sign indexing problem?

Floyd Wu Wed, 21 Aug 2013 23:00:52 -0700

After trying some search case and different params combination of
WordDelimeter. I wonder what is the best strategy to index string
"2DA012_ISO MARK 2" and can be search by term "2DA012"?


What if I just want _ to be removed both query/index time, what and how to
configure?

Floyd



2013/8/22 Floyd Wu <floyd...@gmail.com>

> Thank you all.
> By the way, Jack I gonna by your book. Where to buy?
> Floyd
>
>
> 2013/8/22 Jack Krupansky <j...@basetechnology.com>
>
>> "I thought that the StandardTokenizer always split on punctuation, "
>>
>> Proving that you haven't read my book! The section on the standard
>> tokenizer details the rules that the tokenizer uses (in addition to
>> extensive examples.) That's what I mean by "deep dive."
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Shawn Heisey
>> Sent: Wednesday, August 21, 2013 10:41 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to avoid underscore sign indexing problem?
>>
>>
>> On 8/21/2013 7:54 PM, Floyd Wu wrote:
>>
>>> When using StandardAnalyzer to tokenize string "Pacific_Rim" will get
>>>
>>> ST
>>> textraw_**bytesstartendtypeposition
>>> pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]011<ALPHANUM>1
>>>
>>> How to make this string to be tokenized to these two tokens "Pacific",
>>> "Rim"?
>>> Set _ as stopword?
>>> Please kindly help on this.
>>> Many thanks.
>>>
>>
>> Interesting.  I thought that the StandardTokenizer always split on
>> punctuation, but apparently that's not the case for the underscore
>> character.
>>
>> You can always use the WordDelimeterFilter after the StandardTokenizer.
>>
>> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
>> WordDelimiterFilterFactory<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>
>>
>> Thanks,
>> Shawn
>>
>
>

Re: How to avoid underscore sign indexing problem?

Reply via email to