Re: No documents found for some queries with special chars like m&m

Utkarsh Sengar Tue, 27 Aug 2013 14:02:09 -0700

Yup, the query "o'reilly" worked after adding WDF to the index analyser.



Although "m&m" or "m\&m" doesn't work.
Field analysis for "m&m" says:
ST m, m
WDF m, m

ST m, m
WDF m, m

So essentially & is ignored during the index or the query. My guess is, the
standard tokenize is the problem. As the documentation says:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
Example: "I.B.M. 8.5 can't!!!" ==> ALPHANUM: "I.B.M.", NUM:"8.5",
ALPHANUM:"can't"

The char "&" will be ignored I guess.

*So, my question is:*
Is there a way I can make "m&m" index as one string AND also keep
StandardTokenizerFactory since I need it for other searches.

Thanks,
-Utkarsh


On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote:

> Thanks for the info.
>
> 1.
> http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn:
>
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":16,
>     "params":{
>       "debugQuery":"true",
>       "indent":"true",
>       "q":"o'reilly",
>       "wt":"json"}},
>   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
>   },
>   "debug":{
>     "rawquerystring":"o'reilly",
>     "querystring":"o'reilly",
>     "parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly oreilly)\")",
>     "parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"",
>     "QParser":"LuceneQParser",
>     "explain":{}
>    }
> }
>
>
>
> 2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this
> means tokens are same for "o'reilly"
> 3. I tried escaping ', it doesn’t help:
> http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true<http://SERVER/solr/prodinfo/select?q=o%5C%27reilly&wt=json&indent=true>
>
> I will add WordDelimiterFilterFactory for index and see if it fixes the
> problem.
>
> Thanks,
> -Utkarsh
>
>
>
> On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> First thing to do is attach &query=debug to your queries and look at the
>> parsed output.
>>
>> Second thing to do is look at the admin/analysis page and see what happens
>> at index and query time to things like o'reilly. You have
>> WordDelimiterFilterFactory
>> configured in your query but not index analysis chain. My bet on that is
>> that
>> you're getting different tokens at query and index time...
>>
>> Third thing is that you need to escape the & character. It's probably
>> being
>> interpreted as a delimiter on the URL and Solr ignores params it doesn't
>> understand.
>>
>> Best
>> Erick
>>
>>
>> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar <utkarsh2...@gmail.com
>> >wrote:
>>
>> > Some of the queries (not all) with special chars return no documents.
>> >
>> > Example: queries returning no documents
>> > q=m&m (this can be explained, when I search for "m m", no documents are
>> > returned)
>> > q=o'reilly (when I search for "o reilly", I get documents back)
>> >
>> >
>> > Queries returning documents:
>> > q=hello&world (document matched is "Hello World: A Life in Ham Radio")
>> >
>> >
>> > My questions are:
>> > 1. What's wrong with "o'reilly"? What changes do I need in my field
>> type?
>> > 2. How can I make the query "m&m" work?
>> > My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate Candy
>> > Coated Peanuts  19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1
>> Bag
>> > (42 oz)"
>> >
>> >
>> > FIeld type:
>> >         <fieldType name="text_general" class="solr.TextField"
>> > positionIncrementGap="100">
>> >              <analyzer type="index">
>> >                   <tokenizer class="solr.StandardTokenizerFactory"/>
>> >                   <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>> > words="stopwords.txt" enablePositionIncrements="true" />
>> >                   <filter class="solr.LowerCaseFilterFactory"/>
>> >                   <filter class="solr.EnglishMinimalStemFilterFactory"/>
>> >                   <filter class="solr.ASCIIFoldingFilterFactory"/>
>> >                   <filter
>> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >             </analyzer>
>> >             <analyzer type="query">
>> >                   <filter class="solr.WordDelimiterFilterFactory"
>> > generateWordParts="1" generateNumberParts="1"
>> >
>> > catenateWords="1"
>> >
>> > catenateNumbers="1"
>> >
>> > catenateAll="0"
>> >
>> > preserveOriginal="1"/>
>> >                   <tokenizer class="solr.StandardTokenizerFactory"/>
>> >                   <filter class="solr.StopFilterFactory"
>> ignoreCase="true"
>> > words="stopwords.txt" enablePositionIncrements="true" />
>> >                   <filter class="solr.LowerCaseFilterFactory"/>
>> >                   <filter class="solr.EnglishMinimalStemFilterFactory"/>
>> >                   <filter class="solr.ASCIIFoldingFilterFactory"/>
>> >                   <filter
>> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> >             </analyzer>
>> >         </fieldType>
>> >
>> >
>> > --
>> > Thanks,
>> > -Utkarsh
>> >
>>
>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh

Re: No documents found for some queries with special chars like m&m

Reply via email to