Yup, the query "o'reilly" worked after adding WDF to the index analyser.
Although "m&m" or "m\&m" doesn't work. Field analysis for "m&m" says: ST m, m WDF m, m ST m, m WDF m, m So essentially & is ignored during the index or the query. My guess is, the standard tokenize is the problem. As the documentation says: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory Example: "I.B.M. 8.5 can't!!!" ==> ALPHANUM: "I.B.M.", NUM:"8.5", ALPHANUM:"can't" The char "&" will be ignored I guess. *So, my question is:* Is there a way I can make "m&m" index as one string AND also keep StandardTokenizerFactory since I need it for other searches. Thanks, -Utkarsh On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote: > Thanks for the info. > > 1. > http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn: > > { > "responseHeader":{ > "status":0, > "QTime":16, > "params":{ > "debugQuery":"true", > "indent":"true", > "q":"o'reilly", > "wt":"json"}}, > "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] > }, > "debug":{ > "rawquerystring":"o'reilly", > "querystring":"o'reilly", > "parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly oreilly)\")", > "parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"", > "QParser":"LuceneQParser", > "explain":{} > } > } > > > > 2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this > means tokens are same for "o'reilly" > 3. I tried escaping ', it doesn’t help: > http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true<http://SERVER/solr/prodinfo/select?q=o%5C%27reilly&wt=json&indent=true> > > I will add WordDelimiterFilterFactory for index and see if it fixes the > problem. > > Thanks, > -Utkarsh > > > > On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> First thing to do is attach &query=debug to your queries and look at the >> parsed output. >> >> Second thing to do is look at the admin/analysis page and see what happens >> at index and query time to things like o'reilly. You have >> WordDelimiterFilterFactory >> configured in your query but not index analysis chain. My bet on that is >> that >> you're getting different tokens at query and index time... >> >> Third thing is that you need to escape the & character. It's probably >> being >> interpreted as a delimiter on the URL and Solr ignores params it doesn't >> understand. >> >> Best >> Erick >> >> >> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar <utkarsh2...@gmail.com >> >wrote: >> >> > Some of the queries (not all) with special chars return no documents. >> > >> > Example: queries returning no documents >> > q=m&m (this can be explained, when I search for "m m", no documents are >> > returned) >> > q=o'reilly (when I search for "o reilly", I get documents back) >> > >> > >> > Queries returning documents: >> > q=hello&world (document matched is "Hello World: A Life in Ham Radio") >> > >> > >> > My questions are: >> > 1. What's wrong with "o'reilly"? What changes do I need in my field >> type? >> > 2. How can I make the query "m&m" work? >> > My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate Candy >> > Coated Peanuts 19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1 >> Bag >> > (42 oz)" >> > >> > >> > FIeld type: >> > <fieldType name="text_general" class="solr.TextField" >> > positionIncrementGap="100"> >> > <analyzer type="index"> >> > <tokenizer class="solr.StandardTokenizerFactory"/> >> > <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> > words="stopwords.txt" enablePositionIncrements="true" /> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > <filter class="solr.EnglishMinimalStemFilterFactory"/> >> > <filter class="solr.ASCIIFoldingFilterFactory"/> >> > <filter >> class="solr.RemoveDuplicatesTokenFilterFactory"/> >> > </analyzer> >> > <analyzer type="query"> >> > <filter class="solr.WordDelimiterFilterFactory" >> > generateWordParts="1" generateNumberParts="1" >> > >> > catenateWords="1" >> > >> > catenateNumbers="1" >> > >> > catenateAll="0" >> > >> > preserveOriginal="1"/> >> > <tokenizer class="solr.StandardTokenizerFactory"/> >> > <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> > words="stopwords.txt" enablePositionIncrements="true" /> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > <filter class="solr.EnglishMinimalStemFilterFactory"/> >> > <filter class="solr.ASCIIFoldingFilterFactory"/> >> > <filter >> class="solr.RemoveDuplicatesTokenFilterFactory"/> >> > </analyzer> >> > </fieldType> >> > >> > >> > -- >> > Thanks, >> > -Utkarsh >> > >> > > > > -- > Thanks, > -Utkarsh > -- Thanks, -Utkarsh