> Use a different tokenizer, possibly one of the regex ones. > fake it with phrase queries. > Take a really good look at the various filter combinations. It's possible that WhitespaceTokenizer and WordDelimiterFilterFactory might be able to do good things. Will try to play with these two options.
> Clearly define whether this is capability that you really need. Yes, this is a needed feature. Some of our queries are at&t, h&m, m&m. Returning an empty response is not one of the best experience. I also tried: <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" preserveOriginal="1" types="wdfftypes.txt"/> With: wdfftypes.txt: & => ALPHA \u0026 => ALPHA $ => DIGIT % => DIGIT . => DIGIT \u002C => DIGIT But it didn't work. Thanks, -Utkarsh On Tue, Aug 27, 2013 at 3:07 PM, Erick Erickson <erickerick...@gmail.com>wrote: > bq: Is there a way I can make "m&m" index as one string AND also keep > StandardTokenizerFactory since I need it for other searches. > > In a word, no. You get one and only one tokenizer per field. But there > are lots of options: > > Use a different tokenizer, possibly one of the regex ones. > > fake it with phrase queries. > > Take a really good look at the various filter combinations. It's > possible that WhitespaceTokenizer and WordDelimiterFilterFactory > might be able to do good things. > > Clearly define whether this is capability that you really need. > > This last is my recurring plea to insure that the effort is of real benefit > to the user and not just something someone noticed that's actually > only useful 0.001% of the time. > > Best > Erick > > > On Tue, Aug 27, 2013 at 5:00 PM, Utkarsh Sengar <utkarsh2...@gmail.com > >wrote: > > > Yup, the query "o'reilly" worked after adding WDF to the index analyser. > > > > > > Although "m&m" or "m\&m" doesn't work. > > Field analysis for "m&m" says: > > ST m, m > > WDF m, m > > > > ST m, m > > WDF m, m > > > > So essentially & is ignored during the index or the query. My guess is, > the > > standard tokenize is the problem. As the documentation says: > > > > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory > > Example: "I.B.M. 8.5 can't!!!" ==> ALPHANUM: "I.B.M.", NUM:"8.5", > > ALPHANUM:"can't" > > > > The char "&" will be ignored I guess. > > > > *So, my question is:* > > Is there a way I can make "m&m" index as one string AND also keep > > StandardTokenizerFactory since I need it for other searches. > > > > Thanks, > > -Utkarsh > > > > > > On Tue, Aug 27, 2013 at 11:44 AM, Utkarsh Sengar <utkarsh2...@gmail.com > > >wrote: > > > > > Thanks for the info. > > > > > > 1. > > > > > > http://SERVER/solr/prodinfo/select?q=o%27reilly&wt=json&indent=true&debugQuery=truereturn > > : > > > > > > { > > > "responseHeader":{ > > > "status":0, > > > "QTime":16, > > > "params":{ > > > "debugQuery":"true", > > > "indent":"true", > > > "q":"o'reilly", > > > "wt":"json"}}, > > > "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[] > > > }, > > > "debug":{ > > > "rawquerystring":"o'reilly", > > > "querystring":"o'reilly", > > > "parsedquery":"MultiPhraseQuery(allText:\"o'reilly (reilly > > oreilly)\")", > > > "parsedquery_toString":"allText:\"o'reilly (reilly oreilly)\"", > > > "QParser":"LuceneQParser", > > > "explain":{} > > > } > > > } > > > > > > > > > > > > 2. Analysis gives this: http://i.imgur.com/IPEiiEQ.png I assume this > > > means tokens are same for "o'reilly" > > > 3. I tried escaping ', it doesn’t help: > > > http://SERVER/solr/prodinfo/select?q=o\%27reilly&wt=json&indent=true< > > http://SERVER/solr/prodinfo/select?q=o%5C%27reilly&wt=json&indent=true> > > > > > > I will add WordDelimiterFilterFactory for index and see if it fixes the > > > problem. > > > > > > Thanks, > > > -Utkarsh > > > > > > > > > > > > On Mon, Aug 26, 2013 at 3:15 PM, Erick Erickson < > erickerick...@gmail.com > > >wrote: > > > > > >> First thing to do is attach &query=debug to your queries and look at > the > > >> parsed output. > > >> > > >> Second thing to do is look at the admin/analysis page and see what > > happens > > >> at index and query time to things like o'reilly. You have > > >> WordDelimiterFilterFactory > > >> configured in your query but not index analysis chain. My bet on that > is > > >> that > > >> you're getting different tokens at query and index time... > > >> > > >> Third thing is that you need to escape the & character. It's probably > > >> being > > >> interpreted as a delimiter on the URL and Solr ignores params it > doesn't > > >> understand. > > >> > > >> Best > > >> Erick > > >> > > >> > > >> On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar < > utkarsh2...@gmail.com > > >> >wrote: > > >> > > >> > Some of the queries (not all) with special chars return no > documents. > > >> > > > >> > Example: queries returning no documents > > >> > q=m&m (this can be explained, when I search for "m m", no documents > > are > > >> > returned) > > >> > q=o'reilly (when I search for "o reilly", I get documents back) > > >> > > > >> > > > >> > Queries returning documents: > > >> > q=hello&world (document matched is "Hello World: A Life in Ham > Radio") > > >> > > > >> > > > >> > My questions are: > > >> > 1. What's wrong with "o'reilly"? What changes do I need in my field > > >> type? > > >> > 2. How can I make the query "m&m" work? > > >> > My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate > > Candy > > >> > Coated Peanuts 19.2 oz" and ""M and Ms Chocolate Candies - Peanut > - 1 > > >> Bag > > >> > (42 oz)" > > >> > > > >> > > > >> > FIeld type: > > >> > <fieldType name="text_general" class="solr.TextField" > > >> > positionIncrementGap="100"> > > >> > <analyzer type="index"> > > >> > <tokenizer class="solr.StandardTokenizerFactory"/> > > >> > <filter class="solr.StopFilterFactory" > > >> ignoreCase="true" > > >> > words="stopwords.txt" enablePositionIncrements="true" /> > > >> > <filter class="solr.LowerCaseFilterFactory"/> > > >> > <filter > > class="solr.EnglishMinimalStemFilterFactory"/> > > >> > <filter class="solr.ASCIIFoldingFilterFactory"/> > > >> > <filter > > >> class="solr.RemoveDuplicatesTokenFilterFactory"/> > > >> > </analyzer> > > >> > <analyzer type="query"> > > >> > <filter class="solr.WordDelimiterFilterFactory" > > >> > generateWordParts="1" generateNumberParts="1" > > >> > > > >> > catenateWords="1" > > >> > > > >> > catenateNumbers="1" > > >> > > > >> > catenateAll="0" > > >> > > > >> > preserveOriginal="1"/> > > >> > <tokenizer class="solr.StandardTokenizerFactory"/> > > >> > <filter class="solr.StopFilterFactory" > > >> ignoreCase="true" > > >> > words="stopwords.txt" enablePositionIncrements="true" /> > > >> > <filter class="solr.LowerCaseFilterFactory"/> > > >> > <filter > > class="solr.EnglishMinimalStemFilterFactory"/> > > >> > <filter class="solr.ASCIIFoldingFilterFactory"/> > > >> > <filter > > >> class="solr.RemoveDuplicatesTokenFilterFactory"/> > > >> > </analyzer> > > >> > </fieldType> > > >> > > > >> > > > >> > -- > > >> > Thanks, > > >> > -Utkarsh > > >> > > > >> > > > > > > > > > > > > -- > > > Thanks, > > > -Utkarsh > > > > > > > > > > > -- > > Thanks, > > -Utkarsh > > > -- Thanks, -Utkarsh