What happens if you append &debug=query to your query? IOW, what does the _parsed_ query look like?
Also note that the defaults for WDFF are _not_ identical. catenateWords and catenateNumbers are 1 in the index portion and 0 in the query section. Still, this shouldn't be a problem all other things being equal. Best, Erick On Tue, Sep 2, 2014 at 12:43 PM, Jonathan Rochkind <rochk...@jhu.edu> wrote: > On 9/2/14 1:51 PM, Erick Erickson wrote: > >> bq: In my actual index, query "MacBook" is matching ONLY "mac book", and >> not "macbook" >> >> I suspect your query parameters for WordDelimiterFilterFactory doesn't >> have >> catenate words set. >> >> What do you see when you enter these in both the index and query portions >> of the admin/analysis page? >> > > Thanks Erick! > > Our WordDelimiterFilterFactory does have catenate words set, in both index > and query phases (is that right?): > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1"/> > > It's hard to cut and paste the results of the analysis page into email (or > anywhere!), I'll give you screenshots, sorry -- and I'll give them for our > whole real world app complex field definition. I'll also paste in our > entire field definition below. But I realize my next step is probably > creating a simpler isolation/reproduction case (unless you have a magic > answer from this!). > > Again, the problem is that "MacBook" seems to be only matching on indexed > "macbook" and not indexed "mac book". > > > "MacBook" query analysis: > https://www.dropbox.com/s/b8y11usjdlc88un/mixedcasequery.png > > "MacBook" index analysis: > https://www.dropbox.com/s/fwae3nz4tdtjhjv/mixedcaseindex.png > > "mac book" index analysis: > https://www.dropbox.com/s/mihd58f6zs3rfu8/twowordindex.png > > > Our entire actual field definition: > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100" > autoGeneratePhraseQueries="true"> > <analyzer> > <!-- the rulefiles thing is to keep ICUTokenizerFactory from > stripping punctuation, > so our synonym filter involving C++ etc can still work. > From: https://mail-archives.apache. > org/mod_mbox/lucene-solr-user/201305.mbox/%3C51965E70. > 6070...@elyograg.org%3E > the rbbi file is in our local ./conf, copied from lucene > source tree --> > <tokenizer class="solr.ICUTokenizerFactory" > rulefiles="Latn:Latin-break-only-on-whitespace.rbbi"/> > > <filter class="solr.SynonymFilterFactory" > synonyms="punctuation-whitelist.txt" > ignoreCase="true"/> > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > <!-- folding need sto be after WordDelimiter, so WordDelimiter > can do it's thing with full cases and such --> > <filter class="solr.ICUFoldingFilterFactory" /> > > > <!-- ICUFolding already includes lowercasing, no > need for seperate lowercasing step > <filter class="solr.LowerCaseFilterFactory"/> > --> > > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > > > >