My understanding was that the lower-case and other things happen on per-field basis and is a step after the dismax formula is applied. In this case, however, this seems to be happening before: DisjunctionMaxQuery((((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz)
Hence to question to someone who actually understands those guts. For eDisMax, what's the correct/expected call sequence between query parser and field-type parser? Or maybe just a slightly more in-depth explanation of Michael's statement. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Sat, May 17, 2014 at 8:28 PM, Michael Sokolov <msoko...@safaribooksonline.com> wrote: > Alex - the query parsers generally accept an analyzer, which they must apply > after they perform their own tokenization. Consider: how would a > capitalized query term match lower-cased terms in the index without query > analysis? > > -Mike > > > On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote: >> >> Hello, >> >> I am getting weird results that seem to come from eDisMax using >> analyzer chain to break the input text. I have >> WordDelimiterFilterFactory in my chain, which does a lot of >> interesting things I did not expect query parser to be involved in. >> >> Specifically, the string "abc123XYZ" gets split into 3 components on >> digits and gets lowercased as well. I thought all that was happening >> later, inside individual fields. >> >> All documentation talks about query parsers splitting on space, so I >> don't know where this "full chain" business is coming from. Or maybe I >> am misunderstanding which phase debug output is from. >> >> Here is the field definition: >> <fieldType name="wdText" class="solr.TextField" > >> <analyzer> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.WordDelimiterFilterFactory" >> preserveOriginal="1" /> >> <filter class="solr.LowerCaseFilterFactory" /> >> </analyzer> >> </fieldType> >> <fieldType name="wsText" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> </analyzer> >> </fieldType> >> >> <field name="wdText" type="wdText" indexed="true" stored="true" >> /> >> <field name="wsText" type="wsText" indexed="true" stored="true" >> /> >> >> And here is the debug output: >> >> http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true >> >> "rawquerystring":"hello big world abc123XYZ", >> "querystring":"hello big world abc123XYZ", >> "parsedquery":"(+(DisjunctionMaxQuery((wdText:hello | >> wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big)) >> DisjunctionMaxQuery((wdText:world | wsText:world)) >> DisjunctionMaxQuery((((wdText:abc123xyz wdText:abc) wdText:123 >> wdText:xyz) | wsText:abc123XYZ))))/no_coord", >> "parsedquery_toString":"+((wdText:hello | wsText:hello) >> (wdText:big | wsText:big) (wdText:world | wsText:world) >> (((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) | >> wsText:abc123XYZ))", >> >> Or, and enabling phrase search on the field type, gets even more >> weird. But one problem at a time. >> >> Regards, >> Alex. >> >> Personal website: http://www.outerthoughts.com/ >> Current project: http://www.solr-start.com/ - Accelerating your Solr >> proficiency > >