Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

Alexandre Rafalovitch Sat, 17 May 2014 10:45:49 -0700

My understanding was that the lower-case and other things happen on
per-field basis and is a step after the dismax formula is applied. In
this case, however, this seems to be happening before:
DisjunctionMaxQuery((((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz)


Hence to question to someone who actually understands those guts. For
eDisMax, what's the correct/expected call sequence between query
parser and field-type parser? Or maybe just a slightly more in-depth
explanation of Michael's statement.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Sat, May 17, 2014 at 8:28 PM, Michael Sokolov
<msoko...@safaribooksonline.com> wrote:
> Alex - the query parsers generally accept an analyzer, which they must apply
> after they perform their own tokenization.  Consider: how would a
> capitalized query term match lower-cased terms in the index without query
> analysis?
>
> -Mike
>
>
> On 5/17/2014 4:05 AM, Alexandre Rafalovitch wrote:
>>
>> Hello,
>>
>> I am getting weird results that seem to come from eDisMax using
>> analyzer chain to break the input text. I have
>> WordDelimiterFilterFactory in my chain, which does a lot of
>> interesting things I did not expect query parser to be involved in.
>>
>> Specifically, the string "abc123XYZ" gets split into 3 components on
>> digits and gets lowercased as well. I thought all that was happening
>> later, inside individual fields.
>>
>> All documentation talks about query parsers splitting on space, so I
>> don't know where this "full chain" business is coming from. Or maybe I
>> am misunderstanding which phase debug output is from.
>>
>> Here is the field definition:
>>      <fieldType name="wdText" class="solr.TextField" >
>>          <analyzer>
>>              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>              <filter class="solr.WordDelimiterFilterFactory"
>> preserveOriginal="1" />
>>              <filter class="solr.LowerCaseFilterFactory" />
>>          </analyzer>
>>      </fieldType>
>>      <fieldType name="wsText" class="solr.TextField"
>> positionIncrementGap="100">
>>        <analyzer>
>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        </analyzer>
>>      </fieldType>
>>
>>      <field name="wdText"      type="wdText" indexed="true" stored="true"
>> />
>>      <field name="wsText"      type="wsText" indexed="true" stored="true"
>> />
>>
>> And here is the debug output:
>>
>> http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true
>>
>>     "rawquerystring":"hello big world abc123XYZ",
>>      "querystring":"hello big world abc123XYZ",
>>      "parsedquery":"(+(DisjunctionMaxQuery((wdText:hello |
>> wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big))
>> DisjunctionMaxQuery((wdText:world | wsText:world))
>> DisjunctionMaxQuery((((wdText:abc123xyz wdText:abc) wdText:123
>> wdText:xyz) | wsText:abc123XYZ))))/no_coord",
>>      "parsedquery_toString":"+((wdText:hello | wsText:hello)
>> (wdText:big | wsText:big) (wdText:world | wsText:world)
>> (((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) |
>> wsText:abc123XYZ))",
>>
>> Or, and enabling phrase search on the field type, gets even more
>> weird. But one problem at a time.
>>
>> Regards,
>>     Alex.
>>
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr
>> proficiency
>
>

Re: Solr 4.8: Does eDisMax parser calls analyzer chain to tokenize?

Reply via email to