Hello,

I am getting weird results that seem to come from eDisMax using
analyzer chain to break the input text. I have
WordDelimiterFilterFactory in my chain, which does a lot of
interesting things I did not expect query parser to be involved in.

Specifically, the string "abc123XYZ" gets split into 3 components on
digits and gets lowercased as well. I thought all that was happening
later, inside individual fields.

All documentation talks about query parsers splitting on space, so I
don't know where this "full chain" business is coming from. Or maybe I
am misunderstanding which phase debug output is from.

Here is the field definition:
    <fieldType name="wdText" class="solr.TextField" >
        <analyzer>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
preserveOriginal="1" />
            <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
    </fieldType>
    <fieldType name="wsText" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <field name="wdText"      type="wdText" indexed="true" stored="true" />
    <field name="wsText"      type="wsText" indexed="true" stored="true" />

And here is the debug output:
http://localhost:9000/solr/collection1/select?q=hello+big+world+abc123XYZ&wt=json&indent=true&debugQuery=true&defType=edismax&qf=wdText+wsText&stopwords=true&lowercaseOperators=true

   "rawquerystring":"hello big world abc123XYZ",
    "querystring":"hello big world abc123XYZ",
    "parsedquery":"(+(DisjunctionMaxQuery((wdText:hello |
wsText:hello)) DisjunctionMaxQuery((wdText:big | wsText:big))
DisjunctionMaxQuery((wdText:world | wsText:world))
DisjunctionMaxQuery((((wdText:abc123xyz wdText:abc) wdText:123
wdText:xyz) | wsText:abc123XYZ))))/no_coord",
    "parsedquery_toString":"+((wdText:hello | wsText:hello)
(wdText:big | wsText:big) (wdText:world | wsText:world)
(((wdText:abc123xyz wdText:abc) wdText:123 wdText:xyz) |
wsText:abc123XYZ))",

Or, and enabling phrase search on the field type, gets even more
weird. But one problem at a time.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

Reply via email to