Edismax mysteries?

Dwane Hall Tue, 17 Aug 2021 00:33:54 -0700

Hi all,

A quick question regarding query analysis if someone is feeling brave and knows 
a bit about the edismax parser's behaviour?!


It's probably best explained as an example:

I have 3 fields with two field types (defined below)
ST_Field1 - Field type of search_text
ST_Field2 - Field type of search_text
LC_Field1 - Field type of lowercase

<!--English only text searching-->
<fieldType name="search_text" class="solr.TextField" positionIncrementGap="100" 
uninvertible="false">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.WordDelimiterGraphFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
      <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.WordDelimiterGraphFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>


<!--Code value searching e.g. A flight number QF123  -->
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100" 
uninvertible="false">
    <analyzer>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.TrimFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>

Now if I query these fields with a 2 term query "34567 something" (not a phrase 
query, q.op=AND) and only modify the qf fields the query parsers behaviour 
changes significantly.

Query 1 using qf=ST_Field1 ST_Field2
When I don't use a "lowercase" fieldType in qf - The query generated consists 
of a MUST DisjunctionMaxQuery for each term (2 total) with each qf field a 
SHOULD clause - This is the behaviour I'm expecting
"querystring":"34567 something",
"parsedquery":"+(+DisjunctionMaxQuery((ST_Field1:34567 | ST_Field1:34567)) 
+DisjunctionMaxQuery((ST_Field2:something | ST_Field2:something )))"

Query 2 using qf=ST_Field1 LC_Field1
When I use a "lowercase" fieldType in qf with a "search_text" fieldType - The 
query generated consists of a single MUST DisjunctionMaxQuery with each qf 
field of type "search_text" a MUST clause and the field of type "lowercase" a 
SHOULD clause
"querystring":"34567 something",
"parsedquery":"+(+DisjunctionMaxQuery(((+ST_Field1:34567 +ST_Field1:something) 
| LC_Field1:34567 something )))"

Does anyone know at a high level the rules that dictate these changes in query 
behaviour? If so are there a particular analysis chain to avoid to limit the 
chances of it happing (i.e. Force Query 1 behaviour, not Query 2 behaviour 
above). The Open Source Connections guys (John Berryman) have a great post on 
edismax 
(https://opensourceconnections.com/blog/2013/03/07/the-anatomy-of-a-dismax-query/)
 and it was either them or on this forum where I read that edismax behaviour 
will change if the query gets "too complex" but it'd be useful to understand 
some of the specifics on what forces this behaviour change so we can predict 
when to expect it!

Cheers,

Dwane

Solr 8.8.2

Edismax mysteries?

Reply via email to