Trey Grainger created SOLR-14434:
------------------------------------

             Summary: Multiterm Analyzer Not Persisted in Managed Schema
                 Key: SOLR-14434
                 URL: https://issues.apache.org/jira/browse/SOLR-14434
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Schema and Analysis
    Affects Versions: 8.5.1, 8.4.1, 8.5, 8.3.1, 8.4, 8.3, 8.1.1, 8.2, 8.1, 8.0
            Reporter: Trey Grainger


In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
explicit "{{multiterm}}" analyzer to schema f\{{ieldType}} definitions. This 
allows for specific control over analysis for things like wildcard terms, 
prefix queries, range queries, etc. For example, the following would cause the 
wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
"{{hats*}}", and thus match on the indexed version of "{{hat}}".
{code:java}
  <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
positionIncrementGap="100" termOffsets="true" termVectors="true">
    <analyzer type="index">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" 
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="multiterm">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EnglishMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>{code}
This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if 
you use managed schema (i.e. {{managed-schema}} file) and push your schema 
directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema 
API to add a {{fieldType}}, the {{multiterm}} analyzers are not persisted (only 
{{index}} and {{query}} analyzers are).

This bug seems to have originated from LUCENE-8497, which refactored this code 
area substantially. The bug is caused by the managed schema being able to READ 
in the {{multiterm}} analyzers from the schema file, but then being unable to 
write them out. Since pushing the schema directly to Zookeeper only requires 
Solr reading them in, this bug would not have been obvious in initial testing. 
However, since the schema API reads in the schema file, writes an updated 
schema out to Zookeeper (where the bug occurs), and then reads the file back 
in, all of the {{multiTerm}} analyzers get stripped out.

I've identified the problematic code and am looking into an appropriate fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to