Trey Grainger created SOLR-14434: ------------------------------------ Summary: Multiterm Analyzer Not Persisted in Managed Schema Key: SOLR-14434 URL: https://issues.apache.org/jira/browse/SOLR-14434 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Schema and Analysis Affects Versions: 8.5.1, 8.4.1, 8.5, 8.3.1, 8.4, 8.3, 8.1.1, 8.2, 8.1, 8.0 Reporter: Trey Grainger
In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an explicit "{{multiterm}}" analyzer to schema f\{{ieldType}} definitions. This allows for specific control over analysis for things like wildcard terms, prefix queries, range queries, etc. For example, the following would cause the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of "{{hats*}}", and thus match on the indexed version of "{{hat}}". {code:java} <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" positionIncrementGap="100" termOffsets="true" termVectors="true"> <analyzer type="index"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> <analyzer type="multiterm"> <tokenizer class="solr.ClassicTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> </analyzer> </fieldType>{code} This works fine if using a non-managed schema (i.e. {{schema.xml}} file) OR if you use managed schema (i.e. {{managed-schema}} file) and push your schema directly to Zookeeper. However, starting with Solr 8.0, if you use the Schema API to add a {{fieldType}}, the {{multiterm}} analyzers are not persisted (only {{index}} and {{query}} analyzers are). This bug seems to have originated from LUCENE-8497, which refactored this code area substantially. The bug is caused by the managed schema being able to READ in the {{multiterm}} analyzers from the schema file, but then being unable to write them out. Since pushing the schema directly to Zookeeper only requires Solr reading them in, this bug would not have been obvious in initial testing. However, since the schema API reads in the schema file, writes an updated schema out to Zookeeper (where the bug occurs), and then reads the file back in, all of the {{multiTerm}} analyzers get stripped out. I've identified the problematic code and am looking into an appropriate fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org