Hi all, We are experimenting with the sample techproducts schema <https://github.com/apache/solr/blob/1fffc52103e77563a30fd307df1eb0b7a79a3377/solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema#L459> from the Apache Solr master repo.
We realized that having the stemming(PorterStemFilterFactory) filter after the stopword filter(StopFilterFactory) seems to create issues. For example, we added “what” to the stopword list and we noticed that for the input “what’s in the box”, we end up with “what box” after stemming. However, we would want to have only the word “box” at the end of this process. This desired result “box” can only be achieved when the stopwords filter is placed after the stemming. Additionally, having the stopwords filter after lowercasing and stemming seems to create better stopfilter performance. At the end, we ended up with the following order in our configuration: 1. LowerCaseFilterFactory 2. PorterStemFilterFactory 3. StopFilterFactory Since we are new to the Apache Solr and we are using what it seems a “default” configuration, we fear that we might be missing some important context here. Is there a justification for the default ordering, which I assume most people will use as-is, and that we might be missing? Do you see any issues placing the stopwords filter after stemming? Do you see any issues placing the lowercasing before stopwords filter and stemming? Regards, Guven