Hello Güven, You should consider not using stopwords at all. The filter is useless or problematic in almost all cases. If you want to avoid trouble, drop the filter, because:
* Due to modern compression rates, the memory/disk space the filter clears up is negligible. * The scoring, tf*idf, gives low scores for high frequency terms. * At some point, a product's name or specification/type/brand will contain one or more stopwords. This is inevitable! Regards, Markus Op ma 8 nov. 2021 om 16:31 schreef H. Güven Candoğan <guv...@gmail.com>: > Hi all, > > We are experimenting with the sample techproducts schema > < > https://github.com/apache/solr/blob/1fffc52103e77563a30fd307df1eb0b7a79a3377/solr/server/solr/configsets/sample_techproducts_configs/conf/managed-schema#L459 > > > from > the Apache Solr master repo. > > We realized that having the stemming(PorterStemFilterFactory) filter after > the stopword filter(StopFilterFactory) seems to create issues. > > For example, we added “what” to the stopword list and we noticed that for > the input “what’s in the box”, we end up with “what box” after stemming. > However, we would want to have only the word “box” at the end of this > process. This desired result “box” can only be achieved when the stopwords > filter is placed after the stemming. Additionally, having the stopwords > filter after lowercasing and stemming seems to create better stopfilter > performance. At the end, we ended up with the following order in our > configuration: > > > 1. LowerCaseFilterFactory > 2. PorterStemFilterFactory > 3. StopFilterFactory > > > Since we are new to the Apache Solr and we are using what it seems a > “default” configuration, we fear that we might be missing some important > context here. Is there a justification for the default ordering, which I > assume most people will use as-is, and that we might be missing? Do you see > any issues placing the stopwords filter after stemming? Do you see any > issues placing the lowercasing before stopwords filter and stemming? > > Regards, > Guven >