Elaborating on top of the already good answers: "Out of the box, the scoring will already take care of it." Are we sure? I mean, it will "mostly" take care of it.
When using multi-field search, you can approach scoring in different ways, for example using edismax and the tie factor you can move from a pure disjunction query to a pure boolean query and anything in the middle to calculate score. a query "term1" on the fields qf= text text_stemmed produces: Query Term = term1 Stemmed Query Term = term *Pure Disjunction* text:term1 | text_stemmed:term The score is the max scoring clause. For a document that contains the exact term "term1" the winning clause could be any of the two. *term1* in the field text has term frequency TF1 and document frequency DF1 *term* in the field text_stemmed has term frequency TF and document frequency DF TF >= TF1 (= if only term1 was originally present in the field, > if term1, term2, term were present and stemmed to 'term') IDF <= IDF1 (= if only term1 was originally present in the field in the corpus, > if term1, term2, term were present and stemmed to 'term') Documents containing different terms may have matches with higher or lower TF, while DF is always going to be >=. BM25 approaches saturation for the impact on the score of Term Frequency, still you may get the winning clause to derive from text_stemmed:term because of term frequency. So I think we can say that the exact match is likely to win because of the Inverse Document Frequency factor, but it's not guaranteed in a pure disjunction. e.g. *Doc1* text: "*term1* bla bla bla bla" TF(stemmed)= 1 TF1(un-stemmed)=1 DF1=100 DF=101 *Doc2*: text:"*term2* *term3* *term4* *term5* bla bla *term6* bla bla" TF(stemmed)= 5 DF= 101 TF1(un-stemmed)=0 - no match *Pure Boolean* text:term1 | text_stemmed:term The score is the sum of the scoring clauses. But the observation is similar: Depending on the Term Frequency, we are going to likely see a better score for documents matching the exact term in the field 'text' (caused by the fact that the exact term in the field 'text' has higher inverse document frequency and we sum the stemmed counterpart). But not always because the Inverse Document Frequency could not compensate enough. I know many other factors affect the score, but without boosting to a certain extent (what extent is not easy to say), I don't think we can guarantee the un-stemmed match wins. Cheers -------------------------- Alessandro Benedetti Apache Lucene/Solr Committer Director, R&D Software Engineer, Search Consultant www.sease.io On Fri, 23 Apr 2021 at 12:35, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hallo, > > I would use both at the same time. You do not always want to find all > stemmed forms of a term, but the unstemmed form instead, or at least have > the latter being scored higher. Out of the box, the scoring will already > take care of it. > > Although i actually prefer both in one field, using the KeywordRepeat > filter. But that leads to other headaches that require even more work to > fix it. Use both fields and keep it simple. > > Regards, > Markus > > Op vr 23 apr. 2021 om 11:50 schreef The Maverick <maveric...@posteo.de>: > > > Hello > > > > I have aschema with two fields > > One is stemmed and one isn't. > > When I would use the stemmed field in my search. ( or when I shouldn't do > > it ) > > > > Regards > > S > > >