Thanks Walter.  Much appreciated.

To the Solr dev team, it would be of great help if there Walter's IDF
summary is made part of stop-filter:
https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#stop-filter

Steve

On Fri, Apr 24, 2020 at 8:49 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> IDF and stopword removal are different approaches to the same thing.
>
> Removing stopwords is a binary decision on how important common words
> are for search. It says some words are completely useless.
>
> IDF is a proportional measure on how important common words are for search.
>
> Instead of removing a list of words that are assumed to be common and less
> useful, let the engine actually measure how common the words are and factor
> that into the relevance.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 24, 2020, at 5:39 PM, Steven White <swhite4...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I get it why and when if stopwords are note indexed is a bad idea and can
> > give you 0 or incomplete results.  But what about the quality of search
> > result when stopwords are indexed vs. not indexed?
> >
> > 1) Stopwords are removed and I do word search, not phrase for "solr and
> > lucene are so cool".
> > 2) Stopwords are not removed and I do word search, not phrase for "solr
> and
> > lucene are so cool".
> >
> > Now if "and", "are" and "or" are stopwords, will the search quality and
> > ranking for #1 be better then #2?  What about if I turn the above into a
> > phrase search?
> >
> > Thanks
> >
> > Steve
> >
> >
> > On Fri, Apr 24, 2020 at 10:53 AM Walter Underwood <wun...@wunderwood.org
> >
> > wrote:
> >
> >> I’m astonished that the default still has that. It was a bad idea in
> Solr
> >> 1.3, when
> >> it bit my ass.
> >>
> >> We help people with this about once a month and the advice is always the
> >> same.
> >> Imagine all the poor people who never ask about it and run with that
> >> default!
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Apr 24, 2020, at 7:34 AM, Erick Erickson <erickerick...@gmail.com>
> >> wrote:
> >>>
> >>> +1 to removing stopword filters.
> >>>
> >>>> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <jan....@cominvent.com>
> >> wrote:
> >>>>
> >>>> I tend to agree. Should we simply remove the stopword filters from the
> >> default configsets shipping with Solr?
> >>>>
> >>>> Jan
> >>>>
> >>>>> 24. apr. 2020 kl. 14:44 skrev David Hastings <
> >> hastings.recurs...@gmail.com>:
> >>>>>
> >>>>> you should never use the stopword filter unless you have a very
> >> specific
> >>>>> purpose
> >>>>>
> >>>>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <swhite4...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> What is, if any, the impact of stopwords in to my search ranking
> >> quality?
> >>>>>> Will my ranking improve is I do not index stopwords?
> >>>>>>
> >>>>>> I'm trying to figure out if I should use the stopword filter or not.
> >>>>>>
> >>>>>> Thanks in advanced.
> >>>>>>
> >>>>>> Steve
> >>>>>>
> >>>>
> >>>
> >>
> >>
>
>

Reply via email to