Martin Porter keeps a list of common english stop words that is frequently used to improve search results:
http://snowball.tartarus.org/algorithms/english/stop.txt On Tue, Dec 27, 2016 at 6:31 AM, Erik Gustafson <erik.d.gustaf...@gmail.com> wrote: > Hi Alex, > > As we see, it indexes only words which have a length of 4 characters or > more. > > The reason is to decrease the total index size (which may in fact not be > critical) and to avoid noise like "a", "the" and "and". This function > could be > made more intelligent. > > > Ahh, that makes sense. I may or may not play around with some changes on a > local copy. Might save a newcomer or two some time down the road; not > exactly mission critical though. > >