Martin Porter keeps a list of common english stop words that is frequently
used to improve search results:

On Tue, Dec 27, 2016 at 6:31 AM, Erik Gustafson <>

> Hi Alex,
> As we see, it indexes only words which have a length of 4 characters or
> more.
> The reason is to decrease the total index size (which may in fact not be
> critical) and to avoid noise like "a", "the" and "and". This function
> could be
> made more intelligent.
> Ahh, that makes sense. I may or may not play around with some changes on a
> local copy. Might save a newcomer or two some time down the road; not
> exactly mission critical though.

