I also said, "Stopword removal is a reasonable default because it works fairly well for a general text corpus." Ultraseek keeps stopwords but most engines don't. I think it is fine as a default. I also think you have to understand stopwords at some point.
wunder On 11/5/07 9:59 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : This isn't a problem in Lucene or Solr. It is a result of the analyzers > : you have chosen to use. If you choose to remove stopwords, you will not > : be able to match stopwords. > > I believe paul's point was that this use of stopwords is in the "text" > fieldtype in the example schema.xml ... which many people use as is. > > I'm personally of the mindset that it's fine like it is. While people who > understand that "an" is a stop word might ask "why does 'rating:PG AND > name:an' match 40K movies, it should match 0?" there is another (probably > larger) group of people who won't know how the search is implemented, or > that "an" is a stop word, and they will look at the same results and ask > "why am i getting 40K results? most of these don't have 'an' in the title? > i should only be getting X results." > > That second group of people aren't going to be any happier if you > give them 0 results instead -- at least this way people get some results > to work with. > > -Hoss
