Re: default text type and stop words

Walter Underwood Mon, 05 Nov 2007 22:36:57 -0800

I also said, "Stopword removal is a reasonable default because it works
fairly well for a general text corpus." Ultraseek keeps stopwords but
most engines don't. I think it is fine as a default. I also think you
have to understand stopwords at some point.


wunder

On 11/5/07 9:59 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:

> 
> : This isn't a problem in Lucene or Solr. It is a result of the analyzers
> : you have chosen to use. If you choose to remove stopwords, you will not
> : be able to match stopwords.
> 
> I believe paul's point was that this use of stopwords is in the "text"
> fieldtype in the example schema.xml ... which many people use as is.
> 
> I'm personally of the mindset that it's fine like it is.  While people who
> understand that "an" is a stop word might ask "why does 'rating:PG AND
> name:an' match 40K movies, it should match 0?" there is another (probably
> larger) group of people who won't know how the search is implemented, or
> that "an" is a stop word, and they will look at the same results and ask
> "why am i getting 40K results? most of these don't have 'an' in the title?
> i should only be getting X results."
> 
> That second group of people aren't going to be any happier if you
> give them 0 results instead -- at least this way people get some results
> to work with.
> 
> -Hoss

Re: default text type and stop words

Reply via email to