Another alternative that is to selectively use stopwords as in phrases or other places where they have meaning. In the past, stopword removal was mostly done to save disk space and some computation, but disk is cheap and computation, well, they can help you have better results if done right, so the computation cost may be worth it. If they truly were meaningless, why would they be in the language to begin with? :-)

-Grant

On Nov 6, 2007, at 1:36 AM, Walter Underwood wrote:

I also said, "Stopword removal is a reasonable default because it works
fairly well for a general text corpus." Ultraseek keeps stopwords but
most engines don't. I think it is fine as a default. I also think you
have to understand stopwords at some point.

wunder

On 11/5/07 9:59 PM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:


: This isn't a problem in Lucene or Solr. It is a result of the analyzers : you have chosen to use. If you choose to remove stopwords, you will not
: be able to match stopwords.

I believe paul's point was that this use of stopwords is in the "text"
fieldtype in the example schema.xml ... which many people use as is.

I'm personally of the mindset that it's fine like it is. While people who understand that "an" is a stop word might ask "why does 'rating:PG AND name:an' match 40K movies, it should match 0?" there is another (probably larger) group of people who won't know how the search is implemented, or that "an" is a stop word, and they will look at the same results and ask "why am i getting 40K results? most of these don't have 'an' in the title?
i should only be getting X results."

That second group of people aren't going to be any happier if you
give them 0 results instead -- at least this way people get some results
to work with.

-Hoss



--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ


Reply via email to