Another alternative that is to selectively use stopwords as in phrases
or other places where they have meaning. In the past, stopword
removal was mostly done to save disk space and some computation, but
disk is cheap and computation, well, they can help you have better
results if done right, so the computation cost may be worth it. If
they truly were meaningless, why would they be in the language to
begin with? :-)
-Grant
On Nov 6, 2007, at 1:36 AM, Walter Underwood wrote:
I also said, "Stopword removal is a reasonable default because it
works
fairly well for a general text corpus." Ultraseek keeps stopwords but
most engines don't. I think it is fine as a default. I also think you
have to understand stopwords at some point.
wunder
On 11/5/07 9:59 PM, "Chris Hostetter" <[EMAIL PROTECTED]>
wrote:
: This isn't a problem in Lucene or Solr. It is a result of the
analyzers
: you have chosen to use. If you choose to remove stopwords, you
will not
: be able to match stopwords.
I believe paul's point was that this use of stopwords is in the
"text"
fieldtype in the example schema.xml ... which many people use as is.
I'm personally of the mindset that it's fine like it is. While
people who
understand that "an" is a stop word might ask "why does 'rating:PG
AND
name:an' match 40K movies, it should match 0?" there is another
(probably
larger) group of people who won't know how the search is
implemented, or
that "an" is a stop word, and they will look at the same results
and ask
"why am i getting 40K results? most of these don't have 'an' in the
title?
i should only be getting X results."
That second group of people aren't going to be any happier if you
give them 0 results instead -- at least this way people get some
results
to work with.
-Hoss
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://www.apachecon.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ