There is a list of stop words in NutchAnalysis class ( org.apache.nutch.analysis). I guess thats where the common terms are removed during analysis.
--Rajesh Munavalli Blog: http://mathsearch.blogspot.com On 3/30/06, Vanderdray, Jacob <[EMAIL PROTECTED]> wrote: > > I've added some code to query-basic to log the query after it > has run both addTerms and addPhrases. This helps me to better > understand what's going on. I've noticed that when my search contains > words like "the" or "a", those don't appear in the actual query. > > It looks to me like the common-terms.utf8 file is supposed to be > used to strip common words like "the" out of queries for specific > fields, but that doesn't seem to be what's happening. The term "the" > ends up getting stripped out of the query for all fields (url, content, > anchor, etc.). I even tried removing "the" from the common-terms.utf8 > file, but didn't see any change in behavior. > > Does this file only get used when indexing? If so what > determines which words get stripped out of searches? > > Thanks, > Jake. >
