By default, Nutch uses the index-basic plugin (see plugin.includes property in nutch-default.xml) This plugin (org.apache.nutch.indexer.basic.BasicIndexingFilter) indexes a document using the following fields:

host, site, url, content, anchor, title, tstamp (and cache if allowed)

The fields digest, segment and boost are added by org.apache.nutch.indexer.Indexer for each document by default because Nutch needs them regardless of the indexing filter used.

Mathijs

Daniel Clark wrote:
Which indexFilter plugin does Nutch use out-of-the-box?  Or how do I find
out?  I'm trying to figure out how the following fields are being indexed.

anchor

boost

content

digest

host

segment

site

title

tstamp

url


Reply via email to