By default, Nutch uses the index-basic plugin (see plugin.includes
property in nutch-default.xml)
This plugin (org.apache.nutch.indexer.basic.BasicIndexingFilter) indexes
a document using the following fields:
host, site, url, content, anchor, title, tstamp (and cache if allowed)
The fields digest, segment and boost are added by
org.apache.nutch.indexer.Indexer for each document by default because
Nutch needs them regardless of the indexing filter used.
Mathijs
Daniel Clark wrote:
Which indexFilter plugin does Nutch use out-of-the-box? Or how do I find
out? I'm trying to figure out how the following fields are being indexed.
anchor
boost
content
digest
host
segment
site
title
tstamp
url