I have worked through the
WritingPluginExample<http://wiki.apache.org/nutch/WritingPluginExample>example.
Now I am wondering if the following makes any sense. I would like
to store the date (yyyymmdd) the first time a Page was added to the Index. I
thought I could create a plugin that would add a date_indexed field. My
hesitation is what happens after the fetch interval, when the Page is
refetched.

What happens

- if the Page Content has changed? Is the Page updated (i.e. deleted and
added) in the index and would the date_indexed be recalculated (would be
ok.)
- if the Page hasn't changed? Is the Page also updated (would break the
meaning of the date_indexed field, not ok).

Or does this depend on how I organize my generate/fetch/update/index cycle,
i.e. if I merge my indexes or recreate them from scratch?

Rgrds, Thomas

Reply via email to