I have worked through the WritingPluginExample<http://wiki.apache.org/nutch/WritingPluginExample>example. Now I am wondering if the following makes any sense. I would like to store the date (yyyymmdd) the first time a Page was added to the Index. I thought I could create a plugin that would add a date_indexed field. My hesitation is what happens after the fetch interval, when the Page is refetched.
What happens - if the Page Content has changed? Is the Page updated (i.e. deleted and added) in the index and would the date_indexed be recalculated (would be ok.) - if the Page hasn't changed? Is the Page also updated (would break the meaning of the date_indexed field, not ok). Or does this depend on how I organize my generate/fetch/update/index cycle, i.e. if I merge my indexes or recreate them from scratch? Rgrds, Thomas
