Hey,
May the freshly added CrawlDatum.setMetaData can help you to store such informations. However you need somehow to hack nutch code, since this is not stored until today yet there is no extension point for such a task.

HTH
Stefan

Am 13.02.2006 um 17:36 schrieb Thomas Delnoij:

I have worked through the
WritingPluginExample<http://wiki.apache.org/nutch/ WritingPluginExample>example.
Now I am wondering if the following makes any sense. I would like
to store the date (yyyymmdd) the first time a Page was added to the Index. I thought I could create a plugin that would add a date_indexed field. My
hesitation is what happens after the fetch interval, when the Page is
refetched.

What happens

- if the Page Content has changed? Is the Page updated (i.e. deleted and added) in the index and would the date_indexed be recalculated (would be
ok.)
- if the Page hasn't changed? Is the Page also updated (would break the
meaning of the date_indexed field, not ok).

Or does this depend on how I organize my generate/fetch/update/ index cycle,
i.e. if I merge my indexes or recreate them from scratch?

Rgrds, Thomas

---------------------------------------------
George Orwel was an Optimist
blog: http://www.find23.org
company: http://www.media-style.com


Reply via email to