Hey,
May the freshly added CrawlDatum.setMetaData can help you to store such informations. However you need somehow to hack nutch code, since this is not stored until today yet there is no extension point for such a task.

HTH
Stefan

Am 13.02.2006 um 17:36 schrieb Thomas Delnoij:

I have worked through the
WritingPluginExample<http://wiki.apache.org/nutch/ WritingPluginExample>example.
Now I am wondering if the following makes any sense. I would like
to store the date (yyyymmdd) the first time a Page was added to the Index. I thought I could create a plugin that would add a date_indexed field. My
hesitation is what happens after the fetch interval, when the Page is
refetched.

What happens

- if the Page Content has changed? Is the Page updated (i.e. deleted and added) in the index and would the date_indexed be recalculated (would be
ok.)
- if the Page hasn't changed? Is the Page also updated (would break the
meaning of the date_indexed field, not ok).

Or does this depend on how I organize my generate/fetch/update/ index cycle,
i.e. if I merge my indexes or recreate them from scratch?

Rgrds, Thomas

---------------------------------------------
George Orwel was an Optimist
blog: http://www.find23.org
company: http://www.media-style.com




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to