I am still using 0.7.1 - I think the CrawlDatum.setMetaData is only part of the trunk.
Is it not possible to just "hack" the MoreIndexingFilter and calculate the date_indexed field there (similar to how the lastModified field is calculated), and add a DateIndexedQueryFilter to the org.apache.nutch.searcher.more package? Rgrds, Thomas On 2/17/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > Hey, > May the freshly added CrawlDatum.setMetaData can help you to store > such informations. > However you need somehow to hack nutch code, since this is not stored > until today yet there is no extension point for such a task. > > HTH > Stefan > > Am 13.02.2006 um 17:36 schrieb Thomas Delnoij: > > > I have worked through the > > WritingPluginExample<http://wiki.apache.org/nutch/ > > WritingPluginExample>example. > > Now I am wondering if the following makes any sense. I would like > > to store the date (yyyymmdd) the first time a Page was added to the > > Index. I > > thought I could create a plugin that would add a date_indexed > > field. My > > hesitation is what happens after the fetch interval, when the Page is > > refetched. > > > > What happens > > > > - if the Page Content has changed? Is the Page updated (i.e. > > deleted and > > added) in the index and would the date_indexed be recalculated > > (would be > > ok.) > > - if the Page hasn't changed? Is the Page also updated (would break > > the > > meaning of the date_indexed field, not ok). > > > > Or does this depend on how I organize my generate/fetch/update/ > > index cycle, > > i.e. if I merge my indexes or recreate them from scratch? > > > > Rgrds, Thomas > > --------------------------------------------- > George Orwel was an Optimist > blog: http://www.find23.org > company: http://www.media-style.com > > >
