I am still using 0.7.1 - I think the CrawlDatum.setMetaData is only part of
the trunk.

Is it not possible to just "hack" the MoreIndexingFilter and calculate the
date_indexed field there (similar to how the lastModified field is
calculated), and add a DateIndexedQueryFilter to the
org.apache.nutch.searcher.more package?

Rgrds, Thomas

On 2/17/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>
> Hey,
> May the freshly added CrawlDatum.setMetaData can help you to store
> such informations.
> However you need somehow to hack nutch code, since this is not stored
> until today yet there is no extension point for such a task.
>
> HTH
> Stefan
>
> Am 13.02.2006 um 17:36 schrieb Thomas Delnoij:
>
> > I have worked through the
> > WritingPluginExample<http://wiki.apache.org/nutch/
> > WritingPluginExample>example.
> > Now I am wondering if the following makes any sense. I would like
> > to store the date (yyyymmdd) the first time a Page was added to the
> > Index. I
> > thought I could create a plugin that would add a date_indexed
> > field. My
> > hesitation is what happens after the fetch interval, when the Page is
> > refetched.
> >
> > What happens
> >
> > - if the Page Content has changed? Is the Page updated (i.e.
> > deleted and
> > added) in the index and would the date_indexed be recalculated
> > (would be
> > ok.)
> > - if the Page hasn't changed? Is the Page also updated (would break
> > the
> > meaning of the date_indexed field, not ok).
> >
> > Or does this depend on how I organize my generate/fetch/update/
> > index cycle,
> > i.e. if I merge my indexes or recreate them from scratch?
> >
> > Rgrds, Thomas
>
> ---------------------------------------------
> George Orwel was an Optimist
> blog: http://www.find23.org
> company: http://www.media-style.com
>
>
>

Reply via email to