Ah, after reading up on the new metadata facility in JIRA, I think I
understand better what you mean - the metadata are added to the WebDB and
persisted across refetches. This way it is possible for the complete index
to be recreated from scratch while maintaing the first indexed date, which
otherwise would be lost, right?

Rgrds, Thomas

On 2/17/06, TDLN <[EMAIL PROTECTED]> wrote:
>
> I am still using 0.7.1 - I think the CrawlDatum.setMetaData is only part
> of the trunk.
>
> Is it not possible to just "hack" the MoreIndexingFilter and calculate the
> date_indexed field there (similar to how the lastModified field is
> calculated), and add a DateIndexedQueryFilter to the
> org.apache.nutch.searcher.more package?
>
> Rgrds, Thomas
>
> On 2/17/06, Stefan Groschupf <[EMAIL PROTECTED] > wrote:
> >
> > Hey,
> > May the freshly added CrawlDatum.setMetaData can help you to store
> > such informations.
> > However you need somehow to hack nutch code, since this is not stored
> > until today yet there is no extension point for such a task.
> >
> > HTH
> > Stefan
> >
> > Am 13.02.2006 um 17:36 schrieb Thomas Delnoij:
> >
> > > I have worked through the
> > > WritingPluginExample<http://wiki.apache.org/nutch/
> > > WritingPluginExample>example.
> > > Now I am wondering if the following makes any sense. I would like
> > > to store the date (yyyymmdd) the first time a Page was added to the
> > > Index. I
> > > thought I could create a plugin that would add a date_indexed
> > > field. My
> > > hesitation is what happens after the fetch interval, when the Page is
> > > refetched.
> > >
> > > What happens
> > >
> > > - if the Page Content has changed? Is the Page updated (i.e.
> > > deleted and
> > > added) in the index and would the date_indexed be recalculated
> > > (would be
> > > ok.)
> > > - if the Page hasn't changed? Is the Page also updated (would break
> > > the
> > > meaning of the date_indexed field, not ok).
> > >
> > > Or does this depend on how I organize my generate/fetch/update/
> > > index cycle,
> > > i.e. if I merge my indexes or recreate them from scratch?
> > >
> > > Rgrds, Thomas
> >
> > ---------------------------------------------
> > George Orwel was an Optimist
> > blog: http://www.find23.org
> > company: http://www.media-style.com
> >
> >
> >
>

Reply via email to