What you mentioned is correct? It is stored in the db
Rgds Prabhu On 2/17/06, TDLN <[EMAIL PROTECTED]> wrote: > > Ah, after reading up on the new metadata facility in JIRA, I think I > understand better what you mean - the metadata are added to the WebDB and > persisted across refetches. This way it is possible for the complete index > to be recreated from scratch while maintaing the first indexed date, which > otherwise would be lost, right? > > Rgrds, Thomas > > On 2/17/06, TDLN <[EMAIL PROTECTED]> wrote: > > > > I am still using 0.7.1 - I think the CrawlDatum.setMetaData is only part > > of the trunk. > > > > Is it not possible to just "hack" the MoreIndexingFilter and calculate > the > > date_indexed field there (similar to how the lastModified field is > > calculated), and add a DateIndexedQueryFilter to the > > org.apache.nutch.searcher.more package? > > > > Rgrds, Thomas > > > > On 2/17/06, Stefan Groschupf <[EMAIL PROTECTED] > wrote: > > > > > > Hey, > > > May the freshly added CrawlDatum.setMetaData can help you to store > > > such informations. > > > However you need somehow to hack nutch code, since this is not stored > > > until today yet there is no extension point for such a task. > > > > > > HTH > > > Stefan > > > > > > Am 13.02.2006 um 17:36 schrieb Thomas Delnoij: > > > > > > > I have worked through the > > > > WritingPluginExample<http://wiki.apache.org/nutch/ > > > > WritingPluginExample>example. > > > > Now I am wondering if the following makes any sense. I would like > > > > to store the date (yyyymmdd) the first time a Page was added to the > > > > Index. I > > > > thought I could create a plugin that would add a date_indexed > > > > field. My > > > > hesitation is what happens after the fetch interval, when the Page > is > > > > refetched. > > > > > > > > What happens > > > > > > > > - if the Page Content has changed? Is the Page updated (i.e. > > > > deleted and > > > > added) in the index and would the date_indexed be recalculated > > > > (would be > > > > ok.) > > > > - if the Page hasn't changed? Is the Page also updated (would break > > > > the > > > > meaning of the date_indexed field, not ok). > > > > > > > > Or does this depend on how I organize my generate/fetch/update/ > > > > index cycle, > > > > i.e. if I merge my indexes or recreate them from scratch? > > > > > > > > Rgrds, Thomas > > > > > > --------------------------------------------- > > > George Orwel was an Optimist > > > blog: http://www.find23.org > > > company: http://www.media-style.com > > > > > > > > > > > > >
