Peter Swoboda wrote:
Hi,
what does the property
<property>
<name>db.default.fetch.interval</name>
<value>30</value>
<description>The default number of days between re-fetches of a page.
</description>
</property>
exactly do?
Urls in the CrawlDb are set to be refetched after a given interval. The
default is 30 days. This variable set the interval.
Does it mean, that any changes on an injected url will be mentioned?
Who(?)What re-fetches the page?
Fetcher will once the interval has expired. This does not happen
automatically, a fetch job will have to be run.
What did i have to do, if i want nutch to mention (in the search results) that
an injected url is changed.
Do i have to make a complete recrawl (like in the script)?
If you know specific urls have changed, you can create a fetch list of
only those urls (through a separate crawldb and segments on a separate
inject, generate, fetch process...don't use the same path) Then you can
merge those results using mergedb for the CrawlDb and mergesegs for the
Segments. You should have to do a full recrawl unless you don't know
what pages were changed.
Dennis Kubes
Thanks
Peter