Peter Swoboda wrote:
Hi,
what does the property

<property>
  <name>db.default.fetch.interval</name>
  <value>30</value>
  <description>The default number of days between re-fetches of a page.
  </description>
</property>

exactly do?

Urls in the CrawlDb are set to be refetched after a given interval. The default is 30 days. This variable set the interval.

Does it mean, that any changes on an injected url will be mentioned?
Who(?)What re-fetches the page?

Fetcher will once the interval has expired. This does not happen automatically, a fetch job will have to be run.

What did i have to do, if i want nutch to mention (in the search results) that 
an injected url is changed.
Do i have to make a complete recrawl (like in the script)?

If you know specific urls have changed, you can create a fetch list of only those urls (through a separate crawldb and segments on a separate inject, generate, fetch process...don't use the same path) Then you can merge those results using mergedb for the CrawlDb and mergesegs for the Segments. You should have to do a full recrawl unless you don't know what pages were changed.

Dennis Kubes

Thanks
Peter



Reply via email to