I searched the mail archive and found this
http://www.mail-archive.com/[email protected]/msg01308.html
- Is there in the current version of nutch on way to update the crawl
without fetching every doc again?
- Is the nutch team planning an updating function?
Håvard W. Kongsgård wrote:
So how to update a crawl, the updating section of the FAQ is empty :-(
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6
Doug Cutting wrote:
Håvard W. Kongsgård wrote:
- I want to index about 50 – 100 sites with lots of documents, is
it best use the Intranet Crawling or Whole-web Crawling method.
The "intranet" style is simpler and hence a good place to start. If
it doesn't work well for you then you might try the "whole-web" style.
- Is the crawl auto updated in nutch, or must I run a cron task
It is not auto-updated.
Doug