I searched the mail archive and found this http://www.mail-archive.com/[email protected]/msg01308.html - Is there in the current version of nutch on way to update the crawl without fetching every doc again?
- Is the nutch team planning an updating function?


Håvard W. Kongsgård wrote:

So how to update a crawl, the updating section of the FAQ is empty :-( http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6


Doug Cutting wrote:

Håvard W. Kongsgård wrote:

- I want to index about 50 – 100 sites with lots of documents, is it best use the Intranet Crawling or Whole-web Crawling method.




The "intranet" style is simpler and hence a good place to start. If it doesn't work well for you then you might try the "whole-web" style.

- Is the crawl auto updated in nutch, or must I run a cron task




It is not auto-updated.

Doug






Reply via email to