Re: Intranet crawl and re-fetch - newbie question

Piotr Kosiorowski Mon, 06 Jun 2005 06:23:28 -0700

As far as I know crawl - (named Intranet crawling in tutorial) - assumesyou refetch everything from scratch every time you run it. Whole Webcrawling allows you to control what you want to crawl and recrawl withmore details but some parameters might not work as I would expect (eg.-refetchonly). Support for checking if page was modified from last fetchtime is currently missing (although as I understand there is some workgoing on in this direction: http://issues.apache.org/jira/browse/NUTCH-61 )

Regards
Piotr



[EMAIL PROTECTED] wrote:

Hello,

I have a newbie question:

I have launched and completed an intranet crawling (bin/nutch crawl mySite 
myDB).
Since I would like to recrawl in a few days, I changed the nutch default 
parameter to 3 days (instead of 30).
How do I perform the recrawl? Do I just launch a new intranet crawling using the same parameters?If I do, will the fetching only download new or modified pages, or will it download everything again?
Thanks for any help

Isabelle

[EMAIL PROTECTED]
Ph: 651 687 3424

Re: Intranet crawl and re-fetch - newbie question

Reply via email to