Understanding Crawl-Delay

S.L Sun, 01 Jun 2014 08:35:20 -0700

Hello Folks,

I know that there is a fetcher.max.crawl.delay parameter which when set to
certain value in seconds will skip a particular page to be fetched if the
crawl-delay in robots.txt for that host is more than the value.


I have a confusion because the description of this parameter mentions that
a particular page will not be fetched , where as the crawl-delay applies to
the whole website , does it mean that all the pages will not be fetched by
Nutch subsequently.

For example , if I have crawled page 1 , and page 1 has 100 outlinks.

Now if the crawl-delay is something like two days , does it mean that all
the 100 out links from the one single crawled page not be crawled at all?

Thanks in advance!

Understanding Crawl-Delay

Reply via email to