Hello Folks,

I know that there is a fetcher.max.crawl.delay parameter which when set to
certain value in seconds will skip a particular page to be fetched if the
crawl-delay in robots.txt for that host is more than the value.

I have a confusion because the description of this parameter mentions that
a particular page will not be fetched , where as the crawl-delay applies to
the whole website , does it mean that all the pages will not be fetched by
Nutch subsequently.

For example , if I have crawled page 1 , and page 1 has 100 outlinks.

Now if the crawl-delay is something like two days , does it mean that all
the 100 out links from the one single crawled page not be crawled at all?

Thanks in advance!

Reply via email to