Re: fetcher hangs up?

Dennis Kubes Tue, 13 Feb 2007 14:27:33 -0800

Take a look at the crawl-delay setting in the robots.txt file on thewebsite you are attempting to fetch. It may be what is slowing you down.

There is a setting fetcher.max.crawl.delay in your nutch-*.xml file thatcan change the behavior for this. The default is 30 seconds meaningnutch will ignore pages if the crawl delay is over 30 seconds. In therobots.txt file it would be set in milliseconds, something like 30000.If that website has crawl delays of say 20000 or 20 seconds nutch wouldwait 20 seconds between each webpage request. If this is the case andthe sight has say 10,000 pages then it would take around 2.3 days to fetch.


Dennis Kubes

cesar voulgaris wrote:

OK, thanks

On 2/13/07, cesar voulgaris <[EMAIL PROTECTED]> wrote:


hi, maybe someone who has the same problem can help me:

I started a crawl, at a certain depth the fetchers logs out the urls
aparently correct, but from two days!! it seems  to
be fetching the same site (a big one but not so big). What disturbs me is
that the segment directory is always  the same size

(du -hs segmentdir) it only has crawl_generate as a subdir. Does nutchhas

a temporary dir, where it  stores the fetches until it
write the other subdirs?...maybe it is hangup?. It hapened two times in
diferent crawls (I didi several crawls,not to common)

Re: fetcher hangs up?

Reply via email to