Re: Error at end of MapReduce run with indexing

Ken Krugler Tue, 17 Jan 2006 15:51:33 -0800

Hi Florent,

[snip]

 > 1. Any ideas what might have caused it to time out just now, when it

 had successfully run many jobs up to that point?

 2. What cruft might I need to get rid of because it died? For example,
 I see a reference to /home/crawler/tmp/local/jobTracker/job_18cunz.xml
 now when I try to execute some Nutch commands.


I've had the same problem during the invertlinks step when dealing w/ a
large number of urls.  Increasing the ipc.client.timeout value from
60000  to 100000 (cf nutch-default.xml) did the trick.


Thanks for the idea - we'll give it a try now.

[snip]

 > 4. Any idea whether 4 hours is a reasonable amount of time for this

 test? It seemed long to me, given that I was starting with a single

 > URL as the seed.
 >
How many crawl passes did you do ?


Three deep, as in: bin/nutch crawl seeds -depth 3

This was the same as Doug described in his post here:

http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL 
PROTECTED]

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200

Re: Error at end of MapReduce run with indexing

Reply via email to