Hi Florent,

[snip]

 > 1. Any ideas what might have caused it to time out just now, when it
 had successfully run many jobs up to that point?

 2. What cruft might I need to get rid of because it died? For example,
 I see a reference to /home/crawler/tmp/local/jobTracker/job_18cunz.xml
 now when I try to execute some Nutch commands.

I've had the same problem during the invertlinks step when dealing w/ a
large number of urls.  Increasing the ipc.client.timeout value from
60000  to 100000 (cf nutch-default.xml) did the trick.

Thanks for the idea - we'll give it a try now.

[snip]

 > 4. Any idea whether 4 hours is a reasonable amount of time for this
 test? It seemed long to me, given that I was starting with a single
 > URL as the seed.
 >
How many crawl passes did you do ?

Three deep, as in: bin/nutch crawl seeds -depth 3

This was the same as Doug described in his post here:

http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL 
PROTECTED]

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200

Reply via email to