Hi Florent,
[snip]
> 1. Any ideas what might have caused it to time out just now, when it
had successfully run many jobs up to that point?
2. What cruft might I need to get rid of because it died? For example,
I see a reference to /home/crawler/tmp/local/jobTracker/job_18cunz.xml
now when I try to execute some Nutch commands.
I've had the same problem during the invertlinks step when dealing w/ a
large number of urls. Increasing the ipc.client.timeout value from
60000 to 100000 (cf nutch-default.xml) did the trick.
Thanks for the idea - we'll give it a try now.
[snip]
> 4. Any idea whether 4 hours is a reasonable amount of time for this
test? It seemed long to me, given that I was starting with a single
> URL as the seed.
>
How many crawl passes did you do ?
Three deep, as in: bin/nutch crawl seeds -depth 3
This was the same as Doug described in his post here:
http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL
PROTECTED]
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200