[Nutch-general] Re: Error at end of MapReduce run with indexing

Ken Krugler Tue, 17 Jan 2006 15:52:19 -0800

Hi Florent,

[snip]

 > 1. Any ideas what might have caused it to time out just now, when it

 had successfully run many jobs up to that point?

 2. What cruft might I need to get rid of because it died? For example,
 I see a reference to /home/crawler/tmp/local/jobTracker/job_18cunz.xml
 now when I try to execute some Nutch commands.


I've had the same problem during the invertlinks step when dealing w/ a
large number of urls.  Increasing the ipc.client.timeout value from
60000  to 100000 (cf nutch-default.xml) did the trick.


Thanks for the idea - we'll give it a try now.

[snip]

 > 4. Any idea whether 4 hours is a reasonable amount of time for this

 test? It seemed long to me, given that I was starting with a single

 > URL as the seed.
 >
How many crawl passes did you do ?


Three deep, as in: bin/nutch crawl seeds -depth 3

This was the same as Doug described in his post here:

http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL 
PROTECTED]

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Error at end of MapReduce run with indexing

Reply via email to