RE: False Start

Eric Martin Thu, 04 Nov 2010 12:18:17 -0700

Yes, thank you. Somehow, I had 100% in my tmp from a default crawl on 5
URL's, more than likely due to the -1 for the max URL field, so I dumped the
files and restarted the crawl with a URL limit of 500. Everything is running
(I say that very loosely) I now know what to look for in the future. Thank
you very much.

-----Original Message-----
From: Markus Jelsma [mailto:[email protected]] 
Sent: Thursday, November 04, 2010 11:55 AM
To: Eric Martin
Cc: [email protected]
Subject: Re: False Start

Hmm, im not sure, i don't use this kind of crawling but i can imagine input 
dir segments/* does not exist. Try to remove the asterisk? If that doesn't 
work, how much free disk space you have in your tmp directory? Then try 
setting hadoop.tmp.dir to a disk with plenty of room.

> 2010-11-04 13:48:00,555 WARN  segment.SegmentMerger - Input dir
> /lib/nutch/crawl/segments/* doesn&#039;t exist, skipping.

RE: False Start

Reply via email to