Yes, thank you. Somehow, I had 100% in my tmp from a default crawl on 5 URL's, more than likely due to the -1 for the max URL field, so I dumped the files and restarted the crawl with a URL limit of 500. Everything is running (I say that very loosely) I now know what to look for in the future. Thank you very much.
-----Original Message----- From: Markus Jelsma [mailto:[email protected]] Sent: Thursday, November 04, 2010 11:55 AM To: Eric Martin Cc: [email protected] Subject: Re: False Start Hmm, im not sure, i don't use this kind of crawling but i can imagine input dir segments/* does not exist. Try to remove the asterisk? If that doesn't work, how much free disk space you have in your tmp directory? Then try setting hadoop.tmp.dir to a disk with plenty of room. > 2010-11-04 13:48:00,555 WARN segment.SegmentMerger - Input dir > /lib/nutch/crawl/segments/* doesn't exist, skipping.

