I really think this should be in the FAQ's? http://wiki.apache.org/nutch/FAQ
On Fri, Oct 26, 2012 at 2:10 PM, Markus Jelsma <[email protected]> wrote: > Hi, > > You cannot recover the mapper output as far as i know. But anyway, one should > never have a fetcher running for three days. It's far better to generate a > large amount of smaller segments and fetch them sequentially. If an error > occurs, only a small portion is affected. We never run fetchers for more than > one hour, instead we run many in a row and sometimes concurrently. > > Cheers, > > > -----Original message----- >> From:Mohammad wrk <[email protected]> >> Sent: Fri 26-Oct-2012 00:47 >> To: [email protected] >> Subject: How to recover data from /tmp/hadoop-myuser >> >> Hi, >> >> >> >> My fetch cycle (nutch fetch ./segments/20121021205343/ -threads 25) failed, >> after 3 days, with the error below. Under the segment folder >> (./segments/20121021205343/) there is only generated fetch list >> (crawl_generate) and no content. However /tmp/hadoop-myuser/ has 96G of >> data. I was wondering if there is a way to recover this data and parse the >> segment? >> >> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any >> valid local directory for output/file.out >> >> at >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) >> at >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) >> at >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) >> at >> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640) >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) >> 2012-10-24 14:43:29,671 ERROR fetcher.Fetcher - Fetcher: >> java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) >> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1318) >> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1354) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1327) >> >> >> Thanks, >> Mohammad -- Lewis

