Hi - there's a similar entry already, however, the fetcher.done part doesn't seem to be correct. I can see no reason why that would ever work as Hadoop temp files are simply no copied to the segment if it fails. There's also no notion of an fetcher.done file in trunk.
http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F -----Original message----- > From:Lewis John Mcgibbney <[email protected]> > Sent: Fri 26-Oct-2012 15:15 > To: [email protected] > Subject: Re: How to recover data from /tmp/hadoop-myuser > > I really think this should be in the FAQ's? > > http://wiki.apache.org/nutch/FAQ > > On Fri, Oct 26, 2012 at 2:10 PM, Markus Jelsma > <[email protected]> wrote: > > Hi, > > > > You cannot recover the mapper output as far as i know. But anyway, one > > should never have a fetcher running for three days. It's far better to > > generate a large amount of smaller segments and fetch them sequentially. If > > an error occurs, only a small portion is affected. We never run fetchers > > for more than one hour, instead we run many in a row and sometimes > > concurrently. > > > > Cheers, > > > > > > -----Original message----- > >> From:Mohammad wrk <[email protected]> > >> Sent: Fri 26-Oct-2012 00:47 > >> To: [email protected] > >> Subject: How to recover data from /tmp/hadoop-myuser > >> > >> Hi, > >> > >> > >> > >> My fetch cycle (nutch fetch ./segments/20121021205343/ -threads 25) > >> failed, after 3 days, with the error below. Under the segment folder > >> (./segments/20121021205343/) there is only generated fetch list > >> (crawl_generate) and no content. However /tmp/hadoop-myuser/ has 96G of > >> data. I was wondering if there is a way to recover this data and parse the > >> segment? > >> > >> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any > >> valid local directory for output/file.out > >> > >> at > >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) > >> at > >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) > >> at > >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127) > >> at > >> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69) > >> at > >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640) > >> at > >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323) > >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > >> at > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > >> 2012-10-24 14:43:29,671 ERROR fetcher.Fetcher - Fetcher: > >> java.io.IOException: Job failed! > >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) > >> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1318) > >> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1354) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1327) > >> > >> > >> Thanks, > >> Mohammad > > > > -- > Lewis >

