I really think this should be in the FAQ's?

http://wiki.apache.org/nutch/FAQ

On Fri, Oct 26, 2012 at 2:10 PM, Markus Jelsma
<[email protected]> wrote:
> Hi,
>
> You cannot recover the mapper output as far as i know. But anyway, one should 
> never have a fetcher running for three days. It's far better to generate a 
> large amount of smaller segments and fetch them sequentially. If an error 
> occurs, only a small portion is affected. We never run fetchers for more than 
> one hour, instead we run many in a row and sometimes concurrently.
>
> Cheers,
>
>
> -----Original message-----
>> From:Mohammad wrk <[email protected]>
>> Sent: Fri 26-Oct-2012 00:47
>> To: [email protected]
>> Subject: How to recover data from /tmp/hadoop-myuser
>>
>> Hi,
>>
>>
>>
>> My fetch cycle (nutch fetch ./segments/20121021205343/ -threads 25) failed, 
>> after 3 days, with the error below. Under the segment folder 
>> (./segments/20121021205343/) there is only generated fetch list 
>> (crawl_generate) and no content. However /tmp/hadoop-myuser/ has 96G of 
>> data. I was wondering if there is a way to recover this data and parse the 
>> segment?
>>
>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any 
>> valid local directory for output/file.out
>>
>>         at 
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
>>         at 
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>>         at 
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
>>         at 
>> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
>>         at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
>>         at 
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
>>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>         at 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> 2012-10-24 14:43:29,671 ERROR fetcher.Fetcher - Fetcher: 
>> java.io.IOException: Job failed!
>>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1318)
>>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1354)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1327)
>>
>>
>> Thanks,
>> Mohammad



-- 
Lewis

Reply via email to