Re: [Nutch-general] Recovering aborted fetch

Mathijs Homminga Tue, 27 Feb 2007 04:23:33 -0800

Hi Andrzej,

Thanks for the tool!


I found one 'map_xxxxxx' directory which matches the date my segment was 
created.
It contains a 'part-0.out' file with a timestamp that matches the time 
of the last entries in my log file (just before the process stopped).

I followed the preparation steps and ran the tool. However, I got the 
following error:

2007-02-27 11:27:00,416 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(120)) - job_sygdrx
java.io.IOException: wrong value class:  is not class 
org.apache.nutch.fetcher.FetcherOutput
        at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:346)
        at 
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:58)
        at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:119)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:129)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:91)
2007-02-27 11:27:01,321 WARN  util.ToolBase (ToolBase.java:doMain(185)) 
- Job failed!

By looking at the Hadoop sources I noticed that the FetcherOutput class 
mentioned in this error message is determined by the SequenceFile class 
and obtained from the sequence file itself. Which, I think, indicates 
that the part-00000 file I use as input for the tool does indeed contain 
FetcherOutput object(s).
I get the same error when I remove the following line from 
LocalFetcherRecover.java:
job.setOutputValueClass(FetcherOutput.class);

Any clues?
Btw, we use Nutch 0.8.1.

Thanks,
Mathijs





Andrzej Bialecki wrote:
> Mathijs Homminga wrote:
>> Hi Andrzej,
>>
>> The job stopped because there was no space left on the disk:
>>
>> FATAL fetcher.Fetcher - org.apache.hadoop.fs.FSError: 
>> java.io.IOException: No space left on device
>> FATAL fetcher.Fetcher - at 
>> org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFileSystem.java:150)
>>  
>>
>> FATAL fetcher.Fetcher - at 
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:112)
>>  
>>
>>
>> We use a local FS. Temporary data is stored in /tmp/hadoop/mapred/
>
>
> Ok, in your case this partial data may be recoverable, but with some 
> manual work involved ...
>
> At this stage, I'm assuming that even if you started the reduce phase 
> its output won't be usable at all. So, we need to start from the data 
> contained in partial map outputs. Map outputs are a set of 
> SequenceFile's containing pairs of <Text, FetcherOutput> data. Umm, 
> forgot to ask you - are you running trunk/ or Nutch 0.8 ? If trunk, 
> then use the Text class, if 0.8 - replace all occurrences of Text with 
> UTF8.
>
> This is such a common problem that I created a special tool to address 
> this - please see http://issues.apache.org/jira/browse/NUTCH-451 .
>
> Let me repeat what the javadoc says, so that there's no 
> misunderstanding: if you use DFS and your fetch job is aborted, there 
> is no way in the world to recover the data - it's permanently lost. If 
> you run with a local FS, you can try this tool and hope for the best.
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Recovering aborted fetch

Reply via email to