Hi Andrzej,

Thanks for the tool!

I found one 'map_xxxxxx' directory which matches the date my segment was created. It contains a 'part-0.out' file with a timestamp that matches the time of the last entries in my log file (just before the process stopped).

I followed the preparation steps and ran the tool. However, I got the following error:

2007-02-27 11:27:00,416 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(120)) - job_sygdrx java.io.IOException: wrong value class: is not class org.apache.nutch.fetcher.FetcherOutput at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:346) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:58)
       at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:119)
       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:129)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:91) 2007-02-27 11:27:01,321 WARN util.ToolBase (ToolBase.java:doMain(185)) - Job failed!

By looking at the Hadoop sources I noticed that the FetcherOutput class mentioned in this error message is determined by the SequenceFile class and obtained from the sequence file itself. Which, I think, indicates that the part-00000 file I use as input for the tool does indeed contain FetcherOutput object(s). I get the same error when I remove the following line from LocalFetcherRecover.java:
job.setOutputValueClass(FetcherOutput.class);

Any clues?
Btw, we use Nutch 0.8.1.

Thanks,
Mathijs





Andrzej Bialecki wrote:
Mathijs Homminga wrote:
Hi Andrzej,

The job stopped because there was no space left on the disk:

FATAL fetcher.Fetcher - org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device FATAL fetcher.Fetcher - at org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFileSystem.java:150) FATAL fetcher.Fetcher - at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:112)

We use a local FS. Temporary data is stored in /tmp/hadoop/mapred/


Ok, in your case this partial data may be recoverable, but with some manual work involved ...

At this stage, I'm assuming that even if you started the reduce phase its output won't be usable at all. So, we need to start from the data contained in partial map outputs. Map outputs are a set of SequenceFile's containing pairs of <Text, FetcherOutput> data. Umm, forgot to ask you - are you running trunk/ or Nutch 0.8 ? If trunk, then use the Text class, if 0.8 - replace all occurrences of Text with UTF8.

This is such a common problem that I created a special tool to address this - please see http://issues.apache.org/jira/browse/NUTCH-451 .

Let me repeat what the javadoc says, so that there's no misunderstanding: if you use DFS and your fetch job is aborted, there is no way in the world to recover the data - it's permanently lost. If you run with a local FS, you can try this tool and hope for the best.


Reply via email to