Hi Andrzej,
Thanks for the tool!
I found one 'map_xxxxxx' directory which matches the date my segment was
created.
It contains a 'part-0.out' file with a timestamp that matches the time
of the last entries in my log file (just before the process stopped).
I followed the preparation steps and ran the tool. However, I got the
following error:
2007-02-27 11:27:00,416 WARN mapred.LocalJobRunner
(LocalJobRunner.java:run(120)) - job_sygdrx
java.io.IOException: wrong value class: is not class
org.apache.nutch.fetcher.FetcherOutput
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:346)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:58)
at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:119)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:129)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:91)
2007-02-27 11:27:01,321 WARN util.ToolBase (ToolBase.java:doMain(185))
- Job failed!
By looking at the Hadoop sources I noticed that the FetcherOutput class
mentioned in this error message is determined by the SequenceFile class
and obtained from the sequence file itself. Which, I think, indicates
that the part-00000 file I use as input for the tool does indeed contain
FetcherOutput object(s).
I get the same error when I remove the following line from
LocalFetcherRecover.java:
job.setOutputValueClass(FetcherOutput.class);
Any clues?
Btw, we use Nutch 0.8.1.
Thanks,
Mathijs
Andrzej Bialecki wrote:
Mathijs Homminga wrote:
Hi Andrzej,
The job stopped because there was no space left on the disk:
FATAL fetcher.Fetcher - org.apache.hadoop.fs.FSError:
java.io.IOException: No space left on device
FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFileSystem.java:150)
FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:112)
We use a local FS. Temporary data is stored in /tmp/hadoop/mapred/
Ok, in your case this partial data may be recoverable, but with some
manual work involved ...
At this stage, I'm assuming that even if you started the reduce phase
its output won't be usable at all. So, we need to start from the data
contained in partial map outputs. Map outputs are a set of
SequenceFile's containing pairs of <Text, FetcherOutput> data. Umm,
forgot to ask you - are you running trunk/ or Nutch 0.8 ? If trunk,
then use the Text class, if 0.8 - replace all occurrences of Text with
UTF8.
This is such a common problem that I created a special tool to address
this - please see http://issues.apache.org/jira/browse/NUTCH-451 .
Let me repeat what the javadoc says, so that there's no
misunderstanding: if you use DFS and your fetch job is aborted, there
is no way in the world to recover the data - it's permanently lost. If
you run with a local FS, you can try this tool and hope for the best.