Hi Andrzej,
Thanks for the tool!
I found one 'map_xxxxxx' directory which matches the date my segment was
created.
It contains a 'part-0.out' file with a timestamp that matches the time
of the last entries in my log file (just before the process stopped).
I followed the preparation steps and ran the tool. However, I got the
following error:
2007-02-27 11:27:00,416 WARN mapred.LocalJobRunner
(LocalJobRunner.java:run(120)) - job_sygdrx
java.io.IOException: wrong value class: is not class
org.apache.nutch.fetcher.FetcherOutput
at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:346)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:58)
at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:119)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:129)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:91)
2007-02-27 11:27:01,321 WARN util.ToolBase (ToolBase.java:doMain(185))
- Job failed!
By looking at the Hadoop sources I noticed that the FetcherOutput class
mentioned in this error message is determined by the SequenceFile class
and obtained from the sequence file itself. Which, I think, indicates
that the part-00000 file I use as input for the tool does indeed contain
FetcherOutput object(s).
I get the same error when I remove the following line from
LocalFetcherRecover.java:
job.setOutputValueClass(FetcherOutput.class);
Any clues?
Btw, we use Nutch 0.8.1.
Thanks,
Mathijs
Andrzej Bialecki wrote:
> Mathijs Homminga wrote:
>> Hi Andrzej,
>>
>> The job stopped because there was no space left on the disk:
>>
>> FATAL fetcher.Fetcher - org.apache.hadoop.fs.FSError:
>> java.io.IOException: No space left on device
>> FATAL fetcher.Fetcher - at
>> org.apache.hadoop.fs.LocalFileSystem$LocalFSFileOutputStream.write(LocalFileSystem.java:150)
>>
>>
>> FATAL fetcher.Fetcher - at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:112)
>>
>>
>>
>> We use a local FS. Temporary data is stored in /tmp/hadoop/mapred/
>
>
> Ok, in your case this partial data may be recoverable, but with some
> manual work involved ...
>
> At this stage, I'm assuming that even if you started the reduce phase
> its output won't be usable at all. So, we need to start from the data
> contained in partial map outputs. Map outputs are a set of
> SequenceFile's containing pairs of <Text, FetcherOutput> data. Umm,
> forgot to ask you - are you running trunk/ or Nutch 0.8 ? If trunk,
> then use the Text class, if 0.8 - replace all occurrences of Text with
> UTF8.
>
> This is such a common problem that I created a special tool to address
> this - please see http://issues.apache.org/jira/browse/NUTCH-451 .
>
> Let me repeat what the javadoc says, so that there's no
> misunderstanding: if you use DFS and your fetch job is aborted, there
> is no way in the world to recover the data - it's permanently lost. If
> you run with a local FS, you can try this tool and hope for the best.
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general