Re: Recovery possible?

Andrzej Bialecki Tue, 18 Sep 2007 08:52:20 -0700

On 9/18/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

Tim Gautier wrote:

I ran a fetch on a fetch list of around 3 million urls and it has
failed on a single reduce task.  Is there any way to recover the data
that's been pulled down already?  It's my understanding that the pages
have all been pulled down to disk at this point and since it takes 3
days to pull them down, I'd really like to avoid doing it again.

Did you use DFS, or did you run this on a single machine?


Tim Gautier wrote:
> I used DFS.
>

Then your data is probably lost - sorry. The only possible recovery pathrequires that you immediately shut down the jobtracker and tasktrackers,and then move map outputs to a different place. However, if you kept thejobtracker and tasktrackers running, then by now they've already deletedmap outputs.

If you had the map outputs, then you would have to process them byapplying just the reduce part of the fetcher (which requires writingsome code).

All in all, I think it's easier to re-run the fetch job. By the way, areyou running the fetcher in parsing mode? This is by far the most commonreason for failed reduce tasks. I strongly recommend running fetcherwith -noParsing flag, and running a parse job separately.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Recovery possible?

Reply via email to