When a machine dies in a Hadoop cluster, the data in mapreduce.job.local.dir is 
presumably lost. This data contains mapper output files as well as reducer 
input files. In addition, data stored in the Shuffle Fetcher memory is lost.

How do we ensure that this lost data is re-processed when a machine dies?

Thanks,

Arwin

Reply via email to