Adding to what Jungi Jeong said, if you can get your hands on the book* Hadoop: The Definitive Guide *by Tom White, then that would help as well as it is explains this in significant detail.
Regards, Shahab On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong <[email protected]> wrote: > As far as I know, map outputs are stored in the local disks where map > tasks were executing on, > and the paths to map outputs can be constructed using username / jobId / > taskId (even after map tasks terminated). > The information of map outputs (which maps are done and where they are > located) are available JobTracker, so TaskTracker fetches it from > JobTracker and keeps the info until a job finishes. > The newly launched reduce task requests TaskTracker, who can create a path > to map outputs using jobId and taskId, to transfer corresponding map > outputs, and data will transfer via Http connection (for details, look for > the class MapOutputServlet in TaskTracker.java). > > I hope this can answer your question. > - Jungi > > > On 3 July 2014 18:59, James Teng <[email protected]> wrote: > >> Hi, >> thanks for your quick reply. >> could you pls explain bit more in details? like how to get the info which >> map nodes have to transfer data to this new reducer node. and how to >> communicate with them to transfer the data here. >> or via what kind of way to copy data. >> >> James. >> ------------------------------ >> Date: Thu, 3 Jul 2014 16:52:57 +0800 >> Subject: Re: How to recover reducer task data on a different data node? >> From: [email protected] >> To: [email protected] >> >> >> It will start from scratch to copy all map outputs from all mapper nodes; >> >> Regards, >> *Stanley Shi,* >> >> >> >> On Thu, Jul 3, 2014 at 2:28 PM, James Teng <[email protected]> >> wrote: >> >> First i would like to declare that although i am not new to hadoop, but >> not expert on it as well. >> i would like to consult one issue on mapreduce framework. below is the >> description of the scenarios. >> >> When one reduce task is failed on one datanode, then the job tracker will >> try to schedule another node to set up this reduce job and continue >> running, my question is how to get the assigned data back on the new node? >> when the map phase is done, the output data will be copied to the >> respective partitioned reducer, now if the reduce is created on the a new >> node, what kind of actions does the new node take to get all the >> map-allocated data back. >> >> >> thanks in advance. >> >> James. >> >> >> > > > -- > Jungi Jeong > M.S Candidate, Computer Architecture Lab. > Div. of Computer Science, KAIST >
