ok, got it. thanks shahab & jingi for your helpful reply. :)

Date: Thu, 3 Jul 2014 07:40:20 -0400
Subject: Re: How to recover reducer task data on a different data node?
From: [email protected]
To: [email protected]

Adding to what Jungi Jeong said, if you can get your hands on the book Hadoop: 
The Definitive Guide by Tom White, then that would help as well as it is 
explains this in significant detail.

Regards,Shahab

On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong <[email protected]> wrote:

As far as I know, map outputs are stored in the local disks where map tasks 
were executing on,

and the paths to map outputs can be constructed using username / jobId / taskId 
(even after map tasks terminated).
The information of map outputs (which maps are done and where they are located) 
are available JobTracker, so TaskTracker fetches it from JobTracker and keeps 
the info until a job finishes.
The newly launched reduce task requests TaskTracker, who can create a path to 
map outputs using jobId and taskId, to transfer corresponding map outputs, and 
data will transfer via Http connection (for details, look for the class 
MapOutputServlet in TaskTracker.java).


I hope this can answer your question.- Jungi

On 3 July 2014 18:59, James Teng <[email protected]> wrote:





Hi, thanks for your quick reply.could you pls explain bit more in details? like 
how to get the info which map nodes have to transfer data to this new reducer 
node. and how to communicate with them to transfer the data here.

or via what kind of way to copy data. 
James.
Date: Thu, 3 Jul 2014 16:52:57 +0800
Subject: Re: How to recover reducer task data on a different data node?
From: [email protected]


To: [email protected]

It will start from scratch to copy all map outputs from all mapper nodes; 

Regards,Stanley Shi,




On Thu, Jul 3, 2014 at 2:28 PM, James Teng <[email protected]> wrote:






First i would like to declare that although i am not new to hadoop, but not 
expert on it as well.i would like to consult one issue on mapreduce framework. 
below is the description of the scenarios.



When one reduce task is failed on one datanode, then the job tracker will try 
to schedule another node to set up this reduce job and  continue running, my 
question is how to get the assigned data back on the new node? when the map 
phase is done, the output data will be copied to the respective partitioned 
reducer, now if the reduce is created on the a new node, what kind of actions 
does the new node take to get all the map-allocated data back.




thanks in advance.
James.                                            

                                          


-- 
Jungi Jeong


M.S Candidate, Computer Architecture Lab.
Div. of Computer Science, KAIST


                                          

Reply via email to