Re: How to recover reducer task data on a different data node?

Shahab Yunus Thu, 03 Jul 2014 04:41:23 -0700

Adding to what Jungi Jeong said, if you can get your hands on the book*
Hadoop: The Definitive Guide *by Tom White, then that would help as well as
it is explains this in significant detail.


Regards,
Shahab


On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong <[email protected]>
wrote:

> As far as I know, map outputs are stored in the local disks where map
> tasks were executing on,
> and the paths to map outputs can be constructed using username / jobId /
> taskId (even after map tasks terminated).
> The information of map outputs (which maps are done and where they are
> located) are available JobTracker, so TaskTracker fetches it from
> JobTracker and keeps the info until a job finishes.
> The newly launched reduce task requests TaskTracker, who can create a path
> to map outputs using jobId and taskId, to transfer corresponding map
> outputs, and data will transfer via Http connection (for details, look for
> the class MapOutputServlet in TaskTracker.java).
>
> I hope this can answer your question.
> - Jungi
>
>
> On 3 July 2014 18:59, James Teng <[email protected]> wrote:
>
>> Hi,
>> thanks for your quick reply.
>> could you pls explain bit more in details? like how to get the info which
>> map nodes have to transfer data to this new reducer node. and how to
>> communicate with them to transfer the data here.
>> or via what kind of way to copy data.
>>
>> James.
>> ------------------------------
>> Date: Thu, 3 Jul 2014 16:52:57 +0800
>> Subject: Re: How to recover reducer task data on a different data node?
>> From: [email protected]
>> To: [email protected]
>>
>>
>> It will start from scratch to copy all map outputs from all mapper nodes;
>>
>> Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Thu, Jul 3, 2014 at 2:28 PM, James Teng <[email protected]>
>> wrote:
>>
>> First i would like to declare that although i am not new to hadoop, but
>> not expert on it as well.
>> i would like to consult one issue on mapreduce framework. below is the
>> description of the scenarios.
>>
>> When one reduce task is failed on one datanode, then the job tracker will
>> try to schedule another node to set up this reduce job and  continue
>> running, my question is how to get the assigned data back on the new node?
>> when the map phase is done, the output data will be copied to the
>> respective partitioned reducer, now if the reduce is created on the a new
>> node, what kind of actions does the new node take to get all the
>> map-allocated data back.
>>
>>
>> thanks in advance.
>>
>> James.
>>
>>
>>
>
>
> --
> Jungi Jeong
> M.S Candidate, Computer Architecture Lab.
> Div. of Computer Science, KAIST
>

Re: How to recover reducer task data on a different data node?

Reply via email to