I forget to say, for see the behavior of single task, I just run one map
task for 1G input-split(I set block size to 1GB)


2013/5/9 Robert Evans <[email protected]>

> Deciding on the input split happens in the client.  Each map process just
> opens up the input file and seeks to the appropriate offset in the file.
> At that point it reads each entry one at a time and sends it to the map
> task.  The output of the map task is placed in a buffer.  When the buffer
> gets close to full the data is sorted and spilled out to disk in parallel
> with the map task still running.  It is hard to get CPU time for the
> different parts because they are all happening in parallel. If you do have
> enough ram to store the entire output in memory and you have configured
> your sort buffer to be able to hold it all then you will probably only
> sort/spill once.
>
> --Bobby
>
> On 5/8/13 10:25 AM, "牛兆捷" <[email protected]> wrote:
>
> >I saw the application container log to trace the map-reduce application.
> >
> >For map task, I find there are mainly 3 phase: spilit input, sort and
> >spill
> >out.
> >I set the enough memory to make sure the input can stay in memory.
> >
> >Initially, I thought the highest cpu utilization will appear in sort phase
> >because the other two phase focus on IO,however, it doesn't behave as what
> >I thought. On the contrary, the cpu utilization during  the other phase
> >are
> >higher.
> >
> >Anyone know the reason?
> >
> >--
> >*Sincerely,*
> >*Zhaojie*
> >*
> >*
>
>


-- 
*Sincerely,*
*Zhaojie*
*
*

Reply via email to