I have enough memory, so there will be only one sort and spill. Why do they
will happen parallel?


2013/5/9 Robert Evans <[email protected]>

> Yes it all happens in parallel even on a single task
>
> On 5/8/13 11:17 AM, "牛兆捷" <[email protected]> wrote:
>
> >I forget to say, for see the behavior of single task, I just run one map
> >task for 1G input-split(I set block size to 1GB)
> >
> >
> >2013/5/9 Robert Evans <[email protected]>
> >
> >> Deciding on the input split happens in the client.  Each map process
> >>just
> >> opens up the input file and seeks to the appropriate offset in the file.
> >> At that point it reads each entry one at a time and sends it to the map
> >> task.  The output of the map task is placed in a buffer.  When the
> >>buffer
> >> gets close to full the data is sorted and spilled out to disk in
> >>parallel
> >> with the map task still running.  It is hard to get CPU time for the
> >> different parts because they are all happening in parallel. If you do
> >>have
> >> enough ram to store the entire output in memory and you have configured
> >> your sort buffer to be able to hold it all then you will probably only
> >> sort/spill once.
> >>
> >> --Bobby
> >>
> >> On 5/8/13 10:25 AM, "牛兆捷" <[email protected]> wrote:
> >>
> >> >I saw the application container log to trace the map-reduce
> >>application.
> >> >
> >> >For map task, I find there are mainly 3 phase: spilit input, sort and
> >> >spill
> >> >out.
> >> >I set the enough memory to make sure the input can stay in memory.
> >> >
> >> >Initially, I thought the highest cpu utilization will appear in sort
> >>phase
> >> >because the other two phase focus on IO,however, it doesn't behave as
> >>what
> >> >I thought. On the contrary, the cpu utilization during  the other phase
> >> >are
> >> >higher.
> >> >
> >> >Anyone know the reason?
> >> >
> >> >--
> >> >*Sincerely,*
> >> >*Zhaojie*
> >> >*
> >> >*
> >>
> >>
> >
> >
> >--
> >*Sincerely,*
> >*Zhaojie*
> >*
> >*
>
>


-- 
*Sincerely,*
*Zhaojie*
*
*

Reply via email to