Re: question about cpu utilization

牛兆捷 Wed, 08 May 2013 18:54:58 -0700

Thanks~


2013/5/9 Robert Evans <[email protected]>

> The I am really not sure what is happening.  Try profiling your task.
>
> --Bobby
>
> On 5/8/13 11:48 AM, "牛兆捷" <[email protected]> wrote:
>
> >Just for simplicity, I run only one map task for such as 256mb, then I set
> >my io.sort.memory to more than 512mb to make sure all input can stay in
> >memory, I also check the log to make sure there is just on spill happen
> >for
> >flushing.
> >
> >So I think the different part run one by one, but the cpu utilization is
> >out of my expect.
> >
> >
> >2013/5/9 牛兆捷 <[email protected]>
> >
> >> I have enough memory, so there will be only one sort and spill. Why do
> >> they will happen parallel?
> >>
> >>
> >> 2013/5/9 Robert Evans <[email protected]>
> >>
> >>> Yes it all happens in parallel even on a single task
> >>>
> >>> On 5/8/13 11:17 AM, "牛兆捷" <[email protected]> wrote:
> >>>
> >>> >I forget to say, for see the behavior of single task， I just run one
> >>>map
> >>> >task for 1G input-split（I set block size to 1GB)
> >>> >
> >>> >
> >>> >2013/5/9 Robert Evans <[email protected]>
> >>> >
> >>> >> Deciding on the input split happens in the client.  Each map process
> >>> >>just
> >>> >> opens up the input file and seeks to the appropriate offset in the
> >>> file.
> >>> >> At that point it reads each entry one at a time and sends it to the
> >>>map
> >>> >> task.  The output of the map task is placed in a buffer.  When the
> >>> >>buffer
> >>> >> gets close to full the data is sorted and spilled out to disk in
> >>> >>parallel
> >>> >> with the map task still running.  It is hard to get CPU time for the
> >>> >> different parts because they are all happening in parallel. If you
> >>>do
> >>> >>have
> >>> >> enough ram to store the entire output in memory and you have
> >>>configured
> >>> >> your sort buffer to be able to hold it all then you will probably
> >>>only
> >>> >> sort/spill once.
> >>> >>
> >>> >> --Bobby
> >>> >>
> >>> >> On 5/8/13 10:25 AM, "牛兆捷" <[email protected]> wrote:
> >>> >>
> >>> >> >I saw the application container log to trace the map-reduce
> >>> >>application.
> >>> >> >
> >>> >> >For map task, I find there are mainly 3 phase: spilit input, sort
> >>>and
> >>> >> >spill
> >>> >> >out.
> >>> >> >I set the enough memory to make sure the input can stay in memory.
> >>> >> >
> >>> >> >Initially, I thought the highest cpu utilization will appear in
> >>>sort
> >>> >>phase
> >>> >> >because the other two phase focus on IO,however, it doesn't behave
> >>>as
> >>> >>what
> >>> >> >I thought. On the contrary, the cpu utilization during  the other
> >>> phase
> >>> >> >are
> >>> >> >higher.
> >>> >> >
> >>> >> >Anyone know the reason?
> >>> >> >
> >>> >> >--
> >>> >> >*Sincerely,*
> >>> >> >*Zhaojie*
> >>> >> >*
> >>> >> >*
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> >--
> >>> >*Sincerely,*
> >>> >*Zhaojie*
> >>> >*
> >>> >*
> >>>
> >>>
> >>
> >>
> >> --
> >> *Sincerely,*
> >> *Zhaojie*
> >> *
> >> *
> >>
> >
> >
> >
> >--
> >*Sincerely,*
> >*Zhaojie*
> >*
> >*
>
>


-- 
*Sincerely,*
*Zhaojie*
*
*

Re: question about cpu utilization

Reply via email to