Re: question about cpu utilization

Robert Evans Fri, 10 May 2013 08:07:20 -0700

The CPU scheduling is still kind of fuzzy.  Your request is done in
virtual cores, which do not necessarily correspond to actual physical
cores.  In some cases linux cgroups may be used to guarantee that you will
get at least a certain level of CPU time, but nothing I am aware of right
now will actually bind the process to a given core.


--Bobby

On 5/8/13 11:55 PM, "牛兆捷" <[email protected]> wrote:

>btw，if I set the container cpu to less than 1, what will be? Can many
>container will share one core?
>
>
>2013/5/9 Robert Evans <[email protected]>
>
>> The I am really not sure what is happening.  Try profiling your task.
>>
>> --Bobby
>>
>> On 5/8/13 11:48 AM, "牛兆捷" <[email protected]> wrote:
>>
>> >Just for simplicity, I run only one map task for such as 256mb, then I
>>set
>> >my io.sort.memory to more than 512mb to make sure all input can stay in
>> >memory, I also check the log to make sure there is just on spill happen
>> >for
>> >flushing.
>> >
>> >So I think the different part run one by one, but the cpu utilization
>>is
>> >out of my expect.
>> >
>> >
>> >2013/5/9 牛兆捷 <[email protected]>
>> >
>> >> I have enough memory, so there will be only one sort and spill. Why
>>do
>> >> they will happen parallel?
>> >>
>> >>
>> >> 2013/5/9 Robert Evans <[email protected]>
>> >>
>> >>> Yes it all happens in parallel even on a single task
>> >>>
>> >>> On 5/8/13 11:17 AM, "牛兆捷" <[email protected]> wrote:
>> >>>
>> >>> >I forget to say, for see the behavior of single task， I just run
>>one
>> >>>map
>> >>> >task for 1G input-split（I set block size to 1GB)
>> >>> >
>> >>> >
>> >>> >2013/5/9 Robert Evans <[email protected]>
>> >>> >
>> >>> >> Deciding on the input split happens in the client.  Each map
>>process
>> >>> >>just
>> >>> >> opens up the input file and seeks to the appropriate offset in
>>the
>> >>> file.
>> >>> >> At that point it reads each entry one at a time and sends it to
>>the
>> >>>map
>> >>> >> task.  The output of the map task is placed in a buffer.  When
>>the
>> >>> >>buffer
>> >>> >> gets close to full the data is sorted and spilled out to disk in
>> >>> >>parallel
>> >>> >> with the map task still running.  It is hard to get CPU time for
>>the
>> >>> >> different parts because they are all happening in parallel. If
>>you
>> >>>do
>> >>> >>have
>> >>> >> enough ram to store the entire output in memory and you have
>> >>>configured
>> >>> >> your sort buffer to be able to hold it all then you will probably
>> >>>only
>> >>> >> sort/spill once.
>> >>> >>
>> >>> >> --Bobby
>> >>> >>
>> >>> >> On 5/8/13 10:25 AM, "牛兆捷" <[email protected]> wrote:
>> >>> >>
>> >>> >> >I saw the application container log to trace the map-reduce
>> >>> >>application.
>> >>> >> >
>> >>> >> >For map task, I find there are mainly 3 phase: spilit input,
>>sort
>> >>>and
>> >>> >> >spill
>> >>> >> >out.
>> >>> >> >I set the enough memory to make sure the input can stay in
>>memory.
>> >>> >> >
>> >>> >> >Initially, I thought the highest cpu utilization will appear in
>> >>>sort
>> >>> >>phase
>> >>> >> >because the other two phase focus on IO,however, it doesn't
>>behave
>> >>>as
>> >>> >>what
>> >>> >> >I thought. On the contrary, the cpu utilization during  the
>>other
>> >>> phase
>> >>> >> >are
>> >>> >> >higher.
>> >>> >> >
>> >>> >> >Anyone know the reason?
>> >>> >> >
>> >>> >> >--
>> >>> >> >*Sincerely,*
>> >>> >> >*Zhaojie*
>> >>> >> >*
>> >>> >> >*
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >--
>> >>> >*Sincerely,*
>> >>> >*Zhaojie*
>> >>> >*
>> >>> >*
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Sincerely,*
>> >> *Zhaojie*
>> >> *
>> >> *
>> >>
>> >
>> >
>> >
>> >--
>> >*Sincerely,*
>> >*Zhaojie*
>> >*
>> >*
>>
>>
>
>
>-- 
>*Sincerely,*
>*Zhaojie*
>*
>*

Re: question about cpu utilization

Reply via email to