Re: question about cpu utilization

Robert Evans Wed, 08 May 2013 09:06:11 -0700

Deciding on the input split happens in the client.  Each map process just
opens up the input file and seeks to the appropriate offset in the file.
At that point it reads each entry one at a time and sends it to the map
task.  The output of the map task is placed in a buffer.  When the buffer
gets close to full the data is sorted and spilled out to disk in parallel
with the map task still running.  It is hard to get CPU time for the
different parts because they are all happening in parallel. If you do have
enough ram to store the entire output in memory and you have configured
your sort buffer to be able to hold it all then you will probably only
sort/spill once.


--Bobby

On 5/8/13 10:25 AM, "牛兆捷" <[email protected]> wrote:

>I saw the application container log to trace the map-reduce application.
>
>For map task, I find there are mainly 3 phase: spilit input, sort and
>spill
>out.
>I set the enough memory to make sure the input can stay in memory.
>
>Initially, I thought the highest cpu utilization will appear in sort phase
>because the other two phase focus on IO,however, it doesn't behave as what
>I thought. On the contrary, the cpu utilization during  the other phase
>are
>higher.
>
>Anyone know the reason?
>
>-- 
>*Sincerely,*
>*Zhaojie*
>*
>*

Re: question about cpu utilization

Reply via email to