I forget to say, for see the behavior of single task, I just run one map task for 1G input-split(I set block size to 1GB)
2013/5/9 Robert Evans <[email protected]> > Deciding on the input split happens in the client. Each map process just > opens up the input file and seeks to the appropriate offset in the file. > At that point it reads each entry one at a time and sends it to the map > task. The output of the map task is placed in a buffer. When the buffer > gets close to full the data is sorted and spilled out to disk in parallel > with the map task still running. It is hard to get CPU time for the > different parts because they are all happening in parallel. If you do have > enough ram to store the entire output in memory and you have configured > your sort buffer to be able to hold it all then you will probably only > sort/spill once. > > --Bobby > > On 5/8/13 10:25 AM, "牛兆捷" <[email protected]> wrote: > > >I saw the application container log to trace the map-reduce application. > > > >For map task, I find there are mainly 3 phase: spilit input, sort and > >spill > >out. > >I set the enough memory to make sure the input can stay in memory. > > > >Initially, I thought the highest cpu utilization will appear in sort phase > >because the other two phase focus on IO,however, it doesn't behave as what > >I thought. On the contrary, the cpu utilization during the other phase > >are > >higher. > > > >Anyone know the reason? > > > >-- > >*Sincerely,* > >*Zhaojie* > >* > >* > > -- *Sincerely,* *Zhaojie* * *
