I saw the application container log to trace the map-reduce application. For map task, I find there are mainly 3 phase: spilit input, sort and spill out. I set the enough memory to make sure the input can stay in memory.
Initially, I thought the highest cpu utilization will appear in sort phase because the other two phase focus on IO,however, it doesn't behave as what I thought. On the contrary, the cpu utilization during the other phase are higher. Anyone know the reason? -- *Sincerely,* *Zhaojie* * *
