Hi experts, I faced one strange issue I cannot understand, can you guys tell me if this is a bug or I configured something wrong. Below is my situation.
I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each node only can run a single job at a point of time, I used CapacityScheduler and configured 2 queues(default and small), I only give 5% capacity(10 nodes) to the small queue. What I found is the thoughput of the small queue is very poor if it's under heavy load( the inflow rate > processing speed), I checked the log of the job, found out each job takes extra 1- 2 minutes in job commit phase, see below log 2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttemptattempt_1410336300553_9902_m_000000_0 is : 1.0 2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Task 'attempt_1410336300553_9902_m_000000_0' done. 2014-09-10 14:01:13,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP 2014-09-10 14:01:13,671 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1410336300553_9902_01_000001 taskAttempt attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,675 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED 2014-09-10 14:01:13,685 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,687 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1410336300553_9902_m_000000 Task Transitioned from RUNNING to SUCCEEDED 2014-09-10 14:01:13,693 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1 2014-09-10 14:01:13,694 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.TIEMRAppMetrics: task is completed on 2014-09-10 14:01:13,697 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1410336300553_9902Job Transitioned from RUNNING to COMMITTING 2014-09-10 14:01:13,697 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT 2014-09-10 14:02:30,121 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for JobFinishedEvent 2014-09-10 14:02:30,122 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1410336300553_9902Job Transitioned from COMMITTING to SUCCEEDED As you can see the job commit started at 14:01:13 and ended at 14:02:30, it took a lot of time, I also captured the thread dump of the job(AppMaster), the interesting part is here "CommitterEvent Processor #1" id=91 idx=0x16c tid=29593 prio=5 alive, waiting, native_blocked -- Waiting for notification on: org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat lock] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at java/lang/Object.wait(J)V(Native Method) at java/lang/Object.wait(Object.java:485) at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.waitForValidCommitWindow(CommitterEventHandler.java:313) ^-- Lock released while waiting: org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat lock] at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:252) at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216) at java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java/lang/Thread.run(Thread.java:662) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace I checked the code, it got blocked and waiting for the heartbeat to RM, and also I checked org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator.heartbeat() it seems sending another resource_allocate request to RM. So my understanding(correct me if wrong) is if the queue is short of resources(no available resources in the queue), the heartbeat will be blocked for a while, thus the ongoing jobs will somehow suspend. To prove my guess, I reduce the inflow rate and always keep one node idle(inflow rate < processing speed), I can see the overall thoughput got improved significantly. My question is Is this a bug or my configuration is incorrect? Why heart beat send another resource_alloc request to RM? The scheduler supposes to inclining to the ongoing jobs when under load, make ongoing jobs finish as soon as possible and block new jobs. Thanks -- --Anfernee