Might due to performance issue of FileOutputCommitter which is resolved in 2.7 https://issues.apache.org/jira/browse/MAPREDUCE-4815
Best Regard, Jeff Zhang From: Ashish Kumar Singh <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Monday, July 20, 2015 at 4:06 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Application Master waits a long time after Mapper/Reducers finish Hi Rohit , Thanks for replying . No , I do not see any connection retry attempts to HDFS in the logs . Also , Namenode and HDFS look healthy in our cluster . PFA latest AM logs for the job . Regards, Ashish On Mon, Jul 20, 2015 at 3:29 PM, Rohith Sharma K S <[email protected]<mailto:[email protected]>> wrote: Hi >From thread dump, it seems waiting for HDFS operation. Can you attach AM >logs, and do you see any client retry for connecting to HDFS? "CommitterEvent Processor #4" prio=10 tid=0x000000000199a800 nid=0x18df in Object.wait() [0x00007f4f12aa4000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) ............................ at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1864) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:575) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:274) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) May be you can check from HDFS that is it Healthy? Thanks & Regards Rohith Sharma K S From: Ashish Kumar Singh [mailto:[email protected]<mailto:[email protected]>] Sent: 20 July 2015 14:16 To: [email protected]<mailto:[email protected]> Subject: Application Master waits a long time after Mapper/Reducers finish Hello Users , I am facing a problem running Mapreduce jobs on Hadoop 2.6. I am observing that the Applocation Master waits for a long time after all the Mappers and Reducers are completed before the job is completed . This wait time sometimes exceeds 20-25 mins which is very strange as our mappers and reducers complete in less than 10 minutes for the job . Below are some observations: a) Job completion status stands at 95% when the wait begins b)JOB_COMMIT is initiated just before this wait time ( logs: 2015-07-14 01:54:46,636 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from RUNNING to COMMITTING ) c) job success happens after 20-25 minutes ( logs: 2015-07-14 02:15:06,634 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1436854849540_0123Job Transitioned from COMMITTING to SUCCEEDED ) Appreciate any help on this . Thread dump while the Application master hangs is attached. Regards, Ashish
