Hi
1. For the Failed jobs, you can directly check the MRAppMaster logs. There you get reason for failed jobs. 2. For the stuck job, you need to do some ground work to identify what is going wrong. It can be either YARN issue or MapReduce issue. 2.1 In a recent time, I have face job stuck many times if headroom calculation goes wrong. Headroom is sent by RM to ApplicationMaster and AM uses this as deciding factors ( https://issues.apache.org/jira/i#browse/YARN-1680 ). Corresponding parent jira is https://issues.apache.org/jira/i#browse/YARN-1198 2.2 When the job is stuck, YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total Memory, How many NodeManagers? What is the headroom sent to AM. MapReduce – Any NM’s are blacklisted, Does all the reducers tasks are using ClusterMemory? By default Reducers start before Mapper completion. In case if Mapper fails because of some unstable node, then reducers take over the cluster. Here, it is expected reducers should be pre-empted. Need to identify whether reducers are getting pre-empted. MRAppMaster log would help for some extent to analyze the issue. Thanks & Regards Rohith Sharma K S From: Krish Donald [mailto:[email protected]] Sent: 02 March 2015 11:09 To: [email protected] Subject: Re: How to troubleshoot failed or stuck jobs Thanks for Link Ted, However wanted to understand the approach which should be taken when troubleshooting failed or stuck jobs ? On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <[email protected]<mailto:[email protected]>> wrote: Here are some related discussions and JIRA: http://search-hadoop.com/m/LgpTk2gxrGx http://search-hadoop.com/m/LgpTk2YLArE https://issues.apache.org/jira/browse/MAPREDUCE-6190 Cheers On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <[email protected]<mailto:[email protected]>> wrote: Hi, Wanted to understand, How to troubleshoot failed or stuck jobs ? Thanks Krish
