Thanks Rohith ... What are the other issue you have seen for failed or stuck jobs?
On Sun, Mar 1, 2015 at 10:06 PM, Rohith Sharma K S < [email protected]> wrote: > Hi > > > > 1. For the Failed jobs, you can directly check the MRAppMaster > logs. There you get reason for failed jobs. > > 2. For the stuck job, you need to do some ground work to identify > what is going wrong. It can be either YARN issue or MapReduce issue. > > 2.1 In a recent time, I have face job stuck many times if headroom > calculation goes wrong. Headroom is sent by RM to ApplicationMaster and AM > uses this as deciding factors ( > https://issues.apache.org/jira/i#browse/YARN-1680 ). Corresponding > parent jira is https://issues.apache.org/jira/i#browse/YARN-1198 > > 2.2 When the job is stuck, > > YARN – try to get ClusterMemory Used, ClusterMemory Reserved, Total > Memory, How many NodeManagers? What is the headroom sent to AM. > > MapReduce – Any NM’s are blacklisted, Does all the > reducers tasks are using ClusterMemory? By default Reducers start before > Mapper completion. In case if Mapper fails because of some unstable node, > then reducers take over the cluster. Here, it is expected reducers should > be pre-empted. Need to identify whether reducers are getting pre-empted. > > MRAppMaster log would help for some extent to analyze the issue. > > > > Thanks & Regards > > Rohith Sharma K S > > > > *From:* Krish Donald [mailto:[email protected]] > *Sent:* 02 March 2015 11:09 > *To:* [email protected] > *Subject:* Re: How to troubleshoot failed or stuck jobs > > > > Thanks for Link Ted, > > > > However wanted to understand the approach which should be taken when > troubleshooting failed or stuck jobs ? > > > > > > On Sun, Mar 1, 2015 at 8:52 PM, Ted Yu <[email protected]> wrote: > > Here are some related discussions and JIRA: > > > > http://search-hadoop.com/m/LgpTk2gxrGx > > http://search-hadoop.com/m/LgpTk2YLArE > > > > https://issues.apache.org/jira/browse/MAPREDUCE-6190 > > > > Cheers > > > > On Sun, Mar 1, 2015 at 8:41 PM, Krish Donald <[email protected]> wrote: > > Hi, > > > > Wanted to understand, How to troubleshoot failed or stuck jobs ? > > > > Thanks > > Krish > > > > >
