Using hadoop 2.4.0. #of Applications running on average is small ~ 40 -60. The metrics in Ganglia shows around around 10-30 apps killed every 5 mins which is very high wrt to the apps running at any given time(40-60). The RM logs though show 0 failed apps in audit logs during that hour. The RM UI also doesnt show any apps in Applications->Failed tab . The logs are getting rolled over at a slower rate ..every 1-2 hours. Am searching for "Application Finished - Failed" to find the apps failed. Please let me know if I am missing something here.
Thanks Suma On Wed, Feb 4, 2015 at 10:03 AM, Rohith Sharma K S < rohithsharm...@huawei.com> wrote: > Hi > > > > Could you give more information, which version of hadoop are you using? > > > > >> QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100. > However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs. > > May be I suspect that Logs might be rolled out. Does more applications are > running? > > > > All the applications history will be displayed on RM web UI (provided RM > is not restarted or RM recovery enabled). May be you can check these > applications lists. > > > > For finding reasons for application killed/failed, one way is you can > check in NodeManager logs also. Here you need to check using container_id > for corresponding application. > > > > Thanks & Regards > > Rohith Sharma K S > > > > *From:* Suma Shivaprasad [mailto:sumasai.shivapra...@gmail.com] > *Sent:* 03 February 2015 21:35 > *To:* u...@hadoop.apache.org; yarn-dev@hadoop.apache.org > *Subject:* QueueMetrics.AppsKilled/Failed metrics and failure reasons > > > > Hello, > > > Was trying to debug reasons for Killed/Failed apps and was checking for > the applications that were killed/failed in RM logs - from RMAuditLogger. > > QueueMetrics.AppsKilled/Failed metrics shows much higher nos i.e ~100. > However RMAuditLogger shows 1 or 2 Apps as Killed/Failed in the logs. Is it > possible that some logs are missed by AuditLogger or is it the other way > round and metrics are being reported higher ? > > Thanks > > Suma >