Yeah, oozie sounds like the best approach. I think "timeout" in Oozie refers to something different (stopping a coordinator if it hasn't started within X minutes) but the SLA mechanism should do what's asked for.
-Marcin From: Ted Dunning [mailto:[email protected]] Sent: Saturday, December 22, 2012 5:12 PM To: [email protected] Subject: Re: Alerting Also, I think that Oozie allows for timeouts in job submission. That might answer your need. On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning <[email protected]<mailto:[email protected]>> wrote: You can write a script to parse the Hadoop job list and send an alert. The trick of putting a retry into your workflow system is a nice one. If your program won't allow multiple copies to run at the same time, then if you re-invoke the program every, say, hour, then 5 retries implies that the previous invocation has been running for 5 hours. On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia <[email protected]<mailto:[email protected]>> wrote: Need alerting On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq <[email protected]<mailto:[email protected]>> wrote: MR web UI?Although we can't trigger anything, it provides all the info related to the jobs. I mean it would be easier to just go there and and have a look at everything rather than opening the shell and typing the command. I'm a bit lazy ;) Best Regards, Tariq +91-9741563634<tel:%2B91-9741563634> https://mtariq.jux.com/ On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia <[email protected]<mailto:[email protected]>> wrote: Best I can find is hadoop job list so far On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia <[email protected]<mailto:[email protected]>> wrote: What's the best way to trigger alert when jobs run for too long or have many failures? Is there a hadoop command that can be used to perform this activity?
