Yeah, oozie sounds like the best approach. I think "timeout" in Oozie refers to 
something different (stopping a coordinator if it hasn't started within X 
minutes) but the SLA mechanism should do what's asked for.

-Marcin

From: Ted Dunning [mailto:[email protected]]
Sent: Saturday, December 22, 2012 5:12 PM
To: [email protected]
Subject: Re: Alerting

Also, I think that Oozie allows for timeouts in job submission.  That might 
answer your need.



On Sat, Dec 22, 2012 at 2:08 PM, Ted Dunning 
<[email protected]<mailto:[email protected]>> wrote:
You can write a script to parse the Hadoop job list and send an alert.

The trick of putting a retry into your workflow system is a nice one.  If your 
program won't allow multiple copies to run at the same time, then if you 
re-invoke the program every, say, hour, then 5 retries implies that the 
previous invocation has been running for 5 hours.

On Sat, Dec 22, 2012 at 12:49 PM, Mohit Anchlia 
<[email protected]<mailto:[email protected]>> wrote:
Need alerting

On Sat, Dec 22, 2012 at 12:44 PM, Mohammad Tariq 
<[email protected]<mailto:[email protected]>> wrote:
MR web UI?Although we can't trigger anything, it provides all the info related 
to the jobs. I mean it would be easier to just go there and and have a look at 
everything rather than opening the shell and typing the command.

I'm a bit lazy ;)

Best Regards,
Tariq
+91-9741563634<tel:%2B91-9741563634>
https://mtariq.jux.com/

On Sun, Dec 23, 2012 at 2:09 AM, Mohit Anchlia 
<[email protected]<mailto:[email protected]>> wrote:
Best I can find is hadoop job list so far

On Sat, Dec 22, 2012 at 12:30 PM, Mohit Anchlia 
<[email protected]<mailto:[email protected]>> wrote:
What's the best way to trigger alert when jobs run for too long or have many 
failures? Is there a hadoop command that can be used to perform this activity?





Reply via email to