metrics endpoint polling every 60s is probably the best, alert on nodata > N minutes or any non http 200 response
alternatively you could use something like monit <https://mmonit.com/monit/> to monitor the process is running ,but this won't handle an OOM flume agent, in which case you'd need to add -XX:OnOutOfMemoryError="kill -9 %p", to make the sure the process being monitored dies when the jvm encounters OOM with metrics polling you get the added benefit of being able to detect pressure or problems before they bubble up into larger problems (IE: Channelsize increasing over N minutes, and successfulsinkcount not changing) i dont remember the exact names of the metrics it's been awhile the metric keys seemed to explain it well enough when i was using this in the past, are there any specific keys in the response from /metrics you don't understand? -- Iain Wright This email message is confidential, intended only for the recipient(s) named above and may contain information that is privileged, exempt from disclosure under applicable law. If you are not the intended recipient, do not disclose or disseminate the message to anyone except the intended recipient. If you have received this message in error, or are not the named recipient(s), please immediately notify the sender by return email, and delete all copies of this message. On Sun, Feb 26, 2017 at 7:37 PM, Suresh V <[email protected]> wrote: > Thank you. > > Additionally, where can I find details about each metric in the json > output on port 41414? I could not find detailed description of each metric > and what it means, from the user guide. > > Thank you > Suresh. > > > On Sun, Feb 26, 2017 at 9:33 PM, Sharninder Khera <[email protected]> > wrote: > >> Set up scripts to send alerts sooner ? There isn't a built in way in >> flume so you will have to setup monitoring separately >> >> >> >> >> >> On Mon, Feb 27, 2017 at 8:57 AM +0530, "Suresh V" <[email protected]> >> wrote: >> >> Hello, >>> >>> Is there a way to set up an alert mechanism by email immediately when a >>> flume agent fails due to any reason? >>> >>> At the moment, we have scripts sending the port 41414 JSON metrics by >>> email every hour, but it would be good to know as soon as an agent fails. >>> >>> Appreciate any help. >>> >>> Thank you >>> Suresh. >>> >>> >
