This alert fires when there are alerts which haven’t reported in within 2x their interval value. The most common reason that this alert would misfire is that the scheduler on the agents isn’t able to run the scheduled alert jobs.
We’ve been seeing a problem where the scheduler may occasionally miss the window and the alert won’t run. This is being fixed for Ambari 2.1.3. In the meantime, you can changed these lines of code<https://github.com/apache/ambari/blob/branch-2.1.2/ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py#L46-L50> on the agents where the problem is occurring to: APS_CONFIG = { 'apscheduler.threadpool.core_threads': 3, 'apscheduler.coalesce': True, 'apscheduler.standalone': False, 'apscheduler.misfire_grace_time': 5 } After restarting the agents, the misfire grace time should be higher and allow for the alert job to run, even if it misses its window. On Oct 26, 2015, at 6:31 AM, Vijaya Narayana Reddy Bhoomi Reddy <[email protected]<mailto:[email protected]>> wrote: Hi, In my cluster, I often see “Ambari Server Alerts” with the message “There are 7 stale alerts from 1 host(s). Resource ManagerRPC Latency, Zookeeper etc” Can anyone please throw light on the root cause of this alert? I am not able to trace the correct cause for this. It occurs occasionally and then disappears after a while. However, it comes with a CRITICAL flag. Hence wanted to understand the root cause behind it and as well as the severity. Thanks Vijay -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS.
