You're right that the AGGREGATE alert doesn't give you the host name of the affected host. You can query the alerts endpoint directly to discover the name of the host: GET api/v1/clusters/<clusterName>/alerts?Alert/state=CRITICAL&Alert/definition_name=hbase_regionserver_process
On Mar 24, 2017, at 4:05 PM, Ganesh Viswanathan <[email protected]<mailto:[email protected]>> wrote: This API call worked to get the state for all regionservers: /api/v1/clusters/cluster_name/services/HBASE/components/HBASE_REGIONSERVER?fields=host_components/HostRoles/state I can filter out INSTALLED from this list to find the stopped one. Thanks! On Fri, Mar 24, 2017 at 12:34 PM, Ganesh Viswanathan <[email protected]<mailto:[email protected]>> wrote: Thanks, that explains the behavior when I shut down the regionserver process and see the CRITICAL alert. What I am trying to do is setup a WARNING alert for the case when a single "HBase Regionserver Process" is down and CRITICAL alert when two or more regionservers are down. I am also trying to get the hostname where the regionserver is down in the warning case. Only the "HBase Regionserver Process" alert gives the name of the host impacted (I don't get these from "RegionServers Health Summary" and "Percent RegionServers Available"), hence I am trying to suitably modify this alert for my use-case. Is there a better way to get the regionserver host impacted from Ambari API when RegionServers Health Summary fires at WARNING level? On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley <[email protected]<mailto:[email protected]>> wrote: I'm not sure what you mean when you say "turn down" the process. If you are shutting down the process, then the port is released and the alert will not be able to make a socket connection. You will get a CRITICAL right away. The values in the alert are a round-trip-time coupled with a socket read time. For the warning, it will attempt to make a socket connection and if it succeeds and releases in under 1.5 seconds, then there's no warning. Because you set the CRITICAL value to 3600s but stopped the process, it's not going to wait 3600 since it can detect much faster that the port is not open for a socket connection. On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan <[email protected]<mailto:[email protected]>> wrote: I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING threshold and 3600s as CRITICAL threshold. However, when I test this by turning down the regionserver process, the alert fires off as CRITICAL directly. Is this a bug? I am using HDP2.4 with Ambari 2.2.1.0<http://2.2.1.0/>: https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html Thanks, Ganesh
