You're right that the AGGREGATE alert doesn't give you the host name of the 
affected host. You can query the alerts endpoint directly to discover the name 
of the host:
GET 
api/v1/clusters/<clusterName>/alerts?Alert/state=CRITICAL&Alert/definition_name=hbase_regionserver_process

On Mar 24, 2017, at 4:05 PM, Ganesh Viswanathan 
<[email protected]<mailto:[email protected]>> wrote:

This API call worked to get the state for all regionservers:

/api/v1/clusters/cluster_name/services/HBASE/components/HBASE_REGIONSERVER?fields=host_components/HostRoles/state

I can filter out INSTALLED from this list to find the stopped one.

Thanks!


On Fri, Mar 24, 2017 at 12:34 PM, Ganesh Viswanathan 
<[email protected]<mailto:[email protected]>> wrote:
Thanks, that explains the behavior when I shut down the regionserver process 
and see the CRITICAL alert.

What I am trying to do is setup a WARNING alert for the case when a single 
"HBase Regionserver Process" is down and CRITICAL alert when two or more  
regionservers are down. I am also trying to get the hostname where the 
regionserver is down in the warning case.

Only the "HBase Regionserver Process" alert gives the name of the host impacted 
(I don't get these from "RegionServers Health Summary" and "Percent 
RegionServers Available"), hence I am trying to suitably modify this alert for 
my use-case. Is there a better way to get the regionserver host impacted from 
Ambari API when RegionServers Health Summary fires at WARNING level?




On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley 
<[email protected]<mailto:[email protected]>> wrote:
I'm not sure what you mean when you say "turn down" the process. If you are 
shutting down the process, then the port is released and the alert will not be 
able to make a socket connection. You will get a CRITICAL right away. The 
values in the alert are a round-trip-time coupled with a socket read time. For 
the warning, it will attempt to make a socket connection and if it succeeds and 
releases in under 1.5 seconds, then there's no warning. Because you set the 
CRITICAL value to 3600s but stopped the process, it's not going to wait 3600 
since it can detect much faster that the port is not open for a socket 
connection.

On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan 
<[email protected]<mailto:[email protected]>> wrote:

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING 
threshold and 3600s as CRITICAL threshold. However, when I test this by turning 
down the regionserver process, the alert fires off as CRITICAL directly. Is 
this a bug?

I am using HDP2.4 with Ambari 2.2.1.0<http://2.2.1.0/>:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html


Thanks,
Ganesh




Reply via email to