Ritesh created AMBARI-21697: ------------------------------- Summary: Spark thrift service was alerting for connectivity while using http mode Key: AMBARI-21697 URL: https://issues.apache.org/jira/browse/AMBARI-21697 Project: Ambari Issue Type: Bug Components: ambari-server Affects Versions: 2.5.1 Reporter: Ritesh
Newly installed clusters keep showing ambari thrift server down alert while using http mode. An alert for spark thrift service is seen everytime new cluster is created. The script used by alert is /var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/alerts/alert_spark2_thrift_port.py Error stack ======= Connection failed on host hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016 (Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/alerts/alert_spark2_thrift_port.py", line 144, in execute Execute(cmd, user=hiveruser, path=[beeline_cmd], timeout=CHECK_COMMAND_TIMEOUT_DEFAULT) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in _init_ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) *ExecutionFailed: Execution of '! beeline -u 'jdbc:hive2://hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016/default' transportMode=http -e '' 2>&1| awk ' {print} '|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)* Error: Could not open client transport with JDBC Uri: jdbc:hive2://hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016/default: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) It seems that alert is checking wrong port (10016 instead of 10002) when configured in http mode (transportMode=http). Reason ===== >From the logic in the script it seems that if the transport mode is binary it >will use HIVE_SERVER_THRIFT_PORT which is same as of THRIFT_PORT_DEFAULT. >Hence it will always go for 10016 port. ============ THRIFT_PORT_DEFAULT = 10016 HIVE_SERVER_TRANSPORT_MODE_DEFAULT = 'binary' port = THRIFT_PORT_DEFAULT if transport_mode.lower() == 'binary' and HIVE_SERVER_THRIFT_PORT_KEY in configurations: port = int(configurations[HIVE_SERVER_THRIFT_PORT_KEY]) ======== Resolution We should change the default port to 10002 in the alert script. -- This message was sent by Atlassian JIRA (v6.4.14#64029)