Ritesh created AMBARI-21697:
-------------------------------

             Summary: Spark thrift service was alerting for connectivity while 
using http mode
                 Key: AMBARI-21697
                 URL: https://issues.apache.org/jira/browse/AMBARI-21697
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.5.1
            Reporter: Ritesh


Newly installed clusters keep showing ambari thrift server down alert while 
using http mode.
An alert for spark thrift service is seen everytime new cluster is created. 
The script used by alert is 
/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/alerts/alert_spark2_thrift_port.py

Error stack 
=======
Connection failed on host 
hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016 (Traceback 
(most recent call last): 
File 
"/var/lib/ambari-agent/cache/common-services/SPARK2/2.0.0/package/scripts/alerts/alert_spark2_thrift_port.py",
 line 144, in execute 
Execute(cmd, user=hiveruser, path=[beeline_cmd], 
timeout=CHECK_COMMAND_TIMEOUT_DEFAULT) 
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 
155, in _init_ 
self.env.run() 
File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 160, in run 
self.run_action(resource, action) 
File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 124, in run_action 
provider_action() 
File 
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
 line 262, in action_run 
tries=self.resource.tries, try_sleep=self.resource.try_sleep) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
72, in inner 
result = function(command, **kwargs) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
102, in checked_call 
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
150, in _call_wrapper 
result = _call(command, **kwargs_copy) 
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 
303, in _call 
raise ExecutionFailed(err_msg, code, out, err)
*ExecutionFailed: Execution of '! beeline -u 
'jdbc:hive2://hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016/default'
 transportMode=http -e '' 2>&1| awk '
{print}
'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could 
not open client transport with JDBC Uri: 
jdbc:hive2://hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016/default:
 java.net.ConnectException: Connection refused (Connection refused) 
(state=08S01,code=0)*
Error: Could not open client transport with JDBC Uri: 
jdbc:hive2://hn0-salqa0.lv5aupozrfhezhozcxr3xjcwqe.dx.internal.cloudapp.net:10016/default:
 java.net.ConnectException: Connection refused (Connection refused) 
(state=08S01,code=0)

It seems that alert is checking wrong port (10016 instead of 10002) when 
configured in http mode (transportMode=http).
Reason
=====
>From the logic in the script it seems that if the transport mode is binary it 
>will use HIVE_SERVER_THRIFT_PORT which is same as of THRIFT_PORT_DEFAULT. 
>Hence it will always go for 10016 port. 
============
THRIFT_PORT_DEFAULT = 10016
HIVE_SERVER_TRANSPORT_MODE_DEFAULT = 'binary'
port = THRIFT_PORT_DEFAULT
if transport_mode.lower() == 'binary' and HIVE_SERVER_THRIFT_PORT_KEY in 
configurations:
port = int(configurations[HIVE_SERVER_THRIFT_PORT_KEY])
========
Resolution 
We should change the default port to 10002 in the alert script. 






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to