Robert Levas created AMBARI-20349:
-------------------------------------

             Summary: When SPNEGO authentication is enabled for Hadoop in a 
cluster with NN HA, PXF Process alert fails
                 Key: AMBARI-20349
                 URL: https://issues.apache.org/jira/browse/AMBARI-20349
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.2.2
            Reporter: Robert Levas
            Assignee: Robert Levas
             Fix For: 2.5.0


When SPNEGO authentication is enabled for Hadoop in a cluster where NN HA is 
enabled, PXF Process alert fails with the following errors in the 
ambari-agent.log file 

{noformat}
ERROR 2017-03-07 18:03:58,417 jmx.py:44 - Getting jmx metrics from NN failed. 
URL: 
http://c6401.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesy
stem
Traceback (most recent call last):
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
 line 41, in get_value_from_jmx
    data_dict = json.loads(data)
  File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 
307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
353, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
INFO 2017-03-07 18:04:02,769 logger.py:71 - call['ambari-sudo.sh su hdfs -l -s 
/bin/bash -c 'curl --negotiate -u : -s 
'"'"'http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"'
 1>/tmp/tmphTXg76 2>/tmp/tmp5bm2nM''] {'quiet': False}
INFO 2017-03-07 18:04:02,797 logger.py:71 - call returned (0, '')
ERROR 2017-03-07 18:04:02,798 jmx.py:44 - Getting jmx metrics from NN failed. 
URL: 
http://c6402.ambari.apache.org:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
 line 41, in get_value_from_jmx
    data_dict = json.loads(data)
  File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 
307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.6/site-packages/ambari_simplejson/decoder.py", line 
353, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
{noformat}

*Cause*
During the test for the {{PXF Process}} alert, the Active NN is found using a 
JMX call.  This call requires SPNEGO authentication since SPNEGO authentication 
is turned on for the Hadoop web interfaces. However, a valid Kerberos ticket is 
not found in the configured user's Kerberos ticket cache. In this case, the 
configured users is the HDFS user - which technically is not necessary. 

This occurs in 
{code:title=common-services/PXF/3.0.0/package/alerts/api_status.py:137}
    if CLUSTER_ENV_SECURITY in configurations and 
configurations[CLUSTER_ENV_SECURITY].lower() == "true":
      if 'dfs.nameservices' in configurations[HDFS_SITE]:
        namenode_address = 
get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), 
configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
      else:
        namenode_address = 
configurations[HDFS_SITE]['dfs.namenode.http-address']

      token = _get_delegation_token(namenode_address,
                                     configurations[HADOOP_ENV_HDFS_USER],
                                     
configurations[HADOOP_ENV_HDFS_USER_KEYTAB],
                                     
configurations[HADOOP_ENV_HDFS_PRINCIPAL_NAME],
                                     None)
      commonPXFHeaders.update({"X-GP-TOKEN": token})
{code}

Inside the call at 

{code}
namenode_address = 
get_active_namenode(ConfigDictionary(configurations[HDFS_SITE]), 
configurations[CLUSTER_ENV_SECURITY], configurations[HADOOP_ENV_HDFS_USER])[1]
{code}

*Solution*
Ensure the configured user's Kerberos ticket cache contains a valid ticket 
before querying for the active NN. Possibly change the acting user to one 
executing the PXF component. 





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to