Please forgive me if I am sending this twice:
I am having a problem with Ambari not recognizing nodes on a network.
The cluster is using CentOS 6. I am trying to install HDP 2.1. I
have the following values in my hosts file:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.200.144 datanode10.localdomain.com
192.168.200.143 namenode.localdomain.com
192.168.200.107 datanode01.localdomain.com
When I try to connect from the namenode.localdomain.com to
datanode10.localdomain.com i get this error in the registration log:
==========================
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: datanode10.localdomain.com
==========================
....
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
("WARNING 2014-12-17 16:22:50,380 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:00,390 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:00,391 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
...
Connection to datanode10.localdomain.com closed.
SSH command execution finished
host=datanode10.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:26 datanode10.localdomain.com
What follows is more detail.
I also make some changes to the
/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py file
def run(self):
sshcommand = ["ssh",
"-o", "ConnectTimeOut=60",
"-o", "StrictHostKeyChecking=no",
"-o", "BatchMode=yes",
"-tt", # Should prevent "tput: No value for $TERM
and no -T specified" warning
"-i", self.sshkey_file,
self.user + "@" + self.host, self.command]
if DEBUG:
self.host_log.write("Running ssh command " + ' '.join(sshcommand))
self.host_log.write("==========================")
self.host_log.write("\nCommand start time " +
datetime.now().strftime('%Y-%m-%d %H:%M:%S') + " " + self.host + " "
+ self.user + " " + self.sshkey_file + " " + self.command)
#self.host_log.write("djn:BOOTSTRAP the value is:" + self.host)
sshstat = subprocess.Popen(sshcommand, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
log = sshstat.communicate()
errorMsg = log[1]
if self.errorMessage and sshstat.returncode != 0:
errorMsg = self.errorMessage + "\n" + errorMsg
log = log[0] + "\n" + errorMsg
self.host_log.write(log)
self.host_log.write("SSH command execution finished")
self.host_log.write("host=" + self.host + ", exitcode=" +
str(sshstat.returncode))
self.host_log.write("Command end time " +
datetime.now().strftime('%Y-%m-%d %H:%M:%S') + " " + self.host)
return {"exitstatus": sshstat.returncode, "log": log, "errormsg": errorMsg}
I added some information on the host_log file. The information
includes self.host, self.user, self.ssh key_file and so on...
When I run the web front end I get two different results. First I
will detail the connection to the namenode.localdomain.com. second I
will detail the connection to the datanode10.localdomain.com.
The connection to the namenode.localdomain.com is successful. Here is
the important part of the registeration log:
==========================
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: namenode.localdomain.com
==========================
Command start time 2014-12-17 16:23:17 namenode.localdomain.com root
/var/run/ambari-server/bootstrap/25/sshKey sudo python
/var/lib/ambari-agent/data/tmp/setupAgent1418854996.py
namenode.localdomain.com DEV namenode.localdomain.com 1.7.0 8080
Verifying Python version compatibility...
Using python /usr/bin/python2.6
Found ambari-agent PID: 32172
Stopping ambari-agent
Removing PID file at /var/run/ambari-agent/ambari-agent.pid
ambari-agent successfully stopped
Restarting ambari-agent
Verifying Python version compatibility...
Using python /usr/bin/python2.6
ambari-agent is not running. No PID found at
/var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility...
Using python /usr/bin/python2.6
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
('INFO 2014-12-17 16:22:56,352 Heartbeat.py:78 - Building Heartbeat:
{responseId = 17, timestamp = 1418854976352, commandsInProgress =
False, componentsMapped = False}
INFO 2014-12-17 16:22:56,407 Controller.py:214 - Heartbeat response
received (id = 18)
INFO 2014-12-17 16:22:56,408 Controller.py:249 - No commands sent from
namenode.localdomain.com
INFO 2014-12-17 16:23:06,409 Heartbeat.py:78 - Building Heartbeat:
{responseId = 18, timestamp = 1418854986409, commandsInProgress =
False, componentsMapped = False}
INFO 2014-12-17 16:23:13,422 HostCheckReportFileHandler.py:43 - Host
check report at /var/lib/ambari-agent/data/hostcheck.result
INFO 2014-12-17 16:23:13,423 HostCheckReportFileHandler.py:104 -
Removing old host check file at
/var/lib/ambari-agent/data/hostcheck.result
INFO 2014-12-17 16:23:13,423 HostCheckReportFileHandler.py:109 -
Creating host check file at
/var/lib/ambari-agent/data/hostcheck.result
INFO 2014-12-17 16:23:13,491 Controller.py:214 - Heartbeat response
received (id = 19)
INFO 2014-12-17 16:23:13,492 Controller.py:249 - No commands sent from
namenode.localdomain.com
INFO 2014-12-17 16:23:21,942 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:23,493 Heartbeat.py:78 - Building Heartbeat:
{responseId = 19, timestamp = 1418855003493, commandsInProgress =
False, componentsMapped = False}
INFO 2014-12-17 16:23:23,544 Controller.py:214 - Heartbeat response
received (id = 20)
INFO 2014-12-17 16:23:23,544 Controller.py:249 - No commands sent from
namenode.localdomain.com
INFO 2014-12-17 16:23:28,845 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:28,846 DataCleaner.py:36 - Data cleanup thread started
INFO 2014-12-17 16:23:28,847 DataCleaner.py:117 - Data cleanup started
INFO 2014-12-17 16:23:28,857 DataCleaner.py:119 - Data cleanup finished
INFO 2014-12-17 16:23:28,967 PingPortListener.py:51 - Ping port
listener started on port: 8670
INFO 2014-12-17 16:23:28,968 main.py:233 - Connecting to Ambari server
at https://namenode.localdomain.com:8440 (192.168.200.143)
INFO 2014-12-17 16:23:28,969 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com:8440/ca
', None)
Connection to namenode.localdomain.com closed.
SSH command execution finished
host=namenode.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:31 namenode.localdomain.com
The connection to the datanode10.localdomain.com does not work. Here
is the registeration log for that attempt:
==========================
Running setup agent script...
DJN...expected_host not defined here
DJN:bootstrap.py ...expected_host is: datanode10.localdomain.com
==========================
Command start time 2014-12-17 16:23:16 datanode10.localdomain.com
root /var/run/ambari-server/bootstrap/25/sshKey sudo python
/var/lib/ambari-agent/data/tmp/setupAgent1418854996.py
datanode10.localdomain.com DEV namenode.localdomain.com 1.7.0 8080
Verifying Python version compatibility...
Using python /usr/bin/python2.6
Found ambari-agent PID: 7325
Stopping ambari-agent
Removing PID file at /var/run/ambari-agent/ambari-agent.pid
ambari-agent successfully stopped
Restarting ambari-agent
Verifying Python version compatibility...
Using python /usr/bin/python2.6
ambari-agent is not running. No PID found at
/var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility...
Using python /usr/bin/python2.6
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
("WARNING 2014-12-17 16:22:50,380 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:00,390 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:00,391 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
WARNING 2014-12-17 16:23:00,391 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:10,402 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:10,402 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
WARNING 2014-12-17 16:23:10,402 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
INFO 2014-12-17 16:23:17,959 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:17,959 main.py:55 - signal received, exiting.
INFO 2014-12-17 16:23:17,960 ProcessHelper.py:39 - Removing pid file
INFO 2014-12-17 16:23:17,960 ProcessHelper.py:46 - Removing temp files
INFO 2014-12-17 16:23:23,639 main.py:83 - loglevel=logging.INFO
INFO 2014-12-17 16:23:23,639 DataCleaner.py:36 - Data cleanup thread started
INFO 2014-12-17 16:23:23,641 DataCleaner.py:117 - Data cleanup started
INFO 2014-12-17 16:23:23,642 DataCleaner.py:119 - Data cleanup finished
INFO 2014-12-17 16:23:23,678 PingPortListener.py:51 - Ping port
listener started on port: 8670
WARNING 2014-12-17 16:23:23,678 main.py:235 - Unable to determine the
IP address of the Ambari server
'namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode'
INFO 2014-12-17 16:23:23,678 NetUtil.py:48 - Connecting to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
WARNING 2014-12-17 16:23:23,679 NetUtil.py:71 - Failed to connect to
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440/ca
due to [Errno -2] Name or service not known
WARNING 2014-12-17 16:23:23,679 NetUtil.py:92 - Server at
https://namenode.localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
is not reachable, sleeping for 10 seconds...
", None)
Connection to datanode10.localdomain.com closed.
SSH command execution finished
host=datanode10.localdomain.com, exitcode=0
Command end time 2014-12-17 16:23:26 datanode10.localdomain.com
Registering with the server...
Registration with the server failed.
===============================
To double check something I wrote the following command using the
sshcommand in the bootstrap.py script:
[root@namenode ~]# ssh -v -o ConnectTimeOut=60 -o
StrictHostKeyChecking=no -o BatchMode=yes -tt -i /root/Desktop/id_rsa
[email protected] "[ -d /var/lib/ambari-agent/data/tmp ]
|| sudo mkdir -p /var/lib/ambari-agent/data/tmp ; sudo chown root
/var/lib/ambari-agent/data/tmp"
The command worked and exited with a code of 0. More detail follows.
I added the -v option and the path to the id_rsa key file is the same
one that I entered into the first page of the wizard. The result is
as follows:
[root@namenode ~]# ssh -v -o ConnectTimeOut=60 -o
StrictHostKeyChecking=no -o BatchMode=yes -tt -i /root/Desktop/id_rsa
[email protected] "[ -d /var/lib/ambari-agent/data/tmp ]
|| sudo mkdir -p /var/lib/ambari-agent/data/tmp ; sudo chown root
/var/lib/ambari-agent/data/tmp"
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to datanode10.localdomain.com [192.168.200.144] port 22.
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/Desktop/id_rsa type 1
debug1: identity file /root/Desktop/id_rsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug1: Host 'datanode10.localdomain.com' is known and matches the RSA host key.
debug1: Found key in /root/.ssh/known_hosts:13
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue:
publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found
debug1: Unspecified GSS failure. Minor code may provide more information
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_0' not found
debug1: Next authentication method: publickey
debug1: Offering public key: /root/Desktop/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Requesting [email protected]
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env XMODIFIERS = @im=none
debug1: Sending env LANG = en_US.UTF-8
debug1: Sending command: [ -d /var/lib/ambari-agent/data/tmp ] || sudo
mkdir -p /var/lib/ambari-agent/data/tmp ; sudo chown root
/var/lib/ambari-agent/data/tmp
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype [email protected] reply 0
debug1: channel 0: free: client-session, nchannels 1
Connection to datanode10.localdomain.com closed.
Transferred: sent 2952, received 2352 bytes, in 0.0 seconds
Bytes per second: sent 106095.7, received 84531.6
debug1: Exit status 0
David Novogrodsky
[email protected]
http://www.linkedin.com/in/davidnovogrodsky
On Wed, Dec 17, 2014 at 6:14 PM, Jeff Sposetti <[email protected]> wrote:
> Hi David, Try sending in plain/text, not HTML.
>
>
> On Wed, Dec 17, 2014 at 7:10 PM, David Novogrodsky
> <[email protected]> wrote:
>>
>> I am having problems adding mor
>> information to this post:
>> Delivery to the following recipient failed permanently:
>>
>> [email protected]
>>
>> Technical details of permanent failure:
>> Google tried to deliver your message, but it was rejected by the server
>> for the recipient domain ambari.apache.org by
>> mx1.eu.apache.org.[192.87.106.230].
>>
>> The error that the other server returned was:
>> 552 spam score (6.3) exceeded threshold
>> (HTML_MESSAGE,LONGWORDS,RCVD_IN_DNSWL_LOW,SPF_PASS,SPOOF_COM2OTH,WEIRD_PORT
>>
>> David Novogrodsky
>> [email protected]
>> http://www.linkedin.com/in/davidnovogrodsky
>>
>> On Wed, Dec 17, 2014 at 1:12 PM, David Novogrodsky
>> <[email protected]> wrote:
>>>
>>> The error from the registration log is as follows:
>>> ==========================
>>> Running setup agent script...
>>> ==========================
>>> Agent log at: /var/log/ambari-agent/ambari-
>>> agent.log
>>> ("WARNING 2014-12-17 10:43:08,349 NetUtil.py:92 - Server at
>>> https://namenode .
>>> localdomain.com.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode.namenode:8440
>>> is not reachable, sleeping for 10 seconds...
>>>
>>> David Novogrodsky
>>> [email protected]
>>> http://www.linkedin.com/in/davidnovogrodsky
>>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.