Try a yum clean all and a "yum install *mysql-connector-java*" from
command line on the hosts with any HIVE, OOZIE components.
Then retry from UI.
-Sid
On Sun, Jul 13, 2014 at 12:36 PM, Suraj Nayak M <[email protected]
<mailto:[email protected]>> wrote:
Hi Sumit,
"I restarted the process" meant - I restarted the deployment from
the UI(Using Retry button in the browser).
You were right. The task 10 was stuck at *mysql-connector-java*
installation :)
2014-07-13 20:05:32,755 - Repository['HDP-2.1'] {'action':
['create'], 'mirror_list': None, 'base_url':
'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.3.0',
'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
2014-07-13 20:05:32,761 - File['/etc/yum.repos.d/HDP.repo']
{'content': InlineTemplate(...)}
2014-07-13 20:05:32,762 - Package['hive'] {}
2014-07-13 20:05:32,780 - Installing package hive ('/usr/bin/yum
-d 0 -e 0 -y install hive')
2014-07-13 20:08:32,772 - Package['mysql-connector-java'] {}
2014-07-13 20:08:32,802 - Installing package mysql-connector-java
('/usr/bin/yum -d 0 -e 0 -y install mysql-connector-java')
I also have noticed, if the network is slow, the install succeeds
for few components and fails for few. On retry(from UI), the
install will continue (from the failure point) and the previously
failed component will succeed. Again the cycle continues till all
the components are installed. Is there any way I can increase the
timeout of python script? Or can we have a fix in Ambari for below
condition :
"/*If the error is due to python script timeout, restart the
process*/" ?
The network was slow due to some reason. The installation failed
and the below error was displayed (Screenshot attached)
*Details of error :*
*ERROR :* Python script has been killed due to timeout.
File */var/lib/ambari-agent/data/errors-181.txt* don't contain any
data.
Content of */var/lib/ambari-agent/data/output-181.txt*
2014-07-14 00:07:01,673 - Package['unzip'] {}
2014-07-14 00:07:01,770 - Skipping installing existent package unzip
2014-07-14 00:07:01,772 - Package['curl'] {}
2014-07-14 00:07:01,872 - Skipping installing existent package curl
2014-07-14 00:07:01,874 - Package['net-snmp-utils'] {}
2014-07-14 00:07:01,966 - Skipping installing existent package
net-snmp-utils
2014-07-14 00:07:01,967 - Package['net-snmp'] {}
2014-07-14 00:07:02,060 - Skipping installing existent package
net-snmp
2014-07-14 00:07:02,064 - Group['hadoop'] {}
2014-07-14 00:07:02,069 - Modifying group hadoop
2014-07-14 00:07:02,141 - Group['users'] {}
2014-07-14 00:07:02,142 - Modifying group users
2014-07-14 00:07:02,222 - Group['users'] {}
2014-07-14 00:07:02,224 - Modifying group users
2014-07-14 00:07:02,306 - User['ambari-qa'] {'gid': 'hadoop',
'groups': [u'users']}
2014-07-14 00:07:02,307 - Modifying user ambari-qa
2014-07-14 00:07:02,380 - File['/tmp/changeUid.sh'] {'content':
StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2014-07-14 00:07:02,385 - Execute['/tmp/changeUid.sh ambari-qa
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2014-07-14 00:07:02,454 - Skipping Execute['/tmp/changeUid.sh
ambari-qa
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
2>/dev/null'] due to not_if
2014-07-14 00:07:02,456 - User['hbase'] {'gid': 'hadoop',
'groups': [u'hadoop']}
2014-07-14 00:07:02,456 - Modifying user hbase
2014-07-14 00:07:02,528 - File['/tmp/changeUid.sh'] {'content':
StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2014-07-14 00:07:02,531 - Execute['/tmp/changeUid.sh hbase
/home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase
2>/dev/null'] {'not_if': 'test $(id -u hbase) -gt 1000'}
2014-07-14 00:07:02,600 - Skipping Execute['/tmp/changeUid.sh
hbase
/home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase
2>/dev/null'] due to not_if
2014-07-14 00:07:02,602 - Group['nagios'] {}
2014-07-14 00:07:02,602 - Modifying group nagios
2014-07-14 00:07:02,687 - User['nagios'] {'gid': 'nagios'}
2014-07-14 00:07:02,689 - Modifying user nagios
2014-07-14 00:07:02,757 - User['oozie'] {'gid': 'hadoop'}
2014-07-14 00:07:02,758 - Modifying user oozie
2014-07-14 00:07:02,826 - User['hcat'] {'gid': 'hadoop'}
2014-07-14 00:07:02,828 - Modifying user hcat
2014-07-14 00:07:02,897 - User['hcat'] {'gid': 'hadoop'}
2014-07-14 00:07:02,898 - Modifying user hcat
2014-07-14 00:07:02,964 - User['hive'] {'gid': 'hadoop'}
2014-07-14 00:07:02,965 - Modifying user hive
2014-07-14 00:07:03,032 - User['yarn'] {'gid': 'hadoop'}
2014-07-14 00:07:03,034 - Modifying user yarn
2014-07-14 00:07:03,099 - Group['nobody'] {}
2014-07-14 00:07:03,100 - Modifying group nobody
2014-07-14 00:07:03,178 - Group['nobody'] {}
2014-07-14 00:07:03,179 - Modifying group nobody
2014-07-14 00:07:03,260 - User['nobody'] {'gid': 'hadoop',
'groups': [u'nobody']}
2014-07-14 00:07:03,261 - Modifying user nobody
2014-07-14 00:07:03,330 - User['nobody'] {'gid': 'hadoop',
'groups': [u'nobody']}
2014-07-14 00:07:03,332 - Modifying user nobody
2014-07-14 00:07:03,401 - User['hdfs'] {'gid': 'hadoop', 'groups':
[u'hadoop']}
2014-07-14 00:07:03,403 - Modifying user hdfs
2014-07-14 00:07:03,471 - User['mapred'] {'gid': 'hadoop',
'groups': [u'hadoop']}
2014-07-14 00:07:03,473 - Modifying user mapred
2014-07-14 00:07:03,544 - User['zookeeper'] {'gid': 'hadoop'}
2014-07-14 00:07:03,545 - Modifying user zookeeper
2014-07-14 00:07:03,616 - User['storm'] {'gid': 'hadoop',
'groups': [u'hadoop']}
2014-07-14 00:07:03,618 - Modifying user storm
2014-07-14 00:07:03,688 - User['falcon'] {'gid': 'hadoop',
'groups': [u'hadoop']}
2014-07-14 00:07:03,689 - Modifying user falcon
2014-07-14 00:07:03,758 - User['tez'] {'gid': 'hadoop', 'groups':
[u'users']}
2014-07-14 00:07:03,760 - Modifying user tez
2014-07-14 00:07:04,073 - Repository['HDP-2.1'] {'action':
['create'], 'mirror_list': None, 'base_url':
'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.3.0',
'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
2014-07-14 00:07:04,084 - File['/etc/yum.repos.d/HDP.repo']
{'content': InlineTemplate(...)}
2014-07-14 00:07:04,086 - Package['oozie'] {}
2014-07-14 00:07:04,177 - Installing package oozie ('/usr/bin/yum
-d 0 -e 0 -y install oozie')
--
Suraj Nayak
On Sunday 13 July 2014 09:13 PM, Sumit Mohanty wrote:
By "I restarted the process." do you mean that you restarted
installation?
Can you share the command logs for tasks (e.g. 10, 42, 58, etc.)?
These would help debug why the tasks are still active.
If you look at the Ambari UI and look at the past requests (top
left) then the task specific UI will show you the hosts and the
local file names on the host. The files are named as
/var/lib/ambari-agent/data/output-10.txt and
/var/lib/ambari-agent/data/errors-10.txt for task id 10.
What I can surmise based on the above is that the agents are
still stuck on executing the older tasks. Thus they cannot
execute new commands sent by Ambari Server when you retried
installation. I suggest looking at the command logs and see why
they are stuck. Restarting ambari server may not help as you may
need to restart agents if they are stuck executing the tasks.
-Sumit
On Sun, Jul 13, 2014 at 8:00 AM, Suraj Nayak M <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I am trying to install HDP2.1 using Ambari on 4 nodes. 2 NN
and 2 Slaves. The install failed due to python script
timeout. I restarted the process. From past 2hrs there is no
progress in the installation. Is it safe to kill the ambari
server and restart the process ? How can I terminate the
ongoing process in Ambari gracefully ?
Below is tail of the Ambari-Server logs.
20:12:08,530 WARN [qtp527311109-183] HeartBeatHandler:369 -
Operation failed - may be retried. Service component host:
HIVE_CLIENT, host: slave2.hdp.somedomain.com
<http://slave2.hdp.somedomain.com> Action id1-1
20:12:08,530 INFO [qtp527311109-183] HeartBeatHandler:375 -
Received report for a command that is no longer active.
CommandReport{role='HIVE_CLIENT', actionId='1-1',
status='FAILED', exitCode=999, clusterName='HDP2_CLUSTER1',
serviceName='HIVE', taskId=57, roleCommand=INSTALL,
configurationTags=null, customCommand=null}
20:12:08,530 WARN [qtp527311109-183] ActionManager:143 - The
task 57 is not in progress, ignoring update
20:12:08,966 WARN [qtp527311109-183] ActionManager:143 - The
task 26 is not in progress, ignoring update
20:12:12,319 WARN [qtp527311109-183] ActionManager:143 - The
task 58 is not in progress, ignoring update
20:12:12,605 WARN [qtp527311109-183] ActionManager:143 - The
task 42 is not in progress, ignoring update
20:12:14,872 WARN [qtp527311109-183] ActionManager:143 - The
task 10 is not in progress, ignoring update
20:12:19,039 WARN [qtp527311109-184] ActionManager:143 - The
task 26 is not in progress, ignoring update
20:12:22,382 WARN [qtp527311109-183] ActionManager:143 - The
task 58 is not in progress, ignoring update
20:12:22,655 WARN [qtp527311109-183] ActionManager:143 - The
task 42 is not in progress, ignoring update
20:12:24,919 WARN [qtp527311109-184] ActionManager:143 - The
task 10 is not in progress, ignoring update
20:12:29,086 WARN [qtp527311109-184] ActionManager:143 - The
task 26 is not in progress, ignoring update
20:12:32,576 WARN [qtp527311109-183] ActionManager:143 - The
task 58 is not in progress, ignoring update
20:12:32,704 WARN [qtp527311109-183] ActionManager:143 - The
task 42 is not in progress, ignoring update
20:12:34,955 WARN [qtp527311109-183] ActionManager:143 - The
task 10 is not in progress, ignoring update
20:12:39,132 WARN [qtp527311109-183] ActionManager:143 - The
task 26 is not in progress, ignoring update
20:12:42,629 WARN [qtp527311109-184] ActionManager:143 - The
task 58 is not in progress, ignoring update
20:12:42,754 WARN [qtp527311109-184] ActionManager:143 - The
task 42 is not in progress, ignoring update
20:12:45,137 WARN [qtp527311109-183] ActionManager:143 - The
task 10 is not in progress, ignoring update
20:12:49,320 WARN [qtp527311109-183] ActionManager:143 - The
task 26 is not in progress, ignoring update
20:12:52,962 WARN [qtp527311109-184] ActionManager:143 - The
task 58 is not in progress, ignoring update
20:12:53,093 WARN [qtp527311109-184] ActionManager:143 - The
task 42 is not in progress, ignoring update
20:12:55,184 WARN [qtp527311109-184] ActionManager:143 - The
task 10 is not in progress, ignoring update
20:12:59,366 WARN [qtp527311109-184] ActionManager:143 - The
task 26 is not in progress, ignoring update
20:13:03,013 WARN [qtp527311109-184] ActionManager:143 - The
task 58 is not in progress, ignoring update
20:13:03,257 WARN [qtp527311109-184] ActionManager:143 - The
task 42 is not in progress, ignoring update
20:13:05,231 WARN [qtp527311109-184] ActionManager:143 - The
task 10 is not in progress, ignoring update
--
Thanks
Suraj Nayak
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
entity to which it is addressed and may contain information that
is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended
recipient, you are hereby notified that any printing, copying,
dissemination, distribution, disclosure or forwarding of this
communication is strictly prohibited. If you have received this
communication in error, please contact the sender immediately and
delete it from your system. Thank You.
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
entity to which it is addressed and may contain information that is
confidential, privileged and exempt from disclosure under applicable
law. If the reader of this message is not the intended recipient, you
are hereby notified that any printing, copying, dissemination,
distribution, disclosure or forwarding of this communication is
strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system.
Thank You.