Sid,

Thanks for your suggestion.

*mysql-connector-java* was the initial error. That was solved after a long wait. (I will try your suggestion in my next install :-) )
*
*Below are my attempts to the successful install.
*
**Try-1* : Started cluster Install. Few components failed. (mysql-connector-java process was running via agents) *Try-2* : Used Retry option from UI. All processes were waiting. After a long time (mysql-connector-java process finished) all the process which were on wait were started. Few components installed successfully and failed due to python script timeout error. *Try-3* : Used Retry option from UI. The failed component install succeeded. Again python script timeout during Oozie client install(Screenshot attached in previous mail). *Try-4* : Success. (There were some warning due to JAVA_HOME, which am solving now)

Can I increase the timeout period of Python script which was failing often during the install ?

--
Suraj Nayak

On Monday 14 July 2014 01:29 AM, Siddharth Wagle wrote:
Try a yum clean all and a "yum install *mysql-connector-java*" from command line on the hosts with any HIVE, OOZIE components.

Then retry from UI.

-Sid


On Sun, Jul 13, 2014 at 12:36 PM, Suraj Nayak M <[email protected] <mailto:[email protected]>> wrote:

    Hi Sumit,

    "I restarted the process" meant - I restarted the deployment from
    the UI(Using Retry button in the browser).

    You were right. The task 10 was stuck at *mysql-connector-java*
    installation :)

    2014-07-13 20:05:32,755 - Repository['HDP-2.1'] {'action':
    ['create'], 'mirror_list': None, 'base_url':
    'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.3.0',
    'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
    2014-07-13 20:05:32,761 - File['/etc/yum.repos.d/HDP.repo']
    {'content': InlineTemplate(...)}
    2014-07-13 20:05:32,762 - Package['hive'] {}
    2014-07-13 20:05:32,780 - Installing package hive ('/usr/bin/yum
    -d 0 -e 0 -y install hive')
    2014-07-13 20:08:32,772 - Package['mysql-connector-java'] {}
    2014-07-13 20:08:32,802 - Installing package mysql-connector-java
    ('/usr/bin/yum -d 0 -e 0 -y install mysql-connector-java')

    I also have noticed, if the network is slow, the install succeeds
    for few components and fails for few. On retry(from UI), the
    install will continue (from the failure point) and the previously
    failed component will succeed. Again the cycle continues till all
    the components are installed. Is there any way I can increase the
    timeout of python script? Or can we have a fix in Ambari for below
    condition :

     "/*If the error is due to python script timeout, restart the
    process*/" ?

    The network was slow due to some reason. The installation failed
    and the below error was displayed (Screenshot attached)

    *Details of error :*

    *ERROR :* Python script has been killed due to timeout.

    File */var/lib/ambari-agent/data/errors-181.txt* don't contain any
    data.

    Content of */var/lib/ambari-agent/data/output-181.txt*

    2014-07-14 00:07:01,673 - Package['unzip'] {}
    2014-07-14 00:07:01,770 - Skipping installing existent package unzip
    2014-07-14 00:07:01,772 - Package['curl'] {}
    2014-07-14 00:07:01,872 - Skipping installing existent package curl
    2014-07-14 00:07:01,874 - Package['net-snmp-utils'] {}
    2014-07-14 00:07:01,966 - Skipping installing existent package
    net-snmp-utils
    2014-07-14 00:07:01,967 - Package['net-snmp'] {}
    2014-07-14 00:07:02,060 - Skipping installing existent package
    net-snmp
    2014-07-14 00:07:02,064 - Group['hadoop'] {}
    2014-07-14 00:07:02,069 - Modifying group hadoop
    2014-07-14 00:07:02,141 - Group['users'] {}
    2014-07-14 00:07:02,142 - Modifying group users
    2014-07-14 00:07:02,222 - Group['users'] {}
    2014-07-14 00:07:02,224 - Modifying group users
    2014-07-14 00:07:02,306 - User['ambari-qa'] {'gid': 'hadoop',
    'groups': [u'users']}
    2014-07-14 00:07:02,307 - Modifying user ambari-qa
    2014-07-14 00:07:02,380 - File['/tmp/changeUid.sh'] {'content':
    StaticFile('changeToSecureUid.sh'), 'mode': 0555}
    2014-07-14 00:07:02,385 - Execute['/tmp/changeUid.sh ambari-qa
    
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
    2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
    2014-07-14 00:07:02,454 - Skipping Execute['/tmp/changeUid.sh
    ambari-qa
    
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
    2>/dev/null'] due to not_if
    2014-07-14 00:07:02,456 - User['hbase'] {'gid': 'hadoop',
    'groups': [u'hadoop']}
    2014-07-14 00:07:02,456 - Modifying user hbase
    2014-07-14 00:07:02,528 - File['/tmp/changeUid.sh'] {'content':
    StaticFile('changeToSecureUid.sh'), 'mode': 0555}
    2014-07-14 00:07:02,531 - Execute['/tmp/changeUid.sh hbase
    /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase
    2>/dev/null'] {'not_if': 'test $(id -u hbase) -gt 1000'}
    2014-07-14 00:07:02,600 - Skipping Execute['/tmp/changeUid.sh
    hbase
    /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/hadoop/hbase
    2>/dev/null'] due to not_if
    2014-07-14 00:07:02,602 - Group['nagios'] {}
    2014-07-14 00:07:02,602 - Modifying group nagios
    2014-07-14 00:07:02,687 - User['nagios'] {'gid': 'nagios'}
    2014-07-14 00:07:02,689 - Modifying user nagios
    2014-07-14 00:07:02,757 - User['oozie'] {'gid': 'hadoop'}
    2014-07-14 00:07:02,758 - Modifying user oozie
    2014-07-14 00:07:02,826 - User['hcat'] {'gid': 'hadoop'}
    2014-07-14 00:07:02,828 - Modifying user hcat
    2014-07-14 00:07:02,897 - User['hcat'] {'gid': 'hadoop'}
    2014-07-14 00:07:02,898 - Modifying user hcat
    2014-07-14 00:07:02,964 - User['hive'] {'gid': 'hadoop'}
    2014-07-14 00:07:02,965 - Modifying user hive
    2014-07-14 00:07:03,032 - User['yarn'] {'gid': 'hadoop'}
    2014-07-14 00:07:03,034 - Modifying user yarn
    2014-07-14 00:07:03,099 - Group['nobody'] {}
    2014-07-14 00:07:03,100 - Modifying group nobody
    2014-07-14 00:07:03,178 - Group['nobody'] {}
    2014-07-14 00:07:03,179 - Modifying group nobody
    2014-07-14 00:07:03,260 - User['nobody'] {'gid': 'hadoop',
    'groups': [u'nobody']}
    2014-07-14 00:07:03,261 - Modifying user nobody
    2014-07-14 00:07:03,330 - User['nobody'] {'gid': 'hadoop',
    'groups': [u'nobody']}
    2014-07-14 00:07:03,332 - Modifying user nobody
    2014-07-14 00:07:03,401 - User['hdfs'] {'gid': 'hadoop', 'groups':
    [u'hadoop']}
    2014-07-14 00:07:03,403 - Modifying user hdfs
    2014-07-14 00:07:03,471 - User['mapred'] {'gid': 'hadoop',
    'groups': [u'hadoop']}
    2014-07-14 00:07:03,473 - Modifying user mapred
    2014-07-14 00:07:03,544 - User['zookeeper'] {'gid': 'hadoop'}
    2014-07-14 00:07:03,545 - Modifying user zookeeper
    2014-07-14 00:07:03,616 - User['storm'] {'gid': 'hadoop',
    'groups': [u'hadoop']}
    2014-07-14 00:07:03,618 - Modifying user storm
    2014-07-14 00:07:03,688 - User['falcon'] {'gid': 'hadoop',
    'groups': [u'hadoop']}
    2014-07-14 00:07:03,689 - Modifying user falcon
    2014-07-14 00:07:03,758 - User['tez'] {'gid': 'hadoop', 'groups':
    [u'users']}
    2014-07-14 00:07:03,760 - Modifying user tez
    2014-07-14 00:07:04,073 - Repository['HDP-2.1'] {'action':
    ['create'], 'mirror_list': None, 'base_url':
    'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.1.3.0',
    'components': ['HDP', 'main'], 'repo_file_name': 'HDP'}
    2014-07-14 00:07:04,084 - File['/etc/yum.repos.d/HDP.repo']
    {'content': InlineTemplate(...)}
    2014-07-14 00:07:04,086 - Package['oozie'] {}
    2014-07-14 00:07:04,177 - Installing package oozie ('/usr/bin/yum
    -d 0 -e 0 -y install oozie')

    --
    Suraj Nayak


    On Sunday 13 July 2014 09:13 PM, Sumit Mohanty wrote:
    By "I restarted the process." do you mean that you restarted
    installation?

    Can you share the command logs for tasks (e.g. 10, 42, 58, etc.)?
    These would help debug why the tasks are still active.

    If you look at the Ambari UI and look at the past requests (top
    left) then the task specific UI will show you the hosts and the
    local file names on the host. The files are named as
    /var/lib/ambari-agent/data/output-10.txt and
    /var/lib/ambari-agent/data/errors-10.txt for task id 10.

    What I can surmise based on the above is that the agents are
    still stuck on executing the older tasks. Thus they cannot
    execute new commands sent by Ambari Server when you retried
    installation. I suggest looking at the command logs and see why
    they are stuck. Restarting ambari server may not help as you may
    need to restart agents if they are stuck executing the tasks.

    -Sumit


    On Sun, Jul 13, 2014 at 8:00 AM, Suraj Nayak M <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,

        I am trying to install HDP2.1 using Ambari on 4 nodes. 2 NN
        and 2 Slaves. The install failed due to python script
        timeout. I restarted the process. From past 2hrs there is no
        progress in the installation. Is it safe to kill the ambari
        server and restart the process ? How can I terminate the
        ongoing process in Ambari gracefully ?

        Below is tail of the Ambari-Server logs.

        20:12:08,530  WARN [qtp527311109-183] HeartBeatHandler:369 -
        Operation failed - may be retried. Service component host:
        HIVE_CLIENT, host: slave2.hdp.somedomain.com
        <http://slave2.hdp.somedomain.com> Action id1-1
        20:12:08,530  INFO [qtp527311109-183] HeartBeatHandler:375 -
        Received report for a command that is no longer active.
        CommandReport{role='HIVE_CLIENT', actionId='1-1',
        status='FAILED', exitCode=999, clusterName='HDP2_CLUSTER1',
        serviceName='HIVE', taskId=57, roleCommand=INSTALL,
        configurationTags=null, customCommand=null}
        20:12:08,530  WARN [qtp527311109-183] ActionManager:143 - The
        task 57 is not in progress, ignoring update
        20:12:08,966  WARN [qtp527311109-183] ActionManager:143 - The
        task 26 is not in progress, ignoring update
        20:12:12,319  WARN [qtp527311109-183] ActionManager:143 - The
        task 58 is not in progress, ignoring update
        20:12:12,605  WARN [qtp527311109-183] ActionManager:143 - The
        task 42 is not in progress, ignoring update
        20:12:14,872  WARN [qtp527311109-183] ActionManager:143 - The
        task 10 is not in progress, ignoring update
        20:12:19,039  WARN [qtp527311109-184] ActionManager:143 - The
        task 26 is not in progress, ignoring update
        20:12:22,382  WARN [qtp527311109-183] ActionManager:143 - The
        task 58 is not in progress, ignoring update
        20:12:22,655  WARN [qtp527311109-183] ActionManager:143 - The
        task 42 is not in progress, ignoring update
        20:12:24,919  WARN [qtp527311109-184] ActionManager:143 - The
        task 10 is not in progress, ignoring update
        20:12:29,086  WARN [qtp527311109-184] ActionManager:143 - The
        task 26 is not in progress, ignoring update
        20:12:32,576  WARN [qtp527311109-183] ActionManager:143 - The
        task 58 is not in progress, ignoring update
        20:12:32,704  WARN [qtp527311109-183] ActionManager:143 - The
        task 42 is not in progress, ignoring update
        20:12:34,955  WARN [qtp527311109-183] ActionManager:143 - The
        task 10 is not in progress, ignoring update
        20:12:39,132  WARN [qtp527311109-183] ActionManager:143 - The
        task 26 is not in progress, ignoring update
        20:12:42,629  WARN [qtp527311109-184] ActionManager:143 - The
        task 58 is not in progress, ignoring update
        20:12:42,754  WARN [qtp527311109-184] ActionManager:143 - The
        task 42 is not in progress, ignoring update
        20:12:45,137  WARN [qtp527311109-183] ActionManager:143 - The
        task 10 is not in progress, ignoring update
        20:12:49,320  WARN [qtp527311109-183] ActionManager:143 - The
        task 26 is not in progress, ignoring update
        20:12:52,962  WARN [qtp527311109-184] ActionManager:143 - The
        task 58 is not in progress, ignoring update
        20:12:53,093  WARN [qtp527311109-184] ActionManager:143 - The
        task 42 is not in progress, ignoring update
        20:12:55,184  WARN [qtp527311109-184] ActionManager:143 - The
        task 10 is not in progress, ignoring update
        20:12:59,366  WARN [qtp527311109-184] ActionManager:143 - The
        task 26 is not in progress, ignoring update
        20:13:03,013  WARN [qtp527311109-184] ActionManager:143 - The
        task 58 is not in progress, ignoring update
        20:13:03,257  WARN [qtp527311109-184] ActionManager:143 - The
        task 42 is not in progress, ignoring update
        20:13:05,231  WARN [qtp527311109-184] ActionManager:143 - The
        task 10 is not in progress, ignoring update


        --
        Thanks
        Suraj Nayak



    CONFIDENTIALITY NOTICE
    NOTICE: This message is intended for the use of the individual or
    entity to which it is addressed and may contain information that
    is confidential, privileged and exempt from disclosure under
    applicable law. If the reader of this message is not the intended
    recipient, you are hereby notified that any printing, copying,
    dissemination, distribution, disclosure or forwarding of this
    communication is strictly prohibited. If you have received this
    communication in error, please contact the sender immediately and
delete it from your system. Thank You.



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Reply via email to