Hi Gervais,

Okay, I see two problems: there are some leftover direcyories causing issues and for some reason VDSM seems to be trying to bind to a port something is already running on (probably an older version of VDSM.) Try removing the duplicate dirs (rmdir /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286 and /rhev/data-center/mnt - if they aren't empty don't rm -rf them because they might be mounted from your production servers. Just mv -i them to /root or somewhere.)

Next shutdown the vdsm service with "service vdsm stop" (I think, might be service stop vdsm, I don't use CentOS much) and kill any running vdsm processes (ps ax |grep vdsm) The error that I saw was:

MainThread::ERROR::2016-05-13 08:58:38,262::clientIF::128::vds::(__init__) failed to init clientIF, shutting down storage dispatcher MainThread::ERROR::2016-05-13 08:58:38,289::vdsm::171::vds::(run) Exception raised
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 169, in run
    serve_clients(log)
  File "/usr/share/vdsm/vdsm", line 102, in serve_clients
    cif = clientIF.getInstance(irs, log, scheduler)
  File "/usr/share/vdsm/clientIF.py", line 193, in getInstance
    cls._instance = clientIF(irs, log, scheduler)
  File "/usr/share/vdsm/clientIF.py", line 123, in __init__
    self._createAcceptor(host, port)
  File "/usr/share/vdsm/clientIF.py", line 201, in _createAcceptor
    port, sslctx)
  File "/usr/share/vdsm/protocoldetector.py", line 170, in __init__
    sock = _create_socket(host, port)
  File "/usr/share/vdsm/protocoldetector.py", line 40, in _create_socket
    server_socket.bind(addr[0][4])
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

If you get the same error, do a netstat -lnp and compare it to the same from a working box to see if something else is running on the VDSM port.


On 2016-05-13 09:37 AM, Gervais de Montbrun wrote:
Hi Charles,

I think the problem I am having is due to the setup failing and not something in vdsm configs as I have never gotten this server to start up properly and the BRIDGE ethernet interface + ovirt routes are not setup.

I put the logs here: https://www.dropbox.com/sh/5ugyykqh1lgru9l/AACXxRYWr3tgd0WbBVFW5twHa?dl=0

hosted-engine--deploy-logs.zip# Logs from when I tried to deploy and it failed
vdsm.tar.gz# /var/log/vdsm

Output from running vdsm from the command line:

    [root@cultivar2 log]# su -s /bin/bash vdsm
    [vdsm@cultivar2 log]$ python /usr/share/vdsm/vdsm
    (PID: 6521) I am the actual vdsm 4.17.26-1.el7
    cultivar2.grove.silverorange.com
    <http://cultivar2.grove.silverorange.com/> (3.10.0-327.el7.x86_64)
    VDSM will run with cpu affinity: frozenset([1])
    /usr/bin/taskset --all-tasks --pid --cpu-list 1 6521 (cwd None)
    SUCCESS: <err> = ''; <rc> = 0
    Starting scheduler vdsm.Scheduler
    started
    Run and protect:
    registerDomainStateChangeCallback(callbackFunc=<functools.partial
    object at 0x381b158>)
    Run and protect: registerDomainStateChangeCallback, Return
    response: None
    Trying to connect to Super Vdsm
    Preparing MOM interface
    Using named unix socket /var/run/vdsm/mom-vdsm.sock
    Unregistering all secrests
    trying to connect libvirt
    recovery: started
    Setting channels' timeout to 30 seconds.
    Starting VM channels listener thread.
    Listening at 0.0.0.0:54321 <http://0.0.0.0:54321>
    Adding detector <rpc.bindingxmlrpc.XmlDetector instance at 0x3b4ecb0>
    recovery: completed in 0s
    Adding detector <yajsonrpc.stompreactor.StompDetector instance at
    0x382e5a8>
    Starting executor
    Starting worker jsonrpc.Executor/0
    Worker started
    Starting worker jsonrpc.Executor/1
    Worker started
    Starting worker jsonrpc.Executor/2
    Worker started
    Starting worker jsonrpc.Executor/3
    Worker started
    Starting worker jsonrpc.Executor/4
    Worker started
    Starting worker jsonrpc.Executor/5
    Worker started
    Starting worker jsonrpc.Executor/6
    Worker started
    Starting worker jsonrpc.Executor/7
    Worker started
    XMLRPC server running
    Starting executor
    Starting worker periodic/0
    Worker started
    Starting worker periodic/1
    Worker started
    Starting worker periodic/2
    Worker started
    Starting worker periodic/3
    Worker started
    trying to connect libvirt
    Panic: Connect to supervdsm service failed: [Errno 2] No such file
    or directory
    Traceback (most recent call last):
      File "/usr/share/vdsm/supervdsm.py", line 78, in _connect
        utils.retry(self._manager.connect, Exception, timeout=60, tries=3)
      File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 959,
    in retry
        return func()
      File "/usr/lib64/python2.7/multiprocessing/managers.py", line
    500, in connect
        conn = Client(self._address, authkey=self._authkey)
      File "/usr/lib64/python2.7/multiprocessing/connection.py", line
    173, in Client
        c = SocketClient(address)
      File "/usr/lib64/python2.7/multiprocessing/connection.py", line
    308, in SocketClient
        s.connect(address)
      File "/usr/lib64/python2.7/socket.py", line 224, in meth
        return getattr(self._sock,name)(*args)
    error: [Errno 2] No such file or directory
    Killed


Thanks for the help. It's really appreciated.

Cheers,
Gervais

On Fri, May 13, 2016 at 12:55 AM, Charles Tassell <ctass...@gmail.com <mailto:ctass...@gmail.com>> wrote:

    Hi Gervais,

      Hmm, can you tar up the logfiles (/var/log/vdsm/* on the host
    you are installing on) and put them somewhere to look at?  Also, I
    found that starting VDSM from the command line is useful as it
    sometimes spits out error messages that don't show up in the
    logs.  I think the command I used was:
    su -s /bin/bash vdsm
    python /usr/share/vdsm/vdsm

    My problem was that I customized the logging settings in
    /etc/vdsm/*conf to try and tone down the debugging stuff and had a
    syntax error.


    On 16-05-12 10:24 PM, Gervais de Montbrun wrote:
    Hi Charles,

    Thanks for the suggestion.

    I cleaned up again using the bash script from the
    recoving-from-failed-install link below, then reinstalled (yum
    install ovirt-hosted-engine-setup).

    I enabled NetworkManager and firewalld as you suggested. The
    install stops very early on with an error:
    [ ERROR ] Failed to execute stage 'Programs detection':
    hosted-engine cannot be deployed while NetworkManager is running,
    please stop and disable it before proceeding

    I disabled and stopped NetworkManager and tried again. Same
    result. :(

    Any more guesses?

    Cheers,
    Gervais



    On May 12, 2016, at 9:08 PM, Charles Tassell <ctass...@gmail.com
    <mailto:ctass...@gmail.com>> wrote:

    Hey Gervais,

    Try enabling NetworkManager and firewalld before doing the
    hosted-engine --deploy.  I have run into problems with oVirt
    trying to perform tasks on hosts where firewalld is disabled, so
    maybe you are running into a similar problem.  Also, I think the
    setup script will disable NetworkManager if it needs to.  I know
    I didn't manually disable it on any of the boxes I installed on.

    On 16-05-12 04:49 PM, users-requ...@ovirt.org
    <mailto:users-requ...@ovirt.org> wrote:
    Message: 1
    Date: Thu, 12 May 2016 14:22:12 -0300
    From: Gervais de Montbrun <gerv...@demontbrun.com
    <mailto:gerv...@demontbrun.com>>
    To: Wee Sritippho <we...@forest.go.th <mailto:we...@forest.go.th>>
    Cc: users <users@ovirt.org <mailto:users@ovirt.org>>
    Subject: Re: [ovirt-users] Adding another host to my cluster
    Message-ID:
    <28b7fc74-5c52-4f60-b9f3-39a36621a...@demontbrun.com
    <mailto:28b7fc74-5c52-4f60-b9f3-39a36621a...@demontbrun.com>>
    Content-Type: text/plain; charset="utf-8"

    Hi Wee
    (and others)

    Thanks for the reply. I tried what you suggested, but I am in
    the exact same state. :-(

    I don't want to completely remove my hosted engine setup as it
    is working on the two other hosts in my cluster. I did not run
    the rm -rf stes listed here
    
(https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
    
<https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>)
    that would wipe my hosted_engine nfs mount. If you know that
    this is 100% necessary, please let me know.

    I did:
    hosted-engine --clean-metadata --force-cleanup --host-id=3
    run the bash script to remove all of the ovirt packages and
    config files
    reinstalled ovirt-hosted-engine-setup
    ran "hosted-engine --deploy"

    I'm back exactly where I started. Is there a way to run just
    the network configuration part of the deploy?

    Since the last attempt, I did upgrade my hosted engine and my
    cluster is now running oVirt 3.6.5.

    Cheers,
    Gervais



    On May 12, 2016, at 11:50 AM, Wee Sritippho
    <we...@forest.go.th <mailto:we...@forest.go.th>> wrote:

    Hi,

    I used to have a similar problem where one of my host can't be
    deployed due to the absence of ovirtmgmt bridge. Simone said
    it's a bug ( https://bugzilla.redhat.com/1323465
    <https://bugzilla.redhat.com/1323465> ) which would be fixed
    in 3.6.6.

    This is what I've done to solve it:

    1. In the web UI, set the failed host to maintenance.
    2. Remove it.
    3. In that host, run a script from
    
https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install
    
<https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install>
    4. Install ovirt-hosted-engine-setup again.
    5. Redeploy again.

    Hope that helps

    On 11 ??????? 2016 22 ?????? 48 ???? 58 ?????? GMT+07:00,
    Gervais de Montbrun <gerv...@demontbrun.com
    <mailto:gerv...@demontbrun.com>> wrote:
    Hi Folks,

    I hate to reply to my own message, but I'm really hoping
    someone can help me with my issue
    http://lists.ovirt.org/pipermail/users/2016-May/039690.html
    <http://lists.ovirt.org/pipermail/users/2016-May/039690.html>

    Does anyone have a suggestion for me? If there is any more
    information that I can provide that would help you to help me,
    please advise.

    Cheers,
    Gervais



    On May 9, 2016, at 1:42 PM, Gervais de Montbrun
    <gerv...@demontbrun.com <mailto:gerv...@demontbrun.com>
    <mailto:gerv...@demontbrun.com
    <mailto:gerv...@demontbrun.com>>> wrote:

    Hi All,

    I'm trying to add a third host into my oVirt cluster. I have
    hosted engine setup on the first two. It's failing to finish
    the hosted-engine --deploy on this third host. I wiped the
    server and did a CentOS 7 minimum install and ran it again to
    have a clean machine.

    My setup:
    CentOS 7 clean install
    yum install -y
    http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
    <http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm>
    yum install -y ovirt-hosted-engine-setup
    yum upgrade -y && reboot
    systemctl disable NetworkManager ; systemctl stop
    NetworkManager ; systemctl disable firewalld ; systemctl stop
    firewalld
    hosted-engine --deploy

    hosted-engine --deploy always throws an error:
    [ ERROR ] The VDSM host was found in a failed state. Please
    check engine and bootstrap installation logs.
    [ ERROR ] Unable to add Cultivar2 to the manager
    and then echo's
    [ INFO  ] Waiting for VDSM hardware info
    ...
    [ ERROR ] Failed to execute stage 'Closing up': VDSM did not
    start within 120 seconds
    [ INFO  ] Stage: Clean up
    [ INFO  ] Generating answer file
    '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160509131103.conf'
    [ INFO  ] Stage: Pre-termination
    [ INFO  ] Stage: Termination
    [ ERROR ] Hosted Engine deployment failed: this system is not
    reliable, please check the issue, fix and redeploy
        Log file is located at
    
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160509130658-qb8ev0.log

    Full output of hosted-engine --deploy included in the
    attached zip file.
    I've also included vdsm.log (There is more than one tries
    worth of tries in there).
    You'll also find the
    ovirt-hosted-engine-setup-20160509130658-qb8ev0.log listed above.

    This is my "test" setup. Cultivar0 is my first host and my
    nfs server for storage. I have two hosts in the setup already
    and everything is working fine. The host does show up in the
    oVirt admin, but shows "Installed Failed"
    <PastedGraphic-1.png>

    Trying to reinstall from within the interface just fails again.

    The ovirt bridge interface is not configured and there are no
    config files in /etc/sysconfi/network-scripts related to ovirt.

    OS:
    [root@cultivar2 ovirt-hosted-engine-setup]# cat
    /etc/redhat-release
    CentOS Linux release 7.2.1511 (Core)

    [root@cultivar2 ovirt-hosted-engine-setup]# uname -a
    Linux cultivar2.grove.silverorange.com
    <http://cultivar2.grove.silverorange.com>
    <http://cultivar2.grove.silverorange.com/>
    3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC
    2016 x86_64 x86_64 x86_64 GNU/Linux

    Versions:
    [root@cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
    ovirt
    libgovirt-0.3.3-1.el7_2.1.x86_64
    ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
    ovirt-host-deploy-1.4.1-1.el7.centos.noarch
    ovirt-vmconsole-1.0.0-1.el7.centos.noarch
    ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
    ovirt-release36-007-1.noarch
    ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
    ovirt-setup-lib-1.0.1-1.el7.centos.noarch
    ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch
    [root@cultivar2 ovirt-hosted-engine-setup]#
    [root@cultivar2 ovirt-hosted-engine-setup]#
    [root@cultivar2 ovirt-hosted-engine-setup]#
    [root@cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
    virt
    libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
    virt-viewer-2.0-6.el7.x86_64
    libgovirt-0.3.3-1.el7_2.1.x86_64
    libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64
    ovirt-hosted-engine-setup-1.3.5.0-1.1.el7.noarch
    fence-virt-0.3.2-2.el7.x86_64
    virt-what-1.13-6.el7.x86_64
    libvirt-python-1.2.17-2.el7.x86_64
    libvirt-daemon-1.2.17-13.el7_2.4.x86_64
    libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.4.x86_64
    libvirt-lock-sanlock-1.2.17-13.el7_2.4.x86_64
    libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
    libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
    libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
    ovirt-host-deploy-1.4.1-1.el7.centos.noarch
    virt-v2v-1.28.1-1.55.el7.centos.2.x86_64
    ovirt-vmconsole-1.0.0-1.el7.centos.noarch
    ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
    libvirt-client-1.2.17-13.el7_2.4.x86_64
    libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
    ovirt-release36-007-1.noarch
    libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
    libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
    ovirt-engine-sdk-python-3.6.5.0-1.el7.centos.noarch
    ovirt-setup-lib-1.0.1-1.el7.centos.noarch
    ovirt-hosted-engine-ha-1.3.5.3-1.1.el7.noarch

    I also have a series of stuck tasks that I can't clear
    related to the host that can't be added... This is a
    secondary issue and I don't want to get off track, but they
    look like this:
    <PastedGraphic-2.png>

    I'd appreciate any help that can be offered.

    Cheers,
    Gervais


    Gervais de Montbrun
    Systems Administrator  / silverorange Inc.

    Phone +1 902 367 4532 ext. 104
    <tel:%2B1%20902%20367%204532%20ext.%20104> <tel:+1 902 367
    4532 ext. 104 <tel:%2B1%20902%20367%204532%20ext.%20104>>
    Mobile +1 902 978 0009 <tel:%2B1%20902%20978%200009> <tel:+1
    902 978 0009 <tel:%2B1%20902%20978%200009>>

    <hosted-engine--deploy-logs.zip>


    Users mailing list
    Users@ovirt.org <mailto:Users@ovirt.org>
    http://lists.ovirt.org/mailman/listinfo/users
    <http://lists.ovirt.org/mailman/listinfo/users>

-- Wee


    _______________________________________________
    Users mailing list
    Users@ovirt.org <mailto:Users@ovirt.org>
    http://lists.ovirt.org/mailman/listinfo/users




_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to