Okay, I see two problems: there are some leftover direcyories causing issues and for some reason VDSM seems to be trying to bind to a port something is already running on (probably an older version of VDSM.) Try removing the duplicate dirs (rmdir /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286 and /rhev/data-center/mnt - if they aren't empty don't rm -rf them because they might be mounted from your production servers. Just mv -i them to /root or somewhere.)

Next shutdown the vdsm service with "service vdsm stop" (I think, might be service stop vdsm, I don't use CentOS much) and kill any running vdsm processes (ps ax |grep vdsm) The error that I saw was:

MainThread::ERROR::2016-05-13 08:58:38,262::clientIF::128::vds::(__init__) failed to init clientIF, shutting down storage dispatcher MainThread::ERROR::2016-05-13 08:58:38,289::vdsm::171::vds::(run) Exception raised
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 169, in run
  File "/usr/share/vdsm/vdsm", line 102, in serve_clients
    cif = clientIF.getInstance(irs, log, scheduler)
  File "/usr/share/vdsm/clientIF.py", line 193, in getInstance
    cls._instance = clientIF(irs, log, scheduler)
  File "/usr/share/vdsm/clientIF.py", line 123, in __init__
    self._createAcceptor(host, port)
  File "/usr/share/vdsm/clientIF.py", line 201, in _createAcceptor
    port, sslctx)
  File "/usr/share/vdsm/protocoldetector.py", line 170, in __init__
    sock = _create_socket(host, port)
  File "/usr/share/vdsm/protocoldetector.py", line 40, in _create_socket
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

If you get the same error, do a netstat -lnp and compare it to the same from a working box to see if something else is running on the VDSM port.

I think the problem I am having is due to the setup failing and not something in vdsm configs as I have never gotten this server to start up properly and the BRIDGE ethernet interface + ovirt routes are not setup.

I put the logs here: https://www.dropbox.com/sh/5ugyykqh1lgru9l/AACXxRYWr3tgd0WbBVFW5twHa?dl=0

hosted-engine--deploy-logs.zip# Logs from when I tried to deploy and it failed
vdsm.tar.gz# /var/log/vdsm

Output from running vdsm from the command line:

    [root@cultivar2 log]# su -s /bin/bash vdsm
    [vdsm@cultivar2 log]$ python /usr/share/vdsm/vdsm
    (PID: 6521) I am the actual vdsm 4.17.26-1.el7
    <http://cultivar2.grove.silverorange.com/> (3.10.0-327.el7.x86_64)
    VDSM will run with cpu affinity: frozenset([1])
    /usr/bin/taskset --all-tasks --pid --cpu-list 1 6521 (cwd None)
    SUCCESS: <err> = ''; <rc> = 0
    Starting scheduler vdsm.Scheduler
    Run and protect:
    object at 0x381b158>)
    Run and protect: registerDomainStateChangeCallback, Return
    response: None
    Trying to connect to Super Vdsm
    Preparing MOM interface
    Using named unix socket /var/run/vdsm/mom-vdsm.sock
    Unregistering all secrests
    trying to connect libvirt
    recovery: started
    Setting channels' timeout to 30 seconds.
    Starting VM channels listener thread.
    Listening at <>
    Adding detector <rpc.bindingxmlrpc.XmlDetector instance at 0x3b4ecb0>
    recovery: completed in 0s
    Adding detector <yajsonrpc.stompreactor.StompDetector instance at
    Starting executor
    Starting worker jsonrpc.Executor/0
    Worker started
    Starting worker jsonrpc.Executor/1
    Worker started
    Starting worker jsonrpc.Executor/2
    Worker started
    Starting worker jsonrpc.Executor/3
    Worker started
    Starting worker jsonrpc.Executor/4
    Worker started
    Starting worker jsonrpc.Executor/5
    Worker started
    Starting worker jsonrpc.Executor/6
    Worker started
    Starting worker jsonrpc.Executor/7
    Worker started
    XMLRPC server running
    Starting executor
    Starting worker periodic/0
    Worker started
    Starting worker periodic/1
    Worker started
    Starting worker periodic/2
    Worker started
    Starting worker periodic/3
    Worker started
    trying to connect libvirt
    Panic: Connect to supervdsm service failed: [Errno 2] No such file
    or directory
    Traceback (most recent call last):
      File "/usr/share/vdsm/supervdsm.py", line 78, in _connect
        utils.retry(self._manager.connect, Exception, timeout=60, tries=3)
      File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 959,
    in retry
        return func()
      File "/usr/lib64/python2.7/multiprocessing/managers.py", line
    500, in connect
        conn = Client(self._address, authkey=self._authkey)
      File "/usr/lib64/python2.7/multiprocessing/connection.py", line
    173, in Client
        c = SocketClient(address)
      File "/usr/lib64/python2.7/multiprocessing/connection.py", line
    308, in SocketClient
      File "/usr/lib64/python2.7/socket.py", line 224, in meth
        return getattr(self._sock,name)(*args)
    error: [Errno 2] No such file or directory

      Hmm, can you tar up the logfiles (/var/log/vdsm/* on the host
    you are installing on) and put them somewhere to look at?  Also, I
    found that starting VDSM from the command line is useful as it
    sometimes spits out error messages that don't show up in the
    logs.  I think the command I used was:
    su -s /bin/bash vdsm
    python /usr/share/vdsm/vdsm

    My problem was that I customized the logging settings in
    /etc/vdsm/*conf to try and tone down the debugging stuff and had a
    syntax error.

    Thanks for the suggestion.

    I cleaned up again using the bash script from the
    recoving-from-failed-install link below, then reinstalled (yum
    install ovirt-hosted-engine-setup).

    I enabled NetworkManager and firewalld as you suggested. The
    install stops very early on with an error:
    [ ERROR ] Failed to execute stage 'Programs detection':
    hosted-engine cannot be deployed while NetworkManager is running,
    please stop and disable it before proceeding

    I disabled and stopped NetworkManager and tried again. Same
    result. :(

    Try enabling NetworkManager and firewalld before doing the
    hosted-engine --deploy.  I have run into problems with oVirt
    trying to perform tasks on hosts where firewalld is disabled, so
    maybe you are running into a similar problem.  Also, I think the
    setup script will disable NetworkManager if it needs to.  I know
    I didn't manually disable it on any of the boxes I installed on.

    Thanks for the reply. I tried what you suggested, but I am in
    the exact same state. :-(

    I don't want to completely remove my hosted engine setup as it
    is working on the two other hosts in my cluster. I did not run
    the rm -rf stes listed here
    that would wipe my hosted_engine nfs mount. If you know that
    this is 100% necessary, please let me know.

    I did:
    hosted-engine --clean-metadata --force-cleanup --host-id=3
    run the bash script to remove all of the ovirt packages and
    config files
    reinstalled ovirt-hosted-engine-setup
    ran "hosted-engine --deploy"

    I'm back exactly where I started. Is there a way to run just
    the network configuration part of the deploy?

    Since the last attempt, I did upgrade my hosted engine and my
    cluster is now running oVirt 3.6.5.


    I used to have a similar problem where one of my host can't be
    deployed due to the absence of ovirtmgmt bridge. Simone said
    it's a bug ( https://bugzilla.redhat.com/1323465
    <https://bugzilla.redhat.com/1323465> ) which would be fixed
    in 3.6.6.

    This is what I've done to solve it:

    1. In the web UI, set the failed host to maintenance.
    2. Remove it.
    3. In that host, run a script from
    4. Install ovirt-hosted-engine-setup again.
    5. Redeploy again.

    Hope that helps

    I hate to reply to my own message, but I'm really hoping
    someone can help me with my issue

    Does anyone have a suggestion for me? If there is any more
    information that I can provide that would help you to help me,
    please advise.


    I'm trying to add a third host into my oVirt cluster. I have
    hosted engine setup on the first two. It's failing to finish
    the hosted-engine --deploy on this third host. I wiped the
    server and did a CentOS 7 minimum install and ran it again to
    have a clean machine.

    My setup:
    CentOS 7 clean install
    yum install -y
    yum install -y ovirt-hosted-engine-setup
    yum upgrade -y && reboot
    systemctl disable NetworkManager ; systemctl stop
    NetworkManager ; systemctl disable firewalld ; systemctl stop
    hosted-engine --deploy

    hosted-engine --deploy always throws an error:
    [ ERROR ] The VDSM host was found in a failed state. Please
    check engine and bootstrap installation logs.
    [ ERROR ] Unable to add Cultivar2 to the manager
    and then echo's
    [ INFO  ] Waiting for VDSM hardware info
    [ ERROR ] Failed to execute stage 'Closing up': VDSM did not
    start within 120 seconds
    [ INFO  ] Stage: Clean up
    [ INFO  ] Generating answer file
    [ INFO  ] Stage: Pre-termination
    [ INFO  ] Stage: Termination
    [ ERROR ] Hosted Engine deployment failed: this system is not
    reliable, please check the issue, fix and redeploy
        Log file is located at

    Full output of hosted-engine --deploy included in the
    attached zip file.
    I've also included vdsm.log (There is more than one tries
    worth of tries in there).
    You'll also find the
    ovirt-hosted-engine-setup-20160509130658-qb8ev0.log listed above.

    This is my "test" setup. Cultivar0 is my first host and my
    nfs server for storage. I have two hosts in the setup already
    and everything is working fine. The host does show up in the
    oVirt admin, but shows "Installed Failed"

    Trying to reinstall from within the interface just fails again.

    The ovirt bridge interface is not configured and there are no
    config files in /etc/sysconfi/network-scripts related to ovirt.

    [root@cultivar2 ovirt-hosted-engine-setup]# cat
    CentOS Linux release 7.2.1511 (Core)

    [root@cultivar2 ovirt-hosted-engine-setup]# uname -a
    Linux cultivar2.grove.silverorange.com
    3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC
    2016 x86_64 x86_64 x86_64 GNU/Linux

    [root@cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i
    [root@cultivar2 ovirt-hosted-engine-setup]#
    [root@cultivar2 ovirt-hosted-engine-setup]#
    [root@cultivar2 ovirt-hosted-engine-setup]#
    [root@cultivar2 ovirt-hosted-engine-setup]# rpm -qa | grep -i

    I also have a series of stuck tasks that I can't clear
    related to the host that can't be added... This is a
    secondary issue and I don't want to get off track, but they
    look like this:

    I'd appreciate any help that can be offered.


