Are you able to reproduce reliably? If so please send us the full logs from vdsm, ha-broker and ha-agent. So far it seems like there is more problems mixed in this thread:

1. libvirt+vdsm+qemu problem when creating a snapshot
2. storage not mounted after reboot

Thank you,
Jirka

On 04/28/2014 12:19 PM, Kevin Tibi wrote:
I'am on Centos 6.5 and this repo is for fedora...


2014-04-28 12:16 GMT+02:00 Kevin Tibi <kevint...@hotmail.com
<mailto:kevint...@hotmail.com>>:

    Hi,

    qemu-kvm-0.12.1.2-2.415.el6_5.8.x86_64
    libvirt-0.10.2-29.el6_5.7.x86_64
    vdsm-4.14.6-0.el6.x86_64
    kernel-2.6.32-431.el6.x86_64
    kernel-2.6.32-431.11.2.el6.x86_64

    i add this repop and try to update.




    2014-04-28 11:57 GMT+02:00 Martin Sivak <msi...@redhat.com
    <mailto:msi...@redhat.com>>:

        Hi Kevin,

        thanks for the information.

         > Agent.log and broker.log says nothing.

        Can you please attach those files? I would like to see how the
        crashed Qemu process is reported to us and what are the state
        machine trainsitions that cause the load.

         > 07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
        libvirterror:
         > ecode: 84 edom: 10 level: 2 message: Operation not supported:
        live disk
         > snapshot not supported with this QEMU binary

        What are the versions of vdsm, libvirt, qemu-kvm and kernel?

        If you feel like it try updating virt packages from the
        virt-preview repository:
        http://fedoraproject.org/wiki/Virtualization_Preview_Repository

        --
        Martin Sivák
        msi...@redhat.com <mailto:msi...@redhat.com>
        Red Hat Czech
        RHEV-M SLA / Brno, CZ

        ----- Original Message -----
         > Hi,
         >
         > I use this version : ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
         >
         > For 3 days, my engine-ha worked perfectly but i tried to
        snapshot a Vm and
         > ha service make defunct ==> 400% CPU !!
         >
         > Agent.log and broker.log says nothing. But vdsm.log i have
        errors :
         >
         > Thread-9462::DEBUG::2014-04-28
         > 07:23:58,994::libvirtconnection::124::root::(wrapper) Unknown
        libvirterror:
         > ecode: 84 edom: 10 level: 2 message: Operation not supported:
        live disk
         > snapshot not supported with this QEMU binary
         >
         > Thread-9462::ERROR::2014-04-28
        07:23:58,995::vm::4006::vm.Vm::(snapshot)
         > vmId=`773f6e6d-c670-49f3-ae8c-dfbcfa22d0a5`::Unable to take
        snapshot
         >
         >
         > Thread-9352::DEBUG::2014-04-28
         > 08:41:39,922::lvm::295::Storage.Misc.excCmd::(cmd)
        '/usr/bin/sudo -n
         > /sbin/lvm vgs --config " devices { preferred_names =
        [\\"^/dev/mapper/\\"]
         > ignore_suspended_devices=1 write_cache_state=0
        disable_after_error_count=3
         > obtain_device_list_from_udev=0 filter = [ \'r|.*|\' ] }  global {
         >  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }
          backup {
         >  retain_min = 50  retain_days = 0 } " --noheadings --units b
        --nosuffix
         > --separator | -o
         >
        
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
         > cc51143e-8ad7-4b0b-a4d2-9024dffc1188
        ff98d346-4515-4349-8437-fb2f5e9eaadf'
         > (cwd None)
         >
         > I'll try to reboot my node with hosted-engine.
         >
         >
         >
         > 2014-04-25 13:54 GMT+02:00 Martin Sivak <msi...@redhat.com
        <mailto:msi...@redhat.com>>:
         >
         > > Hi Kevin,
         > >
         > > can you please tell us what version of hosted-engine are
        you running?
         > >
         > > rpm -q ovirt-hosted-engine-ha
         > >
         > > Also, do I understand it correctly that the engine VM is
        running, but you
         > > see bad status when you execute the hosted-engine
        --vm-status command?
         > >
         > > If that is so, can you give us current logs from
         > > /var/log/ovirt-hosted-engine-ha?
         > >
         > > --
         > > Martin Sivák
         > > msi...@redhat.com <mailto:msi...@redhat.com>
         > > Red Hat Czech
         > > RHEV-M SLA / Brno, CZ
         > >
         > > ----- Original Message -----
         > > > Ok i mount manualy the domain for hosted engine and agent
        go up.
         > > >
         > > > But vm-status :
         > > >
         > > > --== Host 2 status ==--
         > > >
         > > > Status up-to-date                  : False
         > > > Hostname                           : 192.168.99.103
         > > > Host ID                            : 2
         > > > Engine status                      : unknown stale-data
         > > > Score                              : 0
         > > > Local maintenance                  : False
         > > > Host timestamp                     : 1398333438
         > > >
         > > > And in my engine, host02 Ha is no active.
         > > >
         > > >
         > > > 2014-04-24 12:48 GMT+02:00 Kevin Tibi
        <kevint...@hotmail.com <mailto:kevint...@hotmail.com>>:
         > > >
         > > > > Hi,
         > > > >
         > > > > I try to reboot my hosts and now [supervdsmServer] is
        <defunct>.
         > > > >
         > > > > /var/log/vdsm/supervdsm.log
         > > > >
         > > > >
         > > > > MainProcess|Thread-120::DEBUG::2014-04-24
         > > > >
        12:22:19,955::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
         > > > > return validateAccess with None
         > > > > MainProcess|Thread-120::DEBUG::2014-04-24
         > > > >
        12:22:20,010::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
         > > call
         > > > > validateAccess with ('qemu', ('qemu', 'kvm'),
         > > > > '/rhev/data-center/mnt/host01.ovirt.lan:_home_export',
        5) {}
         > > > > MainProcess|Thread-120::DEBUG::2014-04-24
         > > > >
        12:22:20,014::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
         > > > > return validateAccess with None
         > > > > MainProcess|Thread-120::DEBUG::2014-04-24
         > > > >
        12:22:20,059::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper)
         > > call
         > > > > validateAccess with ('qemu', ('qemu', 'kvm'),
         > > > > '/rhev/data-center/mnt/host01.ovirt.lan:_home_iso', 5) {}
         > > > > MainProcess|Thread-120::DEBUG::2014-04-24
         > > > >
        12:22:20,063::supervdsmServer::103::SuperVdsm.ServerCallback::(wrapper)
         > > > > return validateAccess with None
         > > > >
         > > > > and one host don't mount the NFS used for hosted engine.
         > > > >
         > > > > MainThread::CRITICAL::2014-04-24
         > > > >
         > >
        
12:36:16,603::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
         > > > > Could not start ha-agent
         > > > > Traceback (most recent call last):
         > > > >   File
         > > > >
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
         > > > > line 97, in run
         > > > >     self._run_agent()
         > > > >   File
         > > > >
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
         > > > > line 154, in _run_agent
         > > > >
         > >
        hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring()
         > > > >   File
         > > > >
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
         > > > > line 299, in start_monitoring
         > > > >     self._initialize_vdsm()
         > > > >   File
         > > > >
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
         > > > > line 418, in _initialize_vdsm
         > > > >     self._sd_path = env_path.get_domain_path(self._config)
         > > > >   File
         > > > >
        "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/env/path.py",
         > > line
         > > > > 40, in get_domain_path
         > > > >     .format(sd_uuid, parent))
         > > > > Exception: path to storage domain
        aea040f8-ab9d-435b-9ecf-ddd4272e592f
         > > not
         > > > > found in /rhev/data-center/mnt
         > > > >
         > > > >
         > > > >
         > > > > 2014-04-23 17:40 GMT+02:00 Kevin Tibi
        <kevint...@hotmail.com <mailto:kevint...@hotmail.com>>:
         > > > >
         > > > > top
         > > > >> 1729 vdsm      20   0     0    0    0 Z 373.8  0.0
        252:08.51
         > > > >> ovirt-ha-broker <defunct>
         > > > >>
         > > > >>
         > > > >> [root@host01 ~]# ps axwu | grep 1729
         > > > >> vdsm      1729  0.7  0.0      0     0 ?        Zl
        Apr02 240:24
         > > > >> [ovirt-ha-broker] <defunct>
         > > > >>
         > > > >> [root@host01 ~]# ll
         > > > >>
         > >
        
/rhev/data-center/mnt/host01.ovirt.lan\:_home_NFS01/aea040f8-ab9d-435b-9ecf-ddd4272e592f/ha_agent/
         > > > >> total 2028
         > > > >> -rw-rw----. 1 vdsm kvm 1048576 23 avril 17:35
        hosted-engine.lockspace
         > > > >> -rw-rw----. 1 vdsm kvm 1028096 23 avril 17:35
        hosted-engine.metadata
         > > > >>
         > > > >> cat /var/log/vdsm/vdsm.log
         > > > >>
         > > > >> Thread-120518::DEBUG::2014-04-23
         > > > >> 17:38:02,299::task::1185::TaskManager.Task::(prepare)
         > > > >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::finished:
         > > > >> {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0,
        'version': 3,
         > > > >> 'acquired': True, 'delay': '0.000410963', 'lastCheck':
        '3.4', 'valid':
         > > > >> True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f':
        {'code': 0, 'version':
         > > 3,
         > > > >> 'acquired': True, 'delay': '0.000412357', 'lastCheck':
        '6.8', 'valid':
         > > > >> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188':
        {'code': 0, 'version':
         > > 0,
         > > > >> 'acquired': True, 'delay': '0.000455292', 'lastCheck':
        '1.2', 'valid':
         > > > >> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf':
        {'code': 0, 'version':
         > > 0,
         > > > >> 'acquired': True, 'delay': '0.00817113', 'lastCheck':
        '1.7', 'valid':
         > > > >> True}}
         > > > >> Thread-120518::DEBUG::2014-04-23
         > > > >> 17:38:02,300::task::595::TaskManager.Task::(_updateState)
         > > > >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::moving
        from state
         > > preparing
         > > > >> ->
         > > > >> state finished
         > > > >> Thread-120518::DEBUG::2014-04-23
         > > > >>
         > >
        17:38:02,300::resourceManager::940::ResourceManager.Owner::(releaseAll)
         > > > >> Owner.releaseAll requests {} resources {}
         > > > >> Thread-120518::DEBUG::2014-04-23
         > > > >>
        17:38:02,300::resourceManager::977::ResourceManager.Owner::(cancelAll)
         > > > >> Owner.cancelAll requests {}
         > > > >> Thread-120518::DEBUG::2014-04-23
         > > > >> 17:38:02,300::task::990::TaskManager.Task::(_decref)
         > > > >> Task=`f13e71f1-ac7c-49ab-8079-8f099ebf72b6`::ref 0
        aborting False
         > > > >> Thread-120518::ERROR::2014-04-23
         > > > >>
         > >
        
17:38:02,302::brokerlink::72::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(connect)
         > > > >> Failed to connect to broker: [Errno 2] No such file or
        directory
         > > > >> Thread-120518::ERROR::2014-04-23
         > > > >> 17:38:02,302::API::1612::vds::(_getHaInfo) failed to
        retrieve Hosted
         > > > >> Engine
         > > > >> HA info
         > > > >>  Traceback (most recent call last):
         > > > >>   File "/usr/share/vdsm/API.py", line 1603, in _getHaInfo
         > > > >>     stats = instance.get_all_stats()
         > > > >>   File
         > > > >>
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
         > > > >> line 83, in get_all_stats
         > > > >>     with broker.connection():
         > > > >>   File "/usr/lib64/python2.6/contextlib.py", line 16,
        in __enter__
         > > > >>     return self.gen.next()
         > > > >>   File
         > > > >>
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
         > > > >> line 96, in connection
         > > > >>     self.connect()
         > > > >>   File
         > > > >>
         > >
        
"/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
         > > > >> line 64, in connect
         > > > >>     self._socket.connect(constants.BROKER_SOCKET_FILE)
         > > > >>   File "<string>", line 1, in connect
         > > > >> error: [Errno 2] No such file or directory
         > > > >> Thread-78::DEBUG::2014-04-23
         > > > >>
        17:38:05,490::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
         > > '/bin/dd
         > > > >> iflag=direct
         > > > >>
         > >
        
if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/5ae613a4-44e4-42cb-89fc-7b5d34c1f30f/dom_md/metadata
         > > > >> bs=4096 count=1' (cwd None)
         > > > >> Thread-78::DEBUG::2014-04-23
         > > > >>
        17:38:05,523::fileSD::225::Storage.Misc.excCmd::(getReadDelay)
         > > SUCCESS:
         > > > >> <err> = '0+1 records in\n0+1 records out\n545 bytes
        (545 B) copied,
         > > > >> 0.000412209 s, 1.3 MB/s\n'; <rc> = 0
         > > > >>
         > > > >>
         > > > >>
         > > > >>
         > > > >> 2014-04-23 17:27 GMT+02:00 Martin Sivak
        <msi...@redhat.com <mailto:msi...@redhat.com>>:
         > > > >>
         > > > >> Hi Kevin,
         > > > >>>
         > > > >>> > same pb.
         > > > >>>
         > > > >>> Are you missing the lockspace file as well while
        running on top of
         > > > >>> GlusterFS?
         > > > >>>
         > > > >>> > ovirt-ha-broker have 400% cpu and is defunct. I
        can't kill with -9.
         > > > >>>
         > > > >>> Defunct process eating full four cores? I wonder how
        is that
         > > possible..
         > > > >>> What are the status flags of that process when you do
        ps axwu?
         > > > >>>
         > > > >>> Can you attach the log files please?
         > > > >>>
         > > > >>> --
         > > > >>> Martin Sivák
         > > > >>> msi...@redhat.com <mailto:msi...@redhat.com>
         > > > >>> Red Hat Czech
         > > > >>> RHEV-M SLA / Brno, CZ
         > > > >>>
         > > > >>> ----- Original Message -----
         > > > >>> > same pb. ovirt-ha-broker have 400% cpu and is
        defunct. I can't kill
         > > > >>> with -9.
         > > > >>> >
         > > > >>> >
         > > > >>> > 2014-04-23 13:55 GMT+02:00 Martin Sivak
        <msi...@redhat.com <mailto:msi...@redhat.com>>:
         > > > >>> >
         > > > >>> > > Hi,
         > > > >>> > >
         > > > >>> > > > Isn't this file created when hosted engine is
        started?
         > > > >>> > >
         > > > >>> > > The file is created by the setup script. If it
        got lost then
         > > there
         > > > >>> was
         > > > >>> > > probably something bad happening in your NFS or
        Gluster storage.
         > > > >>> > >
         > > > >>> > > > Or how can I create this file manually?
         > > > >>> > >
         > > > >>> > > I can give you experimental treatment for this.
        We do not have
         > > any
         > > > >>> > > official way as this is something that should not
        ever happen :)
         > > > >>> > >
         > > > >>> > > !! But before you do that make sure you do not
        have any nodes
         > > running
         > > > >>> > > properly. This will destroy and reinitialize the
        lockspace
         > > database
         > > > >>> for the
         > > > >>> > > whole hosted-engine environment (which you
        apparently lack,
         > > but..).
         > > > >>> !!
         > > > >>> > >
         > > > >>> > > You have to create the
        ha_agent/hosted-engine.lockspace file
         > > with the
         > > > >>> > > expected size (1MB) and then tell sanlock to
        initialize it as a
         > > > >>> lockspace
         > > > >>> > > using:
         > > > >>> > >
         > > > >>> > > # python
         > > > >>> > > >>> import sanlock
         > > > >>> > > >>>
        sanlock.write_lockspace(lockspace="hosted-engine",
         > > > >>> > > ... path="/rhev/data-center/mnt/<nfs>/<hosted
        engine storage
         > > > >>> > > domain>/ha_agent/hosted-engine.lockspace",
         > > > >>> > > ... offset=0)
         > > > >>> > > >>>
         > > > >>> > >
         > > > >>> > > Then try starting the services (both broker and
        agent) again.
         > > > >>> > >
         > > > >>> > > --
         > > > >>> > > Martin Sivák
         > > > >>> > > msi...@redhat.com <mailto:msi...@redhat.com>
         > > > >>> > > Red Hat Czech
         > > > >>> > > RHEV-M SLA / Brno, CZ
         > > > >>> > >
         > > > >>> > >
         > > > >>> > > ----- Original Message -----
         > > > >>> > > > On 04/23/2014 11:08 AM, Martin Sivak wrote:
         > > > >>> > > > > Hi René,
         > > > >>> > > > >
         > > > >>> > > > >>>> libvirtError: Failed to acquire lock: No
        space left on
         > > device
         > > > >>> > > > >
         > > > >>> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2
        cmd_acquire
         > > 2,9,5733
         > > > >>> invalid
         > > > >>> > > > >>>> lockspace found -1 failed 0 name
         > > > >>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
         > > > >>> > > > >
         > > > >>> > > > > Can you please check the contents of
        /rhev/data-center/<your
         > > nfs
         > > > >>> > > > > mount>/<nfs domain uuid>/ha_agent/?
         > > > >>> > > > >
         > > > >>> > > > > This is how it should look like:
         > > > >>> > > > >
         > > > >>> > > > > [root@dev-03 ~]# ls -al
         > > > >>> > > > >
         > > > >>> > >
         > > > >>>
         > >
        
/rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/
         > > > >>> > > > > total 2036
         > > > >>> > > > > drwxr-x---. 2 vdsm kvm    4096 Mar 19 18:46 .
         > > > >>> > > > > drwxr-xr-x. 6 vdsm kvm    4096 Mar 19 18:46 ..
         > > > >>> > > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05
         > > > >>> hosted-engine.lockspace
         > > > >>> > > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46
         > > > >>> hosted-engine.metadata
         > > > >>> > > > >
         > > > >>> > > > > The errors seem to indicate that you somehow
        lost the
         > > lockspace
         > > > >>> file.
         > > > >>> > > >
         > > > >>> > > > True :)
         > > > >>> > > > Isn't this file created when hosted engine is
        started? Or how
         > > can I
         > > > >>> > > > create this file manually?
         > > > >>> > > >
         > > > >>> > > > >
         > > > >>> > > > > --
         > > > >>> > > > > Martin Sivák
         > > > >>> > > > > msi...@redhat.com <mailto:msi...@redhat.com>
         > > > >>> > > > > Red Hat Czech
         > > > >>> > > > > RHEV-M SLA / Brno, CZ
         > > > >>> > > > >
         > > > >>> > > > > ----- Original Message -----
         > > > >>> > > > >> On 04/23/2014 12:28 AM, Doron Fediuck wrote:
         > > > >>> > > > >>> Hi Rene,
         > > > >>> > > > >>> any idea what closed your ovirtmgmt bridge?
         > > > >>> > > > >>> as long as it is down vdsm may have issues
        starting up
         > > properly
         > > > >>> > > > >>> and this is why you see the complaints on
        the rpc server.
         > > > >>> > > > >>>
         > > > >>> > > > >>> Can you try manually fixing the network
        part first and then
         > > > >>> > > > >>> restart vdsm?
         > > > >>> > > > >>> Once vdsm is happy hosted engine VM will start.
         > > > >>> > > > >>
         > > > >>> > > > >> Thanks for your feedback, Doron.
         > > > >>> > > > >>
         > > > >>> > > > >> My ovirtmgmt bridge seems to be on or isn't it:
         > > > >>> > > > >> # brctl show ovirtmgmt
         > > > >>> > > > >> bridge name        bridge id
        STP enabled
         > > > >>> interfaces
         > > > >>> > > > >> ovirtmgmt          8000.0025907587c2       no
         > > > >>>  eth0.200
         > > > >>> > > > >>
         > > > >>> > > > >> # ip a s ovirtmgmt
         > > > >>> > > > >> 7: ovirtmgmt:
        <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
         > > qdisc
         > > > >>> noqueue
         > > > >>> > > > >> state UNKNOWN
         > > > >>> > > > >>       link/ether 00:25:90:75:87:c2 brd
        ff:ff:ff:ff:ff:ff
         > > > >>> > > > >>       inet 10.0.200.102/24
        <http://10.0.200.102/24> brd 10.0.200.255 scope global
         > > > >>> ovirtmgmt
         > > > >>> > > > >>       inet6 fe80::225:90ff:fe75:87c2/64
        scope link
         > > > >>> > > > >>          valid_lft forever preferred_lft forever
         > > > >>> > > > >>
         > > > >>> > > > >> # ip a s eth0.200
         > > > >>> > > > >> 6: eth0.200@eth0:
        <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
         > > 1500
         > > > >>> qdisc
         > > > >>> > > > >> noqueue state UP
         > > > >>> > > > >>       link/ether 00:25:90:75:87:c2 brd
        ff:ff:ff:ff:ff:ff
         > > > >>> > > > >>       inet6 fe80::225:90ff:fe75:87c2/64
        scope link
         > > > >>> > > > >>          valid_lft forever preferred_lft forever
         > > > >>> > > > >>
         > > > >>> > > > >> I tried the following yesterday:
         > > > >>> > > > >> Copy virtual disk from GlusterFS storage to
        local disk of
         > > host
         > > > >>> and
         > > > >>> > > > >> create a new vm with virt-manager which
        loads ovirtmgmt
         > > disk. I
         > > > >>> could
         > > > >>> > > > >> reach my engine over the ovirtmgmt bridge
        (so bridge must be
         > > > >>> working).
         > > > >>> > > > >>
         > > > >>> > > > >> I also started libvirtd with Option -v and I
        saw the
         > > following
         > > > >>> in
         > > > >>> > > > >> libvirtd.log when trying to start ovirt engine:
         > > > >>> > > > >> 2014-04-22 14:18:25.432+0000: 8901: debug :
         > > > >>> virCommandRunAsync:2250 :
         > > > >>> > > > >> Command result 0, with PID 11491
         > > > >>> > > > >> 2014-04-22 14:18:25.478+0000: 8901: debug :
         > > virCommandRun:2045 :
         > > > >>> > > Result
         > > > >>> > > > >> exit status 255, stdout: '' stderr:
        'iptables v1.4.7: goto
         > > > >>> 'FO-vnet0'
         > > > >>> > > is
         > > > >>> > > > >> not a chain
         > > > >>> > > > >>
         > > > >>> > > > >> So it could be that something is broken in
        my hosted-engine
         > > > >>> network.
         > > > >>> > > Do
         > > > >>> > > > >> you have any clue how I can troubleshoot this?
         > > > >>> > > > >>
         > > > >>> > > > >>
         > > > >>> > > > >> Thanks,
         > > > >>> > > > >> René
         > > > >>> > > > >>
         > > > >>> > > > >>
         > > > >>> > > > >>>
         > > > >>> > > > >>> ----- Original Message -----
         > > > >>> > > > >>>> From: "René Koch" <rk...@linuxland.at
        <mailto:rk...@linuxland.at>>
         > > > >>> > > > >>>> To: "Martin Sivak" <msi...@redhat.com
        <mailto:msi...@redhat.com>>
         > > > >>> > > > >>>> Cc: users@ovirt.org <mailto:users@ovirt.org>
         > > > >>> > > > >>>> Sent: Tuesday, April 22, 2014 1:46:38 PM
         > > > >>> > > > >>>> Subject: Re: [ovirt-users] hosted engine
        health check
         > > issues
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> Hi,
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> I rebooted one of my ovirt hosts today and
        the result is
         > > now
         > > > >>> that I
         > > > >>> > > > >>>> can't start hosted-engine anymore.
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> ovirt-ha-agent isn't running because the
        lockspace file is
         > > > >>> missing
         > > > >>> > > > >>>> (sanlock complains about it).
         > > > >>> > > > >>>> So I tried to start hosted-engine with
        --vm-start and I
         > > get
         > > > >>> the
         > > > >>> > > > >>>> following errors:
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> ==> /var/log/sanlock.log <==
         > > > >>> > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2
        cmd_acquire
         > > 2,9,5733
         > > > >>> invalid
         > > > >>> > > > >>>> lockspace found -1 failed 0 name
         > > > >>> > > 2851af27-8744-445d-9fb1-a0d083c8dc82
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> ==> /var/log/messages <==
         > > > >>> > > > >>>> Apr 22 12:38:17 ovirt-host02
        sanlock[3079]: 2014-04-22
         > > > >>> > > 12:38:17+0200 654
         > > > >>> > > > >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid
        lockspace found -1
         > > > >>> failed 0
         > > > >>> > > name
         > > > >>> > > > >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82
         > > > >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel:
        ovirtmgmt: port
         > > 2(vnet0)
         > > > >>> > > entering
         > > > >>> > > > >>>> disabled state
         > > > >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel:
        device vnet0 left
         > > > >>> promiscuous
         > > > >>> > > mode
         > > > >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 kernel:
        ovirtmgmt: port
         > > 2(vnet0)
         > > > >>> > > entering
         > > > >>> > > > >>>> disabled state
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> ==> /var/log/vdsm/vdsm.log <==
         > > > >>> > > > >>>> Thread-21::DEBUG::2014-04-22
         > > > >>> > > > >>>>
        12:38:17,563::libvirtconnection::124::root::(wrapper)
         > > Unknown
         > > > >>> > > > >>>> libvirterror: ecode: 38 edom: 42 level: 2
        message: Failed
         > > to
         > > > >>> acquire
         > > > >>> > > > >>>> lock: No space left on device
         > > > >>> > > > >>>> Thread-21::DEBUG::2014-04-22
         > > > >>> > > > >>>>
        12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm)
         > > > >>> > > > >>>>
         > > vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations
         > > > >>> > > released
         > > > >>> > > > >>>> Thread-21::ERROR::2014-04-22
         > > > >>> > > > >>>>
        12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm)
         > > > >>> > > > >>>>
        vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
         > > > >>> process
         > > > >>> > > failed
         > > > >>> > > > >>>> Traceback (most recent call last):
         > > > >>> > > > >>>>      File "/usr/share/vdsm/vm.py", line
        2249, in
         > > > >>> _startUnderlyingVm
         > > > >>> > > > >>>>        self._run()
         > > > >>> > > > >>>>      File "/usr/share/vdsm/vm.py", line
        3170, in _run
         > > > >>> > > > >>>>        self._connection.createXML(domxml,
        flags),
         > > > >>> > > > >>>>      File
         > > > >>> > > > >>>>
         > > > >>>
          "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
         > > > >>> > > > >>>> line 92, in wrapper
         > > > >>> > > > >>>>        ret = f(*args, **kwargs)
         > > > >>> > > > >>>>      File
        "/usr/lib64/python2.6/site-packages/libvirt.py",
         > > > >>> line
         > > > >>> > > 2665, in
         > > > >>> > > > >>>> createXML
         > > > >>> > > > >>>>        if ret is None:raise
         > > libvirtError('virDomainCreateXML()
         > > > >>> > > failed',
         > > > >>> > > > >>>> conn=self)
         > > > >>> > > > >>>> libvirtError: Failed to acquire lock: No
        space left on
         > > device
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> ==> /var/log/messages <==
         > > > >>> > > > >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR
         > > > >>> > > > >>>>
        vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start
         > > > >>> process
         > > > >>> > > > >>>> failed#012Traceback (most recent call
        last):#012  File
         > > > >>> > > > >>>> "/usr/share/vdsm/vm.py", line 2249, in
         > > _startUnderlyingVm#012
         > > > >>> > > > >>>> self._run()#012  File
        "/usr/share/vdsm/vm.py", line 3170,
         > > in
         > > > >>> > > _run#012
         > > > >>> > > > >>>>     self._connection.createXML(domxml,
        flags),#012  File
         > > > >>> > > > >>>>
         > > > >>>
        "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
         > > > >>> > > line 92,
         > > > >>> > > > >>>> in wrapper#012    ret = f(*args,
        **kwargs)#012  File
         > > > >>> > > > >>>>
        "/usr/lib64/python2.6/site-packages/libvirt.py", line
         > > 2665, in
         > > > >>> > > > >>>> createXML#012    if ret is None:raise
         > > > >>> > > libvirtError('virDomainCreateXML()
         > > > >>> > > > >>>> failed', conn=self)#012libvirtError:
        Failed to acquire
         > > lock:
         > > > >>> No
         > > > >>> > > space
         > > > >>> > > > >>>> left on device
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> ==> /var/log/vdsm/vdsm.log <==
         > > > >>> > > > >>>> Thread-21::DEBUG::2014-04-22
         > > > >>> > > > >>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus)
         > > > >>> > > > >>>>
        vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed
         > > state to
         > > > >>> Down:
         > > > >>> > > > >>>> Failed to acquire lock: No space left on
        device
         > > > >>> > > > >>>>
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> No space left on device is nonsense as
        there is enough
         > > space
         > > > >>> (I had
         > > > >>> > > this
         > > > >>> > > > >>>> issue last time as well where I had to
        patch machine.py,
         > > but
         > > > >>> this
         > > > >>> > > file
         > > > >>> > > > >>>> is now Python 2.6.6 compatible.
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> Any idea what prevents hosted-engine from
        starting?
         > > > >>> > > > >>>> ovirt-ha-broker, vdsmd and sanlock are
        running btw.
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> Btw, I can see in log that json rpc server
        module is
         > > missing
         > > > >>> - which
         > > > >>> > > > >>>> package is required for CentOS 6.5?
         > > > >>> > > > >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds
        WARNING Unable to
         > > load
         > > > >>> the
         > > > >>> > > json
         > > > >>> > > > >>>> rpc server module. Please make sure it is
        installed.
         > > > >>> > > > >>>>
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> Thanks,
         > > > >>> > > > >>>> René
         > > > >>> > > > >>>>
         > > > >>> > > > >>>>
         > > > >>> > > > >>>>
         > > > >>> > > > >>>> On 04/17/2014 10:02 AM, Martin Sivak wrote:
         > > > >>> > > > >>>>> Hi,
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>>>>> How can I disable notifications?
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> The notification is configured in
         > > > >>> > > > >>>>> /etc/ovirt-hosted-engine-ha/broker.conf
         > > > >>> > > > >>>>> section notification.
         > > > >>> > > > >>>>> The email is sent when the key
        state_transition exists
         > > and
         > > > >>> the
         > > > >>> > > string
         > > > >>> > > > >>>>> OldState-NewState contains the (case
        insensitive) regexp
         > > > >>> from the
         > > > >>> > > > >>>>> value.
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>>>>> Is it intended to send out these
        messages and detect
         > > that
         > > > >>> ovirt
         > > > >>> > > > >>>>>>>> engine
         > > > >>> > > > >>>>>>>> is down (which is false anyway), but
        not to restart
         > > the
         > > > >>> vm?
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> Forget about emails for now and check the
         > > > >>> > > > >>>>> /var/log/ovirt-hosted-engine-ha/agent.log
        and broker.log
         > > (and
         > > > >>> > > attach
         > > > >>> > > > >>>>> them
         > > > >>> > > > >>>>> as well btw).
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>>>>> oVirt hosts think that hosted engine
        is down because
         > > it
         > > > >>> seems
         > > > >>> > > that
         > > > >>> > > > >>>>>>>> hosts
         > > > >>> > > > >>>>>>>> can't write to hosted-engine.lockspace
        due to
         > > glusterfs
         > > > >>> issues
         > > > >>> > > (or
         > > > >>> > > > >>>>>>>> at
         > > > >>> > > > >>>>>>>> least I think so).
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> The hosts think so or can't really write
        there? The
         > > > >>> lockspace is
         > > > >>> > > > >>>>> managed
         > > > >>> > > > >>>>> by
         > > > >>> > > > >>>>> sanlock and our HA daemons do not touch
        it at all. We
         > > only
         > > > >>> ask
         > > > >>> > > sanlock
         > > > >>> > > > >>>>> to
         > > > >>> > > > >>>>> get make sure we have unique server id.
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>>>>> Is is possible or planned to make the
        whole ha feature
         > > > >>> optional?
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> Well the system won't perform any
        automatic actions if
         > > you
         > > > >>> put the
         > > > >>> > > > >>>>> hosted
         > > > >>> > > > >>>>> engine to global maintenance and only
        start/stop/migrate
         > > the
         > > > >>> VM
         > > > >>> > > > >>>>> manually.
         > > > >>> > > > >>>>> I would discourage you from stopping
        agent/broker,
         > > because
         > > > >>> the
         > > > >>> > > engine
         > > > >>> > > > >>>>> itself has some logic based on the reporting.
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> Regards
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> --
         > > > >>> > > > >>>>> Martin Sivák
         > > > >>> > > > >>>>> msi...@redhat.com <mailto:msi...@redhat.com>
         > > > >>> > > > >>>>> Red Hat Czech
         > > > >>> > > > >>>>> RHEV-M SLA / Brno, CZ
         > > > >>> > > > >>>>>
         > > > >>> > > > >>>>> ----- Original Message -----
         > > > >>> > > > >>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak
        wrote:
         > > > >>> > > > >>>>>>> On 04/14/2014 10:50 AM, René Koch wrote:
         > > > >>> > > > >>>>>>>> Hi,
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> I have some issues with hosted engine
        status.
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> oVirt hosts think that hosted engine
        is down because
         > > it
         > > > >>> seems
         > > > >>> > > that
         > > > >>> > > > >>>>>>>> hosts
         > > > >>> > > > >>>>>>>> can't write to hosted-engine.lockspace
        due to
         > > glusterfs
         > > > >>> issues
         > > > >>> > > (or
         > > > >>> > > > >>>>>>>> at
         > > > >>> > > > >>>>>>>> least I think so).
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> Here's the output of vm-status:
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> # hosted-engine --vm-status
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> --== Host 1 status ==--
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> Status up-to-date                  : False
         > > > >>> > > > >>>>>>>> Hostname                           :
        10.0.200.102
         > > > >>> > > > >>>>>>>> Host ID                            : 1
         > > > >>> > > > >>>>>>>> Engine status                      :
        unknown
         > > stale-data
         > > > >>> > > > >>>>>>>> Score                              : 2400
         > > > >>> > > > >>>>>>>> Local maintenance                  : False
         > > > >>> > > > >>>>>>>> Host timestamp                     :
        1397035677
         > > > >>> > > > >>>>>>>> Extra metadata (valid at timestamp):
         > > > >>> > > > >>>>>>>>         metadata_parse_version=1
         > > > >>> > > > >>>>>>>>         metadata_feature_version=1
         > > > >>> > > > >>>>>>>>         timestamp=1397035677 (Wed Apr
          9 11:27:57
         > > 2014)
         > > > >>> > > > >>>>>>>>         host-id=1
         > > > >>> > > > >>>>>>>>         score=2400
         > > > >>> > > > >>>>>>>>         maintenance=False
         > > > >>> > > > >>>>>>>>         state=EngineUp
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> --== Host 2 status ==--
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> Status up-to-date                  : True
         > > > >>> > > > >>>>>>>> Hostname                           :
        10.0.200.101
         > > > >>> > > > >>>>>>>> Host ID                            : 2
         > > > >>> > > > >>>>>>>> Engine status                      :
        {'reason': 'vm
         > > not
         > > > >>> running
         > > > >>> > > on
         > > > >>> > > > >>>>>>>> this
         > > > >>> > > > >>>>>>>> host', 'health': 'bad', 'vm': 'down',
        'detail':
         > > 'unknown'}
         > > > >>> > > > >>>>>>>> Score                              : 0
         > > > >>> > > > >>>>>>>> Local maintenance                  : False
         > > > >>> > > > >>>>>>>> Host timestamp                     :
        1397464031
         > > > >>> > > > >>>>>>>> Extra metadata (valid at timestamp):
         > > > >>> > > > >>>>>>>>         metadata_parse_version=1
         > > > >>> > > > >>>>>>>>         metadata_feature_version=1
         > > > >>> > > > >>>>>>>>         timestamp=1397464031 (Mon Apr
        14 10:27:11
         > > 2014)
         > > > >>> > > > >>>>>>>>         host-id=2
         > > > >>> > > > >>>>>>>>         score=0
         > > > >>> > > > >>>>>>>>         maintenance=False
         > > > >>> > > > >>>>>>>>         state=EngineUnexpectedlyDown
         > > > >>> > > > >>>>>>>>         timeout=Mon Apr 14 10:35:05 2014
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> oVirt engine is sending me 2 emails
        every 10 minutes
         > > with
         > > > >>> the
         > > > >>> > > > >>>>>>>> following
         > > > >>> > > > >>>>>>>> subjects:
         > > > >>> > > > >>>>>>>> - ovirt-hosted-engine state transition
         > > > >>> EngineDown-EngineStart
         > > > >>> > > > >>>>>>>> - ovirt-hosted-engine state transition
         > > > >>> EngineStart-EngineUp
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> In oVirt webadmin I can see the
        following message:
         > > > >>> > > > >>>>>>>> VM HostedEngine is down. Exit message:
        internal error
         > > > >>> Failed to
         > > > >>> > > > >>>>>>>> acquire
         > > > >>> > > > >>>>>>>> lock: error -243.
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> These messages are really annoying as
        oVirt isn't
         > > doing
         > > > >>> anything
         > > > >>> > > > >>>>>>>> with
         > > > >>> > > > >>>>>>>> hosted engine - I have an uptime of 9
        days in my
         > > engine
         > > > >>> vm.
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> So my questions are now:
         > > > >>> > > > >>>>>>>> Is it intended to send out these
        messages and detect
         > > that
         > > > >>> ovirt
         > > > >>> > > > >>>>>>>> engine
         > > > >>> > > > >>>>>>>> is down (which is false anyway), but
        not to restart
         > > the
         > > > >>> vm?
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> How can I disable notifications? I'm
        planning to
         > > write a
         > > > >>> Nagios
         > > > >>> > > > >>>>>>>> plugin
         > > > >>> > > > >>>>>>>> which parses the output of
        hosted-engine --vm-status
         > > and
         > > > >>> only
         > > > >>> > > Nagios
         > > > >>> > > > >>>>>>>> should notify me, not hosted-engine
        script.
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> Is is possible or planned to make the
        whole ha feature
         > > > >>> > > optional? I
         > > > >>> > > > >>>>>>>> really really really hate cluster
        software as it
         > > causes
         > > > >>> more
         > > > >>> > > > >>>>>>>> troubles
         > > > >>> > > > >>>>>>>> then standalone machines and in my
        case the
         > > hosted-engine
         > > > >>> ha
         > > > >>> > > feature
         > > > >>> > > > >>>>>>>> really causes troubles (and I didn't
        had a hardware or
         > > > >>> network
         > > > >>> > > > >>>>>>>> outage
         > > > >>> > > > >>>>>>>> yet only issues with hosted-engine ha
        agent). I don't
         > > > >>> need any
         > > > >>> > > ha
         > > > >>> > > > >>>>>>>> feature for hosted engine. I just want
        to run engine
         > > > >>> > > virtualized on
         > > > >>> > > > >>>>>>>> oVirt and if engine vm fails (e.g.
        because of issues
         > > with
         > > > >>> a
         > > > >>> > > host)
         > > > >>> > > > >>>>>>>> I'll
         > > > >>> > > > >>>>>>>> restart it on another node.
         > > > >>> > > > >>>>>>>
         > > > >>> > > > >>>>>>> Hi, you can:
         > > > >>> > > > >>>>>>> 1. edit
         > > > >>> /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and
         > > > >>> > > tweak
         > > > >>> > > > >>>>>>> the logger as you like
         > > > >>> > > > >>>>>>> 2. or kill ovirt-ha-broker &
        ovirt-ha-agent services
         > > > >>> > > > >>>>>>
         > > > >>> > > > >>>>>> Thanks for the information.
         > > > >>> > > > >>>>>> So engine is able to run when
        ovirt-ha-broker and
         > > > >>> ovirt-ha-agent
         > > > >>> > > isn't
         > > > >>> > > > >>>>>> running?
         > > > >>> > > > >>>>>>
         > > > >>> > > > >>>>>>
         > > > >>> > > > >>>>>> Regards,
         > > > >>> > > > >>>>>> René
         > > > >>> > > > >>>>>>
         > > > >>> > > > >>>>>>>
         > > > >>> > > > >>>>>>> --Jirka
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>> Thanks,
         > > > >>> > > > >>>>>>>> René
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>>
         > > > >>> > > > >>>>>>>
         > > > >>> > > > >>>>>>
        _______________________________________________
         > > > >>> > > > >>>>>> Users mailing list
         > > > >>> > > > >>>>>> Users@ovirt.org <mailto:Users@ovirt.org>
         > > > >>> > > > >>>>>>
        http://lists.ovirt.org/mailman/listinfo/users
         > > > >>> > > > >>>>>>
         > > > >>> > > > >>>>
        _______________________________________________
         > > > >>> > > > >>>> Users mailing list
         > > > >>> > > > >>>> Users@ovirt.org <mailto:Users@ovirt.org>
         > > > >>> > > > >>>> http://lists.ovirt.org/mailman/listinfo/users
         > > > >>> > > > >>>>
         > > > >>> > > > >>
         > > > >>> > > >
         > > > >>> > > _______________________________________________
         > > > >>> > > Users mailing list
         > > > >>> > > Users@ovirt.org <mailto:Users@ovirt.org>
         > > > >>> > > http://lists.ovirt.org/mailman/listinfo/users
         > > > >>> > >
         > > > >>> >
         > > > >>> _______________________________________________
         > > > >>> Users mailing list
         > > > >>> Users@ovirt.org <mailto:Users@ovirt.org>
         > > > >>> http://lists.ovirt.org/mailman/listinfo/users
         > > > >>>
         > > > >>
         > > > >>
         > > > >
         > > >
         > > _______________________________________________
         > > Users mailing list
         > > Users@ovirt.org <mailto:Users@ovirt.org>
         > > http://lists.ovirt.org/mailman/listinfo/users
         > >
         >
        _______________________________________________
        Users mailing list
        Users@ovirt.org <mailto:Users@ovirt.org>
        http://lists.ovirt.org/mailman/listinfo/users





_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to