Re: [ovirt-users] hosted engine health check issues
and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at mailto:rk...@linuxland.at To: Martin Sivak msi...@redhat.com mailto:msi...@redhat.com Cc: users@ovirt.org mailto:users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left
Re: [ovirt-users] hosted engine health check issues
network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device == /var/log/messages == Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed#012Traceback (most recent call last):#012 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012 self._run()#012 File /usr/share/vdsm/vm.py, line 3170, in _run#012 self._connection.createXML(domxml, flags),#012 File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper#012ret = f(*args, **kwargs)#012 File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML#012if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)#012libvirtError: Failed to acquire lock: No space left on device == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: Failed to acquire lock: No space left on device No space left on device is nonsense as there is enough space (I had this issue last time as well where I had to patch machine.py, but this file is now Python 2.6.6 compatible. Any idea what prevents hosted-engine from starting? ovirt-ha-broker, vdsmd and sanlock are running btw. Btw, I can see in log that json rpc server module is missing - which package is required for CentOS 6.5? Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Thanks, René On 04/17/2014 10:02 AM, Martin Sivak wrote: Hi, How can I disable notifications? The notification is configured in /etc/ovirt-hosted
Re: [ovirt-users] hosted engine health check issues
bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. Thanks for your feedback, Doron. My ovirtmgmt bridge seems to be on or isn't it: # brctl show ovirtmgmt bridge namebridge id STP enabled interfaces ovirtmgmt 8000.0025907587c2 no eth0.200 # ip a s ovirtmgmt 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever # ip a s eth0.200 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UP link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever I tried the following yesterday: Copy virtual disk from GlusterFS storage to local disk of host and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device == /var/log/messages == Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed#012Traceback (most recent call last):#012 File
Re: [ovirt-users] hosted engine health check issues
On 04/23/2014 12:28 AM, Doron Fediuck wrote: Hi Rene, any idea what closed your ovirtmgmt bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. Thanks for your feedback, Doron. My ovirtmgmt bridge seems to be on or isn't it: # brctl show ovirtmgmt bridge name bridge id STP enabled interfaces ovirtmgmt 8000.0025907587c2 no eth0.200 # ip a s ovirtmgmt 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever # ip a s eth0.200 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UP link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever I tried the following yesterday: Copy virtual disk from GlusterFS storage to local disk of host and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device == /var/log/messages == Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed#012Traceback (most recent call last):#012 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012 self._run()#012 File /usr/share/vdsm/vm.py, line 3170, in _run#012 self._connection.createXML(domxml, flags),#012 File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper#012ret = f(*args, **kwargs)#012 File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML#012if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)#012libvirtError: Failed to acquire lock: No space left on device == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) vmId=`f26dd37e-13b5
Re: [ovirt-users] hosted engine health check issues
Hi René, libvirtError: Failed to acquire lock: No space left on device 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Can you please check the contents of /rhev/data-center/your nfs mount/nfs domain uuid/ha_agent/? This is how it should look like: [root@dev-03 ~]# ls -al /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/ total 2036 drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 . drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 .. -rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace -rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata The errors seem to indicate that you somehow lost the lockspace file. -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/23/2014 12:28 AM, Doron Fediuck wrote: Hi Rene, any idea what closed your ovirtmgmt bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. Thanks for your feedback, Doron. My ovirtmgmt bridge seems to be on or isn't it: # brctl show ovirtmgmt bridge name bridge id STP enabled interfaces ovirtmgmt 8000.0025907587c2 no eth0.200 # ip a s ovirtmgmt 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever # ip a s eth0.200 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UP link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever I tried the following yesterday: Copy virtual disk from GlusterFS storage to local disk of host and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages
Re: [ovirt-users] hosted engine health check issues
On 04/23/2014 11:08 AM, Martin Sivak wrote: Hi René, libvirtError: Failed to acquire lock: No space left on device 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Can you please check the contents of /rhev/data-center/your nfs mount/nfs domain uuid/ha_agent/? This is how it should look like: [root@dev-03 ~]# ls -al /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/ total 2036 drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 . drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 .. -rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace -rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata The errors seem to indicate that you somehow lost the lockspace file. True :) Isn't this file created when hosted engine is started? Or how can I create this file manually? -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/23/2014 12:28 AM, Doron Fediuck wrote: Hi Rene, any idea what closed your ovirtmgmt bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. Thanks for your feedback, Doron. My ovirtmgmt bridge seems to be on or isn't it: # brctl show ovirtmgmt bridge name bridge id STP enabled interfaces ovirtmgmt 8000.0025907587c2 no eth0.200 # ip a s ovirtmgmt 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever # ip a s eth0.200 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UP link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever I tried the following yesterday: Copy virtual disk from GlusterFS storage to local disk of host and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64
Re: [ovirt-users] hosted engine health check issues
Hi, Isn't this file created when hosted engine is started? The file is created by the setup script. If it got lost then there was probably something bad happening in your NFS or Gluster storage. Or how can I create this file manually? I can give you experimental treatment for this. We do not have any official way as this is something that should not ever happen :) !! But before you do that make sure you do not have any nodes running properly. This will destroy and reinitialize the lockspace database for the whole hosted-engine environment (which you apparently lack, but..). !! You have to create the ha_agent/hosted-engine.lockspace file with the expected size (1MB) and then tell sanlock to initialize it as a lockspace using: # python import sanlock sanlock.write_lockspace(lockspace=hosted-engine, ... path=/rhev/data-center/mnt/nfs/hosted engine storage domain/ha_agent/hosted-engine.lockspace, ... offset=0) Then try starting the services (both broker and agent) again. -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/23/2014 11:08 AM, Martin Sivak wrote: Hi René, libvirtError: Failed to acquire lock: No space left on device 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Can you please check the contents of /rhev/data-center/your nfs mount/nfs domain uuid/ha_agent/? This is how it should look like: [root@dev-03 ~]# ls -al /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/ total 2036 drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 . drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 .. -rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace -rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata The errors seem to indicate that you somehow lost the lockspace file. True :) Isn't this file created when hosted engine is started? Or how can I create this file manually? -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/23/2014 12:28 AM, Doron Fediuck wrote: Hi Rene, any idea what closed your ovirtmgmt bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. Thanks for your feedback, Doron. My ovirtmgmt bridge seems to be on or isn't it: # brctl show ovirtmgmt bridge namebridge id STP enabled interfaces ovirtmgmt 8000.0025907587c2 no eth0.200 # ip a s ovirtmgmt 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever # ip a s eth0.200 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UP link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever I tried the following yesterday: Copy virtual disk from GlusterFS storage to local disk of host and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04
Re: [ovirt-users] hosted engine health check issues
same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9. 2014-04-23 13:55 GMT+02:00 Martin Sivak msi...@redhat.com: Hi, Isn't this file created when hosted engine is started? The file is created by the setup script. If it got lost then there was probably something bad happening in your NFS or Gluster storage. Or how can I create this file manually? I can give you experimental treatment for this. We do not have any official way as this is something that should not ever happen :) !! But before you do that make sure you do not have any nodes running properly. This will destroy and reinitialize the lockspace database for the whole hosted-engine environment (which you apparently lack, but..). !! You have to create the ha_agent/hosted-engine.lockspace file with the expected size (1MB) and then tell sanlock to initialize it as a lockspace using: # python import sanlock sanlock.write_lockspace(lockspace=hosted-engine, ... path=/rhev/data-center/mnt/nfs/hosted engine storage domain/ha_agent/hosted-engine.lockspace, ... offset=0) Then try starting the services (both broker and agent) again. -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/23/2014 11:08 AM, Martin Sivak wrote: Hi René, libvirtError: Failed to acquire lock: No space left on device 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Can you please check the contents of /rhev/data-center/your nfs mount/nfs domain uuid/ha_agent/? This is how it should look like: [root@dev-03 ~]# ls -al /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/ total 2036 drwxr-x---. 2 vdsm kvm4096 Mar 19 18:46 . drwxr-xr-x. 6 vdsm kvm4096 Mar 19 18:46 .. -rw-rw. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace -rw-rw. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata The errors seem to indicate that you somehow lost the lockspace file. True :) Isn't this file created when hosted engine is started? Or how can I create this file manually? -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/23/2014 12:28 AM, Doron Fediuck wrote: Hi Rene, any idea what closed your ovirtmgmt bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. Thanks for your feedback, Doron. My ovirtmgmt bridge seems to be on or isn't it: # brctl show ovirtmgmt bridge namebridge id STP enabled interfaces ovirtmgmt 8000.0025907587c2 no eth0.200 # ip a s ovirtmgmt 7: ovirtmgmt: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever # ip a s eth0.200 6: eth0.200@eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UP link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff inet6 fe80::225:90ff:fe75:87c2/64 scope link valid_lft forever preferred_lft forever I tried the following yesterday: Copy virtual disk from GlusterFS storage to local disk of host and create a new vm with virt-manager which loads ovirtmgmt disk. I could reach my engine over the ovirtmgmt bridge (so bridge must be working). I also started libvirtd with Option -v and I saw the following in libvirtd.log when trying to start ovirt engine: 2014-04-22 14:18:25.432+: 8901: debug : virCommandRunAsync:2250 : Command result 0, with PID 11491 2014-04-22 14:18:25.478+: 8901: debug : virCommandRun:2045 : Result exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is not a chain So it could be that something is broken in my hosted-engine network. Do you have any clue how I can troubleshoot this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get
Re: [ovirt-users] hosted engine health check issues
this? Thanks, René - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device == /var/log/messages == Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed#012Traceback (most recent call last):#012 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012 self._run()#012 File /usr/share/vdsm/vm.py, line 3170, in _run#012 self._connection.createXML(domxml, flags),#012 File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper#012ret = f(*args, **kwargs)#012 File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML#012if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)#012libvirtError: Failed to acquire lock: No space left on device == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: Failed to acquire lock: No space left on device No space left on device is nonsense as there is enough space (I had this issue last time as well where I had to patch machine.py, but this file is now Python 2.6.6 compatible. Any idea what prevents hosted-engine from starting? ovirt-ha-broker, vdsmd and sanlock are running btw. Btw, I can see in log that json rpc server module is missing - which package is required for CentOS 6.5? Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Thanks, René On 04/17/2014 10:02 AM, Martin Sivak wrote: Hi, How can I disable notifications? The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf section notification. The email is sent when the key state_transition exists and the string OldState-NewState contains the (case insensitive) regexp from the value. Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? Forget about emails for now and check the /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them as well btw). oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so
Re: [ovirt-users] hosted engine health check issues
Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device == /var/log/messages == Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed#012Traceback (most recent call last):#012 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012 self._run()#012 File /usr/share/vdsm/vm.py, line 3170, in _run#012 self._connection.createXML(domxml, flags),#012 File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper#012ret = f(*args, **kwargs)#012 File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML#012if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)#012libvirtError: Failed to acquire lock: No space left on device == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: Failed to acquire lock: No space left on device No space left on device is nonsense as there is enough space (I had this issue last time as well where I had to patch machine.py, but this file is now Python 2.6.6 compatible. Any idea what prevents hosted-engine from starting? ovirt-ha-broker, vdsmd and sanlock are running btw. Btw, I can see in log that json rpc server module is missing - which package is required for CentOS 6.5? Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Thanks, René On 04/17/2014 10:02 AM, Martin Sivak wrote: Hi, How can I disable notifications? The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf section notification. The email is sent when the key state_transition exists and the string OldState-NewState contains the (case insensitive) regexp from the value. Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? Forget about emails for now and check the /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them as well btw). oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). The hosts think so or can't really write there? The lockspace is managed by sanlock and our HA daemons do not touch it at all. We only ask sanlock to get make sure we have unique server id. Is is possible or planned to make the whole ha feature optional? Well the system won't perform any automatic actions if you put the hosted engine to global maintenance and only start/stop/migrate the VM manually. I would discourage you from stopping agent/broker, because the engine itself has some logic based on the reporting. Regards -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: On 04/14/2014 10:50 AM, René Koch wrote: Hi, I have
Re: [ovirt-users] hosted engine health check issues
On 04/14/2014 11:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Thanks, René I'm pretty sure we removed hosted-engine on gluster due to concerns around the locking issues. is the gluster configured with quorum to avoid split brains? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted engine health check issues
On 04/22/2014 04:04 PM, Itamar Heim wrote: On 04/14/2014 11:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Thanks, René I'm pretty sure we removed hosted-engine on gluster due to concerns around the locking issues. is the gluster configured with quorum to avoid split brains? At the moment there's no quorum (1 host online is enough - but GlusterFS network is on dedicated nics which are directly connected between two hosts), as I'm waiting for additional memory and disks for the other 2 nodes (so I have only 2 nodes atm). But GlusterFS looks fine (now) - same for info heal-failed and info split-brain: # gluster volume heal engine info Gathering Heal info on volume engine has been successful Brick ovirt-host01-gluster:/data/engine Number of entries: 0 Brick ovirt-host02-gluster:/data/engine Number of entries: 0 I can also create (touch) the lockspace file on the mounted GlusterFS volume - so imho GlusterFS isn't blocking libvirt. Regards, René ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted engine health check issues
Hi Rene, any idea what closed your ovirtmgmt bridge? as long as it is down vdsm may have issues starting up properly and this is why you see the complaints on the rpc server. Can you try manually fixing the network part first and then restart vdsm? Once vdsm is happy hosted engine VM will start. - Original Message - From: René Koch rk...@linuxland.at To: Martin Sivak msi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, April 22, 2014 1:46:38 PM Subject: Re: [ovirt-users] hosted engine health check issues Hi, I rebooted one of my ovirt hosts today and the result is now that I can't start hosted-engine anymore. ovirt-ha-agent isn't running because the lockspace file is missing (sanlock complains about it). So I tried to start hosted-engine with --vm-start and I get the following errors: == /var/log/sanlock.log == 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 == /var/log/messages == Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering disabled state == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire lock: No space left on device Thread-21::DEBUG::2014-04-22 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released Thread-21::ERROR::2014-04-22 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed Traceback (most recent call last): File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm self._run() File /usr/share/vdsm/vm.py, line 3170, in _run self._connection.createXML(domxml, flags), File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper ret = f(*args, **kwargs) File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: Failed to acquire lock: No space left on device == /var/log/messages == Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed#012Traceback (most recent call last):#012 File /usr/share/vdsm/vm.py, line 2249, in _startUnderlyingVm#012 self._run()#012 File /usr/share/vdsm/vm.py, line 3170, in _run#012 self._connection.createXML(domxml, flags),#012 File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 92, in wrapper#012ret = f(*args, **kwargs)#012 File /usr/lib64/python2.6/site-packages/libvirt.py, line 2665, in createXML#012if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)#012libvirtError: Failed to acquire lock: No space left on device == /var/log/vdsm/vdsm.log == Thread-21::DEBUG::2014-04-22 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: Failed to acquire lock: No space left on device No space left on device is nonsense as there is enough space (I had this issue last time as well where I had to patch machine.py, but this file is now Python 2.6.6 compatible. Any idea what prevents hosted-engine from starting? ovirt-ha-broker, vdsmd and sanlock are running btw. Btw, I can see in log that json rpc server module is missing - which package is required for CentOS 6.5? Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Thanks, René On 04/17/2014 10:02 AM, Martin Sivak wrote: Hi, How can I disable notifications? The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf section notification. The email is sent when the key state_transition exists and the string OldState-NewState contains the (case insensitive) regexp from the value. Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? Forget about emails for now and check the /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them as well btw). oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). The hosts think so or can't really write there? The lockspace is managed by sanlock and our HA daemons do not touch it at all. We only
Re: [ovirt-users] hosted engine health check issues
On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: On 04/14/2014 10:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Hi, you can: 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak the logger as you like 2. or kill ovirt-ha-broker ovirt-ha-agent services Thanks for the information. So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't running? Regards, René --Jirka Thanks, René ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted engine health check issues
On 04/17/2014 09:34 AM, René Koch wrote: On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: On 04/14/2014 10:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Hi, you can: 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak the logger as you like 2. or kill ovirt-ha-broker ovirt-ha-agent services Thanks for the information. So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't running? - yes, it might cause some problems if you set up another host for hosted engine and run the agent on the other host, but as long as you don't have the agent running anywhere or you don't need to migrate the engine vm, you should be fine. --Jirka Regards, René --Jirka Thanks, René ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted engine health check issues
On 04/17/2014 09:40 AM, Jiri Moskovcak wrote: On 04/17/2014 09:34 AM, René Koch wrote: On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: On 04/14/2014 10:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Hi, you can: 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak the logger as you like 2. or kill ovirt-ha-broker ovirt-ha-agent services Thanks for the information. So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't running? - yes, it might cause some problems if you set up another host for hosted engine and run the agent on the other host, but as long as you don't have the agent running anywhere or you don't need to migrate the engine vm, you should be fine. Thanks! At the moment I have an issue with ovirt-ha-broker running crazy and don't react on kill -9: # ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND vdsm 3059 224 0.0 0 0 ?Zl Mar03 145536:45 [ovirt-ha-broker] defunct # kill -9 3059 # ps aux | egrep -e '%CPU|\[ovirt-ha-broker\]' | grep -v grep USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND vdsm 3059 224 0.0 0 0 ?Zl Mar03 145545:17 [ovirt-ha-broker] defunct --Jirka Regards, René --Jirka Thanks, René ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] hosted engine health check issues
Hi, How can I disable notifications? The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf section notification. The email is sent when the key state_transition exists and the string OldState-NewState contains the (case insensitive) regexp from the value. Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? Forget about emails for now and check the /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them as well btw). oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). The hosts think so or can't really write there? The lockspace is managed by sanlock and our HA daemons do not touch it at all. We only ask sanlock to get make sure we have unique server id. Is is possible or planned to make the whole ha feature optional? Well the system won't perform any automatic actions if you put the hosted engine to global maintenance and only start/stop/migrate the VM manually. I would discourage you from stopping agent/broker, because the engine itself has some logic based on the reporting. Regards -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: On 04/14/2014 10:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Hi, you can: 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak the logger as you like 2. or kill ovirt-ha-broker ovirt-ha-agent services Thanks for the information. So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't running? Regards, René --Jirka Thanks, René ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users
Re: [ovirt-users] hosted engine health check issues
On 04/14/2014 10:50 AM, René Koch wrote: Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Hi, you can: 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak the logger as you like 2. or kill ovirt-ha-broker ovirt-ha-agent services --Jirka Thanks, René ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] hosted engine health check issues
Hi, I have some issues with hosted engine status. oVirt hosts think that hosted engine is down because it seems that hosts can't write to hosted-engine.lockspace due to glusterfs issues (or at least I think so). Here's the output of vm-status: # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : 10.0.200.102 Host ID: 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 1397035677 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397035677 (Wed Apr 9 11:27:57 2014) host-id=1 score=2400 maintenance=False state=EngineUp --== Host 2 status ==-- Status up-to-date : True Hostname : 10.0.200.101 Host ID: 2 Engine status : {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} Score : 0 Local maintenance : False Host timestamp : 1397464031 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=1397464031 (Mon Apr 14 10:27:11 2014) host-id=2 score=0 maintenance=False state=EngineUnexpectedlyDown timeout=Mon Apr 14 10:35:05 2014 oVirt engine is sending me 2 emails every 10 minutes with the following subjects: - ovirt-hosted-engine state transition EngineDown-EngineStart - ovirt-hosted-engine state transition EngineStart-EngineUp In oVirt webadmin I can see the following message: VM HostedEngine is down. Exit message: internal error Failed to acquire lock: error -243. These messages are really annoying as oVirt isn't doing anything with hosted engine - I have an uptime of 9 days in my engine vm. So my questions are now: Is it intended to send out these messages and detect that ovirt engine is down (which is false anyway), but not to restart the vm? How can I disable notifications? I'm planning to write a Nagios plugin which parses the output of hosted-engine --vm-status and only Nagios should notify me, not hosted-engine script. Is is possible or planned to make the whole ha feature optional? I really really really hate cluster software as it causes more troubles then standalone machines and in my case the hosted-engine ha feature really causes troubles (and I didn't had a hardware or network outage yet only issues with hosted-engine ha agent). I don't need any ha feature for hosted engine. I just want to run engine virtualized on oVirt and if engine vm fails (e.g. because of issues with a host) I'll restart it on another node. Thanks, René -- Best Regards René Koch Senior Solution Architect LIS-Linuxland GmbH Brünner Straße 163, A-1210 Vienna Phone: +43 1 236 91 60 Mobile: +43 660 / 512 21 31 E-Mail: rk...@linuxland.at ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users