Hi René, > >> libvirtError: Failed to acquire lock: No space left on device
> >> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid > >> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 Can you please check the contents of /rhev/data-center/<your nfs mount>/<nfs domain uuid>/ha_agent/? This is how it should look like: [root@dev-03 ~]# ls -al /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/ total 2036 drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 . drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 .. -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata The errors seem to indicate that you somehow lost the lockspace file. -- Martin Sivák [email protected] Red Hat Czech RHEV-M SLA / Brno, CZ ----- Original Message ----- > On 04/23/2014 12:28 AM, Doron Fediuck wrote: > > Hi Rene, > > any idea what closed your ovirtmgmt bridge? > > as long as it is down vdsm may have issues starting up properly > > and this is why you see the complaints on the rpc server. > > > > Can you try manually fixing the network part first and then > > restart vdsm? > > Once vdsm is happy hosted engine VM will start. > > Thanks for your feedback, Doron. > > My ovirtmgmt bridge seems to be on or isn't it: > # brctl show ovirtmgmt > bridge name bridge id STP enabled interfaces > ovirtmgmt 8000.0025907587c2 no eth0.200 > > # ip a s ovirtmgmt > 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue > state UNKNOWN > link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff > inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt > inet6 fe80::225:90ff:fe75:87c2/64 scope link > valid_lft forever preferred_lft forever > > # ip a s eth0.200 > 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > noqueue state UP > link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff > inet6 fe80::225:90ff:fe75:87c2/64 scope link > valid_lft forever preferred_lft forever > > I tried the following yesterday: > Copy virtual disk from GlusterFS storage to local disk of host and > create a new vm with virt-manager which loads ovirtmgmt disk. I could > reach my engine over the ovirtmgmt bridge (so bridge must be working). > > I also started libvirtd with Option -v and I saw the following in > libvirtd.log when trying to start ovirt engine: > 2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 : > Command result 0, with PID 11491 > 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 : Result > exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' is > not a chain > > So it could be that something is broken in my hosted-engine network. Do > you have any clue how I can troubleshoot this? > > > Thanks, > René > > > > > > ----- Original Message ----- > >> From: "René Koch" <[email protected]> > >> To: "Martin Sivak" <[email protected]> > >> Cc: [email protected] > >> Sent: Tuesday, April 22, 2014 1:46:38 PM > >> Subject: Re: [ovirt-users] hosted engine health check issues > >> > >> Hi, > >> > >> I rebooted one of my ovirt hosts today and the result is now that I > >> can't start hosted-engine anymore. > >> > >> ovirt-ha-agent isn't running because the lockspace file is missing > >> (sanlock complains about it). > >> So I tried to start hosted-engine with --vm-start and I get the > >> following errors: > >> > >> ==> /var/log/sanlock.log <== > >> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid > >> lockspace found -1 failed 0 name 2851af27-8744-445d-9fb1-a0d083c8dc82 > >> > >> ==> /var/log/messages <== > >> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 12:38:17+0200 654 > >> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 name > >> 2851af27-8744-445d-9fb1-a0d083c8dc82 > >> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering > >> disabled state > >> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous mode > >> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) entering > >> disabled state > >> > >> ==> /var/log/vdsm/vdsm.log <== > >> Thread-21::DEBUG::2014-04-22 > >> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown > >> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire > >> lock: No space left on device > >> Thread-21::DEBUG::2014-04-22 > >> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) > >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations released > >> Thread-21::ERROR::2014-04-22 > >> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) > >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process failed > >> Traceback (most recent call last): > >> File "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm > >> self._run() > >> File "/usr/share/vdsm/vm.py", line 3170, in _run > >> self._connection.createXML(domxml, flags), > >> File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", > >> line 92, in wrapper > >> ret = f(*args, **kwargs) > >> File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in > >> createXML > >> if ret is None:raise libvirtError('virDomainCreateXML() failed', > >> conn=self) > >> libvirtError: Failed to acquire lock: No space left on device > >> > >> ==> /var/log/messages <== > >> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR > >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process > >> failed#012Traceback (most recent call last):#012 File > >> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012 > >> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170, in _run#012 > >> self._connection.createXML(domxml, flags),#012 File > >> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, > >> in wrapper#012 ret = f(*args, **kwargs)#012 File > >> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in > >> createXML#012 if ret is None:raise libvirtError('virDomainCreateXML() > >> failed', conn=self)#012libvirtError: Failed to acquire lock: No space > >> left on device > >> > >> ==> /var/log/vdsm/vdsm.log <== > >> Thread-21::DEBUG::2014-04-22 > >> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) > >> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: > >> Failed to acquire lock: No space left on device > >> > >> > >> No space left on device is nonsense as there is enough space (I had this > >> issue last time as well where I had to patch machine.py, but this file > >> is now Python 2.6.6 compatible. > >> > >> Any idea what prevents hosted-engine from starting? > >> ovirt-ha-broker, vdsmd and sanlock are running btw. > >> > >> Btw, I can see in log that json rpc server module is missing - which > >> package is required for CentOS 6.5? > >> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the json > >> rpc server module. Please make sure it is installed. > >> > >> > >> Thanks, > >> René > >> > >> > >> > >> On 04/17/2014 10:02 AM, Martin Sivak wrote: > >>> Hi, > >>> > >>>>>> How can I disable notifications? > >>> > >>> The notification is configured in /etc/ovirt-hosted-engine-ha/broker.conf > >>> section notification. > >>> The email is sent when the key state_transition exists and the string > >>> OldState-NewState contains the (case insensitive) regexp from the value. > >>> > >>>>>> Is it intended to send out these messages and detect that ovirt engine > >>>>>> is down (which is false anyway), but not to restart the vm? > >>> > >>> Forget about emails for now and check the > >>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and attach them > >>> as well btw). > >>> > >>>>>> oVirt hosts think that hosted engine is down because it seems that > >>>>>> hosts > >>>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at > >>>>>> least I think so). > >>> > >>> The hosts think so or can't really write there? The lockspace is managed > >>> by > >>> sanlock and our HA daemons do not touch it at all. We only ask sanlock to > >>> get make sure we have unique server id. > >>> > >>>>>> Is is possible or planned to make the whole ha feature optional? > >>> > >>> Well the system won't perform any automatic actions if you put the hosted > >>> engine to global maintenance and only start/stop/migrate the VM manually. > >>> I would discourage you from stopping agent/broker, because the engine > >>> itself has some logic based on the reporting. > >>> > >>> Regards > >>> > >>> -- > >>> Martin Sivák > >>> [email protected] > >>> Red Hat Czech > >>> RHEV-M SLA / Brno, CZ > >>> > >>> ----- Original Message ----- > >>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: > >>>>> On 04/14/2014 10:50 AM, René Koch wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I have some issues with hosted engine status. > >>>>>> > >>>>>> oVirt hosts think that hosted engine is down because it seems that > >>>>>> hosts > >>>>>> can't write to hosted-engine.lockspace due to glusterfs issues (or at > >>>>>> least I think so). > >>>>>> > >>>>>> Here's the output of vm-status: > >>>>>> > >>>>>> # hosted-engine --vm-status > >>>>>> > >>>>>> > >>>>>> --== Host 1 status ==-- > >>>>>> > >>>>>> Status up-to-date : False > >>>>>> Hostname : 10.0.200.102 > >>>>>> Host ID : 1 > >>>>>> Engine status : unknown stale-data > >>>>>> Score : 2400 > >>>>>> Local maintenance : False > >>>>>> Host timestamp : 1397035677 > >>>>>> Extra metadata (valid at timestamp): > >>>>>> metadata_parse_version=1 > >>>>>> metadata_feature_version=1 > >>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57 2014) > >>>>>> host-id=1 > >>>>>> score=2400 > >>>>>> maintenance=False > >>>>>> state=EngineUp > >>>>>> > >>>>>> > >>>>>> --== Host 2 status ==-- > >>>>>> > >>>>>> Status up-to-date : True > >>>>>> Hostname : 10.0.200.101 > >>>>>> Host ID : 2 > >>>>>> Engine status : {'reason': 'vm not running on > >>>>>> this > >>>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} > >>>>>> Score : 0 > >>>>>> Local maintenance : False > >>>>>> Host timestamp : 1397464031 > >>>>>> Extra metadata (valid at timestamp): > >>>>>> metadata_parse_version=1 > >>>>>> metadata_feature_version=1 > >>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11 2014) > >>>>>> host-id=2 > >>>>>> score=0 > >>>>>> maintenance=False > >>>>>> state=EngineUnexpectedlyDown > >>>>>> timeout=Mon Apr 14 10:35:05 2014 > >>>>>> > >>>>>> oVirt engine is sending me 2 emails every 10 minutes with the > >>>>>> following > >>>>>> subjects: > >>>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart > >>>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp > >>>>>> > >>>>>> In oVirt webadmin I can see the following message: > >>>>>> VM HostedEngine is down. Exit message: internal error Failed to > >>>>>> acquire > >>>>>> lock: error -243. > >>>>>> > >>>>>> These messages are really annoying as oVirt isn't doing anything with > >>>>>> hosted engine - I have an uptime of 9 days in my engine vm. > >>>>>> > >>>>>> So my questions are now: > >>>>>> Is it intended to send out these messages and detect that ovirt engine > >>>>>> is down (which is false anyway), but not to restart the vm? > >>>>>> > >>>>>> How can I disable notifications? I'm planning to write a Nagios plugin > >>>>>> which parses the output of hosted-engine --vm-status and only Nagios > >>>>>> should notify me, not hosted-engine script. > >>>>>> > >>>>>> Is is possible or planned to make the whole ha feature optional? I > >>>>>> really really really hate cluster software as it causes more troubles > >>>>>> then standalone machines and in my case the hosted-engine ha feature > >>>>>> really causes troubles (and I didn't had a hardware or network outage > >>>>>> yet only issues with hosted-engine ha agent). I don't need any ha > >>>>>> feature for hosted engine. I just want to run engine virtualized on > >>>>>> oVirt and if engine vm fails (e.g. because of issues with a host) I'll > >>>>>> restart it on another node. > >>>>> > >>>>> Hi, you can: > >>>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and tweak > >>>>> the logger as you like > >>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services > >>>> > >>>> Thanks for the information. > >>>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent isn't > >>>> running? > >>>> > >>>> > >>>> Regards, > >>>> René > >>>> > >>>>> > >>>>> --Jirka > >>>>>> > >>>>>> Thanks, > >>>>>> René > >>>>>> > >>>>>> > >>>>> > >>>> _______________________________________________ > >>>> Users mailing list > >>>> [email protected] > >>>> http://lists.ovirt.org/mailman/listinfo/users > >>>> > >> _______________________________________________ > >> Users mailing list > >> [email protected] > >> http://lists.ovirt.org/mailman/listinfo/users > >> > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

