same pb. ovirt-ha-broker have 400% cpu and is defunct. I can't kill with -9.
2014-04-23 13:55 GMT+02:00 Martin Sivak <[email protected]>: > Hi, > > > Isn't this file created when hosted engine is started? > > The file is created by the setup script. If it got lost then there was > probably something bad happening in your NFS or Gluster storage. > > > Or how can I create this file manually? > > I can give you experimental treatment for this. We do not have any > official way as this is something that should not ever happen :) > > !! But before you do that make sure you do not have any nodes running > properly. This will destroy and reinitialize the lockspace database for the > whole hosted-engine environment (which you apparently lack, but..). !! > > You have to create the ha_agent/hosted-engine.lockspace file with the > expected size (1MB) and then tell sanlock to initialize it as a lockspace > using: > > # python > >>> import sanlock > >>> sanlock.write_lockspace(lockspace="hosted-engine", > ... path="/rhev/data-center/mnt/<nfs>/<hosted engine storage > domain>/ha_agent/hosted-engine.lockspace", > ... offset=0) > >>> > > Then try starting the services (both broker and agent) again. > > -- > Martin Sivák > [email protected] > Red Hat Czech > RHEV-M SLA / Brno, CZ > > > ----- Original Message ----- > > On 04/23/2014 11:08 AM, Martin Sivak wrote: > > > Hi René, > > > > > >>>> libvirtError: Failed to acquire lock: No space left on device > > > > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid > > >>>> lockspace found -1 failed 0 name > 2851af27-8744-445d-9fb1-a0d083c8dc82 > > > > > > Can you please check the contents of /rhev/data-center/<your nfs > > > mount>/<nfs domain uuid>/ha_agent/? > > > > > > This is how it should look like: > > > > > > [root@dev-03 ~]# ls -al > > > > /rhev/data-center/mnt/euryale\:_home_ovirt_he/e16de6a2-53f5-4ab3-95a3-255d08398824/ha_agent/ > > > total 2036 > > > drwxr-x---. 2 vdsm kvm 4096 Mar 19 18:46 . > > > drwxr-xr-x. 6 vdsm kvm 4096 Mar 19 18:46 .. > > > -rw-rw----. 1 vdsm kvm 1048576 Apr 23 11:05 hosted-engine.lockspace > > > -rw-rw----. 1 vdsm kvm 1028096 Mar 19 18:46 hosted-engine.metadata > > > > > > The errors seem to indicate that you somehow lost the lockspace file. > > > > True :) > > Isn't this file created when hosted engine is started? Or how can I > > create this file manually? > > > > > > > > -- > > > Martin Sivák > > > [email protected] > > > Red Hat Czech > > > RHEV-M SLA / Brno, CZ > > > > > > ----- Original Message ----- > > >> On 04/23/2014 12:28 AM, Doron Fediuck wrote: > > >>> Hi Rene, > > >>> any idea what closed your ovirtmgmt bridge? > > >>> as long as it is down vdsm may have issues starting up properly > > >>> and this is why you see the complaints on the rpc server. > > >>> > > >>> Can you try manually fixing the network part first and then > > >>> restart vdsm? > > >>> Once vdsm is happy hosted engine VM will start. > > >> > > >> Thanks for your feedback, Doron. > > >> > > >> My ovirtmgmt bridge seems to be on or isn't it: > > >> # brctl show ovirtmgmt > > >> bridge name bridge id STP enabled interfaces > > >> ovirtmgmt 8000.0025907587c2 no eth0.200 > > >> > > >> # ip a s ovirtmgmt > > >> 7: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue > > >> state UNKNOWN > > >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff > > >> inet 10.0.200.102/24 brd 10.0.200.255 scope global ovirtmgmt > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope link > > >> valid_lft forever preferred_lft forever > > >> > > >> # ip a s eth0.200 > > >> 6: eth0.200@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc > > >> noqueue state UP > > >> link/ether 00:25:90:75:87:c2 brd ff:ff:ff:ff:ff:ff > > >> inet6 fe80::225:90ff:fe75:87c2/64 scope link > > >> valid_lft forever preferred_lft forever > > >> > > >> I tried the following yesterday: > > >> Copy virtual disk from GlusterFS storage to local disk of host and > > >> create a new vm with virt-manager which loads ovirtmgmt disk. I could > > >> reach my engine over the ovirtmgmt bridge (so bridge must be working). > > >> > > >> I also started libvirtd with Option -v and I saw the following in > > >> libvirtd.log when trying to start ovirt engine: > > >> 2014-04-22 14:18:25.432+0000: 8901: debug : virCommandRunAsync:2250 : > > >> Command result 0, with PID 11491 > > >> 2014-04-22 14:18:25.478+0000: 8901: debug : virCommandRun:2045 : > Result > > >> exit status 255, stdout: '' stderr: 'iptables v1.4.7: goto 'FO-vnet0' > is > > >> not a chain > > >> > > >> So it could be that something is broken in my hosted-engine network. > Do > > >> you have any clue how I can troubleshoot this? > > >> > > >> > > >> Thanks, > > >> René > > >> > > >> > > >>> > > >>> ----- Original Message ----- > > >>>> From: "René Koch" <[email protected]> > > >>>> To: "Martin Sivak" <[email protected]> > > >>>> Cc: [email protected] > > >>>> Sent: Tuesday, April 22, 2014 1:46:38 PM > > >>>> Subject: Re: [ovirt-users] hosted engine health check issues > > >>>> > > >>>> Hi, > > >>>> > > >>>> I rebooted one of my ovirt hosts today and the result is now that I > > >>>> can't start hosted-engine anymore. > > >>>> > > >>>> ovirt-ha-agent isn't running because the lockspace file is missing > > >>>> (sanlock complains about it). > > >>>> So I tried to start hosted-engine with --vm-start and I get the > > >>>> following errors: > > >>>> > > >>>> ==> /var/log/sanlock.log <== > > >>>> 2014-04-22 12:38:17+0200 654 [3093]: r2 cmd_acquire 2,9,5733 invalid > > >>>> lockspace found -1 failed 0 name > 2851af27-8744-445d-9fb1-a0d083c8dc82 > > >>>> > > >>>> ==> /var/log/messages <== > > >>>> Apr 22 12:38:17 ovirt-host02 sanlock[3079]: 2014-04-22 > 12:38:17+0200 654 > > >>>> [3093]: r2 cmd_acquire 2,9,5733 invalid lockspace found -1 failed 0 > name > > >>>> 2851af27-8744-445d-9fb1-a0d083c8dc82 > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) > entering > > >>>> disabled state > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: device vnet0 left promiscuous > mode > > >>>> Apr 22 12:38:17 ovirt-host02 kernel: ovirtmgmt: port 2(vnet0) > entering > > >>>> disabled state > > >>>> > > >>>> ==> /var/log/vdsm/vdsm.log <== > > >>>> Thread-21::DEBUG::2014-04-22 > > >>>> 12:38:17,563::libvirtconnection::124::root::(wrapper) Unknown > > >>>> libvirterror: ecode: 38 edom: 42 level: 2 message: Failed to acquire > > >>>> lock: No space left on device > > >>>> Thread-21::DEBUG::2014-04-22 > > >>>> 12:38:17,563::vm::2263::vm.Vm::(_startUnderlyingVm) > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::_ongoingCreations > released > > >>>> Thread-21::ERROR::2014-04-22 > > >>>> 12:38:17,564::vm::2289::vm.Vm::(_startUnderlyingVm) > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process > failed > > >>>> Traceback (most recent call last): > > >>>> File "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm > > >>>> self._run() > > >>>> File "/usr/share/vdsm/vm.py", line 3170, in _run > > >>>> self._connection.createXML(domxml, flags), > > >>>> File > > >>>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", > > >>>> line 92, in wrapper > > >>>> ret = f(*args, **kwargs) > > >>>> File "/usr/lib64/python2.6/site-packages/libvirt.py", line > 2665, in > > >>>> createXML > > >>>> if ret is None:raise libvirtError('virDomainCreateXML() > failed', > > >>>> conn=self) > > >>>> libvirtError: Failed to acquire lock: No space left on device > > >>>> > > >>>> ==> /var/log/messages <== > > >>>> Apr 22 12:38:17 ovirt-host02 vdsm vm.Vm ERROR > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::The vm start process > > >>>> failed#012Traceback (most recent call last):#012 File > > >>>> "/usr/share/vdsm/vm.py", line 2249, in _startUnderlyingVm#012 > > >>>> self._run()#012 File "/usr/share/vdsm/vm.py", line 3170, in > _run#012 > > >>>> self._connection.createXML(domxml, flags),#012 File > > >>>> "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", > line 92, > > >>>> in wrapper#012 ret = f(*args, **kwargs)#012 File > > >>>> "/usr/lib64/python2.6/site-packages/libvirt.py", line 2665, in > > >>>> createXML#012 if ret is None:raise > libvirtError('virDomainCreateXML() > > >>>> failed', conn=self)#012libvirtError: Failed to acquire lock: No > space > > >>>> left on device > > >>>> > > >>>> ==> /var/log/vdsm/vdsm.log <== > > >>>> Thread-21::DEBUG::2014-04-22 > > >>>> 12:38:17,569::vm::2731::vm.Vm::(setDownStatus) > > >>>> vmId=`f26dd37e-13b5-430c-b2f2-ecd098b82a91`::Changed state to Down: > > >>>> Failed to acquire lock: No space left on device > > >>>> > > >>>> > > >>>> No space left on device is nonsense as there is enough space (I had > this > > >>>> issue last time as well where I had to patch machine.py, but this > file > > >>>> is now Python 2.6.6 compatible. > > >>>> > > >>>> Any idea what prevents hosted-engine from starting? > > >>>> ovirt-ha-broker, vdsmd and sanlock are running btw. > > >>>> > > >>>> Btw, I can see in log that json rpc server module is missing - which > > >>>> package is required for CentOS 6.5? > > >>>> Apr 22 12:37:14 ovirt-host02 vdsm vds WARNING Unable to load the > json > > >>>> rpc server module. Please make sure it is installed. > > >>>> > > >>>> > > >>>> Thanks, > > >>>> René > > >>>> > > >>>> > > >>>> > > >>>> On 04/17/2014 10:02 AM, Martin Sivak wrote: > > >>>>> Hi, > > >>>>> > > >>>>>>>> How can I disable notifications? > > >>>>> > > >>>>> The notification is configured in > > >>>>> /etc/ovirt-hosted-engine-ha/broker.conf > > >>>>> section notification. > > >>>>> The email is sent when the key state_transition exists and the > string > > >>>>> OldState-NewState contains the (case insensitive) regexp from the > > >>>>> value. > > >>>>> > > >>>>>>>> Is it intended to send out these messages and detect that ovirt > > >>>>>>>> engine > > >>>>>>>> is down (which is false anyway), but not to restart the vm? > > >>>>> > > >>>>> Forget about emails for now and check the > > >>>>> /var/log/ovirt-hosted-engine-ha/agent.log and broker.log (and > attach > > >>>>> them > > >>>>> as well btw). > > >>>>> > > >>>>>>>> oVirt hosts think that hosted engine is down because it seems > that > > >>>>>>>> hosts > > >>>>>>>> can't write to hosted-engine.lockspace due to glusterfs issues > (or > > >>>>>>>> at > > >>>>>>>> least I think so). > > >>>>> > > >>>>> The hosts think so or can't really write there? The lockspace is > > >>>>> managed > > >>>>> by > > >>>>> sanlock and our HA daemons do not touch it at all. We only ask > sanlock > > >>>>> to > > >>>>> get make sure we have unique server id. > > >>>>> > > >>>>>>>> Is is possible or planned to make the whole ha feature optional? > > >>>>> > > >>>>> Well the system won't perform any automatic actions if you put the > > >>>>> hosted > > >>>>> engine to global maintenance and only start/stop/migrate the VM > > >>>>> manually. > > >>>>> I would discourage you from stopping agent/broker, because the > engine > > >>>>> itself has some logic based on the reporting. > > >>>>> > > >>>>> Regards > > >>>>> > > >>>>> -- > > >>>>> Martin Sivák > > >>>>> [email protected] > > >>>>> Red Hat Czech > > >>>>> RHEV-M SLA / Brno, CZ > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>>> On 04/15/2014 04:53 PM, Jiri Moskovcak wrote: > > >>>>>>> On 04/14/2014 10:50 AM, René Koch wrote: > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> I have some issues with hosted engine status. > > >>>>>>>> > > >>>>>>>> oVirt hosts think that hosted engine is down because it seems > that > > >>>>>>>> hosts > > >>>>>>>> can't write to hosted-engine.lockspace due to glusterfs issues > (or > > >>>>>>>> at > > >>>>>>>> least I think so). > > >>>>>>>> > > >>>>>>>> Here's the output of vm-status: > > >>>>>>>> > > >>>>>>>> # hosted-engine --vm-status > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> --== Host 1 status ==-- > > >>>>>>>> > > >>>>>>>> Status up-to-date : False > > >>>>>>>> Hostname : 10.0.200.102 > > >>>>>>>> Host ID : 1 > > >>>>>>>> Engine status : unknown stale-data > > >>>>>>>> Score : 2400 > > >>>>>>>> Local maintenance : False > > >>>>>>>> Host timestamp : 1397035677 > > >>>>>>>> Extra metadata (valid at timestamp): > > >>>>>>>> metadata_parse_version=1 > > >>>>>>>> metadata_feature_version=1 > > >>>>>>>> timestamp=1397035677 (Wed Apr 9 11:27:57 2014) > > >>>>>>>> host-id=1 > > >>>>>>>> score=2400 > > >>>>>>>> maintenance=False > > >>>>>>>> state=EngineUp > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> --== Host 2 status ==-- > > >>>>>>>> > > >>>>>>>> Status up-to-date : True > > >>>>>>>> Hostname : 10.0.200.101 > > >>>>>>>> Host ID : 2 > > >>>>>>>> Engine status : {'reason': 'vm not running > on > > >>>>>>>> this > > >>>>>>>> host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'} > > >>>>>>>> Score : 0 > > >>>>>>>> Local maintenance : False > > >>>>>>>> Host timestamp : 1397464031 > > >>>>>>>> Extra metadata (valid at timestamp): > > >>>>>>>> metadata_parse_version=1 > > >>>>>>>> metadata_feature_version=1 > > >>>>>>>> timestamp=1397464031 (Mon Apr 14 10:27:11 2014) > > >>>>>>>> host-id=2 > > >>>>>>>> score=0 > > >>>>>>>> maintenance=False > > >>>>>>>> state=EngineUnexpectedlyDown > > >>>>>>>> timeout=Mon Apr 14 10:35:05 2014 > > >>>>>>>> > > >>>>>>>> oVirt engine is sending me 2 emails every 10 minutes with the > > >>>>>>>> following > > >>>>>>>> subjects: > > >>>>>>>> - ovirt-hosted-engine state transition EngineDown-EngineStart > > >>>>>>>> - ovirt-hosted-engine state transition EngineStart-EngineUp > > >>>>>>>> > > >>>>>>>> In oVirt webadmin I can see the following message: > > >>>>>>>> VM HostedEngine is down. Exit message: internal error Failed to > > >>>>>>>> acquire > > >>>>>>>> lock: error -243. > > >>>>>>>> > > >>>>>>>> These messages are really annoying as oVirt isn't doing anything > > >>>>>>>> with > > >>>>>>>> hosted engine - I have an uptime of 9 days in my engine vm. > > >>>>>>>> > > >>>>>>>> So my questions are now: > > >>>>>>>> Is it intended to send out these messages and detect that ovirt > > >>>>>>>> engine > > >>>>>>>> is down (which is false anyway), but not to restart the vm? > > >>>>>>>> > > >>>>>>>> How can I disable notifications? I'm planning to write a Nagios > > >>>>>>>> plugin > > >>>>>>>> which parses the output of hosted-engine --vm-status and only > Nagios > > >>>>>>>> should notify me, not hosted-engine script. > > >>>>>>>> > > >>>>>>>> Is is possible or planned to make the whole ha feature > optional? I > > >>>>>>>> really really really hate cluster software as it causes more > > >>>>>>>> troubles > > >>>>>>>> then standalone machines and in my case the hosted-engine ha > feature > > >>>>>>>> really causes troubles (and I didn't had a hardware or network > > >>>>>>>> outage > > >>>>>>>> yet only issues with hosted-engine ha agent). I don't need any > ha > > >>>>>>>> feature for hosted engine. I just want to run engine > virtualized on > > >>>>>>>> oVirt and if engine vm fails (e.g. because of issues with a > host) > > >>>>>>>> I'll > > >>>>>>>> restart it on another node. > > >>>>>>> > > >>>>>>> Hi, you can: > > >>>>>>> 1. edit /etc/ovirt-hosted-engine-ha/{agent,broker}-log.conf and > tweak > > >>>>>>> the logger as you like > > >>>>>>> 2. or kill ovirt-ha-broker & ovirt-ha-agent services > > >>>>>> > > >>>>>> Thanks for the information. > > >>>>>> So engine is able to run when ovirt-ha-broker and ovirt-ha-agent > isn't > > >>>>>> running? > > >>>>>> > > >>>>>> > > >>>>>> Regards, > > >>>>>> René > > >>>>>> > > >>>>>>> > > >>>>>>> --Jirka > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> René > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> _______________________________________________ > > >>>>>> Users mailing list > > >>>>>> [email protected] > > >>>>>> http://lists.ovirt.org/mailman/listinfo/users > > >>>>>> > > >>>> _______________________________________________ > > >>>> Users mailing list > > >>>> [email protected] > > >>>> http://lists.ovirt.org/mailman/listinfo/users > > >>>> > > >> > > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

