[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
>> >> 1.2 All bricks healed (gluster volume heal data info summary) and no >> >> split-brain >> > >> > >> > >> > gluster volume heal data info >> > >> > Brick node-msk-gluster203:/opt/gluster/data >> > Status: Connected >> > Number of entries: 0 >> > >> > Brick node-msk-gluster205:/opt/gluster/data >> > >> > >> > >> > >> > >> > >> > >> > Status: Connected >> > Number of entries: 7 >> > >> > Brick node-msk-gluster201:/opt/gluster/data >> > >> > >> > >> > >> > >> > >> > >> > Status: Connected >> > Number of entries: 7 >> > >> >> Data needs healing. >> Run: cluster volume heal data full > > This does not work. Yeah, That's because my phone corrects the 'gluster' to 'cluster' Usually gluster daemons detect need of heal, but with 'gluster volume heal data full && sleep 5 && gluster volume heal data info summary && sleep 5 && gluster volume heal data info summary', you can force syncing and get the result. Let's see what happens with DNS. Best Regards, Strahil Nikolov___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BR4Y4X5AGRUWGYOSKNQPRR6XHCOMQXZG/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Thx for your help, Strahil! Hmmm, I see DNS resolution failed in hostname without FQDN. I'll try to fix it. 19.03.2019, 09:43, "Strahil" :Hi Alexei,>> 1.2 All bricks healed (gluster volume heal data info summary) and no split-brain>> > > gluster volume heal data info> > Brick node-msk-gluster203:/opt/gluster/data> Status: Connected> Number of entries: 0> > Brick node-msk-gluster205:/opt/gluster/data> 78043-0943-48f8-a4fe-9b23e2ba3404>> 7-1746-471b-a49d-8d824db9fd72>> > > 8-4370-46ce-b976-ac22d2f680ee>> 9142-7843fd260c70>> > Status: Connected> Number of entries: 7> > Brick node-msk-gluster201:/opt/gluster/data> 78043-0943-48f8-a4fe-9b23e2ba3404>> 7-1746-471b-a49d-8d824db9fd72>> > > 8-4370-46ce-b976-ac22d2f680ee>> 9142-7843fd260c70>> > Status: Connected> Number of entries: 7> Data needs healing.Run: cluster volume heal data fullThis does not work. If it still doesn't heal (check in 5 min),go to /rhev/data-center/mnt/glusterSD/msk-gluster-facility._dataAnd run 'find . -exec stat {}\;' without the quotes.Done. https://yadi.sk/i/nXu0RV646YpD6Q As I have understood you, ovirt Hosted Engine is running and can be started on all nodes except 1.Ovirt Hosted Engine works and can be run on all nodes with no exceptions.Hosted Engine volume /rhev/data-center/mnt/glusterSD/msk-gluster-facility._engine can be mounted by all nodes without problems. >> >> 2. Go to the problematic host and check the mount point is there>> > > No mount point on problematic node /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data> If I create a mount point manually, it is deleted after the node is activated.> > Other nodes can mount this volume without problems. Only this node have connection problems after update.> > Here is a part of the log at the time of activation of the node:> > vdsm log> > 2019-03-18 16:46:00,548+0300 INFO (jsonrpc/5) [vds] Setting Hosted Engine HA local maintenance to False (API:1630)> 2019-03-18 16:46:00,549+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)> 2019-03-18 16:46:00,581+0300 INFO (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-025f', conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': u'msk-gluster-facility.:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], options=None) from=:::10.77.253.210,56630, flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)> 2019-03-18 16:46:00,621+0300 INFO (jsonrpc/7) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data' (storageServer:167)> 2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data mode: None (fileUtils:197)> 2019-03-18 16:46:00,622+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)This seems very strange. As you have hidden the hostname, I'm not use which on is this.Check that DNS can be resolved from all hosts and the hostname of this Host is resolvable.Name resolution works without problems. dig msk-gluster-facility. ;; ANSWER SECTION:msk-gluster-facility.. 1786 IN A 10.77.253.205 # <-- node-msk-gluster205.msk-gluster-facility.. 1786 IN A 10.77.253.201 # <-- node-msk-gluster201.msk-gluster-facility.. 1786 IN A 10.77.253.203 # <-- node-msk-gluster203. ;; Query time: 5 msec;; SERVER: 10.77.16.155#53(10.77.16.155);; WHEN: Tue Mar 19 14:55:10 MSK 2019;; MSG SIZE rcvd: 110 Also check if it in the peer list.msk-gluster-facility. is just an A type record in dns. It is used on a webUI for mounting gluster volumes and gluster storage HA.Try to manually mount the cluster volume:mount -t glusterfs msk-gluster-facility.:/data /mntWell, the mount works from hypervisor node77-202.And does not work with the hypervisor node77-204 (problematic node). node77- 204/var/log/glusterfs/mnt.log [2019-03-19 12:15:11.106226] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.15 (args: /usr/sbin/glusterfs --volfile-server=msk-gluster-facility. --volfile-id=/data /mnt)[2019-03-19 12:15:11.109577] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction[2019-03-19 12:15:11.129652] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1[2019-03-19 12:15:11.135384] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2[2019-03-19
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hi Alexei, >> 1.2 All bricks healed (gluster volume heal data info summary) and no >> split-brain > > > > gluster volume heal data info > > Brick node-msk-gluster203:/opt/gluster/data > Status: Connected > Number of entries: 0 > > Brick node-msk-gluster205:/opt/gluster/data > > > > > > > > Status: Connected > Number of entries: 7 > > Brick node-msk-gluster201:/opt/gluster/data > > > > > > > > Status: Connected > Number of entries: 7 > Data needs healing. Run: cluster volume heal data full If it still doesn't heal (check in 5 min),go to /rhev/data-center/mnt/glusterSD/msk-gluster-facility._data And run 'find . -exec stat {}\;' without the quotes. As I have understood you, ovirt Hosted Engine is running and can be started on all nodes except 1. >> >> 2. Go to the problematic host and check the mount point is there > > > > No mount point on problematic node > /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data > If I create a mount point manually, it is deleted after the node is activated. > > Other nodes can mount this volume without problems. Only this node have > connection problems after update. > > Here is a part of the log at the time of activation of the node: > > vdsm log > > 2019-03-18 16:46:00,548+0300 INFO (jsonrpc/5) [vds] Setting Hosted Engine HA > local maintenance to False (API:1630) > 2019-03-18 16:46:00,549+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC > call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573) > 2019-03-18 16:46:00,581+0300 INFO (jsonrpc/7) [vdsm.api] START > connectStorageServer(domType=7, > spUUID=u'5a5cca91-01f8-01af-0297-025f', conList=[{u'id': > u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': > u'msk-gluster-facility.:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', > u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], > options=None) from=:::10.77.253.210,56630, flow_id=81524ed, > task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46) > 2019-03-18 16:46:00,621+0300 INFO (jsonrpc/7) > [storage.StorageServer.MountConnection] Creating directory > u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data' > (storageServer:167) > 2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.fileUtils] Creating > directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data > mode: None (fileUtils:197) > 2019-03-18 16:46:00,622+0300 WARN (jsonrpc/7) > [storage.StorageServer.MountConnection] gluster server > u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', > 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate > servers (storageServer:317) This seems very strange. As you have hidden the hostname, I'm not use which on is this. Check that DNS can be resolved from all hosts and the hostname of this Host is resolvable. Also check if it in the peer list. Try to manually mount the cluster volume: mount -t glusterfs msk-gluster-facility.:/data /mnt Is this a second FQDN/IP of this server? If so, gluster accepts that via gluster peer probe IP2 >> 2.1. Check permissions (should be vdsm:kvm) and fix with chown -R if needed >> 2.2. Check the OVF_STORE from the logs that it exists > > > How can i do this? Go to /rhev/data-center/mnt/glusterSD/host_engine and use find inside the domain UUID for files that are not owned by vdsm:KVM. I usually run 'chown -R vdsm:KVM 823xx---zzz' and it will fix any misconfiguration. Best Regards, Strahil Nikolov___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OFDI4CH3REYGWAD7V36K4SW64MALACAV/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Thx for answer! 18.03.2019, 14:52, "Strahil Nikolov" : Hi Alexei, In order to debug it check the following: 1. Check gluster:1.1 All bricks up ? All peers up. Gluster version is 3.12.15 [root@node-msk-gluster203 ~]# gluster peer statusNumber of Peers: 2 Hostname: node-msk-gluster205.Uuid: 188d8444-3246-4696-a0a7-2872e0a01067State: Peer in Cluster (Connected) Hostname: node-msk-gluster201.Uuid: 919b0a60-b9b7-4091-a60a-51d43b995285State: Peer in Cluster (Connected) All bricks on all gluster servers are UP. Volume Name: dataType: ReplicateVolume ID: 8fb43ba3-b2e9-4e33-b4c3-b0b03cd8cba3Status: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcpBricks:Brick1: node-msk-gluster203.:/opt/gluster/dataBrick2: node-msk-gluster205.:/opt/gluster/dataBrick3: node-msk-gluster201.:/opt/gluster/data (arbiter) Volume Name: engineType: ReplicateVolume ID: 5dda8427-c69b-4b96-bcd6-eff3be2e0b5cStatus: StartedSnapshot Count: 0Number of Bricks: 1 x (2 + 1) = 3Transport-type: tcp Bricks:Brick1: node-msk-gluster205.:/opt/gluster/engineBrick2: node-msk-gluster203.:/opt/gluster/engineBrick3: node-msk-gluster201.:/opt/gluster/engine (arbiter) 1.2 All bricks healed (gluster volume heal data info summary) and no split-brain gluster volume heal data info Brick node-msk-gluster203:/opt/gluster/dataStatus: ConnectedNumber of entries: 0 Brick node-msk-gluster205:/opt/gluster/dataStatus: ConnectedNumber of entries: 7 Brick node-msk-gluster201:/opt/gluster/dataStatus: ConnectedNumber of entries: 7 gluster volume heal engine info Brick node-msk-gluster205.:/opt/gluster/engineStatus: ConnectedNumber of entries: 0 Brick node-msk-gluster203.:/opt/gluster/engineStatus: ConnectedNumber of entries: 0 Brick node-msk-gluster201.:/opt/gluster/engineStatus: ConnectedNumber of entries: 02. Go to the problematic host and check the mount point is there No mount point on problematic node /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_dataIf I create a mount point manually, it is deleted after the node is activated. Other nodes can mount this volume without problems. Only this node have connection problems after update. Here is a part of the log at the time of activation of the node: vdsm log 2019-03-18 16:46:00,548+0300 INFO (jsonrpc/5) [vds] Setting Hosted Engine HA local maintenance to False (API:1630)2019-03-18 16:46:00,549+0300 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.setHaMaintenanceMode succeeded in 0.00 seconds (__init__:573)2019-03-18 16:46:00,581+0300 INFO (jsonrpc/7) [vdsm.api] START connectStorageServer(domType=7, spUUID=u'5a5cca91-01f8-01af-0297-025f', conList=[{u'id': u'5799806e-7969-45da-b17d-b47a63e6a8e4', u'connection': u'msk-gluster-facility.:/data', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], options=None) from=:::10.77.253.210,56630, flow_id=81524ed, task_id=5f353993-95de-480d-afea-d32dc94fd146 (api:46)2019-03-18 16:46:00,621+0300 INFO (jsonrpc/7) [storage.StorageServer.MountConnection] Creating directory u'/rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data' (storageServer:167)2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.fileUtils] Creating directory: /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data mode: None (fileUtils:197)2019-03-18 16:46:00,622+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)2019-03-18 16:46:00,622+0300 INFO (jsonrpc/7) [storage.Mount] mounting msk-gluster-facility.:/data at /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data (mount:204)2019-03-18 16:46:00,809+0300 ERROR (jsonrpc/7) [storage.HSM] Could not connect to storageServer (hsm:2415)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2412, in connectStorageServer conObj.connect() File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 179, in connect six.reraise(t, v, tb) File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 171, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount cgroup=cgroup) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in **kwargs) File "", line 2, in mount File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result)MountError: (1, ';Running scope as unit run-72797.scope.\nMount failed. Please check the log file for more details.\n') 2.1. Check permissions (should be vdsm:kvm) and
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hi Alexei, In order to debug it check the following: 1. Check gluster:1.1 All bricks up ?1.2 All bricks healed (gluster volume heal data info summary) and no split-brain 2. Go to the problematic host and check the mount point is there2.1. Check permissions (should be vdsm:kvm) and fix with chown -R if needed2.2. Check the OVF_STORE from the logs that it exists2.3. Check that vdsm can extract the file:sudo -u vdsm tar -tvf /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data/DOMAIN-UUID/Volume-UUID/Image-ID 3 Configure virsh alias, as it's quite helpful:alias virsh='virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' 4. If VM is running - go to the host and get the xml:virsh dumpxml HostedEngine > /root/HostedEngine.xml4.1. Get the Network:virsh net-dumpxml vdsm-ovirtmgmt > /root/vdsm-ovirtmgmt.xml4.2 If not , Here is mine:[root@ovirt1 ~]# virsh net-dumpxml vdsm-ovirtmgmt vdsm-ovirtmgmt 7ae538ce-d419-4dae-93b8-3a4d27700227 UUID is not important, as my first recovery was with different one. 5. If you Hosted Engine is down:5.1 Remove the VM (if exists anywhere)on all nodes:virsh undefine HostedEngine5.2 Verify that the nodes are in global maintenance:hosted-engine --vm-status5.3 Define the Engine on only 1 machinevirsh define HostedEngine.xmlvirsh net-define vdsm-ovirtmgmt.xml virsh start HostedEngine Note: if it complains about the storage - there is no link in /var/run/vdsm/storage/DOMAIN-UUID/Volume-UUID to your Volume-UUIDHere is how it looks mine:[root@ovirt1 808423f9-8a5c-40cd-bc9f-2568c85b8c74]# ll /var/run/vdsm/storage/808423f9-8a5c-40cd-bc9f-2568c85b8c74 total 24 lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 07:42 2c74697a-8bd9-4472-8a98-bf624f3462d5 -> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/2c74697a-8bd9-4472-8a98-bf624f3462d5 lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 07:45 3ec27d6d-921c-4348-b799-f50543b6f919 -> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/3ec27d6d-921c-4348-b799-f50543b6f919 lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 08:28 441abdc8-6cb1-49a4-903f-a1ec0ed88429 -> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429 lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 21:15 8ec7a465-151e-4ac3-92a7-965ecf854501 -> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/8ec7a465-151e-4ac3-92a7-965ecf854501 lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 08:28 94ade632-6ecc-4901-8cec-8e39f3d69cb0 -> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0 lrwxrwxrwx. 1 vdsm kvm 139 Mar 17 07:42 fe62a281-51e9-4b23-87b3-2deb52357304 -> /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/fe62a281-51e9-4b23-87b3-2deb52357304 Once you create your link , start it again. 6. Wait till OVF is fixed (takes more than the settings in the engine :) ) Good Luck! Best Regards,Strahil Nikolov В понеделник, 18 март 2019 г., 12:57:30 ч. Гринуич+2, Николаев Алексей написа: Hi all! I have a very similar problem after update one of the two nodes to version 4.3.1. This node77-02 lost connection to gluster volume named DATA, but not to volume with hosted engine. node77-02 /var/log/messages Mar 18 13:40:00 node77-02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'storagepoolID': '----', 'storagedomainID': '2ee71105-1810-46eb-9388-cc6caccf9fac', 'volumeID': u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11', 'imageID': u'43b75b50-cad4-411f-8f51-2e99e52f4c77'} failed:#012(code=201, message=Volume does not exist: (u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11',))Mar 18 13:40:00 node77-02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs HostedEngine VM works fine on all nodes. But node77-02 failed witherror in webUI: ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=5a5cca91-01f8-01af-0297-025f, msdUUID=7d5de684-58ff-4fbc-905d-3048fc55b2b1' node77-02 vdsm.log 2019-03-18 13:51:46,287+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)2019-03-18 13:51:46,287+0300 INFO (jsonrpc/7) [storage.Mount] mounting msk-gluster-facility.ipt.fsin.uis:/data at /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data (mount:204)2019-03-18
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hi all! I have a very similar problem after update one of the two nodes to version 4.3.1. This node77-02 lost connection to gluster volume named DATA, but not to volume with hosted engine. node77-02 /var/log/messages Mar 18 13:40:00 node77-02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed scanning for OVF_STORE due to Command Volume.getInfo with args {'storagepoolID': '----', 'storagedomainID': '2ee71105-1810-46eb-9388-cc6caccf9fac', 'volumeID': u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11', 'imageID': u'43b75b50-cad4-411f-8f51-2e99e52f4c77'} failed:#012(code=201, message=Volume does not exist: (u'224e4b80-2744-4d7f-bd9f-43eb8fe6cf11',))Mar 18 13:40:00 node77-02 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs HostedEngine VM works fine on all nodes. But node77-02 failed witherror in webUI: ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=5a5cca91-01f8-01af-0297-025f, msdUUID=7d5de684-58ff-4fbc-905d-3048fc55b2b1' node77-02 vdsm.log 2019-03-18 13:51:46,287+0300 WARN (jsonrpc/7) [storage.StorageServer.MountConnection] gluster server u'msk-gluster-facility.' is not in bricks ['node-msk-gluster203', 'node-msk-gluster205', 'node-msk-gluster201'], possibly mounting duplicate servers (storageServer:317)2019-03-18 13:51:46,287+0300 INFO (jsonrpc/7) [storage.Mount] mounting msk-gluster-facility.ipt.fsin.uis:/data at /rhev/data-center/mnt/glusterSD/msk-gluster-facility.:_data (mount:204)2019-03-18 13:51:46,474+0300 ERROR (jsonrpc/7) [storage.HSM] Could not connect to storageServer (hsm:2415)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2412, in connectStorageServer conObj.connect() File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 179, in connect six.reraise(t, v, tb) File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 171, in connect self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP) File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 207, in mount cgroup=cgroup) File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in __call__ return callMethod() File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 53, in **kwargs) File "", line 2, in mount File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod raise convert_to_error(kind, result)MountError: (1, ';Running scope as unit run-10121.scope.\nMount failed. Please check the log file for more details.\n') -- 2019-03-18 13:51:46,830+0300 ERROR (jsonrpc/4) [storage.TaskManager.Task] (Task='fe81642e-2421-4169-a08b-51467e8f01fe') Unexpected error (task:875)Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "", line 2, in connectStoragePool File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1035, in connectStoragePool spUUID, hostID, msdUUID, masterVersion, domainsMap) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1097, in _connectStoragePool res = pool.connect(hostID, msdUUID, masterVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 700, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1274, in __rebuild self.setMasterDomain(msdUUID, masterVersion) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1495, in setMasterDomain raise se.StoragePoolMasterNotFound(self.spUUID, msdUUID)StoragePoolMasterNotFound: Cannot find master domain: u'spUUID=5a5cca91-01f8-01af-0297-025f, msdUUID=7d5de684-58ff-4fbc-905d-3048fc55b2b1' What the bestpractice to recovery this problem?15.03.2019, 13:47, "Strahil Nikolov" : On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolovwrote:Ok, I have managed to recover again and no issues are detected this time.I guess this case is quite rare and nobody has experienced that. >Hi,>can you please explain how you fixed it? I have set again to global maintenance, defined the HostedEngine from the old xml (taken from old vdsm log) , defined the network and powered it off.Set the OVF update period to 5 min , but it took several hours until the OVF_STORE were updated. Once this happened I restarted the ovirt-ha-agent ovirt-ha-broker on both nodes.Then I powered off the HostedEngine and undefined it from ovirt1. then I set the maintenance to 'none' and the VM powered on ovirt1.In order to
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hi Simone, I have noticed that my Engine's root disk is 'vda' just in standalone KVM. I have the feeling that was not the case before. Can someone check a default engine and post the output of lsblk ? Thanks in advance. Best Regards, Strahil NikolovOn Mar 15, 2019 12:46, Strahil Nikolov wrote: > > > On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov wrote: >> >> Ok, >> >> I have managed to recover again and no issues are detected this time. >> I guess this case is quite rare and nobody has experienced that. > > > >Hi, > >can you please explain how you fixed it? > > I have set again to global maintenance, defined the HostedEngine from the old > xml (taken from old vdsm log) , defined the network and powered it off. > Set the OVF update period to 5 min , but it took several hours until the > OVF_STORE were updated. Once this happened I restarted the ovirt-ha-agent > ovirt-ha-broker on both nodes.Then I powered off the HostedEngine and > undefined it from ovirt1. > > then I set the maintenance to 'none' and the VM powered on ovirt1. > In order to test a failure, I removed the global maintenance and powered off > the HostedEngine from itself (via ssh). It was brought back to the other node. > > In order to test failure of ovirt2, I set ovirt1 in local maintenance and > removed it (mode 'none') and again shutdown the VM via ssh and it started > again to ovirt1. > > It seems to be working, as I have later shut down the Engine several times > and it managed to start without issues. > > I'm not sure this is related, but I had detected that ovirt2 was out-of-sync > of the vdsm-ovirtmgmt network , but it got fixed easily via the UI. > > > > Best Regards, > Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FCMFOMJYSIMXLZKGPY3DDTJIM6IWWNXJ/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov wrote: Ok, I have managed to recover again and no issues are detected this time.I guess this case is quite rare and nobody has experienced that. >Hi,>can you please explain how you fixed it? I have set again to global maintenance, defined the HostedEngine from the old xml (taken from old vdsm log) , defined the network and powered it off.Set the OVF update period to 5 min , but it took several hours until the OVF_STORE were updated. Once this happened I restarted the ovirt-ha-agent ovirt-ha-broker on both nodes.Then I powered off the HostedEngine and undefined it from ovirt1. then I set the maintenance to 'none' and the VM powered on ovirt1. In order to test a failure, I removed the global maintenance and powered off the HostedEngine from itself (via ssh). It was brought back to the other node. In order to test failure of ovirt2, I set ovirt1 in local maintenance and removed it (mode 'none') and again shutdown the VM via ssh and it started again to ovirt1. It seems to be working, as I have later shut down the Engine several times and it managed to start without issues. I'm not sure this is related, but I had detected that ovirt2 was out-of-sync of the vdsm-ovirtmgmt network , but it got fixed easily via the UI. Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3B7OQUA733ETUA66TB7HF5Y24BLSI4XO/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Fri, Mar 15, 2019 at 8:12 AM Strahil Nikolov wrote: > Ok, > > I have managed to recover again and no issues are detected this time. > I guess this case is quite rare and nobody has experienced that. > Hi, can you please explain how you fixed it? > > Best Regards, > Strahil Nikolov > > В сряда, 13 март 2019 г., 13:03:38 ч. Гринуич+2, Strahil Nikolov < > hunter86...@yahoo.com> написа: > > > Dear Simone, > > it seems that there is some kind of problem ,as the OVF got updated with > wrong configuration: > [root@ovirt2 ~]# ls -l > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0} > > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429: > total 66591 > -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 > c3309fc0-8707-4de1-903d-8d4bbb024f81 > -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 > c3309fc0-8707-4de1-903d-8d4bbb024f81.lease > -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 > c3309fc0-8707-4de1-903d-8d4bbb024f81.meta > > > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0: > total 66591 > -rw-rw. 1 vdsm kvm 30720 Mar 13 11:07 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4 > -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease > -rw-r--r--. 1 vdsm kvm 435 Mar 13 11:07 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta > > Starting the hosted-engine fails with: > > 2019-03-13 12:48:21,237+0200 ERROR (vm/8474ae07) [virt.vm] > (vmId='8474ae07-f172-4a20-b516-375c73903df7') The vm start process failed > (vm:937) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in > _startUnderlyingVm > self._run() > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2852, in > _run > dom = self._connection.defineXML(self._domain.xml) > File > "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line > 131, in wrapper > ret = f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line > 94, in wrapper > return func(inst, *args, **kwargs) > File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3743, in > defineXML > if ret is None:raise libvirtError('virDomainDefineXML() failed', > conn=self) > libvirtError: XML error: No PCI buses available > > Best Regards, > Strahil Nikolov > > > В вторник, 12 март 2019 г., 14:14:26 ч. Гринуич+2, Strahil Nikolov < > hunter86...@yahoo.com> написа: > > > Dear Simone, > > it should be 60 min , but I have checked several hours after that and it > didn't update it. > > [root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes > OvfUpdateIntervalInMinutes: 60 version: general > > How can i make a backup of the VM config , as you have noticed the local > copy in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ? > > I will keep the HostedEngine's xml - so I can redefine if needed. > > Best Regards, > Strahil Nikolov > > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NYOEDUYIIV3TYU6HWFHFNKHA45ZV2WFD/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Ok, I have managed to recover again and no issues are detected this time.I guess this case is quite rare and nobody has experienced that. Best Regards,Strahil Nikolov В сряда, 13 март 2019 г., 13:03:38 ч. Гринуич+2, Strahil Nikolov написа: Dear Simone, it seems that there is some kind of problem ,as the OVF got updated with wrong configuration:[root@ovirt2 ~]# ls -l /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0} /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429: total 66591 -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0: total 66591 -rw-rw. 1 vdsm kvm 30720 Mar 13 11:07 9460fc4b-54f3-48e3-b7b6-da962321ecf4 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease -rw-r--r--. 1 vdsm kvm 435 Mar 13 11:07 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta Starting the hosted-engine fails with: 2019-03-13 12:48:21,237+0200 ERROR (vm/8474ae07) [virt.vm] (vmId='8474ae07-f172-4a20-b516-375c73903df7') The vm start process failed (vm:937) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2852, in _run dom = self._connection.defineXML(self._domain.xml) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3743, in defineXML if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self) libvirtError: XML error: No PCI buses available Best Regards,Strahil Nikolov В вторник, 12 март 2019 г., 14:14:26 ч. Гринуич+2, Strahil Nikolov написа: Dear Simone, it should be 60 min , but I have checked several hours after that and it didn't update it. [root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes OvfUpdateIntervalInMinutes: 60 version: general How can i make a backup of the VM config , as you have noticed the local copy in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ? I will keep the HostedEngine's xml - so I can redefine if needed. Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZRPIBZKOD533HODP6VER726XWGQEZXM7/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Dear Simone, it seems that there is some kind of problem ,as the OVF got updated with wrong configuration:[root@ovirt2 ~]# ls -l /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0} /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429: total 66591 -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0: total 66591 -rw-rw. 1 vdsm kvm 30720 Mar 13 11:07 9460fc4b-54f3-48e3-b7b6-da962321ecf4 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease -rw-r--r--. 1 vdsm kvm 435 Mar 13 11:07 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta Starting the hosted-engine fails with: 2019-03-13 12:48:21,237+0200 ERROR (vm/8474ae07) [virt.vm] (vmId='8474ae07-f172-4a20-b516-375c73903df7') The vm start process failed (vm:937) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 866, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2852, in _run dom = self._connection.defineXML(self._domain.xml) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3743, in defineXML if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self) libvirtError: XML error: No PCI buses available Best Regards,Strahil Nikolov В вторник, 12 март 2019 г., 14:14:26 ч. Гринуич+2, Strahil Nikolov написа: Dear Simone, it should be 60 min , but I have checked several hours after that and it didn't update it. [root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes OvfUpdateIntervalInMinutes: 60 version: general How can i make a backup of the VM config , as you have noticed the local copy in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ? I will keep the HostedEngine's xml - so I can redefine if needed. Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3XPJXJ4I4LVDDV47BTSXA4FQE3OM5T5J/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Dear Simone, it should be 60 min , but I have checked several hours after that and it didn't update it. [root@engine ~]# engine-config -g OvfUpdateIntervalInMinutes OvfUpdateIntervalInMinutes: 60 version: general How can i make a backup of the VM config , as you have noticed the local copy in /var/run/ovirt-hosted-engine-ha/vm.conf won't work ? I will keep the HostedEngine's xml - so I can redefine if needed. Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GHH4TKNKFWSBKJVX6UHIVB6R4EKS54EH/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Tue, Mar 12, 2019 at 9:48 AM Strahil Nikolov wrote: > Latest update - the system is back and running normally. > After a day (or maybe a little more), the OVF is OK: > Normally it should try every 60 minutes. Can you please execute engine-config -g OvfUpdateIntervalInMinutes on your engine VM and check the results? it should be 60 minutes by default. > > [root@ovirt1 ~]# ls -l > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0} > > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429: > total 66591 > -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 > c3309fc0-8707-4de1-903d-8d4bbb024f81 > -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 > c3309fc0-8707-4de1-903d-8d4bbb024f81.lease > -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 > c3309fc0-8707-4de1-903d-8d4bbb024f81.meta > > > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0: > total 66591 > -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4 > -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease > -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta > > > Once it's got fixed, I have managed to start the hosted-engine properly (I > have rebooted the whole cluster just to be on the safe side): > > [root@ovirt1 ~]# hosted-engine --vm-status > > > --== Host ovirt1.localdomain (id: 1) status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt1.localdomain > Host ID: 1 > Engine status : {"health": "good", "vm": "up", > "detail": "Up"} > Score : 3400 > stopped: False > Local maintenance : False > crc32 : 8ec26591 > local_conf_timestamp : 49704 > Host timestamp : 49704 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=49704 (Tue Mar 12 10:47:43 2019) > host-id=1 > score=3400 > vm_conf_refresh_time=49704 (Tue Mar 12 10:47:43 2019) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > --== Host ovirt2.localdomain (id: 2) status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : ovirt2.localdomain > Host ID: 2 > Engine status : {"reason": "vm not running on this > host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 3400 > stopped: False > Local maintenance : False > crc32 : f9f39dcd > local_conf_timestamp : 14458 > Host timestamp : 14458 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=14458 (Tue Mar 12 10:47:41 2019) > host-id=2 > score=3400 > vm_conf_refresh_time=14458 (Tue Mar 12 10:47:41 2019) > conf_on_shared_storage=True > maintenance=False > state=EngineDown > stopped=False > > > > Best Regards, > Strahil Nikolov > > В неделя, 10 март 2019 г., 5:05:33 ч. Гринуич+2, Strahil Nikolov < > hunter86...@yahoo.com> написа: > > > Hello again, > > Latest update: the engine is up and running (or at least the login portal). > > [root@ovirt1 ~]# hosted-engine --check-liveliness > Hosted Engine is up! > > I have found online the xml for the network: > > [root@ovirt1 ~]# cat ovirtmgmt_net.xml > > vdsm-ovirtmgmt > > > > > Sadly, I had to create a symbolic link to the main disk in > /var/run/vdsm/storage , as it was missing. > > So, what's next. > > Issues up to now: > 2 OVF - 0 bytes > Problem with local copy of the HostedEngine config - used xml from an old > vdsm log > Missing vdsm-ovirtmgmt definition > No link for the main raw disk in /var/run/vdsm/storage . > > Can you hint me how to recover the 2 OVF tars now ? > > Best Regards, > Strahil Nikolov > > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A23KL3SPOAMCJKGUQZUX65IF46AYITK7/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Latest update - the system is back and running normally.After a day (or maybe a little more), the OVF is OK: [root@ovirt1 ~]# ls -l /rhev/data-center/mnt/glusterSD/ovirt1.localdomain\:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/{441abdc8-6cb1-49a4-903f-a1ec0ed88429,94ade632-6ecc-4901-8cec-8e39f3d69cb0} /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429: total 66591 -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0: total 66591 -rw-rw. 1 vdsm kvm 30720 Mar 12 08:06 9460fc4b-54f3-48e3-b7b6-da962321ecf4 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease -rw-r--r--. 1 vdsm kvm 435 Mar 12 08:06 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta Once it's got fixed, I have managed to start the hosted-engine properly (I have rebooted the whole cluster just to be on the safe side): [root@ovirt1 ~]# hosted-engine --vm-status --== Host ovirt1.localdomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.localdomain Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "Up"} Score : 3400 stopped : False Local maintenance : False crc32 : 8ec26591 local_conf_timestamp : 49704 Host timestamp : 49704 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=49704 (Tue Mar 12 10:47:43 2019) host-id=1 score=3400 vm_conf_refresh_time=49704 (Tue Mar 12 10:47:43 2019) conf_on_shared_storage=True maintenance=False state=EngineUp stopped=False --== Host ovirt2.localdomain (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2.localdomain Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : f9f39dcd local_conf_timestamp : 14458 Host timestamp : 14458 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=14458 (Tue Mar 12 10:47:41 2019) host-id=2 score=3400 vm_conf_refresh_time=14458 (Tue Mar 12 10:47:41 2019) conf_on_shared_storage=True maintenance=False state=EngineDown stopped=False Best Regards,Strahil Nikolov В неделя, 10 март 2019 г., 5:05:33 ч. Гринуич+2, Strahil Nikolov написа: Hello again, Latest update: the engine is up and running (or at least the login portal). [root@ovirt1 ~]# hosted-engine --check-livelinessHosted Engine is up! I have found online the xml for the network: [root@ovirt1 ~]# cat ovirtmgmt_net.xml vdsm-ovirtmgmt Sadly, I had to create a symbolic link to the main disk in /var/run/vdsm/storage , as it was missing. So, what's next. Issues up to now:2 OVF - 0 bytesProblem with local copy of the HostedEngine config - used xml from an old vdsm logMissing vdsm-ovirtmgmt definitionNo link for the main raw disk in /var/run/vdsm/storage . Can you hint me how to recover the 2 OVF tars now ? Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ODFPHVS5LYY6JWFWKWR3PBYTF3QSDKGV/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hello again, Latest update: the engine is up and running (or at least the login portal). [root@ovirt1 ~]# hosted-engine --check-livelinessHosted Engine is up! I have found online the xml for the network: [root@ovirt1 ~]# cat ovirtmgmt_net.xml vdsm-ovirtmgmt Sadly, I had to create a symbolic link to the main disk in /var/run/vdsm/storage , as it was missing. So, what's next. Issues up to now:2 OVF - 0 bytesProblem with local copy of the HostedEngine config - used xml from an old vdsm logMissing vdsm-ovirtmgmt definitionNo link for the main raw disk in /var/run/vdsm/storage . Can you hint me how to recover the 2 OVF tars now ? Best Regards,Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/57QC4DEXCVF6AEIDFDLDBYSPZQIYJGOR/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hi Simone, and thanks for your help. So far I found out that there is some problem with the local copy of the HostedEngine config (see attached part of vdsm.log). I have found out an older xml configuration (in an old vdsm.log) and defining the VM works, but powering it on reports: [root@ovirt1 ~]# virsh define hosted-engine.xmlDomain HostedEngine defined from hosted-engine.xml [root@ovirt1 ~]# virsh list --all Id Name State - HostedEngine shut off [root@ovirt1 ~]# virsh start HostedEngineerror: Failed to start domain HostedEngineerror: Network not found: no network with matching name 'vdsm-ovirtmgmt' [root@ovirt1 ~]# virsh net-list --all Name State Autostart Persistent-- ;vdsmdummy; active no no default inactive no yes [root@ovirt1 ~]# brctl showbridge name bridge id STP enabled interfaces;vdsmdummy; 8000. noovirtmgmt 8000.bc5ff467f5b3 no enp2s0 [root@ovirt1 ~]# ip a s1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: enp2s0: mtu 9000 qdisc mq master ovirtmgmt state UP group default qlen 1000 link/ether bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff3: ovs-system: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether f6:78:c7:2d:32:f9 brd ff:ff:ff:ff:ff:ff4: br-int: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 66:36:dd:63:dc:48 brd ff:ff:ff:ff:ff:ff20: ovirtmgmt: mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff inet 192.168.1.90/24 brd 192.168.1.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet 192.168.1.243/24 brd 192.168.1.255 scope global secondary ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::be5f:f4ff:fe67:f5b3/64 scope link valid_lft forever preferred_lft forever21: ;vdsmdummy;: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ce:36:8d:b7:64:bd brd ff:ff:ff:ff:ff:ff 192.168.1.243/24 is the one of the IPs in ctdb.. So , now comes the question - is there an xml in the logs that defines the network ?My hope is to power up the HostedEngine properly and hope that it will push all the configurations to the right places ... maybe this is way too optimistic. At least I have learned a lot for oVirt. Best Regards,Strahil Nikolov В четвъртък, 7 март 2019 г., 17:55:12 ч. Гринуич+2, Simone Tiraboschi написа: On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov wrote: >The OVF_STORE volume is going to get periodically recreated by the engine so >at least you need a running engine. >In order to avoid this kind of issue we have two OVF_STORE disks, in your case: >MainThread::INFO::2019-03-06 >06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, >volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06 >06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, >volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4 >Can you please check if you have at lest the second copy? Second Copy is empty too:[root@ovirt1 ~]# ll /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429 total 66561 -rw-rw. 1 vdsm kvm 0 Mar 4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 4 05:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta >And even in the case you lost both, we are storing on the shared storage the >initial vm.conf:>MainThread::ERROR::2019-03-06 >>06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store) > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial >vm.conf >Can you please check what do you have in >/var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following: [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf # Editing the hosted engine VM is only possible via the manager UI\API # This file was generated at Thu Mar 7 15:37:26 2019 vmId=8474ae07-f172-4a20-b516-375c73903df7 memSize=4096 display=vnc devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1,
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Fri, Mar 8, 2019 at 12:49 PM Strahil Nikolov wrote: > Hi Simone, > > sadly it seems that starting the engine from an alternative config is not > working. > Virsh reports that the VM is defined , but shut down and the dumpxml > doesn't show any disks - maybe this is normal for oVirt (I have never > checked a running VM). > No, it's not: devices={index:0,iface:virtio,format:raw,poolID:- ---,volumeID:a9ab832f-c4f2-4b9b- 9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7- 965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f- 2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b- 9d99-6393fd995979,address:{bus:0x00, slot:0x06, the disk was definitively there in vm.conf but then the VM ignores it. I'd suggest to double check vdsm.log for errors or something like that. > > Strangely , both OVF have been wiped out at almost the same time. > > I'm attaching some console output and gluster logs. In the gluster stuff I > can see: > glusterd.log > > The message "E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler" repeated 47 times between [2019-03-04 05: > 17:15.810686] and [2019-03-04 05:19:08.576724] > [2019-03-04 05:19:16.147795] I [MSGID: 106488] > [glusterd-handler.c:1558:__glusterd_handle_cli_get_volume] 0-management: > Received get vol req > [2019-03-04 05:19:16.149524] E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-03-04 05:19:42.728693] E [MSGID: 106419] > [glusterd-utils.c:6943:glusterd_add_inode_size_to_dict] 0-management: could > not find (null) to getinode size f > or systemd-1 (autofs): (null) package missing? > [2019-03-04 05:20:54.236659] I [MSGID: 106499] > [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume > data > [2019-03-04 05:20:54.245844] I [MSGID: 106499] > [glusterd-handler.c:4389:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume > engine > > > and the log of the mountpoint: > > [2019-03-04 05:19:35.381378] E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-03-04 05:19:37.294931] I [MSGID: 108031] > [afr-common.c:2543:afr_local_discovery_cbk] 0-engine-replicate-0: selecting > local read_child engine-client-0 > The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] > 0-engine-replicate-0: selecting local read_child engine-client-0" repeated > 7 times > between [2019-03-04 05:19:37.294931] and [2019-03-04 05:21:26.171701] > The message "E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler" repeated 865 times between [2019-03-04 05 > :19:35.381378] and [2019-03-04 05:21:26.233004] > [2019-03-04 05:21:35.699082] E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-03-04 05:21:38.671811] I [MSGID: 108031] > [afr-common.c:2543:afr_local_discovery_cbk] 0-engine-replicate-0: selecting > local read_child engine-client-0 > The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] > 0-engine-replicate-0: selecting local read_child engine-client-0" repeated > 7 times > between [2019-03-04 05:21:38.671811] and [2019-03-04 05:23:31.654205] > The message "E [MSGID: 101191] > [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler" repeated 889 times between [2019-03-04 05 > :21:35.699082] and [2019-03-04 05:23:32.613797] > > Adding also Sahina here. > > > Best Regards, > Strahil Nikolov > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FLPFUNGH3FKQPXNRH3JM2JDXU6S4M2XJ/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov wrote: > > > > >The OVF_STORE volume is going to get periodically recreated by the engine > so at least you need a running engine. > > >In order to avoid this kind of issue we have two OVF_STORE disks, in your > case: > > >MainThread::INFO::2019-03-06 > 06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, > volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81 > >MainThread::INFO::2019-03-06 > 06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, > volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4 > > >Can you please check if you have at lest the second copy? > > Second Copy is empty too: > [root@ovirt1 ~]# ll > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429 > total 66561 > -rw-rw. 1 vdsm kvm 0 Mar 4 05:23 > c3309fc0-8707-4de1-903d-8d4bbb024f81 > -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 > c3309fc0-8707-4de1-903d-8d4bbb024f81.lease > -rw-r--r--. 1 vdsm kvm 435 Mar 4 05:24 > c3309fc0-8707-4de1-903d-8d4bbb024f81.meta > > > > >And even in the case you lost both, we are storing on the shared storage > the initial vm.conf: > >MainThread::ERROR::2019-03-06 > >06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store) > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial > vm.conf > > >Can you please check what do you have > in /var/run/ovirt-hosted-engine-ha/vm.conf ? > > It exists and has the following: > > [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf > # Editing the hosted engine VM is only possible via the manager UI\API > # This file was generated at Thu Mar 7 15:37:26 2019 > > vmId=8474ae07-f172-4a20-b516-375c73903df7 > memSize=4096 > display=vnc > devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, > type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk} > devices={index:0,iface:virtio,format:raw,poolID:----,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00, > slot:0x06, domain:0x, type:pci, > function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1} > devices={device:scsi,model:virtio-scsi,type:controller} > devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00, > slot:0x03, domain:0x, type:pci, > function:0x0},device:bridge,type:interface} > devices={device:console,type:console} > devices={device:vga,alias:video0,type:video} > devices={device:vnc,type:graphics} > vmName=HostedEngine > > spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir > smp=1 > maxVCpus=8 > cpuType=Opteron_G5 > emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(', > ')|first > devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng} > You should be able to copy it to /root/myvm.conf.xml and start the engine VM with hosted-engine --vm-start --vm-conf=/root/myvm.conf > > > > Also, I think this happened when I was upgrading ovirt1 (last in the > gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I > forgot to enable the global maintenance. > > > >Sorry, I don't understand > >Can you please explain what happened? > > I have updated the engine first -> All OK, next was the arbiter -> again > no issues with it. > Next was the empty host -> ovirt2 and everything went OK. > After that I migrated the engine to ovirt2 , and tried to updated ovirt1. > The web showed that the installation failed, but using "yum update" was > working. > During the update via yum of ovirt1 -> the engine app crashed and > restarted on ovirt2. > After the reboot of ovirt1 I have noticed the error about pinging the > gateway ,thus I stopped the engine and stopped the following services on > both hosts (global maintenance): > ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd sanlock > > Next was a reinitialization of the sanlock space via 'sanlock direct -s'. > In the end I have managed to power on the hosted-engine and it was running > for a while. > > As the errors did not stop - I have decided to shutdown everything, then > power it up , heal gluster and check what will happen. > > Currently I'm not able to power up the engine: > > > [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status > > > !! Cluster is in GLOBAL MAINTENANCE mode !! > Please notice that in
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
>The OVF_STORE volume is going to get periodically recreated by the engine so >at least you need a running engine. >In order to avoid this kind of issue we have two OVF_STORE disks, in your case: >MainThread::INFO::2019-03-06 >06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, >volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06 >06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, >volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4 >Can you please check if you have at lest the second copy? Second Copy is empty too:[root@ovirt1 ~]# ll /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429 total 66561 -rw-rw. 1 vdsm kvm 0 Mar 4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 4 05:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta >And even in the case you lost both, we are storing on the shared storage the >initial vm.conf:>MainThread::ERROR::2019-03-06 >>06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store) > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial >vm.conf >Can you please check what do you have in >/var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following: [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf # Editing the hosted engine VM is only possible via the manager UI\API # This file was generated at Thu Mar 7 15:37:26 2019 vmId=8474ae07-f172-4a20-b516-375c73903df7 memSize=4096 display=vnc devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk} devices={index:0,iface:virtio,format:raw,poolID:----,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00, slot:0x06, domain:0x, type:pci, function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1} devices={device:scsi,model:virtio-scsi,type:controller} devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00, slot:0x03, domain:0x, type:pci, function:0x0},device:bridge,type:interface} devices={device:console,type:console} devices={device:vga,alias:video0,type:video} devices={device:vnc,type:graphics} vmName=HostedEngine spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir smp=1 maxVCpus=8 cpuType=Opteron_G5 emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(', ')|first devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng} Also, I think this happened when I was upgrading ovirt1 (last in the gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I forgot to enable the global maintenance. >Sorry, I don't understand>Can you please explain what happened? I have updated the engine first -> All OK, next was the arbiter -> again no issues with it.Next was the empty host -> ovirt2 and everything went OK.After that I migrated the engine to ovirt2 , and tried to updated ovirt1.The web showed that the installation failed, but using "yum update" was working.During the update via yum of ovirt1 -> the engine app crashed and restarted on ovirt2.After the reboot of ovirt1 I have noticed the error about pinging the gateway ,thus I stopped the engine and stopped the following services on both hosts (global maintenance):ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd sanlock Next was a reinitialization of the sanlock space via 'sanlock direct -s'. In the end I have managed to power on the hosted-engine and it was running for a while. As the errors did not stop - I have decided to shutdown everything, then power it up , heal gluster and check what will happen. Currently I'm not able to power up the engine: [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt1.localdomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.localdomain Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Thu, Mar 7, 2019 at 9:19 AM Strahil Nikolov wrote: > Hi Simone, > > I think I found the problem - ovirt-ha cannot extract the file containing > the needed data . > In my case it is completely empty: > > > [root@ovirt1 ~]# ll > /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0 > total 66561 > -rw-rw. 1 vdsm kvm 0 Mar 4 05:21 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4 > -rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease > -rw-r--r--. 1 vdsm kvm 435 Mar 4 05:22 > 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta > > > Any hint how to recreate that ? Maybe wipe and restart the ovirt-ha-broker > and agent ? > The OVF_STORE volume is going to get periodically recreated by the engine so at least you need a running engine. In order to avoid this kind of issue we have two OVF_STORE disks, in your case: MainThread::INFO::2019-03-06 06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81 MainThread::INFO::2019-03-06 06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4 Can you please check if you have at lest the second copy? And even in the case you lost both, we are storing on the shared storage the initial vm.conf: MainThread::ERROR::2019-03-06 06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::(_get_vm_conf_content_from_ovf_store) Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf Can you please check what do you have in /var/run/ovirt-hosted-engine-ha/vm.conf ? > > Also, I think this happened when I was upgrading ovirt1 (last in the > gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I > forgot to enable the global maintenance. > Sorry, I don't understand. Can you please explain what happened? > > > > Best Regards, > Strahil Nikolov > > В сряда, 6 март 2019 г., 16:57:30 ч. Гринуич+2, Simone Tiraboschi < > stira...@redhat.com> написа: > > > > > On Wed, Mar 6, 2019 at 3:09 PM Strahil Nikolov > wrote: > > Hi Simone, > > thanks for your reply. > > >Are you really sure that the issue was on the ping? > >on storage errors the broker restart itself and while the broker is > restarting >the agent cannot ask the broker to trigger the gateway monitor > (the ping one) and >so that error message. > > It seemed so in that moment, but I'm not so sure , right now :) > > >Which kind of storage are you using? > >can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ? > > I'm using glustervs v5 from ovirt 4.3.1 with FUSE mount. > Please , have a look in the attached logs. > > > Nothing seems that strange there but that error. > Can you please try with ovirt-ha-agent and ovirt-ha-broker in debug mode? > you have to set level=DEBUG in [logger_root] section > in /etc/ovirt-hosted-engine-ha/agent-log.conf > and /etc/ovirt-hosted-engine-ha/broker-log.conf and restart the two > services. > > > > Best Regards, > Strahil Nikolov > > В сряда, 6 март 2019 г., 9:53:20 ч. Гринуич+2, Simone Tiraboschi < > stira...@redhat.com> написа: > > > > > On Wed, Mar 6, 2019 at 6:13 AM Strahil wrote: > > Hi guys, > > After updating to 4.3.1 I had an issue where the ovirt-ha-broker was > complaining that it couldn't ping the gateway. > > > Are you really sure that the issue was on the ping? > on storage errors the broker restart itself and while the broker is > restarting the agent cannot ask the broker to trigger the gateway monitor > (the ping one) and so that error message. > > > As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker, > vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace. > > I gues s I didn't do it properly as now I receive: > > ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial > vm.conf > > Any hints how to fix this ? Of course a redeploy is possible, but I prefer > to recover from that. > > > Which kind of storage are you using? > can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ? > > > Best Regards, > Strahil Nikolov > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/ > > ___ > Users mailing list -- users@ovirt.org > To
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
Hi Simone, I think I found the problem - ovirt-ha cannot extract the file containing the needed data .In my case it is completely empty: [root@ovirt1 ~]# ll /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/94ade632-6ecc-4901-8cec-8e39f3d69cb0total 66561-rw-rw. 1 vdsm kvm 0 Mar 4 05:21 9460fc4b-54f3-48e3-b7b6-da962321ecf4-rw-rw. 1 vdsm kvm 1048576 Jan 31 13:24 9460fc4b-54f3-48e3-b7b6-da962321ecf4.lease-rw-r--r--. 1 vdsm kvm 435 Mar 4 05:22 9460fc4b-54f3-48e3-b7b6-da962321ecf4.meta Any hint how to recreate that ? Maybe wipe and restart the ovirt-ha-broker and agent ? Also, I think this happened when I was upgrading ovirt1 (last in the gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I forgot to enable the global maintenance. Best Regards,Strahil Nikolov В сряда, 6 март 2019 г., 16:57:30 ч. Гринуич+2, Simone Tiraboschi написа: On Wed, Mar 6, 2019 at 3:09 PM Strahil Nikolov wrote: Hi Simone, thanks for your reply. >Are you really sure that the issue was on the ping?>on storage errors the >broker restart itself and while the broker is restarting >the agent cannot ask >the broker to trigger the gateway monitor (the ping one) and >so that error >message. It seemed so in that moment, but I'm not so sure , right now :) >Which kind of storage are you using?>can you please attach >/var/log/ovirt-hosted-engine-ha/broker.log ? I'm using glustervs v5 from ovirt 4.3.1 with FUSE mount.Please , have a look in the attached logs. Nothing seems that strange there but that error.Can you please try with ovirt-ha-agent and ovirt-ha-broker in debug mode?you have to set level=DEBUG in [logger_root] section in /etc/ovirt-hosted-engine-ha/agent-log.conf and /etc/ovirt-hosted-engine-ha/broker-log.conf and restart the two services. Best Regards,Strahil Nikolov В сряда, 6 март 2019 г., 9:53:20 ч. Гринуич+2, Simone Tiraboschi написа: On Wed, Mar 6, 2019 at 6:13 AM Strahil wrote: Hi guys, After updating to 4.3.1 I had an issue where the ovirt-ha-broker was complaining that it couldn't ping the gateway. Are you really sure that the issue was on the ping?on storage errors the broker restart itself and while the broker is restarting the agent cannot ask the broker to trigger the gateway monitor (the ping one) and so that error message. As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker, vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace. I gues s I didn't do it properly as now I receive: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Failed extracting VM OVF from the OVF_STORE volume, falling back to initial vm.conf Any hints how to fix this ? Of course a redeploy is possible, but I prefer to recover from that. Which kind of storage are you using?can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ? Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNV7AVUBLOV2UDVBTYN23ZEZ2Q4TJYHV/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QGBP6GYMCMEMI7GM2RB5OQOWMMNILDX5/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Wed, Mar 6, 2019 at 3:09 PM Strahil Nikolov wrote: > Hi Simone, > > thanks for your reply. > > >Are you really sure that the issue was on the ping? > >on storage errors the broker restart itself and while the broker is > restarting >the agent cannot ask the broker to trigger the gateway monitor > (the ping one) and >so that error message. > > It seemed so in that moment, but I'm not so sure , right now :) > > >Which kind of storage are you using? > >can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ? > > I'm using glustervs v5 from ovirt 4.3.1 with FUSE mount. > Please , have a look in the attached logs. > Nothing seems that strange there but that error. Can you please try with ovirt-ha-agent and ovirt-ha-broker in debug mode? you have to set level=DEBUG in [logger_root] section in /etc/ovirt-hosted-engine-ha/agent-log.conf and /etc/ovirt-hosted-engine-ha/broker-log.conf and restart the two services. > > Best Regards, > Strahil Nikolov > > В сряда, 6 март 2019 г., 9:53:20 ч. Гринуич+2, Simone Tiraboschi < > stira...@redhat.com> написа: > > > > > On Wed, Mar 6, 2019 at 6:13 AM Strahil wrote: > > Hi guys, > > After updating to 4.3.1 I had an issue where the ovirt-ha-broker was > complaining that it couldn't ping the gateway. > > > Are you really sure that the issue was on the ping? > on storage errors the broker restart itself and while the broker is > restarting the agent cannot ask the broker to trigger the gateway monitor > (the ping one) and so that error message. > > > As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker, > vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace. > > I gues s I didn't do it properly as now I receive: > > ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial > vm.conf > > Any hints how to fix this ? Of course a redeploy is possible, but I prefer > to recover from that. > > > Which kind of storage are you using? > can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ? > > > Best Regards, > Strahil Nikolov > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/ > > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNV7AVUBLOV2UDVBTYN23ZEZ2Q4TJYHV/ > > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/S3YOLMXMNXPT4B32Y4CYPNQRQXWA2UO3/
[ovirt-users] Re: Ovirt 4.3.1 problem with HA agent
On Wed, Mar 6, 2019 at 6:13 AM Strahil wrote: > Hi guys, > > After updating to 4.3.1 I had an issue where the ovirt-ha-broker was > complaining that it couldn't ping the gateway. > Are you really sure that the issue was on the ping? on storage errors the broker restart itself and while the broker is restarting the agent cannot ask the broker to trigger the gateway monitor (the ping one) and so that error message. > As I have seen that before - I stopped ovirt-ha-agent, ovirt-ha-broker, > vdsmd, supervdsmd and sanlock on the nodes and reinitialized the lockspace. > > I gues s I didn't do it properly as now I receive: > > ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial > vm.conf > > Any hints how to fix this ? Of course a redeploy is possible, but I prefer > to recover from that. > Which kind of storage are you using? can you please attach /var/log/ovirt-hosted-engine-ha/broker.log ? > Best Regards, > Strahil Nikolov > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/OU3FKLEPH7AHT2LO2IYZ47RJHRA72C3Z/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BNV7AVUBLOV2UDVBTYN23ZEZ2Q4TJYHV/