Hi Simone, and thanks for your help. So far I found out that there is some problem with the local copy of the HostedEngine config (see attached part of vdsm.log). I have found out an older xml configuration (in an old vdsm.log) and defining the VM works, but powering it on reports: [root@ovirt1 ~]# virsh define hosted-engine.xmlDomain HostedEngine defined from hosted-engine.xml [root@ovirt1 ~]# virsh list --all Id Name State---------------------------------------------------- - HostedEngine shut off [root@ovirt1 ~]# virsh start HostedEngineerror: Failed to start domain HostedEngineerror: Network not found: no network with matching name 'vdsm-ovirtmgmt' [root@ovirt1 ~]# virsh net-list --all Name State Autostart Persistent---------------------------------------------------------- ;vdsmdummy; active no no default inactive no yes [root@ovirt1 ~]# brctl showbridge name bridge id STP enabled interfaces;vdsmdummy; 8000.000000000000 noovirtmgmt 8000.bc5ff467f5b3 no enp2s0 [root@ovirt1 ~]# ip a s1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master ovirtmgmt state UP group default qlen 1000 link/ether bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether f6:78:c7:2d:32:f9 brd ff:ff:ff:ff:ff:ff4: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 66:36:dd:63:dc:48 brd ff:ff:ff:ff:ff:ff20: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether bc:5f:f4:67:f5:b3 brd ff:ff:ff:ff:ff:ff inet 192.168.1.90/24 brd 192.168.1.255 scope global ovirtmgmt valid_lft forever preferred_lft forever inet 192.168.1.243/24 brd 192.168.1.255 scope global secondary ovirtmgmt valid_lft forever preferred_lft forever inet6 fe80::be5f:f4ff:fe67:f5b3/64 scope link valid_lft forever preferred_lft forever21: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ce:36:8d:b7:64:bd brd ff:ff:ff:ff:ff:ff
192.168.1.243/24 is the one of the IPs in ctdb.. So , now comes the question - is there an xml in the logs that defines the network ?My hope is to power up the HostedEngine properly and hope that it will push all the configurations to the right places ... maybe this is way too optimistic. At least I have learned a lot for oVirt. Best Regards,Strahil Nikolov В четвъртък, 7 март 2019 г., 17:55:12 ч. Гринуич+2, Simone Tiraboschi <stira...@redhat.com> написа: On Thu, Mar 7, 2019 at 2:54 PM Strahil Nikolov <hunter86...@yahoo.com> wrote: >The OVF_STORE volume is going to get periodically recreated by the engine so >at least you need a running engine. >In order to avoid this kind of issue we have two OVF_STORE disks, in your case: >MainThread::INFO::2019-03-06 >06:50:02,391::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:441abdc8-6cb1-49a4-903f-a1ec0ed88429, >volUUID:c3309fc0-8707-4de1-903d-8d4bbb024f81>MainThread::INFO::2019-03-06 >06:50:02,748::ovf_store::120::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) > Found >OVF_STORE: imgUUID:94ade632-6ecc-4901-8cec-8e39f3d69cb0, >volUUID:9460fc4b-54f3-48e3-b7b6-da962321ecf4 >Can you please check if you have at lest the second copy? Second Copy is empty too:[root@ovirt1 ~]# ll /rhev/data-center/mnt/glusterSD/ovirt1.localdomain:_engine/808423f9-8a5c-40cd-bc9f-2568c85b8c74/images/441abdc8-6cb1-49a4-903f-a1ec0ed88429 total 66561 -rw-rw----. 1 vdsm kvm 0 Mar 4 05:23 c3309fc0-8707-4de1-903d-8d4bbb024f81 -rw-rw----. 1 vdsm kvm 1048576 Jan 31 13:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.lease -rw-r--r--. 1 vdsm kvm 435 Mar 4 05:24 c3309fc0-8707-4de1-903d-8d4bbb024f81.meta >And even in the case you lost both, we are storing on the shared storage the >initial vm.conf:>MainThread::ERROR::2019-03-06 >>06:50:02,971::config_ovf::70::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm::>(_get_vm_conf_content_from_ovf_store) > Failed extracting VM OVF from the OVF_STORE volume, falling back to initial >vm.conf >Can you please check what do you have in >/var/run/ovirt-hosted-engine-ha/vm.conf ? It exists and has the following: [root@ovirt1 ~]# cat /var/run/ovirt-hosted-engine-ha/vm.conf # Editing the hosted engine VM is only possible via the manager UI\API # This file was generated at Thu Mar 7 15:37:26 2019 vmId=8474ae07-f172-4a20-b516-375c73903df7 memSize=4096 display=vnc devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1, type:drive},specParams:{},readonly:true,deviceId:,path:,device:cdrom,shared:false,type:disk} devices={index:0,iface:virtio,format:raw,poolID:00000000-0000-0000-0000-000000000000,volumeID:a9ab832f-c4f2-4b9b-9d99-6393fd995979,imageID:8ec7a465-151e-4ac3-92a7-965ecf854501,specParams:{},readonly:false,domainID:808423f9-8a5c-40cd-bc9f-2568c85b8c74,optional:false,deviceId:a9ab832f-c4f2-4b9b-9d99-6393fd995979,address:{bus:0x00, slot:0x06, domain:0x0000, type:pci, function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk,bootOrder:1} devices={device:scsi,model:virtio-scsi,type:controller} devices={nicModel:pv,macAddr:00:16:3e:62:72:c8,linkActive:true,network:ovirtmgmt,specParams:{},deviceId:,address:{bus:0x00, slot:0x03, domain:0x0000, type:pci, function:0x0},device:bridge,type:interface} devices={device:console,type:console} devices={device:vga,alias:video0,type:video} devices={device:vnc,type:graphics} vmName=HostedEngine spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir smp=1 maxVCpus=8 cpuType=Opteron_G5 emulatedMachine=emulated_machine_list.json['values']['system_option_value'][0]['value'].replace('[','').replace(']','').split(', ')|first devices={device:virtio,specParams:{source:urandom},model:virtio,type:rng} You should be able to copy it to /root/myvm.conf.xml and start the engine VM withhosted-engine --vm-start --vm-conf=/root/myvm.conf Also, I think this happened when I was upgrading ovirt1 (last in the gluster cluster) from 4.3.0 to 4.3.1 . The engine got restarted , because I forgot to enable the global maintenance. >Sorry, I don't understand>Can you please explain what happened? I have updated the engine first -> All OK, next was the arbiter -> again no issues with it.Next was the empty host -> ovirt2 and everything went OK.After that I migrated the engine to ovirt2 , and tried to updated ovirt1.The web showed that the installation failed, but using "yum update" was working.During the update via yum of ovirt1 -> the engine app crashed and restarted on ovirt2.After the reboot of ovirt1 I have noticed the error about pinging the gateway ,thus I stopped the engine and stopped the following services on both hosts (global maintenance):ovirt-ha-agent ovirt-ha-broker vdsmd supervdsmd sanlock Next was a reinitialization of the sanlock space via 'sanlock direct -s'. In the end I have managed to power on the hosted-engine and it was running for a while. As the errors did not stop - I have decided to shutdown everything, then power it up , heal gluster and check what will happen. Currently I'm not able to power up the engine: [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! Please notice that in global maintenance mode nothing will try to start the engine VM for you.I assume you tried to exit global maintenance mode or at least you tried to manually start it with hosted-engine --vm-start, right? --== Host ovirt1.localdomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.localdomain Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 45e6772b local_conf_timestamp : 288 Host timestamp : 287 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=287 (Thu Mar 7 15:34:06 2019) host-id=1 score=3400 vm_conf_refresh_time=288 (Thu Mar 7 15:34:07 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host ovirt2.localdomain (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2.localdomain Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 2e9a0444 local_conf_timestamp : 3886 Host timestamp : 3885 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3885 (Thu Mar 7 15:34:05 2019) host-id=2 score=3400 vm_conf_refresh_time=3886 (Thu Mar 7 15:34:06 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start Command VM.getStats with args {'vmID': '8474ae07-f172-4a20-b516-375c73903df7'} failed: (code=1, message=Virtual machine does not exist: {'vmId': u'8474ae07-f172-4a20-b516-375c73903df7'}) [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting [root@ovirt1 ovirt-hosted-engine-ha]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt1.localdomain (id: 1) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt1.localdomain Host ID : 1 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "Down"} Score : 3400 stopped : False Local maintenance : False crc32 : 6b086b7c local_conf_timestamp : 328 Host timestamp : 327 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=327 (Thu Mar 7 15:34:46 2019) host-id=1 score=3400 vm_conf_refresh_time=328 (Thu Mar 7 15:34:47 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host ovirt2.localdomain (id: 2) status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : ovirt2.localdomain Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : c5890e9c local_conf_timestamp : 3926 Host timestamp : 3925 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=3925 (Thu Mar 7 15:34:45 2019) host-id=2 score=3400 vm_conf_refresh_time=3926 (Thu Mar 7 15:34:45 2019) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! [root@ovirt1 ovirt-hosted-engine-ha]# virsh list --all Id Name State ---------------------------------------------------- - HostedEngine shut off I am really puzzled why both volumes are wiped out . This is really scaring: can you please double check gluster logs for warning and errors? Best Regards,Strahil Nikolov
failed-hosted-engine-vdsm.log
Description: Binary data
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TCNPAHJMT6PGJB6TGO3CUYQJOCGSH2EC/