Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
Hi, the VM is up according to the status (at least for a while). You should be able to use console and diagnose anything that happened inside (line the need for fsck and such) now. Check the presence of those links again now, the metadata file content is not important, but the file has to exist (agents will populate it with status data). I have no new idea about what is wrong with that though. Best regards Martin On Fri, Jan 12, 2018 at 5:47 PM, Jayme wrote: > The lock space issue was an issue I needed to clear but I don't believe it > has resolved the problem. I shutdown agent and broker on all hosts and > disconnected hosted-storage then enabled broker/agent on just one host and > connected storage. I started the VM and actually didn't get any errors in > the logs barely at all which was good to see, however the VM is still not > running: > > HOST3: > > Engine status : {"reason": "failed liveliness check", > "health": "bad", "vm": "up", "detail": "Up"} > > ==> /var/log/messages <== > Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered disabled > state > Jan 12 12:42:57 cultivar3 kernel: device vnet0 entered promiscuous mode > Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered blocking > state > Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered > forwarding state > Jan 12 12:42:57 cultivar3 lldpad: recvfrom(Event interface): No buffer space > available > Jan 12 12:42:57 cultivar3 systemd-machined: New machine qemu-111-Cultivar. > Jan 12 12:42:57 cultivar3 systemd: Started Virtual Machine > qemu-111-Cultivar. > Jan 12 12:42:57 cultivar3 systemd: Starting Virtual Machine > qemu-111-Cultivar. > Jan 12 12:42:57 cultivar3 kvm: 3 guests now active > Jan 12 12:44:38 cultivar3 libvirtd: 2018-01-12 16:44:38.737+: 1535: > error : qemuDomainAgentAvailable:6010 : Guest agent is not responding: QEMU > guest agent is not connected > > Interestingly though, now I'm seeing this in the logs which may be a new > clue: > > > ==> /var/log/vdsm/vdsm.log <== > File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 126, > in findDomain > return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID)) > File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 116, > in findDomainPath > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > (u'248f46f0-d793-4581-9810-c9d965e2f286',) > jsonrpc/4::ERROR::2018-01-12 > 12:40:30,380::dispatcher::82::storage.Dispatcher::(wrapper) FINISH > getStorageDomainInfo error=Storage domain does not exist: > (u'248f46f0-d793-4581-9810-c9d965e2f286',) > periodic/42::ERROR::2018-01-12 12:40:35,430::api::196::root::(_getHaInfo) > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or > directory'Is the Hosted Engine setup finished? > periodic/43::ERROR::2018-01-12 12:40:50,473::api::196::root::(_getHaInfo) > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or > directory'Is the Hosted Engine setup finished? > periodic/40::ERROR::2018-01-12 12:41:05,519::api::196::root::(_getHaInfo) > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or > directory'Is the Hosted Engine setup finished? > periodic/43::ERROR::2018-01-12 12:41:20,566::api::196::root::(_getHaInfo) > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or > directory'Is the Hosted Engine setup finished? > > ==> /var/log/ovirt-hosted-engine-ha/broker.log <== > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", > line 151, in get_raw_stats > f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) > OSError: [Errno 2] No such file or directory: > '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' > StatusStorageThread::ERROR::2018-01-12 > 12:32:06,049::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) > Failed to read state. > Traceback (most recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", > line 88, in run > self._storage_broker.get_raw_stats() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", > line 162, in get_raw_stats > .format(str(e))) > RequestError: failed to read metadata: [Errno 2] No such file or directory: > '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' > > On Fri, Jan 12, 2018 at 12:02 PM, Martin Sivak wrote: >> >> The lock is the issue. >> >> - try running sanlock client status on all hosts >> - also make sure you do not have some forgotten host still connected >> to the lockspace, but without ha daemons running (and with the VM) >> >> I need to go to our president election now, I might check the email >> later tonight. >> >> Marti
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
> Can you please stop all hosted engine tooling ( On all hosts I should have added. Martin On Fri, Jan 12, 2018 at 3:22 PM, Martin Sivak wrote: >> RequestError: failed to read metadata: [Errno 2] No such file or directory: >> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' >> >> ls -al >> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 >> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 >> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 >> >> Is this due to the symlink problem you guys are referring to that was >> addressed in RC1 or something else? > > No, this file is the symlink. It should point to somewhere inside > /rhev/. I see it is a 1G file in your case. That is really > interesting. > > Can you please stop all hosted engine tooling (ovirt-ha-agent, > ovirt-ha-broker), move the file (metadata file is not important when > services are stopped, but better safe than sorry) and restart all > services again? > >> Could there possibly be a permissions >> problem somewhere? > > Maybe, but the file itself looks out of the ordinary. I wonder how it got > there. > > Best regards > > Martin Sivak > > On Fri, Jan 12, 2018 at 3:09 PM, Jayme wrote: >> Thanks for the help thus far. Storage could be related but all other VMs on >> same storage are running ok. The storage is mounted via NFS from within one >> of the three hosts, I realize this is not ideal. This was setup by a >> previous admin more as a proof of concept and VMs were put on there that >> should not have been placed in a proof of concept environment.. it was >> intended to be rebuilt with proper storage down the road. >> >> So the storage is on HOST0 and the other hosts mount NFS >> >> cultivar0.grove.silverorange.com:/exports/data 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_data >> cultivar0.grove.silverorange.com:/exports/iso 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_iso >> cultivar0.grove.silverorange.com:/exports/import_export 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_import__export >> cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_hosted__engine >> >> Like I said, the VM data storage itself seems to be working ok, as all other >> VMs appear to be running. >> >> I'm curious why the broker log says this file is not found when it is >> correct and I can see the file at that path: >> >> RequestError: failed to read metadata: [Errno 2] No such file or directory: >> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' >> >> ls -al >> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 >> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 >> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 >> >> Is this due to the symlink problem you guys are referring to that was >> addressed in RC1 or something else? Could there possibly be a permissions >> problem somewhere? >> >> Assuming that all three hosts have 4.2 rpms installed and the host engine >> will not start is it safe for me to update hosts to 4.2 RC1 rpms? Or >> perhaps install that repo and *only* update the ovirt HA packages? >> Assuming that I cannot yet apply the same updates to the inaccessible hosted >> engine VM. >> >> I should also mention one more thing. I originally upgraded the engine VM >> first using new RPMS then engine-setup. It failed due to not being in >> global maintenance, so I set global maintenance and ran it again, which >> appeared to complete as intended but never came back up after. Just in case >> this might have anything at all to do with what could have happened. >> >> Thanks very much again, I very much appreciate the help! >> >> - Jayme >> >> On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi >> wrote: >>> >>> >>> >>> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak wrote: Hi, the hosted engine agent issue might be fixed by restarting ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and -setup. We improved handling of the missing symlink. >>> >>> >>> Available just in oVirt 4.2.1 RC1 >>> All the other issues seem to point to some storage problem I am afraid. You said you started the VM, do you see it in virsh -r list? Best regards Martin Sivak On Th
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
> RequestError: failed to read metadata: [Errno 2] No such file or directory: > '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' > > ls -al > /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 > -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 > /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 > > Is this due to the symlink problem you guys are referring to that was > addressed in RC1 or something else? No, this file is the symlink. It should point to somewhere inside /rhev/. I see it is a 1G file in your case. That is really interesting. Can you please stop all hosted engine tooling (ovirt-ha-agent, ovirt-ha-broker), move the file (metadata file is not important when services are stopped, but better safe than sorry) and restart all services again? > Could there possibly be a permissions > problem somewhere? Maybe, but the file itself looks out of the ordinary. I wonder how it got there. Best regards Martin Sivak On Fri, Jan 12, 2018 at 3:09 PM, Jayme wrote: > Thanks for the help thus far. Storage could be related but all other VMs on > same storage are running ok. The storage is mounted via NFS from within one > of the three hosts, I realize this is not ideal. This was setup by a > previous admin more as a proof of concept and VMs were put on there that > should not have been placed in a proof of concept environment.. it was > intended to be rebuilt with proper storage down the road. > > So the storage is on HOST0 and the other hosts mount NFS > > cultivar0.grove.silverorange.com:/exports/data 4861742080 > 1039352832 3822389248 22% > /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_data > cultivar0.grove.silverorange.com:/exports/iso 4861742080 > 1039352832 3822389248 22% > /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_iso > cultivar0.grove.silverorange.com:/exports/import_export 4861742080 > 1039352832 3822389248 22% > /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_import__export > cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080 > 1039352832 3822389248 22% > /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_hosted__engine > > Like I said, the VM data storage itself seems to be working ok, as all other > VMs appear to be running. > > I'm curious why the broker log says this file is not found when it is > correct and I can see the file at that path: > > RequestError: failed to read metadata: [Errno 2] No such file or directory: > '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' > > ls -al > /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 > -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 > /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 > > Is this due to the symlink problem you guys are referring to that was > addressed in RC1 or something else? Could there possibly be a permissions > problem somewhere? > > Assuming that all three hosts have 4.2 rpms installed and the host engine > will not start is it safe for me to update hosts to 4.2 RC1 rpms? Or > perhaps install that repo and *only* update the ovirt HA packages? > Assuming that I cannot yet apply the same updates to the inaccessible hosted > engine VM. > > I should also mention one more thing. I originally upgraded the engine VM > first using new RPMS then engine-setup. It failed due to not being in > global maintenance, so I set global maintenance and ran it again, which > appeared to complete as intended but never came back up after. Just in case > this might have anything at all to do with what could have happened. > > Thanks very much again, I very much appreciate the help! > > - Jayme > > On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi > wrote: >> >> >> >> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak wrote: >>> >>> Hi, >>> >>> the hosted engine agent issue might be fixed by restarting >>> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and >>> -setup. We improved handling of the missing symlink. >> >> >> Available just in oVirt 4.2.1 RC1 >> >>> >>> >>> All the other issues seem to point to some storage problem I am afraid. >>> >>> You said you started the VM, do you see it in virsh -r list? >>> >>> Best regards >>> >>> Martin Sivak >>> >>> On Thu, Jan 11, 2018 at 10:00 PM, Jayme wrote: >>> > Please help, I'm really not sure what else to try at this point. Thank >>> > you >>> > for reading! >>> > >>> > >>> > I'm still working on trying to get my hosted engine running after a >>> > botched >>> > upgrade to 4.2. St
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
The blockIoTune error should be harmless. It is just a result of a data check by other component (mom) that encountered a VM that no longer exists. I thought we squashed all the logs like that though.. Martin On Fri, Jan 12, 2018 at 3:12 PM, Jayme wrote: > One more thing to add, I've also been seeing a lot of this in the syslog as > well: > > Jan 12 10:10:49 cultivar2 journal: vdsm jsonrpc.JsonRpcServer ERROR Internal > server error#012Traceback (most recent call last):#012 File > "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in > _handle_request#012res = method(**params)#012 File > "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in > _dynamicMethod#012result = fn(*methodArgs)#012 File "", line 2, > in getAllVmIoTunePolicies#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in > method#012ret = func(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in > getAllVmIoTunePolicies#012io_tune_policies_dict = > self._cif.getAllVmIoTunePolicies()#012 File > "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 524, in > getAllVmIoTunePolicies#012'current_values': v.getIoTune()}#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3481, in > getIoTune#012result = self.getIoTuneResponse()#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3500, in > getIoTuneResponse#012res = self._dom.blockIoTune(#012 File > "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in > __getattr__#012% self.vmid)#012NotConnectedError: VM > '4013c829-c9d7-4b72-90d5-6fe58137504c' was not defined yet or was undefined > > On Fri, Jan 12, 2018 at 10:09 AM, Jayme wrote: >> >> Thanks for the help thus far. Storage could be related but all other VMs >> on same storage are running ok. The storage is mounted via NFS from within >> one of the three hosts, I realize this is not ideal. This was setup by a >> previous admin more as a proof of concept and VMs were put on there that >> should not have been placed in a proof of concept environment.. it was >> intended to be rebuilt with proper storage down the road. >> >> So the storage is on HOST0 and the other hosts mount NFS >> >> cultivar0.grove.silverorange.com:/exports/data 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_data >> cultivar0.grove.silverorange.com:/exports/iso 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_iso >> cultivar0.grove.silverorange.com:/exports/import_export 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_import__export >> cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080 >> 1039352832 3822389248 22% >> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_hosted__engine >> >> Like I said, the VM data storage itself seems to be working ok, as all >> other VMs appear to be running. >> >> I'm curious why the broker log says this file is not found when it is >> correct and I can see the file at that path: >> >> RequestError: failed to read metadata: [Errno 2] No such file or >> directory: >> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' >> >> ls -al >> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 >> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 >> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 >> >> Is this due to the symlink problem you guys are referring to that was >> addressed in RC1 or something else? Could there possibly be a permissions >> problem somewhere? >> >> Assuming that all three hosts have 4.2 rpms installed and the host engine >> will not start is it safe for me to update hosts to 4.2 RC1 rpms? Or >> perhaps install that repo and *only* update the ovirt HA packages? >> Assuming that I cannot yet apply the same updates to the inaccessible hosted >> engine VM. >> >> I should also mention one more thing. I originally upgraded the engine VM >> first using new RPMS then engine-setup. It failed due to not being in >> global maintenance, so I set global maintenance and ran it again, which >> appeared to complete as intended but never came back up after. Just in case >> this might have anything at all to do with what could have happened. >> >> Thanks very much again, I very much appreciate the help! >> >> - Jayme >> >> On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi >> wrote: >>> >>> >>> >>> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak wrote: Hi, the hosted engine agent issue might be fixed by restarting ovirt-ha-broker or updating to newest ovirt-host
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
One more thing to add, I've also been seeing a lot of this in the syslog as well: Jan 12 10:10:49 cultivar2 journal: vdsm jsonrpc.JsonRpcServer ERROR Internal server error#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in _handle_request#012res = method(**params)#012 File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in _dynamicMethod#012result = fn(*methodArgs)#012 File "", line 2, in getAllVmIoTunePolicies#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method#012ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in getAllVmIoTunePolicies#012io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()#012 File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 524, in getAllVmIoTunePolicies#012'current_values': v.getIoTune()}#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3481, in getIoTune#012result = self.getIoTuneResponse()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3500, in getIoTuneResponse#012res = self._dom.blockIoTune(#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in __getattr__#012% self.vmid)#012NotConnectedError: VM '4013c829-c9d7-4b72-90d5-6fe58137504c' was not defined yet or was undefined On Fri, Jan 12, 2018 at 10:09 AM, Jayme wrote: > Thanks for the help thus far. Storage could be related but all other VMs > on same storage are running ok. The storage is mounted via NFS from within > one of the three hosts, I realize this is not ideal. This was setup by a > previous admin more as a proof of concept and VMs were put on there that > should not have been placed in a proof of concept environment.. it was > intended to be rebuilt with proper storage down the road. > > So the storage is on HOST0 and the other hosts mount NFS > > cultivar0.grove.silverorange.com:/exports/data 4861742080 > 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar > 0.grove.silverorange.com:_exports_data > cultivar0.grove.silverorange.com:/exports/iso 4861742080 > 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar > 0.grove.silverorange.com:_exports_iso > cultivar0.grove.silverorange.com:/exports/import_export 4861742080 > 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar > 0.grove.silverorange.com:_exports_import__export > cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080 > 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar > 0.grove.silverorange.com:_exports_hosted__engine > > Like I said, the VM data storage itself seems to be working ok, as all > other VMs appear to be running. > > I'm curious why the broker log says this file is not found when it is > correct and I can see the file at that path: > > RequestError: failed to read metadata: [Errno 2] No such file or > directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/ > 14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' > > ls -al /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/1 > 4a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 > -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 /var/run/vdsm/storage/248f46f0 > -d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7 > c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 > > Is this due to the symlink problem you guys are referring to that was > addressed in RC1 or something else? Could there possibly be a permissions > problem somewhere? > > Assuming that all three hosts have 4.2 rpms installed and the host engine > will not start is it safe for me to update hosts to 4.2 RC1 rpms? Or > perhaps install that repo and *only* update the ovirt HA packages? > Assuming that I cannot yet apply the same updates to the inaccessible > hosted engine VM. > > I should also mention one more thing. I originally upgraded the engine VM > first using new RPMS then engine-setup. It failed due to not being in > global maintenance, so I set global maintenance and ran it again, which > appeared to complete as intended but never came back up after. Just in > case this might have anything at all to do with what could have happened. > > Thanks very much again, I very much appreciate the help! > > - Jayme > > On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi > wrote: > >> >> >> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak wrote: >> >>> Hi, >>> >>> the hosted engine agent issue might be fixed by restarting >>> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and >>> -setup. We improved handling of the missing symlink. >>> >> >> Available just in oVirt 4.2.1 RC1 >> >> >>> >>> All the other issues seem to point to some storage problem I am afraid. >>> >>> You said you started the VM, do you see it in virsh -r list? >>> >>> Best regards >>> >>> Martin Sivak >>> >>> On Thu, Jan 11, 2018 at 10:00 PM, Jayme wrote:
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
Thanks for the help thus far. Storage could be related but all other VMs on same storage are running ok. The storage is mounted via NFS from within one of the three hosts, I realize this is not ideal. This was setup by a previous admin more as a proof of concept and VMs were put on there that should not have been placed in a proof of concept environment.. it was intended to be rebuilt with proper storage down the road. So the storage is on HOST0 and the other hosts mount NFS cultivar0.grove.silverorange.com:/exports/data 4861742080 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar 0.grove.silverorange.com:_exports_data cultivar0.grove.silverorange.com:/exports/iso 4861742080 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar 0.grove.silverorange.com:_exports_iso cultivar0.grove.silverorange.com:/exports/import_export 4861742080 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar 0.grove.silverorange.com:_exports_import__export cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080 1039352832 3822389248 22% /rhev/data-center/mnt/cultivar 0.grove.silverorange.com:_exports_hosted__engine Like I said, the VM data storage itself seems to be working ok, as all other VMs appear to be running. I'm curious why the broker log says this file is not found when it is correct and I can see the file at that path: RequestError: failed to read metadata: [Errno 2] No such file or directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/ 14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8' ls -al /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/ 14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 /var/run/vdsm/storage/248f46f0 -d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f- ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8 Is this due to the symlink problem you guys are referring to that was addressed in RC1 or something else? Could there possibly be a permissions problem somewhere? Assuming that all three hosts have 4.2 rpms installed and the host engine will not start is it safe for me to update hosts to 4.2 RC1 rpms? Or perhaps install that repo and *only* update the ovirt HA packages? Assuming that I cannot yet apply the same updates to the inaccessible hosted engine VM. I should also mention one more thing. I originally upgraded the engine VM first using new RPMS then engine-setup. It failed due to not being in global maintenance, so I set global maintenance and ran it again, which appeared to complete as intended but never came back up after. Just in case this might have anything at all to do with what could have happened. Thanks very much again, I very much appreciate the help! - Jayme On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi wrote: > > > On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak wrote: > >> Hi, >> >> the hosted engine agent issue might be fixed by restarting >> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and >> -setup. We improved handling of the missing symlink. >> > > Available just in oVirt 4.2.1 RC1 > > >> >> All the other issues seem to point to some storage problem I am afraid. >> >> You said you started the VM, do you see it in virsh -r list? >> >> Best regards >> >> Martin Sivak >> >> On Thu, Jan 11, 2018 at 10:00 PM, Jayme wrote: >> > Please help, I'm really not sure what else to try at this point. Thank >> you >> > for reading! >> > >> > >> > I'm still working on trying to get my hosted engine running after a >> botched >> > upgrade to 4.2. Storage is NFS mounted from within one of the hosts. >> Right >> > now I have 3 centos7 hosts that are fully updated with yum packages from >> > ovirt 4.2, the engine was fully updated with yum packages and failed to >> come >> > up after reboot. As of right now, everything should have full yum >> updates >> > and all having 4.2 rpms. I have global maintenance mode on right now >> and >> > started hosted-engine on one of the three host and the status is >> currently: >> > Engine status : {"reason": "failed liveliness check”; "health": "bad", >> "vm": >> > "up", "detail": "Up"} >> > >> > >> > this is what I get when trying to enter hosted-vm --console >> > >> > >> > The engine VM is running on this host >> > >> > error: failed to get domain 'HostedEngine' >> > >> > error: Domain not found: no domain with matching name 'HostedEngine' >> > >> > >> > Here are logs from various sources when I start the VM on HOST3: >> > >> > >> > hosted-engine --vm-start >> > >> > Command VM.getStats with args {'vmID': >> > '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed: >> > >> > (code=1, message=Virtual machine does not exist: {'vmId': >> > u'4013c829-c9d7-4b72-90d5-6fe58137504c'}) >> > >> > >> > Jan 11 16:55:57 cultivar3 systemd-machined: New machine >> qemu-110-Cultivar. >> > >> > Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine >> > qemu-110-Cultiv
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak wrote: > Hi, > > the hosted engine agent issue might be fixed by restarting > ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and > -setup. We improved handling of the missing symlink. > Available just in oVirt 4.2.1 RC1 > > All the other issues seem to point to some storage problem I am afraid. > > You said you started the VM, do you see it in virsh -r list? > > Best regards > > Martin Sivak > > On Thu, Jan 11, 2018 at 10:00 PM, Jayme wrote: > > Please help, I'm really not sure what else to try at this point. Thank > you > > for reading! > > > > > > I'm still working on trying to get my hosted engine running after a > botched > > upgrade to 4.2. Storage is NFS mounted from within one of the hosts. > Right > > now I have 3 centos7 hosts that are fully updated with yum packages from > > ovirt 4.2, the engine was fully updated with yum packages and failed to > come > > up after reboot. As of right now, everything should have full yum > updates > > and all having 4.2 rpms. I have global maintenance mode on right now and > > started hosted-engine on one of the three host and the status is > currently: > > Engine status : {"reason": "failed liveliness check”; "health": "bad", > "vm": > > "up", "detail": "Up"} > > > > > > this is what I get when trying to enter hosted-vm --console > > > > > > The engine VM is running on this host > > > > error: failed to get domain 'HostedEngine' > > > > error: Domain not found: no domain with matching name 'HostedEngine' > > > > > > Here are logs from various sources when I start the VM on HOST3: > > > > > > hosted-engine --vm-start > > > > Command VM.getStats with args {'vmID': > > '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed: > > > > (code=1, message=Virtual machine does not exist: {'vmId': > > u'4013c829-c9d7-4b72-90d5-6fe58137504c'}) > > > > > > Jan 11 16:55:57 cultivar3 systemd-machined: New machine > qemu-110-Cultivar. > > > > Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine > > qemu-110-Cultivar. > > > > Jan 11 16:55:57 cultivar3 systemd: Starting Virtual Machine > > qemu-110-Cultivar. > > > > Jan 11 16:55:57 cultivar3 kvm: 3 guests now active > > > > > > ==> /var/log/vdsm/vdsm.log <== > > > > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, > in > > method > > > > ret = func(*args, **kwargs) > > > > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line > 2718, in > > getStorageDomainInfo > > > > dom = self.validateSdUUID(sdUUID) > > > > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line > 304, in > > validateSdUUID > > > > sdDom.validate() > > > > File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line > 515, > > in validate > > > > raise se.StorageDomainAccessError(self.sdUUID) > > > > StorageDomainAccessError: Domain is either partially accessible or > entirely > > inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) > > > > jsonrpc/2::ERROR::2018-01-11 > > 16:55:16,144::dispatcher::82::storage.Dispatcher::(wrapper) FINISH > > getStorageDomainInfo error=Domain is either partially accessible or > entirely > > inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) > > > > > > ==> /var/log/libvirt/qemu/Cultivar.log <== > > > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > > QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name > > guest=Cultivar,debug-threads=on -S -object > > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/ > qemu/domain-108-Cultivar/master-key.aes > > -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu > > Conroe -m 8192 -realtime mlock=off -smp > > 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid > > 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios > > 'type=1,manufacturer=oVirt,product=oVirt > > Node,version=7-4.1708.el7.centos,serial=44454C4C-4300- > 1034-8035-CAC04F424331,uuid=4013c829-c9d7-4b72-90d5-6fe58137504c' > > -no-user-config -nodefaults -chardev > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain- > 108-Cultivar/monitor.sock,server,nowait > > -mon chardev=charmonitor,id=monitor,mode=control -rtc > > base=2018-01-11T20:33:19,driftfix=slew -global > > kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on > -device > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive > > file=/var/run/vdsm/storage/248f46f0-d793-4581-9810- > c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5- > fbe47c0cd705,format=raw,if=none,id=drive-virtio-disk0, > serial=c2dde892-f978-4dfc-a421-c8e04cf387f9,cache=none, > werror=stop,rerror=stop,aio=threads > > -device > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive- > virtio-disk0,id=virtio-disk0,bootindex=1 > > -drive if=none,id=drive-ide0-1-0,readonly=on -device > > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev > > tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device > > virtio-n
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
On Jan 12, 2018 11:43 AM, "Jayme" wrote: Please help, I'm really not sure what else to try at this point. Thank you for reading! I'm still working on trying to get my hosted engine running after a botched upgrade to 4.2. Storage is NFS mounted from within one of the hosts. Right now I have 3 centos7 hosts that are fully That's not a good idea. What happens when the host fails? Y. updated with yum packages from ovirt 4.2, the engine was fully updated with yum packages and failed to come up after reboot. As of right now, everything should have full yum updates and all having 4.2 rpms. I have global maintenance mode on right now and started hosted-engine on one of the three host and the status is currently: Engine status : {"reason": "failed liveliness check”; "health": "bad", "vm": "up", "detail": "Up"} this is what I get when trying to enter hosted-vm --console The engine VM is running on this host error: failed to get domain 'HostedEngine' error: Domain not found: no domain with matching name 'HostedEngine' Here are logs from various sources when I start the VM on HOST3: hosted-engine --vm-start Command VM.getStats with args {'vmID': '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed: (code=1, message=Virtual machine does not exist: {'vmId': u'4013c829-c9d7-4b72-90d5-6fe58137504c'}) Jan 11 16:55:57 cultivar3 systemd-machined: New machine qemu-110-Cultivar. Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine qemu-110-Cultivar. Jan 11 16:55:57 cultivar3 systemd: Starting Virtual Machine qemu-110-Cultivar. Jan 11 16:55:57 cultivar3 kvm: 3 guests now active ==> /var/log/vdsm/vdsm.log <== File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2718, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 304, in validateSdUUID sdDom.validate() File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 515, in validate raise se.StorageDomainAccessError(self.sdUUID) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) jsonrpc/2::ERROR::2018-01-11 16:55:16,144::dispatcher::82::storage.Dispatcher::(wrapper) FINISH getStorageDomainInfo error=Domain is either partially accessible or entirely inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) ==> /var/log/libvirt/qemu/Cultivar.log <== LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=Cultivar,debug-threads=on -S -object secret,id=masterKey0,format= raw,file=/var/lib/libvirt/qemu/domain-108-Cultivar/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Conroe -m 8192 -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-4.1708.el7. centos,serial=44454C4C-4300-1034-8035-CAC04F424331,uuid= 4013c829-c9d7-4b72-90d5-6fe58137504c' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain- 108-Cultivar/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2018-01-11T20:33:19,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5- fbe47c0cd705,format=raw,if=none,id=drive-virtio-disk0, serial=c2dde892-f978-4dfc-a421-c8e04cf387f9,cache=none, werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus= pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:7f:d6:83,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5-6fe58137504c.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev= charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5-6fe58137504c.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev= charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio- serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -chardev socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5-6fe5813750
Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
Hi, the hosted engine agent issue might be fixed by restarting ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and -setup. We improved handling of the missing symlink. All the other issues seem to point to some storage problem I am afraid. You said you started the VM, do you see it in virsh -r list? Best regards Martin Sivak On Thu, Jan 11, 2018 at 10:00 PM, Jayme wrote: > Please help, I'm really not sure what else to try at this point. Thank you > for reading! > > > I'm still working on trying to get my hosted engine running after a botched > upgrade to 4.2. Storage is NFS mounted from within one of the hosts. Right > now I have 3 centos7 hosts that are fully updated with yum packages from > ovirt 4.2, the engine was fully updated with yum packages and failed to come > up after reboot. As of right now, everything should have full yum updates > and all having 4.2 rpms. I have global maintenance mode on right now and > started hosted-engine on one of the three host and the status is currently: > Engine status : {"reason": "failed liveliness check”; "health": "bad", "vm": > "up", "detail": "Up"} > > > this is what I get when trying to enter hosted-vm --console > > > The engine VM is running on this host > > error: failed to get domain 'HostedEngine' > > error: Domain not found: no domain with matching name 'HostedEngine' > > > Here are logs from various sources when I start the VM on HOST3: > > > hosted-engine --vm-start > > Command VM.getStats with args {'vmID': > '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed: > > (code=1, message=Virtual machine does not exist: {'vmId': > u'4013c829-c9d7-4b72-90d5-6fe58137504c'}) > > > Jan 11 16:55:57 cultivar3 systemd-machined: New machine qemu-110-Cultivar. > > Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine > qemu-110-Cultivar. > > Jan 11 16:55:57 cultivar3 systemd: Starting Virtual Machine > qemu-110-Cultivar. > > Jan 11 16:55:57 cultivar3 kvm: 3 guests now active > > > ==> /var/log/vdsm/vdsm.log <== > > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in > method > > ret = func(*args, **kwargs) > > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2718, in > getStorageDomainInfo > > dom = self.validateSdUUID(sdUUID) > > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 304, in > validateSdUUID > > sdDom.validate() > > File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 515, > in validate > > raise se.StorageDomainAccessError(self.sdUUID) > > StorageDomainAccessError: Domain is either partially accessible or entirely > inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) > > jsonrpc/2::ERROR::2018-01-11 > 16:55:16,144::dispatcher::82::storage.Dispatcher::(wrapper) FINISH > getStorageDomainInfo error=Domain is either partially accessible or entirely > inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) > > > ==> /var/log/libvirt/qemu/Cultivar.log <== > > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin > QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name > guest=Cultivar,debug-threads=on -S -object > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-108-Cultivar/master-key.aes > -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu > Conroe -m 8192 -realtime mlock=off -smp > 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid > 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios > 'type=1,manufacturer=oVirt,product=oVirt > Node,version=7-4.1708.el7.centos,serial=44454C4C-4300-1034-8035-CAC04F424331,uuid=4013c829-c9d7-4b72-90d5-6fe58137504c' > -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-108-Cultivar/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc > base=2018-01-11T20:33:19,driftfix=slew -global > kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive > file=/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5-fbe47c0cd705,format=raw,if=none,id=drive-virtio-disk0,serial=c2dde892-f978-4dfc-a421-c8e04cf387f9,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 > -drive if=none,id=drive-ide0-1-0,readonly=on -device > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev > tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:7f:d6:83,bus=pci.0,addr=0x3 > -chardev > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.com.redhat.rhevm.vdsm,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm > -chardev > socket,id=charchannel1,path=/var/lib/li
[ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade
Please help, I'm really not sure what else to try at this point. Thank you for reading! I'm still working on trying to get my hosted engine running after a botched upgrade to 4.2. Storage is NFS mounted from within one of the hosts. Right now I have 3 centos7 hosts that are fully updated with yum packages from ovirt 4.2, the engine was fully updated with yum packages and failed to come up after reboot. As of right now, everything should have full yum updates and all having 4.2 rpms. I have global maintenance mode on right now and started hosted-engine on one of the three host and the status is currently: Engine status : {"reason": "failed liveliness check”; "health": "bad", "vm": "up", "detail": "Up"} this is what I get when trying to enter hosted-vm --console The engine VM is running on this host error: failed to get domain 'HostedEngine' error: Domain not found: no domain with matching name 'HostedEngine' Here are logs from various sources when I start the VM on HOST3: hosted-engine --vm-start Command VM.getStats with args {'vmID': '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed: (code=1, message=Virtual machine does not exist: {'vmId': u'4013c829-c9d7-4b72-90d5-6fe58137504c'}) Jan 11 16:55:57 cultivar3 systemd-machined: New machine qemu-110-Cultivar. Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine qemu-110-Cultivar. Jan 11 16:55:57 cultivar3 systemd: Starting Virtual Machine qemu-110-Cultivar. Jan 11 16:55:57 cultivar3 kvm: 3 guests now active ==> /var/log/vdsm/vdsm.log <== File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2718, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 304, in validateSdUUID sdDom.validate() File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 515, in validate raise se.StorageDomainAccessError(self.sdUUID) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) jsonrpc/2::ERROR::2018-01-11 16:55:16,144::dispatcher::82::storage.Dispatcher::(wrapper) FINISH getStorageDomainInfo error=Domain is either partially accessible or entirely inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',) ==> /var/log/libvirt/qemu/Cultivar.log <== LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=Cultivar,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-108-Cultivar/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Conroe -m 8192 -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios 'type=1,manufacturer=oVirt,product=oVirt Node,version=7-4.1708.el7.centos,serial=44454C4C-4300-1034-8035-CAC04F424331,uuid=4013c829-c9d7-4b72-90d5-6fe58137504c' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-108-Cultivar/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2018-01-11T20:33:19,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5-fbe47c0cd705,format=raw,if=none,id=drive-virtio-disk0,serial=c2dde892-f978-4dfc-a421-c8e04cf387f9,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:7f:d6:83,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -chardev socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=4,chardev=charchannel3,id=c