Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Martin Sivak
Hi,

the VM is up according to the status (at least for a while). You
should be able to use console and diagnose anything that happened
inside (line the need for fsck and such) now.

Check the presence of those links again now, the metadata file content
is not important, but the file has to exist (agents will populate it
with status data). I have no new idea about what is wrong with that
though.

Best regards

Martin



On Fri, Jan 12, 2018 at 5:47 PM, Jayme  wrote:
> The lock space issue was an issue I needed to clear but I don't believe it
> has resolved the problem.  I shutdown agent and broker on all hosts and
> disconnected hosted-storage then enabled broker/agent on just one host and
> connected storage.  I started the VM and actually didn't get any errors in
> the logs barely at all which was good to see, however the VM is still not
> running:
>
> HOST3:
>
> Engine status  : {"reason": "failed liveliness check",
> "health": "bad", "vm": "up", "detail": "Up"}
>
> ==> /var/log/messages <==
> Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered disabled
> state
> Jan 12 12:42:57 cultivar3 kernel: device vnet0 entered promiscuous mode
> Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered blocking
> state
> Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered
> forwarding state
> Jan 12 12:42:57 cultivar3 lldpad: recvfrom(Event interface): No buffer space
> available
> Jan 12 12:42:57 cultivar3 systemd-machined: New machine qemu-111-Cultivar.
> Jan 12 12:42:57 cultivar3 systemd: Started Virtual Machine
> qemu-111-Cultivar.
> Jan 12 12:42:57 cultivar3 systemd: Starting Virtual Machine
> qemu-111-Cultivar.
> Jan 12 12:42:57 cultivar3 kvm: 3 guests now active
> Jan 12 12:44:38 cultivar3 libvirtd: 2018-01-12 16:44:38.737+: 1535:
> error : qemuDomainAgentAvailable:6010 : Guest agent is not responding: QEMU
> guest agent is not connected
>
> Interestingly though, now I'm seeing this in the logs which may be a new
> clue:
>
>
> ==> /var/log/vdsm/vdsm.log <==
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 126,
> in findDomain
> return NfsStorageDomain(NfsStorageDomain.findDomainPath(sdUUID))
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD.py", line 116,
> in findDomainPath
> raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'248f46f0-d793-4581-9810-c9d965e2f286',)
> jsonrpc/4::ERROR::2018-01-12
> 12:40:30,380::dispatcher::82::storage.Dispatcher::(wrapper) FINISH
> getStorageDomainInfo error=Storage domain does not exist:
> (u'248f46f0-d793-4581-9810-c9d965e2f286',)
> periodic/42::ERROR::2018-01-12 12:40:35,430::api::196::root::(_getHaInfo)
> failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
> directory'Is the Hosted Engine setup finished?
> periodic/43::ERROR::2018-01-12 12:40:50,473::api::196::root::(_getHaInfo)
> failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
> directory'Is the Hosted Engine setup finished?
> periodic/40::ERROR::2018-01-12 12:41:05,519::api::196::root::(_getHaInfo)
> failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
> directory'Is the Hosted Engine setup finished?
> periodic/43::ERROR::2018-01-12 12:41:20,566::api::196::root::(_getHaInfo)
> failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
> directory'Is the Hosted Engine setup finished?
>
> ==> /var/log/ovirt-hosted-engine-ha/broker.log <==
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 151, in get_raw_stats
> f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
> OSError: [Errno 2] No such file or directory:
> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
> StatusStorageThread::ERROR::2018-01-12
> 12:32:06,049::status_broker::92::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
> Failed to read state.
> Traceback (most recent call last):
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
> line 88, in run
> self._storage_broker.get_raw_stats()
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 162, in get_raw_stats
> .format(str(e)))
> RequestError: failed to read metadata: [Errno 2] No such file or directory:
> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>
> On Fri, Jan 12, 2018 at 12:02 PM, Martin Sivak  wrote:
>>
>> The lock is the issue.
>>
>> - try running sanlock client status on all hosts
>> - also make sure you do not have some forgotten host still connected
>> to the lockspace, but without ha daemons running (and with the VM)
>>
>> I need to go to our president election now, I might check 

Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Martin Sivak
> Can you please stop all hosted engine tooling (

On all hosts I should have added.

Martin

On Fri, Jan 12, 2018 at 3:22 PM, Martin Sivak  wrote:
>> RequestError: failed to read metadata: [Errno 2] No such file or directory:
>> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>>
>>  ls -al
>> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59
>> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>>
>> Is this due to the symlink problem you guys are referring to that was
>> addressed in RC1 or something else?
>
> No, this file is the symlink. It should point to somewhere inside
> /rhev/. I see it is a 1G file in your case. That is really
> interesting.
>
> Can you please stop all hosted engine tooling (ovirt-ha-agent,
> ovirt-ha-broker), move the file (metadata file is not important when
> services are stopped, but better safe than sorry) and restart all
> services again?
>
>> Could there possibly be a permissions
>> problem somewhere?
>
> Maybe, but the file itself looks out of the ordinary. I wonder how it got 
> there.
>
> Best regards
>
> Martin Sivak
>
> On Fri, Jan 12, 2018 at 3:09 PM, Jayme  wrote:
>> Thanks for the help thus far.  Storage could be related but all other VMs on
>> same storage are running ok.  The storage is mounted via NFS from within one
>> of the three hosts, I realize this is not ideal.  This was setup by a
>> previous admin more as a proof of concept and VMs were put on there that
>> should not have been placed in a proof of concept environment.. it was
>> intended to be rebuilt with proper storage down the road.
>>
>> So the storage is on HOST0 and the other hosts mount NFS
>>
>> cultivar0.grove.silverorange.com:/exports/data  4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_data
>> cultivar0.grove.silverorange.com:/exports/iso   4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_iso
>> cultivar0.grove.silverorange.com:/exports/import_export 4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_import__export
>> cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_hosted__engine
>>
>> Like I said, the VM data storage itself seems to be working ok, as all other
>> VMs appear to be running.
>>
>> I'm curious why the broker log says this file is not found when it is
>> correct and I can see the file at that path:
>>
>> RequestError: failed to read metadata: [Errno 2] No such file or directory:
>> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>>
>>  ls -al
>> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59
>> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>>
>> Is this due to the symlink problem you guys are referring to that was
>> addressed in RC1 or something else?  Could there possibly be a permissions
>> problem somewhere?
>>
>> Assuming that all three hosts have 4.2 rpms installed and the host engine
>> will not start is it safe for me to update hosts to 4.2 RC1 rpms?   Or
>> perhaps install that repo and *only* update the ovirt HA packages?
>> Assuming that I cannot yet apply the same updates to the inaccessible hosted
>> engine VM.
>>
>> I should also mention one more thing.  I originally upgraded the engine VM
>> first using new RPMS then engine-setup.  It failed due to not being in
>> global maintenance, so I set global maintenance and ran it again, which
>> appeared to complete as intended but never came back up after.  Just in case
>> this might have anything at all to do with what could have happened.
>>
>> Thanks very much again, I very much appreciate the help!
>>
>> - Jayme
>>
>> On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi 
>> wrote:
>>>
>>>
>>>
>>> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak  wrote:

 Hi,

 the hosted engine agent issue might be fixed by restarting
 ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and
 -setup. We improved handling of the missing symlink.
>>>
>>>
>>> Available just in oVirt 4.2.1 RC1
>>>


 All the other issues seem to point to some storage problem I am afraid.

 You said you started the VM, do you see it 

Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Martin Sivak
> RequestError: failed to read metadata: [Errno 2] No such file or directory:
> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>
>  ls -al
> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59
> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>
> Is this due to the symlink problem you guys are referring to that was
> addressed in RC1 or something else?

No, this file is the symlink. It should point to somewhere inside
/rhev/. I see it is a 1G file in your case. That is really
interesting.

Can you please stop all hosted engine tooling (ovirt-ha-agent,
ovirt-ha-broker), move the file (metadata file is not important when
services are stopped, but better safe than sorry) and restart all
services again?

> Could there possibly be a permissions
> problem somewhere?

Maybe, but the file itself looks out of the ordinary. I wonder how it got there.

Best regards

Martin Sivak

On Fri, Jan 12, 2018 at 3:09 PM, Jayme  wrote:
> Thanks for the help thus far.  Storage could be related but all other VMs on
> same storage are running ok.  The storage is mounted via NFS from within one
> of the three hosts, I realize this is not ideal.  This was setup by a
> previous admin more as a proof of concept and VMs were put on there that
> should not have been placed in a proof of concept environment.. it was
> intended to be rebuilt with proper storage down the road.
>
> So the storage is on HOST0 and the other hosts mount NFS
>
> cultivar0.grove.silverorange.com:/exports/data  4861742080
> 1039352832 3822389248  22%
> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_data
> cultivar0.grove.silverorange.com:/exports/iso   4861742080
> 1039352832 3822389248  22%
> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_iso
> cultivar0.grove.silverorange.com:/exports/import_export 4861742080
> 1039352832 3822389248  22%
> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_import__export
> cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080
> 1039352832 3822389248  22%
> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_hosted__engine
>
> Like I said, the VM data storage itself seems to be working ok, as all other
> VMs appear to be running.
>
> I'm curious why the broker log says this file is not found when it is
> correct and I can see the file at that path:
>
> RequestError: failed to read metadata: [Errno 2] No such file or directory:
> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>
>  ls -al
> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59
> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>
> Is this due to the symlink problem you guys are referring to that was
> addressed in RC1 or something else?  Could there possibly be a permissions
> problem somewhere?
>
> Assuming that all three hosts have 4.2 rpms installed and the host engine
> will not start is it safe for me to update hosts to 4.2 RC1 rpms?   Or
> perhaps install that repo and *only* update the ovirt HA packages?
> Assuming that I cannot yet apply the same updates to the inaccessible hosted
> engine VM.
>
> I should also mention one more thing.  I originally upgraded the engine VM
> first using new RPMS then engine-setup.  It failed due to not being in
> global maintenance, so I set global maintenance and ran it again, which
> appeared to complete as intended but never came back up after.  Just in case
> this might have anything at all to do with what could have happened.
>
> Thanks very much again, I very much appreciate the help!
>
> - Jayme
>
> On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi 
> wrote:
>>
>>
>>
>> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak  wrote:
>>>
>>> Hi,
>>>
>>> the hosted engine agent issue might be fixed by restarting
>>> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and
>>> -setup. We improved handling of the missing symlink.
>>
>>
>> Available just in oVirt 4.2.1 RC1
>>
>>>
>>>
>>> All the other issues seem to point to some storage problem I am afraid.
>>>
>>> You said you started the VM, do you see it in virsh -r list?
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Thu, Jan 11, 2018 at 10:00 PM, Jayme  wrote:
>>> > Please help, I'm really not sure what else to try at this point.  Thank
>>> > you
>>> > for reading!
>>> >
>>> >
>>> > I'm still working on trying to 

Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Martin Sivak
The blockIoTune error should be harmless. It is just a result of a
data check by other component (mom) that encountered a VM that no
longer exists.

I thought we squashed all the logs like that though..

Martin

On Fri, Jan 12, 2018 at 3:12 PM, Jayme  wrote:
> One more thing to add, I've also been seeing a lot of this in the syslog as
> well:
>
> Jan 12 10:10:49 cultivar2 journal: vdsm jsonrpc.JsonRpcServer ERROR Internal
> server error#012Traceback (most recent call last):#012  File
> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in
> _handle_request#012res = method(**params)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in
> _dynamicMethod#012result = fn(*methodArgs)#012  File "", line 2,
> in getAllVmIoTunePolicies#012  File
> "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
> method#012ret = func(*args, **kwargs)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in
> getAllVmIoTunePolicies#012io_tune_policies_dict =
> self._cif.getAllVmIoTunePolicies()#012  File
> "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 524, in
> getAllVmIoTunePolicies#012'current_values': v.getIoTune()}#012  File
> "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3481, in
> getIoTune#012result = self.getIoTuneResponse()#012  File
> "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3500, in
> getIoTuneResponse#012res = self._dom.blockIoTune(#012  File
> "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in
> __getattr__#012% self.vmid)#012NotConnectedError: VM
> '4013c829-c9d7-4b72-90d5-6fe58137504c' was not defined yet or was undefined
>
> On Fri, Jan 12, 2018 at 10:09 AM, Jayme  wrote:
>>
>> Thanks for the help thus far.  Storage could be related but all other VMs
>> on same storage are running ok.  The storage is mounted via NFS from within
>> one of the three hosts, I realize this is not ideal.  This was setup by a
>> previous admin more as a proof of concept and VMs were put on there that
>> should not have been placed in a proof of concept environment.. it was
>> intended to be rebuilt with proper storage down the road.
>>
>> So the storage is on HOST0 and the other hosts mount NFS
>>
>> cultivar0.grove.silverorange.com:/exports/data  4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_data
>> cultivar0.grove.silverorange.com:/exports/iso   4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_iso
>> cultivar0.grove.silverorange.com:/exports/import_export 4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_import__export
>> cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080
>> 1039352832 3822389248  22%
>> /rhev/data-center/mnt/cultivar0.grove.silverorange.com:_exports_hosted__engine
>>
>> Like I said, the VM data storage itself seems to be working ok, as all
>> other VMs appear to be running.
>>
>> I'm curious why the broker log says this file is not found when it is
>> correct and I can see the file at that path:
>>
>> RequestError: failed to read metadata: [Errno 2] No such file or
>> directory:
>> '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>>
>>  ls -al
>> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59
>> /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>>
>> Is this due to the symlink problem you guys are referring to that was
>> addressed in RC1 or something else?  Could there possibly be a permissions
>> problem somewhere?
>>
>> Assuming that all three hosts have 4.2 rpms installed and the host engine
>> will not start is it safe for me to update hosts to 4.2 RC1 rpms?   Or
>> perhaps install that repo and *only* update the ovirt HA packages?
>> Assuming that I cannot yet apply the same updates to the inaccessible hosted
>> engine VM.
>>
>> I should also mention one more thing.  I originally upgraded the engine VM
>> first using new RPMS then engine-setup.  It failed due to not being in
>> global maintenance, so I set global maintenance and ran it again, which
>> appeared to complete as intended but never came back up after.  Just in case
>> this might have anything at all to do with what could have happened.
>>
>> Thanks very much again, I very much appreciate the help!
>>
>> - Jayme
>>
>> On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi 
>> wrote:
>>>
>>>
>>>
>>> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak  wrote:

 Hi,

 the hosted engine agent issue might 

Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Jayme
One more thing to add, I've also been seeing a lot of this in the syslog as
well:

Jan 12 10:10:49 cultivar2 journal: vdsm jsonrpc.JsonRpcServer ERROR
Internal server error#012Traceback (most recent call last):#012  File
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in
_handle_request#012res = method(**params)#012  File
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in
_dynamicMethod#012result = fn(*methodArgs)#012  File "", line
2, in getAllVmIoTunePolicies#012  File
"/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
method#012ret = func(*args, **kwargs)#012  File
"/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in
getAllVmIoTunePolicies#012io_tune_policies_dict =
self._cif.getAllVmIoTunePolicies()#012  File
"/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 524, in
getAllVmIoTunePolicies#012'current_values': v.getIoTune()}#012  File
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3481, in
getIoTune#012result = self.getIoTuneResponse()#012  File
"/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3500, in
getIoTuneResponse#012res = self._dom.blockIoTune(#012  File
"/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 47, in
__getattr__#012% self.vmid)#012NotConnectedError: VM
'4013c829-c9d7-4b72-90d5-6fe58137504c' was not defined yet or was undefined

On Fri, Jan 12, 2018 at 10:09 AM, Jayme  wrote:

> Thanks for the help thus far.  Storage could be related but all other VMs
> on same storage are running ok.  The storage is mounted via NFS from within
> one of the three hosts, I realize this is not ideal.  This was setup by a
> previous admin more as a proof of concept and VMs were put on there that
> should not have been placed in a proof of concept environment.. it was
> intended to be rebuilt with proper storage down the road.
>
> So the storage is on HOST0 and the other hosts mount NFS
>
> cultivar0.grove.silverorange.com:/exports/data  4861742080
> 1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
> 0.grove.silverorange.com:_exports_data
> cultivar0.grove.silverorange.com:/exports/iso   4861742080
> 1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
> 0.grove.silverorange.com:_exports_iso
> cultivar0.grove.silverorange.com:/exports/import_export 4861742080
> 1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
> 0.grove.silverorange.com:_exports_import__export
> cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080
> 1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
> 0.grove.silverorange.com:_exports_hosted__engine
>
> Like I said, the VM data storage itself seems to be working ok, as all
> other VMs appear to be running.
>
> I'm curious why the broker log says this file is not found when it is
> correct and I can see the file at that path:
>
> RequestError: failed to read metadata: [Errno 2] No such file or
> directory: '/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/
> 14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'
>
>  ls -al /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/1
> 4a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
> -rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 /var/run/vdsm/storage/248f46f0
> -d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-ace38d7
> c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>
> Is this due to the symlink problem you guys are referring to that was
> addressed in RC1 or something else?  Could there possibly be a permissions
> problem somewhere?
>
> Assuming that all three hosts have 4.2 rpms installed and the host engine
> will not start is it safe for me to update hosts to 4.2 RC1 rpms?   Or
> perhaps install that repo and *only* update the ovirt HA packages?
>  Assuming that I cannot yet apply the same updates to the inaccessible
> hosted engine VM.
>
> I should also mention one more thing.  I originally upgraded the engine VM
> first using new RPMS then engine-setup.  It failed due to not being in
> global maintenance, so I set global maintenance and ran it again, which
> appeared to complete as intended but never came back up after.  Just in
> case this might have anything at all to do with what could have happened.
>
> Thanks very much again, I very much appreciate the help!
>
> - Jayme
>
> On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak  wrote:
>>
>>> Hi,
>>>
>>> the hosted engine agent issue might be fixed by restarting
>>> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and
>>> -setup. We improved handling of the missing symlink.
>>>
>>
>> Available just in oVirt 4.2.1 RC1
>>
>>
>>>
>>> All the other issues seem to point to some storage problem I am afraid.
>>>
>>> You said you started the VM, do you see it in virsh -r list?
>>>
>>> Best regards
>>>
>>> Martin 

Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Jayme
Thanks for the help thus far.  Storage could be related but all other VMs
on same storage are running ok.  The storage is mounted via NFS from within
one of the three hosts, I realize this is not ideal.  This was setup by a
previous admin more as a proof of concept and VMs were put on there that
should not have been placed in a proof of concept environment.. it was
intended to be rebuilt with proper storage down the road.

So the storage is on HOST0 and the other hosts mount NFS

cultivar0.grove.silverorange.com:/exports/data  4861742080
1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
0.grove.silverorange.com:_exports_data
cultivar0.grove.silverorange.com:/exports/iso   4861742080
1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
0.grove.silverorange.com:_exports_iso
cultivar0.grove.silverorange.com:/exports/import_export 4861742080
1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
0.grove.silverorange.com:_exports_import__export
cultivar0.grove.silverorange.com:/exports/hosted_engine 4861742080
1039352832 3822389248  22% /rhev/data-center/mnt/cultivar
0.grove.silverorange.com:_exports_hosted__engine

Like I said, the VM data storage itself seems to be working ok, as all
other VMs appear to be running.

I'm curious why the broker log says this file is not found when it is
correct and I can see the file at that path:

RequestError: failed to read metadata: [Errno 2] No such file or directory:
'/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/
14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8'

 ls -al /var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/
14a20941-1b84-4b82-be8f-ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
-rw-rw. 1 vdsm kvm 1028096 Jan 12 09:59 /var/run/vdsm/storage/248f46f0
-d793-4581-9810-c9d965e2f286/14a20941-1b84-4b82-be8f-
ace38d7c037a/8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8

Is this due to the symlink problem you guys are referring to that was
addressed in RC1 or something else?  Could there possibly be a permissions
problem somewhere?

Assuming that all three hosts have 4.2 rpms installed and the host engine
will not start is it safe for me to update hosts to 4.2 RC1 rpms?   Or
perhaps install that repo and *only* update the ovirt HA packages?
 Assuming that I cannot yet apply the same updates to the inaccessible
hosted engine VM.

I should also mention one more thing.  I originally upgraded the engine VM
first using new RPMS then engine-setup.  It failed due to not being in
global maintenance, so I set global maintenance and ran it again, which
appeared to complete as intended but never came back up after.  Just in
case this might have anything at all to do with what could have happened.

Thanks very much again, I very much appreciate the help!

- Jayme

On Fri, Jan 12, 2018 at 8:44 AM, Simone Tiraboschi 
wrote:

>
>
> On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak  wrote:
>
>> Hi,
>>
>> the hosted engine agent issue might be fixed by restarting
>> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and
>> -setup. We improved handling of the missing symlink.
>>
>
> Available just in oVirt 4.2.1 RC1
>
>
>>
>> All the other issues seem to point to some storage problem I am afraid.
>>
>> You said you started the VM, do you see it in virsh -r list?
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Thu, Jan 11, 2018 at 10:00 PM, Jayme  wrote:
>> > Please help, I'm really not sure what else to try at this point.  Thank
>> you
>> > for reading!
>> >
>> >
>> > I'm still working on trying to get my hosted engine running after a
>> botched
>> > upgrade to 4.2.  Storage is NFS mounted from within one of the hosts.
>> Right
>> > now I have 3 centos7 hosts that are fully updated with yum packages from
>> > ovirt 4.2, the engine was fully updated with yum packages and failed to
>> come
>> > up after reboot.  As of right now, everything should have full yum
>> updates
>> > and all having 4.2 rpms.  I have global maintenance mode on right now
>> and
>> > started hosted-engine on one of the three host and the status is
>> currently:
>> > Engine status : {"reason": "failed liveliness check”; "health": "bad",
>> "vm":
>> > "up", "detail": "Up"}
>> >
>> >
>> > this is what I get when trying to enter hosted-vm --console
>> >
>> >
>> > The engine VM is running on this host
>> >
>> > error: failed to get domain 'HostedEngine'
>> >
>> > error: Domain not found: no domain with matching name 'HostedEngine'
>> >
>> >
>> > Here are logs from various sources when I start the VM on HOST3:
>> >
>> >
>> > hosted-engine --vm-start
>> >
>> > Command VM.getStats with args {'vmID':
>> > '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed:
>> >
>> > (code=1, message=Virtual machine does not exist: {'vmId':
>> > u'4013c829-c9d7-4b72-90d5-6fe58137504c'})
>> >
>> >
>> > Jan 11 16:55:57 cultivar3 systemd-machined: New machine
>> qemu-110-Cultivar.
>> >
>> > Jan 11 16:55:57 

Re: [ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Simone Tiraboschi
On Fri, Jan 12, 2018 at 11:11 AM, Martin Sivak  wrote:

> Hi,
>
> the hosted engine agent issue might be fixed by restarting
> ovirt-ha-broker or updating to newest ovirt-hosted-engine-ha and
> -setup. We improved handling of the missing symlink.
>

Available just in oVirt 4.2.1 RC1


>
> All the other issues seem to point to some storage problem I am afraid.
>
> You said you started the VM, do you see it in virsh -r list?
>
> Best regards
>
> Martin Sivak
>
> On Thu, Jan 11, 2018 at 10:00 PM, Jayme  wrote:
> > Please help, I'm really not sure what else to try at this point.  Thank
> you
> > for reading!
> >
> >
> > I'm still working on trying to get my hosted engine running after a
> botched
> > upgrade to 4.2.  Storage is NFS mounted from within one of the hosts.
> Right
> > now I have 3 centos7 hosts that are fully updated with yum packages from
> > ovirt 4.2, the engine was fully updated with yum packages and failed to
> come
> > up after reboot.  As of right now, everything should have full yum
> updates
> > and all having 4.2 rpms.  I have global maintenance mode on right now and
> > started hosted-engine on one of the three host and the status is
> currently:
> > Engine status : {"reason": "failed liveliness check”; "health": "bad",
> "vm":
> > "up", "detail": "Up"}
> >
> >
> > this is what I get when trying to enter hosted-vm --console
> >
> >
> > The engine VM is running on this host
> >
> > error: failed to get domain 'HostedEngine'
> >
> > error: Domain not found: no domain with matching name 'HostedEngine'
> >
> >
> > Here are logs from various sources when I start the VM on HOST3:
> >
> >
> > hosted-engine --vm-start
> >
> > Command VM.getStats with args {'vmID':
> > '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed:
> >
> > (code=1, message=Virtual machine does not exist: {'vmId':
> > u'4013c829-c9d7-4b72-90d5-6fe58137504c'})
> >
> >
> > Jan 11 16:55:57 cultivar3 systemd-machined: New machine
> qemu-110-Cultivar.
> >
> > Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine
> > qemu-110-Cultivar.
> >
> > Jan 11 16:55:57 cultivar3 systemd: Starting Virtual Machine
> > qemu-110-Cultivar.
> >
> > Jan 11 16:55:57 cultivar3 kvm: 3 guests now active
> >
> >
> > ==> /var/log/vdsm/vdsm.log <==
> >
> >   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48,
> in
> > method
> >
> > ret = func(*args, **kwargs)
> >
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
> 2718, in
> > getStorageDomainInfo
> >
> > dom = self.validateSdUUID(sdUUID)
> >
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line
> 304, in
> > validateSdUUID
> >
> > sdDom.validate()
> >
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line
> 515,
> > in validate
> >
> > raise se.StorageDomainAccessError(self.sdUUID)
> >
> > StorageDomainAccessError: Domain is either partially accessible or
> entirely
> > inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',)
> >
> > jsonrpc/2::ERROR::2018-01-11
> > 16:55:16,144::dispatcher::82::storage.Dispatcher::(wrapper) FINISH
> > getStorageDomainInfo error=Domain is either partially accessible or
> entirely
> > inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',)
> >
> >
> > ==> /var/log/libvirt/qemu/Cultivar.log <==
> >
> > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
> > QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name
> > guest=Cultivar,debug-threads=on -S -object
> > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/
> qemu/domain-108-Cultivar/master-key.aes
> > -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu
> > Conroe -m 8192 -realtime mlock=off -smp
> > 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid
> > 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios
> > 'type=1,manufacturer=oVirt,product=oVirt
> > Node,version=7-4.1708.el7.centos,serial=44454C4C-4300-
> 1034-8035-CAC04F424331,uuid=4013c829-c9d7-4b72-90d5-6fe58137504c'
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-
> 108-Cultivar/monitor.sock,server,nowait
> > -mon chardev=charmonitor,id=monitor,mode=control -rtc
> > base=2018-01-11T20:33:19,driftfix=slew -global
> > kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on
> -device
> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
> > file=/var/run/vdsm/storage/248f46f0-d793-4581-9810-
> c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5-
> fbe47c0cd705,format=raw,if=none,id=drive-virtio-disk0,
> serial=c2dde892-f978-4dfc-a421-c8e04cf387f9,cache=none,
> werror=stop,rerror=stop,aio=threads
> > -device
> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-
> virtio-disk0,id=virtio-disk0,bootindex=1
> > -drive if=none,id=drive-ide0-1-0,readonly=on -device
> > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
> > 

[ovirt-users] unable to bring up hosted engine after botched 4.2 upgrade

2018-01-12 Thread Jayme
Please help, I'm really not sure what else to try at this point.  Thank you
for reading!


I'm still working on trying to get my hosted engine running after a botched
upgrade to 4.2.  Storage is NFS mounted from within one of the hosts.  Right
now I have 3 centos7 hosts that are fully updated with yum packages from
ovirt 4.2, the engine was fully updated with yum packages and failed to
come up after reboot.  As of right now, everything should have full yum
updates and all having 4.2 rpms.  I have global maintenance mode on right
now and started hosted-engine on one of the three host and the status is
currently: Engine status : {"reason": "failed liveliness check”; "health":
"bad", "vm": "up", "detail": "Up"}


this is what I get when trying to enter hosted-vm --console


The engine VM is running on this host

error: failed to get domain 'HostedEngine'

error: Domain not found: no domain with matching name 'HostedEngine'


Here are logs from various sources when I start the VM on HOST3:


hosted-engine --vm-start

Command VM.getStats with args {'vmID':
'4013c829-c9d7-4b72-90d5-6fe58137504c'} failed:

(code=1, message=Virtual machine does not exist: {'vmId':
u'4013c829-c9d7-4b72-90d5-6fe58137504c'})


Jan 11 16:55:57 cultivar3 systemd-machined: New machine qemu-110-Cultivar.

Jan 11 16:55:57 cultivar3 systemd: Started Virtual Machine
qemu-110-Cultivar.

Jan 11 16:55:57 cultivar3 systemd: Starting Virtual Machine
qemu-110-Cultivar.

Jan 11 16:55:57 cultivar3 kvm: 3 guests now active


==> /var/log/vdsm/vdsm.log <==

  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
method

ret = func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2718,
in getStorageDomainInfo

dom = self.validateSdUUID(sdUUID)

  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 304, in
validateSdUUID

sdDom.validate()

  File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 515,
in validate

raise se.StorageDomainAccessError(self.sdUUID)

StorageDomainAccessError: Domain is either partially accessible or entirely
inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',)

jsonrpc/2::ERROR::2018-01-11
16:55:16,144::dispatcher::82::storage.Dispatcher::(wrapper) FINISH
getStorageDomainInfo error=Domain is either partially accessible or
entirely inaccessible: (u'248f46f0-d793-4581-9810-c9d965e2f286',)


==> /var/log/libvirt/qemu/Cultivar.log <==

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name
guest=Cultivar,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-108-Cultivar/master-key.aes
-machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu
Conroe -m 8192 -realtime mlock=off -smp
2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid
4013c829-c9d7-4b72-90d5-6fe58137504c -smbios
'type=1,manufacturer=oVirt,product=oVirt
Node,version=7-4.1708.el7.centos,serial=44454C4C-4300-1034-8035-CAC04F424331,uuid=4013c829-c9d7-4b72-90d5-6fe58137504c'
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-108-Cultivar/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=2018-01-11T20:33:19,driftfix=slew -global
kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
file=/var/run/vdsm/storage/248f46f0-d793-4581-9810-c9d965e2f286/c2dde892-f978-4dfc-a421-c8e04cf387f9/23aa0a66-fa6c-4967-a1e5-fbe47c0cd705,format=raw,if=none,id=drive-virtio-disk0,serial=c2dde892-f978-4dfc-a421-c8e04cf387f9,cache=none,werror=stop,rerror=stop,aio=threads
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive if=none,id=drive-ide0-1-0,readonly=on -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=30,id=hostnet0,vhost=on,vhostfd=32 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:7f:d6:83,bus=pci.0,addr=0x3
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.com.redhat.rhevm.vdsm,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel2,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0
-chardev
socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/4013c829-c9d7-4b72-90d5-6fe58137504c.org.ovirt.hosted-engine-setup.0,server,nowait
-device