Sorry, I am mistaken, two hosts failed for the agent with the following error:
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to start monitoring domain (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout during domain acquisition ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Shutting down the agent because of 3 failures in a row! What could cause these timeouts? Some other service not running? On Thu, Jun 29, 2017 at 5:03 PM, cmc <iuco...@gmail.com> wrote: > Both services are up on all three hosts. The broke logs just report: > > Thread-6549::INFO::2017-06-29 > 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) > Connection established > Thread-6549::INFO::2017-06-29 > 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) > Connection closed > > Thanks, > > Cam > > On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <msi...@redhat.com> wrote: >> Hi, >> >> please make sure that both ovirt-ha-agent and ovirt-ha-broker services >> are restarted and up. The error says the agent can't talk to the >> broker. Is there anything in the broker.log? >> >> Best regards >> >> Martin Sivak >> >> On Thu, Jun 29, 2017 at 4:42 PM, cmc <iuco...@gmail.com> wrote: >>> I've restarted those two services across all hosts, have taken the >>> Hosted Engine host out of maintenance, and when I try to migrate the >>> Hosted Engine over to another host, it reports that all three hosts >>> 'did not satisfy internal filter HA because it is not a Hosted Engine >>> host'. >>> >>> On the host that the Hosted Engine is currently on it reports in the >>> agent.log: >>> >>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR >>> Connection closed: Connection closed >>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception >>> getting service path: Connection closed >>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent >>> call last): >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 191, in _run_agent >>> return action(he) >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>> line 64, in action_proper >>> return >>> he.start_monitoring() >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 411, in start_monitoring >>> >>> self._initialize_sanlock() >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>> line 691, in _initialize_sanlock >>> >>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION) >>> File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>> line 162, in get_service_path >>> .format(str(e))) >>> RequestError: Failed >>> to get service path: Connection closed >>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent >>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent >>> >>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <msi...@redhat.com> wrote: >>>> Hi, >>>> >>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services. >>>> >>>> The scheduling message just means that the host has score 0 or is not >>>> reporting score at all. >>>> >>>> Martin >>>> >>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <iuco...@gmail.com> wrote: >>>>> Thanks Martin, do I have to restart anything? When I try to use the >>>>> 'migrate' operation, it complains that the other two hosts 'did not >>>>> satisfy internal filter HA because it is not a Hosted Engine host..' >>>>> (even though I reinstalled both these hosts with the 'deploy hosted >>>>> engine' option, which suggests that something needs restarting. Should >>>>> I worry about the sanlock errors, or will that be resolved by the >>>>> change in host_id? >>>>> >>>>> Kind regards, >>>>> >>>>> Cam >>>>> >>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <msi...@redhat.com> wrote: >>>>>> Change the ids so they are distinct. I need to check if there is a way >>>>>> to read the SPM ids from the engine as using the same numbers would be >>>>>> the best. >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <iuco...@gmail.com> wrote: >>>>>>> Is there any way of recovering from this situation? I'd prefer to fix >>>>>>> the issue rather than re-deploy, but if there is no recovery path, I >>>>>>> could perhaps try re-deploying the hosted engine. In which case, would >>>>>>> the best option be to take a backup of the Hosted Engine, and then >>>>>>> shut it down, re-initialise the SAN partition (or use another >>>>>>> partition) and retry the deployment? Would it be better to use the >>>>>>> older backup from the bare metal engine that I originally used, or use >>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been >>>>>>> added since switching to Hosted Engine. >>>>>>> >>>>>>> Unfortunately I have very little time left to get this working before >>>>>>> I have to hand it over for eval (by end of Friday). >>>>>>> >>>>>>> Here are some log snippets from the cluster that are current >>>>>>> >>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine: >>>>>>> >>>>>>> 2017-06-29 10:50:15,071+0100 INFO (monitor/207221b) [storage.SANLock] >>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id: >>>>>>> 3) (clusterlock:282) >>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor] >>>>>>> Error acquiring host id 3 for domain >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558) >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId >>>>>>> self.domain.acquireHostId(self.hostId, async=True) >>>>>>> File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId >>>>>>> self._manifest.acquireHostId(hostId, async) >>>>>>> File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId >>>>>>> self._domainLock.acquireHostId(hostId, async) >>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", >>>>>>> line 297, in acquireHostId >>>>>>> raise se.AcquireHostIdFailure(self._sdUUID, e) >>>>>>> AcquireHostIdFailure: Cannot acquire host id: >>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock >>>>>>> lockspace add failure', 'Invalid argument')) >>>>>>> >>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host: >>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>> Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-19 >>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-19 >>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Unexpected error >>>>>>> Traceback (most recent call last): >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 443, in start_monitoring >>>>>>> self._initialize_domain_monitor() >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 823, in _initialize_domain_monitor >>>>>>> raise Exception(msg) >>>>>>> Exception: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>> >>>>>>> From sanlock.log: >>>>>>> >>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>> conflicts with name of list1 s5 >>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>> >>>>>>> From the two other hosts: >>>>>>> >>>>>>> host 2: >>>>>>> >>>>>>> vdsm.log >>>>>>> >>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer] >>>>>>> Internal server error (__init__:570) >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line >>>>>>> 565, in _handle_request >>>>>>> res = method(**params) >>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line >>>>>>> 202, in _dynamicMethod >>>>>>> result = fn(*methodArgs) >>>>>>> File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies >>>>>>> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies() >>>>>>> File "/usr/share/vdsm/clientIF.py", line 448, in >>>>>>> getAllVmIoTunePolicies >>>>>>> 'current_values': v.getIoTune()} >>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune >>>>>>> result = self.getIoTuneResponse() >>>>>>> File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse >>>>>>> res = self._dom.blockIoTune( >>>>>>> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line >>>>>>> 47, in __getattr__ >>>>>>> % self.vmid) >>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not >>>>>>> started yet or was shut down >>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log >>>>>>> >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) >>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555, >>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>> OVF_STORE volume path: >>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1 >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>> Found an OVF for HE VM, trying to convert >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>> Got vm.conf from OVF_STORE >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017 >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>>> MainThread::INFO::2017-06-29 >>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>>> Reloading vm.conf from the shared storage domain >>>>>>> >>>>>>> /var/log/messages: >>>>>>> >>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a >>>>>>> partition! >>>>>>> >>>>>>> >>>>>>> host 1: >>>>>>> >>>>>>> /var/log/messages also in sanlock.log >>>>>>> >>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100 >>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177 >>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100 >>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262 >>>>>>> >>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log: >>>>>>> >>>>>>> MainThread::ERROR::2017-06-27 >>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) >>>>>>> Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-27 >>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Error while monitoring engine: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::WARNING::2017-06-27 >>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Unexpected error >>>>>>> Traceback (most recent call last): >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 443, in start_monitoring >>>>>>> self._initialize_domain_monitor() >>>>>>> File >>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>>> line 823, in _initialize_domain_monitor >>>>>>> raise Exception(msg) >>>>>>> Exception: Failed to start monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout >>>>>>> during domain acquisition >>>>>>> MainThread::ERROR::2017-06-27 >>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>>> Shutting down the agent because of 3 failures in a row! >>>>>>> MainThread::INFO::2017-06-27 >>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) >>>>>>> VDSM domain monitor status: PENDING >>>>>>> MainThread::INFO::2017-06-27 >>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) >>>>>>> Failed to stop monitoring domain >>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is >>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f' >>>>>>> MainThread::INFO::2017-06-27 >>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>>>>> Agent shutting down >>>>>>> >>>>>>> >>>>>>> Thanks for any help, >>>>>>> >>>>>>> >>>>>>> Cam >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <iuco...@gmail.com> wrote: >>>>>>>> Hi Martin, >>>>>>>> >>>>>>>> yes, on two of the machines they have the same host_id. The other has >>>>>>>> a different host_id. >>>>>>>> >>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on >>>>>>>> the other host (so all three hosts in the cluster now have it >>>>>>>> installed). The second one I deployed said it was able to host the >>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host >>>>>>>> with the Hosted Engine on it into maintenance to see if it would >>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And >>>>>>>> now the host that said it was able to host the engine says >>>>>>>> 'unavailable due to HA score'. The host that it was trying to move >>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours. >>>>>>>> >>>>>>>> The summary is: >>>>>>>> >>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled >>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted >>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf. >>>>>>>> 'add_lockspace' fails in sanlock.log >>>>>>>> >>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine >>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon >>>>>>>> saying that it was able to host the Hosted Engine, but after migration >>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports: >>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log >>>>>>>> >>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was >>>>>>>> not part of the original cluster. I restored the bare-metal engine >>>>>>>> backup in the Hosted Engine on this host when deploying it, without >>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after >>>>>>>> I put that host into maintenance to test the HA of Hosted Engine). >>>>>>>> Sanlock log shows conflicts >>>>>>>> >>>>>>>> I will look through all the logs for any other errors. Please let me >>>>>>>> know if you need any logs or other clarification/information. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Campbell >>>>>>>> >>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <msi...@redhat.com> >>>>>>>> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> can you please check the contents of >>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or >>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is >>>>>>>>> right now) and search for host-id? >>>>>>>>> >>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug >>>>>>>>> somewhere. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <iuco...@gmail.com> wrote: >>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock: >>>>>>>>>> >>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace >>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0 >>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id >>>>>>>>>> 1 >>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03 >>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail >>>>>>>>>> result -262 >>>>>>>>>> >>>>>>>>>> The sanlock service is running. Why would this occur? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> C >>>>>>>>>> >>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <iuco...@gmail.com> wrote: >>>>>>>>>>> Hi Martin, >>>>>>>>>>> >>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed >>>>>>>>>>> without error. However, it still will not allow the Hosted Engine >>>>>>>>>>> migrate to another host. The >>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the >>>>>>>>>>> host >>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts, >>>>>>>>>>> reports: >>>>>>>>>>> >>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>> >>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine >>>>>>>>>>> High Availability Communications Broker... >>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker >>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR >>>>>>>>>>> Failed to read metadata from >>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata >>>>>>>>>>> Traceback (most >>>>>>>>>>> recent call last): >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", >>>>>>>>>>> line 129, in get_raw_stats_for_service_type >>>>>>>>>>> f = >>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) >>>>>>>>>>> OSError: [Errno 2] >>>>>>>>>>> No such file or directory: >>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata' >>>>>>>>>>> >>>>>>>>>>> --------------------8<------------------- >>>>>>>>>>> >>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. >>>>>>>>>>> The >>>>>>>>>>> perms are slightly different on the host that is running the VM vs >>>>>>>>>>> the >>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. >>>>>>>>>>> Is >>>>>>>>>>> this a san locking issue? >>>>>>>>>>> >>>>>>>>>>> Thanks for any help, >>>>>>>>>>> >>>>>>>>>>> Cam >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <msi...@redhat.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>> Should it be? It was not in the instructions for the migration >>>>>>>>>>>>> from >>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>> >>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services >>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted >>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog. >>>>>>>>>>>> >>>>>>>>>>>> Best regards >>>>>>>>>>>> >>>>>>>>>>>> Martin Sivak >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <iuco...@gmail.com> wrote: >>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display >>>>>>>>>>>>> protocols >>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am >>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6 >>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to >>>>>>>>>>>>> another host. When I try putting the host it is currently on into >>>>>>>>>>>>> maintenance, it reports: >>>>>>>>>>>>> >>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to >>>>>>>>>>>>> Maintenance mode. >>>>>>>>>>>>> There are no available hosts capable of running the engine VM. >>>>>>>>>>>>> >>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status: >>>>>>>>>>>>> unknown stale-data'. >>>>>>>>>>>>> >>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was >>>>>>>>>>>>> set to >>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no >>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two >>>>>>>>>>>>> hosts. >>>>>>>>>>>>> Should it be? It was not in the instructions for the migration >>>>>>>>>>>>> from >>>>>>>>>>>>> bare-metal to Hosted VM >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Cam >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <iuco...@gmail.com> wrote: >>>>>>>>>>>>>> Hi Tomas, >>>>>>>>>>>>>> >>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties >>>>>>>>>>>>>> on my >>>>>>>>>>>>>> engine VM, I have: >>>>>>>>>>>>>> >>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = >>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl >>>>>>>>>>>>>> >>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should >>>>>>>>>>>>>> not apply >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there somewhere else I should be looking? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek >>>>>>>>>>>>>> <tjeli...@redhat.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek >>>>>>>>>>>>>>> <michal.skriva...@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <msi...@redhat.com> >>>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that >>>>>>>>>>>>>>>> > causes >>>>>>>>>>>>>>>> > the following error? >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of >>>>>>>>>>>>>>>> >>>>> action >>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> to match the OS and VM Display type;-) >>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older >>>>>>>>>>>>>>>> releases on >>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga >>>>>>>>>>>>>>>> for non-SPICE >>>>>>>>>>>>>>>> VMs >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this: >>>>>>>>>>>>>>> os.other.devices.display.protocols.value = >>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Thanks. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <iuco...@gmail.com> >>>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>> >> Hi Martin, >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database >>>>>>>>>>>>>>>> >>> backup from >>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just >>>>>>>>>>>>>>>> >>> try again >>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending >>>>>>>>>>>>>>>> >>> value there >>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as >>>>>>>>>>>>>>>> >> re-running the >>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to >>>>>>>>>>>>>>>> >> re-initialise >>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending >>>>>>>>>>>>>>>> >> value(s) >>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file >>>>>>>>>>>>>>>> >> somewhere? >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Cheers, >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Cam >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >>> Regards >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> Martin Sivak >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <iuco...@gmail.com> >>>>>>>>>>>>>>>> >>> wrote: >>>>>>>>>>>>>>>> >>>> Hi Yanir, >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Thanks for the reply. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of : >>>>>>>>>>>>>>>> >>>>> WARN >>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of >>>>>>>>>>>>>>>> >>>>> action >>>>>>>>>>>>>>>> >>>>> 'ImportVm' >>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up >>>>>>>>>>>>>>>> >>>>> correctly and >>>>>>>>>>>>>>>> >>>>> further >>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a >>>>>>>>>>>>>>>> >>>>> stable state. >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a >>>>>>>>>>>>>>>> >>>>> previous/initial >>>>>>>>>>>>>>>> >>>>> state ? >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. >>>>>>>>>>>>>>>> >>>> This was a >>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report >>>>>>>>>>>>>>>> >>>> any error >>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first >>>>>>>>>>>>>>>> >>>> attempts at >>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a >>>>>>>>>>>>>>>> >>>> proxy issue) but >>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get >>>>>>>>>>>>>>>> >>>> the Hosted >>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the >>>>>>>>>>>>>>>> >>>> entire cluster >>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)? >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Thanks for any help. >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Regards, >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>> Cam >>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>> >>>>> Yanir >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <iuco...@gmail.com> >>>>>>>>>>>>>>>> >>>>> wrote: >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin, >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has >>>>>>>>>>>>>>>> >>>>>> no log on any >>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I >>>>>>>>>>>>>>>> >>>>>> need to put the >>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I >>>>>>>>>>>>>>>> >>>>>> created it on >>>>>>>>>>>>>>>> >>>>>> (which >>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, >>>>>>>>>>>>>>>> >>>>>> it won't get >>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to >>>>>>>>>>>>>>>> >>>>>> manage the >>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be >>>>>>>>>>>>>>>> >>>>>> in. >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> Cam >>>>>>>>>>>>>>>> >>>>>> >>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc >>>>>>>>>>>>>>>> >>>>>> <iuco...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same >>>>>>>>>>>>>>>> >>>>>>> cluster. >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine: >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN >>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of >>>>>>>>>>>>>>>> >>>>>>> action >>>>>>>>>>>>>>>> >>>>>>> 'ImportVm' >>>>>>>>>>>>>>>> >>>>>>> failed for user SYST >>>>>>>>>>>>>>>> >>>>>>> EM. Reasons: >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS >>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO >>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand] >>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to >>>>>>>>>>>>>>>> >>>>>>> object >>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a >>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM, >>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>> >>>>>>> HostedEngine>, >>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, >>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', >>>>>>>>>>>>>>>> >>>>>>> sharedLocks= >>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM, >>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName >>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}' >>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR >>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter] >>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed >>>>>>>>>>>>>>>> >>>>>>> importing the Hosted >>>>>>>>>>>>>>>> >>>>>>> Engine VM >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, >>>>>>>>>>>>>>>> >>>>>>> and a >>>>>>>>>>>>>>>> >>>>>>> different >>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related. >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent >>>>>>>>>>>>>>>> >>>>>>> log on the >>>>>>>>>>>>>>>> >>>>>>> host >>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on: >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF >>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19 >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, >>>>>>>>>>>>>>>> >>>>>>> falling back >>>>>>>>>>>>>>>> >>>>>>> to >>>>>>>>>>>>>>>> >>>>>>> initial vm.conf >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, >>>>>>>>>>>>>>>> >>>>>>> but they were >>>>>>>>>>>>>>>> >>>>>>> for >>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved). >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, >>>>>>>>>>>>>>>> >>>>>>> for which I >>>>>>>>>>>>>>>> >>>>>>> will >>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an >>>>>>>>>>>>>>>> >>>>>>> upgrade. I >>>>>>>>>>>>>>>> >>>>>>> guess >>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that? >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> Campbell >>>>>>>>>>>>>>>> >>>>>>> >>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak >>>>>>>>>>>>>>>> >>>>>>> <msi...@redhat.com> >>>>>>>>>>>>>>>> >>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you >>>>>>>>>>>>>>>> >>>>>>>> should have >>>>>>>>>>>>>>>> >>>>>>>> more >>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes >>>>>>>>>>>>>>>> >>>>>>>> should >>>>>>>>>>>>>>>> >>>>>>>> belong to >>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster. >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> Best regards >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak >>>>>>>>>>>>>>>> >>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc >>>>>>>>>>>>>>>> >>>>>>>> <iuco...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed >>>>>>>>>>>>>>>> >>>>>>>>> across all >>>>>>>>>>>>>>>> >>>>>>>>> hosts? >>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it >>>>>>>>>>>>>>>> >>>>>>>>> properly? >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc >>>>>>>>>>>>>>>> >>>>>>>>> <iuco...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but >>>>>>>>>>>>>>>> >>>>>>>>>> am unsure how >>>>>>>>>>>>>>>> >>>>>>>>>> they >>>>>>>>>>>>>>>> >>>>>>>>>> arose. >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> Campbell >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar >>>>>>>>>>>>>>>> >>>>>>>>>> <eto...@redhat.com> >>>>>>>>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, >>>>>>>>>>>>>>>> >>>>>>>>>>> try starting >>>>>>>>>>>>>>>> >>>>>>>>>>> it by >>>>>>>>>>>>>>>> >>>>>>>>>>> running: >>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent. >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine >>>>>>>>>>>>>>>> >>>>>>>>>>> storage domain >>>>>>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>>>>>>> >>>>>>>>>>> import it >>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted >>>>>>>>>>>>>>>> >>>>>>>>>>> engine vm. >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host >>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log) >>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm >>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)? >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc >>>>>>>>>>>>>>>> >>>>>>>>>>> <iuco...@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny, >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running? >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and >>>>>>>>>>>>>>>> >>>>>>>>>>>>> displayed in the >>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you >>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain. >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a >>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a >>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal >>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine >>>>>>>>>>>>>>>> >>>>>>>>>>>> VM for that >>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster. >>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely >>>>>>>>>>>>>>>> >>>>>>>>>>>> new host and >>>>>>>>>>>>>>>> >>>>>>>>>>>> ran >>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these >>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions: >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/). >>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it >>>>>>>>>>>>>>>> >>>>>>>>>>>> completed >>>>>>>>>>>>>>>> >>>>>>>>>>>> without any >>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a >>>>>>>>>>>>>>>> >>>>>>>>>>>> master >>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two >>>>>>>>>>>>>>>> >>>>>>>>>>>> existing master >>>>>>>>>>>>>>>> >>>>>>>>>>>> storage >>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and >>>>>>>>>>>>>>>> >>>>>>>>>>>> one ISO >>>>>>>>>>>>>>>> >>>>>>>>>>>> domain, >>>>>>>>>>>>>>>> >>>>>>>>>>>> which >>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline. >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are >>>>>>>>>>>>>>>> >>>>>>>>>>>>> failing? >>>>>>>>>>>>>>>> >>>>>>>>>>>>> What >>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens >>>>>>>>>>>>>>>> >>>>>>>>>>>>> when >>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now? >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it >>>>>>>>>>>>>>>> >>>>>>>>>>>> exited with >>>>>>>>>>>>>>>> >>>>>>>>>>>> no >>>>>>>>>>>>>>>> >>>>>>>>>>>> output >>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports: >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==-- >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage : True >>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date : False >>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname : >>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk >>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID : 1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status : unknown >>>>>>>>>>>>>>>> >>>>>>>>>>>> stale-data >>>>>>>>>>>>>>>> >>>>>>>>>>>> Score : 0 >>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped : True >>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance : False >>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32 : 0217f07b >>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp : 2911 >>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp : 2897 >>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp): >>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_parse_version=1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> metadata_feature_version=1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> timestamp=2897 (Thu Jun 15 16:22:54 2017) >>>>>>>>>>>>>>>> >>>>>>>>>>>> host-id=1 >>>>>>>>>>>>>>>> >>>>>>>>>>>> score=0 >>>>>>>>>>>>>>>> >>>>>>>>>>>> vm_conf_refresh_time=2911 (Thu Jun 15 >>>>>>>>>>>>>>>> >>>>>>>>>>>> 16:23:08 2017) >>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage=True >>>>>>>>>>>>>>>> >>>>>>>>>>>> maintenance=False >>>>>>>>>>>>>>>> >>>>>>>>>>>> state=AgentStopped >>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped=True >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it >>>>>>>>>>>>>>>> >>>>>>>>>>>> is not HA due >>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>> >>>>>>>>>>>> being >>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the >>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm >>>>>>>>>>>>>>>> >>>>>>>>>>>> need >>>>>>>>>>>>>>>> >>>>>>>>>>>> to >>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw? >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help, >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc >>>>>>>>>>>>>>>> >>>>>>>>>>>>> <iuco...@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted engine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted engine >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> nothing (exit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> tried starting >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> and why it >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users@ovirt.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list >>>>>>>>>>>>>>>> >>>>>>>>> Users@ovirt.org >>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> >>>>>> _______________________________________________ >>>>>>>>>>>>>>>> >>>>>> Users mailing list >>>>>>>>>>>>>>>> >>>>>> Users@ovirt.org >>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>> > _______________________________________________ >>>>>>>>>>>>>>>> > Users mailing list >>>>>>>>>>>>>>>> > Users@ovirt.org >>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users