Re: [ovirt-users] HostedEngine VM not visible, but running

cmc Thu, 29 Jun 2017 09:11:37 -0700

Sorry, I am mistaken, two hosts failed for the agent with the following error:


ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
ERROR Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
ERROR Shutting down the agent because of 3 failures in a row!

What could cause these timeouts? Some other service not running?

On Thu, Jun 29, 2017 at 5:03 PM, cmc <[email protected]> wrote:
> Both services are up on all three hosts. The broke logs just report:
>
> Thread-6549::INFO::2017-06-29
> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
> Connection established
> Thread-6549::INFO::2017-06-29
> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Connection closed
>
> Thanks,
>
> Cam
>
> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak <[email protected]> wrote:
>> Hi,
>>
>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>> are restarted and up. The error says the agent can't talk to the
>> broker. Is there anything in the broker.log?
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Thu, Jun 29, 2017 at 4:42 PM, cmc <[email protected]> wrote:
>>> I've restarted those two services across all hosts, have taken the
>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>> Hosted Engine over to another host, it reports that all three hosts
>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>> host'.
>>>
>>> On the host that the Hosted Engine is currently on it reports in the 
>>> agent.log:
>>>
>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>> Connection closed: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>> getting service path: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>> call last):
>>>                                                     File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 191, in _run_agent
>>>                                                       return action(he)
>>>                                                     File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 64, in action_proper
>>>                                                       return
>>> he.start_monitoring()
>>>                                                     File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 411, in start_monitoring
>>>                                                       
>>> self._initialize_sanlock()
>>>                                                     File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 691, in _initialize_sanlock
>>>
>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>>                                                     File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> line 162, in get_service_path
>>>                                                       .format(str(e)))
>>>                                                   RequestError: Failed
>>> to get service path: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>>
>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>>>>
>>>> The scheduling message just means that the host has score 0 or is not
>>>> reporting score at all.
>>>>
>>>> Martin
>>>>
>>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc <[email protected]> wrote:
>>>>> Thanks Martin, do I have to restart anything? When I try to use the
>>>>> 'migrate' operation, it complains that the other two hosts 'did not
>>>>> satisfy internal filter HA because it is not a Hosted Engine host..'
>>>>> (even though I reinstalled both these hosts with the 'deploy hosted
>>>>> engine' option, which suggests that something needs restarting. Should
>>>>> I worry about the sanlock errors, or will that be resolved by the
>>>>> change in host_id?
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Cam
>>>>>
>>>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak <[email protected]> wrote:
>>>>>> Change the ids so they are distinct. I need to check if there is a way
>>>>>> to read the SPM ids from the engine as using the same numbers would be
>>>>>> the best.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc <[email protected]> wrote:
>>>>>>> Is there any way of recovering from this situation? I'd prefer to fix
>>>>>>> the issue rather than re-deploy, but if there is no recovery path, I
>>>>>>> could perhaps try re-deploying the hosted engine. In which case, would
>>>>>>> the best option be to take a backup of the Hosted Engine, and then
>>>>>>> shut it down, re-initialise the SAN partition (or use another
>>>>>>> partition) and retry the deployment? Would it be better to use the
>>>>>>> older backup from the bare metal engine that I originally used, or use
>>>>>>> a backup from the Hosted Engine? I'm not sure if any VMs have been
>>>>>>> added since switching to Hosted Engine.
>>>>>>>
>>>>>>> Unfortunately I have very little time left to get this working before
>>>>>>> I have to hand it over for eval (by end of Friday).
>>>>>>>
>>>>>>> Here are some log snippets from the cluster that are current
>>>>>>>
>>>>>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>>>>>>>
>>>>>>> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
>>>>>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
>>>>>>> 3) (clusterlock:282)
>>>>>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
>>>>>>> Error acquiring host id 3 for domain
>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>>>>>> Traceback (most recent call last):
>>>>>>>   File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
>>>>>>>     self.domain.acquireHostId(self.hostId, async=True)
>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
>>>>>>>     self._manifest.acquireHostId(hostId, async)
>>>>>>>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
>>>>>>>     self._domainLock.acquireHostId(hostId, async)
>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>>>>>> line 297, in acquireHostId
>>>>>>>     raise se.AcquireHostIdFailure(self._sdUUID, e)
>>>>>>> AcquireHostIdFailure: Cannot acquire host id:
>>>>>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
>>>>>>> lockspace add failure', 'Invalid argument'))
>>>>>>>
>>>>>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>>>>>>
>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>> Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> MainThread::WARNING::2017-06-19
>>>>>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Unexpected error
>>>>>>> Traceback (most recent call last):
>>>>>>>   File 
>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>> line 443, in start_monitoring
>>>>>>>     self._initialize_domain_monitor()
>>>>>>>   File 
>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>     raise Exception(msg)
>>>>>>> Exception: Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> MainThread::ERROR::2017-06-19
>>>>>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>>
>>>>>>> From sanlock.log:
>>>>>>>
>>>>>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>> conflicts with name of list1 s5
>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>
>>>>>>> From the two other hosts:
>>>>>>>
>>>>>>> host 2:
>>>>>>>
>>>>>>> vdsm.log
>>>>>>>
>>>>>>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
>>>>>>> Internal server error (__init__:570)
>>>>>>> Traceback (most recent call last):
>>>>>>>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>>>>>>> 565, in _handle_request
>>>>>>>     res = method(**params)
>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>>>>>>> 202, in _dynamicMethod
>>>>>>>     result = fn(*methodArgs)
>>>>>>>   File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
>>>>>>>     io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>>>>>>>   File "/usr/share/vdsm/clientIF.py", line 448, in 
>>>>>>> getAllVmIoTunePolicies
>>>>>>>     'current_values': v.getIoTune()}
>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
>>>>>>>     result = self.getIoTuneResponse()
>>>>>>>   File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
>>>>>>>     res = self._dom.blockIoTune(
>>>>>>>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
>>>>>>> 47, in __getattr__
>>>>>>>     % self.vmid)
>>>>>>> NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
>>>>>>> started yet or was shut down
>>>>>>>
>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log
>>>>>>>
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>>>>> Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
>>>>>>> volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:33,926::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>> Extracting Engine VM OVF from the OVF_STORE
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:33,938::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>> OVF_STORE volume path:
>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/images/222610db-7880-4f4f-8559-a3635fd73555/c6e0d29b-eabf-4a09-a330-df54cfdd73f1
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:33,967::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>> Found an OVF for HE VM, trying to convert
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:33,971::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>> Got vm.conf from OVF_STORE
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:36,736::states::678::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score)
>>>>>>> Score is 0 due to unexpected vm shutdown at Thu Jun 29 10:53:59 2017
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:36,736::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Current state EngineUnexpectedlyDown (score: 0)
>>>>>>> MainThread::INFO::2017-06-29
>>>>>>> 10:56:46,772::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>>>>> Reloading vm.conf from the shared storage domain
>>>>>>>
>>>>>>> /var/log/messages:
>>>>>>>
>>>>>>> Jun 29 10:53:46 kvm-ldn-02 kernel: dd: sending ioctl 80306d02 to a 
>>>>>>> partition!
>>>>>>>
>>>>>>>
>>>>>>> host 1:
>>>>>>>
>>>>>>> /var/log/messages also in sanlock.log
>>>>>>>
>>>>>>> Jun 29 11:01:02 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:02+0100
>>>>>>> 678325 [9132]: s4531 delta_acquire host_id 1 busy1 1 2 1193177
>>>>>>> 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>> Jun 29 11:01:03 kvm-ldn-01 sanlock[2400]: 2017-06-29 11:01:03+0100
>>>>>>> 678326 [24159]: s4531 add_lockspace fail result -262
>>>>>>>
>>>>>>> /var/log/ovirt-hosted-engine-ha/agent.log:
>>>>>>>
>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>> 15:21:01,143::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>>>>>> Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>> 15:21:01,144::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Error while monitoring engine: Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> MainThread::WARNING::2017-06-27
>>>>>>> 15:21:01,144::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Unexpected error
>>>>>>> Traceback (most recent call last):
>>>>>>>   File 
>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>> line 443, in start_monitoring
>>>>>>>     self._initialize_domain_monitor()
>>>>>>>   File 
>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>>>>> line 823, in _initialize_domain_monitor
>>>>>>>     raise Exception(msg)
>>>>>>> Exception: Failed to start monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>>>>>> during domain acquisition
>>>>>>> MainThread::ERROR::2017-06-27
>>>>>>> 15:21:01,144::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>>>>>> Shutting down the agent because of 3 failures in a row!
>>>>>>> MainThread::INFO::2017-06-27
>>>>>>> 15:21:06,717::hosted_engine::848::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status)
>>>>>>> VDSM domain monitor status: PENDING
>>>>>>> MainThread::INFO::2017-06-27
>>>>>>> 15:21:09,335::hosted_engine::776::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)
>>>>>>> Failed to stop monitoring domain
>>>>>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f): Storage domain is
>>>>>>> member of pool: u'domain=207221b2-959b-426b-b945-18e1adfed62f'
>>>>>>> MainThread::INFO::2017-06-27
>>>>>>> 15:21:09,339::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(run)
>>>>>>> Agent shutting down
>>>>>>>
>>>>>>>
>>>>>>> Thanks for any help,
>>>>>>>
>>>>>>>
>>>>>>> Cam
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 28, 2017 at 11:25 AM, cmc <[email protected]> wrote:
>>>>>>>> Hi Martin,
>>>>>>>>
>>>>>>>> yes, on two of the machines they have the same host_id. The other has
>>>>>>>> a different host_id.
>>>>>>>>
>>>>>>>> To update since yesterday: I reinstalled and deployed Hosted Engine on
>>>>>>>> the other host (so all three hosts in the cluster now have it
>>>>>>>> installed). The second one I deployed said it was able to host the
>>>>>>>> engine (unlike the first I reinstalled), so I tried putting the host
>>>>>>>> with the Hosted Engine on it into maintenance to see if it would
>>>>>>>> migrate over. It managed to move all hosts but the Hosted Engine. And
>>>>>>>> now the host that said it was able to host the engine says
>>>>>>>> 'unavailable due to HA score'. The host that it was trying to move
>>>>>>>> from is now in 'preparing for maintenance' for the last 12 hours.
>>>>>>>>
>>>>>>>> The summary is:
>>>>>>>>
>>>>>>>> kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled
>>>>>>>> with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
>>>>>>>> Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf.
>>>>>>>> 'add_lockspace' fails in sanlock.log
>>>>>>>>
>>>>>>>> kvm-ldn-02 - the other host that was pre-existing before Hosted Engine
>>>>>>>> was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
>>>>>>>> saying that it was able to host the Hosted Engine, but after migration
>>>>>>>> was attempted when putting kvm-ldn-03 into maintenance, it reports:
>>>>>>>> 'unavailable due to HA score'. It has a host_id of '1' in
>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log
>>>>>>>>
>>>>>>>> kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was
>>>>>>>> not part of the original cluster. I restored the bare-metal engine
>>>>>>>> backup in the Hosted Engine on this host when deploying it, without
>>>>>>>> error. It currently has the Hosted Engine on it (as the only VM after
>>>>>>>> I put that host into maintenance to test the HA of Hosted Engine).
>>>>>>>> Sanlock log shows conflicts
>>>>>>>>
>>>>>>>> I will look through all the logs for any other errors. Please let me
>>>>>>>> know if you need any logs or other clarification/information.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Campbell
>>>>>>>>
>>>>>>>> On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> can you please check the contents of
>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf or
>>>>>>>>> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
>>>>>>>>> right now) and search for host-id?
>>>>>>>>>
>>>>>>>>> Make sure the IDs are different. If they are not, then there is a bug 
>>>>>>>>> somewhere.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> On Tue, Jun 27, 2017 at 6:26 PM, cmc <[email protected]> wrote:
>>>>>>>>>> I see this on the host it is trying to migrate in /var/log/sanlock:
>>>>>>>>>>
>>>>>>>>>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>>>>>>>>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>>>>>>>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 
>>>>>>>>>> 1
>>>>>>>>>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>>>>>>>>>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail 
>>>>>>>>>> result -262
>>>>>>>>>>
>>>>>>>>>> The sanlock service is running. Why would this occur?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> C
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 27, 2017 at 5:21 PM, cmc <[email protected]> wrote:
>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the reply. I have done this, and the deployment completed
>>>>>>>>>>> without error. However, it still will not allow the Hosted Engine
>>>>>>>>>>> migrate to another host. The
>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the 
>>>>>>>>>>> host
>>>>>>>>>>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>>>>>>>>>>> reports:
>>>>>>>>>>>
>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>
>>>>>>>>>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>>>>>>>>>>> High Availability Communications Broker...
>>>>>>>>>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>>>>>>>>>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>>>>>>>>>> Failed to read metadata from
>>>>>>>>>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>>>>>>>>>                                                   Traceback (most
>>>>>>>>>>> recent call last):
>>>>>>>>>>>                                                     File
>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>>>>>> line 129, in get_raw_stats_for_service_type
>>>>>>>>>>>                                                       f =
>>>>>>>>>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>>>>>>>>>                                                   OSError: [Errno 2]
>>>>>>>>>>> No such file or directory:
>>>>>>>>>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>>>>>>>>>
>>>>>>>>>>> --------------------8<-------------------
>>>>>>>>>>>
>>>>>>>>>>> I checked the path, and it exists. I can run 'less -f' on it fine. 
>>>>>>>>>>> The
>>>>>>>>>>> perms are slightly different on the host that is running the VM vs 
>>>>>>>>>>> the
>>>>>>>>>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. 
>>>>>>>>>>> Is
>>>>>>>>>>> this a san locking issue?
>>>>>>>>>>>
>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>
>>>>>>>>>>> Cam
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration 
>>>>>>>>>>>>> from
>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>
>>>>>>>>>>>> The hosted engine will only migrate to hosts that have the services
>>>>>>>>>>>> running. Please put one other host to maintenance and select Hosted
>>>>>>>>>>>> engine action: DEPLOY in the reinstall dialog.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards
>>>>>>>>>>>>
>>>>>>>>>>>> Martin Sivak
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>>> I changed the 'os.other.devices.display.protocols.value.3.6 =
>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display 
>>>>>>>>>>>>> protocols
>>>>>>>>>>>>> as 4 and the hosted engine now appears in the list of VMs. I am
>>>>>>>>>>>>> guessing the compatibility version was causing it to use the 3.6
>>>>>>>>>>>>> version. However, I am still unable to migrate the engine VM to
>>>>>>>>>>>>> another host. When I try putting the host it is currently on into
>>>>>>>>>>>>> maintenance, it reports:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Error while executing action: Cannot switch the Host(s) to 
>>>>>>>>>>>>> Maintenance mode.
>>>>>>>>>>>>> There are no available hosts capable of running the engine VM.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Running 'hosted-engine --vm-status' still shows 'Engine status:
>>>>>>>>>>>>> unknown stale-data'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The ovirt-ha-broker service is only running on one host. It was 
>>>>>>>>>>>>> set to
>>>>>>>>>>>>> 'disabled' in systemd. It won't start as there is no
>>>>>>>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two 
>>>>>>>>>>>>> hosts.
>>>>>>>>>>>>> Should it be? It was not in the instructions for the migration 
>>>>>>>>>>>>> from
>>>>>>>>>>>>> bare-metal to Hosted VM
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc <[email protected]> wrote:
>>>>>>>>>>>>>> Hi Tomas,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties 
>>>>>>>>>>>>>> on my
>>>>>>>>>>>>>> engine VM, I have:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> os.other.devices.display.protocols.value = 
>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>> os.other.devices.display.protocols.value.3.6 = 
>>>>>>>>>>>>>> spice/qxl,vnc/cirrus,vnc/qxl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That seems to match - I assume since this is 4.1, the 3.6 should 
>>>>>>>>>>>>>> not apply
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there somewhere else I should be looking?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> > On 22 Jun 2017, at 12:31, Martin Sivak <[email protected]> 
>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Tomas, what fields are needed in a VM to pass the check that 
>>>>>>>>>>>>>>>> > causes
>>>>>>>>>>>>>>>> > the following error?
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >>>>> WARN  
>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of 
>>>>>>>>>>>>>>>> >>>>> action
>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> to match the OS and VM Display type;-)
>>>>>>>>>>>>>>>> Configuration is in osinfo….e.g. if that is import from older 
>>>>>>>>>>>>>>>> releases on
>>>>>>>>>>>>>>>> Linux this is typically caused by the cahgen of cirrus to vga 
>>>>>>>>>>>>>>>> for non-SPICE
>>>>>>>>>>>>>>>> VMs
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yep, the default supported combinations for 4.0+ is this:
>>>>>>>>>>>>>>> os.other.devices.display.protocols.value =
>>>>>>>>>>>>>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Thanks.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>> >> Hi Martin,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> just as a random comment, do you still have the database 
>>>>>>>>>>>>>>>> >>> backup from
>>>>>>>>>>>>>>>> >>> the bare metal -> VM attempt? It might be possible to just 
>>>>>>>>>>>>>>>> >>> try again
>>>>>>>>>>>>>>>> >>> using it. Or in the worst case.. update the offending 
>>>>>>>>>>>>>>>> >>> value there
>>>>>>>>>>>>>>>> >>> before restoring it to the new engine instance.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> I still have the backup. I'd rather do the latter, as 
>>>>>>>>>>>>>>>> >> re-running the
>>>>>>>>>>>>>>>> >> HE deployment is quite lengthy and involved (I have to 
>>>>>>>>>>>>>>>> >> re-initialise
>>>>>>>>>>>>>>>> >> the FC storage each time). Do you know what the offending 
>>>>>>>>>>>>>>>> >> value(s)
>>>>>>>>>>>>>>>> >> would be? Would it be in the Postgres DB or in a config file
>>>>>>>>>>>>>>>> >> somewhere?
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Cheers,
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Cam
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>> Regards
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Martin Sivak
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc <[email protected]> 
>>>>>>>>>>>>>>>> >>> wrote:
>>>>>>>>>>>>>>>> >>>> Hi Yanir,
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Thanks for the reply.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>> First of all, maybe a chain reaction of :
>>>>>>>>>>>>>>>> >>>>> WARN  
>>>>>>>>>>>>>>>> >>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>> >>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of 
>>>>>>>>>>>>>>>> >>>>> action
>>>>>>>>>>>>>>>> >>>>> 'ImportVm'
>>>>>>>>>>>>>>>> >>>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>> >>>>> is causing the hosted engine vm not to be set up 
>>>>>>>>>>>>>>>> >>>>> correctly  and
>>>>>>>>>>>>>>>> >>>>> further
>>>>>>>>>>>>>>>> >>>>> actions were made when the hosted engine vm wasnt in a 
>>>>>>>>>>>>>>>> >>>>> stable state.
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> As for now, are you trying to revert back to a 
>>>>>>>>>>>>>>>> >>>>> previous/initial
>>>>>>>>>>>>>>>> >>>>> state ?
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> I'm not trying to revert it to a previous state for now. 
>>>>>>>>>>>>>>>> >>>> This was a
>>>>>>>>>>>>>>>> >>>> migration from a bare metal engine, and it didn't report 
>>>>>>>>>>>>>>>> >>>> any error
>>>>>>>>>>>>>>>> >>>> during the migration. I'd had some problems on my first 
>>>>>>>>>>>>>>>> >>>> attempts at
>>>>>>>>>>>>>>>> >>>> this migration, whereby it never completed (due to a 
>>>>>>>>>>>>>>>> >>>> proxy issue) but
>>>>>>>>>>>>>>>> >>>> I managed to resolve this. Do you know of a way to get 
>>>>>>>>>>>>>>>> >>>> the Hosted
>>>>>>>>>>>>>>>> >>>> Engine VM into a stable state, without rebuilding the 
>>>>>>>>>>>>>>>> >>>> entire cluster
>>>>>>>>>>>>>>>> >>>> from scratch (since I have a lot of VMs on it)?
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Thanks for any help.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Regards,
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Cam
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>> Regards,
>>>>>>>>>>>>>>>> >>>>> Yanir
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc <[email protected]> 
>>>>>>>>>>>>>>>> >>>>> wrote:
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> Hi Jenny/Martin,
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> Any idea what I can do here? The hosted engine VM has 
>>>>>>>>>>>>>>>> >>>>>> no log on any
>>>>>>>>>>>>>>>> >>>>>> host in /var/log/libvirt/qemu, and I fear that if I 
>>>>>>>>>>>>>>>> >>>>>> need to put the
>>>>>>>>>>>>>>>> >>>>>> host into maintenance, e.g., to upgrade it that I 
>>>>>>>>>>>>>>>> >>>>>> created it on
>>>>>>>>>>>>>>>> >>>>>> (which
>>>>>>>>>>>>>>>> >>>>>> I think is hosting it), or if it fails for any reason, 
>>>>>>>>>>>>>>>> >>>>>> it won't get
>>>>>>>>>>>>>>>> >>>>>> migrated to another host, and I will not be able to 
>>>>>>>>>>>>>>>> >>>>>> manage the
>>>>>>>>>>>>>>>> >>>>>> cluster. It seems to be a very dangerous position to be 
>>>>>>>>>>>>>>>> >>>>>> in.
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> Thanks,
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> Cam
>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>> >>>>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc 
>>>>>>>>>>>>>>>> >>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> >>>>>>> Thanks Martin. The hosts are all part of the same 
>>>>>>>>>>>>>>>> >>>>>>> cluster.
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> I get these errors in the engine.log on the engine:
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z WARN
>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of 
>>>>>>>>>>>>>>>> >>>>>>> action
>>>>>>>>>>>>>>>> >>>>>>> 'ImportVm'
>>>>>>>>>>>>>>>> >>>>>>> failed for user SYST
>>>>>>>>>>>>>>>> >>>>>>> EM. Reasons:
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z INFO
>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to 
>>>>>>>>>>>>>>>> >>>>>>> object
>>>>>>>>>>>>>>>> >>>>>>> 'EngineLock:{exclusiveLocks='[a
>>>>>>>>>>>>>>>> >>>>>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=<VM,
>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName 
>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>,
>>>>>>>>>>>>>>>> >>>>>>> HostedEngine=<VM_NAME, 
>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_NAME_ALREADY_USED>]',
>>>>>>>>>>>>>>>> >>>>>>> sharedLocks=
>>>>>>>>>>>>>>>> >>>>>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=<REMOTE_VM,
>>>>>>>>>>>>>>>> >>>>>>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName 
>>>>>>>>>>>>>>>> >>>>>>> HostedEngine>]'}'
>>>>>>>>>>>>>>>> >>>>>>> 2017-06-19 03:28:05,030Z ERROR
>>>>>>>>>>>>>>>> >>>>>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>>>>>>>>>>>>>>> >>>>>>> (org.ovirt.thread.pool-6-thread-23) [] Failed 
>>>>>>>>>>>>>>>> >>>>>>> importing the Hosted
>>>>>>>>>>>>>>>> >>>>>>> Engine VM
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> The sanlock.log reports conflicts on that same host, 
>>>>>>>>>>>>>>>> >>>>>>> and a
>>>>>>>>>>>>>>>> >>>>>>> different
>>>>>>>>>>>>>>>> >>>>>>> error on the other hosts, not sure if they are related.
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> And this in the /var/log/ovirt-hosted-engine-ha/agent 
>>>>>>>>>>>>>>>> >>>>>>> log on the
>>>>>>>>>>>>>>>> >>>>>>> host
>>>>>>>>>>>>>>>> >>>>>>> which I deployed the hosted engine VM on:
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>>>>>>>>>>>>>>> >>>>>>> Unable to extract HEVM OVF
>>>>>>>>>>>>>>>> >>>>>>> MainThread::ERROR::2017-06-19
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>>>>>>>>>>>>>> >>>>>>> Failed extracting VM OVF from the OVF_STORE volume, 
>>>>>>>>>>>>>>>> >>>>>>> falling back
>>>>>>>>>>>>>>>> >>>>>>> to
>>>>>>>>>>>>>>>> >>>>>>> initial vm.conf
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> I've seen some of these issues reported in bugzilla, 
>>>>>>>>>>>>>>>> >>>>>>> but they were
>>>>>>>>>>>>>>>> >>>>>>> for
>>>>>>>>>>>>>>>> >>>>>>> older versions of oVirt (and appear to be resolved).
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> I will install that package on the other two hosts, 
>>>>>>>>>>>>>>>> >>>>>>> for which I
>>>>>>>>>>>>>>>> >>>>>>> will
>>>>>>>>>>>>>>>> >>>>>>> put them in maintenance as vdsm is installed as an 
>>>>>>>>>>>>>>>> >>>>>>> upgrade. I
>>>>>>>>>>>>>>>> >>>>>>> guess
>>>>>>>>>>>>>>>> >>>>>>> restarting vdsm is a good idea after that?
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> Thanks,
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> Campbell
>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>> >>>>>>> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
>>>>>>>>>>>>>>>> >>>>>>> <[email protected]>
>>>>>>>>>>>>>>>> >>>>>>> wrote:
>>>>>>>>>>>>>>>> >>>>>>>> Hi,
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> you do not have to install it on all hosts. But you 
>>>>>>>>>>>>>>>> >>>>>>>> should have
>>>>>>>>>>>>>>>> >>>>>>>> more
>>>>>>>>>>>>>>>> >>>>>>>> than one and ideally all hosted engine enabled nodes 
>>>>>>>>>>>>>>>> >>>>>>>> should
>>>>>>>>>>>>>>>> >>>>>>>> belong to
>>>>>>>>>>>>>>>> >>>>>>>> the same engine cluster.
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> Best regards
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> Martin Sivak
>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>> On Wed, Jun 21, 2017 at 11:29 AM, cmc 
>>>>>>>>>>>>>>>> >>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> Does ovirt-hosted-engine-ha need to be installed 
>>>>>>>>>>>>>>>> >>>>>>>>> across all
>>>>>>>>>>>>>>>> >>>>>>>>> hosts?
>>>>>>>>>>>>>>>> >>>>>>>>> Could that be the reason it is failing to see it 
>>>>>>>>>>>>>>>> >>>>>>>>> properly?
>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> Cam
>>>>>>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc 
>>>>>>>>>>>>>>>> >>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>> Logs are attached. I can see errors in there, but 
>>>>>>>>>>>>>>>> >>>>>>>>>> am unsure how
>>>>>>>>>>>>>>>> >>>>>>>>>> they
>>>>>>>>>>>>>>>> >>>>>>>>>> arose.
>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>> Campbell
>>>>>>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar
>>>>>>>>>>>>>>>> >>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>> >>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>>>> From the output it looks like the agent is down, 
>>>>>>>>>>>>>>>> >>>>>>>>>>> try starting
>>>>>>>>>>>>>>>> >>>>>>>>>>> it by
>>>>>>>>>>>>>>>> >>>>>>>>>>> running:
>>>>>>>>>>>>>>>> >>>>>>>>>>> systemctl start ovirt-ha-agent.
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>> The engine is supposed to see the hosted engine 
>>>>>>>>>>>>>>>> >>>>>>>>>>> storage domain
>>>>>>>>>>>>>>>> >>>>>>>>>>> and
>>>>>>>>>>>>>>>> >>>>>>>>>>> import it
>>>>>>>>>>>>>>>> >>>>>>>>>>> to the system, then it should import the hosted 
>>>>>>>>>>>>>>>> >>>>>>>>>>> engine vm.
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>> Can you attach the agent log from the host
>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>>>>>>>>>>>>>>>> >>>>>>>>>>> and the engine log from the engine vm
>>>>>>>>>>>>>>>> >>>>>>>>>>> (/var/log/ovirt-engine/engine.log)?
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> >>>>>>>>>>> Jenny
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 19, 2017 at 12:41 PM, cmc 
>>>>>>>>>>>>>>>> >>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>> >>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hi Jenny,
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What version are you running?
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> 4.1.2.2-1.el7.centos
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> For the hosted engine vm to be imported and 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> displayed in the
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> engine, you
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> must first create a master storage domain.
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> To provide a bit more detail: this was a 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> migration of a
>>>>>>>>>>>>>>>> >>>>>>>>>>>> bare-metal
>>>>>>>>>>>>>>>> >>>>>>>>>>>> engine in an existing cluster to a hosted engine 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> VM for that
>>>>>>>>>>>>>>>> >>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>> >>>>>>>>>>>> As part of this migration, I built an entirely 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> new host and
>>>>>>>>>>>>>>>> >>>>>>>>>>>> ran
>>>>>>>>>>>>>>>> >>>>>>>>>>>> 'hosted-engine --deploy' (followed these 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> instructions:
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
>>>>>>>>>>>>>>>> >>>>>>>>>>>> I restored the backup from the engine and it 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> completed
>>>>>>>>>>>>>>>> >>>>>>>>>>>> without any
>>>>>>>>>>>>>>>> >>>>>>>>>>>> errors. I didn't see any instructions regarding a 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> master
>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain in the page above. The cluster has two 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> existing master
>>>>>>>>>>>>>>>> >>>>>>>>>>>> storage
>>>>>>>>>>>>>>>> >>>>>>>>>>>> domains, one is fibre channel, which is up, and 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> one ISO
>>>>>>>>>>>>>>>> >>>>>>>>>>>> domain,
>>>>>>>>>>>>>>>> >>>>>>>>>>>> which
>>>>>>>>>>>>>>>> >>>>>>>>>>>> is currently offline.
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What do you mean the hosted engine commands are 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> failing?
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> What
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> happens
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> you run hosted-engine --vm-status now?
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Interestingly, whereas when I ran it before, it 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> exited with
>>>>>>>>>>>>>>>> >>>>>>>>>>>> no
>>>>>>>>>>>>>>>> >>>>>>>>>>>> output
>>>>>>>>>>>>>>>> >>>>>>>>>>>> and a return code of '1', it now reports:
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> --== Host 1 status ==--
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> conf_on_shared_storage             : True
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Status up-to-date                  : False
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Hostname                           :
>>>>>>>>>>>>>>>> >>>>>>>>>>>> kvm-ldn-03.ldn.fscfc.co.uk
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host ID                            : 1
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Engine status                      : unknown 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> stale-data
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Score                              : 0
>>>>>>>>>>>>>>>> >>>>>>>>>>>> stopped                            : True
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Local maintenance                  : False
>>>>>>>>>>>>>>>> >>>>>>>>>>>> crc32                              : 0217f07b
>>>>>>>>>>>>>>>> >>>>>>>>>>>> local_conf_timestamp               : 2911
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Host timestamp                     : 2897
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Extra metadata (valid at timestamp):
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_parse_version=1
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        metadata_feature_version=1
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        timestamp=2897 (Thu Jun 15 16:22:54 2017)
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        host-id=1
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        score=0
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        vm_conf_refresh_time=2911 (Thu Jun 15 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> 16:23:08 2017)
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        conf_on_shared_storage=True
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        maintenance=False
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        state=AgentStopped
>>>>>>>>>>>>>>>> >>>>>>>>>>>>        stopped=True
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Yet I can login to the web GUI fine. I guess it 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> is not HA due
>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>> >>>>>>>>>>>> being
>>>>>>>>>>>>>>>> >>>>>>>>>>>> in an unknown state currently? Does the 
>>>>>>>>>>>>>>>> >>>>>>>>>>>> hosted-engine-ha rpm
>>>>>>>>>>>>>>>> >>>>>>>>>>>> need
>>>>>>>>>>>>>>>> >>>>>>>>>>>> to
>>>>>>>>>>>>>>>> >>>>>>>>>>>> be installed across all nodes in the cluster, btw?
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> Jenny Tokar
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jun 15, 2017 at 6:32 PM, cmc 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> I've migrated from a bare-metal engine to a 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted engine.
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> There
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> were
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> no errors during the install, however, the 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted engine
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> did not
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> started. I tried running:
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> hosted-engine --status
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> on the host I deployed it on, and it returns 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> nothing (exit
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is 1
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> however). I could not ping it either. So I 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> tried starting
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> it via
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> 'hosted-engine --vm-start' and it returned:
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Virtual machine does not exist
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> But it then became available. I logged into it
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> successfully. It
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in the list of VMs however.
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas why the hosted-engine commands fail, 
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> and why it
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> is not
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> the list of virtual machines?
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for any help,
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Cam
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>>>>
>>>>>>>>>>>>>>>> >>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> >>>>>>>>> Users mailing list
>>>>>>>>>>>>>>>> >>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>> >>>>>> _______________________________________________
>>>>>>>>>>>>>>>> >>>>>> Users mailing list
>>>>>>>>>>>>>>>> >>>>>> [email protected]
>>>>>>>>>>>>>>>> >>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> > _______________________________________________
>>>>>>>>>>>>>>>> > Users mailing list
>>>>>>>>>>>>>>>> > [email protected]
>>>>>>>>>>>>>>>> > http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] HostedEngine VM not visible, but running

Reply via email to