Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-22 Thread Martin Sivak
Hi Artem,

make sure the IDs are different, change them manually if you must!

That is all you need to do to get the agent up I think. The symlink
issue is probably related to another change we did (it happens when a
new hosted engine node is deployed by the engine) and a simple broker
restart should fix it too.

Best regards

Martin Sivak

On Mon, Jan 22, 2018 at 8:03 AM, Artem Tambovskiy
 wrote:
> Hello Kasturi,
>
> Yes, I set global maintenance mode intentionally,
> I'm run out of the ideas troubleshooting my cluster and decided to undeploy
> the hosted engine from second host, clean the installation and add again to
> the cluster.
> Also I cleaned the metadata with hosted-engine --clean-metadata --host-id=2
> --force-clean But once I added the second host to the cluster again it
> doesn't show the capability to run hosted engine. And doesn't even appear in
> the output hosted-engine --vm-status
> [root@ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==--
> conf_on_shared_storage : True Status up-to-date : True Hostname :
> ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up",
> "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32
> : a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930 Extra
> metadata (valid at timestamp): metadata_parse_version=1
> metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018)
> host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018)
> conf_on_shared_storage=True maintenance=False state=GlobalMaintenance
> stopped=False
>
> On redeployed second host I see unknown-stale-data again, and second host
> doesn't show up as a hosted-engine capable.
> [root@ovirt2 ~]# hosted-engine --vm-status
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : False
> Hostname   : ovirt1.telia.ru
> Host ID: 1
> Engine status  : unknown stale-data
> Score  : 0
> stopped: False
> Local maintenance  : False
> crc32  : 18765f68
> local_conf_timestamp   : 848951
> Host timestamp : 848951
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=848951 (Mon Jan 22 09:53:49 2018)
> host-id=1
> score=0
> vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=ReinitializeFSM
> stopped=False
>
>
> Really strange situation ...
>
> Regards,
> Artem
>
>
>
> On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra  wrote:
>>
>> Hello Artem,
>>
>> Any reason why you chose hosted-engine undeploy action for the second
>> host ? I see that the cluster is in global maintenance mode, was this
>> intended ?
>>
>> command to clear the entries from hosted-engine --vm-status is
>> "hosted-engine --clean-metadata --host-id= --force-clean"
>>
>> Hope this helps !!
>>
>> Thanks
>> kasturi
>>
>>
>> On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy
>>  wrote:
>>>
>>> Hi,
>>>
>>> Ok, i decided to remove second host from the cluster.
>>> I reinstalled from webUI it with hosted-engine action UNDEPLOY, and
>>> removed it from the cluster aftewards.
>>> All VM's are fine hosted engine running ok,
>>> But hosted-engine --vm-status still showing 2 hosts.
>>>
>>> How I can clean the traces of second host in a correct way?
>>>
>>>
>>> --== Host 1 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date  : True
>>> Hostname   : ovirt1.telia.ru
>>> Host ID: 1
>>> Engine status  : {"health": "good", "vm": "up",
>>> "detail": "up"}
>>> Score  : 3400
>>> stopped: False
>>> Local maintenance  : False
>>> crc32  : 1b1b6f6d
>>> local_conf_timestamp   : 545385
>>> Host timestamp : 545385
>>> Extra metadata (valid at timestamp):
>>> metadata_parse_version=1
>>> metadata_feature_version=1
>>> timestamp=545385 (Thu Jan 18 21:34:25 2018)
>>> host-id=1
>>> score=3400
>>> vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018)
>>> conf_on_shared_storage=True
>>> maintenance=False
>>> state=GlobalMaintenance
>>> stopped=False
>>>
>>>
>>> --== Host 2 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date  : False
>>> Hostname   : ovirt1.telia.ru
>>> Host ID: 2
>>> Engine status  : unknown stale-data
>>> 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-21 Thread Artem Tambovskiy
Hello Kasturi,

Yes, I set global maintenance mode intentionally,
I'm run out of the ideas troubleshooting my cluster and decided to undeploy
the hosted engine from second host, clean the installation and add again to
the cluster.
Also I cleaned the metadata with *hosted-engine --clean-metadata
--host-id=2 --force-clean *But once I added the second host to the cluster
again it doesn't show the capability to run hosted engine. And doesn't even
appear in the output hosted-engine --vm-status
[root@ovirt1 ~]#hosted-engine --vm-status --== Host 1 status ==--
conf_on_shared_storage : True Status up-to-date : True Hostname :
ovirt1.telia.ru Host ID : 1 Engine status : {"health": "good", "vm": "up",
"detail": "up"} Score : 3400 stopped : False Local maintenance : False
crc32 : a23c7cbd local_conf_timestamp : 848931 Host timestamp : 848930
Extra metadata (valid at timestamp): metadata_parse_version=1
metadata_feature_version=1 timestamp=848930 (Mon Jan 22 09:53:29 2018)
host-id=1 score=3400 vm_conf_refresh_time=848931 (Mon Jan 22 09:53:29 2018)
conf_on_shared_storage=True maintenance=False state=GlobalMaintenance
stopped=False

On redeployed second host I see unknown-stale-data again, and second host
doesn't show up as a hosted-engine capable.
[root@ovirt2 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : unknown stale-data
Score  : 0
stopped: False
Local maintenance  : False
crc32  : 18765f68
local_conf_timestamp   : 848951
Host timestamp : 848951
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=848951 (Mon Jan 22 09:53:49 2018)
host-id=1
score=0
vm_conf_refresh_time=848951 (Mon Jan 22 09:53:50 2018)
conf_on_shared_storage=True
maintenance=False
state=ReinitializeFSM
stopped=False


Really strange situation ...

Regards,
Artem



On Mon, Jan 22, 2018 at 9:46 AM, Kasturi Narra  wrote:

> Hello Artem,
>
> Any reason why you chose hosted-engine undeploy action for the second
> host ? I see that the cluster is in global maintenance mode, was this
> intended ?
>
> command to clear the entries from hosted-engine --vm-status is "hosted-engine
> --clean-metadata --host-id= --force-clean"
>
> Hope this helps !!
>
> Thanks
> kasturi
>
>
> On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Hi,
>>
>> Ok, i decided to remove second host from the cluster.
>> I reinstalled from webUI it with hosted-engine action UNDEPLOY, and
>> removed it from the cluster aftewards.
>> All VM's are fine hosted engine running ok,
>> But hosted-engine --vm-status still showing 2 hosts.
>>
>> How I can clean the traces of second host in a correct way?
>>
>>
>> --== Host 1 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date  : True
>> Hostname   : ovirt1.telia.ru
>> Host ID: 1
>> Engine status  : {"health": "good", "vm": "up",
>> "detail": "up"}
>> Score  : 3400
>> stopped: False
>> Local maintenance  : False
>> crc32  : 1b1b6f6d
>> local_conf_timestamp   : 545385
>> Host timestamp : 545385
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=545385 (Thu Jan 18 21:34:25 2018)
>> host-id=1
>> score=3400
>> vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=GlobalMaintenance
>> stopped=False
>>
>>
>> --== Host 2 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date  : False
>> Hostname   : ovirt1.telia.ru
>> Host ID: 2
>> Engine status  : unknown stale-data
>> Score  : 0
>> stopped: True
>> Local maintenance  : False
>> crc32  : c7037c03
>> local_conf_timestamp   : 7530
>> Host timestamp : 7530
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>> host-id=2
>> score=0
>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>> conf_on_shared_storage=True
>> 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-21 Thread Kasturi Narra
Hello Artem,

Any reason why you chose hosted-engine undeploy action for the second
host ? I see that the cluster is in global maintenance mode, was this
intended ?

command to clear the entries from hosted-engine --vm-status is "hosted-engine
--clean-metadata --host-id= --force-clean"

Hope this helps !!

Thanks
kasturi


On Fri, Jan 19, 2018 at 12:07 AM, Artem Tambovskiy <
artem.tambovs...@gmail.com> wrote:

> Hi,
>
> Ok, i decided to remove second host from the cluster.
> I reinstalled from webUI it with hosted-engine action UNDEPLOY, and
> removed it from the cluster aftewards.
> All VM's are fine hosted engine running ok,
> But hosted-engine --vm-status still showing 2 hosts.
>
> How I can clean the traces of second host in a correct way?
>
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : True
> Hostname   : ovirt1.telia.ru
> Host ID: 1
> Engine status  : {"health": "good", "vm": "up",
> "detail": "up"}
> Score  : 3400
> stopped: False
> Local maintenance  : False
> crc32  : 1b1b6f6d
> local_conf_timestamp   : 545385
> Host timestamp : 545385
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=545385 (Thu Jan 18 21:34:25 2018)
> host-id=1
> score=3400
> vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=GlobalMaintenance
> stopped=False
>
>
> --== Host 2 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : False
> Hostname   : ovirt1.telia.ru
> Host ID: 2
> Engine status  : unknown stale-data
> Score  : 0
> stopped: True
> Local maintenance  : False
> crc32  : c7037c03
> local_conf_timestamp   : 7530
> Host timestamp : 7530
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=7530 (Fri Jan 12 16:10:12 2018)
> host-id=2
> score=0
> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
>
> !! Cluster is in GLOBAL MAINTENANCE mode !!
>
> Thank you in advance!
> Regards,
> Artem
>
>
> On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Hello,
>>
>> Any further suggestions on how to fix the issue and make HA setup
>> working? Can the complete removal of second host (with complete removal
>> ovirt configuration files and packages) from cluster and adding it again
>> solve the issue? Or it might completly ruin the cluster?
>>
>> Regards,
>> Artem
>>
>> 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" <
>> artem.tambovs...@gmail.com> написал:
>>
>> Hi Martin,
>>>
>>> Thanks for feedback.
>>>
>>> All hosts and hosted-engine running 4.1.8 release.
>>> The strange thing : I can see that host ID is set to 1 on both hosts at
>>> /etc/ovirt-hosted-engine/hosted-engine.conf file.
>>> I have no idea how this happen, the only thing I have changed recently
>>> is that I have changed mnt_options in order to add backup-volfile-servers
>>> by using hosted-engine --set-shared-config command
>>>
>>> Both agent and broker are running on second host
>>>
>>> [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
>>> vdsm  42331  1 26 14:40 ?00:31:35 /usr/bin/python
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>>> vdsm  42332  1  0 14:40 ?00:00:16 /usr/bin/python
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>>>
>>> but I saw some tracebacks during the broker start
>>>
>>> [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker
>>> -l
>>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>>> Communications Broker
>>>Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>>> enabled; vendor preset: disabled)
>>>Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min
>>> ago
>>>  Main PID: 42331 (ovirt-ha-broker)
>>>CGroup: /system.slice/ovirt-ha-broker.service
>>>└─42331 /usr/bin/python 
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>> --no-daemon
>>>
>>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
>>> High Availability Communications Broker.
>>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted
>>> Engine High Availability Communications Broker...
>>> Jan 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-18 Thread Artem Tambovskiy
Hi,

Ok, i decided to remove second host from the cluster.
I reinstalled from webUI it with hosted-engine action UNDEPLOY, and removed
it from the cluster aftewards.
All VM's are fine hosted engine running ok,
But hosted-engine --vm-status still showing 2 hosts.

How I can clean the traces of second host in a correct way?


--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : {"health": "good", "vm": "up",
"detail": "up"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 1b1b6f6d
local_conf_timestamp   : 545385
Host timestamp : 545385
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=545385 (Thu Jan 18 21:34:25 2018)
host-id=1
score=3400
vm_conf_refresh_time=545385 (Thu Jan 18 21:34:25 2018)
conf_on_shared_storage=True
maintenance=False
state=GlobalMaintenance
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 2
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : c7037c03
local_conf_timestamp   : 7530
Host timestamp : 7530
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=7530 (Fri Jan 12 16:10:12 2018)
host-id=2
score=0
vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True


!! Cluster is in GLOBAL MAINTENANCE mode !!

Thank you in advance!
Regards,
Artem


On Wed, Jan 17, 2018 at 6:47 PM, Artem Tambovskiy <
artem.tambovs...@gmail.com> wrote:

> Hello,
>
> Any further suggestions on how to fix the issue and make HA setup working?
> Can the complete removal of second host (with complete removal ovirt
> configuration files and packages) from cluster and adding it again solve
> the issue? Or it might completly ruin the cluster?
>
> Regards,
> Artem
>
> 16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" <
> artem.tambovs...@gmail.com> написал:
>
> Hi Martin,
>>
>> Thanks for feedback.
>>
>> All hosts and hosted-engine running 4.1.8 release.
>> The strange thing : I can see that host ID is set to 1 on both hosts at
>> /etc/ovirt-hosted-engine/hosted-engine.conf file.
>> I have no idea how this happen, the only thing I have changed recently is
>> that I have changed mnt_options in order to add backup-volfile-servers
>> by using hosted-engine --set-shared-config command
>>
>> Both agent and broker are running on second host
>>
>> [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
>> vdsm  42331  1 26 14:40 ?00:31:35 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
>> vdsm  42332  1  0 14:40 ?00:00:16 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>>
>> but I saw some tracebacks during the broker start
>>
>> [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l
>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
>> Communications Broker
>>Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
>> enabled; vendor preset: disabled)
>>Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min
>> ago
>>  Main PID: 42331 (ovirt-ha-broker)
>>CGroup: /system.slice/ovirt-ha-broker.service
>>└─42331 /usr/bin/python 
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>> --no-daemon
>>
>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
>> High Availability Communications Broker.
>> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine
>> High Availability Communications Broker...
>> Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker
>> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error
>> handling request, data: 'set-storage-domain FilesystemBackend
>> dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162'
>> Traceback (most
>> recent call last):
>>   File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>> line 166, in handle
>> 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-17 Thread Artem Tambovskiy
Hello,

Any further suggestions on how to fix the issue and make HA setup working?
Can the complete removal of second host (with complete removal ovirt
configuration files and packages) from cluster and adding it again solve
the issue? Or it might completly ruin the cluster?

Regards,
Artem

16 янв. 2018 г. 17:00 пользователь "Artem Tambovskiy" <
artem.tambovs...@gmail.com> написал:

> Hi Martin,
>
> Thanks for feedback.
>
> All hosts and hosted-engine running 4.1.8 release.
> The strange thing : I can see that host ID is set to 1 on both hosts at
> /etc/ovirt-hosted-engine/hosted-engine.conf file.
> I have no idea how this happen, the only thing I have changed recently is
> that I have changed mnt_options in order to add backup-volfile-servers
> by using hosted-engine --set-shared-config command
>
> Both agent and broker are running on second host
>
> [root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
> vdsm  42331  1 26 14:40 ?00:31:35 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
> vdsm  42332  1  0 14:40 ?00:00:16 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>
> but I saw some tracebacks during the broker start
>
> [root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l
> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
> Communications Broker
>Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
> enabled; vendor preset: disabled)
>Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago
>  Main PID: 42331 (ovirt-ha-broker)
>CGroup: /system.slice/ovirt-ha-broker.service
>└─42331 /usr/bin/python 
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
> --no-daemon
>
> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
> High Availability Communications Broker.
> Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine
> High Availability Communications Broker...
> Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker
> ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error
> handling request, data: 'set-storage-domain FilesystemBackend
> dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162'
> Traceback (most
> recent call last):
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
> line 166, in handle
> data)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
> line 299, in _dispatch
>
> .set_storage_domain(client, sd_type, **options)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 66, in set_storage_domain
>
> self._backends[client].connect()
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
> line 462, in connect
> self._dom_type)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
> line 107, in get_domain_path
> " in
> {1}".format(sd_uuid, parent))
>
> BackendFailureException: path to storage domain 
> 4a7f8717-9bb0-4d80-8016-498fa4b88162
> not found in /rhev/data-center/mnt/glusterSD
>
>
>
> I have tried to issue hosted-engine --connect-storage on second host
> followed by agent & broker restart
> But there is no any visible improvements.
>
> Regards,
> Artem
>
>
>
>
>
>
>
> On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak  wrote:
>
>> Hi everybody,
>>
>> there are couple of things to check here.
>>
>> - what version of hosted engine agent is this? The logs look like
>> coming from 4.1
>> - what version of engine is used?
>> - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on
>> both hosts, the numbers must be different
>> - it looks like the agent or broker on host 2 is not active (or there
>> would be a report)
>> - the second host does not see data from the first host (unknown
>> stale-data), wait for a minute and check again, then check the storage
>> connection
>>
>> And then the general troubleshooting:
>>
>> - put hosted engine in global maintenance mode (and check that it is
>> visible from the other host using he --vm-status)
>> - mount storage domain (hosted-engine --connect-storage)
>> - check sanlock client status to see if proper lockspaces are present
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins  

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-16 Thread Artem Tambovskiy
Hi Martin,

Thanks for feedback.

All hosts and hosted-engine running 4.1.8 release.
The strange thing : I can see that host ID is set to 1 on both hosts at
/etc/ovirt-hosted-engine/hosted-engine.conf file.
I have no idea how this happen, the only thing I have changed recently is
that I have changed mnt_options in order to add backup-volfile-servers
by using hosted-engine --set-shared-config command

Both agent and broker are running on second host

[root@ovirt2 ovirt-hosted-engine-ha]# ps -ef | grep ovirt-ha-
vdsm  42331  1 26 14:40 ?00:31:35 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
vdsm  42332  1  0 14:40 ?00:00:16 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

but I saw some tracebacks during the broker start

[root@ovirt2 ovirt-hosted-engine-ha]# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability
Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service;
enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-01-16 14:40:15 MSK; 1h 58min ago
 Main PID: 42331 (ovirt-ha-broker)
   CGroup: /system.slice/ovirt-ha-broker.service
   └─42331 /usr/bin/python
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Started oVirt Hosted Engine
High Availability Communications Broker.
Jan 16 14:40:15 ovirt2.telia.ru systemd[1]: Starting oVirt Hosted Engine
High Availability Communications Broker...
Jan 16 14:40:16 ovirt2.telia.ru ovirt-ha-broker[42331]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error
handling request, data: 'set-storage-domain FilesystemBackend
dom_type=glusterfs sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162'
Traceback (most
recent call last):
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 166, in handle
data)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
line 299, in _dispatch

.set_storage_domain(client, sd_type, **options)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 66, in set_storage_domain

self._backends[client].connect()
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 462, in connect
self._dom_type)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 107, in get_domain_path
" in
{1}".format(sd_uuid, parent))

BackendFailureException: path to storage domain
4a7f8717-9bb0-4d80-8016-498fa4b88162 not found in
/rhev/data-center/mnt/glusterSD



I have tried to issue hosted-engine --connect-storage on second host
followed by agent & broker restart
But there is no any visible improvements.

Regards,
Artem







On Tue, Jan 16, 2018 at 4:18 PM, Martin Sivak  wrote:

> Hi everybody,
>
> there are couple of things to check here.
>
> - what version of hosted engine agent is this? The logs look like
> coming from 4.1
> - what version of engine is used?
> - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on
> both hosts, the numbers must be different
> - it looks like the agent or broker on host 2 is not active (or there
> would be a report)
> - the second host does not see data from the first host (unknown
> stale-data), wait for a minute and check again, then check the storage
> connection
>
> And then the general troubleshooting:
>
> - put hosted engine in global maintenance mode (and check that it is
> visible from the other host using he --vm-status)
> - mount storage domain (hosted-engine --connect-storage)
> - check sanlock client status to see if proper lockspaces are present
>
> Best regards
>
> Martin Sivak
>
> On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins  wrote:
> > Why are both hosts reporting as ovirt 1?
> > Look at the hostname fields to see what mean.
> >
> > -derek
> > Sent using my mobile device. Please excuse any typos.
> >
> > On January 16, 2018 7:11:09 AM Artem Tambovskiy <
> artem.tambovs...@gmail.com>
> > wrote:
> >>
> >> Hello,
> >>
> >> Yes, I followed exactly the same procedure while reinstalling the hosts
> >> (the only difference that I have SSH key configured instead of the
> >> password).
> >>
> >> Just reinstalled the second host one more time, after 20 min the host
> >> still haven't reached active 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-16 Thread Martin Sivak
Hi everybody,

there are couple of things to check here.

- what version of hosted engine agent is this? The logs look like
coming from 4.1
- what version of engine is used?
- check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on
both hosts, the numbers must be different
- it looks like the agent or broker on host 2 is not active (or there
would be a report)
- the second host does not see data from the first host (unknown
stale-data), wait for a minute and check again, then check the storage
connection

And then the general troubleshooting:

- put hosted engine in global maintenance mode (and check that it is
visible from the other host using he --vm-status)
- mount storage domain (hosted-engine --connect-storage)
- check sanlock client status to see if proper lockspaces are present

Best regards

Martin Sivak

On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins  wrote:
> Why are both hosts reporting as ovirt 1?
> Look at the hostname fields to see what mean.
>
> -derek
> Sent using my mobile device. Please excuse any typos.
>
> On January 16, 2018 7:11:09 AM Artem Tambovskiy 
> wrote:
>>
>> Hello,
>>
>> Yes, I followed exactly the same procedure while reinstalling the hosts
>> (the only difference that I have SSH key configured instead of the
>> password).
>>
>> Just reinstalled the second host one more time, after 20 min the host
>> still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and
>> I still don't see crown icon for this host.
>>
>> hosted-engine --vm-status  from ovirt1 host
>>
>> [root@ovirt1 ~]# hosted-engine --vm-status
>>
>>
>> --== Host 1 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date  : True
>> Hostname   : ovirt1.telia.ru
>> Host ID: 1
>> Engine status  : {"health": "good", "vm": "up",
>> "detail": "up"}
>> Score  : 3400
>> stopped: False
>> Local maintenance  : False
>> crc32  : 3f94156a
>> local_conf_timestamp   : 349144
>> Host timestamp : 349144
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=349144 (Tue Jan 16 15:03:45 2018)
>> host-id=1
>> score=3400
>> vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=EngineUp
>> stopped=False
>>
>>
>> --== Host 2 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date  : False
>> Hostname   : ovirt1.telia.ru
>> Host ID: 2
>> Engine status  : unknown stale-data
>> Score  : 0
>> stopped: True
>> Local maintenance  : False
>> crc32  : c7037c03
>> local_conf_timestamp   : 7530
>> Host timestamp : 7530
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=7530 (Fri Jan 12 16:10:12 2018)
>> host-id=2
>> score=0
>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=AgentStopped
>> stopped=True
>>
>>
>> hosted-engine --vm-status output from ovirt2 host
>>
>> [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status
>>
>>
>> --== Host 1 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date  : False
>> Hostname   : ovirt1.telia.ru
>> Host ID: 1
>> Engine status  : unknown stale-data
>> Score  : 3400
>> stopped: False
>> Local maintenance  : False
>> crc32  : 6d3606f1
>> local_conf_timestamp   : 349264
>> Host timestamp : 349264
>> Extra metadata (valid at timestamp):
>> metadata_parse_version=1
>> metadata_feature_version=1
>> timestamp=349264 (Tue Jan 16 15:05:45 2018)
>> host-id=1
>> score=3400
>> vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
>> conf_on_shared_storage=True
>> maintenance=False
>> state=EngineUp
>> stopped=False
>>
>>
>> --== Host 2 status ==--
>>
>> conf_on_shared_storage : True
>> Status up-to-date  : False
>> Hostname   : ovirt1.telia.ru
>> Host ID: 2
>> Engine status  : unknown stale-data
>> Score  : 0
>> 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-16 Thread Derek Atkins

Why are both hosts reporting as ovirt 1?
Look at the hostname fields to see what mean.

-derek
Sent using my mobile device. Please excuse any typos.



On January 16, 2018 7:11:09 AM Artem Tambovskiy 
 wrote:



Hello,

Yes, I followed exactly the same procedure while reinstalling the hosts
(the only difference that I have SSH key configured instead of the
password).

Just reinstalled the second host one more time, after 20 min the host still
haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I
still don't see crown icon for this host.

hosted-engine --vm-status  from ovirt1 host

[root@ovirt1 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : {"health": "good", "vm": "up",
"detail": "up"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 3f94156a
local_conf_timestamp   : 349144
Host timestamp : 349144
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=349144 (Tue Jan 16 15:03:45 2018)
host-id=1
score=3400
vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 2
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : c7037c03
local_conf_timestamp   : 7530
Host timestamp : 7530
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=7530 (Fri Jan 12 16:10:12 2018)
host-id=2
score=0
vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True


hosted-engine --vm-status output from ovirt2 host

[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : unknown stale-data
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 6d3606f1
local_conf_timestamp   : 349264
Host timestamp : 349264
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=349264 (Tue Jan 16 15:05:45 2018)
host-id=1
score=3400
vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 2
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : c7037c03
local_conf_timestamp   : 7530
Host timestamp : 7530
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=7530 (Fri Jan 12 16:10:12 2018)
host-id=2
score=0
vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True


Also I saw some log messages in webGUI about time drift like

"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum
configured value is 300 seconds." that is a bit weird as haven't touched
any time settings since I installed the cluster.
both host have the same time and timezone (MSK) but hosted engine lives in
UTC timezone. Is it mandatory to have everything in sync and in the same
timezone?

Regards,
Artem






On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra  wrote:


Hello,

 I now see that your hosted engine is up and running. Can you let me
know how did you try reinstalling the host? Below is the 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-16 Thread Artem Tambovskiy
Hello,

Yes, I followed exactly the same procedure while reinstalling the hosts
(the only difference that I have SSH key configured instead of the
password).

Just reinstalled the second host one more time, after 20 min the host still
haven't reached active score of 3400 (Hosted Engine HA:Not Active) and I
still don't see crown icon for this host.

hosted-engine --vm-status  from ovirt1 host

[root@ovirt1 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : {"health": "good", "vm": "up",
"detail": "up"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 3f94156a
local_conf_timestamp   : 349144
Host timestamp : 349144
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=349144 (Tue Jan 16 15:03:45 2018)
host-id=1
score=3400
vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 2
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : c7037c03
local_conf_timestamp   : 7530
Host timestamp : 7530
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=7530 (Fri Jan 12 16:10:12 2018)
host-id=2
score=0
vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True


hosted-engine --vm-status output from ovirt2 host

[root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : unknown stale-data
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : 6d3606f1
local_conf_timestamp   : 349264
Host timestamp : 349264
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=349264 (Tue Jan 16 15:05:45 2018)
host-id=1
score=3400
vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 2
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : c7037c03
local_conf_timestamp   : 7530
Host timestamp : 7530
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=7530 (Fri Jan 12 16:10:12 2018)
host-id=2
score=0
vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True


Also I saw some log messages in webGUI about time drift like

"Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum
configured value is 300 seconds." that is a bit weird as haven't touched
any time settings since I installed the cluster.
both host have the same time and timezone (MSK) but hosted engine lives in
UTC timezone. Is it mandatory to have everything in sync and in the same
timezone?

Regards,
Artem






On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra  wrote:

> Hello,
>
>  I now see that your hosted engine is up and running. Can you let me
> know how did you try reinstalling the host? Below is the procedure which is
> used and hope you did not miss any step while reinstalling. If no, can you
> try reinstalling again and see if that works ?
>
> 1) Move the host to maintenance
> 2) click on reinstall
> 3) provide the password
> 4) 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-15 Thread Artem Tambovskiy
Hello,

I have uploaded 2 archives with all relevant logs to shared hosting
files from host 1  (which is currently running all VM's including
hosted_engine)  -  https://yadi.sk/d/PttRoYV63RTvhK
files from second host - https://yadi.sk/d/UBducEsV3RTvhc

I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it
gives no effect. I have also tried to shutdown hosted_engine VM, stop
ovirt-ha-agent and ovirt-ha-broker  services disconnect storage and connect
it again  - no effect as well.
Also I tried to reinstall second host from WebGUI - this lead to the
interesting situation - now  hosted-engine --vm-status  shows that both
hosts have the same address.

[root@ovirt1 ~]# hosted-engine --vm-status

--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : True
Hostname   : ovirt1.telia.ru
Host ID: 1
Engine status  : {"health": "good", "vm": "up",
"detail": "up"}
Score  : 3400
stopped: False
Local maintenance  : False
crc32  : a7758085
local_conf_timestamp   : 259327
Host timestamp : 259327
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=259327 (Mon Jan 15 14:06:48 2018)
host-id=1
score=3400
vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018)
conf_on_shared_storage=True
maintenance=False
state=EngineUp
stopped=False


--== Host 2 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : ovirt1.telia.ru
Host ID: 2
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : c7037c03
local_conf_timestamp   : 7530
Host timestamp : 7530
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=7530 (Fri Jan 12 16:10:12 2018)
host-id=2
score=0
vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True

Gluster seems working fine. all gluster nodes showing connected state.

Any advises on how to resolve this situation are highly appreciated!

Regards,
Artem


On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra  wrote:

> Hello Artem,
>
> Can you check if glusterd service is running on host1 and all the
> peers are in connected state ? If yes, can you restart ovirt-ha-agent and
> broker services and check if things are working fine ?
>
> Thanks
> kasturi
>
> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Explored logs on both hosts.
>> broker.log shows no errors.
>>
>> agent.log looking not good:
>>
>> on host1 (which running hosted engine) :
>>
>> MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::ovir
>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most
>> recent call last):
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>> return action(he)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 64, in action_proper
>> return he.start_monitoring()
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 411, in start_monitoring
>> self._initialize_sanlock()
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 749, in _initialize_sanlock
>> "Failed to initialize sanlock, the number of errors has"
>> SanlockInitializationError: Failed to initialize sanlock, the number of
>> errors has exceeded the limit
>>
>> MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::ovir
>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart
>> agent
>> MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::ovir
>> t_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent,
>> attempt '1'
>> MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::2
>> 42::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>> Found certificate common name: ovirt1.telia.ru
>> MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::6
>> 04::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>> Initializing VDSM
>> MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::6
>> 30::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
>> Connecting the storage
>> 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-15 Thread Kasturi Narra
Hello Artem,

Can you check if glusterd service is running on host1 and all the
peers are in connected state ? If yes, can you restart ovirt-ha-agent and
broker services and check if things are working fine ?

Thanks
kasturi

On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy <
artem.tambovs...@gmail.com> wrote:

> Explored logs on both hosts.
> broker.log shows no errors.
>
> agent.log looking not good:
>
> on host1 (which running hosted engine) :
>
> MainThread::ERROR::2018-01-12 21:51:03,883::agent::205::
> ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most
> recent call last):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 191, in _run_agent
> return action(he)
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 64, in action_proper
> return he.start_monitoring()
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 411, in start_monitoring
> self._initialize_sanlock()
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 749, in _initialize_sanlock
> "Failed to initialize sanlock, the number of errors has"
> SanlockInitializationError: Failed to initialize sanlock, the number of
> errors has exceeded the limit
>
> MainThread::ERROR::2018-01-12 21:51:03,884::agent::206::
> ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart
> agent
> MainThread::WARNING::2018-01-12 21:51:08,889::agent::209::
> ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent,
> attempt '1'
> MainThread::INFO::2018-01-12 21:51:08,919::hosted_engine::
> 242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> Found certificate common name: ovirt1.telia.ru
> MainThread::INFO::2018-01-12 21:51:08,921::hosted_engine::
> 604::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_vdsm) Initializing VDSM
> MainThread::INFO::2018-01-12 21:51:11,398::hosted_engine::
> 630::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_storage_images) Connecting the storage
> MainThread::INFO::2018-01-12 21:51:11,399::storage_server::
> 220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
> Validating storage server
> MainThread::INFO::2018-01-12 21:51:13,725::storage_server::
> 239::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> MainThread::INFO::2018-01-12 21:51:18,390::storage_server::
> 246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> MainThread::INFO::2018-01-12 21:51:18,423::storage_server::
> 253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Refreshing the storage domain
> MainThread::INFO::2018-01-12 21:51:18,689::hosted_engine::
> 663::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_storage_images) Preparing images
> MainThread::INFO::2018-01-12 21:51:18,690::image::126::
> ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images
> MainThread::INFO::2018-01-12 21:51:21,895::hosted_engine::
> 666::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_storage_images) Refreshing vm.conf
> MainThread::INFO::2018-01-12 21:51:21,895::config::493::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
> Reloading vm.conf from the shared storage domain
> MainThread::INFO::2018-01-12 21:51:21,896::config::416::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
> config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher
> copy of vm configuration from the OVF_STORE
> MainThread::INFO::2018-01-12 21:51:21,896::ovf_store::132::
> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
> Extracting Engine VM OVF from the OVF_STORE
> MainThread::INFO::2018-01-12 21:51:21,897::ovf_store::134::
> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
> OVF_STORE volume path: /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-
> 498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-
> d109fa36dfcf
> MainThread::INFO::2018-01-12 21:51:21,915::config::435::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
> config::(_get_vm_conf_content_from_ovf_store) Found an OVF for HE VM,
> trying to convert
> MainThread::INFO::2018-01-12 21:51:21,918::config::440::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
> config::(_get_vm_conf_content_from_ovf_store) Got vm.conf from OVF_STORE
> MainThread::INFO::2018-01-12 21:51:21,919::hosted_engine::
> 509::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_broker) Initializing ha-broker connection
> MainThread::INFO::2018-01-12 21:51:21,919::brokerlink::130:
> 

Re: [ovirt-users] hosted-engine unknow stale-data

2018-01-12 Thread Artem Tambovskiy
Explored logs on both hosts.
broker.log shows no errors.

agent.log looking not good:

on host1 (which running hosted engine) :

MainThread::ERROR::2018-01-12
21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Traceback (most recent call last):
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
return action(he)
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 64, in action_proper
return he.start_monitoring()
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 411, in start_monitoring
self._initialize_sanlock()
  File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 749, in _initialize_sanlock
"Failed to initialize sanlock, the number of errors has"
SanlockInitializationError: Failed to initialize sanlock, the number of
errors has exceeded the limit

MainThread::ERROR::2018-01-12
21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Trying to restart agent
MainThread::WARNING::2018-01-12
21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Restarting agent, attempt '1'
MainThread::INFO::2018-01-12
21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
Found certificate common name: ovirt1.telia.ru
MainThread::INFO::2018-01-12
21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
Initializing VDSM
MainThread::INFO::2018-01-12
21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Connecting the storage
MainThread::INFO::2018-01-12
21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server)
Validating storage server
MainThread::INFO::2018-01-12
21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2018-01-12
21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Connecting storage server
MainThread::INFO::2018-01-12
21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
Refreshing the storage domain
MainThread::INFO::2018-01-12
21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Preparing images
MainThread::INFO::2018-01-12
21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
Preparing images
MainThread::INFO::2018-01-12
21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
Refreshing vm.conf
MainThread::INFO::2018-01-12
21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
Reloading vm.conf from the shared storage domain
MainThread::INFO::2018-01-12
21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::INFO::2018-01-12
21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2018-01-12
21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
OVF_STORE volume path:
/var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf
MainThread::INFO::2018-01-12
21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Found an OVF for HE VM, trying to convert
MainThread::INFO::2018-01-12
21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Got vm.conf from OVF_STORE
MainThread::INFO::2018-01-12
21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Initializing ha-broker connection
MainThread::INFO::2018-01-12
21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor ping, options {'addr': '80.239.162.97'}
MainThread::INFO::2018-01-12
21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Success, id 140547104457680
MainThread::INFO::2018-01-12
21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name':
'ovirtmgmt', 'address': '0'}
MainThread::INFO::2018-01-12