Re: [ovirt-users] Fwd: why host is not capable to run HE?

2018-02-21 Thread Artem Tambovskiy
I took a HE VM down and stopped ovirt-ha-agents on both hosts.
Tried  hosted-engine --reinitialize-lockspace  the command just silently
executes and I'm not sure if it doing something at all.
I also tried to clean the metadata. On one host it went correct, on second
host it always failing with following messages:

INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain
monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain
monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain
monitor status: PENDING
INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:VDSM domain
monitor status: PENDING
ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed to
start monitoring domain (sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162,
host_id=2): timeout during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent call
last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
return action(he)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 67, in action_clean
return he.clean(options.force_cleanup)
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 345, in clean
self._initialize_domain_monitor()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 829, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=4a7f8717-9bb0-4d80-8016-498fa4b88162,
host_id=2): timeout during domain acquisition

ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt
'0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors occurred,
giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down

I'm not an expert when it comes to read the sanlock but the output looks a
bit strange to me:

from first host (host_id=2)

[root@ovirt1 ~]# sanlock client status
daemon b1d7fea2-e8a9-4645-b449-97702fc3808e.ovirt1.tel
p -1 helper
p -1 listener
p -1 status
p 3763
p 62861 quaggaVM
p 63111 powerDNS
p 107818 pjsip_freepbx_14
p 109092 revizorro_dev
p 109589 routerVM
s hosted-engine:2:/var/run/vdsm/storage/4a7f8717-9bb0-4d80-
8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/
911c7637-b49d-463e-b186-23b404e50769:0
s a40cc3a9-54d6-40fd-acee-525ef29c8ce3:2:/rhev/data-center/mnt/glusterSD/
ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0
s 4a7f8717-9bb0-4d80-8016-498fa4b88162:1:/rhev/data-center/mnt/glusterSD/
ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0
r a40cc3a9-54d6-40fd-acee-525ef29c8ce3:SDM:/rhev/data-center/mnt/glusterSD/
ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/leases:1048576:49
p 3763


from second host (host_id=1)

[root@ovirt2 ~]# sanlock client status
daemon 9263e081-e5ea-416b-866a-0a73fe32fe16.ovirt2.tel
p -1 helper
p -1 listener
p 150440 CentOS-Desk
p 151061 centos-dev-box
p 151288 revizorro_nfq
p 151954 gitlabVM
p -1 status
s hosted-engine:1:/var/run/vdsm/storage/4a7f8717-9bb0-4d80-
8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/
911c7637-b49d-463e-b186-23b404e50769:0
s a40cc3a9-54d6-40fd-acee-525ef29c8ce3:1:/rhev/data-center/mnt/glusterSD/
ovirt2.telia.ru\:_data/a40cc3a9-54d6-40fd-acee-525ef29c8ce3/dom_md/ids:0
s 4a7f8717-9bb0-4d80-8016-498fa4b88162:1:/rhev/data-center/mnt/glusterSD/
ovirt2.telia.ru\:_engine/4a7f8717-9bb0-4d80-8016-498fa4b88162/dom_md/ids:0
ADD

Not sure if there is a problem with locspace
4a7f8717-9bb0-4d80-8016-498fa4b88162,
but both hosts showing 1 as a host_id here. Is this correct? Should't they
have different Id's here?

Once ha-agent's has been started hosted-engine --vm-status showing
'unknow-stale-data' for the second host. And HE just doesn't start on
second host at all.
Host redeployment haven't helped as well.

Any advises on this?
Regards,
Artem


On Mon, Feb 19, 2018 at 9:32 PM, Artem Tambovskiy <
artem.tambovs...@gmail.com> wrote:

> Thanks Martin.
>
> As you suggested I updated hosted-engine.conf with correct host_id values
> and restarted ovirt-ha-agent services on both hosts and now I run into the
> problem with  status "unknown-stale-data" :(
> And second host still doesn't looks as capable to run HE.
>
> Should I stop HE VM, bring down ovirt-ha-agents and reinitialize-lockspace
> and start ovirt-ha-agents again?
>
> Regards,
> Artem
>
>
>
> On Mon, Feb 19, 2018 at 6:45 PM, Martin Sivak  wrote:
>
>> Hi Artem,
>>
>> just a restart of ovirt-ha-agent services should be enough.
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Mon, Feb 19, 2018 at 4:40 PM, Artem Tambovskiy
>>  wrote:
>> > Ok, understood.
>> > Once I set correct 

Re: [ovirt-users] Fwd: why host is not capable to run HE?

2018-02-19 Thread Martin Sivak
Hi Artem,

just a restart of ovirt-ha-agent services should be enough.

Best regards

Martin Sivak

On Mon, Feb 19, 2018 at 4:40 PM, Artem Tambovskiy
 wrote:
> Ok, understood.
> Once I set correct host_id on both hosts how to take changes in force? With
> minimal downtime? Or i need reboot both hosts anyway?
>
> Regards,
> Artem
>
> 19 февр. 2018 г. 18:18 пользователь "Simone Tiraboschi"
>  написал:
>
>>
>>
>> On Mon, Feb 19, 2018 at 4:12 PM, Artem Tambovskiy
>>  wrote:
>>>
>>>
>>> Thanks a lot, Simone!
>>>
>>> This is clearly shows a problem:
>>>
>>> [root@ov-eng ovirt-engine]# sudo -u postgres psql -d engine -c 'select
>>> vds_name, vds_spm_id from vds'
>>> vds_name | vds_spm_id
>>> -+
>>>  ovirt1.local |  2
>>>  ovirt2.local |  1
>>> (2 rows)
>>>
>>> While hosted-engine.conf on ovirt1.local have host_id=1, and ovirt2.local
>>> host_id=2. So totally opposite values.
>>> So how to get this fixed in the simple way? Update the engine DB?
>>
>>
>> I'd suggest to manually fix /etc/ovirt-hosted-engine/hosted-engine.conf on
>> both the hosts
>>
>>>
>>>
>>> Regards,
>>> Artem
>>>
>>> On Mon, Feb 19, 2018 at 5:37 PM, Simone Tiraboschi 
>>> wrote:



 On Mon, Feb 19, 2018 at 12:13 PM, Artem Tambovskiy
  wrote:
>
> Hello,
>
> Last weekend my cluster suffered form a massive power outage due to
> human mistake.
> I'm using SHE setup with Gluster, I managed to bring the cluster up
> quickly, but once again I have a problem with duplicated host_id
> (https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second host and 
> due
> to this second host is not capable to run HE.
>
> I manually updated file hosted_engine.conf with correct host_id and
> restarted agent & broker - no effect. Than I rebooted the host itself -
> still no changes. How to fix this issue?


 I'd suggest to run this command on the engine VM:
 sudo -u postgres scl enable rh-postgresql95 --  psql -d engine -c
 'select vds_name, vds_spm_id from vds'
 (just  sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id
 from vds'  if still on 4.1) and check
 /etc/ovirt-hosted-engine/hosted-engine.conf on all the involved host.
 Maybe you can also have a leftover configuration file on undeployed
 host.

 When you find a conflict you should manually bring down sanlock
 In doubt a reboot of both the hosts will solve for sure.


>
>
> Regards,
> Artem
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

>>>
>>>
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Fwd: why host is not capable to run HE?

2018-02-19 Thread Artem Tambovskiy
Ok, understood.
Once I set correct host_id on both hosts how to take changes in force? With
minimal downtime? Or i need reboot both hosts anyway?

Regards,
Artem

19 февр. 2018 г. 18:18 пользователь "Simone Tiraboschi" 
написал:

>
>
> On Mon, Feb 19, 2018 at 4:12 PM, Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>>
>> Thanks a lot, Simone!
>>
>> This is clearly shows a problem:
>>
>> [root@ov-eng ovirt-engine]# sudo -u postgres psql -d engine -c 'select
>> vds_name, vds_spm_id from vds'
>> vds_name | vds_spm_id
>> -+
>>  ovirt1.local |  2
>>  ovirt2.local |  1
>> (2 rows)
>>
>> While hosted-engine.conf on ovirt1.local have host_id=1, and
>> ovirt2.local host_id=2. So totally opposite values.
>> So how to get this fixed in the simple way? Update the engine DB?
>>
>
> I'd suggest to manually fix /etc/ovirt-hosted-engine/hosted-engine.conf
> on both the hosts
>
>
>>
>> Regards,
>> Artem
>>
>> On Mon, Feb 19, 2018 at 5:37 PM, Simone Tiraboschi 
>> wrote:
>>
>>>
>>>
>>> On Mon, Feb 19, 2018 at 12:13 PM, Artem Tambovskiy <
>>> artem.tambovs...@gmail.com> wrote:
>>>
 Hello,

 Last weekend my cluster suffered form a massive power outage due to
 human mistake.
 I'm using SHE setup with Gluster, I managed to bring the cluster up
 quickly, but once again I have a problem with duplicated host_id  (
 https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second host
 and due to this second host is not capable to run HE.

 I manually updated file hosted_engine.conf with correct host_id and
 restarted agent & broker - no effect. Than I rebooted the host itself -
 still no changes. How to fix this issue?

>>>
>>> I'd suggest to run this command on the engine VM:
>>> sudo -u postgres scl enable rh-postgresql95 --  psql -d engine -c
>>> 'select vds_name, vds_spm_id from vds'
>>> (just  sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id
>>> from vds'  if still on 4.1) and check 
>>> /etc/ovirt-hosted-engine/hosted-engine.conf
>>> on all the involved host.
>>> Maybe you can also have a leftover configuration file on undeployed host.
>>>
>>> When you find a conflict you should manually bring down sanlock
>>> In doubt a reboot of both the hosts will solve for sure.
>>>
>>>
>>>

 Regards,
 Artem

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


>>>
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Fwd: why host is not capable to run HE?

2018-02-19 Thread Simone Tiraboschi
On Mon, Feb 19, 2018 at 4:12 PM, Artem Tambovskiy <
artem.tambovs...@gmail.com> wrote:

>
> Thanks a lot, Simone!
>
> This is clearly shows a problem:
>
> [root@ov-eng ovirt-engine]# sudo -u postgres psql -d engine -c 'select
> vds_name, vds_spm_id from vds'
> vds_name | vds_spm_id
> -+
>  ovirt1.local |  2
>  ovirt2.local |  1
> (2 rows)
>
> While hosted-engine.conf on ovirt1.local have host_id=1, and ovirt2.local
> host_id=2. So totally opposite values.
> So how to get this fixed in the simple way? Update the engine DB?
>

I'd suggest to manually fix /etc/ovirt-hosted-engine/hosted-engine.conf on
both the hosts


>
> Regards,
> Artem
>
> On Mon, Feb 19, 2018 at 5:37 PM, Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Mon, Feb 19, 2018 at 12:13 PM, Artem Tambovskiy <
>> artem.tambovs...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Last weekend my cluster suffered form a massive power outage due to
>>> human mistake.
>>> I'm using SHE setup with Gluster, I managed to bring the cluster up
>>> quickly, but once again I have a problem with duplicated host_id  (
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1543988) on second host and
>>> due to this second host is not capable to run HE.
>>>
>>> I manually updated file hosted_engine.conf with correct host_id and
>>> restarted agent & broker - no effect. Than I rebooted the host itself -
>>> still no changes. How to fix this issue?
>>>
>>
>> I'd suggest to run this command on the engine VM:
>> sudo -u postgres scl enable rh-postgresql95 --  psql -d engine -c 'select
>> vds_name, vds_spm_id from vds'
>> (just  sudo -u postgres psql -d engine -c 'select vds_name, vds_spm_id
>> from vds'  if still on 4.1) and check 
>> /etc/ovirt-hosted-engine/hosted-engine.conf
>> on all the involved host.
>> Maybe you can also have a leftover configuration file on undeployed host.
>>
>> When you find a conflict you should manually bring down sanlock
>> In doubt a reboot of both the hosts will solve for sure.
>>
>>
>>
>>>
>>> Regards,
>>> Artem
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users