Hi all,

Last week we had problem on our ovirt infrastructure. The hosted engine didn't 
came up after the reboot of the host which hosted it. With the help of some 
people on #ovirt IRC channel (msivak, nsoffer and some others, thank to all of 
them) i managed to have my hosted engine up and running, but the underlying 
problem is still there. I think there is an inconsistency between sanlock ID of 
the hosts.

Some background:

We installed ovirt in 3.5 on CentOS 7 about 9 month ago. We have one DC with 
two clusters:

cluster 1: 4 hosts (virt1, virt2, virt3, virt4) that were installed with 
'hosted-engine --deploy' so there are capable to run the engine VM.
cluster 2: 2 hosts (virt6 and virt7) that were installed via the webui, so are 
'normal' ovirt hosts.

Since that we have successfully upgraded ovirt to 3.6 and set our cluster to 
3.6 compatibility mode.

Some weeks after something broke and the virt4 host rebooted. After some help 
on the IRC channel, i managed to get the engine vm up and running. After that i 
dug into the problem that seems to be around the sanlock part.

After explanations, that i understand is:

sanlock manage locks at DC level. there is an hosted_engine lock to manage who 
run the VM and there is a vdsm level lock on the hosted_engine disk (or any 
other VM disks) to know who can write to the disk.

The problem in my case is that on some hosts that were installed in 3.5, the 
hosted_engine ID and the vds_spm_id are not the same, and some other host have 
it vds_spm_id identical to some other host hosted_engine_id. So in some case, 
some host can't acquire the lock on some disks and have different IDs in the 
sanlock space.

Example, im my case:

#
# For the hosted_engine hosts:
#
[root@virt1 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
host_id=1

[root@virt2 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
host_id=2

[root@virt3 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
host_id=3

[root@virt4 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf
host_id=4

#
# For all hosts, including hosted engine:
#
[root@virt1 ~]# sanlock client status                                           
                                                                                
                                                                       
daemon 3a99892c-5d3a-4d3d-bac7-d35259363c98.virt1
p -1 helper
p -1 listener
p -1 status
s 
hosted-engine:1:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0
s 
295207d7-41ea-4cda-a028-f860c357d46b:1:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
s 
daf1b53c-7e29-4b18-a9e2-910605cc7080:1:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
s 
680d5ed1-ed70-4340-a430-ddfa39ee3052:1:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
s 
350e5736-41c0-4017-a8fd-9866edad3333:1:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
s 
377ae8e8-0eeb-4591-b50f-3d21298b4146:1:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0

[root@virt2 ~]# sanlock client status                                           
                                                                                
                                                                       
daemon 48fe11a1-6c64-4a56-abf0-6f9690e6a8c2.virt2
p -1 helper
p -1 listener
p -1 status
s 
hosted-engine:2:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0
s 
377ae8e8-0eeb-4591-b50f-3d21298b4146:2:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
s 
295207d7-41ea-4cda-a028-f860c357d46b:3:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
s 
daf1b53c-7e29-4b18-a9e2-910605cc7080:3:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
s 
680d5ed1-ed70-4340-a430-ddfa39ee3052:3:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
s 
350e5736-41c0-4017-a8fd-9866edad3333:3:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
r 
350e5736-41c0-4017-a8fd-9866edad3333:SDM:/dev/350e5736-41c0-4017-a8fd-9866edad3333/leases:1048576:26
 p 9304
r 
377ae8e8-0eeb-4591-b50f-3d21298b4146:d704cf05-e294-4ada-9627-920c9997cf22:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/leases:111149056:21
 p 32747

[root@virt3 ~]# sanlock client status                                           
                                                                                
                                                                       
daemon 3388d8e5-922d-45ab-8ecb-6e321a7a8a4a.virt3
p -1 helper
p -1 listener
p -1 status
s 
daf1b53c-7e29-4b18-a9e2-910605cc7080:2:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
s 
680d5ed1-ed70-4340-a430-ddfa39ee3052:2:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
s 
295207d7-41ea-4cda-a028-f860c357d46b:2:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
s 
350e5736-41c0-4017-a8fd-9866edad3333:2:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
s 
377ae8e8-0eeb-4591-b50f-3d21298b4146:2:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
 ADD

[root@virt4 ~]# sanlock client status                                           
                                                                                
                                                                       
daemon 3ec5f49e-9920-48a5-97a7-2e900ae374ed.virt4
p -1 helper
p -1 listener
p -1 status
s 
377ae8e8-0eeb-4591-b50f-3d21298b4146:4:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
s 
daf1b53c-7e29-4b18-a9e2-910605cc7080:6:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
s 
295207d7-41ea-4cda-a028-f860c357d46b:6:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
s 
680d5ed1-ed70-4340-a430-ddfa39ee3052:6:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
s 
350e5736-41c0-4017-a8fd-9866edad3333:6:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
s 
hosted-engine:4:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0

[root@virt6 bagasse]# sanlock client status
daemon 031a9126-52ac-497a-8403-cd8c3f2db1c1.virt6
p -1 helper
p -1 listener
p -1 status
s 
377ae8e8-0eeb-4591-b50f-3d21298b4146:5:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
s 
295207d7-41ea-4cda-a028-f860c357d46b:5:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
s 
daf1b53c-7e29-4b18-a9e2-910605cc7080:5:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
s 
680d5ed1-ed70-4340-a430-ddfa39ee3052:5:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
s 
350e5736-41c0-4017-a8fd-9866edad3333:5:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0

[root@virt7 ~]# sanlock client status
daemon 3ef87845-975a-443c-af71-0df1981fb8d4.virt7
p -1 helper
p -1 listener
p -1 status
s 
350e5736-41c0-4017-a8fd-9866edad3333:4:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0
s 
daf1b53c-7e29-4b18-a9e2-910605cc7080:4:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0
 ADD
s 
377ae8e8-0eeb-4591-b50f-3d21298b4146:4:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0
 ADD
s 
295207d7-41ea-4cda-a028-f860c357d46b:4:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0
 ADD
s 
680d5ed1-ed70-4340-a430-ddfa39ee3052:4:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0
 ADD

#
# The output that I've found in engine database:
#
engine=# SELECT 
vds_spm_id_map.storage_pool_id,vds_spm_id_map.vds_spm_id,vds_spm_id_map.vds_id,vds.vds_name
 FROM vds_spm_id_map, vds WHERE vds_spm_id_map.vds_id = vds.vds_id;             
                                                       
           storage_pool_id            | vds_spm_id |                vds_id      
          |         vds_name          
--------------------------------------+------------+--------------------------------------+---------------------------
 00000002-0002-0002-0002-000000000208 |          6 | 
c6aef4f9-e972-40a0-916e-4ed296de46db | virt4
 00000002-0002-0002-0002-000000000208 |          5 | 
5922e88b-c6de-41ce-ab64-046f66c8d08e | virt6
 00000002-0002-0002-0002-000000000208 |          4 | 
fcd962ea-3158-468d-a0b9-d7bb864ba959 | virt7
 00000002-0002-0002-0002-000000000208 |          1 | 
b43933d7-7338-41f6-9a71-f7cd389b9167 | virt1
 00000002-0002-0002-0002-000000000208 |          2 | 
031526e8-110e-4254-97ef-1a26cb67b835 | virt3
 00000002-0002-0002-0002-000000000208 |          3 | 
09609537-0c33-437a-93fa-b246f0bb57e4 | virt2
(6 rows)

So, in this case, for example, the host virt4 have host_id=4 and vds_spm_id=6, 
so this host have these 2 ids in sanlock


So my questions are: 
* How can i get out of this behaviour ? 
* As i didn't find the vds_spm_id on any hosts, can i modify this value in the 
database to make them identical to host_id ?
* This strange behaviour is a possible side effect of the upgrade in ovirt in 
3.6 and the import of hosted engine storage in ovirt engine ?

Any pointers are welcome.


Have a nice day.

Regards.

-- 
Baptiste
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to