On Tue, Apr 12, 2016 at 5:12 PM, Martin Sivak <[email protected]> wrote: > Hi, > > thanks for the summary, this is what I was suspecting. > > Just a clarification about the hosted engine host-id and lockspace. > Hosted engine has a separate lockspace from VDSM and uses > hosted-engine's host-id there consistently to protect a metadata > whiteboard. It has nothing to do with the VM and there is no conflict > here. > > > The issue seems to be that the VDSM lockspace is being used when > connect storage domain is called and both hosted engine and > ovirt-engine can call the connect command. Unfortunately hosted engine > does not know the vds_spm_id when mounting the volume for the first > time (even before ovirt-engine VM is started) and uses the host-id for > that. > > Now, there is probably no issue when all hosts accessing that storage > domain are hosted engine enabled right from the start as the storage > domain is mounted to all hosts before the engine starts and the > locking uses consistent id (hosted engine host-id). > > The problem surfaces on a host where the engine manages to call the > "connect hosted engine storage domain" first. Because engine uses the > vds_spm_id for the requested lease and a collision happens. > > I do not see any easy fix at this moment, maybe except telling engine > to use hosted engine id when it tries to connect the hosted engine > storage domain. That feels like a hack, but might work. > > There also seems to be a bug for this issue now: > https://bugzilla.redhat.com/show_bug.cgi?id=1322849 > > Simone/Nir can you please comment on the issue to confirm that our > findings are correct?
I feel so but probably the solution you proposed is not enough since we also allow to mix hosted-engine enabled hosts and regular hosts (where you don't have any hosted-engine id) in the same cluster and, once the hosted-engine-storage domain got imported by engine, the engine will going to connect it on all of them. > Thanks > > Regards > > -- > Martin Sivak > SLA / oVirt > > On Tue, Apr 12, 2016 at 4:31 PM, Baptiste Agasse > <[email protected]> wrote: >> Hi all, >> >> Last week we had problem on our ovirt infrastructure. The hosted engine >> didn't came up after the reboot of the host which hosted it. With the help >> of some people on #ovirt IRC channel (msivak, nsoffer and some others, thank >> to all of them) i managed to have my hosted engine up and running, but the >> underlying problem is still there. I think there is an inconsistency between >> sanlock ID of the hosts. >> >> Some background: >> >> We installed ovirt in 3.5 on CentOS 7 about 9 month ago. We have one DC with >> two clusters: >> >> cluster 1: 4 hosts (virt1, virt2, virt3, virt4) that were installed with >> 'hosted-engine --deploy' so there are capable to run the engine VM. >> cluster 2: 2 hosts (virt6 and virt7) that were installed via the webui, so >> are 'normal' ovirt hosts. >> >> Since that we have successfully upgraded ovirt to 3.6 and set our cluster to >> 3.6 compatibility mode. >> >> Some weeks after something broke and the virt4 host rebooted. After some >> help on the IRC channel, i managed to get the engine vm up and running. >> After that i dug into the problem that seems to be around the sanlock part. >> >> After explanations, that i understand is: >> >> sanlock manage locks at DC level. there is an hosted_engine lock to manage >> who run the VM and there is a vdsm level lock on the hosted_engine disk (or >> any other VM disks) to know who can write to the disk. >> >> The problem in my case is that on some hosts that were installed in 3.5, the >> hosted_engine ID and the vds_spm_id are not the same, and some other host >> have it vds_spm_id identical to some other host hosted_engine_id. So in some >> case, some host can't acquire the lock on some disks and have different IDs >> in the sanlock space. >> >> Example, im my case: >> >> # >> # For the hosted_engine hosts: >> # >> [root@virt1 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf >> host_id=1 >> >> [root@virt2 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf >> host_id=2 >> >> [root@virt3 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf >> host_id=3 >> >> [root@virt4 ~]# grep host_id /etc/ovirt-hosted-engine/hosted-engine.conf >> host_id=4 >> >> # >> # For all hosts, including hosted engine: >> # >> [root@virt1 ~]# sanlock client status >> daemon 3a99892c-5d3a-4d3d-bac7-d35259363c98.virt1 >> p -1 helper >> p -1 listener >> p -1 status >> s >> hosted-engine:1:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0 >> s >> 295207d7-41ea-4cda-a028-f860c357d46b:1:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0 >> s >> daf1b53c-7e29-4b18-a9e2-910605cc7080:1:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0 >> s >> 680d5ed1-ed70-4340-a430-ddfa39ee3052:1:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0 >> s >> 350e5736-41c0-4017-a8fd-9866edad3333:1:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0 >> s >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:1:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0 >> >> [root@virt2 ~]# sanlock client status >> daemon 48fe11a1-6c64-4a56-abf0-6f9690e6a8c2.virt2 >> p -1 helper >> p -1 listener >> p -1 status >> s >> hosted-engine:2:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0 >> s >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:2:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0 >> s >> 295207d7-41ea-4cda-a028-f860c357d46b:3:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0 >> s >> daf1b53c-7e29-4b18-a9e2-910605cc7080:3:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0 >> s >> 680d5ed1-ed70-4340-a430-ddfa39ee3052:3:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0 >> s >> 350e5736-41c0-4017-a8fd-9866edad3333:3:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0 >> r >> 350e5736-41c0-4017-a8fd-9866edad3333:SDM:/dev/350e5736-41c0-4017-a8fd-9866edad3333/leases:1048576:26 >> p 9304 >> r >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:d704cf05-e294-4ada-9627-920c9997cf22:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/leases:111149056:21 >> p 32747 >> >> [root@virt3 ~]# sanlock client status >> daemon 3388d8e5-922d-45ab-8ecb-6e321a7a8a4a.virt3 >> p -1 helper >> p -1 listener >> p -1 status >> s >> daf1b53c-7e29-4b18-a9e2-910605cc7080:2:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0 >> s >> 680d5ed1-ed70-4340-a430-ddfa39ee3052:2:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0 >> s >> 295207d7-41ea-4cda-a028-f860c357d46b:2:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0 >> s >> 350e5736-41c0-4017-a8fd-9866edad3333:2:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0 >> s >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:2:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0 >> ADD >> >> [root@virt4 ~]# sanlock client status >> daemon 3ec5f49e-9920-48a5-97a7-2e900ae374ed.virt4 >> p -1 helper >> p -1 listener >> p -1 status >> s >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:4:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0 >> s >> daf1b53c-7e29-4b18-a9e2-910605cc7080:6:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0 >> s >> 295207d7-41ea-4cda-a028-f860c357d46b:6:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0 >> s >> 680d5ed1-ed70-4340-a430-ddfa39ee3052:6:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0 >> s >> 350e5736-41c0-4017-a8fd-9866edad3333:6:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0 >> s >> hosted-engine:4:/var/run/vdsm/storage/377ae8e8-0eeb-4591-b50f-3d21298b4146/607719dd-b71e-4527-814a-964ed0c1f8ea/6a0b878d-fe7e-4fb6-bd5d-1254bebb0ca0:0 >> >> [root@virt6 bagasse]# sanlock client status >> daemon 031a9126-52ac-497a-8403-cd8c3f2db1c1.virt6 >> p -1 helper >> p -1 listener >> p -1 status >> s >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:5:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0 >> s >> 295207d7-41ea-4cda-a028-f860c357d46b:5:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0 >> s >> daf1b53c-7e29-4b18-a9e2-910605cc7080:5:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0 >> s >> 680d5ed1-ed70-4340-a430-ddfa39ee3052:5:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0 >> s >> 350e5736-41c0-4017-a8fd-9866edad3333:5:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0 >> >> [root@virt7 ~]# sanlock client status >> daemon 3ef87845-975a-443c-af71-0df1981fb8d4.virt7 >> p -1 helper >> p -1 listener >> p -1 status >> s >> 350e5736-41c0-4017-a8fd-9866edad3333:4:/dev/350e5736-41c0-4017-a8fd-9866edad3333/ids:0 >> s >> daf1b53c-7e29-4b18-a9e2-910605cc7080:4:/dev/daf1b53c-7e29-4b18-a9e2-910605cc7080/ids:0 >> ADD >> s >> 377ae8e8-0eeb-4591-b50f-3d21298b4146:4:/dev/377ae8e8-0eeb-4591-b50f-3d21298b4146/ids:0 >> ADD >> s >> 295207d7-41ea-4cda-a028-f860c357d46b:4:/dev/295207d7-41ea-4cda-a028-f860c357d46b/ids:0 >> ADD >> s >> 680d5ed1-ed70-4340-a430-ddfa39ee3052:4:/dev/680d5ed1-ed70-4340-a430-ddfa39ee3052/ids:0 >> ADD >> >> # >> # The output that I've found in engine database: >> # >> engine=# SELECT >> vds_spm_id_map.storage_pool_id,vds_spm_id_map.vds_spm_id,vds_spm_id_map.vds_id,vds.vds_name >> FROM vds_spm_id_map, vds WHERE vds_spm_id_map.vds_id = vds.vds_id; >> storage_pool_id | vds_spm_id | vds_id >> | vds_name >> --------------------------------------+------------+--------------------------------------+--------------------------- >> 00000002-0002-0002-0002-000000000208 | 6 | >> c6aef4f9-e972-40a0-916e-4ed296de46db | virt4 >> 00000002-0002-0002-0002-000000000208 | 5 | >> 5922e88b-c6de-41ce-ab64-046f66c8d08e | virt6 >> 00000002-0002-0002-0002-000000000208 | 4 | >> fcd962ea-3158-468d-a0b9-d7bb864ba959 | virt7 >> 00000002-0002-0002-0002-000000000208 | 1 | >> b43933d7-7338-41f6-9a71-f7cd389b9167 | virt1 >> 00000002-0002-0002-0002-000000000208 | 2 | >> 031526e8-110e-4254-97ef-1a26cb67b835 | virt3 >> 00000002-0002-0002-0002-000000000208 | 3 | >> 09609537-0c33-437a-93fa-b246f0bb57e4 | virt2 >> (6 rows) >> >> So, in this case, for example, the host virt4 have host_id=4 and >> vds_spm_id=6, so this host have these 2 ids in sanlock >> >> >> So my questions are: >> * How can i get out of this behaviour ? >> * As i didn't find the vds_spm_id on any hosts, can i modify this value in >> the database to make them identical to host_id ? >> * This strange behaviour is a possible side effect of the upgrade in ovirt >> in 3.6 and the import of hosted engine storage in ovirt engine ? >> >> Any pointers are welcome. >> >> >> Have a nice day. >> >> Regards. >> >> -- >> Baptiste >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

