Hi,

> Hello,
> 
> I have 3 node HCI cluster with glusterfs. oVirt 4.4.9.5-1. In last 2
> weeks I experience 2 outages where HE and all/some vms were restarted.
> While digging in logs I can see that sanlock cannot renew leases and it
> leads to killing vms as is very good described in [1].
> 
> It looks to me like some hw issue with one of the hosts but cannot find
> which one.

when you check the sanlock logs (/var/log/sanlock.log) around the time of 
outage, you should be able to see which of the host failed to renew its 
sanlock leases. It could be on some of them (could be some issue with these 
host(s)) or on all of them (in this case is more likely a network issue or 
storage issue).

Also, if you want only find out which hosts wasn't able to renew the leases, 
it's even more easy - it was the host whose VMs were killed. If the host runs 
HA VMs and host is not able to renew its leases, sanlock will kill VMs running 
on this host.

Vojta
 
> for example today's outage restarted vms on hosts 1 and 2 but not 3.
> Sanlock logs
> 
> there are these lines in /var/log/messages on host 2 (ovirt-hci02)
> 
> Jan 13 08:27:25 ovirt-hci02 sanlock[1263]: 2022-01-13 08:27:25 1416706
> [341378]: s7 delta_renew read timeout 10 sec offset 0
> /rhev/data-center/mnt/glusterSD/10.0.4.11:_vms/6de5ae6d-c7cc-4292-bdbf-10495
> a38837b/dom_md/ids Jan 13 08:28:59 ovirt-hci02 sanlock[1263]: 2022-01-13
> 08:28:59 1416800 [341257]: write_sectors delta_leader offset 1024 rv -202
> /rhev/data-center/mnt/glusterSD/10.0.4.11:_engine/816a3d0b-2e10-4900-b3cb-4a
> 9b5cd0dd5d/dom_md/ids Jan 13 08:29:27 ovirt-hci02 sanlock[1263]: 2022-01-13
> 08:29:27 1416828 [4189968]: write_sectors delta_leader offset 1024 rv -202
> /rhev/data-center/mnt/glusterSD/10.0.4.11:_engine/816a3d0b-2e10-4900-b3cb-4a
> 9b5cd0dd5d/dom_md/ids
> 
> but not on hosts 1 and 3. Could it indicate that there could be storage
> related problem on host 1?
> 
> could you please suggest further/better debugging approach?
> 
> Thanx a lot,
> 
> Jiri
> 
> [1] https://www.ovirt.org/develop/developer-guide/vdsm/sanlock.html

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/UJYSBUBM3CGP762Q4WRZSF67KJNGGIVC/

Reply via email to