> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver <Oliver.Albl(a)fabasoft.com> > wrote: > > What was the last change in the system? upgrade? network change? storage > change? >
Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including CentOS hosts to 7.7 1908) > > This is expected if some domain is not accessible on all hosts. > > > This means sanlock timed out renewing the lockspace > > > If a host cannot access all storage domain in the DC, the system set > it to non-operational, and will > probably try to reconnect it later. > > > This means reading 4k from start of the metadata lv took 9.6 seconds. > Something in > the way to storage is bad (kernel, network, storage). > > > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath > when there are > no active paths, before I/O fails, pausing the VM. We also resume > paused VMs when > storage monitoring works again, so maybe the VM were paused and resumed. > > However for storage monitoring we have strict 10 seconds timeout. If > reading from > the metadata lv times out or fail and does not operated normally after > 5 minutes, the > domain will become inactive. > > > This can explain the read timeouts. > > > This looks the right way to troubleshoot this. > > > We need vdsm logs to understand this failure. > > > This does not mean OVF is corrupted, only that we could not store new > data. The older data on the other > OVFSTORE disk is probably fine. Hopefuly the system will not try to > write to the other OVFSTORE disk > overwriting the last good version. > > > This is normal, the first 2048 bytes are always zeroes. This area was > used for domain > metadata in older versions. > > > Please share more details: > > - output of "lsblk" > - output of "multipath -ll" > - output of "/usr/libexec/vdsm/fc-scan -v" > - output of "vgs -o +tags problem-domain-id" > - output of "lvs -o +tags problem-domain-id" > - contents of /etc/multipath.conf > - contents of /etc/multipath.conf.d/*.conf > - /var/log/messages since the issue started > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts > > A bug is probably the best place to keep these logs and make it easy to trac. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821 > > Thanks, > Nir Thank you! Oliver _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/

