> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver <Oliver.Albl(a)fabasoft.com&gt; 
> wrote:
> 
> What was the last change in the system? upgrade? network change? storage 
> change?
> 

Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including 
CentOS hosts to 7.7 1908)

> 
> This is expected if some domain is not accessible on all hosts.
> 
> 
> This means sanlock timed out renewing the lockspace
> 
> 
> If a host cannot access all storage domain in the DC, the system set
> it to non-operational, and will
> probably try to reconnect it later.
> 
> 
> This means reading 4k from start of the metadata lv took 9.6 seconds.
> Something in
> the way to storage is bad (kernel, network, storage).
> 
> 
> We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> when there are
> no active paths, before I/O fails, pausing the VM. We also resume
> paused VMs when
> storage monitoring works again, so maybe the VM were paused and resumed.
> 
> However for storage monitoring we have strict 10 seconds timeout. If
> reading from
> the metadata lv times out or fail and does not operated normally after
> 5 minutes, the
> domain will become inactive.
> 
> 
> This can explain the read timeouts.
> 
> 
> This looks the right way to troubleshoot this.
> 
> 
> We need vdsm logs to understand this failure.
> 
> 
> This does not mean OVF is corrupted, only that we could not store new
> data. The older data on the other
> OVFSTORE disk is probably fine. Hopefuly the system will not try to
> write to the other OVFSTORE disk
> overwriting the last good version.
> 
> 
> This is normal, the first 2048 bytes are always zeroes. This area was
> used for domain
> metadata in older versions.
> 
> 
> Please share more details:
> 
> - output of "lsblk"
> - output of "multipath -ll"
> - output of "/usr/libexec/vdsm/fc-scan -v"
> - output of "vgs -o +tags problem-domain-id"
> - output of "lvs -o +tags problem-domain-id"
> - contents of /etc/multipath.conf
> - contents of /etc/multipath.conf.d/*.conf
> - /var/log/messages since the issue started
> - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> 
> A bug is probably the best place to keep these logs and make it easy to trac.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

> 
> Thanks,
> Nir

Thank you!
Oliver
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/

Reply via email to