[ovirt-users] Re: fixed: ovirt small network outage causes HE root xfs crash due to race condition
25.12.2018 10:14, Mike Lykov пишет: 1. Why (when it cannot boot due to corruption) it NOT show anything at all in console? I can get to grub menu (if moving fast enough), but if I continue boot I see a blinking cursor for many minutes and not more. Grub options not contain any splash/quiet parameters. (exclusion for EDD message - it is meaningless, if I use edd=off - I get only black console). Where is a kernel boot logs/console output? Are it try to load initrd at least? 2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 unsuccessful pings and 10 seconds outage? For HE VM stability (not crash/broke fs) are more important instead availability (I can live with unavailable it for 10-15 sec, but cannot with broken VM). 3. I stop ha-agent, broker and HE VM on all (two) nodes. Fix a partition in VM. Then I start ha-agent on nodes, and it BROKE VM fs AGAIN! (trying to decide which VM are starting). I fix VM fs again, put a cluster in maintenance mode, start a VM on one node by hand, check it for status/health ok, and only then put ha-agent in work (none) mode. Easy way to broke the cluster by crash HE VM fs (by not put it to global maintenance mode). --- Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC Link is Down Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered disabled state Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp59s0f0: link becomes ready Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered forwarding state Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]: [1545381193.2204] device (enp59s0f0): carrier: link connected --- There is 17 second. at 33:13 link are back. BUT all events lead to crash follow later: HA agent log: -- MainThread::INFO::2018-12-21 12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost MainThread::INFO::2018-12-21 12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400) MainThread::INFO::2018-12-21 12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 1280 due to gateway status MainThread::INFO::2018-12-21 12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 2120) MainThread::ERROR::2018-12-21 12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host ovirtnode1.miac (id 1) score is significantly better than local score, shutting down VM on this host -- syslog messages: Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host ovirtnode1.miac (id 1) score is significantly better than local score, shutting down VM on this host Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: [1545381217.1796] device (vnet1): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed') Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: [1545381217.1798] device (vnet1): released from master device ovirtmgmt Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 2783: **error : qemuMonitorIO:719 : internal error: End of file from qemu monitor* - WHAT IS THIS? Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine qemu-2-HostedEngine terminated. Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables v1.4.21: goto 'FP-vnet1' is not a chain#012#0 12Try `iptables -h' or 'iptables --help' for more information. Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered blocking state Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered blocking state Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered forwarding state Dec 21 12:33:55 ovirtnode6 lldpad:
[ovirt-users] Re: fixed: ovirt small network outage causes HE root xfs crash due to race condition
24.12.2018 11:30, Mike Lykov пишет: Host nodes (centos 7.5) named ovirtnode1,5,6. Timeouts (in ha agent) are default. Sanlock are configured (as i think) HE running on ovirtnode6, and spare HE deployed on ovirtnode1. Fixed (as seems) by guestfish/xfs_repair method. It requires to zero xfs metadata logs, and this heavily relies on luck. 1. Why (when it cannot boot due to corruption) it NOT show anything at all in console? I can get to grub menu (if moving fast enough), but if I continue boot I see a blinking cursor for many minutes and not more. Grub options not contain any splash/quiet parameters. (exclusion for EDD message - it is meaningless, if I use edd=off - I get only black console). Where is a kernel boot logs/console output? Are it try to load initrd at least? 2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 unsuccessful pings and 10 seconds outage? For HE VM stability (not crash/broke fs) are more important instead availability (I can live with unavailable it for 10-15 sec, but cannot with broken VM). --- Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC Link is Down Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered disabled state Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp59s0f0: link becomes ready Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered forwarding state Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]: [1545381193.2204] device (enp59s0f0): carrier: link connected --- There is 17 second. at 33:13 link are back. BUT all events lead to crash follow later: HA agent log: -- MainThread::INFO::2018-12-21 12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost MainThread::INFO::2018-12-21 12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 3400) MainThread::INFO::2018-12-21 12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) Penalizing score by 1280 due to gateway status MainThread::INFO::2018-12-21 12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state EngineUp (score: 2120) MainThread::ERROR::2018-12-21 12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host ovirtnode1.miac (id 1) score is significantly better than local score, shutting down VM on this host -- syslog messages: Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host ovirtnode1.miac (id 1) score is significantly better than local score, shutting down VM on this host Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM stopped on localhost Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: [1545381217.1796] device (vnet1): state change: disconnected -> unmanaged (reason 'unmanaged', sys-iface-state: 'removed') Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]: [1545381217.1798] device (vnet1): released from master device ovirtmgmt Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 2783: **error : qemuMonitorIO:719 : internal error: End of file from qemu monitor* - WHAT IS THIS? Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine qemu-2-HostedEngine terminated. Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev --physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables v1.4.21: goto 'FP-vnet1' is not a chain#012#0 12Try `iptables -h' or 'iptables --help' for more information. Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered blocking state Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered disabled state Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered blocking state Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered forwarding state Dec 21 12:33:55 ovirtnode6 lldpad: recvfrom(Event interface): No buffer space available Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]: [1545381235.8086] manager: (vnet1):
[ovirt-users] Re: VMs import over slow 1gig interface instead of fast 10gig interface?
I really appreciate this wonderful post that you have provided for us. I assure this would be beneficial for most of the people. http://how-to-activate.net/| http://manage-notron.com/ | http://offiice-products.com/ | http://suites-office.com/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3W3YOSDIBWN6OWTSCOQGRMSZGNUZX5KR/