[ovirt-users] Re: fixed: ovirt small network outage causes HE root xfs crash due to race condition

2018-12-24 Thread Mike Lykov

25.12.2018 10:14, Mike Lykov пишет:

1. Why (when it cannot boot due to corruption) it NOT show anything at 
all in console?
I can get to grub menu (if moving fast enough), but if I continue boot I 
see a blinking cursor for many minutes and not more. Grub options not 
contain any splash/quiet parameters.
(exclusion for EDD message - it is meaningless, if I use edd=off - I get 
only black console).


Where is a kernel boot logs/console output? Are it try to load initrd at 
least?


2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 
unsuccessful pings and 10 seconds outage?
For HE VM stability (not crash/broke fs) are more important instead 
availability (I can live with unavailable it for 10-15 sec, but cannot 
with broken VM).


3. I stop ha-agent, broker and HE VM on all (two) nodes. Fix a partition 
in VM. Then I start ha-agent on nodes, and it BROKE VM fs AGAIN! (trying 
to decide which VM are starting).


I fix VM fs again, put a cluster in maintenance mode, start a VM on one 
node by hand, check it for status/health ok, and only then put ha-agent 
in work (none) mode. Easy way to broke the cluster by crash HE VM fs (by 
not put it to global maintenance mode).






---
Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Down
Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) 
entered disabled state
Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
enp59s0f0: link becomes ready
Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) 
entered forwarding state
Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]:  
[1545381193.2204] device (enp59s0f0): carrier: link connected

---

There is 17 second. at 33:13 link are back. BUT all events lead to 
crash follow later:


HA agent log:
--
MainThread::INFO::2018-12-21 
12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Engine vm running on localhost
MainThread::INFO::2018-12-21 
12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)
MainThread::INFO::2018-12-21 
12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) 
Penalizing score by 1280 due to gateway status
MainThread::INFO::2018-12-21 
12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 2120)
MainThread::ERROR::2018-12-21 
12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Host ovirtnode1.miac (id 1) score is significantly better than local 
score, shutting down VM on this host

--


syslog messages:

Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host 
ovirtnode1.miac (id 1) score is significantly better than local score, 
shutting down VM on this host
Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine 
VM stopped on localhost
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1796] device (vnet1): state change: disconnected -> 
unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1798] device (vnet1): released from master device ovirtmgmt
Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 
2783: **error : qemuMonitorIO:719 : internal error: End of 
file from qemu monitor*  - WHAT IS THIS?

Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active
Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine 
qemu-2-HostedEngine terminated.
Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: 
'/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev 
--physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables 
v1.4.21: goto 'FP-vnet1' is not a chain#012#0

12Try `iptables -h' or 'iptables --help' for more information.

Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
forwarding state
Dec 21 12:33:55 ovirtnode6 lldpad: 

[ovirt-users] Re: fixed: ovirt small network outage causes HE root xfs crash due to race condition

2018-12-24 Thread Mike Lykov

24.12.2018 11:30, Mike Lykov пишет:

Host nodes (centos 7.5) named ovirtnode1,5,6. Timeouts (in ha agent) are 
default. Sanlock are configured (as i think)

HE running on ovirtnode6, and spare HE deployed on ovirtnode1.


Fixed (as seems) by guestfish/xfs_repair method. It requires to zero xfs 
metadata logs, and this heavily relies on luck.


1. Why (when it cannot boot due to corruption) it NOT show anything at 
all in console?
I can get to grub menu (if moving fast enough), but if I continue boot I 
see a blinking cursor for many minutes and not more. Grub options not 
contain any splash/quiet parameters.
(exclusion for EDD message - it is meaningless, if I use edd=off - I get 
only black console).


Where is a kernel boot logs/console output? Are it try to load initrd at 
least?


2. How to set some timeouts for ha-agent NOT to try restart HE after 1-2 
unsuccessful pings and 10 seconds outage?
For HE VM stability (not crash/broke fs) are more important instead 
availability (I can live with unavailable it for 10-15 sec, but cannot 
with broken VM).






---
Dec 21 12:32:56 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Down
Dec 21 12:32:56 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered 
disabled state
Dec 21 12:33:13 ovirtnode6 kernel: bnx2x :3b:00.0 enp59s0f0: NIC 
Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit
Dec 21 12:33:13 ovirtnode6 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): 
enp59s0f0: link becomes ready
Dec 21 12:33:13 ovirtnode6 kernel: ovirtmgmt: port 1(enp59s0f0) entered 
forwarding state
Dec 21 12:33:13 ovirtnode6 NetworkManager[1715]:  
[1545381193.2204] device (enp59s0f0): carrier: link connected

---

There is 17 second. at 33:13 link are back. BUT all events lead to crash 
follow later:


HA agent log:
--
MainThread::INFO::2018-12-21 
12:32:59,540::states::444::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Engine vm running on localhost
MainThread::INFO::2018-12-21 
12:32:59,662::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 3400)
MainThread::INFO::2018-12-21 
12:33:09,797::states::136::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) 
Penalizing score by 1280 due to gateway status
MainThread::INFO::2018-12-21 
12:33:09,798::hosted_engine::491::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) 
Current state EngineUp (score: 2120)
MainThread::ERROR::2018-12-21 
12:33:19,815::states::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) 
Host ovirtnode1.miac (id 1) score is significantly better than local 
score, shutting down VM on this host

--


syslog messages:

Dec 21 12:33:19 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Host 
ovirtnode1.miac (id 1) score is significantly better than local score, 
shutting down VM on this host
Dec 21 12:33:29 ovirtnode6 journal: ovirt-ha-agent 
ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM 
stopped on localhost
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:37 ovirtnode6 kernel: device vnet1 left promiscuous mode
Dec 21 12:33:37 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1796] device (vnet1): state change: disconnected -> 
unmanaged (reason 'unmanaged', sys-iface-state: 'removed')
Dec 21 12:33:37 ovirtnode6 NetworkManager[1715]:  
[1545381217.1798] device (vnet1): released from master device ovirtmgmt
Dec 21 12:33:37 ovirtnode6 libvirtd: 2018-12-21 08:33:37.192+: 2783: 
**error : qemuMonitorIO:719 : internal error: End of file 
from qemu monitor*  - WHAT IS THIS?

Dec 21 12:33:37 ovirtnode6 kvm: 2 guests now active
Dec 21 12:33:37 ovirtnode6 systemd-machined: Machine qemu-2-HostedEngine 
terminated.
Dec 21 12:33:37 ovirtnode6 firewalld[1693]: WARNING: COMMAND_FAILED: 
'/usr/sbin/iptables -w2 -w -D libvirt-out -m physdev 
--physdev-is-bridged --physdev-out vnet1 -g FP-vnet1' failed: iptables 
v1.4.21: goto 'FP-vnet1' is not a chain#012#0

12Try `iptables -h' or 'iptables --help' for more information.

Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
disabled state

Dec 21 12:33:55 ovirtnode6 kernel: device vnet1 entered promiscuous mode
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
blocking state
Dec 21 12:33:55 ovirtnode6 kernel: ovirtmgmt: port 3(vnet1) entered 
forwarding state
Dec 21 12:33:55 ovirtnode6 lldpad: recvfrom(Event interface): No buffer 
space available
Dec 21 12:33:55 ovirtnode6 NetworkManager[1715]:  
[1545381235.8086] manager: (vnet1): 

[ovirt-users] Re: VMs import over slow 1gig interface instead of fast 10gig interface?

2018-12-24 Thread Sofia Martine

I really appreciate this wonderful post that you have provided for us. I assure 
this would be beneficial for most of the people.
http://how-to-activate.net/| http://manage-notron.com/ |   
http://offiice-products.com/   |   http://suites-office.com/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3W3YOSDIBWN6OWTSCOQGRMSZGNUZX5KR/