Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Thu, Feb 04, 2016 at 06:26:14PM +0100, Stefano Danzi wrote: > > > Il 04/02/2016 16.55, Dan Kenigsberg ha scritto: > >On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote: > >>On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote: > >>>On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: > I did some tests: > > kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach > one network cable the network is stable) > kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine > >>>Would you be kind to file a kernel bug in bugzilla.redhat.com? > >>>Summarize the information from this thread (e.g. your ifcfgs and in what > >>>way does mode 4 doesn't work). > >>> > >>>To get the bug solved quickly we'd better find paying RHEL7 customer > >>>subscribing to it. But I'll try to push from my direction. > >>Stefano has been kind to open > >> > >> Bug 1295423 - Unstable network link using bond mode = 4 > >> https://bugzilla.redhat.com/show_bug.cgi?id=1295423 > >> > >>which we fail to reproduce on our own lab. I'd be pleased if anybody who > >>experiences it, and their networking config to the bug (if it is > >>different). Can you also lay out your switch's hardware and > >>configuration? > >Stefano, could you share your /proc/net/bonding/* files with us? > >I heard about similar reports were the bond slaves had mismatching > >aggregator id. Could it be your case as well? > > > > Here: > > [root@ovirt01 ~]# cat /proc/net/bonding/bond0 > Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) > > Bonding Mode: IEEE 802.3ad Dynamic link aggregation > Transmit Hash Policy: layer2 (0) > MII Status: up > MII Polling Interval (ms): 100 > Up Delay (ms): 0 > Down Delay (ms): 0 > > 802.3ad info > LACP rate: slow > Min links: 0 > Aggregator selection policy (ad_select): stable > Active Aggregator Info: > Aggregator ID: 2 > Number of ports: 1 > Actor Key: 9 > Partner Key: 1 > Partner Mac Address: 00:00:00:00:00:00 > > Slave Interface: enp4s0 > MII Status: up > Speed: 1000 Mbps > Duplex: full > Link Failure Count: 2 > Permanent HW addr: **:**:**:**:**:f1 > Slave queue ID: 0 > Aggregator ID: 1 ---^^^ > Actor Churn State: churned > Partner Churn State: churned > Actor Churned Count: 4 > Partner Churned Count: 5 > details actor lacp pdu: > system priority: 65535 > port key: 9 > port priority: 255 > port number: 1 > port state: 69 > details partner lacp pdu: > system priority: 65535 > oper key: 1 > port priority: 255 > port number: 1 > port state: 1 > > Slave Interface: enp5s0 > MII Status: up > Speed: 1000 Mbps > Duplex: full > Link Failure Count: 1 > Permanent HW addr: **:**:**:**:**:f2 > Slave queue ID: 0 > Aggregator ID: 2 ---^^^ it sounds awfully familiar - mismatching aggregator IDs, and an all-zero partner mac. Can you double-check that both your nics are wired to the same switch, which is properly configured to use lacp on these two ports? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Il 04/02/2016 16.55, Dan Kenigsberg ha scritto: On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote: On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote: On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: I did some tests: kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach one network cable the network is stable) kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine Would you be kind to file a kernel bug in bugzilla.redhat.com? Summarize the information from this thread (e.g. your ifcfgs and in what way does mode 4 doesn't work). To get the bug solved quickly we'd better find paying RHEL7 customer subscribing to it. But I'll try to push from my direction. Stefano has been kind to open Bug 1295423 - Unstable network link using bond mode = 4 https://bugzilla.redhat.com/show_bug.cgi?id=1295423 which we fail to reproduce on our own lab. I'd be pleased if anybody who experiences it, and their networking config to the bug (if it is different). Can you also lay out your switch's hardware and configuration? Stefano, could you share your /proc/net/bonding/* files with us? I heard about similar reports were the bond slaves had mismatching aggregator id. Could it be your case as well? Here: [root@ovirt01 ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Min links: 0 Aggregator selection policy (ad_select): stable Active Aggregator Info: Aggregator ID: 2 Number of ports: 1 Actor Key: 9 Partner Key: 1 Partner Mac Address: 00:00:00:00:00:00 Slave Interface: enp4s0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 2 Permanent HW addr: **:**:**:**:**:f1 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: churned Partner Churn State: churned Actor Churned Count: 4 Partner Churned Count: 5 details actor lacp pdu: system priority: 65535 port key: 9 port priority: 255 port number: 1 port state: 69 details partner lacp pdu: system priority: 65535 oper key: 1 port priority: 255 port number: 1 port state: 1 Slave Interface: enp5s0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: **:**:**:**:**:f2 Slave queue ID: 0 Aggregator ID: 2 Actor Churn State: churned Partner Churn State: churned Actor Churned Count: 1 Partner Churned Count: 2 details actor lacp pdu: system priority: 65535 port key: 9 port priority: 255 port number: 2 port state: 77 details partner lacp pdu: system priority: 65535 oper key: 1 port priority: 255 port number: 1 port state: 1 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote: > On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote: > > On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: > > > I did some tests: > > > > > > kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach > > > one network cable the network is stable) > > > kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine > > > > Would you be kind to file a kernel bug in bugzilla.redhat.com? > > Summarize the information from this thread (e.g. your ifcfgs and in what > > way does mode 4 doesn't work). > > > > To get the bug solved quickly we'd better find paying RHEL7 customer > > subscribing to it. But I'll try to push from my direction. > > Stefano has been kind to open > > Bug 1295423 - Unstable network link using bond mode = 4 > https://bugzilla.redhat.com/show_bug.cgi?id=1295423 > > which we fail to reproduce on our own lab. I'd be pleased if anybody who > experiences it, and their networking config to the bug (if it is > different). Can you also lay out your switch's hardware and > configuration? Stefano, could you share your /proc/net/bonding/* files with us? I heard about similar reports were the bond slaves had mismatching aggregator id. Could it be your case as well? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
- Original Message - > From: "Stefano Danzi" > To: "Dan Kenigsberg" > Cc: users@ovirt.org > Sent: Thursday, January 7, 2016 3:53:11 PM > Subject: Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> > 3.6.1 > > > > Il 07/01/2016 12.18, Stefano Danzi ha scritto: > > > > > > Il 06/01/2016 7.45, Dan Kenigsberg ha scritto: > >> On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote: > >>> On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: > >>>> I did some tests: > >>>> > >>>> kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I > >>>> detach > >>>> one network cable the network is stable) > >>>> kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine > >>> Would you be kind to file a kernel bug in bugzilla.redhat.com? > >>> Summarize the information from this thread (e.g. your ifcfgs and in > >>> what > >>> way does mode 4 doesn't work). > >>> > >>> To get the bug solved quickly we'd better find paying RHEL7 customer > >>> subscribing to it. But I'll try to push from my direction. > >> Stefano has been kind to open > >> > >> Bug 1295423 - Unstable network link using bond mode = 4 > >> https://bugzilla.redhat.com/show_bug.cgi?id=1295423 > >> > >> which we fail to reproduce on our own lab. I'd be pleased if anybody who > >> experiences it, and their networking config to the bug (if it is > >> different). Can you also lay out your switch's hardware and > >> configuration? > >> > > > > I made some tests using kernel 3.10.0-327.4.4.el7.x86_64. > > I did a TCP dump on virtual interface "DMZ" (VLAN X on bond0). > > > > When I have two netwok cables connected I can se ARP requests but not > > ARP replyes. > > When I detach one network cable I can see ARP requests and ARP replyes > > (and networking on VM works). > > > > Maybe the problem isn't in bonding config but in qemu/kvm/vhost_net > > How I can enable a debug log for the bond? Hi Michael, Maybe you can assist. How can debug log for bond be enabled? > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Il 06/01/2016 7.45, Dan Kenigsberg ha scritto: On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote: On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: I did some tests: kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach one network cable the network is stable) kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine Would you be kind to file a kernel bug in bugzilla.redhat.com? Summarize the information from this thread (e.g. your ifcfgs and in what way does mode 4 doesn't work). To get the bug solved quickly we'd better find paying RHEL7 customer subscribing to it. But I'll try to push from my direction. Stefano has been kind to open Bug 1295423 - Unstable network link using bond mode = 4 https://bugzilla.redhat.com/show_bug.cgi?id=1295423 which we fail to reproduce on our own lab. I'd be pleased if anybody who experiences it, and their networking config to the bug (if it is different). Can you also lay out your switch's hardware and configuration? I made some tests using kernel 3.10.0-327.4.4.el7.x86_64. I did a TCP dump on virtual interface "DMZ" (VLAN X on bond0). When I have two netwok cables connected I can se ARP requests but not ARP replyes. When I detach one network cable I can see ARP requests and ARP replyes (and networking on VM works). Maybe the problem isn't in bonding config but in qemu/kvm/vhost_net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote: > On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: > > I did some tests: > > > > kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach > > one network cable the network is stable) > > kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine > > Would you be kind to file a kernel bug in bugzilla.redhat.com? > Summarize the information from this thread (e.g. your ifcfgs and in what > way does mode 4 doesn't work). > > To get the bug solved quickly we'd better find paying RHEL7 customer > subscribing to it. But I'll try to push from my direction. Stefano has been kind to open Bug 1295423 - Unstable network link using bond mode = 4 https://bugzilla.redhat.com/show_bug.cgi?id=1295423 which we fail to reproduce on our own lab. I'd be pleased if anybody who experiences it, and their networking config to the bug (if it is different). Can you also lay out your switch's hardware and configuration? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote: > I did some tests: > > kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach > one network cable the network is stable) > kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine Would you be kind to file a kernel bug in bugzilla.redhat.com? Summarize the information from this thread (e.g. your ifcfgs and in what way does mode 4 doesn't work). To get the bug solved quickly we'd better find paying RHEL7 customer subscribing to it. But I'll try to push from my direction. Regards, Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
I did some tests: kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach one network cable the network is stable) kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine Il 31/12/2015 9.44, Dan Kenigsberg ha scritto: On Wed, Dec 30, 2015 at 09:39:12PM +0100, Stefano Danzi wrote: Hi Dan, some info about my network setup: - My bond is used only for VM networking. ovirtmgmt has a dedicated ethernet card. - I haven't set any ethtool opts. [cut] I do not see anything suspecious here. Which kernel version worked well for you? Would it be possible to boot the machine with it, and retest bond mode 4, so that we can whole-heartedly place the blame on kernel? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Just to put in my experience. Installed single system with SH engine in 3.6.0 and CentOS 7.1. Then updated to oVirt 3.6.1 and CentOS 7.2. I never had problems regarding bonding neither in 3.6.0 nor in 3.6.1. My current kernel is 3.10.0-327.3.1.el7.x86_64 The server hw is a blade PowerEdge M910 with 4 Gigabit adapters [root@ractor ~]# lspci | grep igab 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) They are connected to Cisco switches with ports configured as 802.3ad (I have no details at hand for Cisco model but I can verify) And this is the situation for VM bonding, where I only customized mode=4 to specify lacp_rate=1 (default is slow) - bridges [root@ractor ~]# brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000. no ovirtmgmt 8000.002564ff0bf4 no bond1 vnet0 vlan65 8000.002564ff0bf0 no bond0.65 vnet1 vnet2 - bond device for VMs vlans [root@ractor ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable Active Aggregator Info: Aggregator ID: 1 Number of ports: 2 Actor Key: 9 Partner Key: 8 Partner Mac Address: 00:01:02:03:04:0c Slave Interface: em1 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:64:ff:0b:f0 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 port key: 9 port priority: 255 port number: 1 port state: 63 details partner lacp pdu: system priority: 32768 oper key: 8 port priority: 32768 port number: 137 port state: 63 Slave Interface: em2 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:64:ff:0b:f2 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 port key: 9 port priority: 255 port number: 2 port state: 63 details partner lacp pdu: system priority: 32768 oper key: 8 port priority: 32768 port number: 603 port state: 63 - bond device for ovirtmgmt [root@ractor ~]# cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: fast Min links: 0 Aggregator selection policy (ad_select): stable Active Aggregator Info: Aggregator ID: 1 Number of ports: 2 Actor Key: 9 Partner Key: 16 Partner Mac Address: 00:01:02:03:04:0c Slave Interface: em3 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:64:ff:0b:f4 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 port key: 9 port priority: 255 port number: 1 port state: 63 details partner lacp pdu: system priority: 32768 oper key: 16 port priority: 32768 port number: 145 port state: 63 Slave Interface: em4 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:64:ff:0b:f6 Slave queue ID: 0 Aggregator ID: 1 Actor Churn State: none Partner Churn State: none Actor Churned Count: 0 Partner Churned Count: 0 details actor lacp pdu: system priority: 65535 port key: 9 port priority: 255 port number: 2 port state: 63 details partner lacp pdu: system priority: 32768 oper key: 16 port priority: 32768 port number: 611 port state: 63 No particular settings for single interfaces. This is what has been set by the system for both em1, em2, em3 and em4 (output shown only for em1): [root@ractor ~]# ethtool -k em1 Features for em1: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Hi, I'm not near the server for a while but network is set up 2x broadcom nic with whatever driver works out of the box. Both set as slaves to bond0 which is in mode=4 with no explicit options. Switch is a fairly basic tp-link but works almost identically to a Cisco. Has the 2 ports set up in a portchannel. Short and long, this config works in earlier kernel, but not with kernel shipped in 7.2 Release notes for RH7.2 suggest some work on bonding has been done, wonder if default options (LACP speed?) have changed? Jon On 30 December 2015 09:44:02 GMT+00:00, Dan Kenigsberg wrote: >On Tue, Dec 29, 2015 at 09:57:07PM +, Jon Archer wrote: >> Hi Stefano, >> >> It's definitely not the switch, it seems to be the latest kernel >package >> (kernel-3.10.0-327.3.1.el7.x86_64) which stops bonding working >correctly, >> reverting back to the previous kernel brings the network up in >802.3ad mode >> (4). >> >> I know, from reading the release notes of 7.2, that there were some >changes >> to the bonding bits in the kernel so i'm guessing maybe some defaults >have >> changed. >> >> I'll keep digging and post back as soon as i have something. >> >> Jon >> >> On 29/12/15 19:55, Stefano Danzi wrote: >> >Hi! I didn't solve yet. I'm still using mode 2 on bond interface. >What's >> >your switch model and firmware version? > >Hi Jon and Stefano, > >We've been testing bond mode 4 with (an earlier) >kernel-3.10.0-327.el7.x86_64 and experienced no such behaviour. > >However, to better identify the suspected kernel bug, could you provide >more information regarding your network connectivity? > >What is the make of your NICs? Which driver do you use? > >Do you set special ethtool opts (LRO with bridge was broken in 7.2.0 >kernel if I am not mistaken)? > >You have the ovirtmgmt bridge on top of your bond, right? > >Can you share your ifcfg*? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Wed, Dec 30, 2015 at 09:39:12PM +0100, Stefano Danzi wrote: > Hi Dan, > some info about my network setup: > > - My bond is used only for VM networking. ovirtmgmt has a dedicated ethernet > card. > - I haven't set any ethtool opts. > - Nics on bond specs: > 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network > Connection > Subsystem: ASUSTeK Computer Inc. Motherboard > Flags: bus master, fast devsel, latency 0, IRQ 16 > Memory at df20 (32-bit, non-prefetchable) [size=128K] > I/O ports at e000 [size=32] > Memory at df22 (32-bit, non-prefetchable) [size=16K] > Capabilities: [c8] Power Management version 2 > Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [e0] Express Endpoint, MSI 00 > Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- > Capabilities: [100] Advanced Error Reporting > Kernel driver in use: e1000e > > [root@ovirt01 ~]# ifconfig > DMZ: flags=4163 mtu 1500 > txqueuelen 0 (Ethernet) > RX packets 43546 bytes 2758816 (2.6 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 0 bytes 0 (0.0 B) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > LAN_HAW: flags=4163 mtu 1500 > txqueuelen 0 (Ethernet) > RX packets 2090262 bytes 201078292 (191.7 MiB) > RX errors 0 dropped 86 overruns 0 frame 0 > TX packets 0 bytes 0 (0.0 B) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > bond0: flags=5187 mtu 1500 > txqueuelen 0 (Ethernet) > RX packets 2408059 bytes 456371629 (435.2 MiB) > RX errors 0 dropped 185 overruns 0 frame 0 > TX packets 118966 bytes 14862549 (14.1 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > bond0.1: flags=4163 mtu 1500 > txqueuelen 0 (Ethernet) > RX packets 2160985 bytes 210157656 (200.4 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 0 bytes 0 (0.0 B) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > bond0.3: flags=4163 mtu 1500 > txqueuelen 0 (Ethernet) > RX packets 151195 bytes 185253584 (176.6 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 118663 bytes 13857950 (13.2 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > enp4s0: flags=6211 mtu 1500 > txqueuelen 1000 (Ethernet) > RX packets 708141 bytes 95034564 (90.6 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 16714 bytes 5193108 (4.9 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > device interrupt 16 memory 0xdf20-df22 > > enp5s0: flags=6211 mtu 1500 > txqueuelen 1000 (Ethernet) > RX packets 1699934 bytes 361339105 (344.5 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 102252 bytes 9669441 (9.2 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > device interrupt 17 memory 0xdf10-df12 > > enp6s1: flags=4163 mtu 1500 > txqueuelen 1000 (Ethernet) > RX packets 2525232 bytes 362345893 (345.5 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 388452 bytes 208145492 (198.5 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > lo: flags=73 mtu 65536 > inet 127.0.0.1 netmask 255.0.0.0 > loop txqueuelen 0 (Local Loopback) > RX packets 116465661 bytes 1515059255942 (1.3 TiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 116465661 bytes 1515059255942 (1.3 TiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > ovirtmgmt: flags=4163 mtu 1500 > inet 192.168.1.50 netmask 255.255.255.0 broadcast 192.168.1.255 > txqueuelen 0 (Ethernet) > RX packets 3784298 bytes 36509 (529.8 MiB) > RX errors 0 dropped 86 overruns 0 frame 0 > TX packets 1737669 bytes 1401650369 (1.3 GiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > vnet0: flags=4163 mtu 1500 > txqueuelen 500 (Ethernet) > RX packets 558574 bytes 107521742 (102.5 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 1316892 bytes 487764500 (465.1 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > vnet1: flags=4163 mtu 1500 > txqueuelen 500 (Ethernet) > RX packets 42282 bytes 7373007 (7.0 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 40498 bytes 17598215 (16.7 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > vnet2: flags=4163 mtu 1500 > txqueuelen 500 (Ethernet) > RX packets 79388 bytes 16807917 (16.0 MiB) > R
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Hi Dan, some info about my network setup: - My bond is used only for VM networking. ovirtmgmt has a dedicated ethernet card. - I haven't set any ethtool opts. - Nics on bond specs: 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: ASUSTeK Computer Inc. Motherboard Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at df20 (32-bit, non-prefetchable) [size=128K] I/O ports at e000 [size=32] Memory at df22 (32-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [a0] MSI-X: Enable+ Count=5 Masked- Capabilities: [100] Advanced Error Reporting Kernel driver in use: e1000e [root@ovirt01 ~]# ifconfig DMZ: flags=4163 mtu 1500 txqueuelen 0 (Ethernet) RX packets 43546 bytes 2758816 (2.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 LAN_HAW: flags=4163 mtu 1500 txqueuelen 0 (Ethernet) RX packets 2090262 bytes 201078292 (191.7 MiB) RX errors 0 dropped 86 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0: flags=5187 mtu 1500 txqueuelen 0 (Ethernet) RX packets 2408059 bytes 456371629 (435.2 MiB) RX errors 0 dropped 185 overruns 0 frame 0 TX packets 118966 bytes 14862549 (14.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0.1: flags=4163 mtu 1500 txqueuelen 0 (Ethernet) RX packets 2160985 bytes 210157656 (200.4 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0.3: flags=4163 mtu 1500 txqueuelen 0 (Ethernet) RX packets 151195 bytes 185253584 (176.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 118663 bytes 13857950 (13.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0: flags=6211 mtu 1500 txqueuelen 1000 (Ethernet) RX packets 708141 bytes 95034564 (90.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 16714 bytes 5193108 (4.9 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 memory 0xdf20-df22 enp5s0: flags=6211 mtu 1500 txqueuelen 1000 (Ethernet) RX packets 1699934 bytes 361339105 (344.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 102252 bytes 9669441 (9.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 memory 0xdf10-df12 enp6s1: flags=4163 mtu 1500 txqueuelen 1000 (Ethernet) RX packets 2525232 bytes 362345893 (345.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 388452 bytes 208145492 (198.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 116465661 bytes 1515059255942 (1.3 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 116465661 bytes 1515059255942 (1.3 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ovirtmgmt: flags=4163 mtu 1500 inet 192.168.1.50 netmask 255.255.255.0 broadcast 192.168.1.255 txqueuelen 0 (Ethernet) RX packets 3784298 bytes 36509 (529.8 MiB) RX errors 0 dropped 86 overruns 0 frame 0 TX packets 1737669 bytes 1401650369 (1.3 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vnet0: flags=4163 mtu 1500 txqueuelen 500 (Ethernet) RX packets 558574 bytes 107521742 (102.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1316892 bytes 487764500 (465.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vnet1: flags=4163 mtu 1500 txqueuelen 500 (Ethernet) RX packets 42282 bytes 7373007 (7.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 40498 bytes 17598215 (16.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vnet2: flags=4163 mtu 1500 txqueuelen 500 (Ethernet) RX packets 79388 bytes 16807917 (16.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 164596 bytes 183858757 (175.3 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: load balancing
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Tue, Dec 29, 2015 at 09:57:07PM +, Jon Archer wrote: > Hi Stefano, > > It's definitely not the switch, it seems to be the latest kernel package > (kernel-3.10.0-327.3.1.el7.x86_64) which stops bonding working correctly, > reverting back to the previous kernel brings the network up in 802.3ad mode > (4). > > I know, from reading the release notes of 7.2, that there were some changes > to the bonding bits in the kernel so i'm guessing maybe some defaults have > changed. > > I'll keep digging and post back as soon as i have something. > > Jon > > On 29/12/15 19:55, Stefano Danzi wrote: > >Hi! I didn't solve yet. I'm still using mode 2 on bond interface. What's > >your switch model and firmware version? Hi Jon and Stefano, We've been testing bond mode 4 with (an earlier) kernel-3.10.0-327.el7.x86_64 and experienced no such behaviour. However, to better identify the suspected kernel bug, could you provide more information regarding your network connectivity? What is the make of your NICs? Which driver do you use? Do you set special ethtool opts (LRO with bridge was broken in 7.2.0 kernel if I am not mistaken)? You have the ovirtmgmt bridge on top of your bond, right? Can you share your ifcfg*? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Hi Stefano, It's definitely not the switch, it seems to be the latest kernel package (kernel-3.10.0-327.3.1.el7.x86_64) which stops bonding working correctly, reverting back to the previous kernel brings the network up in 802.3ad mode (4). I know, from reading the release notes of 7.2, that there were some changes to the bonding bits in the kernel so i'm guessing maybe some defaults have changed. I'll keep digging and post back as soon as i have something. Jon On 29/12/15 19:55, Stefano Danzi wrote: Hi! I didn't solve yet. I'm still using mode 2 on bond interface. What's your switch model and firmware version? Messaggio originale Da: Jon Archer Data: 29/12/2015 19:26 (GMT+01:00) A: users@ovirt.org Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 3.6.1 Stefano, I am currently experiencing the same issue. 2x nic lacp config at switch, mode 4 bond at server with no connectivity. Interestingly I am able to ping the switch itself. I haven't had time to investigate thoroughly but my first thought is an update somewhere. Did you ever resolve and get back to mode=4? Jon On 17 December 2015 17:51:50 GMT+00:00, Stefano Danzi wrote: I partially solve the problem. My host machine has 2 network interfaces with a bond. The bond was configured with mode=4 (802.3ad) and switch was configured in the same way. If I remove one network cable the network become stable. With both cables attached the network is instable. I removed the link aggregation configuration from switch and change the bond in mode=2 (balance-xor). Now the network are stable. The strange thing is that previous configuration worked fine for one year... since the last upgrade. Now ha-agent don't reboot the hosted-engine anymore, but I receive two emails from brocker evere 2/5 minutes. First a mail with "ovirt-hosted-engine state transition StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition ReinitializeFSM-EngineStarting" Il 17/12/2015 10.51, Stefano Danzi ha scritto: Hello, I have one testing host (only one host) with self hosted engine and 2 VM (one linux and one windows). After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work. Same thing happens fo others two VM. Before the upgrade I never had network problems. Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1 [SOLVED]
On Mon, Dec 28, 2015 at 4:06 PM, Yedidyah Bar David wrote: > On Mon, Dec 28, 2015 at 3:48 PM, Stefano Danzi wrote: > > Problem solved!!! > > > > The file hosted-engine.conf had a wrong fqdn. > > I don't think that this happened during upgrade... mybe thay my colleague > > did something of wrong... > > Thanks for the report :-) > > > > > > > Il 20/12/2015 14.52, Stefano Danzi ha scritto: > > > > Network problems was solved after changing Bond mode (and it's strange. > I > > have to investigate around qemu-kvm, cento 7.2 and switch firmware ), but > > broker problem still exist. If I turn on the host, ha agent start engine > > vm. When engine VM is up, broker strats to send email. Now I haven't > here > > detailed logs. > > > > > > Messaggio originale > > Da: Yedidyah Bar David > > Data: 20/12/2015 11:20 (GMT+01:00) > > A: Stefano Danzi , Dan Kenigsberg > > Cc: users > > Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> > 3.6.1 > > > > On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi wrote: > >> I found this in vdsm.log and I think that could be the problem: > >> > >> Thread-3771::ERROR::2015-12-18 > >> > >> > 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) > >> Connection closed: Connection closed > >> Thread-3771::ERROR::2015-12-18 > 16:18:58,597::API::1847::vds::(_getHaInfo) > >> failed to retrieve Hosted Engine HA info > >> Traceback (most recent call last): > >> File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo > >> stats = instance.get_all_stats() > >> File > >> > >> > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > >> line 103, in get_all_stats > >> self._configure_broker_conn(broker) > >> File > >> > >> > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > >> line 180, in _configure_broker_conn > >> dom_type=dom_type) > >> File > >> > >> > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > >> line 176, in set_storage_domain > >> .format(sd_type, options, e)) > >> RequestError: Failed to set storage domain FilesystemBackend, options > >> {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}: > >> Connection closed > > > > My guess is that this is a consequence of your networking problems. > > > > Adding Dan. > > > >> > >> > >> Il 17/12/2015 18.51, Stefano Danzi ha scritto: > >>> > >>> I partially solve the problem. > >>> > >>> My host machine has 2 network interfaces with a bond. The bond was > >>> configured with mode=4 (802.3ad) and switch was configured in the same > >>> way. > >>> If I remove one network cable the network become stable. With both > cables > >>> attached the network is instable. > >>> > >>> I removed the link aggregation configuration from switch and change the > >>> bond in mode=2 (balance-xor). Now the network are stable. > >>> The strange thing is that previous configuration worked fine for one > >>> year... since the last upgrade. > >>> > >>> Now ha-agent don't reboot the hosted-engine anymore, but I receive two > >>> emails from brocker evere 2/5 minutes. > >>> First a mail with "ovirt-hosted-engine state transition > >>> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state > >>> transition > >>> ReinitializeFSM-EngineStarting" > >>> > >>> > >>> Il 17/12/2015 10.51, Stefano Danzi ha scritto: > > Hello, > I have one testing host (only one host) with self hosted engine and 2 > VM > (one linux and one windows). > > After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works > discontinuously. > Every 10 minutes HA agent restart hosted engine VM because result > down. > But the machine is UP, > only the network stop to work for some minutes. > I activate global maintenace mode to prevent engine reboot. If I ssh > to > the hosted engine sometimes > the connection work and sometimes no. Using VNC connection to engine > I > see that sometime VM reach external network > and sometimes no. > If I do a tcpdump on phisical ethernet interface I don't see any > packet > when network on vm don't work. > > Same thing happens fo others two VM. > > Before the upgrade I never had network problems. > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > >>> > >>> ___ > >>> Users mailing list > >>> Users@ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/users > >> > >> ___ > >> Users mailing list > >> Users@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > -- > > Didi > > > > > > ___ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > >
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
On Mon, Dec 28, 2015 at 3:48 PM, Stefano Danzi wrote: > Problem solved!!! > > The file hosted-engine.conf had a wrong fqdn. > I don't think that this happened during upgrade... mybe thay my colleague > did something of wrong... Thanks for the report :-) > > > Il 20/12/2015 14.52, Stefano Danzi ha scritto: > > Network problems was solved after changing Bond mode (and it's strange. I > have to investigate around qemu-kvm, cento 7.2 and switch firmware ), but > broker problem still exist. If I turn on the host, ha agent start engine > vm. When engine VM is up, broker strats to send email. Now I haven't here > detailed logs. > > > Messaggio originale > Da: Yedidyah Bar David > Data: 20/12/2015 11:20 (GMT+01:00) > A: Stefano Danzi , Dan Kenigsberg > Cc: users > Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 3.6.1 > > On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi wrote: >> I found this in vdsm.log and I think that could be the problem: >> >> Thread-3771::ERROR::2015-12-18 >> >> 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) >> Connection closed: Connection closed >> Thread-3771::ERROR::2015-12-18 16:18:58,597::API::1847::vds::(_getHaInfo) >> failed to retrieve Hosted Engine HA info >> Traceback (most recent call last): >> File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo >> stats = instance.get_all_stats() >> File >> >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 103, in get_all_stats >> self._configure_broker_conn(broker) >> File >> >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 180, in _configure_broker_conn >> dom_type=dom_type) >> File >> >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 176, in set_storage_domain >> .format(sd_type, options, e)) >> RequestError: Failed to set storage domain FilesystemBackend, options >> {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}: >> Connection closed > > My guess is that this is a consequence of your networking problems. > > Adding Dan. > >> >> >> Il 17/12/2015 18.51, Stefano Danzi ha scritto: >>> >>> I partially solve the problem. >>> >>> My host machine has 2 network interfaces with a bond. The bond was >>> configured with mode=4 (802.3ad) and switch was configured in the same >>> way. >>> If I remove one network cable the network become stable. With both cables >>> attached the network is instable. >>> >>> I removed the link aggregation configuration from switch and change the >>> bond in mode=2 (balance-xor). Now the network are stable. >>> The strange thing is that previous configuration worked fine for one >>> year... since the last upgrade. >>> >>> Now ha-agent don't reboot the hosted-engine anymore, but I receive two >>> emails from brocker evere 2/5 minutes. >>> First a mail with "ovirt-hosted-engine state transition >>> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state >>> transition >>> ReinitializeFSM-EngineStarting" >>> >>> >>> Il 17/12/2015 10.51, Stefano Danzi ha scritto: Hello, I have one testing host (only one host) with self hosted engine and 2 VM (one linux and one windows). After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works discontinuously. Every 10 minutes HA agent restart hosted engine VM because result down. But the machine is UP, only the network stop to work for some minutes. I activate global maintenace mode to prevent engine reboot. If I ssh to the hosted engine sometimes the connection work and sometimes no. Using VNC connection to engine I see that sometime VM reach external network and sometimes no. If I do a tcpdump on phisical ethernet interface I don't see any packet when network on vm don't work. Same thing happens fo others two VM. Before the upgrade I never had network problems. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users >>> >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > > -- > Didi > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > -- Didi ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1
Problem solved!!! The file hosted-engine.conf had a wrong fqdn. I don't think that this happened during upgrade... mybe thay my colleague did something of wrong... Il 20/12/2015 14.52, Stefano Danzi ha scritto: Network problems was solved after changing Bond mode (and it's strange. I have to investigate around qemu-kvm, cento 7.2 and switch firmware ), but broker problem still exist. If I turn on the host, ha agent start engine vm. When engine VM is up, broker strats to send email. Now I haven't here detailed logs. Messaggio originale Da: Yedidyah Bar David Data: 20/12/2015 11:20 (GMT+01:00) A: Stefano Danzi , Dan Kenigsberg Cc: users Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 3.6.1 On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi wrote: > I found this in vdsm.log and I think that could be the problem: > > Thread-3771::ERROR::2015-12-18 > 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate) > Connection closed: Connection closed > Thread-3771::ERROR::2015-12-18 16:18:58,597::API::1847::vds::(_getHaInfo) > failed to retrieve Hosted Engine HA info > Traceback (most recent call last): > File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo > stats = instance.get_all_stats() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 103, in get_all_stats > self._configure_broker_conn(broker) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 180, in _configure_broker_conn > dom_type=dom_type) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 176, in set_storage_domain > .format(sd_type, options, e)) > RequestError: Failed to set storage domain FilesystemBackend, options > {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}: > Connection closed My guess is that this is a consequence of your networking problems. Adding Dan. > > > Il 17/12/2015 18.51, Stefano Danzi ha scritto: >> >> I partially solve the problem. >> >> My host machine has 2 network interfaces with a bond. The bond was >> configured with mode=4 (802.3ad) and switch was configured in the same way. >> If I remove one network cable the network become stable. With both cables >> attached the network is instable. >> >> I removed the link aggregation configuration from switch and change the >> bond in mode=2 (balance-xor). Now the network are stable. >> The strange thing is that previous configuration worked fine for one >> year... since the last upgrade. >> >> Now ha-agent don't reboot the hosted-engine anymore, but I receive two >> emails from brocker evere 2/5 minutes. >> First a mail with "ovirt-hosted-engine state transition >> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition >> ReinitializeFSM-EngineStarting" >> >> >> Il 17/12/2015 10.51, Stefano Danzi ha scritto: >>> >>> Hello, >>> I have one testing host (only one host) with self hosted engine and 2 VM >>> (one linux and one windows). >>> >>> After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works >>> discontinuously. >>> Every 10 minutes HA agent restart hosted engine VM because result down. >>> But the machine is UP, >>> only the network stop to work for some minutes. >>> I activate global maintenace mode to prevent engine reboot. If I ssh to >>> the hosted engine sometimes >>> the connection work and sometimes no. Using VNC connection to engine I >>> see that sometime VM reach external network >>> and sometimes no. >>> If I do a tcpdump on phisical ethernet interface I don't see any packet >>> when network on vm don't work. >>> >>> Same thing happens fo others two VM. >>> >>> Before the upgrade I never had network problems. >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users -- Didi ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users