Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-02-04 Thread Dan Kenigsberg
On Thu, Feb 04, 2016 at 06:26:14PM +0100, Stefano Danzi wrote:
> 
> 
> Il 04/02/2016 16.55, Dan Kenigsberg ha scritto:
> >On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote:
> >>On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> >>>On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> I did some tests:
> 
> kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
> one network cable the network is stable)
> kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> >>>Would you be kind to file a kernel bug in bugzilla.redhat.com?
> >>>Summarize the information from this thread (e.g. your ifcfgs and in what
> >>>way does mode 4 doesn't work).
> >>>
> >>>To get the bug solved quickly we'd better find paying RHEL7 customer
> >>>subscribing to it. But I'll try to push from my direction.
> >>Stefano has been kind to open
> >>
> >> Bug 1295423 - Unstable network link using bond mode = 4
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1295423
> >>
> >>which we fail to reproduce on our own lab. I'd be pleased if anybody who
> >>experiences it, and their networking config to the bug (if it is
> >>different). Can you also lay out your switch's hardware and
> >>configuration?
> >Stefano, could you share your /proc/net/bonding/* files with us?
> >I heard about similar reports were the bond slaves had mismatching
> >aggregator id. Could it be your case as well?
> >
> 
> Here:
> 
> [root@ovirt01 ~]# cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> 
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
> 
> 802.3ad info
> LACP rate: slow
> Min links: 0
> Aggregator selection policy (ad_select): stable
> Active Aggregator Info:
> Aggregator ID: 2
> Number of ports: 1
> Actor Key: 9
> Partner Key: 1
> Partner Mac Address: 00:00:00:00:00:00
> 
> Slave Interface: enp4s0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 2
> Permanent HW addr: **:**:**:**:**:f1
> Slave queue ID: 0
> Aggregator ID: 1

---^^^


> Actor Churn State: churned
> Partner Churn State: churned
> Actor Churned Count: 4
> Partner Churned Count: 5
> details actor lacp pdu:
> system priority: 65535
> port key: 9
> port priority: 255
> port number: 1
> port state: 69
> details partner lacp pdu:
> system priority: 65535
> oper key: 1
> port priority: 255
> port number: 1
> port state: 1
> 
> Slave Interface: enp5s0
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 1
> Permanent HW addr: **:**:**:**:**:f2
> Slave queue ID: 0
> Aggregator ID: 2

---^^^


it sounds awfully familiar - mismatching aggregator IDs, and an all-zero
partner mac. Can you double-check that both your nics are wired to the
same switch, which is properly configured to use lacp on these two
ports?

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-02-04 Thread Stefano Danzi



Il 04/02/2016 16.55, Dan Kenigsberg ha scritto:

On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote:

On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:

On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:

I did some tests:

kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
one network cable the network is stable)
kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine

Would you be kind to file a kernel bug in bugzilla.redhat.com?
Summarize the information from this thread (e.g. your ifcfgs and in what
way does mode 4 doesn't work).

To get the bug solved quickly we'd better find paying RHEL7 customer
subscribing to it. But I'll try to push from my direction.

Stefano has been kind to open

 Bug 1295423 - Unstable network link using bond mode = 4
 https://bugzilla.redhat.com/show_bug.cgi?id=1295423

which we fail to reproduce on our own lab. I'd be pleased if anybody who
experiences it, and their networking config to the bug (if it is
different). Can you also lay out your switch's hardware and
configuration?

Stefano, could you share your /proc/net/bonding/* files with us?
I heard about similar reports were the bond slaves had mismatching
aggregator id. Could it be your case as well?



Here:

[root@ovirt01 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 9
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00

Slave Interface: enp4s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: **:**:**:**:**:f1
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 4
Partner Churned Count: 5
details actor lacp pdu:
system priority: 65535
port key: 9
port priority: 255
port number: 1
port state: 69
details partner lacp pdu:
system priority: 65535
oper key: 1
port priority: 255
port number: 1
port state: 1

Slave Interface: enp5s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: **:**:**:**:**:f2
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 2
details actor lacp pdu:
system priority: 65535
port key: 9
port priority: 255
port number: 2
port state: 77
details partner lacp pdu:
system priority: 65535
oper key: 1
port priority: 255
port number: 1
port state: 1



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-02-04 Thread Dan Kenigsberg
On Wed, Jan 06, 2016 at 08:45:16AM +0200, Dan Kenigsberg wrote:
> On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> > On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> > > I did some tests:
> > > 
> > > kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
> > > one network cable the network is stable)
> > > kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> > 
> > Would you be kind to file a kernel bug in bugzilla.redhat.com?
> > Summarize the information from this thread (e.g. your ifcfgs and in what
> > way does mode 4 doesn't work).
> > 
> > To get the bug solved quickly we'd better find paying RHEL7 customer
> > subscribing to it. But I'll try to push from my direction.
> 
> Stefano has been kind to open
> 
> Bug 1295423 - Unstable network link using bond mode = 4
> https://bugzilla.redhat.com/show_bug.cgi?id=1295423
> 
> which we fail to reproduce on our own lab. I'd be pleased if anybody who
> experiences it, and their networking config to the bug (if it is
> different). Can you also lay out your switch's hardware and
> configuration?

Stefano, could you share your /proc/net/bonding/* files with us?
I heard about similar reports were the bond slaves had mismatching
aggregator id. Could it be your case as well?

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-11 Thread Alona Kaplan
- Original Message -
> From: "Stefano Danzi" <s.da...@hawai.it>
> To: "Dan Kenigsberg" <dan...@redhat.com>
> Cc: users@ovirt.org
> Sent: Thursday, January 7, 2016 3:53:11 PM
> Subject: Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 
> 3.6.1
> 
> 
> 
> Il 07/01/2016 12.18, Stefano Danzi ha scritto:
> >
> >
> > Il 06/01/2016 7.45, Dan Kenigsberg ha scritto:
> >> On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> >>> On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> >>>> I did some tests:
> >>>>
> >>>> kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I
> >>>> detach
> >>>> one network cable the network is stable)
> >>>> kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> >>> Would you be kind to file a kernel bug in bugzilla.redhat.com?
> >>> Summarize the information from this thread (e.g. your ifcfgs and in
> >>> what
> >>> way does mode 4 doesn't work).
> >>>
> >>> To get the bug solved quickly we'd better find paying RHEL7 customer
> >>> subscribing to it. But I'll try to push from my direction.
> >> Stefano has been kind to open
> >>
> >>  Bug 1295423 - Unstable network link using bond mode = 4
> >>  https://bugzilla.redhat.com/show_bug.cgi?id=1295423
> >>
> >> which we fail to reproduce on our own lab. I'd be pleased if anybody who
> >> experiences it, and their networking config to the bug (if it is
> >> different). Can you also lay out your switch's hardware and
> >> configuration?
> >>
> >
> > I made some tests using kernel  3.10.0-327.4.4.el7.x86_64.
> > I did a TCP dump on virtual interface "DMZ" (VLAN X on bond0).
> >
> > When I have two netwok cables connected I can se ARP requests but not
> > ARP replyes.
> > When I detach one network cable I can see ARP requests and ARP replyes
> > (and networking on VM works).
> >
> > Maybe the problem isn't in bonding config but in qemu/kvm/vhost_net
> 
> How I can enable a debug log for the bond?

Hi Michael,

Maybe you can assist.
How can debug log for bond be enabled?

> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-07 Thread Stefano Danzi



Il 06/01/2016 7.45, Dan Kenigsberg ha scritto:

On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:

On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:

I did some tests:

kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
one network cable the network is stable)
kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine

Would you be kind to file a kernel bug in bugzilla.redhat.com?
Summarize the information from this thread (e.g. your ifcfgs and in what
way does mode 4 doesn't work).

To get the bug solved quickly we'd better find paying RHEL7 customer
subscribing to it. But I'll try to push from my direction.

Stefano has been kind to open

 Bug 1295423 - Unstable network link using bond mode = 4
 https://bugzilla.redhat.com/show_bug.cgi?id=1295423

which we fail to reproduce on our own lab. I'd be pleased if anybody who
experiences it, and their networking config to the bug (if it is
different). Can you also lay out your switch's hardware and
configuration?



I made some tests using kernel  3.10.0-327.4.4.el7.x86_64.
I did a TCP dump on virtual interface "DMZ" (VLAN X on bond0).

When I have two netwok cables connected I can se ARP requests but not ARP 
replyes.
When I detach one network cable I can see ARP requests and ARP replyes (and 
networking on VM works).

Maybe the problem isn't in bonding config but in qemu/kvm/vhost_net


 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-05 Thread Dan Kenigsberg
On Mon, Jan 04, 2016 at 01:54:37PM +0200, Dan Kenigsberg wrote:
> On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> > I did some tests:
> > 
> > kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
> > one network cable the network is stable)
> > kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine
> 
> Would you be kind to file a kernel bug in bugzilla.redhat.com?
> Summarize the information from this thread (e.g. your ifcfgs and in what
> way does mode 4 doesn't work).
> 
> To get the bug solved quickly we'd better find paying RHEL7 customer
> subscribing to it. But I'll try to push from my direction.

Stefano has been kind to open

Bug 1295423 - Unstable network link using bond mode = 4
https://bugzilla.redhat.com/show_bug.cgi?id=1295423

which we fail to reproduce on our own lab. I'd be pleased if anybody who
experiences it, and their networking config to the bug (if it is
different). Can you also lay out your switch's hardware and
configuration?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-04 Thread Dan Kenigsberg
On Mon, Jan 04, 2016 at 12:31:38PM +0100, Stefano Danzi wrote:
> I did some tests:
> 
> kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I detach
> one network cable the network is stable)
> kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine

Would you be kind to file a kernel bug in bugzilla.redhat.com?
Summarize the information from this thread (e.g. your ifcfgs and in what
way does mode 4 doesn't work).

To get the bug solved quickly we'd better find paying RHEL7 customer
subscribing to it. But I'll try to push from my direction.

Regards,
Dan.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-04 Thread Stefano Danzi

I did some tests:

kernel-3.10.0-327.3.1.el7.x86_64 -> bond mode 4 doesn't work (if I 
detach one network cable the network is stable)

kernel-3.10.0-229.20.1.el7.x86_64 -> bond mode 4 works fine

Il 31/12/2015 9.44, Dan Kenigsberg ha scritto:

On Wed, Dec 30, 2015 at 09:39:12PM +0100, Stefano Danzi wrote:

Hi Dan,
some info about my network setup:

- My bond is used only for VM networking. ovirtmgmt has a dedicated ethernet
card.
- I haven't set any ethtool opts.
[cut]

I do not see anything suspecious here.

Which kernel version worked well for you?

Would it be possible to boot the machine with it, and retest bond mode
4, so that we can whole-heartedly place the blame on kernel?



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-02 Thread Gianluca Cecchi
Just to put in my experience.
Installed single system with SH engine in 3.6.0 and CentOS 7.1.
Then updated to oVirt 3.6.1 and CentOS 7.2.
I never had problems regarding bonding neither in 3.6.0 nor in 3.6.1.

My current kernel is 3.10.0-327.3.1.el7.x86_64
The server hw is a blade PowerEdge M910 with 4 Gigabit adapters

[root@ractor ~]# lspci | grep igab
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S
Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S
Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S
Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S
Gigabit Ethernet (rev 20)

They are connected to Cisco switches with ports configured as 802.3ad (I
have no details at hand for Cisco model but I can verify)

And this is the situation for VM bonding, where I only customized mode=4 to
specify lacp_rate=1 (default is slow)

- bridges
[root@ractor ~]# brctl show
bridge name bridge id STP enabled interfaces
;vdsmdummy; 8000. no
ovirtmgmt 8000.002564ff0bf4 no bond1
vnet0
vlan65 8000.002564ff0bf0 no bond0.65
vnet1
vnet2

- bond device for VMs vlans
[root@ractor ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 9
Partner Key: 8
Partner Mac Address: 00:01:02:03:04:0c

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:64:ff:0b:f0
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 32768
oper key: 8
port priority: 32768
port number: 137
port state: 63

Slave Interface: em2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:64:ff:0b:f2
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 32768
oper key: 8
port priority: 32768
port number: 603
port state: 63


- bond device for ovirtmgmt
[root@ractor ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 9
Partner Key: 16
Partner Mac Address: 00:01:02:03:04:0c

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:64:ff:0b:f4
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 32768
oper key: 16
port priority: 32768
port number: 145
port state: 63

Slave Interface: em4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:64:ff:0b:f6
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 32768
oper key: 16
port priority: 32768
port number: 611
port state: 63


No particular settings for single interfaces. This is what has been set by
the system for both em1, em2, em3 and em4 (output shown only for em1):

[root@ractor ~]# ethtool -k em1
Features for em1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on

Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2016-01-02 Thread Jon Archer
Hi,

I'm not near the server for a while but network is set up

2x broadcom nic with whatever driver works out of the box.

Both set as slaves to bond0 which is in mode=4 with no explicit options.

Switch is a fairly basic tp-link but works almost identically to a Cisco. Has 
the 2 ports set up in a portchannel.

Short and long, this config works in earlier kernel, but not with kernel 
shipped in 7.2

Release notes for RH7.2 suggest some work on bonding has been done, wonder if 
default options (LACP speed?) have changed?

Jon

On 30 December 2015 09:44:02 GMT+00:00, Dan Kenigsberg  
wrote:
>On Tue, Dec 29, 2015 at 09:57:07PM +, Jon Archer wrote:
>> Hi Stefano,
>> 
>> It's definitely not the switch, it seems to be the latest kernel
>package
>> (kernel-3.10.0-327.3.1.el7.x86_64) which stops bonding working
>correctly,
>> reverting back to the previous kernel brings the network up in
>802.3ad mode
>> (4).
>> 
>> I know, from reading the release notes of 7.2, that there were some
>changes
>> to the bonding bits in the kernel so i'm guessing maybe some defaults
>have
>> changed.
>> 
>> I'll keep digging and post back as soon as i have something.
>> 
>> Jon
>> 
>> On 29/12/15 19:55, Stefano Danzi wrote:
>> >Hi! I didn't solve yet. I'm still using mode 2 on bond interface.
>What's
>> >your switch model and firmware version?
>
>Hi Jon and Stefano,
>
>We've been testing bond mode 4 with (an earlier)
>kernel-3.10.0-327.el7.x86_64 and experienced no such behaviour.
>
>However, to better identify the suspected kernel bug, could you provide
>more information regarding your network connectivity?
>
>What is the make of your NICs? Which driver do you use?
>
>Do you set special ethtool opts (LRO with bridge was broken in 7.2.0
>kernel if I am not mistaken)?
>
>You have the ovirtmgmt bridge on top of your bond, right?
>
>Can you share your ifcfg*?

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-31 Thread Dan Kenigsberg
On Wed, Dec 30, 2015 at 09:39:12PM +0100, Stefano Danzi wrote:
> Hi Dan,
> some info about my network setup:
> 
> - My bond is used only for VM networking. ovirtmgmt has a dedicated ethernet
> card.
> - I haven't set any ethtool opts.
> - Nics on bond specs:
> 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
> Connection
> Subsystem: ASUSTeK Computer Inc. Motherboard
> Flags: bus master, fast devsel, latency 0, IRQ 16
> Memory at df20 (32-bit, non-prefetchable) [size=128K]
> I/O ports at e000 [size=32]
> Memory at df22 (32-bit, non-prefetchable) [size=16K]
> Capabilities: [c8] Power Management version 2
> Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
> Capabilities: [e0] Express Endpoint, MSI 00
> Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
> Capabilities: [100] Advanced Error Reporting
> Kernel driver in use: e1000e
> 
> [root@ovirt01 ~]# ifconfig
> DMZ: flags=4163  mtu 1500
> txqueuelen 0  (Ethernet)
> RX packets 43546  bytes 2758816 (2.6 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 0  bytes 0 (0.0 B)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> LAN_HAW: flags=4163  mtu 1500
> txqueuelen 0  (Ethernet)
> RX packets 2090262  bytes 201078292 (191.7 MiB)
> RX errors 0  dropped 86  overruns 0  frame 0
> TX packets 0  bytes 0 (0.0 B)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> bond0: flags=5187  mtu 1500
> txqueuelen 0  (Ethernet)
> RX packets 2408059  bytes 456371629 (435.2 MiB)
> RX errors 0  dropped 185  overruns 0  frame 0
> TX packets 118966  bytes 14862549 (14.1 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> bond0.1: flags=4163  mtu 1500
> txqueuelen 0  (Ethernet)
> RX packets 2160985  bytes 210157656 (200.4 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 0  bytes 0 (0.0 B)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> bond0.3: flags=4163  mtu 1500
> txqueuelen 0  (Ethernet)
> RX packets 151195  bytes 185253584 (176.6 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 118663  bytes 13857950 (13.2 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> enp4s0: flags=6211  mtu 1500
> txqueuelen 1000  (Ethernet)
> RX packets 708141  bytes 95034564 (90.6 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 16714  bytes 5193108 (4.9 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> device interrupt 16  memory 0xdf20-df22
> 
> enp5s0: flags=6211  mtu 1500
> txqueuelen 1000  (Ethernet)
> RX packets 1699934  bytes 361339105 (344.5 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 102252  bytes 9669441 (9.2 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> device interrupt 17  memory 0xdf10-df12
> 
> enp6s1: flags=4163  mtu 1500
> txqueuelen 1000  (Ethernet)
> RX packets 2525232  bytes 362345893 (345.5 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 388452  bytes 208145492 (198.5 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> lo: flags=73  mtu 65536
> inet 127.0.0.1  netmask 255.0.0.0
> loop  txqueuelen 0  (Local Loopback)
> RX packets 116465661  bytes 1515059255942 (1.3 TiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 116465661  bytes 1515059255942 (1.3 TiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> ovirtmgmt: flags=4163  mtu 1500
> inet 192.168.1.50  netmask 255.255.255.0  broadcast 192.168.1.255
> txqueuelen 0  (Ethernet)
> RX packets 3784298  bytes 36509 (529.8 MiB)
> RX errors 0  dropped 86  overruns 0  frame 0
> TX packets 1737669  bytes 1401650369 (1.3 GiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> vnet0: flags=4163  mtu 1500
> txqueuelen 500  (Ethernet)
> RX packets 558574  bytes 107521742 (102.5 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 1316892  bytes 487764500 (465.1 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> vnet1: flags=4163  mtu 1500
>

Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-30 Thread Stefano Danzi

Hi Dan,
some info about my network setup:

- My bond is used only for VM networking. ovirtmgmt has a dedicated 
ethernet card.

- I haven't set any ethtool opts.
- Nics on bond specs:
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network 
Connection

Subsystem: ASUSTeK Computer Inc. Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at df20 (32-bit, non-prefetchable) [size=128K]
I/O ports at e000 [size=32]
Memory at df22 (32-bit, non-prefetchable) [size=16K]
Capabilities: [c8] Power Management version 2
Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [e0] Express Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: e1000e

[root@ovirt01 ~]# ifconfig
DMZ: flags=4163  mtu 1500
txqueuelen 0  (Ethernet)
RX packets 43546  bytes 2758816 (2.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

LAN_HAW: flags=4163  mtu 1500
txqueuelen 0  (Ethernet)
RX packets 2090262  bytes 201078292 (191.7 MiB)
RX errors 0  dropped 86  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0: flags=5187  mtu 1500
txqueuelen 0  (Ethernet)
RX packets 2408059  bytes 456371629 (435.2 MiB)
RX errors 0  dropped 185  overruns 0  frame 0
TX packets 118966  bytes 14862549 (14.1 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0.1: flags=4163  mtu 1500
txqueuelen 0  (Ethernet)
RX packets 2160985  bytes 210157656 (200.4 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 0  bytes 0 (0.0 B)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0.3: flags=4163  mtu 1500
txqueuelen 0  (Ethernet)
RX packets 151195  bytes 185253584 (176.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 118663  bytes 13857950 (13.2 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0: flags=6211  mtu 1500
txqueuelen 1000  (Ethernet)
RX packets 708141  bytes 95034564 (90.6 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 16714  bytes 5193108 (4.9 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
device interrupt 16  memory 0xdf20-df22

enp5s0: flags=6211  mtu 1500
txqueuelen 1000  (Ethernet)
RX packets 1699934  bytes 361339105 (344.5 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 102252  bytes 9669441 (9.2 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
device interrupt 17  memory 0xdf10-df12

enp6s1: flags=4163  mtu 1500
txqueuelen 1000  (Ethernet)
RX packets 2525232  bytes 362345893 (345.5 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 388452  bytes 208145492 (198.5 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73  mtu 65536
inet 127.0.0.1  netmask 255.0.0.0
loop  txqueuelen 0  (Local Loopback)
RX packets 116465661  bytes 1515059255942 (1.3 TiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 116465661  bytes 1515059255942 (1.3 TiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ovirtmgmt: flags=4163  mtu 1500
inet 192.168.1.50  netmask 255.255.255.0  broadcast 
192.168.1.255

txqueuelen 0  (Ethernet)
RX packets 3784298  bytes 36509 (529.8 MiB)
RX errors 0  dropped 86  overruns 0  frame 0
TX packets 1737669  bytes 1401650369 (1.3 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vnet0: flags=4163  mtu 1500
txqueuelen 500  (Ethernet)
RX packets 558574  bytes 107521742 (102.5 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 1316892  bytes 487764500 (465.1 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vnet1: flags=4163  mtu 1500
txqueuelen 500  (Ethernet)
RX packets 42282  bytes 7373007 (7.0 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 40498  bytes 17598215 (16.7 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vnet2: 

Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-30 Thread Dan Kenigsberg
On Tue, Dec 29, 2015 at 09:57:07PM +, Jon Archer wrote:
> Hi Stefano,
> 
> It's definitely not the switch, it seems to be the latest kernel package
> (kernel-3.10.0-327.3.1.el7.x86_64) which stops bonding working correctly,
> reverting back to the previous kernel brings the network up in 802.3ad mode
> (4).
> 
> I know, from reading the release notes of 7.2, that there were some changes
> to the bonding bits in the kernel so i'm guessing maybe some defaults have
> changed.
> 
> I'll keep digging and post back as soon as i have something.
> 
> Jon
> 
> On 29/12/15 19:55, Stefano Danzi wrote:
> >Hi! I didn't solve yet. I'm still using mode 2 on bond interface. What's
> >your switch model and firmware version?

Hi Jon and Stefano,

We've been testing bond mode 4 with (an earlier)
kernel-3.10.0-327.el7.x86_64 and experienced no such behaviour.

However, to better identify the suspected kernel bug, could you provide
more information regarding your network connectivity?

What is the make of your NICs? Which driver do you use?

Do you set special ethtool opts (LRO with bridge was broken in 7.2.0
kernel if I am not mistaken)?

You have the ovirtmgmt bridge on top of your bond, right?

Can you share your ifcfg*?

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-29 Thread Jon Archer

Hi Stefano,

It's definitely not the switch, it seems to be the latest kernel package 
(kernel-3.10.0-327.3.1.el7.x86_64) which stops bonding working 
correctly, reverting back to the previous kernel brings the network up 
in 802.3ad mode (4).


I know, from reading the release notes of 7.2, that there were some 
changes to the bonding bits in the kernel so i'm guessing maybe some 
defaults have changed.


I'll keep digging and post back as soon as i have something.

Jon

On 29/12/15 19:55, Stefano Danzi wrote:
Hi! I didn't solve yet. I'm still using mode 2 on bond interface. 
What's your switch model and firmware version?


 Messaggio originale 
Da: Jon Archer 
Data: 29/12/2015 19:26 (GMT+01:00)
A: users@ovirt.org
Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 
3.6.1


Stefano,

I am currently experiencing the same issue. 2x nic lacp config at 
switch, mode 4 bond at server with no connectivity. Interestingly I am 
able to ping the switch itself.


I haven't had time to investigate thoroughly but my first thought is 
an update somewhere.


Did you ever resolve and get back to mode=4?

Jon

On 17 December 2015 17:51:50 GMT+00:00, Stefano Danzi 
 wrote:


I partially solve the problem.

My host machine has 2 network interfaces with a bond. The bond was
configured with  mode=4 (802.3ad) and switch was configured in the same way.
If I remove one network cable the network become stable. With both
cables attached the network is instable.

I removed the link aggregation configuration from switch and change the
bond in mode=2 (balance-xor). Now the network are stable.
The strange thing is that previous configuration worked fine for one
year... since the last upgrade.

Now ha-agent don't reboot the hosted-engine anymore, but I receive two
emails from brocker evere 2/5 minutes.
First a mail with "ovirt-hosted-engine state transition
StartState-ReinitializeFSM" and after "ovirt-hosted-engine state
transition ReinitializeFSM-EngineStarting"


Il 17/12/2015 10.51, Stefano Danzi ha scritto:

Hello, I have one testing host (only one host) with self
hosted engine and 2 VM (one linux and one windows). After
upgrade ovirt from 3.6.0 to 3.6.1 the network connection works
discontinuously. Every 10 minutes HA agent restart hosted
engine VM because result down. But the machine is UP, only the
network stop to work for some minutes. I activate global
maintenace mode to prevent engine reboot. If I ssh to the
hosted engine sometimes the connection work and sometimes no.
Using VNC connection to engine I see that sometime VM reach
external network and sometimes no. If I do a tcpdump on
phisical ethernet interface I don't see any packet when
network on vm don't work. Same thing happens fo others two VM.
Before the upgrade I never had network problems.

Users mailing list Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users





Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

-- Sent from my Android device with K-9 Mail. Please excuse my brevity. 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-28 Thread Yedidyah Bar David
On Mon, Dec 28, 2015 at 3:48 PM, Stefano Danzi  wrote:
> Problem solved!!!
>
> The file hosted-engine.conf had a wrong fqdn.
> I don't think that this happened during upgrade... mybe thay my colleague
> did something of wrong...

Thanks for the report :-)

>
>
> Il 20/12/2015 14.52, Stefano Danzi ha scritto:
>
> Network problems was solved after changing Bond mode  (and it's strange. I
> have to investigate around qemu-kvm, cento 7.2 and switch firmware ), but
> broker problem still exist.  If I turn on the host, ha agent start engine
> vm. When engine VM is up, broker strats to send email.  Now I haven't here
> detailed logs.
>
>
>  Messaggio originale 
> Da: Yedidyah Bar David 
> Data: 20/12/2015 11:20 (GMT+01:00)
> A: Stefano Danzi , Dan Kenigsberg 
> Cc: users 
> Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 3.6.1
>
> On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi  wrote:
>> I found this in vdsm.log and I think that could be the problem:
>>
>> Thread-3771::ERROR::2015-12-18
>>
>> 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
>> Connection closed: Connection closed
>> Thread-3771::ERROR::2015-12-18 16:18:58,597::API::1847::vds::(_getHaInfo)
>> failed to retrieve Hosted Engine HA info
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo
>> stats = instance.get_all_stats()
>>   File
>>
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>> line 103, in get_all_stats
>> self._configure_broker_conn(broker)
>>   File
>>
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>> line 180, in _configure_broker_conn
>> dom_type=dom_type)
>>   File
>>
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>> line 176, in set_storage_domain
>> .format(sd_type, options, e))
>> RequestError: Failed to set storage domain FilesystemBackend, options
>> {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}:
>> Connection closed
>
> My guess is that this is a consequence of your networking problems.
>
> Adding Dan.
>
>>
>>
>> Il 17/12/2015 18.51, Stefano Danzi ha scritto:
>>>
>>> I partially solve the problem.
>>>
>>> My host machine has 2 network interfaces with a bond. The bond was
>>> configured with  mode=4 (802.3ad) and switch was configured in the same
>>> way.
>>> If I remove one network cable the network become stable. With both cables
>>> attached the network is instable.
>>>
>>> I removed the link aggregation configuration from switch and change the
>>> bond in mode=2 (balance-xor). Now the network are stable.
>>> The strange thing is that previous configuration worked fine for one
>>> year... since the last upgrade.
>>>
>>> Now ha-agent don't reboot the hosted-engine anymore, but I receive two
>>> emails from brocker evere 2/5 minutes.
>>> First a mail with "ovirt-hosted-engine state transition
>>> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state
>>> transition
>>> ReinitializeFSM-EngineStarting"
>>>
>>>
>>> Il 17/12/2015 10.51, Stefano Danzi ha scritto:

 Hello,
 I have one testing host (only one host) with self hosted engine and 2 VM
 (one linux and one windows).

 After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works
 discontinuously.
 Every 10 minutes HA agent restart hosted engine VM because result down.
 But the machine is UP,
 only the network stop to work for some minutes.
 I activate global maintenace mode to prevent engine reboot. If I ssh to
 the hosted engine sometimes
 the connection work and sometimes no.  Using VNC connection to engine I
 see that sometime VM reach external network
 and sometimes no.
 If I do a tcpdump on phisical ethernet interface I don't see any packet
 when network on vm don't work.

 Same thing happens fo others two VM.

 Before the upgrade I never had network problems.
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
> --
> Didi
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>



-- 
Didi
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-28 Thread Stefano Danzi

Problem solved!!!

The file hosted-engine.conf had a wrong fqdn.
I don't think that this happened during upgrade... mybe thay my 
colleague did something of wrong...


Il 20/12/2015 14.52, Stefano Danzi ha scritto:
Network problems was solved after changing Bond mode  (and it's 
strange. I have to investigate around qemu-kvm, cento 7.2 and switch 
firmware ), but broker problem still exist.  If I turn on the host, ha 
agent start engine vm. When engine VM is up, broker strats to send 
email.  Now I haven't here detailed logs.



 Messaggio originale 
Da: Yedidyah Bar David 
Data: 20/12/2015 11:20 (GMT+01:00)
A: Stefano Danzi , Dan Kenigsberg 
Cc: users 
Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 
3.6.1


On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi  wrote:
> I found this in vdsm.log and I think that could be the problem:
>
> Thread-3771::ERROR::2015-12-18
> 
16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)

> Connection closed: Connection closed
> Thread-3771::ERROR::2015-12-18 
16:18:58,597::API::1847::vds::(_getHaInfo)

> failed to retrieve Hosted Engine HA info
> Traceback (most recent call last):
>   File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo
> stats = instance.get_all_stats()
>   File
> 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",

> line 103, in get_all_stats
> self._configure_broker_conn(broker)
>   File
> 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",

> line 180, in _configure_broker_conn
> dom_type=dom_type)
>   File
> 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",

> line 176, in set_storage_domain
> .format(sd_type, options, e))
> RequestError: Failed to set storage domain FilesystemBackend, options
> {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}:
> Connection closed

My guess is that this is a consequence of your networking problems.

Adding Dan.

>
>
> Il 17/12/2015 18.51, Stefano Danzi ha scritto:
>>
>> I partially solve the problem.
>>
>> My host machine has 2 network interfaces with a bond. The bond was
>> configured with  mode=4 (802.3ad) and switch was configured in the 
same way.
>> If I remove one network cable the network become stable. With both 
cables

>> attached the network is instable.
>>
>> I removed the link aggregation configuration from switch and change the
>> bond in mode=2 (balance-xor). Now the network are stable.
>> The strange thing is that previous configuration worked fine for one
>> year... since the last upgrade.
>>
>> Now ha-agent don't reboot the hosted-engine anymore, but I receive two
>> emails from brocker evere 2/5 minutes.
>> First a mail with "ovirt-hosted-engine state transition
>> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state 
transition

>> ReinitializeFSM-EngineStarting"
>>
>>
>> Il 17/12/2015 10.51, Stefano Danzi ha scritto:
>>>
>>> Hello,
>>> I have one testing host (only one host) with self hosted engine 
and 2 VM

>>> (one linux and one windows).
>>>
>>> After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works
>>> discontinuously.
>>> Every 10 minutes HA agent restart hosted engine VM because result 
down.

>>> But the machine is UP,
>>> only the network stop to work for some minutes.
>>> I activate global maintenace mode to prevent engine reboot. If I 
ssh to

>>> the hosted engine sometimes
>>> the connection work and sometimes no.  Using VNC connection to 
engine I

>>> see that sometime VM reach external network
>>> and sometimes no.
>>> If I do a tcpdump on phisical ethernet interface I don't see any 
packet

>>> when network on vm don't work.
>>>
>>> Same thing happens fo others two VM.
>>>
>>> Before the upgrade I never had network problems.
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



--
Didi


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1 [SOLVED]

2015-12-28 Thread Roy Golan
On Mon, Dec 28, 2015 at 4:06 PM, Yedidyah Bar David  wrote:

> On Mon, Dec 28, 2015 at 3:48 PM, Stefano Danzi  wrote:
> > Problem solved!!!
> >
> > The file hosted-engine.conf had a wrong fqdn.
> > I don't think that this happened during upgrade... mybe thay my colleague
> > did something of wrong...
>
> Thanks for the report :-)
>
> >
> >
> > Il 20/12/2015 14.52, Stefano Danzi ha scritto:
> >
> > Network problems was solved after changing Bond mode  (and it's strange.
> I
> > have to investigate around qemu-kvm, cento 7.2 and switch firmware ), but
> > broker problem still exist.  If I turn on the host, ha agent start engine
> > vm. When engine VM is up, broker strats to send email.  Now I haven't
> here
> > detailed logs.
> >
> >
> >  Messaggio originale 
> > Da: Yedidyah Bar David 
> > Data: 20/12/2015 11:20 (GMT+01:00)
> > A: Stefano Danzi , Dan Kenigsberg 
> > Cc: users 
> > Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 ->
> 3.6.1
> >
> > On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi  wrote:
> >> I found this in vdsm.log and I think that could be the problem:
> >>
> >> Thread-3771::ERROR::2015-12-18
> >>
> >>
> 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
> >> Connection closed: Connection closed
> >> Thread-3771::ERROR::2015-12-18
> 16:18:58,597::API::1847::vds::(_getHaInfo)
> >> failed to retrieve Hosted Engine HA info
> >> Traceback (most recent call last):
> >>   File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo
> >> stats = instance.get_all_stats()
> >>   File
> >>
> >>
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> >> line 103, in get_all_stats
> >> self._configure_broker_conn(broker)
> >>   File
> >>
> >>
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> >> line 180, in _configure_broker_conn
> >> dom_type=dom_type)
> >>   File
> >>
> >>
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> >> line 176, in set_storage_domain
> >> .format(sd_type, options, e))
> >> RequestError: Failed to set storage domain FilesystemBackend, options
> >> {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}:
> >> Connection closed
> >
> > My guess is that this is a consequence of your networking problems.
> >
> > Adding Dan.
> >
> >>
> >>
> >> Il 17/12/2015 18.51, Stefano Danzi ha scritto:
> >>>
> >>> I partially solve the problem.
> >>>
> >>> My host machine has 2 network interfaces with a bond. The bond was
> >>> configured with  mode=4 (802.3ad) and switch was configured in the same
> >>> way.
> >>> If I remove one network cable the network become stable. With both
> cables
> >>> attached the network is instable.
> >>>
> >>> I removed the link aggregation configuration from switch and change the
> >>> bond in mode=2 (balance-xor). Now the network are stable.
> >>> The strange thing is that previous configuration worked fine for one
> >>> year... since the last upgrade.
> >>>
> >>> Now ha-agent don't reboot the hosted-engine anymore, but I receive two
> >>> emails from brocker evere 2/5 minutes.
> >>> First a mail with "ovirt-hosted-engine state transition
> >>> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state
> >>> transition
> >>> ReinitializeFSM-EngineStarting"
> >>>
> >>>
> >>> Il 17/12/2015 10.51, Stefano Danzi ha scritto:
> 
>  Hello,
>  I have one testing host (only one host) with self hosted engine and 2
> VM
>  (one linux and one windows).
> 
>  After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works
>  discontinuously.
>  Every 10 minutes HA agent restart hosted engine VM because result
> down.
>  But the machine is UP,
>  only the network stop to work for some minutes.
>  I activate global maintenace mode to prevent engine reboot. If I ssh
> to
>  the hosted engine sometimes
>  the connection work and sometimes no.  Using VNC connection to engine
> I
>  see that sometime VM reach external network
>  and sometimes no.
>  If I do a tcpdump on phisical ethernet interface I don't see any
> packet
>  when network on vm don't work.
> 
>  Same thing happens fo others two VM.
> 
>  Before the upgrade I never had network problems.
>  ___
>  Users mailing list
>  Users@ovirt.org
>  http://lists.ovirt.org/mailman/listinfo/users
> 
> >>>
> >>> ___
> >>> Users mailing list
> >>> Users@ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
> >>
> >> ___
> >> Users mailing list
> >> Users@ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> >
> > --
> > Didi
> >
> >
> > 

[ovirt-users] R: Re: Network instability after upgrade 3.6.0 -> 3.6.1

2015-12-20 Thread Stefano Danzi


Network problems was solved after changing Bond mode  (and it's strange. I have 
to investigate around qemu-kvm, cento 7.2 and switch firmware ), but broker 
problem still exist.  If I turn on the host, ha agent start engine vm. When 
engine VM is up, broker strats to send email.  Now I haven't here detailed logs.

 Messaggio originale 
Da: Yedidyah Bar David  
Data: 20/12/2015  11:20  (GMT+01:00) 
A: Stefano Danzi , Dan Kenigsberg  
Cc: users  
Oggetto: Re: [ovirt-users] Network instability after upgrade 3.6.0 -> 3.6.1 

On Fri, Dec 18, 2015 at 5:31 PM, Stefano Danzi  wrote:
> I found this in vdsm.log and I think that could be the problem:
>
> Thread-3771::ERROR::2015-12-18
> 16:18:58,597::brokerlink::279::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(_communicate)
> Connection closed: Connection closed
> Thread-3771::ERROR::2015-12-18 16:18:58,597::API::1847::vds::(_getHaInfo)
> failed to retrieve Hosted Engine HA info
> Traceback (most recent call last):
>   File "/usr/share/vdsm/API.py", line 1827, in _getHaInfo
> stats = instance.get_all_stats()
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 103, in get_all_stats
> self._configure_broker_conn(broker)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 180, in _configure_broker_conn
> dom_type=dom_type)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 176, in set_storage_domain
> .format(sd_type, options, e))
> RequestError: Failed to set storage domain FilesystemBackend, options
> {'dom_type': 'nfs3', 'sd_uuid': '46f55a31-f35f-465c-b3e2-df45c05e06a7'}:
> Connection closed

My guess is that this is a consequence of your networking problems.

Adding Dan.

>
>
> Il 17/12/2015 18.51, Stefano Danzi ha scritto:
>>
>> I partially solve the problem.
>>
>> My host machine has 2 network interfaces with a bond. The bond was
>> configured with  mode=4 (802.3ad) and switch was configured in the same way.
>> If I remove one network cable the network become stable. With both cables
>> attached the network is instable.
>>
>> I removed the link aggregation configuration from switch and change the
>> bond in mode=2 (balance-xor). Now the network are stable.
>> The strange thing is that previous configuration worked fine for one
>> year... since the last upgrade.
>>
>> Now ha-agent don't reboot the hosted-engine anymore, but I receive two
>> emails from brocker evere 2/5 minutes.
>> First a mail with "ovirt-hosted-engine state transition
>> StartState-ReinitializeFSM" and after "ovirt-hosted-engine state transition
>> ReinitializeFSM-EngineStarting"
>>
>>
>> Il 17/12/2015 10.51, Stefano Danzi ha scritto:
>>>
>>> Hello,
>>> I have one testing host (only one host) with self hosted engine and 2 VM
>>> (one linux and one windows).
>>>
>>> After upgrade ovirt from 3.6.0 to 3.6.1 the network connection works
>>> discontinuously.
>>> Every 10 minutes HA agent restart hosted engine VM because result down.
>>> But the machine is UP,
>>> only the network stop to work for some minutes.
>>> I activate global maintenace mode to prevent engine reboot. If I ssh to
>>> the hosted engine sometimes
>>> the connection work and sometimes no.  Using VNC connection to engine I
>>> see that sometime VM reach external network
>>> and sometimes no.
>>> If I do a tcpdump on phisical ethernet interface I don't see any packet
>>> when network on vm don't work.
>>>
>>> Same thing happens fo others two VM.
>>>
>>> Before the upgrade I never had network problems.
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



-- 
Didi
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users