Re: [ovs-discuss] Possible bug with OVS LACP + VPC

2017-01-17 Thread Shu Shen
On Tue, Jan 17, 2017 at 04:54:51PM -0600, Chad Norgan wrote:
> Given that the partner port_id on the rogue packet matches the slave
> it's sent out. I lean towards #1, that the LACP implementation is
> somehow mixing up the status for the slave's pdu, rather than leaking
> eth1's pdu out the eth0 interface.
> 
> -Chad

Hi Chad,

A few observations and questions as below:

1) I wrote an additional testcase for the slave down and back up case,
which appears to be working fine. I put additional debug messages (not
in the commit referred below thought) to trace the lacpdu being sent by
all slaves and did not see any rogue package. Of course, the testcase
uses two ovs switches and patch ports, so it may well be far away from
reproducing the problem you are having.  You may find the test case
here:


https://github.com/shushen/ovs/commit/72aa0afc6b61d5135ea9253b8aaf31a57c7c4734

And travis-ci builds with the above test case included are passing:
https://travis-ci.org/shushen/ovs/builds/192922935

2) Could you please elaborate a bit more about how you "manually down
the eth1 interface" and "bring eth1 back up"? Did you unplug a physical
link or did you use any ovs/Linux CLI to do so? This may help me refine
the test case to reproduce what you are doing.

3) I find it interesting in the packet trace from the gist you posted,
where the source mac address from the peer switch is all zeros, see


https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98#file-2_tcpdump-L81

If I read correctly, in Section 6.2.11.1 of 802.1AX-2014, it says:

Protocol entities sourcing frames from within the Link Aggregation
sublayer (e.g., LACP and the Marker protocol) use the MAC address of
the MAC within an underlying Aggregation Port as the SA in frames
transmitted through that Aggregation Port.

I'm not sure why the peer switch is using the all-zero MAC address but
it probably shouldn't. I don't know how ovs datapath handles such
packets. If when eth1 is coming back up and the source MAC address is
also all zeros, could this affect how the LACPDU from eth1 being
handled? I welcome comments from you and the list.

I'd appreciate if you could provide a bit more information on 2) or any
other thoughts. My intention is to investigate a bit more on this
problem.

/Shu

> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Possible bug with OVS LACP + VPC

2017-01-17 Thread Ethan J. Jackson
Short answer is no doesn't ring a bell in particular.  Definitely doesn't
sound like the expected behavior.

FWIW, based on my very rusty memory of the code, it sounds like one of two
things is happening:

1) The LACP protocol implementation itself has a bug in which it's sending
the incorrect state on the happy slave.

or 2) For some reason packets intended for just eth1 are getting broadcast
to eth0 as well because of how the OpenFlow is setup.

Those are just guesses though, sorry I can't be of more help.

Ethan



On Tue, Jan 17, 2017 at 9:10 AM, Ben Pfaff  wrote:

> On Thu, Jan 12, 2017 at 06:36:52PM -0600, Chad Norgan wrote:
> > I've been doing some bug chasing around some unintended impacts we've
> > been noticing on our bonded hypervisors. The servers have a bond with
> > two slave interfaces each going to a different upstream switch which
> > have been configured with a Virtual PortChannel (VPC). To OVS, the VPC
> > configuration makes the switches appear as if they are a single device
> > with a single PortChannel. The configuration works great, but we have
> > noticed some unexpected data plane outages when interfaces come back
> > up, not when they go down.
> >
> > For instance, if my server has eth0 and eth1 in a bond and I down the
> > link on eth1, everything is fine. When I re-enable eth1 and it starts
> > to negotiate LACP again, it causes eth0's LACP status to go
> > unsynchronized and stop passing traffic. I've have packet captures for
> > this scenario here:
> > https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98.
> > If you look at lines 49-57 of the 2nd file, you can see that when the
> > 2nd interface is brought back online, a rogue LACPDU is sent out the
> > working slave interface with a LACP state that doesn't match the
> > current slave. The state mismatch then causes the switch to stop
> > forwarding and restart the LACP negotiation.
> >
> > Does anyone have an idea on why this might be happening?
>
> Thank you for this bug report.  This may explain certain other bug
> reports that we've received over the last few years that did not have
> enough information to follow up on.  I guess that, probably, one of us
> will have to dive into it.
>
> However, before we do that--since I don't think that anyone on the OVS
> team currently has a good working understanding of LACP--I'd like to ask
> Ethan about it.  Ethan, does this ring any bells for you?  Do you have
> any idea where one might start looking on this?
>
> Thanks,
>
> Ben.
>



-- 
Ethan J. Jackson
ejj.sh
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] VXLAN support in OVS 2.5.0

2017-01-17 Thread Scott Lowe
On 01/16/2017 11:39 PM, Shravan S K wrote:
> How to do the IP connectivity part using OVS internal ports?


[SL] Take a look at either of these blog posts:





Both posts illustrate the use of an OVS internal port as an IP interface.

Hope this helps!


> Thanks.
> 
> Shravan
> 
> On 16 January 2017 at 04:40, Scott Lowe  > wrote:
> 
> On 01/13/2017 03:20 AM, Shravan S K wrote:
>> My motive is to simulate VXLAN functionality on a bigger topology using
>> mininet.
>> My plan - As Mininet uses OVS bridges to simulate vswitch functionality,
>> we can use ovs-vsctl to configure VXLAN functionality on the bridges. I
>> thought let me try for a simple topology without using Mininet and just
>> using OVS on a single host and 2 VMs. If it works, then I can make a
>> similar configuration for a bigger topology using mininet.
>>
>> OVS on a single host and 2 VMs : vm1-br1---br2-vm2
>> I am confused on how to perform the vxlan config for the above setup.
>>
>> If the above one works, I could try on the mininet topologies.
>> For the mininet topology ( --topo=linear,2 )
>> h1 - s1 -- s2  h2 (h1,h2 are hosts, s1,s2 are switches -
>> actually ovs bridges)
> 
> 
> Setting aside the mininet question for the moment, the way to get VXLAN
> working between two OVS bridges is to establish an IP endpoint (also
> known as a VXLAN Tunnel Endpoint, or VTEP) for each bridge. So, in your
> example configuration, br1 and br2 each need an interface (of some sort)
> with an IP address. You'd then configure a VXLAN port on br1 that points
> to the IP endpoint for br2, and configure a VXLAN port on br2 that
> points to the IP endpoint for br1. Since you're trying to do this within
> a single host, you might consider using OVS internal ports as the IP
> endpoints for each bridge. As long as each IP endpoint can reach the
> other, then in theory it should work.

-- 
Scott

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Possible bug with OVS LACP + VPC

2017-01-17 Thread Chad Norgan
Given that the partner port_id on the rogue packet matches the slave
it's sent out. I lean towards #1, that the LACP implementation is
somehow mixing up the status for the slave's pdu, rather than leaking
eth1's pdu out the eth0 interface.

-Chad
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Possible bug with OVS LACP + VPC

2017-01-17 Thread Ben Pfaff
On Thu, Jan 12, 2017 at 06:36:52PM -0600, Chad Norgan wrote:
> I've been doing some bug chasing around some unintended impacts we've
> been noticing on our bonded hypervisors. The servers have a bond with
> two slave interfaces each going to a different upstream switch which
> have been configured with a Virtual PortChannel (VPC). To OVS, the VPC
> configuration makes the switches appear as if they are a single device
> with a single PortChannel. The configuration works great, but we have
> noticed some unexpected data plane outages when interfaces come back
> up, not when they go down.
> 
> For instance, if my server has eth0 and eth1 in a bond and I down the
> link on eth1, everything is fine. When I re-enable eth1 and it starts
> to negotiate LACP again, it causes eth0's LACP status to go
> unsynchronized and stop passing traffic. I've have packet captures for
> this scenario here:
> https://gist.github.com/beardymcbeards/7bd9feca87c0574e996a397d90d5ff98.
> If you look at lines 49-57 of the 2nd file, you can see that when the
> 2nd interface is brought back online, a rogue LACPDU is sent out the
> working slave interface with a LACP state that doesn't match the
> current slave. The state mismatch then causes the switch to stop
> forwarding and restart the LACP negotiation.
> 
> Does anyone have an idea on why this might be happening?

Thank you for this bug report.  This may explain certain other bug
reports that we've received over the last few years that did not have
enough information to follow up on.  I guess that, probably, one of us
will have to dive into it.

However, before we do that--since I don't think that anyone on the OVS
team currently has a good working understanding of LACP--I'd like to ask
Ethan about it.  Ethan, does this ring any bells for you?  Do you have
any idea where one might start looking on this?

Thanks,

Ben.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS DPDK with no DPDK nics on NUMA 0

2017-01-17 Thread Stokes, Ian
> This article https://software.intel.com/en-us/articles/using-open-vswitch-
> and-dpdk-with-neutron-in-devstack says: "If you have the NICs installed
> entirely on a NUMA Node other than 0, you will encounter a bug that will
> prevent correct OVS setup. You may wish to move your NIC device to a
> different PCIe slot."
> 
> Anyone know what this bug is, and is it in OVS-DPDK?  "correct OVS setup"
> suggests that it might just be a bug in the devstack script.

I think this bug is related to the fact that the ovs PMD coremask value defined 
in the local.conf is 0x4 by default, which is a cpu 3 on socket 0 in this case. 
If the NIC is attached to NUMA node 1 then it will fail to initialize I believe.

I think this is still an issue today for OVS-DPDK, but there are plans to 
enable cross numa pmd configurations (although one should note that there would 
be a performance penalty for crossing the numa nodes).

There is a work around mentioned in the guide under the 'Additional OVS/DPDK 
Options of Note', essentially you may change the pmd coremask to be a cpu on 
numa node 1.

Thanks
Ian

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVS DPDK with no DPDK nics on NUMA 0

2017-01-17 Thread O'Reilly, Darragh

This article 
https://software.intel.com/en-us/articles/using-open-vswitch-and-dpdk-with-neutron-in-devstack
 says: "If you have the NICs installed entirely on a NUMA Node other than 0, 
you will encounter a bug that will prevent correct OVS setup. You may wish to 
move your NIC device to a different PCIe slot."

Anyone know what this bug is, and is it in OVS-DPDK?  "correct OVS setup" 
suggests that it might just be a bug in the devstack script.

Cheers,
Darragh.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss