Re: [Openstack-operators] [neutron] ML2/OVS dropping packets?

2017-06-21 Thread Kevin Benton
Can you do a tcpdump to see if the VM is sending any packets out that other
interface with the source mac of the primary interface?

We make use of the NORMAL action which does mac learning so it's possible
something is slipping through that is causing OVS to get the wrong port
association.

The other possibility is that the ARP entry on the upstream router is
learning the secondary MAC for the IP of the primary due to some traffic
slipping out.

Seeing the dest MAC of upstream traffic going into the filtering bridge of
the secondary interface should tell you if it's a Mac learning problem or
an arp problem.


On Jun 21, 2017 10:52, "Jonathan Proulx"  wrote:



So this all gets more interesting the packets aren't lost they get
routed (switched?) to the wrong interface...


The VM has two interfaces on the same network. Not sure this makes
sense and wes done because this was a straight physical to virtual
migration.  But seems like it should work

so VM is sending SYN from it's (vm)eth0 -> tap0 -> qvb0 -> qvo0 ->
int-eth1-br
-> phy-eht1-br -> (hypervisor)eth1 -> WORLD

but the ACK is coming back (hypervisor)eth1 ->  phy-eht1-br ->
int-eth1-br -> qvo1 !!! -> qvb1 -> tap1 where presumablely sec-group
rules see it as invalid and drop it.

This is quite odd.  Default route on VM is through eth0 where packets
are originatine and where teh ipv4 address it should return to is.

really puzzled why OVS is sending packets back through wrong path.

on the one hand I want to say stop doing that just put both addresses
on one port, on the other I see no reason why it shouldn't work.

-Jon


On Wed, Jun 21, 2017 at 05:35:02PM +0100, Stig Telfer wrote:
:Hi Jon -
:
:From what I understand, while you might have gone to the trouble of
configuring a lossless data centre ethernet, that guarantee of packet loss
ends at the hypervisor. OVS (and other virtual switches) will drop packets
rather than exert back pressure.
:
:I saw a useful paper from IBM Zurich on developing a flow-controlled
virtual switch:
:
:http://researcher.ibm.com/researcher/files/zurich-DCR/
Got%20Loss%20Get%20zOVN.pdf 
:
:It’s a bit dated (2013) but may still apply.
:
:If you figure out a way of preventing this with modern OVS, I’d be very
interested to know.
:
:Best wishes,
:Stig
:
:
:> On 21 Jun 2017, at 16:24, Jonathan Proulx  wrote:
:>
:> On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote:
:> :Are there any events going on during these outages that would cause
:> :reprogramming by the Neutron agent? (e.g. port updates) If not, it's
likely
:> :an OVS issue and you might want to cross-post to the ovs-discuss mailing
:> :list.
:>
:> Guess I'll have to wander deeper into OVS land.
:>
:> No agent updates and nothing in ovs logs (at INFO), flipping to Debug
:> and there's so many messages they get dropped:
:>
:> 017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log
messages in last 0 seconds (most recently, 0 seconds ago) due to excessive
rate
:>
:> /me wanders over to ovs-discuss
:>
:> Thanks,
:> -Jon
:>
:> :Can you check the vswitch logs during the packet loss to see if there
are
:> :any messages indicating a reason? If that doesn't show anything and it
can
:> :be reliably reproduced, it might be worth increasing the logging for the
:> :vswitch to debug.
:> :
:> :
:> :
:> :On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx 
wrote:
:> :
:> :> Hi All,
:> :>
:> :> I have a very busy VM (well one of my users does I don't have access
:> :> but do have cooperative and copentent admin to interact with on th
:> :> eother end).
:> :>
:> :> At peak times it *sometimes* misses packets.  I've been didding in for
:> :> a bit ant it looks like they get dropped in OVS land.
:> :>
:> :> The VM's main function in life is to pull down webpages from other
:> :> sites and analyze as requested.  During peak times ( EU/US working
:> :> hours ) it sometimes hangs some requests and sometimes fails.
:> :>
:> :> Looking at traffic the out bound SYN request from VM is always good
:> :> and returning ACK always gets to physical interface of the hypervisosr
:> :> (on a provider vlan).
:> :>
:> :> When packets get dropped they do not make it to the qvo-XX on
:> :> the integration bridge.
:> :>
:> :> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
:> :> from external to internal vlan-id but neither quite sure how to prove
:> :> that or what to do about it.
:> :>
:> :> My initial though had been to blame contrack but drops are happening
:> :> before the iptables rules and while there's a lot of connections on
:> :> this hypervisor:
:> :>
:> :> net.netfilter.nf_conntrack_count = 351880
:> :>
:> :> There should be plent of overhead to handle:
:> :>
:> :> net.netfilter.nf_conntrack_max = 1048576
:> :>
:> :> Anyone have thought son where to go with this?
:> :>
:> :> version details:
:> :> Ubuntu 14.04

Re: [Openstack-operators] [neutron] ML2/OVS dropping packets?

2017-06-21 Thread Jonathan Proulx


So this all gets more interesting the packets aren't lost they get
routed (switched?) to the wrong interface...


The VM has two interfaces on the same network. Not sure this makes
sense and wes done because this was a straight physical to virtual
migration.  But seems like it should work

so VM is sending SYN from it's (vm)eth0 -> tap0 -> qvb0 -> qvo0 -> int-eth1-br
-> phy-eht1-br -> (hypervisor)eth1 -> WORLD

but the ACK is coming back (hypervisor)eth1 ->  phy-eht1-br ->
int-eth1-br -> qvo1 !!! -> qvb1 -> tap1 where presumablely sec-group
rules see it as invalid and drop it.

This is quite odd.  Default route on VM is through eth0 where packets
are originatine and where teh ipv4 address it should return to is.

really puzzled why OVS is sending packets back through wrong path.

on the one hand I want to say stop doing that just put both addresses
on one port, on the other I see no reason why it shouldn't work.

-Jon
 

On Wed, Jun 21, 2017 at 05:35:02PM +0100, Stig Telfer wrote:
:Hi Jon -
:
:From what I understand, while you might have gone to the trouble of 
configuring a lossless data centre ethernet, that guarantee of packet loss ends 
at the hypervisor. OVS (and other virtual switches) will drop packets rather 
than exert back pressure.
:
:I saw a useful paper from IBM Zurich on developing a flow-controlled virtual 
switch:
:
:http://researcher.ibm.com/researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf
 

:
:It’s a bit dated (2013) but may still apply.
:
:If you figure out a way of preventing this with modern OVS, I’d be very 
interested to know.
:
:Best wishes,
:Stig
:
:
:> On 21 Jun 2017, at 16:24, Jonathan Proulx  wrote:
:> 
:> On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote:
:> :Are there any events going on during these outages that would cause
:> :reprogramming by the Neutron agent? (e.g. port updates) If not, it's likely
:> :an OVS issue and you might want to cross-post to the ovs-discuss mailing
:> :list.
:> 
:> Guess I'll have to wander deeper into OVS land.
:> 
:> No agent updates and nothing in ovs logs (at INFO), flipping to Debug
:> and there's so many messages they get dropped:
:> 
:> 017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log messages in 
last 0 seconds (most recently, 0 seconds ago) due to excessive rate
:> 
:> /me wanders over to ovs-discuss
:> 
:> Thanks,
:> -Jon
:> 
:> :Can you check the vswitch logs during the packet loss to see if there are
:> :any messages indicating a reason? If that doesn't show anything and it can
:> :be reliably reproduced, it might be worth increasing the logging for the
:> :vswitch to debug.
:> :
:> :
:> :
:> :On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx  
wrote:
:> :
:> :> Hi All,
:> :>
:> :> I have a very busy VM (well one of my users does I don't have access
:> :> but do have cooperative and copentent admin to interact with on th
:> :> eother end).
:> :>
:> :> At peak times it *sometimes* misses packets.  I've been didding in for
:> :> a bit ant it looks like they get dropped in OVS land.
:> :>
:> :> The VM's main function in life is to pull down webpages from other
:> :> sites and analyze as requested.  During peak times ( EU/US working
:> :> hours ) it sometimes hangs some requests and sometimes fails.
:> :>
:> :> Looking at traffic the out bound SYN request from VM is always good
:> :> and returning ACK always gets to physical interface of the hypervisosr
:> :> (on a provider vlan).
:> :>
:> :> When packets get dropped they do not make it to the qvo-XX on
:> :> the integration bridge.
:> :>
:> :> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
:> :> from external to internal vlan-id but neither quite sure how to prove
:> :> that or what to do about it.
:> :>
:> :> My initial though had been to blame contrack but drops are happening
:> :> before the iptables rules and while there's a lot of connections on
:> :> this hypervisor:
:> :>
:> :> net.netfilter.nf_conntrack_count = 351880
:> :>
:> :> There should be plent of overhead to handle:
:> :>
:> :> net.netfilter.nf_conntrack_max = 1048576
:> :>
:> :> Anyone have thought son where to go with this?
:> :>
:> :> version details:
:> :> Ubuntu 14.04
:> :> OpenStack Mitaka
:> :> ovs-vsctl (Open vSwitch) 2.5.0
:> :>
:> :> Thanks,
:> :> -Jon
:> :>
:> :> --
:> :>
:> :> ___
:> :> OpenStack-operators mailing list
:> :> OpenStack-operators@lists.openstack.org
:> :> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:> :>
:> 
:> -- 
:> 
:> ___
:> OpenStack-operators mailing list
:> OpenStack-operators@lists.openstack.org
:> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:

-- 

___
OpenStack-operators mailing list

Re: [Openstack-operators] [neutron] ML2/OVS dropping packets?

2017-06-21 Thread Stig Telfer
Hi Jon -

From what I understand, while you might have gone to the trouble of configuring 
a lossless data centre ethernet, that guarantee of packet loss ends at the 
hypervisor. OVS (and other virtual switches) will drop packets rather than 
exert back pressure.

I saw a useful paper from IBM Zurich on developing a flow-controlled virtual 
switch:

http://researcher.ibm.com/researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf
 


It’s a bit dated (2013) but may still apply.

If you figure out a way of preventing this with modern OVS, I’d be very 
interested to know.

Best wishes,
Stig


> On 21 Jun 2017, at 16:24, Jonathan Proulx  wrote:
> 
> On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote:
> :Are there any events going on during these outages that would cause
> :reprogramming by the Neutron agent? (e.g. port updates) If not, it's likely
> :an OVS issue and you might want to cross-post to the ovs-discuss mailing
> :list.
> 
> Guess I'll have to wander deeper into OVS land.
> 
> No agent updates and nothing in ovs logs (at INFO), flipping to Debug
> and there's so many messages they get dropped:
> 
> 017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log messages in 
> last 0 seconds (most recently, 0 seconds ago) due to excessive rate
> 
> /me wanders over to ovs-discuss
> 
> Thanks,
> -Jon
> 
> :Can you check the vswitch logs during the packet loss to see if there are
> :any messages indicating a reason? If that doesn't show anything and it can
> :be reliably reproduced, it might be worth increasing the logging for the
> :vswitch to debug.
> :
> :
> :
> :On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx  wrote:
> :
> :> Hi All,
> :>
> :> I have a very busy VM (well one of my users does I don't have access
> :> but do have cooperative and copentent admin to interact with on th
> :> eother end).
> :>
> :> At peak times it *sometimes* misses packets.  I've been didding in for
> :> a bit ant it looks like they get dropped in OVS land.
> :>
> :> The VM's main function in life is to pull down webpages from other
> :> sites and analyze as requested.  During peak times ( EU/US working
> :> hours ) it sometimes hangs some requests and sometimes fails.
> :>
> :> Looking at traffic the out bound SYN request from VM is always good
> :> and returning ACK always gets to physical interface of the hypervisosr
> :> (on a provider vlan).
> :>
> :> When packets get dropped they do not make it to the qvo-XX on
> :> the integration bridge.
> :>
> :> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
> :> from external to internal vlan-id but neither quite sure how to prove
> :> that or what to do about it.
> :>
> :> My initial though had been to blame contrack but drops are happening
> :> before the iptables rules and while there's a lot of connections on
> :> this hypervisor:
> :>
> :> net.netfilter.nf_conntrack_count = 351880
> :>
> :> There should be plent of overhead to handle:
> :>
> :> net.netfilter.nf_conntrack_max = 1048576
> :>
> :> Anyone have thought son where to go with this?
> :>
> :> version details:
> :> Ubuntu 14.04
> :> OpenStack Mitaka
> :> ovs-vsctl (Open vSwitch) 2.5.0
> :>
> :> Thanks,
> :> -Jon
> :>
> :> --
> :>
> :> ___
> :> OpenStack-operators mailing list
> :> OpenStack-operators@lists.openstack.org
> :> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> :>
> 
> -- 
> 
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [neutron] ML2/OVS dropping packets?

2017-06-21 Thread Jonathan Proulx
On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote:
:Are there any events going on during these outages that would cause
:reprogramming by the Neutron agent? (e.g. port updates) If not, it's likely
:an OVS issue and you might want to cross-post to the ovs-discuss mailing
:list.

Guess I'll have to wander deeper into OVS land.

No agent updates and nothing in ovs logs (at INFO), flipping to Debug
and there's so many messages they get dropped:

017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log messages in 
last 0 seconds (most recently, 0 seconds ago) due to excessive rate

/me wanders over to ovs-discuss

Thanks,
-Jon

:Can you check the vswitch logs during the packet loss to see if there are
:any messages indicating a reason? If that doesn't show anything and it can
:be reliably reproduced, it might be worth increasing the logging for the
:vswitch to debug.
:
:
:
:On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx  wrote:
:
:> Hi All,
:>
:> I have a very busy VM (well one of my users does I don't have access
:> but do have cooperative and copentent admin to interact with on th
:> eother end).
:>
:> At peak times it *sometimes* misses packets.  I've been didding in for
:> a bit ant it looks like they get dropped in OVS land.
:>
:> The VM's main function in life is to pull down webpages from other
:> sites and analyze as requested.  During peak times ( EU/US working
:> hours ) it sometimes hangs some requests and sometimes fails.
:>
:> Looking at traffic the out bound SYN request from VM is always good
:> and returning ACK always gets to physical interface of the hypervisosr
:> (on a provider vlan).
:>
:> When packets get dropped they do not make it to the qvo-XX on
:> the integration bridge.
:>
:> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
:> from external to internal vlan-id but neither quite sure how to prove
:> that or what to do about it.
:>
:> My initial though had been to blame contrack but drops are happening
:> before the iptables rules and while there's a lot of connections on
:> this hypervisor:
:>
:> net.netfilter.nf_conntrack_count = 351880
:>
:> There should be plent of overhead to handle:
:>
:> net.netfilter.nf_conntrack_max = 1048576
:>
:> Anyone have thought son where to go with this?
:>
:> version details:
:> Ubuntu 14.04
:> OpenStack Mitaka
:> ovs-vsctl (Open vSwitch) 2.5.0
:>
:> Thanks,
:> -Jon
:>
:> --
:>
:> ___
:> OpenStack-operators mailing list
:> OpenStack-operators@lists.openstack.org
:> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:>

-- 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [neutron] ML2/OVS dropping packets?

2017-06-21 Thread Kevin Benton
Are there any events going on during these outages that would cause
reprogramming by the Neutron agent? (e.g. port updates) If not, it's likely
an OVS issue and you might want to cross-post to the ovs-discuss mailing
list.

Can you check the vswitch logs during the packet loss to see if there are
any messages indicating a reason? If that doesn't show anything and it can
be reliably reproduced, it might be worth increasing the logging for the
vswitch to debug.



On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx  wrote:

> Hi All,
>
> I have a very busy VM (well one of my users does I don't have access
> but do have cooperative and copentent admin to interact with on th
> eother end).
>
> At peak times it *sometimes* misses packets.  I've been didding in for
> a bit ant it looks like they get dropped in OVS land.
>
> The VM's main function in life is to pull down webpages from other
> sites and analyze as requested.  During peak times ( EU/US working
> hours ) it sometimes hangs some requests and sometimes fails.
>
> Looking at traffic the out bound SYN request from VM is always good
> and returning ACK always gets to physical interface of the hypervisosr
> (on a provider vlan).
>
> When packets get dropped they do not make it to the qvo-XX on
> the integration bridge.
>
> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
> from external to internal vlan-id but neither quite sure how to prove
> that or what to do about it.
>
> My initial though had been to blame contrack but drops are happening
> before the iptables rules and while there's a lot of connections on
> this hypervisor:
>
> net.netfilter.nf_conntrack_count = 351880
>
> There should be plent of overhead to handle:
>
> net.netfilter.nf_conntrack_max = 1048576
>
> Anyone have thought son where to go with this?
>
> version details:
> Ubuntu 14.04
> OpenStack Mitaka
> ovs-vsctl (Open vSwitch) 2.5.0
>
> Thanks,
> -Jon
>
> --
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [neutron] ML2/OVS dropping packets?

2017-06-20 Thread Jonathan Proulx
Hi All,

I have a very busy VM (well one of my users does I don't have access
but do have cooperative and copentent admin to interact with on th
eother end).

At peak times it *sometimes* misses packets.  I've been didding in for
a bit ant it looks like they get dropped in OVS land.

The VM's main function in life is to pull down webpages from other
sites and analyze as requested.  During peak times ( EU/US working
hours ) it sometimes hangs some requests and sometimes fails.

Looking at traffic the out bound SYN request from VM is always good
and returning ACK always gets to physical interface of the hypervisosr
(on a provider vlan).

When packets get dropped they do not make it to the qvo-XX on
the integration bridge.

My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
from external to internal vlan-id but neither quite sure how to prove
that or what to do about it.

My initial though had been to blame contrack but drops are happening
before the iptables rules and while there's a lot of connections on
this hypervisor: 

net.netfilter.nf_conntrack_count = 351880

There should be plent of overhead to handle:

net.netfilter.nf_conntrack_max = 1048576

Anyone have thought son where to go with this?

version details:
Ubuntu 14.04
OpenStack Mitaka
ovs-vsctl (Open vSwitch) 2.5.0

Thanks,
-Jon

-- 

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators