Re: [Openstack] Weird nova-network bridging problem with precise/essex

2012-07-22 Thread Lorin Hochstein




On Jul 20, 2012, at 9:57 PM, Narayan Desai wrote:

 Just for the record, we found the issue. There was some filtering
 being applied in the bridge code which randomly (?) dropped some DNS
 requests. Setting:
 net.bridge.bridge-nf-call-arptables = 0
 net.bridge.bridge-nf-call-iptables = 0
 net.bridge.bridge-nf-call-ip6tables = 0
 
 completely resolved the problem.
 
 I've written up full details here:
 http://buriedlede.blogspot.com/2012/07/debugging-networking-problems-with.html
 -nld
 

Great writeup, I proposed a section to the docs for this: 
https://review.openstack.org/10106


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Weird nova-network bridging problem with precise/essex

2012-07-21 Thread Xu (Simon) Chen
Narayan,

If you do  net.bridge.bridge-nf-call-iptables = 0 on the network
controller, does floating IP still work? For each tenant/network, a subnet
is created, and the nova-network has a .1 gateway configured on the bridge
with the vlan interface plugged in.

The packets from VMs are actually sent to the bridge for NATting. But if
you doesn't allow the bridges to call iptables, it might break public
access all together. Don't know, maybe I'm not understanding the sysctl
flag correctly... Maybe it only applies to the packet transiting the
bridge, not impacting the ones destined to the nova-network?

-Simon

On Fri, Jul 20, 2012 at 9:57 PM, Narayan Desai narayan.de...@gmail.comwrote:

 Just for the record, we found the issue. There was some filtering
 being applied in the bridge code which randomly (?) dropped some DNS
 requests. Setting:
 net.bridge.bridge-nf-call-arptables = 0
 net.bridge.bridge-nf-call-iptables = 0
 net.bridge.bridge-nf-call-ip6tables = 0

 completely resolved the problem.

 I've written up full details here:

 http://buriedlede.blogspot.com/2012/07/debugging-networking-problems-with.html
  -nld

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Weird nova-network bridging problem with precise/essex

2012-07-21 Thread Narayan Desai
On Sat, Jul 21, 2012 at 6:47 AM, Xu (Simon) Chen xche...@gmail.com wrote:
 Narayan,

 If you do  net.bridge.bridge-nf-call-iptables = 0 on the network controller,
 does floating IP still work? For each tenant/network, a subnet is created,
 and the nova-network has a .1 gateway configured on the bridge with the vlan
 interface plugged in.

 The packets from VMs are actually sent to the bridge for NATting. But if you
 doesn't allow the bridges to call iptables, it might break public access all
 together. Don't know, maybe I'm not understanding the sysctl flag
 correctly... Maybe it only applies to the packet transiting the bridge, not
 impacting the ones destined to the nova-network?

Do you mean floating (private) or fixed (public) IPs? I suspect that
you mean fixed. Fixed IPs worked regardless of this setting.

The crux of the issue was that packets transiting the bridge (ie being
moved from vlan200 to the virtual br200) were hitting filtering rules.
It looks to me like the sysctls only apply to traffic moving across
the bridge (ie exactly between vlan200 and br200), but don't bypass
iptables entirely. I don't think that should effect NAT/SNAT in any
case.
 -nld

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Weird nova-network bridging problem with precise/essex

2012-07-21 Thread Xu (Simon) Chen
OK, that sounds good...

I was talking about fixed IP to floating IP SNAT, which happens on the
bridge interfaces. But if the sysctl flag only affects transiting packets,
we should be good...

-Simon

On Sat, Jul 21, 2012 at 8:15 AM, Narayan Desai narayan.de...@gmail.comwrote:

 On Sat, Jul 21, 2012 at 6:47 AM, Xu (Simon) Chen xche...@gmail.com
 wrote:
  Narayan,
 
  If you do  net.bridge.bridge-nf-call-iptables = 0 on the network
 controller,
  does floating IP still work? For each tenant/network, a subnet is
 created,
  and the nova-network has a .1 gateway configured on the bridge with the
 vlan
  interface plugged in.
 
  The packets from VMs are actually sent to the bridge for NATting. But if
 you
  doesn't allow the bridges to call iptables, it might break public access
 all
  together. Don't know, maybe I'm not understanding the sysctl flag
  correctly... Maybe it only applies to the packet transiting the bridge,
 not
  impacting the ones destined to the nova-network?

 Do you mean floating (private) or fixed (public) IPs? I suspect that
 you mean fixed. Fixed IPs worked regardless of this setting.

 The crux of the issue was that packets transiting the bridge (ie being
 moved from vlan200 to the virtual br200) were hitting filtering rules.
 It looks to me like the sysctls only apply to traffic moving across
 the bridge (ie exactly between vlan200 and br200), but don't bypass
 iptables entirely. I don't think that should effect NAT/SNAT in any
 case.
  -nld

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Weird nova-network bridging problem with precise/essex

2012-07-20 Thread Narayan Desai
Just for the record, we found the issue. There was some filtering
being applied in the bridge code which randomly (?) dropped some DNS
requests. Setting:
net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0

completely resolved the problem.

I've written up full details here:
http://buriedlede.blogspot.com/2012/07/debugging-networking-problems-with.html
 -nld

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Weird nova-network bridging problem with precise/essex

2012-07-16 Thread Nathanael Burton
Narayan,

Are you doing bonding in conjunction with your bridging + vlans? Or is it
just a single interface backing the vlan_interface?

Nate
On Jul 16, 2012 9:55 PM, Narayan Desai narayan.de...@gmail.com wrote:

 We're running into what looks like a linux bridging bug, which causes
 both substantial (20-40%) packet loss, and DNS to fail about that same
 fraction of the time. We're running essex on precise, with dedicated
 nova-network servers and VLANManager. On either of our nova-network
 servers, we see the same behavior. When tracking this down, I found
 the following, when tcpdump'ing along the path between vm instance and
 n-net gateway.

 The packets appear to make it to the nova-network server, and are
 properly pulled out of dot1q tagging:
 root@m5-p:~# tcpdump -K -p -i vlan200 -v -vv udp port 53
 tcpdump: WARNING: vlan200: no IPv4 address assigned
 tcpdump: listening on vlan200, link-type EN10MB (Ethernet), capture
 size 65535 bytes
 20:34:02.377711 IP (tos 0x0, ttl 64, id 59761, offset 0, flags [none],
 proto UDP (17), length 60)
 10.0.0.3.54937  10.0.0.1.domain: 52874+ A? www.google.com. (32)
 20:34:07.377942 IP (tos 0x0, ttl 64, id 59762, offset 0, flags [none],
 proto UDP (17), length 60)10.0.0.3.54937  10.0.0.1.domain: 52874+
 A? www.google.com. (32)
 20:34:12.378248 IP (tos 0x0, ttl 64, id 59763, offset 0, flags [none],
 proto UDP (17), length 60)10.0.0.3.54937  10.0.0.1.domain: 52874+
 A? www.google.com. (32)
 20:34:12.378428 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
 UDP (17), length 170)10.0.0.1.domain  10.0.0.3.54937: 52874 q: A?
 www.google.com. 6/0/0 www.google.com. [1d3h55m19s] CNAME
 www.l.google.com., www.l.google.com. [1m33s] A 74.125.225.209,
 www.l.google.com. [1m33s] A 74.125.225.208, www.l.google.com. [1m33s]
 A 74.125.225.212, www.l.google.com. [1m33s] A 74.125.225.211,
 www.l.google.com. [1m33s] A 74.125.225.210 (142)

 But some packets don't make it all of the way to the bridged interface:
 root@m5-p:~# brctl show
 bridge name bridge id   STP enabled interfaces
 br200   8000.fa163e18927b   no  vlan200

 root@m5-p:~# tcpdump -K -p -i br200 -v -vv udp port 53
 tcpdump: listening on br200, link-type EN10MB (Ethernet), capture size
 65535 bytes
 20:34:12.378264 IP (tos 0x0, ttl 64, id 59763, offset 0, flags [none],
 proto UDP (17), length 60)
 10.0.0.3.54937  10.0.0.1.domain: 52874+ A? www.google.com. (32)
 20:34:12.378424 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
 UDP (17), length 170)
 10.0.0.1.domain  10.0.0.3.54937: 52874 q: A? www.google.com.
 6/0/0 www.google.com. [1d3h55m19s] CNAME www.l.google.com.,
 www.l.google.com. [1m33s] A 74.125.225.209, www.l.google.com. [1m33s]
 A 74.125.225.208, www.l.google.com. [1m33s] A 74.125.225.212,
 www.l.google.com. [1m33s] A 74.125.225.211, www.l.google.com. [1m33s]
 A 74.125.225.210 (142)

 I can't find any way that ipfilter could be implicated in this; there
 aren't deny rules that are hitting.

 Oddly enough, this seems to cause no loss in icmp traffic, even with ping
 -f.

 So far, searching hasn't netted very much. I've found this similar
 sounding ubuntu bug report, but it looks like no one is working on it:
 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986043

 We're at 3.2.0-24, and there is a 3.2.0-25, but it is reported to not
 fix this issue, and neither are 3.4 kernels.

 It seems sad to try backrevving to an onieric kernel, but that is on
 my list for tomorrow.  If this is a kernel bug, it will make the
 precise default kernel unusable for nova-network servers with dot1q
 (or whatever the appropriate feature interaction is).

 Does this ring any bells, or is there another course of action I should
 attempt?
 thanks in advance for any suggestions.
  -nld

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp