Re: [Openstack] Weird nova-network bridging problem with precise/essex
On Jul 20, 2012, at 9:57 PM, Narayan Desai wrote: Just for the record, we found the issue. There was some filtering being applied in the bridge code which randomly (?) dropped some DNS requests. Setting: net.bridge.bridge-nf-call-arptables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-ip6tables = 0 completely resolved the problem. I've written up full details here: http://buriedlede.blogspot.com/2012/07/debugging-networking-problems-with.html -nld Great writeup, I proposed a section to the docs for this: https://review.openstack.org/10106 Take care, Lorin -- Lorin Hochstein Lead Architect - Cloud Services Nimbis Services, Inc. www.nimbisservices.com ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Weird nova-network bridging problem with precise/essex
Narayan, If you do net.bridge.bridge-nf-call-iptables = 0 on the network controller, does floating IP still work? For each tenant/network, a subnet is created, and the nova-network has a .1 gateway configured on the bridge with the vlan interface plugged in. The packets from VMs are actually sent to the bridge for NATting. But if you doesn't allow the bridges to call iptables, it might break public access all together. Don't know, maybe I'm not understanding the sysctl flag correctly... Maybe it only applies to the packet transiting the bridge, not impacting the ones destined to the nova-network? -Simon On Fri, Jul 20, 2012 at 9:57 PM, Narayan Desai narayan.de...@gmail.comwrote: Just for the record, we found the issue. There was some filtering being applied in the bridge code which randomly (?) dropped some DNS requests. Setting: net.bridge.bridge-nf-call-arptables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-ip6tables = 0 completely resolved the problem. I've written up full details here: http://buriedlede.blogspot.com/2012/07/debugging-networking-problems-with.html -nld ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Weird nova-network bridging problem with precise/essex
On Sat, Jul 21, 2012 at 6:47 AM, Xu (Simon) Chen xche...@gmail.com wrote: Narayan, If you do net.bridge.bridge-nf-call-iptables = 0 on the network controller, does floating IP still work? For each tenant/network, a subnet is created, and the nova-network has a .1 gateway configured on the bridge with the vlan interface plugged in. The packets from VMs are actually sent to the bridge for NATting. But if you doesn't allow the bridges to call iptables, it might break public access all together. Don't know, maybe I'm not understanding the sysctl flag correctly... Maybe it only applies to the packet transiting the bridge, not impacting the ones destined to the nova-network? Do you mean floating (private) or fixed (public) IPs? I suspect that you mean fixed. Fixed IPs worked regardless of this setting. The crux of the issue was that packets transiting the bridge (ie being moved from vlan200 to the virtual br200) were hitting filtering rules. It looks to me like the sysctls only apply to traffic moving across the bridge (ie exactly between vlan200 and br200), but don't bypass iptables entirely. I don't think that should effect NAT/SNAT in any case. -nld ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Weird nova-network bridging problem with precise/essex
OK, that sounds good... I was talking about fixed IP to floating IP SNAT, which happens on the bridge interfaces. But if the sysctl flag only affects transiting packets, we should be good... -Simon On Sat, Jul 21, 2012 at 8:15 AM, Narayan Desai narayan.de...@gmail.comwrote: On Sat, Jul 21, 2012 at 6:47 AM, Xu (Simon) Chen xche...@gmail.com wrote: Narayan, If you do net.bridge.bridge-nf-call-iptables = 0 on the network controller, does floating IP still work? For each tenant/network, a subnet is created, and the nova-network has a .1 gateway configured on the bridge with the vlan interface plugged in. The packets from VMs are actually sent to the bridge for NATting. But if you doesn't allow the bridges to call iptables, it might break public access all together. Don't know, maybe I'm not understanding the sysctl flag correctly... Maybe it only applies to the packet transiting the bridge, not impacting the ones destined to the nova-network? Do you mean floating (private) or fixed (public) IPs? I suspect that you mean fixed. Fixed IPs worked regardless of this setting. The crux of the issue was that packets transiting the bridge (ie being moved from vlan200 to the virtual br200) were hitting filtering rules. It looks to me like the sysctls only apply to traffic moving across the bridge (ie exactly between vlan200 and br200), but don't bypass iptables entirely. I don't think that should effect NAT/SNAT in any case. -nld ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Weird nova-network bridging problem with precise/essex
Just for the record, we found the issue. There was some filtering being applied in the bridge code which randomly (?) dropped some DNS requests. Setting: net.bridge.bridge-nf-call-arptables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-ip6tables = 0 completely resolved the problem. I've written up full details here: http://buriedlede.blogspot.com/2012/07/debugging-networking-problems-with.html -nld ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Weird nova-network bridging problem with precise/essex
Narayan, Are you doing bonding in conjunction with your bridging + vlans? Or is it just a single interface backing the vlan_interface? Nate On Jul 16, 2012 9:55 PM, Narayan Desai narayan.de...@gmail.com wrote: We're running into what looks like a linux bridging bug, which causes both substantial (20-40%) packet loss, and DNS to fail about that same fraction of the time. We're running essex on precise, with dedicated nova-network servers and VLANManager. On either of our nova-network servers, we see the same behavior. When tracking this down, I found the following, when tcpdump'ing along the path between vm instance and n-net gateway. The packets appear to make it to the nova-network server, and are properly pulled out of dot1q tagging: root@m5-p:~# tcpdump -K -p -i vlan200 -v -vv udp port 53 tcpdump: WARNING: vlan200: no IPv4 address assigned tcpdump: listening on vlan200, link-type EN10MB (Ethernet), capture size 65535 bytes 20:34:02.377711 IP (tos 0x0, ttl 64, id 59761, offset 0, flags [none], proto UDP (17), length 60) 10.0.0.3.54937 10.0.0.1.domain: 52874+ A? www.google.com. (32) 20:34:07.377942 IP (tos 0x0, ttl 64, id 59762, offset 0, flags [none], proto UDP (17), length 60)10.0.0.3.54937 10.0.0.1.domain: 52874+ A? www.google.com. (32) 20:34:12.378248 IP (tos 0x0, ttl 64, id 59763, offset 0, flags [none], proto UDP (17), length 60)10.0.0.3.54937 10.0.0.1.domain: 52874+ A? www.google.com. (32) 20:34:12.378428 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 170)10.0.0.1.domain 10.0.0.3.54937: 52874 q: A? www.google.com. 6/0/0 www.google.com. [1d3h55m19s] CNAME www.l.google.com., www.l.google.com. [1m33s] A 74.125.225.209, www.l.google.com. [1m33s] A 74.125.225.208, www.l.google.com. [1m33s] A 74.125.225.212, www.l.google.com. [1m33s] A 74.125.225.211, www.l.google.com. [1m33s] A 74.125.225.210 (142) But some packets don't make it all of the way to the bridged interface: root@m5-p:~# brctl show bridge name bridge id STP enabled interfaces br200 8000.fa163e18927b no vlan200 root@m5-p:~# tcpdump -K -p -i br200 -v -vv udp port 53 tcpdump: listening on br200, link-type EN10MB (Ethernet), capture size 65535 bytes 20:34:12.378264 IP (tos 0x0, ttl 64, id 59763, offset 0, flags [none], proto UDP (17), length 60) 10.0.0.3.54937 10.0.0.1.domain: 52874+ A? www.google.com. (32) 20:34:12.378424 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 170) 10.0.0.1.domain 10.0.0.3.54937: 52874 q: A? www.google.com. 6/0/0 www.google.com. [1d3h55m19s] CNAME www.l.google.com., www.l.google.com. [1m33s] A 74.125.225.209, www.l.google.com. [1m33s] A 74.125.225.208, www.l.google.com. [1m33s] A 74.125.225.212, www.l.google.com. [1m33s] A 74.125.225.211, www.l.google.com. [1m33s] A 74.125.225.210 (142) I can't find any way that ipfilter could be implicated in this; there aren't deny rules that are hitting. Oddly enough, this seems to cause no loss in icmp traffic, even with ping -f. So far, searching hasn't netted very much. I've found this similar sounding ubuntu bug report, but it looks like no one is working on it: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986043 We're at 3.2.0-24, and there is a 3.2.0-25, but it is reported to not fix this issue, and neither are 3.4 kernels. It seems sad to try backrevving to an onieric kernel, but that is on my list for tomorrow. If this is a kernel bug, it will make the precise default kernel unusable for nova-network servers with dot1q (or whatever the appropriate feature interaction is). Does this ring any bells, or is there another course of action I should attempt? thanks in advance for any suggestions. -nld ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp