Hopefully I've gotten the attention of networking folk with that subject. We have an odd routing problem in our devtest environment, which I haven't figured out the cause of - but I'm wondering if anyone has some insight before I start climbing through the linuxbridge kernel code :)
In the interests of brevity I'm going to skip over *why* the setup is the way it is, but I can enlarge if needed ;). The devtest environment uses a test network - 192.0.2.0/24 broken down into bits - on an isolated bridge, to emulate a datacentre with baremetal machines. We then use 3+ VMs to simulate a deployment story. The 'seed' VM hosts a one-node baremetal nova with one registered baremetal node. The second VM - the 'undercloud' is deployed by the 'seed' cloud and is a one-node baremetal nova with all the remaining VM's registered as baremetal nodes. The third VM - the 'overcloud control plane' is deployed by the undercloud, and is a combined control node + neutron network node; if we have only 3 VM's then it also runs nova-compute kvm. Any addition VM's are scaled out nova-compute kvm nodes. The 'seed' VM has two devices - eth0 connected to virbr0, and eth1 to br99. The other nodes are all connected to br99. This approximates a datacentre network where there is no L2 connectivity between the environment OpenStack tools are being run in, and the undercloud. In the host we have two bridges: virbr0: .. inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0 br99: .. inet6 fe80::ac1b:6ff:fee1:6440/64 scope link Both are linuxbridge devices. We have routing into the test environment: $ ip route 192.0.2.0/24 via 192.168.122.128 dev virbr0 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 And naturally the seed VM has ip forwarding enabled. We have masquerading on eth0 of the seed node to permit the other vm's access to the internet via libvirt's NAT rules: iptables -A POSTROUTING -s 192.0.2.0/24 ! -d 192.168.122.1/32 -o eth0 -j MASQUERADE The nodes booted by the seed node have the seed node as their default route: $ ip route default via 192.0.2.33 dev eth0 192.0.2.32/29 dev eth0 proto kernel scope link src 192.0.2.34 Now, with linux bridge, this doesn't work. With an ovs bridge for br99 it works fine. The *way* in which it doesn't work is the mysterious thing. ping from a seed node booted instance - e.g. from the undercloud VM - to 192.168.122.1 - the virbr0 address of the host - works fine (and tcpdump shows bidirectional traffic on br99). ping from 192.168.122.1 to the undercloud VM - 192.0.2.34 - doesn't work. My immediate reaction was 'this is a NAT problem'. However, if it was an inbound NAT issue, the traffic wouldn't reach br99: and it does. tcpdumping virbr0 shows the ICMP from 192.168.122.1->192.0.2.34 as expected. tcpdumping eth0 within the seed node - ditto. tcpdumping eth1 within the seed node - ditto. tcpdumping br99 from the host - ditto. And if it was a NAT problem I'd still expect to see the incoming traffic on eth0 of the undercloud VM. tcpdumping eth0 of the undercloud VM doesn't see the frames at all. I thought it might be a checksum issue, so tried adding a checksum-fill rule to POSTROUTING on the seed node, but it had no effect. So there is the puzzle: how can traffic reach eth0 of the undercloud VM, via the seed node when it is in reply to a session initiated by the undercloud VM, but not when it's initiated from the host, while it's visible on br99 in both cases. I'm sure I've missed something simple, in my getting-over-a-virus-fog-of-thought. Puzzledly-yrs, Rob -- Robert Collins <rbtcoll...@hp.com> Distinguished Technologist HP Cloud Services _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev