Re: [Openstack] A Grizzly GRE failure [SOLVED]
I've had a terrible time getting the community to help me with this problem. So special thanks to Darragh O'Reilly and rkeene on #openstack who was mean and a bit of a wisenheimer (I'd use different words elsewhere), but at least he talked to me and got me to think twice about my GRE setup. But enough of that, problem solved and a bug report has been submitted: https://bugs.launchpad.net/quantum/+bug/1179223.. I added an s to the front of persists in the subject, but whatever. I always leave one thing in the hotel room, and I always leave one embarrassing typo. Here's the part explaining how it was fixed: SOLUTION: mysql delete from ovs_tunnel_endpoints where id = 1; Query OK, 1 row affected (0.00 sec) mysql select * from ovs_tunnel_endpoints; +-++ | ip_address | id | +-++ | 192.168.239.110 | 3 | | 192.168.239.114 | 4 | | 192.168.239.115 | 5 | | 192.168.239.99 | 2 | +-++ 4 rows in set (0.00 sec) * After doing that, I simply restarted the quantum ovs agents on the network and compute nodes. The old GRE tunnel is not re-created. Thereafter, VM network traffic to and from the external network proceeds without incident. * Should these tables be cleaned up as well, I wonder: mysql select * from ovs_network_bindings; +--+--+--+-+ | network_id | network_type | physical_network | segmentation_id | +--+--+--+-+ | 4e8aacca-8b38-40ac-a628-18cac3168fe6 | gre | NULL | 2 | | af224f3f-8de6-4e0d-b043-6bcd5cb014c5 | gre | NULL | 1 | +--+--+--+-+ 2 rows in set (0.00 sec) mysql select * from ovs_tunnel_allocations where allocated != 0; +---+---+ | tunnel_id | allocated | +---+---+ | 1 | 1 | | 2 | 1 | +---+---+ 2 rows in set (0.00 sec) Cheers, and happy openstacking. Even you, rkeene! --Greg Chavez On Sat, May 11, 2013 at 2:28 PM, Greg Chavez greg.cha...@gmail.com wrote: So to be clear: * I have a three nics on my network node. The VM traffic goes out the 1st nic on 192.168.239.99/24 to the other compute nodes, while management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic is external and has no IP. * I have four GRE endpoints on the VM network, one at the network node (192.168.239.99) and three on compute nodes (192.168.239.{110,114,115}), all with IDs 2-5. * I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network node's management interface. This was the first tunnel created when I deployed the network node because that is how I set the remote_ip in the ovs plugin ini. I corrected the setting later, but the 192.168.241.99 endpoint persists and, as your response implies, *this extraneous endpoint is the cause of my troubles*. My next question then is what is happening? My guess: * I ping a guest from the external network using its floater (10.21.166.4). * It gets NAT'd at the tenant router on the network node to 192.168.252.3, at which point an arp request is sent over the unified GRE broadcast domain. * On a compute node, the arp request is received by the VM, which then sends a reply to the tenant router's MAC (which I verified with tcpdumps). * There are four endpoints for the packet to go down: Bridge br-tun Port br-tun Interface br-tun type: internal Port gre-1 Interface gre-1 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.241.99} Port gre-4 Interface gre-4 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.239.114} Port gre-3 Interface gre-3 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.239.110} Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port gre-2 Interface gre-2 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.239.99} Here's where I get confused. Does it know that gre-1 is a different broadcast domain than the others, or does is see all endpoints as the same domain? What happens here? Is this the cause of my network timeouts on external connections to the VMs? Does this also explain the sporadic nature of the timeouts, why they aren't consistent in frequency or duration? Finally, what happens when I remove the oddball endpoint from the DB? Sounds risky! Thanks for your help --Greg Chavez On Fri, May 10, 2013 at 7:17 PM, Darragh O'Reilly dara2002-openst...@yahoo.com wrote: I'm not sure how to rectify that. You may have to delete the bad row from the DB and
Re: [Openstack] A Grizzly GRE failure
So to be clear: * I have a three nics on my network node. The VM traffic goes out the 1st nic on 192.168.239.99/24 to the other compute nodes, while management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic is external and has no IP. * I have four GRE endpoints on the VM network, one at the network node (192.168.239.99) and three on compute nodes (192.168.239.{110,114,115}), all with IDs 2-5. * I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network node's management interface. This was the first tunnel created when I deployed the network node because that is how I set the remote_ip in the ovs plugin ini. I corrected the setting later, but the 192.168.241.99 endpoint persists and, as your response implies, *this extraneous endpoint is the cause of my troubles*. My next question then is what is happening? My guess: * I ping a guest from the external network using its floater (10.21.166.4). * It gets NAT'd at the tenant router on the network node to 192.168.252.3, at which point an arp request is sent over the unified GRE broadcast domain. * On a compute node, the arp request is received by the VM, which then sends a reply to the tenant router's MAC (which I verified with tcpdumps). * There are four endpoints for the packet to go down: Bridge br-tun Port br-tun Interface br-tun type: internal Port gre-1 Interface gre-1 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.241.99} Port gre-4 Interface gre-4 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.239.114} Port gre-3 Interface gre-3 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.239.110} Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port gre-2 Interface gre-2 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.239.99} Here's where I get confused. Does it know that gre-1 is a different broadcast domain than the others, or does is see all endpoints as the same domain? What happens here? Is this the cause of my network timeouts on external connections to the VMs? Does this also explain the sporadic nature of the timeouts, why they aren't consistent in frequency or duration? Finally, what happens when I remove the oddball endpoint from the DB? Sounds risky! Thanks for your help --Greg Chavez On Fri, May 10, 2013 at 7:17 PM, Darragh O'Reilly dara2002-openst...@yahoo.com wrote: I'm not sure how to rectify that. You may have to delete the bad row from the DB and restart the agents: mysql use quantum; mysql select * from ovs_tunnel_endpoints; ... On Fri, May 10, 2013 at 6:43 PM, Greg Chavez greg.cha...@gmail.com wrote: I'm refactoring my question once again (see A Grizzly arping failure and Failure to arp by quantum router). Quickly, the problem is in a multi-node Grizzly+Raring setup with a separate network node and a dedicated VLAN for VM traffic. External connections time out within a minute and dont' resume until traffic is initiated from the VM. I got some rather annoying and hostile assistance just now on IRC and while it didn't result in a fix, it got me to realize that the problem is possibly with my GRE setup. I made a mistake when I originally set this up, assigning the mgmt interface of the network node (192.168.241.99) as its GRE remote_ip instead if the vm_config network interface (192.168.239.99). I realized my mistake and reconfigured the OVS plugin on the network node and moved one. But now, taking a look at my OVS bridges on the network node, I see that the old remote IP is still there! Bridge br-tun snip Port gre-1 Interface gre-1 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.241.99} snip This is also on all the compute nodes. ( Full ovs-vsctl show output here: http://pastebin.com/xbre1fNV) What's more, I have this error every time I restart OVS: 2013-05-10 18:21:24ERROR [quantum.agent.linux.ovs_lib] Unable to execute ['ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5']. Exception: Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5'] Exit code: 1 Stdout: '' Stderr: 'ovs-vsctl: cannot create a port named gre-5 because a port named gre-5 already exists on bridge br-tun\n' Could that be because grep-1 is vestigial and possibly fouling up the works by creating two possible paths for VM traffic? Is it as simple as removing it with ovs-vsctl or is something else required? Or is this actually needed for some reason? Argh... help!
[Openstack] A Grizzly GRE failure
I'm refactoring my question once again (see A Grizzly arping failure and Failure to arp by quantum router). Quickly, the problem is in a multi-node Grizzly+Raring setup with a separate network node and a dedicated VLAN for VM traffic. External connections time out within a minute and dont' resume until traffic is initiated from the VM. I got some rather annoying and hostile assistance just now on IRC and while it didn't result in a fix, it got me to realize that the problem is possibly with my GRE setup. I made a mistake when I originally set this up, assigning the mgmt interface of the network node (192.168.241.99) as its GRE remote_ip instead if the vm_config network interface (192.168.239.99). I realized my mistake and reconfigured the OVS plugin on the network node and moved one. But now, taking a look at my OVS bridges on the network node, I see that the old remote IP is still there! Bridge br-tun snip Port gre-1 Interface gre-1 type: gre options: {in_key=flow, out_key=flow, remote_ip=192.168.241.99} snip This is also on all the compute nodes. ( Full ovs-vsctl show output here: http://pastebin.com/xbre1fNV) What's more, I have this error every time I restart OVS: 2013-05-10 18:21:24ERROR [quantum.agent.linux.ovs_lib] Unable to execute ['ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5']. Exception: Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5'] Exit code: 1 Stdout: '' Stderr: 'ovs-vsctl: cannot create a port named gre-5 because a port named gre-5 already exists on bridge br-tun\n' Could that be because grep-1 is vestigial and possibly fouling up the works by creating two possible paths for VM traffic? Is it as simple as removing it with ovs-vsctl or is something else required? Or is this actually needed for some reason? Argh... help! -- \*..+.- --Greg Chavez +//..;}; ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] A Grizzly GRE failure
I'm not sure how to rectify that. You may have to delete the bad row from the DBÂ and restart the agents: mysql use quantum; mysql select * from ovs_tunnel_endpoints; ... ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp