Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-12 Thread Fredy Neeser


On 11.03.2015 19:31, Ian Wells wrote:
On 11 March 2015 at 04:27, Fredy Neeser fredy.nee...@solnet.ch 
mailto:fredy.nee...@solnet.ch wrote:


7: br-ex.1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc
noqueue state UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 http://192.168.1.14/24 brd
192.168.1.255 scope global br-ex.1
   valid_lft forever preferred_lft forever

8: br-ex.12: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc
noqueue state UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 http://192.168.1.14/24 brd
192.168.1.255 scope global br-ex.12
   valid_lft forever preferred_lft forever


I find it hard to believe that you want the same address configured on 
*both* of these interfaces - which one do you think will be sending 
packets?


Ian, thanks for your feedback!

I did choose the same address for the two interfaces, for three reasons:

1.  Within my home single-LAN (underlay) environment, traffic is 
switched, and VXLAN traffic is confined to VLAN 12, so there is never a 
conflict between IP 192.168.1.14 on VLAN 1 and the same IP on VLAN 12.
OTOH, for a more scalable VXLAN setup (with multiple underlays and L3 
routing in between), I would like to use different IPs for br-ex.1 and 
br-ex.12 -- for example by using separate subnets

  192.168.1.0/26  for VLAN 1
  192.168.12.0/26  for VLAN 12
However, I'm not quite there yet (see 3.).

2.  I'm using policy routing on my hosts to steer VXLAN traffic (UDP 
dest. port 4789) to interface br-ex.12 --  all other traffic from 
192.168.1.14 is source routed from br-ex.1, presumably because br-ex.1 
is a lower-numbered interface than br-ex.12  (?) -- interesting question 
whether I'm relying here on the order in which I created these two 
interfaces.


  [root@langrain ~]# ip a
  ...
  7: br-ex.1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UNKNOWN group default

  link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
 valid_lft forever preferred_lft forever
  8: br-ex.12: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc noqueue 
state UNKNOWN group default

  link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
 valid_lft forever preferred_lft forever

3.  It's not clear to me how to setup multiple nodes with packstack if a 
node's tunnel IP does not equal its admin IP (or the OpenStack API IP in 
case of a controller node).  With packstack, I can only specify the 
compute node IPs through CONFIG_COMPUTE_HOSTS. Presumably, these IPs are 
used for both packstack deployment (admin IP) and for configuring the 
VXLAN tunnel IPs (local_ip and remote_ip parameters).  How would I 
specify different IPs for these purposes? (Recall that my hosts have a 
single NIC).



In any case, native traffic on bridge br-ex is sent via br-ex.1 (VLAN 
1), which is also the reason the Neutron gateway port qg-XXX needs to be 
an access port for VLAN 1 (tag: 1).   VXLAN traffic is sent from 
br-ex.12 on all compute nodes.  See the 2 cases below:



Case 1. Max-size ping from compute node 'langrain' (192.168.1.14) to 
another host on same LAN
 = Native traffic sent from br-ex.1; no traffic sent from 
br-ex.12


[fn@langrain ~]$ ping -M do -s 1472 -c 1 192.168.1.54
PING 192.168.1.54 (192.168.1.54) 1472(1500) bytes of data.
1480 bytes from 192.168.1.54: icmp_seq=1 ttl=64 time=0.766 ms

[root@langrain ~]# tcpdump -n -i br-ex.1 dst 192.168.1.54
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex.1, link-type EN10MB (Ethernet), capture size 65535 bytes
10:32:37.666572 IP 192.168.1.14  192.168.1.54: ICMP echo request, id 
10432, seq 1, length 1480
10:32:42.673665 ARP, Request who-has 192.168.1.54 tell 192.168.1.14, 
length 28



Case 2: Max-size ping from a guest1 (10.0.0.1) on compute node 
'langrain' (192.168.1.14)
 to a guest2 (10.0.0.3) on another compute node 
(192.168.1.21) via VXLAN tunnel.

 Guests are on the same virtual network 10.0.0.0/24
 = Encapsulated traffic sent from br-ex.12; no traffic 
sent from br-ex.1


$ ping -M do -s 1472 -c 1 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 1472(1500) bytes of data.
1480 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=2.22 ms

[root@langrain ~]# tcpdump -n -i br-ex.12
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex.12, link-type EN10MB (Ethernet), capture size 65535 bytes

11:02:56.916265 IP 192.168.1.14.47872  192.168.1.21.4789: VXLAN, flags 
[I] (0x08), vni 10

ARP, Request who-has 10.0.0.3 tell 10.0.0.1, length 28
11:02:56.916991 IP 192.168.1.21.51408  192.168.1.14.4789: VXLAN, flags 
[I] (0x08), vni 10

ARP, Reply 10.0.0.3 is-at fa:16:3e:e6:e1:c8, length 28
11:02:56.917282 IP

[openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-06 Thread Fredy Neeser

Hello world

I recently created a VXLAN test setup with single-NIC compute nodes 
(using OpenStack Juno on Fedora 20), conciously ignoring the OpenStack 
advice of using nodes with at least 2 NICs ;-) .


The fact that both native and encapsulated traffic needs to pass through 
the same NIC does create some interesting challenges, but finally I got 
it working cleanly, staying clear of MTU pitfalls ...


I documented my findings here:

  [1] 
http://blog.systemathic.ch/2015/03/06/openstack-vxlan-with-single-nic-compute-nodes/
  [2] 
http://blog.systemathic.ch/2015/03/05/openstack-mtu-pitfalls-with-tunnels/


For those interested in single-NIC setups, I'm curious what you think 
about [1]  (a small patch is needed to add VLAN awareness to the 
qg-XXX Neutron gateway ports).



While catching up with Neutron changes for OpenStack Kilo, I came across 
the in-progress work on MTU selection and advertisement:


  [3]  Spec: 
https://github.com/openstack/neutron-specs/blob/master/specs/kilo/mtu-selection-and-advertisement.rst

  [4]  Patch review:  https://review.openstack.org/#/c/153733/
  [5]  Spec update:  https://review.openstack.org/#/c/159146/

Seems like [1] eliminates some additional MTU pitfalls that are not 
addressed by [3-5].


But I think it would be nice if we could achieve [1] while coordinating 
with the MTU selection and advertisement work [3-5].


Thoughts?

Cheers,
- Fredy

Fredy (Freddie) Neeser
http://blog.systeMathic.ch


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-11 Thread Fredy Neeser

[resent with a clarification of what [6] is doing towards EoM]


OK, I looked at the devstack patch

   [6] Configure mtu for ovs with the common protocols

but no -- it doesn't do the job for the VLAN-based separation
of native and encapsulated traffic, which I'm using in [1] for
a clean (correct MTUs ...) VXLAN setup with single-NIC compute nodes.

As shown in Figure 2 of [1], I'm using VLANs 1 and 12 for native
and encapsulated traffic, respectively.  I needed to manually
create br-ex ports br-ex.1 (VLAN 1) and br-ex.12 (VLAN 12) and
configure their MTUs.  Moreover, I needed a small VLAN awareness
patch to ensure that the Neutron router gateway port qg-XXX uses VLAN 1.


Consider the example below:

Example

# ip a
...
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc pfifo_fast 
master ovs-system state UP group default qlen 1000

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
...
6: br-ex: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc noqueue state 
UNKNOWN group default

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff

7: br-ex.1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UNKNOWN group default

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
   valid_lft forever preferred_lft forever

8: br-ex.12: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc noqueue 
state UNKNOWN group default

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
   valid_lft forever preferred_lft forever


# ovs-vsctl show
c0618b20-1eeb-486c-88bd-fb96988dbf96
Bridge br-tun
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port br-tun
Interface br-tun
type: internal
Port vxlan-c0a80115
Interface vxlan-c0a80115
type: vxlan
options: {df_default=true, in_key=flow, 
local_ip=192.168.1.14, out_key=flow, remote_ip=192.168.1.21}


Bridge br-ex
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
Port br-ex.12
tag: 12
Interface br-ex.12
type: internal
Port br-ex.1
tag: 1
Interface br-ex.1
type: internal
Port eth0
tag: 1
trunks: [1, 12]
Interface eth0
Port qg-e046ec4e-e3
tag: 1
Interface qg-e046ec4e-e3
type: internal
Port br-ex
Interface br-ex
type: internal

/Example


My home LAN (external network) is enabled for Jumbo frames, as can be
seen from eth0's MTU of 1554, so my path MTU for VXLAN is 1554 bytes.

VLAN 12 supports encapsulated traffic with an MTU of 1554 bytes.

This allows my VMs to use the standard MTU of 1500 regardless
of whether they are on different compute nodes (so they communicate
via VXLAN) or on the same compute node, i.e., the effective
L2 segment MTU is 1500 bytes.  Because this is the default,
I don't need to change guest MTUs at all.

For bridge br-ex, I configured two internal ports br-ex.{1,12}
as shown in the table below:

  br-ex VLAN MTURemarks
  Port
  --
  br-ex.1  11500
  br-ex.12121554
  br-ex Unused
  qg-e046ec4e-e3   1VLAN awareness patch (cf. [1])


All native traffic (including routed traffic to/from a Neutron router
and traffic generated by the Network Node itself) uses VLAN 1 on
my LAN, with an MTU of 1500 bytes.

For my small VXLAN test setup, I didn't need to assign different IPs to
br-ex.1 and br-ex.12, both are assigned 192.168.1.14/24.

So why doesn't [6] do the right thing? --

Well, obviously [6] does not add the VLAN awareness that I need
for the Neutron qg-XXX gateway ports.

Moreover, [6] tries to auto-configure the L2 segment MTU based on
guessing the path MTU by determining the MTU of an interface associated
with $TUNNEL_ENDPOINT_IP, which is 192.168.1.14 in my case.

It does this essentially by querying

  # ip -o address | awk /192.168.1.14/ {print \$2}

getting the MTU of that interface and then subtracting out the overhead
for VXLAN encapsulation.

However, in my case, the above lookup would return *two* interfaces:
  br-ex.1
  br-ex.12
so the patch [6] wouldn't know which interface's MTU it should take.

Also, when I'm doing VLAN-based traffic separation for an overlay
setup using single-NIC nodes, then I already know both the
L3 path MTU and the desired L2 segment MTU.


I'm currently checking if the MTU selection and advertisement patches
[3-5] are compatible with the VLAN-based traffic separation [1].


Regards

Fredy Neeser
http://blog.systeMathic.ch


On 06.03.2015 18:37, Attila Fazekas wrote:


Can

Re: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

2015-03-11 Thread Fredy Neeser

OK, I looked at the devstack patch

   [6] Configure mtu for ovs with the common protocols

but no -- it doesn't do the job for the VLAN-based separation
of native and encapsulated traffic, which I'm using in [1] for
a clean (correct MTUs ...) VXLAN setup with single-NIC compute nodes.

As shown in Figure 2 of [1], I'm using VLANs 1 and 12 for native
and encapsulated traffic, respectively.  I needed to manually
create br-ex ports br-ex.1 (VLAN 1) and br-ex.12 (VLAN 12) and
configure their MTUs.  Moreover, I needed a small VLAN awareness
patch to ensure that the Neutron router gateway port qg-XXX uses VLAN 1.


Consider the example below:

Example

# ip a
...
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc pfifo_fast 
master ovs-system state UP group default qlen 1000

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
...
6: br-ex: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc noqueue state 
UNKNOWN group default

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff

7: br-ex.1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UNKNOWN group default

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
   valid_lft forever preferred_lft forever

8: br-ex.12: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1554 qdisc noqueue 
state UNKNOWN group default

link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
   valid_lft forever preferred_lft forever


# ovs-vsctl show
c0618b20-1eeb-486c-88bd-fb96988dbf96
Bridge br-tun
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port br-tun
Interface br-tun
type: internal
Port vxlan-c0a80115
Interface vxlan-c0a80115
type: vxlan
options: {df_default=true, in_key=flow, 
local_ip=192.168.1.14, out_key=flow, remote_ip=192.168.1.21}


Bridge br-ex
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
Port br-ex.12
tag: 12
Interface br-ex.12
type: internal
Port br-ex.1
tag: 1
Interface br-ex.1
type: internal
Port eth0
tag: 1
trunks: [1, 12]
Interface eth0
Port qg-e046ec4e-e3
tag: 1
Interface qg-e046ec4e-e3
type: internal
Port br-ex
Interface br-ex
type: internal

/Example


My home LAN (external network) is enabled for Jumbo frames, as can be
seen from eth0's MTU of 1554, so my path MTU for VXLAN is 1554 bytes.

VLAN 12 supports encapsulated traffic with an MTU of 1554 bytes.

This allows my VMs to use the standard MTU of 1500 regardless
of whether they are on different compute nodes (so they communicate
via VXLAN) or on the same compute node, i.e., the effective
L2 segment MTU is 1500 bytes.  Because this is the default,
I don't need to change guest MTUs at all.

For bridge br-ex, I configured two internal ports br-ex.{1,12}
as shown in the table below:

  br-ex VLAN MTURemarks
  Port
  --
  br-ex.1  11500
  br-ex.12121554
  br-ex Unused
  qg-e046ec4e-e3   1VLAN awareness patch (cf. [1])


All native traffic (including routed traffic to/from a Neutron router
and traffic generated by the Network Node itself) uses VLAN 1 on
my LAN, with an MTU of 1500 bytes.

For my small VXLAN test setup, I didn't need to assign different IPs to
br-ex.1 and br-ex.12, both are assigned 192.168.1.14/24.

So why doesn't [6] do the right thing? --

Well, obviously [6] does not add the VLAN awareness that I need
for the Neutron qg-XXX gateway ports.
Moreover, [6] tries to auto-configure the L2 segment MTU based on
determining the MTU of an interface associated with
$TUNNEL_ENDPOINT_IP, which is 192.168.1.14 in my case.

It does this essentially by querying

  # ip -o address | awk /192.168.1.14/ {print \$2}

However, in my case, this would return *two* interfaces:
  br-ex.1
  br-ex.12
so the patch [6] wouldn't know which interface's MTU it should take.


I'm currently checking if the MTU selection and advertisement patches
[3-5] are compatible with the VLAN-based traffic separation [1].


Regards

Fredy Neeser
http://blog.systeMathic.ch


On 06.03.2015 18:37, Attila Fazekas wrote:


Can you check is this patch does the right thing [6]:

[6] https://review.openstack.org/#/c/112523/6

- Original Message -

From: Fredy Neeser fredy.nee...@solnet.ch
To: openstack-dev@lists.openstack.org
Sent: Friday, March 6, 2015 6:01:08 PM
Subject: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes:  
Avoiding the MTU pitfalls

Hello world

I recently