Re: tpmr(4): 802.1Q Two-Port MAC Relay

2019-07-30 Thread David Gwynne



> On 30 Jul 2019, at 6:28 pm, Remi Locherer  wrote:
> 
> On Tue, Jul 30, 2019 at 01:36:59PM +1000, David Gwynne wrote:
>> a Two-Port MAC Relay is basically a cut down bridge(4). it only supports
>> two ports, and unconditionally relays packets between those ports
>> instead of doing learning or anything like that.
>> 
>> i've been trying to get a redundant pair of bridges set up between two
>> datacenters here to help me while i migrate between them. so far all my
>> efforts to make it redundant have mostly worked, until they introduced
>> loops in the layer 2 topology, which generates a broadcast storm, which
>> basically takes the net down for a few minutes at a time. it's feels
>> very betraying.
>> 
>> my frustration is that switches plugged together have mechanisms to
>> prevent loops like that, more specifically they use spanning tree or
>> lacp to make appropriate use of redundant links. i got to a point where
>> i just wanted the switches to talk to each other and do their own thing
>> to negotiate use of the redundant links.
>> 
>> unfortunately the only way to get ethernet packets off a physical
>> wire and onto a tunnel over an ip network is bridge(4), and bridge(4)
>> tries to be a compliant switch from a standards point of view. this
>> means it intercepts packets that are meant to be processed by bridges,
>> because it is a bridge. these types of packets include spanning tree and
>> lacp, which means i couldnt get the physical switches at each site to
>> talk to each other. sadface.
>> 
>> so to solve my problem i hacked up a small driver that did less than
>> bridge(4). however, it turns out that what i hacked up is an actual
>> thing that already exists as something done in the real world. IEEE
>> 802.1Q describes TPMR, which is defined as intercepting far less
>> than a real bridge does. one of the appendices specifically describes
>> lacp going through one, which is exactly what i wanted. cisco does
>> something like this with their layer 2 cross-connects (search for cisco
>> xconnect for examples), juniper has l2circuits, and so on.
>> 
>> the way i'm using this is like below. i have a pair of bridges in each
>> datacenter, so 4 boxes in total. they peer directly with the ip network
>> that sits between the datacenter. each box has a 4 physical network
>> ports. 2 of those ports are configured with aggr(4) and talk IP into the
>> core network. the other two ports are connected to the switches at
>> each site for use with tpmr. there's 2 etherip interfaces configured on
>> each physical box, each of which is connected to the tpmr.
>> 
>> all that together looks a bit like the following:
>> 
>> +-+ +--+  +---+ +-+
>> |d|-|ix2 <-> tpmr0 <-> etherip0|--|etherip0 <-> tpmr0 <-> ixl0|-|d|
>> |c| |  |  |   | |c|
>> |0|-|ix3 <-> tpmr1 <-> etherip1|--|etherip1 <-> tpmr1 <-> ixl1|-|1|
>> ||| +--+ \  / +---+ |||
>> |s| dc0-bridge0   \/  dc1-bridge0   |s|
>> |w|   /\|w|
>> |i| +--+ /  \ +---+ |i|
>> |t|-|ix2 <-> tpmr0 <-> etherip0|--|etherip0 <-> tpmr0 <-> ixl0|-|t|
>> |c| |  |  |   | |c|
>> |h|-|ix3 <-> tpmr1 <-> etherip1|--|etherip1 <-> tpmr1 <-> ixl1|-|h|
>> +-+ +--+  +---+ +-+
>> dc0-bridge1   dc1-bridge1
>> 
>> each switch has a 4 port port-channel (lacp aggregation) set up. because
>> each physical interface on the bridges are tied to a single tunnel, the
>> packets effectively traverse a point-to-point link, ie, a really
>> complicated wire. because lacp makes it from each point to the other
>> point, the switches make sure only active lacp ports are used, which
>> avoids layer 2 loops. lacp also means i get to use all the links when
>> theyre available.
>> 
>> with the topology above i can lose a bridge at each site and should
>> still have a working link to the other side, so i get my redundancy. the
>> use of the extra links with lacp is a bonus. at this point i would have
>> been happy for spanning tree to shut links down.
>> 
>> anyway, here's the code.
>> 
>> it was originally called xcon(4) since it provides a software
>> cross-connect, but i changed my mind after looking at 802.1Q. it might
>> be unfair to refer to 802.1Q because tpmr(4) does none of the filtering
>> that the spec says it should. i just needed it to work though.
>> 
>> the guts of it is tpmr_input(). it basically gets the rxed packet from
>> one port and enqueues it for tranmission immediately on the other port.
>> it does run bpf though, and supports filtering on bpf, which has been
>> handy for us when we needed to test taking bpdus off the wire for a bit.
>> 
>> because it does

Re: tpmr(4): 802.1Q Two-Port MAC Relay

2019-07-30 Thread Remi Locherer
On Tue, Jul 30, 2019 at 01:36:59PM +1000, David Gwynne wrote:
> a Two-Port MAC Relay is basically a cut down bridge(4). it only supports
> two ports, and unconditionally relays packets between those ports
> instead of doing learning or anything like that.
> 
> i've been trying to get a redundant pair of bridges set up between two
> datacenters here to help me while i migrate between them. so far all my
> efforts to make it redundant have mostly worked, until they introduced
> loops in the layer 2 topology, which generates a broadcast storm, which
> basically takes the net down for a few minutes at a time. it's feels
> very betraying.
> 
> my frustration is that switches plugged together have mechanisms to
> prevent loops like that, more specifically they use spanning tree or
> lacp to make appropriate use of redundant links. i got to a point where
> i just wanted the switches to talk to each other and do their own thing
> to negotiate use of the redundant links.
> 
> unfortunately the only way to get ethernet packets off a physical
> wire and onto a tunnel over an ip network is bridge(4), and bridge(4)
> tries to be a compliant switch from a standards point of view. this
> means it intercepts packets that are meant to be processed by bridges,
> because it is a bridge. these types of packets include spanning tree and
> lacp, which means i couldnt get the physical switches at each site to
> talk to each other. sadface.
> 
> so to solve my problem i hacked up a small driver that did less than
> bridge(4). however, it turns out that what i hacked up is an actual
> thing that already exists as something done in the real world. IEEE
> 802.1Q describes TPMR, which is defined as intercepting far less
> than a real bridge does. one of the appendices specifically describes
> lacp going through one, which is exactly what i wanted. cisco does
> something like this with their layer 2 cross-connects (search for cisco
> xconnect for examples), juniper has l2circuits, and so on.
> 
> the way i'm using this is like below. i have a pair of bridges in each
> datacenter, so 4 boxes in total. they peer directly with the ip network
> that sits between the datacenter. each box has a 4 physical network
> ports. 2 of those ports are configured with aggr(4) and talk IP into the
> core network. the other two ports are connected to the switches at
> each site for use with tpmr. there's 2 etherip interfaces configured on
> each physical box, each of which is connected to the tpmr.
> 
> all that together looks a bit like the following:
> 
>  +-+ +--+  +---+ +-+
>  |d|-|ix2 <-> tpmr0 <-> etherip0|--|etherip0 <-> tpmr0 <-> ixl0|-|d|
>  |c| |  |  |   | |c|
>  |0|-|ix3 <-> tpmr1 <-> etherip1|--|etherip1 <-> tpmr1 <-> ixl1|-|1|
>  ||| +--+ \  / +---+ |||
>  |s| dc0-bridge0   \/  dc1-bridge0   |s|
>  |w|   /\|w|
>  |i| +--+ /  \ +---+ |i|
>  |t|-|ix2 <-> tpmr0 <-> etherip0|--|etherip0 <-> tpmr0 <-> ixl0|-|t|
>  |c| |  |  |   | |c|
>  |h|-|ix3 <-> tpmr1 <-> etherip1|--|etherip1 <-> tpmr1 <-> ixl1|-|h|
>  +-+ +--+  +---+ +-+
>  dc0-bridge1   dc1-bridge1
> 
> each switch has a 4 port port-channel (lacp aggregation) set up. because
> each physical interface on the bridges are tied to a single tunnel, the
> packets effectively traverse a point-to-point link, ie, a really
> complicated wire. because lacp makes it from each point to the other
> point, the switches make sure only active lacp ports are used, which
> avoids layer 2 loops. lacp also means i get to use all the links when
> theyre available.
> 
> with the topology above i can lose a bridge at each site and should
> still have a working link to the other side, so i get my redundancy. the
> use of the extra links with lacp is a bonus. at this point i would have
> been happy for spanning tree to shut links down.
> 
> anyway, here's the code.
> 
> it was originally called xcon(4) since it provides a software
> cross-connect, but i changed my mind after looking at 802.1Q. it might
> be unfair to refer to 802.1Q because tpmr(4) does none of the filtering
> that the spec says it should. i just needed it to work though.
> 
> the guts of it is tpmr_input(). it basically gets the rxed packet from
> one port and enqueues it for tranmission immediately on the other port.
> it does run bpf though, and supports filtering on bpf, which has been
> handy for us when we needed to test taking bpdus off the wire for a bit.
> 
> because it does such a small amount of work, it is relatively fast.
> hrvoje popovski has given it a quick spin and seen the following
> results o

Re: tpmr(4): 802.1Q Two-Port MAC Relay

2019-07-29 Thread David Gwynne
On Tue, Jul 30, 2019 at 01:36:59PM +1000, David Gwynne wrote:
> a Two-Port MAC Relay is basically a cut down bridge(4). it only supports
> two ports, and unconditionally relays packets between those ports
> instead of doing learning or anything like that.

i had written a manpage too:


TPMR(4)  Device Drivers Manual TPMR(4)

NAME
 tpmr - IEEE 802.1Q Two-Port MAC Relay interface

SYNOPSIS
 pseudo-device tpmr

DESCRIPTION
 The tpmr driver implements an 802.1Q (originally 802.1aj) Two-Port MAC
 Relay (TPMR), otherwise known as an Ethernet cross-connect, or layer 2
 circuit.

 A TPMR is a simplified Ethernet bridge that provides a subset of the
 functionality as found in bridge(4).  A TPMR has exactly two ports, and
 unconditionally relays Ethernet packets between the two ports.

 tpmr interfaces can be created at runtime using the ifconfig tpmrN create
 command or by setting up a hostname.if(5) configuration file for
 netstart(8).  The interface itself can be configured with ifconfig(8);
 see its manual page for more information.

 tpmr interfaces may be configured with ifconfig(8) and netstart(8) using
 the following options:

 trunkport child-iface
 Add child-iface as a port.

 -trunkport child-iface
 Remove the port child-iface.

 Other forms of Ethernet bridging are available using the bridge(4)
 driver.  Other forms of aggregation of Ethernet interfaces are available
 using the aggr(4) and trunk(4) drivers.

EXAMPLES
 tpmr can be used to cross-connect Ethernet devices that support different
 physical media.  For example, a device that supports a 100baseTX half-
 duplex connection can be connected to a switch with 1000baseSX optical
 ports by using tpmr with a pair of physical network interfaces, each of
 which supports the required media types.  If fxp(4) is used to connect to
 the 100baseTX device, and em(4) is used to connect to the 1000baseSX
 switch, the following configuration can be used:

 # ifconfig tpmr0 create
 # ifconfig tpmr0 trunkport fxp0 trunkport em0
 # ifconfig fxp0 up
 # ifconfig em0 up
 # ifconfig tpmr0 up

 Multiple TPMRs can be chained to transport Ethernet traffic for a pair of
 devices over another network.  Given two physically separate Ethernet
 switches, TPMRs can be used as follows to provide a point-to-point
 Ethernet link between them.  TPMRs allow the use of the Link Aggregation
 Control Protocol (LACP) or Spanning Tree Protocol (STP) by the switches
 to detect communication failures or connectivity loops respectively,
 which is not possible using bridge(4) as it filters those protocols.

 If Host A connected to Router B has the external IP address 192.0.2.10 on
 em0, Host D connected to Router C has the external IP address
 198.51.100.14 on em0, and both hosts have em1 connected to the switches,
 the following configuration can be used to connect the switches together.
 etherip(4) is used to transport the Ethernet packets over the IP network.

 Switch X  Host A -- tunnel --- Host D  Switch E
\/
 \  /
  + Router B  Router C +

 Create the tpmr and etherip(4) interfaces:

   # ifconfig etherip0 create
   # ifconfig tpmr0 create

 Configure the etherip interface:

   (on Host A) # ifconfig etherip0 tunnel 192.0.2.10 198.51.100.14 up
   (on Host D) # ifconfig etherip0 tunnel 198.51.100.14 192.0.2.10 up

 Add the etherip interface and physical interface to the TPMR:

   # ifconfig tpmr0 trunkport em1 trunkport etherip0 up

 An equivalent setup using MPLS pseudowires instead of IP as the transport
 can be built using mpw(4) interfaces.

SEE ALSO
 aggr(4), bridge(4), trunk(4), hostname.if(5), ifconfig(8), netstart(8)

HISTORY
 The tpmr driver first appeared in OpenBSD 6.6.

OpenBSD 6.5  July 5, 2019  OpenBSD 6.5


Index: Makefile
===
RCS file: /cvs/src/share/man/man4/Makefile,v
retrieving revision 1.716
diff -u -p -r1.716 Makefile
--- Makefile5 Jul 2019 01:41:14 -   1.716
+++ Makefile30 Jul 2019 04:10:34 -
@@ -70,8 +70,8 @@ MAN=  aac.4 abcrtc.4 ac97.4 acphy.4 acrtc
st.4 ste.4 stge.4 sti.4 stp.4 sv.4 switch.4 sxiccmu.4 sximmc.4 \
sxipio.4 sxirsb.4 sxirtc.4 sxitemp.4 sxitwi.4 sym.4 sypwr.4 syscon.4 \
tcic.4 tcp.4 termios.4 tht.4 ti.4 tipmic.4 tl.4 \
-   tlphy.4 thmc.4 tpm.4 tqphy.4 trm.4 trunk.4 tsl.4 tty.4 tun.4 tap.4 \
-   twe.4 \
+   tlphy.4 thmc.4 tpm.4 tpmr.4 tqphy.4 trm.4 trunk.4 tsl.4 tty.4 \
+   tun.4 tap.4 twe.4 \
txp.4 txphy.4 uaudio.4 uark

tpmr(4): 802.1Q Two-Port MAC Relay

2019-07-29 Thread David Gwynne
a Two-Port MAC Relay is basically a cut down bridge(4). it only supports
two ports, and unconditionally relays packets between those ports
instead of doing learning or anything like that.

i've been trying to get a redundant pair of bridges set up between two
datacenters here to help me while i migrate between them. so far all my
efforts to make it redundant have mostly worked, until they introduced
loops in the layer 2 topology, which generates a broadcast storm, which
basically takes the net down for a few minutes at a time. it's feels
very betraying.

my frustration is that switches plugged together have mechanisms to
prevent loops like that, more specifically they use spanning tree or
lacp to make appropriate use of redundant links. i got to a point where
i just wanted the switches to talk to each other and do their own thing
to negotiate use of the redundant links.

unfortunately the only way to get ethernet packets off a physical
wire and onto a tunnel over an ip network is bridge(4), and bridge(4)
tries to be a compliant switch from a standards point of view. this
means it intercepts packets that are meant to be processed by bridges,
because it is a bridge. these types of packets include spanning tree and
lacp, which means i couldnt get the physical switches at each site to
talk to each other. sadface.

so to solve my problem i hacked up a small driver that did less than
bridge(4). however, it turns out that what i hacked up is an actual
thing that already exists as something done in the real world. IEEE
802.1Q describes TPMR, which is defined as intercepting far less
than a real bridge does. one of the appendices specifically describes
lacp going through one, which is exactly what i wanted. cisco does
something like this with their layer 2 cross-connects (search for cisco
xconnect for examples), juniper has l2circuits, and so on.

the way i'm using this is like below. i have a pair of bridges in each
datacenter, so 4 boxes in total. they peer directly with the ip network
that sits between the datacenter. each box has a 4 physical network
ports. 2 of those ports are configured with aggr(4) and talk IP into the
core network. the other two ports are connected to the switches at
each site for use with tpmr. there's 2 etherip interfaces configured on
each physical box, each of which is connected to the tpmr.

all that together looks a bit like the following:

 +-+ +--+  +---+ +-+
 |d|-|ix2 <-> tpmr0 <-> etherip0|--|etherip0 <-> tpmr0 <-> ixl0|-|d|
 |c| |  |  |   | |c|
 |0|-|ix3 <-> tpmr1 <-> etherip1|--|etherip1 <-> tpmr1 <-> ixl1|-|1|
 ||| +--+ \  / +---+ |||
 |s| dc0-bridge0   \/  dc1-bridge0   |s|
 |w|   /\|w|
 |i| +--+ /  \ +---+ |i|
 |t|-|ix2 <-> tpmr0 <-> etherip0|--|etherip0 <-> tpmr0 <-> ixl0|-|t|
 |c| |  |  |   | |c|
 |h|-|ix3 <-> tpmr1 <-> etherip1|--|etherip1 <-> tpmr1 <-> ixl1|-|h|
 +-+ +--+  +---+ +-+
 dc0-bridge1   dc1-bridge1

each switch has a 4 port port-channel (lacp aggregation) set up. because
each physical interface on the bridges are tied to a single tunnel, the
packets effectively traverse a point-to-point link, ie, a really
complicated wire. because lacp makes it from each point to the other
point, the switches make sure only active lacp ports are used, which
avoids layer 2 loops. lacp also means i get to use all the links when
theyre available.

with the topology above i can lose a bridge at each site and should
still have a working link to the other side, so i get my redundancy. the
use of the extra links with lacp is a bonus. at this point i would have
been happy for spanning tree to shut links down.

anyway, here's the code.

it was originally called xcon(4) since it provides a software
cross-connect, but i changed my mind after looking at 802.1Q. it might
be unfair to refer to 802.1Q because tpmr(4) does none of the filtering
that the spec says it should. i just needed it to work though.

the guts of it is tpmr_input(). it basically gets the rxed packet from
one port and enqueues it for tranmission immediately on the other port.
it does run bpf though, and supports filtering on bpf, which has been
handy for us when we needed to test taking bpdus off the wire for a bit.

because it does such a small amount of work, it is relatively fast.
hrvoje popovski has given it a quick spin and seen the following
results on a fast box with a pair of ix(4) interfaces:

plain ip forwarding: 1.5Mpps
bridge(4) under load from 14Mpps: 500Kpps
bridge(4) under load from 1Mpps: 800Kpps
tpmr(4): 1.75Mpps

1.75Mpps was lower than I was expecting, but it turns ou