Re: mvsw(4): present each port as a separate interface

David Gwynne Mon, 30 May 2022 14:51:39 -0700

On Mon, May 30, 2022 at 09:14:47PM +0200, Mark Kettenis wrote:
> > Date: Sat, 28 May 2022 15:08:49 +1000
> > From: David Gwynne <[email protected]>
> > 
> > the espressobin is the least worst thing ive settled on. it's not
> > too expensive, it has a case, it has multiple interfaces, and
> > kettenis and patrick have already worked on the platform stuff. the
> > only problem with it was that the interfaces are on a switch chip
> > supported by mvsw, and mvsw configures the switch as a switch. this
> > makes it hard to use as a router.
> 
> So I have a Turris MOX that is based on the same SoC.  It is modular
> and I have a switch module that integrates the same Marvell switch
> chip.  That is the hardware I used to develop the current mvsw(4)
> driver.  The interesting thing is that the Turris MOX actually has two
> mvneta(4) interface; mvneta0 is connected to a normal Ethernet PHY,
> whereas mvneta1 is connected to the switch chip.  In this setup it
> makes sense to use the mvneta0 port as a "WAN" port and the swithc
> ports as "LAN" ports and have let the hardware do the switching
> between the "LAN" ports.


nice. i didn't realise you were using the mox in a load bearing
capacity already.

the way mvsw interacts with mvneta after this diff depends on what's
in the fdt. the mvsw ports nodes should have one that references
the phandle of an ethernet interface, which is how the "cpu" ports
on the switch are identified. my diff also adds glue to of_misc.c
that allows drivers to register interfaces so they can be found by
node or phandle, and i've added that registration to mvneta. that
allows mvsw to find it later, and then steal it.

stealing it means it takes it over like aggr/trunk takes over an
interface. it replaces the input handler on mvneta (which is
ether_input) with mvsw_p_input, which allows the kernel to demux
packets coming from mvsw. mvsw itself is configured to add it's dsa
tag to packets heading toward the cpu, which is how the kernel knows
which downstream/user port actually received the packet.

however, if the fdt doesnt have an ethernet port node for an mvneta
interface, mvneta is left alone. on the mox i think mvneta0 will
remain independent, and mvneta1 will get taken over by the switch.

> > this diff makes mvsw present its ports as individual interfaces,
> > and configures the switch so each external port can only talk to
> > the host system. it's basically an ethernet mux instead of a switch.
> > 
> > this is inspired by the dsa framework in linux, which in turn seems
> > to have been inspired by and written for the link street family of
> > switch chips thats in the espressobin.
> > 
> > like dsa, mvsw now tries to understand the different roles of switch
> > ports and their topology. most ports are externally accessible, and one
> > is wired up internally to mvneta. the fdt describes these roles and
> > relationships, and now mvsw does different things depending on these
> > roles.
> > 
> > the internal port wired up to mvneta is a "cpu" interface. mvsw now
> > takes over mvneta, much like how aggr or trunk takes over an interface,
> > and unconditionally configures the switch port to tag packets so the
> > kernel can know which mvsw port a packet was received on.
> > 
> > the externally accessible ports are enumerated and attached as separate
> > network interfaces in the kernel, and configured on the switch so that
> > they can only talk to the cpu/mvneta switch port. packets sent out
> > a kernel mvsport interface are tagged and sent out mvneta instead,
> > kind of like vlan interfaces.
> 
> How does configuring IP addresses work with your diff?  Do you still
> configure and address on the mvneta(4) interface?  Or do you configure
> addresses on the individual mvsport(4) interfaces?  And how does one
> configure additional VLANs in this scenario?

mvneta (or whatever ethernet interface is wired up to the switch)
with an mvsw on it is no longer usable for layer 3 in the kernel.
you just have to bring it up so packets can flow over it. the stack
operates on mvsport interfaces instead.

mvsw is configured to ignore vlans, so they get passed straight
from the port they're received on to the cpu, and visa versa.
my isp here uses pppoe over vlan 2, which works fine over an mvsport
interface.

ebin# for i in /etc/hostname.*; do echo == $i ==; cat $i; done
== /etc/hostname.mvneta0 ==
up
== /etc/hostname.mvsport1 ==
up
== /etc/hostname.mvsport2 ==
inet autoconf
ebin# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 32768
        index 3 priority 0 llprio 3
        groups: lo
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
mvneta0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> 
mtu 1500
        lladdr f0:ad:4e:1c:08:42
        index 1 priority 0 llprio 3
        trunk: trunkdev mvsw0
        media: Ethernet autoselect (1000baseT full-duplex)
        status: active
enc0: flags=0<>
        index 2 priority 0 llprio 3
        groups: enc
        status: active
mvsport0: flags=8002<BROADCAST,MULTICAST> mtu 1500
        lladdr 00:51:82:11:22:03
        description: lan1
        index 4 priority 0 llprio 3
        media: Ethernet autoselect (none)
        status: no carrier
mvsport1: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        lladdr 00:51:82:11:22:02
        description: lan0
        index 5 priority 0 llprio 3
        media: Ethernet autoselect (none)
        status: no carrier
mvsport2: flags=808043<UP,BROADCAST,RUNNING,MULTICAST,AUTOCONF4> mtu 1500
        lladdr 00:51:82:11:22:01
        description: wan
        index 6 priority 0 llprio 3
        groups: egress
        media: Ethernet autoselect (1000baseT full-duplex)
        status: active
        inet 192.168.1.212 netmask 0xffffff00 broadcast 192.168.1.255
pflog0: flags=141<UP,RUNNING,PROMISC> mtu 33136
        index 7 priority 0 llprio 3
        groups: pflog

> > there is no longer any config that allows mvsw switch ports to
> > communicate directly, if you want to bridge them now you have to
> > do it in software with veb (or bridge or tpmr).
> 
> So this would "break" my current setup on the Turris MOX.  Now
> actually I currently only use a single port on the switch, but my plan
> was to connect some of my equipment directly to the MOX instead of my
> 24-port switch.  I just didn't get around to doing that.

yes. right now you would need to configure veb/bridge and add the
mvsport interfaces to get l2 switching across all the ports. the
downside is the switching would be done in software, which is fine
if most of your traffic is nort/south (hosts mostly talk to the
router), but sucks if it's east/west.

dsa in linux has integration between dsa and their bridge code so
ports (eg, mvsport) added to a bridge enables "offloading" of the
bridging to the switch chip. i intend to implement this in openbsd
too.

> > presenting the ports as separate interfaces allows them to fit in
> > nicely with the existing kernel functionality. in particular, it
> > lets us wire up ifmedia and mii so you can see the phy on each port,
> > which in turn will let ifconfig operate on them as you'd expect.
> 
> That *is* nice.

except dtucker has found that i break negotiation with 100Mbps
devices, so it needs some work. it looks like mvsw itself can handle
media, so maybe i dont have to wire the miibus up and present a
phy?

> > the diff below is very rough, but i think it's far enough along
> > that it demonstrates where im going. if the ideas are acceptable,
> > i'd like to commit it and hack on it in the tree.
> 
> I would like to understand things a bit better.  And make sure this
> doesn't break things to badly for me.  That Turris MOX is my firewall
> and wireless access point, so breaking things would be bad for OpenBSD
> development ;).  Unfortunately, it also means it isn't trivial for me
> to test things.

understood. i can keep hacking on it out of the tree, but beyond a
certain point it's nice to do small changes and explain them in
commit messages.

> > there's some future work to be done. the mdio bus probably needs a lock
> > around it. i have no idea how mii/ifmedia works, so some pointers
> > there would be good.
> 
> The mvmdio(4) driver already uses a mutex to control access to the bus.

cool, good to know.

> > it would be nice to hack on another switch chip at some
> > point. recent banana pi routers might be a good candidate for
> > that. they have mediatek or realtek switches on them from what i can
> > tell, and they link to the doco for them. it might be possible to
> > factor the "port interface" code out for all these and just have the
> > switch drivers provide glue for them.
> 
> The "Banana Pi BPI-R2 Pro Router" might be the most interesting one.
> That one is based on a Rockchip RK3568 SoC for which we already have
> some support in the tree and hardware in the hands of other developers
> (me and patrick).  That one integrates a MediaTek MT7531BE switch chip.

and there's readable doco for the switch. which would be lovely.

> The other Banana Pi router boards are based on SoCs that we don't
> support.
> 
> > it would be interesting to teach veb and vlan how to offload switching
> > to a switch chip for port interfaces. there's a bunch of functionality
> > we offer in our virtual bridges that wouldn't work on a hardware
> > chip (eg, pf, bridge rules, ipsec, etc), but veb already special
> > cases vport interfaces so special casing these ports could work
> > too.
> > 
> > however, having routed interfaces is more useful to me than bridging
> > right now.
> >

Re: mvsw(4): present each port as a separate interface

Reply via email to