Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Mon, 2012-03-12 at 09:48 +0100, Lennert Buytenhek wrote: Since it can lead to problems (address database mismatches, doesn't correctly handle STP transitions or topology changes automatically), I think it should be avoided whenever possible. I don't see any advantages of hardware based learning over software based learning anyway ('flexibility' doesn't seem like a very good argument). Indeed address mismatches may happen if you have two databases. You have two choices then: Do learning in user space or be able to tolerate some transient inconsistency (if you have some software that lazily looks at the database). But there is a case where the database sits only in hardware. In such a case, you cant have mismatches. I think the STP problem can be handled by user space regardless of whether address mismatch happens or not. It should be doable along the lines of the current DSA patch -- add a VLAN ID argument to the interface add/remove callbacks, and when a VLAN virtual interface is added to the bridge, call the relevant callbacks with the parent interface + VLAN ID instead. (This doesn't work for stacked VLANs, but the current net/dsa supported chips don't handle those anyway.) Sounds like a good start - we could have a different interface for stacked variants. I think you should push in the patch. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Tue, 2012-03-06 at 15:09 +0100, Lennert Buytenhek wrote: Why so? (I think the switch chips should just never do learning at all..) I agree that learning in software gives you more flexibility; however, I am for providing interface flexibility as well - switches have learning features. I think i should be able to use them when it makes sense to. I think it should also be upto the admin to decide whether the learning happens in the kernel or user space. I can't see any point in doing it in userspace. What would be the advantage of that? And based on what would the admin make the decision? If i wanted to do some funky access control based on some new MAC address showing up - best place to do it is user space. It does, there is an STP state field per port in the switch chip, which controls whether learning takes place on this port (in Learning and Forwarding states) and whether packets are forwarded (in the Forwarding state). ok, makes sense. But e.g. it doesn't automatically flush this port's FDB entries if you move a port from Forwarding to Listening -- the STP state field only controls direct learning and forwarding for received packets. And when you receive a BPDU with the topology change notification bit set, the switch won't automatically shorten the FDB entry timeout for you until the topology change is over, either. I have to go back and look at some manuals i have - but iirc, the ones ive played with behaved similarly. As long as we provide knobs to set/unset those different attributes, I think the handling of all that should be from software (likely some daemon in user space); then it shouldnt matter whether we are working with STP BPDUs or TRILL or thenewprotocolTM etc. Keep in mind that these chips also do VLAN tagging in hardware, and so a scenario like: # brctl addbr br123 # brctl addif br123 lan1.123 # brctl addif br123 lan2.123 is also one that can be handled in hardware (which the current patchwork patch doesn't handle yet). We would need to work with offloading VLANs, no? Do the current VLAN offloads used for NICs suffice for switching chips as well? i.e typically most chips have a table associated with some port in which the Vlan is partof or is the lookup key. You can let the switch rate limit the number of packets passed up to the CPU. 500 kp/s broadcast traffic seems somewhat excessive in any case, and I'm not sure if this deserves handling apart from QoSing those streams to manageable levels. Yes, that would provide a solution. I havent seen anything where you can rate limit the learning(SA lookup failure). cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Mon, 2012-03-05 at 17:53 +0100, Lennert Buytenhek wrote: net/dsa currently configures any switch chips in the system to do auto-learning. So we clearly need the (user configurable) knob to turn on/off learning. I think it should also be upto the admin to decide whether the learning happens in the kernel or user space. However, I would much prefer to disable that, and have the switch chip just pass up packets for new source addresses, have Linux do the learning, and then mirror the Linux software FDB into the hardware instead -- that avoids having to manually flush the hardware FDB on certain STP state transitions or having to configure the hardware to use a shorter address learning timeout when we're in the middle of an STP topology change, which are problems we are running into in practice. So in the scenario you are describing then it seems the h/ware has no stp state toggles, correct? In other ASICs i have seen, there is influence from stp state on behavior. Just curious -- while your patches allow propagating FDB entries into the hardware, do you also have hooks to tell the hardware which ports are to share address databases? I think those are missing in this discussion and makes a lot of sense to be part of the interface. net/dsa currently solves this by not having the hardware handle broadcast packets at all, which circumvents the problem, but for multicast traffic you would still like to be able to do at least the forwarding that can be done in hardware in hardware. (Unicast doesn't have this problem as long as the kernel and the switch chip agree on their view of the FDB.) Of course this could represent an interesting opportunity for a DOS. Even at 4 port switch at 100Mbs, hitting 500Kpps to the CPU (I am thinking these tiny switches end up in some tiny MIPS/ARM cpu) could be devastating. How do you deal with that? cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Thu, 2012-03-01 at 14:17 -0800, John Fastabend wrote: Hmm so I think what I'll do is this... both: ndm_flags = 0 sw : ndm_flags = NTF_SW_FDB hw : ndm_flags = NTF_HW_FDB Then current tools will work with embedded bridges and software bridges with the interesting case being when a port supporting an offloaded FDB is attached to a SW bridge. Doing both in this case seems to be a reasonable default to me. Looks good, although it seems like no backward compat is broken, it feels like the default should be whats goin on today i.e s/ware only. IOW, I would make that the 0. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Wed, 2012-02-29 at 09:25 -0800, John Fastabend wrote: Well I think NETLINK_ROUTE is the most correct type to use in this case. Per netlink.h its for routing and device hooks. #define NETLINK_ROUTE 0 /* Routing/device hook */ And NETLINK_ROUTE msg_types use the RTM_* prefix. The _*NEIGH postfix were merely a copy from the SW BRIDGE code paths. How about, PF_BRIDGE:RTM_FDB_NEWENTRY PF_BRIDGE:RTM_FDB_DELENTRY PF_BRIDGE:RTM_FDB_GETENTRY OK, I guess ;- And a new group RTNLGRP_FDB. Nod. Also using NETLINK_ROUTE gives the correct rtnl locking semantics for free. makes sense. Agreed. I think adding some ndo_ops for bridging offloads here would work. For example the DSA infrastructure and/or macvlan devices might need this. Along the lines of extending this RFC, [RFC] hardware bridging support for DSA switches http://patchwork.ozlabs.org/patch/16578/ Certainly - thats one approach that is reasonable. Where is Lennert? ;- I changed his email address to one that i am familiar with. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Wed, 2012-02-29 at 10:19 -0800, John Fastabend wrote: I want to see a unified API so that user space control applications (RSTP, TRILL?) can use one set of netlink calls for both software bridge and hardware offloaded bridges. Does this proposal meet that requirement? I dont see any issues with those requirements being met. Jamal, so why do They have to be different calls? I'm not so sure anymore... moving to RTM_FDB_XXXENTRY saved some refactoring in the bridge module but that is just cosmetic. I may not want to use the s/ware bridge i.e I may want to use h/ware bridge. I may want to use both. So there are 3 variations there. You need at least 1.5 bits to represent them if you are going to use the same interface. There may be features in either h/ware but not in s/ware and vice-versa. A single interface with flags which say this applies to hware:sware:both would be good, but it may be harder to achieve - thats why i suggested they be different. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Tue, 2012-02-28 at 20:40 -0800, John Fastabend wrote: OK back to this. The last piece is where to put these messages... we could take PF_ROUTE:RTM_*NEIGH PF_ROUTE:RTM_NEWNEIGH - Add a new FDB entry to an offloaded switch. PF_ROUTE:RTM_DELNEIGH - Delete a FDB entry from an offlaoded switch. PF_ROUTE:RTM_GETNEIGH - Dumps the embedded FDB table Why RTM_*NEIGH? RTM tends to map to Route/L3 and NEIGH tends to map to ndisc or ARP both tied to IP address resolution. While both ARP/Ndisc may play a role in the user space app populating the FDB, i dont think they are necessary players. Learning could be via a table entry miss and packet redirect to user space. So my suggestion is to use FDB_*ENTRY for names The neighbor code is using the PF_UNSPEC protocol type so we won't collide with these unless someone was using PF_ROUTE and relying on falling back to PF_UNSPEC however I couldn't find any programs that did this iproute2 certainly doesn't. And the bridge pieces are using PF_BRIDGE so no collision there. They have to be different calls from the calls that talk to the s/ware bridge. In my opinion, as controversial as this may sound, you need to be flexible enough that some vendor can replace these calls with proprietary calls which are more efficient for their hardware. So a plugin to replace these calls in the user space code would be a good idea. Alternatively, you could make that something they do at the driver level i.e from user space to kernel it is hardware, please addthistotheFDBtable() call and the implementation of that could be proprietary to the specific hardware. [..] Also if there are embedded switches with learning capabilities they might want to trigger events to user space. In this case having a protocol type makes user space a bit easier to manage. I've added Lennert so maybe he can comment I think the Marvell chipsets might support something along these lines. The SR-IOV chipsets I'm aware of _today_ don't do learning. Learning makes the event model more plausible. The other events to consider is aging of hardware entries. The other mechanism would be to embed some more attributes into the PF_UNSPEC:RTM_XXXLINK msg however I'm thinking that if we want to support learning and triggering events then we likely also don't want to send these events to every app with RTNLGRP_LINK set. I think this needs to be a different event message. FDB_TABLEMISS? FDB_EXCEPTION? Plus there is already a proliferation of LINK attributes and dumping the FDB out of this seems a bit much but could be done with some bitmasks. Although the current ext_filter_mask u32 doesn't seem to be sufficient for events to trigger this. Dumping the FDB table should be something along the lines of FDB_GET with the dump flag. It shouldnt tie to the LINK side of things. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Tue, 2012-02-28 at 21:14 -0800, John Fastabend wrote: Just checked looks like the DSA infrastructure has commands to enable STP so guess it is doing learning. IIRC, Lennert built some of this stuff tied to the kernel. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Fri, 2012-02-17 at 09:10 -0800, John Fastabend wrote: Yes I agree that is the goal. One last comment: With synchronization there are other challenges when the entry in the hardware conflicts with the entry in software when you intend the behavior to be the same. This is not such a big deal with bridging but becomes more apparent when you start offloading ACLs etc. OK and these sorts of conflicts certainly don't need to be resolved by kernel code. So I think this is a reasonable reason to drive the synchronization into a user space daemon. Yep. Thanks for listening John. Waiting to see them patches. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Wed, 2012-02-15 at 17:26 -0800, John Fastabend wrote: On 2/15/2012 6:10 AM, Jamal Hadi Salim wrote: On Tue, 2012-02-14 at 10:57 -0800, John Fastabend wrote: Roopa was likely on the right track here, http://patchwork.ozlabs.org/patch/123064/ Doesnt seem related to the bridging stuff - the modeling looks reasonable however. The operations are really the same ADD/DEL/GET additional MAC addresses to a port, in this case a macvlan type port. The difference is the macvlan port type drops any packet with an address not in the FDB where the bridge type floods these. Ok. [the vlan piece really should have been an integrated part of bridging; in the early days this was the case] [root@jf-dev1-dcblab src]# br fdb help Usage: br fdb { add | del | replace } ADDR dev DEV br fdb {show} [ dev DEV ] In my example I just dumped all bridge devices, Ok, makes sense. Seems we need both a synchronize and a { add | del | replace } option. I am conflicted on this. Not sure if that is a command line thing or something built into a user space daemon. It may be useful to have the command line variant but i feel having a daemon take care of things helps in faster synchronization. I think user space is a good spot to add such functionality (as opposed to the kernel). That way user space can work with h/ware switching such as yours as well as a standalone switching chips (from sillicon vendors like Marvel etc). IMO, the average user doesnt need to be aware of such low level stuff; so the default should be for the user not to be responsible for configuration of synchronization. IOW, I want to just run well understood user interface tools things like ifconfig, ip link etc, the new br tool and not even need to be aware that we are offloading. So as long as s/w br0 is mapping to the bridge on ixgb-0 i dont need to know ixgb0 h/w bridge exists. One last comment: With synchronization there are other challenges when the entry in the hardware conflicts with the entry in software when you intend the behavior to be the same. This is not such a big deal with bridging but becomes more apparent when you start offloading ACLs etc. So I think what your saying is a per port bit to disable learning... hmm but if you start tweaking it too much it looks less and less like a 802.1D bridge and more like something you would want to build with tc or openvswitch or tc+bridge or tc+macvlan. These are pretty commodity features in most silicon switching chips ive come across. You have a knob to control learning and another to control flooding. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Thu, 2012-02-16 at 03:58 +, Ben Hutchings wrote: Well, in addition, there are SR-IOV network adapters that don't have any bridge. For these, the software bridge is necessary to handle multicast, broadcast and forwarding between local ports, not only to do learning. For the scenario where there is no h/w bridge - the s/ware bridge should be usable. There's no way working around that. My contention is only with the case where there is a h/w bridge and there being two FDB tables; one in hardware and another in s/w. And both the h/w and s/w bridges doing flooding and learning. It is desirable to have options to use one or other or both with some synchronization. Solarflare's implementation of accelerated guest networking (which Shradha and I are gradually sending upstream) builds on libvirt's existing support for software bridges and assigns VFs to guests as a means to offload some of the forwarding. If and when we implement a hardware bridge, we would probably still want to keep the software bridge as a fallback. If a guest is dependent on a VF that's connected to a hardware bridge, it becomes impossible or at least very disruptive to migrate it to another host that doesn't have a compatible VF available. In the scheme i described to John in last email, libvirt needs not be aware of existence of hardware offloading (and migration should be transparent of whether h/w bridge exists or not)... cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Tue, 2012-02-14 at 10:57 -0800, John Fastabend wrote: Roopa was likely on the right track here, http://patchwork.ozlabs.org/patch/123064/ Doesnt seem related to the bridging stuff - the modeling looks reasonable however. But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX netlink messages. And if possible drive this without extending ndo_ops. An ideal user space interaction IMHO would look like, [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10 [root@jf-dev1-dcblab iproute2]# ./br/br fdb portmac addrflags veth2 36:a6:35:9b:96:c4 local veth4 aa:54:b0:7b:42:ef local veth0 2a:e8:5c:95:6c:1b local veth6 6e:26:d5:43:a3:36 local veth0 f2:c1:39:76:6a:fb veth8 4e:35:16:af:87:13 local veth10 52:e5:62:7b:57:88 static veth10 aa:a9:35:21:15:c4 local Looks nice, where is the targeted bridge(eg br0) in that syntax? Using Stephen's br tool. First command adds FDB entry to SW bridge and if the same tool could be used to add entries to embedded bridge I think that would be the best case. That would be nice (although adds dependency on the presence of the s/ware bridge). Would be nicer to have either a knob in the kernel to say synchronize with h/w bridge foo which can be turned off. So no RTNETLINK error on the second cmd. Then embedded FDB entries could be dumped this way also so I get a complete view of my FDB setup across multiple sw bridges and embedded bridges. So if you had multiple h/ware bridges - which one is tied to br0? Yes. The hardware has a bit to support this which is currently not exposed to user space. That's a case where we have 'yet another knob' that needs a clean solution. This causes real bugs today when users try to use the macvlan devices in VEPA mode on top of SR-IOV. By the way these modes are all part of the 802.1Qbg spec which people actually want to use with Linux so a good clean solution is probably needed. I think the knobs to flood and learn are important. The hardware seems to have the flood but not the learn/discover. I think the s/ware bridge needs to have both. At the moment - as pointed out in that *NEIGH* notification, s/w bridge assumes a policy that could be considered a security flaw in some circles - just because you are my neighbor does not mean i trust you to come into my house; i may trust you partially and allow you only to come through the front door. Even in Canada with a default policy of not locking your door we sometimes lock our doors ;- I have no problem with drawing the line here and trying to implement something over PF_BRIDGE:RTM_xxx nlmsgs. My comment/concern was in regard to the bridge built-in policy of reading from the neighbor updates (refer to above comments) cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote: The use case here is multiple VFs but the same solution should work with multiple PFs as well. FDB controls should be independent of how the ports are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc. Makes sense. With events and ADD/DEL/GET FDB controls we can solve both cases. This also solves Roopa's case with macvlan where he wants to add additional addresses to macvlan ports. Not familiar with that issue - I'll prowl the list. Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA. Ok. So there is a toggle somewhere which controls how flooding should happen. Maybe not. But the kernel already has the needed signals with one extra hook we can save running a daemon in user space. Maybe that's not a great argument to add kernel code though. You make a reasonable arguement to have it in the kernel but i think we win more if we separate the control. So while i empathize, I am hoping that youd go with the path that is hard to travel ;- The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the br_netlink_init() path. Hrm - hadnt paid attention to that before. Nasty. The bridge seems to be hard-coding policy on station movement, no? This is a good example of the qualms i have on adding things to the kernel;- I may not want to auto update a MAC address moving ports as part of some policy i have. I can go and add YAK (Yet Another Knob) - but where is the line drawn? cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Fri, 2012-02-10 at 08:39 -0800, Stephen Hemminger wrote: Some related discussion points: * the bridge needs to support control from both userspace (MSTP, TRILL, ...) and kernel space (offload etc) I think all are pretty much covered if you let some controler (I prefer user space) ADD/DEL/GET/Event on the fdb TRILL really is outside the scope of this; from an encap/decap it probably needs to be YAND (Yet another netdev) and from a control side of things you need to just provide the above netlink ops(ADD, etC) on the fdb and let the controller worry about things (Actually you _may_ need to have learning done outside of the kernel for TRILL) * the bridge forwarding database is simpler and different than the existing neighbor table, don't remember the details but last time I checked it using neighbor table in bridge would be putting square peg in round hole. Agreed. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
Hi John, I went backwards to summarize at the top after going through your email. TL;DR version 0.1: you provide a good use case where it makes sense to do things in the kernel. IMO, you could make the same arguement if your embedded switch could do ACLs, IPv4 forwarding etc. And the kernel bloats. I am always bigoted to move all policy control to user space instead of bloating in the kernel. On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote: Hi Jamal, The user space app in this case would listen for FDB updates to the SW bridge and then mirror them at the embedded NIC. In this case it seems easier to just add a notifier chain and let the kernel keep these in sync. Otherwise we need a daemon in user space to replicate these. A user space daemon if you need to ensure synchronization. Thats what i meant when i said there was a disadvantage over the simple case when the goal is always to synchronize. On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH, and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you would have one common interface to drive these. But the bridge already has this protocol/msgtype so that would require either some demux or new protocol/msgtype pairs to be created. The bridge is very netlink friendly these days. Given the rest of the network stack (*NEIGH* you mention above) talks netlink to user space it should be workable. Let me think on it. I'm tempted by the simplicity of adding notifier hooks though. If something is missing bridge-side it may need to be added (as Per Stephen's comment) - i just took it one further indicating those notifiers need to also netlink-speak Actually because the bridge is adding/removing fdb entries dynamically maybe its best this gets done in kernel. Here's the example case, [..] With the flow by letters above hope this is not too difficult to follow. (A) veth0 a virtual device transmits packet destined for ethx.y (B) SW bridge receives frames and updates FDB flooding to C (C) eth0 the PF in this case sends the frame to the HW backed by the embedded bridge Following so far. Can you have more than one PF per embedded switch? Or is the intent here purely to do VMs/VF separation? (D) The HW embedded switch has a static entry for ethx.y and forwards the frame to the VF or if its a broadcast frame also floods it to the wire and ethx.y nod. (E) ethx.y receives the frame and generates a response to the dest mac of veth0 nod. Since you said in #D the entries in the switch are static, I am assuming at this point neither ethx.y nor veth0 exist in the embedded FDB. Now here is the potential issue, (G) The frame transmitted from ethx.y with the destination address of veth0 but the embedded switch is not a learning switch. If the FDB update is done in user space its possible (likely?) that the FDB entry for veth0 has not been added to the embedded switch yet. Ok, got it - so the catch here is the switch is not capable of learning. I think this depends on where learning is done. Your intent is to use the S/W bridge as something that does the learning for you i.e in the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run. And that maybe the case for your use case. What if I dont wanna run the S/W bridge at all? Ive been making a point that with a simple knob(Stephen doesn like to add such a knob), the SW bridge could defer learning to user space. [This way you can add a lot of richness e.g on ACLs such as restricting what MAC addresses etc are allowed to talk to which ones etc.]. But if bypass the s/w bridge all together and learn in user space or have a static config in which i populate the embedded switch, i dont see the issue. Now we either have to flood the frame which is not horrible but not ideal or worse if the embedded switch does not support flooding send it to the wire and veth0 never receives it. If it is a switch it has to flood, no? Otherwise it sounds broken. If the SW bridge pushes the FDB update down into the embedded switch the address is for sure in the embedded switches forwarding tables and the switching works as expected. Yes, there is a small gap between the s/w bridge learning and the synchronization happening to the embedded nic switch. That gap gets larger if you defer learning to user space. But like you said earlier, during that gap packets are flooded - and do you care if the synchronization doesnt happen immediately? So to handle this case correctly its probably best IMHO to use a notifier hook. Having a RTM_GETNEIGH for the embedded switch implemented though would be nice for dumping the FDB of the embedded switch and SET/DEL could be used to configure the FDB when its not being driven by the SW switch. Of course we should try to be minimalists here. Do you need to have a different *NEIGH* than what we already have really
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Thu, 2012-02-09 at 09:52 -0800, John Fastabend wrote: By netlink_notifier do you mean adding a notifier_block and using atomic_notifier_call_chain() probably in rtnl_notify()? Then drivers could register with the notifier chain with atomic_notifier_chain_register() and receive the events correctly. Or did I miss some notifier chain that already exists? Yes. that is what I mean. The callbacks you need may or may not already be present. I'll go one step further. This stuff shouldnt be in the kernel at all. The disadvantage is you need a user space app to update the hardware. i.e, the same mechanism should be usable for either a switch embedded in a NIC or a standalone hardware switch (with/out the s/ware bridge presence) cheers, jamal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html