Re: [B.A.T.M.A.N.] [PATCH 03/17] batman-adv: Add network_coding and mcast sysfs files to README
On Sat, Oct 29, 2016 at 12:56:28PM +0200, Jiri Pirko wrote: > >> I strongly believe it is a huge mistake to use sysfs for things like > >> this. This should be done via generic netlink api. > > > >This doesn't change the problem that it is already that way. This patch > >only adds the list of available files to the README. > > Sure. Just found out you did it like that. Therefore I commented. I > suggest to rework the api to use genl entirely. Hi Jiri, Thanks for sharing your thoughts! Could you explain a bit more on which disadvantages you see in the usage of sysfs here? Regards, Linus
Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
On Mon, Mar 12, 2018 at 12:08:56PM +0100, Linus Lüssing wrote: > On Tue, Feb 27, 2018 at 11:08:20AM +0100, Rafał Miłecki wrote: > > I've problem when using OpenWrt/LEDE on a home router with Broadcom's > > FullMAC WiFi chipset. > > Hi Rafał, > > Thanks for reporting this issue! > > > Can you see any solution for this problem? Is that an option to stop > > multicast-to-unicast from touching 802.11f packets? Some other ideas? > > Obviously I can't modify Broadcom's firmware and drop that obsoleted > > standard. > > Just to avoid some potential confusion: This is more an issue of > hairpinning than it is an issue of multicast-to-unicast per se, > right? > > That is, if you set this to 0 manually: > /sys/class/net//brport/multicast_to_unicast > Then the issue still occurs, right? Btw., if in OpenWRT/LEDE you set 'option multicast_to_unicast 0' in /etc/config/network for the bridge (and not bridge port) then netifd should refrain from setting the bridge hairpinning and AP isolation on wireless devices, too, if I remember correctly. Can you confirm that the issue disappears for you then?
Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
On Mon, Mar 12, 2018 at 10:46:45AM +0100, Rafał Miłecki wrote: > On 27 February 2018 at 18:05, Stephen Hemminger [...] > > ebtables is your friend in dealing with weird and broken devices. > > It may be weird, not sure if actually broken. Anyway I'd like to have > some generic solution instead of telling every user to use ebtables to > workaround the problem. I agree that a "broken by default" in OpenWRT/LEDE for a variety of Broadcom devices is not really acceptable. Technically we could teach netifd in OpenWRT/LEDE to configure ebtables accordingly, at least for a list of affected devices, so that users would not have to. However, as ebtables is not managed by the fw3 in OpenWRT/LEDE, that would probably interfer with user provided ebtables rules and scripts... > That said I think we still should look for a solution for existing > firmwares. I guess it may takes months to years to never to release > new firmwares for all supported chipsets. Hm, we could change the default in OpenWRT/LEDE for multicast-to-unicast (or more precisely bridge hairpinning) to disabled again for now.
Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
On Tue, Feb 27, 2018 at 11:08:20AM +0100, Rafał Miłecki wrote: > I've problem when using OpenWrt/LEDE on a home router with Broadcom's > FullMAC WiFi chipset. Hi Rafał, Thanks for reporting this issue! > Can you see any solution for this problem? Is that an option to stop > multicast-to-unicast from touching 802.11f packets? Some other ideas? > Obviously I can't modify Broadcom's firmware and drop that obsoleted > standard. Just to avoid some potential confusion: This is more an issue of hairpinning than it is an issue of multicast-to-unicast per se, right? That is, if you set this to 0 manually: /sys/class/net//brport/multicast_to_unicast Then the issue still occurs, right? Regards, Linus
Re: [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state
On Fri, Dec 08, 2017 at 06:46:06AM +0100, Linus Lüssing wrote: > Extending the usersize to include info->prev would probably be too > hackish/ugly, right? And wouldn't be enough anyway, since info->{credit,credit_cap,cost} would still be zeroed... Hm.
Re: [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state
On Thu, Dec 07, 2017 at 01:26:19AM +0100, Pablo Neira Ayuso wrote: > > I also had a quick look at a 4.15-rc1 kernel in a VM now. I still > > end up in ebt_limit_mt_check() with the variables being reset > > when editing the table somewhere. > > My question is if your fix would work with 4.15-rc1. You are absoluetly right, it's not working anymore since the commit you mentioned initially :-( ("xtables: extend matches and targets with .usersize"). info->prev is always 0 since exactly this commit. That means, trying tricks in ebt_limit_mt_check() is too late now, the old values are already overwritten? (or is there some commit scheme which installs the ebt_limit_info provided by ebt_limit_check() some time after its call?) Extending the usersize to include info->prev would probably be too hackish/ugly, right?
Re: [Bridge] [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state
On Mon, Dec 04, 2017 at 05:53:35AM +0100, Linus Lüssing wrote: > And so, no I do not have this patch. I looked at it now, but it > does not seem to have any relation with .matchinfo, does it? Relation between .usersize and .checkentry I ment, not .usersize and .matchinfo.
Re: [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state
Hi Pablo, Thanks for your reply! On Tue, Nov 28, 2017 at 12:30:08AM +0100, Pablo Neira Ayuso wrote: > [...] > > diff --git a/net/bridge/netfilter/ebt_limit.c > > b/net/bridge/netfilter/ebt_limit.c > > index 61a9f1be1263..f74b48633feb 100644 > > --- a/net/bridge/netfilter/ebt_limit.c > > +++ b/net/bridge/netfilter/ebt_limit.c > > @@ -69,6 +69,10 @@ static int ebt_limit_mt_check(const struct > > xt_mtchk_param *par) > > { > > struct ebt_limit_info *info = par->matchinfo; > > > > + /* Do not reset state on unrelated table changes */ > > + if (info->prev) > > + return 0; > > What kernel version are you using? I suspect you don't have this > applied? I'm indeed using a 4.4.102 kernel, as LEDE is still in the process of updating to 4.14. So 4.4 with LEDE is where I got the measurement results from. > > commit ec23189049651b16dc2ffab35a4371dc1f491aca > Author: Willem de Bruijn> Date: Mon Jan 2 17:19:46 2017 -0500 > > xtables: extend matches and targets with .usersize And so, no I do not have this patch. I looked at it now, but it does not seem to have any relation with .matchinfo, does it? I also had a quick look at a 4.15-rc1 kernel in a VM now. I still end up in ebt_limit_mt_check() with the variables being reset when editing the table somewhere. Regards, Linus
[PATCH net-next] bridge: ebtables: Avoid resetting limit rule state
So far any changes with ebtables will reset the state of limit rules, leading to spikes in traffic. This is especially noticeable if changes are done frequently, for instance via a daemon. This patch fixes this by bailing out from (re)setting if the limit rule was initialized before. When sending packets every 250ms for 600s, with a "--limit 1/sec --limit-burst 50" rule and a command like this in the background: $ ebtables -N VOIDCHAIN $ while true; do ebtables -F VOIDCHAIN; sleep 30; done The results are: Before: ~1600 packets After: 650 packets Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- net/bridge/netfilter/ebt_limit.c | 4 1 file changed, 4 insertions(+) diff --git a/net/bridge/netfilter/ebt_limit.c b/net/bridge/netfilter/ebt_limit.c index 61a9f1be1263..f74b48633feb 100644 --- a/net/bridge/netfilter/ebt_limit.c +++ b/net/bridge/netfilter/ebt_limit.c @@ -69,6 +69,10 @@ static int ebt_limit_mt_check(const struct xt_mtchk_param *par) { struct ebt_limit_info *info = par->matchinfo; + /* Do not reset state on unrelated table changes */ + if (info->prev) + return 0; + /* Check for overflow. */ if (info->burst == 0 || user2credits(info->avg * info->burst) < user2credits(info->avg)) { -- 2.11.0
[PATCH net v3] bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port
When trying to redirect bridged frames to the bridge device itself or a bridge port (brouting) via the dnat target then this currently fails: The ethernet destination of the frame is dnat'ed to the MAC address of the bridge device or port just fine. However, the IP code drops it in the beginning of ip_input.c/ip_rcv() as the dnat target left the skb->pkt_type as PACKET_OTHERHOST. Fixing this by resetting skb->pkt_type to an appropriate type after dnat'ing. Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- Changelog v3: - moved pkt_type fixup into ebtable dnat code -> v1/v2 only fixed it for prerouting/dnat so far, now tested and verified that v3 fixes it for brouting/dnat, too - updated commit message Changelog v2: - refrain from altering pkt_type for multicast packets with a unicast destination MAC --- net/bridge/netfilter/ebt_dnat.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/net/bridge/netfilter/ebt_dnat.c b/net/bridge/netfilter/ebt_dnat.c index 4e0b0c3..21acb53 100644 --- a/net/bridge/netfilter/ebt_dnat.c +++ b/net/bridge/netfilter/ebt_dnat.c @@ -9,6 +9,7 @@ */ #include #include +#include "../br_private.h" #include #include #include @@ -18,11 +19,32 @@ static unsigned int ebt_dnat_tg(struct sk_buff *skb, const struct xt_action_param *par) { const struct ebt_nat_info *info = par->targinfo; + struct net_device *dev; if (!skb_make_writable(skb, 0)) return EBT_DROP; ether_addr_copy(eth_hdr(skb)->h_dest, info->mac); + + if (is_multicast_ether_addr(info->mac)) { + if (is_broadcast_ether_addr(info->mac)) + skb->pkt_type = PACKET_BROADCAST; + else + skb->pkt_type = PACKET_MULTICAST; + } else { + rcu_read_lock(); + if (xt_hooknum(par) != NF_BR_BROUTING) + dev = br_port_get_rcu(xt_in(par))->br->dev; + else + dev = xt_in(par); + + if (ether_addr_equal(info->mac, dev->dev_addr)) + skb->pkt_type = PACKET_HOST; + else + skb->pkt_type = PACKET_OTHERHOST; + rcu_read_unlock(); + } + return info->target; } -- 2.1.4
Re: [PATCH v2] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
On Tue, Mar 21, 2017 at 04:32:45PM -0700, Stephen Hemminger wrote: > On Tue, 21 Mar 2017 23:28:45 +0100 > Linus Lüssing <linus.luess...@c0d3.blue> wrote: > > > However, the IP code drops it in the beginning of ip_input.c/ip_rcv() > > as the dnat target did not update the skb->pkt_type. If after > > dnat'ing the packet is now destined to us then the skb->pkt_type > > needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too. > > Why not fix DNAT netfilter module rather than hacking bridge code here. Sorry for the late response. Wanted to do some more testing before replying. My assumptions regarding macvlan were wrong: A) The code my patch adds is not touched in the case of a macvlan device on top of a bridge - with macvlan there seems to be no FDB entry for the MAC address of the macvlan device at all, actually, resulting in the flooding of a frame intended for that device (is this, ehm, known/intended?). B) ip_rcv() does not drop for a macvlan device, because macvlan unconditionally sets skb->pkt_type to PACKET_HOST in macvlan_handle_frame(). (And I guess, maybe I shouldn't actually care that much about macvlans on top of a bridge, as a veth-pair is much cleaner for that?) Regarding netdev::dev_addrs, if I understand the kernel code correctly, it is only, potentially populated with more than netdev::dev_addr for two drivers, via dev_addr_add(): bnx2x and ixgbe. So for a bridge device, there should never be more than one item in netdev::dev_addrs (= netdev::dev_addr), correct? So, currently I'm happy with moving the fix into the ebtables dnat code and just checking against the netdev::dev_addr of the bridge device now. (if anyone has any suggestions regarding upper devices I should test with that, then please let me know) (Sorry, Pablo, for me pressing against your earlier suggestion to put the fix in the ebtables dnat instead of bridge code. :( ) Regards, Linus
[PATCH v2] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
When trying to redirect bridged frames to the bridge device itself via the ebtables nat-prerouting chain and the dnat target then this currently fails: The ethernet destination of the frame is dnat'ed to the MAC address of the bridge itself just fine and the correctly altered frame can even be captured via a tcpdump on br0 (with or without promisc mode). However, the IP code drops it in the beginning of ip_input.c/ip_rcv() as the dnat target did not update the skb->pkt_type. If after dnat'ing the packet is now destined to us then the skb->pkt_type needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too. Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- Changelog v2: * refrain from altering pkt_type for multicast packets with a unicast destination MAC --- net/bridge/br_input.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 013f2290b..fd7bc4c 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -198,8 +198,13 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb if (dst) { unsigned long now = jiffies; - if (dst->is_local) + if (dst->is_local) { + /* fix up potential DNAT inconsistencies */ + if (skb->pkt_type == PACKET_OTHERHOST) + skb->pkt_type = PACKET_HOST; + return br_pass_frame_up(skb); + } if (now != dst->used) dst->used = now; -- 2.1.4
Re: [Bridge] [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
On Sun, Mar 19, 2017 at 05:55:06PM +0100, Linus Lüssing wrote: > On Fri, Mar 17, 2017 at 02:10:44PM +0100, Pablo Neira Ayuso wrote: > > Wait. > > > > May this break local multicast listener that are bound to the bridge > > interface? Assuming the bridge interface got an IP address, and that > > there is local multicast listener. > > > > Missing anything here? > > Hm, for multicast packets usually the code path a few lines > later in br_handle_frame_finish() should be taken instead. > > But you might be right for IP multicast packets with a unicast MAC > destination (due to whatever reason, for instance via DNAT'ing > again). > > Will check that - thanks! Ok, I tested DNAT'ing an IP multicast packet to the unicast MAC address of the bridge interface. Both ping-ing to an IPv4 and IPv6 multicast listener on br0 worked and was replied to fine, both with or without changing skb->pkt_type from PACKET_MULTICAST to PACKET_HOST. ("$ ping 224.1.0.123" and "$ ping6 ff02::1:ff40:707c%in0" from a network namespace, tied into the bridge via veth) Also, a DNAT'ed PACKET_BROADCAST worked, with or without changing it to PACKET_HOST. I also checked via tcpdump that the destination MAC was changed successfully. So, so far I wasn't able to find any bugs with the current patch. But I think I like the idea of leaving the skb->pkt_type unaltered for PACKET_MULTICAST and PACKET_BROADCAST, seems cleaner. I'd just add an "if (skb->pkt_type == PACKET_OTHERHOST)" check then and resend a PATCH v2.
Re: [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
On Fri, Mar 17, 2017 at 02:10:44PM +0100, Pablo Neira Ayuso wrote: > Wait. > > May this break local multicast listener that are bound to the bridge > interface? Assuming the bridge interface got an IP address, and that > there is local multicast listener. > > Missing anything here? Hm, for multicast packets usually the code path a few lines later in br_handle_frame_finish() should be taken instead. But you might be right for IP multicast packets with a unicast MAC destination (due to whatever reason, for instance via DNAT'ing again). Will check that - thanks!
Re: [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
On Wed, Mar 15, 2017 at 07:15:39PM +0100, Pablo Neira Ayuso wrote: > Could you update ebtables dnat to check if the ethernet address > matches the one of the input bridge interface, so we mangle the > ->pkt_type accordingly from there, instead of doing this from the > core? Actually, that was the approach I thought about and went for first (and it would probably work for me). Just checking against the bridge device's net_device::dev_addr. I scratched it though, as I was afraid that the issue might still exist for people using some other upper device on top of the bridge device. For instance, macvlan? And iterating over the net_device::dev_addrs list seemed too costly for fast path to me.
Re: [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
On Wed, Mar 15, 2017 at 11:42:11AM +0100, Pablo Neira Ayuso wrote: > I'm missing then why redirect is not then just enough for Linus usecase. For my usecase, the MAC address is configured by the user from a Web-UI. It may or may not be the one from the bridge device. Besides, found it counter intuitive that DNAT did not work here and took me some time to find out why. At least I didn't read about any such known limitations of the dnat target in the ebtables manpage.
[PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device
When trying to redirect bridged frames to the bridge device itself via the ebtables nat-prerouting chain and the dnat target then this currently fails: The ethernet destination of the frame is dnat'ed to the MAC address of the bridge itself just fine and the correctly altered frame can even be captured via a tcpdump on br0 (with or without promisc mode). However, the IP code drops it in the beginning of ip_input.c/ip_rcv() as the dnat target did not update the skb->pkt_type. If after dnat'ing the packet is now destined to us then the skb->pkt_type needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too. Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- net/bridge/br_input.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 013f2290b..ec83175 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -198,8 +198,12 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb if (dst) { unsigned long now = jiffies; - if (dst->is_local) + if (dst->is_local) { + /* fix up potential DNAT mess */ + skb->pkt_type = PACKET_HOST; + return br_pass_frame_up(skb); + } if (now != dst->used) dst->used = now; -- 2.1.4
[PATCH net] ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
When for instance a mobile Linux device roams from one access point to another with both APs sharing the same broadcast domain and a multicast snooping switch in between: 1)(c) <~~~> (AP1) <--[SSW]--> (AP2) 2) (AP1) <--[SSW]--> (AP2) <~~~> (c) Then currently IPv6 multicast packets will get lost for (c) until an MLD Querier sends its next query message. The packet loss occurs because upon roaming the Linux host so far stayed silent regarding MLD and the snooping switch will therefore be unaware of the multicast topology change for a while. This patch fixes this by always resending MLD reports when an interface change happens, for instance from NO-CARRIER to CARRIER state. Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- Initial problem report was sent to the bridge mailing list a while ago: - https://lists.linuxfoundation.org/pipermail/bridge/2015-September/009754.html The RFCs concerning IGMP, MLD and snooping switches seem a have a hole concerning roaming. A request for clarification to mcast-w...@ietf.org was left unanswered, unfortunately: - https://mailarchive.ietf.org/arch/msg/mcast-wifi/Ghn2cGy1oN2ZwG1qaQO9SE13g6g However, simply resending reports seems to be the straight forward way to me to fix the issue mentioned above. --- net/ipv6/addrconf.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index f60e88e..81f7b4e 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -3386,9 +3386,15 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, } if (idev) { - if (idev->if_flags & IF_READY) - /* device is already configured. */ + if (idev->if_flags & IF_READY) { + /* device is already configured - +* but resend MLD reports, we might +* have roamed and need to update +* multicast snooping switches +*/ + ipv6_mc_up(idev); break; + } idev->if_flags |= IF_READY; } -- 2.1.4
[PATCH net-next v5] bridge: multicast to unicast
From: Felix Fietkau <n...@nbd.name> Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <n...@nbd.name> [linus.luess...@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- This feature is used and enabled by default in OpenWRT and LEDE for AP interfaces for more than a year now to allow both a more robust multicast delivery and multicast at higher rates (e.g. multicast streaming). In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by the network daemon enabling AP isolation and by that separating all STAs. Delivery of STA-to-STA IP mulitcast is made possible again by enabling and utilizing the bridge hairpin mode, which considers the incoming port as a potential outgoing port, too. Hairpin-mode is performed after multicast snooping, therefore leading to only deliver reports to STAs running a multicast router. Changes in v5: * fix a potential pagefault in br_ip6_multicast_mld2_report(): -> a pskb_may_pull() might reallocate skb->data, therefore perform the "src = eth_hdr(skb)->h_source" only afterwards * simplify code by always adding ether source address to a port group and checking the per-port multicast-to-unicast flag instead of a per-port-group one (thanks Stephen!) Changes in v4: * readd "From: Felix Fietkau [...]" (missed it again in v3) Changes in v3: * fix an uninitialized variable bug introduced in br_multicast_flood() in v2, found by kbuild test bot Changes in v2: * netlink support (thanks Nik!) * fixed switching between multicast_to_unicast on/off -> even after toggling an already existing entry would stale in its mode and would never be replaced -> new extra check in br_port_group_equal) * reduced checks in br_multicast_flood() from two to one to address fast-path concerns (thanks Nik!) * rev-christmas tree ordering (thanks Nik!) * removed "net_bridge_port_group::unicast", using ::flags instead (thanks Nik!) * BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST (BR_MULTICAST_FLAST_LEAVE has the same length anyway) * simplified maybe_deliver_addr() (no return, no "prev" paramater -> was a NOP anyway) * added "From: Felix Fietkau [...]" * added "Signed-off-by: Felix Fietkau [...]" --- include/linux/if_bridge.h| 1 + include/uapi/linux/if_link.h | 1 + net/bridge/br_forward.c | 39 ++- net/bridge/br_mdb.c | 2 +- net/bridge/br_multicast.c| 90 net/bridge/br_netlink.c | 5 +++ net/bridge/br_private.h | 3 +- net/bridge/br_sysfs_if.c | 2 + 8 files changed, 114 insertions(+), 29 deletions(-) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index c6587c0..debc9d5 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -46,6 +46,7 @@ struct br_ip_list { #define BR_LEARNING_SYNC BIT(9) #define BR_PROXYARP_WIFI BIT(10) #define BR_MCAST_FLOOD BIT(11) +#define BR_MULTICAST_TO_UNICASTBIT(12) #define BR_DEFAULT_AGEING_TIME (300 * HZ) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6b13e59..4e59565 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -321,6 +321,7 @@ enum { IFLA_BRPORT_MULTICAST_ROUTER, IFLA_BRPORT_PAD, IFLA_BRPORT_MCAST_FLOOD, + IFLA_BRPORT_MCAST_TO_UCAST, __IFLA_BRPORT_MAX }; #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 7cb41ae..a0f9d00 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -174,6 +174,31 @@ static struct net_bridge_port *maybe_deliver( return p; } +static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb, + const unsigned char *addr, bool local_orig) +{ + struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev; + const unsigned char *src = eth_hdr(skb)->h_source; + + if (!should_deliver(p, skb)) + return; + +
[PATCH net-next v4] bridge: multicast to unicast
From: Felix Fietkau <n...@nbd.name> Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <n...@nbd.name> [linus.luess...@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- This feature is used and enabled by default in OpenWRT and LEDE for AP interfaces for more than a year now to allow both a more robust multicast delivery and multicast at higher rates (e.g. multicast streaming). In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by the network daemon enabling AP isolation and by that separating all STAs. Delivery of STA-to-STA IP mulitcast is made possible again by enabling and utilizing the bridge hairpin mode, which considers the incoming port as a potential outgoing port, too. Hairpin-mode is performed after multicast snooping, therefore leading to only deliver reports to STAs running a multicast router. Changes in v4: * readd "From: Felix Fietkau [...]" (missed it again in v3) Changes in v3: * fix an uninitialized variable bug introduced in br_multicast_flood() in v2, found by kbuild test bot Changes in v2: * netlink support (thanks Nik!) * fixed switching between multicast_to_unicast on/off -> even after toggling an already existing entry would stale in its mode and would never be replaced -> new extra check in br_port_group_equal) * reduced checks in br_multicast_flood() from two to one to address fast-path concerns (thanks Nik!) * rev-christmas tree ordering (thanks Nik!) * removed "net_bridge_port_group::unicast", using ::flags instead (thanks Nik!) * BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST (BR_MULTICAST_FLAST_LEAVE has the same length anyway) * simplified maybe_deliver_addr() (no return, no "prev" paramater -> was a NOP anyway) * added "From: Felix Fietkau [...]" * added "Signed-off-by: Felix Fietkau [...]" --- include/linux/if_bridge.h| 1 + include/uapi/linux/if_link.h | 1 + net/bridge/br_forward.c | 37 - net/bridge/br_mdb.c | 2 +- net/bridge/br_multicast.c| 96 net/bridge/br_netlink.c | 5 +++ net/bridge/br_private.h | 8 ++-- net/bridge/br_sysfs_if.c | 2 + 8 files changed, 121 insertions(+), 31 deletions(-) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index c6587c0..debc9d5 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -46,6 +46,7 @@ struct br_ip_list { #define BR_LEARNING_SYNC BIT(9) #define BR_PROXYARP_WIFI BIT(10) #define BR_MCAST_FLOOD BIT(11) +#define BR_MULTICAST_TO_UNICASTBIT(12) #define BR_DEFAULT_AGEING_TIME (300 * HZ) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6b13e59..4e59565 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -321,6 +321,7 @@ enum { IFLA_BRPORT_MULTICAST_ROUTER, IFLA_BRPORT_PAD, IFLA_BRPORT_MCAST_FLOOD, + IFLA_BRPORT_MCAST_TO_UCAST, __IFLA_BRPORT_MAX }; #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 7cb41ae..a6c8a27 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -174,6 +174,29 @@ static struct net_bridge_port *maybe_deliver( return p; } +static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb, + const unsigned char *addr, bool local_orig) +{ + struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev; + const unsigned char *src = eth_hdr(skb)->h_source; + + if (!should_deliver(p, skb)) + return; + + /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */ + if (skb->dev == p->dev && ether_addr_equal(src, addr)) + return; + + skb = skb_copy(skb, GFP_ATOMIC); + if (!skb) { + dev->stats.tx_dropped++; + return; + } + + memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN); + __br_forward(p, skb, loc
[PATCH net-next v3] bridge: multicast to unicast
Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <n...@nbd.name> [linus.luess...@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- This feature is used and enabled by default in OpenWRT and LEDE for AP interfaces for more than a year now to allow both a more robust multicast delivery and multicast at higher rates (e.g. multicast streaming). In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by the network daemon enabling AP isolation and by that separating all STAs. Delivery of STA-to-STA IP mulitcast is made possible again by enabling and utilizing the bridge hairpin mode, which considers the incoming port as a potential outgoing port, too. Hairpin-mode is performed after multicast snooping, therefore leading to only deliver reports to STAs running a multicast router. Changes in v3: * fix an uninitialized variable bug introduced in br_multicast_flood() in v2, found by kbuild test bot Changes in v2: * netlink support (thanks Nik!) * fixed switching between multicast_to_unicast on/off -> even after toggling an already existing entry would stale in its mode and would never be replaced -> new extra check in br_port_group_equal) * reduced checks in br_multicast_flood() from two to one to address fast-path concerns (thanks Nik!) * rev-christmas tree ordering (thanks Nik!) * removed "net_bridge_port_group::unicast", using ::flags instead (thanks Nik!) * BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST (BR_MULTICAST_FLAST_LEAVE has the same length anyway) * simplified maybe_deliver_addr() (no return, no "prev" paramater -> was a NOP anyway) * added "From: Felix Fietkau [...]" * added "Signed-off-by: Felix Fietkau [...]" --- include/linux/if_bridge.h| 1 + include/uapi/linux/if_link.h | 1 + net/bridge/br_forward.c | 37 - net/bridge/br_mdb.c | 2 +- net/bridge/br_multicast.c| 96 net/bridge/br_netlink.c | 5 +++ net/bridge/br_private.h | 8 ++-- net/bridge/br_sysfs_if.c | 2 + 8 files changed, 121 insertions(+), 31 deletions(-) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index c6587c0..debc9d5 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -46,6 +46,7 @@ struct br_ip_list { #define BR_LEARNING_SYNC BIT(9) #define BR_PROXYARP_WIFI BIT(10) #define BR_MCAST_FLOOD BIT(11) +#define BR_MULTICAST_TO_UNICASTBIT(12) #define BR_DEFAULT_AGEING_TIME (300 * HZ) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6b13e59..4e59565 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -321,6 +321,7 @@ enum { IFLA_BRPORT_MULTICAST_ROUTER, IFLA_BRPORT_PAD, IFLA_BRPORT_MCAST_FLOOD, + IFLA_BRPORT_MCAST_TO_UCAST, __IFLA_BRPORT_MAX }; #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 7cb41ae..a6c8a27 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -174,6 +174,29 @@ static struct net_bridge_port *maybe_deliver( return p; } +static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb, + const unsigned char *addr, bool local_orig) +{ + struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev; + const unsigned char *src = eth_hdr(skb)->h_source; + + if (!should_deliver(p, skb)) + return; + + /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */ + if (skb->dev == p->dev && ether_addr_equal(src, addr)) + return; + + skb = skb_copy(skb, GFP_ATOMIC); + if (!skb) { + dev->stats.tx_dropped++; + return; + } + + memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN); + __br_forward(p, skb, local_orig); +} + /* called under rcu_read_lock */ void br_flood(struct net_bridge *br, struct sk_buff *skb,
[PATCH net-next v2] bridge: multicast to unicast
From: Felix Fietkau <n...@nbd.name> Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <n...@nbd.name> [linus.luess...@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- This feature is used and enabled by default in OpenWRT and LEDE for AP interfaces for more than a year now to allow both a more robust multicast delivery and multicast at higher rates (e.g. multicast streaming). In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by the network daemon enabling AP isolation and by that separating all STAs. Delivery of STA-to-STA IP mulitcast is made possible again by enabling and utilizing the bridge hairpin mode, which considers the incoming port as a potential outgoing port, too. Hairpin-mode is performed after multicast snooping, therefore leading to only deliver reports to STAs running a multicast router. Changes in v2: * netlink support (thanks Nik!) * fixed switching between multicast_to_unicast on/off -> even after toggling an already existing entry would stale in its mode and would never be replaced -> new extra check in br_port_group_equal) * reduced checks in br_multicast_flood() from two to one to address fast-path concerns (thanks Nik!) * rev-christmas tree ordering (thanks Nik!) * removed "net_bridge_port_group::unicast", using ::flags instead (thanks Nik!) * BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST (BR_MULTICAST_FLAST_LEAVE has the same length anyway) * simplified maybe_deliver_addr() (no return, no "prev" paramater -> was a NOP anyway) * added "From: Felix Fietkau [...]" * added "Signed-off-by: Felix Fietkau [...]" --- include/linux/if_bridge.h| 1 + include/uapi/linux/if_link.h | 1 + net/bridge/br_forward.c | 37 - net/bridge/br_mdb.c | 2 +- net/bridge/br_multicast.c| 96 net/bridge/br_netlink.c | 5 +++ net/bridge/br_private.h | 8 ++-- net/bridge/br_sysfs_if.c | 2 + 8 files changed, 121 insertions(+), 31 deletions(-) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index c6587c0..debc9d5 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -46,6 +46,7 @@ struct br_ip_list { #define BR_LEARNING_SYNC BIT(9) #define BR_PROXYARP_WIFI BIT(10) #define BR_MCAST_FLOOD BIT(11) +#define BR_MULTICAST_TO_UNICASTBIT(12) #define BR_DEFAULT_AGEING_TIME (300 * HZ) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6b13e59..4e59565 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -321,6 +321,7 @@ enum { IFLA_BRPORT_MULTICAST_ROUTER, IFLA_BRPORT_PAD, IFLA_BRPORT_MCAST_FLOOD, + IFLA_BRPORT_MCAST_TO_UCAST, __IFLA_BRPORT_MAX }; #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 7cb41ae..75d041e 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -174,6 +174,29 @@ static struct net_bridge_port *maybe_deliver( return p; } +static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb, + const unsigned char *addr, bool local_orig) +{ + struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev; + const unsigned char *src = eth_hdr(skb)->h_source; + + if (!should_deliver(p, skb)) + return; + + /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */ + if (skb->dev == p->dev && ether_addr_equal(src, addr)) + return; + + skb = skb_copy(skb, GFP_ATOMIC); + if (!skb) { + dev->stats.tx_dropped++; + return; + } + + memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN); + __br_forward(p, skb, local_orig); +} + /* called under rcu_read_lock */ void br_flood(struct net_bridge *br, struct sk_buff *skb, enum br_pkt_type pkt_type, bool local_rcv, bool local_orig) @@ -241,10 +264,20 @@ void b
Re: [PATCH net-next] bridge: multicast to unicast
On Mon, Jan 09, 2017 at 01:30:32PM -0800, Stephen Hemminger wrote: > I wonder if MAC80211 should be doing IGMP snooping and not bridge > in this environment. In the long term, yes. For now, not quite sure. I personally like to go for simple solutions first :).
Re: [PATCH net-next] bridge: multicast to unicast
On Mon, Jan 09, 2017 at 10:42:46PM +0100, Johannes Berg wrote: > On Mon, 2017-01-09 at 22:33 +0100, Linus Lüssing wrote: > > On Mon, Jan 09, 2017 at 01:44:03PM +0100, Johannes Berg wrote: > > > > > > > > A host SHOULD silently discard a datagram that is > > > > > received via > > > > > a link-layer broadcast (see Section 2.4) but does not > > > > > specify > > > > > an IP multicast or broadcast destination address. > > > > > > > > This example is the other way round. It specifies how the IP > > > > destination should look like in case of link-layer broadcast. Not > > > > how the link-layer destination should look like in case of a > > > > multicast/broadcast IP destination. > > > > > > You stopped reading too early - snipped the context part for you :) > > > > Sorry for writing to you directly, but I still have some > > difficulties. In pseudo-code that line says: > > > > - > > if ll_dst(pkt) == bcast AND ip_dst(pkt) != mcast/bcast: > > -> drop(pkt) > > - > > > > But after multicast-to-unicast conversion, we have: > > > > - > > ll_dst(pkt) == ucast AND ip_dst(pkt) == mcast > > - > > > > So none of the two requirements for dropping are matched? > > > > Exactly. My point is that this is breaking the expectation that hosts > are actually able to drop such packets. [readding CCs I removed earlier] Ah! Thanks. I was worried about creating packetloss :D. Hm, for this other other way round, I think it does not apply for the bridge multicast-to-unicast patch if I'm not misreading the bridge code: For a packet with a link-layer multicast address but a unicast IP destination, the bridge MDB lookup will fail. (http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.8#L178 returns NULL) Case A): No multicast router on port: -> bridge, br_multicast_flood(), will drop the packet already (no matter if multicast-to-unicast is enabled or not) Case B): Multicast router present on port: -> The new patch does not apply multicast-to-unicast but just floods packet unaltered ("else { port = rport; addr = NULL; }" branch) Regards, Linus
Re: [PATCH net-next] bridge: multicast to unicast
On Mon, Jan 09, 2017 at 12:44:19PM +0100, M. Braun wrote: > Am 09.01.2017 um 09:08 schrieb Johannes Berg: > > Does it make sense to implement the two in separate layers though? > > > > Clearly, this part needs to be implemented in the bridge layer due to > > the snooping knowledge, but the code is very similar to what mac80211 > > has now. > > Does the bridge always know about all stations connected? The bridge does not always know about all stations, especially the silent ones like in your DVB-T example. However, concerning IP multicast, there is IGMP/MLD. So the bridge does know about all stations which are interested in a specific IP multicast stream. (As long as there is a querier on the link, which periodically queriers for IGMP/MLD reports from any listener. If there is no querier then the bridge multicast snooping, including the bridge multicast-to-unicast will fall back to flooding) So if your television example uses IP multicast properly, it is completely doable with the bridge multicast-to-unicast, thanks to IGMP/MLD.
Re: [PATCH net-next] bridge: multicast to unicast
On Mon, Jan 09, 2017 at 09:05:49AM +0100, Johannes Berg wrote: > On Sat, 2017-01-07 at 16:15 +0100, Linus Lüssing wrote: > > > Actually, I do not quite understand that remark in the mac80211 > > multicast-to-unicast patch. IP should not care about the ethernet > > header? > > But it does, for example RFC 1122 states: > > When a host sends a datagram to a link-layer broadcast address, > the IP destination address MUST be a legal IP broadcast or IP > multicast address. > > A host SHOULD silently discard a datagram that is received via > a link-layer broadcast (see Section 2.4) but does not specify > an IP multicast or broadcast destination address. This example is the other way round. It specifies how the IP destination should look like in case of link-layer broadcast. Not how the link-layer destination should look like in case of a multicast/broadcast IP destination. Any other examples?
Re: [PATCH net-next] bridge: multicast to unicast
On Fri, Jan 06, 2017 at 01:47:52PM +0100, Johannes Berg wrote: > How does this compare and/or relate to the multicast-to-unicast feature > we were going to add to the wifi stack, particularly mac80211? Do we > perhaps not need that feature at all, if bridging will have it? > > I suppose that the feature there could apply also to locally generated > traffic when the AP interface isn't in a bridge, but I think I could > live with requiring the AP to be put into a bridge to achieve a similar > configuration? > > Additionally, on an unrelated note, this seems to apply generically to > all kinds of frames, losing information by replacing the address. > Shouldn't it have similar limitations as the wifi stack feature has > then, like only applying to ARP, IPv4, IPv6 and not general protocols? (should all three be answered with Michael's and my reply to Michael's mail, I think) > > Also, it should probably come with the same caveat as we documented for > the wifi feature: > > Note that this may break certain expectations of the receiver, > such as the ability to drop unicast IP packets received within > multicast L2 frames, or the ability to not send ICMP destination > unreachable messages for packets received in L2 multicast (which > is required, but the receiver can't tell the difference if this > new option is enabled.) Actually, I do not quite understand that remark in the mac80211 multicast-to-unicast patch. IP should not care about the ethernet header?
Re: [PATCH net-next] bridge: multicast to unicast
On Fri, Jan 06, 2017 at 07:13:56PM -0800, Stephen Hemminger wrote: > On Mon, 2 Jan 2017 20:32:14 +0100 > Linus Lüssing <linus.luess...@c0d3.blue> wrote: > > > This feature is intended for interface types which have a more reliable > > and/or efficient way to deliver unicast packets than broadcast ones > > (e.g. wifi). > > > Why is this not done in MAC80211 rather than bridge? Because mac80211 does not have the IGMP/MLD snooping code as the bridge has? Reimplementing the snooping in mac80211 does not make sense because of duplicating code. Moving the snooping code from the bridge to mac80211 does not make sense either, because we want multicast snooping, software based, (virtually) wired switches, too. The "best way" (TM) would probably be to migrate the IGMP/MLD snooping from the bridge code to the net device code on the long run to make such a database usable for any kind of device, without needing this bridge hack. But such a migration would also need a way more invasive patchset. While Felix's idea might look a little "ugly" due it's hacky nature, I think it is also quite beautiful thanks to it's simplicity.
Re: [PATCH net-next] bridge: multicast to unicast
On Sat, Jan 07, 2017 at 11:32:57AM +0100, M. Braun wrote: > Am 06.01.2017 um 14:54 schrieb Johannes Berg: > > > >> The bridge layer can use IGMP snooping to ensure that the multicast > >> stream is only transmitted to clients that are actually a member of > >> the group. Can the mac80211 feature do the same? > > > > No, it'll convert the packet for all clients that are behind that > > netdev. But that's an argument for dropping the mac80211 feature, which > > hasn't been merged upstream yet, no? > > But there is multicast/broadcast traffic like e.g. ARP and some IP > multicast groups that are not covered by IGMP snooping. The mac80211 > patch converts this to unicast as well, which the bridge cannot do. > > That way, these features both complement and overlap each other. Right, I'd agree with that. I didn't write it explicitly in the commit message, but yes, the like anything concerning bridge multicast snooping, bridge multicast-to-unicast can only affect packets as noted in RFC4541 ("Considerations for Internet Group Management Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping Switches"), too. So it is only working for IPv4 multicast, excluding link-local (224.0.0.0/24), and IPv6 multicast, excluding all-host-multicast (ff02::1). And does not concern ARP in any way. The nice complementary effect is, that the bridge can first sieve out those IP packets thanks to IGMP/MLD snooping knowledge and for anything else, like ARP, 224.0.0.x or ff02::1, the mac80211 multicast-to-unicast could do its job. For APs with a small number of STAs (like your private home AP), you might want to enable both bridge multicast-to-unicast and mac80211 multicast-to-unicast for this complementary effect. While on public APs with 30 to 50 STAs with varying distances and bitrates, you might only one to enable the bridge one, because sending an ARP packet 50x might actually reduce performance and airtime significantly.
[PATCH net-next] bridge: multicast to unicast
Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Cc: Felix Fietkau <n...@nbd.name> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- This feature is used and enabled by default in OpenWRT and LEDE for AP interfaces for more than a year now to allow both a more robust multicast delivery and multicast at higher rates (e.g. multicast streaming). In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by the network daemon enabling AP isolation and by that separating all STAs. Delivery of STA-to-STA IP mulitcast is made possible again by enabling and utilizing the bridge hairpin mode, which considers the incoming port as a potential outgoing port, too. Hairpin-mode is performed after multicast snooping, therefore leading to only deliver reports to STAs running a multicast router. --- include/linux/if_bridge.h | 1 + net/bridge/br_forward.c | 44 +-- net/bridge/br_mdb.c | 2 +- net/bridge/br_multicast.c | 92 ++- net/bridge/br_private.h | 4 ++- net/bridge/br_sysfs_if.c | 2 ++ 6 files changed, 115 insertions(+), 30 deletions(-) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index c6587c0..f1b0d78 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -46,6 +46,7 @@ struct br_ip_list { #define BR_LEARNING_SYNC BIT(9) #define BR_PROXYARP_WIFI BIT(10) #define BR_MCAST_FLOOD BIT(11) +#define BR_MULTICAST_TO_UCAST BIT(12) #define BR_DEFAULT_AGEING_TIME (300 * HZ) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 7cb41ae..49d742d 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -174,6 +174,33 @@ static struct net_bridge_port *maybe_deliver( return p; } +static struct net_bridge_port *maybe_deliver_addr( + struct net_bridge_port *prev, struct net_bridge_port *p, + struct sk_buff *skb, const unsigned char *addr, + bool local_orig) +{ + struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev; + const unsigned char *src = eth_hdr(skb)->h_source; + + if (!should_deliver(p, skb)) + return prev; + + /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */ + if (skb->dev == p->dev && ether_addr_equal(src, addr)) + return prev; + + skb = skb_copy(skb, GFP_ATOMIC); + if (!skb) { + dev->stats.tx_dropped++; + return prev; + } + + memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN); + __br_forward(p, skb, local_orig); + + return prev; +} + /* called under rcu_read_lock */ void br_flood(struct net_bridge *br, struct sk_buff *skb, enum br_pkt_type pkt_type, bool local_rcv, bool local_orig) @@ -231,6 +258,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst, struct net_bridge_port *prev = NULL; struct net_bridge_port_group *p; struct hlist_node *rp; + const unsigned char *addr; rp = rcu_dereference(hlist_first_rcu(>router_list)); p = mdst ? rcu_dereference(mdst->ports) : NULL; @@ -241,10 +269,20 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst, rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) : NULL; - port = (unsigned long)lport > (unsigned long)rport ? - lport : rport; + if ((unsigned long)lport > (unsigned long)rport) { + port = lport; + addr = p->unicast ? p->eth_addr : NULL; + } else { + port = rport; + addr = NULL; + } + + if (addr) + prev = maybe_deliver_addr(prev, port, skb, addr, + local_orig); + else + prev = maybe_deliver(prev, port, skb, local_orig); - prev = maybe_deliver(prev, port, skb, local_orig); if (IS_ERR(prev)) goto out; if (prev == port
Re: [PATCH] mac80211_hwsim: suggest nl80211 instead of wext driver in documentation
On Mon, Oct 17, 2016 at 11:39:04AM +0200, Johannes Berg wrote: > On Mon, 2016-10-17 at 00:39 +0200, Linus Lüssing wrote: > > For mac80211_hwsim interfaces, suggest to use wpa_supplicant with the > > more modern, netlink based driver instead of wext. > > Makes sense, applied. > > > Actually, I wasn't even able to make a connection with the > > configuration > > files and information provided in > > Documentation/networking/mac80211_hwsim/{README,hostapd.conf/wpa_supp > > licant.conf} > > > This less so, we even have a few test cases we run regularly, but I > don't know. Maybe there's something special in those configuration > files that we don't test for otherwise. I investigated a little further and the problem is probably that I'm running a minimal Linux kernel which was compiled without CONFIG_CFG80211_WEXT :-). Anyways, thanks for applying the patch! Regards, Linus
[PATCH] mac80211_hwsim: suggest nl80211 instead of wext driver in documentation
For mac80211_hwsim interfaces, suggest to use wpa_supplicant with the more modern, netlink based driver instead of wext. Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- Actually, I wasn't even able to make a connection with the configuration files and information provided in Documentation/networking/mac80211_hwsim/{README,hostapd.conf/wpa_supplicant.conf} Changing -Dwext to -Dnl80211 helped and made the WPA-PSK connection with mac80211_hwsim interfaces work for me. --- Documentation/networking/mac80211_hwsim/README | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/networking/mac80211_hwsim/README b/Documentation/networking/mac80211_hwsim/README index 24ac91d..3566a72 100644 --- a/Documentation/networking/mac80211_hwsim/README +++ b/Documentation/networking/mac80211_hwsim/README @@ -60,7 +60,7 @@ modprobe mac80211_hwsim hostapd hostapd.conf # Run wpa_supplicant (station) for wlan1 -wpa_supplicant -Dwext -iwlan1 -c wpa_supplicant.conf +wpa_supplicant -Dnl80211 -iwlan1 -c wpa_supplicant.conf More test cases are available in hostap.git: -- 2.1.4
Re: [Bridge] [PATCH net-next v2 2/2] net: bridge: add per-port multicast flood flag
On Wed, Aug 31, 2016 at 08:02:22AM +0200, Nikolay Aleksandrov wrote: > On 31/08/16 03:37, Linus Lüssing wrote: > > On Tue, Aug 30, 2016 at 05:23:08PM +0200, Nikolay Aleksandrov via Bridge > > wrote: > >> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c > >> index 1da3221845f1..ed0dd3340084 100644 > >> --- a/net/bridge/br_if.c > >> +++ b/net/bridge/br_if.c > >> @@ -362,7 +362,7 @@ static struct net_bridge_port *new_nbp(struct > >> net_bridge *br, > >>p->path_cost = port_cost(dev); > >>p->priority = 0x8000 >> BR_PORT_BITS; > >>p->port_no = index; > >> - p->flags = BR_LEARNING | BR_FLOOD; > >> + p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD; > > > > I'm discontent with this new flag becoming the default. > > > > Could you elaborate a little more on your use-case, when/why do > > you want/need this flag? > > > > The use case is the current default behaviour if we don't make this flag on > by default > then we'll change user-visible default behaviour. Right now we flood > unregistered mcast > traffic by default (if there's no querier and router port, which continues to > function > as before). Also we have the port flags equal to BR_AUTO_MASK by default. Ok, you're right, the way you implemented it doesn't change the default behaviour (ignoring the BR_AUTO_MASK change you removed in v3). I guess the "similar to the unknown unicast flood flag" confused me a little and I was afraid that the "flood if there is no listener / MDB entry" behaviour, which was removed some years ago, would be reintroduced. (but yeah, looking at the code more closely, it doesn't do that) Thanks for the clarification! Regards, Linus
Re: [Bridge] [PATCH net-next v2 2/2] net: bridge: add per-port multicast flood flag
On Tue, Aug 30, 2016 at 05:23:08PM +0200, Nikolay Aleksandrov via Bridge wrote: > diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c > index 1da3221845f1..ed0dd3340084 100644 > --- a/net/bridge/br_if.c > +++ b/net/bridge/br_if.c > @@ -362,7 +362,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge > *br, > p->path_cost = port_cost(dev); > p->priority = 0x8000 >> BR_PORT_BITS; > p->port_no = index; > - p->flags = BR_LEARNING | BR_FLOOD; > + p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD; I'm discontent with this new flag becoming the default. Could you elaborate a little more on your use-case, when/why do you want/need this flag?
[PATCH] cfg80211: Add stub for cfg80211_get_station()
This allows modules using this function (currently: batman-adv) to compile even if cfg80211 is not built at all, thus relaxing dependencies. Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- include/net/cfg80211.h | 9 + 1 file changed, 9 insertions(+) diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h index 9c23f4d3..beb7610 100644 --- a/include/net/cfg80211.h +++ b/include/net/cfg80211.h @@ -1102,6 +1102,7 @@ struct station_info { struct cfg80211_tid_stats pertid[IEEE80211_NUM_TIDS + 1]; }; +#if IS_ENABLED(CONFIG_CFG80211) /** * cfg80211_get_station - retrieve information about a given station * @dev: the device where the station is supposed to be connected to @@ -1114,6 +1115,14 @@ struct station_info { */ int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr, struct station_info *sinfo); +#else +static inline int cfg80211_get_station(struct net_device *dev, + const u8 *mac_addr, + struct station_info *sinfo) +{ + return -ENOENT; +} +#endif /** * enum monitor_flags - monitor flags -- 2.1.4
Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
On Tue, Jun 28, 2016 at 08:04:42AM -0400, David Miller wrote: > From: Linus Lüssing <linus.luess...@c0d3.blue> > [...] > > Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()") > > You're missing an initial 'd' in that SHA1-ID. > > With that fixed, applied and queued up for -stable. Sorry :(. Thanks for taking care of it!
Re: [Bridge] [PATCH net-next] net: bridge: add support for IGMP/MLD stats and export them via netlink
On Mon, Jun 27, 2016 at 08:10:48PM +0200, Nikolay Aleksandrov via Bridge wrote: > These are invaluable when monitoring or debugging complex multicast setups > with bridges. Indeed! Great patch :). Especially if people are unable to provide pcap files for debugging (due to whatever reason). Hopefully that will help with bugzilla ticket #99081, too... I know it might not quite fit into your current patch, which simply stores the ICMPv6 and IGMP type in the bridge private skb->cb, but do you think you could count and export the following two more things, too: * MLDv1 vs. MLDv2 querier (and IGMP accordingly) * Number of (potential) MLD/IGMP parse errors (e.g. beginning of br_multicast_ipv{4,6}_rcv(): http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1588 and http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1634) The former would help to know how the network is expected to behave (for instance whether you should see MLDv2 reports at all or whether / how much report suppression to expect). The latter would help to spot either potential IGMP/MLD parsing bugs in the bridge or malformed IGMP/MLD messages send by someone else. Ideally, there would be per port counters again for the overall IPv4/IPv6 multicast traffic. That would help for multicast streams for instance, to easily see whether multicast counters increase rapidly on the ports you would expect them to. And whether snooping is working in general for such streames, without needing to check each port individually via tcpdump, for instance. Just some thoughts, would love to hear what you think about them. Regards, Linus
Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
On Fri, Jun 24, 2016 at 12:35:18PM +0200, Daniel Danzberger wrote: > The bridge is falsly dropping ipv6 mulitcast packets if there is: > 1. No ipv6 address assigned on the brigde. > 2. No external mld querier present. > 3. The internal querier enabled. > > When the bridge fails to build mld queries, because it has no > ipv6 address, it slilently returns, but keeps the local querier enabled. > This specific case causes confusing packet loss. > > Ipv6 multicast snooping can only work if: > a) An external querier is present > OR > b) The bridge has an ipv6 address an is capable of sending own queries > > Otherwise it has to forward/flood the ipv6 multicast traffic, > because snooping cannot work. > > This patch fixes the issue by adding a flag to the bridge struct that > indicates that there is currently no ipv6 address assinged to the bridge > and returns a false state for the local querier in > __br_multicast_querier_exists(). Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()")
Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
On Fri, Jun 24, 2016 at 12:35:18PM +0200, Daniel Danzberger wrote: > The bridge is falsly dropping ipv6 mulitcast packets if there is: > 1. No ipv6 address assigned on the brigde. > 2. No external mld querier present. > 3. The internal querier enabled. > > When the bridge fails to build mld queries, because it has no > ipv6 address, it slilently returns, but keeps the local querier enabled. > This specific case causes confusing packet loss. > > Ipv6 multicast snooping can only work if: > a) An external querier is present > OR > b) The bridge has an ipv6 address an is capable of sending own queries > > Otherwise it has to forward/flood the ipv6 multicast traffic, > because snooping cannot work. > > This patch fixes the issue by adding a flag to the bridge struct that > indicates that there is currently no ipv6 address assinged to the bridge > and returns a false state for the local querier in > __br_multicast_querier_exists(). Acked-by: Linus Lüssing <linus.luess...@c0d3.blue>
Re: [PATCH] Bridge: Fix ipv6 mc snooping if it has no ipv6 address.
Hi Daniel, Thanks for submitting this patch here :). On Thu, Jun 23, 2016 at 11:28:55AM +0200, daniel wrote: > The bridge is falsly dropping ipv6 mulitcast packets > if there is no ipv6 address assigned on the brigde and no > external mld querier is present. and if the bridge internal querier is enabled (usually disabled by default in the bridge code, but enabled by default in OpenWRT for instance). > > When the bridge fails to build mld queries, because it has no > ipv6 address, it silently returns, but keeps the local querier enabled. > (br_multicast.c:832) Not sure whether David or others like line numbers in commit messages, as they can change over time. > > Ipv6 multicast snooping can only work if: > a) an external querier is present maybe clarify that this is an OR, not AND? I think you can add a [PATCH net] tag, as it seems small enough for stable kernels and fixes a potential, confusing packet loss case. Also maybe add a: -- Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()") -- Regards, Linus PS: Does not seem to apply for me on either David's net branch or Torvald's master branch. "fatal: patch fragment without header at line 7: @@ -599,10 +612,12 @@ static inline bool" Try using "git format-patch" and "git send-email" instead. Also check ./scripts/get_maintainer.pl for a few more email addresses to add.
[PATCHv2 net] bridge: fix igmp / mld query parsing
With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMP and MLD query parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header. If there is a querier somewhere else, then this either causes the multicast snooping to stay disabled even though it could be enabled. Or, if we have the querier enabled too, then this can create unnecessary IGMP / MLD query messages on the link. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Simon Wunderlich <s...@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- v2: changed "int" to "unsigned int" net/bridge/br_multicast.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 03661d9..ea98937 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1270,6 +1270,7 @@ static int br_ip4_multicast_query(struct net_bridge *br, struct br_ip saddr; unsigned long max_delay; unsigned long now = jiffies; + unsigned int offset = skb_transport_offset(skb); __be32 group; int err = 0; @@ -1280,14 +1281,14 @@ static int br_ip4_multicast_query(struct net_bridge *br, group = ih->group; - if (skb->len == sizeof(*ih)) { + if (skb->len == offset + sizeof(*ih)) { max_delay = ih->code * (HZ / IGMP_TIMER_SCALE); if (!max_delay) { max_delay = 10 * HZ; group = 0; } - } else if (skb->len >= sizeof(*ih3)) { + } else if (skb->len >= offset + sizeof(*ih3)) { ih3 = igmpv3_query_hdr(skb); if (ih3->nsrcs) goto out; @@ -1348,6 +1349,7 @@ static int br_ip6_multicast_query(struct net_bridge *br, struct br_ip saddr; unsigned long max_delay; unsigned long now = jiffies; + unsigned int offset = skb_transport_offset(skb); const struct in6_addr *group = NULL; bool is_general_query; int err = 0; @@ -1357,8 +1359,8 @@ static int br_ip6_multicast_query(struct net_bridge *br, (port && port->state == BR_STATE_DISABLED)) goto out; - if (skb->len == sizeof(*mld)) { - if (!pskb_may_pull(skb, sizeof(*mld))) { + if (skb->len == offset + sizeof(*mld)) { + if (!pskb_may_pull(skb, offset + sizeof(*mld))) { err = -EINVAL; goto out; } @@ -1367,7 +1369,7 @@ static int br_ip6_multicast_query(struct net_bridge *br, if (max_delay) group = >mld_mca; } else { - if (!pskb_may_pull(skb, sizeof(*mld2q))) { + if (!pskb_may_pull(skb, offset + sizeof(*mld2q))) { err = -EINVAL; goto out; } -- 1.7.10.4
Re: [PATCH net] bridge: fix igmp / mld query parsing
On Tue, May 03, 2016 at 01:26:23PM -0700, Stephen Hemminger wrote: > On Tue, 3 May 2016 22:18:54 +0200 > Linus Lüssing <linus.luess...@c0d3.blue> wrote: > > > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c > > index 03661d9..7105cdf 100644 > > --- a/net/bridge/br_multicast.c > > +++ b/net/bridge/br_multicast.c > > @@ -1271,6 +1271,7 @@ static int br_ip4_multicast_query(struct net_bridge > > *br, > > unsigned long max_delay; > > unsigned long now = jiffies; > > __be32 group; > > + int offset = skb_transport_offset(skb); > shouldn't this be unsigned? Yes, should always be unsigned here. Ok, I'm changing that (even though skb_transport_offset() is "static inline int").
[PATCH net] bridge: fix igmp / mld query parsing
With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMP and MLD query parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header. If there is a querier somewhere else, then this either causes the multicast snooping to stay disabled even though it could be enabled. Or, if we have the querier enabled too, then this can create unnecessary IGMP / MLD query messages on the link. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Simon Wunderlich <s...@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- net/bridge/br_multicast.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 03661d9..7105cdf 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1271,6 +1271,7 @@ static int br_ip4_multicast_query(struct net_bridge *br, unsigned long max_delay; unsigned long now = jiffies; __be32 group; + int offset = skb_transport_offset(skb); int err = 0; spin_lock(>multicast_lock); @@ -1280,14 +1281,14 @@ static int br_ip4_multicast_query(struct net_bridge *br, group = ih->group; - if (skb->len == sizeof(*ih)) { + if (skb->len == offset + sizeof(*ih)) { max_delay = ih->code * (HZ / IGMP_TIMER_SCALE); if (!max_delay) { max_delay = 10 * HZ; group = 0; } - } else if (skb->len >= sizeof(*ih3)) { + } else if (skb->len >= offset + sizeof(*ih3)) { ih3 = igmpv3_query_hdr(skb); if (ih3->nsrcs) goto out; @@ -1350,6 +1351,7 @@ static int br_ip6_multicast_query(struct net_bridge *br, unsigned long now = jiffies; const struct in6_addr *group = NULL; bool is_general_query; + int offset = skb_transport_offset(skb); int err = 0; spin_lock(>multicast_lock); @@ -1357,8 +1359,8 @@ static int br_ip6_multicast_query(struct net_bridge *br, (port && port->state == BR_STATE_DISABLED)) goto out; - if (skb->len == sizeof(*mld)) { - if (!pskb_may_pull(skb, sizeof(*mld))) { + if (skb->len == offset + sizeof(*mld)) { + if (!pskb_may_pull(skb, offset + sizeof(*mld))) { err = -EINVAL; goto out; } @@ -1367,7 +1369,7 @@ static int br_ip6_multicast_query(struct net_bridge *br, if (max_delay) group = >mld_mca; } else { - if (!pskb_may_pull(skb, sizeof(*mld2q))) { + if (!pskb_may_pull(skb, offset + sizeof(*mld2q))) { err = -EINVAL; goto out; } -- 1.7.10.4
[PATCHv3] net: fix bridge multicast packet checksum validation
We need to update the skb->csum after pulling the skb, otherwise an unnecessary checksum (re)computation can ocure for IGMP/MLD packets in the bridge code. Additionally this fixes the following splats for network devices / bridge ports with support for and enabled RX checksum offloading: [...] [ 43.986968] eth0: hw csum failure [ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2 [ 43.996193] Hardware name: BCM2709 [ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14) [ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90) [ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac) [ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178) [ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188) [ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338) [ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00) [ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c) [...] Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Álvaro Fernández Rojas <nolt...@gmail.com> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- v3: * Missed that an skb_postpush_rcsum() was just added for v4.5, using that one instead v2: * substituted the storing of the csum with an skb_push_rcsum() approach (did some more reading about CHECKSUM_PARTIAL, and it seems to me that the skb_push direction is actually the easier/"safer" one, so there should be no resetting to CHECKSUM_NONE necessary) * Rewording of commit message, this bug should not cause any packet loss due to "automatic" checksum recomputations in software if skb->csum is bogus (in the CHECKSUM_COMPLETE case) net/core/skbuff.c | 22 -- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5bf88f5..8616d11 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2948,6 +2948,24 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, EXPORT_SYMBOL_GPL(skb_append_pagefrags); /** + * skb_push_rcsum - push skb and update receive checksum + * @skb: buffer to update + * @len: length of data pulled + * + * This function performs an skb_push on the packet and updates + * the CHECKSUM_COMPLETE checksum. It should be used on + * receive path processing instead of skb_push unless you know + * that the checksum difference is zero (e.g., a valid IP header) + * or you are setting ip_summed to CHECKSUM_NONE. + */ +static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len) +{ + skb_push(skb, len); + skb_postpush_rcsum(skb, skb->data, len); + return skb->data; +} + +/** * skb_pull_rcsum - pull skb and update receive checksum * @skb: buffer to update * @len: length of data pulled @@ -4084,9 +4102,9 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, if (!pskb_may_pull(skb_chk, offset)) goto err; - __skb_pull(skb_chk, offset); + skb_pull_rcsum(skb_chk, offset); ret = skb_chkf(skb_chk); - __skb_push(skb_chk, offset); + skb_push_rcsum(skb_chk, offset); if (ret) goto err; -- 1.7.10.4
[PATCHv2] net: fix bridge multicast packet checksum validation
We need to update the skb->csum after pulling the skb, otherwise an unnecessary checksum (re)computation can ocure for IGMP/MLD packets in the bridge code. Additionally this fixes the following splats for network devices / bridge ports with support for and enabled RX checksum offloading: [...] [ 43.986968] eth0: hw csum failure [ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2 [ 43.996193] Hardware name: BCM2709 [ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14) [ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90) [ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac) [ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178) [ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188) [ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338) [ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00) [ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c) [...] Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Álvaro Fernández Rojas <nolt...@gmail.com> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- v2: * substituted the storing of the csum with an skb_push_rcsum() approach (did some more reading about CHECKSUM_PARTIAL, and it seems to me that the skb_push direction is actually the easier/"safer" one, so there should be no resetting to CHECKSUM_NONE necessary) * Rewording of commit message, this bug should not cause any packet loss due to "automatic" checksum recomputations in software if skb->csum is bogus (in the CHECKSUM_COMPLETE case) include/linux/skbuff.h | 17 + net/core/skbuff.c | 22 -- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4ce9ff7..d66e007 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2784,6 +2784,23 @@ static inline int skb_linearize_cow(struct sk_buff *skb) } /** + * skb_postpush_rcsum - update checksum for received skb after push + * @skb: buffer to update + * @start: start of data before push + * @len: length of data pushed + * + * After doing a push on a received packet, you need to call this to + * update the CHECKSUM_COMPLETE checksum. + */ + +static inline void skb_postpush_rcsum(struct sk_buff *skb, + const void *start, unsigned int len) +{ + if (skb->ip_summed == CHECKSUM_COMPLETE) + skb->csum = csum_add(skb->csum, csum_partial(start, len, 0)); +} + +/** * skb_postpull_rcsum - update checksum for received skb after pull * @skb: buffer to update * @start: start of data before pull diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5bf88f5..8616d11 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2948,6 +2948,24 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, EXPORT_SYMBOL_GPL(skb_append_pagefrags); /** + * skb_push_rcsum - push skb and update receive checksum + * @skb: buffer to update + * @len: length of data pulled + * + * This function performs an skb_push on the packet and updates + * the CHECKSUM_COMPLETE checksum. It should be used on + * receive path processing instead of skb_push unless you know + * that the checksum difference is zero (e.g., a valid IP header) + * or you are setting ip_summed to CHECKSUM_NONE. + */ +static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len) +{ + skb_push(skb, len); + skb_postpush_rcsum(skb, skb->data, len); + return skb->data; +} + +/** * skb_pull_rcsum - pull skb and update receive checksum * @skb: buffer to update * @len: length of data pulled @@ -4084,9 +4102,9 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, if (!pskb_may_pull(skb_chk, offset)) goto err; - __skb_pull(skb_chk, offset); + skb_pull_rcsum(skb_chk, offset); ret = skb_chkf(skb_chk); - __skb_push(skb_chk, offset); + skb_push_rcsum(skb_chk, offset); if (ret) goto err; -- 1.7.10.4
Re: [PATCH] net: fix bridge multicast packet checksum validation
On Thu, Feb 18, 2016 at 01:51:34PM +0100, Steinar H. Gunderson wrote: > On Mon, Feb 15, 2016 at 03:07:06AM +0100, Linus Lüssing wrote: > > Steinar, can you check whether this fixes the bridge issues you reported on > > bugzilla #99081? Not quite sure whether it is the same as yours as you > > do not seem to have any such call traces. > > It doesn't immediately sound like the same problem; why would promisc change > anything if the problem is the checksumming? The mdb you provided in the bugzilla ticket misses reports, so it was unable to parse reports. Which could point to a checksumming problem. Enabling promisc probably did not fix the parsing for you, but instead promisc forces to forward packets upstream on your interface independent of the mdb. I would assume that even with promisc, your output from "bridge mdb show" looks rather empty. Can you check? > > I don't have any reboots scheduled for this machine right now, but I'll see > what I can do wrt. testing. Thanks :).
[PATCH] net: fix bridge multicast packet checksum validation
We need to update the skb->csum after pulling the skb, otherwise checksum validation will fail. This fixes multicast packet loss for bridges and splats like the following: [...] [ 43.986968] eth0: hw csum failure [ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2 [ 43.996193] Hardware name: BCM2709 [ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14) [ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90) [ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac) [ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178) [ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188) [ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338) [ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00) [ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c) [...] Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Álvaro Fernández Rojas <nolt...@gmail.com> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- Steinar, can you check whether this fixes the bridge issues you reported on bugzilla #99081? Not quite sure whether it is the same as yours as you do not seem to have any such call traces. I am not super happy with how this patch looks, but there is no "skb_push_rcsum" available and skb_pull_rcsum() seems non-reversible as is. Alternative suggestions always welcome. net/core/skbuff.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5bf88f5..6c34ef6 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4076,6 +4076,11 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, struct sk_buff *skb_chk; unsigned int offset = skb_transport_offset(skb); __sum16 ret; + int ip_summed; + int csum_valid; + int csum_level; + int csum_bad; + __wsum csum; skb_chk = skb_checksum_maybe_trim(skb, transport_len); if (!skb_chk) @@ -4084,10 +4089,22 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, if (!pskb_may_pull(skb_chk, offset)) goto err; - __skb_pull(skb_chk, offset); + ip_summed = skb->ip_summed; + csum_valid = skb->csum_valid; + csum_level = skb->csum_level; + csum_bad = skb->csum_bad; + csum = skb->csum; + + skb_pull_rcsum(skb_chk, offset); ret = skb_chkf(skb_chk); __skb_push(skb_chk, offset); + skb->ip_summed = ip_summed; + skb->csum_valid = csum_valid; + skb->csum_level = csum_level; + skb->csum_bad = csum_bad; + skb->csum = csum; + if (ret) goto err; -- 1.7.10.4
[PATCH net] bridge: fix igmpv3 / mldv2 report parsing
With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMPv3 and MLDv2 report parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header, breaking the message parsing and creating packet loss. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Tobias Powalowski <tobias.powalow...@googlemail.com> Tested-by: Tobias Powalowski <tobias.powalow...@googlemail.com> Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue> --- net/bridge/br_multicast.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 66efdc2..480b3de 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1006,7 +1006,7 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br, ih = igmpv3_report_hdr(skb); num = ntohs(ih->ngrec); - len = sizeof(*ih); + len = skb_transport_offset(skb) + sizeof(*ih); for (i = 0; i < num; i++) { len += sizeof(*grec); @@ -1067,7 +1067,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br, icmp6h = icmp6_hdr(skb); num = ntohs(icmp6h->icmp6_dataun.un_data16[1]); - len = sizeof(*icmp6h); + len = skb_transport_offset(skb) + sizeof(*icmp6h); for (i = 0; i < num; i++) { __be16 *nsrcs, _nsrcs; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code
The recent refactoring of the IGMP and MLD parsing code into ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash / BUG() invocation for bridges: I wrongly assumed that skb_get() could be used as a simple reference counter for an skb which is not the case. skb_get() bears additional semantics, a user count. This leads to a BUG() invocation in pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb with a user count greater than one - unfortunately the refactoring did just that. Fixing this by removing the skb_get() call and changing the API: The caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to additionally check whether the returned skb_trimmed is a clone. Fixes: 9afd85c9e455 (net: Export IGMP/MLD message validation code) Reported-by: Brenden Blanco bbla...@plumgrid.com Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue --- net/bridge/br_multicast.c |4 ++-- net/core/skbuff.c | 37 ++--- net/ipv4/igmp.c | 33 ++--- net/ipv6/mcast_snoop.c| 33 ++--- 4 files changed, 56 insertions(+), 51 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 0b39dcc..1285eaf 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1591,7 +1591,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br, break; } - if (skb_trimmed) + if (skb_trimmed skb_trimmed != skb) kfree_skb(skb_trimmed); return err; @@ -1636,7 +1636,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br, break; } - if (skb_trimmed) + if (skb_trimmed skb_trimmed != skb) kfree_skb(skb_trimmed); return err; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b6a19ca..bf9a5d9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4022,8 +4022,8 @@ EXPORT_SYMBOL(skb_checksum_setup); * Otherwise returns the provided skb. Returns NULL in error cases * (e.g. transport_len exceeds skb length or out-of-memory). * - * Caller needs to set the skb transport header and release the returned skb. - * Provided skb is consumed. + * Caller needs to set the skb transport header and free any returned skb if it + * differs from the provided skb. */ static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb, unsigned int transport_len) @@ -4032,16 +4032,12 @@ static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb, unsigned int len = skb_transport_offset(skb) + transport_len; int ret; - if (skb-len len) { - kfree_skb(skb); + if (skb-len len) return NULL; - } else if (skb-len == len) { + else if (skb-len == len) return skb; - } skb_chk = skb_clone(skb, GFP_ATOMIC); - kfree_skb(skb); - if (!skb_chk) return NULL; @@ -4066,8 +4062,8 @@ static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb, * If the skb has data beyond the given transport length, then a * trimmed cloned skb is checked and returned. * - * Caller needs to set the skb transport header and release the returned skb. - * Provided skb is consumed. + * Caller needs to set the skb transport header and free any returned skb if it + * differs from the provided skb. */ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, unsigned int transport_len, @@ -4079,23 +4075,26 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb, skb_chk = skb_checksum_maybe_trim(skb, transport_len); if (!skb_chk) - return NULL; + goto err; - if (!pskb_may_pull(skb_chk, offset)) { - kfree_skb(skb_chk); - return NULL; - } + if (!pskb_may_pull(skb_chk, offset)) + goto err; __skb_pull(skb_chk, offset); ret = skb_chkf(skb_chk); __skb_push(skb_chk, offset); - if (ret) { - kfree_skb(skb_chk); - return NULL; - } + if (ret) + goto err; return skb_chk; + +err: + if (skb_chk skb_chk != skb) + kfree_skb(skb_chk); + + return NULL; + } EXPORT_SYMBOL(skb_checksum_trimmed); diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 651cdf6..9fdfd9d 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1435,33 +1435,35 @@ static int __ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed) struct sk_buff *skb_chk; unsigned int transport_len; unsigned int len = skb_transport_offset(skb) + sizeof(struct igmphdr); - int ret; + int ret = -EINVAL; transport_len = ntohs(ip_hdr(skb)-tot_len) - ip_hdrlen(skb); - skb_get(skb
Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote: Doing some code reading with Alexei, we found a suspect commit, which introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG when skb-len == len. Urgh, didn't know that pskb_may_pull() doesn't like an skb with a reference count greater than one... But yes, the BUG() call in skbuff.c:1128 / pskb_expand_head() says that (though in this case the BUG() in skbuff.c call actually seems kinda weird (/wrong?), as it isn't shared between different code paths). Thanks for the thorough analysis, going to provide a patch within the next 24h (hopefully). Cheers, Linus -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
On Tue, Aug 11, 2015 at 10:51:40PM +0200, Linus Lüssing wrote: On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote: Doing some code reading with Alexei, we found a suspect commit, which introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG when skb-len == len. Urgh, didn't know that pskb_may_pull() doesn't like an skb with a reference count greater than one... But yes, the BUG() call in skbuff.c:1128 / pskb_expand_head() says that (though in this case the BUG() in skbuff.c call actually seems kinda weird (/wrong?), as it isn't shared between different code paths). The more I think about it, I'm tending to remove the BUG() call in pskb_expand_head() as in this case it obviously isn't a bug. The skb_get() allows a simple and in my opinion easy to read cleanup part of skb_trimmed for any caller of ip{v6,}_mc_check_mld(). No need to check whether skb == skb_trimmed for a caller for instance, simply checking whether skb_trimmed exists is enough. Any objections to remove the if (skb_shared(skb)) BUG() part in pskb_expand_head()? Or would there be any other undesired side effects in utilising skb_get() like that? Cheers, Linus -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Caller or callee setting skb-network_header?
Hi, For an .ndo_start_xmit handler, can the callee rely on the caller setting the skb network header? Or should the callee set it before performing any skb_network_header()/ip_hdr()/ipv6_hdr()/... calls? Cheers, Linus PS: Currently looking at batman-adv's ndo_start_xmit handler interface_tx(), whether an skb_set_network_header() should be added there as it uses ip_hdr()/ipv6_hdr() later (- batadv_mcast_forw_mode_check_ipv{4,6}()). Testing this in VMs it seemed to be set correctly by the caller, but wanted to check whether this can or shouldn't be relied on. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] bridge: fix br_multicast_query_expired() bug
On Thu, May 28, 2015 at 04:42:54AM -0700, Eric Dumazet wrote: Intent of the code was to clear port field, not the pointer to querier. Acked-by: Linus Lüssing linus.luess...@c0d3.blue -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3 net-next] bridge: allow setting hash_max + multicast_router if interface is down
Network managers like netifd (used in OpenWRT for instance) try to configure interface options after creation but before setting the interface up. Unfortunately the sysfs / bridge currently only allows to configure the hash_max and multicast_router options when the bridge interface is up. But since br_multicast_init() doesn't start any timers and only sets default values and initializes timers it should be save to reconfigure the default values after that, before things actually get active after the bridge is set up. Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue --- Changelog v3: * Readded two breaks (cosmetic / for future safety reasons) Changelog v2: * remove another now unnecessary -ENOENT initialization * consistently initialize to -EINVAL, allowing to shorten two switch-cases net/bridge/br_multicast.c | 24 +++- 1 file changed, 3 insertions(+), 21 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 2d69d5c..28a87c2 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1772,11 +1772,9 @@ out: int br_multicast_set_router(struct net_bridge *br, unsigned long val) { - int err = -ENOENT; + int err = -EINVAL; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; switch (val) { case 0: @@ -1787,13 +1785,8 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val) br-multicast_router = val; err = 0; break; - - default: - err = -EINVAL; - break; } -unlock: spin_unlock_bh(br-multicast_lock); return err; @@ -1802,11 +1795,9 @@ unlock: int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) { struct net_bridge *br = p-br; - int err = -ENOENT; + int err = -EINVAL; spin_lock(br-multicast_lock); - if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED) - goto unlock; switch (val) { case 0: @@ -1828,13 +1819,8 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) br_multicast_add_router(br, p); break; - - default: - err = -EINVAL; - break; } -unlock: spin_unlock(br-multicast_lock); return err; @@ -1939,15 +1925,11 @@ unlock: int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) { - int err = -ENOENT; + int err = -EINVAL; u32 old; struct net_bridge_mdb_htable *mdb; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; - - err = -EINVAL; if (!is_power_of_2(val)) goto unlock; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 net-next] bridge: allow setting hash_max + multicast_router if interface is down
Network managers like netifd (used in OpenWRT for instance) try to configure interface options after creation but before setting the interface up. Unfortunately the sysfs / bridge currently only allows to configure the hash_max and multicast_router options when the bridge interface is up. But since br_multicast_init() doesn't start any timers and only sets default values and initializes timers it should be save to reconfigure the default values after that, before things actually get active after the bridge is set up. Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue --- Changelog v2: * remove another now unnecessary -ENOENT initialization * consistently initialize to -EINVAL, allowing to shorten two switch-cases net/bridge/br_multicast.c | 26 +++--- 1 file changed, 3 insertions(+), 23 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 2d69d5c..9955fa4 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1772,11 +1772,9 @@ out: int br_multicast_set_router(struct net_bridge *br, unsigned long val) { - int err = -ENOENT; + int err = -EINVAL; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; switch (val) { case 0: @@ -1786,14 +1784,8 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val) case 1: br-multicast_router = val; err = 0; - break; - - default: - err = -EINVAL; - break; } -unlock: spin_unlock_bh(br-multicast_lock); return err; @@ -1802,11 +1794,9 @@ unlock: int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) { struct net_bridge *br = p-br; - int err = -ENOENT; + int err = -EINVAL; spin_lock(br-multicast_lock); - if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED) - goto unlock; switch (val) { case 0: @@ -1827,14 +1817,8 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) break; br_multicast_add_router(br, p); - break; - - default: - err = -EINVAL; - break; } -unlock: spin_unlock(br-multicast_lock); return err; @@ -1939,15 +1923,11 @@ unlock: int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) { - int err = -ENOENT; + int err = -EINVAL; u32 old; struct net_bridge_mdb_htable *mdb; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; - - err = -EINVAL; if (!is_power_of_2(val)) goto unlock; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next] bridge: allow setting hash_max + multicast_router if interface is down
On Thu, May 21, 2015 at 11:49:21AM +0800, Herbert Xu wrote: The timer operations are all supposed to be idempotent. So enabling a port twice or stopping it twice should be OK. Oki doki. * Might calls to br_multicast_add_router() via br_multicast_enable_port() cause unintended side-effects? What do you mean? How does add_router get called from enable_port? Sorry, ment br_multicast_add_router() via br_multicast_set_port_router(). But it's not modifying any timers, and other modifications are locked by the multicast lock, right. See above. It's there so that you don't readd a timer when we're calling del_timer_sync. del_timer_sync has to be called without the multicast lock so that's why we need another mechanism to prevent the timers from being readded. Right, all the touched functions never rearm a timer. The multicast_router timer may only get readded upon receiving a multicast query. (br_multicast_query_received()-br_multicast_mark_router() ) By removing the netif_running check we might only delete a timer which wasn't running anyway which as you said already is safe. AFAICS the spots you patched aren't adding timers so they *should* be OK. Okay, thanks for your thorough explanations about the timers and how the locking is supposed to work. After your explanations I went over the code a few more times and am fairly confident too now, that this patch is supposed to work fine. Going to resend this patch without the RFC tag. Cheers, Linus -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] bridge: allow setting hash_max + multicast_router if interface is down
Network managers like netifd (used in OpenWRT for instance) try to configure interface options after creation but before setting the interface up. Unfortunately the sysfs / bridge currently only allows to configure the hash_max and multicast_router options when the bridge interface is up. But since br_multicast_init() doesn't start any timers and only sets default values and initializes timers it should be save to reconfigure the default values after that, before things actually get active after the bridge is set up. With this patch hash_max and multicast_router attributes can be changed even if the according bridge (port) is down, just like other other bridge (port) attributes allow too. Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue --- Changelog: * [RFC PATCH net-next] - [PATCH net-next] net/bridge/br_multicast.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 2d69d5c..066199e 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1775,8 +1775,6 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val) int err = -ENOENT; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; switch (val) { case 0: @@ -1793,7 +1791,6 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val) break; } -unlock: spin_unlock_bh(br-multicast_lock); return err; @@ -1802,18 +1799,15 @@ unlock: int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) { struct net_bridge *br = p-br; - int err = -ENOENT; + int err = 0; spin_lock(br-multicast_lock); - if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED) - goto unlock; switch (val) { case 0: case 1: case 2: p-multicast_router = val; - err = 0; if (val 2 !hlist_unhashed(p-rlist)) hlist_del_init_rcu(p-rlist); @@ -1834,7 +1828,6 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) break; } -unlock: spin_unlock(br-multicast_lock); return err; @@ -1939,15 +1932,11 @@ unlock: int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) { - int err = -ENOENT; + int err = -EINVAL; u32 old; struct net_bridge_mdb_htable *mdb; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; - - err = -EINVAL; if (!is_power_of_2(val)) goto unlock; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net-next] bridge: allow setting hash_max + multicast_router if interface is down
Network managers like netifd (used in OpenWRT for instance) try to configure interface options after creation but before setting the interface up. Unfortunately the sysfs / bridge currently only allows to configure the hash_max and multicast_router options when the bridge interface is up. But since br_multicast_init() doesn't start any timers and only sets default values and initializes timers it should be save to reconfigure the default values after that, before things actually get active after the bridge is set up. With this patch hash_max and multicast_router attributes can be changed even if the according bridge (port) is down, just like other other bridge (port) attributes allow too. Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue --- I'm currently a little unsure about a few things (that's why I'm sending this as an RFC): * For i=br_multicast_init(), e=br_multicast_enable() and s=br_multicast_stop() is the order i-e-s-e-s-e-... always ensured by the netdev API? Will this work even if I have many fast and wild calls to ip link set up dev br0 and ip link set down dev br0 and some changes to hash_max and multicast_router in between? * Might calls to br_multicast_add_router() via br_multicast_enable_port() cause unintended side-effects? * (maybe independent of this patch: ) Can fast changes to the multicast_router attribute of a bridge (port) cause race conditions because a del_timer() instead of a del_timer_sync() is used in br_multicast_set_{port,}router()? (or: why does br_multicast_stop() use del_timer_sync() while br_multicast_disable_port() doesn't?) * @Herbert: Do you remember whether there was a reason for checking netif_running() or the bridge port state which I might have overlooked? PS: Running this patch didn't create any crashes for me so far and multicast traffic still runs fine. No locust infestation in sight yet either. Still careful though :P. net/bridge/br_multicast.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 2d69d5c..066199e 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1775,8 +1775,6 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val) int err = -ENOENT; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; switch (val) { case 0: @@ -1793,7 +1791,6 @@ int br_multicast_set_router(struct net_bridge *br, unsigned long val) break; } -unlock: spin_unlock_bh(br-multicast_lock); return err; @@ -1802,18 +1799,15 @@ unlock: int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) { struct net_bridge *br = p-br; - int err = -ENOENT; + int err = 0; spin_lock(br-multicast_lock); - if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED) - goto unlock; switch (val) { case 0: case 1: case 2: p-multicast_router = val; - err = 0; if (val 2 !hlist_unhashed(p-rlist)) hlist_del_init_rcu(p-rlist); @@ -1834,7 +1828,6 @@ int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val) break; } -unlock: spin_unlock(br-multicast_lock); return err; @@ -1939,15 +1932,11 @@ unlock: int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) { - int err = -ENOENT; + int err = -EINVAL; u32 old; struct net_bridge_mdb_htable *mdb; spin_lock_bh(br-multicast_lock); - if (!netif_running(br-dev)) - goto unlock; - - err = -EINVAL; if (!is_power_of_2(val)) goto unlock; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html