Re: [B.A.T.M.A.N.] [PATCH 03/17] batman-adv: Add network_coding and mcast sysfs files to README

2018-03-27 Thread Linus Lüssing
On Sat, Oct 29, 2016 at 12:56:28PM +0200, Jiri Pirko wrote:
> >> I strongly believe it is a huge mistake to use sysfs for things like
> >> this. This should be done via generic netlink api.
> >
> >This doesn't change the problem that it is already that way. This patch
> >only adds the list of available files to the README.
> 
> Sure. Just found out you did it like that. Therefore I commented. I
> suggest to rework the api to use genl entirely.

Hi Jiri,

Thanks for sharing your thoughts!

Could you explain a bit more on which disadvantages you see in
the usage of sysfs here?

Regards, Linus


Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw

2018-03-12 Thread Linus Lüssing
On Mon, Mar 12, 2018 at 12:08:56PM +0100, Linus Lüssing wrote:
> On Tue, Feb 27, 2018 at 11:08:20AM +0100, Rafał Miłecki wrote:
> > I've problem when using OpenWrt/LEDE on a home router with Broadcom's
> > FullMAC WiFi chipset.
> 
> Hi Rafał,
> 
> Thanks for reporting this issue!
> 
> > Can you see any solution for this problem? Is that an option to stop
> > multicast-to-unicast from touching 802.11f packets? Some other ideas?
> > Obviously I can't modify Broadcom's firmware and drop that obsoleted
> > standard.
> 
> Just to avoid some potential confusion: This is more an issue of
> hairpinning than it is an issue of multicast-to-unicast per se,
> right?
> 
> That is, if you set this to 0 manually:
> /sys/class/net//brport/multicast_to_unicast
> Then the issue still occurs, right?

Btw., if in OpenWRT/LEDE you set 'option multicast_to_unicast 0'
in /etc/config/network for the bridge (and not bridge port)
then netifd should refrain from setting the bridge hairpinning and
AP isolation on wireless devices, too, if I remember correctly.

Can you confirm that the issue disappears for you then?


Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw

2018-03-12 Thread Linus Lüssing
On Mon, Mar 12, 2018 at 10:46:45AM +0100, Rafał Miłecki wrote:
> On 27 February 2018 at 18:05, Stephen Hemminger
[...]
> > ebtables is your friend in dealing with weird and broken devices.
> 
> It may be weird, not sure if actually broken. Anyway I'd like to have
> some generic solution instead of telling every user to use ebtables to
> workaround the problem.

I agree that a "broken by default" in OpenWRT/LEDE for a variety
of Broadcom devices is not really acceptable.

Technically we could teach netifd in OpenWRT/LEDE to configure
ebtables accordingly, at least for a list of affected devices,
so that users would not have to. However, as ebtables is not
managed by the fw3 in OpenWRT/LEDE, that would probably interfer
with user provided ebtables rules and scripts...


> That said I think we still should look for a solution for existing
> firmwares. I guess it may takes months to years to never to release
> new firmwares for all supported chipsets.

Hm, we could change the default in OpenWRT/LEDE for
multicast-to-unicast (or more precisely bridge hairpinning) to
disabled again for now.


Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw

2018-03-12 Thread Linus Lüssing
On Tue, Feb 27, 2018 at 11:08:20AM +0100, Rafał Miłecki wrote:
> I've problem when using OpenWrt/LEDE on a home router with Broadcom's
> FullMAC WiFi chipset.

Hi Rafał,

Thanks for reporting this issue!

> Can you see any solution for this problem? Is that an option to stop
> multicast-to-unicast from touching 802.11f packets? Some other ideas?
> Obviously I can't modify Broadcom's firmware and drop that obsoleted
> standard.

Just to avoid some potential confusion: This is more an issue of
hairpinning than it is an issue of multicast-to-unicast per se,
right?

That is, if you set this to 0 manually:
/sys/class/net//brport/multicast_to_unicast
Then the issue still occurs, right?

Regards, Linus


Re: [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state

2017-12-07 Thread Linus Lüssing
On Fri, Dec 08, 2017 at 06:46:06AM +0100, Linus Lüssing wrote:
> Extending the usersize to include info->prev would probably be too
> hackish/ugly, right?

And wouldn't be enough anyway, since
info->{credit,credit_cap,cost} would still be zeroed... Hm.


Re: [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state

2017-12-07 Thread Linus Lüssing
On Thu, Dec 07, 2017 at 01:26:19AM +0100, Pablo Neira Ayuso wrote:
> > I also had a quick look at a 4.15-rc1 kernel in a VM now. I still
> > end up in ebt_limit_mt_check() with the variables being reset
> > when editing the table somewhere.
> 
> My question is if your fix would work with 4.15-rc1.

You are absoluetly right, it's not working anymore since the
commit you mentioned initially :-(
("xtables: extend matches and targets with .usersize").
info->prev is always 0 since exactly this commit.

That means, trying tricks in ebt_limit_mt_check() is too
late now, the old values are already overwritten? (or is there
some commit scheme which installs the ebt_limit_info provided
by ebt_limit_check() some time after its call?)

Extending the usersize to include info->prev would probably be too
hackish/ugly, right?


Re: [Bridge] [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state

2017-12-03 Thread Linus Lüssing
On Mon, Dec 04, 2017 at 05:53:35AM +0100, Linus Lüssing wrote:
> And so, no I do not have this patch. I looked at it now, but it
> does not seem to have any relation with .matchinfo, does it?

Relation between .usersize and .checkentry I ment, not
.usersize and .matchinfo.


Re: [PATCH net-next] bridge: ebtables: Avoid resetting limit rule state

2017-12-03 Thread Linus Lüssing
Hi Pablo,

Thanks for your reply!

On Tue, Nov 28, 2017 at 12:30:08AM +0100, Pablo Neira Ayuso wrote:
> [...]
> > diff --git a/net/bridge/netfilter/ebt_limit.c 
> > b/net/bridge/netfilter/ebt_limit.c
> > index 61a9f1be1263..f74b48633feb 100644
> > --- a/net/bridge/netfilter/ebt_limit.c
> > +++ b/net/bridge/netfilter/ebt_limit.c
> > @@ -69,6 +69,10 @@ static int ebt_limit_mt_check(const struct 
> > xt_mtchk_param *par)
> >  {
> > struct ebt_limit_info *info = par->matchinfo;
> >  
> > +   /* Do not reset state on unrelated table changes */
> > +   if (info->prev)
> > +   return 0;
> 
> What kernel version are you using? I suspect you don't have this
> applied?

I'm indeed using a 4.4.102 kernel, as LEDE is still in the process
of updating to 4.14. So 4.4 with LEDE is where I got the measurement
results from.

> 
> commit ec23189049651b16dc2ffab35a4371dc1f491aca
> Author: Willem de Bruijn 
> Date:   Mon Jan 2 17:19:46 2017 -0500
> 
> xtables: extend matches and targets with .usersize

And so, no I do not have this patch. I looked at it now, but it
does not seem to have any relation with .matchinfo, does it?

I also had a quick look at a 4.15-rc1 kernel in a VM now. I still
end up in ebt_limit_mt_check() with the variables being reset
when editing the table somewhere.

Regards, Linus


[PATCH net-next] bridge: ebtables: Avoid resetting limit rule state

2017-11-24 Thread Linus Lüssing
So far any changes with ebtables will reset the state of limit rules,
leading to spikes in traffic. This is especially noticeable if changes
are done frequently, for instance via a daemon.

This patch fixes this by bailing out from (re)setting if the limit
rule was initialized before.

When sending packets every 250ms for 600s, with a
"--limit 1/sec --limit-burst 50" rule and a command like this
in the background:

$ ebtables -N VOIDCHAIN
$ while true; do ebtables -F VOIDCHAIN; sleep 30; done

The results are:

Before: ~1600 packets
After: 650 packets

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---
 net/bridge/netfilter/ebt_limit.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/bridge/netfilter/ebt_limit.c b/net/bridge/netfilter/ebt_limit.c
index 61a9f1be1263..f74b48633feb 100644
--- a/net/bridge/netfilter/ebt_limit.c
+++ b/net/bridge/netfilter/ebt_limit.c
@@ -69,6 +69,10 @@ static int ebt_limit_mt_check(const struct xt_mtchk_param 
*par)
 {
struct ebt_limit_info *info = par->matchinfo;
 
+   /* Do not reset state on unrelated table changes */
+   if (info->prev)
+   return 0;
+
/* Check for overflow. */
if (info->burst == 0 ||
user2credits(info->avg * info->burst) < user2credits(info->avg)) {
-- 
2.11.0



[PATCH net v3] bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port

2017-04-19 Thread Linus Lüssing
When trying to redirect bridged frames to the bridge device itself or
a bridge port (brouting) via the dnat target then this currently fails:

The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge device or port just fine. However, the IP code drops it in
the beginning of ip_input.c/ip_rcv() as the dnat target left
the skb->pkt_type as PACKET_OTHERHOST.

Fixing this by resetting skb->pkt_type to an appropriate type after
dnat'ing.

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

Changelog v3:
- moved pkt_type fixup into ebtable dnat code
  -> v1/v2 only fixed it for prerouting/dnat so far, now tested
 and verified that v3 fixes it for brouting/dnat, too
- updated commit message

Changelog v2:
- refrain from altering pkt_type for multicast packets
  with a unicast destination MAC
---
 net/bridge/netfilter/ebt_dnat.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/net/bridge/netfilter/ebt_dnat.c b/net/bridge/netfilter/ebt_dnat.c
index 4e0b0c3..21acb53 100644
--- a/net/bridge/netfilter/ebt_dnat.c
+++ b/net/bridge/netfilter/ebt_dnat.c
@@ -9,6 +9,7 @@
  */
 #include 
 #include 
+#include "../br_private.h"
 #include 
 #include 
 #include 
@@ -18,11 +19,32 @@ static unsigned int
 ebt_dnat_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
const struct ebt_nat_info *info = par->targinfo;
+   struct net_device *dev;
 
if (!skb_make_writable(skb, 0))
return EBT_DROP;
 
ether_addr_copy(eth_hdr(skb)->h_dest, info->mac);
+
+   if (is_multicast_ether_addr(info->mac)) {
+   if (is_broadcast_ether_addr(info->mac))
+   skb->pkt_type = PACKET_BROADCAST;
+   else
+   skb->pkt_type = PACKET_MULTICAST;
+   } else {
+   rcu_read_lock();
+   if (xt_hooknum(par) != NF_BR_BROUTING)
+   dev = br_port_get_rcu(xt_in(par))->br->dev;
+   else
+   dev = xt_in(par);
+
+   if (ether_addr_equal(info->mac, dev->dev_addr))
+   skb->pkt_type = PACKET_HOST;
+   else
+   skb->pkt_type = PACKET_OTHERHOST;
+   rcu_read_unlock();
+   }
+
return info->target;
 }
 
-- 
2.1.4



Re: [PATCH v2] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-04-17 Thread Linus Lüssing
On Tue, Mar 21, 2017 at 04:32:45PM -0700, Stephen Hemminger wrote:
> On Tue, 21 Mar 2017 23:28:45 +0100
> Linus Lüssing <linus.luess...@c0d3.blue> wrote:
> 
> > However, the IP code drops it in the beginning of ip_input.c/ip_rcv()
> > as the dnat target did not update the skb->pkt_type. If after
> > dnat'ing the packet is now destined to us then the skb->pkt_type
> > needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too.
> 
> Why not fix DNAT netfilter module rather than hacking bridge code here.

Sorry for the late response. Wanted to do some more testing before
replying.

My assumptions regarding macvlan were wrong:

A) The code my patch adds is not touched in the case of a macvlan
device on top of a bridge - with macvlan there seems to be no FDB entry
for the MAC address of the macvlan device at all, actually, resulting
in the flooding of a frame intended for that device (is this, ehm,
known/intended?).

B) ip_rcv() does not drop for a macvlan device, because macvlan
unconditionally sets skb->pkt_type to PACKET_HOST in macvlan_handle_frame().

(And I guess, maybe I shouldn't actually care that much about
macvlans on top of a bridge, as a veth-pair is much cleaner for that?)


Regarding netdev::dev_addrs, if I understand the kernel code correctly, it
is only, potentially populated with more than netdev::dev_addr for two
drivers, via dev_addr_add(): bnx2x and ixgbe. So for a bridge device,
there should never be more than one item in netdev::dev_addrs
(= netdev::dev_addr), correct?


So, currently I'm happy with moving the fix into the ebtables dnat
code and just checking against the netdev::dev_addr of the bridge
device now. (if anyone has any suggestions regarding upper devices
I should test with that, then please let me know)

(Sorry, Pablo, for me pressing against your earlier suggestion to
put the fix in the ebtables dnat instead of bridge code. :( )

Regards, Linus


[PATCH v2] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-21 Thread Linus Lüssing
When trying to redirect bridged frames to the bridge device itself
via the ebtables nat-prerouting chain and the dnat target then this
currently fails:

The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge itself just fine and the correctly altered frame can even
be captured via a tcpdump on br0 (with or without promisc mode).

However, the IP code drops it in the beginning of ip_input.c/ip_rcv()
as the dnat target did not update the skb->pkt_type. If after
dnat'ing the packet is now destined to us then the skb->pkt_type
needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too.

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---
Changelog v2:
* refrain from altering pkt_type for multicast packets
  with a unicast destination MAC
---
 net/bridge/br_input.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 013f2290b..fd7bc4c 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -198,8 +198,13 @@ int br_handle_frame_finish(struct net *net, struct sock 
*sk, struct sk_buff *skb
if (dst) {
unsigned long now = jiffies;
 
-   if (dst->is_local)
+   if (dst->is_local) {
+   /* fix up potential DNAT inconsistencies */
+   if (skb->pkt_type == PACKET_OTHERHOST)
+   skb->pkt_type = PACKET_HOST;
+
return br_pass_frame_up(skb);
+   }
 
if (now != dst->used)
dst->used = now;
-- 
2.1.4



Re: [Bridge] [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-20 Thread Linus Lüssing
On Sun, Mar 19, 2017 at 05:55:06PM +0100, Linus Lüssing wrote:
> On Fri, Mar 17, 2017 at 02:10:44PM +0100, Pablo Neira Ayuso wrote:
> > Wait.
> > 
> > May this break local multicast listener that are bound to the bridge
> > interface? Assuming the bridge interface got an IP address, and that
> > there is local multicast listener.
> > 
> > Missing anything here?
> 
> Hm, for multicast packets usually the code path a few lines
> later in br_handle_frame_finish() should be taken instead.
> 
> But you might be right for IP multicast packets with a unicast MAC
> destination (due to whatever reason, for instance via DNAT'ing
> again).
> 
> Will check that - thanks!

Ok, I tested DNAT'ing an IP multicast packet to the unicast MAC address
of the bridge interface.

Both ping-ing to an IPv4 and IPv6 multicast listener on br0 worked
and was replied to fine, both with or without changing skb->pkt_type
from PACKET_MULTICAST to PACKET_HOST.
("$ ping 224.1.0.123" and "$ ping6 ff02::1:ff40:707c%in0" from a
 network namespace, tied into the bridge via veth)

Also, a DNAT'ed PACKET_BROADCAST worked, with or without changing
it to PACKET_HOST.

I also checked via tcpdump that the destination MAC was changed
successfully.


So, so far I wasn't able to find any bugs with the current
patch. But I think I like the idea of leaving the skb->pkt_type
unaltered for PACKET_MULTICAST and PACKET_BROADCAST, seems cleaner.

I'd just add an "if (skb->pkt_type == PACKET_OTHERHOST)" check
then and resend a PATCH v2.


Re: [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-19 Thread Linus Lüssing
On Fri, Mar 17, 2017 at 02:10:44PM +0100, Pablo Neira Ayuso wrote:
> Wait.
> 
> May this break local multicast listener that are bound to the bridge
> interface? Assuming the bridge interface got an IP address, and that
> there is local multicast listener.
> 
> Missing anything here?

Hm, for multicast packets usually the code path a few lines
later in br_handle_frame_finish() should be taken instead.

But you might be right for IP multicast packets with a unicast MAC
destination (due to whatever reason, for instance via DNAT'ing
again).

Will check that - thanks!


Re: [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-15 Thread Linus Lüssing
On Wed, Mar 15, 2017 at 07:15:39PM +0100, Pablo Neira Ayuso wrote:
> Could you update ebtables dnat to check if the ethernet address
> matches the one of the input bridge interface, so we mangle the
> ->pkt_type accordingly from there, instead of doing this from the
> core?

Actually, that was the approach I thought about and went for first
(and it would probably work for me). Just checking against the
bridge device's net_device::dev_addr.

I scratched it though, as I was afraid that the issue might still
exist for people using some other upper device on top of the bridge
device. For instance, macvlan? And iterating over the
net_device::dev_addrs list seemed too costly for fast path to me.


Re: [PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-15 Thread Linus Lüssing
On Wed, Mar 15, 2017 at 11:42:11AM +0100, Pablo Neira Ayuso wrote:
> I'm missing then why redirect is not then just enough for Linus usecase.

For my usecase, the MAC address is configured by the user from a
Web-UI. It may or may not be the one from the bridge device.

Besides, found it counter intuitive that DNAT did not work here
and took me some time to find out why. At least I didn't read about
any such known limitations of the dnat target in the ebtables
manpage.


[PATCH net] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-14 Thread Linus Lüssing
When trying to redirect bridged frames to the bridge device itself
via the ebtables nat-prerouting chain and the dnat target then this
currently fails:

The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge itself just fine and the correctly altered frame can even
be captured via a tcpdump on br0 (with or without promisc mode).

However, the IP code drops it in the beginning of ip_input.c/ip_rcv()
as the dnat target did not update the skb->pkt_type. If after
dnat'ing the packet is now destined to us then the skb->pkt_type
needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too.

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---
 net/bridge/br_input.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 013f2290b..ec83175 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -198,8 +198,12 @@ int br_handle_frame_finish(struct net *net, struct sock 
*sk, struct sk_buff *skb
if (dst) {
unsigned long now = jiffies;
 
-   if (dst->is_local)
+   if (dst->is_local) {
+   /* fix up potential DNAT mess */
+   skb->pkt_type = PACKET_HOST;
+
return br_pass_frame_up(skb);
+   }
 
if (now != dst->used)
dst->used = now;
-- 
2.1.4



[PATCH net] ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches

2017-02-02 Thread Linus Lüssing
When for instance a mobile Linux device roams from one access point to
another with both APs sharing the same broadcast domain and a
multicast snooping switch in between:

1)(c) <~~~> (AP1) <--[SSW]--> (AP2)

2)  (AP1) <--[SSW]--> (AP2) <~~~> (c)

Then currently IPv6 multicast packets will get lost for (c) until an
MLD Querier sends its next query message. The packet loss occurs
because upon roaming the Linux host so far stayed silent regarding
MLD and the snooping switch will therefore be unaware of the
multicast topology change for a while.

This patch fixes this by always resending MLD reports when an interface
change happens, for instance from NO-CARRIER to CARRIER state.

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

Initial problem report was sent to the bridge mailing list a while ago:
- https://lists.linuxfoundation.org/pipermail/bridge/2015-September/009754.html

The RFCs concerning IGMP, MLD and snooping switches seem a have a hole
concerning roaming. A request for clarification to mcast-w...@ietf.org
was left unanswered, unfortunately:
- https://mailarchive.ietf.org/arch/msg/mcast-wifi/Ghn2cGy1oN2ZwG1qaQO9SE13g6g

However, simply resending reports seems to be the straight forward way
to me to fix the issue mentioned above.
---
 net/ipv6/addrconf.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f60e88e..81f7b4e 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3386,9 +3386,15 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
}
 
if (idev) {
-   if (idev->if_flags & IF_READY)
-   /* device is already configured. */
+   if (idev->if_flags & IF_READY) {
+   /* device is already configured -
+* but resend MLD reports, we might
+* have roamed and need to update
+* multicast snooping switches
+*/
+   ipv6_mc_up(idev);
break;
+   }
idev->if_flags |= IF_READY;
}
 
-- 
2.1.4



[PATCH net-next v5] bridge: multicast to unicast

2017-01-21 Thread Linus Lüssing
From: Felix Fietkau <n...@nbd.name>

Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.

multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.

This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).

However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.

The initial patch and idea is from Felix Fietkau.

Signed-off-by: Felix Fietkau <n...@nbd.name>
[linus.luess...@c0d3.blue: various bug + style fixes, commit message]
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

This feature is used and enabled by default in OpenWRT and LEDE for AP
interfaces for more than a year now to allow both a more robust multicast
delivery and multicast at higher rates (e.g. multicast streaming).

In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by
the network daemon enabling AP isolation and by that separating all STAs.
Delivery of STA-to-STA IP mulitcast is made possible again by
enabling and utilizing the bridge hairpin mode, which considers the
incoming port as a potential outgoing port, too.

Hairpin-mode is performed after multicast snooping, therefore leading to
only deliver reports to STAs running a multicast router.

Changes in v5:
* fix a potential pagefault in br_ip6_multicast_mld2_report():
  -> a pskb_may_pull() might reallocate skb->data, therefore perform
 the "src = eth_hdr(skb)->h_source" only afterwards
* simplify code by always adding ether source address to a port group
  and checking the per-port multicast-to-unicast flag instead of a
  per-port-group one (thanks Stephen!)

Changes in v4:
* readd "From: Felix Fietkau [...]" (missed it again in v3)

Changes in v3:
* fix an uninitialized variable bug introduced in br_multicast_flood()
  in v2, found by kbuild test bot

Changes in v2:
* netlink support (thanks Nik!)
* fixed switching between multicast_to_unicast on/off
  -> even after toggling an already existing entry would
 stale in its mode and would never be replaced
  -> new extra check in br_port_group_equal)
* reduced checks in br_multicast_flood() from two to one
  to address fast-path concerns (thanks Nik!)
* rev-christmas tree ordering (thanks Nik!)
* removed "net_bridge_port_group::unicast", using
  ::flags instead (thanks Nik!)
* BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST
  (BR_MULTICAST_FLAST_LEAVE has the same length anyway)
* simplified maybe_deliver_addr()
  (no return, no "prev" paramater -> was a NOP anyway)
* added "From: Felix Fietkau [...]"
* added "Signed-off-by: Felix Fietkau [...]"
---
 include/linux/if_bridge.h|  1 +
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c  | 39 ++-
 net/bridge/br_mdb.c  |  2 +-
 net/bridge/br_multicast.c| 90 
 net/bridge/br_netlink.c  |  5 +++
 net/bridge/br_private.h  |  3 +-
 net/bridge/br_sysfs_if.c |  2 +
 8 files changed, 114 insertions(+), 29 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index c6587c0..debc9d5 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -46,6 +46,7 @@ struct br_ip_list {
 #define BR_LEARNING_SYNC   BIT(9)
 #define BR_PROXYARP_WIFI   BIT(10)
 #define BR_MCAST_FLOOD BIT(11)
+#define BR_MULTICAST_TO_UNICASTBIT(12)
 
 #define BR_DEFAULT_AGEING_TIME (300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6b13e59..4e59565 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -321,6 +321,7 @@ enum {
IFLA_BRPORT_MULTICAST_ROUTER,
IFLA_BRPORT_PAD,
IFLA_BRPORT_MCAST_FLOOD,
+   IFLA_BRPORT_MCAST_TO_UCAST,
__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7cb41ae..a0f9d00 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -174,6 +174,31 @@ static struct net_bridge_port *maybe_deliver(
return p;
 }
 
+static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb,
+  const unsigned char *addr, bool local_orig)
+{
+   struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+   const unsigned char *src = eth_hdr(skb)->h_source;
+
+   if (!should_deliver(p, skb))
+   return;
+
+

[PATCH net-next v4] bridge: multicast to unicast

2017-01-18 Thread Linus Lüssing
From: Felix Fietkau <n...@nbd.name>

Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.

multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.

This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).

However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.

The initial patch and idea is from Felix Fietkau.

Signed-off-by: Felix Fietkau <n...@nbd.name>
[linus.luess...@c0d3.blue: various bug + style fixes, commit message]
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

This feature is used and enabled by default in OpenWRT and LEDE for AP
interfaces for more than a year now to allow both a more robust multicast
delivery and multicast at higher rates (e.g. multicast streaming).

In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by
the network daemon enabling AP isolation and by that separating all STAs.
Delivery of STA-to-STA IP mulitcast is made possible again by
enabling and utilizing the bridge hairpin mode, which considers the
incoming port as a potential outgoing port, too.

Hairpin-mode is performed after multicast snooping, therefore leading to
only deliver reports to STAs running a multicast router.

Changes in v4:
* readd "From: Felix Fietkau [...]" (missed it again in v3)

Changes in v3:
* fix an uninitialized variable bug introduced in br_multicast_flood()
  in v2, found by kbuild test bot

Changes in v2:
* netlink support (thanks Nik!)
* fixed switching between multicast_to_unicast on/off
  -> even after toggling an already existing entry would
 stale in its mode and would never be replaced
  -> new extra check in br_port_group_equal)
* reduced checks in br_multicast_flood() from two to one
  to address fast-path concerns (thanks Nik!)
* rev-christmas tree ordering (thanks Nik!)
* removed "net_bridge_port_group::unicast", using
  ::flags instead (thanks Nik!)
* BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST
  (BR_MULTICAST_FLAST_LEAVE has the same length anyway)
* simplified maybe_deliver_addr()
  (no return, no "prev" paramater -> was a NOP anyway)
* added "From: Felix Fietkau [...]"
* added "Signed-off-by: Felix Fietkau [...]"
---
 include/linux/if_bridge.h|  1 +
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c  | 37 -
 net/bridge/br_mdb.c  |  2 +-
 net/bridge/br_multicast.c| 96 
 net/bridge/br_netlink.c  |  5 +++
 net/bridge/br_private.h  |  8 ++--
 net/bridge/br_sysfs_if.c |  2 +
 8 files changed, 121 insertions(+), 31 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index c6587c0..debc9d5 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -46,6 +46,7 @@ struct br_ip_list {
 #define BR_LEARNING_SYNC   BIT(9)
 #define BR_PROXYARP_WIFI   BIT(10)
 #define BR_MCAST_FLOOD BIT(11)
+#define BR_MULTICAST_TO_UNICASTBIT(12)
 
 #define BR_DEFAULT_AGEING_TIME (300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6b13e59..4e59565 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -321,6 +321,7 @@ enum {
IFLA_BRPORT_MULTICAST_ROUTER,
IFLA_BRPORT_PAD,
IFLA_BRPORT_MCAST_FLOOD,
+   IFLA_BRPORT_MCAST_TO_UCAST,
__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7cb41ae..a6c8a27 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -174,6 +174,29 @@ static struct net_bridge_port *maybe_deliver(
return p;
 }
 
+static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb,
+  const unsigned char *addr, bool local_orig)
+{
+   struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+   const unsigned char *src = eth_hdr(skb)->h_source;
+
+   if (!should_deliver(p, skb))
+   return;
+
+   /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */
+   if (skb->dev == p->dev && ether_addr_equal(src, addr))
+   return;
+
+   skb = skb_copy(skb, GFP_ATOMIC);
+   if (!skb) {
+   dev->stats.tx_dropped++;
+   return;
+   }
+
+   memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN);
+   __br_forward(p, skb, loc

[PATCH net-next v3] bridge: multicast to unicast

2017-01-18 Thread Linus Lüssing
Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.

multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.

This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).

However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.

The initial patch and idea is from Felix Fietkau.

Signed-off-by: Felix Fietkau <n...@nbd.name>
[linus.luess...@c0d3.blue: various bug + style fixes, commit message]
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

This feature is used and enabled by default in OpenWRT and LEDE for AP
interfaces for more than a year now to allow both a more robust multicast
delivery and multicast at higher rates (e.g. multicast streaming).

In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by
the network daemon enabling AP isolation and by that separating all STAs.
Delivery of STA-to-STA IP mulitcast is made possible again by
enabling and utilizing the bridge hairpin mode, which considers the
incoming port as a potential outgoing port, too.

Hairpin-mode is performed after multicast snooping, therefore leading to
only deliver reports to STAs running a multicast router.

Changes in v3:
* fix an uninitialized variable bug introduced in br_multicast_flood()
  in v2, found by kbuild test bot

Changes in v2:
* netlink support (thanks Nik!)
* fixed switching between multicast_to_unicast on/off
  -> even after toggling an already existing entry would
 stale in its mode and would never be replaced
  -> new extra check in br_port_group_equal)
* reduced checks in br_multicast_flood() from two to one
  to address fast-path concerns (thanks Nik!)
* rev-christmas tree ordering (thanks Nik!)
* removed "net_bridge_port_group::unicast", using
  ::flags instead (thanks Nik!)
* BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST
  (BR_MULTICAST_FLAST_LEAVE has the same length anyway)
* simplified maybe_deliver_addr()
  (no return, no "prev" paramater -> was a NOP anyway)
* added "From: Felix Fietkau [...]"
* added "Signed-off-by: Felix Fietkau [...]"
---
 include/linux/if_bridge.h|  1 +
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c  | 37 -
 net/bridge/br_mdb.c  |  2 +-
 net/bridge/br_multicast.c| 96 
 net/bridge/br_netlink.c  |  5 +++
 net/bridge/br_private.h  |  8 ++--
 net/bridge/br_sysfs_if.c |  2 +
 8 files changed, 121 insertions(+), 31 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index c6587c0..debc9d5 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -46,6 +46,7 @@ struct br_ip_list {
 #define BR_LEARNING_SYNC   BIT(9)
 #define BR_PROXYARP_WIFI   BIT(10)
 #define BR_MCAST_FLOOD BIT(11)
+#define BR_MULTICAST_TO_UNICASTBIT(12)
 
 #define BR_DEFAULT_AGEING_TIME (300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6b13e59..4e59565 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -321,6 +321,7 @@ enum {
IFLA_BRPORT_MULTICAST_ROUTER,
IFLA_BRPORT_PAD,
IFLA_BRPORT_MCAST_FLOOD,
+   IFLA_BRPORT_MCAST_TO_UCAST,
__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7cb41ae..a6c8a27 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -174,6 +174,29 @@ static struct net_bridge_port *maybe_deliver(
return p;
 }
 
+static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb,
+  const unsigned char *addr, bool local_orig)
+{
+   struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+   const unsigned char *src = eth_hdr(skb)->h_source;
+
+   if (!should_deliver(p, skb))
+   return;
+
+   /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */
+   if (skb->dev == p->dev && ether_addr_equal(src, addr))
+   return;
+
+   skb = skb_copy(skb, GFP_ATOMIC);
+   if (!skb) {
+   dev->stats.tx_dropped++;
+   return;
+   }
+
+   memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN);
+   __br_forward(p, skb, local_orig);
+}
+
 /* called under rcu_read_lock */
 void br_flood(struct net_bridge *br, struct sk_buff *skb,
  

[PATCH net-next v2] bridge: multicast to unicast

2017-01-17 Thread Linus Lüssing
From: Felix Fietkau <n...@nbd.name>

Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.

multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.

This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).

However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.

The initial patch and idea is from Felix Fietkau.

Signed-off-by: Felix Fietkau <n...@nbd.name>
[linus.luess...@c0d3.blue: various bug + style fixes, commit message]
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

This feature is used and enabled by default in OpenWRT and LEDE for AP
interfaces for more than a year now to allow both a more robust multicast
delivery and multicast at higher rates (e.g. multicast streaming).

In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by
the network daemon enabling AP isolation and by that separating all STAs.
Delivery of STA-to-STA IP mulitcast is made possible again by
enabling and utilizing the bridge hairpin mode, which considers the
incoming port as a potential outgoing port, too.

Hairpin-mode is performed after multicast snooping, therefore leading to
only deliver reports to STAs running a multicast router.

Changes in v2:
* netlink support (thanks Nik!)
* fixed switching between multicast_to_unicast on/off
  -> even after toggling an already existing entry would
 stale in its mode and would never be replaced
  -> new extra check in br_port_group_equal)
* reduced checks in br_multicast_flood() from two to one
  to address fast-path concerns (thanks Nik!)
* rev-christmas tree ordering (thanks Nik!)
* removed "net_bridge_port_group::unicast", using
  ::flags instead (thanks Nik!)
* BR_MULTICAST_TO_UCAST -> BR_MULTICAST_TO_UNICAST
  (BR_MULTICAST_FLAST_LEAVE has the same length anyway)
* simplified maybe_deliver_addr()
  (no return, no "prev" paramater -> was a NOP anyway)
* added "From: Felix Fietkau [...]"
* added "Signed-off-by: Felix Fietkau [...]"
---
 include/linux/if_bridge.h|  1 +
 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c  | 37 -
 net/bridge/br_mdb.c  |  2 +-
 net/bridge/br_multicast.c| 96 
 net/bridge/br_netlink.c  |  5 +++
 net/bridge/br_private.h  |  8 ++--
 net/bridge/br_sysfs_if.c |  2 +
 8 files changed, 121 insertions(+), 31 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index c6587c0..debc9d5 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -46,6 +46,7 @@ struct br_ip_list {
 #define BR_LEARNING_SYNC   BIT(9)
 #define BR_PROXYARP_WIFI   BIT(10)
 #define BR_MCAST_FLOOD BIT(11)
+#define BR_MULTICAST_TO_UNICASTBIT(12)
 
 #define BR_DEFAULT_AGEING_TIME (300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6b13e59..4e59565 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -321,6 +321,7 @@ enum {
IFLA_BRPORT_MULTICAST_ROUTER,
IFLA_BRPORT_PAD,
IFLA_BRPORT_MCAST_FLOOD,
+   IFLA_BRPORT_MCAST_TO_UCAST,
__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7cb41ae..75d041e 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -174,6 +174,29 @@ static struct net_bridge_port *maybe_deliver(
return p;
 }
 
+static void maybe_deliver_addr(struct net_bridge_port *p, struct sk_buff *skb,
+  const unsigned char *addr, bool local_orig)
+{
+   struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+   const unsigned char *src = eth_hdr(skb)->h_source;
+
+   if (!should_deliver(p, skb))
+   return;
+
+   /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */
+   if (skb->dev == p->dev && ether_addr_equal(src, addr))
+   return;
+
+   skb = skb_copy(skb, GFP_ATOMIC);
+   if (!skb) {
+   dev->stats.tx_dropped++;
+   return;
+   }
+
+   memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN);
+   __br_forward(p, skb, local_orig);
+}
+
 /* called under rcu_read_lock */
 void br_flood(struct net_bridge *br, struct sk_buff *skb,
  enum br_pkt_type pkt_type, bool local_rcv, bool local_orig)
@@ -241,10 +264,20 @@ void b

Re: [PATCH net-next] bridge: multicast to unicast

2017-01-09 Thread Linus Lüssing
On Mon, Jan 09, 2017 at 01:30:32PM -0800, Stephen Hemminger wrote:
> I wonder if MAC80211 should be doing IGMP snooping and not bridge
> in this environment.

In the long term, yes. For now, not quite sure.

I personally like to go for simple solutions first :).


Re: [PATCH net-next] bridge: multicast to unicast

2017-01-09 Thread Linus Lüssing
On Mon, Jan 09, 2017 at 10:42:46PM +0100, Johannes Berg wrote:
> On Mon, 2017-01-09 at 22:33 +0100, Linus Lüssing wrote:
> > On Mon, Jan 09, 2017 at 01:44:03PM +0100, Johannes Berg wrote:
> > > 
> > > > >  A host SHOULD silently discard a datagram that is
> > > > > received via
> > > > >  a link-layer broadcast (see Section 2.4) but does not
> > > > > specify
> > > > >  an IP multicast or broadcast destination address.
> > > > 
> > > > This example is the other way round. It specifies how the IP
> > > > destination should look like in case of link-layer broadcast. Not
> > > > how the link-layer destination should look like in case of a
> > > > multicast/broadcast IP destination.
> > > 
> > > You stopped reading too early - snipped the context part for you :)
> > 
> > Sorry for writing to you directly, but I still have some
> > difficulties. In pseudo-code that line says:
> > 
> > -
> > if ll_dst(pkt) == bcast AND ip_dst(pkt) != mcast/bcast:
> > -> drop(pkt)
> > -
> > 
> > But after multicast-to-unicast conversion, we have:
> > 
> > -
> > ll_dst(pkt) == ucast AND ip_dst(pkt) == mcast
> > -
> > 
> > So none of the two requirements for dropping are matched?
> > 
> 
> Exactly. My point is that this is breaking the expectation that hosts
> are actually able to drop such packets.

[readding CCs I removed earlier]

Ah! Thanks. I was worried about creating packetloss :D.

Hm, for this other other way round, I think it does not apply for
the bridge multicast-to-unicast patch if I'm not misreading the bridge code:

For a packet with a link-layer multicast address but a unicast IP
destination, the bridge MDB lookup will fail.
(http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.8#L178
 returns NULL)

Case A): No multicast router on port:
-> bridge, br_multicast_flood(), will drop the packet already
   (no matter if multicast-to-unicast is enabled or not)

Case B): Multicast router present on port:
-> The new patch does not apply multicast-to-unicast but just floods
   packet unaltered
   ("else { port = rport; addr = NULL; }" branch)

Regards, Linus


Re: [PATCH net-next] bridge: multicast to unicast

2017-01-09 Thread Linus Lüssing
On Mon, Jan 09, 2017 at 12:44:19PM +0100, M. Braun wrote:
> Am 09.01.2017 um 09:08 schrieb Johannes Berg:
> > Does it make sense to implement the two in separate layers though?
> > 
> > Clearly, this part needs to be implemented in the bridge layer due to
> > the snooping knowledge, but the code is very similar to what mac80211
> > has now.
> 
> Does the bridge always know about all stations connected?

The bridge does not always know about all stations, especially the
silent ones like in your DVB-T example.

However, concerning IP multicast, there is IGMP/MLD. So the bridge
does know about all stations which are interested in a specific IP
multicast stream.

(As long as there is a querier on the link, which periodically
queriers for IGMP/MLD reports from any listener. If there is no
querier then the bridge multicast snooping, including the bridge
multicast-to-unicast will fall back to flooding)


So if your television example uses IP multicast properly, it is
completely doable with the bridge multicast-to-unicast, thanks to
IGMP/MLD.


Re: [PATCH net-next] bridge: multicast to unicast

2017-01-09 Thread Linus Lüssing
On Mon, Jan 09, 2017 at 09:05:49AM +0100, Johannes Berg wrote:
> On Sat, 2017-01-07 at 16:15 +0100, Linus Lüssing wrote:
> 
> > Actually, I do not quite understand that remark in the mac80211
> > multicast-to-unicast patch. IP should not care about the ethernet
> > header?
> 
> But it does, for example RFC 1122 states:
> 
>  When a host sends a datagram to a link-layer broadcast address,
>  the IP destination address MUST be a legal IP broadcast or IP
>  multicast address.
> 
>  A host SHOULD silently discard a datagram that is received via
>  a link-layer broadcast (see Section 2.4) but does not specify
>  an IP multicast or broadcast destination address.

This example is the other way round. It specifies how the IP
destination should look like in case of link-layer broadcast. Not
how the link-layer destination should look like in case of a
multicast/broadcast IP destination.

Any other examples?


Re: [PATCH net-next] bridge: multicast to unicast

2017-01-07 Thread Linus Lüssing
On Fri, Jan 06, 2017 at 01:47:52PM +0100, Johannes Berg wrote:
> How does this compare and/or relate to the multicast-to-unicast feature
> we were going to add to the wifi stack, particularly mac80211? Do we
> perhaps not need that feature at all, if bridging will have it?
> 
> I suppose that the feature there could apply also to locally generated
> traffic when the AP interface isn't in a bridge, but I think I could
> live with requiring the AP to be put into a bridge to achieve a similar
> configuration?
> 
> Additionally, on an unrelated note, this seems to apply generically to
> all kinds of frames, losing information by replacing the address.
> Shouldn't it have similar limitations as the wifi stack feature has
> then, like only applying to ARP, IPv4, IPv6 and not general protocols?

(should all three be answered with Michael's and my reply to
Michael's mail, I think)

> 
> Also, it should probably come with the same caveat as we documented for
> the wifi feature:
> 
> Note that this may break certain expectations of the receiver,
> such as the ability to drop unicast IP packets received within
> multicast L2 frames, or the ability to not send ICMP destination
> unreachable messages for packets received in L2 multicast (which
> is required, but the receiver can't tell the difference if this
> new option is enabled.)

Actually, I do not quite understand that remark in the mac80211
multicast-to-unicast patch. IP should not care about the ethernet
header?


Re: [PATCH net-next] bridge: multicast to unicast

2017-01-07 Thread Linus Lüssing
On Fri, Jan 06, 2017 at 07:13:56PM -0800, Stephen Hemminger wrote:
> On Mon,  2 Jan 2017 20:32:14 +0100
> Linus Lüssing <linus.luess...@c0d3.blue> wrote:
> 
> > This feature is intended for interface types which have a more reliable
> > and/or efficient way to deliver unicast packets than broadcast ones
> > (e.g. wifi).
> 
> 
> Why is this not done in MAC80211 rather than  bridge?

Because mac80211 does not have the IGMP/MLD snooping code as
the bridge has?

Reimplementing the snooping in mac80211 does not make sense
because of duplicating code. Moving the snooping code from the
bridge to mac80211 does not make sense either, because we want
multicast snooping, software based, (virtually) wired switches,
too.

The "best way" (TM) would probably be to migrate the IGMP/MLD
snooping from the bridge code to the net device code on the long
run to make such a database usable for any kind of device, without
needing this bridge hack.

But such a migration would also need a way more invasive patchset.

While Felix's idea might look a little "ugly" due it's hacky
nature, I think it is also quite beautiful thanks to it's
simplicity.


Re: [PATCH net-next] bridge: multicast to unicast

2017-01-07 Thread Linus Lüssing
On Sat, Jan 07, 2017 at 11:32:57AM +0100, M. Braun wrote:
> Am 06.01.2017 um 14:54 schrieb Johannes Berg:
> > 
> >> The bridge layer can use IGMP snooping to ensure that the multicast
> >> stream is only transmitted to clients that are actually a member of
> >> the group. Can the mac80211 feature do the same?
> > 
> > No, it'll convert the packet for all clients that are behind that
> > netdev. But that's an argument for dropping the mac80211 feature, which
> > hasn't been merged upstream yet, no?
> 
> But there is multicast/broadcast traffic like e.g. ARP and some IP
> multicast groups that are not covered by IGMP snooping. The mac80211
> patch converts this to unicast as well, which the bridge cannot do.
> 
> That way, these features both complement and overlap each other.

Right, I'd agree with that.

I didn't write it explicitly in the commit message, but yes, the
like anything concerning bridge multicast snooping, bridge
multicast-to-unicast can only affect packets as noted in
RFC4541 ("Considerations for Internet Group Management Protocol (IGMP)
and Multicast Listener Discovery (MLD) Snooping Switches"), too.

So it is only working for IPv4 multicast, excluding link-local
(224.0.0.0/24), and IPv6 multicast, excluding all-host-multicast
(ff02::1).

And does not concern ARP in any way.


The nice complementary effect is, that the bridge can first sieve
out those IP packets thanks to IGMP/MLD snooping knowledge and for
anything else, like ARP, 224.0.0.x or ff02::1, the mac80211
multicast-to-unicast could do its job.


For APs with a small number of STAs (like your private home AP),
you might want to enable both bridge multicast-to-unicast and
mac80211 multicast-to-unicast for this complementary effect. While
on public APs with 30 to 50 STAs with varying distances and bitrates,
you might only one to enable the bridge one, because sending an ARP
packet 50x might actually reduce performance and airtime
significantly.


[PATCH net-next] bridge: multicast to unicast

2017-01-02 Thread Linus Lüssing
Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.

multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.

This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).

However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.

The initial patch and idea is from Felix Fietkau.

Cc: Felix Fietkau <n...@nbd.name>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>

---

This feature is used and enabled by default in OpenWRT and LEDE for AP
interfaces for more than a year now to allow both a more robust multicast
delivery and multicast at higher rates (e.g. multicast streaming).

In OpenWRT/LEDE the IGMP/MLD report suppression issue is overcome by
the network daemon enabling AP isolation and by that separating all STAs.
Delivery of STA-to-STA IP mulitcast is made possible again by
enabling and utilizing the bridge hairpin mode, which considers the
incoming port as a potential outgoing port, too.

Hairpin-mode is performed after multicast snooping, therefore leading to
only deliver reports to STAs running a multicast router.
---
 include/linux/if_bridge.h |  1 +
 net/bridge/br_forward.c   | 44 +--
 net/bridge/br_mdb.c   |  2 +-
 net/bridge/br_multicast.c | 92 ++-
 net/bridge/br_private.h   |  4 ++-
 net/bridge/br_sysfs_if.c  |  2 ++
 6 files changed, 115 insertions(+), 30 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index c6587c0..f1b0d78 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -46,6 +46,7 @@ struct br_ip_list {
 #define BR_LEARNING_SYNC   BIT(9)
 #define BR_PROXYARP_WIFI   BIT(10)
 #define BR_MCAST_FLOOD BIT(11)
+#define BR_MULTICAST_TO_UCAST  BIT(12)
 
 #define BR_DEFAULT_AGEING_TIME (300 * HZ)
 
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7cb41ae..49d742d 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -174,6 +174,33 @@ static struct net_bridge_port *maybe_deliver(
return p;
 }
 
+static struct net_bridge_port *maybe_deliver_addr(
+   struct net_bridge_port *prev, struct net_bridge_port *p,
+   struct sk_buff *skb, const unsigned char *addr,
+   bool local_orig)
+{
+   struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+   const unsigned char *src = eth_hdr(skb)->h_source;
+
+   if (!should_deliver(p, skb))
+   return prev;
+
+   /* Even with hairpin, no soliloquies - prevent breaking IPv6 DAD */
+   if (skb->dev == p->dev && ether_addr_equal(src, addr))
+   return prev;
+
+   skb = skb_copy(skb, GFP_ATOMIC);
+   if (!skb) {
+   dev->stats.tx_dropped++;
+   return prev;
+   }
+
+   memcpy(eth_hdr(skb)->h_dest, addr, ETH_ALEN);
+   __br_forward(p, skb, local_orig);
+
+   return prev;
+}
+
 /* called under rcu_read_lock */
 void br_flood(struct net_bridge *br, struct sk_buff *skb,
  enum br_pkt_type pkt_type, bool local_rcv, bool local_orig)
@@ -231,6 +258,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
struct net_bridge_port *prev = NULL;
struct net_bridge_port_group *p;
struct hlist_node *rp;
+   const unsigned char *addr;
 
rp = rcu_dereference(hlist_first_rcu(>router_list));
p = mdst ? rcu_dereference(mdst->ports) : NULL;
@@ -241,10 +269,20 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
 NULL;
 
-   port = (unsigned long)lport > (unsigned long)rport ?
-  lport : rport;
+   if ((unsigned long)lport > (unsigned long)rport) {
+   port = lport;
+   addr = p->unicast ? p->eth_addr : NULL;
+   } else {
+   port = rport;
+   addr = NULL;
+   }
+
+   if (addr)
+   prev = maybe_deliver_addr(prev, port, skb, addr,
+ local_orig);
+   else
+   prev = maybe_deliver(prev, port, skb, local_orig);
 
-   prev = maybe_deliver(prev, port, skb, local_orig);
if (IS_ERR(prev))
goto out;
if (prev == port

Re: [PATCH] mac80211_hwsim: suggest nl80211 instead of wext driver in documentation

2016-10-17 Thread Linus Lüssing
On Mon, Oct 17, 2016 at 11:39:04AM +0200, Johannes Berg wrote:
> On Mon, 2016-10-17 at 00:39 +0200, Linus Lüssing wrote:
> > For mac80211_hwsim interfaces, suggest to use wpa_supplicant with the
> > more modern, netlink based driver instead of wext.
> 
> Makes sense, applied.
> 
> > Actually, I wasn't even able to make a connection with the
> > configuration
> > files and information provided in
> > Documentation/networking/mac80211_hwsim/{README,hostapd.conf/wpa_supp
> > licant.conf}
> > 
> This less so, we even have a few test cases we run regularly, but I
> don't know. Maybe there's something special in those configuration
> files that we don't test for otherwise.

I investigated a little further and the problem is probably that
I'm running a minimal Linux kernel which was compiled without
CONFIG_CFG80211_WEXT :-).

Anyways, thanks for applying the patch!

Regards, Linus


[PATCH] mac80211_hwsim: suggest nl80211 instead of wext driver in documentation

2016-10-16 Thread Linus Lüssing
For mac80211_hwsim interfaces, suggest to use wpa_supplicant with the more
modern, netlink based driver instead of wext.

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---

Actually, I wasn't even able to make a connection with the configuration
files and information provided in
Documentation/networking/mac80211_hwsim/{README,hostapd.conf/wpa_supplicant.conf}

Changing -Dwext to -Dnl80211 helped and made the WPA-PSK connection with
mac80211_hwsim interfaces work for me.
---
 Documentation/networking/mac80211_hwsim/README | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/networking/mac80211_hwsim/README 
b/Documentation/networking/mac80211_hwsim/README
index 24ac91d..3566a72 100644
--- a/Documentation/networking/mac80211_hwsim/README
+++ b/Documentation/networking/mac80211_hwsim/README
@@ -60,7 +60,7 @@ modprobe mac80211_hwsim
 hostapd hostapd.conf
 
 # Run wpa_supplicant (station) for wlan1
-wpa_supplicant -Dwext -iwlan1 -c wpa_supplicant.conf
+wpa_supplicant -Dnl80211 -iwlan1 -c wpa_supplicant.conf
 
 
 More test cases are available in hostap.git:
-- 
2.1.4



Re: [Bridge] [PATCH net-next v2 2/2] net: bridge: add per-port multicast flood flag

2016-08-31 Thread Linus Lüssing
On Wed, Aug 31, 2016 at 08:02:22AM +0200, Nikolay Aleksandrov wrote:
> On 31/08/16 03:37, Linus Lüssing wrote:
> > On Tue, Aug 30, 2016 at 05:23:08PM +0200, Nikolay Aleksandrov via Bridge 
> > wrote:
> >> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> >> index 1da3221845f1..ed0dd3340084 100644
> >> --- a/net/bridge/br_if.c
> >> +++ b/net/bridge/br_if.c
> >> @@ -362,7 +362,7 @@ static struct net_bridge_port *new_nbp(struct 
> >> net_bridge *br,
> >>p->path_cost = port_cost(dev);
> >>p->priority = 0x8000 >> BR_PORT_BITS;
> >>p->port_no = index;
> >> -  p->flags = BR_LEARNING | BR_FLOOD;
> >> +  p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD;
> > 
> > I'm discontent with this new flag becoming the default.
> > 
> > Could you elaborate a little more on your use-case, when/why do
> > you want/need this flag?
> > 
> 
> The use case is the current default behaviour if we don't make this flag on 
> by default
> then we'll change user-visible default behaviour. Right now we flood 
> unregistered mcast
> traffic by default (if there's no querier and router port, which continues to 
> function
> as before). Also we have the port flags equal to BR_AUTO_MASK by default.

Ok, you're right, the way you implemented it doesn't
change the default behaviour (ignoring the BR_AUTO_MASK change you
removed in v3).

I guess the "similar to the unknown unicast flood flag" confused
me a little and I was afraid that the "flood if there is no
listener / MDB entry" behaviour, which was removed some years ago,
would be reintroduced. (but yeah, looking at the code more
closely, it doesn't do that)

Thanks for the clarification!

Regards, Linus


Re: [Bridge] [PATCH net-next v2 2/2] net: bridge: add per-port multicast flood flag

2016-08-30 Thread Linus Lüssing
On Tue, Aug 30, 2016 at 05:23:08PM +0200, Nikolay Aleksandrov via Bridge wrote:
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 1da3221845f1..ed0dd3340084 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -362,7 +362,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge 
> *br,
>   p->path_cost = port_cost(dev);
>   p->priority = 0x8000 >> BR_PORT_BITS;
>   p->port_no = index;
> - p->flags = BR_LEARNING | BR_FLOOD;
> + p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD;

I'm discontent with this new flag becoming the default.

Could you elaborate a little more on your use-case, when/why do
you want/need this flag?


[PATCH] cfg80211: Add stub for cfg80211_get_station()

2016-08-19 Thread Linus Lüssing
This allows modules using this function (currently: batman-adv) to
compile even if cfg80211 is not built at all, thus relaxing
dependencies.

Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---
 include/net/cfg80211.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 9c23f4d3..beb7610 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1102,6 +1102,7 @@ struct station_info {
struct cfg80211_tid_stats pertid[IEEE80211_NUM_TIDS + 1];
 };
 
+#if IS_ENABLED(CONFIG_CFG80211)
 /**
  * cfg80211_get_station - retrieve information about a given station
  * @dev: the device where the station is supposed to be connected to
@@ -1114,6 +1115,14 @@ struct station_info {
  */
 int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr,
 struct station_info *sinfo);
+#else
+static inline int cfg80211_get_station(struct net_device *dev,
+  const u8 *mac_addr,
+  struct station_info *sinfo)
+{
+   return -ENOENT;
+}
+#endif
 
 /**
  * enum monitor_flags - monitor flags
-- 
2.1.4



Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address

2016-06-28 Thread Linus Lüssing
On Tue, Jun 28, 2016 at 08:04:42AM -0400, David Miller wrote:
> From: Linus Lüssing <linus.luess...@c0d3.blue>
> [...]
> > Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()")
> 
> You're missing an initial 'd' in that SHA1-ID.
> 
> With that fixed, applied and queued up for -stable.

Sorry :(. Thanks for taking care of it!


Re: [Bridge] [PATCH net-next] net: bridge: add support for IGMP/MLD stats and export them via netlink

2016-06-28 Thread Linus Lüssing
On Mon, Jun 27, 2016 at 08:10:48PM +0200, Nikolay Aleksandrov via Bridge wrote:
> These are invaluable when monitoring or debugging complex multicast setups
> with bridges.

Indeed! Great patch :). Especially if people are unable to provide
pcap files for debugging (due to whatever reason). Hopefully that
will help with bugzilla ticket #99081, too...

I know it might not quite fit into your current patch, which simply
stores the ICMPv6 and IGMP type in the bridge private skb->cb, but
do you think you could count and export the following two more
things, too:

* MLDv1 vs. MLDv2 querier (and IGMP accordingly)
* Number of (potential) MLD/IGMP parse errors
  (e.g. beginning of br_multicast_ipv{4,6}_rcv():
   http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1588 
and
   http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1634)

The former would help to know how the network is expected to
behave (for instance whether you should see MLDv2 reports at all or
whether / how much report suppression to expect).

The latter would help to spot either potential IGMP/MLD parsing bugs in
the bridge or malformed IGMP/MLD messages send by someone else.


Ideally, there would be per port counters again for the overall
IPv4/IPv6 multicast traffic. That would help for multicast streams
for instance, to easily see whether multicast counters increase
rapidly on the ports you would expect them to. And whether snooping
is working in general for such streames, without needing to check
each port individually via tcpdump, for instance.


Just some thoughts, would love to hear what you think about them.

Regards, Linus


Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address

2016-06-25 Thread Linus Lüssing
On Fri, Jun 24, 2016 at 12:35:18PM +0200, Daniel Danzberger wrote:
> The bridge is falsly dropping ipv6 mulitcast packets if there is:
>  1. No ipv6 address assigned on the brigde.
>  2. No external mld querier present.
>  3. The internal querier enabled.
> 
> When the bridge fails to build mld queries, because it has no
> ipv6 address, it slilently returns, but keeps the local querier enabled.
> This specific case causes confusing packet loss.
> 
> Ipv6 multicast snooping can only work if:
>  a) An external querier is present
>  OR
>  b) The bridge has an ipv6 address an is capable of sending own queries
> 
> Otherwise it has to forward/flood the ipv6 multicast traffic,
> because snooping cannot work.
> 
> This patch fixes the issue by adding a flag to the bridge struct that
> indicates that there is currently no ipv6 address assinged to the bridge
> and returns a false state for the local querier in
> __br_multicast_querier_exists().

Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()")


Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address

2016-06-24 Thread Linus Lüssing
On Fri, Jun 24, 2016 at 12:35:18PM +0200, Daniel Danzberger wrote:
> The bridge is falsly dropping ipv6 mulitcast packets if there is:
>  1. No ipv6 address assigned on the brigde.
>  2. No external mld querier present.
>  3. The internal querier enabled.
> 
> When the bridge fails to build mld queries, because it has no
> ipv6 address, it slilently returns, but keeps the local querier enabled.
> This specific case causes confusing packet loss.
> 
> Ipv6 multicast snooping can only work if:
>  a) An external querier is present
>  OR
>  b) The bridge has an ipv6 address an is capable of sending own queries
> 
> Otherwise it has to forward/flood the ipv6 multicast traffic,
> because snooping cannot work.
> 
> This patch fixes the issue by adding a flag to the bridge struct that
> indicates that there is currently no ipv6 address assinged to the bridge
> and returns a false state for the local querier in
> __br_multicast_querier_exists().

Acked-by: Linus Lüssing <linus.luess...@c0d3.blue>


Re: [PATCH] Bridge: Fix ipv6 mc snooping if it has no ipv6 address.

2016-06-23 Thread Linus Lüssing
Hi Daniel,

Thanks for submitting this patch here :).


On Thu, Jun 23, 2016 at 11:28:55AM +0200, daniel wrote:
> The bridge is falsly dropping ipv6 mulitcast packets
> if there is no ipv6 address assigned on the brigde and no
> external mld querier is present.

and if the bridge internal querier is enabled (usually disabled
by default in the bridge code, but enabled by default in OpenWRT
for instance).

> 
> When the bridge fails to build mld queries, because it has no
> ipv6 address, it silently returns, but keeps the local querier enabled.
> (br_multicast.c:832)

Not sure whether David or others like line numbers in commit messages, as
they can change over time.

> 
> Ipv6 multicast snooping can only work if:
>  a) an external querier is present

maybe clarify that this is an OR, not AND?


I think you can add a [PATCH net] tag, as it seems small
enough for stable kernels and fixes a potential, confusing packet
loss case.

Also maybe add a:
--
Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()")
--

Regards, Linus


PS: Does not seem to apply for me on either David's net branch or
Torvald's master branch.

"fatal: patch fragment without header at line 7: @@ -599,10
+612,12 @@ static inline bool"

Try using "git format-patch" and "git send-email" instead. Also
check ./scripts/get_maintainer.pl for a few more email addresses
to add.


[PATCHv2 net] bridge: fix igmp / mld query parsing

2016-05-04 Thread Linus Lüssing
With the newly introduced helper functions the skb pulling is hidden
in the checksumming function - and undone before returning to the
caller.

The IGMP and MLD query parsing functions in the bridge still
assumed that the skb is pointing to the beginning of the IGMP/MLD
message while it is now kept at the beginning of the IPv4/6 header.

If there is a querier somewhere else, then this either causes
the multicast snooping to stay disabled even though it could be
enabled. Or, if we have the querier enabled too, then this can
create unnecessary IGMP / MLD query messages on the link.

Fixing this by taking the offset between IP and IGMP/MLD header into
account, too.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Simon Wunderlich <s...@simonwunderlich.de>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---
v2: changed "int" to "unsigned int"

 net/bridge/br_multicast.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 03661d9..ea98937 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1270,6 +1270,7 @@ static int br_ip4_multicast_query(struct net_bridge *br,
struct br_ip saddr;
unsigned long max_delay;
unsigned long now = jiffies;
+   unsigned int offset = skb_transport_offset(skb);
__be32 group;
int err = 0;
 
@@ -1280,14 +1281,14 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 
group = ih->group;
 
-   if (skb->len == sizeof(*ih)) {
+   if (skb->len == offset + sizeof(*ih)) {
max_delay = ih->code * (HZ / IGMP_TIMER_SCALE);
 
if (!max_delay) {
max_delay = 10 * HZ;
group = 0;
}
-   } else if (skb->len >= sizeof(*ih3)) {
+   } else if (skb->len >= offset + sizeof(*ih3)) {
ih3 = igmpv3_query_hdr(skb);
if (ih3->nsrcs)
goto out;
@@ -1348,6 +1349,7 @@ static int br_ip6_multicast_query(struct net_bridge *br,
struct br_ip saddr;
unsigned long max_delay;
unsigned long now = jiffies;
+   unsigned int offset = skb_transport_offset(skb);
const struct in6_addr *group = NULL;
bool is_general_query;
int err = 0;
@@ -1357,8 +1359,8 @@ static int br_ip6_multicast_query(struct net_bridge *br,
(port && port->state == BR_STATE_DISABLED))
goto out;
 
-   if (skb->len == sizeof(*mld)) {
-   if (!pskb_may_pull(skb, sizeof(*mld))) {
+   if (skb->len == offset + sizeof(*mld)) {
+   if (!pskb_may_pull(skb, offset + sizeof(*mld))) {
err = -EINVAL;
goto out;
}
@@ -1367,7 +1369,7 @@ static int br_ip6_multicast_query(struct net_bridge *br,
if (max_delay)
group = >mld_mca;
} else {
-   if (!pskb_may_pull(skb, sizeof(*mld2q))) {
+   if (!pskb_may_pull(skb, offset + sizeof(*mld2q))) {
err = -EINVAL;
goto out;
}
-- 
1.7.10.4



Re: [PATCH net] bridge: fix igmp / mld query parsing

2016-05-04 Thread Linus Lüssing
On Tue, May 03, 2016 at 01:26:23PM -0700, Stephen Hemminger wrote:
> On Tue,  3 May 2016 22:18:54 +0200
> Linus Lüssing <linus.luess...@c0d3.blue> wrote:
> 
> > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> > index 03661d9..7105cdf 100644
> > --- a/net/bridge/br_multicast.c
> > +++ b/net/bridge/br_multicast.c
> > @@ -1271,6 +1271,7 @@ static int br_ip4_multicast_query(struct net_bridge 
> > *br,
> > unsigned long max_delay;
> > unsigned long now = jiffies;
> > __be32 group;
> > +   int offset = skb_transport_offset(skb);
> shouldn't this be unsigned?

Yes, should always be unsigned here.

Ok, I'm changing that (even though skb_transport_offset() is "static
inline int").


[PATCH net] bridge: fix igmp / mld query parsing

2016-05-03 Thread Linus Lüssing
With the newly introduced helper functions the skb pulling is hidden
in the checksumming function - and undone before returning to the
caller.

The IGMP and MLD query parsing functions in the bridge still
assumed that the skb is pointing to the beginning of the IGMP/MLD
message while it is now kept at the beginning of the IPv4/6 header.

If there is a querier somewhere else, then this either causes
the multicast snooping to stay disabled even though it could be
enabled. Or, if we have the querier enabled too, then this can
create unnecessary IGMP / MLD query messages on the link.

Fixing this by taking the offset between IP and IGMP/MLD header into
account, too.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Simon Wunderlich <s...@simonwunderlich.de>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---
 net/bridge/br_multicast.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 03661d9..7105cdf 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1271,6 +1271,7 @@ static int br_ip4_multicast_query(struct net_bridge *br,
unsigned long max_delay;
unsigned long now = jiffies;
__be32 group;
+   int offset = skb_transport_offset(skb);
int err = 0;
 
spin_lock(>multicast_lock);
@@ -1280,14 +1281,14 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 
group = ih->group;
 
-   if (skb->len == sizeof(*ih)) {
+   if (skb->len == offset + sizeof(*ih)) {
max_delay = ih->code * (HZ / IGMP_TIMER_SCALE);
 
if (!max_delay) {
max_delay = 10 * HZ;
group = 0;
}
-   } else if (skb->len >= sizeof(*ih3)) {
+   } else if (skb->len >= offset + sizeof(*ih3)) {
ih3 = igmpv3_query_hdr(skb);
if (ih3->nsrcs)
goto out;
@@ -1350,6 +1351,7 @@ static int br_ip6_multicast_query(struct net_bridge *br,
unsigned long now = jiffies;
const struct in6_addr *group = NULL;
bool is_general_query;
+   int offset = skb_transport_offset(skb);
int err = 0;
 
spin_lock(>multicast_lock);
@@ -1357,8 +1359,8 @@ static int br_ip6_multicast_query(struct net_bridge *br,
(port && port->state == BR_STATE_DISABLED))
goto out;
 
-   if (skb->len == sizeof(*mld)) {
-   if (!pskb_may_pull(skb, sizeof(*mld))) {
+   if (skb->len == offset + sizeof(*mld)) {
+   if (!pskb_may_pull(skb, offset + sizeof(*mld))) {
err = -EINVAL;
goto out;
}
@@ -1367,7 +1369,7 @@ static int br_ip6_multicast_query(struct net_bridge *br,
if (max_delay)
group = >mld_mca;
} else {
-   if (!pskb_may_pull(skb, sizeof(*mld2q))) {
+   if (!pskb_may_pull(skb, offset + sizeof(*mld2q))) {
err = -EINVAL;
goto out;
}
-- 
1.7.10.4



[PATCHv3] net: fix bridge multicast packet checksum validation

2016-02-23 Thread Linus Lüssing
We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:

[...]
[   43.986968] eth0: hw csum failure
[   43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[   43.996193] Hardware name: BCM2709
[   43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] 
(show_stack+0x10/0x14)
[   44.007432] [<8001cf14>] (show_stack) from [<801ab614>] 
(dump_stack+0x80/0x90)
[   44.014695] [<801ab614>] (dump_stack) from [<802e4548>] 
(__skb_checksum_complete+0x6c/0xac)
[   44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] 
(ipv6_mc_validate_checksum+0x104/0x178)
[   44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] 
(skb_checksum_trimmed+0x130/0x188)
[   44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] 
(ipv6_mc_check_mld+0x118/0x338)
[   44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] 
(br_multicast_rcv+0x5dc/0xd00)
[   44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] 
(br_handle_frame_finish+0xac/0x51c)
[...]

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <nolt...@gmail.com>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---

v3:
* Missed that an skb_postpush_rcsum() was just added for v4.5, using that
  one instead

v2:
* substituted the storing of the csum with an skb_push_rcsum() approach
 (did some more reading about CHECKSUM_PARTIAL, and it seems to me that the
  skb_push direction is actually the easier/"safer" one, so there should be
  no resetting to CHECKSUM_NONE necessary)
* Rewording of commit message, this bug should not cause any packet loss
  due to "automatic" checksum recomputations in software if skb->csum
  is bogus (in the CHECKSUM_COMPLETE case)


 net/core/skbuff.c |   22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5bf88f5..8616d11 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2948,6 +2948,24 @@ int skb_append_pagefrags(struct sk_buff *skb, struct 
page *page,
 EXPORT_SYMBOL_GPL(skb_append_pagefrags);
 
 /**
+ * skb_push_rcsum - push skb and update receive checksum
+ * @skb: buffer to update
+ * @len: length of data pulled
+ *
+ * This function performs an skb_push on the packet and updates
+ * the CHECKSUM_COMPLETE checksum.  It should be used on
+ * receive path processing instead of skb_push unless you know
+ * that the checksum difference is zero (e.g., a valid IP header)
+ * or you are setting ip_summed to CHECKSUM_NONE.
+ */
+static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len)
+{
+   skb_push(skb, len);
+   skb_postpush_rcsum(skb, skb->data, len);
+   return skb->data;
+}
+
+/**
  * skb_pull_rcsum - pull skb and update receive checksum
  * @skb: buffer to update
  * @len: length of data pulled
@@ -4084,9 +4102,9 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb,
if (!pskb_may_pull(skb_chk, offset))
goto err;
 
-   __skb_pull(skb_chk, offset);
+   skb_pull_rcsum(skb_chk, offset);
ret = skb_chkf(skb_chk);
-   __skb_push(skb_chk, offset);
+   skb_push_rcsum(skb_chk, offset);
 
if (ret)
goto err;
-- 
1.7.10.4



[PATCHv2] net: fix bridge multicast packet checksum validation

2016-02-23 Thread Linus Lüssing
We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:

[...]
[   43.986968] eth0: hw csum failure
[   43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[   43.996193] Hardware name: BCM2709
[   43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] 
(show_stack+0x10/0x14)
[   44.007432] [<8001cf14>] (show_stack) from [<801ab614>] 
(dump_stack+0x80/0x90)
[   44.014695] [<801ab614>] (dump_stack) from [<802e4548>] 
(__skb_checksum_complete+0x6c/0xac)
[   44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] 
(ipv6_mc_validate_checksum+0x104/0x178)
[   44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] 
(skb_checksum_trimmed+0x130/0x188)
[   44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] 
(ipv6_mc_check_mld+0x118/0x338)
[   44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] 
(br_multicast_rcv+0x5dc/0xd00)
[   44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] 
(br_handle_frame_finish+0xac/0x51c)
[...]

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <nolt...@gmail.com>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---

v2:
* substituted the storing of the csum with an skb_push_rcsum() approach
 (did some more reading about CHECKSUM_PARTIAL, and it seems to me that the
  skb_push direction is actually the easier/"safer" one, so there should be
  no resetting to CHECKSUM_NONE necessary)
* Rewording of commit message, this bug should not cause any packet loss
  due to "automatic" checksum recomputations in software if skb->csum
  is bogus (in the CHECKSUM_COMPLETE case)

 include/linux/skbuff.h |   17 +
 net/core/skbuff.c  |   22 --
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ce9ff7..d66e007 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2784,6 +2784,23 @@ static inline int skb_linearize_cow(struct sk_buff *skb)
 }
 
 /**
+ * skb_postpush_rcsum - update checksum for received skb after push
+ * @skb: buffer to update
+ * @start: start of data before push
+ * @len: length of data pushed
+ *
+ * After doing a push on a received packet, you need to call this to
+ * update the CHECKSUM_COMPLETE checksum.
+ */
+
+static inline void skb_postpush_rcsum(struct sk_buff *skb,
+ const void *start, unsigned int len)
+{
+   if (skb->ip_summed == CHECKSUM_COMPLETE)
+   skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
+}
+
+/**
  * skb_postpull_rcsum - update checksum for received skb after pull
  * @skb: buffer to update
  * @start: start of data before pull
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5bf88f5..8616d11 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2948,6 +2948,24 @@ int skb_append_pagefrags(struct sk_buff *skb, struct 
page *page,
 EXPORT_SYMBOL_GPL(skb_append_pagefrags);
 
 /**
+ * skb_push_rcsum - push skb and update receive checksum
+ * @skb: buffer to update
+ * @len: length of data pulled
+ *
+ * This function performs an skb_push on the packet and updates
+ * the CHECKSUM_COMPLETE checksum.  It should be used on
+ * receive path processing instead of skb_push unless you know
+ * that the checksum difference is zero (e.g., a valid IP header)
+ * or you are setting ip_summed to CHECKSUM_NONE.
+ */
+static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len)
+{
+   skb_push(skb, len);
+   skb_postpush_rcsum(skb, skb->data, len);
+   return skb->data;
+}
+
+/**
  * skb_pull_rcsum - pull skb and update receive checksum
  * @skb: buffer to update
  * @len: length of data pulled
@@ -4084,9 +4102,9 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb,
if (!pskb_may_pull(skb_chk, offset))
goto err;
 
-   __skb_pull(skb_chk, offset);
+   skb_pull_rcsum(skb_chk, offset);
ret = skb_chkf(skb_chk);
-   __skb_push(skb_chk, offset);
+   skb_push_rcsum(skb_chk, offset);
 
if (ret)
goto err;
-- 
1.7.10.4



Re: [PATCH] net: fix bridge multicast packet checksum validation

2016-02-18 Thread Linus Lüssing
On Thu, Feb 18, 2016 at 01:51:34PM +0100, Steinar H. Gunderson wrote:
> On Mon, Feb 15, 2016 at 03:07:06AM +0100, Linus Lüssing wrote:
> > Steinar, can you check whether this fixes the bridge issues you reported on
> > bugzilla #99081? Not quite sure whether it is the same as yours as you
> > do not seem to have any such call traces.
> 
> It doesn't immediately sound like the same problem; why would promisc change
> anything if the problem is the checksumming?

The mdb you provided in the bugzilla ticket misses reports, so it
was unable to parse reports. Which could point to a checksumming
problem.

Enabling promisc probably did not fix the parsing for you, but instead
promisc forces to forward packets upstream on your interface independent
of the mdb. I would assume that even with promisc, your output
from "bridge mdb show" looks rather empty. Can you check?

> 
> I don't have any reboots scheduled for this machine right now, but I'll see
> what I can do wrt. testing.

Thanks :).


[PATCH] net: fix bridge multicast packet checksum validation

2016-02-14 Thread Linus Lüssing
We need to update the skb->csum after pulling the skb, otherwise
checksum validation will fail. This fixes multicast packet loss for
bridges and splats like the following:

[...]
[   43.986968] eth0: hw csum failure
[   43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[   43.996193] Hardware name: BCM2709
[   43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] 
(show_stack+0x10/0x14)
[   44.007432] [<8001cf14>] (show_stack) from [<801ab614>] 
(dump_stack+0x80/0x90)
[   44.014695] [<801ab614>] (dump_stack) from [<802e4548>] 
(__skb_checksum_complete+0x6c/0xac)
[   44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] 
(ipv6_mc_validate_checksum+0x104/0x178)
[   44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] 
(skb_checksum_trimmed+0x130/0x188)
[   44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] 
(ipv6_mc_check_mld+0x118/0x338)
[   44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] 
(br_multicast_rcv+0x5dc/0xd00)
[   44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] 
(br_handle_frame_finish+0xac/0x51c)
[...]

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <nolt...@gmail.com>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---

Steinar, can you check whether this fixes the bridge issues you reported on
bugzilla #99081? Not quite sure whether it is the same as yours as you
do not seem to have any such call traces.


I am not super happy with how this patch looks, but there is no "skb_push_rcsum"
available and skb_pull_rcsum() seems non-reversible as is. Alternative 
suggestions
always welcome.


 net/core/skbuff.c |   19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5bf88f5..6c34ef6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4076,6 +4076,11 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb,
struct sk_buff *skb_chk;
unsigned int offset = skb_transport_offset(skb);
__sum16 ret;
+   int ip_summed;
+   int csum_valid;
+   int csum_level;
+   int csum_bad;
+   __wsum csum;
 
skb_chk = skb_checksum_maybe_trim(skb, transport_len);
if (!skb_chk)
@@ -4084,10 +4089,22 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff 
*skb,
if (!pskb_may_pull(skb_chk, offset))
goto err;
 
-   __skb_pull(skb_chk, offset);
+   ip_summed = skb->ip_summed;
+   csum_valid = skb->csum_valid;
+   csum_level = skb->csum_level;
+   csum_bad = skb->csum_bad;
+   csum = skb->csum;
+
+   skb_pull_rcsum(skb_chk, offset);
ret = skb_chkf(skb_chk);
__skb_push(skb_chk, offset);
 
+   skb->ip_summed = ip_summed;
+   skb->csum_valid = csum_valid;
+   skb->csum_level = csum_level;
+   skb->csum_bad = csum_bad;
+   skb->csum = csum;
+
if (ret)
goto err;
 
-- 
1.7.10.4



[PATCH net] bridge: fix igmpv3 / mldv2 report parsing

2015-09-11 Thread Linus Lüssing
With the newly introduced helper functions the skb pulling is hidden in
the checksumming function - and undone before returning to the caller.

The IGMPv3 and MLDv2 report parsing functions in the bridge still
assumed that the skb is pointing to the beginning of the IGMP/MLD
message while it is now kept at the beginning of the IPv4/6 header,
breaking the message parsing and creating packet loss.

Fixing this by taking the offset between IP and IGMP/MLD header into
account, too.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Tobias Powalowski <tobias.powalow...@googlemail.com>
Tested-by: Tobias Powalowski <tobias.powalow...@googlemail.com>
Signed-off-by: Linus Lüssing <linus.luess...@c0d3.blue>
---
 net/bridge/br_multicast.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 66efdc2..480b3de 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1006,7 +1006,7 @@ static int br_ip4_multicast_igmp3_report(struct 
net_bridge *br,
 
ih = igmpv3_report_hdr(skb);
num = ntohs(ih->ngrec);
-   len = sizeof(*ih);
+   len = skb_transport_offset(skb) + sizeof(*ih);
 
for (i = 0; i < num; i++) {
len += sizeof(*grec);
@@ -1067,7 +1067,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge 
*br,
 
icmp6h = icmp6_hdr(skb);
num = ntohs(icmp6h->icmp6_dataun.un_data16[1]);
-   len = sizeof(*icmp6h);
+   len = skb_transport_offset(skb) + sizeof(*icmp6h);
 
for (i = 0; i < num; i++) {
__be16 *nsrcs, _nsrcs;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code

2015-08-12 Thread Linus Lüssing
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e455 (net: Export IGMP/MLD message validation code)
Reported-by: Brenden Blanco bbla...@plumgrid.com
Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue
---
 net/bridge/br_multicast.c |4 ++--
 net/core/skbuff.c |   37 ++---
 net/ipv4/igmp.c   |   33 ++---
 net/ipv6/mcast_snoop.c|   33 ++---
 4 files changed, 56 insertions(+), 51 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 0b39dcc..1285eaf 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1591,7 +1591,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
break;
}
 
-   if (skb_trimmed)
+   if (skb_trimmed  skb_trimmed != skb)
kfree_skb(skb_trimmed);
 
return err;
@@ -1636,7 +1636,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
break;
}
 
-   if (skb_trimmed)
+   if (skb_trimmed  skb_trimmed != skb)
kfree_skb(skb_trimmed);
 
return err;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b6a19ca..bf9a5d9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4022,8 +4022,8 @@ EXPORT_SYMBOL(skb_checksum_setup);
  * Otherwise returns the provided skb. Returns NULL in error cases
  * (e.g. transport_len exceeds skb length or out-of-memory).
  *
- * Caller needs to set the skb transport header and release the returned skb.
- * Provided skb is consumed.
+ * Caller needs to set the skb transport header and free any returned skb if it
+ * differs from the provided skb.
  */
 static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb,
   unsigned int transport_len)
@@ -4032,16 +4032,12 @@ static struct sk_buff *skb_checksum_maybe_trim(struct 
sk_buff *skb,
unsigned int len = skb_transport_offset(skb) + transport_len;
int ret;
 
-   if (skb-len  len) {
-   kfree_skb(skb);
+   if (skb-len  len)
return NULL;
-   } else if (skb-len == len) {
+   else if (skb-len == len)
return skb;
-   }
 
skb_chk = skb_clone(skb, GFP_ATOMIC);
-   kfree_skb(skb);
-
if (!skb_chk)
return NULL;
 
@@ -4066,8 +4062,8 @@ static struct sk_buff *skb_checksum_maybe_trim(struct 
sk_buff *skb,
  * If the skb has data beyond the given transport length, then a
  * trimmed  cloned skb is checked and returned.
  *
- * Caller needs to set the skb transport header and release the returned skb.
- * Provided skb is consumed.
+ * Caller needs to set the skb transport header and free any returned skb if it
+ * differs from the provided skb.
  */
 struct sk_buff *skb_checksum_trimmed(struct sk_buff *skb,
 unsigned int transport_len,
@@ -4079,23 +4075,26 @@ struct sk_buff *skb_checksum_trimmed(struct sk_buff 
*skb,
 
skb_chk = skb_checksum_maybe_trim(skb, transport_len);
if (!skb_chk)
-   return NULL;
+   goto err;
 
-   if (!pskb_may_pull(skb_chk, offset)) {
-   kfree_skb(skb_chk);
-   return NULL;
-   }
+   if (!pskb_may_pull(skb_chk, offset))
+   goto err;
 
__skb_pull(skb_chk, offset);
ret = skb_chkf(skb_chk);
__skb_push(skb_chk, offset);
 
-   if (ret) {
-   kfree_skb(skb_chk);
-   return NULL;
-   }
+   if (ret)
+   goto err;
 
return skb_chk;
+
+err:
+   if (skb_chk  skb_chk != skb)
+   kfree_skb(skb_chk);
+
+   return NULL;
+
 }
 EXPORT_SYMBOL(skb_checksum_trimmed);
 
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 651cdf6..9fdfd9d 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1435,33 +1435,35 @@ static int __ip_mc_check_igmp(struct sk_buff *skb, 
struct sk_buff **skb_trimmed)
struct sk_buff *skb_chk;
unsigned int transport_len;
unsigned int len = skb_transport_offset(skb) + sizeof(struct igmphdr);
-   int ret;
+   int ret = -EINVAL;
 
transport_len = ntohs(ip_hdr(skb)-tot_len) - ip_hdrlen(skb);
 
-   skb_get(skb

Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128

2015-08-11 Thread Linus Lüssing
On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote:
 Doing some code reading with Alexei, we found a suspect commit, which
 introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
 when skb-len == len.

Urgh, didn't know that pskb_may_pull() doesn't like an skb with a
reference count greater than one... But yes, the BUG() call in
skbuff.c:1128 / pskb_expand_head() says that (though in this case
the BUG() in skbuff.c call actually seems kinda weird (/wrong?), as
it isn't shared between different code paths).

Thanks for the thorough analysis, going to provide a patch within
the next 24h (hopefully).

Cheers, Linus
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128

2015-08-11 Thread Linus Lüssing
On Tue, Aug 11, 2015 at 10:51:40PM +0200, Linus Lüssing wrote:
 On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote:
  Doing some code reading with Alexei, we found a suspect commit, which
  introduces an skb_get and skb_may_pull of the same skb, which leads to the 
  BUG
  when skb-len == len.
 
 Urgh, didn't know that pskb_may_pull() doesn't like an skb with a
 reference count greater than one... But yes, the BUG() call in
 skbuff.c:1128 / pskb_expand_head() says that (though in this case
 the BUG() in skbuff.c call actually seems kinda weird (/wrong?), as
 it isn't shared between different code paths).

The more I think about it, I'm tending to remove the BUG() call in
pskb_expand_head() as in this case it obviously isn't a bug.

The skb_get() allows a simple and in my opinion easy to read cleanup
part of skb_trimmed for any caller of ip{v6,}_mc_check_mld(). No need
to check whether skb == skb_trimmed for a caller for instance,
simply checking whether skb_trimmed exists is enough.


Any objections to remove the if (skb_shared(skb)) BUG() part in
pskb_expand_head()? Or would there be any other undesired side
effects in utilising skb_get() like that?

Cheers, Linus
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Caller or callee setting skb-network_header?

2015-06-29 Thread Linus Lüssing
Hi,

For an .ndo_start_xmit handler, can the callee rely on the caller
setting the skb network header?

Or should the callee set it before performing any
skb_network_header()/ip_hdr()/ipv6_hdr()/... calls?

Cheers, Linus

PS: Currently looking at batman-adv's ndo_start_xmit handler
interface_tx(), whether an skb_set_network_header() should be
added there as it uses ip_hdr()/ipv6_hdr() later
(- batadv_mcast_forw_mode_check_ipv{4,6}()). Testing this in
VMs it seemed to be set correctly by the caller, but wanted to
check whether this can or shouldn't be relied on.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bridge: fix br_multicast_query_expired() bug

2015-05-28 Thread Linus Lüssing
On Thu, May 28, 2015 at 04:42:54AM -0700, Eric Dumazet wrote:
 Intent of the code was to clear port field, not the pointer to querier.

Acked-by: Linus Lüssing linus.luess...@c0d3.blue
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 net-next] bridge: allow setting hash_max + multicast_router if interface is down

2015-05-22 Thread Linus Lüssing
Network managers like netifd (used in OpenWRT for instance) try to
configure interface options after creation but before setting the
interface up.

Unfortunately the sysfs / bridge currently only allows to configure the
hash_max and multicast_router options when the bridge interface is up.
But since br_multicast_init() doesn't start any timers and only sets
default values and initializes timers it should be save to reconfigure
the default values after that, before things actually get active after
the bridge is set up.

Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue
---
Changelog v3:
* Readded two breaks (cosmetic / for future safety reasons)

Changelog v2:
* remove another now unnecessary -ENOENT initialization
* consistently initialize to -EINVAL, allowing to shorten two switch-cases

 net/bridge/br_multicast.c |   24 +++-
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2d69d5c..28a87c2 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1772,11 +1772,9 @@ out:
 
 int br_multicast_set_router(struct net_bridge *br, unsigned long val)
 {
-   int err = -ENOENT;
+   int err = -EINVAL;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
 
switch (val) {
case 0:
@@ -1787,13 +1785,8 @@ int br_multicast_set_router(struct net_bridge *br, 
unsigned long val)
br-multicast_router = val;
err = 0;
break;
-
-   default:
-   err = -EINVAL;
-   break;
}
 
-unlock:
spin_unlock_bh(br-multicast_lock);
 
return err;
@@ -1802,11 +1795,9 @@ unlock:
 int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
 {
struct net_bridge *br = p-br;
-   int err = -ENOENT;
+   int err = -EINVAL;
 
spin_lock(br-multicast_lock);
-   if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED)
-   goto unlock;
 
switch (val) {
case 0:
@@ -1828,13 +1819,8 @@ int br_multicast_set_port_router(struct net_bridge_port 
*p, unsigned long val)
 
br_multicast_add_router(br, p);
break;
-
-   default:
-   err = -EINVAL;
-   break;
}
 
-unlock:
spin_unlock(br-multicast_lock);
 
return err;
@@ -1939,15 +1925,11 @@ unlock:
 
 int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 {
-   int err = -ENOENT;
+   int err = -EINVAL;
u32 old;
struct net_bridge_mdb_htable *mdb;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
-
-   err = -EINVAL;
if (!is_power_of_2(val))
goto unlock;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next] bridge: allow setting hash_max + multicast_router if interface is down

2015-05-22 Thread Linus Lüssing
Network managers like netifd (used in OpenWRT for instance) try to
configure interface options after creation but before setting the
interface up.

Unfortunately the sysfs / bridge currently only allows to configure the
hash_max and multicast_router options when the bridge interface is up.
But since br_multicast_init() doesn't start any timers and only sets
default values and initializes timers it should be save to reconfigure
the default values after that, before things actually get active after
the bridge is set up.

Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue
---
Changelog v2:
* remove another now unnecessary -ENOENT initialization
* consistently initialize to -EINVAL, allowing to shorten two switch-cases

 net/bridge/br_multicast.c |   26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2d69d5c..9955fa4 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1772,11 +1772,9 @@ out:
 
 int br_multicast_set_router(struct net_bridge *br, unsigned long val)
 {
-   int err = -ENOENT;
+   int err = -EINVAL;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
 
switch (val) {
case 0:
@@ -1786,14 +1784,8 @@ int br_multicast_set_router(struct net_bridge *br, 
unsigned long val)
case 1:
br-multicast_router = val;
err = 0;
-   break;
-
-   default:
-   err = -EINVAL;
-   break;
}
 
-unlock:
spin_unlock_bh(br-multicast_lock);
 
return err;
@@ -1802,11 +1794,9 @@ unlock:
 int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
 {
struct net_bridge *br = p-br;
-   int err = -ENOENT;
+   int err = -EINVAL;
 
spin_lock(br-multicast_lock);
-   if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED)
-   goto unlock;
 
switch (val) {
case 0:
@@ -1827,14 +1817,8 @@ int br_multicast_set_port_router(struct net_bridge_port 
*p, unsigned long val)
break;
 
br_multicast_add_router(br, p);
-   break;
-
-   default:
-   err = -EINVAL;
-   break;
}
 
-unlock:
spin_unlock(br-multicast_lock);
 
return err;
@@ -1939,15 +1923,11 @@ unlock:
 
 int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 {
-   int err = -ENOENT;
+   int err = -EINVAL;
u32 old;
struct net_bridge_mdb_htable *mdb;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
-
-   err = -EINVAL;
if (!is_power_of_2(val))
goto unlock;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] bridge: allow setting hash_max + multicast_router if interface is down

2015-05-21 Thread Linus Lüssing
On Thu, May 21, 2015 at 11:49:21AM +0800, Herbert Xu wrote:
 The timer operations are all supposed to be idempotent.  So enabling
 a port twice or stopping it twice should be OK.

Oki doki.

 
  * Might calls to br_multicast_add_router() via br_multicast_enable_port()
cause unintended side-effects?
 
 What do you mean? How does add_router get called from enable_port?

Sorry, ment br_multicast_add_router() via
br_multicast_set_port_router(). But it's not modifying any timers,
and other modifications are locked by the multicast lock, right.

 See above.  It's there so that you don't readd a timer when we're
 calling del_timer_sync.  del_timer_sync has to be called without
 the multicast lock so that's why we need another mechanism to
 prevent the timers from being readded.

Right, all the touched functions never rearm a timer. The
multicast_router timer may only get readded upon receiving a
multicast query.
(br_multicast_query_received()-br_multicast_mark_router() )
By removing the netif_running check we might only delete a timer
which wasn't running anyway which as you said already is safe.

 
 AFAICS the spots you patched aren't adding timers so they *should*
 be OK.

Okay, thanks for your thorough explanations about the timers and
how the locking is supposed to work. After your explanations I
went over the code a few more times and am fairly confident too
now, that this patch is supposed to work fine.

Going to resend this patch without the RFC tag.

Cheers, Linus
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bridge: allow setting hash_max + multicast_router if interface is down

2015-05-21 Thread Linus Lüssing
Network managers like netifd (used in OpenWRT for instance) try to
configure interface options after creation but before setting the
interface up.

Unfortunately the sysfs / bridge currently only allows to configure the
hash_max and multicast_router options when the bridge interface is up.
But since br_multicast_init() doesn't start any timers and only sets
default values and initializes timers it should be save to reconfigure
the default values after that, before things actually get active after
the bridge is set up.

With this patch hash_max and multicast_router attributes can be
changed even if the according bridge (port) is down, just like other
other bridge (port) attributes allow too.

Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue
---
Changelog:
* [RFC PATCH net-next] - [PATCH net-next]


 net/bridge/br_multicast.c |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2d69d5c..066199e 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1775,8 +1775,6 @@ int br_multicast_set_router(struct net_bridge *br, 
unsigned long val)
int err = -ENOENT;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
 
switch (val) {
case 0:
@@ -1793,7 +1791,6 @@ int br_multicast_set_router(struct net_bridge *br, 
unsigned long val)
break;
}
 
-unlock:
spin_unlock_bh(br-multicast_lock);
 
return err;
@@ -1802,18 +1799,15 @@ unlock:
 int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
 {
struct net_bridge *br = p-br;
-   int err = -ENOENT;
+   int err = 0;
 
spin_lock(br-multicast_lock);
-   if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED)
-   goto unlock;
 
switch (val) {
case 0:
case 1:
case 2:
p-multicast_router = val;
-   err = 0;
 
if (val  2  !hlist_unhashed(p-rlist))
hlist_del_init_rcu(p-rlist);
@@ -1834,7 +1828,6 @@ int br_multicast_set_port_router(struct net_bridge_port 
*p, unsigned long val)
break;
}
 
-unlock:
spin_unlock(br-multicast_lock);
 
return err;
@@ -1939,15 +1932,11 @@ unlock:
 
 int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 {
-   int err = -ENOENT;
+   int err = -EINVAL;
u32 old;
struct net_bridge_mdb_htable *mdb;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
-
-   err = -EINVAL;
if (!is_power_of_2(val))
goto unlock;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next] bridge: allow setting hash_max + multicast_router if interface is down

2015-05-20 Thread Linus Lüssing
Network managers like netifd (used in OpenWRT for instance) try to
configure interface options after creation but before setting the
interface up.

Unfortunately the sysfs / bridge currently only allows to configure the
hash_max and multicast_router options when the bridge interface is up.
But since br_multicast_init() doesn't start any timers and only sets
default values and initializes timers it should be save to reconfigure
the default values after that, before things actually get active after
the bridge is set up.

With this patch hash_max and multicast_router attributes can be
changed even if the according bridge (port) is down, just like other
other bridge (port) attributes allow too.

Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue
---

I'm currently a little unsure about a few things (that's why I'm sending
this as an RFC):

* For i=br_multicast_init(), e=br_multicast_enable() and
  s=br_multicast_stop() is the order i-e-s-e-s-e-... always
  ensured by the netdev API? Will this work even if I have
  many fast and wild calls to ip link set up dev br0 and
  ip link set down dev br0 and some changes to hash_max and
  multicast_router in between?

* Might calls to br_multicast_add_router() via br_multicast_enable_port()
  cause unintended side-effects?

* (maybe independent of this patch: )
  Can fast changes to the multicast_router attribute of a bridge (port)
  cause race conditions because a del_timer() instead of a
  del_timer_sync() is used in br_multicast_set_{port,}router()?
  (or: why does br_multicast_stop() use del_timer_sync() while
   br_multicast_disable_port() doesn't?)

* @Herbert: Do you remember whether there was a reason for
  checking netif_running() or the bridge port state which
  I might have overlooked?


PS: Running this patch didn't create any crashes for me so far
and multicast traffic still runs fine. No locust infestation in sight
yet either. Still careful though :P.

 net/bridge/br_multicast.c |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2d69d5c..066199e 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1775,8 +1775,6 @@ int br_multicast_set_router(struct net_bridge *br, 
unsigned long val)
int err = -ENOENT;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
 
switch (val) {
case 0:
@@ -1793,7 +1791,6 @@ int br_multicast_set_router(struct net_bridge *br, 
unsigned long val)
break;
}
 
-unlock:
spin_unlock_bh(br-multicast_lock);
 
return err;
@@ -1802,18 +1799,15 @@ unlock:
 int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
 {
struct net_bridge *br = p-br;
-   int err = -ENOENT;
+   int err = 0;
 
spin_lock(br-multicast_lock);
-   if (!netif_running(br-dev) || p-state == BR_STATE_DISABLED)
-   goto unlock;
 
switch (val) {
case 0:
case 1:
case 2:
p-multicast_router = val;
-   err = 0;
 
if (val  2  !hlist_unhashed(p-rlist))
hlist_del_init_rcu(p-rlist);
@@ -1834,7 +1828,6 @@ int br_multicast_set_port_router(struct net_bridge_port 
*p, unsigned long val)
break;
}
 
-unlock:
spin_unlock(br-multicast_lock);
 
return err;
@@ -1939,15 +1932,11 @@ unlock:
 
 int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 {
-   int err = -ENOENT;
+   int err = -EINVAL;
u32 old;
struct net_bridge_mdb_htable *mdb;
 
spin_lock_bh(br-multicast_lock);
-   if (!netif_running(br-dev))
-   goto unlock;
-
-   err = -EINVAL;
if (!is_power_of_2(val))
goto unlock;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html