from:"Nikolay Aleksandrov"

Re: [PATCH net-next v2 0/3] net: bridge: convert multicast to generic rhashtable

2018-12-04 Thread Nikolay Aleksandrov

On 12/4/18 5:01 PM, Nikolay Aleksandrov wrote:
> Hi,
> The current bridge multicast code uses a custom rhashtable
> implementation which predates the generic rhashtable API. Patch 01
> converts it to use the generic kernel rhashtable which simplifies the
> code a lot and removes duplicated functionality. The convert also makes
> hash_elasticity obsolete as the generic rhashtable already has such
> checks and has a fixed elasticity of RHT_ELASTICITY (16 currently) so we
> emit a warning whenever elasticity is set and return RHT_ELASTICITY when
> read (patch 02). Since now we have the generic rhashtable which
> autoshrinks we can be more liberal with the default hash maximum so patch
> 03 increases it to 4096 and moves it to a define in br_private.h.
> 
> v2: send the latest version of the set which handles when IGMP snooping
> is not defined, changes are in patch 01
> 
> Thanks,
>  Nik

Unfortunately I'll have to send a v3, I missed the fact that rhashtable_lookup
requires RCU to be always held and even though br_mdb_ip_get() does the
lookups under multicast_lock, we can still trigger suspicious RCU usage because 
RCU
is not held in those protected places. Need to add a non-rcu mdb lookup variant.

On a related note I saw Paul's call_rcu patches hit, so I'll wait for those
to go in and will rebase on top of them before sending the v3 as the bridge
change will have a conflict with this set.

Thanks,
 Nik

[PATCH net-next v2 1/3] net: bridge: convert multicast to generic rhashtable

2018-12-04 Thread Nikolay Aleksandrov

The bridge multicast code currently uses a custom resizable hashtable
which predates the generic rhashtable interface. It has many
shortcomings compared and duplicates functionality that is presently
available via the generic rhashtable, so this patch removes the custom
rhashtable implementation in favor of the kernel's generic rhashtable.
The hash maximum is kept and the rhashtable's size is used to do a loose
check if it's reached in which case we revert to the old behaviour and
disable further bridge multicast processing. Also now we can support any
hash maximum, doesn't need to be a power of 2.

v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
placeholders

Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_device.c|  10 +
 net/bridge/br_mdb.c   | 120 +---
 net/bridge/br_multicast.c | 403 ++
 net/bridge/br_private.h   |  38 ++--
 4 files changed, 146 insertions(+), 425 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index c6abf927f0c9..9f41a5d4da3f 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -131,9 +131,17 @@ static int br_dev_init(struct net_device *dev)
return err;
}
 
+   err = br_mdb_hash_init(br);
+   if (err) {
+   free_percpu(br->stats);
+   br_fdb_hash_fini(br);
+   return err;
+   }
+
err = br_vlan_init(br);
if (err) {
free_percpu(br->stats);
+   br_mdb_hash_fini(br);
br_fdb_hash_fini(br);
return err;
}
@@ -142,6 +150,7 @@ static int br_dev_init(struct net_device *dev)
if (err) {
free_percpu(br->stats);
br_vlan_flush(br);
+   br_mdb_hash_fini(br);
br_fdb_hash_fini(br);
}
br_set_lockdep_class(dev);
@@ -156,6 +165,7 @@ static void br_dev_uninit(struct net_device *dev)
br_multicast_dev_del(br);
br_multicast_uninit_stats(br);
br_vlan_flush(br);
+   br_mdb_hash_fini(br);
br_fdb_hash_fini(br);
free_percpu(br->stats);
 }
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index a7ea2d431714..ea8abdb56df3 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -78,82 +78,72 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry 
*entry, struct br_ip *ip)
 static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
struct net_device *dev)
 {
+   int idx = 0, s_idx = cb->args[1], err = 0;
struct net_bridge *br = netdev_priv(dev);
-   struct net_bridge_mdb_htable *mdb;
+   struct net_bridge_mdb_entry *mp;
struct nlattr *nest, *nest2;
-   int i, err = 0;
-   int idx = 0, s_idx = cb->args[1];
 
if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
return 0;
 
-   mdb = rcu_dereference(br->mdb);
-   if (!mdb)
-   return 0;
-
nest = nla_nest_start(skb, MDBA_MDB);
if (nest == NULL)
return -EMSGSIZE;
 
-   for (i = 0; i < mdb->max; i++) {
-   struct net_bridge_mdb_entry *mp;
+   hlist_for_each_entry_rcu(mp, >mdb_list, mdb_node) {
struct net_bridge_port_group *p;
struct net_bridge_port_group __rcu **pp;
struct net_bridge_port *port;
 
-   hlist_for_each_entry_rcu(mp, >mhash[i], hlist[mdb->ver]) {
-   if (idx < s_idx)
-   goto skip;
+   if (idx < s_idx)
+   goto skip;
 
-   nest2 = nla_nest_start(skb, MDBA_MDB_ENTRY);
-   if (nest2 == NULL) {
-   err = -EMSGSIZE;
-   goto out;
-   }
+   nest2 = nla_nest_start(skb, MDBA_MDB_ENTRY);
+   if (!nest2) {
+   err = -EMSGSIZE;
+   break;
+   }
 
-   for (pp = >ports;
-(p = rcu_dereference(*pp)) != NULL;
- pp = >next) {
-   struct nlattr *nest_ent;
-   struct br_mdb_entry e;
-
-   port = p->port;
-   if (!port)
-   continue;
-
-   memset(, 0, sizeof(e));
-   e.ifindex = port->dev->ifindex;
-   e.vid = p->addr.vid;
-   __mdb_entry_fill_flags(, p->flags);
-   if (p->addr.proto == htons(ETH_P_IP))
-   e.addr.u.ip4 = p->addr.u.ip4;
+   for (pp = >ports; (p = rcu_dereference(*pp)) != NULL;
+

[PATCH net-next v2 3/3] net: bridge: increase multicast's default maximum number of entries

2018-12-04 Thread Nikolay Aleksandrov

bridge's default hash_max was 512 which is rather conservative, now that
we're using the generic rhashtable API which autoshrinks let's increase
it to 4096 and move it to a define in br_private.h.

Signed-off-by: Nikolay Aleksandrov 
---
v2: no change

 net/bridge/br_multicast.c | 2 +-
 net/bridge/br_private.h   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index d71edc9fa5c3..031067283678 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1743,7 +1743,7 @@ static void br_ip6_multicast_query_expired(struct 
timer_list *t)
 
 void br_multicast_init(struct net_bridge *br)
 {
-   br->hash_max = 512;
+   br->hash_max = BR_MULTICAST_DEFAULT_HASH_MAX;
 
br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
br->multicast_last_member_count = 2;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index df7584068d25..20a703a679e1 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -31,6 +31,8 @@
 #define BR_PORT_BITS   10
 #define BR_MAX_PORTS   (1<

[PATCH net-next v2 2/3] net: bridge: mark hash_elasticity as obsolete

2018-12-04 Thread Nikolay Aleksandrov

Now that the bridge multicast uses the generic rhashtable interface we
can drop the hash_elasticity option as that is already done for us and
it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a
warning about the obsolete option when the hash_elasticity is set.

Signed-off-by: Nikolay Aleksandrov 
---
v2: no change

 net/bridge/br_multicast.c |  1 -
 net/bridge/br_netlink.c   | 11 ---
 net/bridge/br_private.h   |  1 -
 net/bridge/br_sysfs_br.c  |  6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index a3ac47dd0e5f..d71edc9fa5c3 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1743,7 +1743,6 @@ static void br_ip6_multicast_query_expired(struct 
timer_list *t)
 
 void br_multicast_init(struct net_bridge *br)
 {
-   br->hash_elasticity = 4;
br->hash_max = 512;
 
br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 13cd50326af2..d3a5963de35d 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1189,11 +1189,9 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return err;
}
 
-   if (data[IFLA_BR_MCAST_HASH_ELASTICITY]) {
-   u32 val = nla_get_u32(data[IFLA_BR_MCAST_HASH_ELASTICITY]);
-
-   br->hash_elasticity = val;
-   }
+   if (data[IFLA_BR_MCAST_HASH_ELASTICITY])
+   br_warn(br, "the hash_elasticity option has been deprecated and 
is always %u\n",
+   RHT_ELASTICITY);
 
if (data[IFLA_BR_MCAST_HASH_MAX]) {
u32 hash_max = nla_get_u32(data[IFLA_BR_MCAST_HASH_MAX]);
@@ -1457,8 +1455,7 @@ static int br_fill_info(struct sk_buff *skb, const struct 
net_device *brdev)
   br_opt_get(br, BROPT_MULTICAST_QUERIER)) ||
nla_put_u8(skb, IFLA_BR_MCAST_STATS_ENABLED,
   br_opt_get(br, BROPT_MULTICAST_STATS_ENABLED)) ||
-   nla_put_u32(skb, IFLA_BR_MCAST_HASH_ELASTICITY,
-   br->hash_elasticity) ||
+   nla_put_u32(skb, IFLA_BR_MCAST_HASH_ELASTICITY, RHT_ELASTICITY) ||
nla_put_u32(skb, IFLA_BR_MCAST_HASH_MAX, br->hash_max) ||
nla_put_u32(skb, IFLA_BR_MCAST_LAST_MEMBER_CNT,
br->multicast_last_member_count) ||
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index a0ce0822921c..df7584068d25 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -372,7 +372,6 @@ struct net_bridge {
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 
-   u32 hash_elasticity;
u32 hash_max;
 
u32 multicast_last_member_count;
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 6a378a7e16ea..05a910d3b70a 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -424,13 +424,13 @@ static DEVICE_ATTR_RW(multicast_querier);
 static ssize_t hash_elasticity_show(struct device *d,
struct device_attribute *attr, char *buf)
 {
-   struct net_bridge *br = to_bridge(d);
-   return sprintf(buf, "%u\n", br->hash_elasticity);
+   return sprintf(buf, "%u\n", RHT_ELASTICITY);
 }
 
 static int set_elasticity(struct net_bridge *br, unsigned long val)
 {
-   br->hash_elasticity = val;
+   br_warn(br, "the hash_elasticity option has been deprecated and is 
always %u\n",
+   RHT_ELASTICITY);
return 0;
 }
 
-- 
2.17.2

[PATCH net-next v2 0/3] net: bridge: convert multicast to generic rhashtable

2018-12-04 Thread Nikolay Aleksandrov

Hi,
The current bridge multicast code uses a custom rhashtable
implementation which predates the generic rhashtable API. Patch 01
converts it to use the generic kernel rhashtable which simplifies the
code a lot and removes duplicated functionality. The convert also makes
hash_elasticity obsolete as the generic rhashtable already has such
checks and has a fixed elasticity of RHT_ELASTICITY (16 currently) so we
emit a warning whenever elasticity is set and return RHT_ELASTICITY when
read (patch 02). Since now we have the generic rhashtable which
autoshrinks we can be more liberal with the default hash maximum so patch
03 increases it to 4096 and moves it to a define in br_private.h.

v2: send the latest version of the set which handles when IGMP snooping
is not defined, changes are in patch 01

Thanks,
 Nik


Nikolay Aleksandrov (3):
  net: bridge: convert multicast to generic rhashtable
  net: bridge: mark hash_elasticity as obsolete
  net: bridge: increase multicast's default maximum number of entries

 net/bridge/br_device.c|  10 +
 net/bridge/br_mdb.c   | 120 +--
 net/bridge/br_multicast.c | 406 +++---
 net/bridge/br_netlink.c   |  11 +-
 net/bridge/br_private.h   |  41 ++--
 net/bridge/br_sysfs_br.c  |   6 +-
 6 files changed, 156 insertions(+), 438 deletions(-)

-- 
2.17.2

Re: [PATCH net-next 0/3] net: bridge: convert multicast to generic rhashtable

2018-12-04 Thread Nikolay Aleksandrov

On 04/12/2018 16:44, Nikolay Aleksandrov wrote:
> Hi,
> The current bridge multicast code uses a custom rhashtable
> implementation which predates the generic rhashtable API. Patch 01
> converts it to use the generic kernel rhashtable which simplifies the
> code a lot and removes duplicated functionality. The convert also makes
> hash_elasticity obsolete as the generic rhashtable already has such
> checks and has a fixed elasticity of RHT_ELASTICITY (16 currently) so we
> emit a warning whenever elasticity is set and return RHT_ELASTICITY when
> read (patch 02). Since now we have the generic rhashtable which
> autoshrinks we can be more liberal with the default hash maximum so patch
> 03 increases it to 4096 and moves it to a define in br_private.h.
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (3):
>   net: bridge: convert multicast to generic rhashtable
>   net: bridge: mark hash_elasticity as obsolete
>   net: bridge: increase multicast's default maximum number of entries
> 
>  net/bridge/br_device.c|  11 ++
>  net/bridge/br_mdb.c   | 120 +--
>  net/bridge/br_multicast.c | 405 ++
>  net/bridge/br_netlink.c   |  11 +-
>  net/bridge/br_private.h   |  36 ++--
>  net/bridge/br_sysfs_br.c  |   6 +-
>  6 files changed, 149 insertions(+), 440 deletions(-)
> 

Self-NAK, sent an older version of the patch-set which has a problem
when IGMP_SNOOPING is not defined. Will send the current in a few.

[PATCH net-next 2/3] net: bridge: mark hash_elasticity as obsolete

2018-12-04 Thread Nikolay Aleksandrov

Now that the bridge multicast uses the generic rhashtable interface we
can drop the hash_elasticity option as that is already done for us and
it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a
warning about the obsolete option when the hash_elasticity is set.

Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_multicast.c |  1 -
 net/bridge/br_netlink.c   | 11 ---
 net/bridge/br_private.h   |  1 -
 net/bridge/br_sysfs_br.c  |  6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index fdca91231815..79f981a4c225 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1743,7 +1743,6 @@ static void br_ip6_multicast_query_expired(struct 
timer_list *t)
 
 void br_multicast_init(struct net_bridge *br)
 {
-   br->hash_elasticity = 4;
br->hash_max = 512;
 
br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 13cd50326af2..d3a5963de35d 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1189,11 +1189,9 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return err;
}
 
-   if (data[IFLA_BR_MCAST_HASH_ELASTICITY]) {
-   u32 val = nla_get_u32(data[IFLA_BR_MCAST_HASH_ELASTICITY]);
-
-   br->hash_elasticity = val;
-   }
+   if (data[IFLA_BR_MCAST_HASH_ELASTICITY])
+   br_warn(br, "the hash_elasticity option has been deprecated and 
is always %u\n",
+   RHT_ELASTICITY);
 
if (data[IFLA_BR_MCAST_HASH_MAX]) {
u32 hash_max = nla_get_u32(data[IFLA_BR_MCAST_HASH_MAX]);
@@ -1457,8 +1455,7 @@ static int br_fill_info(struct sk_buff *skb, const struct 
net_device *brdev)
   br_opt_get(br, BROPT_MULTICAST_QUERIER)) ||
nla_put_u8(skb, IFLA_BR_MCAST_STATS_ENABLED,
   br_opt_get(br, BROPT_MULTICAST_STATS_ENABLED)) ||
-   nla_put_u32(skb, IFLA_BR_MCAST_HASH_ELASTICITY,
-   br->hash_elasticity) ||
+   nla_put_u32(skb, IFLA_BR_MCAST_HASH_ELASTICITY, RHT_ELASTICITY) ||
nla_put_u32(skb, IFLA_BR_MCAST_HASH_MAX, br->hash_max) ||
nla_put_u32(skb, IFLA_BR_MCAST_LAST_MEMBER_CNT,
br->multicast_last_member_count) ||
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index ff443aea279f..e3c56b418543 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -372,7 +372,6 @@ struct net_bridge {
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 
-   u32 hash_elasticity;
u32 hash_max;
 
u32 multicast_last_member_count;
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 6a378a7e16ea..05a910d3b70a 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -424,13 +424,13 @@ static DEVICE_ATTR_RW(multicast_querier);
 static ssize_t hash_elasticity_show(struct device *d,
struct device_attribute *attr, char *buf)
 {
-   struct net_bridge *br = to_bridge(d);
-   return sprintf(buf, "%u\n", br->hash_elasticity);
+   return sprintf(buf, "%u\n", RHT_ELASTICITY);
 }
 
 static int set_elasticity(struct net_bridge *br, unsigned long val)
 {
-   br->hash_elasticity = val;
+   br_warn(br, "the hash_elasticity option has been deprecated and is 
always %u\n",
+   RHT_ELASTICITY);
return 0;
 }
 
-- 
2.17.2

[PATCH net-next 3/3] net: bridge: increase multicast's default maximum number of entries

2018-12-04 Thread Nikolay Aleksandrov

bridge's default hash_max was 512 which is rather conservative, now that
we're using the generic rhashtable API which autoshrinks let's increase
it to 4096 and move it to a define in br_private.h.

Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_multicast.c | 2 +-
 net/bridge/br_private.h   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 79f981a4c225..5871086a0ab1 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1743,7 +1743,7 @@ static void br_ip6_multicast_query_expired(struct 
timer_list *t)
 
 void br_multicast_init(struct net_bridge *br)
 {
-   br->hash_max = 512;
+   br->hash_max = BR_MULTICAST_DEFAULT_HASH_MAX;
 
br->multicast_router = MDB_RTR_TYPE_TEMP_QUERY;
br->multicast_last_member_count = 2;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e3c56b418543..a469a779478e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -31,6 +31,8 @@
 #define BR_PORT_BITS   10
 #define BR_MAX_PORTS   (1<

[PATCH net-next 0/3] net: bridge: convert multicast to generic rhashtable

2018-12-04 Thread Nikolay Aleksandrov

Hi,
The current bridge multicast code uses a custom rhashtable
implementation which predates the generic rhashtable API. Patch 01
converts it to use the generic kernel rhashtable which simplifies the
code a lot and removes duplicated functionality. The convert also makes
hash_elasticity obsolete as the generic rhashtable already has such
checks and has a fixed elasticity of RHT_ELASTICITY (16 currently) so we
emit a warning whenever elasticity is set and return RHT_ELASTICITY when
read (patch 02). Since now we have the generic rhashtable which
autoshrinks we can be more liberal with the default hash maximum so patch
03 increases it to 4096 and moves it to a define in br_private.h.

Thanks,
 Nik


Nikolay Aleksandrov (3):
  net: bridge: convert multicast to generic rhashtable
  net: bridge: mark hash_elasticity as obsolete
  net: bridge: increase multicast's default maximum number of entries

 net/bridge/br_device.c|  11 ++
 net/bridge/br_mdb.c   | 120 +--
 net/bridge/br_multicast.c | 405 ++
 net/bridge/br_netlink.c   |  11 +-
 net/bridge/br_private.h   |  36 ++--
 net/bridge/br_sysfs_br.c  |   6 +-
 6 files changed, 149 insertions(+), 440 deletions(-)

-- 
2.17.2

[PATCH net-next 1/3] net: bridge: convert multicast to generic rhashtable

2018-12-04 Thread Nikolay Aleksandrov

The bridge multicast code currently uses a custom resizable hashtable
which predates the generic rhashtable interface. It has many
shortcomings compared and duplicates functionality that is presently
available via the generic rhashtable, so this patch removes the custom
rhashtable implementation in favor of the kernel's generic rhashtable.
The hash maximum is kept and the rhashtable's size is used to do a loose
check if it's reached in which case we revert to the old behaviour and
disable further bridge multicast processing. Also now we can support any
hash maximum, doesn't need to be a power of 2.

Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_device.c|  11 ++
 net/bridge/br_mdb.c   | 120 +---
 net/bridge/br_multicast.c | 402 ++
 net/bridge/br_private.h   |  33 ++--
 4 files changed, 139 insertions(+), 427 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index c6abf927f0c9..1cb09aaaf193 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -131,9 +131,17 @@ static int br_dev_init(struct net_device *dev)
return err;
}
 
+   err = br_mdb_hash_init(br);
+   if (err) {
+   free_percpu(br->stats);
+   br_fdb_hash_fini(br);
+   return err;
+   }
+
err = br_vlan_init(br);
if (err) {
free_percpu(br->stats);
+   br_mdb_hash_fini(br);
br_fdb_hash_fini(br);
return err;
}
@@ -142,6 +150,7 @@ static int br_dev_init(struct net_device *dev)
if (err) {
free_percpu(br->stats);
br_vlan_flush(br);
+   br_mdb_hash_fini(br);
br_fdb_hash_fini(br);
}
br_set_lockdep_class(dev);
@@ -156,6 +165,7 @@ static void br_dev_uninit(struct net_device *dev)
br_multicast_dev_del(br);
br_multicast_uninit_stats(br);
br_vlan_flush(br);
+   br_mdb_hash_fini(br);
br_fdb_hash_fini(br);
free_percpu(br->stats);
 }
@@ -426,6 +436,7 @@ void br_dev_setup(struct net_device *dev)
spin_lock_init(>lock);
INIT_LIST_HEAD(>port_list);
INIT_HLIST_HEAD(>fdb_list);
+   INIT_HLIST_HEAD(>mdb_list);
spin_lock_init(>hash_lock);
 
br->bridge_id.prio[0] = 0x80;
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index a7ea2d431714..ea8abdb56df3 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -78,82 +78,72 @@ static void __mdb_entry_to_br_ip(struct br_mdb_entry 
*entry, struct br_ip *ip)
 static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
struct net_device *dev)
 {
+   int idx = 0, s_idx = cb->args[1], err = 0;
struct net_bridge *br = netdev_priv(dev);
-   struct net_bridge_mdb_htable *mdb;
+   struct net_bridge_mdb_entry *mp;
struct nlattr *nest, *nest2;
-   int i, err = 0;
-   int idx = 0, s_idx = cb->args[1];
 
if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
return 0;
 
-   mdb = rcu_dereference(br->mdb);
-   if (!mdb)
-   return 0;
-
nest = nla_nest_start(skb, MDBA_MDB);
if (nest == NULL)
return -EMSGSIZE;
 
-   for (i = 0; i < mdb->max; i++) {
-   struct net_bridge_mdb_entry *mp;
+   hlist_for_each_entry_rcu(mp, >mdb_list, mdb_node) {
struct net_bridge_port_group *p;
struct net_bridge_port_group __rcu **pp;
struct net_bridge_port *port;
 
-   hlist_for_each_entry_rcu(mp, >mhash[i], hlist[mdb->ver]) {
-   if (idx < s_idx)
-   goto skip;
+   if (idx < s_idx)
+   goto skip;
 
-   nest2 = nla_nest_start(skb, MDBA_MDB_ENTRY);
-   if (nest2 == NULL) {
-   err = -EMSGSIZE;
-   goto out;
-   }
+   nest2 = nla_nest_start(skb, MDBA_MDB_ENTRY);
+   if (!nest2) {
+   err = -EMSGSIZE;
+   break;
+   }
 
-   for (pp = >ports;
-(p = rcu_dereference(*pp)) != NULL;
- pp = >next) {
-   struct nlattr *nest_ent;
-   struct br_mdb_entry e;
-
-   port = p->port;
-   if (!port)
-   continue;
-
-   memset(, 0, sizeof(e));
-   e.ifindex = port->dev->ifindex;
-   e.vid = p->addr.vid;
-   __mdb_entry_fill_flags(,

Re: [net:master 17/19] net//bridge/br_multicast.c:1432:32: error: 'union ' has no member named 'ip6'; did you mean 'ip4'?

2018-10-27 Thread Nikolay Aleksandrov

On 27/10/2018 03:50, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
> head:   aab456dfa404f3a16d6f1780e62a6a8533c4d008
> commit: 5a2de63fd1a59c30c02526d427bc014b98adf508 [17/19] bridge: do not add 
> port to router list when receives query with source 0.0.0.0
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> git checkout 5a2de63fd1a59c30c02526d427bc014b98adf508
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=powerpc 
> 
> All errors (new ones prefixed by >>):
> 
>net//bridge/br_multicast.c: In function 'br_multicast_query_received':
>>> net//bridge/br_multicast.c:1432:32: error: 'union ' has no 
>>> member named 'ip6'; did you mean 'ip4'?
>   !ipv6_addr_any(>u.ip6)))
>^~~
>ip4
> 
> vim +1432 net//bridge/br_multicast.c
> 
>   1414
>   1415static void br_multicast_query_received(struct net_bridge *br,
>   1416struct net_bridge_port 
> *port,
>   1417struct 
> bridge_mcast_other_query *query,
>   1418struct br_ip *saddr,
>   1419unsigned long max_delay)
>   1420{
>   1421if (!br_multicast_select_querier(br, port, saddr))
>   1422return;
>   1423
>   1424br_multicast_update_query_timer(br, query, max_delay);
>   1425
>   1426/* Based on RFC4541, section 2.1.1 IGMP Forwarding 
> Rules,
>   1427 * the arrival port for IGMP Queries where the source 
> address
>   1428 * is 0.0.0.0 should not be added to router port list.
>   1429 */
>   1430if ((saddr->proto == htons(ETH_P_IP) && saddr->u.ip4) ||
>   1431(saddr->proto == htons(ETH_P_IPV6) &&
>> 1432  !ipv6_addr_any(>u.ip6)))
>   1433br_multicast_mark_router(br, port);
>   1434}
>   1435
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 

Should've seen this one coming when reviewing the patch, ip6 is defined
only when IPv6 is configured.
I'll send a fix in a minute after running a few tests.

Thanks.

Re: [PATCH net] net: ipmr: fix unresolved entry dumps

2018-10-17 Thread Nikolay Aleksandrov

On 17/10/2018 22:34, Nikolay Aleksandrov wrote:
> If the skb space ends in an unresolved entry while dumping we'll miss
> some unresolved entries. The reason is due to zeroing the entry counter
> between dumping resolved and unresolved mfc entries. We should just
> keep counting until the whole table is dumped and zero when we move to
> the next as we have a separate table counter.
> 
> Reported-by: Colin Ian King 
> Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
> Signed-off-by: Nikolay Aleksandrov 
> ---
> Dropped Yuval's mail because it bounces.
> 
>  net/ipv4/ipmr_base.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
> index 1ad9aa62a97b..eab8cd5ec2f5 100644
> --- a/net/ipv4/ipmr_base.c
> +++ b/net/ipv4/ipmr_base.c
> @@ -296,8 +296,6 @@ int mr_rtm_dumproute(struct sk_buff *skb, struct 
> netlink_callback *cb,
>  next_entry:
>   e++;
>   }
> - e = 0;
> - s_e = 0;
>  
>   spin_lock_bh(lock);
>   list_for_each_entry(mfc, >mfc_unres_queue, list) {
> 

+CC Colin
Sorry about that, my script somehow missed the reported-by email.

[PATCH net] net: ipmr: fix unresolved entry dumps

2018-10-17 Thread Nikolay Aleksandrov

If the skb space ends in an unresolved entry while dumping we'll miss
some unresolved entries. The reason is due to zeroing the entry counter
between dumping resolved and unresolved mfc entries. We should just
keep counting until the whole table is dumped and zero when we move to
the next as we have a separate table counter.

Reported-by: Colin Ian King 
Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
Signed-off-by: Nikolay Aleksandrov 
---
Dropped Yuval's mail because it bounces.

 net/ipv4/ipmr_base.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index 1ad9aa62a97b..eab8cd5ec2f5 100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -296,8 +296,6 @@ int mr_rtm_dumproute(struct sk_buff *skb, struct 
netlink_callback *cb,
 next_entry:
e++;
}
-   e = 0;
-   s_e = 0;
 
spin_lock_bh(lock);
list_for_each_entry(mfc, >mfc_unres_queue, list) {
-- 
2.17.2

Re: ipmr, ip6mr: Unite dumproute flows

2018-10-17 Thread Nikolay Aleksandrov

On 17/10/2018 21:22, Colin Ian King wrote:
> Hi,
> 
> Static analysis on linux-next has picked up a potential bug with commit:
> 
> commit 7b0db85737db3f4d76b2a412e4f19eae59b8b494
> Author: Yuval Mintz 
> Date:   Wed Feb 28 23:29:39 2018 +0200
> 
> ipmr, ip6mr: Unite dumproute flows
> 
> in function mr_table_dump(), s_e is and never seems to change, so the
> check if (e < s_e) is never true:  See code below, as annotated by
> CoverityScan:
> 
> 317}
>assignment: Assigning: e = 0U.
> 
> 318e = 0;
>assignment: Assigning: s_e = 0U.
> 
> 319s_e = 0;
> 320
> 321spin_lock_bh(lock);
> 322list_for_each_entry(mfc, >mfc_unres_queue, list) {
>at_least: At condition e < s_e, the value of e must be at least 0.
>const: At condition e < s_e, the value of s_e must be equal to 0.
>dead_error_condition: The condition e < s_e cannot be true.
> 
> 323if (e < s_e)
> 
>CID 1474511 (#1 of 1): Logically dead code (DEADCODE)
> dead_error_line: Execution cannot reach this statement: goto next_entry2;.
> 
> 324goto next_entry2;
> 325if (filter->dev &&
> 326!mr_mfc_uses_dev(mrt, mfc, filter->dev))
> 327goto next_entry2;
> 328
> 329err = fill(mrt, skb, NETLINK_CB(cb->skb).portid,
> 330   cb->nlh->nlmsg_seq, mfc, RTM_NEWROUTE, flags);
> 331if (err < 0) {
> 332spin_unlock_bh(lock);
> 333goto out;
> 334}
> 335next_entry2:
> 
>incr: Incrementing e. The value of e is now 1.
>incr: Incrementing e. The value of e is now 2.
> 
> 336    e++;
> 
> 
> Note that the zero'ing of e and s_e for ipmr and ip6mr was added in by
> earlier commits:
> 
> commit 8fb472c09b9df478a062eacc7841448e40fc3c17
> Author: Nikolay Aleksandrov 
> Date:   Thu Jan 12 15:53:33 2017 +0100
> 
> ipmr: improve hash scalability
> 
> commit 87c418bf1323d57140f4b448715f64de3fbb7e91
> Author: Yuval Mintz 
> Date:   Wed Feb 28 23:29:31 2018 +0200
> 
> ip6mr: Align hash implementation to ipmr
> 
> Anyhow, this looks incorrect, but I'm not familiar with this code to
> suggest the correct fix.
> 
> Colin
> 

Indeed, there's a case where we might miss an unresolved entry due to the 
zeroing.
I'll send a fix shortly after testing.

Thanks,
 Nik

Re: [PATCH iproute2 net-next] bridge: add support for backup port

2018-10-12 Thread Nikolay Aleksandrov

On October 12, 2018 6:40:28 PM GMT+03:00, Stephen Hemminger 
 wrote:
>On Fri, 12 Oct 2018 14:42:55 +0300
>Nikolay Aleksandrov  wrote:
>
>> This patch adds support for the new backup port option that can be
>set
>> on a bridge port. If the port's carrier goes down all of the traffic
>> gets redirected to the configured backup port. We add the following
>new
>> arguments:
>> $ ip link set dev brport type bridge_slave backup_port brport2
>> $ ip link set dev brport type bridge_slave nobackup_port
>> 
>> $ bridge link set dev brport backup_port brport2
>> $ bridge link set dev brport nobackup_port
>> 
>> The man pages are updated respectively.
>> Also 2 minor style adjustments:
>> - add missing space to bridge man page's state argument
>> - use lower starting case for vlan_tunnel in ip-link man page (to be
>> consistent with the rest)
>> 
>> Signed-off-by: Nikolay Aleksandrov 
>
>This seems a bit like feature creep in the bridge.
>Why not use team or bonding?

Not really, the bond/team cannot work in such way. We did discuss it when the 
kernel patches were sent, for more information please check
https://www.spinics.net/lists/netdev/msg514642.html

Thanks,
  Nik

[PATCH iproute2 net-next] bridge: add support for backup port

2018-10-12 Thread Nikolay Aleksandrov

This patch adds support for the new backup port option that can be set
on a bridge port. If the port's carrier goes down all of the traffic
gets redirected to the configured backup port. We add the following new
arguments:
$ ip link set dev brport type bridge_slave backup_port brport2
$ ip link set dev brport type bridge_slave nobackup_port

$ bridge link set dev brport backup_port brport2
$ bridge link set dev brport nobackup_port

The man pages are updated respectively.
Also 2 minor style adjustments:
- add missing space to bridge man page's state argument
- use lower starting case for vlan_tunnel in ip-link man page (to be
consistent with the rest)

Signed-off-by: Nikolay Aleksandrov 
---
 bridge/link.c| 26 ++
 ip/iplink_bridge_slave.c | 18 ++
 man/man8/bridge.8| 13 -
 man/man8/ip-link.8.in| 14 --
 4 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 09df489b26ab..4a14845da591 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -152,6 +152,16 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
if (prtb[IFLA_BRPORT_VLAN_TUNNEL])
print_onoff(fp, "vlan_tunnel",

rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
+
+   if (prtb[IFLA_BRPORT_BACKUP_PORT]) {
+   int ifidx;
+
+   ifidx = rta_getattr_u32(prtb[IFLA_BRPORT_BACKUP_PORT]);
+   print_string(PRINT_ANY,
+"backup_port", "backup_port %s ",
+ll_index_to_name(ifidx));
+   }
+
if (prtb[IFLA_BRPORT_ISOLATED])
print_onoff(fp, "isolated",
rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
@@ -255,6 +265,7 @@ static void usage(void)
fprintf(stderr, "   [ vlan_tunnel {on | 
off} ]\n");
fprintf(stderr, "   [ isolated {on | off} 
]\n");
fprintf(stderr, "   [ hwmode {vepa | veb} 
]\n");
+   fprintf(stderr, "   [ backup_port DEVICE ] 
[ nobackup_port ]\n");
fprintf(stderr, "   [ self ] [ master ]\n");
fprintf(stderr, "   bridge link show [dev DEV]\n");
exit(-1);
@@ -289,6 +300,7 @@ static int brlink_modify(int argc, char **argv)
.ifm.ifi_family = PF_BRIDGE,
};
char *d = NULL;
+   int backup_port_idx = -1;
__s8 neigh_suppress = -1;
__s8 learning = -1;
__s8 learning_sync = -1;
@@ -395,6 +407,16 @@ static int brlink_modify(int argc, char **argv)
NEXT_ARG();
if (!on_off("isolated", , *argv))
return -1;
+   } else if (strcmp(*argv, "backup_port") == 0) {
+   NEXT_ARG();
+   backup_port_idx = ll_name_to_index(*argv);
+   if (!backup_port_idx) {
+   fprintf(stderr, "Error: device %s does not 
exist\n",
+   *argv);
+   return -1;
+   }
+   } else if (strcmp(*argv, "nobackup_port") == 0) {
+   backup_port_idx = 0;
} else {
usage();
}
@@ -456,6 +478,10 @@ static int brlink_modify(int argc, char **argv)
if (isolated != -1)
addattr8(, sizeof(req), IFLA_BRPORT_ISOLATED, isolated);
 
+   if (backup_port_idx != -1)
+   addattr32(, sizeof(req), IFLA_BRPORT_BACKUP_PORT,
+ backup_port_idx);
+
addattr_nest_end(, nest);
 
/* IFLA_AF_SPEC nested attribute. Contains IFLA_BRIDGE_FLAGS that
diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c
index 5a6e48559781..8b4f93f265be 100644
--- a/ip/iplink_bridge_slave.c
+++ b/ip/iplink_bridge_slave.c
@@ -41,6 +41,7 @@ static void print_explain(FILE *f)
"[ neigh_suppress {on | off} ]\n"
"[ vlan_tunnel {on | off} ]\n"
"[ isolated {on | off} ]\n"
+   "[ backup_port DEVICE ] [ nobackup_port 
]\n"
);
 }
 
@@ -279,6 +280,13 @@ static void bridge_slave_print_opt(struct link_util *lu, 
FILE *f,
if (tb[IFLA_BRPORT_ISOLATED])
_print_onoff(f, "isolated", "isolated",
 rta_getattr_u8(tb[IFLA_BRPORT_ISOLATE

Re: [PATCH] MAINTAINERS: change bridge maintainers

2018-09-27 Thread Nikolay Aleksandrov

On 27/09/18 11:47, Stephen Hemminger wrote:
> I haven't been doing reviews only but not active development on bridge
> code for several years. Roopa and Nikolay have been doing most of
> the new features and have agreed to take over as new co-maintainers.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  MAINTAINERS | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7233a9ed0f5b..123ae2e65d94 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5470,7 +5470,8 @@ S:  Odd Fixes
>  F:   drivers/net/ethernet/agere/
>  
>  ETHERNET BRIDGE
> -M:   Stephen Hemminger 
> +M:   Roopa Prabhu 
> +M:   Nikolay Aleksandrov 
>  L:   bri...@lists.linux-foundation.org (moderated for non-subscribers)
>  L:   netdev@vger.kernel.org
>  W:   http://www.linuxfoundation.org/en/Net:Bridge
> 

Thank you,
Acked-by: Nikolay Aleksandrov

[PATCH iproute2 net-next] bridge: fdb: add support for sticky flag

2018-09-27 Thread Nikolay Aleksandrov

Add support for the new sticky flag that can be set on fdbs and update the
man page.

CC: David Ahern 
Signed-off-by: Nikolay Aleksandrov 
---
 bridge/fdb.c  | 9 +++--
 man/man8/bridge.8 | 6 +-
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index 4dbc894ceab9..828fdab264cb 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -37,8 +37,8 @@ static void usage(void)
fprintf(stderr,
"Usage: bridge fdb { add | append | del | replace } ADDR dev 
DEV\n"
"  [ self ] [ master ] [ use ] [ router ] [ 
extern_learn ]\n"
-   "  [ local | static | dynamic ] [ dst IPADDR ] [ 
vlan VID ]\n"
-   "  [ port PORT] [ vni VNI ] [ via DEV ]\n"
+   "  [ sticky ] [ local | static | dynamic ] [ dst 
IPADDR ]\n"
+   "  [ vlan VID ] [ port PORT] [ vni VNI ] [ via DEV 
]\n"
"   bridge fdb [ show [ br BRDEV ] [ brport DEV ] [ vlan 
VID ] [ state STATE ] ]\n");
exit(-1);
 }
@@ -101,6 +101,9 @@ static void fdb_print_flags(FILE *fp, unsigned int flags)
if (flags & NTF_MASTER)
print_string(PRINT_ANY, NULL, "%s ", "master");
 
+   if (flags & NTF_STICKY)
+   print_string(PRINT_ANY, NULL, "%s ", "sticky");
+
close_json_array(PRINT_JSON, NULL);
 }
 
@@ -414,6 +417,8 @@ static int fdb_modify(int cmd, int flags, int argc, char 
**argv)
req.ndm.ndm_flags |= NTF_USE;
} else if (matches(*argv, "extern_learn") == 0) {
req.ndm.ndm_flags |= NTF_EXT_LEARNED;
+   } else if (matches(*argv, "sticky") == 0) {
+   req.ndm.ndm_flags |= NTF_STICKY;
} else {
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 53cd3d0a3d93..c0415bc646df 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -63,7 +63,7 @@ bridge \- show / manipulate bridge addresses and devices
 .B dev
 .IR DEV " { "
 .BR local " | " static " | " dynamic " } [ "
-.BR self " ] [ " master " ] [ " router " ] [ " use " ] [ " extern_learn " ] [ "
+.BR self " ] [ " master " ] [ " router " ] [ " use " ] [ " extern_learn " ] [ 
" sticky " ] [ "
 .B dst
 .IR IPADDR " ] [ "
 .B vni
@@ -448,6 +448,10 @@ indicate to the kernel that an entry was hardware or 
user-space
 controller learnt dynamic entry. Kernel will not age such an entry.
 .sp
 
+.B sticky
+- this entry will not change its port due to learning.
+.sp
+
 .in -8
 The next command line parameters apply only
 when the specified device
-- 
2.11.0

[PATCH net-next] selftests: forwarding: test for bridge sticky flag

2018-09-27 Thread Nikolay Aleksandrov

This test adds an fdb entry with the sticky flag and sends traffic from
a different port with the same mac as a source address expecting the entry
to not change ports if the flag is operating correctly.

Signed-off-by: Nikolay Aleksandrov 
---
 .../selftests/net/forwarding/bridge_sticky_fdb.sh  | 69 ++
 1 file changed, 69 insertions(+)
 create mode 100755 tools/testing/selftests/net/forwarding/bridge_sticky_fdb.sh

diff --git a/tools/testing/selftests/net/forwarding/bridge_sticky_fdb.sh 
b/tools/testing/selftests/net/forwarding/bridge_sticky_fdb.sh
new file mode 100755
index ..1f8ef0eff862
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/bridge_sticky_fdb.sh
@@ -0,0 +1,69 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ALL_TESTS="sticky"
+NUM_NETIFS=4
+TEST_MAC=de:ad:be:ef:13:37
+source lib.sh
+
+switch_create()
+{
+   ip link add dev br0 type bridge
+
+   ip link set dev $swp1 master br0
+   ip link set dev $swp2 master br0
+
+   ip link set dev br0 up
+   ip link set dev $h1 up
+   ip link set dev $swp1 up
+   ip link set dev $h2 up
+   ip link set dev $swp2 up
+}
+
+switch_destroy()
+{
+   ip link set dev $swp2 down
+   ip link set dev $h2 down
+   ip link set dev $swp1 down
+   ip link set dev $h1 down
+
+   ip link del dev br0
+}
+
+setup_prepare()
+{
+   h1=${NETIFS[p1]}
+   swp1=${NETIFS[p2]}
+   h2=${NETIFS[p3]}
+   swp2=${NETIFS[p4]}
+
+   switch_create
+}
+
+cleanup()
+{
+   pre_cleanup
+   switch_destroy
+}
+
+sticky()
+{
+   bridge fdb add $TEST_MAC dev $swp1 master static sticky
+   check_err $? "Could not add fdb entry"
+   bridge fdb del $TEST_MAC dev $swp1 vlan 1 master static sticky
+   $MZ $h2 -c 1 -a $TEST_MAC -t arp "request" -q
+   bridge -j fdb show br br0 brport $swp1\
+   | jq -e ".[] | select(.mac == \"$TEST_MAC\")" &> /dev/null
+   check_err $? "Did not find FDB record when should"
+
+   log_test "Sticky fdb entry"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.11.0

[PATCH net-next] net: bridge: explicitly zero is_sticky in fdb_create

2018-09-27 Thread Nikolay Aleksandrov

We need to explicitly zero is_sticky when creating a new fdb, otherwise
we might get a stale value for a new entry.

Fixes: 435f2e7cc0b7 ("net: bridge: add support for sticky fdb entries")
Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_fdb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a56ed7f2a3a3..74331690a390 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -504,6 +504,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
net_bridge *br,
fdb->added_by_user = 0;
fdb->added_by_external_learn = 0;
fdb->offloaded = 0;
+   fdb->is_sticky = 0;
fdb->updated = fdb->used = jiffies;
if (rhashtable_lookup_insert_fast(>fdb_hash_tbl,
  >rhnode,
-- 
2.11.0

[PATCH net-next] bonding: don't cast const buf in sysfs store

2018-07-22 Thread Nikolay Aleksandrov

As was recently discussed [1], let's avoid casting the const buf in
bonding_sysfs_store_option and use kstrndup/kfree instead.

[1] http://lists.openwall.net/netdev/2018/07/22/25

Signed-off-by: Nikolay Aleksandrov 
---
 drivers/net/bonding/bond_sysfs.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 6096440e96ea..35847250da5a 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -160,14 +160,19 @@ static ssize_t bonding_sysfs_store_option(struct device 
*d,
 {
struct bonding *bond = to_bond(d);
const struct bond_option *opt;
+   char *buffer_clone;
int ret;
 
opt = bond_opt_get_by_name(attr->attr.name);
if (WARN_ON(!opt))
return -ENOENT;
-   ret = bond_opt_tryset_rtnl(bond, opt->id, (char *)buffer);
+   buffer_clone = kstrndup(buffer, count, GFP_KERNEL);
+   if (!buffer_clone)
+   return -ENOMEM;
+   ret = bond_opt_tryset_rtnl(bond, opt->id, buffer_clone);
if (!ret)
ret = count;
+   kfree(buffer_clone);
 
return ret;
 }
-- 
2.11.0

Re: 答复: [PATCH][net-next] bridge: clean up mtu_set_by_user setting to false and comments

2018-07-13 Thread Nikolay Aleksandrov

On 13/07/18 12:11, Li,Rongqing wrote:
> 
> 
>> -邮件原件-----
>> 发件人: Nikolay Aleksandrov [mailto:niko...@cumulusnetworks.com]
>> 发送时间: 2018年7月13日 16:01
>> 收件人: Li,Rongqing ; netdev@vger.kernel.org
>> 主题: Re: [PATCH][net-next] bridge: clean up mtu_set_by_user setting to
>> false and comments
>>
>> On 13/07/18 09:47, Li RongQing wrote:
>>> Once mtu_set_by_user is set to true, br_mtu_auto_adjust will not run,
>>> and no chance to clear mtu_set_by_user.
>>>
>> ^^
>> This was by design, there is no error here and no "cleanup" is needed.
>> If you read the ndo_change_mtu() call you'll see the comment:
>> /* this flag will be cleared if the MTU was automatically adjusted */
>>
> But after this comment, mtu_set_by_user is set to true, and br_mtu_auto_adjust
>  will not truly be run, how to set mtu_set_by_user to false?

It will be run if the mtu_set_by_user is set to "false".

> 
> 230 /* this flag will be cleared if the MTU was automatically adjusted */
> 231 br->mtu_set_by_user = true;
> 
> And the line 457  is useless, since it run only if it is false?

dev_set_mtu() calls ndo_change_mtu() which makes mtu_set_by_user = true but 
that is
not really _true_ since this is an automatic MTU adjustment so we need to 
revert the
value to false so the automatic adjustment will continue to work.
But if you go ahead and set the MTU yourself manually, then mtu_set_by_user 
will be
equal to true and will stay that way, thus disabling the auto adjust behaviour.

This is used to differentiate when auto adjust is used and when user has set 
the MTU.
As I already said everything is working as expected and you should not remove 
this code.

> 
> 445 void br_mtu_auto_adjust(struct net_bridge *br)
> 446 {
> 447 ASSERT_RTNL();
> 448 
> 449 /* if the bridge MTU was manually configured don't mess with it */
> 450 if (br->mtu_set_by_user)
> 451 return;
> 452 
> 453 /* change to the minimum MTU and clear the flag which was set by
> 454  * the bridge ndo_change_mtu callback
> 455  */
> 456 dev_set_mtu(br->dev, br_mtu_min(br));
> 457 br->mtu_set_by_user = false;
> 458 }
> 
> 
> -R
>

[PATCH net-next] net: ipmr: add support for passing full packet on wrong vif

2018-07-13 Thread Nikolay Aleksandrov

This patch adds support for IGMPMSG_WRVIFWHOLE which is used to pass
full packet and real vif id when the incoming interface is wrong.
While the RP and FHR are setting up state we need to be sending the
registers encapsulated with all the data inside otherwise we lose it.
The RP then decapsulates it and forwards it to the interested parties.
Currently with WRONGVIF we can only be sending empty register packets
and will lose that data.
This behaviour can be enabled by using MRT_PIM with
val == IGMPMSG_WRVIFWHOLE. This doesn't prevent IGMPMSG_WRONGVIF from
happening, it happens in addition to it, also it is controlled by the same
throttling parameters as WRONGVIF (i.e. 1 packet per 3 seconds currently).
Both messages are generated to keep backwards compatibily and avoid
breaking someone who was enabling MRT_PIM with val == 4, since any
positive val is accepted and treated the same.

Signed-off-by: Nikolay Aleksandrov 
---
We have been running with this patch for over an year and FRRouting fully
supports this WRVIFWHOLE message officially. Actually it uses both WRONGVIF
and WRVIFWHOLE for different purposes right now.

 include/linux/mroute_base.h |  1 +
 include/uapi/linux/mroute.h |  2 ++
 net/ipv4/ipmr.c | 21 -
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index fd436cdd4725..6675b9f81979 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -254,6 +254,7 @@ struct mr_table {
atomic_tcache_resolve_queue_len;
boolmroute_do_assert;
boolmroute_do_pim;
+   boolmroute_do_wrvifwhole;
int mroute_reg_vif_num;
 };
 
diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
index 10f9ff9426a2..5d37a9ccce63 100644
--- a/include/uapi/linux/mroute.h
+++ b/include/uapi/linux/mroute.h
@@ -120,6 +120,7 @@ enum {
IPMRA_TABLE_MROUTE_DO_ASSERT,
IPMRA_TABLE_MROUTE_DO_PIM,
IPMRA_TABLE_VIFS,
+   IPMRA_TABLE_MROUTE_DO_WRVIFWHOLE,
__IPMRA_TABLE_MAX
 };
 #define IPMRA_TABLE_MAX (__IPMRA_TABLE_MAX - 1)
@@ -173,5 +174,6 @@ enum {
 #define IGMPMSG_NOCACHE1   /* Kern cache fill 
request to mrouted */
 #define IGMPMSG_WRONGVIF   2   /* For PIM assert processing 
(unused) */
 #define IGMPMSG_WHOLEPKT   3   /* For PIM Register processing 
*/
+#define IGMPMSG_WRVIFWHOLE 4   /* For PIM Register and assert 
processing */
 
 #endif /* _UAPI__LINUX_MROUTE_H */
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 82f914122f1b..5660adcf7a04 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1052,7 +1052,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
struct sk_buff *skb;
int ret;
 
-   if (assert == IGMPMSG_WHOLEPKT)
+   if (assert == IGMPMSG_WHOLEPKT || assert == IGMPMSG_WRVIFWHOLE)
skb = skb_realloc_headroom(pkt, sizeof(struct iphdr));
else
skb = alloc_skb(128, GFP_ATOMIC);
@@ -1060,7 +1060,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
if (!skb)
return -ENOBUFS;
 
-   if (assert == IGMPMSG_WHOLEPKT) {
+   if (assert == IGMPMSG_WHOLEPKT || assert == IGMPMSG_WRVIFWHOLE) {
/* Ugly, but we have no choice with this interface.
 * Duplicate old header, fix ihl, length etc.
 * And all this only to mangle msg->im_msgtype and
@@ -1071,9 +1071,12 @@ static int ipmr_cache_report(struct mr_table *mrt,
skb_reset_transport_header(skb);
msg = (struct igmpmsg *)skb_network_header(skb);
memcpy(msg, skb_network_header(pkt), sizeof(struct iphdr));
-   msg->im_msgtype = IGMPMSG_WHOLEPKT;
+   msg->im_msgtype = assert;
msg->im_mbz = 0;
-   msg->im_vif = mrt->mroute_reg_vif_num;
+   if (assert == IGMPMSG_WRVIFWHOLE)
+   msg->im_vif = vifi;
+   else
+   msg->im_vif = mrt->mroute_reg_vif_num;
ip_hdr(skb)->ihl = sizeof(struct iphdr) >> 2;
ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(pkt)->tot_len) +
 sizeof(struct iphdr));
@@ -1372,6 +1375,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, 
char __user *optval,
struct mr_table *mrt;
struct vifctl vif;
struct mfcctl mfc;
+   bool do_wrvifwhole;
u32 uval;
 
/* There's one exception to the lock - MRT_DONE which needs to unlock */
@@ -1502,10 +1506,12 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, 
char __user *optval,
break;
}
 
+   do_wrvifwhole = (val == IGMPMSG_WRVIFWHOLE);

Re: [PATCH][net-next] bridge: clean up mtu_set_by_user setting to false and comments

2018-07-13 Thread Nikolay Aleksandrov

On 13/07/18 09:47, Li RongQing wrote:
> Once mtu_set_by_user is set to true, br_mtu_auto_adjust will
> not run, and no chance to clear mtu_set_by_user.
> 
^^
This was by design, there is no error here and no "cleanup" is needed.
If you read the ndo_change_mtu() call you'll see the comment:
/* this flag will be cleared if the MTU was automatically adjusted */

It is the only way we can know that the MTU was automatically adjusted or set
by the user manually in which case we need to _stop_ automatically adjusting
MTU. This was done to be backwards compatible as much as possible but still
give the option to have user-configured MTU which doesn't disappear (and is
not overwritten).

So please next time read the original commit.

>From the original commit 804b854d374e ("net: bridge: disable bridge MTU auto 
>tuning if it was set manually"):
" ...
Let's improve on that situation and allow for the user to
set any MTU within ETH_MIN/MAX limits, but once manually configured it
is the user's responsibility to keep it correct afterwards.

In case the MTU isn't manually set - the behaviour reverts to the
previous and the bridge follows the minimum MTU.
...
"

> and br_mtu_auto_adjust will run only if mtu_set_by_user is
> false, so not need to set it to false again
> 
> Cc: Nikolay Aleksandrov 
> Signed-off-by: Li RongQing 
> ---
>  net/bridge/br_device.c | 1 -
>  net/bridge/br_if.c | 4 
>  2 files changed, 5 deletions(-)
> 
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index e682a668ce57..c636bc2749c2 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -227,7 +227,6 @@ static int br_change_mtu(struct net_device *dev, int 
> new_mtu)
>  
>   dev->mtu = new_mtu;
>  
> - /* this flag will be cleared if the MTU was automatically adjusted */
>   br->mtu_set_by_user = true;
>  #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
>   /* remember the MTU in the rtable for PMTU */
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 05e42d86882d..47c65da4b1be 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -450,11 +450,7 @@ void br_mtu_auto_adjust(struct net_bridge *br)
>   if (br->mtu_set_by_user)
>   return;
>  
> - /* change to the minimum MTU and clear the flag which was set by
> -  * the bridge ndo_change_mtu callback
> -  */
>   dev_set_mtu(br->dev, br_mtu_min(br));
> - br->mtu_set_by_user = false;
>  }
>  
>  static void br_set_gso_limits(struct net_bridge *br)
> 

Nacked-by: Nikolay Aleksandrov

[PATCH net-next 1/2] selftests: forwarding: lib: extract ping and ping6 so they can be reused

2018-07-03 Thread Nikolay Aleksandrov

Extract ping and ping6 command execution so the return value can be
checked by the caller, this is needed for port isolation tests that are
intended to fail.

Signed-off-by: Nikolay Aleksandrov 
---
 tools/testing/selftests/net/forwarding/lib.sh | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh 
b/tools/testing/selftests/net/forwarding/lib.sh
index e073918bbe15..2bb9cf303c53 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -659,30 +659,40 @@ multipath_eval()
 ##
 # Tests
 
-ping_test()
+ping_do()
 {
local if_name=$1
local dip=$2
local vrf_name
 
-   RET=0
-
vrf_name=$(master_name_get $if_name)
ip vrf exec $vrf_name $PING $dip -c 10 -i 0.1 -w 2 &> /dev/null
+}
+
+ping_test()
+{
+   RET=0
+
+   ping_do $1 $2
check_err $?
log_test "ping"
 }
 
-ping6_test()
+ping6_do()
 {
local if_name=$1
local dip=$2
local vrf_name
 
-   RET=0
-
vrf_name=$(master_name_get $if_name)
ip vrf exec $vrf_name $PING6 $dip -c 10 -i 0.1 -w 2 &> /dev/null
+}
+
+ping6_test()
+{
+   RET=0
+
+   ping6_do $1 $2
check_err $?
log_test "ping6"
 }
-- 
2.11.0

[PATCH iproute2 net-next] bridge: add support for isolated option

2018-07-03 Thread Nikolay Aleksandrov

This patch adds support for the new isolated port option which, if set,
would allow the isolated ports to communicate only with non-isolated
ports and the bridge device. The option can be set via the bridge or ip
link type bridge_slave commands, e.g.:
$ ip link set dev eth0 type bridge_slave isolated on
$ bridge link set dev eth0 isolated on

Signed-off-by: Nikolay Aleksandrov 
---
 bridge/link.c| 11 +++
 ip/iplink_bridge_slave.c |  9 +
 man/man8/bridge.8|  6 ++
 man/man8/ip-link.8.in|  6 --
 4 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 8d89aca2e638..9656ca338782 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -152,6 +152,9 @@ static void print_protinfo(FILE *fp, struct rtattr *attr)
if (prtb[IFLA_BRPORT_VLAN_TUNNEL])
print_onoff(fp, "vlan_tunnel",

rta_getattr_u8(prtb[IFLA_BRPORT_VLAN_TUNNEL]));
+   if (prtb[IFLA_BRPORT_ISOLATED])
+   print_onoff(fp, "isolated",
+   rta_getattr_u8(prtb[IFLA_BRPORT_ISOLATED]));
} else
print_portstate(rta_getattr_u8(attr));
 }
@@ -250,6 +253,7 @@ static void usage(void)
fprintf(stderr, "   [ mcast_flood {on | 
off} ]\n");
fprintf(stderr, "   [ neigh_suppress {on | 
off} ]\n");
fprintf(stderr, "   [ vlan_tunnel {on | 
off} ]\n");
+   fprintf(stderr, "   [ isolated {on | off} 
]\n");
fprintf(stderr, "   [ hwmode {vepa | veb} 
]\n");
fprintf(stderr, "   [ self ] [ master ]\n");
fprintf(stderr, "   bridge link show [dev DEV]\n");
@@ -291,6 +295,7 @@ static int brlink_modify(int argc, char **argv)
__s8 flood = -1;
__s8 vlan_tunnel = -1;
__s8 mcast_flood = -1;
+   __s8 isolated = -1;
__s8 hairpin = -1;
__s8 bpdu_guard = -1;
__s8 fast_leave = -1;
@@ -386,6 +391,10 @@ static int brlink_modify(int argc, char **argv)
if (!on_off("vlan_tunnel", _tunnel,
*argv))
return -1;
+   } else if (strcmp(*argv, "isolated") == 0) {
+   NEXT_ARG();
+   if (!on_off("isolated", , *argv))
+   return -1;
} else {
usage();
}
@@ -444,6 +453,8 @@ static int brlink_modify(int argc, char **argv)
if (vlan_tunnel != -1)
addattr8(, sizeof(req), IFLA_BRPORT_VLAN_TUNNEL,
 vlan_tunnel);
+   if (isolated != -1)
+   addattr8(, sizeof(req), IFLA_BRPORT_ISOLATED, isolated);
 
addattr_nest_end(, nest);
 
diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c
index 3fbfb878cdc4..5a6e48559781 100644
--- a/ip/iplink_bridge_slave.c
+++ b/ip/iplink_bridge_slave.c
@@ -40,6 +40,7 @@ static void print_explain(FILE *f)
"[ group_fwd_mask MASK ]\n"
"[ neigh_suppress {on | off} ]\n"
"[ vlan_tunnel {on | off} ]\n"
+   "[ isolated {on | off} ]\n"
);
 }
 
@@ -274,6 +275,10 @@ static void bridge_slave_print_opt(struct link_util *lu, 
FILE *f,
if (tb[IFLA_BRPORT_VLAN_TUNNEL])
_print_onoff(f, "vlan_tunnel", "vlan_tunnel",
 rta_getattr_u8(tb[IFLA_BRPORT_VLAN_TUNNEL]));
+
+   if (tb[IFLA_BRPORT_ISOLATED])
+   _print_onoff(f, "isolated", "isolated",
+rta_getattr_u8(tb[IFLA_BRPORT_ISOLATED]));
 }
 
 static void bridge_slave_parse_on_off(char *arg_name, char *arg_val,
@@ -379,6 +384,10 @@ static int bridge_slave_parse_opt(struct link_util *lu, 
int argc, char **argv,
NEXT_ARG();
bridge_slave_parse_on_off("vlan_tunnel", *argv, n,
  IFLA_BRPORT_VLAN_TUNNEL);
+   } else if (matches(*argv, "isolated") == 0) {
+   NEXT_ARG();
+   bridge_slave_parse_on_off("isolated", *argv, n,
+ IFLA_BRPORT_ISOLATED);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index e7f7148315e1..f6d228c5ebfe 1

[PATCH net-next 2/2] selftests: forwarding: test for bridge port isolation

2018-07-03 Thread Nikolay Aleksandrov

This test checks if the bridge port isolation feature works as expected
by performing ping/ping6 tests between hosts that are isolated (should
not work) and between an isolated and non-isolated hosts (should work).
Same test is performed for flooding from and to isolated and
non-isolated ports.

Signed-off-by: Nikolay Aleksandrov 
---
 .../net/forwarding/bridge_port_isolation.sh| 151 +
 1 file changed, 151 insertions(+)
 create mode 100755 
tools/testing/selftests/net/forwarding/bridge_port_isolation.sh

diff --git a/tools/testing/selftests/net/forwarding/bridge_port_isolation.sh 
b/tools/testing/selftests/net/forwarding/bridge_port_isolation.sh
new file mode 100755
index ..a43b4645c4de
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/bridge_port_isolation.sh
@@ -0,0 +1,151 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+ALL_TESTS="ping_ipv4 ping_ipv6 flooding"
+NUM_NETIFS=6
+CHECK_TC="yes"
+source lib.sh
+
+h1_create()
+{
+   simple_if_init $h1 192.0.2.1/24 2001:db8:1::1/64
+}
+
+h1_destroy()
+{
+   simple_if_fini $h1 192.0.2.1/24 2001:db8:1::1/64
+}
+
+h2_create()
+{
+   simple_if_init $h2 192.0.2.2/24 2001:db8:1::2/64
+}
+
+h2_destroy()
+{
+   simple_if_fini $h2 192.0.2.2/24 2001:db8:1::2/64
+}
+
+h3_create()
+{
+   simple_if_init $h3 192.0.2.3/24 2001:db8:1::3/64
+}
+
+h3_destroy()
+{
+   simple_if_fini $h3 192.0.2.3/24 2001:db8:1::3/64
+}
+
+switch_create()
+{
+   ip link add dev br0 type bridge
+
+   ip link set dev $swp1 master br0
+   ip link set dev $swp2 master br0
+   ip link set dev $swp3 master br0
+
+   ip link set dev $swp1 type bridge_slave isolated on
+   check_err $? "Can't set isolation on port $swp1"
+   ip link set dev $swp2 type bridge_slave isolated on
+   check_err $? "Can't set isolation on port $swp2"
+   ip link set dev $swp3 type bridge_slave isolated off
+   check_err $? "Can't disable isolation on port $swp3"
+
+   ip link set dev br0 up
+   ip link set dev $swp1 up
+   ip link set dev $swp2 up
+   ip link set dev $swp3 up
+}
+
+switch_destroy()
+{
+   ip link set dev $swp3 down
+   ip link set dev $swp2 down
+   ip link set dev $swp1 down
+
+   ip link del dev br0
+}
+
+setup_prepare()
+{
+   h1=${NETIFS[p1]}
+   swp1=${NETIFS[p2]}
+
+   swp2=${NETIFS[p3]}
+   h2=${NETIFS[p4]}
+
+   swp3=${NETIFS[p5]}
+   h3=${NETIFS[p6]}
+
+   vrf_prepare
+
+   h1_create
+   h2_create
+   h3_create
+
+   switch_create
+}
+
+cleanup()
+{
+   pre_cleanup
+
+   switch_destroy
+
+   h3_destroy
+   h2_destroy
+   h1_destroy
+
+   vrf_cleanup
+}
+
+ping_ipv4()
+{
+   RET=0
+   ping_do $h1 192.0.2.2
+   check_fail $? "Ping worked when it should not have"
+
+   RET=0
+   ping_do $h3 192.0.2.2
+   check_err $? "Ping didn't work when it should have"
+
+   log_test "Isolated port ping"
+}
+
+ping_ipv6()
+{
+   RET=0
+   ping6_do $h1 2001:db8:1::2
+   check_fail $? "Ping6 worked when it should not have"
+
+   RET=0
+   ping6_do $h3 2001:db8:1::2
+   check_err $? "Ping6 didn't work when it should have"
+
+   log_test "Isolated port ping6"
+}
+
+flooding()
+{
+   local mac=de:ad:be:ef:13:37
+   local ip=192.0.2.100
+
+   RET=0
+   flood_test_do false $mac $ip $h1 $h2
+   check_err $? "Packet was flooded when it should not have been"
+
+   RET=0
+   flood_test_do true $mac $ip $h3 $h2
+   check_err $? "Packet was not flooded when it should have been"
+
+   log_test "Isolated port flooding"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.11.0

[PATCH net-next 0/2] bridge: iproute2 isolated port and selftests

2018-07-03 Thread Nikolay Aleksandrov

Add support to iproute2 for port isolation config and selftests for it.

Nikolay Aleksandrov (2):
  selftests: forwarding: lib: extract ping and ping6 so they can be
reused
  selftests: forwarding: test for bridge port isolation

 .../net/forwarding/bridge_port_isolation.sh| 151 +
 tools/testing/selftests/net/forwarding/lib.sh  |  22 ++-
 2 files changed, 167 insertions(+), 6 deletions(-)
 create mode 100755 
tools/testing/selftests/net/forwarding/bridge_port_isolation.sh

-- 
2.11.0

Re: [PATCH net-next] net: bridge: Lock before br_fdb_find()

2018-05-28 Thread Nikolay Aleksandrov

On 28/05/18 18:52, Nikolay Aleksandrov wrote:
> On 28/05/18 18:44, Petr Machata wrote:
>> Callers of br_fdb_find() need to hold the hash lock, which
>> br_fdb_find_port() doesn't do. Add the missing lock/unlock
>> pair.
>>
>> Signed-off-by: Petr Machata <pe...@mellanox.com>
>> ---
>>  net/bridge/br_fdb.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index b19e310..3f5691a 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -135,9 +135,11 @@ struct net_device *br_fdb_find_port(const struct 
>> net_device *br_dev,
>>  return NULL;
>>  
>>  br = netdev_priv(br_dev);
>> +spin_lock_bh(>hash_lock);
>>  f = br_fdb_find(br, addr, vid);
>>  if (f && f->dst)
>>  dev = f->dst->dev;
>> +spin_unlock_bh(>hash_lock);
>>  
>>  return dev;
>>  }
>>
> 
> There's also a lockdep assert for hash_lock in br_find_fdb().
> 
> Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
> 

Fixes: 4d4fd36126d6 ("net: bridge: Publish bridge accessor functions")

Re: [PATCH net-next] net: bridge: Lock before br_fdb_find()

2018-05-28 Thread Nikolay Aleksandrov

On 28/05/18 18:44, Petr Machata wrote:
> Callers of br_fdb_find() need to hold the hash lock, which
> br_fdb_find_port() doesn't do. Add the missing lock/unlock
> pair.
> 
> Signed-off-by: Petr Machata <pe...@mellanox.com>
> ---
>  net/bridge/br_fdb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b19e310..3f5691a 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -135,9 +135,11 @@ struct net_device *br_fdb_find_port(const struct 
> net_device *br_dev,
>   return NULL;
>  
>   br = netdev_priv(br_dev);
> + spin_lock_bh(>hash_lock);
>   f = br_fdb_find(br, addr, vid);
>   if (f && f->dst)
>   dev = f->dst->dev;
> + spin_unlock_bh(>hash_lock);
>  
>   return dev;
>  }
> 

There's also a lockdep assert for hash_lock in br_find_fdb().

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 6/7] net: bridge: Notify about bridge VLANs

2018-05-25 Thread Nikolay Aleksandrov

On 24/05/18 18:10, Petr Machata wrote:
> A driver might need to react to changes in settings of brentry VLANs.
> Therefore send switchdev port notifications for these as well. Reuse
> SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use
> netif_is_bridge_master() on orig_dev to determine whether the
> notification is about a bridge port or a bridge.
> 
> Signed-off-by: Petr Machata <pe...@mellanox.com>
> ---
>  net/bridge/br_vlan.c | 14 ++
>  1 file changed, 14 insertions(+)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 1/7] net: bridge: Extract boilerplate around switchdev_port_obj_*()

2018-05-25 Thread Nikolay Aleksandrov

On 24/05/18 18:10, Petr Machata wrote:
> A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves
> initializing a struct switchdev_obj_port_vlan, a piece of code that
> repeats on each call site almost verbatim. While in the current codebase
> there is just one duplicated add call, the follow-up patches add more of
> both add and del calls.
> 
> Thus to remove the duplication, extract the repetition into named
> functions and reuse.
> 
> Signed-off-by: Petr Machata <pe...@mellanox.com>
> ---
>  net/bridge/br_vlan.c | 44 +++-
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

[PATCH net-next] net: bridge: add support for port isolation

2018-05-24 Thread Nikolay Aleksandrov

This patch adds support for a new port flag - BR_ISOLATED. If it is set
then isolated ports cannot communicate between each other, but they can
still communicate with non-isolated ports. The same can be achieved via
ACLs but they can't scale with large number of ports and also the
complexity of the rules grows. This feature can be used to achieve
isolated vlan functionality (similar to pvlan) as well, though currently
it will be port-wide (for all vlans on the port). The new test in
should_deliver uses data that is already cache hot and the new boolean
is used to avoid an additional source port test in should_deliver.

Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
---
 include/linux/if_bridge.h| 1 +
 include/uapi/linux/if_link.h | 1 +
 net/bridge/br_forward.c  | 3 ++-
 net/bridge/br_input.c| 1 +
 net/bridge/br_netlink.c  | 9 -
 net/bridge/br_private.h  | 9 +
 net/bridge/br_sysfs_if.c | 2 ++
 7 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 585d27182425..7843b98e1c6e 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -50,6 +50,7 @@ struct br_ip_list {
 #define BR_VLAN_TUNNEL BIT(13)
 #define BR_BCAST_FLOOD BIT(14)
 #define BR_NEIGH_SUPPRESS  BIT(15)
+#define BR_ISOLATEDBIT(16)
 
 #define BR_DEFAULT_AGEING_TIME (300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b85266420bfb..cf01b6824244 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -333,6 +333,7 @@ enum {
IFLA_BRPORT_BCAST_FLOOD,
IFLA_BRPORT_GROUP_FWD_MASK,
IFLA_BRPORT_NEIGH_SUPPRESS,
+   IFLA_BRPORT_ISOLATED,
__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7a7fd672ccf2..9019f326fe81 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -30,7 +30,8 @@ static inline int should_deliver(const struct net_bridge_port 
*p,
vg = nbp_vlan_group_rcu(p);
return ((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
br_allowed_egress(vg, skb) && p->state == BR_STATE_FORWARDING &&
-   nbp_switchdev_allowed_egress(p, skb);
+   nbp_switchdev_allowed_egress(p, skb) &&
+   !br_skb_isolated(p, skb);
 }
 
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff 
*skb)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7f98a7d25866..72074276c088 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -114,6 +114,7 @@ int br_handle_frame_finish(struct net *net, struct sock 
*sk, struct sk_buff *skb
goto drop;
 
BR_INPUT_SKB_CB(skb)->brdev = br->dev;
+   BR_INPUT_SKB_CB(skb)->src_port_isolated = !!(p->flags & BR_ISOLATED);
 
if (IS_ENABLED(CONFIG_INET) &&
(skb->protocol == htons(ETH_P_ARP) ||
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 015f465c514b..9f5eb05b0373 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -139,6 +139,7 @@ static inline size_t br_port_info_size(void)
+ nla_total_size(1) /* IFLA_BRPORT_PROXYARP_WIFI */
+ nla_total_size(1) /* IFLA_BRPORT_VLAN_TUNNEL */
+ nla_total_size(1) /* IFLA_BRPORT_NEIGH_SUPPRESS */
+   + nla_total_size(1) /* IFLA_BRPORT_ISOLATED */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* 
IFLA_BRPORT_ROOT_ID */
+ nla_total_size(sizeof(struct ifla_bridge_id)) /* 
IFLA_BRPORT_BRIDGE_ID */
+ nla_total_size(sizeof(u16))   /* IFLA_BRPORT_DESIGNATED_PORT 
*/
@@ -213,7 +214,8 @@ static int br_port_fill_attrs(struct sk_buff *skb,
BR_VLAN_TUNNEL)) ||
nla_put_u16(skb, IFLA_BRPORT_GROUP_FWD_MASK, p->group_fwd_mask) ||
nla_put_u8(skb, IFLA_BRPORT_NEIGH_SUPPRESS,
-  !!(p->flags & BR_NEIGH_SUPPRESS)))
+  !!(p->flags & BR_NEIGH_SUPPRESS)) ||
+   nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)))
return -EMSGSIZE;
 
timerval = br_timer_value(>message_age_timer);
@@ -660,6 +662,7 @@ static const struct nla_policy 
br_port_policy[IFLA_BRPORT_MAX + 1] = {
[IFLA_BRPORT_VLAN_TUNNEL] = { .type = NLA_U8 },
[IFLA_BRPORT_GROUP_FWD_MASK] = { .type = NLA_U16 },
[IFLA_BRPORT_NEIGH_SUPPRESS] = { .type = NLA_U8 },
+   [IFLA_BRPORT_ISOLATED]  = { .type = NLA_U8 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -810,6 +813,10 @@ static int br_setport(struct net_bridge_port *p, struct 
nlattr *tb[])
if

Re: [PATCH net] ipmr: properly check rhltable_init() return value

2018-05-21 Thread Nikolay Aleksandrov

On 05/21/2018 08:51 PM, Eric Dumazet wrote:
> commit 8fb472c09b9d ("ipmr: improve hash scalability")
> added a call to rhltable_init() without checking its return value.
> 
> This problem was then later copied to IPv6 and factorized in commit
> 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 31552 Comm: syz-executor7 Not tainted 4.17.0-rc5+ #60
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:rht_key_hashfn include/linux/rhashtable.h:277 [inline]
> RIP: 0010:__rhashtable_lookup include/linux/rhashtable.h:630 [inline]
> RIP: 0010:rhltable_lookup include/linux/rhashtable.h:716 [inline]
> RIP: 0010:mr_mfc_find_parent+0x2ad/0xbb0 net/ipv4/ipmr_base.c:63
> RSP: 0018:8801826aef70 EFLAGS: 00010203
> RAX: 0001 RBX: 0001 RCX: c90001ea
> RDX: 0079 RSI: 8661e859 RDI: 000c
> RBP: 8801826af1c0 R08: 8801b2212000 R09: ed003b5e46c2
> R10: ed003b5e46c2 R11: 8801daf23613 R12: dc00
> R13: 8801826af198 R14: 8801cf8225c0 R15: 8801826af658
> FS:  7ff7fa732700() GS:8801daf0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0003ff9c CR3: 0001b021 CR4: 001406e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  ip6mr_cache_find_parent net/ipv6/ip6mr.c:981 [inline]
>  ip6mr_mfc_delete+0x1fe/0x6b0 net/ipv6/ip6mr.c:1221
>  ip6_mroute_setsockopt+0x15c6/0x1d70 net/ipv6/ip6mr.c:1698
>  do_ipv6_setsockopt.isra.9+0x422/0x4660 net/ipv6/ipv6_sockglue.c:163
>  ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:922
>  rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1060
>  sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
>  __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
>  __do_sys_setsockopt net/socket.c:1914 [inline]
>  __se_sys_setsockopt net/socket.c:1911 [inline]
>  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
> Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
> Cc: Yuval Mintz <yuv...@mellanox.com>
> Reported-by: syzbot <syzkal...@googlegroups.com>
> ---
>  net/ipv4/ipmr_base.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 

Oops :). Thanks!

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 1/4] net: bridge: Allow bridge master in br_vlan_get_info()

2018-05-10 Thread Nikolay Aleksandrov

On 10/05/18 13:13, Ido Schimmel wrote:
> From: Petr Machata <pe...@mellanox.com>
> 
> Mirroring offload in mlxsw needs to check that a given VLAN is allowed
> to ingress the bridge device. br_vlan_get_info() is the function that is
> used for this, however currently it only supports bridge port devices.
> Extend it to support bridge masters as well.
> 
> Signed-off-by: Petr Machata <pe...@mellanox.com>
> Signed-off-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  net/bridge/br_vlan.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index df37a5137c25..dc832c0934c6 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1176,6 +1176,8 @@ int br_vlan_get_info(const struct net_device *dev, u16 
> vid,
>   p = br_port_get_check_rtnl(dev);
>   if (p)
>   vg = nbp_vlan_group(p);
> + else if (netif_is_bridge_master(dev))
> + vg = br_vlan_group(netdev_priv(dev));
>   else
>   return -EINVAL;
>  
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next] net: dsa: fix added_by_user switchdev notification

2018-05-09 Thread Nikolay Aleksandrov

sa_port_fdb_add+0x50/0x6c)
 [   75.811371] [<806a9c2c>] (dsa_port_fdb_add) from [<806ab228>] 
(dsa_slave_switchdev_event_work+0xb0/0x10c)
 [   75.819635]  r4:9ddffc00
 [   75.820885] [<806ab178>] (dsa_slave_switchdev_event_work) from 
[<80131e30>] (process_one_work+0x120/0x3a4)
 [   75.829241]  r6:9f40d200 r5:9f4c1900 r4:9ddffc00 r3:806ab178
 [   75.833612] [<80131d10>] (process_one_work) from [<80132124>] 
(worker_thread+0x70/0x574)
 [   75.840415]  r10:9f4c1900 r9:9f40d200 r8:9f54a038 r7:9f40d214 
r6:812119a0 r5:9f4c1918
 [   75.846945]  r4:9f40d200
 [   75.848191] [<801320b4>] (worker_thread) from [<80137ad0>] 
(kthread+0x130/0x160)
 [   75.854300]  r10:9f4f7e88 r9:9f5f2dd8 r8:801320b4 r7:9f4c1900 
r6: r5:9f53dbc0
 [   75.860830]  r4:9f5f2dc0
 [   75.862076] [<801379a0>] (kthread) from [<801010e8>] 
(ret_from_fork+0x14/0x2c)
 [   75.867999] Exception stack(0x9f54bfb0 to 0x9f54bff8)
 [   75.871753] bfa0:   
 
 [   75.878640] bfc0:       
 
 [   75.885519] bfe0:     0013 
 [   75.890844]  r10: r9: r8: r7: 
r6: r5:801379a0
 [   75.897377]  r4:9f53dbc0 r3:9f54a000
 [   75.899663] Code: e3a02000 e3a03000 e14b26f4 e24bc055 (e5973000)
 [   75.904575] ---[ end trace fbca818a124dbf0d ]---

Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev 
notifications")
Signed-off-by: Vivien Didelot <vivien.dide...@savoirfairelinux.com>
---
@petr I expect the same issue with Rocker, but I haven't tested it.

  net/dsa/slave.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index c287f1ef964c..746ab428a17a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1395,6 +1395,9 @@ static void dsa_slave_switchdev_event_work(struct 
work_struct *work)
switch (switchdev_work->event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE:
fdb_info = _work->fdb_info;
+   if (!fdb_info->added_by_user)
+   break;
+
err = dsa_port_fdb_add(dp, fdb_info->addr, fdb_info->vid);
if (err) {
netdev_dbg(dev, "fdb add failed err=%d\n", err);
@@ -1406,6 +1409,9 @@ static void dsa_slave_switchdev_event_work(struct 
work_struct *work)
  
  	case SWITCHDEV_FDB_DEL_TO_DEVICE:

fdb_info = _work->fdb_info;
+   if (!fdb_info->added_by_user)
+   break;
+
err = dsa_port_fdb_del(dp, fdb_info->addr, fdb_info->vid);
if (err) {
netdev_dbg(dev, "fdb del failed err=%d\n", err);
@@ -1441,7 +1447,6 @@ static int dsa_slave_switchdev_event(struct 
notifier_block *unused,
 unsigned long event, void *ptr)
  {
struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
-   struct switchdev_notifier_fdb_info *fdb_info = ptr;
struct dsa_switchdev_event_work *switchdev_work;
  
  	if (!dsa_slave_dev_check(dev))

@@ -1459,10 +1464,7 @@ static int dsa_slave_switchdev_event(struct 
notifier_block *unused,
switch (event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE: /* fall through */
case SWITCHDEV_FDB_DEL_TO_DEVICE:
-   if (!fdb_info->added_by_user)
-   break;
-   if (dsa_slave_switchdev_fdb_work_init(switchdev_work,
-         fdb_info))
+   if (dsa_slave_switchdev_fdb_work_init(switchdev_work, ptr))
goto err_fdb_work_init;
dev_hold(dev);
break;



Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next mlxsw v2 2/2] net: bridge: Notify about !added_by_user FDB entries

2018-05-03 Thread Nikolay Aleksandrov


On 03/05/18 15:43, Petr Machata wrote:

Do not automatically bail out on sending notifications about activity on
non-user-added FDB entries. Instead, notify about this activity except
for cases where the activity itself originates in a notification, to
avoid sending duplicate notifications.

Signed-off-by: Petr Machata <pe...@mellanox.com>
---
  net/bridge/br.c   |  4 ++--
  net/bridge/br_fdb.c   | 47 ++-
  net/bridge/br_private.h   |  6 --
  net/bridge/br_switchdev.c |  2 +-
  4 files changed, 33 insertions(+), 26 deletions(-)



Thanks, looks good to me! In the future please add the reviewers to the CC list
when sending a v2, I actually missed the v2 set and saw your reply to the cover
letter patch later.

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

[PATCH net-next] net: bridge: avoid duplicate notification on up/down/change netdev events

2018-05-03 Thread Nikolay Aleksandrov

While handling netdevice events, br_device_event() sometimes uses
br_stp_(disable|enable)_port which unconditionally send a notification,
but then a second notification for the same event is sent at the end of
the br_device_event() function. To avoid sending duplicate notifications
in such cases, check if one has already been sent (i.e.
br_stp_enable/disable_port have been called).
The patch is based on a change by Satish Ashok.

Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
---
We've been running with a similar patch for over an year, it's been
thoroughly tested. Sending for net-next since it's an improvement and
not really a bug fix.

 net/bridge/br.c | 12 
 net/bridge/br_if.c  | 11 ---
 net/bridge/br_private.h |  2 +-
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index 671d13c10f6f..2ca035054664 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -34,6 +34,7 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct net_bridge_port *p;
struct net_bridge *br;
+   bool notified = false;
bool changed_addr;
int err;
 
@@ -67,7 +68,7 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
break;
 
case NETDEV_CHANGE:
-   br_port_carrier_check(p);
+   br_port_carrier_check(p, );
break;
 
case NETDEV_FEAT_CHANGE:
@@ -76,8 +77,10 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
 
case NETDEV_DOWN:
spin_lock_bh(>lock);
-   if (br->dev->flags & IFF_UP)
+   if (br->dev->flags & IFF_UP) {
br_stp_disable_port(p);
+   notified = true;
+   }
spin_unlock_bh(>lock);
break;
 
@@ -85,6 +88,7 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
if (netif_running(br->dev) && netif_oper_up(dev)) {
spin_lock_bh(>lock);
br_stp_enable_port(p);
+   notified = true;
spin_unlock_bh(>lock);
}
break;
@@ -110,8 +114,8 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
}
 
/* Events that may cause spanning tree to refresh */
-   if (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
-   event == NETDEV_CHANGE || event == NETDEV_DOWN)
+   if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP ||
+ event == NETDEV_CHANGE || event == NETDEV_DOWN))
br_ifinfo_notify(RTM_NEWLINK, NULL, p);
 
return NOTIFY_DONE;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 82c1a6f430b3..e3a8ea1bcbe2 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -64,7 +64,7 @@ static int port_cost(struct net_device *dev)
 
 
 /* Check for port carrier transitions. */
-void br_port_carrier_check(struct net_bridge_port *p)
+void br_port_carrier_check(struct net_bridge_port *p, bool *notified)
 {
struct net_device *dev = p->dev;
struct net_bridge *br = p->br;
@@ -73,16 +73,21 @@ void br_port_carrier_check(struct net_bridge_port *p)
netif_running(dev) && netif_oper_up(dev))
p->path_cost = port_cost(dev);
 
+   *notified = false;
if (!netif_running(br->dev))
return;
 
spin_lock_bh(>lock);
if (netif_running(dev) && netif_oper_up(dev)) {
-   if (p->state == BR_STATE_DISABLED)
+   if (p->state == BR_STATE_DISABLED) {
br_stp_enable_port(p);
+   *notified = true;
+   }
} else {
-   if (p->state != BR_STATE_DISABLED)
+   if (p->state != BR_STATE_DISABLED) {
br_stp_disable_port(p);
+   *notified = true;
+   }
}
spin_unlock_bh(>lock);
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1a5093115534..0ddeeea2c6a7 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -573,7 +573,7 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
  enum br_pkt_type pkt_type, bool local_rcv, bool local_orig);
 
 /* br_if.c */
-void br_port_carrier_check(struct net_bridge_port *p);
+void br_port_carrier_check(struct net_bridge_port *p, bool *notified);
 int br_add_bridge(struct net *net, const char *name);
 int br_del_bridge(struct net *net, const char *name);
 int br_add_if(struct net_bridge *br, struct net_device *dev,
-- 
2.11.0

Re: [PATCH net-next 1/2] switchdev: Add fdb.added_by_user to switchdev notifications

2018-05-01 Thread Nikolay Aleksandrov


On 01/05/18 20:04, Petr Machata wrote:

The following patch enables sending notifications also for events on FDB
entries that weren't added by the user. Give the drivers the information
necessary to distinguish between the two origins of FDB entries.

To maintain the current behavior, have switchdev-implementing drivers
bail out on notifications about non-user-added FDB entries. In case of
mlxsw driver, allow a call to mlxsw_sp_span_respin() so that SPAN over
bridge catches up with the changed FDB.

Signed-off-by: Petr Machata <pe...@mellanox.com>
---
  drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c |  4 
  drivers/net/ethernet/rocker/rocker_main.c|  2 ++
  include/net/switchdev.h  |  1 +
  net/bridge/br_switchdev.c| 10 +++---
  net/dsa/slave.c  |  5 -
  5 files changed, 18 insertions(+), 4 deletions(-)



Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 2/2] net: bridge: Notify about !added_by_user FDB entries

2018-05-01 Thread Nikolay Aleksandrov


On 01/05/18 20:04, Petr Machata wrote:

Do not automatically bail out on sending notifications about activity on
non-user-added FDB entries. Instead, notify about this activity except
for cases where the activity itself originates in a notification, to
avoid sending duplicate notifications.

Signed-off-by: Petr Machata 
---
  net/bridge/br.c   |  4 ++--
  net/bridge/br_fdb.c   | 40 +++-
  net/bridge/br_private.h   |  4 ++--
  net/bridge/br_switchdev.c |  2 +-
  4 files changed, 32 insertions(+), 18 deletions(-)



Hi Petr,
We already have 7 different fdb delete functions, I'm really not a fan of
adding yet another one for such trivial change.
Why don't you just add the new notify parameter to the already existing
fdb_delete() ? (actually about the name see below)
IMO it's confusing - if one wants a notification then use fdb_delete() or 
__fdb_delete(true)
vs __fdb_delete(false) if a notification is not required. I think simply having 
the last
parameter everywhere for fdb_delete() shows the intention clearer and avoids 
another
fdb delete function.

Another point, the notify parameter has a confusing name in this context because
you're controlling the switchdev notifications not the rtnetlink ones. I'd 
suggest
changing the name to something more descriptive like swdev_notify, otherwise you
could get the funny end result of calling __fdb_notify() with notify == false 
which
to me means don't notify. :-)

Also please add the bridge maintainers to the CC list.

Thanks,
 Nik

Re: [PATCH net-next v3 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror

2018-04-29 Thread Nikolay Aleksandrov


On 29/04/18 10:56, Ido Schimmel wrote:

From: Petr Machata <pe...@mellanox.com>

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.

Signed-off-by: Petr Machata <pe...@mellanox.com>
Signed-off-by: Ido Schimmel <ido...@mellanox.com>
---
  .../net/ethernet/mellanox/mlxsw/spectrum_span.c| 95 --
  .../net/ethernet/mellanox/mlxsw/spectrum_span.h|  1 +
  2 files changed, 90 insertions(+), 6 deletions(-)



Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v3 1/6] net: bridge: Publish bridge accessor functions

2018-04-29 Thread Nikolay Aleksandrov


On 29/04/18 10:56, Ido Schimmel wrote:

From: Petr Machata <pe...@mellanox.com>

Add a couple new functions to allow querying FDB and vlan settings of a
bridge.

Signed-off-by: Petr Machata <pe...@mellanox.com>
Signed-off-by: Ido Schimmel <ido...@mellanox.com>
---
  include/linux/if_bridge.h | 28 
  net/bridge/br_fdb.c   | 22 ++
  net/bridge/br_private.h   | 11 +++
  net/bridge/br_vlan.c  | 39 +++
  4 files changed, 100 insertions(+)



Looks good,
Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCHv2 net] bridge: check iface upper dev when setting master via ioctl

2018-04-28 Thread Nikolay Aleksandrov


On 27/04/18 15:59, Hangbin Liu wrote:

When we set a bond slave's master to bridge via ioctl, we only check
the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
at netdev_master_upper_dev_link() later, it already does some settings
and allocates some resources. It would be better to return as early
as possible.

v1 -> v2:
use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev()
to check if we have a master, because not all upper devs are masters,
e.g. vlan device.

Reported-by: syzbot+de73361ee4971b6e6...@syzkaller.appspotmail.com
Signed-off-by: Hangbin Liu <liuhang...@gmail.com>
---
  net/bridge/br_if.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 82c1a6f..5bb6681 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -518,8 +518,8 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
return -ELOOP;
}
  
-	/* Device is already being bridged */

-   if (br_port_exists(dev))
+   /* Device has master upper dev */
+   if (netdev_master_upper_dev_get(dev))
return -EBUSY;
  
  	/* No bridging devices that dislike that (e.g. wireless) */




Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 1/6] net: bridge: Publish bridge accessor functions

2018-04-27 Thread Nikolay Aleksandrov

On 27/04/18 19:08, Petr Machata wrote:
> Nikolay Aleksandrov <niko...@cumulusnetworks.com> writes:
> 
>> On 27/04/18 18:11, Ido Schimmel wrote:
>>> From: Petr Machata <pe...@mellanox.com>
>>>
>>> Add a couple new functions to allow querying FDB and vlan settings of a
>>> bridge.
>>>
>>> Signed-off-by: Petr Machata <pe...@mellanox.com>
>>> Signed-off-by: Ido Schimmel <ido...@mellanox.com>
>>> ---
>>>  include/linux/if_bridge.h | 28 
>>>  net/bridge/br_fdb.c   | 22 ++
>>>  net/bridge/br_private.h   | 11 +++
>>>  net/bridge/br_vlan.c  | 39 +++
>>>  4 files changed, 100 insertions(+)
>>>
>>
>> Thanks! This looks good to me although the new exported helpers could've
>> taken both bridge or port and return the result. Usually when adding a
>> port-only functions we name them with nbp_ prefix instead of br_.
>>
>> Anyway this can be done later since the API is internal,
> 
> The idea is that the API would accept both ports and bridges, but
> currently there's no user for the parts that I didn't code up--it would
> be a dead code. It's super simple to extend these if/when there are
> clients, e.g.:
> 
> modified   net/bridge/br_vlan.c
> @@ -1155,7 +1155,10 @@ int br_vlan_pvid_rtnl(const struct net_device *dev, 
> u16 *p_pvid)
>   struct net_bridge_vlan_group *vg;
>  
>   ASSERT_RTNL();
> - if (netif_is_bridge_master(dev))
> + p = br_port_get_check_rtnl(dev);
> + if (p)
> + vg = nbp_vlan_group(p);
> + else if (netif_is_bridge_master(dev))
>   vg = br_vlan_group(netdev_priv(dev));
>   else
>   return -EINVAL;
> 
> I can post a follow-up with the renames if you prefer that.
> 
> Thanks,
> Petr
> 

I know, that's why I said it can be done later. :-)  No need to do it
now, it was only a comment based on other accessors in the bridge code
and for completeness. I'm very happy with this version and have
acked/reviewed it.

Cheers,
 Nik

Re: [PATCH net-next v2 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror

2018-04-27 Thread Nikolay Aleksandrov

On 27/04/18 18:11, Ido Schimmel wrote:
> From: Petr Machata <pe...@mellanox.com>
> 
> When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
> underlay address (i.e. the remote address of the tunnel) may be routed
> to a bridge.
> 
> In that case, look up the resolved neighbor Ethernet address in that
> bridge's FDB. Then configure the offload to direct the mirrored traffic
> to that port, possibly with tagging.
> 
> Signed-off-by: Petr Machata <pe...@mellanox.com>
> Signed-off-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  .../net/ethernet/mellanox/mlxsw/spectrum_span.c| 95 
> --
>  .../net/ethernet/mellanox/mlxsw/spectrum_span.h|  1 +
>  2 files changed, 90 insertions(+), 6 deletions(-)
>

Looks good, thanks!

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 1/6] net: bridge: Publish bridge accessor functions

2018-04-27 Thread Nikolay Aleksandrov

On 27/04/18 18:11, Ido Schimmel wrote:
> From: Petr Machata <pe...@mellanox.com>
> 
> Add a couple new functions to allow querying FDB and vlan settings of a
> bridge.
> 
> Signed-off-by: Petr Machata <pe...@mellanox.com>
> Signed-off-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  include/linux/if_bridge.h | 28 
>  net/bridge/br_fdb.c   | 22 ++
>  net/bridge/br_private.h   | 11 +++
>  net/bridge/br_vlan.c  | 39 +++
>  4 files changed, 100 insertions(+)
> 

Thanks! This looks good to me although the new exported helpers could've
taken both bridge or port and return the result. Usually when adding a
port-only functions we name them with nbp_ prefix instead of br_.

Anyway this can be done later since the API is internal,

Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net] bridge: check iface upper dev when setting master via ioctl

2018-04-27 Thread Nikolay Aleksandrov


On 27/04/18 04:31, Hangbin Liu wrote:

Hi Nikolay,

Thanks for the comments.
On Thu, Apr 26, 2018 at 05:22:46PM +0300, Nikolay Aleksandrov wrote:

Not all upper devs are masters. This can break some setups.


Ah, like vlan device.. So how about

+   if (netdev_master_upper_dev_get(dev))
return -EBUSY;


That should be fine, yes.








Also it's not really a bug, the device begins to get initialized but it
will get removed at netdev_master_upper_dev_link() anyway if there's
already a master. Why would it be better ?



It's clearly wrong to try and enslave a device that already has a master
via ioctl, rtnetlink already deals with that and the old ioctl interface
will get an error, yes it will initialize some structs but they'll get
freed later. This is common practice, check the bonding for example.


Bonding use netdev_is_rx_handler_busy(slave_dev) to check if the slave
already has a master, which is another solution.


Some masters don't use rx_handlers and the bonding fails at linking them
as a master which is still fine, it cleans up after the error like the bridge.



If anything do the check in the ioctl interface (add_del_if) only and
maybe target net-next, there's really no bug fix here. IMO it's not


What if someone do like

while true; do brctl addif br0 bond_slave &; done

I know this is stupid and almost no one will do that in real world.
But syzbot run some similar test and get warn from kobject_add_internal()
with -ENOMEM. That's why I think we should fix it before allocate any
resource.

What do you think?


The bridge code is only a symptom of what happened, that warn was placed to
warn people against doing stupid things - it was literally in the commit message
of some kobject patch. As long as the resources involved are cleaned up and it's
returned to the bridge to cleanup after itself, it should be fine.
 
You can add the check if you feel like it, I don't have an

objection against failing earlier. My main concern was the netdev_has_any_upper
usage which can break some setups.

Cheers,
 Nik



[1] 
https://syzkaller.appspot.com/bug?id=3e0339080acd6a2a350a900bc6533b03f5498490

Thanks
Hangbin

Re: [PATCH net-next] bridge: use hlist_entry_safe

2018-04-26 Thread Nikolay Aleksandrov

On 26/04/18 06:07, YueHaibing wrote:
> Use hlist_entry_safe() instead of open-coding it.
> 
> Signed-off-by: YueHaibing <yuehaib...@huawei.com>
> ---
>  net/bridge/br_forward.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index b4eed11..7a7fd67 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -274,8 +274,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
>   struct net_bridge_port *port, *lport, *rport;
>  
>   lport = p ? p->port : NULL;
> - rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
> -  NULL;
> + rport = hlist_entry_safe(rp, struct net_bridge_port, rlist);
>  
>   if ((unsigned long)lport > (unsigned long)rport) {
>   port = lport;
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net] bridge: check iface upper dev when setting master via ioctl

2018-04-26 Thread Nikolay Aleksandrov

On 26/04/18 17:00, Nikolay Aleksandrov wrote:
> On 26/04/18 16:56, Hangbin Liu wrote:
>> When we set a bond slave's master to bridge via ioctl, we only check
>> the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
>> at netdev_master_upper_dev_link() later, it already does some settings
>> and allocates some resources. So it would be better to return as early
>> as possible.
>>
>> Reported-by: syzbot+de73361ee4971b6e6...@syzkaller.appspotmail.com
>> Signed-off-by: Hangbin Liu <liuhang...@gmail.com>
>> ---
>>  net/bridge/br_if.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>> index 82c1a6f..176de8a9 100644
>> --- a/net/bridge/br_if.c
>> +++ b/net/bridge/br_if.c
>> @@ -518,8 +518,8 @@ int br_add_if(struct net_bridge *br, struct net_device 
>> *dev,
>>  return -ELOOP;
>>  }
>>  
>> -/* Device is already being bridged */
>> -if (br_port_exists(dev))
>> +/* Device has master upper dev */
>> +if (netdev_has_any_upper_dev(dev))
>>  return -EBUSY;
>>  
>>  /* No bridging devices that dislike that (e.g. wireless) */
>>
> 
> Not all upper devs are masters. This can break some setups.
> 
> 

Also it's not really a bug, the device begins to get initialized but it
will get removed at netdev_master_upper_dev_link() anyway if there's
already a master. Why would it be better ?
It's clearly wrong to try and enslave a device that already has a master
via ioctl, rtnetlink already deals with that and the old ioctl interface
will get an error, yes it will initialize some structs but they'll get
freed later. This is common practice, check the bonding for example.

If anything do the check in the ioctl interface (add_del_if) only and
maybe target net-next, there's really no bug fix here. IMO it's not
needed even there, but it doesn't hurt either so up to you.

Thanks,
 Nik

Re: [PATCH net] bridge: check iface upper dev when setting master via ioctl

2018-04-26 Thread Nikolay Aleksandrov

On 26/04/18 16:56, Hangbin Liu wrote:
> When we set a bond slave's master to bridge via ioctl, we only check
> the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
> at netdev_master_upper_dev_link() later, it already does some settings
> and allocates some resources. So it would be better to return as early
> as possible.
> 
> Reported-by: syzbot+de73361ee4971b6e6...@syzkaller.appspotmail.com
> Signed-off-by: Hangbin Liu 
> ---
>  net/bridge/br_if.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 82c1a6f..176de8a9 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -518,8 +518,8 @@ int br_add_if(struct net_bridge *br, struct net_device 
> *dev,
>   return -ELOOP;
>   }
>  
> - /* Device is already being bridged */
> - if (br_port_exists(dev))
> + /* Device has master upper dev */
> + if (netdev_has_any_upper_dev(dev))
>   return -EBUSY;
>  
>   /* No bridging devices that dislike that (e.g. wireless) */
> 

Not all upper devs are masters. This can break some setups.

Re: WARNING: kobject bug in br_add_if

2018-04-26 Thread Nikolay Aleksandrov


On 26/04/18 14:49, Nikolay Aleksandrov wrote:

On 26/04/18 13:37, Hangbin Liu wrote:

On Thu, Apr 26, 2018 at 10:04:16AM +0200, Dmitry Vyukov wrote:

On Thu, Apr 26, 2018 at 8:13 AM, Hangbin Liu <liuhang...@gmail.com> wrote:

On Wed, Apr 11, 2018 at 05:18:23PM +0200, Dmitry Vyukov wrote:

On Wed, Apr 11, 2018 at 5:15 PM, syzbot
<syzbot+de73361ee4971b6e6...@syzkaller.appspotmail.com> wrote:

kobject_add_internal failed for brport (error: -12 parent: bond0)

[snip]


Re-checked the error. This is a -ENOMEM. So normally we could ignore it.

But on the other hand, although we could find out the slave iface's
master in netdev_master_upper_dev_link(). It already go much further
and allocate some resource and change iface state. e.g.

[54273.968516] br0: port 1(em1) entered blocking state
[54273.973979] br0: port 1(em1) entered disabled state

So I think we'd better return as early as possible. I will post a fix
for this.

Thanks
Hangbin


If I'm not mistaken the bridge allocated resources for the port are
cleaned on kobject_init_and_add() error return. Or are you talking
about some other resources ?



Ah, my bad - you weren't talking about resource freeing.
Nevermind my comment.

Re: WARNING: kobject bug in br_add_if

2018-04-26 Thread Nikolay Aleksandrov


On 26/04/18 13:37, Hangbin Liu wrote:

On Thu, Apr 26, 2018 at 10:04:16AM +0200, Dmitry Vyukov wrote:

On Thu, Apr 26, 2018 at 8:13 AM, Hangbin Liu  wrote:

On Wed, Apr 11, 2018 at 05:18:23PM +0200, Dmitry Vyukov wrote:

On Wed, Apr 11, 2018 at 5:15 PM, syzbot
 wrote:

kobject_add_internal failed for brport (error: -12 parent: bond0)

[snip]


Re-checked the error. This is a -ENOMEM. So normally we could ignore it.

But on the other hand, although we could find out the slave iface's
master in netdev_master_upper_dev_link(). It already go much further
and allocate some resource and change iface state. e.g.

[54273.968516] br0: port 1(em1) entered blocking state
[54273.973979] br0: port 1(em1) entered disabled state

So I think we'd better return as early as possible. I will post a fix
for this.

Thanks
Hangbin


If I'm not mistaken the bridge allocated resources for the port are
cleaned on kobject_init_and_add() error return. Or are you talking
about some other resources ?

Re: [PATCH net-next 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror

2018-04-26 Thread Nikolay Aleksandrov


On 26/04/18 12:06, Ido Schimmel wrote:

From: Petr Machata 

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.

Signed-off-by: Petr Machata 
Signed-off-by: Ido Schimmel 
---
  .../net/ethernet/mellanox/mlxsw/spectrum_span.c| 102 +++--
  .../net/ethernet/mellanox/mlxsw/spectrum_span.h|   1 +
  2 files changed, 97 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index 65a77708ff61..92fb81839852 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -32,6 +32,7 @@
   * POSSIBILITY OF SUCH DAMAGE.
   */
  
+#include 

  #include 
  #include 
  #include 
@@ -39,8 +40,9 @@
  #include 
  
  #include "spectrum.h"

-#include "spectrum_span.h"
  #include "spectrum_ipip.h"
+#include "spectrum_span.h"
+#include "spectrum_switchdev.h"
  
  int mlxsw_sp_span_init(struct mlxsw_sp *mlxsw_sp)

  {
@@ -167,6 +169,79 @@ mlxsw_sp_span_entry_unoffloadable(struct 
mlxsw_sp_span_parms *sparmsp)
return 0;
  }
  
+static struct net_device *

+mlxsw_sp_span_entry_bridge_8021q(const struct net_device *br_dev,
+unsigned char *dmac,
+u16 *p_vid)
+{
+   struct net_bridge_vlan_group *vg = br_vlan_group_rtnl(br_dev);
+   u16 pvid = br_vlan_group_pvid(vg);
+   struct net_device *edev = NULL;
+   struct net_bridge_vlan *v;
+


Hi,
These structures are not really exported anywhere outside of br_private.h
Instead of passing them around and risking someone else actually trying to
dereference an incomplete type, why don't you add just 2 new helpers -
br_vlan_pvid(netdevice) and br_vlan_info(netdevice, vid, *bridge_vlan_info)

br_vlan_info can return the exported and already available type 
"bridge_vlan_info"
(defined in uapi/linux/if_bridge.h) into the bridge_vlan_info arg
and br_vlan_pvid is obvious - return the current dev pvid if available.

Another bonus you'll avoid dealing with the bridge's vlan internals.


+   if (pvid)
+   edev = br_fdb_find_port_hold(br_dev, dmac, pvid);
+   if (!edev)
+   return NULL;
+
+   /* RTNL prevents edev from being removed. */
+   dev_put(edev);


Minor point here - since this is always called in rtnl, the dev_hold/put dance 
isn't needed,
unless you plan to use the fdb_find_port_hold outside of rtnl of course. In 
case that is
not planned why don't you do br_fdb_find_port_rtnl() instead with RCU inside 
(thus not blocking
learning) and ASSERT_RTNL() to avoid some complexity ?

Thanks,
 Nik


+
+   vg = br_port_vlan_group_rtnl(edev);
+   v = br_vlan_find(vg, pvid);
+   if (!v)
+   return NULL;
+   if (!(br_vlan_flags(v) & BRIDGE_VLAN_INFO_UNTAGGED))
+   *p_vid = pvid;
+   return edev;
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_bridge_8021d(const struct net_device *br_dev,
+unsigned char *dmac)
+{
+   struct net_device *edev = br_fdb_find_port_hold(br_dev, dmac, 0);
+
+   /* RTNL prevents edev from being removed. */
+   dev_put(edev);
+
+   return edev;
+}
+
+static struct net_device *
+mlxsw_sp_span_entry_bridge(const struct net_device *br_dev,
+  unsigned char dmac[ETH_ALEN],
+  u16 *p_vid)
+{
+   struct mlxsw_sp_bridge_port *bridge_port;
+   enum mlxsw_reg_spms_state spms_state;
+   struct mlxsw_sp_port *port;
+   struct net_device *dev;
+   u8 stp_state;
+
+   if (br_vlan_enabled(br_dev))
+   dev = mlxsw_sp_span_entry_bridge_8021q(br_dev, dmac, p_vid);
+   else
+   dev = mlxsw_sp_span_entry_bridge_8021d(br_dev, dmac);
+   if (!dev)
+   return NULL;
+
+   port = mlxsw_sp_port_dev_lower_find(dev);
+   if (!port)
+   return NULL;
+
+   bridge_port = mlxsw_sp_bridge_port_find(port->mlxsw_sp->bridge, dev);
+   if (!bridge_port)
+   return NULL;
+
+   stp_state = mlxsw_sp_bridge_port_stp_state(bridge_port);
+   spms_state = mlxsw_sp_stp_spms_state(stp_state);
+   if (spms_state != MLXSW_REG_SPMS_STATE_FORWARDING)
+   return NULL;
+
+   return dev;
+}
+
  static __maybe_unused int
  mlxsw_sp_span_entry_tunnel_parms_common(struct net_device *l3edev,
union mlxsw_sp_l3addr saddr,
@@ -177,13 +252,22 @@ mlxsw_sp_span_entry_tunnel_parms_common(struct net_device 
*l3edev,
struct mlxsw_sp_span_parms

Re: [RFC PATCH] net: bridge: multicast querier per VLAN support

2018-04-18 Thread Nikolay Aleksandrov

On April 18, 2018 6:54:07 PM GMT+03:00, Stephen Hemminger 
<step...@networkplumber.org> wrote:
>On Wed, 18 Apr 2018 16:14:26 +0300
>Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:
>
>> On 18/04/18 16:07, Joachim Nilsson wrote:
>> > On Wed, Apr 18, 2018 at 03:31:57PM +0300, Nikolay Aleksandrov
>wrote:  
>> >> On 18/04/18 15:07, Joachim Nilsson wrote:  
>> >>> - First of all, is this patch useful to anyone  
>> >> Obviously to us as it's based on our patch. :-)
>> >> We actually recently discussed what will be needed to make it
>acceptable to upstream.  
>> > 
>> > Great! :)
>> >   
>> >>> - The current br_multicast.c is very complex.  The support for
>both IPv4
>> >>>and IPv6 is a no-brainer, but it also has #ifdef
>VLAN_FILTERING and
>> >>>'br->vlan_enabled' ... this has likely been discussed before,
>but if
>> >>>we could remove those code paths I believe what's left would
>be quite
>> >>>a bit easier to read and maintain.  
>> >> br->vlan_enabled has a wrapper that can be used without ifdefs, as
>does br_vlan_find()
>> >> so in short - you can remove the ifdefs and use the wrappers, 
>they'll degrade to always
>> >> false/null when vlans are disabled.  
>> > 
>> > Thanks, I'll have a look at that and prepare an RFC v2!
>> >   
>> >>> - Many per-bridge specific multicast sysfs settings may need to
>have a
>> >>>corresponding per-VLAN setting, e.g. snooping, query_interval,
>etc.
>> >>>How should we go about that? (For status reporting I have a
>proposal)  
>> >> We'll have to add more to the per-vlan context, but yes it has to
>happen.
>> >> It will be only netlink interface for config/retrieval, no sysfs. 
>
>> > 
>> > Some settings are possible to do with sysfs, like
>multicast_query_interval
>> > and ...  
>> 
>> We want to avoid sysfs in general, all of networking config and stats
>> are moving to netlink. It is better controlled and structured for
>such
>> changes, also provides nice interfaces for automatic  type checks
>etc.
>> 
>> Also (but a minor reason) there is no tree/entity in sysfs for the
>vlans
>> where to add this. It will either have to be a file which does some
>> format string hack (like us currently) or will need to add new tree
>for
>> them which I'd really like to avoid for the bridge.
>
>In general, all bridge attributes need to show in netlink and sysfs.
>Sysfs is easier for scripting from languages.

True, but vlans and per-vlan settings have never been exposed via sysfs, only 
through netlink.
I'd like to avoid adding a directory with potentially 4k multiplied by the attr 
number for each vlan entries.

There is already vlan config infrastructure via netlink.

Re: [RFC PATCH] net: bridge: multicast querier per VLAN support

2018-04-18 Thread Nikolay Aleksandrov

On 18/04/18 16:07, Joachim Nilsson wrote:
> On Wed, Apr 18, 2018 at 03:31:57PM +0300, Nikolay Aleksandrov wrote:
>> On 18/04/18 15:07, Joachim Nilsson wrote:
>>> - First of all, is this patch useful to anyone
>> Obviously to us as it's based on our patch. :-)
>> We actually recently discussed what will be needed to make it acceptable to 
>> upstream.
> 
> Great! :)
> 
>>> - The current br_multicast.c is very complex.  The support for both IPv4
>>>and IPv6 is a no-brainer, but it also has #ifdef VLAN_FILTERING and
>>>'br->vlan_enabled' ... this has likely been discussed before, but if
>>>we could remove those code paths I believe what's left would be quite
>>>a bit easier to read and maintain.
>> br->vlan_enabled has a wrapper that can be used without ifdefs, as does 
>> br_vlan_find()
>> so in short - you can remove the ifdefs and use the wrappers,  they'll 
>> degrade to always
>> false/null when vlans are disabled.
> 
> Thanks, I'll have a look at that and prepare an RFC v2!
> 
>>> - Many per-bridge specific multicast sysfs settings may need to have a
>>>corresponding per-VLAN setting, e.g. snooping, query_interval, etc.
>>>How should we go about that? (For status reporting I have a proposal)
>> We'll have to add more to the per-vlan context, but yes it has to happen.
>> It will be only netlink interface for config/retrieval, no sysfs.
> 
> Some settings are possible to do with sysfs, like multicast_query_interval
> and ...

We want to avoid sysfs in general, all of networking config and stats
are moving to netlink. It is better controlled and structured for such
changes, also provides nice interfaces for automatic  type checks etc.

Also (but a minor reason) there is no tree/entity in sysfs for the vlans
where to add this. It will either have to be a file which does some
format string hack (like us currently) or will need to add new tree for
them which I'd really like to avoid for the bridge.

> 
>>> - Dito per-port specific multicast sysfs settings, e.g. multicast_router
>> I'm not sure I follow this one, there is per-port mcast router config now ?
> 
> Sorry no, I meant we may want to add more per-VLAN settings when we get
> this base patch merged.  Like router ports, we may want to be able to
> set them per VLAN.

Sure, that can be done easily via netlink. br_afspec() can decode any
additional per-vlan attributes and can be fairly easily extended.
Also after my vlan rhastable change, we have per-vlan context even today
(e.g. per-vlan stats use it) so we'll just extend that.

> 
>> Thanks for the effort, I see that you have done some of the required cleanups
>> for this to be upstreamable, but as you've noted above we need to make it
>> complete (with the per-vlan contexts and all).
> 
> There's definitely more work to be done.  Agreeing on a base set of changes
> to start with is maybe the most important, as well as making it complete.>
>> I will review this patch in detail later and come back if there's anything.
> 
> Thank you so much for the quick feedback so far! :)
> 
> Cheers
>  /Joachim
>

Re: [RFC PATCH] net: bridge: multicast querier per VLAN support

2018-04-18 Thread Nikolay Aleksandrov


On 18/04/18 15:07, Joachim Nilsson wrote:

This RFC patch¹ is an attempt to add multicast querier per VLAN support
to a VLAN aware bridge.  I'm posting it as RFC for now since non-VLAN
aware bridges are not handled, and one of my questions is if that is
complexity we need to continue supporting?

 From what I understand, multicast join/report already support per VLAN
operation, and the MDB as well support filtering per VLAN, but queries
are currently limited to per-port operation on VLAN-aware bridges.

The naive² approach of this patch relocates query timers from the bridge
to operate per VLAN, on timer expiry we send queries to all bridge ports
in the same VLAN.  Tagged port members have tagged VLAN queries.

Unlike the original patch¹, which uses a sysfs entry to set the querier
address of each VLAN, this use the IP address of the VLAN interface when
initiating a per VLAN query.  A version of inet_select_addr() is used
for this, called inet_select_dev_addr(), not included in this patch.

Open questions/TODO:

- First of all, is this patch useful to anyone


Obviously to us as it's based on our patch. :-)
We actually recently discussed what will be needed to make it acceptable to 
upstream.


- The current br_multicast.c is very complex.  The support for both IPv4
   and IPv6 is a no-brainer, but it also has #ifdef VLAN_FILTERING and
   'br->vlan_enabled' ... this has likely been discussed before, but if
   we could remove those code paths I believe what's left would be quite
   a bit easier to read and maintain.


br->vlan_enabled has a wrapper that can be used without ifdefs, as does 
br_vlan_find()
so in short - you can remove the ifdefs and use the wrappers,  they'll degrade 
to always
false/null when vlans are disabled.


- Many per-bridge specific multicast sysfs settings may need to have a
   corresponding per-VLAN setting, e.g. snooping, query_interval, etc.
   How should we go about that? (For status reporting I have a proposal)


We'll have to add more to the per-vlan context, but yes it has to happen.
It will be only netlink interface for config/retrieval, no sysfs.


- Dito per-port specific multicast sysfs settings, e.g. multicast_router


I'm not sure I follow this one, there is per-port mcast router config now ?
Take a look at br_multicast_set_port_router().


- The MLD support has been kept in sync with the rest but is completely
   untested.  In particular I suspect the wrong source IP will be used.

¹) Initially based on a patch by Cumulus Networks

http://repo3.cumulusnetworks.com/repo/pool/cumulus/l/linux/linux-source-4.1_4.1.33-1+cl3u11_all.deb


I knew this looked familiar when I glanced through it :)


²) This patch is currently limited to work only on bridges with VLAN
enabled.  Care has been taken to support MLD snooping, but it is
completely untested.

Thank you for reading this far!

Signed-off-by: Joachim Nilsson 


Thanks for the effort, I see that you have done some of the required cleanups
for this to be upstreamable, but as you've noted above we need to make it
complete (with the per-vlan contexts and all).

I will review this patch in detail later and come back if there's anything.

Cheers,
 Nik


---
  net/bridge/br_device.c|   2 +-
  net/bridge/br_input.c |   2 +-
  net/bridge/br_multicast.c | 456 --
  net/bridge/br_private.h   |  38 +++-
  net/bridge/br_stp.c   |   5 +-
  net/bridge/br_vlan.c  |   3 +
  6 files changed, 327 insertions(+), 179 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 02f9f8aab047..ba35485032d8 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -98,7 +98,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct 
net_device *dev)
  
  		mdst = br_mdb_get(br, skb, vid);

if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-   br_multicast_querier_exists(br, eth_hdr(skb)))
+   br_multicast_querier_exists(br, vid, eth_hdr(skb)))
br_multicast_flood(mdst, skb, false, true);
else
br_flood(br, skb, BR_PKT_MULTICAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 56bb9189c374..13d48489e0e1 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -137,7 +137,7 @@ int br_handle_frame_finish(struct net *net, struct sock 
*sk, struct sk_buff *skb
mdst = br_mdb_get(br, skb, vid);
if ((mdst && mdst->addr.proto == htons(ETH_P_ALL)) ||
((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-br_multicast_querier_exists(br, eth_hdr(skb {
+br_multicast_querier_exists(br, vid, eth_hdr(skb {
if ((mdst && mdst->host_joined) ||
br_multicast_is_router(br)) {
local_rcv = true;
diff --git a/net/bridge/br_multicast.c

Re: [PATCH] net: bridge: add missing NULL checks

2018-04-08 Thread Nikolay Aleksandrov


On 08/04/18 20:49, Laszlo Toth wrote:

br_port_get_rtnl() can return NULL

Signed-off-by: Laszlo Toth <lasz...@gmail.com>
---
  net/bridge/br_netlink.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)



Nacked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
More below.


diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 015f465c..cbec11f 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -939,14 +939,17 @@ static int br_port_slave_changelink(struct net_device 
*brdev,
struct nlattr *data[],
struct netlink_ext_ack *extack)
  {
+   struct net_bridge_port *port = br_port_get_rtnl(dev);
struct net_bridge *br = netdev_priv(brdev);
int ret;
  
  	if (!data)

return 0;
+   if (!port)
+   return -EINVAL;
  


If we're here, it means the master device of dev is a bridge => dev is a bridge 
port,
since we're running with RTNL that cannot change, so this check is unnecessary.

Have you actually hit a bug with this code ?


spin_lock_bh(>lock);
-   ret = br_setport(br_port_get_rtnl(dev), data);
+   ret = br_setport(port, data);
spin_unlock_bh(>lock);
  
  	return ret;

@@ -956,7 +959,12 @@ static int br_port_fill_slave_info(struct sk_buff *skb,
   const struct net_device *brdev,
   const struct net_device *dev)
  {
-   return br_port_fill_attrs(skb, br_port_get_rtnl(dev));
+   struct net_bridge_port *port = br_port_get_rtnl(dev);
+
+   if (!port)
+   return -EINVAL;
+
+   return br_port_fill_attrs(skb, port);


Same rationale here, fill_slave_info is called via a master device's ops
under RTNL, which means dev is a bridge port and that also cannot change.

If you have hit a bug with this code, can we see the trace ?
The problem might be elsewhere.

Thanks,
 Nik


  }
  
  static size_t br_port_get_slave_size(const struct net_device *brdev,

Re: [PATCH] net: bond: skip vlan header when do layer 3+4 hash policy

2018-03-31 Thread Nikolay Aleksandrov


On 31/03/18 12:14, liujia...@huawei.com wrote:

From: liujian 

When the hash policy is BOND_XMIT_POLICY_LAYER34 mode and skb protocol
is 802.1q VLAN, the policy will be degenerated to LAYER2 mode; Now,
change it to get the next layer protocol to ensure that it worked in
BOND_XMIT_POLICY_LAYER34 mode.

Signed-off-by: liujian 
---
  drivers/net/bonding/bond_main.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)



Nak
Use BOND_XMIT_POLICY_ENCAP34 (encap3+4), that was one of the main reasons it 
was added.

[PATCH net-next 2/2] net: bridge: disable bridge MTU auto tuning if it was set manually

2018-03-30 Thread Nikolay Aleksandrov

As Roopa noted today the biggest source of problems when configuring
bridge and ports is that the bridge MTU keeps changing automatically on
port events (add/del/changemtu). That leads to inconsistent behaviour
and network config software needs to chase the MTU and fix it on each
such event. Let's improve on that situation and allow for the user to
set any MTU within ETH_MIN/MAX limits, but once manually configured it
is the user's responsibility to keep it correct afterwards.

In case the MTU isn't manually set - the behaviour reverts to the
previous and the bridge follows the minimum MTU.

Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
---
 net/bridge/br.c |  2 +-
 net/bridge/br_device.c  |  5 ++---
 net/bridge/br_if.c  | 36 +---
 net/bridge/br_private.h |  3 ++-
 4 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index 565ff055813b..671d13c10f6f 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -52,7 +52,7 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
 
switch (event) {
case NETDEV_CHANGEMTU:
-   dev_set_mtu(br->dev, br_mtu(br, false));
+   br_mtu_auto_adjust(br);
break;
 
case NETDEV_CHANGEADDR:
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index edb9967eb165..e682a668ce57 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -225,11 +225,10 @@ static int br_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct net_bridge *br = netdev_priv(dev);
 
-   if (new_mtu > br_mtu(br, br_vlan_enabled(dev)))
-   return -EINVAL;
-
dev->mtu = new_mtu;
 
+   /* this flag will be cleared if the MTU was automatically adjusted */
+   br->mtu_set_by_user = true;
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
/* remember the MTU in the rtable for PMTU */
dst_metric_set(>fake_rtable.dst, RTAX_MTU, new_mtu);
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 7d5dc5a91084..82c1a6f430b3 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -424,28 +424,34 @@ int br_del_bridge(struct net *net, const char *name)
return ret;
 }
 
-/* MTU of the bridge pseudo-device: ETH_DATA_LEN if there are no ports, the
- * minimum of the ports if @max is false or the maximum if it's true
- */
-int br_mtu(const struct net_bridge *br, bool max)
+/* MTU of the bridge pseudo-device: ETH_DATA_LEN or the minimum of the ports */
+static int br_mtu_min(const struct net_bridge *br)
 {
const struct net_bridge_port *p;
int ret_mtu = 0;
 
-   ASSERT_RTNL();
-
-   list_for_each_entry(p, >port_list, list) {
-   if (!max) {
-   if (!ret_mtu || ret_mtu > p->dev->mtu)
-   ret_mtu = p->dev->mtu;
-   } else if (p->dev->mtu > ret_mtu) {
+   list_for_each_entry(p, >port_list, list)
+   if (!ret_mtu || ret_mtu > p->dev->mtu)
ret_mtu = p->dev->mtu;
-   }
-   }
 
return ret_mtu ? ret_mtu : ETH_DATA_LEN;
 }
 
+void br_mtu_auto_adjust(struct net_bridge *br)
+{
+   ASSERT_RTNL();
+
+   /* if the bridge MTU was manually configured don't mess with it */
+   if (br->mtu_set_by_user)
+   return;
+
+   /* change to the minimum MTU and clear the flag which was set by
+* the bridge ndo_change_mtu callback
+*/
+   dev_set_mtu(br->dev, br_mtu_min(br));
+   br->mtu_set_by_user = false;
+}
+
 static void br_set_gso_limits(struct net_bridge *br)
 {
unsigned int gso_max_size = GSO_MAX_SIZE;
@@ -597,7 +603,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
if (changed_addr)
call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev);
 
-   dev_set_mtu(br->dev, br_mtu(br, false));
+   br_mtu_auto_adjust(br);
br_set_gso_limits(br);
 
kobject_uevent(>kobj, KOBJ_ADD);
@@ -644,7 +650,7 @@ int br_del_if(struct net_bridge *br, struct net_device *dev)
 */
del_nbp(p);
 
-   dev_set_mtu(br->dev, br_mtu(br, false));
+   br_mtu_auto_adjust(br);
br_set_gso_limits(br);
 
spin_lock_bh(>lock);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 586f84b9670d..a7cb3ece5031 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -410,6 +410,7 @@ struct net_bridge {
int offload_fwd_mark;
 #endif
boolneigh_suppress_enabled;
+   boolmtu_set_by_user;
struct hlist_head   fdb_list;
 };
 
@@ -578,7 +579,7 @@ int br_del_bridge(struct net *net, const char *name);
 int br_add_if(struct net_bridge *br, struct net_device *dev,
  struct netlink_ext_ack *extack

[PATCH net-next 1/2] net: bridge: set min MTU on port events and allow user to set max

2018-03-30 Thread Nikolay Aleksandrov

Recently the bridge was changed to automatically set maximum MTU on port
events (add/del/changemtu) when vlan filtering is enabled, but that
actually changes behaviour in a way which breaks some setups and can lead
to packet drops. In order to still allow that maximum to be set while being
compatible, we add the ability for the user to tune the bridge MTU up to
the maximum when vlan filtering is enabled, but that has to be done
explicitly and all port events (add/del/changemtu) lead to resetting that
MTU to the minimum as before.

Suggested-by: Roopa Prabhu <ro...@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
---
 net/bridge/br.c |  2 +-
 net/bridge/br_device.c  |  3 ++-
 net/bridge/br_if.c  | 43 ++-
 net/bridge/br_private.h |  2 +-
 4 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index 26e1616b2c90..565ff055813b 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -52,7 +52,7 @@ static int br_device_event(struct notifier_block *unused, 
unsigned long event, v
 
switch (event) {
case NETDEV_CHANGEMTU:
-   dev_set_mtu(br->dev, br_mtu(br));
+   dev_set_mtu(br->dev, br_mtu(br, false));
break;
 
case NETDEV_CHANGEADDR:
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 278fc999d355..edb9967eb165 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -224,7 +224,8 @@ static void br_get_stats64(struct net_device *dev,
 static int br_change_mtu(struct net_device *dev, int new_mtu)
 {
struct net_bridge *br = netdev_priv(dev);
-   if (new_mtu > br_mtu(br))
+
+   if (new_mtu > br_mtu(br, br_vlan_enabled(dev)))
return -EINVAL;
 
dev->mtu = new_mtu;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 87b2afd455c7..7d5dc5a91084 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -424,41 +424,26 @@ int br_del_bridge(struct net *net, const char *name)
return ret;
 }
 
-static bool min_mtu(int a, int b)
-{
-   return a < b ? 1 : 0;
-}
-
-static bool max_mtu(int a, int b)
-{
-   return a > b ? 1 : 0;
-}
-
-/* MTU of the bridge pseudo-device: ETH_DATA_LEN or the minimum of the ports */
-static int __br_mtu(const struct net_bridge *br, bool (compare_fn)(int, int))
+/* MTU of the bridge pseudo-device: ETH_DATA_LEN if there are no ports, the
+ * minimum of the ports if @max is false or the maximum if it's true
+ */
+int br_mtu(const struct net_bridge *br, bool max)
 {
const struct net_bridge_port *p;
-   int mtu = 0;
+   int ret_mtu = 0;
 
ASSERT_RTNL();
 
-   if (list_empty(>port_list))
-   mtu = ETH_DATA_LEN;
-   else {
-   list_for_each_entry(p, >port_list, list) {
-   if (!mtu || compare_fn(p->dev->mtu, mtu))
-   mtu = p->dev->mtu;
+   list_for_each_entry(p, >port_list, list) {
+   if (!max) {
+   if (!ret_mtu || ret_mtu > p->dev->mtu)
+   ret_mtu = p->dev->mtu;
+   } else if (p->dev->mtu > ret_mtu) {
+   ret_mtu = p->dev->mtu;
}
}
-   return mtu;
-}
 
-int br_mtu(const struct net_bridge *br)
-{
-   if (br_vlan_enabled(br->dev))
-   return __br_mtu(br, max_mtu);
-   else
-   return __br_mtu(br, min_mtu);
+   return ret_mtu ? ret_mtu : ETH_DATA_LEN;
 }
 
 static void br_set_gso_limits(struct net_bridge *br)
@@ -612,7 +597,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
if (changed_addr)
call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev);
 
-   dev_set_mtu(br->dev, br_mtu(br));
+   dev_set_mtu(br->dev, br_mtu(br, false));
br_set_gso_limits(br);
 
kobject_uevent(>kobj, KOBJ_ADD);
@@ -659,7 +644,7 @@ int br_del_if(struct net_bridge *br, struct net_device *dev)
 */
del_nbp(p);
 
-   dev_set_mtu(br->dev, br_mtu(br));
+   dev_set_mtu(br->dev, br_mtu(br, false));
br_set_gso_limits(br);
 
spin_lock_bh(>lock);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 048d5b51813b..586f84b9670d 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -578,7 +578,7 @@ int br_del_bridge(struct net *net, const char *name);
 int br_add_if(struct net_bridge *br, struct net_device *dev,
  struct netlink_ext_ack *extack);
 int br_del_if(struct net_bridge *br, struct net_device *dev);
-int br_mtu(const struct net_bridge *br);
+int br_mtu(const struct net_bridge *br, bool max);
 netdev_features_t br_features_recompute(struct net_bridge *br,
netdev_features_t features);
 void br_port_flags_change(struct net_bridge_port *port, unsigned long mask);
-- 
2.11.0

[PATCH net-next 0/2] net: bridge: MTU handling changes

2018-03-30 Thread Nikolay Aleksandrov

Hi,
As previously discussed the recent changes break some setups and could lead
to packet drops. Thus the first patch reverts the behaviour for the bridge
to follow the minimum MTU but also keeps the ability to set the MTU to the
maximum (out of all ports) if vlan filtering is enabled. Patch 02 is the
bigger change in behaviour - we've always had trouble when configuring
bridges and their MTU which is auto tuning on port events
(add/del/changemtu), which means config software needs to chase it and fix
it after each such event, after patch 02 we allow the user to configure any
MTU (ETH_MIN/MAX limited) but once that is done the bridge stops auto
tuning and relies on the user to keep the MTU correct.
This should be compatible with cases that don't touch the MTU (or set it
to the same value), while allowing to configure the MTU and not worry
about it changing afterwards.

The patches are intentionally split like this, so that if they get accepted
and there are any complaints patch 02 can be reverted.

Thanks,
 Nik

Nikolay Aleksandrov (2):
  net: bridge: set min MTU on port events and allow user to set max
  net: bridge: disable bridge MTU auto tuning if it was set manually

 net/bridge/br.c |  2 +-
 net/bridge/br_device.c  |  4 ++--
 net/bridge/br_if.c  | 49 -
 net/bridge/br_private.h |  3 ++-
 4 files changed, 25 insertions(+), 33 deletions(-)

-- 
2.11.0

Re: [bpf-next PATCH] net: br_vlan build error

2018-03-27 Thread Nikolay Aleksandrov

On 27/03/18 18:38, John Fastabend wrote:
> Fix build error in br_if.c
> 
> net/bridge/br_if.c: In function ‘br_mtu’:
> net/bridge/br_if.c:458:8: error: ‘const struct net_bridge’ has no member 
> named ‘vlan_enabled’
>   if (br->vlan_enabled)
> ^
> net/bridge/br_if.c:462:1: warning: control reaches end of non-void function 
> [-Wreturn-type]
>  }
>  ^
> Fixes: 419d14af9e07f ("bridge: Allow max MTU when multiple VLANs present")
> Signed-off-by: John Fastabend <john.fastab...@gmail.com>
> ---
>  net/bridge/br_if.c |   27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 

I'm not sure what the rules about merging are, but just in case I
already fixed this in net-next couple of days ago.

commit 82792a070b16
Author: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
Date:   Fri Mar 23 18:27:06 2018 +0200

net: bridge: fix direct access to bridge vlan_enabled and use helper

https://patchwork.ozlabs.org/patch/890043/


Cheers,
 Nik

Re: [PATCH net 0/3] bonding: a bunch of fixes for dev hwaddr sync in bond_enslave

2018-03-26 Thread Nikolay Aleksandrov


On 25/03/18 20:16, Xin Long wrote:

This patchset is mainly to fix a crash when adding vlan as slave of
bond which is also the parent link in patch 2/3,  and also fix some
err process problems in bond_enslave in patch 1/3 and 3/3.

Xin Long (3):
   bonding: fix the err path for dev hwaddr sync in bond_enslave
   bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave
   bonding: process the err returned by dev_set_allmulti properly in
 bond_enslave

  drivers/net/bonding/bond_main.c | 73 +
  1 file changed, 37 insertions(+), 36 deletions(-)



I have no objections to the patches but you've missed to add the bonding 
maintainers
to the CC list (added now and removed the bouncing emails).

Thanks,
 Nik

Re: [PATCH net 1/3] bonding: fix the err path for dev hwaddr sync in bond_enslave

2018-03-26 Thread Nikolay Aleksandrov


On 25/03/18 20:16, Xin Long wrote:

vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
the err path it should unsync dev hwaddr. Otherwise, the slave
dev's hwaddr will never be unsync when this err happens.

Fixes: 1ff412ad7714 ("bonding: change the bond's vlan syncing functions with the 
standard ones")
Signed-off-by: Xin Long <lucien@gmail.com>
---
  drivers/net/bonding/bond_main.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)



Seems I've missed the err path back then.
Thanks,
Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

[PATCH net-next] net: bridge: fix direct access to bridge vlan_enabled and use helper

2018-03-23 Thread Nikolay Aleksandrov

We need to use br_vlan_enabled() helper otherwise we'll break builds
without bridge vlans:
net/bridge//br_if.c: In function ‘br_mtu’:
net/bridge//br_if.c:458:8: error: ‘const struct net_bridge’ has no
member named ‘vlan_enabled’
  if (br->vlan_enabled)
^
net/bridge//br_if.c:462:1: warning: control reaches end of non-void
function [-Wreturn-type]
 }
 ^
scripts/Makefile.build:324: recipe for target 'net/bridge//br_if.o'
failed

Fixes: 419d14af9e07 ("bridge: Allow max MTU when multiple VLANs present")
Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
---
 net/bridge/br_if.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 48dc4d2e2be3..87b2afd455c7 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -455,7 +455,7 @@ static int __br_mtu(const struct net_bridge *br, bool 
(compare_fn)(int, int))
 
 int br_mtu(const struct net_bridge *br)
 {
-   if (br->vlan_enabled)
+   if (br_vlan_enabled(br->dev))
return __br_mtu(br, max_mtu);
else
return __br_mtu(br, min_mtu);
-- 
2.1.4

Re: [PATCH net-next] bridge: Allow max MTU when multiple VLANs present

2018-03-23 Thread Nikolay Aleksandrov

On 23/03/18 18:17, David Miller wrote:
> From: Chas Williams <3ch...@gmail.com>
> Date: Thu, 22 Mar 2018 11:34:06 -0400
> 
>> If the bridge is allowing multiple VLANs, some VLANs may have
>> different MTUs.  Instead of choosing the minimum MTU for the
>> bridge interface, choose the maximum MTU of the bridge members.
>> With this the user only needs to set a larger MTU on the member
>> ports that are participating in the large MTU VLANS.
>>
>> Signed-off-by: Chas Williams <3ch...@gmail.com>
> 
> Applied, thanks.
> 

Argh, this will break on builds without vlans because br->vlan_enabled shouldn't
be accessed directly. I missed that when reviewing.
I'll send a follow up fix in a second that uses br_vlan_enabled().

Re: [PATCH net-next] bridge: Allow max MTU when multiple VLANs present

2018-03-22 Thread Nikolay Aleksandrov

On 22/03/18 17:34, Chas Williams wrote:
> If the bridge is allowing multiple VLANs, some VLANs may have
> different MTUs.  Instead of choosing the minimum MTU for the
> bridge interface, choose the maximum MTU of the bridge members.
> With this the user only needs to set a larger MTU on the member
> ports that are participating in the large MTU VLANS.
> 
> Signed-off-by: Chas Williams <3ch...@gmail.com>
> ---
>  net/bridge/br.c |  2 +-
>  net/bridge/br_device.c  |  2 +-
>  net/bridge/br_if.c  | 26 ++
>  net/bridge/br_private.h |  2 +-
>  4 files changed, 25 insertions(+), 7 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 09/10] net: Remove unused get_hash_from_flow functions

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> __get_hash_from_flowi6 is still used for flowlabels, but the IPv4
> variant and the wrappers to both are not used. Remove them.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  include/net/flow.h| 16 
>  net/core/flow_dissector.c | 16 
>  2 files changed, 32 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 07/10] net/ipv6: Add support for path selection using hash of 5-tuple

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> Some operators prefer IPv6 path selection to use a standard 5-tuple
> hash rather than just an L3 hash with the flow the label. To that end
> add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
> ("net: ipv4: add support for ECMP hash policy choice"). The default
> is still L3 which covers source and destination addresses along with
> flow label and IPv6 protocol.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> Tested-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  Documentation/networking/ip-sysctl.txt |  7 
>  include/net/ip6_route.h|  4 +-
>  include/net/netevent.h |  1 +
>  include/net/netns/ipv6.h   |  1 +
>  net/ipv6/icmp.c|  2 +-
>  net/ipv6/route.c   | 68 
> ++
>  net/ipv6/sysctl_net_ipv6.c | 27 ++
>  7 files changed, 91 insertions(+), 19 deletions(-)
> 

LGTM,

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 06/10] net/ipv6: Pass skb to route lookup

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> IPv6 does path selection for multipath routes deep in the lookup
> functions. The next patch adds L4 hash option and needs the skb
> for the forward path. To get the skb to the relevant FIB lookup
> functions it needs to go through the fib rules layer, so add a
> lookup_data argument to the fib_lookup_arg struct.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  drivers/infiniband/core/cma.c  |  2 +-
>  drivers/net/ipvlan/ipvlan_core.c   |  3 +-
>  drivers/net/vrf.c  |  7 ++--
>  include/net/fib_rules.h|  1 +
>  include/net/ip6_fib.h  |  4 ++-
>  include/net/ip6_route.h| 11 +++---
>  net/ipv6/anycast.c |  2 +-
>  net/ipv6/fib6_rules.c  |  8 +++--
>  net/ipv6/icmp.c|  3 +-
>  net/ipv6/ip6_fib.c |  3 +-
>  net/ipv6/ip6_gre.c |  2 +-
>  net/ipv6/ip6_tunnel.c  |  4 +--
>  net/ipv6/ip6_vti.c |  2 +-
>  net/ipv6/mcast.c   |  4 +--
>  net/ipv6/netfilter/ip6t_rpfilter.c |  2 +-
>  net/ipv6/netfilter/nft_fib_ipv6.c  |  3 +-
>  net/ipv6/route.c   | 72 
> +++---
>  net/ipv6/seg6_local.c  |  4 +--
>  18 files changed, 83 insertions(+), 54 deletions(-)
> 

Indeed, I remember this requirement back when I was making the IPv4
changes and looked into how IPv6 is handled, skb wasn't passed down
until now. Looks good to me!

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 05/10] net: Rename NETEVENT_MULTIPATH_HASH_UPDATE

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> Rename NETEVENT_MULTIPATH_HASH_UPDATE to
> NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
> in the IPv4 hash policy.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +-
>  include/net/netevent.h| 2 +-
>  net/ipv4/sysctl_net_ipv4.c| 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 04/10] net/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> Make rt6_multipath_hash more of a direct parallel to fib_multipath_hash
> and reduce stack and overhead in the process: get_hash_from_flowi6 is
> just a wrapper around __get_hash_from_flowi6 with another stack
> allocation for flow_keys. Move setting the addresses, protocol and
> label into rt6_multipath_hash and allow it to make the call to
> flow_hash_from_keys.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  net/ipv6/route.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 03/10] net/ipv4: Simplify fib_multipath_hash with optional flow keys

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> As of commit e37b1e978bec5 ("ipv6: route: dissect flow in input path if
> fib rules need it") fib_multipath_hash takes an optional flow keys. If
> non-NULL it means the skb has already been dissected. If not set, then
> fib_multipath_hash needs to call skb_flow_dissect_flow_keys.
> 
> Simplify the logic by setting flkeys to the local stack variable keys.
> Simplifies fib_multipath_hash by only have 1 set of instructions
> setting hash_keys.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  net/ipv4/route.c | 24 ++--
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 02/10] net: Align ip_multipath_l3_keys and ip6_multipath_l3_keys

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> Symmetry is good and allows easy comparison that ipv4 and ipv6 are
> doing the same thing. To that end, change ip_multipath_l3_keys to
> set addresses at the end after the icmp compares, and move the
> initialization of ipv6 flow keys to rt6_multipath_hash.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> Reviewed-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  net/ipv4/route.c | 20 +++-
>  net/ipv6/route.c |  4 ++--
>  2 files changed, 13 insertions(+), 11 deletions(-)
>

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v3 net-next 01/10] net/ipv4: Pass net to fib_multipath_hash instead of fib_info

2018-03-02 Thread Nikolay Aleksandrov

On 02/03/18 18:32, David Ahern wrote:
> fib_multipath_hash only needs net struct to check a sysctl. Make it
> clear by passing net instead of fib_info. In the end this allows
> alignment between the ipv4 and ipv6 versions.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>
> ---
>  include/net/ip_fib.h | 2 +-
>  net/ipv4/fib_semantics.c | 2 +-
>  net/ipv4/route.c | 5 ++---
>  3 files changed, 4 insertions(+), 5 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v2 net-next 06/11] ipmr, ip6mr: Make mfc_cache a common structure

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 23:29, Yuval Mintz wrote:
> mfc_cache and mfc6_cache are almost identical - the main difference is
> in the origin/group addresses and comparison-key. Make a common
> structure encapsulating most of the multicast routing logic  - mr_mfc
> and convert both ipmr and ip6mr into using it.
> 
> For easy conversion [casting, in this case] mr_mfc has to be the first
> field inside every multicast routing abstraction utilizing it.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c |  21 +-
>  include/linux/mroute.h|  45 +---
>  include/linux/mroute6.h   |  23 +-
>  include/linux/mroute_base.h   |  45 
>  net/ipv4/ipmr.c   | 233 ++--
>  net/ipv6/ip6mr.c  | 248 
> +++---
>  6 files changed, 312 insertions(+), 303 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH v2 net-next 07/11] ipmr, ip6mr: Unite logic for searching in MFC cache

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 23:29, Yuval Mintz wrote:
> ipmr and ip6mr utilize the exact same methods for searching the
> hashed resolved connections, difference being only in the construction
> of the hash comparison key.
> 
> In order to unite the flow, introduce an mr_table operation set that
> would contain the protocol specific information required for common
> flows, in this case - the hash parameters and a comparison key
> representing a (*,*) route.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute_base.h | 52 +--
>  net/ipv4/ipmr.c | 71 ++-
>  net/ipv4/ipmr_base.c| 54 +++--
>  net/ipv6/ip6mr.c| 74 
> +++--
>  4 files changed, 134 insertions(+), 117 deletions(-)
> 

Thanks!

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 5/5] ipv6: route: dissect flow in input path if fib rules need it

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 05:52, Roopa Prabhu wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> 
> Dissect flow in fwd path if fib rules require it. Controlled by
> a flag to avoid penatly for the common case. Flag is set when fib
> rules with sport, dport and proto match that require flow dissect
> are installed. Also passes the dissected hash keys to the multipath
> hash function when applicable to avoid dissecting the flow again.
> icmp packets will continue to use inner header for hash
> calculations.
> 
> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
> ---
>  include/net/ip6_fib.h| 25 +
>  include/net/ip6_route.h  |  4 +++-
>  include/net/netns/ipv6.h |  3 ++-
>  net/ipv6/fib6_rules.c| 16 
>  net/ipv6/icmp.c  |  2 +-
>  net/ipv6/route.c | 34 +-
>  6 files changed, 72 insertions(+), 12 deletions(-)
> 

Looks good,
Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 4/5] ipv4: route: dissect flow in input path if fib rules need it

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 05:52, Roopa Prabhu wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> 
> Dissect flow in fwd path if fib rules require it. Controlled by
> a flag to avoid penatly for the common case. Flag is set when fib
> rules with sport, dport and proto match that require flow dissect
> are installed. Also passes the dissected hash keys to the multipath
> hash function when applicable to avoid dissecting the flow again.
> icmp packets will continue to use inner header for hash
> calculations (Thanks to Nikolay Aleksandrov for some review here).
> 
> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
> ---
>  include/net/ip_fib.h | 27 ++-
>  include/net/netns/ipv4.h |  1 +
>  net/ipv4/fib_rules.c |  8 
>  net/ipv4/fib_semantics.c |  2 +-
>  net/ipv4/route.c | 43 +--
>  5 files changed, 65 insertions(+), 16 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 2/5] ipv4: fib_rules: support match on sport, dport and ip proto

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 05:52, Roopa Prabhu wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> 
> support to match on src port, dst port and ip protocol.
> 
> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
> ---
>  net/ipv4/fib_rules.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
> index 35d646a..16083b8 100644
> --- a/net/ipv4/fib_rules.c
> +++ b/net/ipv4/fib_rules.c
> @@ -182,6 +182,17 @@ static int fib4_rule_match(struct fib_rule *rule, struct 
> flowi *fl, int flags)
>   if (r->tos && (r->tos != fl4->flowi4_tos))
>   return 0;
>  
> + if (rule->ip_proto && (rule->ip_proto != fl4->flowi4_proto))
> + return 0;
> +
> + if (fib_rule_port_range_set(>sport_range) &&
> + !fib_rule_port_inrange(>sport_range, fl4->fl4_sport))
> + return 0;
> +
> + if (fib_rule_port_range_set(>dport_range) &&
> + !fib_rule_port_inrange(>dport_range, fl4->fl4_dport))
> + return 0;
> +
>   return 1;
>  }
>  
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 3/5] ipv6: fib6_rules: support for match on sport, dport and ip proto

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 05:52, Roopa Prabhu wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> 
> support to match on src port, dst port and ip protocol.
> 
> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
> ---
>  net/ipv6/fib6_rules.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
> index 95a2c9e..bcd1f22 100644
> --- a/net/ipv6/fib6_rules.c
> +++ b/net/ipv6/fib6_rules.c
> @@ -223,6 +223,17 @@ static int fib6_rule_match(struct fib_rule *rule, struct 
> flowi *fl, int flags)
>   if (r->tclass && r->tclass != ip6_tclass(fl6->flowlabel))
>   return 0;
>  
> + if (rule->ip_proto && (rule->ip_proto != fl6->flowi6_proto))
> + return 0;
> +
> + if (fib_rule_port_range_set(>sport_range) &&
> + !fib_rule_port_inrange(>sport_range, fl6->fl6_sport))
> + return 0;
> +
> + if (fib_rule_port_range_set(>dport_range) &&
> + !fib_rule_port_inrange(>dport_range, fl6->fl6_dport))
> + return 0;
> +
>   return 1;
>  }
>  
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next v2 1/5] net: fib_rules: support for match on ip_proto, sport and dport

2018-02-28 Thread Nikolay Aleksandrov

On 28/02/18 05:52, Roopa Prabhu wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> 
> uapi for ip_proto, sport and dport range match
> in fib rules.
> 
> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
> ---
>  include/net/fib_rules.h| 36 -
>  include/uapi/linux/fib_rules.h |  8 
>  net/core/fib_rules.c   | 92 
> +-
>  3 files changed, 133 insertions(+), 3 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 11/11] ipmr, ip6mr: Unite dumproute flows

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> The various MFC entries are being held in the same kind of mr_tables
> for both ipmr and ip6mr, and their traversal logic is identical.
> Also, with the exception of the addresses [and other small tidbits]
> the major bulk of the nla setting is identical.
> 
> Unite as much of the dumping as possible between the two.
> Notice this requires creating an mr_table iterator for each, as the
> for-each preprocessor macro can't be used by the common logic.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute_base.h |  29 
>  net/ipv4/ipmr.c | 161 
> +++-
>  net/ipv4/ipmr_base.c| 123 +
>  net/ipv6/ip6mr.c| 156 +++---
>  4 files changed, 230 insertions(+), 239 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 09/11] ipmr, ip6mr: Unite vif seq functions

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> Same as previously done with the mfc seq, the logic for the vif seq is
> refactored to be shared between ipmr and ip6mr.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute_base.h | 33 ++
>  net/ipv4/ipmr.c | 49 +---
>  net/ipv4/ipmr_base.c| 33 ++
>  net/ipv6/ip6mr.c| 50 
> +
>  4 files changed, 76 insertions(+), 89 deletions(-)
> 

LGTM,

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 10/11] ip6mr: Remove MFC_NOTIFY and refactor flags

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> MFC_NOTIFY exists in ip6mr, probably as some legacy code
> [was already removed for ipmr in commit
> 06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags 
> enum").
> Remove it from ip6mr as well, and move the enum into a common file;
> Notice MFC_OFFLOAD is currently only used by ipmr.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute.h  | 9 -
>  include/linux/mroute6.h | 3 ---
>  include/linux/mroute_base.h | 9 +
>  net/ipv6/ip6mr.c| 3 ---
>  4 files changed, 9 insertions(+), 15 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 08/11] ipmr, ip6mr: Unite mfc seq logic

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> With the exception of the final dump, ipmr and ip6mr have the exact same
> seq logic for traversing a given mr_table. Refactor that code and make
> it common.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute_base.h | 69 
>  net/ipv4/ipmr.c | 93 +++
>  net/ipv4/ipmr_base.c| 62 +
>  net/ipv6/ip6mr.c| 97 
> -
>  4 files changed, 143 insertions(+), 178 deletions(-)
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 07/11] ipmr, ip6mr: Unite logic for searching in MFC cache

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> ipmr and ip6mr utilize the exact same methods for searching the
> hashed resolved connections, difference being only in the construction
> of the hash comparison key.
> 
> In order to unite the flow, introduce an mr_table operation set that
> would contain the protocol specific information required for common
> flows, in this case - the hash parameters and a comparison key
> representing a (*,*) route.
> 
> Signed-off-by: Yuval Mintz 
> ---
>  include/linux/mroute_base.h | 52 +--
>  net/ipv4/ipmr.c | 71 ++-
>  net/ipv4/ipmr_base.c| 54 +++--
>  net/ipv6/ip6mr.c| 74 
> +++--
>  4 files changed, 134 insertions(+), 117 deletions(-)
> 
> diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
> index 2769e2f..18a1d75 100644
> --- a/include/linux/mroute_base.h
> +++ b/include/linux/mroute_base.h
> @@ -89,10 +89,23 @@ struct mr_mfc {
>   struct rcu_head rcu;
>  };
>  
> +struct mr_table;
> +
> +/**
> + * struct mr_table_ops - callbacks and info for protocol-specific ops
> + * rht_params: parameters for accessing the MFC hash
> + * cmparg_any: a hash key to be used for matching on (*,*) routes

kernel doc style uses @variable

> + */
> +struct mr_table_ops {
> + const struct rhashtable_params *rht_params;
> + void *cmparg_any;
> +};
> +
>  /**
>   * struct mr_table - a multicast routing table
>   * @list: entry within a list of multicast routing tables
>   * @net: net where this table belongs
> + * @op: protocol specific operations

nit: s/op/ops/

>   * @id: identifier of the table
>   * @mroute_sk: socket associated with the table
>   * @ipmr_expire_timer: timer for handling unresolved routes
> @@ -109,6 +122,7 @@ struct mr_mfc {
>  struct mr_table {
>   struct list_headlist;
>   possible_net_t  net;
> + struct mr_table_ops ops;
>   u32 id;
>   struct sock __rcu   *mroute_sk;
>   struct timer_list   ipmr_expire_timer;
> @@ -133,10 +147,19 @@ void vif_device_init(struct vif_device *v,
>  
>  struct mr_table *
>  mr_table_alloc(struct net *net, u32 id,
> -const struct rhashtable_params *rht_params,
> +struct mr_table_ops *ops,
>  void (*expire_func)(struct timer_list *t),
>  void (*table_set)(struct mr_table *mrt,
>struct net *net));
> +
> +/* These actually return 'struct mr_mfc *', but to avoid need for explicit
> + * castings they simply return void.
> + */
> +void *mr_mfc_find_parent(struct mr_table *mrt,
> +  void *hasharg, int parent);
> +void *mr_mfc_find_any_parent(struct mr_table *mrt, int vifi);
> +void *mr_mfc_find_any(struct mr_table *mrt, int vifi, void *hasharg);
> +
>  #else
>  static inline void vif_device_init(struct vif_device *v,
>  struct net_device *dev,
> @@ -147,14 +170,37 @@ static inline void vif_device_init(struct vif_device *v,
>  {
>  }
>  
> -static inline struct mr_table *
> +static inline void *
>  mr_table_alloc(struct net *net, u32 id,
> -const struct rhashtable_params *rht_params,
> +struct mr_table_ops *ops,
>  void (*expire_func)(struct timer_list *t),
>  void (*table_set)(struct mr_table *mrt,
>struct net *net))
>  {
>   return NULL;
>  }
> +
> +static inline void *mr_mfc_find_parent(struct mr_table *mrt,
> +void *hasharg, int parent)
> +{
> + return NULL;
> +}
> +
> +static inline void *mr_mfc_find_any_parent(struct mr_table *mrt,
> +int vifi)
> +{
> + return NULL;
> +}
> +
> +static inline struct mr_mfc *mr_mfc_find_any(struct mr_table *mrt,
> +  int vifi, void *hasharg)
> +{
> + return NULL;
> +}
>  #endif
> +
> +static inline void *mr_mfc_find(struct mr_table *mrt, void *hasharg)
> +{
> + return mr_mfc_find_parent(mrt, hasharg, -1);
> +}
>  #endif
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 238a579..d0bf021 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -359,6 +359,16 @@ static void ipmr_new_table_set(struct mr_table *mrt,
>  #endif
>  }
>  
> +static struct mfc_cache_cmp_arg ipmr_mr_table_ops_cmparg_any = {
> + .mfc_mcastgrp = htonl(INADDR_ANY),
> + .mfc_origin = htonl(INADDR_ANY),
> +};
> +
> +static struct mr_table_ops ipmr_mr_table_ops = {
> + .rht_params = _rht_params,
> + .cmparg_any = _mr_table_ops_cmparg_any,
> +};
> +
>  static struct mr_table *ipmr_new_table(struct net *net, u32 id)
>  {
>   struct mr_table *mrt;
> @@ -371,7 +381,7 @@ static struct mr_table *ipmr_new_table(struct net *net, 
> u32 id)
>   if (mrt)
>   return mrt;
>  
> -

Re: [PATCH net-next 06/11] ipmr, ip6mr: Make mfc_cache a common structure

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> mfc_cache and mfc6_cache are almost identical - the main difference is
> in the origin/group addresses and comparison-key. Make a common
> structure encapsulating most of the multicast routing logic  - mr_mfc
> and convert both ipmr and ip6mr into using it.
> 
> For easy conversion [casting, in this case] mr_mfc has to be the first
> field inside every multicast routing abstraction utilizing it.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c |  21 +-
>  include/linux/mroute.h|  45 +---
>  include/linux/mroute6.h   |  23 +--
>  include/linux/mroute_base.h   |  45 
>  net/ipv4/ipmr.c   | 234 +++--
>  net/ipv6/ip6mr.c  | 241 
> +++---
>  6 files changed, 312 insertions(+), 297 deletions(-)
> 

I feel uneasy about these casts all over the place, anyway functionally
the patch looks fine.

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 05/11] ipmr, ip6mr: Unite creation of new mr_table

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> Now that both ipmr and ip6mr are using the same mr_table structure,
> we can have a common function to allocate & initialize a new instance.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute_base.h | 17 +
>  net/ipv4/ipmr.c | 27 ++-
>  net/ipv4/ipmr_base.c| 27 +++
>  net/ipv6/ip6mr.c| 30 ++
>  4 files changed, 64 insertions(+), 37 deletions(-)
> 

I don't like the function definition broken with the name and type on different 
rows, but
I guess it's a personal preference, the patch looks okay.

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 04/11] mroute*: Make mr_table a common struct

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> Following previous changes to ip6mr, mr_table and mr6_table are
> basically the same [up to mr6_table having additional '6' suffixes to
> its variable names].
> Move the common structure definition into a common header; This
> requires renaming all references in ip6mr to variables that had the
> distinct suffix.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute.h  |  21 
>  include/linux/mroute6.h |   1 -
>  include/linux/mroute_base.h |  46 +++
>  include/net/netns/ipv6.h|   2 +-
>  net/ipv4/ipmr.c |   2 -
>  net/ipv6/ip6mr.c| 301 
> 
>  6 files changed, 186 insertions(+), 187 deletions(-)
> 

Nice,

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 03/11] ip6mr: Align hash implementation to ipmr

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> Since commit 8fb472c09b9d ("ipmr: improve hash scalability") ipmr has
> been using rhashtable as a basis for its mfc routes, but ip6mr is
> currently still using the old private MFC hash implementation.
> 
> Align ip6mr to the current ipmr implementation.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute6.h |  30 ++---
>  net/ipv6/ip6mr.c| 313 
> ++--
>  2 files changed, 184 insertions(+), 159 deletions(-)
> 

This looks like my ipmr patch ported for v6, pretty straightforward and
for a long time on my TODO list. :-)

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 02/11] ip6mr: Make mroute_sk rcu-based

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> In ipmr the mr_table socket is handled under RCU. Introduce the same
> for ip6mr.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute6.h |  6 +++---
>  net/ipv6/ip6_output.c   |  2 +-
>  net/ipv6/ip6mr.c| 45 +++--
>  3 files changed, 31 insertions(+), 22 deletions(-)

LGTM, though do we need the write_lock when changing mroute6_sk after RCUfying 
it ?
In any case that can be resolved later,

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 01/11] ipmr,ipmr6: Define a uniform vif_device

2018-02-27 Thread Nikolay Aleksandrov

On 27/02/18 20:58, Yuval Mintz wrote:
> The two implementations have almost identical structures - vif_device and
> mif_device. As a step toward uniforming the mr_tables, eliminate the
> mif_device and relocate the vif_device definition into a new common
> header file.
> 
> Also, introduce a common initializing function for setting most of the
> vif_device fields in a new common source file. This requires modifying
> the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
> config option - CONFIG_IP_MROUTE_COMMON.
> 
> Signed-off-by: Yuval Mintz <yuv...@mellanox.com>
> ---
>  include/linux/mroute.h  | 13 +---
>  include/linux/mroute6.h | 11 +-
>  include/linux/mroute_base.h | 52 
> +
>  net/ipv4/Kconfig|  5 +
>  net/ipv4/Makefile   |  1 +
>  net/ipv4/ipmr.c | 32 +---
>  net/ipv4/ipmr_base.c| 28 
>  net/ipv6/Kconfig|  1 +
>  net/ipv6/ip6mr.c| 37 
>  9 files changed, 117 insertions(+), 63 deletions(-)
>  create mode 100644 include/linux/mroute_base.h
>  create mode 100644 net/ipv4/ipmr_base.c
> 

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net] bridge: Fix VLAN reference count problem

2018-02-25 Thread Nikolay Aleksandrov

On 25/02/18 21:59, Ido Schimmel wrote:
> When a VLAN is added on a port, a reference is taken on the
> corresponding master VLAN entry. If it does not already exist, then it
> is created and a reference taken.
> 
> However, in the second case a reference is not really taken when
> CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by
> refcount_inc_not_zero().
> 
> Fix this by using refcount_set() on a newly created master VLAN entry.
> 
> Fixes: 251277598596 ("net, bridge: convert net_bridge_vlan.refcnt from 
> atomic_t to refcount_t")
> Signed-off-by: Ido Schimmel <ido...@mellanox.com>
> ---
>  net/bridge/br_vlan.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 51935270c651..9896f4975353 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -168,6 +168,8 @@ static struct net_bridge_vlan *br_vlan_get_master(struct 
> net_bridge *br, u16 vid
>   masterv = br_vlan_find(vg, vid);
>   if (WARN_ON(!masterv))
>   return NULL;
> + refcount_set(>refcnt, 1);
> +     return masterv;
>   }
>   refcount_inc(>refcnt);
>  
> 

Good catch,

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH net-next 1/5] net: fib_rules: support for match on ip_proto, sport and dport

2018-02-25 Thread Nikolay Aleksandrov

On 25/02/18 17:04, Nikolay Aleksandrov wrote:
> On 25/02/18 07:44, Roopa Prabhu wrote:
>> From: Roopa Prabhu <ro...@cumulusnetworks.com>
>>
>> uapi for ip_proto, sport and dport range match
>> in fib rules.
>>
>> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
>> ---
[snip]
>>  struct rcu_head rcu;
>>  };
>>  
>> @@ -144,6 +151,28 @@ static inline u32 frh_get_table(struct fib_rule_hdr 
>> *frh, struct nlattr **nla)
>>  return frh->table;
>>  }
>>  
>> +static inline bool fib_rule_port_inrange(struct fib_port_range *a,
>> + __be16 port)
>> +{
>> +if (!a->start)
>> +return true;
> 
> Can start be == 0 ?
> IIUC this check is unnecessary because when you're adding the new rule,
> you do a check for start > 0 so it shouldn't be possible to be 0.

Nevermind this comment, I spoke too soon and saw the match later. :-)

> 
>> +return ntohs(port) >= a->start &&
>> +ntohs(port) <= a->end;
>> +}
>> +
>> +static inline bool fib_rule_port_range_valid(const struct fib_port_range *a)
>> +{
>> +return a->start > 0 && a->end < 0x &&
>> +a->start <= a->end;
> 
> nit: alignment (also can be on a single line)
> 
>> +}
>> +
>> +static inline bool fib_rule_port_range_compare(struct fib_port_range *a,
>> +   struct fib_port_range *b)
>> +{
>> +return a->start == b->start &&
>> +a->end == b->end;
> 
> nit: alignment (also can be on a single line)
> 
>> +}
>> +
>>  struct fib_rules_ops *fib_rules_register(const struct fib_rules_ops *,
>>   struct net *);
>>  void fib_rules_unregister(struct fib_rules_ops *);
>> diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
>> index 77d90ae..232df14 100644
>> --- a/include/uapi/linux/fib_rules.h
>> +++ b/include/uapi/linux/fib_rules.h
>> @@ -35,6 +35,11 @@ struct fib_rule_uid_range {
>>  __u32   end;
>>  };
>>  
>> +struct fib_rule_port_range {
>> +__u16   start;
>> +__u16   end;
>> +};
>> +
>>  enum {
>>  FRA_UNSPEC,
>>  FRA_DST,/* destination address */
>> @@ -59,6 +64,9 @@ enum {
>>  FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
>>  FRA_UID_RANGE,  /* UID range */
>>  FRA_PROTOCOL,   /* Originator of the rule */
>> +FRA_IP_PROTO,   /* ip proto */
>> +FRA_SPORT_RANGE, /* sport */
>> +FRA_DPORT_RANGE, /* dport */
>>  __FRA_MAX
>>  };
>>  
>> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
>> index a6aea80..5008235 100644
>> --- a/net/core/fib_rules.c
>> +++ b/net/core/fib_rules.c
>> @@ -33,6 +33,10 @@ bool fib_rule_matchall(const struct fib_rule *rule)
>>  if (!uid_eq(rule->uid_range.start, fib_kuid_range_unset.start) ||
>>  !uid_eq(rule->uid_range.end, fib_kuid_range_unset.end))
>>  return false;
>> +if (fib_rule_port_range_valid(>sport_range))
>> +return false;
>> +if (fib_rule_port_range_valid(>dport_range))
>> +return false;
>>  return true;
>>  }
>>  EXPORT_SYMBOL_GPL(fib_rule_matchall);
>> @@ -221,6 +225,12 @@ static int nla_put_uid_range(struct sk_buff *skb, 
>> struct fib_kuid_range *range)
>>  return nla_put(skb, FRA_UID_RANGE, sizeof(out), );
>>  }
>>  
>> +static int nla_put_port_range(struct sk_buff *skb, int attrtype,
>> +  struct fib_port_range *range)
>> +{
>> +return nla_put(skb, attrtype, sizeof(*range), range);
>> +}
>> +
>>  static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops,
>>struct flowi *fl, int flags,
>>struct fib_lookup_arg *arg)
>> @@ -425,6 +435,17 @@ static int rule_exists(struct fib_rules_ops *ops, 
>> struct fib_rule_hdr *frh,
>>  !uid_eq(r->uid_range.end, rule->uid_range.end))
>>  continue;
>>  
>> +if (r->ip_proto != rule->ip_proto)
>> +continue;
>> +
>> +if (!fib_rule_port_range_compare(>sport_range,
>> + >sport_range))
>> +continue

Re: [PATCH net-next 5/5] ipv6: route: dissect flow in input path if fib rules need it

2018-02-25 Thread Nikolay Aleksandrov

On 25/02/18 07:44, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> Dissect flow in fwd path if fib rules require it. Controlled by
> a flag to avoid penatly for the common case. Flag is set when fib
> rules with sport, dport and proto match that require flow dissect
> are installed. Also passes the dissected hash keys to the multipath
> hash function when applicable to avoid dissecting the flow again.
> icmp packets will continue to use inner header for hash
> calculations.
> 
> Signed-off-by: Roopa Prabhu 
> ---
>  include/net/ip6_route.h  |  3 ++-
>  include/net/netns/ipv6.h |  1 +
>  net/ipv6/fib6_rules.c|  5 +
>  net/ipv6/icmp.c  |  2 +-
>  net/ipv6/route.c | 45 -
>  5 files changed, 45 insertions(+), 11 deletions(-)
> 
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 27d23a6..218f89c 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -127,7 +127,8 @@ static inline int ip6_route_get_saddr(struct net *net, 
> struct rt6_info *rt,
>  
>  struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
>   const struct in6_addr *saddr, int oif, int flags);
> -u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb);
> +u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb,
> +struct flow_keys *hkeys);
>  
>  struct dst_entry *icmp6_dst_alloc(struct net_device *dev, struct flowi6 
> *fl6);
>  
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 987cc45..7aca00e 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -72,6 +72,7 @@ struct netns_ipv6 {
>   unsigned longip6_rt_last_gc;
>  #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>   bool fib6_has_custom_rules;
> + bool fib6_rules_require_fldissect;
>   struct rt6_info *ip6_prohibit_entry;
>   struct rt6_info *ip6_blk_hole_entry;
>   struct fib6_table   *fib6_local_tbl;
> diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
> index 678d664..e3a7861 100644
> --- a/net/ipv6/fib6_rules.c
> +++ b/net/ipv6/fib6_rules.c
> @@ -267,6 +267,11 @@ static int fib6_rule_configure(struct fib_rule *rule, 
> struct sk_buff *skb,
>   rule6->dst.plen = frh->dst_len;
>   rule6->tclass = frh->tos;
>  
> + if (rule->ip_proto ||
> + fib_rule_port_range_valid(>sport_range) ||
> + fib_rule_port_range_valid(>dport_range))
> + net->ipv6.fib6_rules_require_fldissect = true;
> +
>   net->ipv6.fib6_has_custom_rules = true;
>   err = 0;
>  errout:
> diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
> index 4fa4f1b..b0778d3 100644
> --- a/net/ipv6/icmp.c
> +++ b/net/ipv6/icmp.c
> @@ -522,7 +522,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
> code, __u32 info,
>   fl6.fl6_icmp_type = type;
>   fl6.fl6_icmp_code = code;
>   fl6.flowi6_uid = sock_net_uid(net, NULL);
> - fl6.mp_hash = rt6_multipath_hash(, skb);
> + fl6.mp_hash = rt6_multipath_hash(, skb, NULL);
>   security_skb_classify_flow(skb, flowi6_to_flowi());
>  
>   sk = icmpv6_xmit_lock(net);
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index aa709b6..778212b 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -460,7 +460,7 @@ static struct rt6_info *rt6_multipath_select(struct 
> rt6_info *match,
>* case it will always be non-zero. Otherwise now is the time to do it.
>*/
>   if (!fl6->mp_hash)
> - fl6->mp_hash = rt6_multipath_hash(fl6, NULL);
> + fl6->mp_hash = rt6_multipath_hash(fl6, NULL, NULL);
>  
>   if (fl6->mp_hash <= atomic_read(>rt6i_nh_upper_bound))
>   return match;
> @@ -1786,10 +1786,12 @@ struct dst_entry *ip6_route_input_lookup(struct net 
> *net,
>  EXPORT_SYMBOL_GPL(ip6_route_input_lookup);
>  
>  static void ip6_multipath_l3_keys(const struct sk_buff *skb,
> -   struct flow_keys *keys)
> +   struct flow_keys *keys,
> +   struct flow_keys *flkeys)
>  {
>   const struct ipv6hdr *outer_iph = ipv6_hdr(skb);
>   const struct ipv6hdr *key_iph = outer_iph;
> + struct flow_keys *_flkeys = flkeys;
>   const struct ipv6hdr *inner_iph;
>   const struct icmp6hdr *icmph;
>   struct ipv6hdr _inner_iph;
> @@ -1811,22 +1813,31 @@ static void ip6_multipath_l3_keys(const struct 
> sk_buff *skb,
>   goto out;
>  
>   key_iph = inner_iph;
> + _flkeys = NULL;
>  out:
>   memset(keys, 0, sizeof(*keys));
>   keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
> - keys->addrs.v6addrs.src = key_iph->saddr;
> - keys->addrs.v6addrs.dst = key_iph->daddr;
> - keys->tags.flow_label = ip6_flowinfo(key_iph);
> - keys->basic.ip_proto

Re: [PATCH net-next 4/5] ipv4: route: dissect flow in input path if fib rules need it

2018-02-25 Thread Nikolay Aleksandrov

On 25/02/18 07:44, Roopa Prabhu wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> 
> Dissect flow in fwd path if fib rules require it. Controlled by
> a flag to avoid penatly for the common case. Flag is set when fib
> rules with sport, dport and proto match that require flow dissect
> are installed. Also passes the dissected hash keys to the multipath
> hash function when applicable to avoid dissecting the flow again.
> icmp packets will continue to use inner header for hash
> calculations (Thanks to Nikolay Aleksandrov for some review here).
> 
> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
> ---
>  include/net/ip_fib.h |  2 +-
>  include/net/netns/ipv4.h |  1 +
>  net/ipv4/fib_rules.c |  6 ++
>  net/ipv4/fib_semantics.c |  2 +-
>  net/ipv4/route.c | 52 
> +++-
>  5 files changed, 47 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index f805243..5ada772 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -371,7 +371,7 @@ int fib_sync_up(struct net_device *dev, unsigned int 
> nh_flags);
>  
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH
>  int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
> -const struct sk_buff *skb);
> +const struct sk_buff *skb, struct flow_keys *flkeys);
>  #endif
>  void fib_select_multipath(struct fib_result *res, int hash);
>  void fib_select_path(struct net *net, struct fib_result *res,
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 44668c2..87b8fdc 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -52,6 +52,7 @@ struct netns_ipv4 {
>  #ifdef CONFIG_IP_MULTIPLE_TABLES
>   struct fib_rules_ops*rules_ops;
>   boolfib_has_custom_rules;
> + boolfib_rules_require_fldissect;
>   struct fib_table __rcu  *fib_main;
>   struct fib_table __rcu  *fib_default;
>  #endif
> diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
> index 9d55c90..83aa786 100644
> --- a/net/ipv4/fib_rules.c
> +++ b/net/ipv4/fib_rules.c
> @@ -253,6 +253,11 @@ static int fib4_rule_configure(struct fib_rule *rule, 
> struct sk_buff *skb,
>   }
>  #endif
>  
> + if (rule->ip_proto ||
> + fib_rule_port_range_valid(>sport_range) ||
> + fib_rule_port_range_valid(>dport_range))
> + net->ipv4.fib_rules_require_fldissect = true;
> +
>   rule4->src_len = frh->src_len;
>   rule4->srcmask = inet_make_mask(rule4->src_len);
>   rule4->dst_len = frh->dst_len;
> @@ -398,6 +403,7 @@ int __net_init fib4_rules_init(struct net *net)
>   goto fail;
>   net->ipv4.rules_ops = ops;
>   net->ipv4.fib_has_custom_rules = false;
> + net->ipv4.fib_rules_require_fldissect = false;
>   return 0;
>  
>  fail:
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index cd46d76..b0c398a 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -1770,7 +1770,7 @@ void fib_select_path(struct net *net, struct fib_result 
> *res,
>  
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH
>   if (res->fi->fib_nhs > 1) {
> - int h = fib_multipath_hash(res->fi, fl4, skb);
> + int h = fib_multipath_hash(res->fi, fl4, skb, NULL);
>  
>   fib_select_multipath(res, h);
>   }
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 26eefa2..72dd6c6 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1783,7 +1783,7 @@ static void ip_multipath_l3_keys(const struct sk_buff 
> *skb,
>  
>  /* if skb is set it will be used and fl4 can be NULL */
>  int fib_multipath_hash(const struct fib_info *fi, const struct flowi4 *fl4,
> -const struct sk_buff *skb)
> +const struct sk_buff *skb, struct flow_keys *flkeys)
>  {
>   struct net *net = fi->fib_net;
>   struct flow_keys hash_keys;
> @@ -1810,14 +1810,23 @@ int fib_multipath_hash(const struct fib_info *fi, 
> const struct flowi4 *fl4,
>   if (skb->l4_hash)
>   return skb_get_hash_raw(skb) >> 1;
>   memset(_keys, 0, sizeof(hash_keys));
> - skb_flow_dissect_flow_keys(skb, , flag);
>  
> - hash_keys.control.addr_type = 
> FLOW_DISSECTOR_KEY_IPV4_ADDRS;
> - hash_keys.addrs.v4addrs.src = keys.addrs.v4addrs.src;
> - hash_keys.addrs.v4addrs.dst = keys.addrs.v4addr

Re: [PATCH net-next 1/5] net: fib_rules: support for match on ip_proto, sport and dport

2018-02-25 Thread Nikolay Aleksandrov

On 25/02/18 07:44, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> uapi for ip_proto, sport and dport range match
> in fib rules.
> 
> Signed-off-by: Roopa Prabhu 
> ---
>  include/net/fib_rules.h| 31 +-
>  include/uapi/linux/fib_rules.h |  8 
>  net/core/fib_rules.c   | 94 
> +-
>  3 files changed, 130 insertions(+), 3 deletions(-)
> 

You should probably update validate_rulemsg() as well, these aren't added in 
the per-proto
policies and nothing validates if the attribute data is actually there. Maybe 
I'm missing
something obvious, but it looks like many other FRA_ attributes don't have such 
checks.

> diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
> index b3d2162..6d99202 100644
> --- a/include/net/fib_rules.h
> +++ b/include/net/fib_rules.h
> @@ -11,6 +11,11 @@
>  #include 
>  #include 
>  
> +struct fib_port_range {
> + __u16 start;
> + __u16 end;
> +};
> +
>  struct fib_kuid_range {
>   kuid_t start;
>   kuid_t end;
> @@ -27,7 +32,7 @@ struct fib_rule {
>   u8  action;
>   u8  l3mdev;
>   u8  proto;
> - /* 1 byte hole, try to use */
> + u8  ip_proto;
>   u32 target;
>   __be64  tun_id;
>   struct fib_rule __rcu   *ctarget;
> @@ -40,6 +45,8 @@ struct fib_rule {
>   chariifname[IFNAMSIZ];
>   charoifname[IFNAMSIZ];
>   struct fib_kuid_range   uid_range;
> + struct fib_port_range   sport_range;
> + struct fib_port_range   dport_range;
>   struct rcu_head rcu;
>  };
>  
> @@ -144,6 +151,28 @@ static inline u32 frh_get_table(struct fib_rule_hdr 
> *frh, struct nlattr **nla)
>   return frh->table;
>  }
>  
> +static inline bool fib_rule_port_inrange(struct fib_port_range *a,
> +  __be16 port)
> +{
> + if (!a->start)
> + return true;

Can start be == 0 ?
IIUC this check is unnecessary because when you're adding the new rule,
you do a check for start > 0 so it shouldn't be possible to be 0.

> + return ntohs(port) >= a->start &&
> + ntohs(port) <= a->end;
> +}
> +
> +static inline bool fib_rule_port_range_valid(const struct fib_port_range *a)
> +{
> + return a->start > 0 && a->end < 0x &&
> + a->start <= a->end;

nit: alignment (also can be on a single line)

> +}
> +
> +static inline bool fib_rule_port_range_compare(struct fib_port_range *a,
> +struct fib_port_range *b)
> +{
> + return a->start == b->start &&
> + a->end == b->end;

nit: alignment (also can be on a single line)

> +}
> +
>  struct fib_rules_ops *fib_rules_register(const struct fib_rules_ops *,
>struct net *);
>  void fib_rules_unregister(struct fib_rules_ops *);
> diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
> index 77d90ae..232df14 100644
> --- a/include/uapi/linux/fib_rules.h
> +++ b/include/uapi/linux/fib_rules.h
> @@ -35,6 +35,11 @@ struct fib_rule_uid_range {
>   __u32   end;
>  };
>  
> +struct fib_rule_port_range {
> + __u16   start;
> + __u16   end;
> +};
> +
>  enum {
>   FRA_UNSPEC,
>   FRA_DST,/* destination address */
> @@ -59,6 +64,9 @@ enum {
>   FRA_L3MDEV, /* iif or oif is l3mdev goto its table */
>   FRA_UID_RANGE,  /* UID range */
>   FRA_PROTOCOL,   /* Originator of the rule */
> + FRA_IP_PROTO,   /* ip proto */
> + FRA_SPORT_RANGE, /* sport */
> + FRA_DPORT_RANGE, /* dport */
>   __FRA_MAX
>  };
>  
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index a6aea80..5008235 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -33,6 +33,10 @@ bool fib_rule_matchall(const struct fib_rule *rule)
>   if (!uid_eq(rule->uid_range.start, fib_kuid_range_unset.start) ||
>   !uid_eq(rule->uid_range.end, fib_kuid_range_unset.end))
>   return false;
> + if (fib_rule_port_range_valid(>sport_range))
> + return false;
> + if (fib_rule_port_range_valid(>dport_range))
> + return false;
>   return true;
>  }
>  EXPORT_SYMBOL_GPL(fib_rule_matchall);
> @@ -221,6 +225,12 @@ static int nla_put_uid_range(struct sk_buff *skb, struct 
> fib_kuid_range *range)
>   return nla_put(skb, FRA_UID_RANGE, sizeof(out), );
>  }
>  
> +static int nla_put_port_range(struct sk_buff *skb, int attrtype,
> +   struct fib_port_range *range)
> +{
> + return nla_put(skb, attrtype, sizeof(*range), range);
> +}
> +
>  static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops,
> struct flowi *fl, int flags,
>

Re: [PATCH net] net: ipv4: Set addr_type in hash_keys for forwarded case

2018-02-21 Thread Nikolay Aleksandrov

On 21/02/18 21:00, David Ahern wrote:
> The result of the skb flow dissect is copied from keys to hash_keys to
> ensure only the intended data is hashed. The original L4 hash patch
> overlooked setting the addr_type for this case; add it.
> 
> Fixes: bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice")
> Reported-by: Ido Schimmel <ido...@idosch.org>
> Signed-off-by: David Ahern <dsah...@gmail.com>
> ---
>  net/ipv4/route.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 49cc1c1df1ba..a4f44d815a61 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1826,6 +1826,8 @@ int fib_multipath_hash(const struct fib_info *fi, const 
> struct flowi4 *fl4,
>   return skb_get_hash_raw(skb) >> 1;
>   memset(_keys, 0, sizeof(hash_keys));
>   skb_flow_dissect_flow_keys(skb, , flag);
> +
> + hash_keys.control.addr_type = 
> FLOW_DISSECTOR_KEY_IPV4_ADDRS;
>   hash_keys.addrs.v4addrs.src = keys.addrs.v4addrs.src;
>   hash_keys.addrs.v4addrs.dst = keys.addrs.v4addrs.dst;
>   hash_keys.ports.src = keys.ports.src;
> 

Good catch,

Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

Re: [PATCH] net: bridge: Fix uninitialized error in br_fdb_sync_static()

2018-02-01 Thread Nikolay Aleksandrov

On 01/02/18 12:25, Geert Uytterhoeven wrote:
> With gcc-4.1.2.:
> 
> net/bridge/br_fdb.c: In function ‘br_fdb_sync_static’:
> net/bridge/br_fdb.c:996: warning: ‘err’ may be used uninitialized in this 
> function
> 
> Indeed, if the list is empty, err will be uninitialized, and will be
> propagated up as the function return value.
> 
> Fix this by preinitializing err to zero.
> 
> Fixes: eb7935830d00b9e0 ("net: bridge: use rhashtable for fdbs")
> Signed-off-by: Geert Uytterhoeven <ge...@linux-m68k.org>
> ---
>  net/bridge/br_fdb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index dc87fbc9a23b04e6..d9e69e4514beb20d 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -993,7 +993,7 @@ int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>  int br_fdb_sync_static(struct net_bridge *br, struct net_bridge_port *p)
>  {
>   struct net_bridge_fdb_entry *f, *tmp;
> -     int err;
> + int err = 0;
>  
>   ASSERT_RTNL();
>  
> 

Thanks,
Acked-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 998 matches

Mail list logo