Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-05 Thread महेश बंडेवार
On Sat, Nov 4, 2017 at 4:53 PM, Serge E. Hallyn  wrote:
>
> Quoting Mahesh Bandewar (mah...@bandewar.net):
> > Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
> > that belongs to uncontrolled user-ns can create another (child) user-
> > namespace that is uncontrolled. Any other process (that either does
> > not have SYS_ADMIN or belongs to a controlled user-ns) can only
> > create a user-ns that is controlled.
>
> That's a huge change though.  It means that any system that previously
> used unprivileged containers will need new privileged code (which always
> risks more privilege leaks through the new code) to re-enable what was
> possible without privilege before.  That's a regression.
>
I wouldn't call it a regression since the existing behavior is
preserved as it is if the default-mask is not altered. i.e.
uncontrolled process can create user-ns and have full control inside
that user-ns. The only difference is - as an example if 'something'
comes up which makes a specific capability expose ring-0, so admin can
quickly remove the capability in question from the mask, so that no
untrusted code can exploit that capability until either the kernel is
patched or workloads are sanitized keeping in mind what was
discovered. (I have given some real life example vulnerabilities
published recently about CAP_NET_RAW in the cover letter)

> I'm very much interested in what you want to do,  But it seems like
> it would be worth starting with some automated code analysis that shows
> exactly what code becomes accessible to unprivileged users with user
> namespaces which was accessible to unprivileged users before.  Then we
> can reason about classifying that code and perhaps limiting access to
> some of it.
I would like to look at this as 'a tool' that is available to admins
who can quickly take possible-compromise-situation under-control
probably at the cost of some functionality-loss until kernel is
patched and the mask is restored to default value.

I'm not sure if automated tools could discover anything since these
changes should not alter behavior in any way.


[PATCH net-next] net: netlink: Update attr validation to require exact length for some types

2017-11-05 Thread David Ahern
Attributes using NLA_U* and NLA_S* (where * is 8, 16,32 and 64) are
expected to be an exact length. Split these data types from
nla_attr_minlen into nla_attr_len and update validate_nla to require
the attribute to have exact length for them.

Signed-off-by: David Ahern 
---
resend. First one was in response to a thread and did not show up
in patchworks.

 lib/nlattr.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/nlattr.c b/lib/nlattr.c
index 927c2f19f119..b5e360e7dfc8 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -14,19 +14,23 @@
 #include 
 #include 
 
-static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
+/* for these data types attribute length must be exactly given size */
+static const u8 nla_attr_len[NLA_TYPE_MAX+1] = {
[NLA_U8]= sizeof(u8),
[NLA_U16]   = sizeof(u16),
[NLA_U32]   = sizeof(u32),
[NLA_U64]   = sizeof(u64),
-   [NLA_MSECS] = sizeof(u64),
-   [NLA_NESTED]= NLA_HDRLEN,
[NLA_S8]= sizeof(s8),
[NLA_S16]   = sizeof(s16),
[NLA_S32]   = sizeof(s32),
[NLA_S64]   = sizeof(s64),
 };
 
+static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
+   [NLA_MSECS] = sizeof(u64),
+   [NLA_NESTED]= NLA_HDRLEN,
+};
+
 static int validate_nla_bitfield32(const struct nlattr *nla,
   u32 *valid_flags_allowed)
 {
@@ -64,6 +68,13 @@ static int validate_nla(const struct nlattr *nla, int 
maxtype,
 
BUG_ON(pt->type > NLA_TYPE_MAX);
 
+   /* for data types NLA_U* and NLA_S* require exact length */
+   if (nla_attr_len[pt->type]) {
+   if (attrlen != nla_attr_len[pt->type])
+   return -ERANGE;
+   return 0;
+   }
+
switch (pt->type) {
case NLA_FLAG:
if (attrlen > 0)
-- 
2.1.4



[PATCH net-next 1/3] net: neigh: Add helper to flush entries on carrier down

2017-11-05 Thread David Ahern
Add neigh_carrier_down helper to flush non-permanent entries on carrier
down.

Signed-off-by: David Ahern 
---
 include/net/neighbour.h |  1 +
 net/core/neighbour.c| 26 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 2492000e1035..b2eff6ffbfdc 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -320,6 +320,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, 
u8 new, u32 flags,
 void __neigh_set_probe_once(struct neighbour *neigh);
 bool neigh_remove_one(struct neighbour *ndel, struct neigh_table *tbl);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
+int neigh_carrier_down(struct neigh_table *tbl, struct net_device *dev);
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb);
 int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6ea3a1a7f36a..a7012bca1ce9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -229,7 +229,8 @@ static void pneigh_queue_purge(struct sk_buff_head *list)
}
 }
 
-static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+   bool skip_perm)
 {
int i;
struct neigh_hash_table *nht;
@@ -247,6 +248,10 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
struct net_device *dev)
np = >next;
continue;
}
+   if (skip_perm && n->nud_state & NUD_PERMANENT) {
+   np = >next;
+   continue;
+   }
rcu_assign_pointer(*np,
   rcu_dereference_protected(n->next,
lockdep_is_held(>lock)));
@@ -282,20 +287,33 @@ static void neigh_flush_dev(struct neigh_table *tbl, 
struct net_device *dev)
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
+   neigh_flush_dev(tbl, dev, false);
write_unlock_bh(>lock);
 }
 EXPORT_SYMBOL(neigh_changeaddr);
 
-int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+static void __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+  bool skip_perm)
 {
write_lock_bh(>lock);
-   neigh_flush_dev(tbl, dev);
+   neigh_flush_dev(tbl, dev, skip_perm);
pneigh_ifdown(tbl, dev);
write_unlock_bh(>lock);
 
del_timer_sync(>proxy_timer);
pneigh_queue_purge(>proxy_queue);
+}
+
+int neigh_carrier_down(struct neigh_table *tbl, struct net_device *dev)
+{
+   __neigh_ifdown(tbl, dev, true);
+   return 0;
+}
+EXPORT_SYMBOL(neigh_carrier_down);
+
+int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+{
+   __neigh_ifdown(tbl, dev, false);
return 0;
 }
 EXPORT_SYMBOL(neigh_ifdown);
-- 
2.1.4



Re: [PATCH] reduce the spinlock conflict during massive connect

2017-11-05 Thread Eric Dumazet
On Mon, 2017-11-06 at 14:48 +0800, Liu Yu wrote:
> On Mon, Nov 6, 2017 at 1:27 PM, Eric Dumazet  wrote:
> > On Mon, 2017-11-06 at 10:28 +0800, Liu Yu wrote:
> >> From: Liu Yu 
> >>
> >> When a mount of processes connect to the same port at the same address
> >> simultaneously, they are likely getting the same bhash and therefore
> >> conflict with each other.
> >>
> >> The more the cpu number, the worse in this case.
> >>
> >> Use spin_trylock instead for this scene, which seems doesn't matter
> >> for common case.
> >>
> >> Signed-off-by: Liu Yu 
> >> ---
> >>  net/ipv4/inet_hashtables.c |6 +-
> >>  1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> >> index e7d15fb..cc11ec7 100644
> >> --- a/net/ipv4/inet_hashtables.c
> >> +++ b/net/ipv4/inet_hashtables.c
> >> @@ -581,13 +581,17 @@ int __inet_hash_connect(struct 
> >> inet_timewait_death_row *death_row,
> >>  other_parity_scan:
> >>   port = low + offset;
> >>   for (i = 0; i < remaining; i += 2, port += 2) {
> >> + int ret;
> >> +
> >>   if (unlikely(port >= high))
> >>   port -= remaining;
> >>   if (inet_is_local_reserved_port(net, port))
> >>   continue;
> >>   head = >bhash[inet_bhashfn(net, port,
> >> hinfo->bhash_size)];
> >> - spin_lock_bh(>lock);
> >> + ret = spin_trylock(>lock);
> >> + if (unlikely(!ret))
> >> + continue;
> >>
> >>   /* Does not bother with rcv_saddr checks, because
> >>* the established check is already unique enough.
> >
> > This is broken.
> >
> > I am pretty sure you have not really tested this patch properly.
> >
> > Chances are very high that a connect() will miss slots and wont succeed,
> > when table is almost full.
> 
> Thanks for your comments!
> 
> Can you explain how connect() miss slots when table is almost full?


Every time your spin_trylock() is failing, you will not examin one port.

Now go back to the loop :

for (i = 0; i < remaining; i += 2, port += 2) {

Since we no longer look all the ports, we are going to fail to find a
4-tuple.

Really a spin_trylock() is a not a good idea.

Find something else if you really have contention at connect()

> 
> >
> > Performance is nice, but we actually need to allocate a 4-tuple in a
> > more deterministic fashion.
> >
> 
> So, what's the 4th element would you suggest?

What 4th element I am suggesting ?




[PATCH net-next 2/3] net: ipv4: flush neighbor entries when carrier is off

2017-11-05 Thread David Ahern
Commit a6db4494d218c ("net: ipv4: Consider failed nexthops in multipath
routes") added support for checking neighbor state when selecting a path
for multipath route lookups. It works but incurs a delay waiting for
the neighbor entry to timeout. Improve the path selection by flushing
non-permanent neighbor entries when carrier is off.

Signed-off-by: Satish Ashok 
Signed-off-by: David Ahern 
---
 net/ipv4/fib_frontend.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index f02819134ba2..aa8fea74858f 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1226,10 +1226,13 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
-   if (flags & (IFF_RUNNING | IFF_LOWER_UP))
+   if (flags & (IFF_RUNNING | IFF_LOWER_UP)) {
fib_sync_up(dev, RTNH_F_LINKDOWN);
-   else
+   } else {
fib_sync_down_dev(dev, event, false);
+   if (IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev))
+   neigh_carrier_down(_tbl, dev);
+   }
/* fall through */
case NETDEV_CHANGEMTU:
rt_cache_flush(net);
-- 
2.1.4



[PATCH net-next 0/3] net: flush neighbor entries when carrier is off

2017-11-05 Thread David Ahern
Commit a6db4494d218c ("net: ipv4: Consider failed nexthops in multipath
routes") added support for checking neighbor state when selecting a path
for multipath route lookups. It works but incurs a delay waiting for
the neighbor entry to timeout. Improve the path selection by flushing
non-permanent neighbor entries when carrier is off.

David Ahern (3):
  net: neigh: Add helper to flush entries on carrier down
  net: ipv4: flush neighbor entries when carrier is off
  net: ipv6: flush neighbor entries when carrier is off

 include/net/neighbour.h |  1 +
 net/core/neighbour.c| 26 ++
 net/ipv4/fib_frontend.c |  7 +--
 net/ipv6/addrconf.c |  3 +++
 4 files changed, 31 insertions(+), 6 deletions(-)

-- 
2.1.4



[PATCH net-next 3/3] net: ipv6: flush neighbor entries when carrier is off

2017-11-05 Thread David Ahern
Similar to IPv4, flush non-permanent neighbor entries on carrier down
to improve path selection for multipath routes.

Signed-off-by: Satish Ashok 
Signed-off-by: David Ahern 
---
 net/ipv6/addrconf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5a8a10229a07..85bddff5eac6 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3432,6 +3432,9 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
run_pending = 1;
}
} else if (event == NETDEV_CHANGE) {
+   if (idev && idev->cnf.ignore_routes_with_linkdown)
+   neigh_carrier_down(_tbl, dev);
+
if (!addrconf_link_ready(dev)) {
/* device is still not ready. */
break;
-- 
2.1.4



Re: [PATCH] reduce the spinlock conflict during massive connect

2017-11-05 Thread Liu Yu
On Mon, Nov 6, 2017 at 1:27 PM, Eric Dumazet  wrote:
> On Mon, 2017-11-06 at 10:28 +0800, Liu Yu wrote:
>> From: Liu Yu 
>>
>> When a mount of processes connect to the same port at the same address
>> simultaneously, they are likely getting the same bhash and therefore
>> conflict with each other.
>>
>> The more the cpu number, the worse in this case.
>>
>> Use spin_trylock instead for this scene, which seems doesn't matter
>> for common case.
>>
>> Signed-off-by: Liu Yu 
>> ---
>>  net/ipv4/inet_hashtables.c |6 +-
>>  1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>> index e7d15fb..cc11ec7 100644
>> --- a/net/ipv4/inet_hashtables.c
>> +++ b/net/ipv4/inet_hashtables.c
>> @@ -581,13 +581,17 @@ int __inet_hash_connect(struct inet_timewait_death_row 
>> *death_row,
>>  other_parity_scan:
>>   port = low + offset;
>>   for (i = 0; i < remaining; i += 2, port += 2) {
>> + int ret;
>> +
>>   if (unlikely(port >= high))
>>   port -= remaining;
>>   if (inet_is_local_reserved_port(net, port))
>>   continue;
>>   head = >bhash[inet_bhashfn(net, port,
>> hinfo->bhash_size)];
>> - spin_lock_bh(>lock);
>> + ret = spin_trylock(>lock);
>> + if (unlikely(!ret))
>> + continue;
>>
>>   /* Does not bother with rcv_saddr checks, because
>>* the established check is already unique enough.
>
> This is broken.
>
> I am pretty sure you have not really tested this patch properly.
>
> Chances are very high that a connect() will miss slots and wont succeed,
> when table is almost full.

Thanks for your comments!

Can you explain how connect() miss slots when table is almost full?

>
> Performance is nice, but we actually need to allocate a 4-tuple in a
> more deterministic fashion.
>

So, what's the 4th element would you suggest?


Re: [PATCH] reduce the spinlock conflict during massive connect

2017-11-05 Thread Liu Yu
On Mon, Nov 6, 2017 at 12:12 PM, Cong Wang  wrote:
> On Sun, Nov 5, 2017 at 6:28 PM, Liu Yu  wrote:
>> -   spin_lock_bh(>lock);
>> +   ret = spin_trylock(>lock);
>
> Clearly you want spin_trylock_bh() instead.

Good catch! Thansk!


[patch net-next 8/9] mlxsw: spectrum: Support RED xstats

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_RED_XSTATS. This call returns the RED qdisc xstats from the cache
if the handle ID that is asked for matching the root qdisc ID and fails
otherwise.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  9 
 .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c   | 51 ++
 2 files changed, 60 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index e68299e..a86a493 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "port.h"
 #include "core.h"
@@ -211,6 +212,14 @@ enum mlxsw_sp_qdisc_type {
 struct mlxsw_sp_qdisc {
u32 handle;
enum mlxsw_sp_qdisc_type type;
+   struct red_stats xstats_base;
+   union {
+   struct {
+   u64 tail_drop_base;
+   u64 ecn_base;
+   u64 wred_drop_base;
+   } red;
+   } xstats;
 };
 
 /* No need an internal lock; At worse - miss a single periodic iteration */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
index c33e51a..b97b30e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "spectrum.h"
 #include "reg.h"
@@ -77,6 +78,27 @@ mlxsw_sp_tclass_congestion_disable(struct mlxsw_sp_port 
*mlxsw_sp_port,
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(cwtpm), cwtpm_cmd);
 }
 
+static void
+mlxsw_sp_setup_tc_qdisc_clean_stats(struct mlxsw_sp_port *mlxsw_sp_port,
+   struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
+   int tclass_num)
+{
+   struct red_stats *xstats_base = _sp_qdisc->xstats_base;
+   struct mlxsw_sp_port_xstats *xstats;
+
+   xstats = _sp_port->periodic_hw_stats.xstats;
+
+   switch (mlxsw_sp_qdisc->type) {
+   case MLXSW_SP_QDISC_RED:
+   xstats_base->prob_mark = xstats->ecn;
+   xstats_base->prob_drop = xstats->wred_drop[tclass_num];
+   xstats_base->pdrop = xstats->tail_drop[tclass_num];
+   break;
+   default:
+   break;
+   }
+}
+
 static int
 mlxsw_sp_qdisc_red_destroy(struct mlxsw_sp_port *mlxsw_sp_port, u32 handle,
   struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
@@ -135,6 +157,11 @@ mlxsw_sp_qdisc_red_replace(struct mlxsw_sp_port 
*mlxsw_sp_port, u32 handle,
goto err_config;
 
mlxsw_sp_qdisc->type = MLXSW_SP_QDISC_RED;
+   if (mlxsw_sp_qdisc->handle != handle)
+   mlxsw_sp_setup_tc_qdisc_clean_stats(mlxsw_sp_port,
+   mlxsw_sp_qdisc,
+   tclass_num);
+
mlxsw_sp_qdisc->handle = handle;
return 0;
 
@@ -146,6 +173,26 @@ mlxsw_sp_qdisc_red_replace(struct mlxsw_sp_port 
*mlxsw_sp_port, u32 handle,
return err;
 }
 
+static int
+mlxsw_sp_qdisc_get_red_xstats(struct mlxsw_sp_port *mlxsw_sp_port, u32 handle,
+ struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
+ int tclass_num, struct red_stats *res)
+{
+   struct red_stats *xstats_base = _sp_qdisc->xstats_base;
+   struct mlxsw_sp_port_xstats *xstats;
+
+   if (mlxsw_sp_qdisc->handle != handle ||
+   mlxsw_sp_qdisc->type != MLXSW_SP_QDISC_RED)
+   return -EOPNOTSUPP;
+
+   xstats = _sp_port->periodic_hw_stats.xstats;
+
+   res->prob_drop = xstats->wred_drop[tclass_num] - xstats_base->prob_drop;
+   res->prob_mark = xstats->ecn - xstats_base->prob_mark;
+   res->pdrop = xstats->tail_drop[tclass_num] - xstats_base->pdrop;
+   return 0;
+}
+
 #define MLXSW_SP_PORT_DEFAULT_TCLASS 0
 
 int mlxsw_sp_setup_tc_red(struct mlxsw_sp_port *mlxsw_sp_port,
@@ -168,6 +215,10 @@ int mlxsw_sp_setup_tc_red(struct mlxsw_sp_port 
*mlxsw_sp_port,
case TC_RED_DESTROY:
return mlxsw_sp_qdisc_red_destroy(mlxsw_sp_port, p->handle,
  mlxsw_sp_qdisc, tclass_num);
+   case TC_RED_XSTATS:
+   return mlxsw_sp_qdisc_get_red_xstats(mlxsw_sp_port, p->handle,
+mlxsw_sp_qdisc, tclass_num,
+p->xstats);
default:
return -EOPNOTSUPP;
}
-- 
2.9.5



[patch net-next 9/9] mlxsw: spectrum: Support general qdisc stats

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_QDISC_STATS. This call updates the generic qdisc stats from the
cache if the handle ID that is asked for matching the root qdisc ID and
fails otherwise.
Currently doesn't support qlen and rqueues.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  5 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c   | 51 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a86a493..58cf222 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -220,6 +220,11 @@ struct mlxsw_sp_qdisc {
u64 wred_drop_base;
} red;
} xstats;
+
+   u64 tx_bytes;
+   u64 tx_packets;
+   u64 drops;
+   u64 overlimits;
 };
 
 /* No need an internal lock; At worse - miss a single periodic iteration */
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
index b97b30e..c33beac 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -85,14 +85,24 @@ mlxsw_sp_setup_tc_qdisc_clean_stats(struct mlxsw_sp_port 
*mlxsw_sp_port,
 {
struct red_stats *xstats_base = _sp_qdisc->xstats_base;
struct mlxsw_sp_port_xstats *xstats;
+   struct rtnl_link_stats64 *stats;
 
xstats = _sp_port->periodic_hw_stats.xstats;
+   stats = _sp_port->periodic_hw_stats.stats;
+
+   mlxsw_sp_qdisc->tx_packets = stats->tx_packets;
+   mlxsw_sp_qdisc->tx_bytes = stats->tx_bytes;
 
switch (mlxsw_sp_qdisc->type) {
case MLXSW_SP_QDISC_RED:
xstats_base->prob_mark = xstats->ecn;
xstats_base->prob_drop = xstats->wred_drop[tclass_num];
xstats_base->pdrop = xstats->tail_drop[tclass_num];
+
+   mlxsw_sp_qdisc->overlimits = xstats_base->prob_drop +
+xstats_base->prob_mark;
+   mlxsw_sp_qdisc->drops = xstats_base->prob_drop +
+   xstats_base->pdrop;
break;
default:
break;
@@ -193,6 +203,43 @@ mlxsw_sp_qdisc_get_red_xstats(struct mlxsw_sp_port 
*mlxsw_sp_port, u32 handle,
return 0;
 }
 
+static int
+mlxsw_sp_qdisc_get_red_stats(struct mlxsw_sp_port *mlxsw_sp_port, u32 handle,
+struct mlxsw_sp_qdisc *mlxsw_sp_qdisc,
+int tclass_num,
+struct tc_red_qopt_offload_stats *res)
+{
+   u64 tx_bytes, tx_packets, overlimits, drops;
+   struct mlxsw_sp_port_xstats *xstats;
+   struct rtnl_link_stats64 *stats;
+
+   if (mlxsw_sp_qdisc->handle != handle ||
+   mlxsw_sp_qdisc->type != MLXSW_SP_QDISC_RED)
+   return -EOPNOTSUPP;
+
+   xstats = _sp_port->periodic_hw_stats.xstats;
+   stats = _sp_port->periodic_hw_stats.stats;
+
+   tx_bytes = stats->tx_bytes - mlxsw_sp_qdisc->tx_bytes;
+   tx_packets = stats->tx_packets - mlxsw_sp_qdisc->tx_packets;
+   overlimits = xstats->wred_drop[tclass_num] + xstats->ecn -
+mlxsw_sp_qdisc->overlimits;
+   drops = xstats->wred_drop[tclass_num] + xstats->tail_drop[tclass_num] -
+   mlxsw_sp_qdisc->drops;
+
+   _bstats_update(res->bstats, tx_bytes, tx_packets);
+   res->qstats->overlimits += overlimits;
+   res->qstats->drops += drops;
+   res->qstats->backlog += mlxsw_sp_cells_bytes(mlxsw_sp_port->mlxsw_sp,
+   xstats->backlog[tclass_num]);
+
+   mlxsw_sp_qdisc->drops +=  drops;
+   mlxsw_sp_qdisc->overlimits += overlimits;
+   mlxsw_sp_qdisc->tx_bytes += tx_bytes;
+   mlxsw_sp_qdisc->tx_packets += tx_packets;
+   return 0;
+}
+
 #define MLXSW_SP_PORT_DEFAULT_TCLASS 0
 
 int mlxsw_sp_setup_tc_red(struct mlxsw_sp_port *mlxsw_sp_port,
@@ -219,6 +266,10 @@ int mlxsw_sp_setup_tc_red(struct mlxsw_sp_port 
*mlxsw_sp_port,
return mlxsw_sp_qdisc_get_red_xstats(mlxsw_sp_port, p->handle,
 mlxsw_sp_qdisc, tclass_num,
 p->xstats);
+   case TC_RED_STATS:
+   return mlxsw_sp_qdisc_get_red_stats(mlxsw_sp_port, p->handle,
+   mlxsw_sp_qdisc, tclass_num,
+   >stats);
default:
return -EOPNOTSUPP;
}
-- 
2.9.5



[patch net-next 6/9] mlxsw: reg: Add ext and tc-cong counter groups

2017-11-05 Thread Jiri Pirko
From: Yuval Mintz 

This adds the counter group definitions for 2 new counter groups
which are necessary for gaining ECN & wred counters.

Signed-off-by: Yuval Mintz 
Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index db394ec..6c4e08b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3341,8 +3341,10 @@ MLXSW_ITEM32(reg, ppcnt, pnat, 0x00, 14, 2);
 
 enum mlxsw_reg_ppcnt_grp {
MLXSW_REG_PPCNT_IEEE_8023_CNT = 0x0,
+   MLXSW_REG_PPCNT_EXT_CNT = 0x5,
MLXSW_REG_PPCNT_PRIO_CNT = 0x10,
MLXSW_REG_PPCNT_TC_CNT = 0x11,
+   MLXSW_REG_PPCNT_TC_CONG_TC = 0x13,
 };
 
 /* reg_ppcnt_grp
@@ -3358,6 +3360,7 @@ enum mlxsw_reg_ppcnt_grp {
  * 0x10: Per Priority Counters
  * 0x11: Per Traffic Class Counters
  * 0x12: Physical Layer Counters
+ * 0x13: Per Traffic Class Congestion Counters
  * Access: Index
  */
 MLXSW_ITEM32(reg, ppcnt, grp, 0x00, 0, 6);
@@ -3496,6 +3499,14 @@ MLXSW_ITEM64(reg, ppcnt, 
a_pause_mac_ctrl_frames_received,
 MLXSW_ITEM64(reg, ppcnt, a_pause_mac_ctrl_frames_transmitted,
 MLXSW_REG_PPCNT_COUNTERS_OFFSET + 0x90, 0, 64);
 
+/* Ethernet Extended Counter Group Counters */
+
+/* reg_ppcnt_ecn_marked
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, ecn_marked,
+MLXSW_REG_PPCNT_COUNTERS_OFFSET + 0x08, 0, 64);
+
 /* Ethernet Per Priority Group Counters */
 
 /* reg_ppcnt_rx_octets
@@ -3571,6 +3582,14 @@ MLXSW_ITEM64(reg, ppcnt, tc_transmit_queue,
 MLXSW_ITEM64(reg, ppcnt, tc_no_buffer_discard_uc,
 MLXSW_REG_PPCNT_COUNTERS_OFFSET + 0x08, 0, 64);
 
+/* Ethernet Per Traffic Class Congestion Group Counters */
+
+/* reg_ppcnt_wred_discard
+ * Access: RO
+ */
+MLXSW_ITEM64(reg, ppcnt, wred_discard,
+MLXSW_REG_PPCNT_COUNTERS_OFFSET + 0x00, 0, 64);
+
 static inline void mlxsw_reg_ppcnt_pack(char *payload, u8 local_port,
enum mlxsw_reg_ppcnt_grp grp,
u8 prio_tc)
-- 
2.9.5



[patch net-next 3/9] net_sch: cbs: Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS to match the new convention..

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
 include/linux/netdevice.h | 2 +-
 net/sched/sch_cbs.c   | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e22bce7..43cf395 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2488,7 +2488,7 @@ static int igb_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
struct igb_adapter *adapter = netdev_priv(dev);
 
switch (type) {
-   case TC_SETUP_CBS:
+   case TC_SETUP_QDISC_CBS:
return igb_offload_cbs(adapter, type_data);
 
default:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 703885a..30f0f29 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -776,7 +776,7 @@ enum tc_setup_type {
TC_SETUP_CLSMATCHALL,
TC_SETUP_CLSBPF,
TC_SETUP_BLOCK,
-   TC_SETUP_CBS,
+   TC_SETUP_QDISC_CBS,
TC_SETUP_QDISC_RED,
 };
 
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index bdb533b..7a72980 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -212,7 +212,7 @@ static void cbs_disable_offload(struct net_device *dev,
cbs.queue = q->queue;
cbs.enable = 0;
 
-   err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, );
+   err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_CBS, );
if (err < 0)
pr_warn("Couldn't disable CBS offload for queue %d\n",
cbs.queue);
@@ -236,7 +236,7 @@ static int cbs_enable_offload(struct net_device *dev, 
struct cbs_sched_data *q,
cbs.idleslope = opt->idleslope;
cbs.sendslope = opt->sendslope;
 
-   err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, );
+   err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_CBS, );
if (err < 0)
return err;
 
-- 
2.9.5



[patch net-next 7/9] mlxsw: spectrum: Collect tclass related stats periodically

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Add more statistics to be collected from the HW periodically. These stats
are tclass based (beside ECN marked packet, that exist only port based).
They are needed to expose RED qdisc stats and xstats correctly.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 34 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  9 +++
 2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index e42b3e7..1497b43 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1324,6 +1324,38 @@ static int mlxsw_sp_port_get_hw_stats(struct net_device 
*dev,
return err;
 }
 
+static void
+mlxsw_sp_port_get_hw_xstats(struct net_device *dev,
+   struct mlxsw_sp_port_xstats *xstats)
+{
+   char ppcnt_pl[MLXSW_REG_PPCNT_LEN];
+   int err, i;
+
+   err = mlxsw_sp_port_get_stats_raw(dev, MLXSW_REG_PPCNT_EXT_CNT, 0,
+ ppcnt_pl);
+   if (!err)
+   xstats->ecn = mlxsw_reg_ppcnt_ecn_marked_get(ppcnt_pl);
+
+   for (i = 0; i < TC_MAX_QUEUE; i++) {
+   err = mlxsw_sp_port_get_stats_raw(dev,
+ MLXSW_REG_PPCNT_TC_CONG_TC,
+ i, ppcnt_pl);
+   if (!err)
+   xstats->wred_drop[i] =
+   mlxsw_reg_ppcnt_wred_discard_get(ppcnt_pl);
+
+   err = mlxsw_sp_port_get_stats_raw(dev, MLXSW_REG_PPCNT_TC_CNT,
+ i, ppcnt_pl);
+   if (err)
+   continue;
+
+   xstats->backlog[i] =
+   mlxsw_reg_ppcnt_tc_transmit_queue_get(ppcnt_pl);
+   xstats->tail_drop[i] =
+   mlxsw_reg_ppcnt_tc_no_buffer_discard_uc_get(ppcnt_pl);
+   }
+}
+
 static void update_stats_cache(struct work_struct *work)
 {
struct mlxsw_sp_port *mlxsw_sp_port =
@@ -1335,6 +1367,8 @@ static void update_stats_cache(struct work_struct *work)
 
mlxsw_sp_port_get_hw_stats(mlxsw_sp_port->dev,
   _sp_port->periodic_hw_stats.stats);
+   mlxsw_sp_port_get_hw_xstats(mlxsw_sp_port->dev,
+   _sp_port->periodic_hw_stats.xstats);
 
 out:
mlxsw_core_schedule_dw(_sp_port->periodic_hw_stats.update_dw,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 76ebd58..e68299e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -213,6 +213,14 @@ struct mlxsw_sp_qdisc {
enum mlxsw_sp_qdisc_type type;
 };
 
+/* No need an internal lock; At worse - miss a single periodic iteration */
+struct mlxsw_sp_port_xstats {
+   u64 ecn;
+   u64 wred_drop[TC_MAX_QUEUE];
+   u64 tail_drop[TC_MAX_QUEUE];
+   u64 backlog[TC_MAX_QUEUE];
+};
+
 struct mlxsw_sp_port {
struct net_device *dev;
struct mlxsw_sp_port_pcpu_stats __percpu *pcpu_stats;
@@ -242,6 +250,7 @@ struct mlxsw_sp_port {
struct {
#define MLXSW_HW_STATS_UPDATE_TIME HZ
struct rtnl_link_stats64 stats;
+   struct mlxsw_sp_port_xstats xstats;
struct delayed_work update_dw;
} periodic_hw_stats;
struct mlxsw_sp_port_sample *sample;
-- 
2.9.5



[patch net-next 4/9] mlxsw: reg: Add cwtp & cwtpm registers

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

This patch adds 2 new registers:
 - Congestion WRED ECN TClass Profile Register [CWTP]
 - Congestion WRED ECN TClass and Pool Mapping Register [CWTPM]

These registers would later be needed to offload RED-related
functionality to the HW.

Signed-off-by: Yuval Mintz 
Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 187 ++
 1 file changed, 187 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 5066553..db394ec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -1758,6 +1758,191 @@ static inline void mlxsw_reg_spvmlr_pack(char *payload, 
u8 local_port,
}
 }
 
+/* CWTP - Congetion WRED ECN TClass Profile
+ * 
+ * Configures the profiles for queues of egress port and traffic class
+ */
+#define MLXSW_REG_CWTP_ID 0x2802
+#define MLXSW_REG_CWTP_BASE_LEN 0x28
+#define MLXSW_REG_CWTP_PROFILE_DATA_REC_LEN 0x08
+#define MLXSW_REG_CWTP_LEN 0x40
+
+MLXSW_REG_DEFINE(cwtp, MLXSW_REG_CWTP_ID, MLXSW_REG_CWTP_LEN);
+
+/* reg_cwtp_local_port
+ * Local port number
+ * Not supported for CPU port
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, cwtp, local_port, 0, 16, 8);
+
+/* reg_cwtp_traffic_class
+ * Traffic Class to configure
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, cwtp, traffic_class, 32, 0, 8);
+
+/* reg_cwtp_profile_min
+ * Minimum Average Queue Size of the profile in cells.
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, cwtp, profile_min, MLXSW_REG_CWTP_BASE_LEN,
+0, 20, MLXSW_REG_CWTP_PROFILE_DATA_REC_LEN, 0, false);
+
+/* reg_cwtp_profile_percent
+ * Percentage of WRED and ECN marking for maximum Average Queue size
+ * Range is 0 to 100, units of integer percentage
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, cwtp, profile_percent, MLXSW_REG_CWTP_BASE_LEN,
+24, 7, MLXSW_REG_CWTP_PROFILE_DATA_REC_LEN, 4, false);
+
+/* reg_cwtp_profile_max
+ * Maximum Average Queue size of the profile in cells
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, cwtp, profile_max, MLXSW_REG_CWTP_BASE_LEN,
+0, 20, MLXSW_REG_CWTP_PROFILE_DATA_REC_LEN, 4, false);
+
+#define MLXSW_REG_CWTP_MIN_VALUE 64
+#define MLXSW_REG_CWTP_MAX_PROFILE 2
+#define MLXSW_REG_CWTP_DEFAULT_PROFILE 1
+
+static inline void mlxsw_reg_cwtp_pack(char *payload, u8 local_port,
+  u8 traffic_class)
+{
+   int i;
+
+   MLXSW_REG_ZERO(cwtp, payload);
+   mlxsw_reg_cwtp_local_port_set(payload, local_port);
+   mlxsw_reg_cwtp_traffic_class_set(payload, traffic_class);
+
+   for (i = 0; i <= MLXSW_REG_CWTP_MAX_PROFILE; i++) {
+   mlxsw_reg_cwtp_profile_min_set(payload, i,
+  MLXSW_REG_CWTP_MIN_VALUE);
+   mlxsw_reg_cwtp_profile_max_set(payload, i,
+  MLXSW_REG_CWTP_MIN_VALUE);
+   }
+}
+
+#define MLXSW_REG_CWTP_PROFILE_TO_INDEX(profile) (profile - 1)
+
+static inline void
+mlxsw_reg_cwtp_profile_pack(char *payload, u8 profile, u32 min, u32 max,
+   u32 probability)
+{
+   u8 index = MLXSW_REG_CWTP_PROFILE_TO_INDEX(profile);
+
+   mlxsw_reg_cwtp_profile_min_set(payload, index, min);
+   mlxsw_reg_cwtp_profile_max_set(payload, index, max);
+   mlxsw_reg_cwtp_profile_percent_set(payload, index, probability);
+}
+
+/* CWTPM - Congestion WRED ECN TClass and Pool Mapping
+ * ---
+ * The CWTPM register maps each egress port and traffic class to profile num.
+ */
+#define MLXSW_REG_CWTPM_ID 0x2803
+#define MLXSW_REG_CWTPM_LEN 0x44
+
+MLXSW_REG_DEFINE(cwtpm, MLXSW_REG_CWTPM_ID, MLXSW_REG_CWTPM_LEN);
+
+/* reg_cwtpm_local_port
+ * Local port number
+ * Not supported for CPU port
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, cwtpm, local_port, 0, 16, 8);
+
+/* reg_cwtpm_traffic_class
+ * Traffic Class to configure
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, cwtpm, traffic_class, 32, 0, 8);
+
+/* reg_cwtpm_ew
+ * Control enablement of WRED for traffic class:
+ * 0 - Disable
+ * 1 - Enable
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, cwtpm, ew, 36, 1, 1);
+
+/* reg_cwtpm_ee
+ * Control enablement of ECN for traffic class:
+ * 0 - Disable
+ * 1 - Enable
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, cwtpm, ee, 36, 0, 1);
+
+/* reg_cwtpm_tcp_g
+ * TCP Green Profile.
+ * Index of the profile within {port, traffic class} to use.
+ * 0 for disabling both WRED and ECN for this type of traffic.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, cwtpm, tcp_g, 52, 0, 2);
+
+/* reg_cwtpm_tcp_y
+ * TCP Yellow Profile.
+ * Index of the profile within {port, traffic class} to use.
+ * 0 for disabling both WRED and ECN for this type of traffic.
+ * Access: RW
+ */

[patch net-next 5/9] mlxsw: spectrum: Support RED qdisc offload

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED.
This call sets RED qdisc on a traffic class.
This patch supports RED qdisc only as a root qdisc and set in on the
default tclass. It can be set with or without ECN.

Signed-off-by: Yuval Mintz 
Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  15 ++
 .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c   | 174 +
 4 files changed, 193 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile 
b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 80f4efd..9463c3f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -19,7 +19,8 @@ mlxsw_spectrum-objs   := spectrum.o 
spectrum_buffers.o \
   spectrum_acl.o spectrum_flower.o \
   spectrum_cnt.o spectrum_fid.o \
   spectrum_ipip.o spectrum_acl_flex_actions.o \
-  spectrum_mr.o spectrum_mr_tcam.o
+  spectrum_mr.o spectrum_mr_tcam.o \
+  spectrum_qdisc.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)+= spectrum_dcb.o
 mlxsw_spectrum-$(CONFIG_NET_DEVLINK) += spectrum_dpipe.o
 obj-$(CONFIG_MLXSW_MINIMAL)+= mlxsw_minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 63e5087..e42b3e7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1797,6 +1797,8 @@ static int mlxsw_sp_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
case TC_SETUP_BLOCK:
return mlxsw_sp_setup_tc_block(mlxsw_sp_port, type_data);
+   case TC_SETUP_QDISC_RED:
+   return mlxsw_sp_setup_tc_red(mlxsw_sp_port, type_data);
default:
return -EOPNOTSUPP;
}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 47dd7e0..76ebd58 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -203,6 +203,16 @@ struct mlxsw_sp_port_vlan {
struct list_head bridge_vlan_node;
 };
 
+enum mlxsw_sp_qdisc_type {
+   MLXSW_SP_QDISC_NO_QDISC,
+   MLXSW_SP_QDISC_RED,
+};
+
+struct mlxsw_sp_qdisc {
+   u32 handle;
+   enum mlxsw_sp_qdisc_type type;
+};
+
 struct mlxsw_sp_port {
struct net_device *dev;
struct mlxsw_sp_port_pcpu_stats __percpu *pcpu_stats;
@@ -236,6 +246,7 @@ struct mlxsw_sp_port {
} periodic_hw_stats;
struct mlxsw_sp_port_sample *sample;
struct list_head vlans_list;
+   struct mlxsw_sp_qdisc root_qdisc;
 };
 
 static inline bool
@@ -546,6 +557,10 @@ void mlxsw_sp_flower_destroy(struct mlxsw_sp_port 
*mlxsw_sp_port, bool ingress,
 int mlxsw_sp_flower_stats(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
  struct tc_cls_flower_offload *f);
 
+/* spectrum_qdisc.c */
+int mlxsw_sp_setup_tc_red(struct mlxsw_sp_port *mlxsw_sp_port,
+ struct tc_red_qopt_offload *p);
+
 /* spectrum_fid.c */
 int mlxsw_sp_fid_flood_set(struct mlxsw_sp_fid *fid,
   enum mlxsw_sp_flood_type packet_type, u8 local_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
new file mode 100644
index 000..c33e51a
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
@@ -0,0 +1,174 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Nogah Frankel 
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *contributors may be used to endorse or promote products derived from
+ *this software without specific prior written permission.
+ *
+ * Alternatively, this software 

[patch net-next 2/9] net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c   | 2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  | 2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c| 2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c| 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 2 +-
 drivers/net/ethernet/sfc/falcon/tx.c   | 2 +-
 drivers/net/ethernet/sfc/tx.c  | 2 +-
 drivers/net/ethernet/ti/netcp_core.c   | 2 +-
 include/linux/netdevice.h  | 2 +-
 net/sched/sch_mqprio.c | 5 +++--
 15 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 3d53153..a74a8fb 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -2206,7 +2206,7 @@ static int xgbe_setup_tc(struct net_device *netdev, enum 
tc_setup_type type,
struct tc_mqprio_qopt *mqprio = type_data;
u8 tc;
 
-   if (type != TC_SETUP_MQPRIO)
+   if (type != TC_SETUP_QDISC_MQPRIO)
return -EOPNOTSUPP;
 
mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 1216c1f..4c739d5 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4289,7 +4289,7 @@ int __bnx2x_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 {
struct tc_mqprio_qopt *mqprio = type_data;
 
-   if (type != TC_SETUP_MQPRIO)
+   if (type != TC_SETUP_QDISC_MQPRIO)
return -EOPNOTSUPP;
 
mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 96416f5..e5472e5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7388,7 +7388,7 @@ static int bnxt_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
switch (type) {
case TC_SETUP_BLOCK:
return bnxt_setup_tc_block(dev, type_data);
-   case TC_SETUP_MQPRIO: {
+   case TC_SETUP_QDISC_MQPRIO: {
struct tc_mqprio_qopt *mqprio = type_data;
 
mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ebc55b6..784dbf5 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -351,7 +351,7 @@ static int dpaa_setup_tc(struct net_device *net_dev, enum 
tc_setup_type type,
u8 num_tc;
int i;
 
-   if (type != TC_SETUP_MQPRIO)
+   if (type != TC_SETUP_QDISC_MQPRIO)
return -EOPNOTSUPP;
 
mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
index 2a0af11..5941509 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -1252,7 +1252,7 @@ static int hns3_setup_tc(struct net_device *netdev, void 
*type_data)
 static int hns3_nic_setup_tc(struct net_device *dev, enum tc_setup_type type,
 void *type_data)
 {
-   if (type != TC_SETUP_MQPRIO)
+   if (type != TC_SETUP_QDISC_MQPRIO)
return -EOPNOTSUPP;
 
return hns3_setup_tc(dev, type_data);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 81e4425..adc62fb 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1389,7 +1389,7 @@ static int __fm10k_setup_tc(struct net_device *dev, enum 
tc_setup_type type,
 {
struct tc_mqprio_qopt *mqprio = type_data;
 
-   if (type != TC_SETUP_MQPRIO)
+   if (type != TC_SETUP_QDISC_MQPRIO)
return -EOPNOTSUPP;
 
mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 05b94d8..17e6f64 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7550,7 +7550,7 

[patch net-next 1/9] net_sch: red: Add offload ability to RED qdisc

2017-11-05 Thread Jiri Pirko
From: Nogah Frankel 

Add the ability to offload RED qdisc by using ndo_setup_tc.
There are four commands for RED offloading:
* TC_RED_SET: handles set and change.
* TC_RED_DESTROY: handle qdisc destroy.
* TC_RED_STATS: update the qdiscs counters (given as reference)
* TC_RED_XSTAT: returns red xstats.

Whether RED is being offloaded is being determined every time dump action
is being called because parent change of this qdisc could change its
offload state but doesn't require any RED function to be called.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h  |  1 +
 include/net/pkt_cls.h  | 30 
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_red.c| 79 ++
 4 files changed, 111 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fda527c..71968a2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -777,6 +777,7 @@ enum tc_setup_type {
TC_SETUP_CLSBPF,
TC_SETUP_BLOCK,
TC_SETUP_CBS,
+   TC_SETUP_QDISC_RED,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 98fef32..03c208d 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -703,4 +703,34 @@ struct tc_cookie {
u8  *data;
u32 len;
 };
+
+enum tc_red_command {
+   TC_RED_REPLACE,
+   TC_RED_DESTROY,
+   TC_RED_STATS,
+   TC_RED_XSTATS,
+};
+
+struct tc_red_qopt_offload_params {
+   u32 min;
+   u32 max;
+   u32 probability;
+   bool is_ecn;
+};
+struct tc_red_qopt_offload_stats {
+   struct gnet_stats_basic_packed *bstats;
+   struct gnet_stats_queue *qstats;
+};
+
+struct tc_red_qopt_offload {
+   enum tc_red_command command;
+   u32 handle;
+   u32 parent;
+   union {
+   struct tc_red_qopt_offload_params set;
+   struct tc_red_qopt_offload_stats stats;
+   struct red_stats *xstats;
+   };
+};
+
 #endif
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 5002562..6a2c5ea 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -256,6 +256,7 @@ struct tc_red_qopt {
 #define TC_RED_ECN 1
 #define TC_RED_HARDDROP2
 #define TC_RED_ADAPTATIVE  4
+#define TC_RED_OFFLOADED   8
 };
 
 struct tc_red_xstats {
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index fdfdb56..007dd8e 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -148,11 +149,37 @@ static void red_reset(struct Qdisc *sch)
red_restart(>vars);
 }
 
+static int red_offload(struct Qdisc *sch, bool enable)
+{
+   struct red_sched_data *q = qdisc_priv(sch);
+   struct net_device *dev = qdisc_dev(sch);
+   struct tc_red_qopt_offload opt = {
+   .handle = sch->handle,
+   .parent = sch->parent,
+   };
+
+   if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
+   return -EOPNOTSUPP;
+
+   if (enable) {
+   opt.command = TC_RED_REPLACE;
+   opt.set.min = q->parms.qth_min >> q->parms.Wlog;
+   opt.set.max = q->parms.qth_max >> q->parms.Wlog;
+   opt.set.probability = q->parms.max_P;
+   opt.set.is_ecn = red_use_ecn(q);
+   } else {
+   opt.command = TC_RED_DESTROY;
+   }
+
+   return dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, );
+}
+
 static void red_destroy(struct Qdisc *sch)
 {
struct red_sched_data *q = qdisc_priv(sch);
 
del_timer_sync(>adapt_timer);
+   red_offload(sch, false);
qdisc_destroy(q->qdisc);
 }
 
@@ -219,6 +246,7 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt)
red_start_of_idle_period(>vars);
 
sch_tree_unlock(sch);
+   red_offload(sch, true);
return 0;
 }
 
@@ -244,6 +272,33 @@ static int red_init(struct Qdisc *sch, struct nlattr *opt)
return red_change(sch, opt);
 }
 
+static int red_dump_offload(struct Qdisc *sch, struct tc_red_qopt *opt)
+{
+   struct net_device *dev = qdisc_dev(sch);
+   struct tc_red_qopt_offload hw_stats = {
+   .handle = sch->handle,
+   .parent = sch->parent,
+   .command = TC_RED_STATS,
+   .stats.bstats = >bstats,
+   .stats.qstats = >qstats,
+   };
+   int err;
+
+   opt->flags &= ~TC_RED_OFFLOADED;
+   if (!tc_can_offload(dev) || !dev->netdev_ops->ndo_setup_tc)
+   return 0;
+
+   err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
+   _stats);
+   if (err == -EOPNOTSUPP)
+   

[patch net-next 0/9] qdisc RED offload

2017-11-05 Thread Jiri Pirko
From: Jiri Pirko 

Nogah says:

Add an offload support for RED qdisc for mlxsw driver.
The first patch adds the ability to offload RED qdisc by using
ndo_setup_tc. It gives RED three commands, to offload, change or delete
the qdisc, to get the qdisc generic stats and to get it's RED xstats.
There is no enforcement on a driver to offload or not offload the qdisc and
it is up to the driver to decide.
RED qdisc is first being created and only later graft to a parent (unless
it is a root qdisc). For that reason the return value of the offload
replace command that is called in the init process doesn't reflect actual
offload state. The offload state is determined in the dump function so it
can be reflected to the user. This function is also responsible for stats
update.

The patchses 2-3 change the name of TC_SETUP_MQPRIO & TC_SETUP_CBS to match
with the new convention of QDISC prefix.
The rest of the patchset is driver support for the qdisc. Currently only
as root qdisc that is being set on the default traffic class. It supports
only the following parameters of RED: min, max, probability and ECN mode.
Limit and burst size related params are being ignored at this moment.

---
v7->v8 internal: (external RFC->v1)
- patch 1/9:
 - unite the offload and un-offload functions
 - clean the OFFLOAD flag when the qdisc in not offloaded
- patch 2/9:
 - minor change to avoid a conflict
- patch 5/9:
 - check for bad min/max values
 - clean the offloaded qdisc after a bad config call

Nogah Frankel (8):
  net_sch: red: Add offload ability to RED qdisc
  net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO
  net_sch: cbs: Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS
  mlxsw: reg: Add cwtp & cwtpm registers
  mlxsw: spectrum: Support RED qdisc offload
  mlxsw: spectrum: Collect tclass related stats periodically
  mlxsw: spectrum: Support RED xstats
  mlxsw: spectrum: Support general qdisc stats

Yuval Mintz (1):
  mlxsw: reg: Add ext and tc-cong counter groups

 drivers/net/ethernet/amd/xgbe/xgbe-drv.c   |   2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |   2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c |   2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c |   2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c|   2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c|   2 +-
 drivers/net/ethernet/intel/igb/igb_main.c  |   2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   3 +-
 drivers/net/ethernet/mellanox/mlxsw/reg.h  | 206 +++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  36 +++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  38 +++
 .../net/ethernet/mellanox/mlxsw/spectrum_qdisc.c   | 276 +
 drivers/net/ethernet/sfc/falcon/tx.c   |   2 +-
 drivers/net/ethernet/sfc/tx.c  |   2 +-
 drivers/net/ethernet/ti/netcp_core.c   |   2 +-
 include/linux/netdevice.h  |   5 +-
 include/net/pkt_cls.h  |  30 +++
 include/uapi/linux/pkt_sched.h |   1 +
 net/sched/sch_cbs.c|   4 +-
 net/sched/sch_mqprio.c |   5 +-
 net/sched/sch_red.c|  79 ++
 25 files changed, 690 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c

-- 
2.9.5



[PATCH net-next V3 2/3] tools: bpftool: show filenames of pinned objects

2017-11-05 Thread Prashant Bhole
Added support to show filenames of pinned objects.

For example:

root@test# ./bpftool prog
3: tracepoint  name tracepoint__irq  tag f677a7dd722299a3
loaded_at Oct 26/11:39  uid 0
xlated 160B  not jited  memlock 4096B  map_ids 4
pinned /sys/fs/bpf/softirq_prog

4: tracepoint  name tracepoint__irq  tag ea5dc530d00b92b6
loaded_at Oct 26/11:39  uid 0
xlated 392B  not jited  memlock 4096B  map_ids 4,6

root@test# ./bpftool --json --pretty prog
[{
"id": 3,
"type": "tracepoint",
"name": "tracepoint__irq",
"tag": "f677a7dd722299a3",
"loaded_at": "Oct 26/11:39",
"uid": 0,
"bytes_xlated": 160,
"jited": false,
"bytes_memlock": 4096,
"map_ids": [4
],
"pinned": ["/sys/fs/bpf/softirq_prog"
]
},{
"id": 4,
"type": "tracepoint",
"name": "tracepoint__irq",
"tag": "ea5dc530d00b92b6",
"loaded_at": "Oct 26/11:39",
"uid": 0,
"bytes_xlated": 392,
"jited": false,
"bytes_memlock": 4096,
"map_ids": [4,6
],
"pinned": []
}
]

root@test# ./bpftool map
4: hash  name start  flags 0x0
key 4B  value 16B  max_entries 10240  memlock 1003520B
pinned /sys/fs/bpf/softirq_map1
5: hash  name iptr  flags 0x0
key 4B  value 8B  max_entries 10240  memlock 921600B

root@test# ./bpftool --json --pretty map
[{
"id": 4,
"type": "hash",
"name": "start",
"flags": 0,
"bytes_key": 4,
"bytes_value": 16,
"max_entries": 10240,
"bytes_memlock": 1003520,
"pinned": ["/sys/fs/bpf/softirq_map1"
]
},{
"id": 5,
"type": "hash",
"name": "iptr",
"flags": 0,
"bytes_key": 4,
"bytes_value": 8,
"max_entries": 10240,
"bytes_memlock": 921600,
"pinned": []
}
]

Signed-off-by: Prashant Bhole 
---
v2:
 - Dynamically identify bpf-fs moutpoint
 - Close files descriptors before returning on error
 - Fixed line break for proper output formatting
 - Code style: wrapped lines > 80, used reverse christmastree style

v3:
 - Handle multiple bpffs mountpoints
 - Code style: fixed line break indentation

 tools/bpf/bpftool/common.c | 85 ++
 tools/bpf/bpftool/main.c   |  8 +
 tools/bpf/bpftool/main.h   | 17 ++
 tools/bpf/bpftool/map.c| 21 
 tools/bpf/bpftool/prog.c   | 24 +
 5 files changed, 155 insertions(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 4556947709ee..152c5bdbe2e9 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -45,6 +45,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -290,3 +292,86 @@ void print_hex_data_json(uint8_t *data, size_t len)
jsonw_printf(json_wtr, "\"0x%02hhx\"", data[i]);
jsonw_end_array(json_wtr);
 }
+
+int build_pinned_obj_table(struct pinned_obj_table *tab,
+  enum bpf_obj_type type)
+{
+   struct bpf_prog_info pinned_info = {};
+   __u32 len = sizeof(pinned_info);
+   struct pinned_obj *obj_node = NULL;
+   enum bpf_obj_type objtype;
+   struct mntent *mntent = NULL;
+   FILE *mntfile = NULL;
+   FTSENT *ftse = NULL;
+   FTS *fts = NULL;
+   int fd, err;
+
+   mntfile = setmntent("/proc/mounts", "r");
+   if (!mntfile)
+   return -1;
+
+   while ((mntent = getmntent(mntfile)) != NULL) {
+   char *path[] = {mntent->mnt_dir, 0};
+
+   if (strncmp(mntent->mnt_type, "bpf", 3) != 0)
+   continue;
+
+   fts = fts_open(path, 0, NULL);
+   if (!fts)
+   continue;
+
+   while ((ftse = fts_read(fts)) != NULL) {
+   if (!(ftse->fts_info & FTS_F))
+   continue;
+   fd = open_obj_pinned(ftse->fts_path);
+   if (fd < 0)
+   continue;
+
+   objtype = get_fd_type(fd);
+   if (objtype != type) {
+   close(fd);
+   continue;
+   }
+   memset(_info, 0, sizeof(pinned_info));
+   err = bpf_obj_get_info_by_fd(fd, _info, );
+   if (err) {
+   close(fd);
+   continue;
+   }
+
+   obj_node = malloc(sizeof(*obj_node));
+   if (!obj_node) {
+   close(fd);
+   fts_close(fts);
+   fclose(mntfile);
+   return -1;
+   }
+
+   

[PATCH net-next V3 3/3] tools: bpftool: optionally show filenames of pinned objects

2017-11-05 Thread Prashant Bhole
Making it optional to show file names of pinned objects because
it scans complete bpf-fs filesystem which is costly.
Added option -f|--bpffs. Documentation updated.

Signed-off-by: Prashant Bhole 
---
v2:
 - Change command line option from {-l|--pinned} to {-f|--bpffs}
 - Updated documentation

v3:
 - No change

 tools/bpf/bpftool/Documentation/bpftool-map.rst  |  5 -
 tools/bpf/bpftool/Documentation/bpftool-prog.rst |  5 -
 tools/bpf/bpftool/main.c | 14 +++---
 tools/bpf/bpftool/main.h |  3 ++-
 tools/bpf/bpftool/map.c  |  3 ++-
 tools/bpf/bpftool/prog.c |  3 ++-
 6 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index abb9ee940b15..9f51a268eb06 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -12,7 +12,7 @@ SYNOPSIS
 
**bpftool** [*OPTIONS*] **map** *COMMAND*
 
-   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] }
+   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
 
*COMMANDS* :=
{ **show** | **dump** | **update** | **lookup** | **getnext** | 
**delete**
@@ -86,6 +86,9 @@ OPTIONS
-p, --pretty
  Generate human-readable JSON output. Implies **-j**.
 
+   -f, --bpffs
+ Show file names of pinned maps.
+
 EXAMPLES
 
 **# bpftool map show**
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 0f25d3c39e05..36e8d1c3c40d 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -12,7 +12,7 @@ SYNOPSIS
 
**bpftool** [*OPTIONS*] **prog** *COMMAND*
 
-   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] }
+   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
 
*COMMANDS* :=
{ **show** | **dump xlated** | **dump jited** | **pin** | **help** }
@@ -75,6 +75,9 @@ OPTIONS
-p, --pretty
  Generate human-readable JSON output. Implies **-j**.
 
+   -f, --bpffs
+ Show file names of pinned programs.
+
 EXAMPLES
 
 **# bpftool prog show**
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 6ad53f1797fa..d6e4762170a4 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -54,6 +54,7 @@ static int (*last_do_help)(int argc, char **argv);
 json_writer_t *json_wtr;
 bool pretty_output;
 bool json_output;
+bool show_pinned;
 struct pinned_obj_table prog_table;
 struct pinned_obj_table map_table;
 
@@ -265,6 +266,7 @@ int main(int argc, char **argv)
{ "help",   no_argument,NULL,   'h' },
{ "pretty", no_argument,NULL,   'p' },
{ "version",no_argument,NULL,   'V' },
+   { "bpffs",  no_argument,NULL,   'f' },
{ 0 }
};
int opt, ret;
@@ -272,12 +274,13 @@ int main(int argc, char **argv)
last_do_help = do_help;
pretty_output = false;
json_output = false;
+   show_pinned = false;
bin_name = argv[0];
 
hash_init(prog_table.table);
hash_init(map_table.table);
 
-   while ((opt = getopt_long(argc, argv, "Vhpj",
+   while ((opt = getopt_long(argc, argv, "Vhpjf",
  options, NULL)) >= 0) {
switch (opt) {
case 'V':
@@ -290,6 +293,9 @@ int main(int argc, char **argv)
case 'j':
json_output = true;
break;
+   case 'f':
+   show_pinned = true;
+   break;
default:
usage();
}
@@ -316,8 +322,10 @@ int main(int argc, char **argv)
if (json_output)
jsonw_destroy(_wtr);
 
-   delete_pinned_obj_table(_table);
-   delete_pinned_obj_table(_table);
+   if (show_pinned) {
+   delete_pinned_obj_table(_table);
+   delete_pinned_obj_table(_table);
+   }
 
return ret;
 }
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 726f6e27a706..32846a0e42fb 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -59,7 +59,7 @@
 #define HELP_SPEC_PROGRAM  \
"PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }"
 #define HELP_SPEC_OPTIONS  \
-   "OPTIONS := { {-j|--json} [{-p|--pretty}] }"
+   "OPTIONS := { {-j|--json} [{-p|--pretty}] | {-f|--bpffs} }"
 
 enum bpf_obj_type {
   

[PATCH net-next V3 1/3] tools: bpftool: open pinned object without type check

2017-11-05 Thread Prashant Bhole
This was needed for opening any file in bpf-fs without knowing
its object type

Signed-off-by: Prashant Bhole 
---
v2:
 - No change

v3:
 - No change

 tools/bpf/bpftool/common.c | 15 +--
 tools/bpf/bpftool/main.h   |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index f0288269dae8..4556947709ee 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -91,9 +91,8 @@ static int mnt_bpffs(const char *target, char *buff, size_t 
bufflen)
return 0;
 }
 
-int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
+int open_obj_pinned(char *path)
 {
-   enum bpf_obj_type type;
int fd;
 
fd = bpf_obj_get(path);
@@ -105,6 +104,18 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
exp_type)
return -1;
}
 
+   return fd;
+}
+
+int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type)
+{
+   enum bpf_obj_type type;
+   int fd;
+
+   fd = open_obj_pinned(path);
+   if (fd < 0)
+   return -1;
+
type = get_fd_type(fd);
if (type < 0) {
close(fd);
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index d315d01be645..4b5685005cb0 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -86,6 +86,7 @@ int cmd_select(const struct cmd *cmds, int argc, char **argv,
 int get_fd_type(int fd);
 const char *get_fd_type_name(enum bpf_obj_type type);
 char *get_fdinfo(int fd, const char *key);
+int open_obj_pinned(char *path);
 int open_obj_pinned_any(char *path, enum bpf_obj_type exp_type);
 int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32));
 
-- 
2.13.6




[PATCH net-next V3 0/3] tools: bpftool: show filenames of pinned objects

2017-11-05 Thread Prashant Bhole
This patchset adds support to show pinned objects in object details.

Patch1 adds a funtionality to open a path in bpf-fs regardless of its object
type.

Patch2 adds actual functionality by scanning the bpf-fs once and adding
object information in hash table, with object id as a key. One object may be
associated with multiple paths because an object can be pinned multiple times

Patch3 adds command line option to enable this functionality. Making it optional
because scanning bpf-fs can be costly.

v1->v2:
 - Dynamically identify bpf-fs moutpoint
 - Close files descriptors before returning on error
 - Change command line option from {-l|--pinned} to {-f|--bpffs}
 - Updated documentation
 - Fixed line break for proper output formatting
 - Code style: wrapped lines > 80, used reverse christmastree style

v2->v3:
 - Handle multiple bpffs mountpoints
 - Code style: fixed line break indentation

Prashant Bhole (3):
  tools: bpftool: open pinned object without type check
  tools: bpftool: show filenames of pinned objects
  tools: bpftool: optionally show filenames of pinned objects

 tools/bpf/bpftool/Documentation/bpftool-map.rst  |   5 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst |   5 +-
 tools/bpf/bpftool/common.c   | 100 ++-
 tools/bpf/bpftool/main.c |  18 +++-
 tools/bpf/bpftool/main.h |  21 -
 tools/bpf/bpftool/map.c  |  22 +
 tools/bpf/bpftool/prog.c |  25 ++
 7 files changed, 190 insertions(+), 6 deletions(-)

-- 
2.13.6




Re: Bond recovery from BOND_LINK_FAIL state not working

2017-11-05 Thread Jarod Wilson

On 2017-11-02 9:11 PM, Jay Vosburgh wrote:

Alex Sidorenko  wrote:

...> I think I see the flaw in the logic.


1) bond_miimon_inspect finds link_state = 0, then makes a call
to bond_propose_link_state(BOND_LINK_FAIL), setting link_new_state to
BOND_LINK_FAIL.  _inspect then sets slave->new_link = BOND_LINK_DOWN and
returns non-zero.

2) bond_mii_monitor rtnl_trylock fails, it reschedules.

3) bond_mii_monitor runs again, and calls bond_miimon_inspect.

4) the slave's link has recovered, so link_state != 0.
slave->link is still BOND_LINK_UP.  The slave's link_new_state remains
set to BOND_LINK_FAIL, but new_link is reset to NOCHANGE.
bond_miimon_inspect returns 0, so nothing is committed.

5) step 4 can repeat indefinitely.

6) eventually, the other slave does something that causes
commit++, making bond_mii_monitor call bond_commit_link_state and then
bond_miimon_commit.  The slave in question from steps 1-4 still has
link_new_state as BOND_LINK_FAIL, but new_link is NOCHANGE, so it ends
up in BOND_LINK_FAIL state.

I think step 6 could also occur concurrently with the initial
pass through step 4 to induce the problem.

It looks like Mahesh mostly fixed this in

commit fb9eb899a6dc663e4a2deed9af2ac28f507d0ffb
Author: Mahesh Bandewar 
Date:   Tue Apr 11 22:36:00 2017 -0700

 bonding: handle link transition from FAIL to UP correctly

but the window still exists, and requires the slave link state
to change between the failed rtnl_trylock and the second pass through
_inspect.  The problem is that a state transition has been kept between
invocations to _inspect, but the condition that induced the transition
has changed.

I haven't tested these, but I suspect the solution is either to
clear link_new_state on entry to the loop in bond_miimon_inspect, or
merge new_state and link_new_state as a single "next state" (which is
cleared on entry to the loop).

The first of these is a pretty simple patch:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 18b58e1376f1..6f89f9981a6c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2046,6 +2046,7 @@ static int bond_miimon_inspect(struct bonding *bond)
  
  	bond_for_each_slave_rcu(bond, slave, iter) {

slave->new_link = BOND_LINK_NOCHANGE;
+   slave->link_new_state = slave->link;
  
  		link_state = bond_check_dev_link(bond, slave->dev, 0);
  


Alex / Jarod, could you check my logic, and would you be able to
test this patch if my analysis appears sound?


This patch looks good, the original reproducing setup successfully 
recovers after the original active slave goes down, even with 
NetworkManager in the mix.


Reviewed-by: Jarod Wilson 

--
Jarod Wilson
ja...@redhat.com


RE: [PATCH net-next V2 2/3] tools: bpftool: show filenames of pinned objects

2017-11-05 Thread Prashant Bhole


> From: Quentin Monnet [mailto:quentin.mon...@netronome.com]
> 
> 2017-11-02 16:59 UTC+0900 ~ Prashant Bhole
> 
> > Added support to show filenames of pinned objects.
> >
> > For example:
> >
> 
> […]
> 
> >
> > Signed-off-by: Prashant Bhole 
> > ---
> > v2:
> >  - Dynamically identify bpf-fs moutpoint
> >  - Close files descriptors before returning on error
> >  - Fixed line break for proper output formatting
> >  - Code style: wrapped lines > 80, used reverse christmastree style
> 
> Thanks for those changes!
> 
> >
> >  tools/bpf/bpftool/common.c | 93
> ++
> >  tools/bpf/bpftool/main.c   |  8 
> >  tools/bpf/bpftool/main.h   | 17 +
> >  tools/bpf/bpftool/map.c| 21 +++
> >  tools/bpf/bpftool/prog.c   | 24 
> >  5 files changed, 163 insertions(+)
> >
> > diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > index 4556947709ee..78a16c02c778 100644
> > --- a/tools/bpf/bpftool/common.c
> > +++ b/tools/bpf/bpftool/common.c
> > @@ -45,6 +45,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >
> >  #include 
> >
> > @@ -290,3 +292,94 @@ void print_hex_data_json(uint8_t *data, size_t len)
> > jsonw_printf(json_wtr, "\"0x%02hhx\"", data[i]);
> > jsonw_end_array(json_wtr);
> >  }
> > +
> > +int build_pinned_obj_table(struct pinned_obj_table *tab,
> > +  enum bpf_obj_type type)
> > +{
> > +   struct bpf_prog_info pinned_info = {};
> > +   __u32 len = sizeof(pinned_info);
> > +   struct pinned_obj *obj_node = NULL;
> > +   enum bpf_obj_type objtype;
> > +   struct mntent *mntent = NULL;
> > +   FILE *mntfile = NULL;
> > +   char *bpf_dir = NULL;
> > +   FTSENT *ftse = NULL;
> > +   FTS *ftsp = NULL;
> > +   int fd, err;
> > +
> > +   mntfile = setmntent("/proc/mounts", "r");
> > +   if (!mntfile)
> > +   return -1;
> > +
> > +   while ((mntent = getmntent(mntfile)) != NULL) {
> > +   if (strncmp(mntent->mnt_type, "bpf", 3) == 0) {
> > +   bpf_dir = mntent->mnt_dir;
> > +   break;
> 
> It works well to find a bpf virtual file system, but it stops after the first 
> one it
> finds, although it is possible to have several bpffs on the system. Since you
> already have all the logics, could you move the
> fts_read() step inside this loop, so as to browse all existing bpffs instead 
> of just
> the first one?
> 

Thanks. Sending V3 soon with this change and other coding style fixes.

Prashant




[RFC 8/9] net: move adaptive interrpt coalescing code to lib/

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This takes the code that is not generically named to lib/.

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalecing paramets per ring.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   1 +
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 303 
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h| 107 ---
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 8 files changed, 419 insertions(+), 413 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d5d6d3d..19b21b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_rx_am.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 203dc7b..04b36fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_rx_am.h"
+#include 
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1f8fda1..391f1ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -31,6 +31,7 @@
  */
 
 #include "en.h"
+#include 
 
 void mlx5e_rx_am_work(struct work_struct *work)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
deleted file mode 100644
index 37ea6d1..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
+++ /dev/null
@@ -1,303 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_PARAMS_AM_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_RX_AM_DEF_PROFILE_CQE 1
-#define NET_RX_AM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_AM_NUM_PROFILES */
-#define NET_AM_EQE_PROFILES { \
-   {1,   NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  

[RFC 3/9] mlx5_en: remove rq references in mlx5e_rx_am

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 22 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h |  5 -
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  5 -
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index acf32fe..845dbb8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -260,13 +260,15 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
return am->profile_ix != prev_ix;
 }
 
-static void mlx5e_am_sample(struct mlx5e_rq *rq,
+static void mlx5e_am_sample(u16 event_ctr,
+   u64 packets,
+   u64 bytes,
struct mlx5e_rx_am_sample *s)
 {
s->time  = ktime_get();
-   s->pkt_ctr   = rq->stats.packets;
-   s->byte_ctr  = rq->stats.bytes;
-   s->event_ctr = rq->cq.event_ctr;
+   s->pkt_ctr   = packets;
+   s->byte_ctr  = bytes;
+   s->event_ctr = event_ctr;
 }
 
 #define MLX5E_AM_NEVENTS 64
@@ -305,20 +307,22 @@ void mlx5e_rx_am_work(struct work_struct *work)
am->state = MLX5E_AM_START_MEASURE;
 }
 
-void mlx5e_rx_am(struct mlx5e_rq *rq)
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes)
 {
-   struct mlx5e_rx_am *am = >am;
struct mlx5e_rx_am_sample end_sample;
struct mlx5e_rx_am_stats curr_stats;
u16 nevents;
 
switch (am->state) {
case MLX5E_AM_MEASURE_IN_PROGRESS:
-   nevents = BIT_GAP(BITS_PER_TYPE(u16), rq->cq.event_ctr,
+   nevents = BIT_GAP(BITS_PER_TYPE(u16), event_ctr,
  am->start_sample.event_ctr);
if (nevents < MLX5E_AM_NEVENTS)
break;
-   mlx5e_am_sample(rq, _sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, _sample);
mlx5e_am_calc_stats(>start_sample, _sample,
_stats);
if (mlx5e_am_decision(_stats, am)) {
@@ -328,7 +332,7 @@ void mlx5e_rx_am(struct mlx5e_rq *rq)
}
/* fall through */
case MLX5E_AM_START_MEASURE:
-   mlx5e_am_sample(rq, >start_sample);
+   mlx5e_am_sample(event_ctr, packets, bytes, >start_sample);
am->state = MLX5E_AM_MEASURE_IN_PROGRESS;
break;
case MLX5E_AM_APPLY_NEW_PROFILE:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 869e4e7..90e4913 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,7 +71,10 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am(struct mlx5e_rx_am *am,
+u16 event_ctr,
+u64 packets,
+u64 bytes);
 void mlx5e_rx_am_work(struct work_struct *work);
 struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index e906b75..8fed6c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -77,7 +77,10 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
mlx5e_cq_arm(>sq[i].cq);
 
if (MLX5E_TEST_BIT(c->rq.state, MLX5E_RQ_STATE_AM))
-   mlx5e_rx_am(>rq);
+   mlx5e_rx_am(>rq.am,
+   c->rq.cq.event_ctr,
+   c->rq.stats.packets,
+   c->rq.stats.bytes);
 
mlx5e_cq_arm(>rq.cq);
mlx5e_cq_arm(>icosq.cq);
-- 
2.7.4



[RFC 5/9] mlx5_en: move generic functions to new file

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_rx_am.c.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 272 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h |   1 +
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 303 +
 4 files changed, 307 insertions(+), 271 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 19b21b4..d5d6d3d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o net_rx_am.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 02d4f80..b9b434b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -32,249 +32,13 @@
 
 #include "en.h"
 
-/* Adaptive moderation profiles */
-#define MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define MLX5E_RX_AM_DEF_PROFILE_CQE 1
-#define MLX5E_RX_AM_DEF_PROFILE_EQE 1
-#define MLX5E_PARAMS_AM_NUM_PROFILES 5
-
-/* All profiles sizes must be MLX5E_PARAMS_AM_NUM_PROFILES */
-#define MLX5_AM_EQE_PROFILES { \
-   {1,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {128, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {256, MLX5E_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-}
-
-#define MLX5_AM_CQE_PROFILES { \
-   {2,  256}, \
-   {8,  128}, \
-   {16, 64},  \
-   {32, 64},  \
-   {64, 64}   \
-}
-
-static const struct mlx5e_cq_moder
-profile[MLX5_CQ_PERIOD_NUM_MODES][MLX5E_PARAMS_AM_NUM_PROFILES] = {
-   MLX5_AM_EQE_PROFILES,
-   MLX5_AM_CQE_PROFILES,
-};
-
-static inline struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, 
int ix)
-{
-   return profile[cq_period_mode][ix];
-}
-
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode)
-{
-   int default_profile_ix;
-
-   if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
-   default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_CQE;
-   else /* MLX5_CQ_PERIOD_MODE_START_FROM_EQE */
-   default_profile_ix = MLX5E_RX_AM_DEF_PROFILE_EQE;
-
-   return profile[rx_cq_period_mode][default_profile_ix];
-}
-
-
-static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
-{
-   switch (am->tune_state) {
-   case MLX5E_AM_PARKING_ON_TOP:
-   case MLX5E_AM_PARKING_TIRED:
-   return true;
-   case MLX5E_AM_GOING_RIGHT:
-   return (am->steps_left > 1) && (am->steps_right == 1);
-   default: /* MLX5E_AM_GOING_LEFT */
-   return (am->steps_right > 1) && (am->steps_left == 1);
-   }
-}
-
-static void mlx5e_am_turn(struct mlx5e_rx_am *am)
-{
-   switch (am->tune_state) {
-   case MLX5E_AM_PARKING_ON_TOP:
-   case MLX5E_AM_PARKING_TIRED:
-   break;
-   case MLX5E_AM_GOING_RIGHT:
-   am->tune_state = MLX5E_AM_GOING_LEFT;
-   am->steps_left = 0;
-   break;
-   case MLX5E_AM_GOING_LEFT:
-   am->tune_state = MLX5E_AM_GOING_RIGHT;
-   am->steps_right = 0;
-   break;
-   }
-}
-
-static int mlx5e_am_step(struct mlx5e_rx_am *am)
-{
-   if (am->tired == (MLX5E_PARAMS_AM_NUM_PROFILES * 2))
-   return MLX5E_AM_TOO_TIRED;
-
-   switch (am->tune_state) {
-   case MLX5E_AM_PARKING_ON_TOP:
-   case MLX5E_AM_PARKING_TIRED:
-   break;
-   case MLX5E_AM_GOING_RIGHT:
-   if (am->profile_ix == (MLX5E_PARAMS_AM_NUM_PROFILES - 1))
-   return MLX5E_AM_ON_EDGE;
-   am->profile_ix++;
-   am->steps_right++;
-   break;
-   case MLX5E_AM_GOING_LEFT:
-   if (am->profile_ix == 0)
-   return MLX5E_AM_ON_EDGE;
-   am->profile_ix--;
-   am->steps_left++;
-   break;
-   }
-
-   am->tired++;
-   return 

[RFC 6/9] mlx5_en: rename en_rx_am.h to net_rx_am.h

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This is so net_rx_am.h can be easily moved out of mlx5/core.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 108 -
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h| 108 +
 3 files changed, 109 insertions(+), 109 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1c56d16..a9dc118 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "en_rx_am.h"
+#include "net_rx_am.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
deleted file mode 100644
index ef86bf8..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ /dev/null
@@ -1,108 +0,0 @@
-/*
- * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
- * Copyright (c) 2017, Broadcom Limited
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
-*/
-
-#ifndef MLX5_AM_H
-#define MLX5_AM_H
-
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
-enum {
-   MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
-   MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
-   MLX5_CQ_PERIOD_NUM_MODES
-};
-
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
-
-void mlx5e_rx_am(struct mlx5e_rx_am *am,
-u16 event_ctr,
-u64 packets,
-u64 bytes);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-struct mlx5e_cq_moder mlx5e_am_get_profile(u8 cq_period_mode, int ix);
-
-#endif /* MLX5_AM_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h
new file mode 100644
index 000..ef86bf8
--- /dev/null
+++ 

[RFC 4/9] mlx5_en: move AM logic enums

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 25 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 26 ++
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 845dbb8..02d4f80 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -78,31 +78,6 @@ struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 
rx_cq_period_mode)
return profile[rx_cq_period_mode][default_profile_ix];
 }
 
-/* Adaptive moderation logic */
-enum {
-   MLX5E_AM_START_MEASURE,
-   MLX5E_AM_MEASURE_IN_PROGRESS,
-   MLX5E_AM_APPLY_NEW_PROFILE,
-};
-
-enum {
-   MLX5E_AM_PARKING_ON_TOP,
-   MLX5E_AM_PARKING_TIRED,
-   MLX5E_AM_GOING_RIGHT,
-   MLX5E_AM_GOING_LEFT,
-};
-
-enum {
-   MLX5E_AM_STATS_WORSE,
-   MLX5E_AM_STATS_SAME,
-   MLX5E_AM_STATS_BETTER,
-};
-
-enum {
-   MLX5E_AM_STEPPED,
-   MLX5E_AM_TOO_TIRED,
-   MLX5E_AM_ON_EDGE,
-};
 
 static bool mlx5e_am_on_top(struct mlx5e_rx_am *am)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 90e4913..efbee99 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,6 +71,32 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+/* Adaptive moderation logic */
+enum {
+   MLX5E_AM_START_MEASURE,
+   MLX5E_AM_MEASURE_IN_PROGRESS,
+   MLX5E_AM_APPLY_NEW_PROFILE,
+};
+
+enum {
+   MLX5E_AM_PARKING_ON_TOP,
+   MLX5E_AM_PARKING_TIRED,
+   MLX5E_AM_GOING_RIGHT,
+   MLX5E_AM_GOING_LEFT,
+};
+
+enum {
+   MLX5E_AM_STATS_WORSE,
+   MLX5E_AM_STATS_SAME,
+   MLX5E_AM_STATS_BETTER,
+};
+
+enum {
+   MLX5E_AM_STEPPED,
+   MLX5E_AM_TOO_TIRED,
+   MLX5E_AM_ON_EDGE,
+};
+
 void mlx5e_rx_am(struct mlx5e_rx_am *am,
 u16 event_ctr,
 u64 packets,
-- 
2.7.4



[RFC 9/9] bnxt_en: add support for software adaptive interrupt moderation

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This implements the changes needed for the bnxt_en driver to add support
for adaptive interrupt moderation per ring.

This does add additional counters in the receive path, but testing shows
that any additional instructions are offset by throughput gain when the
default configuration is for low latency.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/broadcom/bnxt/Makefile   |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 51 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  7 
 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c   | 32 ++
 5 files changed, 114 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile 
b/drivers/net/ethernet/broadcom/bnxt/Makefile
index 59c8ec9..1b0c78c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_BNXT) += bnxt_en.o
 
-bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o
+bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o 
bnxt_xdp.o bnxt_vfr.o bnxt_devlink.o bnxt_rx_am.o
 bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4e3d569..e1110d9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1482,6 +1482,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
u32 tmp_raw_cons = *raw_cons;
u16 cfa_code, cons, prod, cp_cons = RING_CMP(tmp_raw_cons);
struct bnxt_sw_rx_bd *rx_buf;
+   unsigned int pkts = 0;
unsigned int len;
u8 *data_ptr, agg_bufs, cmp_type;
dma_addr_t dma_addr;
@@ -1522,6 +1523,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
 
rc = -ENOMEM;
if (likely(skb)) {
+   struct skb_shared_info *shinfo = skb_shinfo(skb);
+   pkts = shinfo->nr_frags;
bnxt_deliver_skb(bp, bnapi, skb);
rc = 1;
}
@@ -1645,6 +1648,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
rxr->rx_next_cons = NEXT_RX(cons);
 
 next_rx_no_prod:
+   cpr->rx_packets += pkts ? : 1;
+   cpr->rx_bytes += len;
*raw_cons = tmp_raw_cons;
 
return rc;
@@ -1798,6 +1803,7 @@ static irqreturn_t bnxt_msix(int irq, void *dev_instance)
struct bnxt_cp_ring_info *cpr = >cp_ring;
u32 cons = RING_CMP(cpr->cp_raw_cons);
 
+   cpr->event_ctr++;
prefetch(>cp_desc_ring[CP_RING(cons)][CP_IDX(cons)]);
napi_schedule(>napi);
return IRQ_HANDLED;
@@ -2021,6 +2027,11 @@ static int bnxt_poll(struct napi_struct *napi, int 
budget)
break;
}
}
+   if (bp->flags & BNXT_FLAG_RX_AM)
+   net_rx_am(>am,
+ cpr->event_ctr,
+ cpr->rx_packets,
+ cpr->rx_bytes);
mmiowb();
return work_done;
 }
@@ -2606,6 +2617,8 @@ static void bnxt_init_cp_rings(struct bnxt *bp)
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
ring->fw_ring_id = INVALID_HW_RING_ID;
+   cpr->rx_ring_coal.coal_ticks = bp->rx_coal.coal_ticks;
+   cpr->rx_ring_coal.coal_bufs = bp->rx_coal.coal_bufs;
}
 }
 
@@ -4579,6 +4592,38 @@ static void bnxt_hwrm_set_coal_params(struct bnxt_coal 
*hw_coal,
req->flags = cpu_to_le16(flags);
 }
 
+int bnxt_hwrm_set_ring_coal(struct bnxt *bp, struct bnxt_napi *bnapi)
+{
+   struct hwrm_ring_cmpl_ring_cfg_aggint_params_input req_rx = {0};
+   struct bnxt_cp_ring_info *cpr = >cp_ring;
+   struct bnxt_coal coal;
+   int rc = 0;
+
+/* Tick values in micro seconds.
+ * 1 coal_buf x bufs_per_record = 1 completion record.
+ */
+   memcpy(, >rx_coal, sizeof(struct bnxt_coal));
+
+   coal.coal_ticks = cpr->rx_ring_coal.coal_ticks;
+   coal.coal_bufs = cpr->rx_ring_coal.coal_bufs;
+
+   if (!bnapi->rx_ring)
+   return -ENODEV;
+
+   bnxt_hwrm_cmd_hdr_init(bp, _rx,
+  HWRM_RING_CMPL_RING_CFG_AGGINT_PARAMS, -1, -1);
+
+   bnxt_hwrm_set_coal_params(, _rx);
+
+   mutex_lock(>hwrm_cmd_lock);
+   req_rx.ring_id = cpr->cp_ring_struct.fw_ring_id;
+
+   rc = _hwrm_send_message(bp, _rx, sizeof(req_rx),
+   HWRM_CMD_TIMEOUT);
+   mutex_unlock(>hwrm_cmd_lock);
+   return rc;
+}
+
 int bnxt_hwrm_set_coal(struct bnxt *bp)
 

[RFC 8/9] net: move adaptive interrupt coalescing code to lib/

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This takes the code that is not generically named to lib/.

This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   1 +
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 303 
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h| 107 ---
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 8 files changed, 419 insertions(+), 413 deletions(-)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.h
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d5d6d3d..19b21b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
en_tx.o en_rx.o en_rx_am.o en_txrx.o en_stats.o vxlan.o \
-   en_arfs.o en_fs_ethtool.o en_selftest.o net_rx_am.o
+   en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 203dc7b..04b36fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,7 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
-#include "net_rx_am.h"
+#include 
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index 1f8fda1..391f1ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -31,6 +31,7 @@
  */
 
 #include "en.h"
+#include 
 
 void mlx5e_rx_am_work(struct work_struct *work)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
deleted file mode 100644
index 37ea6d1..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/net_rx_am.c
+++ /dev/null
@@ -1,303 +0,0 @@
-/*
- * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
- * Copyright (c) 2017, Broadcom Limiited. All rights reserved.
- *
- * This software is available to you under a choice of one of two
- * licenses.  You may choose to be licensed under the terms of the GNU
- * General Public License (GPL) Version 2, available from the file
- * COPYING in the main directory of this source tree, or the
- * OpenIB.org BSD license below:
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- *  - Redistributions of source code must retain the above
- *copyright notice, this list of conditions and the following
- *disclaimer.
- *
- *  - Redistributions in binary form must reproduce the above
- *copyright notice, this list of conditions and the following
- *disclaimer in the documentation and/or other materials
- *provided with the distribution.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
- * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
- * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
- * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
- * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
- * SOFTWARE.
- */
-
-#include "en.h"
-
-#define NET_PARAMS_AM_NUM_PROFILES 5
-/* Adaptive moderation profiles */
-#define NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE 256
-#define NET_RX_AM_DEF_PROFILE_CQE 1
-#define NET_RX_AM_DEF_PROFILE_EQE 1
-
-/* All profiles sizes must be NET_PARAMS_AM_NUM_PROFILES */
-#define NET_AM_EQE_PROFILES { \
-   {1,   NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {8,   NET_AM_DEFAULT_RX_CQ_MODERATION_PKTS_FROM_EQE}, \
-   {64,  

[RFC 7/9] mlx5_en: remove Mellanox references in AM code

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

Remove all mlx5* and MLX* references to net_ and NET_, respectively in
code that handles software interrupt moderation.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   7 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.c| 214 ++---
 .../net/ethernet/mellanox/mlx5/core/net_rx_am.h|  57 +++---
 8 files changed, 157 insertions(+), 157 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a9dc118..203dc7b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -221,8 +221,8 @@ struct mlx5e_params {
u8  num_tc;
u8  rx_cq_period_mode;
bool rx_cqe_compress_def;
-   struct mlx5e_cq_moder rx_cq_moderation;
-   struct mlx5e_cq_moder tx_cq_moderation;
+   struct net_cq_moder rx_cq_moderation;
+   struct net_cq_moder tx_cq_moderation;
bool lro_en;
u32 lro_wqe_sz;
u16 tx_max_inline;
@@ -505,7 +505,7 @@ struct mlx5e_rq {
unsigned long  state;
intix;
 
-   struct mlx5e_rx_am am; /* Adaptive Moderation */
+   struct net_rx_am am; /* Adaptive Moderation */
 
/* XDP */
struct bpf_prog   *xdp_prog;
@@ -1036,4 +1036,5 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
 
+void mlx5e_rx_am_work(struct work_struct *work);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b34aa8e..3955521 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1454,11 +1454,11 @@ static int set_pflag_rx_cqe_based_moder(struct 
net_device *netdev, bool enable)
int err = 0;
 
rx_cq_period_mode = enable ?
-   MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
-   MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+   NET_CQ_PERIOD_MODE_START_FROM_CQE :
+   NET_CQ_PERIOD_MODE_START_FROM_EQE;
rx_mode_changed = rx_cq_period_mode != 
priv->channels.params.rx_cq_period_mode;
 
-   if (rx_cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE &&
+   if (rx_cq_period_mode == NET_CQ_PERIOD_MODE_START_FROM_CQE &&
!MLX5_CAP_GEN(mdev, cq_period_start_from_cqe))
return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 28ae00b..dcd96fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1571,7 +1571,7 @@ static void mlx5e_destroy_cq(struct mlx5e_cq *cq)
 }
 
 static int mlx5e_open_cq(struct mlx5e_channel *c,
-struct mlx5e_cq_moder moder,
+struct net_cq_moder moder,
 struct mlx5e_cq_param *param,
 struct mlx5e_cq *cq)
 {
@@ -1748,7 +1748,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, 
int ix,
  struct mlx5e_channel_param *cparam,
  struct mlx5e_channel **cp)
 {
-   struct mlx5e_cq_moder icocq_moder = {0, 0};
+   struct net_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
struct mlx5e_channel *c;
unsigned int irq;
@@ -1987,7 +1987,7 @@ static void mlx5e_build_tx_cq_param(struct mlx5e_priv 
*priv,
 
mlx5e_build_common_cq_param(priv, param);
 
-   param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+   param->cq_period_mode = NET_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_ico_cq_param(struct mlx5e_priv *priv,
@@ -2000,7 +2000,7 @@ static void mlx5e_build_ico_cq_param(struct mlx5e_priv 
*priv,
 
mlx5e_build_common_cq_param(priv, param);
 
-   param->cq_period_mode = MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
+   param->cq_period_mode = NET_CQ_PERIOD_MODE_START_FROM_EQE;
 }
 
 static void mlx5e_build_icosq_param(struct mlx5e_priv *priv,
@@ -3996,16 +3996,16 @@ void mlx5e_set_rx_cq_mode_params(struct mlx5e_params 
*params, u8 cq_period_mode)
params->rx_cq_moderation.usec =
MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC;
 
-   if (cq_period_mode == MLX5_CQ_PERIOD_MODE_START_FROM_CQE)
+   if (cq_period_mode == NET_CQ_PERIOD_MODE_START_FROM_CQE)
   

[RFC 1/9] mlx5_en: move interrupt moderation structs to new file

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

Create new header file to prepare to move code that handles irq
moderation to a library.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 32 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 74 ++
 include/linux/mlx5/mlx5_ifc.h  |  6 --
 3 files changed, 75 insertions(+), 37 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e613ce0..1bde086 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -49,6 +49,7 @@
 #include "wq.h"
 #include "mlx5_core.h"
 #include "en_stats.h"
+#include "en_rx_am.h"
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
@@ -209,11 +210,6 @@ enum mlx5e_priv_flag {
 #define MLX5E_MAX_BW_ALLOC 100 /* Max percentage of BW allocation */
 #endif
 
-struct mlx5e_cq_moder {
-   u16 usec;
-   u16 pkts;
-};
-
 struct mlx5e_params {
u8  log_sq_size;
u8  rq_wq_type;
@@ -449,32 +445,6 @@ struct mlx5e_mpw_info {
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
 };
 
-struct mlx5e_rx_am_stats {
-   int ppms; /* packets per msec */
-   int bpms; /* bytes per msec */
-   int epms; /* events per msec */
-};
-
-struct mlx5e_rx_am_sample {
-   ktime_t time;
-   u32 pkt_ctr;
-   u32 byte_ctr;
-   u16 event_ctr;
-};
-
-struct mlx5e_rx_am { /* Adaptive Moderation */
-   u8  state;
-   struct mlx5e_rx_am_statsprev_stats;
-   struct mlx5e_rx_am_sample   start_sample;
-   struct work_struct  work;
-   u8  profile_ix;
-   u8  mode;
-   u8  tune_state;
-   u8  steps_right;
-   u8  steps_left;
-   u8  tired;
-};
-
 /* a single cache unit is capable to serve one napi call (for non-striding rq)
  * or a MPWQE (for striding rq).
  */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
new file mode 100644
index 000..176a732
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2013-2015, Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2017, Broadcom Limited
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+#ifndef MLX5_AM_H
+#define MLX5_AM_H
+
+struct mlx5e_cq_moder {
+   u16 usec;
+   u16 pkts;
+};
+
+struct mlx5e_rx_am_sample {
+   ktime_t time;
+   u32 pkt_ctr;
+   u32 byte_ctr;
+   u16 event_ctr;
+};
+
+struct mlx5e_rx_am_stats {
+   int ppms; /* packets per msec */
+   int bpms; /* bytes per msec */
+   int epms; /* events per msec */
+};
+
+struct mlx5e_rx_am { /* Adaptive Moderation */
+   u8  state;
+   struct mlx5e_rx_am_statsprev_stats;
+   struct mlx5e_rx_am_sample   start_sample;
+   struct work_struct  work;
+   u8  profile_ix;
+   u8  mode;
+   u8  tune_state;
+   u8 

[RFC 2/9] mlx5_en: move interrupt moderation forward delcarations

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1bde086..1c56d16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -798,10 +798,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 176a732..869e4e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,4 +71,8 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[RFC 2/9] mlx5_en: move interrupt moderation forward declarations

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 4 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 1bde086..1c56d16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -798,10 +798,6 @@ void mlx5e_dealloc_rx_wqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix);
 void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 
-void mlx5e_rx_am(struct mlx5e_rq *rq);
-void mlx5e_rx_am_work(struct work_struct *work);
-struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
-
 void mlx5e_update_stats(struct mlx5e_priv *priv, bool full);
 
 int mlx5e_create_flow_steering(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
index 176a732..869e4e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.h
@@ -71,4 +71,8 @@ enum {
MLX5_CQ_PERIOD_NUM_MODES
 };
 
+void mlx5e_rx_am(struct mlx5e_rq *rq);
+void mlx5e_rx_am_work(struct work_struct *work);
+struct mlx5e_cq_moder mlx5e_am_get_def_profile(u8 rx_cq_period_mode);
+
 #endif /* MLX5_AM_H */
-- 
2.7.4



[RFC 0/9] net: create adaptive software irq moderation library

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This RFC converts the adaptive interrupt moderation library from the
mlx5_en driver into a library so it can be used by any driver.  The last
patch in this set adds support for interrupt moderation in the bnxt_en
driver.

The main purpose of this code in the mlx5 driver is to allow an 
  administrator to make sure that default coalesce 
settings are optimized   for low latency, but 
quickly adapt to handle high throughput traffic and
optimize how many packets are received during each napi poll.

For any new driver the following changes would ne needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

My main reason for making this an RFC is that I would like verification
from Mellanox that the performance of their driver does not change in a
unintended way.  I did some basic testing (netperf) and did not note a
statistically significant change in throughput or CPU utilization before
and after this set.  

Andy Gospodarek (9):
  mlx5_en: move interrupt moderation structs to new file
  mlx5_en: move interrupt moderation forward delcarations
  mlx5_en: remove rq references in mlx5e_rx_am
  mlx5_en: move AM logic enums
  mlx5_en: move generic functions to new file
  mlx5_en: rename en_rx_am.h to net_rx_am.h
  mlx5_en: remove Mellanox references in AM code
  net: move adaptive interrpt coalescing code to lib/
  bnxt_en: add support for software adaptive interrupt moderation

 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  51 
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   7 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c|  32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  43 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 298 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +-
 include/linux/mlx5/mlx5_ifc.h  |   6 -
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 15 files changed, 558 insertions(+), 365 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

-- 
2.7.4



[RFC 0/9] net: create adaptive software irq moderation library

2017-11-05 Thread Andy Gospodarek
From: Andy Gospodarek 

This RFC converts the adaptive interrupt moderation library from the
mlx5_en driver into a library so it can be used by any driver.  The last
patch in this set adds support for interrupt moderation in the bnxt_en
driver.

The main purpose of this code in the mlx5_en driver is to allow an
administrator to make sure that default coalesce settings are optimized
for low latency, but quickly adapt to handle high throughput traffic and
optimize how many packets are received during each napi poll.

For any new driver the following changes would be needed to use this
library:

- add elements in ring struct to track items needed by this library
- create function that can be called to actually set coalesce settings
  for the driver

My main reason for making this an RFC is that I would like verification
from Mellanox that the performance of their driver does not change in a
unintended way.  I did some basic testing (netperf) and did not note a
statistically significant change in throughput or CPU utilization before
and after this set.  

Credit to Rob Rice and Lee Reed for doing some of the initial proof of
concept and testing for this patch.

Andy Gospodarek (9):
  mlx5_en: move interrupt moderation structs to new file
  mlx5_en: move interrupt moderation forward delcarations
  mlx5_en: remove rq references in mlx5e_rx_am
  mlx5_en: move AM logic enums
  mlx5_en: move generic functions to new file
  mlx5_en: rename en_rx_am.h to net_rx_am.h
  mlx5_en: remove Mellanox references in AM code
  net: move adaptive interrpt coalescing code to lib/
  bnxt_en: add support for software adaptive interrupt moderation

 drivers/net/ethernet/broadcom/bnxt/Makefile|   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c  |  51 
 drivers/net/ethernet/broadcom/bnxt/bnxt.h  |  34 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c  |   7 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c|  32 +++
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  43 +--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  18 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 298 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   5 +-
 include/linux/mlx5/mlx5_ifc.h  |   6 -
 include/linux/net_rx_am.h  | 109 
 lib/Makefile   |   2 +-
 lib/net_rx_am.c| 306 +
 15 files changed, 558 insertions(+), 365 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_rx_am.c
 create mode 100644 include/linux/net_rx_am.h
 create mode 100644 lib/net_rx_am.c

-- 
2.7.4



Re: [PATCH net-next] bpf: Rename tcp_bbf.readme to tcp_bpf.readme

2017-11-05 Thread Daniel Borkmann

On 11/06/2017 03:44 AM, Lawrence Brakmo wrote:

The original patch had the wrong filename.

Fixes: bfdf75693875 ("bpf: create samples/bpf/tcp_bpf.readme")
Signed-off-by: Lawrence Brakmo 


Acked-by: Daniel Borkmann 


Re: [PATCH] reduce the spinlock conflict during massive connect

2017-11-05 Thread Eric Dumazet
On Mon, 2017-11-06 at 10:28 +0800, Liu Yu wrote:
> From: Liu Yu 
> 
> When a mount of processes connect to the same port at the same address
> simultaneously, they are likely getting the same bhash and therefore
> conflict with each other.
> 
> The more the cpu number, the worse in this case.
> 
> Use spin_trylock instead for this scene, which seems doesn't matter
> for common case.
> 
> Signed-off-by: Liu Yu 
> ---
>  net/ipv4/inet_hashtables.c |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index e7d15fb..cc11ec7 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -581,13 +581,17 @@ int __inet_hash_connect(struct inet_timewait_death_row 
> *death_row,
>  other_parity_scan:
>   port = low + offset;
>   for (i = 0; i < remaining; i += 2, port += 2) {
> + int ret;
> +
>   if (unlikely(port >= high))
>   port -= remaining;
>   if (inet_is_local_reserved_port(net, port))
>   continue;
>   head = >bhash[inet_bhashfn(net, port,
> hinfo->bhash_size)];
> - spin_lock_bh(>lock);
> + ret = spin_trylock(>lock);
> + if (unlikely(!ret))
> + continue;
>  
>   /* Does not bother with rcv_saddr checks, because
>* the established check is already unique enough.

This is broken.

I am pretty sure you have not really tested this patch properly.

Chances are very high that a connect() will miss slots and wont succeed,
when table is almost full.

Performance is nice, but we actually need to allocate a 4-tuple in a
more deterministic fashion.





Re: [PATCH net-next] bpf: Rename tcp_bbf.readme to tcp_bpf.readme

2017-11-05 Thread Alexei Starovoitov

On 11/6/17 11:44 AM, Lawrence Brakmo wrote:

The original patch had the wrong filename.

Fixes: bfdf75693875 ("bpf: create samples/bpf/tcp_bpf.readme")
Signed-off-by: Lawrence Brakmo 


Acked-by: Alexei Starovoitov 



Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-05 Thread Sandipan Das
Hi Alexei, Naveen,

On 11/04/2017 11:01 PM, Naveen N. Rao wrote:
> 
> I think the offsets described in dwarf were incorrect with 
> CONFIG_GCC_PLUGIN_RANDSTRUCT, but I'll let Sandipan confirm that.
> 

I think that the offsets described in dwarf are probably incorrect when
CONFIG_GCC_PLUGIN_RANDSTRUCT is enabled. To verify this, I used perf
to attach a probe to try_to_wake_up() which is the also the function to
which waker() is attached in the previously mentioned kernel sample. So,
if the run the following:

# perf probe "try_to_wake_up" "p->pid"
# perf record -a -e probe:try_to_wake_up
# perf script

The value of p->pid is reported as 0. Similarly, if I try to read
p->comm, it is reported to be an empty string. The same problem is
seen with systemtap as well.

Also, if I do a printk with offsetof(struct task_struct, pid) and
offsetof(struct task_struct, comm) inside the kernel code and then
compare the values with the offsets reported by pahole, they are
completely different.

- Sandipan



Re: [PATCH net-next v15] openvswitch: enable NSH support

2017-11-05 Thread Yang, Yi
On Sat, Nov 04, 2017 at 10:29:46PM +0800, Pravin Shelar wrote:
> On Tue, Oct 31, 2017 at 9:03 PM, Yi Yang  wrote:
> > +int nsh_push(struct sk_buff *skb, const struct nshhdr *pushed_nh)
> > +{
> > +   struct nshhdr *nh;
> > +   size_t length = nsh_hdr_len(pushed_nh);
> > +   u8 next_proto;
> > +
> > +   if (skb->mac_len) {
> > +   next_proto = TUN_P_ETHERNET;
> > +   } else {
> > +   next_proto = tun_p_from_eth_p(skb->protocol);
> > +   if (!next_proto)
> > +   return -EAFNOSUPPORT;
> check for supported protocols can be moved to flow install validation
> in __ovs_nla_copy_actions().
> 
> > +   }
> > +
> > +   /* Add the NSH header */
> > +   if (skb_cow_head(skb, length) < 0)
> > +   return -ENOMEM;
> > +
> > +   skb_push(skb, length);
> > +   nh = (struct nshhdr *)(skb->data);
> > +   memcpy(nh, pushed_nh, length);
> > +   nh->np = next_proto;
> > +
> > +   skb->protocol = htons(ETH_P_NSH);
> > +   skb_reset_mac_header(skb);
> > +   skb_reset_network_header(skb);
> > +   skb_reset_mac_len(skb);
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(nsh_push);
> > +
> > +int nsh_pop(struct sk_buff *skb)
> > +{
> > +   struct nshhdr *nh;
> > +   size_t length;
> > +   __be16 inner_proto;
> > +
> > +   if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN))
> > +   return -ENOMEM;
> > +   nh = (struct nshhdr *)(skb->data);
> > +   length = nsh_hdr_len(nh);
> > +   inner_proto = tun_p_to_eth_p(nh->np);
> same as above, this check can be moved to flow install 
> __ovs_nla_copy_actions().

Pravin, these two functions are not only for OVS, you can see it is
net/nsh/nsh.c, Jiri and Eric mentioned they also could be used by TC.

I understand you expect some checks should be moved to slow path, but
for there two cases, we can't remove them into __ovs_nla_copy_actions.

> 
> > +   if (!pskb_may_pull(skb, length))
> > +   return -ENOMEM;
> > +
> > +   if (!inner_proto)
> > +   return -EAFNOSUPPORT;
> > +
> > +   skb_pull(skb, length);
> > +   skb_reset_mac_header(skb);
> > +   skb_reset_network_header(skb);
> > +   skb_reset_mac_len(skb);
> > +   skb->protocol = inner_proto;
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(nsh_pop);
> > +
> ...
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index a551232..dd1449d 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> ...
> > +static int pop_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +   int err;
> > +
> > +   if (ovs_key_mac_proto(key) != MAC_PROTO_NONE ||
> > +   skb->protocol != htons(ETH_P_NSH)) {
> > +   return -EINVAL;
> > +   }
> > +
> These checks can be moved to flow install.

Done in v16, here is incremental patch. I have sent out v16.

diff -u b/net/openvswitch/actions.c b/net/openvswitch/actions.c
--- b/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -400,11 +400,6 @@
 {
int err;
 
-   if (ovs_key_mac_proto(key) != MAC_PROTO_NONE ||
-   skb->protocol != htons(ETH_P_NSH)) {
-   return -EINVAL;
-   }
-
err = nsh_pop(skb);
if (err)
return err;
diff -u b/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
--- b/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -2737,6 +2737,8 @@
break;
 
case OVS_KEY_ATTR_NSH:
+   if (eth_type != htons(ETH_P_NSH))
+   return -EINVAL;
if (!validate_nsh(nla_data(a), masked, false, log))
return -EINVAL;
break;
@@ -3006,6 +3008,8 @@
break;
 
case OVS_ACTION_ATTR_POP_NSH:
+   if (eth_type != htons(ETH_P_NSH))
+   return -EINVAL;
if (key->nsh.base.np == TUN_P_ETHERNET)
mac_proto = MAC_PROTO_ETHERNET;
else
> 
> > +   err = nsh_pop(skb);
> > +   if (err)
> > +   return err;
> > +
> > +   /* safe right before invalidate_flow_key */
> > +   if (skb->protocol == htons(ETH_P_TEB))
> > +   key->mac_proto = MAC_PROTO_ETHERNET;
> > +   else
> > +   key->mac_proto = MAC_PROTO_NONE;
> > +   invalidate_flow_key(key);
> > +   return 0;
> > +}
> > +


[PATCH net-next v16] openvswitch: enable NSH support

2017-11-05 Thread Yi Yang
v15->v16
 - Add csum recalculation for nsh_push, nsh_pop and set_nsh
   pointed out by Pravin
 - Move nsh key into the union with ipv4 and ipv6 and add
   check for nsh key in match_validate pointed out by Pravin
 - Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
 - Check size in nsh_hdr_from_nlattr
 - Fixed four small issues pointed out By Jiri and Eric

v13->v14
 - Rename skb_push_nsh to nsh_push per Dave's comment
 - Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
 - Fix NSH header length check in set_nsh

v11->v12
 - Fix missing changes old comments pointed out
 - Fix new comments for v11

v10->v11
 - Fix the left three disputable comments for v9
   but not fixed in v10.

v9->v10
 - Change struct ovs_key_nsh to
   struct ovs_nsh_key_base base;
   __be32 context[NSH_MD1_CONTEXT_SIZE];
 - Fix new comments for v9

v8->v9
 - Fix build error reported by daily intel build
   because nsh module isn't selected by openvswitch

v7->v8
 - Rework nested value and mask for OVS_KEY_ATTR_NSH
 - Change pop_nsh to adapt to nsh kernel module
 - Fix many issues per comments from Jiri Benc

v6->v7
 - Remove NSH GSO patches in v6 because Jiri Benc
   reworked it as another patch series and they have
   been merged.
 - Change it to adapt to nsh kernel module added by NSH
   GSO patch series

v5->v6
 - Fix the rest comments for v4.
 - Add NSH GSO support for VxLAN-gpe + NSH and
   Eth + NSH.

v4->v5
 - Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
 - Add new NSH match field ttl
 - Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
 - Fix comments for v3.

v2->v3
 - Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
 - Remove struct ovs_action_push_nsh completely
 - Add code to handle nested attribute for SET_MASKED
 - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
 - Fix comments and coding style issues by Jiri and Eric

v1->v2
 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
 - Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang 
Acked-by: Jiri Benc 
Acked-by: Eric Garver 
Acked-by: Pravin Shelar 
---
 include/net/nsh.h|   3 +
 include/uapi/linux/openvswitch.h |  29 
 net/nsh/nsh.c|  60 +++
 net/openvswitch/Kconfig  |   1 +
 net/openvswitch/actions.c| 116 ++
 net/openvswitch/flow.c   |  51 ++
 net/openvswitch/flow.h   |   7 +
 net/openvswitch/flow_netlink.c   | 330 ++-
 net/openvswitch/flow_netlink.h   |   5 +
 9 files changed, 600 insertions(+), 2 deletions(-)

diff --git a/include/net/nsh.h b/include/net/nsh.h
index a1eaea2..350b1ad 100644
--- a/include/net/nsh.h
+++ b/include/net/nsh.h
@@ -304,4 +304,7 @@ static inline void nsh_set_flags_ttl_len(struct nshhdr 
*nsh, u8 flags,
NSH_FLAGS_MASK | NSH_TTL_MASK | NSH_LEN_MASK);
 }
 
+int nsh_push(struct sk_buff *skb, const struct nshhdr *pushed_nh);
+int nsh_pop(struct sk_buff *skb);
+
 #endif /* __NET_NSH_H */
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 501e4c4..ec75a68 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -336,6 +336,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
+   OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -495,6 +496,30 @@ struct ovs_key_ct_tuple_ipv6 {
__u8   ipv6_proto;
 };
 
+enum ovs_nsh_key_attr {
+   OVS_NSH_KEY_ATTR_UNSPEC,
+   OVS_NSH_KEY_ATTR_BASE,  /* struct ovs_nsh_key_base. */
+   OVS_NSH_KEY_ATTR_MD1,   /* struct ovs_nsh_key_md1. */
+   OVS_NSH_KEY_ATTR_MD2,   /* variable-length octets for MD type 2. */
+   __OVS_NSH_KEY_ATTR_MAX
+};
+
+#define OVS_NSH_KEY_ATTR_MAX (__OVS_NSH_KEY_ATTR_MAX - 1)
+
+struct ovs_nsh_key_base {
+   __u8 flags;
+   __u8 ttl;
+   __u8 mdtype;
+   __u8 np;
+   __be32 path_hdr;
+};
+
+#define NSH_MD1_CONTEXT_SIZE 4
+
+struct ovs_nsh_key_md1 {
+   __be32 context[NSH_MD1_CONTEXT_SIZE];
+};
+
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
  * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
@@ -811,6 +836,8 @@ struct ovs_action_push_eth {
  * 

Re: [PATCH] reduce the spinlock conflict during massive connect

2017-11-05 Thread Cong Wang
On Sun, Nov 5, 2017 at 6:28 PM, Liu Yu  wrote:
> -   spin_lock_bh(>lock);
> +   ret = spin_trylock(>lock);

Clearly you want spin_trylock_bh() instead.


[PATCH net-next] bpf: Rename tcp_bbf.readme to tcp_bpf.readme

2017-11-05 Thread Lawrence Brakmo
The original patch had the wrong filename.

Fixes: bfdf75693875 ("bpf: create samples/bpf/tcp_bpf.readme")
Signed-off-by: Lawrence Brakmo 
---
 samples/bpf/{tcp_bbf.readme => tcp_bpf.readme} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename samples/bpf/{tcp_bbf.readme => tcp_bpf.readme} (100%)

diff --git a/samples/bpf/tcp_bbf.readme b/samples/bpf/tcp_bpf.readme
similarity index 100%
rename from samples/bpf/tcp_bbf.readme
rename to samples/bpf/tcp_bpf.readme
-- 
2.9.5



Re: [PATCH net-next 4/6] net: hns3: add support for set_link_ksettings

2017-11-05 Thread lipeng (Y)



On 2017/11/4 3:52, Florian Fainelli wrote:

On 11/02/2017 09:18 PM, Lipeng wrote:

From: Fuyun Liang 

This patch adds set_link_ksettings support for ethtool cmd.

Signed-off-by: Fuyun Liang 
Signed-off-by: Lipeng 
---
  drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
index c7b8ebd..7fe193b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
@@ -653,6 +653,16 @@ static int hns3_get_link_ksettings(struct net_device 
*netdev,
return 0;
  }
  
+static int hns3_set_link_ksettings(struct net_device *netdev,

+  const struct ethtool_link_ksettings *cmd)
+{
+   /* Only support ksettings_set for netdev with phy attached for now */
+   if (netdev->phydev)
+   return phy_ethtool_ksettings_set(netdev->phydev, cmd);
+
+   return -EOPNOTSUPP;

Consider using phy_ethtool_get_link_ksettings() which already checks for
netdev->phydev.

agree, Thanks for your comment.

as this patch has been applied to  net-next, we will push another 
cleanup patch.






[PATCH] reduce the spinlock conflict during massive connect

2017-11-05 Thread Liu Yu
From: Liu Yu 

When a mount of processes connect to the same port at the same address
simultaneously, they are likely getting the same bhash and therefore
conflict with each other.

The more the cpu number, the worse in this case.

Use spin_trylock instead for this scene, which seems doesn't matter
for common case.

Signed-off-by: Liu Yu 
---
 net/ipv4/inet_hashtables.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index e7d15fb..cc11ec7 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -581,13 +581,17 @@ int __inet_hash_connect(struct inet_timewait_death_row 
*death_row,
 other_parity_scan:
port = low + offset;
for (i = 0; i < remaining; i += 2, port += 2) {
+   int ret;
+
if (unlikely(port >= high))
port -= remaining;
if (inet_is_local_reserved_port(net, port))
continue;
head = >bhash[inet_bhashfn(net, port,
  hinfo->bhash_size)];
-   spin_lock_bh(>lock);
+   ret = spin_trylock(>lock);
+   if (unlikely(!ret))
+   continue;
 
/* Does not bother with rcv_saddr checks, because
 * the established check is already unique enough.
-- 
1.7.1



RE: [PATCH net v5 2/2] net: fec: Let fec_ptp have its own interrupt routine

2017-11-05 Thread Andy Duan
From: Troy Kisky  Sent: Saturday, November 04, 
2017 1:30 AM
>This is better for code locality and should slightly speed up normal 
>interrupts.
>
>This also allows PPS clock output to start working for i.mx7. This is because
>i.mx7 was already using the limit of 3 interrupts, and needed another.
>
>Signed-off-by: Troy Kisky 
>

This version seems fine for me. Thanks.
Acked-by: Fugang Duan 

>---
>
>v2: made this change independent of any devicetree change so that old dtbs
>continue to work.
>
>Continue to register ptp clock if interrupt is not found.
>
>v3: renamed "ptp" interrupt to "pps" interrupt
>
>v4: no change
>
>v5: moving binding documentation to this patch
>   as requested by Shawn Guo
>s/irq_index/irq_idx/
>add function fec_enet_get_irq_cnt() to encapsulate if,
>   as requested by Andy Duan
>---
> Documentation/devicetree/bindings/net/fsl-fec.txt | 13 
> drivers/net/ethernet/freescale/fec.h  |  3 +-
> drivers/net/ethernet/freescale/fec_main.c | 31 ++---
> drivers/net/ethernet/freescale/fec_ptp.c  | 82 +--
> 4 files changed, 84 insertions(+), 45 deletions(-)
>
>diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt
>b/Documentation/devicetree/bindings/net/fsl-fec.txt
>index 6f55bdd52f8a..f0dc94409107 100644
>--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
>+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
>@@ -34,6 +34,19 @@ Optional properties:
> - fsl,err006687-workaround-present: If present indicates that the system has
>   the hardware workaround for ERR006687 applied and does not need a
>software
>   workaround.
>+ -interrupt-names:  names of the interrupts listed in interrupts
>+property in
>+  the same order. The defaults if not specified are
>+  __Number of interrupts__   __Default__
>+  1   "int0"
>+  2   "int0", "pps"
>+  3   "int0", "int1", "int2"
>+  4   "int0", "int1", "int2", "pps"
>+  The order may be changed as long as they correspond to the interrupts
>+  property. Currently, only i.mx7 uses "int1" and "int2". They
>+correspond to
>+  tx/rx queues 1 and 2. "int0" will be used for queue 0 and ENET_MII
>interrupts.
>+  For imx6sx, "int0" handles all 3 queues and ENET_MII. "pps" is for
>+the pulse
>+  per second interrupt associated with 1588 precision time protocol(PTP).
>+
>
> Optional subnodes:
> - mdio : specifies the mdio bus in the FEC, used as a container for phy nodes
>diff --git a/drivers/net/ethernet/freescale/fec.h
>b/drivers/net/ethernet/freescale/fec.h
>index ede1876a9a19..0af58991ca8f 100644
>--- a/drivers/net/ethernet/freescale/fec.h
>+++ b/drivers/net/ethernet/freescale/fec.h
>@@ -582,12 +582,11 @@ struct fec_enet_private {
>   u64 ethtool_stats[0];
> };
>
>-void fec_ptp_init(struct platform_device *pdev);
>+void fec_ptp_init(struct platform_device *pdev, int irq_idx);
> void fec_ptp_stop(struct platform_device *pdev);  void
>fec_ptp_start_cyclecounter(struct net_device *ndev);  int fec_ptp_set(struct
>net_device *ndev, struct ifreq *ifr);  int fec_ptp_get(struct net_device *ndev,
>struct ifreq *ifr); -uint fec_ptp_check_pps_event(struct fec_enet_private
>*fep);
>
>
>/**
>**/
> #endif /* FEC_H */
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index 3dc2d771a222..610573855213 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -1602,10 +1602,6 @@ fec_enet_interrupt(int irq, void *dev_id)
>   ret = IRQ_HANDLED;
>   complete(>mdio_done);
>   }
>-
>-  if (fep->ptp_clock)
>-  if (fec_ptp_check_pps_event(fep))
>-  ret = IRQ_HANDLED;
>   return ret;
> }
>
>@@ -3312,6 +3308,19 @@ fec_enet_get_queue_num(struct platform_device
>*pdev, int *num_tx, int *num_rx)
>
> }
>
>+static int fec_enet_get_irq_cnt(struct platform_device *pdev) {
>+  int irq_cnt = platform_irq_count(pdev);
>+
>+  if (irq_cnt > FEC_IRQ_NUM)
>+  irq_cnt = FEC_IRQ_NUM;  /* last for pps */
>+  else if (irq_cnt == 2)
>+  irq_cnt = 1;/* last for pps */
>+  else if (irq_cnt <= 0)
>+  irq_cnt = 1;/* At least 1 irq is needed */
>+  return irq_cnt;
>+}
>+
> static int
> fec_probe(struct platform_device *pdev)  { @@ -3325,6 +3334,8 @@
>fec_probe(struct platform_device *pdev)
>   struct device_node *np = pdev->dev.of_node, *phy_node;
>   int num_tx_qs;
>   int num_rx_qs;
>+  char irq_name[8];
>+  int irq_cnt;
>
>   fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
>
>@@ -3465,18 +3476,20 @@ fec_probe(struct platform_device *pdev)
>   if (ret)
>   goto failed_reset;
>
>+  irq_cnt = 

Re: [PATCH] net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action

2017-11-05 Thread Gustavo A. R. Silva

Hi Saeed,

Quoting Saeed Mahameed :


On Sat, Nov 4, 2017 at 8:54 PM, Gustavo A. R. Silva
 wrote:

hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced
by accessing hn->ai.addr

Fix this by copying the MAC address into a local variable for its safe use
in all possible execution paths within function mlx5e_execute_l2_action.

Addresses-Coverity-ID: 1417789
Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS")
Signed-off-by: Gustavo A. R. Silva 


Acked-by: Saeed Mahameed 

Looks good.
Thank you Gustavo.



Glad to help.

Thanks
--
Gustavo A. R. Silva







[PATCHv2 net] bonding: discard lowest hash bit for 802.3ad layer3+4

2017-11-05 Thread Hangbin Liu
After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), we will try to use even ports for connect(). Then if an
application (seen clearly with iperf) opens multiple streams to the same
destination IP and port, each stream will be given an even source port.

So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
will always hash all these streams to the same interface. And the total
throughput will limited to a single slave.

Change the tcp code will impact the whole tcp behavior, only for bonding
usage. Paolo Abeni suggested fix this by changing the bonding code only,
which should be more reasonable, and less impact.

Fix this by discarding the lowest hash bit because it contains little entropy.
After the fix we can re-balance between slaves.

Signed-off-by: Paolo Abeni 
Signed-off-by: Hangbin Liu 
---
 drivers/net/bonding/bond_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c99dc59..76e8054 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3253,7 +3253,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff 
*skb)
hash ^= (hash >> 16);
hash ^= (hash >> 8);
 
-   return hash;
+   return hash >> 1;
 }
 
 /*-- Device entry points */
-- 
2.5.5



Re: [PATCH net] bonding: discard lowest hash bit for 802.3ad layer3+4

2017-11-05 Thread Hangbin Liu
On Sun, Nov 05, 2017 at 01:38:47PM -0800, Eric Dumazet wrote:
> > diff --git a/drivers/net/bonding/bond_main.c 
> > b/drivers/net/bonding/bond_main.c
> > index c99dc59..728fa08 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -3237,7 +3237,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct 
> > sk_buff *skb)
> >
> > if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> > skb->l4_hash)
> > -   return skb->hash;
> > +   return skb->hash >> 1;
> 
> Why are you changing this part ?
> 
> The l4 hash provided by local TCP stack does not use a pathological
> XOR based on ports/addresses,
> but a random value with pretty good entropy.
> 

Oh, my bad. I only tested the patch works, but not carefully check the
skb->hash when skb->l4_hash is 1. Will send a v2 patch to fix this.

Regards
Hangbin


Re: How to identify net namespace in kernel messages?

2017-11-05 Thread David Ahern
On 11/6/17 5:56 AM, Vasily Averin wrote:
> On 2017-11-05 15:48, David Miller wrote:
>> From: Vasily Averin 
>>> I doubt that pointer to freed net have value for someone except
>>> developers, on the other hand it helps to speed up the problem
>>> investigation.
>>
>> Any kernel pointer printed has value to attackers.
> 
> David, could you please advise how to identify net namespace in kernel 
> messages?
> 
> In OpenVz we got many requests from host admins, they need to understand
> which container triggered the message. In such cases we have added our custom
> Container Id, but mainline lacks it.
> 
> I expected that mainline can use net pointer for such purposes,
> nfsd does it for example:
> 
>  NFSD: starting 90-second grace period (net 880e307fe240)
> 
> Now you recommend do not use net pointer.
> However could you please advise some alternative?
> 

Perf now exports the device and inode. see perf_ns_link_info and its use.


[PATCH v2 net-next 5/5] ila: Add ila.txt

2017-11-05 Thread Tom Herbert
Add documenation for kernel ILA. This describes ILA, features,
configuration gives some examples.

Signed-off-by: Tom Herbert 
---
 Documentation/networking/ila.txt | 286 +++
 1 file changed, 286 insertions(+)
 create mode 100644 Documentation/networking/ila.txt

diff --git a/Documentation/networking/ila.txt b/Documentation/networking/ila.txt
new file mode 100644
index ..e9923218cd99
--- /dev/null
+++ b/Documentation/networking/ila.txt
@@ -0,0 +1,286 @@
+Identifier Locator Addressing (ILA)
+
+
+Introduction
+
+
+Identifier-locator addressing (ILA) is a technique used with IPv6 that
+differentiates between location and identity of a network node. Part of an
+address expresses the immutable identity of the node, and another part
+indicates the location of the node which can be dynamic. Identifier-locator
+addressing can be used to efficiently implement overlay networks for
+network virtualization as well as solutions for use cases in mobility.
+
+ILA can be thought of as means to implement an overlay network without
+encapsulation. This is accomplished by performing network address
+translation on destination addresses as a packet traverses a network. To
+the network, an ILA translated packet appears to be no different than any
+other IPv6 packet. For instance, if the transport protocol is TCP then an
+ILA translated packet looks like just another TCP/IPv6 packet. The
+advantage of this is that ILA is transparent to the network so that
+optimizations in the network, such as ECMP, RSS, GRO, GSO, etc., just work.
+
+The ILA protocol is described in Internet-Draft draft-herbert-intarea-ila.
+
+
+ILA terminology
+===
+
+  - Identifier A number that identifies an addressable node in the network
+   independent of its location. ILA identifiers are sixty-four
+   bit values.
+
+  - LocatorA network prefix that routes to a physical host. Locators
+   provide the topological location of an addressed node. ILA
+   locators are sixty-four bit prefixes.
+
+  - ILA mapping
+   A mapping of an ILA identifier to a locator (or to a
+   locator and meta data). An ILA domain maintains a database
+   that contains mappings for all destinations in the domain.
+
+  - SIR address
+   An IPv6 address composed of a SIR prefix (upper sixty-
+   four bits) and an identifier (lower sixty-four bits).
+   SIR addresses are visible to applications and provide a
+   means for them to address nodes independent of their
+   location.
+
+  - ILA address
+   An IPv6 address composed of a locator (upper sixty-four
+   bits) and an identifier (low order sixty-four bits). ILA
+   addresses are never visible to an application.
+
+  - ILA host   An end host that is capable of performing ILA translations
+   on transmit or receive.
+
+  - ILA router A network node that performs ILA translation and forwarding
+   of translated packets.
+
+  - ILA forwarding cache
+   A type of ILA router that only maintains a working set
+   cache of mappings.
+
+  - ILA node   A network node capable of performing ILA translations. This
+   can be an ILA router, ILA forwarding cache, or ILA host.
+
+
+Operation
+=
+
+There are two fundamental operations with ILA:
+
+  - Translate a SIR address to an ILA address. This is performed on ingress
+to an ILA overlay.
+
+  - Translate an ILA address to a SIR address. This is performed on egress
+from the ILA overlay.
+
+ILA can be deployed either on end hosts or intermediate devices in the
+network; these are provided by "ILA hosts" and "ILA routers" respectively.
+Configuration and datapath for these two points of deployment is somewhat
+different.
+
+The diagram below illustrates the flow of packets through ILA as well
+as showing ILA hosts and routers.
+
+++++
+| Host A +-+ +--->| Host B |
+|| |  (2) ILA   (')   ||
+++ |...addressed   (   )  ++
+   V  +---+--+  .  packet  .  +---+--+  (_)
+   (1) SIR |  | ILA  |->>>| ILA  |   |   (3) SIR
+addressed  +->|router|  .  .  |router|->-+addressed
+packet+---+--+  . IPv6 .  +---+--+packet
+   /.Network   .
+  / .  .   +--+-+++
+++   /  .  .   |ILA ||  Host  |
+|  Host  +--+   .  .- -|host|||
+||  .  .   +--+-+++
+++  
+
+
+Transport 

Re: [PATCH 4/4] RFC: net: dsa: realtek-smi: Add Realtek SMI driver

2017-11-05 Thread Andrew Lunn
Hi Linus

> +static int realtek_smi_read_reg(struct realtek_smi *smi, u32 addr, u32 *data)
> +{
> + unsigned long flags;
> + u8 lo = 0;
> + u8 hi = 0;
> + int ret;
> +
> + spin_lock_irqsave(>lock, flags);
> +
> + realtek_smi_start(smi);
> +
> + /* send READ command */
> + ret = realtek_smi_write_byte(smi, smi->cmd_read);
> + if (ret)
> + goto out;
> +
> + /* set ADDR[7:0] */
> + ret = realtek_smi_write_byte(smi, addr & 0xff);
> + if (ret)
> + goto out;
> +
> + /* set ADDR[15:8] */
> + ret = realtek_smi_write_byte(smi, addr >> 8);
> + if (ret)
> + goto out;
> +
> + /* read DATA[7:0] */
> + realtek_smi_read_byte0(smi, );
> + /* read DATA[15:8] */
> + realtek_smi_read_byte1(smi, );
> +
> + *data = ((u32) lo) | (((u32) hi) << 8);

If i'm reading this correct, addr is a u16 and data is also u16?  So
it is pretty similar to SMI.

I'm wondering if this should be modelled as a normal MDIO bus? Put the
driver as drivers/net/mdio-realtek.c?

I need to study the rest of the code to see if this is a good idea or
not.

Andrew



[PATCH v2 net-next 4/5] ila: Add a hook type for LWT routes

2017-11-05 Thread Tom Herbert
In LWT tunnels both an input and output route method is defined.
If both of these are executed in the same path then double translation
happens and the effect is not correct.

This patch adds a new attribute that indicates the hook type. Two
values are defined for route output and route output. ILA
translation is only done for the one that is set. The default is
to enable ILA on route output.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/ila.h |  7 +++
 net/ipv6/ila/ila_lwt.c   | 39 ---
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 8353c78a7781..483b77af4eb8 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -18,6 +18,7 @@ enum {
ILA_ATTR_PAD,
ILA_ATTR_CSUM_MODE, /* u8 */
ILA_ATTR_IDENT_TYPE,/* u8 */
+   ILA_ATTR_HOOK_TYPE, /* u8 */
 
__ILA_ATTR_MAX,
 };
@@ -57,4 +58,10 @@ enum {
 
ILA_ATYPE_USE_FORMAT = 32, /* Get type from type field in identifier */
 };
+
+enum {
+   ILA_HOOK_ROUTE_OUTPUT,
+   ILA_HOOK_ROUTE_INPUT,
+};
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 4b97d573f223..3d56a2fb6f86 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -20,6 +20,7 @@ struct ila_lwt {
struct ila_params p;
struct dst_cache dst_cache;
u32 connected : 1;
+   u32 lwt_output : 1;
 };
 
 static inline struct ila_lwt *ila_lwt_lwtunnel(
@@ -45,8 +46,10 @@ static int ila_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
 
-   ila_update_ipv6_locator(skb, ila_params_lwtunnel(orig_dst->lwtstate),
-   true);
+   if (ilwt->lwt_output)
+   ila_update_ipv6_locator(skb,
+   ila_params_lwtunnel(orig_dst->lwtstate),
+   true);
 
if (rt->rt6i_flags & (RTF_GATEWAY | RTF_CACHE)) {
/* Already have a next hop address in route, no need for
@@ -98,11 +101,15 @@ static int ila_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 static int ila_input(struct sk_buff *skb)
 {
struct dst_entry *dst = skb_dst(skb);
+   struct ila_lwt *ilwt = ila_lwt_lwtunnel(dst->lwtstate);
 
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
 
-   ila_update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate), false);
+   if (!ilwt->lwt_output)
+   ila_update_ipv6_locator(skb,
+   ila_params_lwtunnel(dst->lwtstate),
+   false);
 
return dst->lwtstate->orig_input(skb);
 
@@ -115,6 +122,7 @@ static const struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 
1] = {
[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
[ILA_ATTR_CSUM_MODE] = { .type = NLA_U8, },
[ILA_ATTR_IDENT_TYPE] = { .type = NLA_U8, },
+   [ILA_ATTR_HOOK_TYPE] = { .type = NLA_U8, },
 };
 
 static int ila_build_state(struct nlattr *nla,
@@ -129,7 +137,9 @@ static int ila_build_state(struct nlattr *nla,
const struct fib6_config *cfg6 = cfg;
struct ila_addr *iaddr;
u8 ident_type = ILA_ATYPE_USE_FORMAT;
+   u8 hook_type = ILA_HOOK_ROUTE_OUTPUT;
u8 csum_mode = ILA_CSUM_NO_ACTION;
+   bool lwt_output = true;
u8 eff_ident_type;
int ret;
 
@@ -180,6 +190,20 @@ static int ila_build_state(struct nlattr *nla,
return -EINVAL;
}
 
+   if (tb[ILA_ATTR_HOOK_TYPE])
+   hook_type = nla_get_u8(tb[ILA_ATTR_HOOK_TYPE]);
+
+   switch (hook_type) {
+   case ILA_HOOK_ROUTE_OUTPUT:
+   lwt_output = true;
+   break;
+   case ILA_HOOK_ROUTE_INPUT:
+   lwt_output = false;
+   break;
+   default:
+   return -EINVAL;
+   }
+
if (tb[ILA_ATTR_CSUM_MODE])
csum_mode = nla_get_u8(tb[ILA_ATTR_CSUM_MODE]);
 
@@ -202,6 +226,8 @@ static int ila_build_state(struct nlattr *nla,
return ret;
}
 
+   ilwt->lwt_output = !!lwt_output;
+
p = ila_params_lwtunnel(newts);
 
p->csum_mode = csum_mode;
@@ -236,6 +262,7 @@ static int ila_fill_encap_info(struct sk_buff *skb,
   struct lwtunnel_state *lwtstate)
 {
struct ila_params *p = ila_params_lwtunnel(lwtstate);
+   struct ila_lwt *ilwt = ila_lwt_lwtunnel(lwtstate);
 
if (nla_put_u64_64bit(skb, ILA_ATTR_LOCATOR, (__force 
u64)p->locator.v64,
  ILA_ATTR_PAD))
@@ -247,6 +274,11 @@ static int ila_fill_encap_info(struct sk_buff *skb,
if (nla_put_u8(skb, ILA_ATTR_IDENT_TYPE, (__force u8)p->ident_type))
goto 

[PATCH v2 net-next 3/5] ila: allow configuration of identifier type

2017-11-05 Thread Tom Herbert
Allow identifier to be explicitly configured for a mapping.
This can either be one of the identifier types specified in the
ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the
identifier type is inferred from the identifier type field.
If a value other than ILA_ATYPE_USE_FORMAT is set for a
mapping then it is assumed that the identifier type field is
not present in an identifier.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/ila.h | 13 
 net/ipv6/ila/ila.h   | 12 +---
 net/ipv6/ila/ila_lwt.c   | 51 +---
 net/ipv6/ila/ila_xlat.c  | 18 -
 4 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 0744881dcef3..8353c78a7781 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -17,6 +17,7 @@ enum {
ILA_ATTR_DIR,   /* u32 */
ILA_ATTR_PAD,
ILA_ATTR_CSUM_MODE, /* u8 */
+   ILA_ATTR_IDENT_TYPE,/* u8 */
 
__ILA_ATTR_MAX,
 };
@@ -44,4 +45,16 @@ enum {
ILA_CSUM_NEUTRAL_MAP_AUTO,
 };
 
+enum {
+   ILA_ATYPE_IID = 0,
+   ILA_ATYPE_LUID,
+   ILA_ATYPE_VIRT_V4,
+   ILA_ATYPE_VIRT_UNI_V6,
+   ILA_ATYPE_VIRT_MULTI_V6,
+   ILA_ATYPE_NONLOCAL_ADDR,
+   ILA_ATYPE_RSVD_1,
+   ILA_ATYPE_RSVD_2,
+
+   ILA_ATYPE_USE_FORMAT = 32, /* Get type from type field in identifier */
+};
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index e0170f62bc39..3c7a11b62334 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -55,17 +55,6 @@ struct ila_identifier {
};
 };
 
-enum {
-   ILA_ATYPE_IID = 0,
-   ILA_ATYPE_LUID,
-   ILA_ATYPE_VIRT_V4,
-   ILA_ATYPE_VIRT_UNI_V6,
-   ILA_ATYPE_VIRT_MULTI_V6,
-   ILA_ATYPE_RSVD_1,
-   ILA_ATYPE_RSVD_2,
-   ILA_ATYPE_RSVD_3,
-};
-
 #define CSUM_NEUTRAL_FLAG  htonl(0x1000)
 
 struct ila_addr {
@@ -93,6 +82,7 @@ struct ila_params {
struct ila_locator locator_match;
__wsum csum_diff;
u8 csum_mode;
+   u8 ident_type;
 };
 
 static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 104af07d83a6..4b97d573f223 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -114,6 +114,7 @@ static int ila_input(struct sk_buff *skb)
 static const struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
[ILA_ATTR_CSUM_MODE] = { .type = NLA_U8, },
+   [ILA_ATTR_IDENT_TYPE] = { .type = NLA_U8, },
 };
 
 static int ila_build_state(struct nlattr *nla,
@@ -127,19 +128,14 @@ static int ila_build_state(struct nlattr *nla,
struct lwtunnel_state *newts;
const struct fib6_config *cfg6 = cfg;
struct ila_addr *iaddr;
+   u8 ident_type = ILA_ATYPE_USE_FORMAT;
u8 csum_mode = ILA_CSUM_NO_ACTION;
+   u8 eff_ident_type;
int ret;
 
if (family != AF_INET6)
return -EINVAL;
 
-   if (cfg6->fc_dst_len < 8 * sizeof(struct ila_locator) + 3) {
-   /* Need to have full locator and at least type field
-* included in destination
-*/
-   return -EINVAL;
-   }
-
ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla, ila_nl_policy, extack);
if (ret < 0)
return ret;
@@ -149,6 +145,41 @@ static int ila_build_state(struct nlattr *nla,
 
iaddr = (struct ila_addr *)>fc_dst;
 
+   if (tb[ILA_ATTR_IDENT_TYPE])
+   ident_type = nla_get_u8(tb[ILA_ATTR_IDENT_TYPE]);
+
+   if (ident_type == ILA_ATYPE_USE_FORMAT) {
+   /* Infer identifier type from type field in formatted
+* identifier.
+*/
+
+   if (cfg6->fc_dst_len < 8 * sizeof(struct ila_locator) + 3) {
+   /* Need to have full locator and at least type field
+* included in destination
+*/
+   return -EINVAL;
+   }
+
+   eff_ident_type = iaddr->ident.type;
+   } else {
+   eff_ident_type = ident_type;
+   }
+
+   switch (eff_ident_type) {
+   case ILA_ATYPE_IID:
+   /* Don't allow ILA for IID type */
+   return -EINVAL;
+   case ILA_ATYPE_LUID:
+   break;
+   case ILA_ATYPE_VIRT_V4:
+   case ILA_ATYPE_VIRT_UNI_V6:
+   case ILA_ATYPE_VIRT_MULTI_V6:
+   case ILA_ATYPE_NONLOCAL_ADDR:
+   /* These ILA formats are not supported yet. */
+   default:
+   return -EINVAL;
+   }
+
if (tb[ILA_ATTR_CSUM_MODE])
csum_mode = nla_get_u8(tb[ILA_ATTR_CSUM_MODE]);
 
@@ -174,6 +205,7 @@ static int ila_build_state(struct nlattr 

[PATCH v2 net-next 1/5] ila: cleanup checksum diff

2017-11-05 Thread Tom Herbert
Consolidate computing checksum diff into one function.

Add get_csum_diff_iaddr that computes the checksum diff between
an address argument and locator being written. get_csum_diff
calls this using the destination address in the IP header as
the argument.

Also moved ila_init_saved_csum to be close to the checksum
diff functions.

Signed-off-by: Tom Herbert 
---
 net/ipv6/ila/ila_common.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index aba0998ddbfb..f1d9248d8b86 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -13,15 +13,28 @@
 #include 
 #include "ila.h"
 
-static __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
+void ila_init_saved_csum(struct ila_params *p)
 {
-   struct ila_addr *iaddr = ila_a2i(>daddr);
+   if (!p->locator_match.v64)
+   return;
+
+   p->csum_diff = compute_csum_diff8(
+   (__be32 *)>locator,
+   (__be32 *)>locator_match);
+}
 
+static __wsum get_csum_diff_iaddr(struct ila_addr *iaddr, struct ila_params *p)
+{
if (p->locator_match.v64)
return p->csum_diff;
else
-   return compute_csum_diff8((__be32 *)>loc,
- (__be32 *)>locator);
+   return compute_csum_diff8((__be32 *)>locator,
+ (__be32 *)>loc);
+}
+
+static __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
+{
+   return get_csum_diff_iaddr(ila_a2i(>daddr), p);
 }
 
 static void ila_csum_do_neutral(struct ila_addr *iaddr,
@@ -30,13 +43,7 @@ static void ila_csum_do_neutral(struct ila_addr *iaddr,
__sum16 *adjust = (__force __sum16 *)>ident.v16[3];
__wsum diff, fval;
 
-   /* Check if checksum adjust value has been cached */
-   if (p->locator_match.v64) {
-   diff = p->csum_diff;
-   } else {
-   diff = compute_csum_diff8((__be32 *)>locator,
- (__be32 *)iaddr);
-   }
+   diff = get_csum_diff_iaddr(iaddr, p);
 
fval = (__force __wsum)(ila_csum_neutral_set(iaddr->ident) ?
CSUM_NEUTRAL_FLAG : ~CSUM_NEUTRAL_FLAG);
@@ -134,16 +141,6 @@ void ila_update_ipv6_locator(struct sk_buff *skb, struct 
ila_params *p,
iaddr->loc = p->locator;
 }
 
-void ila_init_saved_csum(struct ila_params *p)
-{
-   if (!p->locator_match.v64)
-   return;
-
-   p->csum_diff = compute_csum_diff8(
-   (__be32 *)>locator,
-   (__be32 *)>locator_match);
-}
-
 static int __init ila_init(void)
 {
int ret;
-- 
2.11.0



[PATCH v2 net-next 0/5] ila: make identifier format optional and other fixes

2017-11-05 Thread Tom Herbert
The identifier type and checksum neutral mapping bits are optional
in identifier formats. This patch set fixes the implementation to
make them optional and configurable.

Specific items:

  - Clean up checksum diff code in ILA
  - Add checksum neutral mapping auto so that checksum neutral
mapping can be configured without requiring use of the C-bit
  - Add identifier type configuration and allow identifier
type to be configured so that the identifier type field does
not need to be present
  - Added ILA documention: ila.txt

I have fixes for ILA in iproute2 that will be poseted separately.

Tested: Ran netperf TCP_RR on various combinations of checksum
mode and the two supported identifier types.

v2:
  - Add proper sign off
  - In ILA LWT, only check prefix length includes identifier type
if identifier type is enabled (ILA_ATYPE_USE_FORMAT).
  - Add a hook type so that it can be specified whether ILA
translation is done on input or output route funciton in
LWT.

Tom Herbert (5):
  ila: cleanup checksum diff
  ila: add checksum neutral map auto
  ila: allow configuration of identifier type
  ila: Add a hook type for LWT routes
  ila: Add ila.txt

 Documentation/networking/ila.txt | 286 +++
 include/uapi/linux/ila.h |  21 +++
 net/ipv6/ila/ila.h   |  12 +-
 net/ipv6/ila/ila_common.c| 104 +++---
 net/ipv6/ila/ila_lwt.c   | 111 ---
 net/ipv6/ila/ila_xlat.c  |  26 ++--
 6 files changed, 474 insertions(+), 86 deletions(-)
 create mode 100644 Documentation/networking/ila.txt

-- 
2.11.0



[PATCH v2 net-next 2/5] ila: add checksum neutral map auto

2017-11-05 Thread Tom Herbert
Add checksum neutral auto that performs checksum neutral mapping
without using the C-bit. This is enabled by configuration of
a mapping.

The checksum neutral function has been split into
ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former
handles the C-bit and includes it in the adjustment value. The latter
just sets the adjustment value on the locator diff only.

Added configuration for checksum neutral map aut in ila_lwt
and ila_xlat.

Signed-off-by: Tom Herbert 
---
 include/uapi/linux/ila.h  |  1 +
 net/ipv6/ila/ila_common.c | 65 ---
 net/ipv6/ila/ila_lwt.c| 29 +++--
 net/ipv6/ila/ila_xlat.c   | 10 +---
 4 files changed, 61 insertions(+), 44 deletions(-)

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index f54853288f99..0744881dcef3 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -41,6 +41,7 @@ enum {
ILA_CSUM_ADJUST_TRANSPORT,
ILA_CSUM_NEUTRAL_MAP,
ILA_CSUM_NO_ACTION,
+   ILA_CSUM_NEUTRAL_MAP_AUTO,
 };
 
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index f1d9248d8b86..8c88ecf29b93 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -37,8 +37,8 @@ static __wsum get_csum_diff(struct ipv6hdr *ip6h, struct 
ila_params *p)
return get_csum_diff_iaddr(ila_a2i(>daddr), p);
 }
 
-static void ila_csum_do_neutral(struct ila_addr *iaddr,
-   struct ila_params *p)
+static void ila_csum_do_neutral_fmt(struct ila_addr *iaddr,
+   struct ila_params *p)
 {
__sum16 *adjust = (__force __sum16 *)>ident.v16[3];
__wsum diff, fval;
@@ -60,13 +60,23 @@ static void ila_csum_do_neutral(struct ila_addr *iaddr,
iaddr->ident.csum_neutral ^= 1;
 }
 
-static void ila_csum_adjust_transport(struct sk_buff *skb,
+static void ila_csum_do_neutral_nofmt(struct ila_addr *iaddr,
  struct ila_params *p)
 {
+   __sum16 *adjust = (__force __sum16 *)>ident.v16[3];
__wsum diff;
-   struct ipv6hdr *ip6h = ipv6_hdr(skb);
-   struct ila_addr *iaddr = ila_a2i(>daddr);
+
+   diff = get_csum_diff_iaddr(iaddr, p);
+
+   *adjust = ~csum_fold(csum_add(diff, csum_unfold(*adjust)));
+}
+
+static void ila_csum_adjust_transport(struct sk_buff *skb,
+ struct ila_params *p)
+{
size_t nhoff = sizeof(struct ipv6hdr);
+   struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   __wsum diff;
 
switch (ip6h->nexthdr) {
case NEXTHDR_TCP:
@@ -105,36 +115,39 @@ static void ila_csum_adjust_transport(struct sk_buff *skb,
}
break;
}
-
-   /* Now change destination address */
-   iaddr->loc = p->locator;
 }
 
 void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p,
-bool set_csum_neutral)
+bool sir2ila)
 {
struct ipv6hdr *ip6h = ipv6_hdr(skb);
struct ila_addr *iaddr = ila_a2i(>daddr);
 
-   /* First deal with the transport checksum */
-   if (ila_csum_neutral_set(iaddr->ident)) {
-   /* C-bit is set in the locator indicating that this
-* is a locator being translated to a SIR address.
-* Perform (receiver) checksum-neutral translation.
-*/
-   if (!set_csum_neutral)
-   ila_csum_do_neutral(iaddr, p);
-   } else {
-   switch (p->csum_mode) {
-   case ILA_CSUM_ADJUST_TRANSPORT:
-   ila_csum_adjust_transport(skb, p);
-   break;
-   case ILA_CSUM_NEUTRAL_MAP:
-   ila_csum_do_neutral(iaddr, p);
-   break;
-   case ILA_CSUM_NO_ACTION:
+   switch (p->csum_mode) {
+   case ILA_CSUM_ADJUST_TRANSPORT:
+   ila_csum_adjust_transport(skb, p);
+   break;
+   case ILA_CSUM_NEUTRAL_MAP:
+   if (sir2ila) {
+   if (WARN_ON(ila_csum_neutral_set(iaddr->ident))) {
+   /* Checksum flag should never be
+* set in a formatted SIR address.
+*/
+   break;
+   }
+   } else if (!ila_csum_neutral_set(iaddr->ident)) {
+   /* ILA to SIR translation and C-bit isn't
+* set so we're good.
+*/
break;
}
+   ila_csum_do_neutral_fmt(iaddr, p);
+   break;
+   case ILA_CSUM_NEUTRAL_MAP_AUTO:
+   ila_csum_do_neutral_nofmt(iaddr, p);
+   break;
+   case ILA_CSUM_NO_ACTION:
+   break;
}
 
/* Now 

Re: [PATCH] NFC: fix device-allocation error return

2017-11-05 Thread Samuel Ortiz
Hi Johan,

On Sun, Jul 09, 2017 at 01:08:58PM +0200, Johan Hovold wrote:
> A recent change fixing NFC device allocation itself introduced an
> error-handling bug by returning an error pointer in case device-id
> allocation failed. This is clearly broken as the callers still expected
> NULL to be returned on errors as detected by Dan's static checker.
> 
> Fix this up by returning NULL in the event that we've run out of memory
> when allocating a new device id.
> 
> Note that the offending commit is marked for stable (3.8) so this fix
> needs to be backported along with it.
> 
> Fixes: 20777bc57c34 ("NFC: fix broken device allocation")
> Cc: stable    # 3.8
> Reported-by: Dan Carpenter 
> Signed-off-by: Johan Hovold 
> ---
>  net/nfc/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
Applied, thanks for the fix.

Cheers,
Samuel.


Re: [PATCH 3/4] RFC: net: dsa: Add bindings for Realtek SMI DSAs

2017-11-05 Thread Andrew Lunn
> This interrupt construction is similar to how we handle
> interrupt controllers inside PCI bridges etc.

Hi Linus

Your interrupt handling is going in the right direction, but needs
further work. The PHY interrupt is a phy property, so should be in the
PHY node in device tree.

The Marvell driver gives an example of this, and
vf610-zii-dev-rev-c.dts is an example DT blob you can look at.

> + ports {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <0>;
> + port@0 {
> + reg = <0>;
> + label = "lan0";

So here, you should have a

phy-handle = <>;

linking this MAC to the PHY connected to it.


> + };

And then an MDIO bus, listing the PHYs

   mdio {
#address-cells = <1>;
#size-cells = <0>;

phy0: phy@0 {
reg = <0>;
interrupt-parent = <_intc>;
interrupts = <0>;
};

It is here you list the interrupts. And the PHY subsystem will link
the interrupt to the PHY when it enumerate the MDIO bus.

You have most of the code already for implementing the MDIO bus. The
rest you can probably borrow from the mv88e6xxx driver.

 Andrew


[PATCH 1/4] RFC: net/dsa: Allow DSA PHYs to define link IRQs

2017-11-05 Thread Linus Walleij
PHYs attached to DSAs may provide IRQs from GPIOs or other
interrupt controllers in the device tree. For these cases,
we need to go and grab the IRQ before registering the slave
so the PHY core can grab and enable this IRQ.

Cc: Antti Seppälä 
Cc: Roman Yeryomin 
Cc: Colin Leitner 
Cc: Gabor Juhos 
Cc: Florian Fainelli 
Signed-off-by: Linus Walleij 
---
 net/dsa/slave.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2afa99506f8b..9909d7fe80b1 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1119,6 +1120,13 @@ static int dsa_slave_phy_connect(struct dsa_slave_priv 
*p,
return -ENODEV;
}
 
+   /*
+* If the PHY has a link IRQ associated with it in the device tree,
+* then assign it so it can be claimed by the core.
+*/
+   if (of_irq_count(p->dp->dn))
+   p->phy->irq = irq_of_parse_and_map(p->dp->dn, 0);
+
/* Use already configured phy mode */
if (p->phy_interface == PHY_INTERFACE_MODE_NA)
p->phy_interface = p->phy->interface;
-- 
2.13.6



[PATCH 4/4] RFC: net: dsa: realtek-smi: Add Realtek SMI driver

2017-11-05 Thread Linus Walleij
This adds a driver core for the Realtek SMI chips and a subdriver
for the RTL8366RB. I just added this chip simply because it is
all I can test.

The code is a massaged variant of the code that has been sitting
out-of-tree in OpenWRT for years in the absence of a proper switch
subsystem. I have tried to credit the original authors wherever
possible.

The main changes I've done from the OpenWRT code:
- Added a callback to set the MAC address.
- Added an IRQ chip inside the RTL8366RB switch to demux and
  handle the line state IRQs.
- Distributed the phy handling out to the PHY driver.
- Added some RTL8366RB code that was missing in the driver,
  such as setting up "green ethernet" with a funny jam table
  and forcing MAC5 (the CPU port) into 1 GBit.

Cc: Antti Seppälä 
Cc: Roman Yeryomin 
Cc: Colin Leitner 
Cc: Gabor Juhos 
Cc: Florian Fainelli 
Signed-off-by: Linus Walleij 
---
 drivers/net/dsa/Kconfig   |   12 +
 drivers/net/dsa/Makefile  |2 +
 drivers/net/dsa/realtek-smi.c |  436 
 drivers/net/dsa/realtek-smi.h |  145 ++
 drivers/net/dsa/rtl8366.c |  493 ++
 drivers/net/dsa/rtl8366rb.c   | 1103 +
 6 files changed, 2191 insertions(+)
 create mode 100644 drivers/net/dsa/realtek-smi.c
 create mode 100644 drivers/net/dsa/realtek-smi.h
 create mode 100644 drivers/net/dsa/rtl8366.c
 create mode 100644 drivers/net/dsa/rtl8366rb.c

diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 83a9bc892a3b..d25fa9a35ad3 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -52,6 +52,18 @@ config NET_DSA_QCA8K
  This enables support for the Qualcomm Atheros QCA8K Ethernet
  switch chips.
 
+config NET_DSA_RTK_SMI
+   tristate "Realtek SMI Ethernet switch family support"
+   depends on NET_DSA
+   # FIXME: select NET_DSA_TAG_RTK
+   select FIXED_PHY
+   select IRQ_DOMAIN
+   select REALTEK_PHY
+   select NET_DSA_TAG_TRAILER
+   ---help---
+ This enables support for the Realtek SMI-based switch
+ chips, currently only RTL8366RB.
+
 config NET_DSA_SMSC_LAN9303
tristate
select NET_DSA_TAG_LAN9303
diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile
index 4a5b5bd297ee..f660096cbad1 100644
--- a/drivers/net/dsa/Makefile
+++ b/drivers/net/dsa/Makefile
@@ -4,6 +4,8 @@ obj-$(CONFIG_NET_DSA_LOOP)  += dsa_loop.o dsa_loop_bdinfo.o
 obj-$(CONFIG_NET_DSA_MT7530)   += mt7530.o
 obj-$(CONFIG_NET_DSA_MV88E6060) += mv88e6060.o
 obj-$(CONFIG_NET_DSA_QCA8K)+= qca8k.o
+obj-$(CONFIG_NET_DSA_RTK_SMI)  += realtek.o
+realtek-objs   := realtek-smi.o rtl8366.o rtl8366rb.o
 obj-$(CONFIG_NET_DSA_SMSC_LAN9303) += lan9303-core.o
 obj-$(CONFIG_NET_DSA_SMSC_LAN9303_I2C) += lan9303_i2c.o
 obj-$(CONFIG_NET_DSA_SMSC_LAN9303_MDIO) += lan9303_mdio.o
diff --git a/drivers/net/dsa/realtek-smi.c b/drivers/net/dsa/realtek-smi.c
new file mode 100644
index ..de37b9a776fa
--- /dev/null
+++ b/drivers/net/dsa/realtek-smi.c
@@ -0,0 +1,436 @@
+/*
+ * Realtek Simple Management Interface (SMI) driver
+ * It can be discussed how "simple" this interface is.
+ *
+ * The SMI protocol piggy-backs the MDIO MDC and MDIO signals levels
+ * but the protocol is not MDIO at all. Instead it is a Realtek
+ * pecularity that need to bit-bang the lines in a special way to
+ * communicate with the switch.
+ *
+ * ASICs we intend to support with this driver:
+ *
+ * RTL8366   - The original version, apparently
+ * RTL8369   - Similar enough to have the same datsheet as RTL8366
+ * RTL8366RB - Probably reads out "RTL8366 revision B", has a quite
+ * different register layout from the other two
+ * RTL8366S  - Is this "RTL8366 super"?
+ * RTL8367   - Has an OpenWRT driver as well
+ * RTL8368S  - Seems to be an alternative name for RTL8366RB
+ * RTL8370   - Also uses SMI
+ *
+ * Copyright (C) 2017 Linus Walleij 
+ * Copyright (C) 2010 Antti Seppälä 
+ * Copyright (C) 2010 Roman Yeryomin 
+ * Copyright (C) 2011 Colin Leitner 
+ * Copyright (C) 2009-2010 Gabor Juhos 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "realtek-smi.h"
+
+#define REALTEK_SMI_ACK_RETRY_COUNT5
+#define REALTEK_SMI_HW_STOP_DELAY  25  /* msecs */
+#define REALTEK_SMI_HW_START_DELAY 100 /* msecs */
+
+static inline void realtek_smi_clk_delay(struct realtek_smi *smi)
+{
+ 

[PATCH 3/4] RFC: net: dsa: Add bindings for Realtek SMI DSAs

2017-11-05 Thread Linus Walleij
The Realtek SMI family is a set of DSA chips that provide
switching in routers. This binding just follows the pattern
set by other switches but with the introduction of an embedded
irqchip to demux and handle the interrupts fired by the single
line from the chip.

This interrupt construction is similar to how we handle
interrupt controllers inside PCI bridges etc.

Cc: Antti Seppälä 
Cc: Roman Yeryomin 
Cc: Colin Leitner 
Cc: Gabor Juhos 
Cc: Florian Fainelli 
Cc: devicet...@vger.kernel.org
Signed-off-by: Linus Walleij 
---
 .../devicetree/bindings/net/dsa/realtek-smi.txt| 104 +
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/realtek-smi.txt

diff --git a/Documentation/devicetree/bindings/net/dsa/realtek-smi.txt 
b/Documentation/devicetree/bindings/net/dsa/realtek-smi.txt
new file mode 100644
index ..95e96d49c0be
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/realtek-smi.txt
@@ -0,0 +1,104 @@
+Realtek SMI-based Switches
+==
+
+The SMI "Simple Management Interface" is a two-wire protocol using
+bit-banged GPIO that while it reuses the MDIO lines MCK and MDIO does
+not use the MDIO protocol. This binding defines how to specify the
+SMI-based Realtek devices.
+
+Required properties:
+
+- compatible: must be exactly one of:
+  "realtek,rtl8366"
+  "realtek,rtl8369"
+  "realtek,rtl8366rb"
+  "realtek,rtl8366s"
+  "realtek,rtl8367"
+  "realtek,rtl8367b"
+
+Required subnode:
+
+- interrupt-controller
+
+  This defines an interrupt controller with an IRQ line (typically
+  a GPIO) that will demultiplex and handle the interrupt from the single
+  interrupt line coming out of one of the SMI-based chips. It most
+  importantly provides link up/down interrupts to the PHY blocks inside
+  the ASIC.
+
+Required properties of interrupt-controller:
+
+- interrupt: parent interrupt, see interrupt-controller/interrupts.txt
+- interrupt-controller: see interrupt-controller/interrupts.txt
+- #address-cells: should be <0>
+- #interrupt-cells: should be <1>
+
+See net/dsa/dsa.txt for a list of additional required and optional properties
+and subnodes.
+
+
+Examples:
+
+switch {
+   compatible = "realtek,rtl8366rb";
+   reg = <0>;
+   /* 22 = MDIO (has input reads), 21 = MDC (clock, output only) */
+   mdc-gpios = < 21 GPIO_ACTIVE_HIGH>;
+   mdio-gpios = < 22 GPIO_ACTIVE_HIGH>;
+   reset-gpios = < 14 GPIO_ACTIVE_LOW>;
+
+   switch_intc: interrupt-controller {
+   /* GPIO 15 provides the interrupt */
+   interrupt-parent = <>;
+   interrupts = <15 IRQ_TYPE_LEVEL_LOW>;
+   interrupt-controller;
+   #address-cells = <0>;
+   #interrupt-cells = <1>;
+   };
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+   port@0 {
+   reg = <0>;
+   label = "lan0";
+   interrupt-parent = <_intc>;
+   interrupts = <0>;
+   };
+   port@1 {
+   reg = <1>;
+   label = "lan1";
+   interrupt-parent = <_intc>;
+   interrupts = <1>;
+   };
+   port@2 {
+   reg = <2>;
+   label = "lan2";
+   interrupt-parent = <_intc>;
+   interrupts = <2>;
+   };
+   port@3 {
+   reg = <3>;
+   label = "lan3";
+   interrupt-parent = <_intc>;
+   interrupts = <3>;
+   };
+   port@4 {
+   reg = <4>;
+   label = "wan";
+   interrupt-parent = <_intc>;
+   interrupts = <4>;
+   };
+   phy0: port@5 {
+   reg = <5>;
+   label = "cpu";
+   ethernet = <>;
+   phy-mode = "rgmii";
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   };
+};
-- 
2.13.6



[PATCH 2/4] RFC: net: phy: realtek: Support RTL8366RB variant

2017-11-05 Thread Linus Walleij
The RTL8366RB is an ASIC with five internal PHYs for
LAN0..LAN3 and WAN. The PHYs are spawn off the main
device so they can be handled in a distributed manner
by the Realtek PHY driver. All that is really needed
is the power save feature enablement and letting the
PHY driver core pick up the IRQ from the switch chip.

Cc: Antti Seppälä 
Cc: Roman Yeryomin 
Cc: Colin Leitner 
Cc: Gabor Juhos 
Cc: Florian Fainelli 
Signed-off-by: Linus Walleij 
---
 drivers/net/phy/realtek.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 9cbe645e3d89..2fb2eb7a32be 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -29,6 +29,9 @@
 #define RTL8211F_PAGE_SELECT   0x1f
 #define RTL8211F_TX_DELAY  0x100
 
+#define RTL8366RB_POWER_SAVE   0x21
+#define RTL8366RB_POWER_SAVE_ON 0x1000
+
 MODULE_DESCRIPTION("Realtek PHY driver");
 MODULE_AUTHOR("Johnson Leung");
 MODULE_LICENSE("GPL");
@@ -119,6 +122,22 @@ static int rtl8211f_config_init(struct phy_device *phydev)
return 0;
 }
 
+static int rtl8366rb_config_init(struct phy_device *phydev)
+{
+   int ret;
+   u16 reg;
+
+   ret = genphy_config_init(phydev);
+   if (ret < 0)
+   return ret;
+
+   reg = phy_read(phydev, RTL8366RB_POWER_SAVE);
+   reg |= RTL8366RB_POWER_SAVE_ON;
+   phy_write(phydev, RTL8366RB_POWER_SAVE, reg);
+
+   return 0;
+}
+
 static struct phy_driver realtek_drvs[] = {
{
.phy_id = 0x8201,
@@ -175,6 +194,18 @@ static struct phy_driver realtek_drvs[] = {
.config_intr= _config_intr,
.suspend= genphy_suspend,
.resume = genphy_resume,
+   }, {
+   /* The main part of this DSA is in drivers/net/dsa */
+   .phy_id = 0x001cc961,
+   .name   = "RTL8366RB Gigabit Ethernet",
+   .phy_id_mask= 0x001f,
+   .features   = PHY_GBIT_FEATURES,
+   .flags  = PHY_HAS_INTERRUPT,
+   .config_aneg= _config_aneg,
+   .config_init= _config_init,
+   .read_status= _read_status,
+   .suspend= genphy_suspend,
+   .resume = genphy_resume,
},
 };
 
@@ -185,6 +216,7 @@ static struct mdio_device_id __maybe_unused realtek_tbl[] = 
{
{ 0x001cc914, 0x001f },
{ 0x001cc915, 0x001f },
{ 0x001cc916, 0x001f },
+   { 0x001cc961, 0x001f },
{ }
 };
 
-- 
2.13.6



[PATCH 0/4] RFC: Realtek 83xx SMI driver core

2017-11-05 Thread Linus Walleij
Hi folks,

I'm working a bit on this. Since DSA is big, complex and hard for
a novice I just wanted to throw what I have in my tree out there
so you can take a look at how I hacked this up and give me some
help how to continue.

I am running it for trials on the D-Link DIR-685 and it looks
fun, but my ethernet driver for Gemini is not yet working so
I cannot really do proper testing. I'll get there I guess.

Example from dmesg:
realtek-smi 0.switch: deasserted RESET
realtek-smi 0.switch: found an RTL8366RB switch
DSA: switch 0 0 parsed
DSA: tree 0 parsed
realtek-smi 0.switch: RTL5937 ver 3 chip found
realtek-smi 0.switch: active low/falling IRQ
realtek-smi 0.switch: set MAC: CE:32:3B:FB:58:13
libphy: dsa slave smi: probed
RTL8366RB Gigabit Ethernet dsa-0.0:00: attached PHY driver [RTL8366RB Gigabit 
Ethernet] (mii_bus:phy_addr=dsa-0.0:00, irq=37)
RTL8366RB Gigabit Ethernet dsa-0.0:01: attached PHY driver [RTL8366RB Gigabit 
Ethernet] (mii_bus:phy_addr=dsa-0.0:01, irq=38)
RTL8366RB Gigabit Ethernet dsa-0.0:02: attached PHY driver [RTL8366RB Gigabit 
Ethernet] (mii_bus:phy_addr=dsa-0.0:02, irq=39)
RTL8366RB Gigabit Ethernet dsa-0.0:03: attached PHY driver [RTL8366RB Gigabit 
Ethernet] (mii_bus:phy_addr=dsa-0.0:03, irq=40)
RTL8366RB Gigabit Ethernet dsa-0.0:04: attached PHY driver [RTL8366RB Gigabit 
Ethernet] (mii_bus:phy_addr=dsa-0.0:04, irq=41)
realtek-smi 0.switch: adjust link on CPU port
gmac-gemini 6000.ethernet eth0: connected to PHY "fixed-0:00"
Generic PHY fixed-0:00: attached PHY driver [Generic PHY] 
(mii_bus:phy_addr=fixed-0:00, irq=POLL)
phy_id=0x, phy_mode=rgmii
gmac-gemini 6000.ethernet: set GMAC0 and GMAC1 to MII/RGMII mode
gmac-gemini 6000.ethernet eth0: connect to RGMII
gmac-gemini 6000.ethernet eth0: opened
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
realtek-smi 0.switch: enable port 0
IPv6: ADDRCONF(NETDEV_UP): lan0: link is not ready
realtek-smi 0.switch lan0: Link is Down
realtek-smi 0.switch lan0: Link is Up - 1Gbps/Full - flow control rx/tx
IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready

cat /proc/interrupts
 36:  6  FTGPIO010  15 Level RTL8366RB
 37:  3  RTL8366RB   0 Edge  dsa-0.0:00
 38:  0  RTL8366RB   1 Edge  dsa-0.0:01
 39:  2  RTL8366RB   2 Edge  dsa-0.0:02
 40:  2  RTL8366RB   3 Edge  dsa-0.0:03
 41:  0  RTL8366RB   4 Edge  dsa-0.0:04

Plugged some cables in/out. Hooray, no polling needed.

Linus Walleij (4):
  RFC: net/dsa: Allow DSA PHYs to define link IRQs
  RFC: net: phy: realtek: Support RTL8366RB variant
  RFC: net: dsa: Add bindings for Realtek SMI DSAs
  RFC: net: dsa: realtek-smi: Add Realtek SMI driver

 .../devicetree/bindings/net/dsa/realtek-smi.txt|  104 ++
 drivers/net/dsa/Kconfig|   12 +
 drivers/net/dsa/Makefile   |2 +
 drivers/net/dsa/realtek-smi.c  |  436 
 drivers/net/dsa/realtek-smi.h  |  145 +++
 drivers/net/dsa/rtl8366.c  |  493 +
 drivers/net/dsa/rtl8366rb.c| 1103 
 drivers/net/phy/realtek.c  |   32 +
 net/dsa/slave.c|8 +
 9 files changed, 2335 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/realtek-smi.txt
 create mode 100644 drivers/net/dsa/realtek-smi.c
 create mode 100644 drivers/net/dsa/realtek-smi.h
 create mode 100644 drivers/net/dsa/rtl8366.c
 create mode 100644 drivers/net/dsa/rtl8366rb.c

-- 
2.13.6



Re: [PATCH net] bonding: discard lowest hash bit for 802.3ad layer3+4

2017-11-05 Thread Eric Dumazet
On Sun, Nov 5, 2017 at 6:56 AM, Hangbin Liu  wrote:
> After commit 07f4c90062f8 ("tcp/dccp: try to not exhaust ip_local_port_range
> in connect()"), we will try to use even ports for connect(). Then If an
> application (seen clearly with iperf) opens multiple streams to the same
> destination IP and port, each stream will be given an even source port.
>
> So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
> will always hash all these streams to the same interface. And the total
> throughput will limited to a single slave.
>
> Change the tcp code will impact the whole tcp behavior, only for bonding
> usage. Paolo Abeni suggested fix this by changing the bonding code only,
> which should be more reasonable, and less impact.
>
> Fix this by discarding the lowest hash bit because it contains little entropy.
> After the fix we can re-balance between slaves.
>
> Signed-off-by: Paolo Abeni 
> Signed-off-by: Hangbin Liu 
> ---
>  drivers/net/bonding/bond_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index c99dc59..728fa08 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3237,7 +3237,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff 
> *skb)
>
> if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> skb->l4_hash)
> -   return skb->hash;
> +   return skb->hash >> 1;

Why are you changing this part ?

The l4 hash provided by local TCP stack does not use a pathological
XOR based on ports/addresses,
but a random value with pretty good entropy.

No need to try do 'enhance' it by actually being slightly worse.


>
> if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> !bond_flow_dissect(bond, skb, ))
> @@ -3253,7 +3253,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff 
> *skb)
> hash ^= (hash >> 16);
> hash ^= (hash >> 8);
>
> -   return hash;
> +   return hash >> 1;
>  }
>
>  /*-- Device entry points 
> */
> --
> 2.5.5
>


How to identify net namespace in kernel messages?

2017-11-05 Thread Vasily Averin
On 2017-11-05 15:48, David Miller wrote:
> From: Vasily Averin 
>> I doubt that pointer to freed net have value for someone except
>> developers, on the other hand it helps to speed up the problem
>> investigation.
> 
> Any kernel pointer printed has value to attackers.

David, could you please advise how to identify net namespace in kernel messages?

In OpenVz we got many requests from host admins, they need to understand
which container triggered the message. In such cases we have added our custom
Container Id, but mainline lacks it.

I expected that mainline can use net pointer for such purposes,
nfsd does it for example:

 NFSD: starting 90-second grace period (net 880e307fe240)

Now you recommend do not use net pointer.
However could you please advise some alternative?


Removal of IrDA

2017-11-05 Thread Petr Cvek

Hello,

I got a time to update from v4.10 to v4.14-rc and I've found IrDA is in 
the staging (and it is scheduled to be removed).


I'm still using the driver nsc-ircc on Thinkpad T60p and the driver 
pxaficp_ir (I have some kingsun-sir dongles and an use for irtty-sir 
too). Actually I'm pretty dependent on the second one (my phone has only 
USB host/device and IrDA).


If the removal is inevitable will there be at least some off-the-tree 
repository?


best regards,
Petr


Kopie von: "Re:禹娄越

2017-11-05 Thread MOVIDA
Dieses ist eine Kopie der folgenden Nachricht, die an Contact Movida 
International via MOVIDA gesendet wurde:

Dies ist eine Mailanfrage via https://movida-net.com/ von:
竺那姜 

周末优惠不打烊,狂享加赠得意金!
每逢周六、周日,任意进行电子游艺游戏,次日即享加赠得意金,最高11000元。让您兴致连连,无比得意!
老牌信誉,大额无忧: www.6660605.com/? 
常命不输于任何人论法术虽然常命的本门木系法术修炼的比较少但是他还会其他的火系法术



Re: [PATCH 06/21] nfs client: exit_net cleanup check added

2017-11-05 Thread Trond Myklebust
On Sun, 2017-11-05 at 19:48 +0300, Vasily Averin wrote:
> On 2017-11-05 19:02, Trond Myklebust wrote:
> > On Sun, 2017-11-05 at 13:00 +0300, Vasily Averin wrote:
> > > Be sure that nfs_client_list and nfs_volume_list lists
> > > initialized
> > > in net_init hook were return to initial state in net_exit hook.
> > > 
> > > Signed-off-by: Vasily Averin 
> > > ---
> > >  fs/nfs/client.c | 4 
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > > index 22880ef..7c0691c 100644
> > > --- a/fs/nfs/client.c
> > > +++ b/fs/nfs/client.c
> > > @@ -204,6 +204,10 @@ void nfs_cleanup_cb_ident_idr(struct net
> > > *net)
> > >   struct nfs_net *nn = net_generic(net, nfs_net_id);
> > >  
> > >   idr_destroy(>cb_ident_idr);
> > > + WARN(!list_empty(>nfs_client_list),
> > > +  "net %p exit: nfs_client_list is not empty\n",
> > > net);
> > > + WARN(!list_empty(>nfs_volume_list),
> > > +  "net %p exit: nfs_volume_list is not empty\n",
> > > net);
> > >  }
> > >  
> > 
> > Why do we need these? Is there a specific bug that you are trying
> > to
> > track down?
> 
> I hope such checks allows to detect leaked per-netns objects.
> Also I hope that all new pernet_operations will inherit such checks
> too.
> 
> I assume that elements added into per-net lists should not live
> longer than net namespace,
> and should be deleted from the list. I think exit_net hook is good
> place for such check.
> 
> Recently I've found lost list_entry and enabled timer on stop of net
> namespace.
> Then I've reviewed all existing pernet_operations and found that many
> drivers
> have such checks already. So I decided to complete this task and add
> such checks
> into all affected subsystems.
> 

Unless there is a known problem that has specific debugging needs, this
kind of assert should take the form of a WARN_ONCE() so that it doesn't
fill user syslogs with redundant warnings if triggered.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.mykleb...@primarydata.com


[net-next:master 2025/2045] net/dsa/dsa2.c:659:7: error: implicit declaration of function 'of_property_read_variable_u32_array'

2017-11-05 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   2798b80b385384d51a81832556ee9ad25d175f9b
commit: 975e6e32215e6cbc09b65d762865b1a46e8e9103 [2025/2045] net: dsa: rework 
switch parsing
config: x86_64-randconfig-u0-11060023 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout 975e6e32215e6cbc09b65d762865b1a46e8e9103
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/dsa/dsa2.c: In function 'dsa_switch_parse_member_of':
>> net/dsa/dsa2.c:659:7: error: implicit declaration of function 
>> 'of_property_read_variable_u32_array' [-Werror=implicit-function-declaration]
 sz = of_property_read_variable_u32_array(dn, "dsa,member", m, 2, 2);
  ^~~
   In file included from include/linux/ioport.h:13:0,
from include/linux/device.h:16,
from net/dsa/dsa2.c:13:
   net/dsa/dsa2.c: At top level:
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'strcpy' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:422:2: note: in expansion of macro 'if'
 if (p_size == (size_t)-1 && q_size == (size_t)-1)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'kmemdup' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:412:2: note: in expansion of macro 'if'
 if (p_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'kmemdup' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:410:2: note: in expansion of macro 'if'
 if (__builtin_constant_p(size) && p_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'memchr_inv' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:401:2: note: in expansion of macro 'if'
 if (p_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'memchr_inv' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:399:2: note: in expansion of macro 'if'
 if (__builtin_constant_p(size) && p_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'memchr' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:390:2: note: in expansion of macro 'if'
 if (p_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'memchr' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:388:2: note: in expansion of macro 'if'
 if (__builtin_constant_p(size) && p_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'memcmp' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:380:2: note: in expansion of macro 'if'
 if (p_size < size || q_size < size)
 ^~
   include/linux/compiler.h:163:4: warning: '__f' is static but declared in 
inline function 'memcmp' which is not static
   __f = { \
   ^
   include/linux/compiler.h:155:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  

[PATCH net-next 2/3] ip6_gre: Refactor ip6gre xmit codes

2017-11-05 Thread William Tu
This patch refactors the ip6gre_xmit_{ipv4, ipv6}.
It is a prep work to add the ip6erspan tunnel.

Signed-off-by: William Tu 
---
 net/ipv6/ip6_gre.c | 124 -
 1 file changed, 76 insertions(+), 48 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 3e10c51e7e0c..8c7612f32926 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -497,6 +497,79 @@ static int gre_handle_offloads(struct sk_buff *skb, bool 
csum)
csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE);
 }
 
+static inline void prepare_ip6gre_xmit_ipv4(struct sk_buff *skb,
+   struct net_device *dev,
+   struct flowi6 *fl6, __u8 *dsfield,
+   int *encap_limit)
+{
+   struct ip6_tnl *t = netdev_priv(dev);
+   const struct iphdr *iph = ip_hdr(skb);
+
+   if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
+   *encap_limit = t->parms.encap_limit;
+
+   memcpy(fl6, >fl.u.ip6, sizeof(*fl6));
+
+   if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
+   *dsfield = ipv4_get_dsfield(iph);
+   else
+   *dsfield = ip6_tclass(t->parms.flowinfo);
+
+   if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
+   fl6->flowi6_mark = skb->mark;
+   else
+   fl6->flowi6_mark = t->parms.fwmark;
+
+   fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL);
+}
+
+static inline int prepare_ip6gre_xmit_ipv6(struct sk_buff *skb,
+  struct net_device *dev,
+  struct flowi6 *fl6, __u8 *dsfield,
+  int *encap_limit)
+{
+   struct ip6_tnl *t = netdev_priv(dev);
+   struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+   __u16 offset;
+
+   offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb));
+   /* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */
+   ipv6h = ipv6_hdr(skb);
+
+   if (offset > 0) {
+   struct ipv6_tlv_tnl_enc_lim *tel;
+
+   tel = (struct ipv6_tlv_tnl_enc_lim 
*)_network_header(skb)[offset];
+   if (tel->encap_limit == 0) {
+   icmpv6_send(skb, ICMPV6_PARAMPROB,
+   ICMPV6_HDR_FIELD, offset + 2);
+   return -1;
+   }
+   *encap_limit = tel->encap_limit - 1;
+   } else if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) {
+   *encap_limit = t->parms.encap_limit;
+   }
+
+   memcpy(fl6, >fl.u.ip6, sizeof(*fl6));
+
+   if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
+   *dsfield = ipv6_get_dsfield(ipv6h);
+   else
+   *dsfield = ip6_tclass(t->parms.flowinfo);
+
+   if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
+   fl6->flowlabel |= ip6_flowlabel(ipv6h);
+
+   if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
+   fl6->flowi6_mark = skb->mark;
+   else
+   fl6->flowi6_mark = t->parms.fwmark;
+
+   fl6->flowi6_uid = sock_net_uid(dev_net(dev), NULL);
+
+   return 0;
+}
+
 static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
   struct net_device *dev, __u8 dsfield,
   struct flowi6 *fl6, int encap_limit,
@@ -533,7 +606,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
 static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev)
 {
struct ip6_tnl *t = netdev_priv(dev);
-   const struct iphdr  *iph = ip_hdr(skb);
int encap_limit = -1;
struct flowi6 fl6;
__u8 dsfield;
@@ -542,21 +614,7 @@ static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, 
struct net_device *dev)
 
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 
-   if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
-   encap_limit = t->parms.encap_limit;
-
-   memcpy(, >fl.u.ip6, sizeof(fl6));
-
-   if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-   dsfield = ipv4_get_dsfield(iph);
-   else
-   dsfield = ip6_tclass(t->parms.flowinfo);
-   if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
-   fl6.flowi6_mark = skb->mark;
-   else
-   fl6.flowi6_mark = t->parms.fwmark;
-
-   fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL);
+   prepare_ip6gre_xmit_ipv4(skb, dev, , , _limit);
 
err = gre_handle_offloads(skb, !!(t->parms.o_flags & TUNNEL_CSUM));
if (err)
@@ -580,7 +638,6 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, 
struct net_device *dev)
struct ip6_tnl *t = netdev_priv(dev);
struct ipv6hdr *ipv6h = ipv6_hdr(skb);
int encap_limit = -1;
-   __u16 offset;
struct flowi6 fl6;
__u8 dsfield;
__u32 mtu;

[PATCH net-next 1/3] ip_gre: Refector the erpsan tunnel code.

2017-11-05 Thread William Tu
Move two erspan functions to header file, erspan.h, so ipv6
erspan implementation can use it.

Signed-off-by: William Tu 
---
 include/net/erspan.h | 51 +
 net/ipv4/ip_gre.c| 54 +---
 2 files changed, 56 insertions(+), 49 deletions(-)

diff --git a/include/net/erspan.h b/include/net/erspan.h
index ca94fc86865e..c2d265684c2d 100644
--- a/include/net/erspan.h
+++ b/include/net/erspan.h
@@ -58,4 +58,55 @@ struct erspanhdr {
struct erspan_metadata md;
 };
 
+static inline u8 tos_to_cos(u8 tos)
+{
+   u8 dscp, cos;
+
+   dscp = tos >> 2;
+   cos = dscp >> 3;
+   return cos;
+}
+
+static void erspan_build_header(struct sk_buff *skb,
+   __be32 id, u32 index,
+   bool truncate, bool is_ipv4)
+{
+   struct ethhdr *eth = eth_hdr(skb);
+   enum erspan_encap_type enc_type;
+   struct erspanhdr *ershdr;
+   struct qtag_prefix {
+   __be16 eth_type;
+   __be16 tci;
+   } *qp;
+   u16 vlan_tci = 0;
+   u8 tos;
+
+   tos = is_ipv4 ? ip_hdr(skb)->tos :
+   (ipv6_hdr(skb)->priority << 4) +
+   (ipv6_hdr(skb)->flow_lbl[0] >> 4);
+
+   enc_type = ERSPAN_ENCAP_NOVLAN;
+
+   /* If mirrored packet has vlan tag, extract tci and
+*  perserve vlan header in the mirrored frame.
+*/
+   if (eth->h_proto == htons(ETH_P_8021Q)) {
+   qp = (struct qtag_prefix *)(skb->data + 2 * ETH_ALEN);
+   vlan_tci = ntohs(qp->tci);
+   enc_type = ERSPAN_ENCAP_INFRAME;
+   }
+
+   skb_push(skb, sizeof(*ershdr));
+   ershdr = (struct erspanhdr *)skb->data;
+   memset(ershdr, 0, sizeof(*ershdr));
+
+   ershdr->ver_vlan = htons((vlan_tci & VLAN_MASK) |
+(ERSPAN_VERSION << VER_OFFSET));
+   ershdr->session_id = htons((u16)(ntohl(id) & ID_MASK) |
+  ((tos_to_cos(tos) << COS_OFFSET) & COS_MASK) |
+  (enc_type << EN_OFFSET & EN_MASK) |
+  ((truncate << T_OFFSET) & T_MASK));
+   ershdr->md.index = htonl(index & INDEX_MASK);
+}
+
 #endif
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index c105a315b1a3..007b733195cb 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -114,7 +114,8 @@ MODULE_PARM_DESC(log_ecn_error, "Log packets received with 
corrupted ECN");
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
 static int ipgre_tunnel_init(struct net_device *dev);
 static void erspan_build_header(struct sk_buff *skb,
-   __be32 id, u32 index, bool truncate);
+   __be32 id, u32 index,
+   bool truncate, bool is_ipv4);
 
 static unsigned int ipgre_net_id __read_mostly;
 static unsigned int gre_tap_net_id __read_mostly;
@@ -589,7 +590,7 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct 
net_device *dev,
goto err_free_rt;
 
erspan_build_header(skb, tunnel_id_to_key32(key->tun_id),
-   ntohl(md->index), truncate);
+   ntohl(md->index), truncate, true);
 
gre_build_header(skb, 8, TUNNEL_SEQ,
 htons(ETH_P_ERSPAN), 0, htonl(tunnel->o_seqno++));
@@ -668,52 +669,6 @@ static netdev_tx_t ipgre_xmit(struct sk_buff *skb,
return NETDEV_TX_OK;
 }
 
-static inline u8 tos_to_cos(u8 tos)
-{
-   u8 dscp, cos;
-
-   dscp = tos >> 2;
-   cos = dscp >> 3;
-   return cos;
-}
-
-static void erspan_build_header(struct sk_buff *skb,
-   __be32 id, u32 index, bool truncate)
-{
-   struct iphdr *iphdr = ip_hdr(skb);
-   struct ethhdr *eth = eth_hdr(skb);
-   enum erspan_encap_type enc_type;
-   struct erspanhdr *ershdr;
-   struct qtag_prefix {
-   __be16 eth_type;
-   __be16 tci;
-   } *qp;
-   u16 vlan_tci = 0;
-
-   enc_type = ERSPAN_ENCAP_NOVLAN;
-
-   /* If mirrored packet has vlan tag, extract tci and
-*  perserve vlan header in the mirrored frame.
-*/
-   if (eth->h_proto == htons(ETH_P_8021Q)) {
-   qp = (struct qtag_prefix *)(skb->data + 2 * ETH_ALEN);
-   vlan_tci = ntohs(qp->tci);
-   enc_type = ERSPAN_ENCAP_INFRAME;
-   }
-
-   skb_push(skb, sizeof(*ershdr));
-   ershdr = (struct erspanhdr *)skb->data;
-   memset(ershdr, 0, sizeof(*ershdr));
-
-   ershdr->ver_vlan = htons((vlan_tci & VLAN_MASK) |
-(ERSPAN_VERSION << VER_OFFSET));
-   ershdr->session_id = htons((u16)(ntohl(id) & ID_MASK) |
-  ((tos_to_cos(iphdr->tos) << COS_OFFSET) & COS_MASK) |
-  (enc_type << EN_OFFSET & EN_MASK) |
-  

[PATCH net-next 3/3] ip6_gre: Add ERSPAN native tunnel support

2017-11-05 Thread William Tu
The patch adds support for ERSPAN tunnel over ipv6.

Signed-off-by: William Tu 
---
 include/net/ip6_tunnel.h |   1 +
 net/ipv6/ip6_gre.c   | 266 ++-
 2 files changed, 263 insertions(+), 4 deletions(-)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index d66f70f63734..3475dad0aa77 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -36,6 +36,7 @@ struct __ip6_tnl_parm {
__be32  o_key;
 
__u32   fwmark;
+   __u32 index;/* ERSPAN type II index */
 };
 
 /* IPv6 tunnel */
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 8c7612f32926..eb00a65f9d4c 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -55,6 +55,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 static bool log_ecn_error = true;
@@ -73,10 +74,12 @@ struct ip6gre_net {
 
 static struct rtnl_link_ops ip6gre_link_ops __read_mostly;
 static struct rtnl_link_ops ip6gre_tap_ops __read_mostly;
+static struct rtnl_link_ops ip6erspan_tap_ops __read_mostly;
 static int ip6gre_tunnel_init(struct net_device *dev);
 static void ip6gre_tunnel_setup(struct net_device *dev);
 static void ip6gre_tunnel_link(struct ip6gre_net *ign, struct ip6_tnl *t);
 static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu);
+static void ip6erspan_tap_setup(struct net_device *dev);
 
 /* Tunnel hash table */
 
@@ -121,7 +124,8 @@ static struct ip6_tnl *ip6gre_tunnel_lookup(struct 
net_device *dev,
unsigned int h1 = HASH_KEY(key);
struct ip6_tnl *t, *cand = NULL;
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
-   int dev_type = (gre_proto == htons(ETH_P_TEB)) ?
+   int dev_type = (gre_proto == htons(ETH_P_TEB) ||
+   gre_proto == htons(ETH_P_ERSPAN)) ?
   ARPHRD_ETHER : ARPHRD_IP6GRE;
int score, cand_score = 4;
 
@@ -469,6 +473,40 @@ static int ip6gre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi)
return PACKET_REJECT;
 }
 
+static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
+struct tnl_ptk_info *tpi)
+{
+   const struct ipv6hdr *ipv6h;
+   struct erspanhdr *ershdr;
+   struct ip6_tnl *tunnel;
+   __be32 index;
+
+   ipv6h = ipv6_hdr(skb);
+   ershdr = (struct erspanhdr *)skb->data;
+
+   if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr
+   return PACKET_REJECT;
+
+   tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
+   index = ershdr->md.index;
+
+   tunnel = ip6gre_tunnel_lookup(skb->dev,
+ >saddr, >daddr, tpi->key,
+ tpi->proto);
+   if (tunnel) {
+   if (__iptunnel_pull_header(skb, sizeof(*ershdr),
+  htons(ETH_P_TEB),
+  false, false) < 0)
+   return PACKET_REJECT;
+
+   ip6_tnl_rcv(tunnel, skb, tpi, NULL, false);
+
+   return PACKET_RCVD;
+   }
+
+   return PACKET_RCVD;
+}
+
 static int gre_rcv(struct sk_buff *skb)
 {
struct tnl_ptk_info tpi;
@@ -482,6 +520,12 @@ static int gre_rcv(struct sk_buff *skb)
if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false))
goto drop;
 
+   if (unlikely(tpi.proto == htons(ETH_P_ERSPAN))) {
+   if (ip6erspan_rcv(skb, hdr_len, ) == PACKET_RCVD)
+   return 0;
+   goto drop;
+   }
+
if (ip6gre_rcv(skb, ) == PACKET_RCVD)
return 0;
 
@@ -739,6 +783,84 @@ static netdev_tx_t ip6gre_tunnel_xmit(struct sk_buff *skb,
return NETDEV_TX_OK;
 }
 
+static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
+struct net_device *dev)
+{
+   struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+   struct ip6_tnl *t = netdev_priv(dev);
+   struct dst_entry *dst = skb_dst(skb);
+   struct net_device_stats *stats;
+   int encap_limit = -1;
+   __u8 dsfield = false;
+   struct flowi6 fl6;
+   int err = -EINVAL;
+   bool truncate;
+   __u32 mtu;
+
+   stats = >dev->stats;
+
+   if (!ip6_tnl_xmit_ctl(t, >parms.laddr, >parms.raddr))
+   goto tx_err;
+
+   if (gre_handle_offloads(skb, false))
+   goto tx_err;
+
+   switch (skb->protocol) {
+   case htons(ETH_P_IP):
+   memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
+   prepare_ip6gre_xmit_ipv4(skb, dev, ,
+, _limit);
+   break;
+   case htons(ETH_P_IPV6):
+   if (ipv6_addr_equal(>parms.raddr, >saddr))
+   goto tx_err;
+   if (prepare_ip6gre_xmit_ipv6(skb, dev, ,
+, _limit))
+   

[PATCH net-next 0/3] ip6_gre: add erspan native tunnel for ipv6

2017-11-05 Thread William Tu
The patch series add support for ERSPAN tunnel over ipv6.  The first patch
refectors the existing ipv4 gre implementation and the second refactors the
ipv6 gre's xmit code.  Finally the last patch introduces erspan protocol.

William Tu (3):
  ip_gre: Refector the erpsan tunnel code.
  ip6_gre: Refactor ip6gre xmit codes
  ip6_gre: Add ERSPAN native tunnel support

 include/net/erspan.h |  51 +++
 include/net/ip6_tunnel.h |   1 +
 net/ipv4/ip_gre.c|  54 +--
 net/ipv6/ip6_gre.c   | 390 ---
 4 files changed, 395 insertions(+), 101 deletions(-)

-- 
A test script is provided below:
#!/bin/bash
# In the namespace NS0, create veth0 and ip6erspan00
# Out of the namespace, create veth1 and ip6erspan11
# Ping in and out of namespace using ERSPAN protocol 

# Patch for iproute2
# https://marc.info/?l=linux-netdev=150990695210278=2 

cleanup() {
set +ex
ip netns del ns0
ip link del ip6erspan11
ip link del veth1
}

main() {
trap cleanup 0 2 3 9

ip netns add ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns0

# non-namespace
ip addr add dev veth1 fc00:100::2/96
ip link add dev ip6erspan11 type ip6erspan seq key 102 erspan 123 \
 local fc00:100::2 \
remote fc00:100::1

ip addr add dev ip6erspan11 fc00:200::2/96
ip addr add dev ip6erspan11 10.10.200.2/24

# namespace: ns0 
ip netns exec ns0 ip addr add fc00:100::1/96 dev veth0

# Tunnel
ip netns exec ns0 ip link add dev ip6erspan00 type ip6erspan seq key 
102 erspan 12 \
 local fc00:100::1 \
remote fc00:100::2

ip netns exec ns0 ip addr add dev ip6erspan00 fc00:200::1/96
ip netns exec ns0 ip addr add dev ip6erspan00 10.10.200.1/24

ip link set dev veth1 up
ip link set dev ip6erspan11 up
ip netns exec ns0 ip link set dev ip6erspan00 up
ip netns exec ns0 ip link set dev veth0 up
}

main

# Ping underlying
ping6 -c 1 fc00:100::1 || true

# ping overlay
ping -c 3 10.10.200.1
ping6 -c 3 fc00:200::1
---


2.7.4



[PATCH net-next iproute2] ip6_gre: add support for ERSPAN tunnel

2017-11-05 Thread William Tu
The patch adds ERSPAN type II tunnel support for IPv6.

Signed-off-by: William Tu 
---
 ip/ipaddress.c   |  5 +++--
 ip/iplink.c  |  6 +++---
 ip/link_gre6.c   | 28 +++-
 man/man8/ip-address.8.in |  1 +
 man/man8/ip-link.8.in|  4 
 5 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 9e9a7e0a6477..c0cd66538ea2 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -77,8 +77,9 @@ static void usage(void)
fprintf(stderr, "LFT := forever | SECONDS\n");
fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | 
macvtap |\n");
fprintf(stderr, "  bridge | bond | ipoib | ip6tnl | ipip | sit 
| vxlan | lowpan |\n");
-   fprintf(stderr, "  gre | gretap | erspan | ip6gre | ip6gretap | 
vti | nlmon | can |\n");
-   fprintf(stderr, "  bond_slave | ipvlan | geneve | bridge_slave 
| vrf | hsr | macsec }\n");
+   fprintf(stderr, "  gre | gretap | erspan | ip6gre | ip6gretap | 
ip6erspan | vti |\n");
+   fprintf(stderr, "  nlmon | can | bond_slave | ipvlan | geneve | 
bridge_slave |\n");
+   fprintf(stderr, "  hsr | macsec\n");
 
exit(-1);
 }
diff --git a/ip/iplink.c b/ip/iplink.c
index 6a96ea9ff56a..be41749af036 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -112,9 +112,9 @@ void iplink_usage(void)
"\n"
"TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | 
macvtap |\n"
"  bridge | bond | team | ipoib | ip6tnl | ipip 
| sit | vxlan |\n"
-   "  gre | gretap | erspan | ip6gre | ip6gretap | 
vti | nlmon |\n"
-   "  team_slave | bond_slave | ipvlan | geneve | 
bridge_slave |\n"
-   "  vrf | macsec }\n");
+   "  gre | gretap | erspan | ip6gre | ip6gretap | 
ip6erspan |\n"
+   "  vti | nlmon | team_slave | bond_slave | 
ipvlan | geneve |\n"
+   "  bridge_slave | vrf | macsec }\n");
}
exit(-1);
 }
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 7d07932a60f0..a738739ff1e8 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -33,7 +33,7 @@
 static void print_usage(FILE *f)
 {
fprintf(f,
-   "Usage: ... { ip6gre | ip6gretap } [ remote ADDR ]\n"
+   "Usage: ... { ip6gre | ip6gretap | ip6erspan} [ remote ADDR ]\n"
"  [ local ADDR ]\n"
"  [ [i|o]seq ]\n"
"  [ [i|o]key KEY ]\n"
@@ -52,6 +52,7 @@ static void print_usage(FILE *f)
"  [ [no]encap-csum ]\n"
"  [ [no]encap-csum6 ]\n"
"  [ [no]encap-remcsum ]\n"
+   "  [ erspan IDX ]\n"
"\n"
"Where: ADDR  := IPV6_ADDRESS\n"
"   TTL   := { 0..255 } (default=%d)\n"
@@ -106,6 +107,7 @@ static int gre_parse_opt(struct link_util *lu, int argc, 
char **argv,
__u16 encapdport = 0;
int len;
__u32 fwmark = 0;
+   __u32 erspan_idx = 0;
 
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
if (rtnl_talk(, , , sizeof(req)) < 0) {
@@ -180,6 +182,9 @@ get_failed:
 
if (greinfo[IFLA_GRE_FWMARK])
fwmark = rta_getattr_u32(greinfo[IFLA_GRE_FWMARK]);
+
+   if (greinfo[IFLA_GRE_ERSPAN_INDEX])
+   erspan_idx = 
rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
}
 
while (argc > 0) {
@@ -369,6 +374,12 @@ get_failed:
encap_limit = uval;
flags &= ~IP6_TNL_F_IGN_ENCAP_LIMIT;
}
+   } else if (strcmp(*argv, "erspan") == 0) {
+   NEXT_ARG();
+   if (get_u32(_idx, *argv, 0))
+   invarg("invalid erspan index\n", *argv);
+   if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
+   invarg("erspan index must be > 0 and <= 
20-bit\n", *argv);
} else
usage();
argc--; argv++;
@@ -387,6 +398,8 @@ get_failed:
addattr_l(n, 1024, IFLA_GRE_FLOWINFO, , 4);
addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
+   if (erspan_idx != 0)
+   addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
 
addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
@@ -554,6 +567,11 @@ static void 

[PATCH v2 21/21] sunrpc: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that all_clients list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/sunrpc/sunrpc_syms.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index c73de18..4a25658 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -65,10 +65,14 @@ static __net_init int sunrpc_init_net(struct net *net)
 
 static __net_exit void sunrpc_exit_net(struct net *net)
 {
+   struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
+
rpc_pipefs_exit_net(net);
unix_gid_cache_destroy(net);
ip_map_cache_destroy(net);
rpc_proc_exit(net);
+   WARN(!list_empty(>all_clients),
+"%s: all_clients list is not empty\n", __func__);
 }
 
 static struct pernet_operations sunrpc_net_ops = {
-- 
2.7.4



[PATCH v2 20/21] phonet: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that pndevs.list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/phonet/pn_dev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index 2cb4c5d..81b4eb0 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -331,7 +331,11 @@ static int __net_init phonet_init_net(struct net *net)
 
 static void __net_exit phonet_exit_net(struct net *net)
 {
+   struct phonet_net *pnn = phonet_pernet(net);
+
remove_proc_entry("phonet", net->proc_net);
+   WARN(!list_empty(>pndevs.list),
+"%s: pndevs.list is not empty\n", __func__);
 }
 
 static struct pernet_operations phonet_net_ops = {
-- 
2.7.4



[PATCH v2 19/21] packet: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that packet.sklist initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/packet/af_packet.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index bec01a3..7ceb97c 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -4562,6 +4562,8 @@ static int __net_init packet_net_init(struct net *net)
 static void __net_exit packet_net_exit(struct net *net)
 {
remove_proc_entry("packet", net->proc_net);
+   WARN(!hlist_empty(>packet.sklist),
+"%s: sklist is not empty\n", __func__);
 }
 
 static struct pernet_operations packet_net_ops = {
-- 
2.7.4



[PATCH v2 18/21] recent: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that tables list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/netfilter/xt_recent.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 245fa35..230d00f 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -681,7 +681,11 @@ static int __net_init recent_net_init(struct net *net)
 
 static void __net_exit recent_net_exit(struct net *net)
 {
+   struct recent_net *recent_net = recent_pernet(net);
+
recent_proc_net_exit(net);
+   WARN(!list_empty(_net->tables),
+"%s: tables list is not empty\n", __func__);
 }
 
 static struct pernet_operations recent_net_ops = {
-- 
2.7.4



[PATCH v2 17/21] hashlimit: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that htables array initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/netfilter/xt_hashlimit.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 5da8746..abef6b4 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -1338,7 +1338,11 @@ static int __net_init hashlimit_net_init(struct net *net)
 
 static void __net_exit hashlimit_net_exit(struct net *net)
 {
+   struct hashlimit_net *hashlimit_net = hashlimit_pernet(net);
+
hashlimit_proc_net_exit(net);
+   WARN(!hlist_empty(_net->htables),
+"%s: htables hlist is not empty\n", __func__);
 }
 
 static struct pernet_operations hashlimit_net_ops = {
-- 
2.7.4



[PATCH v2 16/21] x_tables: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that xt.tables array initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/netfilter/x_tables.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d8571f4..8125363 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1714,8 +1714,18 @@ static int __net_init xt_net_init(struct net *net)
return 0;
 }
 
+static void __net_exit xt_net_exit(struct net *net)
+{
+   int i;
+
+   for (i = 0; i < NFPROTO_NUMPROTO; i++)
+   WARN(!list_empty(>xt.tables[i]),
+"%s: tables list is not empty\n", __func__);
+}
+
 static struct pernet_operations xt_net_ops = {
.init = xt_net_init,
+   .exit = xt_net_exit,
 };
 
 static int __init xt_init(void)
-- 
2.7.4



[PATCH v2 14/21] nfnetlink_log: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that instance_table array initialized in net_init hook
was return to initial state.

Signed-off-by: Vasily Averin 
---
 net/netfilter/nfnetlink_log.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index cad6498..c99f427 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -1093,10 +1093,16 @@ static int __net_init nfnl_log_net_init(struct net *net)
 
 static void __net_exit nfnl_log_net_exit(struct net *net)
 {
+   unsigned int i;
+   struct nfnl_log_net *log = nfnl_log_pernet(net);
 #ifdef CONFIG_PROC_FS
remove_proc_entry("nfnetlink_log", net->nf.proc_netfilter);
 #endif
nf_log_unset(net, _logger);
+   for (i = 0; i < INSTANCE_BUCKETS; i++)
+   if (WARN(!hlist_empty(>instance_table[i]),
+"%s: instance_table is not empty\n", __func__))
+   break;
 }
 
 static struct pernet_operations nfnl_log_net_ops = {
-- 
2.7.4



[PATCH v2 13/21] nf_tables: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that lists initialized in net_init hook were return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/netfilter/nf_tables_api.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 64e1ee0..8219b2f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5778,6 +5778,14 @@ static int __net_init nf_tables_init_net(struct net *net)
return 0;
 }
 
+static void __net_exit nf_tables_exit_net(struct net *net)
+{
+   WARN(!list_empty(>nft.af_info),
+"%s: af_info list is not empty\n", __func__);
+   WARN(!list_empty(>nft.commit_list),
+"%s: commit_list is not empty\n", __func__);
+}
+
 int __nft_release_basechain(struct nft_ctx *ctx)
 {
struct nft_rule *rule, *nr;
@@ -5848,6 +5856,7 @@ static void __nft_release_afinfo(struct net *net, struct 
nft_af_info *afi)
 
 static struct pernet_operations nf_tables_net_ops = {
.init   = nf_tables_init_net,
+   .exit   = nf_tables_exit_net,
 };
 
 static int __init nf_tables_module_init(void)
-- 
2.7.4



[PATCH v2 15/21] nfnetlink_gueue: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that instance_table array initialized in net_init hook
was return to initial state.

Signed-off-by: Vasily Averin 
---
 net/netfilter/nfnetlink_queue.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index c979662..0fa56d9 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -1512,10 +1512,17 @@ static int __net_init nfnl_queue_net_init(struct net 
*net)
 
 static void __net_exit nfnl_queue_net_exit(struct net *net)
 {
+   unsigned int i;
+   struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+
nf_unregister_queue_handler(net);
 #ifdef CONFIG_PROC_FS
remove_proc_entry("nfnetlink_queue", net->nf.proc_netfilter);
 #endif
+   for (i = 0; i < INSTANCE_BUCKETS; i++)
+   if (WARN(!hlist_empty(>instance_table[i]),
+"%s: instance_table isn't empty\n", __func__))
+   break;
 }
 
 static void nfnl_queue_net_exit_batch(struct list_head *net_exit_list)
-- 
2.7.4



[PATCH v2 09/21] clusterip: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that configs list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 17b4ca5..038f0a9 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -819,6 +819,8 @@ static void clusterip_net_exit(struct net *net)
cn->procdir = NULL;
 #endif
nf_unregister_net_hook(net, _arp_ops);
+   WARN(!list_empty(>configs),
+"%s: configs list is not empty\n"i, __func__);
 }
 
 static struct pernet_operations clusterip_net_ops = {
-- 
2.7.4



[PATCH v2 11/21] af_key: replace BUG_ON on WARN_ON in net_exit hook

2017-11-05 Thread Vasily Averin
Signed-off-by: Vasily Averin 
---
 net/key/af_key.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index a00d607..3dffb89 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3845,7 +3845,7 @@ static void __net_exit pfkey_net_exit(struct net *net)
struct netns_pfkey *net_pfkey = net_generic(net, pfkey_net_id);
 
pfkey_exit_proc(net);
-   BUG_ON(!hlist_empty(_pfkey->table));
+   WARN_ON(!hlist_empty(_pfkey->table));
 }
 
 static struct pernet_operations pfkey_net_ops = {
-- 
2.7.4



[PATCH v2 12/21] l2tp: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that l2tp_session_hlist array initialized in net_init hook
was return to initial state.

Signed-off-by: Vasily Averin 
---
 net/l2tp/l2tp_core.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 02d6110..1136341 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1888,6 +1888,7 @@ static __net_exit void l2tp_exit_net(struct net *net)
 {
struct l2tp_net *pn = l2tp_pernet(net);
struct l2tp_tunnel *tunnel = NULL;
+   int hash;
 
rcu_read_lock_bh();
list_for_each_entry_rcu(tunnel, >l2tp_tunnel_list, list) {
@@ -1897,6 +1898,11 @@ static __net_exit void l2tp_exit_net(struct net *net)
 
flush_workqueue(l2tp_wq);
rcu_barrier();
+
+   for (hash = 0; hash < L2TP_HASH_SIZE_2; hash++)
+   if (WARN(!hlist_empty(>l2tp_session_hlist[hash]),
+"%s: session_hlist is not empty\n", __func__))
+   break;
 }
 
 static struct pernet_operations l2tp_net_ops = {
-- 
2.7.4



[PATCH v2 10/21] xfrm6_tunnel: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that spi_byaddr and spi_byspi arrays initialized in net_init hook
were return to initial state

Signed-off-by: Vasily Averin 
---
 net/ipv6/xfrm6_tunnel.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 4e438bc..8280152 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -338,6 +338,18 @@ static int __net_init xfrm6_tunnel_net_init(struct net 
*net)
 
 static void __net_exit xfrm6_tunnel_net_exit(struct net *net)
 {
+   struct xfrm6_tunnel_net *xfrm6_tn = xfrm6_tunnel_pernet(net);
+   unsigned int i;
+
+   for (i = 0; i < XFRM6_TUNNEL_SPI_BYADDR_HSIZE; i++)
+   if (WARN(!hlist_empty(_tn->spi_byaddr[i]),
+"%s: spi_byaddr is not empty\n", __func__))
+   break;
+
+   for (i = 0; i < XFRM6_TUNNEL_SPI_BYSPI_HSIZE; i++)
+   if (WARN(!hlist_empty(_tn->spi_byspi[i]),
+"%s: spi_byspi is not empty\n", __func__))
+   break;
 }
 
 static struct pernet_operations xfrm6_tunnel_net_ops = {
-- 
2.7.4



[PATCH v2 08/21] fib_rules: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that rules_ops list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/core/fib_rules.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 9a6d97c..5ab4fac 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -1019,8 +1019,15 @@ static int __net_init fib_rules_net_init(struct net *net)
return 0;
 }
 
+static void __net_exit fib_rules_net_exit(struct net *net)
+{
+   WARN(!list_empty(>rules_ops),
+"%s: rules_ops list is not empty\n", __func__);
+}
+
 static struct pernet_operations fib_rules_net_ops = {
.init = fib_rules_net_init,
+   .exit = fib_rules_net_exit,
 };
 
 static int __init fib_rules_init(void)
-- 
2.7.4



[PATCH v2 07/21] fib_notifier: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that fib_notifier_ops list initilized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 net/core/fib_notifier.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
index 4fc202d..1f57ec0 100644
--- a/net/core/fib_notifier.c
+++ b/net/core/fib_notifier.c
@@ -161,8 +161,15 @@ static int __net_init fib_notifier_net_init(struct net 
*net)
return 0;
 }
 
+static void __net_exit fib_notifier_net_exit(struct net *net)
+{
+   WARN(!list_empty(>fib_notifier_ops),
+"%s: fib_notifier_ops list is not empty\n", __func__);
+}
+
 static struct pernet_operations fib_notifier_net_ops = {
.init = fib_notifier_net_init,
+   .exit = fib_notifier_net_exit,
 };
 
 static int __init fib_notifier_init(void)
-- 
2.7.4



[PATCH v2 06/21] nfs client: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that nfs_client_list and nfs_volume_list lists initialized
in net_init hook were return to initial state in net_exit hook.

Signed-off-by: Vasily Averin 
---
 fs/nfs/client.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 22880ef..e099a01 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -204,6 +204,10 @@ void nfs_cleanup_cb_ident_idr(struct net *net)
struct nfs_net *nn = net_generic(net, nfs_net_id);
 
idr_destroy(>cb_ident_idr);
+   WARN(!list_empty(>nfs_client_list),
+"nfs net_exit: nfs_client_list is not empty\n");
+   WARN(!list_empty(>nfs_volume_list),
+"nfs net_exit: nfs_volume_list is not empty\n");
 }
 
 /* nfs_client_lock held */
-- 
2.7.4



[PATCH v2 05/21] nfs4blocklayout: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that bl_wq wait queue initialized in net_init hook
is not used anymore.

Signed-off-by: Vasily Averin 
---
 fs/nfs/blocklayout/rpc_pipefs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/blocklayout/rpc_pipefs.c b/fs/nfs/blocklayout/rpc_pipefs.c
index 9fb067a6..faae48f 100644
--- a/fs/nfs/blocklayout/rpc_pipefs.c
+++ b/fs/nfs/blocklayout/rpc_pipefs.c
@@ -256,6 +256,8 @@ static void nfs4blocklayout_net_exit(struct net *net)
nfs4blocklayout_unregister_net(net, nn->bl_device_pipe);
rpc_destroy_pipe_data(nn->bl_device_pipe);
nn->bl_device_pipe = NULL;
+   WARN(!list_empty(>bl_wq.head),
+"%s: bl_wq head is not empty\n", __func__);
 }
 
 static struct pernet_operations nfs4blocklayout_net_ops = {
-- 
2.7.4



[PATCH v2 02/21] ppp: exit_net cleanup checks added

2017-11-05 Thread Vasily Averin
Be sure that lists initialized in net_init hook were return
to initial state.

Signed-off-by: Vasily Averin 
---
 drivers/net/ppp/ppp_generic.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index e365866..10cee62 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -960,6 +960,10 @@ static __net_exit void ppp_exit_net(struct net *net)
rtnl_unlock();
 
idr_destroy(>units_idr);
+   WARN(!list_empty(>all_channels),
+"%s: all_channels list is not empty\n", __func__);
+   WARN(!list_empty(>new_channels),
+"%s: new_channels list is not empty\n", __func__);
 }
 
 static struct pernet_operations ppp_net_ops = {
-- 
2.7.4



[PATCH v2 04/21] netdev: exit_net cleanup check added

2017-11-05 Thread Vasily Averin
Be sure that dev_base_head list initialized in net_init hook was return
to initial state

Signed-off-by: Vasily Averin 
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 588b473..198f137 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8562,6 +8562,9 @@ static void __net_exit netdev_exit(struct net *net)
 {
kfree(net->dev_name_head);
kfree(net->dev_index_head);
+   if (net != _net)
+   WARN(!list_empty(>dev_base_head),
+"%s: dev_base_head is not empty\n", __func__);
 }
 
 static struct pernet_operations __net_initdata netdev_net_ops = {
-- 
2.7.4



[PATCH v2 01/21] exit_net cleanup: geneve sock_list check

2017-11-05 Thread Vasily Averin
Be sure that sock_list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin 
---
 drivers/net/geneve.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index f640407..dece711 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1673,6 +1673,8 @@ static void __net_exit geneve_exit_net(struct net *net)
/* unregister the devices gathered above */
unregister_netdevice_many();
rtnl_unlock();
+   WARN(!list_empty(>sock_list),
+": sock_list is not empty\n", __func__);
 }
 
 static struct pernet_operations geneve_net_ops = {
-- 
2.7.4



[PATCH v2 03/21] vxlan: exit_net cleanup checks added

2017-11-05 Thread Vasily Averin
Be sure that sock_list array initialized in net_init hook was return
to initial state

Signed-off-by: Vasily Averin 
---
 drivers/net/vxlan.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d7c49cf..f72c1de 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3704,6 +3704,7 @@ static void __net_exit vxlan_exit_net(struct net *net)
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_dev *vxlan, *next;
struct net_device *dev, *aux;
+   unsigned int h;
LIST_HEAD(list);
 
rtnl_lock();
@@ -3723,6 +3724,11 @@ static void __net_exit vxlan_exit_net(struct net *net)
 
unregister_netdevice_many();
rtnl_unlock();
+
+   for (h = 0; h < PORT_HASH_SIZE; ++h)
+   if (WARN(!hlist_empty(>sock_list[h]),
+"%s: sock_list is not empty\n", __func__))
+   break;
 }
 
 static struct pernet_operations vxlan_net_ops = {
-- 
2.7.4



[PATCH v2 00/21] exit_net checks for objects initialized in net_init hook

2017-11-05 Thread Vasily Averin
This patch set checks that lists initialized in net_init hooks were
return to initial state at end of net_exit hooks.

I hope such checks allows to detect leaked per-netns objects.
Also I hope that all new pernet_operations will inherit such checks too.

I assume that elements added into per-net lists should not live longer than net 
namespace,
and should be deleted from the list. I think exit_net hook is good place for 
such check.

Recently I've found lost list_entry and enabled timer on stop of net namespace.
Then I've reviewed all existing pernet_operations and found that many drivers
have such checks already. So I decided to complete this task and add such checks
into all affected subsystems.

v2:
- net pointer removed from output
- fixed compilation for phonet driver

Vasily Averin (21):
  exit_net cleanup: geneve sock_list check
  ppp: exit_net cleanup checks added
  vxlan: exit_net cleanup checks added
  netdev: exit_net cleanup check added
  nfs4blocklayout: exit_net cleanup check added
  nfs client: exit_net cleanup check added
  fib_notifier: exit_net cleanup check added
  fib_rules: exit_net cleanup check added
  clusterip: exit_net cleanup check added
  xfrm6_tunnel: exit_net cleanup check added
  af_key: replace BUG_ON on WARN_ON in net_exit hook
  l2tp: exit_net cleanup check added
  nf_tables: exit_net cleanup check added
  nfnetlink_log: exit_net cleanup check added
  nfnetlink_gueue: exit_net cleanup check added
  x_tables: exit_net cleanup check added
  hashlimit: exit_net cleanup check added
  recent: exit_net cleanup check added
  packet: exit_net cleanup check added
  phonet: exit_net cleanup check added
  sunrpc: exit_net cleanup check added

 drivers/net/geneve.c   |  2 ++
 drivers/net/ppp/ppp_generic.c  |  4 
 drivers/net/vxlan.c|  6 ++
 fs/nfs/blocklayout/rpc_pipefs.c|  2 ++
 fs/nfs/client.c|  4 
 net/core/dev.c |  3 +++
 net/core/fib_notifier.c|  7 +++
 net/core/fib_rules.c   |  7 +++
 net/ipv4/netfilter/ipt_CLUSTERIP.c |  2 ++
 net/ipv6/xfrm6_tunnel.c| 12 
 net/key/af_key.c   |  2 +-
 net/l2tp/l2tp_core.c   |  6 ++
 net/netfilter/nf_tables_api.c  |  9 +
 net/netfilter/nfnetlink_log.c  |  6 ++
 net/netfilter/nfnetlink_queue.c|  7 +++
 net/netfilter/x_tables.c   | 10 ++
 net/netfilter/xt_hashlimit.c   |  4 
 net/netfilter/xt_recent.c  |  4 
 net/packet/af_packet.c |  2 ++
 net/phonet/pn_dev.c|  4 
 net/sunrpc/sunrpc_syms.c   |  4 
 21 files changed, 106 insertions(+), 1 deletion(-)

-- 
2.7.4



  1   2   >