date:20181114

Re: [PATCH net] ipv6: fix a dst leak when removing its exception

2018-11-14 Thread Xin Long

On Thu, Nov 15, 2018 at 3:33 PM David Ahern  wrote:
>
> On 11/14/18 11:03 AM, David Ahern wrote:
> > On 11/13/18 8:48 AM, Xin Long wrote:
> >> These is no need to hold dst before calling rt6_remove_exception_rt().
> >> The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
> >> which has been removed in Commit 93531c674315 ("net/ipv6: separate
> >> handling of FIB entries from dst based routes"). Otherwise, it will
> >> cause a dst leak.
> >>
> >> This patch is to simply remove the dst_hold_safe() call before calling
> >> rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
> >> It's safe, because the removal of the exception that holds its dst's
> >> refcnt is protected by rt6_exception_lock.
> >>
> >> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst 
> >> based routes")
> >> Fixes: 23fb93a4d3f1 ("net/ipv6: Cleanup exception and cache route 
> >> handling")
> >> Reported-by: Li Shuang 
> >> Signed-off-by: Xin Long 
> >> ---
> >>  net/ipv6/route.c | 7 +++
> >>  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > was this problem actually hit or is this patch based on a code analysis?
> >
>
> I ask because I have not been able to reproduce the leak using existing
> tests (e.g., pmtu) that I know create exceptions.
>
> If this problem was hit, it would be good to get a test case for it.
The attachment is the ip6_dst.sh with IPVS.

# sh ip6_dst.sh

But this one triggers the kernel warnings caused by 2 places:
   unregister_netdevice: waiting for br0 to become free. Usage count = 3

1. one is IPVS, I just posted the fix:
https://patchwork.ozlabs.org/patch/998123/  [1]
2. the other one is IPv6,
ip6_link_failure() will be hit.

So to make this reproduce clearly, you may want to apply
patch [1] firstly.


ip6_dst.sh
Description: Bourne shell script

Re: [PATCH v3 net-next 0/4] net: batched receive in GRO path

2018-11-14 Thread Eric Dumazet

On 11/14/2018 10:07 AM, Edward Cree wrote:
>
> Conclusion:
> * TCP b/w is 16.5% faster for traffic which cannot be coalesced by GRO.
>

But only for traffic that actually was perfect GRO candidate, right ?

Now what happens if all the packets you are batching are hitting different TCP 
sockets ?

(DDOS attack patterns)

By the time we build a list of 64 packets, the first packets in the list wont 
be anymore
in L1 cache (32 KB 8-way associative typically), and we will probably have 
cache trashing.

Re: [PATCH net] l2tp: fix a sock refcnt leak in l2tp_tunnel_register

2018-11-14 Thread David Miller

From: Xin Long 
Date: Tue, 13 Nov 2018 01:08:25 +0800

> This issue happens when trying to add an existent tunnel. It
> doesn't call sock_put() before returning -EEXIST to release
> the sock refcnt that was held by calling sock_hold() before
> the existence check.
> 
> This patch is to fix it by holding the sock after doing the
> existence check.
> 
> Fixes: f6cd651b056f ("l2tp: fix race in duplicate tunnel detection")
> Reported-by: Jianlin Shi 
> Signed-off-by: Xin Long 

Applied and queued up for -stable.

Re: [PATCH net-next 2/2] net/sched: act_police: don't use spinlock in the data path

2018-11-14 Thread Eric Dumazet




On 09/13/2018 10:29 AM, Davide Caratti wrote:
> use RCU instead of spinlocks, to protect concurrent read/write on
> act_police configuration. This reduces the effects of contention in the
> data path, in case multiple readers are present.
> 
> Signed-off-by: Davide Caratti 
> ---
>  net/sched/act_police.c | 156 -
>  1 file changed, 92 insertions(+), 64 deletions(-)
> 

I must be missing something obvious with this patch.

How can the following piece of code in tcf_police_act() can possibly be run
without a spinlock or something preventing multiple cpus messing badly with the 
state variables  ?


now = ktime_get_ns();
toks = min_t(s64, now - p->tcfp_t_c, p->tcfp_burst);
if (p->peak_present) {
ptoks = toks + p->tcfp_ptoks;
if (ptoks > p->tcfp_mtu_ptoks)
ptoks = p->tcfp_mtu_ptoks;
ptoks -= (s64)psched_l2t_ns(&p->peak,
qdisc_pkt_len(skb));
}
toks += p->tcfp_toks;
if (toks > p->tcfp_burst)
toks = p->tcfp_burst;
toks -= (s64)psched_l2t_ns(&p->rate, qdisc_pkt_len(skb));
if ((toks|ptoks) >= 0) {
p->tcfp_t_c = now;
p->tcfp_toks = toks;
p->tcfp_ptoks = ptoks;
ret = p->tcfp_result;
goto inc_drops;
}

Re: [patch net-next] net: 8021q: move vlan offload registrations into vlan_core

2018-11-14 Thread David Ahern

On 11/13/18 2:22 PM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Currently, the vlan packet offloads are registered only upon 8021q module
> load. However, even without this module loaded, the offloads could be
> utilized, for example by openvswitch datapath. As reported by Michael,
> that causes 2x to 5x performance improvement, depending on a testcase.
> 
> So move the vlan offload registrations into vlan_core and make this
> available even without 8021q module loaded.
> 
> Reported-by: Michael Shteinbok 
> Signed-off-by: Jiri Pirko 
> Tested-by: Michael Shteinbok 
> ---
>  net/8021q/vlan.c  | 96 -
>  net/8021q/vlan_core.c | 99 +++
>  2 files changed, 99 insertions(+), 96 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH net] ipv6: fix a dst leak when removing its exception

2018-11-14 Thread David Ahern

On 11/14/18 11:03 AM, David Ahern wrote:
> On 11/13/18 8:48 AM, Xin Long wrote:
>> These is no need to hold dst before calling rt6_remove_exception_rt().
>> The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
>> which has been removed in Commit 93531c674315 ("net/ipv6: separate
>> handling of FIB entries from dst based routes"). Otherwise, it will
>> cause a dst leak.
>>
>> This patch is to simply remove the dst_hold_safe() call before calling
>> rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
>> It's safe, because the removal of the exception that holds its dst's
>> refcnt is protected by rt6_exception_lock.
>>
>> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst 
>> based routes")
>> Fixes: 23fb93a4d3f1 ("net/ipv6: Cleanup exception and cache route handling")
>> Reported-by: Li Shuang 
>> Signed-off-by: Xin Long 
>> ---
>>  net/ipv6/route.c | 7 +++
>>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> was this problem actually hit or is this patch based on a code analysis?
> 

I ask because I have not been able to reproduce the leak using existing
tests (e.g., pmtu) that I know create exceptions.

If this problem was hit, it would be good to get a test case for it.

[PATCH net-next 2/7] net: sched: gred: pass extack to nla_parse_nested()

2018-11-14 Thread Jakub Kicinski

In case netlink wants to provide parsing error pass extack
to nla_parse_nested().

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 net/sched/sch_gred.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 22110be9d285..9f6a4ddd262a 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -406,7 +406,7 @@ static int gred_change(struct Qdisc *sch, struct nlattr 
*opt,
if (opt == NULL)
return -EINVAL;
 
-   err = nla_parse_nested(tb, TCA_GRED_MAX, opt, gred_policy, NULL);
+   err = nla_parse_nested(tb, TCA_GRED_MAX, opt, gred_policy, extack);
if (err < 0)
return err;
 
@@ -476,7 +476,7 @@ static int gred_init(struct Qdisc *sch, struct nlattr *opt,
if (!opt)
return -EINVAL;
 
-   err = nla_parse_nested(tb, TCA_GRED_MAX, opt, gred_policy, NULL);
+   err = nla_parse_nested(tb, TCA_GRED_MAX, opt, gred_policy, extack);
if (err < 0)
return err;
 
-- 
2.17.1

[PATCH net-next 6/7] net: sched: gred: store red flags per virtual queue

2018-11-14 Thread Jakub Kicinski

Right now ECN marking and HARD drop (the common RED flags) can only
be configured for the entire Qdisc.  In preparation for per-vq flags
store the values in the virtual queue structure.  Setting per-vq
flags will only be allowed when no flags are set for the entire Qdisc.
For the new flags we will also make sure undefined bits are 0.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 net/sched/sch_gred.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index dc09a32c4b4f..47133106c7e2 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -29,12 +29,15 @@
 #define GRED_DEF_PRIO (MAX_DPs / 2)
 #define GRED_VQ_MASK (MAX_DPs - 1)
 
+#define GRED_VQ_RED_FLAGS  (TC_RED_ECN | TC_RED_HARDDROP)
+
 struct gred_sched_data;
 struct gred_sched;
 
 struct gred_sched_data {
u32 limit;  /* HARD maximal queue length*/
u32 DP; /* the drop parameters */
+   u32 red_flags;  /* virtualQ version of red_flags */
u64 bytesin;/* bytes seen on virtualQ so far*/
u32 packetsin;  /* packets seen on virtualQ so far*/
u32 backlog;/* bytes on the virtualQ */
@@ -139,14 +142,14 @@ static inline void gred_store_wred_set(struct gred_sched 
*table,
table->wred_set.qidlestart = q->vars.qidlestart;
 }
 
-static inline int gred_use_ecn(struct gred_sched *t)
+static int gred_use_ecn(struct gred_sched_data *q)
 {
-   return t->red_flags & TC_RED_ECN;
+   return q->red_flags & TC_RED_ECN;
 }
 
-static inline int gred_use_harddrop(struct gred_sched *t)
+static int gred_use_harddrop(struct gred_sched_data *q)
 {
-   return t->red_flags & TC_RED_HARDDROP;
+   return q->red_flags & TC_RED_HARDDROP;
 }
 
 static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch,
@@ -212,7 +215,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 
case RED_PROB_MARK:
qdisc_qstats_overlimit(sch);
-   if (!gred_use_ecn(t) || !INET_ECN_set_ce(skb)) {
+   if (!gred_use_ecn(q) || !INET_ECN_set_ce(skb)) {
q->stats.prob_drop++;
goto congestion_drop;
}
@@ -222,7 +225,7 @@ static int gred_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 
case RED_HARD_MARK:
qdisc_qstats_overlimit(sch);
-   if (gred_use_harddrop(t) || !gred_use_ecn(t) ||
+   if (gred_use_harddrop(q) || !gred_use_ecn(q) ||
!INET_ECN_set_ce(skb)) {
q->stats.forced_drop++;
goto congestion_drop;
@@ -305,6 +308,7 @@ static int gred_change_table_def(struct Qdisc *sch, struct 
nlattr *dps,
 {
struct gred_sched *table = qdisc_priv(sch);
struct tc_gred_sopt *sopt;
+   bool red_flags_changed;
int i;
 
if (!dps)
@@ -329,6 +333,7 @@ static int gred_change_table_def(struct Qdisc *sch, struct 
nlattr *dps,
sch_tree_lock(sch);
table->DPs = sopt->DPs;
table->def = sopt->def_DP;
+   red_flags_changed = table->red_flags != sopt->flags;
table->red_flags = sopt->flags;
 
/*
@@ -348,6 +353,12 @@ static int gred_change_table_def(struct Qdisc *sch, struct 
nlattr *dps,
gred_disable_wred_mode(table);
}
 
+   if (red_flags_changed)
+   for (i = 0; i < table->DPs; i++)
+   if (table->tab[i])
+   table->tab[i]->red_flags =
+   table->red_flags & GRED_VQ_RED_FLAGS;
+
for (i = table->DPs; i < MAX_DPs; i++) {
if (table->tab[i]) {
pr_warn("GRED: Warning: Destroying shadowed VQ 0x%x\n",
@@ -379,6 +390,7 @@ static inline int gred_change_vq(struct Qdisc *sch, int dp,
*prealloc = NULL;
if (!q)
return -ENOMEM;
+   q->red_flags = table->red_flags & GRED_VQ_RED_FLAGS;
}
 
q->DP = dp;
-- 
2.17.1

[PATCH net-next 3/7] net: sched: gred: use extack to provide more details on configuration errors

2018-11-14 Thread Jakub Kicinski

Add extack messages to -EINVAL errors, to help users identify
their mistakes.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 net/sched/sch_gred.c | 44 +---
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 9f6a4ddd262a..3d7bd374b303 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -300,7 +300,8 @@ static inline void gred_destroy_vq(struct gred_sched_data 
*q)
kfree(q);
 }
 
-static inline int gred_change_table_def(struct Qdisc *sch, struct nlattr *dps)
+static int gred_change_table_def(struct Qdisc *sch, struct nlattr *dps,
+struct netlink_ext_ack *extack)
 {
struct gred_sched *table = qdisc_priv(sch);
struct tc_gred_sopt *sopt;
@@ -311,9 +312,19 @@ static inline int gred_change_table_def(struct Qdisc *sch, 
struct nlattr *dps)
 
sopt = nla_data(dps);
 
-   if (sopt->DPs > MAX_DPs || sopt->DPs == 0 ||
-   sopt->def_DP >= sopt->DPs)
+   if (sopt->DPs > MAX_DPs) {
+   NL_SET_ERR_MSG_MOD(extack, "number of virtual queues too high");
return -EINVAL;
+   }
+   if (sopt->DPs == 0) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "number of virtual queues can't be 0");
+   return -EINVAL;
+   }
+   if (sopt->def_DP >= sopt->DPs) {
+   NL_SET_ERR_MSG_MOD(extack, "default virtual queue above virtual 
queue count");
+   return -EINVAL;
+   }
 
sch_tree_lock(sch);
table->DPs = sopt->DPs;
@@ -352,13 +363,16 @@ static inline int gred_change_table_def(struct Qdisc 
*sch, struct nlattr *dps)
 static inline int gred_change_vq(struct Qdisc *sch, int dp,
 struct tc_gred_qopt *ctl, int prio,
 u8 *stab, u32 max_P,
-struct gred_sched_data **prealloc)
+struct gred_sched_data **prealloc,
+struct netlink_ext_ack *extack)
 {
struct gred_sched *table = qdisc_priv(sch);
struct gred_sched_data *q = table->tab[dp];
 
-   if (!red_check_params(ctl->qth_min, ctl->qth_max, ctl->Wlog))
+   if (!red_check_params(ctl->qth_min, ctl->qth_max, ctl->Wlog)) {
+   NL_SET_ERR_MSG_MOD(extack, "invalid RED parameters");
return -EINVAL;
+   }
 
if (!q) {
table->tab[dp] = q = *prealloc;
@@ -413,21 +427,25 @@ static int gred_change(struct Qdisc *sch, struct nlattr 
*opt,
if (tb[TCA_GRED_PARMS] == NULL && tb[TCA_GRED_STAB] == NULL) {
if (tb[TCA_GRED_LIMIT] != NULL)
sch->limit = nla_get_u32(tb[TCA_GRED_LIMIT]);
-   return gred_change_table_def(sch, tb[TCA_GRED_DPS]);
+   return gred_change_table_def(sch, tb[TCA_GRED_DPS], extack);
}
 
if (tb[TCA_GRED_PARMS] == NULL ||
tb[TCA_GRED_STAB] == NULL ||
-   tb[TCA_GRED_LIMIT] != NULL)
+   tb[TCA_GRED_LIMIT] != NULL) {
+   NL_SET_ERR_MSG_MOD(extack, "can't configure Qdisc and virtual 
queue at the same time");
return -EINVAL;
+   }
 
max_P = tb[TCA_GRED_MAX_P] ? nla_get_u32(tb[TCA_GRED_MAX_P]) : 0;
 
ctl = nla_data(tb[TCA_GRED_PARMS]);
stab = nla_data(tb[TCA_GRED_STAB]);
 
-   if (ctl->DP >= table->DPs)
+   if (ctl->DP >= table->DPs) {
+   NL_SET_ERR_MSG_MOD(extack, "virtual queue index above virtual 
queue count");
return -EINVAL;
+   }
 
if (gred_rio_mode(table)) {
if (ctl->prio == 0) {
@@ -447,7 +465,8 @@ static int gred_change(struct Qdisc *sch, struct nlattr 
*opt,
prealloc = kzalloc(sizeof(*prealloc), GFP_KERNEL);
sch_tree_lock(sch);
 
-   err = gred_change_vq(sch, ctl->DP, ctl, prio, stab, max_P, &prealloc);
+   err = gred_change_vq(sch, ctl->DP, ctl, prio, stab, max_P, &prealloc,
+extack);
if (err < 0)
goto err_unlock_free;
 
@@ -480,8 +499,11 @@ static int gred_init(struct Qdisc *sch, struct nlattr *opt,
if (err < 0)
return err;
 
-   if (tb[TCA_GRED_PARMS] || tb[TCA_GRED_STAB])
+   if (tb[TCA_GRED_PARMS] || tb[TCA_GRED_STAB]) {
+   NL_SET_ERR_MSG_MOD(extack,
+  "virtual queue configuration can't be 
specified at initialization time");
return -EINVAL;
+   }
 
if (tb[TCA_GRED_LIMIT])
sch->limit = nla_get_u32(tb[TCA_GRED_LIMIT]);
@@ -489,7 +511,7 @@ static int gred_init(struct Qdisc *sch, struct nlattr *opt,
sch->limit = qdisc_dev(sch)->tx_queue_len
 * psched_mtu(qdisc_dev(sch));
 
-   return gred_change_table_def(sch, tb[TCA_GRED_DPS]);
+   return gred_c

[PATCH net-next 1/7] net: sched: gred: separate error and non-error path in gred_change()

2018-11-14 Thread Jakub Kicinski

We will soon want to add more code to the non-error path, separate
it from the error handling flow.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 net/sched/sch_gred.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 4a042abf844c..22110be9d285 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -423,12 +423,11 @@ static int gred_change(struct Qdisc *sch, struct nlattr 
*opt,
 
max_P = tb[TCA_GRED_MAX_P] ? nla_get_u32(tb[TCA_GRED_MAX_P]) : 0;
 
-   err = -EINVAL;
ctl = nla_data(tb[TCA_GRED_PARMS]);
stab = nla_data(tb[TCA_GRED_STAB]);
 
if (ctl->DP >= table->DPs)
-   goto errout;
+   return -EINVAL;
 
if (gred_rio_mode(table)) {
if (ctl->prio == 0) {
@@ -450,7 +449,7 @@ static int gred_change(struct Qdisc *sch, struct nlattr 
*opt,
 
err = gred_change_vq(sch, ctl->DP, ctl, prio, stab, max_P, &prealloc);
if (err < 0)
-   goto errout_locked;
+   goto err_unlock_free;
 
if (gred_rio_mode(table)) {
gred_disable_wred_mode(table);
@@ -458,12 +457,13 @@ static int gred_change(struct Qdisc *sch, struct nlattr 
*opt,
gred_enable_wred_mode(table);
}
 
-   err = 0;
+   sch_tree_unlock(sch);
+   kfree(prealloc);
+   return 0;
 
-errout_locked:
+err_unlock_free:
sch_tree_unlock(sch);
kfree(prealloc);
-errout:
return err;
 }
 
-- 
2.17.1

[PATCH net-next 5/7] net: sched: gred: provide a better structured dump and expose stats

2018-11-14 Thread Jakub Kicinski

Currently all GRED's virtual queue data is dumped in a single
array in a single attribute.  This makes it pretty much impossible
to add new fields.  In order to expose more detailed stats add a
new set of attributes.  We can now expose the 64 bit value of bytesin
and all the mark stats which were not part of the original design.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 include/uapi/linux/pkt_sched.h | 26 +
 net/sched/sch_gred.c   | 53 +-
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index ee017bc057a3..c8f717346b60 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -291,11 +291,37 @@ enum {
TCA_GRED_DPS,
TCA_GRED_MAX_P,
TCA_GRED_LIMIT,
+   TCA_GRED_VQ_LIST,   /* nested TCA_GRED_VQ_ENTRY */
__TCA_GRED_MAX,
 };
 
 #define TCA_GRED_MAX (__TCA_GRED_MAX - 1)
 
+enum {
+   TCA_GRED_VQ_ENTRY_UNSPEC,
+   TCA_GRED_VQ_ENTRY,  /* nested TCA_GRED_VQ_* */
+   __TCA_GRED_VQ_ENTRY_MAX,
+};
+#define TCA_GRED_VQ_ENTRY_MAX (__TCA_GRED_VQ_ENTRY_MAX - 1)
+
+enum {
+   TCA_GRED_VQ_UNSPEC,
+   TCA_GRED_VQ_PAD,
+   TCA_GRED_VQ_DP, /* u32 */
+   TCA_GRED_VQ_STAT_BYTES, /* u64 */
+   TCA_GRED_VQ_STAT_PACKETS,   /* u32 */
+   TCA_GRED_VQ_STAT_BACKLOG,   /* u32 */
+   TCA_GRED_VQ_STAT_PROB_DROP, /* u32 */
+   TCA_GRED_VQ_STAT_PROB_MARK, /* u32 */
+   TCA_GRED_VQ_STAT_FORCED_DROP,   /* u32 */
+   TCA_GRED_VQ_STAT_FORCED_MARK,   /* u32 */
+   TCA_GRED_VQ_STAT_PDROP, /* u32 */
+   TCA_GRED_VQ_STAT_OTHER, /* u32 */
+   __TCA_GRED_VQ_MAX
+};
+
+#define TCA_GRED_VQ_MAX (__TCA_GRED_VQ_MAX - 1)
+
 struct tc_gred_qopt {
__u32   limit;/* HARD maximal queue length (bytes)*/
__u32   qth_min;  /* Min average length threshold (bytes) */
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 6f209c83ee7a..dc09a32c4b4f 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -404,6 +404,7 @@ static const struct nla_policy gred_policy[TCA_GRED_MAX + 
1] = {
[TCA_GRED_DPS]  = { .len = sizeof(struct tc_gred_sopt) },
[TCA_GRED_MAX_P]= { .type = NLA_U32 },
[TCA_GRED_LIMIT]= { .type = NLA_U32 },
+   [TCA_GRED_VQ_LIST]  = { .type = NLA_REJECT },
 };
 
 static int gred_change(struct Qdisc *sch, struct nlattr *opt,
@@ -517,7 +518,7 @@ static int gred_init(struct Qdisc *sch, struct nlattr *opt,
 static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
struct gred_sched *table = qdisc_priv(sch);
-   struct nlattr *parms, *opts = NULL;
+   struct nlattr *parms, *vqs, *opts = NULL;
int i;
u32 max_p[MAX_DPs];
struct tc_gred_sopt sopt = {
@@ -544,6 +545,7 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff *skb)
if (nla_put_u32(skb, TCA_GRED_LIMIT, sch->limit))
goto nla_put_failure;
 
+   /* Old style all-in-one dump of VQs */
parms = nla_nest_start(skb, TCA_GRED_PARMS);
if (parms == NULL)
goto nla_put_failure;
@@ -594,6 +596,55 @@ static int gred_dump(struct Qdisc *sch, struct sk_buff 
*skb)
 
nla_nest_end(skb, parms);
 
+   /* Dump the VQs again, in more structured way */
+   vqs = nla_nest_start(skb, TCA_GRED_VQ_LIST);
+   if (!vqs)
+   goto nla_put_failure;
+
+   for (i = 0; i < MAX_DPs; i++) {
+   struct gred_sched_data *q = table->tab[i];
+   struct nlattr *vq;
+
+   if (!q)
+   continue;
+
+   vq = nla_nest_start(skb, TCA_GRED_VQ_ENTRY);
+   if (!vq)
+   goto nla_put_failure;
+
+   if (nla_put_u32(skb, TCA_GRED_VQ_DP, q->DP))
+   goto nla_put_failure;
+
+   /* Stats */
+   if (nla_put_u64_64bit(skb, TCA_GRED_VQ_STAT_BYTES, q->bytesin,
+ TCA_GRED_VQ_PAD))
+   goto nla_put_failure;
+   if (nla_put_u32(skb, TCA_GRED_VQ_STAT_PACKETS, q->packetsin))
+   goto nla_put_failure;
+   if (nla_put_u32(skb, TCA_GRED_VQ_STAT_BACKLOG,
+   gred_backlog(table, q, sch)))
+   goto nla_put_failure;
+   if (nla_put_u32(skb, TCA_GRED_VQ_STAT_PROB_DROP,
+   q->stats.prob_drop))
+   goto nla_put_failure;
+   if (nla_put_u32(skb, TCA_GRED_VQ_STAT_PROB_MARK,
+   q->stats.prob_mark))
+   goto nla_put_failure;
+   if (nla_put_u32(skb, TCA_GRED_VQ_STAT_FORCED_DROP,
+   q->stats.forced_drop))
+   g

[PATCH net-next 7/7] net: sched: gred: allow manipulating per-DP RED flags

2018-11-14 Thread Jakub Kicinski

Allow users to set and dump RED flags (ECN enabled and harddrop)
on per-virtual queue basis.  Validation of attributes is split
from changes to make sure we won't have to undo previous operations
when we find out configuration is invalid.

The objective is to allow changing per-Qdisc parameters without
overwriting the per-vq configured flags.

Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and
per-Qdisc flags will always get propagated to the virtual queues.

New user space which wants to make use of per-vq flags should set
per-Qdisc flags to 0 and then configure per-vq flags as it
sees fit.  Once per-vq flags are set per-Qdisc flags can't be
changed to non-zero.  Vice versa - if the per-Qdisc flags are
non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted
or set to the same value as per-Qdisc flags.

Update per-Qdisc parameters:
per-Qdisc | per-VQ | result
0 |  0 | all vq flags updated
0 |  non-0 | error (vq flags in use)
non-0 |  0 | -- impossible --
non-0 |  non-0 | all vq flags updated

Update per-VQ state (flags parameter not specified):
   no change to flags

Update per-VQ state (flags parameter set):
per-Qdisc | per-VQ | result
0 |   any  | per-vq flags updated
non-0 |  0 | -- impossible --
non-0 |  non-0 | error (per-Qdisc flags in use)

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 include/uapi/linux/pkt_sched.h |   1 +
 net/sched/sch_gred.c   | 144 -
 2 files changed, 144 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index c8f717346b60..0d18b1d1fbbc 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -317,6 +317,7 @@ enum {
TCA_GRED_VQ_STAT_FORCED_MARK,   /* u32 */
TCA_GRED_VQ_STAT_PDROP, /* u32 */
TCA_GRED_VQ_STAT_OTHER, /* u32 */
+   TCA_GRED_VQ_FLAGS,  /* u32 */
__TCA_GRED_VQ_MAX
 };
 
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 47133106c7e2..8b8c325f48bc 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -152,6 +152,19 @@ static int gred_use_harddrop(struct gred_sched_data *q)
return q->red_flags & TC_RED_HARDDROP;
 }
 
+static bool gred_per_vq_red_flags_used(struct gred_sched *table)
+{
+   unsigned int i;
+
+   /* Local per-vq flags couldn't have been set unless global are 0 */
+   if (table->red_flags)
+   return false;
+   for (i = 0; i < MAX_DPs; i++)
+   if (table->tab[i] && table->tab[i]->red_flags)
+   return true;
+   return false;
+}
+
 static int gred_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
 {
@@ -329,6 +342,10 @@ static int gred_change_table_def(struct Qdisc *sch, struct 
nlattr *dps,
NL_SET_ERR_MSG_MOD(extack, "default virtual queue above virtual 
queue count");
return -EINVAL;
}
+   if (sopt->flags && gred_per_vq_red_flags_used(table)) {
+   NL_SET_ERR_MSG_MOD(extack, "can't set per-Qdisc RED flags when 
per-virtual queue flags are used");
+   return -EINVAL;
+   }
 
sch_tree_lock(sch);
table->DPs = sopt->DPs;
@@ -410,15 +427,127 @@ static inline int gred_change_vq(struct Qdisc *sch, int 
dp,
return 0;
 }
 
+static const struct nla_policy gred_vq_policy[TCA_GRED_VQ_MAX + 1] = {
+   [TCA_GRED_VQ_DP]= { .type = NLA_U32 },
+   [TCA_GRED_VQ_FLAGS] = { .type = NLA_U32 },
+};
+
+static const struct nla_policy gred_vqe_policy[TCA_GRED_VQ_ENTRY_MAX + 1] = {
+   [TCA_GRED_VQ_ENTRY] = { .type = NLA_NESTED },
+};
+
 static const struct nla_policy gred_policy[TCA_GRED_MAX + 1] = {
[TCA_GRED_PARMS]= { .len = sizeof(struct tc_gred_qopt) },
[TCA_GRED_STAB] = { .len = 256 },
[TCA_GRED_DPS]  = { .len = sizeof(struct tc_gred_sopt) },
[TCA_GRED_MAX_P]= { .type = NLA_U32 },
[TCA_GRED_LIMIT]= { .type = NLA_U32 },
-   [TCA_GRED_VQ_LIST]  = { .type = NLA_REJECT },
+   [TCA_GRED_VQ_LIST]  = { .type = NLA_NESTED },
 };
 
+static void gred_vq_apply(struct gred_sched *table, const struct nlattr *entry)
+{
+   struct nlattr *tb[TCA_GRED_VQ_MAX + 1];
+   u32 dp;
+
+   nla_parse_nested(tb, TCA_GRED_VQ_MAX, entry, gred_vq_policy, NULL);
+
+   dp = nla_get_u32(tb[TCA_GRED_VQ_DP]);
+
+   if (tb[TCA_GRED_VQ_FLAGS])
+   table->tab[dp]->red_flags = nla_get_u32(tb[TCA_GRED_VQ_FLAGS]);
+}
+
+static void gred_vqs_apply(struct gred_sched *table, struct nlattr *vqs)
+{
+   const struct nlattr *attr;
+   int rem;
+
+   nla_for_each_nested(attr, vqs, rem) {
+   switch (nla_type(attr)) {
+   case TCA_GRED_VQ_ENTRY:
+   gred_vq_apply(table, attr);
+   break

[PATCH net-next 0/7] net: sched: gred: introduce per-virtual queue attributes

2018-11-14 Thread Jakub Kicinski

Hi!

This series updates the GRED Qdisc.  The Qdisc matches nfp offload very
well, but before we can offload it there are a number of improvements
to make.

First few patches add extack messages to the Qdisc and pass extack
to netlink validation.

Next a new netlink attribute group is added, to allow GRED to be
extended more easily.  Currently GRED passes C structures as attributes,
and even an array of C structs for virtual queue configuration.  User
space has hard coded the expected length of that array, so adding new
fields is not possible.

New two-level attribute group is added:

  [TCA_GRED_VQ_LIST]
[TCA_GRED_VQ_ENTRY]
  [TCA_GRED_VQ_DP]
  [TCA_GRED_VQ_FLAGS]
  [TCA_GRED_VQ_STAT_*]
[TCA_GRED_VQ_ENTRY]
  [TCA_GRED_VQ_DP]
  [TCA_GRED_VQ_FLAGS]
  [TCA_GRED_VQ_STAT_*]
[TCA_GRED_VQ_ENTRY]
   ...

Statistics are dump only. Patch 4 switches the byte counts to be 64 bit,
and patch 5 introduces the new stats attributes for dump.  Patch 6
switches RED flags to be per-virtual queue, and patch 7 allows them
to be dumped and set at virtual queue granularity.


Jakub Kicinski (7):
  net: sched: gred: separate error and non-error path in gred_change()
  net: sched: gred: pass extack to nla_parse_nested()
  net: sched: gred: use extack to provide more details on configuration
errors
  net: sched: gred: store bytesin as a 64 bit value
  net: sched: gred: provide a better structured dump and expose stats
  net: sched: gred: store red flags per virtual queue
  net: sched: gred: allow manipulating per-DP RED flags

 include/uapi/linux/pkt_sched.h |  27 
 net/sched/sch_gred.c   | 281 +
 2 files changed, 281 insertions(+), 27 deletions(-)

-- 
2.17.1

[PATCH net-next 4/7] net: sched: gred: store bytesin as a 64 bit value

2018-11-14 Thread Jakub Kicinski

32 bit counters for bytes are not really going to last long in modern
world.  Make sch_gred count bytes on a 64 bit counter.  It will still
get truncated during dump but follow up patch will add set of new
stat dump attributes.

Signed-off-by: Jakub Kicinski 
Reviewed-by: John Hurley 
---
 net/sched/sch_gred.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 3d7bd374b303..6f209c83ee7a 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -35,7 +35,7 @@ struct gred_sched;
 struct gred_sched_data {
u32 limit;  /* HARD maximal queue length*/
u32 DP; /* the drop parameters */
-   u32 bytesin;/* bytes seen on virtualQ so far*/
+   u64 bytesin;/* bytes seen on virtualQ so far*/
u32 packetsin;  /* packets seen on virtualQ so far*/
u32 backlog;/* bytes on the virtualQ */
u8  prio;   /* the prio of this vq */
-- 
2.17.1

SHIPMENT DELIVERY

2018-11-14 Thread David Jim Brown




Attention,

I am Mr. David Jim Brown, Head Officer-in-Charge, Administrative Service 
Inspection Unit United Nations Inspection Agency in Hartsfield-Jackson 
International Airport Atlanta, Georgia. During our investigation, I discovered 
an abandoned shipment through a Diplomat from United Kingdom which was 
transferred from JF Kennedy Airport to our facility here in Atlanta, and when 
scanned it revealed an undisclosed sum of money in 2 Metal Trunk Boxes weighing 
approximately 130kg.


The consignment was abandoned because the Content was not properly declared by 
the consignee as money rather it was declared as personal effect/classified 
document to either avoid diversion by the Shipping Agent or confiscation by the 
relevant authorities. The diplomat's inability to pay for Non Inspection fees 
among other things are the reason why the consignment is delayed and abandoned.


By my assessment, each of the boxes contains $4M  which is total of $8Million 
USD. They are  still left in the airport storage facility till today. The 
Consignments like I said are two  metal trunk boxes weighing about 65kg each 
(Internal dimension:  W61 x H156 x D73 (cm) effective capacity: 680 L) 
Approximately. The details of the consignment including your name and email on 
the official document from United Nations' office in London where the shipment 
was tagged as personal effects/classified document is still available with us. 
As it stands now, you have to reconfirm your full name, Phone Number, full 
address so I  can cross-check and see if it corresponds with the one on the 
official documents. It is now left to you to decide if you still need the 
consignment or allow us repatriate it back to UK. (place of origin) as we were 
instructed.

(Reply to this email : ( headofficeincharge1...@gmail.com) for delivery details

As I did say again, the shipper abandoned it and ran away most importantly 
because he gave a false declaration, he could not pay for the yellow tag, he 
could not secure a valid non inspection document(s), etc. I am ready to assist 
you in any way I can for you to get back this packages provided you will also 
give me something  out of it (financial gratification). You can either come in 
person, or you engage the services of a secure shipping/delivery Company/agent 
that will provide the  necessary security that is required to deliver the 
package to your doorstep or the  destination of your choice. I need all the 
guarantee that I can get from  you before I can get involved in this project.


Best Regards,

Mr. David Jim Brown
Head Officer-in-Charge
Administrative Service Inspection Unit.
Contact Email: headofficeincharge1...@gmail.com

Re: [PATCH net-next 6/8] net: eth: altera: tse: add support for ptp and timestamping

2018-11-14 Thread Richard Cochran

On Wed, Nov 14, 2018 at 04:50:45PM -0800, Dalon Westergreen wrote:
> From: Dalon Westergreen 
> 
> Add support for the ptp clock used with the tse, and update
> the driver to support timestamping when enabled.  We also
> enable debugfs entries for the ptp clock to allow some user
> control and interaction with the ptp clock.
> 
> Signed-off-by: Dalon Westergreen 
> ---
>  drivers/net/ethernet/altera/Kconfig   |   1 +
>  drivers/net/ethernet/altera/Makefile  |   3 +-
>  drivers/net/ethernet/altera/altera_ptp.c  | 473 ++
>  drivers/net/ethernet/altera/altera_ptp.h  |  77 +++

Wouldn't it be nice if the PTP maintainer were included on CC for new
PTP drivers?  One can always wish...

> diff --git a/drivers/net/ethernet/altera/Kconfig 
> b/drivers/net/ethernet/altera/Kconfig
> index fdddba51473e..36aee0fc0b51 100644
> --- a/drivers/net/ethernet/altera/Kconfig
> +++ b/drivers/net/ethernet/altera/Kconfig
> @@ -2,6 +2,7 @@ config ALTERA_TSE
>   tristate "Altera Triple-Speed Ethernet MAC support"
>   depends on HAS_DMA
>   select PHYLIB
> + select PTP_1588_CLOCK

We now use "imply" instead of "select".

> +/* A fine ToD HW clock offset adjustment.
> + * To perform the fine offset adjustment the AdjustPeriod register is used
> + * to replace the Period register for AdjustCount clock cycles in hardware.
> + */
> +static int fine_adjust_tod_clock(struct altera_ptp_private *priv,
> +  u32 adjust_period, u32 adjust_count)

The function naming is really poor throughout this driver.  You should
use a unique prefix that relates to your driver or HW.

> +static int adjust_fine(struct ptp_clock_info *ptp, long scaled_ppm)
> +{

This is named like a global function.  Needs prefix.

> + struct altera_ptp_private *priv =
> + container_of(ptp, struct altera_ptp_private, ptp_clock_ops);
> + unsigned long flags;
> + int ret = 0;
> + u64 ppb;
> + u32 tod_period;
> + u32 tod_rem;
> + u32 tod_drift_adjust_fns;
> + u32 tod_drift_adjust_rate;
> + unsigned long rate;

Please put all like types together on one line, and use reverse
Christmas tree style.

> +
> + priv->scaled_ppm = scaled_ppm;
> +
> + /* only unlock if it is currently locked */
> + if (mutex_is_locked(&priv->ppm_mutex))
> + mutex_unlock(&priv->ppm_mutex);
> +
> + if (!priv->ptp_correct_freq)
> + goto out;
> +
> + rate = clk_get_rate(priv->tod_clk);
> +
> + /* From scaled_ppm_to_ppb */
> + ppb = 1 + scaled_ppm;
> + ppb *= 125;
> + ppb >>= 13;
> +
> + ppb += NOMINAL_PPB;
> +
> + tod_period = div_u64_rem(ppb << 16, rate, &tod_rem);
> + if (tod_period > TOD_PERIOD_MAX) {
> + ret = -ERANGE;
> + goto out;
> + }
> +
> + /* The drift of ToD adjusted periodically by adding a drift_adjust_fns
> +  * correction value every drift_adjust_rate count of clock cycles.
> +  */
> + tod_drift_adjust_fns = tod_rem / gcd(tod_rem, rate);
> + tod_drift_adjust_rate = rate / gcd(tod_rem, rate);
> +
> + while ((tod_drift_adjust_fns > TOD_DRIFT_ADJUST_FNS_MAX) |
> + (tod_drift_adjust_rate > TOD_DRIFT_ADJUST_RATE_MAX)) {
> + tod_drift_adjust_fns = tod_drift_adjust_fns >> 1;
> + tod_drift_adjust_rate = tod_drift_adjust_rate >> 1;
> + }
> +
> + if (tod_drift_adjust_fns == 0)
> + tod_drift_adjust_rate = 0;
> +
> + spin_lock_irqsave(&priv->tod_lock, flags);
> + csrwr32(tod_period, priv->tod_ctrl, tod_csroffs(period));
> + csrwr32(0, priv->tod_ctrl, tod_csroffs(adjust_period));
> + csrwr32(0, priv->tod_ctrl, tod_csroffs(adjust_count));
> + csrwr32(tod_drift_adjust_fns, priv->tod_ctrl,
> + tod_csroffs(drift_adjust));
> + csrwr32(tod_drift_adjust_rate, priv->tod_ctrl,
> + tod_csroffs(drift_adjust_rate));
> + spin_unlock_irqrestore(&priv->tod_lock, flags);
> +
> +out:
> + return ret;
> +}
> +
> +static int adjust_time(struct ptp_clock_info *ptp, s64 delta)
> +{
> + struct altera_ptp_private *priv =
> + container_of(ptp, struct altera_ptp_private, ptp_clock_ops);
> + int ret = 0;
> + u64 abs_delta;
> + unsigned long flags;
> + u32 period;
> + u32 diff;
> + u64 count;
> + u32 rem;

Same here.

> + if (!priv->ptp_correct_offs)
> + goto out;
> +
> + if (delta < 0)
> + abs_delta = -delta;
> + else
> + abs_delta = delta;
> +
> + spin_lock_irqsave(&priv->tod_lock, flags);
> +
> + /* Get the maximum possible value of the Period register offset
> +  * adjustment in nanoseconds scale. This depends on the current
> +  * Period register settings and the maximum and minimum possible
> +  * values of the Period register.
> +  */
> + period = csrrd32(priv->tod_ctrl, tod_csroffs(period));
> +
> + if (delta < 0)
> + diff = (perio

[PATCH net-next] selftests: add explicit test for multiple concurrent GRO sockets

2018-11-14 Thread Paolo Abeni

This covers for proper accounting of encap needed static keys

Signed-off-by: Paolo Abeni 
---
 tools/testing/selftests/net/udpgro.sh | 34 +++
 1 file changed, 34 insertions(+)

diff --git a/tools/testing/selftests/net/udpgro.sh 
b/tools/testing/selftests/net/udpgro.sh
index e94ef8067173..aeac53a99aeb 100755
--- a/tools/testing/selftests/net/udpgro.sh
+++ b/tools/testing/selftests/net/udpgro.sh
@@ -91,6 +91,28 @@ run_one_nat() {
wait $(jobs -p)
 }
 
+run_one_2sock() {
+   # use 'rx' as separator between sender args and receiver args
+   local -r all="$@"
+   local -r tx_args=${all%rx*}
+   local -r rx_args=${all#*rx}
+
+   cfg_veth
+
+   ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} -p 12345 &
+   ip netns exec "${PEER_NS}" ./udpgso_bench_rx ${rx_args} && \
+   echo "ok" || \
+   echo "failed" &
+
+   # Hack: let bg programs complete the startup
+   sleep 0.1
+   ./udpgso_bench_tx ${tx_args} -p 12345
+   sleep 0.1
+   # first UDP GSO socket should be closed at this point
+   ./udpgso_bench_tx ${tx_args}
+   wait $(jobs -p)
+}
+
 run_nat_test() {
local -r args=$@
 
@@ -98,6 +120,13 @@ run_nat_test() {
./in_netns.sh $0 __subprocess_nat $2 rx -r $3
 }
 
+run_2sock_test() {
+   local -r args=$@
+
+   printf " %-40s" "$1"
+   ./in_netns.sh $0 __subprocess_2sock $2 rx -G -r $3
+}
+
 run_all() {
local -r core_args="-l 4"
local -r ipv4_args="${core_args} -4 -D 192.168.1.1"
@@ -120,6 +149,7 @@ run_all() {
run_test "GRO with custom segment size cmsg" "${ipv4_args} -M 1 -s 
14720 -S 500 " "-4 -n 1 -l 14720 -S 500"
 
run_nat_test "bad GRO lookup" "${ipv4_args} -M 1 -s 14720 -S 0" "-n 10 
-l 1472"
+   run_2sock_test "multiple GRO socks" "${ipv4_args} -M 1 -s 14720 -S 0 " 
"-4 -n 1 -l 14720 -S 1472"
 
echo "ipv6"
run_test "no GRO" "${ipv6_args} -M 10 -s 1400" "-n 10 -l 1400"
@@ -130,6 +160,7 @@ run_all() {
run_test "GRO with custom segment size cmsg" "${ipv6_args} -M 1 -s 
14520 -S 500" "-n 1 -l 14520 -S 500"
 
run_nat_test "bad GRO lookup" "${ipv6_args} -M 1 -s 14520 -S 0" "-n 10 
-l 1452"
+   run_2sock_test "multiple GRO socks" "${ipv6_args} -M 1 -s 14520 -S 0 " 
"-n 1 -l 14520 -S 1452"
 }
 
 if [ ! -f ../bpf/xdp_dummy.o ]; then
@@ -145,4 +176,7 @@ elif [[ $1 == "__subprocess" ]]; then
 elif [[ $1 == "__subprocess_nat" ]]; then
shift
run_one_nat $@
+elif [[ $1 == "__subprocess_2sock" ]]; then
+   shift
+   run_one_2sock $@
 fi
-- 
2.17.2

[PATCH ipsec-next] xfrm: policy: fix netlink/pf_key policy lookups

2018-11-14 Thread Florian Westphal

Colin Ian King says:
 Static analysis with CoverityScan found a potential issue [..]
 It seems that pointer pol is set to NULL and then a check to see if it
 is non-null is used to set pol to tmp; howeverm this check is always
 going to be false because pol is always NULL.

Fix this and update test script to catch this.  Updated script only:
./xfrm_policy.sh ; echo $?
RTNETLINK answers: No such file or directory
FAIL: ip -net ns3 xfrm policy get src 10.0.1.0/24 dst 10.0.2.0/24 dir out
RTNETLINK answers: No such file or directory
[..]
PASS: policy before exception matches
PASS: ping to .254 bypassed ipsec tunnel
PASS: direct policy matches
PASS: policy matches
1

Fixes: 6be3b0db6db ("xfrm: policy: add inexact policy search tree 
infrastructure")
Reported-by: Colin Ian King 
Signed-off-by: Florian Westphal 
---
 net/xfrm/xfrm_policy.c |  5 ++-
 tools/testing/selftests/net/xfrm_policy.sh | 38 ++
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index bd80b8a4322f..cff8c5b720b4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1663,7 +1663,10 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net 
*net, u32 mark, u32 if_id,
tmp = __xfrm_policy_bysel_ctx(cand.res[i], mark,
  if_id, type, dir,
  sel, ctx);
-   if (tmp && pol && tmp->pos < pol->pos)
+   if (!tmp)
+   continue;
+
+   if (!pol || tmp->pos < pol->pos)
pol = tmp;
}
} else {
diff --git a/tools/testing/selftests/net/xfrm_policy.sh 
b/tools/testing/selftests/net/xfrm_policy.sh
index 6ca63a6e50e9..8db35b99457c 100755
--- a/tools/testing/selftests/net/xfrm_policy.sh
+++ b/tools/testing/selftests/net/xfrm_policy.sh
@@ -21,6 +21,7 @@
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 ret=0
+policy_checks_ok=1
 
 KEY_SHA=0xdeadbeef1234567890abcdefabcdefabcdefabcd
 KEY_AES=0x0123456789abcdef0123456789012345
@@ -45,6 +46,26 @@ do_esp() {
 ip -net $ns xfrm policy add src $rnet dst $lnet dir fwd tmpl src $remote 
dst $me proto esp mode tunnel priority 100 action allow
 }
 
+do_esp_policy_get_check() {
+local ns=$1
+local lnet=$2
+local rnet=$3
+
+ip -net $ns xfrm policy get src $lnet dst $rnet dir out > /dev/null
+if [ $? -ne 0 ] && [ $policy_checks_ok -eq 1 ] ;then
+policy_checks_ok=0
+echo "FAIL: ip -net $ns xfrm policy get src $lnet dst $rnet dir out"
+ret=1
+fi
+
+ip -net $ns xfrm policy get src $rnet dst $lnet dir fwd > /dev/null
+if [ $? -ne 0 ] && [ $policy_checks_ok -eq 1 ] ;then
+policy_checks_ok=0
+echo "FAIL: ip -net $ns xfrm policy get src $rnet dst $lnet dir fwd"
+ret=1
+fi
+}
+
 do_exception() {
 local ns=$1
 local me=$2
@@ -112,31 +133,31 @@ check_xfrm() {
# 1: iptables -m policy rule count != 0
rval=$1
ip=$2
-   ret=0
+   lret=0
 
ip netns exec ns1 ping -q -c 1 10.0.2.$ip > /dev/null
 
check_ipt_policy_count ns3
if [ $? -ne $rval ] ; then
-   ret=1
+   lret=1
fi
check_ipt_policy_count ns4
if [ $? -ne $rval ] ; then
-   ret=1
+   lret=1
fi
 
ip netns exec ns2 ping -q -c 1 10.0.1.$ip > /dev/null
 
check_ipt_policy_count ns3
if [ $? -ne $rval ] ; then
-   ret=1
+   lret=1
fi
check_ipt_policy_count ns4
if [ $? -ne $rval ] ; then
-   ret=1
+   lret=1
fi
 
-   return $ret
+   return $lret
 }
 
 #check for needed privileges
@@ -227,6 +248,11 @@ do_esp ns4 dead:3::10 dead:3::1 dead:2::/64 dead:1::/64 
$SPI2 $SPI1
 do_dummies4 ns3
 do_dummies6 ns4
 
+do_esp_policy_get_check ns3 10.0.1.0/24 10.0.2.0/24
+do_esp_policy_get_check ns4 10.0.2.0/24 10.0.1.0/24
+do_esp_policy_get_check ns3 dead:1::/64 dead:2::/64
+do_esp_policy_get_check ns4 dead:2::/64 dead:1::/64
+
 # ping to .254 should use ipsec, exception is not installed.
 check_xfrm 1 254
 if [ $? -ne 0 ]; then
-- 
2.18.1

[PATCH net-next] udp: fix jump label misuse

2018-11-14 Thread Paolo Abeni

The commit 60fb9567bf30 ("udp: implement complete book-keeping for
encap_needed") introduced a severe misuse of jump label APIs, which
syzbot, as reported by Eric, was able to exploit.

When multiple sockets/process can concurrently request (and than
disable) the udp encap, we need to track the activation counter with
*_inc()/*_dec() jump label variants, or we can experience bad things
at disable time.

Fixes: 60fb9567bf30 ("udp: implement complete book-keeping for encap_needed")
Reported-by: Eric Dumazet 
Signed-off-by: Paolo Abeni 
---
 net/ipv4/udp.c | 4 ++--
 net/ipv6/udp.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 6f8890c5bc7e..aff2a8e99e01 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -587,7 +587,7 @@ static inline bool __udp_is_mcast_sock(struct net *net, 
struct sock *sk,
 DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key);
 void udp_encap_enable(void)
 {
-   static_branch_enable(&udp_encap_needed_key);
+   static_branch_inc(&udp_encap_needed_key);
 }
 EXPORT_SYMBOL(udp_encap_enable);
 
@@ -2524,7 +2524,7 @@ void udp_destroy_sock(struct sock *sk)
encap_destroy(sk);
}
if (up->encap_enabled)
-   static_branch_disable(&udp_encap_needed_key);
+   static_branch_dec(&udp_encap_needed_key);
}
 }
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index dde51fc7ac16..09cba4cfe31f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -448,7 +448,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len,
 DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
 void udpv6_encap_enable(void)
 {
-   static_branch_enable(&udpv6_encap_needed_key);
+   static_branch_inc(&udpv6_encap_needed_key);
 }
 EXPORT_SYMBOL(udpv6_encap_enable);
 
@@ -1579,7 +1579,7 @@ void udpv6_destroy_sock(struct sock *sk)
encap_destroy(sk);
}
if (up->encap_enabled)
-   static_branch_disable(&udpv6_encap_needed_key);
+   static_branch_dec(&udpv6_encap_needed_key);
}
 
inet6_destroy_sock(sk);
-- 
2.17.2

[PATCH net-next v1 1/4] etf: Cancel timer if there are no pending skbs

2018-11-14 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

There is no point in firing the qdisc watchdog if there are no future
skbs pending in the queue and the watchdog had been set previously.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 1538d6fa8165..fa85b24ac794 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -117,8 +117,10 @@ static void reset_watchdog(struct Qdisc *sch)
struct sk_buff *skb = etf_peek_timesortedlist(sch);
ktime_t next;
 
-   if (!skb)
+   if (!skb) {
+   qdisc_watchdog_cancel(&q->watchdog);
return;
+   }
 
next = ktime_sub_ns(skb->tstamp, q->delta);
qdisc_watchdog_schedule_ns(&q->watchdog, ktime_to_ns(next));
-- 
2.19.1

[PATCH net-next v1 4/4] etf: Drop all expired packets

2018-11-14 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

Currently on dequeue() ETF only drops the first expired packet, which
causes a problem if the next packet is already expired. When this
happens, the watchdog will be configured with a time in the past, fire
straight way and the packet will finally be dropped once the dequeue()
function of the qdisc is called again.

We can save quite a few cycles and improve the overall behavior of the
qdisc if we drop all expired packets if the next packet is expired.
This should allow ETF to recover faster from bad situations. But
packet drops are still a very serious warning that the requirements
imposed on the system aren't reasonable.

This was inspired by how the implementation of hrtimers use the
rb_tree inside the kernel.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index bfe04748d5f0..1150f22983df 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -190,29 +190,35 @@ static int etf_enqueue_timesortedlist(struct sk_buff 
*nskb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
 }
 
-static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb)
+static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
+   ktime_t now)
 {
struct etf_sched_data *q = qdisc_priv(sch);
struct sk_buff *to_free = NULL;
+   struct sk_buff *tmp = NULL;
 
-   rb_erase_cached(&skb->rbnode, &q->head);
+   skb_rbtree_walk_from_safe(skb, tmp) {
+   if (ktime_after(skb->tstamp, now))
+   break;
 
-   /* The rbnode field in the skb re-uses these fields, now that
-* we are done with the rbnode, reset them.
-*/
-   skb->next = NULL;
-   skb->prev = NULL;
-   skb->dev = qdisc_dev(sch);
+   rb_erase_cached(&skb->rbnode, &q->head);
 
-   qdisc_qstats_backlog_dec(sch, skb);
+   /* The rbnode field in the skb re-uses these fields, now that
+* we are done with the rbnode, reset them.
+*/
+   skb->next = NULL;
+   skb->prev = NULL;
+   skb->dev = qdisc_dev(sch);
 
-   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
+   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
 
-   qdisc_drop(skb, sch, &to_free);
-   kfree_skb_list(to_free);
-   qdisc_qstats_overlimit(sch);
+   qdisc_qstats_backlog_dec(sch, skb);
+   qdisc_drop(skb, sch, &to_free);
+   qdisc_qstats_overlimit(sch);
+   sch->q.qlen--;
+   }
 
-   sch->q.qlen--;
+   kfree_skb_list(to_free);
 }
 
 static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
@@ -251,7 +257,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 
/* Drop if packet has expired while in queue. */
if (ktime_before(skb->tstamp, now)) {
-   timesortedlist_drop(sch, skb);
+   timesortedlist_drop(sch, skb, now);
skb = NULL;
goto out;
}
-- 
2.19.1

[PATCH net-next v1 3/4] etf: Split timersortedlist_erase()

2018-11-14 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

This is just a refactor that will simplify the implementation of the
next patch in this series which will drop all expired packets on the
dequeue flow.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 44 +---
 1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index 52452b546564..bfe04748d5f0 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -190,10 +190,10 @@ static int etf_enqueue_timesortedlist(struct sk_buff 
*nskb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
 }
 
-static void timesortedlist_erase(struct Qdisc *sch, struct sk_buff *skb,
-bool drop)
+static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb)
 {
struct etf_sched_data *q = qdisc_priv(sch);
+   struct sk_buff *to_free = NULL;
 
rb_erase_cached(&skb->rbnode, &q->head);
 
@@ -206,19 +206,33 @@ static void timesortedlist_erase(struct Qdisc *sch, 
struct sk_buff *skb,
 
qdisc_qstats_backlog_dec(sch, skb);
 
-   if (drop) {
-   struct sk_buff *to_free = NULL;
+   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
 
-   report_sock_error(skb, ECANCELED, SO_EE_CODE_TXTIME_MISSED);
+   qdisc_drop(skb, sch, &to_free);
+   kfree_skb_list(to_free);
+   qdisc_qstats_overlimit(sch);
 
-   qdisc_drop(skb, sch, &to_free);
-   kfree_skb_list(to_free);
-   qdisc_qstats_overlimit(sch);
-   } else {
-   qdisc_bstats_update(sch, skb);
+   sch->q.qlen--;
+}
 
-   q->last = skb->tstamp;
-   }
+static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
+{
+   struct etf_sched_data *q = qdisc_priv(sch);
+
+   rb_erase_cached(&skb->rbnode, &q->head);
+
+   /* The rbnode field in the skb re-uses these fields, now that
+* we are done with the rbnode, reset them.
+*/
+   skb->next = NULL;
+   skb->prev = NULL;
+   skb->dev = qdisc_dev(sch);
+
+   qdisc_qstats_backlog_dec(sch, skb);
+
+   qdisc_bstats_update(sch, skb);
+
+   q->last = skb->tstamp;
 
sch->q.qlen--;
 }
@@ -237,7 +251,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 
/* Drop if packet has expired while in queue. */
if (ktime_before(skb->tstamp, now)) {
-   timesortedlist_erase(sch, skb, true);
+   timesortedlist_drop(sch, skb);
skb = NULL;
goto out;
}
@@ -246,7 +260,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 * txtime from deadline to (now + delta).
 */
if (q->deadline_mode) {
-   timesortedlist_erase(sch, skb, false);
+   timesortedlist_remove(sch, skb);
skb->tstamp = now;
goto out;
}
@@ -255,7 +269,7 @@ static struct sk_buff *etf_dequeue_timesortedlist(struct 
Qdisc *sch)
 
/* Dequeue only if now is within the [txtime - delta, txtime] range. */
if (ktime_after(now, next))
-   timesortedlist_erase(sch, skb, false);
+   timesortedlist_remove(sch, skb);
else
skb = NULL;
 
-- 
2.19.1

[PATCH net-next v1 2/4] etf: Use cached rb_root

2018-11-14 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

ETF's peek() operation is heavily used so use an rb_root_cached instead
and leverage rb_first_cached() which will run in O(1) instead of
O(log n).

Even if on 'timesortedlist_clear()' we could be using rb_erase(), we
choose to use rb_erase_cached(), because if in the future we allow
runtime changes to ETF parameters, and need to do a '_clear()', this
might cause some hard to debug issues.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_etf.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index fa85b24ac794..52452b546564 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -30,7 +30,7 @@ struct etf_sched_data {
int queue;
s32 delta; /* in ns */
ktime_t last; /* The txtime of the last skb sent to the netdevice. */
-   struct rb_root head;
+   struct rb_root_cached head;
struct qdisc_watchdog watchdog;
ktime_t (*get_time)(void);
 };
@@ -104,7 +104,7 @@ static struct sk_buff *etf_peek_timesortedlist(struct Qdisc 
*sch)
struct etf_sched_data *q = qdisc_priv(sch);
struct rb_node *p;
 
-   p = rb_first(&q->head);
+   p = rb_first_cached(&q->head);
if (!p)
return NULL;
 
@@ -156,8 +156,9 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, 
struct Qdisc *sch,
  struct sk_buff **to_free)
 {
struct etf_sched_data *q = qdisc_priv(sch);
-   struct rb_node **p = &q->head.rb_node, *parent = NULL;
+   struct rb_node **p = &q->head.rb_root.rb_node, *parent = NULL;
ktime_t txtime = nskb->tstamp;
+   bool leftmost = true;
 
if (!is_packet_valid(sch, nskb)) {
report_sock_error(nskb, EINVAL,
@@ -170,13 +171,15 @@ static int etf_enqueue_timesortedlist(struct sk_buff 
*nskb, struct Qdisc *sch,
 
parent = *p;
skb = rb_to_skb(parent);
-   if (ktime_after(txtime, skb->tstamp))
+   if (ktime_after(txtime, skb->tstamp)) {
p = &parent->rb_right;
-   else
+   leftmost = false;
+   } else {
p = &parent->rb_left;
+   }
}
rb_link_node(&nskb->rbnode, parent, p);
-   rb_insert_color(&nskb->rbnode, &q->head);
+   rb_insert_color_cached(&nskb->rbnode, &q->head, leftmost);
 
qdisc_qstats_backlog_inc(sch, nskb);
sch->q.qlen++;
@@ -192,7 +195,7 @@ static void timesortedlist_erase(struct Qdisc *sch, struct 
sk_buff *skb,
 {
struct etf_sched_data *q = qdisc_priv(sch);
 
-   rb_erase(&skb->rbnode, &q->head);
+   rb_erase_cached(&skb->rbnode, &q->head);
 
/* The rbnode field in the skb re-uses these fields, now that
 * we are done with the rbnode, reset them.
@@ -388,14 +391,14 @@ static int etf_init(struct Qdisc *sch, struct nlattr *opt,
 static void timesortedlist_clear(struct Qdisc *sch)
 {
struct etf_sched_data *q = qdisc_priv(sch);
-   struct rb_node *p = rb_first(&q->head);
+   struct rb_node *p = rb_first_cached(&q->head);
 
while (p) {
struct sk_buff *skb = rb_to_skb(p);
 
p = rb_next(p);
 
-   rb_erase(&skb->rbnode, &q->head);
+   rb_erase_cached(&skb->rbnode, &q->head);
rtnl_kfree_skbs(skb, skb);
sch->q.qlen--;
}
-- 
2.19.1

[PATCH net-next 5/8] net: eth: altera: tse: Move common functions to altera_utils

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

Move request_and_map and other shared functions to altera_utils. This
is the first step to moving common code out of tse specific code so
that it can be shared with future altera ethernet ip.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/altera_tse.h  | 45 --
 .../net/ethernet/altera/altera_tse_ethtool.c  |  1 +
 drivers/net/ethernet/altera/altera_tse_main.c | 32 +
 drivers/net/ethernet/altera/altera_utils.c| 30 
 drivers/net/ethernet/altera/altera_utils.h| 46 +++
 5 files changed, 78 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_tse.h 
b/drivers/net/ethernet/altera/altera_tse.h
index 7f246040135d..f435fb0eca90 100644
--- a/drivers/net/ethernet/altera/altera_tse.h
+++ b/drivers/net/ethernet/altera/altera_tse.h
@@ -500,49 +500,4 @@ struct altera_tse_private {
  */
 void altera_tse_set_ethtool_ops(struct net_device *);
 
-static inline
-u32 csrrd32(void __iomem *mac, size_t offs)
-{
-   void __iomem *paddr = (void __iomem *)((uintptr_t)mac + offs);
-   return readl(paddr);
-}
-
-static inline
-u16 csrrd16(void __iomem *mac, size_t offs)
-{
-   void __iomem *paddr = (void __iomem *)((uintptr_t)mac + offs);
-   return readw(paddr);
-}
-
-static inline
-u8 csrrd8(void __iomem *mac, size_t offs)
-{
-   void __iomem *paddr = (void __iomem *)((uintptr_t)mac + offs);
-   return readb(paddr);
-}
-
-static inline
-void csrwr32(u32 val, void __iomem *mac, size_t offs)
-{
-   void __iomem *paddr = (void __iomem *)((uintptr_t)mac + offs);
-
-   writel(val, paddr);
-}
-
-static inline
-void csrwr16(u16 val, void __iomem *mac, size_t offs)
-{
-   void __iomem *paddr = (void __iomem *)((uintptr_t)mac + offs);
-
-   writew(val, paddr);
-}
-
-static inline
-void csrwr8(u8 val, void __iomem *mac, size_t offs)
-{
-   void __iomem *paddr = (void __iomem *)((uintptr_t)mac + offs);
-
-   writeb(val, paddr);
-}
-
 #endif /* __ALTERA_TSE_H__ */
diff --git a/drivers/net/ethernet/altera/altera_tse_ethtool.c 
b/drivers/net/ethernet/altera/altera_tse_ethtool.c
index 7c367713c3e6..2998655ab316 100644
--- a/drivers/net/ethernet/altera/altera_tse_ethtool.c
+++ b/drivers/net/ethernet/altera/altera_tse_ethtool.c
@@ -33,6 +33,7 @@
 #include 
 
 #include "altera_tse.h"
+#include "altera_utils.h"
 
 #define TSE_STATS_LEN  31
 #define TSE_NUM_REGS   128
diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index f6b6a14b1ce9..b25d03506470 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -44,7 +43,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -1332,35 +1331,6 @@ static struct net_device_ops altera_tse_netdev_ops = {
.ndo_validate_addr  = eth_validate_addr,
 };
 
-static int request_and_map(struct platform_device *pdev, const char *name,
-  struct resource **res, void __iomem **ptr)
-{
-   struct resource *region;
-   struct device *device = &pdev->dev;
-
-   *res = platform_get_resource_byname(pdev, IORESOURCE_MEM, name);
-   if (*res == NULL) {
-   dev_err(device, "resource %s not defined\n", name);
-   return -ENODEV;
-   }
-
-   region = devm_request_mem_region(device, (*res)->start,
-resource_size(*res), dev_name(device));
-   if (region == NULL) {
-   dev_err(device, "unable to request %s\n", name);
-   return -EBUSY;
-   }
-
-   *ptr = devm_ioremap_nocache(device, region->start,
-   resource_size(region));
-   if (*ptr == NULL) {
-   dev_err(device, "ioremap_nocache of %s failed!", name);
-   return -ENOMEM;
-   }
-
-   return 0;
-}
-
 /* Probe Altera TSE MAC device
  */
 static int altera_tse_probe(struct platform_device *pdev)
diff --git a/drivers/net/ethernet/altera/altera_utils.c 
b/drivers/net/ethernet/altera/altera_utils.c
index d7eeb1713ad2..bc33b7f0b0c5 100644
--- a/drivers/net/ethernet/altera/altera_utils.c
+++ b/drivers/net/ethernet/altera/altera_utils.c
@@ -42,3 +42,33 @@ int tse_bit_is_clear(void __iomem *ioaddr, size_t offs, u32 
bit_mask)
u32 value = csrrd32(ioaddr, offs);
return (value & bit_mask) ? 0 : 1;
 }
+
+int request_and_map(struct platform_device *pdev, const char *name,
+   struct resource **res, void __iomem **ptr)
+{
+   struct resource *region;
+   struct device *device = &pdev->dev;
+
+   *res = platform_get_resource_byname(pdev, IORESOURCE_MEM, name);
+   if (!*res) {
+   dev_err(device, "resource %s not defined\n", name);
+   return -ENODEV;
+   }
+
+   region = d

[PATCH net-next 8/8] net: eth: altera: tse: update devicetree bindings documentation

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

Update devicetree bindings documentation to include msgdma
prefetcher and ptp bindings.

Signed-off-by: Dalon Westergreen 
---
 .../devicetree/bindings/net/altera_tse.txt| 98 +++
 1 file changed, 79 insertions(+), 19 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/altera_tse.txt 
b/Documentation/devicetree/bindings/net/altera_tse.txt
index 0e21df94a53f..d35806942a8f 100644
--- a/Documentation/devicetree/bindings/net/altera_tse.txt
+++ b/Documentation/devicetree/bindings/net/altera_tse.txt
@@ -2,50 +2,79 @@
 
 Required properties:
 - compatible: Should be "altr,tse-1.0" for legacy SGDMA based TSE, and should
-   be "altr,tse-msgdma-1.0" for the preferred MSGDMA based TSE.
+   be "altr,tse-msgdma-1.0" for the preferred MSGDMA based TSE,
+   and "altr,tse-msgdma-2.0" for MSGDMA with prefetcher based
+   implementations.
ALTR is supported for legacy device trees, but is deprecated.
altr should be used for all new designs.
 - reg: Address and length of the register set for the device. It contains
   the information of registers in the same order as described by reg-names
 - reg-names: Should contain the reg names
-  "control_port": MAC configuration space region
-  "tx_csr":   xDMA Tx dispatcher control and status space region
-  "tx_desc":  MSGDMA Tx dispatcher descriptor space region
-  "rx_csr" :  xDMA Rx dispatcher control and status space region
-  "rx_desc":  MSGDMA Rx dispatcher descriptor space region
-  "rx_resp":  MSGDMA Rx dispatcher response space region
-  "s1":  SGDMA descriptor memory
 - interrupts: Should contain the TSE interrupts and it's mode.
 - interrupt-names: Should contain the interrupt names
-  "rx_irq":   xDMA Rx dispatcher interrupt
-  "tx_irq":   xDMA Tx dispatcher interrupt
+  "rx_irq":   DMA Rx dispatcher interrupt
+  "tx_irq":   DMA Tx dispatcher interrupt
 - rx-fifo-depth: MAC receive FIFO buffer depth in bytes
 - tx-fifo-depth: MAC transmit FIFO buffer depth in bytes
 - phy-mode: See ethernet.txt in the same directory.
 - phy-handle: See ethernet.txt in the same directory.
 - phy-addr: See ethernet.txt in the same directory. A configuration should
include phy-handle or phy-addr.
-- altr,has-supplementary-unicast:
-   If present, TSE supports additional unicast addresses.
-   Otherwise additional unicast addresses are not supported.
-- altr,has-hash-multicast-filter:
-   If present, TSE supports a hash based multicast filter.
-   Otherwise, hash-based multicast filtering is not supported.
-
 - mdio device tree subnode: When the TSE has a phy connected to its local
mdio, there must be device tree subnode with the following
required properties:
-
- compatible: Must be "altr,tse-mdio".
- #address-cells: Must be <1>.
- #size-cells: Must be <0>.
 
For each phy on the mdio bus, there must be a node with the following
fields:
-
- reg: phy id used to communicate to phy.
- device_type: Must be "ethernet-phy".
 
+- altr,has-supplementary-unicast:
+   If present, TSE supports additional unicast addresses.
+   Otherwise additional unicast addresses are not supported.
+- altr,has-hash-multicast-filter:
+   If present, TSE supports a hash based multicast filter.
+   Otherwise, hash-based multicast filtering is not supported.
+- altr,has-ptp:
+   If present, TSE supports 1588 timestamping.  Currently only
+   supported with the msgdma prefetcher.
+- altr,tx-poll-cnt:
+   Optional cycle count for Tx prefetcher to poll descriptor
+   list.  If not present, defaults to 128, which at 125MHz is
+   roughly 1usec. Only for "altr,tse-msgdma-2.0".
+- altr,rx-poll-cnt:
+   Optional cycle count for Tx prefetcher to poll descriptor
+   list.  If not present, defaults to 128, which at 125MHz is
+   roughly 1usec. Only for "altr,tse-msgdma-2.0".
+
+Required registers by compatibility string:
+ - "altr,tse-1.0"
+   "control_port": MAC configuration space region
+   "tx_csr":   DMA Tx dispatcher control and status space region
+   "rx_csr" :  DMA Rx dispatcher control and status space region
+   "s1":   DMA descriptor memory
+
+ - "altr,tse-msgdma-1.0"
+   "control_port": MAC configuration space region
+   "tx_csr":   DMA Tx dispatcher control and status space region
+   "tx_desc":  DMA Tx dispatcher descriptor space region
+   "rx_csr" :  DMA Rx dispatcher control and status space region
+   "rx_desc":  DMA Rx dispatcher descriptor space region
+   "rx_resp":  DMA Rx dispatcher response space region
+
+ - "altr,tse-msgdma-2.0"
+   "control_port": M

[PATCH net-next 7/8] net: eth: altera: tse: add msgdma prefetcher

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

Add support for the mSGDMA prefetcher.  The prefetcher adds support
for a linked list of descriptors in system memory.  The prefetcher
feeds these to the mSGDMA dispatcher.

The prefetcher is configured to poll for the next descriptor in the
list to be owned by hardware, then pass the descriptor to the
dispatcher.  It will then poll the next descriptor until it is
owned by hardware.

The dispatcher responses are written back to the appropriate
descriptor, and the owned by hardware bit is cleared.

The driver sets up a linked list twice the tx and rx ring sizes,
with the last descriptor pointing back to the first.  This ensures
that the ring of descriptors will always have inactive descriptors
preventing the prefetcher from looping over and reusing descriptors
inappropriately.  The prefetcher will continuously loop over these
descriptors.  The driver modifies descriptors as required to update
the skb address and length as well as the owned by hardware bit.

In addition to the above, the mSGDMA prefetcher can be used to
handle rx and tx timestamps coming from the ethernet ip.  These
can be included in the prefetcher response in the descriptor.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/Makefile  |   2 +-
 .../altera/altera_msgdma_prefetcher.c | 433 ++
 .../altera/altera_msgdma_prefetcher.h |  30 ++
 .../altera/altera_msgdmahw_prefetcher.h   |  87 
 drivers/net/ethernet/altera/altera_tse.h  |  14 +
 drivers/net/ethernet/altera/altera_tse_main.c |  51 +++
 6 files changed, 616 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/altera/altera_msgdma_prefetcher.c
 create mode 100644 drivers/net/ethernet/altera/altera_msgdma_prefetcher.h
 create mode 100644 drivers/net/ethernet/altera/altera_msgdmahw_prefetcher.h

diff --git a/drivers/net/ethernet/altera/Makefile 
b/drivers/net/ethernet/altera/Makefile
index ad80be42fa26..73b32876f126 100644
--- a/drivers/net/ethernet/altera/Makefile
+++ b/drivers/net/ethernet/altera/Makefile
@@ -5,4 +5,4 @@
 obj-$(CONFIG_ALTERA_TSE) += altera_tse.o
 altera_tse-objs := altera_tse_main.o altera_tse_ethtool.o \
   altera_msgdma.o altera_sgdma.o altera_utils.o \
-  altera_ptp.o
+  altera_ptp.o altera_msgdma_prefetcher.o
diff --git a/drivers/net/ethernet/altera/altera_msgdma_prefetcher.c 
b/drivers/net/ethernet/altera/altera_msgdma_prefetcher.c
new file mode 100644
index ..55b475e9e15b
--- /dev/null
+++ b/drivers/net/ethernet/altera/altera_msgdma_prefetcher.c
@@ -0,0 +1,433 @@
+// SPDX-License-Identifier: GPL-2.0
+/* MSGDMA Prefetcher driver for Altera ethernet devices
+ *
+ * Copyright (C) 2018 Intel Corporation. All rights reserved.
+ * Author(s):
+ *   Dalon Westergreen 
+ */
+
+#include 
+#include 
+#include 
+#include "altera_utils.h"
+#include "altera_tse.h"
+#include "altera_msgdma.h"
+#include "altera_msgdmahw.h"
+#include "altera_msgdma_prefetcher.h"
+#include "altera_msgdmahw_prefetcher.h"
+
+int msgdma_pref_initialize(struct altera_tse_private *priv)
+{
+   int i;
+   struct msgdma_pref_extended_desc *rx_descs;
+   struct msgdma_pref_extended_desc *tx_descs;
+   dma_addr_t rx_descsphys;
+   dma_addr_t tx_descsphys;
+   u32 rx_ring_size;
+   u32 tx_ring_size;
+
+   priv->pref_rxdescphys = (dma_addr_t)0;
+   priv->pref_txdescphys = (dma_addr_t)0;
+
+   /* we need to allocate more pref descriptors than ringsize, for now
+* just double ringsize
+*/
+   rx_ring_size = priv->rx_ring_size * 2;
+   tx_ring_size = priv->tx_ring_size * 2;
+
+   /* The prefetcher requires the descriptors to be aligned to the
+* descriptor read/write master's data width which worst case is
+* 512 bits.  Currently we DO NOT CHECK THIS and only support 32-bit
+* prefetcher masters.
+*/
+
+   /* allocate memory for rx descriptors */
+   priv->pref_rxdesc =
+   dma_zalloc_coherent(priv->device,
+   sizeof(struct msgdma_pref_extended_desc)
+   * rx_ring_size,
+   &priv->pref_rxdescphys, GFP_KERNEL);
+
+   if (!priv->pref_rxdesc)
+   goto err_rx;
+
+   /* allocate memory for tx descriptors */
+   priv->pref_txdesc =
+   dma_zalloc_coherent(priv->device,
+   sizeof(struct msgdma_pref_extended_desc)
+   * tx_ring_size,
+   &priv->pref_txdescphys, GFP_KERNEL);
+
+   if (!priv->pref_txdesc)
+   goto err_tx;
+
+   /* setup base descriptor ring for tx & rx */
+   rx_descs = (struct msgdma_pref_extended_desc *)priv->pref_rxdesc;
+   tx_descs = (struct msgdma_pref_extended_desc *)priv->pref_txdesc;
+   tx_descsphys = priv->pref_txdescphys;
+   rx_descsphys = pr

[PATCH net-next 4/8] net: eth: altera: tse: add optional function to start tx dma

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

Allow for optional start up of tx dma if the start_txdma
function is defined in altera_dmaops.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/altera_tse.h  | 1 +
 drivers/net/ethernet/altera/altera_tse_main.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/altera/altera_tse.h 
b/drivers/net/ethernet/altera/altera_tse.h
index d5b97e02e6d6..7f246040135d 100644
--- a/drivers/net/ethernet/altera/altera_tse.h
+++ b/drivers/net/ethernet/altera/altera_tse.h
@@ -412,6 +412,7 @@ struct altera_dmaops {
int (*init_dma)(struct altera_tse_private *priv);
void (*uninit_dma)(struct altera_tse_private *priv);
void (*start_rxdma)(struct altera_tse_private *priv);
+   void (*start_txdma)(struct altera_tse_private *priv);
 };
 
 /* This structure is private to each device.
diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index 0c0e8f9bba9b..f6b6a14b1ce9 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -1256,6 +1256,9 @@ static int tse_open(struct net_device *dev)
 
priv->dmaops->start_rxdma(priv);
 
+   if (priv->dmaops->start_txdma)
+   priv->dmaops->start_txdma(priv);
+
/* Start MAC Rx/Tx */
spin_lock(&priv->mac_cfg_lock);
tse_set_mac(priv, true);
@@ -1658,6 +1661,7 @@ static const struct altera_dmaops altera_dtype_sgdma = {
.init_dma = sgdma_initialize,
.uninit_dma = sgdma_uninitialize,
.start_rxdma = sgdma_start_rxdma,
+   .start_txdma = NULL,
 };
 
 static const struct altera_dmaops altera_dtype_msgdma = {
@@ -1677,6 +1681,7 @@ static const struct altera_dmaops altera_dtype_msgdma = {
.init_dma = msgdma_initialize,
.uninit_dma = msgdma_uninitialize,
.start_rxdma = msgdma_start_rxdma,
+   .start_txdma = NULL,
 };
 
 static const struct of_device_id altera_tse_ids[] = {
-- 
2.19.1

[PATCH net-next 6/8] net: eth: altera: tse: add support for ptp and timestamping

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

Add support for the ptp clock used with the tse, and update
the driver to support timestamping when enabled.  We also
enable debugfs entries for the ptp clock to allow some user
control and interaction with the ptp clock.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/Kconfig   |   1 +
 drivers/net/ethernet/altera/Makefile  |   3 +-
 drivers/net/ethernet/altera/altera_ptp.c  | 473 ++
 drivers/net/ethernet/altera/altera_ptp.h  |  77 +++
 drivers/net/ethernet/altera/altera_tse.h  |  10 +
 .../net/ethernet/altera/altera_tse_ethtool.c  |  28 ++
 drivers/net/ethernet/altera/altera_tse_main.c | 164 +-
 7 files changed, 754 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/altera/altera_ptp.c
 create mode 100644 drivers/net/ethernet/altera/altera_ptp.h

diff --git a/drivers/net/ethernet/altera/Kconfig 
b/drivers/net/ethernet/altera/Kconfig
index fdddba51473e..36aee0fc0b51 100644
--- a/drivers/net/ethernet/altera/Kconfig
+++ b/drivers/net/ethernet/altera/Kconfig
@@ -2,6 +2,7 @@ config ALTERA_TSE
tristate "Altera Triple-Speed Ethernet MAC support"
depends on HAS_DMA
select PHYLIB
+   select PTP_1588_CLOCK
---help---
  This driver supports the Altera Triple-Speed (TSE) Ethernet MAC.
 
diff --git a/drivers/net/ethernet/altera/Makefile 
b/drivers/net/ethernet/altera/Makefile
index d4a187e45369..ad80be42fa26 100644
--- a/drivers/net/ethernet/altera/Makefile
+++ b/drivers/net/ethernet/altera/Makefile
@@ -4,4 +4,5 @@
 
 obj-$(CONFIG_ALTERA_TSE) += altera_tse.o
 altera_tse-objs := altera_tse_main.o altera_tse_ethtool.o \
-altera_msgdma.o altera_sgdma.o altera_utils.o
+  altera_msgdma.o altera_sgdma.o altera_utils.o \
+  altera_ptp.o
diff --git a/drivers/net/ethernet/altera/altera_ptp.c 
b/drivers/net/ethernet/altera/altera_ptp.c
new file mode 100644
index ..4467b3c90c59
--- /dev/null
+++ b/drivers/net/ethernet/altera/altera_ptp.c
@@ -0,0 +1,473 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Altera PTP Hardware Clock (PHC) Linux driver
+ * Copyright (C) 2015-2016 Altera Corporation. All rights reserved.
+ * Copyright (C) 2017-2018 Intel Corporation. All rights reserved.
+ *
+ * Author(s):
+ * Dalon Westergreen 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "altera_ptp.h"
+#include "altera_utils.h"
+
+#define NOMINAL_PPB10ULL
+#define TOD_PERIOD_MAX 0xf
+#define TOD_PERIOD_MIN 0
+#define TOD_DRIFT_ADJUST_FNS_MAX   0x
+#define TOD_DRIFT_ADJUST_RATE_MAX  0x
+#define TOD_ADJUST_COUNT_MAX   0xf
+#define TOD_ADJUST_MS_MAX  (TOD_PERIOD_MAX) >> 16) + 1) * \
+ ((TOD_ADJUST_COUNT_MAX) + 1)) /  \
+100UL)
+
+/* A fine ToD HW clock offset adjustment.
+ * To perform the fine offset adjustment the AdjustPeriod register is used
+ * to replace the Period register for AdjustCount clock cycles in hardware.
+ */
+static int fine_adjust_tod_clock(struct altera_ptp_private *priv,
+u32 adjust_period, u32 adjust_count)
+{
+   int limit;
+
+   csrwr32(adjust_period, priv->tod_ctrl, tod_csroffs(adjust_period));
+   csrwr32(adjust_count, priv->tod_ctrl, tod_csroffs(adjust_count));
+
+   /* Wait for present offset adjustment update to complete */
+   limit = TOD_ADJUST_MS_MAX;
+   while (limit--) {
+   if (!csrrd32(priv->tod_ctrl, tod_csroffs(adjust_count)))
+   break;
+   mdelay(1);
+   }
+   if (limit < 0)
+   return -EBUSY;
+
+   return 0;
+}
+
+/* A coarse ToD HW clock offset adjustment.
+ * The coarse time adjustment performs by adding or subtracting the delta value
+ * from the current ToD HW clock time.
+ */
+static int coarse_adjust_tod_clock(struct altera_ptp_private *priv, s64 delta)
+{
+   u64 seconds;
+   u32 seconds_msb;
+   u32 seconds_lsb;
+   u32 nanosec;
+   u64 now;
+
+   if (delta == 0)
+   goto out;
+
+   /* Get current time */
+   nanosec = csrrd32(priv->tod_ctrl, tod_csroffs(nanosec));
+   seconds_lsb = csrrd32(priv->tod_ctrl, tod_csroffs(seconds_lsb));
+   seconds_msb = csrrd32(priv->tod_ctrl, tod_csroffs(seconds_msb));
+
+   /* Calculate new time */
+   seconds = (((u64)(seconds_msb & 0x)) << 32) | seconds_lsb;
+   now = seconds * NSEC_PER_SEC + nanosec + delta;
+
+   seconds = div_u64_rem(now, NSEC_PER_SEC, &nanosec);
+   seconds_msb = upper_32_bits(seconds) & 0x;
+   seconds_lsb = lower_32_bits(seconds);
+
+   /* Set corrected time */
+   csrwr32(seconds_msb, priv->tod_ctrl, tod_csroffs(seconds_msb));
+   csrwr32(seconds_lsb, priv->tod_ctrl,

[PATCH net-next 0/8] net: eth: altera: tse: Add PTP and mSGDMA prefetcher

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

This patch series cleans up the Altera TSE driver and adds support
for the newer msgdma prefetcher as well as ptp support when using
the msgdma prefetcher.

Dalon Westergreen (8):
  net: eth: altera: tse_start_xmit ignores tx_buffer call response
  net: eth: altera: set rx and tx ring size before init_dma call
  net: eth: altera: tse: fix altera_dmaops declaration
  net: eth: altera: tse: add optional function to start tx dma
  net: eth: altera: tse: Move common functions to altera_utils
  net: eth: altera: tse: add support for ptp and timestamping
  net: eth: altera: tse: add msgdma prefetcher
  net: eth: altera: tse: update devicetree bindings documentation

 .../devicetree/bindings/net/altera_tse.txt|  98 +++-
 drivers/net/ethernet/altera/Kconfig   |   1 +
 drivers/net/ethernet/altera/Makefile  |   3 +-
 .../altera/altera_msgdma_prefetcher.c | 433 
 .../altera/altera_msgdma_prefetcher.h |  30 ++
 .../altera/altera_msgdmahw_prefetcher.h   |  87 
 drivers/net/ethernet/altera/altera_ptp.c  | 473 ++
 drivers/net/ethernet/altera/altera_ptp.h  |  77 +++
 drivers/net/ethernet/altera/altera_sgdma.c|  14 +-
 drivers/net/ethernet/altera/altera_tse.h  | 100 ++--
 .../net/ethernet/altera/altera_tse_ethtool.c  |  29 ++
 drivers/net/ethernet/altera/altera_tse_main.c | 244 -
 drivers/net/ethernet/altera/altera_utils.c|  30 ++
 drivers/net/ethernet/altera/altera_utils.h|  46 ++
 14 files changed, 1554 insertions(+), 111 deletions(-)
 create mode 100644 drivers/net/ethernet/altera/altera_msgdma_prefetcher.c
 create mode 100644 drivers/net/ethernet/altera/altera_msgdma_prefetcher.h
 create mode 100644 drivers/net/ethernet/altera/altera_msgdmahw_prefetcher.h
 create mode 100644 drivers/net/ethernet/altera/altera_ptp.c
 create mode 100644 drivers/net/ethernet/altera/altera_ptp.h

-- 
2.19.1

[PATCH net-next 3/8] net: eth: altera: tse: fix altera_dmaops declaration

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

The declaration of struct altera_dmaops does not have
identifier names.  Add identifier names to confrom with
required coding styles.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/altera_tse.h | 30 +---
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_tse.h 
b/drivers/net/ethernet/altera/altera_tse.h
index e2feee87180a..d5b97e02e6d6 100644
--- a/drivers/net/ethernet/altera/altera_tse.h
+++ b/drivers/net/ethernet/altera/altera_tse.h
@@ -396,20 +396,22 @@ struct altera_tse_private;
 struct altera_dmaops {
int altera_dtype;
int dmamask;
-   void (*reset_dma)(struct altera_tse_private *);
-   void (*enable_txirq)(struct altera_tse_private *);
-   void (*enable_rxirq)(struct altera_tse_private *);
-   void (*disable_txirq)(struct altera_tse_private *);
-   void (*disable_rxirq)(struct altera_tse_private *);
-   void (*clear_txirq)(struct altera_tse_private *);
-   void (*clear_rxirq)(struct altera_tse_private *);
-   int (*tx_buffer)(struct altera_tse_private *, struct tse_buffer *);
-   u32 (*tx_completions)(struct altera_tse_private *);
-   void (*add_rx_desc)(struct altera_tse_private *, struct tse_buffer *);
-   u32 (*get_rx_status)(struct altera_tse_private *);
-   int (*init_dma)(struct altera_tse_private *);
-   void (*uninit_dma)(struct altera_tse_private *);
-   void (*start_rxdma)(struct altera_tse_private *);
+   void (*reset_dma)(struct altera_tse_private *priv);
+   void (*enable_txirq)(struct altera_tse_private *priv);
+   void (*enable_rxirq)(struct altera_tse_private *priv);
+   void (*disable_txirq)(struct altera_tse_private *priv);
+   void (*disable_rxirq)(struct altera_tse_private *priv);
+   void (*clear_txirq)(struct altera_tse_private *priv);
+   void (*clear_rxirq)(struct altera_tse_private *priv);
+   int (*tx_buffer)(struct altera_tse_private *priv,
+struct tse_buffer *buffer);
+   u32 (*tx_completions)(struct altera_tse_private *priv);
+   void (*add_rx_desc)(struct altera_tse_private *priv,
+   struct tse_buffer *buffer);
+   u32 (*get_rx_status)(struct altera_tse_private *priv);
+   int (*init_dma)(struct altera_tse_private *priv);
+   void (*uninit_dma)(struct altera_tse_private *priv);
+   void (*start_rxdma)(struct altera_tse_private *priv);
 };
 
 /* This structure is private to each device.
-- 
2.19.1

[PATCH net-next 2/8] net: eth: altera: set rx and tx ring size before init_dma call

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

It is more appropriate to set the rx and tx ring size before calling
the init function for the dma.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/altera_tse_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index dcb330129e23..0c0e8f9bba9b 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -1166,6 +1166,10 @@ static int tse_open(struct net_device *dev)
int i;
unsigned long int flags;
 
+   /* set tx and rx ring size */
+   priv->rx_ring_size = dma_rx_num;
+   priv->tx_ring_size = dma_tx_num;
+
/* Reset and configure TSE MAC and probe associated PHY */
ret = priv->dmaops->init_dma(priv);
if (ret != 0) {
@@ -1208,8 +1212,6 @@ static int tse_open(struct net_device *dev)
priv->dmaops->reset_dma(priv);
 
/* Create and initialize the TX/RX descriptors chains. */
-   priv->rx_ring_size = dma_rx_num;
-   priv->tx_ring_size = dma_tx_num;
ret = alloc_init_skbufs(priv);
if (ret) {
netdev_err(dev, "DMA descriptors initialization failed\n");
-- 
2.19.1

[PATCH net-next 1/8] net: eth: altera: tse_start_xmit ignores tx_buffer call response

2018-11-14 Thread Dalon Westergreen

From: Dalon Westergreen 

The return from tx_buffer call in tse_start_xmit is
inapropriately ignored.  tse_buffer calls should return
0 for success or NETDEV_TX_BUSY.  tse_start_xmit should
return not report a successful transmit when the tse_buffer
call returns an error condition.

In addition to the above, the msgdma and sgdma do not return
the same value on success or failure.  The sgdma_tx_buffer
returned 0 on failure and a positive number of transmitted
packets on success.  Given that it only ever sends 1 packet,
this made no sense.  The msgdma implementation msgdma_tx_buffer
returns 0 on success.

  -> Don't ignore the return from tse_buffer calls
  -> Fix sgdma tse_buffer call to return 0 on success
 and NETDEV_TX_BUSY on failure.

Signed-off-by: Dalon Westergreen 
---
 drivers/net/ethernet/altera/altera_sgdma.c| 14 --
 drivers/net/ethernet/altera/altera_tse_main.c |  4 +++-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_sgdma.c 
b/drivers/net/ethernet/altera/altera_sgdma.c
index 88ef67a998b4..eb47b9b820bb 100644
--- a/drivers/net/ethernet/altera/altera_sgdma.c
+++ b/drivers/net/ethernet/altera/altera_sgdma.c
@@ -15,6 +15,7 @@
  */
 
 #include 
+#include 
 #include "altera_utils.h"
 #include "altera_tse.h"
 #include "altera_sgdmahw.h"
@@ -170,10 +171,11 @@ void sgdma_clear_txirq(struct altera_tse_private *priv)
SGDMA_CTRLREG_CLRINT);
 }
 
-/* transmits buffer through SGDMA. Returns number of buffers
- * transmitted, 0 if not possible.
- *
- * tx_lock is held by the caller
+/* transmits buffer through SGDMA.
+ *   original behavior returned the number of transmitted packets (always 1) &
+ *   returned 0 on error.  This differs from the msgdma.  the calling function
+ *   will now actually look at the code, so from now, 0 is good and return
+ *   NETDEV_TX_BUSY when busy.
  */
 int sgdma_tx_buffer(struct altera_tse_private *priv, struct tse_buffer *buffer)
 {
@@ -185,7 +187,7 @@ int sgdma_tx_buffer(struct altera_tse_private *priv, struct 
tse_buffer *buffer)
 
/* wait 'til the tx sgdma is ready for the next transmit request */
if (sgdma_txbusy(priv))
-   return 0;
+   return NETDEV_TX_BUSY;
 
sgdma_setup_descrip(cdesc,  /* current descriptor */
ndesc,  /* next descriptor */
@@ -202,7 +204,7 @@ int sgdma_tx_buffer(struct altera_tse_private *priv, struct 
tse_buffer *buffer)
/* enqueue the request to the pending transmit queue */
queue_tx(priv, buffer);
 
-   return 1;
+   return 0;
 }
 
 
diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index baca8f704a45..dcb330129e23 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -606,7 +606,9 @@ static int tse_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
buffer->dma_addr = dma_addr;
buffer->len = nopaged_len;
 
-   priv->dmaops->tx_buffer(priv, buffer);
+   ret = priv->dmaops->tx_buffer(priv, buffer);
+   if (ret)
+   goto out;
 
skb_tx_timestamp(skb);
 
-- 
2.19.1

RE: [Intel-wired-lan] [PATCH net] igb: fix uninitialized variables

2018-11-14 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of wangyunjian
> Sent: Tuesday, November 6, 2018 12:27 AM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Cc: stone.z...@huawei.com; Yunjian Wang 
> Subject: [Intel-wired-lan] [PATCH net] igb: fix uninitialized variables
> 
> From: Yunjian Wang 
> 
> This patch fixes the variable 'phy_word' may be used uninitialized.
> 
> Signed-off-by: Yunjian Wang 
> ---
>  drivers/net/ethernet/intel/igb/e1000_i210.c | 1 +
>  1 file changed, 1 insertion(+)
> 

Tested-by: Aaron Brown

[net-next 09/14] i40e: always set ks->base.speed in i40e_get_settings_link_up

2018-11-14 Thread Jeff Kirsher

From: Jacob Keller 

In i40e_get_settings_link_up, set ks->base.speed to SPEED_UNKNOWN
in the case where we don't know the link speed.

Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1ed241a5c3f4..a6bc7847346b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -906,6 +906,7 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
ks->base.speed = SPEED_100;
break;
default:
+   ks->base.speed = SPEED_UNKNOWN;
break;
}
ks->base.duplex = DUPLEX_FULL;
-- 
2.19.1

[net-next 03/14] i40e: Add capability flag for stopping FW LLDP

2018-11-14 Thread Jeff Kirsher

From: Krzysztof Galazka 

Add HW capability flag to indicate that firmware supports stopping
LLDP agent. This feature has been added in FW API 1.7 for XL710
devices and 1.6 for X722. Also raise expected minor version number
for X722 FW API to 6.

Signed-off-by: Krzysztof Galazka 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.c | 6 ++
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 4 +++-
 drivers/net/ethernet/intel/i40e/i40e_type.h   | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c 
b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
index 501ee718177f..7ab61f6ebb5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
@@ -588,6 +588,12 @@ i40e_status i40e_init_adminq(struct i40e_hw *hw)
hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
hw->aq.api_min_ver >= I40E_MINOR_VER_GET_LINK_INFO_XL710) {
hw->flags |= I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE;
+   hw->flags |= I40E_HW_FLAG_FW_LLDP_STOPPABLE;
+   }
+   if (hw->mac.type == I40E_MAC_X722 &&
+   hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
+   hw->aq.api_min_ver >= I40E_MINOR_VER_FW_LLDP_STOPPABLE_X722) {
+   hw->flags |= I40E_HW_FLAG_FW_LLDP_STOPPABLE;
}
 
/* Newer versions of firmware require lock when reading the NVM */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 80e3eec6134e..11506102471c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -11,7 +11,7 @@
  */
 
 #define I40E_FW_API_VERSION_MAJOR  0x0001
-#define I40E_FW_API_VERSION_MINOR_X722 0x0005
+#define I40E_FW_API_VERSION_MINOR_X722 0x0006
 #define I40E_FW_API_VERSION_MINOR_X710 0x0007
 
 #define I40E_FW_MINOR_VERSION(_h) ((_h)->mac.type == I40E_MAC_XL710 ? \
@@ -20,6 +20,8 @@
 
 /* API version 1.7 implements additional link and PHY-specific APIs  */
 #define I40E_MINOR_VER_GET_LINK_INFO_XL710 0x0007
+/* API version 1.6 for X722 devices adds ability to stop FW LLDP agent */
+#define I40E_MINOR_VER_FW_LLDP_STOPPABLE_X722 0x0006
 
 struct i40e_aq_desc {
__le16 flags;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 7df969c59855..2781ab91ca82 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -615,6 +615,7 @@ struct i40e_hw {
 #define I40E_HW_FLAG_802_1AD_CAPABLEBIT_ULL(1)
 #define I40E_HW_FLAG_AQ_PHY_ACCESS_CAPABLE  BIT_ULL(2)
 #define I40E_HW_FLAG_NVM_READ_REQUIRES_LOCK BIT_ULL(3)
+#define I40E_HW_FLAG_FW_LLDP_STOPPABLE  BIT_ULL(4)
u64 flags;
 
/* Used in set switch config AQ command */
-- 
2.19.1

[net-next 05/14] i40e: Protect access to VF control methods

2018-11-14 Thread Jeff Kirsher

From: Jan Sokolowski 

A scenario has been found in which simultaneous
addition/removal and modification of VF's might cause
unstable behaviour, up to and including kernel panics.

Protect the methods that create/modify/destroy VF's
by locking them behind an atomically set bit in PF status
bitfield.

Signed-off-by: Jan Sokolowski 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h|  1 +
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c| 64 +--
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 876cac317e79..5595a4614206 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -146,6 +146,7 @@ enum i40e_state_t {
__I40E_CLIENT_SERVICE_REQUESTED,
__I40E_CLIENT_L2_CHANGE,
__I40E_CLIENT_RESET,
+   __I40E_VIRTCHNL_OP_PENDING,
/* This must be last as it determines the size of the BITMAP */
__I40E_STATE_SIZE__,
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index ac5698ed0b11..8e0a247e7e5a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1675,13 +1675,20 @@ static int i40e_pci_sriov_enable(struct pci_dev *pdev, 
int num_vfs)
 int i40e_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
 {
struct i40e_pf *pf = pci_get_drvdata(pdev);
+   int ret = 0;
+
+   if (test_and_set_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state)) {
+   dev_warn(&pdev->dev, "Unable to configure VFs, other operation 
is pending.\n");
+   return -EAGAIN;
+   }
 
if (num_vfs) {
if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
}
-   return i40e_pci_sriov_enable(pdev, num_vfs);
+   ret = i40e_pci_sriov_enable(pdev, num_vfs);
+   goto sriov_configure_out;
}
 
if (!pci_vfs_assigned(pf->pdev)) {
@@ -1690,9 +1697,12 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int 
num_vfs)
i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
} else {
dev_warn(&pdev->dev, "Unable to free VFs because some are 
assigned to VMs.\n");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto sriov_configure_out;
}
-   return 0;
+sriov_configure_out:
+   clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
+   return ret;
 }
 
 /***virtual channel routines**/
@@ -3893,6 +3903,11 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int 
vf_id, u8 *mac)
goto error_param;
}
 
+   if (test_and_set_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state)) {
+   dev_warn(&pf->pdev->dev, "Unable to configure VFs, other 
operation is pending.\n");
+   return -EAGAIN;
+   }
+
if (is_multicast_ether_addr(mac)) {
dev_err(&pf->pdev->dev,
"Invalid Ethernet address %pM for VF %d\n", mac, vf_id);
@@ -3941,6 +3956,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int 
vf_id, u8 *mac)
dev_info(&pf->pdev->dev, "Bring down and up the VF interface to make 
this change effective.\n");
 
 error_param:
+   clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
return ret;
 }
 
@@ -3992,6 +4008,11 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, 
int vf_id,
struct i40e_vf *vf;
int ret = 0;
 
+   if (test_and_set_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state)) {
+   dev_warn(&pf->pdev->dev, "Unable to configure VFs, other 
operation is pending.\n");
+   return -EAGAIN;
+   }
+
/* validate the request */
ret = i40e_validate_vf(pf, vf_id);
if (ret)
@@ -4107,6 +4128,7 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, 
int vf_id,
ret = 0;
 
 error_pvid:
+   clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
return ret;
 }
 
@@ -4128,6 +4150,11 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int 
vf_id, int min_tx_rate,
struct i40e_vf *vf;
int ret = 0;
 
+   if (test_and_set_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state)) {
+   dev_warn(&pf->pdev->dev, "Unable to configure VFs, other 
operation is pending.\n");
+   return -EAGAIN;
+   }
+
/* validate the request */
ret = i40e_validate_vf(pf, vf_id);
if (ret)
@@ -4154,6 +4181,7 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int 
vf_id, int min_tx_rate,
 
vf->tx_rate = max_tx_rate;
 error:
+   clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
return ret;
 }
 
@

[net-next 02/14] i40e: Use a local variable for readability

2018-11-14 Thread Jeff Kirsher

From: Jan Sokolowski 

Use a local variable to make the code a bit more readable.

Signed-off-by: Jan Sokolowski 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 1384a5a006a4..c4d44096cdaf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3528,6 +3528,7 @@ static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
u16 i = xdp_ring->next_to_use;
struct i40e_tx_buffer *tx_bi;
struct i40e_tx_desc *tx_desc;
+   void *data = xdpf->data;
u32 size = xdpf->len;
dma_addr_t dma;
 
@@ -3535,8 +3536,7 @@ static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
xdp_ring->tx_stats.tx_busy++;
return I40E_XDP_CONSUMED;
}
-
-   dma = dma_map_single(xdp_ring->dev, xdpf->data, size, DMA_TO_DEVICE);
+   dma = dma_map_single(xdp_ring->dev, data, size, DMA_TO_DEVICE);
if (dma_mapping_error(xdp_ring->dev, dma))
return I40E_XDP_CONSUMED;
 
-- 
2.19.1

[net-next 08/14] i40e: don't restart nway if autoneg not supported

2018-11-14 Thread Jeff Kirsher

From: Mitch Williams 

On link types that do not support autoneg, we cannot attempt to restart
nway negotiation. This results in a dead link that requires a power
cycle to remedy.

Fix this by saving off the autoneg state and checking this value before
we try to restart nway.

Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 311edac272aa..1ed241a5c3f4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1335,6 +1335,7 @@ static int i40e_set_pauseparam(struct net_device *netdev,
i40e_status status;
u8 aq_failures;
int err = 0;
+   u32 is_an;
 
/* Changing the port's flow control is not supported if this isn't the
 * port's controlling PF
@@ -1347,15 +1348,14 @@ static int i40e_set_pauseparam(struct net_device 
*netdev,
if (vsi != pf->vsi[pf->lan_vsi])
return -EOPNOTSUPP;
 
-   if (pause->autoneg != ((hw_link_info->an_info & I40E_AQ_AN_COMPLETED) ?
-   AUTONEG_ENABLE : AUTONEG_DISABLE)) {
+   is_an = hw_link_info->an_info & I40E_AQ_AN_COMPLETED;
+   if (pause->autoneg != is_an) {
netdev_info(netdev, "To change autoneg please use: ethtool -s 
 autoneg \n");
return -EOPNOTSUPP;
}
 
/* If we have link and don't have autoneg */
-   if (!test_bit(__I40E_DOWN, pf->state) &&
-   !(hw_link_info->an_info & I40E_AQ_AN_COMPLETED)) {
+   if (!test_bit(__I40E_DOWN, pf->state) && !is_an) {
/* Send message that it might not necessarily work*/
netdev_info(netdev, "Autoneg did not complete so changing 
settings may not result in an actual change.\n");
}
@@ -1406,7 +1406,7 @@ static int i40e_set_pauseparam(struct net_device *netdev,
err = -EAGAIN;
}
 
-   if (!test_bit(__I40E_DOWN, pf->state)) {
+   if (!test_bit(__I40E_DOWN, pf->state) && is_an) {
/* Give it a little more time to try to come back */
msleep(75);
if (!test_bit(__I40E_DOWN, pf->state))
-- 
2.19.1

[net-next 10/14] virtchnl: white space and reorder

2018-11-14 Thread Jeff Kirsher

From: Alice Michael 

White space change.

Move the check on the virtchnl_vsi_queue_config_info struct
to be close to the struct like all the other similar checks.
This keeps it clearer and easier to read.

Signed-off-by: Alice Michael 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 include/linux/avf/virtchnl.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index b2488055fd1d..3130dec40b93 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -171,7 +171,7 @@ struct virtchnl_msg {
 
 VIRTCHNL_CHECK_STRUCT_LEN(20, virtchnl_msg);
 
-/* Message descriptions and data structures.*/
+/* Message descriptions and data structures. */
 
 /* VIRTCHNL_OP_VERSION
  * VF posts its version number to the PF. PF responds with its version number
@@ -342,6 +342,8 @@ struct virtchnl_vsi_queue_config_info {
struct virtchnl_queue_pair_info qpair[1];
 };
 
+VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_vsi_queue_config_info);
+
 /* VIRTCHNL_OP_REQUEST_QUEUES
  * VF sends this message to request the PF to allocate additional queues to
  * this VF.  Each VF gets a guaranteed number of queues on init but asking for
@@ -357,8 +359,6 @@ struct virtchnl_vf_res_request {
u16 num_queue_pairs;
 };
 
-VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_vsi_queue_config_info);
-
 /* VIRTCHNL_OP_CONFIG_IRQ_MAP
  * VF uses this message to map vectors to queues.
  * The rxq_map and txq_map fields are bitmaps used to indicate which queues
-- 
2.19.1

[net-next 11/14] virtchnl: Fix off by one error

2018-11-14 Thread Jeff Kirsher

From: Alice Michael 

When calculating the valid length for a VIRTCHNL_OP_ENABLE_CHANNELS
message, we accidentally allowed messages with one extra
virtchnl_channel_info structure on the end. This happened due
to an off by one error, because we forgot that valid_len already
accounted for one virtchnl_channel_info structure, so we need to
subtract one from the num_tc value.

Signed-off-by: Alice Michael 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 include/linux/avf/virtchnl.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 3130dec40b93..7605b5919c3a 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -819,8 +819,8 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info 
*ver, u32 v_opcode,
if (msglen >= valid_len) {
struct virtchnl_tc_info *vti =
(struct virtchnl_tc_info *)msg;
-   valid_len += vti->num_tc *
-   sizeof(struct virtchnl_channel_info);
+   valid_len += (vti->num_tc - 1) *
+sizeof(struct virtchnl_channel_info);
if (vti->num_tc == 0)
err_msg_format = true;
}
-- 
2.19.1

[net-next 01/14] i40e: Replace spin_is_locked() with lockdep

2018-11-14 Thread Jeff Kirsher

From: Lance Roy 

lockdep_assert_held() is better suited to checking locking requirements,
since it won't get confused when someone else holds the lock. This is
also a step towards possibly removing spin_is_locked().

Signed-off-by: Lance Roy 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 21c2688d6308..7e4c07227832 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1493,8 +1493,7 @@ int i40e_del_mac_filter(struct i40e_vsi *vsi, const u8 
*macaddr)
bool found = false;
int bkt;
 
-   WARN(!spin_is_locked(&vsi->mac_filter_hash_lock),
-"Missing mac_filter_hash_lock\n");
+   lockdep_assert_held(&vsi->mac_filter_hash_lock);
hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) {
if (ether_addr_equal(macaddr, f->macaddr)) {
__i40e_del_filter(vsi, f);
-- 
2.19.1

[net-next 07/14] i40e: Allow disabling FW LLDP on X722 devices

2018-11-14 Thread Jeff Kirsher

From: Patryk Małek 

This patch allows disabling FW LLDP agent on X722 devices.
It also changes a source of information for this feature from
pf->hw_features to pf->hw.flags which are set in i40e_init_adminq.

Signed-off-by: Patryk Małek 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  1 -
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 +++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 13 +++--
 drivers/net/ethernet/intel/i40e/i40e_main.c| 15 +++
 4 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 5595a4614206..cda37d7ae5d6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -495,7 +495,6 @@ struct i40e_pf {
 #define I40E_HW_STOP_FW_LLDP   BIT(16)
 #define I40E_HW_PORT_ID_VALID  BIT(17)
 #define I40E_HW_RESTART_AUTONEGBIT(18)
-#define I40E_HW_STOPPABLE_FW_LLDP  BIT(19)
 
u32 flags;
 #define I40E_FLAG_RX_CSUM_ENABLED  BIT(0)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 85f75b5978fc..97a9b1fb4763 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -3723,6 +3723,9 @@ i40e_aq_set_dcb_parameters(struct i40e_hw *hw, bool 
dcb_enable,
(struct i40e_aqc_set_dcb_parameters *)&desc.params.raw;
i40e_status status;
 
+   if (!(hw->flags & I40E_HW_FLAG_FW_LLDP_STOPPABLE))
+   return I40E_ERR_DEVICE_NOT_SUPPORTED;
+
i40e_fill_default_direct_cmd_desc(&desc,
  i40e_aqc_opc_set_dcb_parameters);
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 9c1211ad2c6b..311edac272aa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -4660,14 +4660,15 @@ static int i40e_set_priv_flags(struct net_device *dev, 
u32 flags)
return -EOPNOTSUPP;
 
/* If the driver detected FW LLDP was disabled on init, this flag could
-* be set, however we do not support _changing_ the flag if NPAR is
-* enabled or FW API version < 1.7.  There are situations where older
-* FW versions/NPAR enabled PFs could disable LLDP, however we _must_
-* not allow the user to enable/disable LLDP with this flag on
-* unsupported FW versions.
+* be set, however we do not support _changing_ the flag:
+* - on XL710 if NPAR is enabled or FW API version < 1.7
+* - on X722 with FW API version < 1.6
+* There are situations where older FW versions/NPAR enabled PFs could
+* disable LLDP, however we _must_ not allow the user to enable/disable
+* LLDP with this flag on unsupported FW versions.
 */
if (changed_flags & I40E_FLAG_DISABLE_FW_LLDP) {
-   if (!(pf->hw_features & I40E_HW_STOPPABLE_FW_LLDP)) {
+   if (!(pf->hw.flags & I40E_HW_FLAG_FW_LLDP_STOPPABLE)) {
dev_warn(&pf->pdev->dev,
 "Device does not support changing FW LLDP\n");
return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index dbd6fffd9b85..d4461eec26bd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11331,16 +11331,15 @@ static int i40e_sw_init(struct i40e_pf *pf)
/* IWARP needs one extra vector for CQP just like MISC.*/
pf->num_iwarp_msix = (int)num_online_cpus() + 1;
}
-   /* Stopping the FW LLDP engine is only supported on the
-* XL710 with a FW ver >= 1.7.  Also, stopping FW LLDP
-* engine is not supported if NPAR is functioning on this
-* part
+   /* Stopping FW LLDP engine is supported on XL710 and X722
+* starting from FW versions determined in i40e_init_adminq.
+* Stopping the FW LLDP engine is not supported on XL710
+* if NPAR is functioning so unset this hw flag in this case.
 */
if (pf->hw.mac.type == I40E_MAC_XL710 &&
-   !pf->hw.func_caps.npar_enable &&
-   (pf->hw.aq.api_maj_ver > 1 ||
-(pf->hw.aq.api_maj_ver == 1 && pf->hw.aq.api_min_ver > 6)))
-   pf->hw_features |= I40E_HW_STOPPABLE_FW_LLDP;
+   pf->hw.func_caps.npar_enable &&
+   (pf->hw.flags & I40E_HW_FLAG_FW_LLDP_STOPPABLE))
+   pf->hw.flags &= ~I40E_HW_FLAG_FW_LLDP_STOPPABLE;
 
 #ifdef CONFIG_PCI_IOV
if (pf->hw.func_caps.num_vfs && pf->hw.partition_id == 1) {
-- 
2.19.1

[net-next 06/14] i40e: update driver version

2018-11-14 Thread Jeff Kirsher

From: Alice Michael 

The version numbers have not been kept up to date and this is
an effort to ammend that.

Signed-off-by: Alice Michael 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b6f4ebb4557e..dbd6fffd9b85 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -26,8 +26,8 @@ static const char i40e_driver_string[] =
 #define DRV_KERN "-k"
 
 #define DRV_VERSION_MAJOR 2
-#define DRV_VERSION_MINOR 3
-#define DRV_VERSION_BUILD 2
+#define DRV_VERSION_MINOR 7
+#define DRV_VERSION_BUILD 6
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD)DRV_KERN
-- 
2.19.1

[net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2018-11-14

2018-11-14 Thread Jeff Kirsher

This series contains updates to i40e and virtchnl.

Lance Roy updates i40e to use lockdep_assert_held() instead of
spin_is_locked(), since it is better suited to check locking
requirements.

Jan improves the code readability in XDP by adding the use of a local
variable.  Provides protection on methods that create/modify/destroy
VF's via locking mechanism to prevent unstable behaviour and potential
kernel panics.

Krzysztof adds a hardware capability flag to indicate whether firmware
supports stopping the LLDP agent.

Patryk replaces the use of strncpy() with strlcpy() to ensure the buffer
is NULL terminated.

Mitch fixes the issue of trying to start nway on devices that do not
support auto-negotiation, by checking the autoneg state before
attempting to restart nway.

Alice updates virtchnl to keep the checks all together for ease of
readability and consistency.  Also fixed a "off by one" error in the
number of traffic classes being calculated.

Richard fixed VF port VLANs, where the priority bits were incorrectly
set because the incorrect shift and mask bits were being used.

Alan adds a bit to set and check if a timeout recovery is already
pending to prevent overlapping transmit timeout recovery.

The following are changes since commit 15cef30974c5f8b256008beb62dcbf15792b77a9:
  Merge branch 'aquantia-add-rx-flow-filter-support'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (1):
  i40e: prevent overlapping tx_timeout recover

Alice Michael (3):
  i40e: update driver version
  virtchnl: white space and reorder
  virtchnl: Fix off by one error

Jacob Keller (1):
  i40e: always set ks->base.speed in i40e_get_settings_link_up

Jan Sokolowski (2):
  i40e: Use a local variable for readability
  i40e: Protect access to VF control methods

Krzysztof Galazka (1):
  i40e: Add capability flag for stopping FW LLDP

Lance Roy (1):
  i40e: Replace spin_is_locked() with lockdep

Mitch Williams (2):
  i40e: don't restart nway if autoneg not supported
  i40e: suppress bogus error message

Patryk Małek (2):
  i40e: Replace strncpy with strlcpy to ensure null termination
  i40e: Allow disabling FW LLDP on X722 devices

Richard Rodriguez (1):
  i40e: Use correct shift for VLAN priority

 drivers/net/ethernet/intel/i40e/i40e.h|  3 +-
 drivers/net/ethernet/intel/i40e/i40e_adminq.c |  6 ++
 .../net/ethernet/intel/i40e/i40e_adminq_cmd.h |  4 +-
 drivers/net/ethernet/intel/i40e/i40e_common.c |  3 +
 .../net/ethernet/intel/i40e/i40e_ethtool.c| 24 ---
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 41 ++--
 drivers/net/ethernet/intel/i40e/i40e_ptp.c|  2 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  4 +-
 drivers/net/ethernet/intel/i40e/i40e_type.h   |  1 +
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c| 67 +--
 .../ethernet/intel/i40e/i40e_virtchnl_pf.h|  4 +-
 include/linux/avf/virtchnl.h  | 10 +--
 12 files changed, 121 insertions(+), 48 deletions(-)

-- 
2.19.1

[net-next 12/14] i40e: Use correct shift for VLAN priority

2018-11-14 Thread Jeff Kirsher

From: Richard Rodriguez 

When using port VLAN, for VFs, and setting priority bits, the device
was sending out incorrect priority bits, and also setting the CFI
bit incorrectly.

To fix this, changed shift and mask bit definition for this function, to
use the correct ones.

Signed-off-by: Richard Rodriguez 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index bf67d62e2b5f..f9621026beef 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -13,9 +13,9 @@
 #define I40E_DEFAULT_NUM_MDD_EVENTS_ALLOWED3
 #define I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED  10
 
-#define I40E_VLAN_PRIORITY_SHIFT   12
+#define I40E_VLAN_PRIORITY_SHIFT   13
 #define I40E_VLAN_MASK 0xFFF
-#define I40E_PRIORITY_MASK 0x7000
+#define I40E_PRIORITY_MASK 0xE000
 
 /* Various queue ctrls */
 enum i40e_queue_ctrl {
-- 
2.19.1

[net-next 14/14] i40e: prevent overlapping tx_timeout recover

2018-11-14 Thread Jeff Kirsher

From: Alan Brady 

If a TX hang occurs, we attempt to recover by incrementally resetting.
If we're starved for CPU time, it's possible the reset doesn't actually
complete (or even fire) before another tx_timeout fires causing us to
fly through the different resets without actually doing them.

This adds a bit to set and check if a timeout recovery is already
pending and, if so, bail out of tx_timeout.  The bit will get cleared at
the end of i40e_rebuild when reset is complete.

Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h  | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index cda37d7ae5d6..8de9085bba9e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -122,6 +122,7 @@ enum i40e_state_t {
__I40E_MDD_EVENT_PENDING,
__I40E_VFLR_EVENT_PENDING,
__I40E_RESET_RECOVERY_PENDING,
+   __I40E_TIMEOUT_RECOVERY_PENDING,
__I40E_MISC_IRQ_REQUESTED,
__I40E_RESET_INTR_RECEIVED,
__I40E_REINIT_REQUESTED,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d4461eec26bd..47f0fdadbac9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -338,6 +338,10 @@ static void i40e_tx_timeout(struct net_device *netdev)
  (pf->tx_timeout_last_recovery + netdev->watchdog_timeo)))
return;   /* don't do any new action before the next timeout */
 
+   /* don't kick off another recovery if one is already pending */
+   if (test_and_set_bit(__I40E_TIMEOUT_RECOVERY_PENDING, pf->state))
+   return;
+
if (tx_ring) {
head = i40e_get_head(tx_ring);
/* Read interrupt register */
@@ -9631,6 +9635,7 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, 
bool lock_acquired)
clear_bit(__I40E_RESET_FAILED, pf->state);
 clear_recovery:
clear_bit(__I40E_RESET_RECOVERY_PENDING, pf->state);
+   clear_bit(__I40E_TIMEOUT_RECOVERY_PENDING, pf->state);
 }
 
 /**
-- 
2.19.1

[net-next 04/14] i40e: Replace strncpy with strlcpy to ensure null termination

2018-11-14 Thread Jeff Kirsher

From: Patryk Małek 

Using strncpy allows destination buffer to be not null terminated
after the copying takes place. strlcpy ensures that's not the
case by explicitly setting last element in the buffer as '\0'.

Signed-off-by: Patryk Małek 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 14 +++---
 drivers/net/ethernet/intel/i40e/i40e_ptp.c  |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 7e4c07227832..b6f4ebb4557e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14301,23 +14301,23 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 
switch (hw->bus.speed) {
case i40e_bus_speed_8000:
-   strncpy(speed, "8.0", PCI_SPEED_SIZE); break;
+   strlcpy(speed, "8.0", PCI_SPEED_SIZE); break;
case i40e_bus_speed_5000:
-   strncpy(speed, "5.0", PCI_SPEED_SIZE); break;
+   strlcpy(speed, "5.0", PCI_SPEED_SIZE); break;
case i40e_bus_speed_2500:
-   strncpy(speed, "2.5", PCI_SPEED_SIZE); break;
+   strlcpy(speed, "2.5", PCI_SPEED_SIZE); break;
default:
break;
}
switch (hw->bus.width) {
case i40e_bus_width_pcie_x8:
-   strncpy(width, "8", PCI_WIDTH_SIZE); break;
+   strlcpy(width, "8", PCI_WIDTH_SIZE); break;
case i40e_bus_width_pcie_x4:
-   strncpy(width, "4", PCI_WIDTH_SIZE); break;
+   strlcpy(width, "4", PCI_WIDTH_SIZE); break;
case i40e_bus_width_pcie_x2:
-   strncpy(width, "2", PCI_WIDTH_SIZE); break;
+   strlcpy(width, "2", PCI_WIDTH_SIZE); break;
case i40e_bus_width_pcie_x1:
-   strncpy(width, "1", PCI_WIDTH_SIZE); break;
+   strlcpy(width, "1", PCI_WIDTH_SIZE); break;
default:
break;
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c 
b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 1199f0502d6d..e6fc0aff8c99 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -694,7 +694,7 @@ static long i40e_ptp_create_clock(struct i40e_pf *pf)
if (!IS_ERR_OR_NULL(pf->ptp_clock))
return 0;
 
-   strncpy(pf->ptp_caps.name, i40e_driver_name,
+   strlcpy(pf->ptp_caps.name, i40e_driver_name,
sizeof(pf->ptp_caps.name) - 1);
pf->ptp_caps.owner = THIS_MODULE;
pf->ptp_caps.max_adj = 9;
-- 
2.19.1

[net-next 13/14] i40e: suppress bogus error message

2018-11-14 Thread Jeff Kirsher

From: Mitch Williams 

The i40e driver complains about unprivileged VFs trying to configure
promiscuous mode each time a VF reset occurs. This isn't the fault of
the poor VF driver - the PF driver itself is making the request.

To fix this, skip the privilege check if the request is to disable all
promiscuous activity. This gets rid of the bogus message, but doesn't
affect privilege checks, since we really only care if the unprivileged
VF is trying to enable promiscuous mode.

Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 8e0a247e7e5a..2ac23ebfbf31 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1112,7 +1112,8 @@ static i40e_status i40e_config_vf_promiscuous_mode(struct 
i40e_vf *vf,
if (!i40e_vc_isvalid_vsi_id(vf, vsi_id) || !vsi)
return I40E_ERR_PARAM;
 
-   if (!test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps)) {
+   if (!test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps) &&
+   (allmulti || alluni)) {
dev_err(&pf->pdev->dev,
"Unprivileged VF %d is attempting to configure 
promiscuous mode\n",
vf->vf_id);
-- 
2.19.1

Re: xfrm: policy: add inexact policy search tree infrastructure

2018-11-14 Thread Florian Westphal

Colin Ian King  wrote:
> Hi,
> 
> Static analysis with CoverityScan found a potential issue with the commit:
> 
> commit 6be3b0db6db82cf056a72cc18042048edd27f8ee
> Author: Florian Westphal 
> Date:   Wed Nov 7 23:00:37 2018 +0100
> 
> xfrm: policy: add inexact policy search tree infrastructure
> 
> It seems that pointer pol is set to NULL and then a check to see if it
> is non-null is used to set pol to tmp; howeverm this check is always
> going to be false because pol is always NULL.

Right.  This is in the control-plane code to retrieve a policy
via netlink or PF_KEY.

> The issue is reported by CoverityScan as follows:
> 
> Line
> 1658
> assignment: Assigning: pol = NULL.
> 1659pol = NULL;
> 1660for (i = 0; i < ARRAY_SIZE(cand.res); i++) {
> 1661struct xfrm_policy *tmp;
> 1662
> 1663tmp = __xfrm_policy_bysel_ctx(cand.res[i], mark,
> 1664  if_id, type, dir,
> 1665  sel, ctx);
> 
> null: At condition pol, the value of pol must be NULL.
> dead_error_condition: The condition pol cannot be true.
> 
> CID 1475480 (#1 of 1): Logically dead code
> 
> (DEADCODE) dead_error_line: Execution cannot reach the expression
> tmp->pos < pol->pos inside this statement: if (tmp && pol && tmp->pos 
> 
> 1666if (tmp && pol && tmp->pos < pol->pos)
> 1667pol = tmp;
>
> 
> I suspect this is not intentional and is probably a bug.

Right, bug.  Would like to just break after first 'tmp != NULL' but
that might make us return a different policy than old linear search.
So we should update pol in case its NULL as well.

Steffen, let me know if you want an incremental fix or if you
prefer to squash this:

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1663,7 +1663,10 @@ struct xfrm_policy *xfrm_policy_bysel_ctx(struct net 
*net, u32 mark, u32 if_id,
tmp = __xfrm_policy_bysel_ctx(cand.res[i], mark,
  if_id, type, dir,
  sel, ctx);
-   if (tmp && pol && tmp->pos < pol->pos)
+   if (!tmp)
+   continue;
+
+   if (!pol || tmp->pos < pol->pol)
pol = tmp;
}
} else {

re: xfrm: policy: add inexact policy search tree infrastructure

2018-11-14 Thread Colin Ian King

Hi,

Static analysis with CoverityScan found a potential issue with the commit:

commit 6be3b0db6db82cf056a72cc18042048edd27f8ee
Author: Florian Westphal 
Date:   Wed Nov 7 23:00:37 2018 +0100

xfrm: policy: add inexact policy search tree infrastructure

It seems that pointer pol is set to NULL and then a check to see if it
is non-null is used to set pol to tmp; howeverm this check is always
going to be false because pol is always NULL.

The issue is reported by CoverityScan as follows:

Line
1658
assignment: Assigning: pol = NULL.
1659pol = NULL;
1660for (i = 0; i < ARRAY_SIZE(cand.res); i++) {
1661struct xfrm_policy *tmp;
1662
1663tmp = __xfrm_policy_bysel_ctx(cand.res[i], mark,
1664  if_id, type, dir,
1665  sel, ctx);

null: At condition pol, the value of pol must be NULL.
dead_error_condition: The condition pol cannot be true.

CID 1475480 (#1 of 1): Logically dead code

(DEADCODE) dead_error_line: Execution cannot reach the expression
tmp->pos < pol->pos inside this statement: if (tmp && pol && tmp->pos 

1666if (tmp && pol && tmp->pos < pol->pos)
1667pol = tmp;
1668}


I suspect this is not intentional and is probably a bug.

Colin

[PATCH AUTOSEL 4.18 57/59] net: qualcomm: rmnet: Fix incorrect assignment of real_dev

2018-11-14 Thread Sasha Levin

From: Subash Abhinov Kasiviswanathan 

[ Upstream commit d02854dc1999ed3e7fd79ec700c64ac23ac0c458 ]

A null dereference was observed when a sysctl was being set
from userspace and rmnet was stuck trying to complete some actions
in the NETDEV_REGISTER callback. This is because the real_dev is set
only after the device registration handler completes.

sysctl call stack -

<6> Unable to handle kernel NULL pointer dereference at
virtual address 0108
<2> pc : rmnet_vnd_get_iflink+0x1c/0x28
<2> lr : dev_get_iflink+0x2c/0x40
<2>  rmnet_vnd_get_iflink+0x1c/0x28
<2>  inet6_fill_ifinfo+0x15c/0x234
<2>  inet6_ifinfo_notify+0x68/0xd4
<2>  ndisc_ifinfo_sysctl_change+0x1b8/0x234
<2>  proc_sys_call_handler+0xac/0x100
<2>  proc_sys_write+0x3c/0x4c
<2>  __vfs_write+0x54/0x14c
<2>  vfs_write+0xcc/0x188
<2>  SyS_write+0x60/0xc0
<2>  el0_svc_naked+0x34/0x38

device register call stack -

<2>  notifier_call_chain+0x84/0xbc
<2>  raw_notifier_call_chain+0x38/0x48
<2>  call_netdevice_notifiers_info+0x40/0x70
<2>  call_netdevice_notifiers+0x38/0x60
<2>  register_netdevice+0x29c/0x3d8
<2>  rmnet_vnd_newlink+0x68/0xe8
<2>  rmnet_newlink+0xa0/0x160
<2>  rtnl_newlink+0x57c/0x6c8
<2>  rtnetlink_rcv_msg+0x1dc/0x328
<2>  netlink_rcv_skb+0xac/0x118
<2>  rtnetlink_rcv+0x24/0x30
<2>  netlink_unicast+0x158/0x1f0
<2>  netlink_sendmsg+0x32c/0x338
<2>  sock_sendmsg+0x44/0x60
<2>  SyS_sendto+0x150/0x1ac
<2>  el0_svc_naked+0x34/0x38

Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
Signed-off-by: Sean Tranchetti 
Signed-off-by: Subash Abhinov Kasiviswanathan 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index b9a7548ec6a0..2efdf7d2dec8 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -234,7 +234,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
  struct net_device *real_dev,
  struct rmnet_endpoint *ep)
 {
-   struct rmnet_priv *priv;
+   struct rmnet_priv *priv = netdev_priv(rmnet_dev);
int rc;
 
if (ep->egress_dev)
@@ -247,6 +247,8 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
rmnet_dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
rmnet_dev->hw_features |= NETIF_F_SG;
 
+   priv->real_dev = real_dev;
+
rc = register_netdevice(rmnet_dev);
if (!rc) {
ep->egress_dev = rmnet_dev;
@@ -255,9 +257,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 
rmnet_dev->rtnl_link_ops = &rmnet_link_ops;
 
-   priv = netdev_priv(rmnet_dev);
priv->mux_id = id;
-   priv->real_dev = real_dev;
 
netdev_dbg(rmnet_dev, "rmnet dev created\n");
}
-- 
2.17.1

Re: [PATCH net] net/sched: act_pedit: fix memory leak when IDR allocation fails

2018-11-14 Thread Cong Wang

(Cc'ing Jamal)

On Wed, Nov 14, 2018 at 3:26 AM Davide Caratti  wrote:
>
> tcf_idr_check_alloc() can return a negative value, on allocation failures
> (-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.

I think the comments above tcf_idr_check_alloc() need to improve too,
they imply tcf_idr_check_alloc() returns either 0 or 1.

Of course, this can be done with a separated patch.

>
> Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
> Signed-off-by: Davide Caratti 

I think your patch is correct.

Acked-by: Cong Wang

Re: [RFC v1 3/3] vxlan: handle underlay VRF changes

2018-11-14 Thread David Ahern

On 11/14/18 1:31 AM, Alexis Bauvin wrote:
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 7477b5510a04..188c0cdb8838 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
> *first_remote_rtnl(struct vxlan_fdb *fdb)
>   return list_first_entry(&fdb->remotes, struct vxlan_rdst, list);
>  }
>  
> +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
> + struct net_device *dev)
> +{
> + if (!chain)
> + return 0;
> +
> + if (chain->ifindex == dev->ifindex)
> + return 1;
> + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
> + dev);

l3mdev_master_dev_rcu

Re: [RFC v1 2/3] vxlan: add support for underlay in non-default VRF

2018-11-14 Thread David Ahern

you are making this more specific than it needs to be 

On 11/14/18 1:31 AM, Alexis Bauvin wrote:
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 27bd586b94b0..7477b5510a04 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -208,11 +208,23 @@ static inline struct vxlan_rdst 
> *first_remote_rtnl(struct vxlan_fdb *fdb)
>   return list_first_entry(&fdb->remotes, struct vxlan_rdst, list);
>  }
>  
> +static int vxlan_get_l3mdev(struct net *net, int ifindex)
> +{
> + struct net_device *dev;
> +
> + dev = __dev_get_by_index(net, ifindex);
> + while (dev && !netif_is_l3_master(dev))
> + dev = netdev_master_upper_dev_get(dev);
> +
> + return dev ? dev->ifindex : 0;
> +}

l3mdev_master_ifindex_by_index should work instead of defining this for
vxlan.

But I do not believe you need this function.


> +
>  /* Find VXLAN socket based on network namespace, address family and UDP port
>   * and enabled unshareable flags.
>   */
>  static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t 
> family,
> -   __be16 port, u32 flags)
> +   __be16 port, u32 flags,
> +   int l3mdev_ifindex)
>  {
>   struct vxlan_sock *vs;
>  
> @@ -221,7 +233,8 @@ static struct vxlan_sock *vxlan_find_sock(struct net 
> *net, sa_family_t family,
>   hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
>   if (inet_sk(vs->sock->sk)->inet_sport == port &&
>   vxlan_get_sk_family(vs) == family &&
> - vs->flags == flags)
> + vs->flags == flags &&
> + vs->sock->sk->sk_bound_dev_if == l3mdev_ifindex)

Why not allow the vxlan socket to bind to any ifindex? In that case this
socket lookup follows what we do for tcp, udp and raw sockets, and  you
don't need to call out vrf / l3mdev directly (ie.,
s/l3mdev_ifindex/ifindex/g) - it comes for free.

Re: [iproute PATCH] ip-address: Fix filtering by negated address flags

2018-11-14 Thread Stephen Hemminger

On Wed, 14 Nov 2018 11:52:51 +0100
Phil Sutter  wrote:

> Hi Stephen,
> 
> On Tue, Nov 13, 2018 at 02:47:59PM -0800, Stephen Hemminger wrote:
> > On Tue, 13 Nov 2018 16:12:01 +0100
> > Phil Sutter  wrote:
> >   
> > > + if (arg[0] == '-') {
> > > + inv = true;
> > > + arg++;
> > > + }  
> > The inverse logic needs to be moved into the loop handling filter names.
> > 
> > Otherwise, you get weirdness like "-dynamic" being accepted and not
> > doing what was expected.  
> 
> I intentionally moved it there to allow for '-dynamic' and '-primary'
> as well. IMO this is consistent: 'dynamic' is the inverse of 'permanent'
> and 'primary' the inverse of 'secondary' but currently only '-permanent'
> and '-secondary' are allowed. With my patch applied, one may specify not
> only '-permanent' to get the same effect as 'dynamic' but also
> '-dynamic' to get the same effect as 'permanent'. Likewise for the other
> two. Did I miss something?
> 
> > Also, please make sure the man page matches the code.  
> 
> Oh, right. Given the above is fine with you, I will add the man page
> change in v2.
> 
> Thanks, Phil

I was thinking something like this which simplifies the logic.

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index cd8cc76a3473..3f1510383071 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1212,37 +1212,34 @@ static void print_ifa_flags(FILE *fp, const struct 
ifaddrmsg *ifa,
 static int get_filter(const char *arg)
 {
unsigned int i;
+   bool inv = false;
 
/* Special cases */
if (strcmp(arg, "dynamic") == 0) {
-   filter.flags &= ~IFA_F_PERMANENT;
-   filter.flagmask |= IFA_F_PERMANENT;
+   arg = "-permanent";
} else if (strcmp(arg, "primary") == 0) {
-   filter.flags &= ~IFA_F_SECONDARY;
-   filter.flagmask |= IFA_F_SECONDARY;
-   } else if (*arg == '-') {
-   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
-   if (strcmp(arg + 1, ifa_flag_names[i].name))
-   continue;
+   arg = "-secondary";
+   }
 
-   filter.flags &= ifa_flag_names[i].value;
-   filter.flagmask |= ifa_flag_names[i].value;
-   return 0;
-   }
+   if (*arg == '-') {
+   inv = true;
+   ++arg;
+   }
 
-   return -1;
-   } else {
-   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
-   if (strcmp(arg, ifa_flag_names[i].name))
-   continue;
+   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
+   if (strcmp(arg, ifa_flag_names[i].name))
+   continue;
+
+   if (inv) {
+   filter.flags &= ~ifa_flag_names[i].value;
+   filter.flagmask |= ifa_flag_names[i].value;
+   } else {
filter.flags |= ifa_flag_names[i].value;
filter.flagmask |= ifa_flag_names[i].value;
-   return 0;
}
-   return -1;
+   return 0;
}
-
-   return 0;
+   return -1;
 }
 
 static int ifa_label_match_rta(int ifindex, const struct rtattr *rta)

Re: [iproute PATCH] ip-route: Fix nexthop encap parsing

2018-11-14 Thread Stephen Hemminger

On Tue, 13 Nov 2018 13:39:04 +0100
Phil Sutter  wrote:

> When parsing nexthop parameters, a buffer of 4k bytes is provided. Yet,
> in lwt_parse_encap() and some functions called by it, buffer size was
> assumed to be 1k despite the actual size was provided. This led to
> spurious buffer size errors if the buffer was filled by previous nexthop
> parameters to exceed that 1k boundary.
> 
> Fixes: 1e5293056a02c ("lwtunnel: Add encapsulation support to ip route")
> Fixes: 5866bddd9aa9e ("ila: Add support for ILA lwtunnels")
> Fixes: ed67f83806538 ("ila: Support for checksum neutral translation")
> Fixes: 86905c8f057c0 ("ila: support for configuring identifier and hook 
> types")
> Fixes: b15f440e78373 ("lwt: BPF support for LWT")
> Signed-off-by: Phil Sutter 

Applied

Re: [iproute PATCH] man: ip-route.8: Document nexthop limit

2018-11-14 Thread Stephen Hemminger

On Mon, 12 Nov 2018 23:21:01 +0100
Phil Sutter  wrote:

> Add a note to 'nexthop' description stating the maximum number of
> nexthops per command and pointing at 'append' command as a workaround.
> 
> Signed-off-by: Phil Sutter 

Applied

Re: [PATCH net] ipv6: fix a dst leak when removing its exception

2018-11-14 Thread David Ahern

On 11/13/18 8:48 AM, Xin Long wrote:
> These is no need to hold dst before calling rt6_remove_exception_rt().
> The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
> which has been removed in Commit 93531c674315 ("net/ipv6: separate
> handling of FIB entries from dst based routes"). Otherwise, it will
> cause a dst leak.
> 
> This patch is to simply remove the dst_hold_safe() call before calling
> rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
> It's safe, because the removal of the exception that holds its dst's
> refcnt is protected by rt6_exception_lock.
> 
> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst 
> based routes")
> Fixes: 23fb93a4d3f1 ("net/ipv6: Cleanup exception and cache route handling")
> Reported-by: Li Shuang 
> Signed-off-by: Xin Long 
> ---
>  net/ipv6/route.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

was this problem actually hit or is this patch based on a code analysis?

[PATCH net-next] nxp: fix trivial comment typo

2018-11-14 Thread Andrea Claudi

s/rxfliterctrl/rxfilterctrl

Signed-off-by: Andrea Claudi 
---
 drivers/net/ethernet/nxp/lpc_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/nxp/lpc_eth.c 
b/drivers/net/ethernet/nxp/lpc_eth.c
index bd8695a4faaa..89d17399fb5a 100644
--- a/drivers/net/ethernet/nxp/lpc_eth.c
+++ b/drivers/net/ethernet/nxp/lpc_eth.c
@@ -280,7 +280,7 @@
 #define LPC_FCCR_MIRRORCOUNTERCURRENT(n)   ((n) & 0x)
 
 /*
- * rxfliterctrl, rxfilterwolstatus, and rxfilterwolclear shared
+ * rxfilterctrl, rxfilterwolstatus, and rxfilterwolclear shared
  * register definitions
  */
 #define LPC_RXFLTRW_ACCEPTUNICAST  (1 << 0)
@@ -291,7 +291,7 @@
 #define LPC_RXFLTRW_ACCEPTPERFECT  (1 << 5)
 
 /*
- * rxfliterctrl register definitions
+ * rxfilterctrl register definitions
  */
 #define LPC_RXFLTRWSTS_MAGICPACKETENWOL(1 << 12)
 #define LPC_RXFLTRWSTS_RXFILTERENWOL   (1 << 13)
-- 
2.17.2

[PATCH v3 net-next 4/4] net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list

2018-11-14 Thread Edward Cree

Allows GRO-using drivers to get the benefits of batching for non-GROable
 traffic.

Signed-off-by: Edward Cree 
---
 net/core/dev.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 35427167f6fb..65bfe28fbc81 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5664,6 +5664,7 @@ EXPORT_SYMBOL(napi_gro_receive);
 int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head)
 {
struct sk_buff *skb, *next;
+   struct list_head sublist;
gro_result_t result;
int kept = 0;
 
@@ -5673,14 +5674,26 @@ int napi_gro_receive_list(struct napi_struct *napi, 
struct list_head *head)
skb_gro_reset_offset(skb);
}
 
+   INIT_LIST_HEAD(&sublist);
list_for_each_entry_safe(skb, next, head, list) {
list_del(&skb->list);
skb->next = NULL;
result = dev_gro_receive(napi, skb);
-   result = napi_skb_finish(result, skb);
-   if (result != GRO_DROP)
-   kept++;
+   if (result == GRO_NORMAL) {
+   list_add_tail(&skb->list, &sublist);
+   continue;
+   } else {
+   if (!list_empty(&sublist)) {
+   /* Handle the GRO_NORMAL skbs to prevent OoO */
+   kept += 
netif_receive_skb_list_internal(&sublist);
+   INIT_LIST_HEAD(&sublist);
+   }
+   result = napi_skb_finish(result, skb);
+   if (result != GRO_DROP)
+   kept++;
+   }
}
+   kept += netif_receive_skb_list_internal(&sublist);
return kept;
 }
 EXPORT_SYMBOL(napi_gro_receive_list);

[PATCH v3 net-next 3/4] net: make listified RX functions return number of good packets

2018-11-14 Thread Edward Cree

'Good' packets are defined as skbs for which netif_receive_skb() would
 have returned %NET_RX_SUCCESS.  Thus, drivers can use this number for
 adaptive interrupt moderation where they previously reacted to the
 return code from netif_receive_skb().

Signed-off-by: Edward Cree 
---
 include/linux/netdevice.h |  4 +--
 include/net/ip.h  |  4 +--
 include/net/ipv6.h|  4 +--
 net/core/dev.c| 63 +--
 net/ipv4/ip_input.c   | 39 ++---
 net/ipv6/ip6_input.c  | 37 +---
 6 files changed, 92 insertions(+), 59 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2cef1d0fb2b1..76b98386a5dd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2357,7 +2357,7 @@ struct packet_type {
 struct net_device *,
 struct packet_type *,
 struct net_device *);
-   void(*list_func) (struct list_head *,
+   int (*list_func) (struct list_head *,
  struct packet_type *,
  struct net_device *);
bool(*id_match)(struct packet_type *ptype,
@@ -3587,7 +3587,7 @@ int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
 int netif_receive_skb_core(struct sk_buff *skb);
-void netif_receive_skb_list(struct list_head *head);
+int netif_receive_skb_list(struct list_head *head);
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
 int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
diff --git a/include/net/ip.h b/include/net/ip.h
index 8866bfce6121..33ab464f7a09 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -152,8 +152,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
  struct ip_options_rcu *opt);
 int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
   struct net_device *orig_dev);
-void ip_list_rcv(struct list_head *head, struct packet_type *pt,
-struct net_device *orig_dev);
+int ip_list_rcv(struct list_head *head, struct packet_type *pt,
+   struct net_device *orig_dev);
 int ip_local_deliver(struct sk_buff *skb);
 void ip_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int proto);
 int ip_mr_input(struct sk_buff *skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index daf80863d3a5..e25920829a94 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -914,8 +914,8 @@ static inline __be32 flowi6_get_flowlabel(const struct 
flowi6 *fl6)
 
 int ipv6_rcv(struct sk_buff *skb, struct net_device *dev,
 struct packet_type *pt, struct net_device *orig_dev);
-void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
-  struct net_device *orig_dev);
+int ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
+ struct net_device *orig_dev);
 
 int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 8f0fb56170b3..35427167f6fb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4969,24 +4969,27 @@ int netif_receive_skb_core(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_receive_skb_core);
 
-static inline void __netif_receive_skb_list_ptype(struct list_head *head,
- struct packet_type *pt_prev,
- struct net_device *orig_dev)
+static inline int __netif_receive_skb_list_ptype(struct list_head *head,
+struct packet_type *pt_prev,
+struct net_device *orig_dev)
 {
struct sk_buff *skb, *next;
+   int kept = 0;
 
if (!pt_prev)
-   return;
+   return 0;
if (list_empty(head))
-   return;
+   return 0;
if (pt_prev->list_func != NULL)
-   pt_prev->list_func(head, pt_prev, orig_dev);
+   kept = pt_prev->list_func(head, pt_prev, orig_dev);
else
list_for_each_entry_safe(skb, next, head, list)
-   pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+   if (pt_prev->func(skb, skb->dev, pt_prev, orig_dev) == 
NET_RX_SUCCESS)
+   kept++;
+   return kept;
 }
 
-static void __netif_receive_skb_list_core(struct list_head *head, bool 
pfmemalloc)
+static int __netif_receive_skb_list_core(struct list_head *head, bool 
pfmemalloc)
 {
/* Fast-path assumptions:
 * - There is

[PATCH v3 net-next 2/4] sfc: use batched receive for GRO

2018-11-14 Thread Edward Cree

Signed-off-by: Edward Cree 
---
 drivers/net/ethernet/sfc/efx.c| 11 +--
 drivers/net/ethernet/sfc/net_driver.h |  1 +
 drivers/net/ethernet/sfc/rx.c | 16 +---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 98fe7e762e17..dbe4a70b36b0 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -263,9 +263,9 @@ static int efx_check_disabled(struct efx_nic *efx)
  */
 static int efx_process_channel(struct efx_channel *channel, int budget)
 {
+   struct list_head rx_list, gro_list;
struct efx_tx_queue *tx_queue;
-   struct list_head rx_list;
-   int spent;
+   int spent, gro_count;
 
if (unlikely(!channel->enabled))
return 0;
@@ -275,6 +275,10 @@ static int efx_process_channel(struct efx_channel 
*channel, int budget)
INIT_LIST_HEAD(&rx_list);
channel->rx_list = &rx_list;
 
+   EFX_WARN_ON_PARANOID(channel->gro_list != NULL);
+   INIT_LIST_HEAD(&gro_list);
+   channel->gro_list = &gro_list;
+
efx_for_each_channel_tx_queue(tx_queue, channel) {
tx_queue->pkts_compl = 0;
tx_queue->bytes_compl = 0;
@@ -300,6 +304,9 @@ static int efx_process_channel(struct efx_channel *channel, 
int budget)
/* Receive any packets we queued up */
netif_receive_skb_list(channel->rx_list);
channel->rx_list = NULL;
+   gro_count = napi_gro_receive_list(&channel->napi_str, 
channel->gro_list);
+   channel->irq_mod_score += gro_count * 2;
+   channel->gro_list = NULL;
 
return spent;
 }
diff --git a/drivers/net/ethernet/sfc/net_driver.h 
b/drivers/net/ethernet/sfc/net_driver.h
index 961b92979640..72addac7a84a 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -502,6 +502,7 @@ struct efx_channel {
unsigned int rx_pkt_index;
 
struct list_head *rx_list;
+   struct list_head *gro_list;
 
struct efx_rx_queue rx_queue;
struct efx_tx_queue tx_queue[EFX_TXQ_TYPES];
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 396ff01298cd..0534a54048c6 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -453,9 +453,19 @@ efx_rx_packet_gro(struct efx_channel *channel, struct 
efx_rx_buffer *rx_buf,
 
skb_record_rx_queue(skb, channel->rx_queue.core_index);
 
-   gro_result = napi_gro_frags(napi);
-   if (gro_result != GRO_DROP)
-   channel->irq_mod_score += 2;
+   /* Pass the packet up */
+   if (channel->gro_list != NULL) {
+   /* Clear napi->skb and prepare skb for GRO */
+   skb = napi_frags_skb(napi);
+   if (skb)
+   /* Add to list, will pass up later */
+   list_add_tail(&skb->list, channel->gro_list);
+   } else {
+   /* No list, so pass it up now */
+   gro_result = napi_gro_frags(napi);
+   if (gro_result != GRO_DROP)
+   channel->irq_mod_score += 2;
+   }
 }
 
 /* Allocate and construct an SKB around page fragments */

[PATCH v3 net-next 1/4] net: introduce list entry point for GRO

2018-11-14 Thread Edward Cree

Also export napi_frags_skb() so that drivers using the napi_gro_frags()
 interface can prepare their SKBs properly for submitting on such a list.

Signed-off-by: Edward Cree 
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c| 28 +++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 487fa5e0e165..2cef1d0fb2b1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3589,8 +3589,10 @@ int netif_receive_skb(struct sk_buff *skb);
 int netif_receive_skb_core(struct sk_buff *skb);
 void netif_receive_skb_list(struct list_head *head);
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb);
+int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head);
 void napi_gro_flush(struct napi_struct *napi, bool flush_old);
 struct sk_buff *napi_get_frags(struct napi_struct *napi);
+struct sk_buff *napi_frags_skb(struct napi_struct *napi);
 gro_result_t napi_gro_frags(struct napi_struct *napi);
 struct packet_offload *gro_find_receive_by_type(__be16 type);
 struct packet_offload *gro_find_complete_by_type(__be16 type);
diff --git a/net/core/dev.c b/net/core/dev.c
index bf7e0a471186..8f0fb56170b3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5645,6 +5645,31 @@ gro_result_t napi_gro_receive(struct napi_struct *napi, 
struct sk_buff *skb)
 }
 EXPORT_SYMBOL(napi_gro_receive);
 
+/* Returns the number of SKBs on the list successfully received */
+int napi_gro_receive_list(struct napi_struct *napi, struct list_head *head)
+{
+   struct sk_buff *skb, *next;
+   gro_result_t result;
+   int kept = 0;
+
+   list_for_each_entry(skb, head, list) {
+   skb_mark_napi_id(skb, napi);
+   trace_napi_gro_receive_entry(skb);
+   skb_gro_reset_offset(skb);
+   }
+
+   list_for_each_entry_safe(skb, next, head, list) {
+   list_del(&skb->list);
+   skb->next = NULL;
+   result = dev_gro_receive(napi, skb);
+   result = napi_skb_finish(result, skb);
+   if (result != GRO_DROP)
+   kept++;
+   }
+   return kept;
+}
+EXPORT_SYMBOL(napi_gro_receive_list);
+
 static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 {
if (unlikely(skb->pfmemalloc)) {
@@ -5716,7 +5741,7 @@ static gro_result_t napi_frags_finish(struct napi_struct 
*napi,
  * Drivers could call both napi_gro_frags() and napi_gro_receive()
  * We copy ethernet header into skb->data to have a common layout.
  */
-static struct sk_buff *napi_frags_skb(struct napi_struct *napi)
+struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 {
struct sk_buff *skb = napi->skb;
const struct ethhdr *eth;
@@ -5752,6 +5777,7 @@ static struct sk_buff *napi_frags_skb(struct napi_struct 
*napi)
 
return skb;
 }
+EXPORT_SYMBOL(napi_frags_skb);
 
 gro_result_t napi_gro_frags(struct napi_struct *napi)
 {

[PATCH v3 net-next 0/4] net: batched receive in GRO path

2018-11-14 Thread Edward Cree

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.

Performance figures with this series, collected on a back-to-back pair of
 Solarflare sfn8522-r2 NICs with 120-second NetPerf tests.  In the stats,
 sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
 uncoalesceable packets (e.g. due to IP or TCP options).  Payload_size in all
 tests was 8000 bytes.  BW tests use 4 streams, RR tests use 100.
TCP Stream, GRO on:
net-next: 9.415 Gb/s (line rate); 190% total rxcpu
after #4: 9.415 Gb/s; 192% total rxcpu
 p_bw = 0.155; p_cpu = 0.382
TCP Stream, GRO off:
net-next: 5.625 Gb/s
after #4: 6.551 Gb/s
  16.5% faster; p < 0.001
TCP RR, GRO on:
net-next: 837.6 us
after #4: 840.0 us
  0.3% slower; p = 0.229
TCP RR, GRO off:
net-next: 867.6 us
after #4: 860.1 us
  0.9% faster; p = 0.064
UDP Stream (GRO off):
net-next: 7.808 Gb/s
after #4: 7.848 Gb/s
  0.5% slower; p = 0.144
Conclusion:
* TCP b/w is 16.5% faster for traffic which cannot be coalesced by GRO.
* TCP latency might be slightly improved in the same case, but it's not
  quite statistically significant
* Both see no statistically significant change in performance with GRO
  active
* UDP throughput might be slightly slowed (probably by patch #3) but it's
  not statistically significant.  Note that drivers which (unlike sfc) pass
  UDP traffic to GRO will probably see gains here as this gives them access
  to bundling.

Change history:
v3: Rebased on latest net-next.  Re-ran performance tests and added TCP_RR
 tests at suggestion of Eric Dumazet.  Expanded changelog of patch #3.

v2: Rebased on latest net-next.  Removed RFC tags.  Otherwise unchanged
 owing to lack of comments on v1.

Edward Cree (4):
  net: introduce list entry point for GRO
  sfc: use batched receive for GRO
  net: make listified RX functions return number of good packets
  net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list

 drivers/net/ethernet/sfc/efx.c|  11 +++-
 drivers/net/ethernet/sfc/net_driver.h |   1 +
 drivers/net/ethernet/sfc/rx.c |  16 +-
 include/linux/netdevice.h |   6 +-
 include/net/ip.h  |   4 +-
 include/net/ipv6.h|   4 +-
 net/core/dev.c| 104 ++
 net/ipv4/ip_input.c   |  39 -
 net/ipv6/ip6_input.c  |  37 +++-
 9 files changed, 157 insertions(+), 65 deletions(-)

Re: [RFC v1 1/3] udp_tunnel: add config option to bind to a device

2018-11-14 Thread Alexis Bauvin

Le 14 nov. 2018 à 17:07, Nicolas Dichtel  a écrit :
> Le 14/11/2018 à 10:31, Alexis Bauvin a écrit :
>> UDP tunnel sockets are always opened unbound to a specific device. This
>> patch allow the socket to be bound on a custom device, which
>> incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
>> 
>> Signed-off-by: Alexis Bauvin 
>> Reviewed-by: Amine Kherbouche 
>> Tested-by: Amine Kherbouche 
> What is the difference with the previous version?
> Maybe a cover letter would help to track the history.
Unless a mistake from my side, you should have received the cover letter in
the previous email. The previous version had a typo in the commit log of the
third patch of this patch set.

> Regards,
> Nicolas
Regards,
Alexis

Re: [PATCH net-next 00/13] nfp: abm: track all Qdiscs

2018-11-14 Thread David Miller

From: Jakub Kicinski 
Date: Mon, 12 Nov 2018 14:58:06 -0800

> Our Qdisc offload so far has been very simplistic.  We held
> and array of marking thresholds and statistics sized to the
> number of PF queues.  This was sufficient since the only
> configuration we supported was single layer of RED Qdiscs
> (on top of MQ or not, but MQ isn't really about queuing).
> 
> As we move to add more Qdiscs it's time to actually try to
> track the full Qdisc hierarchy.  This allows us to make sure
> our offloaded configuration reflects the SW path better.
> We add graft notifications to MQ and RED (PRIO already sends
> them) to allow drivers offloading those to learn how Qdiscs
> are linked.  MQ graft gives us the obvious advantage of being
> able to track when Qdiscs are shared or moved.  It seems
> unlikely HW would offload RED's child Qdiscs but since the
> behaviour would change based on linked child we should
> stop offloading REDs with modified child.  RED will also
> handle the child differently during reconfig when limit
> parameter is set - so we have to inform the drivers about
> the limit, and have them reset the child state when
> appropriate.
> 
> The NFP driver will now allocate a structure to track each
> Qdisc and link it to its children.  We will also maintain
> a shadow copy of threshold settings - to save device writes
> and make it easier to apply defaults when config is
> re-evaluated.

Series applied, thanks.

Re: [PATCH net-next v2 0/6] net: aquantia: add rx-flow filter support

2018-11-14 Thread David Miller

From: Igor Russkikh 
Date: Mon, 12 Nov 2018 15:45:56 +

> In this patchset the rx-flow filters functionality and vlan filter offloads
> are implemented.
> 
> The rules in NIC hardware have fixed order and priorities.
> To support this, the locations of filters from ethtool perspective are also 
> fixed:
> 
> * Locations 0 - 15 for VLAN ID filters
> * Locations 16 - 31 for L2 EtherType and PCP filters
> * Locations 32 - 39 for L3/L4 5-tuple filters (locations 32, 36 for IPv6)

Series applied, thanks.

Re: [PATCH net-next 17/17] net: sched: unlock rules update API

2018-11-14 Thread Vlad Buslov



On Wed 14 Nov 2018 at 06:44, Jiri Pirko  wrote:
> Tue, Nov 13, 2018 at 02:46:54PM CET, vla...@mellanox.com wrote:
>>On Mon 12 Nov 2018 at 17:30, David Miller  wrote:
>>> From: Vlad Buslov 
>>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>>
 Register netlink protocol handlers for message types RTM_NEWTFILTER,
 RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
 tracks rtnl mutex state to be false by default.
>>>
>>> This whole conditional locking mechanism is really not clean and makes
>>> this code so much harder to understand and audit.
>>>
>>> Please improve the code so that this kind of construct is not needed.
>>>
>>> Thank you.
>>
>>Hi David,
>>
>>I considered several approaches to this problem and decided that this
>>one is most straightforward to implement. I understand your concern and
>>agree that this code is not easiest to understand and can suggest
>>several possible solutions that do not require this kind of elaborate
>>locking mechanism in cls API, but have their own drawbacks:
>>
>>1. Convert all qdiscs and classifiers to support unlocked execution,
>>like we did for actions. However, according to my experience with
>>converting flower classifier, these require much more code than actions.
>>I would estimate it to be more work than whole current unlocking effort
>>(hundred+ patches). Also, authors of some of them might be unhappy with
>>such intrusive changes. I don't think this approach is realistic.
>>
>>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>>update functions. Currently, this is not possible because locking
>>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>>field, which requires code to first do whole ops lookup sequence.
>>However, instead of class field I can put 'flags' in some kind of hash
>>table or array that will map qdisc/classifier type string to flags, so
>>it will be possible to determine locking requirements by just parsing
>>netlink message and obtaining flags by qdisc/classifier type. I do not
>>consider it pretty solution either, but maybe you have different
>>opinion.
>
> I think you will have to do 2. or some modification. Can't you just
> check for cls ability to run unlocked early on in tc_new_tfilter()?
> You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
> which would do tcf_proto_lookup_ops() for ops and check the flags?

I guess that would work. However, such solution requires calling
tcf_proto_lookup_ops(), which iterates over tcf_proto_base list and
calls strcmp() for each proto, for every rule update call. That is why I
suggested to use some kind of optimized data structure for that purpose
in my first reply. Dunno if such solution will significantly impact rule
update performance. We don't have that many classifiers and their names
are short, so I guess not?

>
>
>>
>>3. Anything you can suggest? I might be missing something simple that
>>you would consider more elegant solution to this problem.
>>
>>Thanks,
>>Vlad
>>

Re: [RFC v1 1/3] udp_tunnel: add config option to bind to a device

2018-11-14 Thread Nicolas Dichtel

Le 14/11/2018 à 10:31, Alexis Bauvin a écrit :
> UDP tunnel sockets are always opened unbound to a specific device. This
> patch allow the socket to be bound on a custom device, which
> incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
> 
> Signed-off-by: Alexis Bauvin 
> Reviewed-by: Amine Kherbouche 
> Tested-by: Amine Kherbouche 
What is the difference with the previous version?
Maybe a cover letter would help to track the history.


Regards,
Nicolas

Re: [PATCHv2 net-next 1/4] sctp: define subscribe in sctp_sock as __u16

2018-11-14 Thread Xin Long

On Wed, Nov 14, 2018 at 2:16 AM Neil Horman  wrote:
>
> On Tue, Nov 13, 2018 at 02:24:53PM +0800, Xin Long wrote:
> >
> >   /* Default Peer Address Parameters.  These defaults can
> >* be modified via SCTP_PEER_ADDR_PARAMS
> > @@ -5267,14 +5274,24 @@ static int sctp_getsockopt_disable_fragments(struct 
> > sock *sk, int len,
> >  static int sctp_getsockopt_events(struct sock *sk, int len, char __user 
> > *optval,
> > int __user *optlen)
> >  {
> > + struct sctp_event_subscribe subscribe;
> > + __u8 *sn_type = (__u8 *)&subscribe;
> > + int i;
> > +
> >   if (len == 0)
> >   return -EINVAL;
> >   if (len > sizeof(struct sctp_event_subscribe))
> >   len = sizeof(struct sctp_event_subscribe);
> >   if (put_user(len, optlen))
> >   return -EFAULT;
> > - if (copy_to_user(optval, &sctp_sk(sk)->subscribe, len))
> > +
> > + for (i = 0; i <= len; i++)
> > + sn_type[i] = 
> > sctp_ulpevent_type_enabled(sctp_sk(sk)->subscribe,
> > + SCTP_SN_TYPE_BASE + 
> > i);
> > +
> This seems like an off by one error.  sctp_event_subscribe has N bytes in it 
> (1
> byte for each event), meaning that that events 0-(N-1) are subscribable.
> Iterating this loop imples that you are going to check N events, overrunning 
> the
> sctp_event_subscribe struct.
you're right, thanks.

>
> Neil
>
> >

[PATCH net] qed: Fix qed compilation issue when CONFIG_QED_RDMA not defined

2018-11-14 Thread Denis Bolotin

Add a missing semicolon to a line in an empty implementation function.

Signed-off-by: Denis Bolotin 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed_rdma.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.h 
b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
index 50d609c..5eec88c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
@@ -183,7 +183,10 @@ struct qed_rdma_qp {
 static inline void qed_rdma_dpm_conf(struct qed_hwfn *p_hwfn, struct qed_ptt 
*p_ptt) {}
 static inline void qed_rdma_dpm_bar(struct qed_hwfn *p_hwfn,
struct qed_ptt *p_ptt) {}
-static inline int qed_rdma_info_alloc(struct qed_hwfn *p_hwfn) {return -EINVAL}
+static inline int qed_rdma_info_alloc(struct qed_hwfn *p_hwfn)
+{
+   return -EINVAL;
+}
 static inline void qed_rdma_info_free(struct qed_hwfn *p_hwfn) {}
 #endif
 
-- 
1.8.3.1

[PATCH] allow DSCP values in ip rulesB

2018-11-14 Thread Pavel Balaev

Hello, for now IP rules supports only old TOS values and we cannot use
DSCP.

This patch adds support for DSCP values in IP rules:

$ ip r add default via 192.168.0.6 table test
$ ip ru add tos 0x80 table test
$ ip ru
0:  from all lookup local 
32764:  from all tos CS4 lookup test 
32766:  from all lookup main 
32767:  from all lookup default 
$ ip r get fibmatch 8.8.8.9 tos 0x80
default tos CS4 via 192.168.0.6 dev lan table test

Signed-off-by: Pavel Balaev 
---
 net/ipv4/fib_rules.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index f8eb78d0..7a6c5bfe 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -220,7 +220,7 @@ static int fib4_rule_configure(struct fib_rule *rule, 
struct sk_buff *skb,
int err = -EINVAL;
struct fib4_rule *rule4 = (struct fib4_rule *) rule;
 
-   if (frh->tos & ~IPTOS_TOS_MASK) {
+   if (frh->tos & ~(IPTOS_TOS_MASK | IPTOS_PREC_MASK)) {
NL_SET_ERR_MSG(extack, "Invalid tos");
goto errout;
}
-- 
2.18.1

[net:master 25/27] drivers/net/ethernet/qlogic/qed/qed_rdma.h:186:79: error: expected ';' before '}' token

2018-11-14 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   db8ddde766adf09605b5282e7978fa0ba76c3ee3
commit: 291d57f67d2449737d1e370ab5b9a583818eaa0c [25/27] qed: Fix rdma_info 
structure allocation
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 291d57f67d2449737d1e370ab5b9a583818eaa0c
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/qlogic/qed/qed_cxt.c:49:0:
   drivers/net/ethernet/qlogic/qed/qed_rdma.h: In function 
'qed_rdma_info_alloc':
>> drivers/net/ethernet/qlogic/qed/qed_rdma.h:186:79: error: expected ';' 
>> before '}' token
static inline int qed_rdma_info_alloc(struct qed_hwfn *p_hwfn) {return 
-EINVAL}

  ^

vim +186 drivers/net/ethernet/qlogic/qed/qed_rdma.h

   176  
   177  #if IS_ENABLED(CONFIG_QED_RDMA)
   178  void qed_rdma_dpm_bar(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
   179  void qed_rdma_dpm_conf(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt);
   180  int qed_rdma_info_alloc(struct qed_hwfn *p_hwfn);
   181  void qed_rdma_info_free(struct qed_hwfn *p_hwfn);
   182  #else
   183  static inline void qed_rdma_dpm_conf(struct qed_hwfn *p_hwfn, struct 
qed_ptt *p_ptt) {}
   184  static inline void qed_rdma_dpm_bar(struct qed_hwfn *p_hwfn,
   185  struct qed_ptt *p_ptt) {}
 > 186  static inline int qed_rdma_info_alloc(struct qed_hwfn *p_hwfn) {return 
 > -EINVAL}
   187  static inline void qed_rdma_info_free(struct qed_hwfn *p_hwfn) {}
   188  #endif
   189  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH net-next 3/3] dpaa2-eth: bql support

2018-11-14 Thread Ioana Ciocoi Radulescu

Add support for byte queue limit.

On NAPI poll, we save the total number of Tx confirmed frames/bytes
and register them with bql at the end of the poll function.

Signed-off-by: Ioana Radulescu 
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 59 ++--
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h |  2 +
 2 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index be31287..640967a 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -288,7 +288,7 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv *priv,
  * Observance of NAPI budget is not our concern, leaving that to the caller.
  */
 static int consume_frames(struct dpaa2_eth_channel *ch,
- enum dpaa2_eth_fq_type *type)
+ struct dpaa2_eth_fq **src)
 {
struct dpaa2_eth_priv *priv = ch->priv;
struct dpaa2_eth_fq *fq = NULL;
@@ -322,10 +322,10 @@ static int consume_frames(struct dpaa2_eth_channel *ch,
ch->stats.frames += cleaned;
 
/* A dequeue operation only pulls frames from a single queue
-* into the store. Return the frame queue type as an out param.
+* into the store. Return the frame queue as an out param.
 */
-   if (type)
-   *type = fq->type;
+   if (src)
+   *src = fq;
 
return cleaned;
 }
@@ -570,8 +570,10 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, 
struct net_device *net_dev)
struct rtnl_link_stats64 *percpu_stats;
struct dpaa2_eth_drv_stats *percpu_extras;
struct dpaa2_eth_fq *fq;
+   struct netdev_queue *nq;
u16 queue_mapping;
unsigned int needed_headroom;
+   u32 fd_len;
int err, i;
 
percpu_stats = this_cpu_ptr(priv->percpu_stats);
@@ -643,8 +645,12 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, 
struct net_device *net_dev)
/* Clean up everything, including freeing the skb */
free_tx_fd(priv, &fd);
} else {
+   fd_len = dpaa2_fd_get_len(&fd);
percpu_stats->tx_packets++;
-   percpu_stats->tx_bytes += dpaa2_fd_get_len(&fd);
+   percpu_stats->tx_bytes += fd_len;
+
+   nq = netdev_get_tx_queue(net_dev, queue_mapping);
+   netdev_tx_sent_queue(nq, fd_len);
}
 
return NETDEV_TX_OK;
@@ -660,10 +666,11 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, 
struct net_device *net_dev)
 static void dpaa2_eth_tx_conf(struct dpaa2_eth_priv *priv,
  struct dpaa2_eth_channel *ch __always_unused,
  const struct dpaa2_fd *fd,
- struct dpaa2_eth_fq *fq __always_unused)
+ struct dpaa2_eth_fq *fq)
 {
struct rtnl_link_stats64 *percpu_stats;
struct dpaa2_eth_drv_stats *percpu_extras;
+   u32 fd_len = dpaa2_fd_get_len(fd);
u32 fd_errors;
 
/* Tracing point */
@@ -671,7 +678,10 @@ static void dpaa2_eth_tx_conf(struct dpaa2_eth_priv *priv,
 
percpu_extras = this_cpu_ptr(priv->percpu_extras);
percpu_extras->tx_conf_frames++;
-   percpu_extras->tx_conf_bytes += dpaa2_fd_get_len(fd);
+   percpu_extras->tx_conf_bytes += fd_len;
+
+   fq->dq_frames++;
+   fq->dq_bytes += fd_len;
 
/* Check frame errors in the FD field */
fd_errors = dpaa2_fd_get_ctrl(fd) & DPAA2_FD_TX_ERR_MASK;
@@ -932,8 +942,9 @@ static int dpaa2_eth_poll(struct napi_struct *napi, int 
budget)
struct dpaa2_eth_channel *ch;
struct dpaa2_eth_priv *priv;
int rx_cleaned = 0, txconf_cleaned = 0;
-   enum dpaa2_eth_fq_type type = 0;
-   int store_cleaned;
+   struct dpaa2_eth_fq *fq, *txc_fq = NULL;
+   struct netdev_queue *nq;
+   int store_cleaned, work_done;
int err;
 
ch = container_of(napi, struct dpaa2_eth_channel, napi);
@@ -947,18 +958,25 @@ static int dpaa2_eth_poll(struct napi_struct *napi, int 
budget)
/* Refill pool if appropriate */
refill_pool(priv, ch, priv->bpid);
 
-   store_cleaned = consume_frames(ch, &type);
-   if (type == DPAA2_RX_FQ)
+   store_cleaned = consume_frames(ch, &fq);
+   if (!store_cleaned)
+   break;
+   if (fq->type == DPAA2_RX_FQ) {
rx_cleaned += store_cleaned;
-   else
+   } else {
txconf_cleaned += store_cleaned;
+   /* We have a single Tx conf FQ on this channel */
+   txc_fq = fq;
+   }
 
/* If we either consumed the whole NAPI budget with Rx frames
 * or we reached the Tx confirmations thres

[PATCH net-next 2/3] dpaa2-eth: Don't use multiple queues per channel

2018-11-14 Thread Ioana Ciocoi Radulescu

The DPNI object on which we build a network interface has a
certain number of {Rx, Tx, Tx confirmation} frame queues as
resources. The default hardware setup offers one queue of each
type, as well as one DPCON channel, for each core available
in the system.

There are however cases where the number of queues is greater
than the number of cores or channels. Until now, we configured
and used all the frame queues associated with a DPNI, even if it
meant assigning multiple queues of one type to the same channel.

Update the driver to only use a number of queues equal to the
number of channels, ensuring each channel will contain exactly
one Rx and one Tx confirmation queue.

>From the user viewpoint, this change is completely transparent.
Performance wise there is no impact in most scenarios. In case
the number of queues is larger than and not a multiple of the
number of channels, Rx hash distribution offers now better load
balancing between cores, which can have a positive impact on
overall system performance.

Signed-off-by: Ioana Radulescu 
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 2 +-
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 048414a..be31287 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -1601,7 +1601,7 @@ static int setup_dpio(struct dpaa2_eth_priv *priv)
/* Stop if we already have enough channels to accommodate all
 * RX and TX conf queues
 */
-   if (priv->num_channels == dpaa2_eth_queue_count(priv))
+   if (priv->num_channels == priv->dpni_attrs.num_queues)
break;
}
 
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
index 287b6853..3af706a 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
@@ -433,9 +433,10 @@ static inline unsigned int dpaa2_eth_rx_head_room(struct 
dpaa2_eth_priv *priv)
   DPAA2_ETH_RX_HWA_SIZE;
 }
 
+/* We have exactly one {Rx, Tx conf} queue per channel */
 static int dpaa2_eth_queue_count(struct dpaa2_eth_priv *priv)
 {
-   return priv->dpni_attrs.num_queues;
+   return priv->num_channels;
 }
 
 int dpaa2_eth_set_hash(struct net_device *net_dev, u64 flags);
-- 
2.7.4

[PATCH net-next 1/3] dpaa2-eth: Update callback signature

2018-11-14 Thread Ioana Ciocoi Radulescu

Change the frame consume callback signature:
* the entire FQ structure is passed to the callback instead
of just the queue index
* the NAPI structure can be easily obtained from the channel
it is associated to, so we don't need to pass it explicitly

Signed-off-by: Ioana Radulescu 
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 12 +---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h |  3 +--
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index bdfb13b..048414a 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -203,8 +203,7 @@ static struct sk_buff *build_frag_skb(struct dpaa2_eth_priv 
*priv,
 static void dpaa2_eth_rx(struct dpaa2_eth_priv *priv,
 struct dpaa2_eth_channel *ch,
 const struct dpaa2_fd *fd,
-struct napi_struct *napi,
-u16 queue_id)
+struct dpaa2_eth_fq *fq)
 {
dma_addr_t addr = dpaa2_fd_get_addr(fd);
u8 fd_format = dpaa2_fd_get_format(fd);
@@ -267,12 +266,12 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv *priv,
}
 
skb->protocol = eth_type_trans(skb, priv->net_dev);
-   skb_record_rx_queue(skb, queue_id);
+   skb_record_rx_queue(skb, fq->flowid);
 
percpu_stats->rx_packets++;
percpu_stats->rx_bytes += dpaa2_fd_get_len(fd);
 
-   napi_gro_receive(napi, skb);
+   napi_gro_receive(&ch->napi, skb);
 
return;
 
@@ -312,7 +311,7 @@ static int consume_frames(struct dpaa2_eth_channel *ch,
fd = dpaa2_dq_fd(dq);
fq = (struct dpaa2_eth_fq *)(uintptr_t)dpaa2_dq_fqd_ctx(dq);
 
-   fq->consume(priv, ch, fd, &ch->napi, fq->flowid);
+   fq->consume(priv, ch, fd, fq);
cleaned++;
} while (!is_last);
 
@@ -661,8 +660,7 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, struct 
net_device *net_dev)
 static void dpaa2_eth_tx_conf(struct dpaa2_eth_priv *priv,
  struct dpaa2_eth_channel *ch __always_unused,
  const struct dpaa2_fd *fd,
- struct napi_struct *napi __always_unused,
- u16 queue_id __always_unused)
+ struct dpaa2_eth_fq *fq __always_unused)
 {
struct rtnl_link_stats64 *percpu_stats;
struct dpaa2_eth_drv_stats *percpu_extras;
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
index 452a8e9..287b6853 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
@@ -277,8 +277,7 @@ struct dpaa2_eth_fq {
void (*consume)(struct dpaa2_eth_priv *priv,
struct dpaa2_eth_channel *ch,
const struct dpaa2_fd *fd,
-   struct napi_struct *napi,
-   u16 queue_id);
+   struct dpaa2_eth_fq *fq);
struct dpaa2_eth_fq_stats stats;
 };
 
-- 
2.7.4

[PATCH net-next 0/3] dpaa2-eth: add bql support

2018-11-14 Thread Ioana Ciocoi Radulescu

The first two patches make minor tweaks to the driver to
simplify bql implementation. The third patch adds the actual
bql support.

Ioana Radulescu (3):
  dpaa2-eth: Update callback signature
  dpaa2-eth: Don't use multiple queues per channel
  dpaa2-eth: bql support

 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 71 
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h |  8 ++-
 2 files changed, 54 insertions(+), 25 deletions(-)

-- 
2.7.4

[PATCH net] net/sched: act_pedit: fix memory leak when IDR allocation fails

2018-11-14 Thread Davide Caratti

tcf_idr_check_alloc() can return a negative value, on allocation failures
(-ENOMEM) or IDR exhaustion (-ENOSPC): don't leak keys_ex in these cases.

Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Signed-off-by: Davide Caratti 
---
 net/sched/act_pedit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index da3dd0f68cc2..2b372a06b432 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -201,7 +201,8 @@ static int tcf_pedit_init(struct net *net, struct nlattr 
*nla,
goto out_release;
}
} else {
-   return err;
+   ret = err;
+   goto out_free;
}
 
p = to_pedit(*a);
-- 
2.19.1

Re: [RFC PATCH 5/6] net: marvell: neta: add support for 2500base-X

2018-11-14 Thread Russell King - ARM Linux

On Wed, Nov 14, 2018 at 02:18:14PM +0530, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 12/11/18 6:01 PM, Russell King wrote:
> > Signed-off-by: Russell King 
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 58 
> > ++-
> >  1 file changed, 51 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> > b/drivers/net/ethernet/marvell/mvneta.c
> > index 5bfd349bf41a..7305d4cc0630 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -437,6 +438,7 @@ struct mvneta_port {
> > struct device_node *dn;
> > unsigned int tx_csum_limit;
> > struct phylink *phylink;
> > +   struct phy *comphy;
> >  
> > struct mvneta_bm *bm_priv;
> > struct mvneta_bm_pool *pool_long;
> > @@ -3150,6 +3152,8 @@ static void mvneta_start_dev(struct mvneta_port *pp)
> >  {
> > int cpu;
> >  
> > +   WARN_ON(phy_power_on(pp->comphy));
> > +
> > mvneta_max_rx_size_set(pp, pp->pkt_size);
> > mvneta_txq_max_tx_size_set(pp, pp->pkt_size);
> >  
> > @@ -3212,6 +3216,8 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
> >  
> > mvneta_tx_reset(pp);
> > mvneta_rx_reset(pp);
> > +
> > +   WARN_ON(phy_power_off(pp->comphy));
> >  }
> >  
> >  static void mvneta_percpu_enable(void *arg)
> > @@ -3337,6 +3343,7 @@ static int mvneta_set_mac_addr(struct net_device 
> > *dev, void *addr)
> >  static void mvneta_validate(struct net_device *ndev, unsigned long 
> > *supported,
> > struct phylink_link_state *state)
> >  {
> > +   struct mvneta_port *pp = netdev_priv(ndev);
> > __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
> >  
> > /* We only support QSGMII, SGMII, 802.3z and RGMII modes */
> > @@ -3357,14 +3364,14 @@ static void mvneta_validate(struct net_device 
> > *ndev, unsigned long *supported,
> > /* Asymmetric pause is unsupported */
> > phylink_set(mask, Pause);
> >  
> > -   /* We cannot use 1Gbps when using the 2.5G interface. */
> > -   if (state->interface == PHY_INTERFACE_MODE_2500BASEX) {
> > -   phylink_set(mask, 2500baseT_Full);
> > -   phylink_set(mask, 2500baseX_Full);
> > -   } else {
> > +   /* Half-duplex at speeds higher than 100Mbit is unsupported */
> > +   if (pp->comphy || state->interface != PHY_INTERFACE_MODE_2500BASEX) {
> > phylink_set(mask, 1000baseT_Full);
> > phylink_set(mask, 1000baseX_Full);
> > }
> > +   if (pp->comphy || state->interface == PHY_INTERFACE_MODE_2500BASEX) {
> > +   phylink_set(mask, 2500baseX_Full);
> > +   }
> >  
> > if (!phy_interface_mode_is_8023z(state->interface)) {
> > /* 10M and 100M are only supported in non-802.3z mode */
> > @@ -3378,6 +3385,11 @@ static void mvneta_validate(struct net_device *ndev, 
> > unsigned long *supported,
> >__ETHTOOL_LINK_MODE_MASK_NBITS);
> > bitmap_and(state->advertising, state->advertising, mask,
> >__ETHTOOL_LINK_MODE_MASK_NBITS);
> > +
> > +   /* We can only operate at 2500BaseX or 1000BaseX.  If requested
> > +* to advertise both, only report advertising at 2500BaseX.
> > +*/
> > +   phylink_helper_basex_speed(state);
> >  }
> >  
> >  static int mvneta_mac_link_state(struct net_device *ndev,
> > @@ -3389,7 +3401,9 @@ static int mvneta_mac_link_state(struct net_device 
> > *ndev,
> > gmac_stat = mvreg_read(pp, MVNETA_GMAC_STATUS);
> >  
> > if (gmac_stat & MVNETA_GMAC_SPEED_1000)
> > -   state->speed = SPEED_1000;
> > +   state->speed =
> > +   state->interface == PHY_INTERFACE_MODE_2500BASEX ?
> > +   SPEED_2500 : SPEED_1000;
> > else if (gmac_stat & MVNETA_GMAC_SPEED_100)
> > state->speed = SPEED_100;
> > else
> > @@ -3504,12 +3518,32 @@ static void mvneta_mac_config(struct net_device 
> > *ndev, unsigned int mode,
> > MVNETA_GMAC_FORCE_LINK_DOWN);
> > }
> >  
> > +
> > /* When at 2.5G, the link partner can send frames with shortened
> >  * preambles.
> >  */
> > if (state->speed == SPEED_2500)
> > new_ctrl4 |= MVNETA_GMAC4_SHORT_PREAMBLE_ENABLE;
> >  
> > +   if (pp->comphy) {
> > +   enum phy_mode mode = PHY_MODE_INVALID;
> > +
> > +   switch (state->interface) {
> > +   case PHY_INTERFACE_MODE_SGMII:
> > +   case PHY_INTERFACE_MODE_1000BASEX:
> > +   mode = PHY_MODE_SGMII;
> > +   break;
> > +   case PHY_INTERFACE_MODE_2500BASEX:
> > +   mode = PHY_MODE_2500SGMII;
> > +   break;
> > +   default:
> > +   break;
> > +   }
> > +
> > +   if (mode != PHY_MODE_INVALID)
> > +   WARN_ON(phy_set_mode(pp->comphy, mode));
> > +   }
> > +
>

Re: [RFC PATCH 0/6] Armada 38x comphy driver to support 2.5Gbps networking

2018-11-14 Thread Russell King - ARM Linux

On Wed, Nov 14, 2018 at 01:39:29PM +0530, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 12/11/18 5:59 PM, Russell King - ARM Linux wrote:
> > Hi,
> > 
> > This series adds support for dynamically switching between 1Gbps
> > and 2.5Gbps networking for the Marvell Armada 38x SoCs, tested on
> > Armada 388 on the Clearfog platform.
> > 
> > This is necessary to be able to connect (eg) a Clearfog platform
> > with a Macchiatobin platform via the SFP sockets, as Clearfog
> > currently only supports 1Gbps networking via the SFP socket and
> > Macchiatobin defaults to 2.5Gbps when using Fiberchannel SFPs.
> > 
> > In order to allow dynamic switching, we need to implement a common
> > phy driver to switch the ethernet serdes lane speed - 2.5Gbps is
> > just 1Gbps up-clocked by 2.5x.  We implement a simple comphy
> > driver to achieve this, which only supports networking.
> > 
> > With this, we are able to support both Fiberchannel SFPs operating
> > at 2.5Gbps or 1Gbps, and 1G ethernet SFPs plugged into the Clearfog
> > platform, dynamically selecting according to the SFPs abilities.
> > 
> > I'm aware of the proposed changes to the PHY layer, changing
> > phy_set_mode() to take the ethernet phy interface type, hence why
> > this is RFC - there's also the question about how this will be
> > merged.  This series is currently based on 4.20-rc1, but will
> > likely need to be rebased when the PHY layer changes hit.
> 
> For this case, I'd prefer the phy_set_mode series and the phy and net changes
> here (after rebasing) go via linux-phy tree.

Please let me know when they've hit, thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Re: [iproute PATCH] ip-address: Fix filtering by negated address flags

2018-11-14 Thread Phil Sutter

Hi Stephen,

On Tue, Nov 13, 2018 at 02:47:59PM -0800, Stephen Hemminger wrote:
> On Tue, 13 Nov 2018 16:12:01 +0100
> Phil Sutter  wrote:
> 
> > +   if (arg[0] == '-') {
> > +   inv = true;
> > +   arg++;
> > +   }
> The inverse logic needs to be moved into the loop handling filter names.
> 
> Otherwise, you get weirdness like "-dynamic" being accepted and not
> doing what was expected.

I intentionally moved it there to allow for '-dynamic' and '-primary'
as well. IMO this is consistent: 'dynamic' is the inverse of 'permanent'
and 'primary' the inverse of 'secondary' but currently only '-permanent'
and '-secondary' are allowed. With my patch applied, one may specify not
only '-permanent' to get the same effect as 'dynamic' but also
'-dynamic' to get the same effect as 'permanent'. Likewise for the other
two. Did I miss something?

> Also, please make sure the man page matches the code.

Oh, right. Given the above is fine with you, I will add the man page
change in v2.

Thanks, Phil

Re: BUG: sleeping function called from invalid context at mm/slab.h:421

2018-11-14 Thread Naresh Kamboju

Hi Roman,

On Tue, 13 Nov 2018 at 23:07, Roman Gushchin  wrote:
>
> On Tue, Nov 13, 2018 at 10:03:38PM +0530, Naresh Kamboju wrote:
> > While running kernel selftests bpf test_cgroup_storage test this
> > kernel BUG reported every time on all devices running Linux -next
> > 4.20.0-rc2-next-20181113 (from 4.19.0-rc5-next-20180928).
> > This kernel BUG log is from x86_64 machine.
> >
> > Do you see at your end ?
> >
> > [   73.047526] BUG: sleeping function called from invalid context at
> > /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
> > [   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name:
> > test_cgroup_sto
> > [   73.068342] INFO: lockdep is turned off.
> > [   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
> > 4.20.0-rc2-next-20181113 #1
> > [   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > 2.0b 07/27/2017
> > [   73.088018] Call Trace:
> > [   73.090463]  dump_stack+0x70/0xa5
> > [   73.093783]  ___might_sleep+0x152/0x240
> > [   73.097619]  __might_sleep+0x4a/0x80
> > [   73.101191]  __kmalloc_node+0x1cf/0x2f0
> > [   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
> > [   73.109909]  cgroup_storage_update_elem+0x46/0x90
>
> Hi Naresh!
>
> Thank you for the report! Can you, please, try the following patch?

The below patch tested and it is working.
After applying the patch i do not see reported "BUG:".
Thanks for the fix patch.
Happy to test :)

- Naresh

>
> Thanks!
>
> --
>
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index c97a8f968638..d91710fb8360 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -139,8 +139,8 @@ static int cgroup_storage_update_elem(struct bpf_map 
> *map, void *_key,
> return -ENOENT;
>
> new = kmalloc_node(sizeof(struct bpf_storage_buffer) +
> -  map->value_size, __GFP_ZERO | GFP_USER,
> -  map->numa_node);
> +  map->value_size, __GFP_ZERO | GFP_ATOMIC |
> +  __GFP_NOWARN, map->numa_node);
> if (!new)
> return -ENOMEM;
>

Re: [RFC PATCH 5/6] net: marvell: neta: add support for 2500base-X

2018-11-14 Thread Kishon Vijay Abraham I

Hi,

On 12/11/18 6:01 PM, Russell King wrote:
> Signed-off-by: Russell King 
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 58 
> ++-
>  1 file changed, 51 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 5bfd349bf41a..7305d4cc0630 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -437,6 +438,7 @@ struct mvneta_port {
>   struct device_node *dn;
>   unsigned int tx_csum_limit;
>   struct phylink *phylink;
> + struct phy *comphy;
>  
>   struct mvneta_bm *bm_priv;
>   struct mvneta_bm_pool *pool_long;
> @@ -3150,6 +3152,8 @@ static void mvneta_start_dev(struct mvneta_port *pp)
>  {
>   int cpu;
>  
> + WARN_ON(phy_power_on(pp->comphy));
> +
>   mvneta_max_rx_size_set(pp, pp->pkt_size);
>   mvneta_txq_max_tx_size_set(pp, pp->pkt_size);
>  
> @@ -3212,6 +3216,8 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
>  
>   mvneta_tx_reset(pp);
>   mvneta_rx_reset(pp);
> +
> + WARN_ON(phy_power_off(pp->comphy));
>  }
>  
>  static void mvneta_percpu_enable(void *arg)
> @@ -3337,6 +3343,7 @@ static int mvneta_set_mac_addr(struct net_device *dev, 
> void *addr)
>  static void mvneta_validate(struct net_device *ndev, unsigned long 
> *supported,
>   struct phylink_link_state *state)
>  {
> + struct mvneta_port *pp = netdev_priv(ndev);
>   __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
>  
>   /* We only support QSGMII, SGMII, 802.3z and RGMII modes */
> @@ -3357,14 +3364,14 @@ static void mvneta_validate(struct net_device *ndev, 
> unsigned long *supported,
>   /* Asymmetric pause is unsupported */
>   phylink_set(mask, Pause);
>  
> - /* We cannot use 1Gbps when using the 2.5G interface. */
> - if (state->interface == PHY_INTERFACE_MODE_2500BASEX) {
> - phylink_set(mask, 2500baseT_Full);
> - phylink_set(mask, 2500baseX_Full);
> - } else {
> + /* Half-duplex at speeds higher than 100Mbit is unsupported */
> + if (pp->comphy || state->interface != PHY_INTERFACE_MODE_2500BASEX) {
>   phylink_set(mask, 1000baseT_Full);
>   phylink_set(mask, 1000baseX_Full);
>   }
> + if (pp->comphy || state->interface == PHY_INTERFACE_MODE_2500BASEX) {
> + phylink_set(mask, 2500baseX_Full);
> + }
>  
>   if (!phy_interface_mode_is_8023z(state->interface)) {
>   /* 10M and 100M are only supported in non-802.3z mode */
> @@ -3378,6 +3385,11 @@ static void mvneta_validate(struct net_device *ndev, 
> unsigned long *supported,
>  __ETHTOOL_LINK_MODE_MASK_NBITS);
>   bitmap_and(state->advertising, state->advertising, mask,
>  __ETHTOOL_LINK_MODE_MASK_NBITS);
> +
> + /* We can only operate at 2500BaseX or 1000BaseX.  If requested
> +  * to advertise both, only report advertising at 2500BaseX.
> +  */
> + phylink_helper_basex_speed(state);
>  }
>  
>  static int mvneta_mac_link_state(struct net_device *ndev,
> @@ -3389,7 +3401,9 @@ static int mvneta_mac_link_state(struct net_device 
> *ndev,
>   gmac_stat = mvreg_read(pp, MVNETA_GMAC_STATUS);
>  
>   if (gmac_stat & MVNETA_GMAC_SPEED_1000)
> - state->speed = SPEED_1000;
> + state->speed =
> + state->interface == PHY_INTERFACE_MODE_2500BASEX ?
> + SPEED_2500 : SPEED_1000;
>   else if (gmac_stat & MVNETA_GMAC_SPEED_100)
>   state->speed = SPEED_100;
>   else
> @@ -3504,12 +3518,32 @@ static void mvneta_mac_config(struct net_device 
> *ndev, unsigned int mode,
>   MVNETA_GMAC_FORCE_LINK_DOWN);
>   }
>  
> +
>   /* When at 2.5G, the link partner can send frames with shortened
>* preambles.
>*/
>   if (state->speed == SPEED_2500)
>   new_ctrl4 |= MVNETA_GMAC4_SHORT_PREAMBLE_ENABLE;
>  
> + if (pp->comphy) {
> + enum phy_mode mode = PHY_MODE_INVALID;
> +
> + switch (state->interface) {
> + case PHY_INTERFACE_MODE_SGMII:
> + case PHY_INTERFACE_MODE_1000BASEX:
> + mode = PHY_MODE_SGMII;
> + break;
> + case PHY_INTERFACE_MODE_2500BASEX:
> + mode = PHY_MODE_2500SGMII;
> + break;
> + default:
> + break;
> + }
> +
> + if (mode != PHY_MODE_INVALID)
> + WARN_ON(phy_set_mode(pp->comphy, mode));
> + }
> +
>   if (new_ctrl0 != gmac_ctrl0)
>   mvreg_write(pp, MVNETA_GMAC_CTRL_0, new_ctrl0);
>   if (new_ctrl2 != gmac_ctrl2)
> @@ -4411,7 +4445,7 @@ static int mvneta_port_power_up(str

Re: [RFC PATCH 2/6] phy: armada38x: add common phy support

2018-11-14 Thread Kishon Vijay Abraham I

Hi,

On 12/11/18 6:00 PM, Russell King wrote:
> Add support for the Armada 38x common phy to allow us to change the
> speed of the Ethernet serdes lane.  This driver only supports
> manipulation of the speed, it does not support configuration of the
> common phy.
> 
> Signed-off-by: Russell King 
> ---
>  drivers/phy/marvell/Kconfig|  10 ++
>  drivers/phy/marvell/Makefile   |   1 +
>  drivers/phy/marvell/phy-armada38x-comphy.c | 236 
> +
>  3 files changed, 247 insertions(+)
>  create mode 100644 drivers/phy/marvell/phy-armada38x-comphy.c
> 
> diff --git a/drivers/phy/marvell/Kconfig b/drivers/phy/marvell/Kconfig
> index 6fb4b56e4c14..224ea4e6a46d 100644
> --- a/drivers/phy/marvell/Kconfig
> +++ b/drivers/phy/marvell/Kconfig
> @@ -21,6 +21,16 @@ config PHY_BERLIN_USB
>   help
> Enable this to support the USB PHY on Marvell Berlin SoCs.
>  
> +config PHY_MVEBU_A38X_COMPHY
> + tristate "Marvell Armada 38x comphy driver"
> + depends on ARCH_MVEBU || COMPILE_TEST
> + depends on OF
> + select GENERIC_PHY
> + help
> +   This driver allows to control the comphy, an hardware block providing
> +   shared serdes PHYs on Marvell Armada 38x. Its serdes lanes can be
> +   used by various controllers (Ethernet, sata, usb, PCIe...).
> +
>  config PHY_MVEBU_CP110_COMPHY
>   tristate "Marvell CP110 comphy driver"
>   depends on ARCH_MVEBU || COMPILE_TEST
> diff --git a/drivers/phy/marvell/Makefile b/drivers/phy/marvell/Makefile
> index 3975b144f8ec..59b6c03ef756 100644
> --- a/drivers/phy/marvell/Makefile
> +++ b/drivers/phy/marvell/Makefile
> @@ -2,6 +2,7 @@
>  obj-$(CONFIG_ARMADA375_USBCLUSTER_PHY)   += phy-armada375-usb2.o
>  obj-$(CONFIG_PHY_BERLIN_SATA)+= phy-berlin-sata.o
>  obj-$(CONFIG_PHY_BERLIN_USB) += phy-berlin-usb.o
> +obj-$(CONFIG_PHY_MVEBU_A38X_COMPHY)  += phy-armada38x-comphy.o
>  obj-$(CONFIG_PHY_MVEBU_CP110_COMPHY) += phy-mvebu-cp110-comphy.o
>  obj-$(CONFIG_PHY_MVEBU_SATA) += phy-mvebu-sata.o
>  obj-$(CONFIG_PHY_PXA_28NM_HSIC)  += phy-pxa-28nm-hsic.o
> diff --git a/drivers/phy/marvell/phy-armada38x-comphy.c 
> b/drivers/phy/marvell/phy-armada38x-comphy.c
> new file mode 100644
> index ..61d1965e1cf6
> --- /dev/null
> +++ b/drivers/phy/marvell/phy-armada38x-comphy.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2018 Russell King, Deep Blue Solutions Ltd.
> + *
> + * Partly derived from CP110 comphy driver by Antoine Tenart
> + * 
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MAX_A38X_COMPHY  6
> +#define MAX_A38X_PORTS   3
> +
> +#define COMPHY_CFG1  0x00
> +#define  COMPHY_CFG1_GEN_TX(x)   ((x) << 26)
> +#define  COMPHY_CFG1_GEN_TX_MSK  COMPHY_CFG1_GEN_TX(15)
> +#define  COMPHY_CFG1_GEN_RX(x)   ((x) << 22)
> +#define  COMPHY_CFG1_GEN_RX_MSK  COMPHY_CFG1_GEN_RX(15)
> +#define  GEN_SGMII_1_25GBPS  6
> +#define  GEN_SGMII_3_125GBPS 8
> +
> +#define COMPHY_STAT1 0x18
> +#define  COMPHY_STAT1_PLL_RDY_TX BIT(3)
> +#define  COMPHY_STAT1_PLL_RDY_RX BIT(2)
> +
> +#define COMPHY_SELECTOR  0xfc
> +
> +struct a38x_comphy;
> +
> +struct a38x_comphy_lane {
> + void __iomem *base;
> + struct a38x_comphy *priv;
> + unsigned int n;
> +
> + int port;
> +};
> +
> +struct a38x_comphy {
> + void __iomem *base;
> + struct device *dev;
> + struct a38x_comphy_lane lane[MAX_A38X_COMPHY];
> +};
> +
> +static const u8 gbe_mux[MAX_A38X_COMPHY][MAX_A38X_PORTS] = {
> + { 0, 0, 0 },
> + { 4, 5, 0 },
> + { 0, 4, 0 },
> + { 0, 0, 4 },
> + { 0, 3, 0 },
> + { 0, 0, 3 },
> +};
> +
> +static void a38x_comphy_set_reg(struct a38x_comphy_lane *lane,
> + unsigned int offset, u32 mask, u32 value)
> +{
> + u32 val;
> +
> + val = readl_relaxed(lane->base + offset) & ~mask;
> + writel(val | value, lane->base + offset);
> +}
> +
> +static void a38x_comphy_set_speed(struct a38x_comphy_lane *lane,
> +   unsigned int gen_tx, unsigned int gen_rx)
> +{
> + a38x_comphy_set_reg(lane, COMPHY_CFG1,
> + COMPHY_CFG1_GEN_TX_MSK | COMPHY_CFG1_GEN_RX_MSK,
> + COMPHY_CFG1_GEN_TX(gen_tx) |
> + COMPHY_CFG1_GEN_RX(gen_rx));
> +}
> +
> +static int a38x_comphy_poll(struct a38x_comphy_lane *lane,
> + unsigned int offset, u32 mask, u32 value)
> +{
> + unsigned int timeout = 10;
> + u32 val;
> +
> + while (1) {
> + val = readl_relaxed(lane->base + offset);
> + if ((val & mask) == value)
> + return 0;
> + if (!timeout--)
> + break;
> + udelay(10);
> + }
> +
> + dev_err(lane->priv->dev, "comphy%u: timed

[PATCH net-next 09/11] mlxsw: spectrum: acl: Push code related to num_ctcam_erps inc/dec into separate helpers

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

Later on the same code is going to be needed for deltas as well. So push
the procedures related to increment and decrement of num_ctcam_erps
into a separate helpers.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../mellanox/mlxsw/spectrum_acl_erp.c | 108 ++
 1 file changed, 59 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index 818e03cf9add..bfd0c8a6cabf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -623,9 +623,50 @@ mlxsw_sp_acl_erp_region_ctcam_disable(struct 
mlxsw_sp_acl_erp_table *erp_table)
mlxsw_sp_acl_erp_table_enable(erp_table, false);
 }
 
+static int
+__mlxsw_sp_acl_erp_table_other_inc(struct mlxsw_sp_acl_erp_table *erp_table,
+  unsigned int *inc_num)
+{
+   int err;
+
+   /* If there are C-TCAM eRPs in use we need to transition
+* the region to use eRP table, if it is not already done
+*/
+   if (erp_table->ops != &erp_two_masks_ops &&
+   erp_table->ops != &erp_multiple_masks_ops) {
+   err = mlxsw_sp_acl_erp_region_table_trans(erp_table);
+   if (err)
+   return err;
+   }
+
+   /* When C-TCAM is used, the eRP table must be used */
+   if (erp_table->ops != &erp_multiple_masks_ops)
+   erp_table->ops = &erp_multiple_masks_ops;
+
+   (*inc_num)++;
+
+   return 0;
+}
+
+static int mlxsw_sp_acl_erp_ctcam_inc(struct mlxsw_sp_acl_erp_table *erp_table)
+{
+   return __mlxsw_sp_acl_erp_table_other_inc(erp_table,
+ &erp_table->num_ctcam_erps);
+}
+
 static void
-mlxsw_sp_acl_erp_ctcam_table_ops_set(struct mlxsw_sp_acl_erp_table *erp_table)
+__mlxsw_sp_acl_erp_table_other_dec(struct mlxsw_sp_acl_erp_table *erp_table,
+  unsigned int *dec_num)
 {
+   (*dec_num)--;
+
+   /* If there are no C-TCAM eRPs in use, the state we
+* transition to depends on the number of A-TCAM eRPs currently
+* in use.
+*/
+   if (erp_table->num_ctcam_erps > 0)
+   return;
+
switch (erp_table->num_atcam_erps) {
case 2:
/* Keep using the eRP table, but correctly set the
@@ -659,9 +700,15 @@ mlxsw_sp_acl_erp_ctcam_table_ops_set(struct 
mlxsw_sp_acl_erp_table *erp_table)
}
 }
 
+static void mlxsw_sp_acl_erp_ctcam_dec(struct mlxsw_sp_acl_erp_table 
*erp_table)
+{
+   __mlxsw_sp_acl_erp_table_other_dec(erp_table,
+  &erp_table->num_ctcam_erps);
+}
+
 static struct mlxsw_sp_acl_erp *
-__mlxsw_sp_acl_erp_ctcam_mask_create(struct mlxsw_sp_acl_erp_table *erp_table,
-struct mlxsw_sp_acl_erp_key *key)
+mlxsw_sp_acl_erp_ctcam_mask_create(struct mlxsw_sp_acl_erp_table *erp_table,
+  struct mlxsw_sp_acl_erp_key *key)
 {
struct mlxsw_sp_acl_erp *erp;
int err;
@@ -673,7 +720,11 @@ __mlxsw_sp_acl_erp_ctcam_mask_create(struct 
mlxsw_sp_acl_erp_table *erp_table,
memcpy(&erp->key, key, sizeof(*key));
bitmap_from_arr32(erp->mask_bitmap, (u32 *) key->mask,
  MLXSW_SP_ACL_TCAM_MASK_LEN);
-   erp_table->num_ctcam_erps++;
+
+   err = mlxsw_sp_acl_erp_ctcam_inc(erp_table);
+   if (err)
+   goto err_erp_ctcam_inc;
+
erp->erp_table = erp_table;
 
err = mlxsw_sp_acl_erp_master_mask_set(erp_table, &erp->key);
@@ -684,50 +735,17 @@ __mlxsw_sp_acl_erp_ctcam_mask_create(struct 
mlxsw_sp_acl_erp_table *erp_table,
if (err)
goto err_erp_region_ctcam_enable;
 
-   /* When C-TCAM is used, the eRP table must be used */
-   erp_table->ops = &erp_multiple_masks_ops;
-
return erp;
 
 err_erp_region_ctcam_enable:
mlxsw_sp_acl_erp_master_mask_clear(erp_table, &erp->key);
 err_master_mask_set:
-   erp_table->num_ctcam_erps--;
+   mlxsw_sp_acl_erp_ctcam_dec(erp_table);
+err_erp_ctcam_inc:
kfree(erp);
return ERR_PTR(err);
 }
 
-static struct mlxsw_sp_acl_erp *
-mlxsw_sp_acl_erp_ctcam_mask_create(struct mlxsw_sp_acl_erp_table *erp_table,
-  struct mlxsw_sp_acl_erp_key *key)
-{
-   struct mlxsw_sp_acl_erp *erp;
-   int err;
-
-   /* There is a special situation where we need to spill rules
-* into the C-TCAM, yet the region is still using a master
-* mask and thus not performing a lookup in the C-TCAM. This
-* can happen when two rules that only differ in priority - and
-* thus sharing the same key - are programmed. In this case
-* we transition the region to use an eRP table
-*/
-   err = mlxsw_sp_acl_erp_region_table_trans(erp_table

[PATCH net-next 11/11] selftests: mlxsw: spectrum-2: Add simple delta test

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

Track the basic codepaths of delta handling, using objagg tracepoints.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh | 82 ++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh 
b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index 84ef95320c96..00ae99fbc253 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -8,7 +8,7 @@
 lib_dir=$(dirname $0)/../../../../net/forwarding
 
 ALL_TESTS="single_mask_test identical_filters_test two_masks_test \
-   multiple_masks_test ctcam_edge_cases_test"
+   multiple_masks_test ctcam_edge_cases_test delta_simple_test"
 NUM_NETIFS=2
 source $lib_dir/tc_common.sh
 source $lib_dir/lib.sh
@@ -324,6 +324,86 @@ ctcam_edge_cases_test()
ctcam_no_atcam_masks_test
 }
 
+tp_record()
+{
+   local tracepoint=$1
+   local cmd=$2
+
+   perf record -q -e $tracepoint $cmd
+   return $?
+}
+
+tp_check_hits()
+{
+   local tracepoint=$1
+   local count=$2
+
+   perf_output=`perf script -F trace:event,trace`
+   hits=`echo $perf_output | grep "$tracepoint:" | wc -l`
+   if [[ "$count" -ne "$hits" ]]; then
+   return 1
+   fi
+   return 0
+}
+
+delta_simple_test()
+{
+   # The first filter will create eRP, the second filter will fit into
+   # the first eRP with delta. Remove the first rule then and check that
+# the eRP stays (referenced by the second filter).
+
+   RET=0
+
+   if [[ "$tcflags" != "skip_sw" ]]; then
+   return 0;
+   fi
+
+   tp_record "objagg:*" "tc filter add dev $h2 ingress protocol ip \
+  pref 1 handle 101 flower $tcflags dst_ip 192.0.0.0/24 \
+  action drop"
+   tp_check_hits "objagg:objagg_obj_root_create" 1
+   check_err $? "eRP was not created"
+
+   tp_record "objagg:*" "tc filter add dev $h2 ingress protocol ip \
+  pref 2 handle 102 flower $tcflags dst_ip 192.0.2.2 \
+  action drop"
+   tp_check_hits "objagg:objagg_obj_root_create" 0
+   check_err $? "eRP was incorrectly created"
+   tp_check_hits "objagg:objagg_obj_parent_assign" 1
+   check_err $? "delta was not created"
+
+   $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+   -t ip -q
+
+   tc_check_packets "dev $h2 ingress" 101 1
+   check_fail $? "Matched a wrong filter"
+
+   tc_check_packets "dev $h2 ingress" 102 1
+   check_err $? "Did not match on correct filter"
+
+   tp_record "objagg:*" "tc filter del dev $h2 ingress protocol ip \
+  pref 1 handle 101 flower"
+   tp_check_hits "objagg:objagg_obj_root_destroy" 0
+   check_err $? "eRP was incorrectly destroyed"
+   tp_check_hits "objagg:objagg_obj_parent_unassign" 0
+   check_err $? "delta was incorrectly destroyed"
+
+   $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
+   -t ip -q
+
+   tc_check_packets "dev $h2 ingress" 102 2
+   check_err $? "Did not match on correct filter after the first was 
removed"
+
+   tp_record "objagg:*" "tc filter del dev $h2 ingress protocol ip \
+  pref 2 handle 102 flower"
+   tp_check_hits "objagg:objagg_obj_parent_unassign" 1
+   check_err $? "delta was not destroyed"
+   tp_check_hits "objagg:objagg_obj_root_destroy" 1
+   check_err $? "eRP was not destroyed"
+
+   log_test "delta simple test ($tcflags)"
+}
+
 setup_prepare()
 {
h1=${NETIFS[p1]}
-- 
2.19.1

[PATCH net-next 07/11] mlxsw: spectrum: acl: Don't encode the key again in mlxsw_sp_acl_atcam_12kb_lkey_id_get()

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

No need to do key encoding again in
mlxsw_sp_acl_atcam_12kb_lkey_id_get(). Instead of that, introduce
a new helper that would just clear unused blocks.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../mellanox/mlxsw/core_acl_flex_keys.c   | 10 +++
 .../mellanox/mlxsw/core_acl_flex_keys.h   |  3 ++
 .../mellanox/mlxsw/spectrum_acl_atcam.c   | 22 +++
 .../mellanox/mlxsw/spectrum_acl_flex_keys.c   | 28 +--
 4 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
index 98c00ea9c398..0900ccfdf315 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
@@ -458,3 +458,13 @@ void mlxsw_afk_encode(struct mlxsw_afk *mlxsw_afk,
}
 }
 EXPORT_SYMBOL(mlxsw_afk_encode);
+
+void mlxsw_afk_clear(struct mlxsw_afk *mlxsw_afk, char *key,
+int block_start, int block_end)
+{
+   int i;
+
+   for (i = block_start; i <= block_end; i++)
+   mlxsw_afk->ops->clear_block(key, i);
+}
+EXPORT_SYMBOL(mlxsw_afk_clear);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
index 6a44501d8af7..a5303c0b53b4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
@@ -189,6 +189,7 @@ struct mlxsw_afk_ops {
const struct mlxsw_afk_block *blocks;
unsigned int blocks_count;
void (*encode_block)(char *output, int block_index, char *block);
+   void (*clear_block)(char *output, int block_index);
 };
 
 struct mlxsw_afk *mlxsw_afk_create(unsigned int max_blocks,
@@ -229,5 +230,7 @@ void mlxsw_afk_encode(struct mlxsw_afk *mlxsw_afk,
  struct mlxsw_afk_key_info *key_info,
  struct mlxsw_afk_element_values *values,
  char *key, char *mask, int block_start, int block_end);
+void mlxsw_afk_clear(struct mlxsw_afk *mlxsw_afk, char *key,
+int block_start, int block_end);
 
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index 5a0b88707269..ffdf464660be 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -14,8 +14,8 @@
 #include "spectrum_acl_tcam.h"
 #include "core_acl_flex_keys.h"
 
-#define MLXSW_SP_ACL_ATCAM_LKEY_ID_BLOCK_START 6
-#define MLXSW_SP_ACL_ATCAM_LKEY_ID_BLOCK_END   11
+#define MLXSW_SP_ACL_ATCAM_LKEY_ID_BLOCK_CLEAR_START   0
+#define MLXSW_SP_ACL_ATCAM_LKEY_ID_BLOCK_CLEAR_END 5
 
 struct mlxsw_sp_acl_atcam_lkey_id_ht_key {
char enc_key[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN]; /* MSB blocks */
@@ -34,7 +34,7 @@ struct mlxsw_sp_acl_atcam_region_ops {
void (*fini)(struct mlxsw_sp_acl_atcam_region *aregion);
struct mlxsw_sp_acl_atcam_lkey_id *
(*lkey_id_get)(struct mlxsw_sp_acl_atcam_region *aregion,
-  struct mlxsw_sp_acl_rule_info *rulei, u8 erp_id);
+  char *enc_key, u8 erp_id);
void (*lkey_id_put)(struct mlxsw_sp_acl_atcam_region *aregion,
struct mlxsw_sp_acl_atcam_lkey_id *lkey_id);
 };
@@ -90,8 +90,7 @@ mlxsw_sp_acl_atcam_region_generic_fini(struct 
mlxsw_sp_acl_atcam_region *aregion
 
 static struct mlxsw_sp_acl_atcam_lkey_id *
 mlxsw_sp_acl_atcam_generic_lkey_id_get(struct mlxsw_sp_acl_atcam_region 
*aregion,
-  struct mlxsw_sp_acl_rule_info *rulei,
-  u8 erp_id)
+  char *enc_key, u8 erp_id)
 {
struct mlxsw_sp_acl_atcam_region_generic *region_generic;
 
@@ -220,8 +219,7 @@ mlxsw_sp_acl_atcam_lkey_id_destroy(struct 
mlxsw_sp_acl_atcam_region *aregion,
 
 static struct mlxsw_sp_acl_atcam_lkey_id *
 mlxsw_sp_acl_atcam_12kb_lkey_id_get(struct mlxsw_sp_acl_atcam_region *aregion,
-   struct mlxsw_sp_acl_rule_info *rulei,
-   u8 erp_id)
+   char *enc_key, u8 erp_id)
 {
struct mlxsw_sp_acl_atcam_region_12kb *region_12kb = aregion->priv;
struct mlxsw_sp_acl_tcam_region *region = aregion->region;
@@ -230,9 +228,10 @@ mlxsw_sp_acl_atcam_12kb_lkey_id_get(struct 
mlxsw_sp_acl_atcam_region *aregion,
struct mlxsw_afk *afk = mlxsw_sp_acl_afk(mlxsw_sp->acl);
struct mlxsw_sp_acl_atcam_lkey_id *lkey_id;
 
-   mlxsw_afk_encode(afk, region->key_info, &rulei->values, ht_key.enc_key,
-NULL, MLXSW_SP_ACL_ATCAM_LKEY_ID_BLOCK_START,
-MLXSW_SP_ACL_ATCAM_LKEY_ID_BLOCK_END);
+   memcpy(ht_ke

[PATCH net-next 00/11] mlxsw: spectrum: acl: Introduce ERP sharing by multiple masks

2018-11-14 Thread Ido Schimmel

Jiri says:

The Spectrum-2 hardware has limitation number of ERPs per-region. In
order to accommodate more masks than number of ERPs, the hardware
supports to insert rules with delta bits. By that, the rules with masks
that differ in up-to 8 consecutive bits can share the same ERP.

Patches 1 and 2 fix couple of issues that would appear in existing
selftests after adding delta support

Patch 3 introduces a generic object aggregation library. Now it is
static, but it will get extended for recalculation of aggregations in
the future in order to reach more optimal aggregation.

Patch 4 just simply converts existing ERP code to use the objagg library
instead of a rhashtable.

Patches 5-9 do more or less small changes to prepare ground for the last
patch.

Patch 10 fills-up delta callbacks of objagg library and utilizes the
delta bits for rule insertion.

The last patch adds selftest to test the mlxsw Spectrum-2 delta flows.

Jiri Pirko (11):
  selftests: Adjust spectrum-2 two_mask_test
  selftests: Adjust spectrum-2 ctcam_two_atcam_masks_test
  lib: introduce initial implementation of object aggregation manager
  mlxsw: spectrum: acl_erp: Convert to use objagg for tracking ERPs
  mlxsw: spectrum: acl: Pass key pointer to master_mask_set/clear
  mlxsw: core_acl: Change order of args of ops->encode_block()
  mlxsw: spectrum: acl: Don't encode the key again in
mlxsw_sp_acl_atcam_12kb_lkey_id_get()
  mlxsw: spectrum: acl: Remove mlxsw_afk_encode() block range args and
key/mask check
  mlxsw: spectrum: acl: Push code related to num_ctcam_erps inc/dec into
separate helpers
  mlxsw: spectrum: acl: Implement delta for ERP
  selftests: mlxsw: spectrum-2: Add simple delta test

 MAINTAINERS   |   8 +
 drivers/net/ethernet/mellanox/mlxsw/Kconfig   |   1 +
 .../mellanox/mlxsw/core_acl_flex_keys.c   |  22 +-
 .../mellanox/mlxsw/core_acl_flex_keys.h   |   7 +-
 drivers/net/ethernet/mellanox/mlxsw/reg.h |   8 +-
 .../mellanox/mlxsw/spectrum2_acl_tcam.c   |  12 +-
 .../mellanox/mlxsw/spectrum_acl_atcam.c   |  75 +-
 .../mellanox/mlxsw/spectrum_acl_ctcam.c   |   5 +-
 .../mellanox/mlxsw/spectrum_acl_erp.c | 454 +++---
 .../mellanox/mlxsw/spectrum_acl_flex_keys.c   |  32 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.h|  42 +-
 include/linux/objagg.h|  46 +
 include/trace/events/objagg.h | 228 +
 lib/Kconfig   |   3 +
 lib/Kconfig.debug |  10 +
 lib/Makefile  |   2 +
 lib/objagg.c  | 501 +++
 lib/test_objagg.c | 835 ++
 .../drivers/net/mlxsw/spectrum-2/tc_flower.sh |  86 +-
 19 files changed, 2184 insertions(+), 193 deletions(-)
 create mode 100644 include/linux/objagg.h
 create mode 100644 include/trace/events/objagg.h
 create mode 100644 lib/objagg.c
 create mode 100644 lib/test_objagg.c

-- 
2.19.1

[PATCH net-next 02/11] selftests: Adjust spectrum-2 ctcam_two_atcam_masks_test

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

In order for this to behave as required with delta bits, change the mask
for rule with handle 103.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh 
b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index 6b96eb6f6e74..84ef95320c96 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -235,7 +235,7 @@ ctcam_two_atcam_masks_test()
$tcflags dst_ip 192.0.2.2 action drop
# Filter goes into A-TCAM
tc filter add dev $h2 ingress protocol ip pref 3 handle 103 flower \
-   $tcflags dst_ip 192.0.2.0/24 action drop
+   $tcflags dst_ip 192.0.0.0/16 action drop
 
$MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
-t ip -q
-- 
2.19.1

[PATCH net-next 08/11] mlxsw: spectrum: acl: Remove mlxsw_afk_encode() block range args and key/mask check

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

Since two remaining users of mlxsw_afk_encode() do not specify
block ranges to work on, remove the args. Also, key/mask is always
non-NULL now, so skip the checks.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c | 12 ++--
 .../net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h |  2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c |  4 +---
 .../net/ethernet/mellanox/mlxsw/spectrum_acl_ctcam.c |  5 +
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
index 0900ccfdf315..df78d23b3ec3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
@@ -426,15 +426,17 @@ mlxsw_sp_afk_encode_one(const struct 
mlxsw_afk_element_inst *elinst,
 void mlxsw_afk_encode(struct mlxsw_afk *mlxsw_afk,
  struct mlxsw_afk_key_info *key_info,
  struct mlxsw_afk_element_values *values,
- char *key, char *mask, int block_start, int block_end)
+ char *key, char *mask)
 {
+   unsigned int blocks_count =
+   mlxsw_afk_key_info_blocks_count_get(key_info);
char block_mask[MLXSW_SP_AFK_KEY_BLOCK_MAX_SIZE];
char block_key[MLXSW_SP_AFK_KEY_BLOCK_MAX_SIZE];
const struct mlxsw_afk_element_inst *elinst;
enum mlxsw_afk_element element;
int block_index, i;
 
-   for (i = block_start; i <= block_end; i++) {
+   for (i = 0; i < blocks_count; i++) {
memset(block_key, 0, MLXSW_SP_AFK_KEY_BLOCK_MAX_SIZE);
memset(block_mask, 0, MLXSW_SP_AFK_KEY_BLOCK_MAX_SIZE);
 
@@ -451,10 +453,8 @@ void mlxsw_afk_encode(struct mlxsw_afk *mlxsw_afk,
values->storage.mask);
}
 
-   if (key)
-   mlxsw_afk->ops->encode_block(key, i, block_key);
-   if (mask)
-   mlxsw_afk->ops->encode_block(mask, i, block_mask);
+   mlxsw_afk->ops->encode_block(key, i, block_key);
+   mlxsw_afk->ops->encode_block(mask, i, block_mask);
}
 }
 EXPORT_SYMBOL(mlxsw_afk_encode);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
index a5303c0b53b4..bcd264135af7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
@@ -229,7 +229,7 @@ void mlxsw_afk_values_add_buf(struct 
mlxsw_afk_element_values *values,
 void mlxsw_afk_encode(struct mlxsw_afk *mlxsw_afk,
  struct mlxsw_afk_key_info *key_info,
  struct mlxsw_afk_element_values *values,
- char *key, char *mask, int block_start, int block_end);
+ char *key, char *mask);
 void mlxsw_afk_clear(struct mlxsw_afk *mlxsw_afk, char *key,
 int block_start, int block_end);
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index ffdf464660be..12798ce33a60 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -439,12 +439,10 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
char mask[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN] = { 0 };
struct mlxsw_afk *afk = mlxsw_sp_acl_afk(mlxsw_sp->acl);
struct mlxsw_sp_acl_erp_mask *erp_mask;
-   unsigned int blocks_count;
int err;
 
-   blocks_count = mlxsw_afk_key_info_blocks_count_get(region->key_info);
mlxsw_afk_encode(afk, region->key_info, &rulei->values,
-aentry->ht_key.enc_key, mask, 0, blocks_count - 1);
+aentry->ht_key.enc_key, mask);
 
erp_mask = mlxsw_sp_acl_erp_mask_get(aregion, mask, false);
if (IS_ERR(erp_mask))
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_ctcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_ctcam.c
index e3c6fe8b1d40..f3e834bfea1a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_ctcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_ctcam.c
@@ -46,7 +46,6 @@ mlxsw_sp_acl_ctcam_region_entry_insert(struct mlxsw_sp 
*mlxsw_sp,
struct mlxsw_sp_acl_tcam_region *region = cregion->region;
struct mlxsw_afk *afk = mlxsw_sp_acl_afk(mlxsw_sp->acl);
char ptce2_pl[MLXSW_REG_PTCE2_LEN];
-   unsigned int blocks_count;
char *act_set;
u32 priority;
char *mask;
@@ -63,9 +62,7 @@ mlxsw_sp_acl_ctcam_region_entry_insert(struct mlxsw_sp 
*mlxsw_sp,
 centry->parman_item.index, priority);
key = mlxsw_reg_ptce2

[PATCH net-next 01/11] selftests: Adjust spectrum-2 two_mask_test

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

In order for this to behave as required with delta bits, change the mask
for rule with handle 103.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh 
b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
index 3b75180f455d..6b96eb6f6e74 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/spectrum-2/tc_flower.sh
@@ -142,7 +142,7 @@ two_masks_test()
tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
$tcflags dst_ip 192.0.2.2 action drop
tc filter add dev $h2 ingress protocol ip pref 3 handle 103 flower \
-   $tcflags dst_ip 192.0.0.0/16 action drop
+   $tcflags dst_ip 192.0.0.0/8 action drop
 
$MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \
-t ip -q
-- 
2.19.1

[PATCH net-next 04/11] mlxsw: spectrum: acl_erp: Convert to use objagg for tracking ERPs

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

Currently the ERPs are tracked internally in a hashtable. Benefit from
the newly introduced objagg library and use it to track ERPs. At this
point, there is no nesting of objects done, as the delta_create callback
always returns -EOPNOTSUPP. On the way, add "mask" into ERP mask get and
set functions and struct names.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig   |   1 +
 .../mellanox/mlxsw/spectrum2_acl_tcam.c   |  12 +-
 .../mellanox/mlxsw/spectrum_acl_atcam.c   |  22 +--
 .../mellanox/mlxsw/spectrum_acl_erp.c | 135 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.h|  19 +--
 5 files changed, 99 insertions(+), 90 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig 
b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index 8a291eb36c64..080ddd1942ec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -80,6 +80,7 @@ config MLXSW_SPECTRUM
depends on IPV6_GRE || IPV6_GRE=n
select GENERIC_ALLOCATOR
select PARMAN
+   select OBJAGG
select MLXFW
default m
---help---
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum2_acl_tcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum2_acl_tcam.c
index 8ca77f3e8f27..62e6cf4bc16e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum2_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum2_acl_tcam.c
@@ -34,15 +34,15 @@ mlxsw_sp2_acl_ctcam_region_entry_insert(struct 
mlxsw_sp_acl_ctcam_region *cregio
 {
struct mlxsw_sp_acl_atcam_region *aregion;
struct mlxsw_sp_acl_atcam_entry *aentry;
-   struct mlxsw_sp_acl_erp *erp;
+   struct mlxsw_sp_acl_erp_mask *erp_mask;
 
aregion = mlxsw_sp_acl_tcam_cregion_aregion(cregion);
aentry = mlxsw_sp_acl_tcam_centry_aentry(centry);
 
-   erp = mlxsw_sp_acl_erp_get(aregion, mask, true);
-   if (IS_ERR(erp))
-   return PTR_ERR(erp);
-   aentry->erp = erp;
+   erp_mask = mlxsw_sp_acl_erp_mask_get(aregion, mask, true);
+   if (IS_ERR(erp_mask))
+   return PTR_ERR(erp_mask);
+   aentry->erp_mask = erp_mask;
 
return 0;
 }
@@ -57,7 +57,7 @@ mlxsw_sp2_acl_ctcam_region_entry_remove(struct 
mlxsw_sp_acl_ctcam_region *cregio
aregion = mlxsw_sp_acl_tcam_cregion_aregion(cregion);
aentry = mlxsw_sp_acl_tcam_centry_aentry(centry);
 
-   mlxsw_sp_acl_erp_put(aregion, aentry->erp);
+   mlxsw_sp_acl_erp_mask_put(aregion, aentry->erp_mask);
 }
 
 static const struct mlxsw_sp_acl_ctcam_region_ops
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index 2dda028f94db..5a0b88707269 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -64,7 +64,7 @@ static const struct rhashtable_params 
mlxsw_sp_acl_atcam_entries_ht_params = {
 static bool
 mlxsw_sp_acl_atcam_is_centry(const struct mlxsw_sp_acl_atcam_entry *aentry)
 {
-   return mlxsw_sp_acl_erp_is_ctcam_erp(aentry->erp);
+   return mlxsw_sp_acl_erp_mask_is_ctcam(aentry->erp_mask);
 }
 
 static int
@@ -379,7 +379,7 @@ mlxsw_sp_acl_atcam_region_entry_insert(struct mlxsw_sp 
*mlxsw_sp,
   struct mlxsw_sp_acl_rule_info *rulei)
 {
struct mlxsw_sp_acl_tcam_region *region = aregion->region;
-   u8 erp_id = mlxsw_sp_acl_erp_id(aentry->erp);
+   u8 erp_id = mlxsw_sp_acl_erp_mask_erp_id(aentry->erp_mask);
struct mlxsw_sp_acl_atcam_lkey_id *lkey_id;
char ptce3_pl[MLXSW_REG_PTCE3_LEN];
u32 kvdl_index, priority;
@@ -418,7 +418,7 @@ mlxsw_sp_acl_atcam_region_entry_remove(struct mlxsw_sp 
*mlxsw_sp,
 {
struct mlxsw_sp_acl_atcam_lkey_id *lkey_id = aentry->lkey_id;
struct mlxsw_sp_acl_tcam_region *region = aregion->region;
-   u8 erp_id = mlxsw_sp_acl_erp_id(aentry->erp);
+   u8 erp_id = mlxsw_sp_acl_erp_mask_erp_id(aentry->erp_mask);
char ptce3_pl[MLXSW_REG_PTCE3_LEN];
 
mlxsw_reg_ptce3_pack(ptce3_pl, false, MLXSW_REG_PTCE3_OP_WRITE_WRITE, 0,
@@ -438,7 +438,7 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_acl_tcam_region *region = aregion->region;
char mask[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN] = { 0 };
struct mlxsw_afk *afk = mlxsw_sp_acl_afk(mlxsw_sp->acl);
-   struct mlxsw_sp_acl_erp *erp;
+   struct mlxsw_sp_acl_erp_mask *erp_mask;
unsigned int blocks_count;
int err;
 
@@ -446,11 +446,11 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
mlxsw_afk_encode(afk, region->key_info, &rulei->values,
 aentry->ht_key.enc_key, mask, 0, blocks_count - 1);
 
-   erp = mlxsw_sp_acl_erp_get(aregion, mask, false);
-   if (IS_ERR(erp))
-   return PTR_ERR(

[PATCH net-next 03/11] lib: introduce initial implementation of object aggregation manager

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

This lib tracks objects which could be of two types:
1) root object
2) nested object - with a "delta" which differentiates it from
   the associated root object
The objects are tracked by a hashtable and reference-counted. User is
responsible of implementing callbacks to create/destroy root entity
related to each root object and callback to create/destroy nested object
delta.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 MAINTAINERS   |   8 +
 include/linux/objagg.h|  46 ++
 include/trace/events/objagg.h | 228 ++
 lib/Kconfig   |   3 +
 lib/Kconfig.debug |  10 +
 lib/Makefile  |   2 +
 lib/objagg.c  | 501 
 lib/test_objagg.c | 835 ++
 8 files changed, 1633 insertions(+)
 create mode 100644 include/linux/objagg.h
 create mode 100644 include/trace/events/objagg.h
 create mode 100644 lib/objagg.c
 create mode 100644 lib/test_objagg.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e110e327bf38..3bd775ba51ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10679,6 +10679,14 @@ L: linux-...@lists.01.org (moderated for 
non-subscribers)
 S: Supported
 F: drivers/nfc/nxp-nci
 
+OBJAGG
+M: Jiri Pirko 
+L: netdev@vger.kernel.org
+S: Supported
+F: lib/objagg.c
+F: lib/test_objagg.c
+F: include/linux/objagg.h
+
 OBJTOOL
 M: Josh Poimboeuf 
 M: Peter Zijlstra 
diff --git a/include/linux/objagg.h b/include/linux/objagg.h
new file mode 100644
index ..34f38c186ea0
--- /dev/null
+++ b/include/linux/objagg.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 */
+/* Copyright (c) 2018 Mellanox Technologies. All rights reserved */
+
+#ifndef _OBJAGG_H
+#define _OBJAGG_H
+
+struct objagg_ops {
+   size_t obj_size;
+   void * (*delta_create)(void *priv, void *parent_obj, void *obj);
+   void (*delta_destroy)(void *priv, void *delta_priv);
+   void * (*root_create)(void *priv, void *obj);
+   void (*root_destroy)(void *priv, void *root_priv);
+};
+
+struct objagg;
+struct objagg_obj;
+
+const void *objagg_obj_root_priv(const struct objagg_obj *objagg_obj);
+const void *objagg_obj_delta_priv(const struct objagg_obj *objagg_obj);
+const void *objagg_obj_raw(const struct objagg_obj *objagg_obj);
+
+struct objagg_obj *objagg_obj_get(struct objagg *objagg, void *obj);
+void objagg_obj_put(struct objagg *objagg, struct objagg_obj *objagg_obj);
+struct objagg *objagg_create(const struct objagg_ops *ops, void *priv);
+void objagg_destroy(struct objagg *objagg);
+
+struct objagg_obj_stats {
+   unsigned int user_count;
+   unsigned int delta_user_count; /* includes delta object users */
+};
+
+struct objagg_obj_stats_info {
+   struct objagg_obj_stats stats;
+   struct objagg_obj *objagg_obj; /* associated object */
+   bool is_root;
+};
+
+struct objagg_stats {
+   unsigned int stats_info_count;
+   struct objagg_obj_stats_info stats_info[];
+};
+
+const struct objagg_stats *objagg_stats_get(struct objagg *objagg);
+void objagg_stats_put(const struct objagg_stats *objagg_stats);
+
+#endif
diff --git a/include/trace/events/objagg.h b/include/trace/events/objagg.h
new file mode 100644
index ..fcec0fc9eb0c
--- /dev/null
+++ b/include/trace/events/objagg.h
@@ -0,0 +1,228 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 */
+/* Copyright (c) 2018 Mellanox Technologies. All rights reserved */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM objagg
+
+#if !defined(__TRACE_OBJAGG_H) || defined(TRACE_HEADER_MULTI_READ)
+#define __TRACE_OBJAGG_H
+
+#include 
+
+struct objagg;
+struct objagg_obj;
+
+TRACE_EVENT(objagg_create,
+   TP_PROTO(const struct objagg *objagg),
+
+   TP_ARGS(objagg),
+
+   TP_STRUCT__entry(
+   __field(const void *, objagg)
+   ),
+
+   TP_fast_assign(
+   __entry->objagg = objagg;
+   ),
+
+   TP_printk("objagg %p", __entry->objagg)
+);
+
+TRACE_EVENT(objagg_destroy,
+   TP_PROTO(const struct objagg *objagg),
+
+   TP_ARGS(objagg),
+
+   TP_STRUCT__entry(
+   __field(const void *, objagg)
+   ),
+
+   TP_fast_assign(
+   __entry->objagg = objagg;
+   ),
+
+   TP_printk("objagg %p", __entry->objagg)
+);
+
+TRACE_EVENT(objagg_obj_create,
+   TP_PROTO(const struct objagg *objagg,
+const struct objagg_obj *obj),
+
+   TP_ARGS(objagg, obj),
+
+   TP_STRUCT__entry(
+   __field(const void *, objagg)
+   __field(const void *, obj)
+   ),
+
+   TP_fast_assign(
+   __entry->objagg = objagg;
+   __entry->obj = obj;
+   ),
+
+   TP_printk("objagg %p, obj %p", __entry->objagg, __entry->obj)
+);
+
+TRACE_EVENT(objagg_obj_destroy,
+   TP_PROTO(const struct objagg *objagg,
+const struct objag

[PATCH net-next 10/11] mlxsw: spectrum: acl: Implement delta for ERP

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

Allow ERP sharing for multiple mask. Do it by properly implementing
delta_create() objagg object. Use the computed delta info for inserting
rules in A-TCAM.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h |   8 +-
 .../mellanox/mlxsw/spectrum_acl_atcam.c   |  27 ++-
 .../mellanox/mlxsw/spectrum_acl_erp.c | 193 +-
 .../mellanox/mlxsw/spectrum_acl_tcam.h|  21 +-
 4 files changed, 237 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index db3d2790aeec..d3babcc49fd2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -2834,8 +2834,9 @@ static inline void mlxsw_reg_ptce3_pack(char *payload, 
bool valid,
u32 priority,
const char *tcam_region_info,
const char *key, u8 erp_id,
-   bool large_exists, u32 lkey_id,
-   u32 action_pointer)
+   u16 delta_start, u8 delta_mask,
+   u8 delta_value, bool large_exists,
+   u32 lkey_id, u32 action_pointer)
 {
MLXSW_REG_ZERO(ptce3, payload);
mlxsw_reg_ptce3_v_set(payload, valid);
@@ -2844,6 +2845,9 @@ static inline void mlxsw_reg_ptce3_pack(char *payload, 
bool valid,
mlxsw_reg_ptce3_tcam_region_info_memcpy_to(payload, tcam_region_info);
mlxsw_reg_ptce3_flex2_key_blocks_memcpy_to(payload, key);
mlxsw_reg_ptce3_erp_id_set(payload, erp_id);
+   mlxsw_reg_ptce3_delta_start_set(payload, delta_start);
+   mlxsw_reg_ptce3_delta_mask_set(payload, delta_mask);
+   mlxsw_reg_ptce3_delta_value_set(payload, delta_value);
mlxsw_reg_ptce3_large_exists_set(payload, large_exists);
mlxsw_reg_ptce3_large_entry_key_id_set(payload, lkey_id);
mlxsw_reg_ptce3_action_pointer_set(payload, action_pointer);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
index 12798ce33a60..e7bd8733e58e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_atcam.c
@@ -398,6 +398,9 @@ mlxsw_sp_acl_atcam_region_entry_insert(struct mlxsw_sp 
*mlxsw_sp,
mlxsw_reg_ptce3_pack(ptce3_pl, true, MLXSW_REG_PTCE3_OP_WRITE_WRITE,
 priority, region->tcam_region_info,
 aentry->ht_key.enc_key, erp_id,
+aentry->delta_info.start,
+aentry->delta_info.mask,
+aentry->delta_info.value,
 refcount_read(&lkey_id->refcnt) != 1, lkey_id->id,
 kvdl_index);
err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ptce3), ptce3_pl);
@@ -419,11 +422,16 @@ mlxsw_sp_acl_atcam_region_entry_remove(struct mlxsw_sp 
*mlxsw_sp,
struct mlxsw_sp_acl_atcam_lkey_id *lkey_id = aentry->lkey_id;
struct mlxsw_sp_acl_tcam_region *region = aregion->region;
u8 erp_id = mlxsw_sp_acl_erp_mask_erp_id(aentry->erp_mask);
+   char *enc_key = aentry->ht_key.enc_key;
char ptce3_pl[MLXSW_REG_PTCE3_LEN];
 
mlxsw_reg_ptce3_pack(ptce3_pl, false, MLXSW_REG_PTCE3_OP_WRITE_WRITE, 0,
-region->tcam_region_info, aentry->ht_key.enc_key,
-erp_id, refcount_read(&lkey_id->refcnt) != 1,
+region->tcam_region_info,
+enc_key, erp_id,
+aentry->delta_info.start,
+aentry->delta_info.mask,
+aentry->delta_info.value,
+refcount_read(&lkey_id->refcnt) != 1,
 lkey_id->id, 0);
mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ptce3), ptce3_pl);
aregion->ops->lkey_id_put(aregion, lkey_id);
@@ -438,17 +446,30 @@ __mlxsw_sp_acl_atcam_entry_add(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_acl_tcam_region *region = aregion->region;
char mask[MLXSW_REG_PTCEX_FLEX_KEY_BLOCKS_LEN] = { 0 };
struct mlxsw_afk *afk = mlxsw_sp_acl_afk(mlxsw_sp->acl);
+   const struct mlxsw_sp_acl_erp_delta *delta;
struct mlxsw_sp_acl_erp_mask *erp_mask;
int err;
 
mlxsw_afk_encode(afk, region->key_info, &rulei->values,
-aentry->ht_key.enc_key, mask);
+aentry->full_enc_key, mask);
 
erp_mask = mlxsw_sp_acl_erp_mask_get(aregion, mask, false);
if (IS_ERR(erp_mask))
return PTR_ERR(erp_mask);
aentry->erp_mask = erp_mask;
aentry

[PATCH net-next 06/11] mlxsw: core_acl: Change order of args of ops->encode_block()

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

Change order so it is aligned with the usual case where the "write_to"
buffer comes as the first arg.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c  | 4 ++--
 drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h  | 2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c  | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
index 785bf01fe2be..98c00ea9c398 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c
@@ -452,9 +452,9 @@ void mlxsw_afk_encode(struct mlxsw_afk *mlxsw_afk,
}
 
if (key)
-   mlxsw_afk->ops->encode_block(block_key, i, key);
+   mlxsw_afk->ops->encode_block(key, i, block_key);
if (mask)
-   mlxsw_afk->ops->encode_block(block_mask, i, mask);
+   mlxsw_afk->ops->encode_block(mask, i, block_mask);
}
 }
 EXPORT_SYMBOL(mlxsw_afk_encode);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
index c29c045d826d..6a44501d8af7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
@@ -188,7 +188,7 @@ struct mlxsw_afk;
 struct mlxsw_afk_ops {
const struct mlxsw_afk_block *blocks;
unsigned int blocks_count;
-   void (*encode_block)(char *block, int block_index, char *output);
+   void (*encode_block)(char *output, int block_index, char *block);
 };
 
 struct mlxsw_afk *mlxsw_afk_create(unsigned int max_blocks,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c
index d409b09ba8df..9b93c6c3c89b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c
@@ -98,8 +98,8 @@ static const struct mlxsw_afk_block mlxsw_sp1_afk_blocks[] = {
 
 #define MLXSW_SP1_AFK_KEY_BLOCK_SIZE 16
 
-static void mlxsw_sp1_afk_encode_block(char *block, int block_index,
-  char *output)
+static void mlxsw_sp1_afk_encode_block(char *output, int block_index,
+  char *block)
 {
unsigned int offset = block_index * MLXSW_SP1_AFK_KEY_BLOCK_SIZE;
char *output_indexed = output + offset;
@@ -263,8 +263,8 @@ static const struct mlxsw_sp2_afk_block_layout 
mlxsw_sp2_afk_blocks_layout[] = {
MLXSW_SP2_AFK_BLOCK_LAYOUT(block11, 0x00, 12),
 };
 
-static void mlxsw_sp2_afk_encode_block(char *block, int block_index,
-  char *output)
+static void mlxsw_sp2_afk_encode_block(char *output, int block_index,
+  char *block)
 {
u64 block_value = mlxsw_sp2_afk_block_value_get(block);
const struct mlxsw_sp2_afk_block_layout *block_layout;
-- 
2.19.1

[PATCH net-next 05/11] mlxsw: spectrum: acl: Pass key pointer to master_mask_set/clear

2018-11-14 Thread Ido Schimmel

From: Jiri Pirko 

The device requires that the master mask of each region will be
composed from a logical OR between all the unmasked bits in the region.
Currently, this is just a logical OR between all the eRPs used in the
region, but the next patch is going to introduce delta bits support
which need to be taken into account as well.

Since the eRP does not include the delta bits, pass the key pointer to
mlxsw_sp_acl_erp_master_mask_set/clear instead. Convert key->mask to
the bitmap on fly.

Signed-off-by: Jiri Pirko 
Signed-off-by: Ido Schimmel 
---
 .../mellanox/mlxsw/spectrum_acl_erp.c | 30 +++
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index 52cbdf79bc18..818e03cf9add 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -176,12 +176,15 @@ mlxsw_sp_acl_erp_master_mask_update(struct 
mlxsw_sp_acl_erp_table *erp_table)
 
 static int
 mlxsw_sp_acl_erp_master_mask_set(struct mlxsw_sp_acl_erp_table *erp_table,
-const struct mlxsw_sp_acl_erp *erp)
+struct mlxsw_sp_acl_erp_key *key)
 {
+   DECLARE_BITMAP(mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN);
unsigned long bit;
int err;
 
-   for_each_set_bit(bit, erp->mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
+   bitmap_from_arr32(mask_bitmap, (u32 *) key->mask,
+ MLXSW_SP_ACL_TCAM_MASK_LEN);
+   for_each_set_bit(bit, mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
mlxsw_sp_acl_erp_master_mask_bit_set(bit,
 &erp_table->master_mask);
 
@@ -192,7 +195,7 @@ mlxsw_sp_acl_erp_master_mask_set(struct 
mlxsw_sp_acl_erp_table *erp_table,
return 0;
 
 err_master_mask_update:
-   for_each_set_bit(bit, erp->mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
+   for_each_set_bit(bit, mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
mlxsw_sp_acl_erp_master_mask_bit_clear(bit,
   &erp_table->master_mask);
return err;
@@ -200,12 +203,15 @@ mlxsw_sp_acl_erp_master_mask_set(struct 
mlxsw_sp_acl_erp_table *erp_table,
 
 static int
 mlxsw_sp_acl_erp_master_mask_clear(struct mlxsw_sp_acl_erp_table *erp_table,
-  const struct mlxsw_sp_acl_erp *erp)
+  struct mlxsw_sp_acl_erp_key *key)
 {
+   DECLARE_BITMAP(mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN);
unsigned long bit;
int err;
 
-   for_each_set_bit(bit, erp->mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
+   bitmap_from_arr32(mask_bitmap, (u32 *) key->mask,
+ MLXSW_SP_ACL_TCAM_MASK_LEN);
+   for_each_set_bit(bit, mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
mlxsw_sp_acl_erp_master_mask_bit_clear(bit,
   &erp_table->master_mask);
 
@@ -216,7 +222,7 @@ mlxsw_sp_acl_erp_master_mask_clear(struct 
mlxsw_sp_acl_erp_table *erp_table,
return 0;
 
 err_master_mask_update:
-   for_each_set_bit(bit, erp->mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
+   for_each_set_bit(bit, mask_bitmap, MLXSW_SP_ACL_TCAM_MASK_LEN)
mlxsw_sp_acl_erp_master_mask_bit_set(bit,
 &erp_table->master_mask);
return err;
@@ -238,13 +244,11 @@ mlxsw_sp_acl_erp_generic_create(struct 
mlxsw_sp_acl_erp_table *erp_table,
goto err_erp_id_get;
 
memcpy(&erp->key, key, sizeof(*key));
-   bitmap_from_arr32(erp->mask_bitmap, (u32 *) key->mask,
- MLXSW_SP_ACL_TCAM_MASK_LEN);
list_add(&erp->list, &erp_table->atcam_erps_list);
erp_table->num_atcam_erps++;
erp->erp_table = erp_table;
 
-   err = mlxsw_sp_acl_erp_master_mask_set(erp_table, erp);
+   err = mlxsw_sp_acl_erp_master_mask_set(erp_table, &erp->key);
if (err)
goto err_master_mask_set;
 
@@ -264,7 +268,7 @@ mlxsw_sp_acl_erp_generic_destroy(struct mlxsw_sp_acl_erp 
*erp)
 {
struct mlxsw_sp_acl_erp_table *erp_table = erp->erp_table;
 
-   mlxsw_sp_acl_erp_master_mask_clear(erp_table, erp);
+   mlxsw_sp_acl_erp_master_mask_clear(erp_table, &erp->key);
erp_table->num_atcam_erps--;
list_del(&erp->list);
mlxsw_sp_acl_erp_id_put(erp_table, erp->id);
@@ -672,7 +676,7 @@ __mlxsw_sp_acl_erp_ctcam_mask_create(struct 
mlxsw_sp_acl_erp_table *erp_table,
erp_table->num_ctcam_erps++;
erp->erp_table = erp_table;
 
-   err = mlxsw_sp_acl_erp_master_mask_set(erp_table, erp);
+   err = mlxsw_sp_acl_erp_master_mask_set(erp_table, &erp->key);
if (err)
goto err_master_mask_set;
 
@@ -686,7 +690,7 @@ __mlxsw_sp_acl_erp_ctcam

Re: [RFC PATCH 0/6] Armada 38x comphy driver to support 2.5Gbps networking

2018-11-14 Thread Kishon Vijay Abraham I

Hi,

On 12/11/18 5:59 PM, Russell King - ARM Linux wrote:
> Hi,
> 
> This series adds support for dynamically switching between 1Gbps
> and 2.5Gbps networking for the Marvell Armada 38x SoCs, tested on
> Armada 388 on the Clearfog platform.
> 
> This is necessary to be able to connect (eg) a Clearfog platform
> with a Macchiatobin platform via the SFP sockets, as Clearfog
> currently only supports 1Gbps networking via the SFP socket and
> Macchiatobin defaults to 2.5Gbps when using Fiberchannel SFPs.
> 
> In order to allow dynamic switching, we need to implement a common
> phy driver to switch the ethernet serdes lane speed - 2.5Gbps is
> just 1Gbps up-clocked by 2.5x.  We implement a simple comphy
> driver to achieve this, which only supports networking.
> 
> With this, we are able to support both Fiberchannel SFPs operating
> at 2.5Gbps or 1Gbps, and 1G ethernet SFPs plugged into the Clearfog
> platform, dynamically selecting according to the SFPs abilities.
> 
> I'm aware of the proposed changes to the PHY layer, changing
> phy_set_mode() to take the ethernet phy interface type, hence why
> this is RFC - there's also the question about how this will be
> merged.  This series is currently based on 4.20-rc1, but will
> likely need to be rebased when the PHY layer changes hit.

For this case, I'd prefer the phy_set_mode series and the phy and net changes
here (after rebasing) go via linux-phy tree.

Thanks
Kishon

af_xdp zero copy ideas

2018-11-14 Thread Michael S. Tsirkin

So a I mentioned during the presentation for the af_xdp zero copy I
think it's pretty important to be able to close the device and get back
the affected memory. One way would be to unmap the DMA memory from
userspace and map in some other memory. It's tricky since you need
to also replace the mapping to the backing file which could be
hugetlbfs, tmpfs, just a file ...

HTH,

-- 
MST

98 matches

Mail list logo