[net-next PATCH v2] bpf: devmap fix mutex in rcu critical section

2017-08-04 Thread John Fastabend
Originally we used a mutex to protect concurrent devmap update
and delete operations from racing with netdev unregister notifier
callbacks.

The notifier hook is needed because we increment the netdev ref
count when a dev is added to the devmap. This ensures the netdev
reference is valid in the datapath. However, we don't want to block
unregister events, hence the initial mutex and notifier handler.

The concern was in the notifier hook we search the map for dev
entries that hold a refcnt on the net device being torn down. But,
in order to do this we require two steps,

  (i) dereference the netdev:  dev = rcu_dereference(map[i])
 (ii) test ifindex:   dev->ifindex == removing_ifindex

and then finally we can swap in the NULL dev in the map via an
xchg operation,

  xchg(map[i], NULL)

The danger here is a concurrent update could run a different
xchg op concurrently leading us to replace the new dev with a
NULL dev incorrectly.

  CPU 1CPU 2

   notifier hook   bpf devmap update

   dev = rcu_dereference(map[i])
   dev = rcu_dereference(map[i])
   xchg(map[i]), new_dev);
   rcu_call(dev,...)
   xchg(map[i], NULL)

The above flow would create the incorrect state with the dev
reference in the update path being lost. To resolve this the
original code used a mutex around the above block. However,
updates, deletes, and lookups occur inside rcu critical sections
so we can't use a mutex in this context safely.

Fortunately, by writing slightly better code we can avoid the
mutex altogether. If CPU 1 in the above example uses a cmpxchg
and _only_ replaces the dev reference in the map when it is in
fact the expected dev the race is removed completely. The two
cases being illustrated here, first the race condition,

  CPU 1  CPU 2

   notifier hook bpf devmap update

   dev = rcu_dereference(map[i])
 dev = rcu_dereference(map[i])
 xchg(map[i]), new_dev);
 rcu_call(dev,...)
   odev = cmpxchg(map[i], dev, NULL)

Now we can test the cmpxchg return value, detect odev != dev and
abort. Or in the good case,

  CPU 1  CPU 2

   notifier hook bpf devmap update
   dev = rcu_dereference(map[i])
   odev = cmpxchg(map[i], dev, NULL)
 [...]

Now 'odev == dev' and we can do proper cleanup.

And viola the original race we tried to solve with a mutex is
corrected and the trace noted by Sasha below is resolved due
to removal of the mutex.

Note: When walking the devmap and removing dev references as needed
we depend on the core to fail any calls to dev_get_by_index() using
the ifindex of the device being removed. This way we do not race with
the user while searching the devmap.

Additionally, the mutex was also protecting list add/del/read on
the list of maps in-use. This patch converts this to an RCU list
and spinlock implementation. This protects the list from concurrent
alloc/free operations. The notifier hook walks this list so it uses
RCU read semantics.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1
1 lock held by syz-executor1/16315:
 #0:  (rcu_read_lock){..}, at: [] map_delete_elem 
kernel/bpf/syscall.c:577 [inline]
 #0:  (rcu_read_lock){..}, at: [] SYSC_bpf 
kernel/bpf/syscall.c:1427 [inline]
 #0:  (rcu_read_lock){..}, at: [] SyS_bpf+0x1d32/0x4ba0 
kernel/bpf/syscall.c:1388

Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map")
Reported-by: Sasha Levin 
Signed-off-by: Daniel Borkmann 
Signed-off-by: John Fastabend 
---
 kernel/bpf/devmap.c |   48 +---
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index d439ee0..7192fb6 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -40,11 +40,12 @@
  * contain a reference to the net device and remove them. This is a two step
  * process (a) dereference the bpf_dtab_netdev object in netdev_map and (b)
  * check to see if the ifindex is the same as the net_device being removed.
- * Unfortunately, the xchg() operations do not protect against this. To avoid
- * potentially removing incorrect objects the dev_map_list_mutex protects
- * conflicting netdev unregister and BPF syscall operations. Updates and
- * deletes from a BPF program (done in rcu critical section) are blocked
- * because of this mutex.
+ * When removing the dev a cmpxchg() is used to ensure the correct dev is
+ * removed, in the case of a concurrent update or delete operation it is
+ * possible that the initially referenced dev is no 

[Patch net-next 1/2] net_sched: refactor notification code for RTM_DELTFILTER

2017-08-04 Thread Cong Wang
It is confusing to use 'unsigned long fh' as both a handle
and a pointer, especially commit 9ee7837449b3
("net sched filters: fix notification of filter delete with proper handle").

This patch introduces tfilter_del_notify() so that we can
pass it as a pointer as before, and we don't need to check
RTM_DELTFILTER in tcf_fill_node() any more.

This prepares for the next patch.

Cc: Jamal Hadi Salim 
Cc: Jiri Pirko 
Signed-off-by: Cong Wang 
---
 net/sched/cls_api.c | 44 +++-
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e655221c654e..afd099727aea 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -104,6 +104,10 @@ static int tfilter_notify(struct net *net, struct sk_buff 
*oskb,
  struct nlmsghdr *n, struct tcf_proto *tp,
  unsigned long fh, int event, bool unicast);
 
+static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
+ struct nlmsghdr *n, struct tcf_proto *tp,
+ unsigned long fh, bool unicast, bool *last);
+
 static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 struct nlmsghdr *n,
 struct tcf_chain *chain, int event)
@@ -595,11 +599,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
}
break;
case RTM_DELTFILTER:
-   err = tp->ops->delete(tp, fh, );
+   err = tfilter_del_notify(net, skb, n, tp, fh, false,
+);
if (err)
goto errout;
-   tfilter_notify(net, skb, n, tp, t->tcm_handle,
-  RTM_DELTFILTER, false);
if (last) {
tcf_chain_tp_remove(chain, _info, tp);
tcf_proto_destroy(tp);
@@ -659,9 +662,9 @@ static int tcf_fill_node(struct net *net, struct sk_buff 
*skb,
goto nla_put_failure;
if (nla_put_u32(skb, TCA_CHAIN, tp->chain->index))
goto nla_put_failure;
-   tcm->tcm_handle = fh;
-   if (RTM_DELTFILTER != event) {
+   if (!fh) {
tcm->tcm_handle = 0;
+   } else {
if (tp->ops->dump && tp->ops->dump(net, tp, fh, skb, tcm) < 0)
goto nla_put_failure;
}
@@ -698,6 +701,37 @@ static int tfilter_notify(struct net *net, struct sk_buff 
*oskb,
  n->nlmsg_flags & NLM_F_ECHO);
 }
 
+static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
+ struct nlmsghdr *n, struct tcf_proto *tp,
+ unsigned long fh, bool unicast, bool *last)
+{
+   struct sk_buff *skb;
+   u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
+   int err;
+
+   skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+   if (!skb)
+   return -ENOBUFS;
+
+   if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq,
+ n->nlmsg_flags, RTM_DELTFILTER) <= 0) {
+   kfree_skb(skb);
+   return -EINVAL;
+   }
+
+   err = tp->ops->delete(tp, fh, last);
+   if (err) {
+   kfree_skb(skb);
+   return err;
+   }
+
+   if (unicast)
+   return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT);
+
+   return rtnetlink_send(skb, net, portid, RTNLGRP_TC,
+ n->nlmsg_flags & NLM_F_ECHO);
+}
+
 struct tcf_dump_args {
struct tcf_walker w;
struct sk_buff *skb;
-- 
2.13.0



[Patch net-next 2/2] net_sched: use void pointer for filter handle

2017-08-04 Thread Cong Wang
Now we use 'unsigned long fh' as a pointer in every place,
it is safe to convert it to a void pointer now. This gets
rid of many casts to pointer.

Cc: Jamal Hadi Salim 
Cc: Jiri Pirko 
Signed-off-by: Cong Wang 
---
 include/net/pkt_cls.h |  2 +-
 include/net/sch_generic.h |  8 
 net/sched/cls_api.c   | 17 -
 net/sched/cls_basic.c | 22 ++
 net/sched/cls_bpf.c   | 27 ---
 net/sched/cls_cgroup.c| 12 ++--
 net/sched/cls_flow.c  | 24 
 net/sched/cls_flower.c| 22 +++---
 net/sched/cls_fw.c| 26 +-
 net/sched/cls_matchall.c  | 16 
 net/sched/cls_route.c | 26 +-
 net/sched/cls_rsvp.h  | 24 
 net/sched/cls_tcindex.c   | 36 
 net/sched/cls_u32.c   | 30 +++---
 14 files changed, 141 insertions(+), 151 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e0c54f111467..4667e6173fd7 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -11,7 +11,7 @@ struct tcf_walker {
int stop;
int skip;
int count;
-   int (*fn)(struct tcf_proto *, unsigned long node, struct tcf_walker 
*);
+   int (*fn)(struct tcf_proto *, void *node, struct tcf_walker *);
 };
 
 int register_tcf_proto_ops(struct tcf_proto_ops *ops);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1c123e2b2415..e79f5ad1c5f3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -213,16 +213,16 @@ struct tcf_proto_ops {
int (*init)(struct tcf_proto*);
void(*destroy)(struct tcf_proto*);
 
-   unsigned long   (*get)(struct tcf_proto*, u32 handle);
+   void*   (*get)(struct tcf_proto*, u32 handle);
int (*change)(struct net *net, struct sk_buff *,
struct tcf_proto*, unsigned long,
u32 handle, struct nlattr **,
-   unsigned long *, bool);
-   int (*delete)(struct tcf_proto*, unsigned long, 
bool*);
+   void **, bool);
+   int (*delete)(struct tcf_proto*, void *, bool*);
void(*walk)(struct tcf_proto*, struct tcf_walker 
*arg);
 
/* rtnetlink specific */
-   int (*dump)(struct net*, struct tcf_proto*, 
unsigned long,
+   int (*dump)(struct net*, struct tcf_proto*, void *,
struct sk_buff *skb, struct tcmsg*);
 
struct module   *owner;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index afd099727aea..668afb6e9885 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -102,11 +102,11 @@ EXPORT_SYMBOL(unregister_tcf_proto_ops);
 
 static int tfilter_notify(struct net *net, struct sk_buff *oskb,
  struct nlmsghdr *n, struct tcf_proto *tp,
- unsigned long fh, int event, bool unicast);
+ void *fh, int event, bool unicast);
 
 static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
  struct nlmsghdr *n, struct tcf_proto *tp,
- unsigned long fh, bool unicast, bool *last);
+ void *fh, bool unicast, bool *last);
 
 static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 struct nlmsghdr *n,
@@ -432,7 +432,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
struct tcf_proto *tp;
const struct Qdisc_class_ops *cops;
unsigned long cl;
-   unsigned long fh;
+   void *fh;
int err;
int tp_created;
 
@@ -571,7 +571,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
 
fh = tp->ops->get(tp, t->tcm_handle);
 
-   if (fh == 0) {
+   if (!fh) {
if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
tcf_chain_tp_remove(chain, _info, tp);
tfilter_notify(net, skb, n, tp, fh,
@@ -641,7 +641,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
 }
 
 static int tcf_fill_node(struct net *net, struct sk_buff *skb,
-struct tcf_proto *tp, unsigned long fh, u32 portid,
+struct tcf_proto *tp, void *fh, u32 portid,
 u32 seq, u16 flags, int event)
 {
struct tcmsg *tcm;
@@ -679,7 +679,7 @@ static int tcf_fill_node(struct net *net, struct sk_buff 
*skb,
 
 static int 

[Patch net-next 0/2] net_sched: clean up filter handle

2017-08-04 Thread Cong Wang
This patchset sits in my local branch for a long time, it is time to
send it out. It cleans up the ambiguous use of 'unsigned long fh',
please see each of them for details.

Cong Wang (2):
  net_sched: refactor notification code for RTM_DELTFILTER
  net_sched: use void pointer for filter handle

 include/net/pkt_cls.h |  2 +-
 include/net/sch_generic.h |  8 +++
 net/sched/cls_api.c   | 57 +--
 net/sched/cls_basic.c | 22 +-
 net/sched/cls_bpf.c   | 27 ++
 net/sched/cls_cgroup.c| 12 +-
 net/sched/cls_flow.c  | 24 ++--
 net/sched/cls_flower.c| 22 +-
 net/sched/cls_fw.c| 26 ++---
 net/sched/cls_matchall.c  | 16 ++---
 net/sched/cls_route.c | 26 ++---
 net/sched/cls_rsvp.h  | 24 ++--
 net/sched/cls_tcindex.c   | 36 +-
 net/sched/cls_u32.c   | 30 -
 14 files changed, 178 insertions(+), 154 deletions(-)

-- 
2.13.0



Re: [PATCH net-next] aquantia: Switch to use napi_gro_receive

2017-08-04 Thread David Miller
From: Pavel Belous 
Date: Thu,  3 Aug 2017 18:15:32 +0300

> From: Pavel Belous 
> 
> Add support for GRO (generic receive offload) for aQuantia Atlantic driver.
> This results in a perfomance improvement when GRO is enabled.
> 
> Signed-off-by: Pavel Belous 

Applied, thank you.


[PATCH net-next v2] lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL

2017-08-04 Thread Roopa Prabhu
From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
v2 - fixed a incorrect replace

 net/core/lwtunnel.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index d9cb353..435f35f 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -65,7 +65,7 @@ struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
 
return lws;
 }
-EXPORT_SYMBOL(lwtunnel_state_alloc);
+EXPORT_SYMBOL_GPL(lwtunnel_state_alloc);
 
 static const struct lwtunnel_encap_ops __rcu *
lwtun_encaps[LWTUNNEL_ENCAP_MAX + 1] __read_mostly;
@@ -80,7 +80,7 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops 
*ops,
_encaps[num],
NULL, ops) ? 0 : -1;
 }
-EXPORT_SYMBOL(lwtunnel_encap_add_ops);
+EXPORT_SYMBOL_GPL(lwtunnel_encap_add_ops);
 
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *ops,
   unsigned int encap_type)
@@ -99,7 +99,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops 
*ops,
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_encap_del_ops);
+EXPORT_SYMBOL_GPL(lwtunnel_encap_del_ops);
 
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap, unsigned int family,
@@ -138,7 +138,7 @@ int lwtunnel_build_state(u16 encap_type,
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_build_state);
+EXPORT_SYMBOL_GPL(lwtunnel_build_state);
 
 int lwtunnel_valid_encap_type(u16 encap_type, struct netlink_ext_ack *extack)
 {
@@ -175,7 +175,7 @@ int lwtunnel_valid_encap_type(u16 encap_type, struct 
netlink_ext_ack *extack)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_valid_encap_type);
+EXPORT_SYMBOL_GPL(lwtunnel_valid_encap_type);
 
 int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining,
   struct netlink_ext_ack *extack)
@@ -205,7 +205,7 @@ int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int 
remaining,
 
return 0;
 }
-EXPORT_SYMBOL(lwtunnel_valid_encap_type_attr);
+EXPORT_SYMBOL_GPL(lwtunnel_valid_encap_type_attr);
 
 void lwtstate_free(struct lwtunnel_state *lws)
 {
@@ -219,7 +219,7 @@ void lwtstate_free(struct lwtunnel_state *lws)
}
module_put(ops->owner);
 }
-EXPORT_SYMBOL(lwtstate_free);
+EXPORT_SYMBOL_GPL(lwtstate_free);
 
 int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state *lwtstate)
 {
@@ -259,7 +259,7 @@ int lwtunnel_fill_encap(struct sk_buff *skb, struct 
lwtunnel_state *lwtstate)
 
return (ret == -EOPNOTSUPP ? 0 : ret);
 }
-EXPORT_SYMBOL(lwtunnel_fill_encap);
+EXPORT_SYMBOL_GPL(lwtunnel_fill_encap);
 
 int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate)
 {
@@ -281,7 +281,7 @@ int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_get_encap_size);
+EXPORT_SYMBOL_GPL(lwtunnel_get_encap_size);
 
 int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b)
 {
@@ -309,7 +309,7 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct 
lwtunnel_state *b)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_cmp_encap);
+EXPORT_SYMBOL_GPL(lwtunnel_cmp_encap);
 
 int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
@@ -343,7 +343,7 @@ int lwtunnel_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_output);
+EXPORT_SYMBOL_GPL(lwtunnel_output);
 
 int lwtunnel_xmit(struct sk_buff *skb)
 {
@@ -378,7 +378,7 @@ int lwtunnel_xmit(struct sk_buff *skb)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_xmit);
+EXPORT_SYMBOL_GPL(lwtunnel_xmit);
 
 int lwtunnel_input(struct sk_buff *skb)
 {
@@ -412,4 +412,4 @@ int lwtunnel_input(struct sk_buff *skb)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_input);
+EXPORT_SYMBOL_GPL(lwtunnel_input);
-- 
2.1.4



Re: [PATCH] of_mdio: use of_property_read_u32_array()

2017-08-04 Thread Andrew Lunn
On Sat, Aug 05, 2017 at 12:43:43AM +0300, Sergei Shtylyov wrote:
> The "fixed-link" prop support predated of_property_read_u32_array(), so
> basically had to open-code it. Using the modern API saves 24 bytes of the
> object code (ARM gcc 4.8.5); the only behavior change would be that the
> prop length check is now less strict (however the strict pre-check done
> in of_phy_is_fixed_link() is left intact anyway)...
> 
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next 1/3] net: dsa: assign switch device in legacy code

2017-08-04 Thread Andrew Lunn
> @@ -251,8 +251,9 @@ dsa_switch_setup(struct dsa_switch_tree *dst, struct 
> net_device *master,
>   ds->cd = cd;
>   ds->ops = ops;
>   ds->priv = priv;
> + ds->dev = parent;

Hi Vivien

Is this even needed? dsa_switch_alloc() does ds->dev = dev.

   Andrew


Re: [PATCH net] bpf: fix byte order test in test_verifier

2017-08-04 Thread David Miller
From: Daniel Borkmann 
Date: Fri,  4 Aug 2017 22:24:41 +0200

> We really must check with #if __BYTE_ORDER == XYZ instead of
> just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
> actually running this on big endian machine, the latter test
> resolves to true for user space, same for #ifdef __BIG_ENDIAN.
> 
> E.g., looking at endian.h from libc, both are also defined
> there, so we really must test this against __BYTE_ORDER instead
> for proper insns selection. For the kernel, such checks are
> fine though e.g. see 13da9e200fe4 ("Revert "endian: #define
> __BYTE_ORDER"") and 415586c9e6d3 ("UAPI: fix endianness conditionals
> in M32R's asm/stat.h") for some more context, but not for
> user space. Lets also make sure to properly include endian.h.
> After that, suite passes for me:
> 
> ./test_verifier: ELF 64-bit MSB executable, [...]
> 
> Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x 
> GNU/Linux
> 
> Before fix: Summary: 505 PASSED, 11 FAILED
> After  fix: Summary: 516 PASSED,  0 FAILED
> 
> Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx 
> field loads")
> Signed-off-by: Daniel Borkmann 

Applied and queued up for -stable, thanks!


[PATCH net-next v4 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints

2017-08-04 Thread Yonghong Song
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
style tracepoints. The iovisor/bcc issue #748
(https://github.com/iovisor/bcc/issues/748) documents this issue.
For example, if you try to attach a bpf program to tracepoints
syscalls/sys_enter_newfstat, you will get the following error:
   # ./tools/trace.py t:syscalls:sys_enter_newfstat
   Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
   Failed to attach BPF to tracepoint

The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
tracepoints are treated differently from other tracepoints and there
is no bpf hook to it.

This patch adds bpf support for these syscalls tracepoints by
  . permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF
  . calling bpf programs in perf_syscall_enter and perf_syscall_exit

The legality of bpf program ctx access is also checked.
Function trace_event_get_offsets returns correct max offset for each
specific syscall tracepoint, which is compared against the maximum offset
access in bpf program.

Signed-off-by: Yonghong Song 
---
 include/linux/syscalls.h  | 12 ++
 kernel/events/core.c  | 10 
 kernel/trace/trace_syscalls.c | 53 +--
 3 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 3cb15ea..c917021 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -172,8 +172,20 @@ extern struct trace_event_functions 
exit_syscall_print_funcs;
static struct syscall_metadata __used   \
  __attribute__((section("__syscalls_metadata")))   \
 *__p_syscall_meta_##sname = &__syscall_meta_##sname;
+
+static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
+{
+   return tp_event->class == _class_syscall_enter ||
+  tp_event->class == _class_syscall_exit;
+}
+
 #else
 #define SYSCALL_METADATA(sname, nb, ...)
+
+static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
+{
+   return 0;
+}
 #endif
 
 #define SYSCALL_DEFINE0(sname) \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ff..a7a6c1d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8050,7 +8050,7 @@ static void perf_event_free_bpf_handler(struct perf_event 
*event)
 
 static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
 {
-   bool is_kprobe, is_tracepoint;
+   bool is_kprobe, is_tracepoint, is_syscall_tp;
struct bpf_prog *prog;
 
if (event->attr.type != PERF_TYPE_TRACEPOINT)
@@ -8061,7 +8061,8 @@ static int perf_event_set_bpf_prog(struct perf_event 
*event, u32 prog_fd)
 
is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
-   if (!is_kprobe && !is_tracepoint)
+   is_syscall_tp = is_syscall_trace_event(event->tp_event);
+   if (!is_kprobe && !is_tracepoint && !is_syscall_tp)
/* bpf programs can only be attached to u/kprobe or tracepoint 
*/
return -EINVAL;
 
@@ -8070,13 +8071,14 @@ static int perf_event_set_bpf_prog(struct perf_event 
*event, u32 prog_fd)
return PTR_ERR(prog);
 
if ((is_kprobe && prog->type != BPF_PROG_TYPE_KPROBE) ||
-   (is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT)) {
+   (is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT) ||
+   (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT)) {
/* valid fd, but invalid bpf program type */
bpf_prog_put(prog);
return -EINVAL;
}
 
-   if (is_tracepoint) {
+   if (is_tracepoint || is_syscall_tp) {
int off = trace_event_get_offsets(event->tp_event);
 
if (prog->aux->max_ctx_offset > off) {
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 5e10395..7a1a920 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -559,11 +559,29 @@ static DECLARE_BITMAP(enabled_perf_exit_syscalls, 
NR_syscalls);
 static int sys_perf_refcount_enter;
 static int sys_perf_refcount_exit;
 
+static int perf_call_bpf_enter(struct bpf_prog *prog, struct pt_regs *regs,
+ struct syscall_metadata *sys_data,
+ struct syscall_trace_enter *rec) {
+   struct syscall_tp_t {
+   unsigned long long regs;
+   unsigned long syscall_nr;
+   unsigned long args[sys_data->nb_args];
+   } param;
+   int i;
+
+   *(struct pt_regs **) = regs;
+   param.syscall_nr = rec->nr;
+   for (i = 0; i < sys_data->nb_args; i++)
+   param.args[i] = rec->args[i];
+   return trace_call_bpf(prog, );
+}
+
 static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 {

[PATCH net-next v4 2/2] bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints

2017-08-04 Thread Yonghong Song
Signed-off-by: Yonghong Song 
---
 samples/bpf/Makefile  |  4 +++
 samples/bpf/syscall_tp_kern.c | 62 +
 samples/bpf/syscall_tp_user.c | 71 +++
 3 files changed, 137 insertions(+)
 create mode 100644 samples/bpf/syscall_tp_kern.c
 create mode 100644 samples/bpf/syscall_tp_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 770d46c..f1010fe 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -39,6 +39,7 @@ hostprogs-y += per_socket_stats_example
 hostprogs-y += load_sock_ops
 hostprogs-y += xdp_redirect
 hostprogs-y += xdp_redirect_map
+hostprogs-y += syscall_tp
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -82,6 +83,7 @@ test_map_in_map-objs := bpf_load.o $(LIBBPF) 
test_map_in_map_user.o
 per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
 xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
 xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
+syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -125,6 +127,7 @@ always += tcp_iw_kern.o
 always += tcp_clamp_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
+always += syscall_tp_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -163,6 +166,7 @@ HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
 HOSTLOADLIBES_test_map_in_map += -lelf
 HOSTLOADLIBES_xdp_redirect += -lelf
 HOSTLOADLIBES_xdp_redirect_map += -lelf
+HOSTLOADLIBES_syscall_tp += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/syscall_tp_kern.c b/samples/bpf/syscall_tp_kern.c
new file mode 100644
index 000..9149c52
--- /dev/null
+++ b/samples/bpf/syscall_tp_kern.c
@@ -0,0 +1,62 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include "bpf_helpers.h"
+
+struct syscalls_enter_open_args {
+   unsigned long long unused;
+   long syscall_nr;
+   long filename_ptr;
+   long flags;
+   long mode;
+};
+
+struct syscalls_exit_open_args {
+   unsigned long long unused;
+   long syscall_nr;
+   long ret;
+};
+
+struct bpf_map_def SEC("maps") enter_open_map = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(u32),
+   .max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") exit_open_map = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(u32),
+   .max_entries = 1,
+};
+
+static __always_inline void count(void *map)
+{
+   u32 key = 0;
+   u32 *value, init_val = 1;
+
+   value = bpf_map_lookup_elem(map, );
+   if (value)
+   *value += 1;
+   else
+   bpf_map_update_elem(map, , _val, BPF_NOEXIST);
+}
+
+SEC("tracepoint/syscalls/sys_enter_open")
+int trace_enter_open(struct syscalls_enter_open_args *ctx)
+{
+   count((void *)_open_map);
+   return 0;
+}
+
+SEC("tracepoint/syscalls/sys_exit_open")
+int trace_enter_exit(struct syscalls_exit_open_args *ctx)
+{
+   count((void *)_open_map);
+   return 0;
+}
diff --git a/samples/bpf/syscall_tp_user.c b/samples/bpf/syscall_tp_user.c
new file mode 100644
index 000..a3cb91e
--- /dev/null
+++ b/samples/bpf/syscall_tp_user.c
@@ -0,0 +1,71 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+
+/* This program verifies bpf attachment to tracepoint sys_enter_* and 
sys_exit_*.
+ * This requires kernel CONFIG_FTRACE_SYSCALLS to be set.
+ */
+
+static void verify_map(int map_id)
+{
+   __u32 key = 0;
+   __u32 val;
+
+   if (bpf_map_lookup_elem(map_id, , ) != 0) {
+   fprintf(stderr, "map_lookup failed: %s\n", strerror(errno));
+   return;
+   }
+   if (val == 0)
+   fprintf(stderr, "failed: map #%d returns value 0\n", map_id);
+}
+
+int main(int argc, char **argv)
+{
+   struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+   char filename[256];
+   int fd;
+
+   setrlimit(RLIMIT_MEMLOCK, );
+   snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+   if (load_bpf_file(filename)) {
+   fprintf(stderr, "%s", bpf_log_buf);
+   return 1;
+ 

[PATCH net-next v4 0/2] bpf: add support for sys_{enter|exit}_* tracepoints

2017-08-04 Thread Yonghong Song
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
style tracepoints. The main reason is that syscalls/sys_enter_* and 
syscalls/sys_exit_*
tracepoints are treated differently from other tracepoints and there
is no bpf hook to it.

This patch set adds bpf support for these syscalls tracepoints and also
adds a test case for it.

Changelogs:
v3 -> v4:
 - Check the legality of ctx offset access for syscall tracepoint as well.
   trace_event_get_offsets will return correct max offset for each
   specific syscall tracepoint.
 - Use variable length array to avoid hardcode 6 as the maximum
   arguments beyond syscall_nr.
v2 -> v3:
 - Fix a build issue
v1 -> v2:
 - Do not use TRACE_EVENT_FL_CAP_ANY to identify syscall tracepoint.
   Instead use trace_event_call->class.

Yonghong Song (2):
  bpf: add support for sys_enter_* and sys_exit_* tracepoints
  bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints

 include/linux/syscalls.h  | 12 
 kernel/events/core.c  | 10 +++---
 kernel/trace/trace_syscalls.c | 53 ++--
 samples/bpf/Makefile  |  4 +++
 samples/bpf/syscall_tp_kern.c | 62 +
 samples/bpf/syscall_tp_user.c | 71 +++
 6 files changed, 206 insertions(+), 6 deletions(-)
 create mode 100644 samples/bpf/syscall_tp_kern.c
 create mode 100644 samples/bpf/syscall_tp_user.c

-- 
2.9.4



Re: [PATCH net-next] lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL

2017-08-04 Thread Roopa Prabhu
On Fri, Aug 4, 2017 at 3:25 PM, Andrew Lunn  wrote:
> On Fri, Aug 04, 2017 at 03:23:37PM -0700, Roopa Prabhu wrote:
>> From: Roopa Prabhu 
>>
>> Signed-off-by: Roopa Prabhu 
>> ---
>>  net/core/lwtunnel.c | 26 +-
>>  1 file changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
>> index d9cb353..8693ff8 100644
>> --- a/net/core/lwtunnel.c
>> +++ b/net/core/lwtunnel.c
>> @@ -65,7 +65,7 @@ struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
>>
>>   return lws;
>>  }
>> -EXPORT_SYMBOL(lwtunnel_state_alloc);
>> +EXPORT_SYMBOL_GPL_GPL(lwtunnel_state_alloc);
>
> Hi Roopa
>
> GPL_GPL?
>

oops, i had fixed it but sent the wrong version. thx


Re: uapi: MAX_ADDR_LEN vs. numeric 32

2017-08-04 Thread Mikko Rapeli
On Sat, Aug 05, 2017 at 01:25:19AM +0300, Dmitry V. Levin wrote:
>
> On Sat, Aug 05, 2017 at 12:33:25AM +0300, Mikko Rapeli wrote:
> > 
> > I find using MAX_ADDR_LEN better than numeric 32, though I doubt this will
> > change any time soon. Would you mind if I change packet_diag.h and
> > if_link.h to use that instead and fix the userspace compilation
> > problems by including netdevice.h?
> 
> The alternative fix, that is, to include 
> which pulls in other headers and a lot of definitions with them,
> has been mentioned in the discussion, too.
> We decided that the fix that was applied would be the least of all evils.

Ok, that's fine then. I'll drop my netdevice.h inclusion patch.

Thanks,

-Mikko


[PATCH net-next 2/3] net: dsa: remove useless args of dsa_cpu_dsa_setup

2017-08-04 Thread Vivien Didelot
dsa_cpu_dsa_setup currently takes 4 arguments but they are all available
from the dsa_port argument. Remove all others.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa.c  | 10 +-
 net/dsa/dsa2.c |  4 ++--
 net/dsa/dsa_priv.h |  3 +--
 net/dsa/legacy.c   |  4 +---
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 0ba842c08dd3..64db6eece3c1 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -67,17 +67,17 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = 
{
[DSA_TAG_PROTO_NONE] = _ops,
 };
 
-int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
- struct dsa_port *dport, int port)
+int dsa_cpu_dsa_setup(struct dsa_port *port)
 {
-   struct device_node *port_dn = dport->dn;
+   struct device_node *port_dn = port->dn;
+   struct dsa_switch *ds = port->ds;
struct phy_device *phydev;
int ret, mode;
 
if (of_phy_is_fixed_link(port_dn)) {
ret = of_phy_register_fixed_link(port_dn);
if (ret) {
-   dev_err(dev, "failed to register fixed PHY\n");
+   dev_err(ds->dev, "failed to register fixed PHY\n");
return ret;
}
phydev = of_phy_find_device(port_dn);
@@ -90,7 +90,7 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device 
*dev,
genphy_config_init(phydev);
genphy_read_status(phydev);
if (ds->ops->adjust_link)
-   ds->ops->adjust_link(ds, port, phydev);
+   ds->ops->adjust_link(ds, port->index, phydev);
 
put_device(>mdio.dev);
}
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index c442051d5a55..2a0120493cf1 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -219,7 +219,7 @@ static int dsa_dsa_port_apply(struct dsa_port *port)
struct dsa_switch *ds = port->ds;
int err;
 
-   err = dsa_cpu_dsa_setup(ds, ds->dev, port, port->index);
+   err = dsa_cpu_dsa_setup(port);
if (err) {
dev_warn(ds->dev, "Failed to setup dsa port %d: %d\n",
 port->index, err);
@@ -243,7 +243,7 @@ static int dsa_cpu_port_apply(struct dsa_port *port)
struct dsa_switch *ds = port->ds;
int err;
 
-   err = dsa_cpu_dsa_setup(ds, ds->dev, port, port->index);
+   err = dsa_cpu_dsa_setup(port);
if (err) {
dev_warn(ds->dev, "Failed to setup cpu port %d: %d\n",
 port->index, err);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 7aa0656296c2..46851c91c7fe 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -101,8 +101,7 @@ struct dsa_slave_priv {
 };
 
 /* dsa.c */
-int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
- struct dsa_port *dport, int port);
+int dsa_cpu_dsa_setup(struct dsa_port *port);
 void dsa_cpu_dsa_destroy(struct dsa_port *dport);
 const struct dsa_device_ops *dsa_resolve_tag_protocol(int tag_protocol);
 int dsa_cpu_port_ethtool_setup(struct dsa_port *cpu_dp);
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index c565787e1c78..05be0bc10735 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -80,15 +80,13 @@ dsa_switch_probe(struct device *parent, struct device 
*host_dev, int sw_addr,
 /* basic switch operations **/
 static int dsa_cpu_dsa_setups(struct dsa_switch *ds)
 {
-   struct dsa_port *dport;
int ret, port;
 
for (port = 0; port < ds->num_ports; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
 
-   dport = >ports[port];
-   ret = dsa_cpu_dsa_setup(ds, ds->dev, dport, port);
+   ret = dsa_cpu_dsa_setup(>ports[port]);
if (ret)
return ret;
}
-- 
2.13.3



Re: [PATCH net] bpf: fix byte order test in test_verifier

2017-08-04 Thread Yonghong Song



On 8/4/17 1:24 PM, Daniel Borkmann wrote:

We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.

E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see 13da9e200fe4 ("Revert "endian: #define
__BYTE_ORDER"") and 415586c9e6d3 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:

./test_verifier: ELF 64-bit MSB executable, [...]

Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x 
GNU/Linux

Before fix: Summary: 505 PASSED, 11 FAILED
After  fix: Summary: 516 PASSED,  0 FAILED

Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field 
loads")
Signed-off-by: Daniel Borkmann 
---
  tools/testing/selftests/bpf/test_verifier.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index addea82..d3ed732 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -8,6 +8,7 @@
   * License as published by the Free Software Foundation.
   */
  
+#include 

  #include 
  #include 
  #include 
@@ -1098,7 +1099,7 @@ struct test_val {
"check skb->hash byte load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash)),
  #else
@@ -1135,7 +1136,7 @@ struct test_val {
"check skb->hash byte load not permitted 3",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash) + 3),
  #else
@@ -1244,7 +1245,7 @@ struct test_val {
"check skb->hash half load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash)),
  #else
@@ -1259,7 +1260,7 @@ struct test_val {
"check skb->hash half load not permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash) + 2),
  #else
@@ -5422,7 +5423,7 @@ struct test_val {
"check bpf_perf_event_data->sample_period byte load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct bpf_perf_event_data, 
sample_period)),
  #else
@@ -5438,7 +5439,7 @@ struct test_val {
"check bpf_perf_event_data->sample_period half load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct bpf_perf_event_data, 
sample_period)),
  #else
@@ -5454,7 +5455,7 @@ struct test_val {
"check bpf_perf_event_data->sample_period word load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
offsetof(struct bpf_perf_event_data, 
sample_period)),
  #else
@@ -5481,7 +5482,7 @@ struct test_val {
"check skb->data half load not permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, data)),
  #else
@@ -5497,7 +5498,7 @@ struct test_val {
"check skb->tc_classid half load not permitted for lwt prog",
.insns = {

[PATCH net-next 0/3] net: dsa: remove unnecessary arguments

2017-08-04 Thread Vivien Didelot
Several DSA core functions take many arguments, mostly because the
legacy code does not assign ds->dev. This patch series assigns ds->dev
in legacy and removes the unnecessary arguments of these functions,
where either the dsa_switch or dsa_port argument is enough.

Vivien Didelot (3):
  net: dsa: assign switch device in legacy code
  net: dsa: remove useless args of dsa_cpu_dsa_setup
  net: dsa: remove useless args of dsa_slave_create

 net/dsa/dsa.c  | 10 +-
 net/dsa/dsa2.c |  6 +++---
 net/dsa/dsa_priv.h |  6 ++
 net/dsa/legacy.c   | 19 +--
 net/dsa/slave.c| 14 +++---
 5 files changed, 26 insertions(+), 29 deletions(-)

-- 
2.13.3



[PATCH net-next 1/3] net: dsa: assign switch device in legacy code

2017-08-04 Thread Vivien Didelot
Assign the parent device to the dev member of the newly allocated
dsa_switch structure in the legacy dsa_switch_setup function, so that
the underlying dsa_switch_setup_one and dsa_cpu_dsa_setups functions can
access it instead of requiring an additional struct device argument.

Signed-off-by: Vivien Didelot 
---
 net/dsa/legacy.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 1d7a3282f2a7..c565787e1c78 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -78,7 +78,7 @@ dsa_switch_probe(struct device *parent, struct device 
*host_dev, int sw_addr,
 }
 
 /* basic switch operations **/
-static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct device *dev)
+static int dsa_cpu_dsa_setups(struct dsa_switch *ds)
 {
struct dsa_port *dport;
int ret, port;
@@ -88,15 +88,15 @@ static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct 
device *dev)
continue;
 
dport = >ports[port];
-   ret = dsa_cpu_dsa_setup(ds, dev, dport, port);
+   ret = dsa_cpu_dsa_setup(ds, ds->dev, dport, port);
if (ret)
return ret;
}
return 0;
 }
 
-static int dsa_switch_setup_one(struct dsa_switch *ds, struct net_device 
*master,
-   struct device *parent)
+static int dsa_switch_setup_one(struct dsa_switch *ds,
+   struct net_device *master)
 {
const struct dsa_switch_ops *ops = ds->ops;
struct dsa_switch_tree *dst = ds->dst;
@@ -176,7 +176,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct net_device *master
}
 
if (!ds->slave_mii_bus && ops->phy_read) {
-   ds->slave_mii_bus = devm_mdiobus_alloc(parent);
+   ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
if (!ds->slave_mii_bus)
return -ENOMEM;
dsa_slave_mii_bus_init(ds);
@@ -196,14 +196,14 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct net_device *master
if (!(ds->enabled_port_mask & (1 << i)))
continue;
 
-   ret = dsa_slave_create(ds, parent, i, cd->port_names[i]);
+   ret = dsa_slave_create(ds, ds->dev, i, cd->port_names[i]);
if (ret < 0)
netdev_err(master, "[%d]: can't create dsa slave device 
for port %d(%s): %d\n",
   index, i, cd->port_names[i], ret);
}
 
/* Perform configuration of the CPU and DSA ports */
-   ret = dsa_cpu_dsa_setups(ds, parent);
+   ret = dsa_cpu_dsa_setups(ds);
if (ret < 0)
netdev_err(master, "[%d] : can't configure CPU and DSA ports\n",
   index);
@@ -251,8 +251,9 @@ dsa_switch_setup(struct dsa_switch_tree *dst, struct 
net_device *master,
ds->cd = cd;
ds->ops = ops;
ds->priv = priv;
+   ds->dev = parent;
 
-   ret = dsa_switch_setup_one(ds, master, parent);
+   ret = dsa_switch_setup_one(ds, master);
if (ret)
return ERR_PTR(ret);
 
-- 
2.13.3



[PATCH net-next 3/3] net: dsa: remove useless args of dsa_slave_create

2017-08-04 Thread Vivien Didelot
dsa_slave_create currently takes 4 arguments while it only needs the
related dsa_port and its name. Remove all other arguments.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa2.c |  2 +-
 net/dsa/dsa_priv.h |  3 +--
 net/dsa/legacy.c   |  2 +-
 net/dsa/slave.c| 14 +++---
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 2a0120493cf1..cceaa4dd9f53 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -275,7 +275,7 @@ static int dsa_user_port_apply(struct dsa_port *port)
if (!name)
name = "eth%d";
 
-   err = dsa_slave_create(ds, ds->dev, port->index, name);
+   err = dsa_slave_create(port, name);
if (err) {
dev_warn(ds->dev, "Failed to create slave %d: %d\n",
 port->index, err);
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 46851c91c7fe..73426f9c2cca 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -148,8 +148,7 @@ int dsa_port_vlan_dump(struct dsa_port *dp,
 extern const struct dsa_device_ops notag_netdev_ops;
 void dsa_slave_mii_bus_init(struct dsa_switch *ds);
 void dsa_cpu_port_ethtool_init(struct ethtool_ops *ops);
-int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
-int port, const char *name);
+int dsa_slave_create(struct dsa_port *port, const char *name);
 void dsa_slave_destroy(struct net_device *slave_dev);
 int dsa_slave_suspend(struct net_device *slave_dev);
 int dsa_slave_resume(struct net_device *slave_dev);
diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index 05be0bc10735..db7c1eaad078 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -194,7 +194,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds,
if (!(ds->enabled_port_mask & (1 << i)))
continue;
 
-   ret = dsa_slave_create(ds, ds->dev, i, cd->port_names[i]);
+   ret = dsa_slave_create(>ports[i], cd->port_names[i]);
if (ret < 0)
netdev_err(master, "[%d]: can't create dsa slave device 
for port %d(%s): %d\n",
   index, i, cd->port_names[i], ret);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index e196562035b1..c8eb33746850 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1165,9 +1165,9 @@ int dsa_slave_resume(struct net_device *slave_dev)
return 0;
 }
 
-int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
-int port, const char *name)
+int dsa_slave_create(struct dsa_port *port, const char *name)
 {
+   struct dsa_switch *ds = port->ds;
struct dsa_switch_tree *dst = ds->dst;
struct net_device *master;
struct net_device *slave_dev;
@@ -1197,13 +1197,13 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
device *parent,
netdev_for_each_tx_queue(slave_dev, dsa_slave_set_lockdep_class_one,
 NULL);
 
-   SET_NETDEV_DEV(slave_dev, parent);
-   slave_dev->dev.of_node = ds->ports[port].dn;
+   SET_NETDEV_DEV(slave_dev, port->ds->dev);
+   slave_dev->dev.of_node = port->dn;
slave_dev->vlan_features = master->vlan_features;
 
p = netdev_priv(slave_dev);
u64_stats_init(>stats64.syncp);
-   p->dp = >ports[port];
+   p->dp = port;
INIT_LIST_HEAD(>mall_tc_list);
p->xmit = dst->tag_ops->xmit;
 
@@ -1211,12 +1211,12 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
device *parent,
p->old_link = -1;
p->old_duplex = -1;
 
-   ds->ports[port].netdev = slave_dev;
+   port->netdev = slave_dev;
ret = register_netdev(slave_dev);
if (ret) {
netdev_err(master, "error %d registering interface %s\n",
   ret, slave_dev->name);
-   ds->ports[port].netdev = NULL;
+   port->netdev = NULL;
free_netdev(slave_dev);
return ret;
}
-- 
2.13.3



Re: [PATCH net-next] lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL

2017-08-04 Thread Andrew Lunn
On Fri, Aug 04, 2017 at 03:23:37PM -0700, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> Signed-off-by: Roopa Prabhu 
> ---
>  net/core/lwtunnel.c | 26 +-
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
> index d9cb353..8693ff8 100644
> --- a/net/core/lwtunnel.c
> +++ b/net/core/lwtunnel.c
> @@ -65,7 +65,7 @@ struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
>  
>   return lws;
>  }
> -EXPORT_SYMBOL(lwtunnel_state_alloc);
> +EXPORT_SYMBOL_GPL_GPL(lwtunnel_state_alloc);

Hi Roopa

GPL_GPL?

Andrew


Re: uapi: MAX_ADDR_LEN vs. numeric 32

2017-08-04 Thread Dmitry V. Levin
Hi,

On Sat, Aug 05, 2017 at 12:33:25AM +0300, Mikko Rapeli wrote:
> First, thanks Dmitry for fixing several uapi compilation problems in
> user space. I got a bit demotivated

That's quite understandable.

> about the slow review progress, e.g.
> no feedback what so ever, on some of the patches, but lets try again...
> 
> I rebased my tree now and saw
> 
> commit 745cb7f8a5de0805cade3de3991b7a95317c7c73
> Author: Dmitry V. Levin 
> Date:   Tue Mar 7 23:50:50 2017 +0300
> 
> uapi: fix linux/packet_diag.h userspace compilation error
> 
> which does:
> 
> --- a/include/uapi/linux/packet_diag.h
> +++ b/include/uapi/linux/packet_diag.h
> @@ -64,7 +64,7 @@ struct packet_diag_mclist {
> __u32   pdmc_count;
> __u16   pdmc_type;
> __u16   pdmc_alen;
> -   __u8pdmc_addr[MAX_ADDR_LEN];
> +   __u8pdmc_addr[32]; /* MAX_ADDR_LEN */
>  };
>  
>  struct packet_diag_ring {
> 
> In my tree I had fixed that case with:
> 
> --- a/include/uapi/linux/packet_diag.h
> +++ b/include/uapi/linux/packet_diag.h
> @@ -2,6 +2,7 @@
>  #define __PACKET_DIAG_H__
>  
>  #include 
> +#include 
>  
>  struct packet_diag_req {
> __u8sdiag_family;
> 
> since netdevice.h has the definition also in user space
> 
> #define MAX_ADDR_LEN32  /* Largest hardware address length */
> 
> I find using MAX_ADDR_LEN better than numeric 32, though I doubt this will
> change any time soon. Would you mind if I change packet_diag.h and
> if_link.h to use that instead and fix the userspace compilation
> problems by including netdevice.h?

The alternative fix, that is, to include 
which pulls in other headers and a lot of definitions with them,
has been mentioned in the discussion, too.
We decided that the fix that was applied would be the least of all evils.


-- 
ldv


signature.asc
Description: PGP signature


[PATCH net-next] lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPL

2017-08-04 Thread Roopa Prabhu
From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 net/core/lwtunnel.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index d9cb353..8693ff8 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -65,7 +65,7 @@ struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
 
return lws;
 }
-EXPORT_SYMBOL(lwtunnel_state_alloc);
+EXPORT_SYMBOL_GPL_GPL(lwtunnel_state_alloc);
 
 static const struct lwtunnel_encap_ops __rcu *
lwtun_encaps[LWTUNNEL_ENCAP_MAX + 1] __read_mostly;
@@ -80,7 +80,7 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops 
*ops,
_encaps[num],
NULL, ops) ? 0 : -1;
 }
-EXPORT_SYMBOL(lwtunnel_encap_add_ops);
+EXPORT_SYMBOL_GPL(lwtunnel_encap_add_ops);
 
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *ops,
   unsigned int encap_type)
@@ -99,7 +99,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops 
*ops,
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_encap_del_ops);
+EXPORT_SYMBOL_GPL(lwtunnel_encap_del_ops);
 
 int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap, unsigned int family,
@@ -138,7 +138,7 @@ int lwtunnel_build_state(u16 encap_type,
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_build_state);
+EXPORT_SYMBOL_GPL(lwtunnel_build_state);
 
 int lwtunnel_valid_encap_type(u16 encap_type, struct netlink_ext_ack *extack)
 {
@@ -175,7 +175,7 @@ int lwtunnel_valid_encap_type(u16 encap_type, struct 
netlink_ext_ack *extack)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_valid_encap_type);
+EXPORT_SYMBOL_GPL(lwtunnel_valid_encap_type);
 
 int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int remaining,
   struct netlink_ext_ack *extack)
@@ -205,7 +205,7 @@ int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int 
remaining,
 
return 0;
 }
-EXPORT_SYMBOL(lwtunnel_valid_encap_type_attr);
+EXPORT_SYMBOL_GPL(lwtunnel_valid_encap_type_attr);
 
 void lwtstate_free(struct lwtunnel_state *lws)
 {
@@ -219,7 +219,7 @@ void lwtstate_free(struct lwtunnel_state *lws)
}
module_put(ops->owner);
 }
-EXPORT_SYMBOL(lwtstate_free);
+EXPORT_SYMBOL_GPL(lwtstate_free);
 
 int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state *lwtstate)
 {
@@ -259,7 +259,7 @@ int lwtunnel_fill_encap(struct sk_buff *skb, struct 
lwtunnel_state *lwtstate)
 
return (ret == -EOPNOTSUPP ? 0 : ret);
 }
-EXPORT_SYMBOL(lwtunnel_fill_encap);
+EXPORT_SYMBOL_GPL(lwtunnel_fill_encap);
 
 int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate)
 {
@@ -281,7 +281,7 @@ int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_get_encap_size);
+EXPORT_SYMBOL_GPL(lwtunnel_get_encap_size);
 
 int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b)
 {
@@ -309,7 +309,7 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct 
lwtunnel_state *b)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_cmp_encap);
+EXPORT_SYMBOL_GPL(lwtunnel_cmp_encap);
 
 int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
@@ -343,7 +343,7 @@ int lwtunnel_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_output);
+EXPORT_SYMBOL_GPL(lwtunnel_output);
 
 int lwtunnel_xmit(struct sk_buff *skb)
 {
@@ -378,7 +378,7 @@ int lwtunnel_xmit(struct sk_buff *skb)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_xmit);
+EXPORT_SYMBOL_GPL(lwtunnel_xmit);
 
 int lwtunnel_input(struct sk_buff *skb)
 {
@@ -412,4 +412,4 @@ int lwtunnel_input(struct sk_buff *skb)
 
return ret;
 }
-EXPORT_SYMBOL(lwtunnel_input);
+EXPORT_SYMBOL_GPL(lwtunnel_input);
-- 
2.1.4



Re: [PATCH v5] ss: Enclose IPv6 address in brackets

2017-08-04 Thread Stephen Hemminger
On Fri, 04 Aug 2017 13:46:27 -0700
Eric Dumazet  wrote:

> On Fri, 2017-08-04 at 12:05 -0700, Stephen Hemminger wrote:
> > On Fri, 4 Aug 2017 20:02:52 +0200
> > Florian Lehner  wrote:
> >   
> > > diff --git a/misc/ss.c b/misc/ss.c
> > > index 12763c9..83683b5 100644
> > > --- a/misc/ss.c
> > > +++ b/misc/ss.c
> > > @@ -1046,8 +1046,9 @@ do_numeric:
> > > 
> > >  static void inet_addr_print(const inet_prefix *a, int port, unsigned
> > > int ifindex)
> > >  {  
> > 
> > Your email client is wrapping long lines which leads to malformed patch.
> > 
> > You didn't need buf2, and the code was more complex than it needed to be.
> > 
> > Rather than waiting for yet another version, I just merged in similar
> > code.  
> 
> 
> Also, is this new format accepted for the filter ?
> 
> ss src [::1]:22

It looks like ss would always do that (from earliest git version).


Re: [net-next PATCH] net: devmap fix mutex in rcu critical section

2017-08-04 Thread John Fastabend
On 08/04/2017 02:21 PM, John Fastabend wrote:
> Originally we used a mutex to protect concurrent devmap update
> and delete operations from racing with netdev unregister notifier
> callbacks.
> 

[...]

>  }
> @@ -396,22 +385,20 @@ static int dev_map_notification(struct notifier_block 
> *notifier,
>  

Daniel reminds me this is not in a rcu_read_lock/unlock() section as
needed, so v2 on its way. Thanks!

>   switch (event) {
>   case NETDEV_UNREGISTER:
> - mutex_lock(_map_list_mutex);

rcu_read_lock();

>   list_for_each_entry(dtab, _map_list, list) {
>   for (i = 0; i < dtab->map.max_entries; i++) {
> - struct bpf_dtab_netdev *dev;
> + struct bpf_dtab_netdev *dev, *odev;
>  
> - dev = dtab->netdev_map[i];
> + dev = READ_ONCE(dtab->netdev_map[i]);
>   if (!dev ||
>   dev->dev->ifindex != netdev->ifindex)
>   continue;
> - dev = xchg(>netdev_map[i], NULL);
> - if (dev)
> + odev = cmpxchg(>netdev_map[i], dev, NULL);
> + if (dev == odev)
>   call_rcu(>rcu,
>__dev_map_entry_free);
>   }
>   }

rcu_read_unlock();

> - mutex_unlock(_map_list_mutex);
>   break;
>   default:
>   break;
> 



uapi: MAX_ADDR_LEN vs. numeric 32

2017-08-04 Thread Mikko Rapeli
Hi,

First, thanks Dmitry for fixing several uapi compilation problems in
user space. I got a bit demotivated about the slow review progress, e.g.
no feedback what so ever, on some of the patches, but lets try again...

I rebased my tree now and saw

commit 745cb7f8a5de0805cade3de3991b7a95317c7c73
Author: Dmitry V. Levin 
Date:   Tue Mar 7 23:50:50 2017 +0300

uapi: fix linux/packet_diag.h userspace compilation error

which does:

--- a/include/uapi/linux/packet_diag.h
+++ b/include/uapi/linux/packet_diag.h
@@ -64,7 +64,7 @@ struct packet_diag_mclist {
__u32   pdmc_count;
__u16   pdmc_type;
__u16   pdmc_alen;
-   __u8pdmc_addr[MAX_ADDR_LEN];
+   __u8pdmc_addr[32]; /* MAX_ADDR_LEN */
 };
 
 struct packet_diag_ring {

In my tree I had fixed that case with:

--- a/include/uapi/linux/packet_diag.h
+++ b/include/uapi/linux/packet_diag.h
@@ -2,6 +2,7 @@
 #define __PACKET_DIAG_H__
 
 #include 
+#include 
 
 struct packet_diag_req {
__u8sdiag_family;

since netdevice.h has the definition also in user space

#define MAX_ADDR_LEN32  /* Largest hardware address length */

I find using MAX_ADDR_LEN better than numeric 32, though I doubt this will
change any time soon. Would you mind if I change packet_diag.h and
if_link.h to use that instead and fix the userspace compilation
problems by including netdevice.h?

Thanks,

-Mikko


Re: [PATCH] PCI: Update ACS quirk for more Intel 10G NICs

2017-08-04 Thread Roland Dreier
> I think the conclusion is that a hard-wired ACS capability is a
> positive indication of isolation for a multifunction device, the code
> is intended to support this and appears to do so, and Roland was going
> to investigate the sightings that inspired this patch in more detail.
> Dropping for now is appropriate.  Thanks,

That's right.  I confirmed that the issue we found was due to another PCI quirk.

It may make sense to add more 82599 variants to the table, but the X540 and X550
work without a quirk.

Sorry for the noise.

 - R.


[PATCH] of_mdio: use of_property_read_u32_array()

2017-08-04 Thread Sergei Shtylyov
The "fixed-link" prop support predated of_property_read_u32_array(), so
basically had to open-code it. Using the modern API saves 24 bytes of the
object code (ARM gcc 4.8.5); the only behavior change would be that the
prop length check is now less strict (however the strict pre-check done
in of_phy_is_fixed_link() is left intact anyway)...

Signed-off-by: Sergei Shtylyov 

---
The patch is against the 'dt/next' branch of Rob Herring's 'linux-git' repo
plus the previously posted patch killing the useless local variable in
of_phy_register_fixed_link().

 drivers/of/of_mdio.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

Index: linux/drivers/of/of_mdio.c
===
--- linux.orig/drivers/of/of_mdio.c
+++ linux/drivers/of/of_mdio.c
@@ -421,10 +421,10 @@ int of_phy_register_fixed_link(struct de
 {
struct fixed_phy_status status = {};
struct device_node *fixed_link_node;
-   const __be32 *fixed_link_prop;
+   u32 fixed_link_prop[5];
struct phy_device *phy;
const char *managed;
-   int link_gpio, len;
+   int link_gpio;
 
if (of_property_read_string(np, "managed", ) == 0) {
if (strcmp(managed, "in-band-status") == 0) {
@@ -459,13 +459,13 @@ int of_phy_register_fixed_link(struct de
}
 
/* Old binding */
-   fixed_link_prop = of_get_property(np, "fixed-link", );
-   if (fixed_link_prop && len == (5 * sizeof(__be32))) {
+   if (of_property_read_u32_array(np, "fixed-link", fixed_link_prop,
+  ARRAY_SIZE(fixed_link_prop)) == 0) {
status.link = 1;
-   status.duplex = be32_to_cpu(fixed_link_prop[1]);
-   status.speed = be32_to_cpu(fixed_link_prop[2]);
-   status.pause = be32_to_cpu(fixed_link_prop[3]);
-   status.asym_pause = be32_to_cpu(fixed_link_prop[4]);
+   status.duplex = fixed_link_prop[1];
+   status.speed  = fixed_link_prop[2];
+   status.pause  = fixed_link_prop[3];
+   status.asym_pause = fixed_link_prop[4];
phy = fixed_phy_register(PHY_POLL, , -1, np);
return PTR_ERR_OR_ZERO(phy);
}



[net-next PATCH] net: devmap fix mutex in rcu critical section

2017-08-04 Thread John Fastabend
Originally we used a mutex to protect concurrent devmap update
and delete operations from racing with netdev unregister notifier
callbacks.

The notifier hook is needed because we increment the netdev ref
count when a dev is added to the devmap. This ensures the netdev
reference is valid in the datapath. However, we don't want to block
unregister events, hence the initial mutex and notifier handler.

The concern was in the notifier hook we search the map for dev
entries that hold a refcnt on the net device being torn down. But,
in order to do this we require two steps,

  (i) dereference the netdev:  dev = rcu_dereference(map[i])
 (ii) test ifindex:   dev->ifindex == removing_ifindex

and then finally we can swap in the NULL dev in the map via an
xchg operation,

  xchg(map[i], NULL)

The danger here is a concurrent update could run a different
xchg op concurrently leading us to replace the new dev with a
NULL dev incorrectly.

  CPU 1CPU 2

   notifier hook   bpf devmap update

   dev = rcu_dereference(map[i])
   dev = rcu_dereference(map[i])
   xchg(map[i]), new_dev);
   rcu_call(dev,...)
   xchg(map[i], NULL)

The above flow would create the incorrect state with the dev
reference in the update path being lost. To resolve this the
original code used a mutex around the above block. However,
updates, deletes, and lookups occur inside rcu critical sections
so we can't use a mutex in this context safely.

Fortunately, by writing slightly better code we can avoid the
mutex altogether. If CPU 1 in the above example uses a cmpxchg
and _only_ replaces the dev reference in the map when it is in
fact the expected dev the race is removed completely. The two
cases being illustrated here, first the racy condition,

  CPU 1  CPU 2

   notifier hook bpf devmap update

   dev = rcu_dereference(map[i])
 dev = rcu_dereference(map[i])
 xchg(map[i]), new_dev);
 rcu_call(dev,...)
   odev = cmpxchg(map[i], dev, NULL)

Now we can test the cmpxchg return value, detect odev != dev and
abort. Or in the good case,

  CPU 1  CPU 2

   notifier hook bpf devmap update
   dev = rcu_dereference(map[i])
   odev = cmpxchg(map[i], dev, NULL)
 [...]

Now 'odev == dev' and we can do proper cleanup.

And viola the original race we tried to solve with a mutex is
corrected and the trace noted by Sasha below is resolved due
to removal of the mutex.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1
1 lock held by syz-executor1/16315:
 #0:  (rcu_read_lock){..}, at: [] map_delete_elem 
kernel/bpf/syscall.c:577 [inline]
 #0:  (rcu_read_lock){..}, at: [] SYSC_bpf 
kernel/bpf/syscall.c:1427 [inline]
 #0:  (rcu_read_lock){..}, at: [] SyS_bpf+0x1d32/0x4ba0 
kernel/bpf/syscall.c:1388

Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map")
Reported-by: Sasha Levin 
Signed-off-by: Daniel Borkmann 
Signed-off-by: John Fastabend 
---
 kernel/bpf/devmap.c |   27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index d439ee0..087f7230 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -40,11 +40,9 @@
  * contain a reference to the net device and remove them. This is a two step
  * process (a) dereference the bpf_dtab_netdev object in netdev_map and (b)
  * check to see if the ifindex is the same as the net_device being removed.
- * Unfortunately, the xchg() operations do not protect against this. To avoid
- * potentially removing incorrect objects the dev_map_list_mutex protects
- * conflicting netdev unregister and BPF syscall operations. Updates and
- * deletes from a BPF program (done in rcu critical section) are blocked
- * because of this mutex.
+ * When removing the dev a cmpxchg() is used to ensure the correct dev is
+ * removed, in the case of a concurrent update or delete operation it is
+ * possible that the initially referenced dev is no longer in the map.
  */
 #include 
 #include 
@@ -68,7 +66,6 @@ struct bpf_dtab {
struct list_head list;
 };
 
-static DEFINE_MUTEX(dev_map_list_mutex);
 static LIST_HEAD(dev_map_list);
 
 static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
@@ -128,9 +125,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
if (!dtab->netdev_map)
goto free_dtab;
 
-   mutex_lock(_map_list_mutex);
list_add_tail(>list, _map_list);
-   mutex_unlock(_map_list_mutex);
return >map;
 
 free_dtab:
@@ 

Re: [PATCH v8 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-08-04 Thread Casey Leedom
| From: Ding Tianhong 
| Sent: Thursday, August 3, 2017 6:44 AM
|
| diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
| index 6967c6b..1e1cdbe 100644
| --- a/drivers/pci/quirks.c
| +++ b/drivers/pci/quirks.c
| @@ -4016,6 +4016,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
|quirk_tw686x_class);
|
|  /*
| + * Some devices have problems with Transaction Layer Packets with the Relaxed
| + * Ordering Attribute set.  Such devices should mark themselves and other
| + * Device Drivers should check before sending TLPs with RO set.
| + */
| +static void quirk_relaxedordering_disable(struct pci_dev *dev)
| +{
| +   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
| +}
| +
| +/*
| + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
| + * cause performance problems with Upstream Transaction Layer Packets with
| + * Relaxed Ordering set.
| + */
| +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
| + quirk_relaxedordering_disable);
| +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
| + quirk_relaxedordering_disable);
| +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
| + quirk_relaxedordering_disable);
| + ...

It looks like this is missing the set of Root Complex IDs that were noted in
the document to which Patrick Cramer sent us a reference:

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

In section 3.9.1 we have:

3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
  and Toward MMIO Regions (P2P)

In order to maximize performance for PCIe devices in the processors
listed in Table 3-6 below, the soft- ware should determine whether the
accesses are toward coherent memory (system memory) or toward MMIO
regions (P2P access to other devices). If the access is toward MMIO
region, then software can command HW to set the RO bit in the TLP
header, as this would allow hardware to achieve maximum throughput for
these types of accesses. For accesses toward coherent memory, software
can command HW to clear the RO bit in the TLP header (no RO), as this
would allow hardware to achieve maximum throughput for these types of
accesses.

Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
   PCIe Performance

ProcessorCPU RP Device IDs

Intel Xeon processors based on   6F01H-6F0EH
Broadwell microarchitecture

Intel Xeon processors based on   2F01H-2F0EH
Haswell microarchitecture

The PCI Device IDs you have there are the first ones that I guessed at
having the performance problem with Relaxed Ordering.  We now apparently
have a complete list from Intel.

I don't want to phrase this as a "NAK" because you've gone around the
mulberry bush a bunch of times already.  So maybe just go with what you've
got in version 8 of your patch and then do a follow on patch to complete the
table?

Casey

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-08-04 Thread Casey Leedom
| From: Raj, Ashok 
| Sent: Friday, August 4, 2017 1:21 PM
|
| On Fri, Aug 04, 2017 at 08:20:37PM +, Casey Leedom wrote:
| > ...
| > As I've noted a number of times, it would be great if the Intel Hardware
| > Engineers who attempted to implement the Relaxed Ordering semantics in the
| > current generation of Root Complexes had left the ability to turn off the
| > logic which is obviously not working.  If there was a way to disable the
| > logic via an undocumented register, then we could have the Linux PCI Quirk
| > do that.  Since Relaxed Ordering is just a hint, it's completely legitimate
| > to completely ignore it.
|
| Suppose you are looking for the existence of a chicken bit to instruct the
| port to ignore RO traffic. So all we would do is turn the chicken bit on
| but would permit p2p traffic to be allowed since we won't turn off the
| capability as currently proposed.
|
| Let me look into that keep you posted.

Huh, I'd never heard it called a "chicken bit" before, but yes, that's what
I'm talking about.

Whenever our Hardware Designers implement new functionality in our hardware,
they almost always put in A. several "knobs" which can control fundamental
parameters of the new Hardware Feature, and B.  a mechanism of completely
disabling it if necessary.  This stems from the incredibly long Design ->
Deployment cyle for Hardware (as opposed to the edit->compile->run cycle for s!

It's obvious that handling Relaxed Ordering is a new Hardware Feature for
Intel's Root Complexes since previous versions simply ignored it (because
that's legal[1]).  If I was a Hardware Engineer tasked with implementing
Relaxed Ordering semantics for a device, I would certainly have also
implemented a switch to turn it off in case there were unintended
consequences (performance in this case).

And if there is such a mechanism to simply disable processing of Relaxed
Ordering semantics in the Root Complex, that would be a far easier "fix" for
this problem ... and leave the code in place to continue sending Relaxed
Ordering TLPs for a future Root Complex implementation which got it right ...

Casey

[1] One can't ~quite~ just ignore the Relaxed Ordering Attribute on an
incoming Transaction Layer Packet Request: PCIe completion rules (see
section 2.2.9 of the PCIe 3.0 specificatin) require that the Relaxed
Ordering and No Snoop Attributes in a Request TLP be reflected back
verbatim in any corresponding Response TLP.  (The rules for ID-Based
Ordering are more complex.)


Re: [PATCH v5] ss: Enclose IPv6 address in brackets

2017-08-04 Thread Eric Dumazet
On Fri, 2017-08-04 at 12:05 -0700, Stephen Hemminger wrote:
> On Fri, 4 Aug 2017 20:02:52 +0200
> Florian Lehner  wrote:
> 
> > diff --git a/misc/ss.c b/misc/ss.c
> > index 12763c9..83683b5 100644
> > --- a/misc/ss.c
> > +++ b/misc/ss.c
> > @@ -1046,8 +1046,9 @@ do_numeric:
> > 
> >  static void inet_addr_print(const inet_prefix *a, int port, unsigned
> > int ifindex)
> >  {
> 
> Your email client is wrapping long lines which leads to malformed patch.
> 
> You didn't need buf2, and the code was more complex than it needed to be.
> 
> Rather than waiting for yet another version, I just merged in similar
> code.


Also, is this new format accepted for the filter ?

ss src [::1]:22






Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-08-04 Thread Raj, Ashok
On Fri, Aug 04, 2017 at 08:20:37PM +, Casey Leedom wrote:
> | From: Raj, Ashok 
> | Sent: Thursday, August 3, 2017 1:31 AM
> |
> | I don't understand this completely.. So your driver would know not to send
> | RO TLP's to root complex. But you want to send RO to the NVMe device? This
> | is the peer-2-peer case correct?
> 
> Yes, this is the "heavy hammer" issue which you alluded to later.  There are
> applications where a device will want to send TLPs to a Root Complex without
> Relaxed Ordering set, but will want to use it when sending TLPs to a Peer
> device (say, an NVMe storage device).  The current approach doesn't make
> that easy ... and in fact, I still don't kow how to code a solution for this
> with the proposed APIs.  This means that we may be trading off one
> performance problem for another and that Relaxed Ordering may be doomed for
> use under Linux for the foreseeable future.
> 
> As I've noted a number of times, it would be great if the Intel Hardware
> Engineers who attempted to implement the Relaxed Ordering semantics in the
> current generation of Root Complexes had left the ability to turn off the
> logic which is obviously not working.  If there was a way to disable the
> logic via an undocumented register, then we could have the Linux PCI Quirk
> do that.  Since Relaxed Ordering is just a hint, it's completely legitimate
> to completely ignore it.

Suppose you are looking for the existence of a chicken bit to instruct the
port to ignore RO traffic. So all we would do is turn the chicken bit on
but would permit p2p traffic to be allowed since we won't turn off the
capability as currently proposed.

Let me look into that keep you posted.

Cheers,
Ashok
> 
> Casey


[PATCH net] bpf: fix byte order test in test_verifier

2017-08-04 Thread Daniel Borkmann
We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.

E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see 13da9e200fe4 ("Revert "endian: #define
__BYTE_ORDER"") and 415586c9e6d3 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:

./test_verifier: ELF 64-bit MSB executable, [...]

Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x 
GNU/Linux

Before fix: Summary: 505 PASSED, 11 FAILED
After  fix: Summary: 516 PASSED,  0 FAILED

Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field 
loads")
Signed-off-by: Daniel Borkmann 
---
 tools/testing/selftests/bpf/test_verifier.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index addea82..d3ed732 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -8,6 +8,7 @@
  * License as published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -1098,7 +1099,7 @@ struct test_val {
"check skb->hash byte load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash)),
 #else
@@ -1135,7 +1136,7 @@ struct test_val {
"check skb->hash byte load not permitted 3",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash) + 3),
 #else
@@ -1244,7 +1245,7 @@ struct test_val {
"check skb->hash half load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash)),
 #else
@@ -1259,7 +1260,7 @@ struct test_val {
"check skb->hash half load not permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, hash) + 2),
 #else
@@ -5422,7 +5423,7 @@ struct test_val {
"check bpf_perf_event_data->sample_period byte load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
offsetof(struct bpf_perf_event_data, 
sample_period)),
 #else
@@ -5438,7 +5439,7 @@ struct test_val {
"check bpf_perf_event_data->sample_period half load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct bpf_perf_event_data, 
sample_period)),
 #else
@@ -5454,7 +5455,7 @@ struct test_val {
"check bpf_perf_event_data->sample_period word load permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
offsetof(struct bpf_perf_event_data, 
sample_period)),
 #else
@@ -5481,7 +5482,7 @@ struct test_val {
"check skb->data half load not permitted",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
offsetof(struct __sk_buff, data)),
 #else
@@ -5497,7 +5498,7 @@ struct test_val {
"check skb->tc_classid half load not permitted for lwt prog",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
-#ifdef __LITTLE_ENDIAN
+#if __BYTE_ORDER == 

[PATCH] wan: dscc4: add checks for dma mapping errors

2017-08-04 Thread Alexey Khoroshilov
The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/wan/dscc4.c | 52 +++--
 1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wan/dscc4.c b/drivers/net/wan/dscc4.c
index 799830ffcae2..1a94f0a95b2c 100644
--- a/drivers/net/wan/dscc4.c
+++ b/drivers/net/wan/dscc4.c
@@ -522,19 +522,27 @@ static inline int try_get_rx_skb(struct dscc4_dev_priv 
*dpriv,
struct RxFD *rx_fd = dpriv->rx_fd + dirty;
const int len = RX_MAX(HDLC_MAX_MRU);
struct sk_buff *skb;
-   int ret = 0;
+   dma_addr_t addr;
 
skb = dev_alloc_skb(len);
dpriv->rx_skbuff[dirty] = skb;
-   if (skb) {
-   skb->protocol = hdlc_type_trans(skb, dev);
-   rx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
- skb->data, len, PCI_DMA_FROMDEVICE));
-   } else {
-   rx_fd->data = 0;
-   ret = -1;
-   }
-   return ret;
+   if (!skb)
+   goto err_out;
+
+   skb->protocol = hdlc_type_trans(skb, dev);
+   addr = pci_map_single(dpriv->pci_priv->pdev,
+ skb->data, len, PCI_DMA_FROMDEVICE);
+   if (pci_dma_mapping_error(dpriv->pci_priv->pdev, addr))
+   goto err_free_skb;
+
+   rx_fd->data = cpu_to_le32(addr);
+   return 0;
+
+err_free_skb:
+   dev_kfree_skb_any(skb);
+err_out:
+   rx_fd->data = 0;
+   return -1;
 }
 
 /*
@@ -1147,14 +1155,22 @@ static netdev_tx_t dscc4_start_xmit(struct sk_buff *skb,
struct dscc4_dev_priv *dpriv = dscc4_priv(dev);
struct dscc4_pci_priv *ppriv = dpriv->pci_priv;
struct TxFD *tx_fd;
+   dma_addr_t addr;
int next;
 
+   addr = pci_map_single(ppriv->pdev, skb->data, skb->len,
+ PCI_DMA_TODEVICE);
+   if (pci_dma_mapping_error(ppriv->pdev, addr)) {
+   dev_kfree_skb_any(skb);
+   dev->stats.tx_errors++;
+   return NETDEV_TX_OK;
+   }
+
next = dpriv->tx_current%TX_RING_SIZE;
dpriv->tx_skbuff[next] = skb;
tx_fd = dpriv->tx_fd + next;
tx_fd->state = FrameEnd | TO_STATE_TX(skb->len);
-   tx_fd->data = cpu_to_le32(pci_map_single(ppriv->pdev, skb->data, 
skb->len,
-PCI_DMA_TODEVICE));
+   tx_fd->data = cpu_to_le32(addr);
tx_fd->complete = 0x;
tx_fd->jiffies = jiffies;
mb();
@@ -1889,14 +1905,20 @@ static struct sk_buff *dscc4_init_dummy_skb(struct 
dscc4_dev_priv *dpriv)
if (skb) {
int last = dpriv->tx_dirty%TX_RING_SIZE;
struct TxFD *tx_fd = dpriv->tx_fd + last;
+   dma_addr_t addr;
 
skb->len = DUMMY_SKB_SIZE;
skb_copy_to_linear_data(skb, version,
strlen(version) % DUMMY_SKB_SIZE);
tx_fd->state = FrameEnd | TO_STATE_TX(DUMMY_SKB_SIZE);
-   tx_fd->data = cpu_to_le32(pci_map_single(dpriv->pci_priv->pdev,
-skb->data, DUMMY_SKB_SIZE,
-PCI_DMA_TODEVICE));
+   addr = pci_map_single(dpriv->pci_priv->pdev,
+ skb->data, DUMMY_SKB_SIZE,
+ PCI_DMA_TODEVICE);
+   if (pci_dma_mapping_error(dpriv->pci_priv->pdev, addr)) {
+   dev_kfree_skb_any(skb);
+   return NULL;
+   }
+   tx_fd->data = cpu_to_le32(addr);
dpriv->tx_skbuff[last] = skb;
}
return skb;
-- 
2.7.4



Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-08-04 Thread Casey Leedom
| From: Raj, Ashok 
| Sent: Thursday, August 3, 2017 1:31 AM
|
| I don't understand this completely.. So your driver would know not to send
| RO TLP's to root complex. But you want to send RO to the NVMe device? This
| is the peer-2-peer case correct?

Yes, this is the "heavy hammer" issue which you alluded to later.  There are
applications where a device will want to send TLPs to a Root Complex without
Relaxed Ordering set, but will want to use it when sending TLPs to a Peer
device (say, an NVMe storage device).  The current approach doesn't make
that easy ... and in fact, I still don't kow how to code a solution for this
with the proposed APIs.  This means that we may be trading off one
performance problem for another and that Relaxed Ordering may be doomed for
use under Linux for the foreseeable future.

As I've noted a number of times, it would be great if the Intel Hardware
Engineers who attempted to implement the Relaxed Ordering semantics in the
current generation of Root Complexes had left the ability to turn off the
logic which is obviously not working.  If there was a way to disable the
logic via an undocumented register, then we could have the Linux PCI Quirk
do that.  Since Relaxed Ordering is just a hint, it's completely legitimate
to completely ignore it.

Casey


[PATCH v2 net-next 6/7] net: ipv6: add second dif to inet6 socket lookups

2017-08-04 Thread David Ahern
Add a second device index, sdif, to inet6 socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after tcp_v4_rcv
tcp_v4_sdif is used.

Signed-off-by: David Ahern 
---
 include/net/inet6_hashtables.h | 22 +-
 include/net/tcp.h  | 10 ++
 net/dccp/ipv6.c|  4 ++--
 net/ipv6/inet6_hashtables.c| 28 +---
 net/ipv6/tcp_ipv6.c| 13 -
 net/ipv6/udp.c |  7 ---
 net/netfilter/xt_TPROXY.c  |  4 ++--
 7 files changed, 56 insertions(+), 32 deletions(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index b87becacd9d3..6e91e38a31da 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -49,7 +49,8 @@ struct sock *__inet6_lookup_established(struct net *net,
const struct in6_addr *saddr,
const __be16 sport,
const struct in6_addr *daddr,
-   const u16 hnum, const int dif);
+   const u16 hnum, const int dif,
+   const int sdif);
 
 struct sock *inet6_lookup_listener(struct net *net,
   struct inet_hashinfo *hashinfo,
@@ -57,7 +58,8 @@ struct sock *inet6_lookup_listener(struct net *net,
   const struct in6_addr *saddr,
   const __be16 sport,
   const struct in6_addr *daddr,
-  const unsigned short hnum, const int dif);
+  const unsigned short hnum,
+  const int dif, const int sdif);
 
 static inline struct sock *__inet6_lookup(struct net *net,
  struct inet_hashinfo *hashinfo,
@@ -66,24 +68,25 @@ static inline struct sock *__inet6_lookup(struct net *net,
  const __be16 sport,
  const struct in6_addr *daddr,
  const u16 hnum,
- const int dif,
+ const int dif, const int sdif,
  bool *refcounted)
 {
struct sock *sk = __inet6_lookup_established(net, hashinfo, saddr,
-   sport, daddr, hnum, dif);
+sport, daddr, hnum,
+dif, sdif);
*refcounted = true;
if (sk)
return sk;
*refcounted = false;
return inet6_lookup_listener(net, hashinfo, skb, doff, saddr, sport,
-daddr, hnum, dif);
+daddr, hnum, dif, sdif);
 }
 
 static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
  struct sk_buff *skb, int doff,
  const __be16 sport,
  const __be16 dport,
- int iif,
+ int iif, int sdif,
  bool *refcounted)
 {
struct sock *sk = skb_steal_sock(skb);
@@ -95,7 +98,7 @@ static inline struct sock *__inet6_lookup_skb(struct 
inet_hashinfo *hashinfo,
return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb,
  doff, _hdr(skb)->saddr, sport,
  _hdr(skb)->daddr, ntohs(dport),
- iif, refcounted);
+ iif, sdif, refcounted);
 }
 
 struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
@@ -107,13 +110,14 @@ struct sock *inet6_lookup(struct net *net, struct 
inet_hashinfo *hashinfo,
 int inet6_hash(struct sock *sk);
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
-#define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif) \
+#define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif, __sdif) \
(((__sk)->sk_portpair == (__ports)) &&  \
 ((__sk)->sk_family == AF_INET6)&&  \
 ipv6_addr_equal(&(__sk)->sk_v6_daddr, (__saddr))   &&  
\
 ipv6_addr_equal(&(__sk)->sk_v6_rcv_saddr, (__daddr))   &&  \
 (!(__sk)->sk_bound_dev_if  ||  \
- 

[PATCH v2 net-next 2/7] net: ipv4: add second dif to inet socket lookups

2017-08-04 Thread David Ahern
Add a second device index, sdif, to inet socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after the cb move
in  tcp_v4_rcv the tcp_v4_sdif helper is used.

Signed-off-by: David Ahern 
---
 include/net/inet_hashtables.h | 31 +--
 include/net/tcp.h | 10 ++
 net/dccp/ipv4.c   |  2 +-
 net/ipv4/inet_hashtables.c| 27 +--
 net/ipv4/tcp_ipv4.c   | 13 -
 net/ipv4/udp.c|  6 +++---
 net/netfilter/xt_TPROXY.c |  2 +-
 7 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 5026b1f08bb8..2dbbbff5e1e3 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -221,16 +221,16 @@ struct sock *__inet_lookup_listener(struct net *net,
const __be32 saddr, const __be16 sport,
const __be32 daddr,
const unsigned short hnum,
-   const int dif);
+   const int dif, const int sdif);
 
 static inline struct sock *inet_lookup_listener(struct net *net,
struct inet_hashinfo *hashinfo,
struct sk_buff *skb, int doff,
__be32 saddr, __be16 sport,
-   __be32 daddr, __be16 dport, int dif)
+   __be32 daddr, __be16 dport, int dif, int sdif)
 {
return __inet_lookup_listener(net, hashinfo, skb, doff, saddr, sport,
- daddr, ntohs(dport), dif);
+ daddr, ntohs(dport), dif, sdif);
 }
 
 /* Socket demux engine toys. */
@@ -262,22 +262,24 @@ static inline struct sock *inet_lookup_listener(struct 
net *net,
   (((__force __u64)(__be32)(__daddr)) << 32) | 
\
   ((__force __u64)(__be32)(__saddr)))
 #endif /* __BIG_ENDIAN */
-#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif)
\
+#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, 
__sdif) \
(((__sk)->sk_portpair == (__ports)) &&  \
 ((__sk)->sk_addrpair == (__cookie))&&  \
 (!(__sk)->sk_bound_dev_if  ||  \
-  ((__sk)->sk_bound_dev_if == (__dif)))&&  \
+  ((__sk)->sk_bound_dev_if == (__dif)) ||  \
+  ((__sk)->sk_bound_dev_if == (__sdif)))   &&  \
 net_eq(sock_net(__sk), (__net)))
 #else /* 32-bit arch */
 #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
const int __name __deprecated __attribute__((unused))
 
-#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif) \
+#define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, 
__sdif) \
(((__sk)->sk_portpair == (__ports)) &&  \
 ((__sk)->sk_daddr  == (__saddr))   &&  \
 ((__sk)->sk_rcv_saddr  == (__daddr))   &&  \
 (!(__sk)->sk_bound_dev_if  ||  \
-  ((__sk)->sk_bound_dev_if == (__dif)))&&  \
+  ((__sk)->sk_bound_dev_if == (__dif)) ||  \
+  ((__sk)->sk_bound_dev_if == (__sdif)))   &&  \
 net_eq(sock_net(__sk), (__net)))
 #endif /* 64-bit arch */
 
@@ -288,7 +290,7 @@ struct sock *__inet_lookup_established(struct net *net,
   struct inet_hashinfo *hashinfo,
   const __be32 saddr, const __be16 sport,
   const __be32 daddr, const u16 hnum,
-  const int dif);
+  const int dif, const int sdif);
 
 static inline struct sock *
inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo,
@@ -297,7 +299,7 @@ static inline struct sock *
const int dif)
 {
return __inet_lookup_established(net, hashinfo, saddr, sport, daddr,
-ntohs(dport), dif);
+ntohs(dport), dif, 0);
 }
 
 static inline struct sock *__inet_lookup(struct net *net,
@@ -305,20 +307,20 @@ static inline struct sock *__inet_lookup(struct net *net,
 struct sk_buff *skb, int doff,
 const __be32 saddr, const __be16 sport,
   

Re: [PATCH net-next 00/10] net: l3mdev: Support for sockets bound to enslaved device

2017-08-04 Thread David Ahern
On 8/1/17 6:41 PM, David Miller wrote:
> From: David Ahern 
> Date: Mon, 31 Jul 2017 20:13:16 -0700
> 
>> Existing code for socket lookups already pass in 6+ arguments. Rather
>> than add another for the enslaved device index, the existing lookups
>> are converted to use a new sk_lookup struct. From there, the enslaved
>> device index becomes another element of the struct.
> 
> Sorry, not gonna happen :-)
> 
> I know it's difficult, but maybe we should think about why we're
> passing so much crap into each lookup.\

The 'crap' is essential data to find a socket -- the hash table (common
lookup functions for multiple backends), destination address, dest port,
source address, source port and device index make 6 parameters to be
matched. The socket data is what I consolidated into 1 struct.
Ultimately that data has to make its way from high level wrappers to
compute_score and INET{6}_MATCH at the very bottom.

> 
> And perhaps, why it can't (for example) be constituted in the lookup
> function itself given sufficient (relevant) context.

There are several contexts for lookups -- ingress packets (ipv4 and v6,
tcp, udp, updlite, dccp), error paths for those protocols, netfilter and
socket diagnostic API. The 5 socket parameters do not have a consistent
storage across those contexts.

Further there are several layers of wrappers to what are the real lookup
functions, each wrapper abstracting some piece of data. The call
heirarchy for inet lookups for example:

tcp_v4_early_demux
__inet_lookup_established

tcp_v4_rcv / dccp_v4_rcv
__inet_lookup_skb
__inet_lookup
__inet_lookup_listener
__inet_lookup_established

tcp_v4_rcv - TCP_TW_SYN
inet_lookup_listener
__inet_lookup_listener

nf_tproxy_get_sock_v4
inet_lookup_listener
__inet_lookup_listener

inet_lookup_established
__inet_lookup_established


inet_diag_find_one_icsk / nf_socket_get_sock_v4
inet_lookup
__inet_lookup
__inet_lookup_listener
__inet_lookup_established

tcp_v4_send_reset - MD5 path
__inet_lookup_listener


tcp_v4_err / dccp_v4_err
__inet_lookup_established


> 
> I think passing a big struct into the lookups, by reference, is a big
> step backwards.
> 
> For one thing, if you pass it by pointer then the compiler can't
> potentially pass parts in registers even if it could.  However
> if you pass it by value, that's actually a possibility.
> 
> But I'd like to avoid this on-stack blob altogether if possible.

Just sent a v2 that just adds sdif to the existing functions.


Re: [PATCH iproute2 1/2] tc actions: Improved batching and time filtered dumping

2017-08-04 Thread Stephen Hemminger
On Wed,  2 Aug 2017 07:46:26 -0400
Jamal Hadi Salim  wrote:

> From: Jamal Hadi Salim 
> 
> dump more than TCA_ACT_MAX_PRIO actions per batch when the kernel
> supports it.
> 
> Introduced keyword "since" for time based filtering of actions.
> Some example (we have 400 actions bound to 400 filters); at
> installation time. Using updated when tc setting the time of
> interest to 120 seconds earlier (we see 400 actions):
> prompt$ hackedtc actions ls action gact since 12| grep index | wc -l
> 400
> 
> go get some coffee and wait for > 120 seconds and try again:
> 
> prompt$ hackedtc actions ls action gact since 12 | grep index | wc -l
> 0
> 
> Lets see a filter bound to one of these actions:
> 
> filter pref 10 u32
> filter pref 10 u32 fh 800: ht divisor 1
> filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
> hit 2 success 1)
>   match 7f02/ at 12 (success 1 )
> action order 1: gact action pass
>  random type none pass val 0
>  index 23 ref 2 bind 1 installed 1145 sec used 802 sec
> Action statistics:
> Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> ...
> 
> that coffee took long, no? It was good.
> 
> Now lets ping -c 1 127.0.0.2, then run the actions again:
> prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
> 1
> 
> More details please:
> prompt$ hackedtc -s actions ls action gact since 12
> 
> action order 0: gact action pass
>  random type none pass val 0
>  index 23 ref 2 bind 1 installed 1270 sec used 30 sec
> Action statistics:
> Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> 
> And the filter?
> filter pref 10 u32
> filter pref 10 u32 fh 800: ht divisor 1
> filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule 
> hit 4 success 2)
>   match 7f02/ at 12 (success 2 )
> action order 1: gact action pass
>  random type none pass val 0
>  index 23 ref 2 bind 1 installed 1324 sec used 84 sec
> Action statistics:
> Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> 
> Signed-off-by: Jamal Hadi Salim 

Applied to net-next branch. Thanks Jamal



[PATCH v2 net-next 7/7] net: ipv6: add second dif to raw socket lookups

2017-08-04 Thread David Ahern
Add a second device index, sdif, to raw socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

Signed-off-by: David Ahern 
---
 include/net/rawv6.h |  2 +-
 net/ipv4/raw_diag.c |  2 +-
 net/ipv6/raw.c  | 13 -
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/net/rawv6.h b/include/net/rawv6.h
index cbe4e9de1894..4addc5c988e0 100644
--- a/include/net/rawv6.h
+++ b/include/net/rawv6.h
@@ -6,7 +6,7 @@
 extern struct raw_hashinfo raw_v6_hashinfo;
 struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
 unsigned short num, const struct in6_addr 
*loc_addr,
-const struct in6_addr *rmt_addr, int dif);
+const struct in6_addr *rmt_addr, int dif, int 
sdif);
 
 int raw_abort(struct sock *sk, int err);
 
diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index c600d3c71d4d..c200065ef9a5 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -52,7 +52,7 @@ static struct sock *raw_lookup(struct net *net, struct sock 
*from,
sk = __raw_v6_lookup(net, from, r->sdiag_raw_protocol,
 (const struct in6_addr *)r->id.idiag_src,
 (const struct in6_addr *)r->id.idiag_dst,
-r->id.idiag_if);
+r->id.idiag_if, 0);
 #endif
return sk;
 }
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 60be012fe708..e4462b0ff801 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -72,7 +72,7 @@ EXPORT_SYMBOL_GPL(raw_v6_hashinfo);
 
 struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
unsigned short num, const struct in6_addr *loc_addr,
-   const struct in6_addr *rmt_addr, int dif)
+   const struct in6_addr *rmt_addr, int dif, int sdif)
 {
bool is_multicast = ipv6_addr_is_multicast(loc_addr);
 
@@ -86,7 +86,9 @@ struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
!ipv6_addr_equal(>sk_v6_daddr, rmt_addr))
continue;
 
-   if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif)
+   if (sk->sk_bound_dev_if &&
+   sk->sk_bound_dev_if != dif &&
+   sk->sk_bound_dev_if != sdif)
continue;
 
if (!ipv6_addr_any(>sk_v6_rcv_saddr)) {
@@ -178,7 +180,8 @@ static bool ipv6_raw_deliver(struct sk_buff *skb, int 
nexthdr)
goto out;
 
net = dev_net(skb->dev);
-   sk = __raw_v6_lookup(net, sk, nexthdr, daddr, saddr, inet6_iif(skb));
+   sk = __raw_v6_lookup(net, sk, nexthdr, daddr, saddr,
+inet6_iif(skb), inet6_sdif(skb));
 
while (sk) {
int filtered;
@@ -222,7 +225,7 @@ static bool ipv6_raw_deliver(struct sk_buff *skb, int 
nexthdr)
}
}
sk = __raw_v6_lookup(net, sk_next(sk), nexthdr, daddr, saddr,
-inet6_iif(skb));
+inet6_iif(skb), inet6_sdif(skb));
}
 out:
read_unlock(_v6_hashinfo.lock);
@@ -378,7 +381,7 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr,
net = dev_net(skb->dev);
 
while ((sk = __raw_v6_lookup(net, sk, nexthdr, saddr, daddr,
-   inet6_iif(skb {
+inet6_iif(skb), inet6_iif(skb {
rawv6_err(sk, skb, NULL, type, code,
inner_offset, info);
sk = sk_next(sk);
-- 
2.1.4



[PATCH v2 net-next 3/7] net: ipv4: add second dif to raw socket lookups

2017-08-04 Thread David Ahern
Add a second device index, sdif, to raw socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

Signed-off-by: David Ahern 
---
 include/net/raw.h   |  2 +-
 net/ipv4/raw.c  | 16 +++-
 net/ipv4/raw_diag.c |  2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/net/raw.h b/include/net/raw.h
index 57c33dd22ec4..99d26d0c4a19 100644
--- a/include/net/raw.h
+++ b/include/net/raw.h
@@ -26,7 +26,7 @@ extern struct proto raw_prot;
 extern struct raw_hashinfo raw_v4_hashinfo;
 struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
 unsigned short num, __be32 raddr,
-__be32 laddr, int dif);
+__be32 laddr, int dif, int sdif);
 
 int raw_abort(struct sock *sk, int err);
 void raw_icmp_error(struct sk_buff *, int, u32);
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b0bb5d0a30bd..2726aecf224b 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -122,7 +122,8 @@ void raw_unhash_sk(struct sock *sk)
 EXPORT_SYMBOL_GPL(raw_unhash_sk);
 
 struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
-   unsigned short num, __be32 raddr, __be32 laddr, int dif)
+unsigned short num, __be32 raddr, __be32 laddr,
+int dif, int sdif)
 {
sk_for_each_from(sk) {
struct inet_sock *inet = inet_sk(sk);
@@ -130,7 +131,8 @@ struct sock *__raw_v4_lookup(struct net *net, struct sock 
*sk,
if (net_eq(sock_net(sk), net) && inet->inet_num == num  &&
!(inet->inet_daddr && inet->inet_daddr != raddr)&&
!(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) &&
-   !(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif))
+   !(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
+ sk->sk_bound_dev_if != sdif))
goto found; /* gotcha */
}
sk = NULL;
@@ -171,6 +173,7 @@ static int icmp_filter(const struct sock *sk, const struct 
sk_buff *skb)
  */
 static int raw_v4_input(struct sk_buff *skb, const struct iphdr *iph, int hash)
 {
+   int sdif = inet_sdif(skb);
struct sock *sk;
struct hlist_head *head;
int delivered = 0;
@@ -184,7 +187,7 @@ static int raw_v4_input(struct sk_buff *skb, const struct 
iphdr *iph, int hash)
net = dev_net(skb->dev);
sk = __raw_v4_lookup(net, __sk_head(head), iph->protocol,
 iph->saddr, iph->daddr,
-skb->dev->ifindex);
+skb->dev->ifindex, sdif);
 
while (sk) {
delivered = 1;
@@ -199,7 +202,7 @@ static int raw_v4_input(struct sk_buff *skb, const struct 
iphdr *iph, int hash)
}
sk = __raw_v4_lookup(net, sk_next(sk), iph->protocol,
 iph->saddr, iph->daddr,
-skb->dev->ifindex);
+skb->dev->ifindex, sdif);
}
 out:
read_unlock(_v4_hashinfo.lock);
@@ -297,12 +300,15 @@ void raw_icmp_error(struct sk_buff *skb, int protocol, 
u32 info)
read_lock(_v4_hashinfo.lock);
raw_sk = sk_head(_v4_hashinfo.ht[hash]);
if (raw_sk) {
+   int dif = skb->dev->ifindex;
+   int sdif = inet_sdif(skb);
+
iph = (const struct iphdr *)skb->data;
net = dev_net(skb->dev);
 
while ((raw_sk = __raw_v4_lookup(net, raw_sk, protocol,
iph->daddr, iph->saddr,
-   skb->dev->ifindex)) != NULL) {
+   dif, sdif)) != NULL) {
raw_err(raw_sk, skb, info);
raw_sk = sk_next(raw_sk);
iph = (const struct iphdr *)skb->data;
diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index e1a51ca68d23..c600d3c71d4d 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -46,7 +46,7 @@ static struct sock *raw_lookup(struct net *net, struct sock 
*from,
sk = __raw_v4_lookup(net, from, r->sdiag_raw_protocol,
 r->id.idiag_dst[0],
 r->id.idiag_src[0],
-r->id.idiag_if);
+r->id.idiag_if, 0);
 #if IS_ENABLED(CONFIG_IPV6)
else
sk = __raw_v6_lookup(net, from, r->sdiag_raw_protocol,
-- 
2.1.4



[PATCH v2 net-next 5/7] net: ipv6: add second dif to udp socket lookups

2017-08-04 Thread David Ahern
Add a second device index, sdif, to udp socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

Early demux lookups are handled in the next patch as part of INET_MATCH
changes.

Signed-off-by: David Ahern 
---
 include/linux/ipv6.h | 10 ++
 include/net/udp.h|  2 +-
 net/ipv4/udp_diag.c  |  4 ++--
 net/ipv6/udp.c   | 40 ++--
 4 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 474d6bbc158c..ac2da4e11d5e 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -159,6 +159,16 @@ static inline bool inet6_is_jumbogram(const struct sk_buff 
*skb)
 }
 
 /* can not be used in TCP layer after tcp_v6_fill_cb */
+static inline int inet6_sdif(const struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+   if (skb && ipv6_l3mdev_skb(IP6CB(skb)->flags))
+   return IP6CB(skb)->iif;
+#endif
+   return 0;
+}
+
+/* can not be used in TCP layer after tcp_v6_fill_cb */
 static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff *skb)
 {
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/include/net/udp.h b/include/net/udp.h
index 826c713d5a48..20dcdca4e85c 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -298,7 +298,7 @@ struct sock *udp6_lib_lookup(struct net *net,
 struct sock *__udp6_lib_lookup(struct net *net,
   const struct in6_addr *saddr, __be16 sport,
   const struct in6_addr *daddr, __be16 dport,
-  int dif, struct udp_table *tbl,
+  int dif, int sdif, struct udp_table *tbl,
   struct sk_buff *skb);
 struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
 __be16 sport, __be16 dport);
diff --git a/net/ipv4/udp_diag.c b/net/ipv4/udp_diag.c
index 1f07fe109535..d0390d844ac8 100644
--- a/net/ipv4/udp_diag.c
+++ b/net/ipv4/udp_diag.c
@@ -53,7 +53,7 @@ static int udp_dump_one(struct udp_table *tbl, struct sk_buff 
*in_skb,
req->id.idiag_sport,
(struct in6_addr *)req->id.idiag_dst,
req->id.idiag_dport,
-   req->id.idiag_if, tbl, NULL);
+   req->id.idiag_if, 0, tbl, NULL);
 #endif
if (sk && !refcount_inc_not_zero(>sk_refcnt))
sk = NULL;
@@ -198,7 +198,7 @@ static int __udp_diag_destroy(struct sk_buff *in_skb,
req->id.idiag_dport,
(struct in6_addr *)req->id.idiag_src,
req->id.idiag_sport,
-   req->id.idiag_if, tbl, NULL);
+   req->id.idiag_if, 0, tbl, NULL);
}
 #endif
else {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 578142b7ca3e..d96a877798a7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -129,7 +129,7 @@ static void udp_v6_rehash(struct sock *sk)
 static int compute_score(struct sock *sk, struct net *net,
 const struct in6_addr *saddr, __be16 sport,
 const struct in6_addr *daddr, unsigned short hnum,
-int dif, bool exact_dif)
+int dif, int sdif, bool exact_dif)
 {
int score;
struct inet_sock *inet;
@@ -161,9 +161,13 @@ static int compute_score(struct sock *sk, struct net *net,
}
 
if (sk->sk_bound_dev_if || exact_dif) {
-   if (sk->sk_bound_dev_if != dif)
+   bool dev_match = (sk->sk_bound_dev_if == dif ||
+ sk->sk_bound_dev_if == sdif);
+
+   if (exact_dif && !dev_match)
return -1;
-   score++;
+   if (sk->sk_bound_dev_if && dev_match)
+   score++;
}
 
if (sk->sk_incoming_cpu == raw_smp_processor_id())
@@ -175,9 +179,9 @@ static int compute_score(struct sock *sk, struct net *net,
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
const struct in6_addr *saddr, __be16 sport,
-   const struct in6_addr *daddr, unsigned int hnum, int dif,
-   bool exact_dif, struct udp_hslot *hslot2,
-   struct sk_buff *skb)
+   const struct in6_addr *daddr, unsigned int hnum,
+   int dif, int sdif, bool exact_dif,
+   struct udp_hslot *hslot2, struct sk_buff *skb)
 {
struct sock *sk, *result;
int score, badness, matches = 0, reuseport = 0;
@@ -187,7 +191,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
badness = -1;

[PATCH v2 net-next 4/7] net: ipv4: add second dif to multicast source filter

2017-08-04 Thread David Ahern
Signed-off-by: David Ahern 
---
 include/linux/igmp.h | 3 ++-
 net/ipv4/igmp.c  | 6 --
 net/ipv4/raw.c   | 2 +-
 net/ipv4/udp.c   | 2 +-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 97caf1821de8..f8231854b5d6 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -118,7 +118,8 @@ extern int ip_mc_msfget(struct sock *sk, struct ip_msfilter 
*msf,
struct ip_msfilter __user *optval, int __user *optlen);
 extern int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf,
struct group_filter __user *optval, int __user *optlen);
-extern int ip_mc_sf_allow(struct sock *sk, __be32 local, __be32 rmt, int dif);
+extern int ip_mc_sf_allow(struct sock *sk, __be32 local, __be32 rmt,
+ int dif, int sdif);
 extern void ip_mc_init_dev(struct in_device *);
 extern void ip_mc_destroy_dev(struct in_device *);
 extern void ip_mc_up(struct in_device *);
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 28f14afd0dd3..5bc8570c2ec3 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -2549,7 +2549,8 @@ int ip_mc_gsfget(struct sock *sk, struct group_filter 
*gsf,
 /*
  * check if a multicast source filter allows delivery for a given 

  */
-int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif)
+int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr,
+  int dif, int sdif)
 {
struct inet_sock *inet = inet_sk(sk);
struct ip_mc_socklist *pmc;
@@ -2564,7 +2565,8 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, 
__be32 rmt_addr, int dif)
rcu_read_lock();
for_each_pmc_rcu(inet, pmc) {
if (pmc->multi.imr_multiaddr.s_addr == loc_addr &&
-   pmc->multi.imr_ifindex == dif)
+   (pmc->multi.imr_ifindex == dif ||
+(sdif && pmc->multi.imr_ifindex == sdif)))
break;
}
ret = inet->mc_all;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 2726aecf224b..33b70bfd1122 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -193,7 +193,7 @@ static int raw_v4_input(struct sk_buff *skb, const struct 
iphdr *iph, int hash)
delivered = 1;
if ((iph->protocol != IPPROTO_ICMP || !icmp_filter(sk, skb)) &&
ip_mc_sf_allow(sk, iph->daddr, iph->saddr,
-  skb->dev->ifindex)) {
+  skb->dev->ifindex, sdif)) {
struct sk_buff *clone = skb_clone(skb, GFP_ATOMIC);
 
/* Not releasing hash table! */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 608b91b912c9..9a77b29f4ebd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -606,7 +606,7 @@ static inline bool __udp_is_mcast_sock(struct net *net, 
struct sock *sk,
(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
 sk->sk_bound_dev_if != sdif))
return false;
-   if (!ip_mc_sf_allow(sk, loc_addr, rmt_addr, dif))
+   if (!ip_mc_sf_allow(sk, loc_addr, rmt_addr, dif, sdif))
return false;
return true;
 }
-- 
2.1.4



[PATCH v2 net-next 1/7] net: ipv4: add second dif to udp socket lookups

2017-08-04 Thread David Ahern
Add a second device index, sdif, to udp socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

Early demux lookups are handled in the next patch as part of INET_MATCH
changes.

Signed-off-by: David Ahern 
---
 include/net/ip.h| 10 +
 include/net/udp.h   |  2 +-
 net/ipv4/udp.c  | 58 +++--
 net/ipv4/udp_diag.c |  6 +++---
 4 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 821cedcc8e73..6a2c4b4aaa98 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -78,6 +78,16 @@ struct ipcm_cookie {
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
 #define PKTINFO_SKB_CB(skb) ((struct in_pktinfo *)((skb)->cb))
 
+/* return enslaved device index if relevant */
+static inline int inet_sdif(struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
+   if (skb && ipv4_l3mdev_skb(IPCB(skb)->flags))
+   return IPCB(skb)->iif;
+#endif
+   return 0;
+}
+
 struct ip_ra_chain {
struct ip_ra_chain __rcu *next;
struct sock *sk;
diff --git a/include/net/udp.h b/include/net/udp.h
index cc8036987dcb..826c713d5a48 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -287,7 +287,7 @@ int udp_lib_setsockopt(struct sock *sk, int level, int 
optname,
 struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 __be32 daddr, __be16 dport, int dif);
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
-  __be32 daddr, __be16 dport, int dif,
+  __be32 daddr, __be16 dport, int dif, int sdif,
   struct udp_table *tbl, struct sk_buff *skb);
 struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
 __be16 sport, __be16 dport);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e6276fa3750b..ff79bcf19276 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -380,8 +380,8 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum)
 
 static int compute_score(struct sock *sk, struct net *net,
 __be32 saddr, __be16 sport,
-__be32 daddr, unsigned short hnum, int dif,
-bool exact_dif)
+__be32 daddr, unsigned short hnum,
+int dif, int sdif, bool exact_dif)
 {
int score;
struct inet_sock *inet;
@@ -413,10 +413,15 @@ static int compute_score(struct sock *sk, struct net *net,
}
 
if (sk->sk_bound_dev_if || exact_dif) {
-   if (sk->sk_bound_dev_if != dif)
+   bool dev_match = (sk->sk_bound_dev_if == dif ||
+ sk->sk_bound_dev_if == sdif);
+
+   if (exact_dif && !dev_match)
return -1;
-   score += 4;
+   if (sk->sk_bound_dev_if && dev_match)
+   score += 4;
}
+
if (sk->sk_incoming_cpu == raw_smp_processor_id())
score++;
return score;
@@ -436,10 +441,11 @@ static u32 udp_ehashfn(const struct net *net, const 
__be32 laddr,
 
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
-   __be32 saddr, __be16 sport,
-   __be32 daddr, unsigned int hnum, int dif, bool exact_dif,
-   struct udp_hslot *hslot2,
-   struct sk_buff *skb)
+__be32 saddr, __be16 sport,
+__be32 daddr, unsigned int hnum,
+int dif, int sdif, bool exact_dif,
+struct udp_hslot *hslot2,
+struct sk_buff *skb)
 {
struct sock *sk, *result;
int score, badness, matches = 0, reuseport = 0;
@@ -449,7 +455,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
badness = 0;
udp_portaddr_for_each_entry_rcu(sk, >head) {
score = compute_score(sk, net, saddr, sport,
- daddr, hnum, dif, exact_dif);
+ daddr, hnum, dif, sdif, exact_dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
@@ -477,8 +483,8 @@ static struct sock *udp4_lib_lookup2(struct net *net,
  * harder than this. -DaveM
  */
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
-   __be16 sport, __be32 daddr, __be16 dport,
-   int dif, struct udp_table *udptable, struct sk_buff *skb)
+   __be16 sport, __be32 daddr, __be16 dport, int dif,
+   int sdif, struct udp_table *udptable, struct sk_buff 

[PATCH v2 net-next 0/7] net: l3mdev: Support for sockets bound to enslaved device

2017-08-04 Thread David Ahern
A missing piece to the VRF puzzle is the ability to bind sockets to
devices enslaved to a VRF. This patch set adds the enslaved device
index, sdif, to IPv4 and IPv6 socket lookups. The end result for users
is the following scope options for services:

1. "global" services - sockets not bound to any device

   Allows 1 service to work across all network interfaces with
   connected sockets bound to the VRF the connection originates
   (Requires net.ipv4.tcp_l3mdev_accept=1 for TCP and
net.ipv4.udp_l3mdev_accept=1 for UDP)

2. "VRF" local services - sockets bound to a VRF

   Sockets work across all network interfaces enslaved to a VRF but
   are limited to just the one VRF.

3. "device" services - sockets bound to a specific network interface

   Service works only through the one specific interface.

v2
- remove sk_lookup struct and add sdif as an argument to existing
  functions

Changes since RFC:
- no significant logic changes; mainly whitespace cleanups

David Ahern (7):
  net: ipv4: add second dif to udp socket lookups
  net: ipv4: add second dif to inet socket lookups
  net: ipv4: add second dif to raw socket lookups
  net: ipv4: add second dif to multicast source filter
  net: ipv6: add second dif to udp socket lookups
  net: ipv6: add second dif to inet6 socket lookups
  net: ipv6: add second dif to raw socket lookups

 include/linux/igmp.h   |  3 +-
 include/linux/ipv6.h   | 10 +++
 include/net/inet6_hashtables.h | 22 --
 include/net/inet_hashtables.h  | 31 +++-
 include/net/ip.h   | 10 +++
 include/net/raw.h  |  2 +-
 include/net/rawv6.h|  2 +-
 include/net/tcp.h  | 20 +
 include/net/udp.h  |  4 +--
 net/dccp/ipv4.c|  2 +-
 net/dccp/ipv6.c|  4 +--
 net/ipv4/igmp.c|  6 ++--
 net/ipv4/inet_hashtables.c | 27 ++---
 net/ipv4/raw.c | 18 
 net/ipv4/raw_diag.c|  4 +--
 net/ipv4/tcp_ipv4.c| 13 +
 net/ipv4/udp.c | 66 --
 net/ipv4/udp_diag.c| 10 +++
 net/ipv6/inet6_hashtables.c| 28 +++---
 net/ipv6/raw.c | 13 +
 net/ipv6/tcp_ipv6.c| 13 +
 net/ipv6/udp.c | 47 --
 net/netfilter/xt_TPROXY.c  |  6 ++--
 23 files changed, 227 insertions(+), 134 deletions(-)

-- 
2.1.4



[PATCH v2 2/2] dt-bindings: net: Document bindings for anarion-gmac

2017-08-04 Thread Alexandru Gagniuc
Signed-off-by: Alexandru Gagniuc 
---
 .../devicetree/bindings/net/anarion-gmac.txt   | 25 ++
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/anarion-gmac.txt

diff --git a/Documentation/devicetree/bindings/net/anarion-gmac.txt 
b/Documentation/devicetree/bindings/net/anarion-gmac.txt
new file mode 100644
index 000..fe67896
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/anarion-gmac.txt
@@ -0,0 +1,25 @@
+*  Adaptrum Anarion ethernet controller
+
+This device is a platform glue layer for stmmac.
+Please see stmmac.txt for the other unchanged properties.
+
+Required properties:
+ - compatible:  Should be "adaptrum,anarion-gmac", "snps,dwmac"
+ - phy-mode:Should be "rgmii". Other modes are not currently supported.
+
+
+Examples:
+
+   gmac1: ethernet@f2014000 {
+   compatible = "adaptrum,anarion-gmac", "snps,dwmac";
+   reg = <0xf2014000 0x4000>, <0xf2018100 8>;
+
+   interrupt-parent = <_intc>;
+   interrupts = <21>;
+   interrupt-names = "macirq";
+
+   clocks = <_clk>;
+   clock-names = "stmmaceth";
+
+   phy-mode = "rgmii";
+   };
-- 
2.9.3



[PATCH v2 1/2] net: stmmac: Add Adaptrum Anarion GMAC glue layer

2017-08-04 Thread Alexandru Gagniuc
Before the GMAC on the Anarion chip can be used, the PHY interface
selection must be configured with the DWMAC block in reset.

This layer covers a block containing only two registers. Although it
is possible to model this as a reset controller and use the "resets"
property of stmmac, it's much more intuitive to include this in the
glue layer instead.

At this time only RGMII is supported, because it is the only mode
which has been validated hardware-wise.

Signed-off-by: Alexandru Gagniuc 
---
Changes since v1:
 * Moved documentation for bindings to separate patch

 drivers/net/ethernet/stmicro/stmmac/Kconfig|   9 ++
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
 .../net/ethernet/stmicro/stmmac/dwmac-anarion.c| 152 +
 3 files changed, 162 insertions(+)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-anarion.c

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig 
b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index 85c0e41..9703576 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -45,6 +45,15 @@ config DWMAC_GENERIC
  platform specific code to function or is using platform
  data for setup.
 
+config DWMAC_ANARION
+   tristate "Adaptrum Anarion GMAC support"
+   default ARC
+   depends on OF && (ARC || COMPILE_TEST)
+   help
+ Support for Adaptrum Anarion GMAC Ethernet controller.
+
+ This selects the Anarion SoC glue layer support for the stmmac driver.
+
 config DWMAC_IPQ806X
tristate "QCA IPQ806x DWMAC support"
default ARCH_QCOM
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index fd4937a..238307f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -7,6 +7,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
 
 # Ordering matters. Generic driver must be last.
 obj-$(CONFIG_STMMAC_PLATFORM)  += stmmac-platform.o
+obj-$(CONFIG_DWMAC_ANARION)+= dwmac-anarion.o
 obj-$(CONFIG_DWMAC_IPQ806X)+= dwmac-ipq806x.o
 obj-$(CONFIG_DWMAC_LPC18XX)+= dwmac-lpc18xx.o
 obj-$(CONFIG_DWMAC_MESON)  += dwmac-meson.o dwmac-meson8b.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-anarion.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-anarion.c
new file mode 100644
index 000..85ce80c
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-anarion.c
@@ -0,0 +1,152 @@
+/*
+ * Adaptrum Anarion DWMAC glue layer
+ *
+ * Copyright (C) 2017, Adaptrum, Inc.
+ * (Written by Alexandru Gagniuc  for Adaptrum, Inc.)
+ * Licensed under the GPLv2 or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "stmmac.h"
+#include "stmmac_platform.h"
+
+#define GMAC_RESET_CONTROL_REG 0
+#define GMAC_SW_CONFIG_REG 4
+#define  GMAC_CONFIG_INTF_SEL_MASK (0x7 << 0)
+#define  GMAC_CONFIG_INTF_RGMII(0x1 << 0)
+
+struct anarion_gmac {
+   uintptr_t ctl_block;
+   uint32_t phy_intf_sel;
+};
+
+static uint32_t gmac_read_reg(struct anarion_gmac *gmac, uint8_t reg)
+{
+   return readl((void *)(gmac->ctl_block + reg));
+};
+
+static void gmac_write_reg(struct anarion_gmac *gmac, uint8_t reg, uint32_t 
val)
+{
+   writel(val, (void *)(gmac->ctl_block + reg));
+}
+
+static int anarion_gmac_init(struct platform_device *pdev, void *priv)
+{
+   uint32_t sw_config;
+   struct anarion_gmac *gmac = priv;
+
+   /* Reset logic, configure interface mode, then release reset. SIMPLE! */
+   gmac_write_reg(gmac, GMAC_RESET_CONTROL_REG, 1);
+
+   sw_config = gmac_read_reg(gmac, GMAC_SW_CONFIG_REG);
+   sw_config &= ~GMAC_CONFIG_INTF_SEL_MASK;
+   sw_config |= (gmac->phy_intf_sel & GMAC_CONFIG_INTF_SEL_MASK);
+   gmac_write_reg(gmac, GMAC_SW_CONFIG_REG, sw_config);
+
+   gmac_write_reg(gmac, GMAC_RESET_CONTROL_REG, 0);
+
+   return 0;
+}
+
+static void anarion_gmac_exit(struct platform_device *pdev, void *priv)
+{
+   struct anarion_gmac *gmac = priv;
+
+   gmac_write_reg(gmac, GMAC_RESET_CONTROL_REG, 1);
+}
+
+static struct anarion_gmac *anarion_config_dt(struct platform_device *pdev)
+{
+   int phy_mode;
+   struct resource *res;
+   void __iomem *ctl_block;
+   struct anarion_gmac *gmac;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   ctl_block = devm_ioremap_resource(>dev, res);
+   if (IS_ERR(ctl_block)) {
+   dev_err(>dev, "Cannot get reset region (%ld)!\n",
+   PTR_ERR(ctl_block));
+   return ctl_block;
+   }
+
+   gmac = devm_kzalloc(>dev, sizeof(*gmac), GFP_KERNEL);
+   if (!gmac)
+   return ERR_PTR(-ENOMEM);
+
+   gmac->ctl_block = (uintptr_t)ctl_block;
+
+   phy_mode = of_get_phy_mode(pdev->dev.of_node);
+   switch 

Re: STABLE: net: reduce skb_warn_bad_offload() noise

2017-08-04 Thread Greg KH
On Fri, Jul 28, 2017 at 10:22:52PM -0700, Eric Dumazet wrote:
> On Fri, 2017-07-28 at 12:30 -0700, David Miller wrote:
> > From: Mark Salyzyn 
> > Date: Fri, 28 Jul 2017 10:29:57 -0700
> > 
> > > Please backport the upstream patch to the stable trees (including
> > > 3.10.y, 3.18.y, 4.4.y and 4.9.y):
> > > 
> > > b2504a5dbef3305ef41988ad270b0e8ec289331c net: reduce
> > > skb_warn_bad_offload() noise
> > > 
> > > Impacting performance or creating unnecessary alarm, and will result
> > > in kernel panic for panic_on_warn configuration.
> > 
> > Yeah this is fine.
> 
> If you do so, also backport 6e7bc478c9a006c701c14476ec9d389a484b4864
> ("net: skb_needs_check() accepts CHECKSUM_NONE for tx")

Thanks for letting us know, I've queued this up as well.

greg k-h


[PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf

2017-08-04 Thread Stephen Hemminger
With new transparent VF support, it is possible to get a deadlock
when some of the deferred work is running and the unregister_vf
is trying to cancel the work element. The solution is to use
trylock and reschedule (similar to bonding and team device).

Reported-by: Vitaly Kuznetsov 
Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c71728d82049..e75c0f852a63 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
struct net_device *vf_netdev;
 
-   rtnl_lock();
+   if (!rtnl_trylock()) {
+   schedule_work(w);
+   return;
+   }
+
vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
if (vf_netdev)
__netvsc_vf_setup(ndev, vf_netdev);
@@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
struct net_device *vf_netdev;
bool vf_is_up;
 
-   rtnl_lock();
+   if (!rtnl_trylock()) {
+   schedule_work(w);
+   return;
+   }
+
vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
if (!vf_netdev)
goto unlock;
-- 
2.11.0



[PATCH net-next 0/1] netvsc: fix deadlock in VF unregister

2017-08-04 Thread Stephen Hemminger
There was a race in VF unregister (in net-next only) which can be triggered
if SR-IOV is disabled on host side, which causes PCI hotplug removal.

Stephen Hemminger (1):
  netvsc: fix rtnl deadlock on unregister of vf

 drivers/net/hyperv/netvsc_drv.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

-- 
2.11.0



Re: [PATCH v5] ss: Enclose IPv6 address in brackets

2017-08-04 Thread Stephen Hemminger
On Fri, 4 Aug 2017 20:02:52 +0200
Florian Lehner  wrote:

> diff --git a/misc/ss.c b/misc/ss.c
> index 12763c9..83683b5 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -1046,8 +1046,9 @@ do_numeric:
> 
>  static void inet_addr_print(const inet_prefix *a, int port, unsigned
> int ifindex)
>  {

Your email client is wrapping long lines which leads to malformed patch.

You didn't need buf2, and the code was more complex than it needed to be.

Rather than waiting for yet another version, I just merged in similar
code.


Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints

2017-08-04 Thread Yonghong Song



On 8/4/17 11:40 AM, Alexei Starovoitov wrote:

On 8/3/17 5:09 PM, Y Song wrote:

On Thu, Aug 3, 2017 at 7:08 PM, Alexei Starovoitov  wrote:

On 8/3/17 6:29 AM, Yonghong Song wrote:


@@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct
pt_regs *regs, long id)
if (!sys_data)
return;

+   prog = READ_ONCE(sys_data->enter_event->prog);
head = this_cpu_ptr(sys_data->enter_event->perf_events);
-   if (hlist_empty(head))
+   if (!prog && hlist_empty(head))
return;

/* get the size after alignment with the u32 buffer size 
field */
@@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, 
struct

pt_regs *regs, long id)
rec->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args,
   (unsigned long *)>args);
+
+   if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, 
rec)) ||

+   hlist_empty(head)) {
+   perf_swevent_put_recursion_context(rctx);
+   return;
+   }



hmm. if I read the patch correctly that makes it different from
kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
force user space to perf_event_open() on every cpu?
In other cases it's the job of the bpf program to filter by cpu
if necessary and that is well understood by bcc scripts.


The patch actually does allow the bpf program to track all cpus.
The test:

+   if (!prog && hlist_empty(head))
return;

ensures that if prog is not empty, it will not return even if the
event in the current cpu is empty. Later on, perf_call_bpf_enter will
be called if prog is not empty. This ensures that
the bpf program will execute regardless of the current cpu.

Maybe I missed anything here?


you're right. sorry. misread && for ||.
That part looks good indeed.

Another question...
that part:
 if (is_tracepoint) {
 int off = trace_event_get_offsets(event->tp_event);

 if (prog->aux->max_ctx_offset > off) {
seems to be not used in this new path...
or new is_syscall_tp is also is_tracepoint ?


Good catch! I think I need "is_tracepoint || is_syscall_tp" here.
If trace_event_get_offsets can get the correct offset for the current
particular syscall_{enter|exit}_* event, we will be find.
I will double check this and have another patch.


If so, then it's ok...
and trace_event_get_offsets() returns the actual number
of syscall args or always upper bound of 6?


Since the specific event is fed here, I think the actual number
will be returned.


just curious how this new code checks that bpf prog cannot
access args[6+].

Thanks!



Re: [PATCH] samples/bpf: Fix cross compiler error with bpf sample

2017-08-04 Thread Daniel Borkmann

On 08/04/2017 08:33 PM, Joel Fernandes wrote:

On Fri, Aug 4, 2017 at 6:58 AM, Daniel Borkmann  wrote:

On 08/04/2017 07:46 AM, Joel Fernandes wrote:


When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. Fix it by allowing
the
deprecated syscall in the sample.

Signed-off-by: Joel Fernandes 
---
   samples/bpf/map_perf_test_user.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/samples/bpf/map_perf_test_user.c
b/samples/bpf/map_perf_test_user.c
index 1a8894b5ac51..6e6fc7121640 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -8,7 +8,9 @@
   #include 
   #include 
   #include 
+#define __ARCH_WANT_SYSCALL_DEPRECATED
   #include 
+#undef __ARCH_WANT_SYSCALL_DEPRECATED



So the only arch that sets __ARCH_WANT_SYSCALL_DEPRECATED
is score:

   $ git grep -n __ARCH_WANT_SYSCALL_DEPRECATED
   arch/score/include/uapi/asm/unistd.h:7:#define
__ARCH_WANT_SYSCALL_DEPRECATED
   include/uapi/asm-generic/unistd.h:837:#ifdef
__ARCH_WANT_SYSCALL_DEPRECATED
   include/uapi/asm-generic/unistd.h:899:#endif /*
__ARCH_WANT_SYSCALL_DEPRECATED */

But even if this would make aarch64 compile, the syscall
numbers don't match up:

   $ git grep -n __NR_getpgrp include/uapi/asm-generic/unistd.h
   include/uapi/asm-generic/unistd.h:841:#define __NR_getpgrp 1060
   include/uapi/asm-generic/unistd.h:843:__SYSCALL(__NR_getpgrp, sys_getpgrp)

The only thing that can be found on arm64 is:

   $ git grep -n __NR_getpgrp arch/arm64/
   arch/arm64/include/asm/unistd32.h:154:#define __NR_getpgrp 65
   arch/arm64/include/asm/unistd32.h:155:__SYSCALL(__NR_getpgrp, sys_getpgrp)

In arch/arm64/include/asm/unistd.h, it does include the
uapi/asm/unistd.h when compat is not set, but without the
__ARCH_WANT_SYSCALL_DEPRECATED. That doesn't look correct
unless I'm missing something, hmm, can't we just attach the
kprobes to a different syscall, one that is not deprecated,
so that we don't run into this in the first place?


Yes, I agree that's better. I think we can use getpgid. I'll try to
whip something today and send it out.


Ok, cool. Please make sure that this doesn't clash with anything
else attached to map_perf_test_kern.c already given the obj file
is loaded first with the attachment points.


I also wanted to fix something else, HOSTCC is set to gcc, but I want
the boostrap part of the sample to run an ARM so I have to make HOSTCC
my cross-compiler. Right now I'm hacking it to point to the arm64 gcc
however I think I'd like to add a 'cross compile mode' or something
whether HOSTCC points to CROSS_COMPILE instead. I'm happy to discuss
any ideas to get this fixed too.


Yeah, sounds like a good idea to add such possibility. In case of
cross compiling to a target arch with different endianess, you might
also need to specifically select bpfeb (big endian) resp. bpfel
(little endian) as clang target. (Just bpf target uses host endianess.)

Thanks, Joel!


Re: [PATCH net-next v3 1/2] bpf: add support for sys_enter_* and sys_exit_* tracepoints

2017-08-04 Thread Alexei Starovoitov

On 8/3/17 5:09 PM, Y Song wrote:

On Thu, Aug 3, 2017 at 7:08 PM, Alexei Starovoitov  wrote:

On 8/3/17 6:29 AM, Yonghong Song wrote:


@@ -578,8 +596,9 @@ static void perf_syscall_enter(void *ignore, struct
pt_regs *regs, long id)
if (!sys_data)
return;

+   prog = READ_ONCE(sys_data->enter_event->prog);
head = this_cpu_ptr(sys_data->enter_event->perf_events);
-   if (hlist_empty(head))
+   if (!prog && hlist_empty(head))
return;

/* get the size after alignment with the u32 buffer size field */
@@ -594,6 +613,13 @@ static void perf_syscall_enter(void *ignore, struct
pt_regs *regs, long id)
rec->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args,
   (unsigned long *)>args);
+
+   if ((prog && !perf_call_bpf_enter(prog, regs, sys_data, rec)) ||
+   hlist_empty(head)) {
+   perf_swevent_put_recursion_context(rctx);
+   return;
+   }



hmm. if I read the patch correctly that makes it different from
kprobe/uprobe/tracepoints+bpf behavior. Why make it different and
force user space to perf_event_open() on every cpu?
In other cases it's the job of the bpf program to filter by cpu
if necessary and that is well understood by bcc scripts.


The patch actually does allow the bpf program to track all cpus.
The test:

+   if (!prog && hlist_empty(head))
return;

ensures that if prog is not empty, it will not return even if the
event in the current cpu is empty. Later on, perf_call_bpf_enter will
be called if prog is not empty. This ensures that
the bpf program will execute regardless of the current cpu.

Maybe I missed anything here?


you're right. sorry. misread && for ||.
That part looks good indeed.

Another question...
that part:
if (is_tracepoint) {
int off = trace_event_get_offsets(event->tp_event);

if (prog->aux->max_ctx_offset > off) {
seems to be not used in this new path...
or new is_syscall_tp is also is_tracepoint ?
If so, then it's ok...
and trace_event_get_offsets() returns the actual number
of syscall args or always upper bound of 6?
just curious how this new code checks that bpf prog cannot
access args[6+].

Thanks!



Re: [PATCH] samples/bpf: Fix cross compiler error with bpf sample

2017-08-04 Thread Joel Fernandes
On Fri, Aug 4, 2017 at 6:58 AM, Daniel Borkmann  wrote:
> On 08/04/2017 07:46 AM, Joel Fernandes wrote:
>>
>> When cross-compiling the bpf sample map_perf_test for aarch64, I find that
>> __NR_getpgrp is undefined. This causes build errors. Fix it by allowing
>> the
>> deprecated syscall in the sample.
>>
>> Signed-off-by: Joel Fernandes 
>> ---
>>   samples/bpf/map_perf_test_user.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/samples/bpf/map_perf_test_user.c
>> b/samples/bpf/map_perf_test_user.c
>> index 1a8894b5ac51..6e6fc7121640 100644
>> --- a/samples/bpf/map_perf_test_user.c
>> +++ b/samples/bpf/map_perf_test_user.c
>> @@ -8,7 +8,9 @@
>>   #include 
>>   #include 
>>   #include 
>> +#define __ARCH_WANT_SYSCALL_DEPRECATED
>>   #include 
>> +#undef __ARCH_WANT_SYSCALL_DEPRECATED
>
>
> So the only arch that sets __ARCH_WANT_SYSCALL_DEPRECATED
> is score:
>
>   $ git grep -n __ARCH_WANT_SYSCALL_DEPRECATED
>   arch/score/include/uapi/asm/unistd.h:7:#define
> __ARCH_WANT_SYSCALL_DEPRECATED
>   include/uapi/asm-generic/unistd.h:837:#ifdef
> __ARCH_WANT_SYSCALL_DEPRECATED
>   include/uapi/asm-generic/unistd.h:899:#endif /*
> __ARCH_WANT_SYSCALL_DEPRECATED */
>
> But even if this would make aarch64 compile, the syscall
> numbers don't match up:
>
>   $ git grep -n __NR_getpgrp include/uapi/asm-generic/unistd.h
>   include/uapi/asm-generic/unistd.h:841:#define __NR_getpgrp 1060
>   include/uapi/asm-generic/unistd.h:843:__SYSCALL(__NR_getpgrp, sys_getpgrp)
>
> The only thing that can be found on arm64 is:
>
>   $ git grep -n __NR_getpgrp arch/arm64/
>   arch/arm64/include/asm/unistd32.h:154:#define __NR_getpgrp 65
>   arch/arm64/include/asm/unistd32.h:155:__SYSCALL(__NR_getpgrp, sys_getpgrp)
>
> In arch/arm64/include/asm/unistd.h, it does include the
> uapi/asm/unistd.h when compat is not set, but without the
> __ARCH_WANT_SYSCALL_DEPRECATED. That doesn't look correct
> unless I'm missing something, hmm, can't we just attach the
> kprobes to a different syscall, one that is not deprecated,
> so that we don't run into this in the first place?

Yes, I agree that's better. I think we can use getpgid. I'll try to
whip something today and send it out.

I also wanted to fix something else, HOSTCC is set to gcc, but I want
the boostrap part of the sample to run an ARM so I have to make HOSTCC
my cross-compiler. Right now I'm hacking it to point to the arm64 gcc
however I think I'd like to add a 'cross compile mode' or something
whether HOSTCC points to CROSS_COMPILE instead. I'm happy to discuss
any ideas to get this fixed too.

thanks!

-Joel


>
> Thanks,
> Daniel


Re: [PATCH net] xgene: Always get clk source, but ignore if it's missing for SGMII ports

2017-08-04 Thread David Miller
From: Thomas Bogendoerfer 
Date: Thu, 3 Aug 2017 15:43:14 +0200

> From: Thomas Bogendoerfer 
> 
> Even the driver doesn't do anything with the clk source for SGMII
> ports it needs to be enabled by doing a devm_clk_get(), if there is
> a clk source in DT.
> 
> Fixes: 0db01097cabd ('xgene: Don't fail probe, if there is no clk resource 
> for SGMII interfaces')
> Signed-off-by: Thomas Bogendoerfer 
> Tested-by: Laura Abbott 
> Acked-by: Iyappan Subramanian 

Applied, thanks.


Re: [net-next PATCH v2] net: comment fixes against BPF devmap helper calls

2017-08-04 Thread David Miller
From: John Fastabend 
Date: Fri, 04 Aug 2017 08:24:05 -0700

> Update BPF comments to accurately reflect XDP usage.
> 
> Fixes: 97f91a7cf04ff ("bpf: add bpf_redirect_map helper routine")
> Reported-by: Alexei Starovoitov 
> Signed-off-by: John Fastabend 

Applied, thanks.


Re: [PATCH net-next 2/5] ipv6: sr: export SRH insertion functions

2017-08-04 Thread David Miller
From: David Lebrun 
Date: Fri, 4 Aug 2017 15:24:12 +0200

> +EXPORT_SYMBOL(seg6_do_srh_encap);
 ...
> +EXPORT_SYMBOL(seg6_do_srh_inline);

EXPORT_SYMBOL_GPL() please.


Re: [PATCH] MIPS: Add missing file for eBPF JIT.

2017-08-04 Thread David Miller
From: Daniel Borkmann 
Date: Fri, 04 Aug 2017 15:05:19 +0200

> On 08/04/2017 02:10 AM, David Daney wrote:
>> Inexplicably, commit f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
>> lost a file somewhere on its path to Linus' tree.  Add back the
>> missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected.
>>
>> This version of ebpf_jit.c is identical to the original except for two
>> minor change need to resolve conflicts with changes merged from the
>> BPF branch:
>>
>> A) Set prog->jited_len = image_size;
>> B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X
>>
>> Fixes: f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
>> Signed-off-by: David Daney 
>> ---
>>
>> It might be best to merge this along the path of BPF fixes rather than
>> MIPS, as the MIPS maintainer (Ralf) seems to be inactive recently.
> 
> Looks like situation is that multiple people including myself tried
> to contact Ralf due to 'half/mis-applied' MIPS BPF JIT in [1,2] that
> sits currently in Linus tree, but never got a reply back since mid
> June.
> 
> Given the work was accepted long ago but incorrectly merged, would be
> great if this could still be fixed up with this patch. Given Ralf
> seems
> unfortunately unresponsive, is there a chance, if people are fine with
> it, that we could try route this fix e.g. via -net instead before a
> final v4.13?
> 
> Anyway, the generic pieces interacting with core BPF look good to me:
> 
> Acked-by: Daniel Borkmann 

Ok, I've applied this to the net GIT tree.

Thanks.


Re: [patch net-next v2 00/20] net: sched: summer cleanup part 1, mainly in exts area

2017-08-04 Thread David Miller
From: Jiri Pirko 
Date: Fri,  4 Aug 2017 14:28:55 +0200

> From: Jiri Pirko 
> 
> This patchset is one of the couple cleanup patchsets I have in queue.
> The motivation aside the obvious need to "make things nicer" is also
> to prepare for shared filter blocks introduction. That requires tp->q
> removal, and therefore removal of all tp->q users.
> 
> Patch 1 is just some small thing I spotted on the way
> Patch 2 removes one user of tp->q, namely tcf_em_tree_change
> Patches 3-8 do preparations for exts->nr_actions removal
> Patches 9-10 do simple renames of functions in cls*
> Patches 11-19 remove unnecessary calls of tcf_exts_change helper
> The last patch changes tcf_exts_change to don't take lock
> 
> Tested by tools/testing/selftests/tc-testing
> 
> v1->v2:
> - removed conversion of action array to list as noted by Cong
> - added the past patch instead
> - small rebases of patches 11-19

Series applied, thanks Jiri.


Re: [PATCH net 0/2] Two BPF fixes for s390

2017-08-04 Thread David Miller
From: Daniel Borkmann 
Date: Fri,  4 Aug 2017 14:20:53 +0200

> Found while testing some other work touching JITs.

Series applied and patch #1 queued up for -stable, thanks!


Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-04 Thread Willy Tarreau
On Fri, Aug 04, 2017 at 02:01:34PM -0400, Neal Cardwell wrote:
> On Fri, Aug 4, 2017 at 1:10 PM, Willy Tarreau  wrote:
> > Hi Neal,
> >
> > On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
> >> I have attached patches for this fix rebased on to v3.10.107, the
> >> latest stable release for 3.10. That's pretty far back in history, so
> >> there were substantial conflict resolutions and adjustments required.
> >> :-) Hope that helps.
> >
> > At least it will help me :-)
> >
> > Do you suggest that I queue them for 3.10.108, that I wait for Maowenan
> > to test them more broadly first or anything else ?
> 
> Let's wait for Maowenan to test them first.

Fine, thanks.
Willy


Re: [patch net 0/2] mlxsw: Couple of fixes

2017-08-04 Thread David Miller
From: Jiri Pirko 
Date: Fri,  4 Aug 2017 14:12:28 +0200

> From: Jiri Pirko 
> 
> Ido says:
> 
> The first patch prevents us from warning about valid situations that can
> happen due to the fact that some operations in switchdev are deferred.
> 
> Second patch fixes a long standing problem in which we didn't correctly
> free resources upon module removal, resulting in a memory leak.

Series applied, thanks!


Re: [PATCH net-next] net: hns: Fix for __udivdi3 compiler error

2017-08-04 Thread David Miller
From: Yunsheng Lin 
Date: Fri, 4 Aug 2017 17:24:59 +0800

> This patch fixes the __udivdi3 undefined error reported by
> test robot.
> 
> Fixes: b8c17f708831 ("net: hns: Add self-adaptive interrupt coalesce support 
> in hns driver")
> Signed-off-by: Yunsheng Lin 

Applied, thank you.


Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-04 Thread Neal Cardwell
On Fri, Aug 4, 2017 at 1:10 PM, Willy Tarreau  wrote:
> Hi Neal,
>
> On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
>> I have attached patches for this fix rebased on to v3.10.107, the
>> latest stable release for 3.10. That's pretty far back in history, so
>> there were substantial conflict resolutions and adjustments required.
>> :-) Hope that helps.
>
> At least it will help me :-)
>
> Do you suggest that I queue them for 3.10.108, that I wait for Maowenan
> to test them more broadly first or anything else ?

Let's wait for Maowenan to test them first.

Thanks!

neal


[PATCH v5] ss: Enclose IPv6 address in brackets

2017-08-04 Thread Florian Lehner
This updated patch adds support for RFC2732 IPv6 address format with
brackets for the tool ss.

Now checking the complete IPv6 address if it is IN6ADDR_ANY.

Signed-off-by: Lehner Florian 
---
 misc/ss.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 12763c9..83683b5 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1046,8 +1046,9 @@ do_numeric:

 static void inet_addr_print(const inet_prefix *a, int port, unsigned
int ifindex)
 {
-   char buf[1024];
+   char buf[1024], buf2[1024];
const char *ap = buf;
+   char *c = NULL;
int est_len = addr_width;
const char *ifname = NULL;

@@ -1059,7 +1060,18 @@ static void inet_addr_print(const inet_prefix *a,
int port, unsigned int ifindex
ap = format_host(AF_INET, 4, a->data);
}
} else {
-   ap = format_host(a->family, 16, a->data);
+   if (a->data[0] == 0 && a->data[1] == 0 &&
+   a->data[2] == 0 && a->data[3] == 0) {
+   buf[0] = '*';
+   buf[1] = 0;
+   } else {
+   ap = format_host(a->family, 16, a->data);
+   c = strchr(ap, ':');
+   if (c != NULL && a->family == AF_INET6) {
+   sprintf(buf2, "[%s]", ap);
+   ap = buf2;
+   }
+   }
est_len = strlen(ap);
if (est_len <= addr_width)
est_len = addr_width;
-- 
2.9.4


Re: [PATCH] net: phy: marvell: logical vs bitwise OR typo

2017-08-04 Thread David Miller
From: Dan Carpenter 
Date: Fri, 4 Aug 2017 11:17:21 +0300

> This was supposed to be a bitwise OR but there is a || vs | typo.
> 
> Fixes: 864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay 
> configuration")
> Signed-off-by: Dan Carpenter 

Applied, but please specify "[PATCH net-next]" in your Subject line
and make your target tree explicit in the future.

Thanks.


Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread David Miller
From: David Ahern 
Date: Fri, 4 Aug 2017 11:11:40 -0600

> On 8/4/17 11:07 AM, Eric Dumazet wrote:
>> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
>>> From: Shaohua Li 
>>>
>>> In a syn flooding test, the fib6_table rwlock is a significant
>>> bottleneck. While converting the rwlock to rcu sounds straighforward,
>>> but is very challenging if it's possible. A percpu spinlock (lglock has
>>> been removed from kernel, so I added a simple implementation here) is
>>> quite trival for this problem since updating the routing table is a rare
>>> event. In my test, the server receives around 1.5 Mpps in syn flooding
>>> test without the patch in a dual sockets and 56-CPU system. With the
>>> patch, the server receives around 3.8Mpps, and perf report doesn't show
>>> the locking issue.
>>>
>>> Of course the percpu lock isn't as good as rcu, so this isn't intended
>>> to replace rcu, but this is much better than current readwrite lock.
>>> Before we have a rcu implementation, this is a good temporary solution.
>>> Plus, this is a trival change, there is nothing to prevent pursuing a
>>> rcu implmentation.
>>>
>>> Cc: Wei Wang 
>>> Cc: Eric Dumazet 
>>> Cc: Stephen Hemminger 
>>> Signed-off-by: Shaohua Li 
>>> ---
>> 
>> Wei has almost done the RCU conversion.
>> 
>> This patch is probably coming too late.
> 
> 
> +1
> 
> I'd rather see the RCU conversion than a move to per-cpu locks.

Me too.


Re: [PATCH] net: phy: marvell: logical vs bitwise OR typo

2017-08-04 Thread Andrew Lunn
On Fri, Aug 04, 2017 at 02:54:45PM +, David Laight wrote:
> From: Dan Carpenter
> > Sent: 04 August 2017 09:17
> > This was supposed to be a bitwise OR but there is a || vs | typo.
> > 
> > Fixes: 864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay 
> > configuration")
> > Signed-off-by: Dan Carpenter 
> > 
> > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > index 361fe9927ef2..15cbcdba618a 100644
> > --- a/drivers/net/phy/marvell.c
> > +++ b/drivers/net/phy/marvell.c
> > @@ -83,7 +83,7 @@
> >  #define MII_88E1121_PHY_MSCR_REG   21
> >  #define MII_88E1121_PHY_MSCR_RX_DELAY  BIT(5)
> >  #define MII_88E1121_PHY_MSCR_TX_DELAY  BIT(4)
> > -#define MII_88E1121_PHY_MSCR_DELAY_MASK(~(BIT(5) || BIT(4)))
> > +#define MII_88E1121_PHY_MSCR_DELAY_MASK(~(BIT(5) | BIT(4)))
> 
> Wouldn't:
> +#define MII_88E1121_PHY_MSCR_DELAY_MASK  
> (~(MII_88E1121_PHY_MSCR_RX_DELAY | MII_88E1121_PHY_MSCR_TX_DELAY))
> be more correct?
> If a little long?
> Actually the ~ looks odd here
> Reads code
> Kill the define and explicitly mask off the two values just before
> conditionally setting them.

Hi David

I will put this on my TODO list. But lets get Dan's fix included
first.

Andrew


Re: [PATCH] net: phy: marvell: logical vs bitwise OR typo

2017-08-04 Thread Andrew Lunn
On Fri, Aug 04, 2017 at 11:17:21AM +0300, Dan Carpenter wrote:
> This was supposed to be a bitwise OR but there is a || vs | typo.
> 
> Fixes: 864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay 
> configuration")
> Signed-off-by: Dan Carpenter 

Hi Dan

Thanks for the fix.

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics

2017-08-04 Thread Eric Dumazet
On Fri, 2017-08-04 at 10:11 -0700, Eric Dumazet wrote:

> You could add a debug version of u64_stats_update_begin()
> 
> doing 
> 
> int ret = atomic_inc((atomic_t *)syncp);

I meant atomic_inc_return() of course.

> 
> BUG_ON(ret & 1);
> 
> 
> And u64_stats_update_end()
> 
> int ret = atomic_inc((atomic_t *)syncp);
> 
> BUG_ON(!(ret & 1));
> 
> 
> We probably could have a CONFIG_DEBUG_U64_STATS  that could be used on
> 64bit kernels as well...
> 
> 
> 




Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread Wei Wang
On Fri, Aug 4, 2017 at 10:11 AM, David Ahern  wrote:
> On 8/4/17 11:07 AM, Eric Dumazet wrote:
>> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
>>> From: Shaohua Li 
>>>
>>> In a syn flooding test, the fib6_table rwlock is a significant
>>> bottleneck. While converting the rwlock to rcu sounds straighforward,
>>> but is very challenging if it's possible. A percpu spinlock (lglock has
>>> been removed from kernel, so I added a simple implementation here) is
>>> quite trival for this problem since updating the routing table is a rare
>>> event. In my test, the server receives around 1.5 Mpps in syn flooding
>>> test without the patch in a dual sockets and 56-CPU system. With the
>>> patch, the server receives around 3.8Mpps, and perf report doesn't show
>>> the locking issue.
>>>
>>> Of course the percpu lock isn't as good as rcu, so this isn't intended
>>> to replace rcu, but this is much better than current readwrite lock.
>>> Before we have a rcu implementation, this is a good temporary solution.
>>> Plus, this is a trival change, there is nothing to prevent pursuing a
>>> rcu implmentation.
>>>
>>> Cc: Wei Wang 
>>> Cc: Eric Dumazet 
>>> Cc: Stephen Hemminger 
>>> Signed-off-by: Shaohua Li 
>>> ---
>>
>> Wei has almost done the RCU conversion.
>>
>> This patch is probably coming too late.
>
>
> +1
>
> I'd rather see the RCU conversion than a move to per-cpu locks.

I am actively working on the RCU conversion.
The main coding part is mostly done and I am working on testing it.
Some more time is needed to catch the rest of the missing pieces and
get the patches ready.

Thanks.
Wei


Re: [PATCH net 1/2] bpf, s390: fix jit branch offset related to ldimm64

2017-08-04 Thread David Miller
From: Michael Holzheu 
Date: Fri, 4 Aug 2017 19:10:33 +0200

> At least I would vote for "Cc: stable".

No, please do not ever do this for networking patches.


Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics

2017-08-04 Thread Eric Dumazet
On Fri, 2017-08-04 at 08:51 -0700, Florian Fainelli wrote:
> On 08/03/2017 10:36 PM, Eric Dumazet wrote:
> > On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote:
> >> During testing with a background iperf pushing 1Gbit/sec worth of
> >> traffic and having both ifconfig and ethtool collect statistics, we
> >> could see quite frequent deadlocks. Convert the often accessed DSA slave
> >> network devices statistics to per-cpu 64-bit statistics to remove these
> >> deadlocks and provide fast efficient statistics updates.
> >>
> > 
> > This seems to be a bug fix, it would be nice to get a proper tag like :
> > 
> > Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics")
> 
> Right, should have been added, thanks!
> 
> > 
> > Problem here is that if multiple cpus can call dsa_switch_rcv() at the
> > same time, then u64_stats_update_begin() contract is not respected.
> 
> This is really where I struggled understanding what is wrong in the
> non-per CPU version, my understanding is that we have:
> 
> - writers for xmit executes in process context
> - writers for receive executes from NAPI (from the DSA's master network
> device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa
> -> netif_receive_skb)
> 
> readers should all execute in process context. The test scenario that
> led to a deadlock involved running iperf in the background, having a
> while loop with both ifconfig and ethtool reading stats, and somehow
> when iperf exited, either reader would just be locked. So I guess this
> leaves us with the two writers not being mutually excluded then, right?

You could add a debug version of u64_stats_update_begin()

doing 

int ret = atomic_inc((atomic_t *)syncp);

BUG_ON(ret & 1);


And u64_stats_update_end()

int ret = atomic_inc((atomic_t *)syncp);

BUG_ON(!(ret & 1));


We probably could have a CONFIG_DEBUG_U64_STATS  that could be used on
64bit kernels as well...





Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread David Ahern
On 8/4/17 11:07 AM, Eric Dumazet wrote:
> On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
>> From: Shaohua Li 
>>
>> In a syn flooding test, the fib6_table rwlock is a significant
>> bottleneck. While converting the rwlock to rcu sounds straighforward,
>> but is very challenging if it's possible. A percpu spinlock (lglock has
>> been removed from kernel, so I added a simple implementation here) is
>> quite trival for this problem since updating the routing table is a rare
>> event. In my test, the server receives around 1.5 Mpps in syn flooding
>> test without the patch in a dual sockets and 56-CPU system. With the
>> patch, the server receives around 3.8Mpps, and perf report doesn't show
>> the locking issue.
>>
>> Of course the percpu lock isn't as good as rcu, so this isn't intended
>> to replace rcu, but this is much better than current readwrite lock.
>> Before we have a rcu implementation, this is a good temporary solution.
>> Plus, this is a trival change, there is nothing to prevent pursuing a
>> rcu implmentation.
>>
>> Cc: Wei Wang 
>> Cc: Eric Dumazet 
>> Cc: Stephen Hemminger 
>> Signed-off-by: Shaohua Li 
>> ---
> 
> Wei has almost done the RCU conversion.
> 
> This patch is probably coming too late.


+1

I'd rather see the RCU conversion than a move to per-cpu locks.


Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-04 Thread Willy Tarreau
Hi Neal,

On Fri, Aug 04, 2017 at 12:59:51PM -0400, Neal Cardwell wrote:
> I have attached patches for this fix rebased on to v3.10.107, the
> latest stable release for 3.10. That's pretty far back in history, so
> there were substantial conflict resolutions and adjustments required.
> :-) Hope that helps.

At least it will help me :-)

Do you suggest that I queue them for 3.10.108, that I wait for Maowenan
to test them more broadly first or anything else ?

I'm fine with any option.
Thanks!
Willy


Re: [PATCH net 1/2] bpf, s390: fix jit branch offset related to ldimm64

2017-08-04 Thread Michael Holzheu
Am Fri, 04 Aug 2017 15:52:47 +0200
schrieb Daniel Borkmann :

> On 08/04/2017 03:44 PM, Michael Holzheu wrote:
> > Am Fri,  4 Aug 2017 14:20:54 +0200
> > schrieb Daniel Borkmann :
> [...]
> >
> > What about "Cc: sta...@vger.kernel.org"?
> 
> Handled by Dave, see also: Documentation/networking/netdev-FAQ.txt +117

Thanks, good to know! At least I would vote for "Cc: stable".

Michael



Re: [PATCH 0/3] ARM: dts: keystone-k2g: Add DCAN instances to 66AK2G

2017-08-04 Thread Santosh Shilimkar

Hi Franklin,

On 8/2/2017 1:18 PM, Franklin S Cooper Jr wrote:

Add D CAN nodes to 66AK2G based SoC dtsi.

Franklin S Cooper Jr (2):
   dt-bindings: net: c_can: Update binding for clock and power-domains
 property
   ARM: configs: keystone: Enable D_CAN driver

Lokesh Vutla (1):
   ARM: dts: k2g: Add DCAN nodes


Any DCAN driver dependency with these patchset ? If not, I can
queue this up so do let me know.

Regards,
Santosh


Re: [PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread Eric Dumazet
On Fri, 2017-08-04 at 09:38 -0700, Shaohua Li wrote:
> From: Shaohua Li 
> 
> In a syn flooding test, the fib6_table rwlock is a significant
> bottleneck. While converting the rwlock to rcu sounds straighforward,
> but is very challenging if it's possible. A percpu spinlock (lglock has
> been removed from kernel, so I added a simple implementation here) is
> quite trival for this problem since updating the routing table is a rare
> event. In my test, the server receives around 1.5 Mpps in syn flooding
> test without the patch in a dual sockets and 56-CPU system. With the
> patch, the server receives around 3.8Mpps, and perf report doesn't show
> the locking issue.
> 
> Of course the percpu lock isn't as good as rcu, so this isn't intended
> to replace rcu, but this is much better than current readwrite lock.
> Before we have a rcu implementation, this is a good temporary solution.
> Plus, this is a trival change, there is nothing to prevent pursuing a
> rcu implmentation.
> 
> Cc: Wei Wang 
> Cc: Eric Dumazet 
> Cc: Stephen Hemminger 
> Signed-off-by: Shaohua Li 
> ---

Wei has almost done the RCU conversion.

This patch is probably coming too late.





Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-04 Thread Neal Cardwell
On Fri, Aug 4, 2017 at 3:33 AM, maowenan  wrote:
> [Mao Wenan]Follow previous mail, in lower version such as 3.10, I found
> there are many timer type, e.g:ICSK_TIME_EARLY_RETRANS, RTO,PTO
> are used. I'm not sure there exist some unknown problem if we don't check
> isck_pending here and below if branch? And could you please post one lower
> version patch such as 3.10?  Thanks a lot.
>
> #define ICSK_TIME_RETRANS   1   /* Retransmit timer */
> #define ICSK_TIME_DACK  2   /* Delayed ack timer */
> #define ICSK_TIME_PROBE03   /* Zero window probe timer */
> #define ICSK_TIME_EARLY_RETRANS 4   /* Early retransmit timer */
> #define ICSK_TIME_LOSS_PROBE5   /* Tail loss probe timer */

I think you'll find that it you audit how these patches interact with
those other timer types you'll find that the behavior either matches
the existing code, or is improved. We ran these patches through our
packetdrill test suite and found that was the case for all our
existing tests. But please let us know if you find a specific scenario
where that is not the case.

I have attached patches for this fix rebased on to v3.10.107, the
latest stable release for 3.10. That's pretty far back in history, so
there were substantial conflict resolutions and adjustments required.
:-) Hope that helps.

thanks,
neal


0001-tcp-introduce-tcp_rto_delta_us-helper-for-xmit-timer.patch
Description: Binary data


0002-tcp-enable-xmit-timer-fix-by-having-TLP-use-time-whe.patch
Description: Binary data


0003-tcp-fix-xmit-timer-to-only-be-reset-if-data-ACKed-SA.patch
Description: Binary data


Re: XFRM pcpu cache issue

2017-08-04 Thread Florian Westphal
Ilan Tayari  wrote:
> I debugged a little the regression I told you about the other day...
> 
> Steps and Symptoms:
> 1. Set up a host-to-host IPSec tunnel (or transport, doesn't matter)
> 2. Ping over IPSec, or do something to populate the pcpu cache
> 3. Join a MC group, then leave MC group
> 4. Try to ping again using same CPU as before -> traffic doesn't egress the 
> machine at all
> 
> If trying from another CPU (with clean cache), it pings well.
> If clearing the pcpu cache, it works well again.

Yes, I think i see the problem, thanks for debugging this.

I dropped the stale_bundle() check vs. rfc, that was a stupid thing
to do because that is what would detect this


Does this help?

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1818,7 +1818,8 @@ xfrm_resolve_and_create_bundle(struct xfrm_policy **pols, 
int num_pols,
xdst->num_pols == num_pols &&
!xfrm_pol_dead(xdst) &&
memcmp(xdst->pols, pols,
-  sizeof(struct xfrm_policy *) * num_pols) == 0) {
+  sizeof(struct xfrm_policy *) * num_pols) == 0 &&
+   xfrm_bundle_ok(xdst)) {
dst_hold(>u.dst);
return xdst;
}




Re: [RFC] iproute: Add support for extended ack to rtnl_talk

2017-08-04 Thread Stephen Hemminger
On Fri, 4 Aug 2017 13:31:48 +0200
Simon Horman  wrote:

> On Thu, Aug 03, 2017 at 02:26:58PM -0600, David Ahern wrote:
> > On 5/18/17 10:24 PM, David Ahern wrote:  
> > > On 5/18/17 3:02 AM, Daniel Borkmann wrote:  
> > >> So effectively this means libmnl has to be used for new stuff, noone
> > >> has time to do the work to convert the existing tooling over (which
> > >> by itself might be a challenge in testing everything to make sure
> > >> there are no regressions) given there's not much activity around
> > >> lib/libnetlink.c anyway, and existing users not using libmnl today
> > >> won't see/notice new improvements on netlink side when they do an
> > >> upgrade. So we'll be stuck with that dual library mess pretty much
> > >> for a very long time. :(  
> > > 
> > > lib/libnetlink.c with all of its duplicate functions weighs in at just
> > > 947 LOC -- a mere 12% of the code in lib/. From a total SLOC of iproute2
> > > it is a negligible part of the code base.
> > > 
> > > Given that, there is very little gain -- but a lot of risk in
> > > regressions -- in converting such a small, low level code base to libmnl
> > > just for the sake of using a library - something Phil noted in his
> > > cursory attempt at converting ip to libmnl. ie., The level effort
> > > required vs the benefit is just not worth it.
> > > 
> > > There are so many other parts of the ip code base that need work with a
> > > much higher return on the time investment.
> > >   
> > 
> > Stephen: It has been 3 months since the first extack patches were posted
> > and still nothing in iproute2, all of it hung up on your decision to
> > require libmnl. Do you plan to finish the libmnl support any time soon
> > and send out patches?  
> 
> FWIIW I would also like to see some way to get this enhancement accepted.

I will put in the libmnl version. If it doesn't work because no one sent
me test cases, then fine. send a patch for that.


[PATCH net-next V2] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-04 Thread Shaohua Li
From: Shaohua Li 

In a syn flooding test, the fib6_table rwlock is a significant
bottleneck. While converting the rwlock to rcu sounds straighforward,
but is very challenging if it's possible. A percpu spinlock (lglock has
been removed from kernel, so I added a simple implementation here) is
quite trival for this problem since updating the routing table is a rare
event. In my test, the server receives around 1.5 Mpps in syn flooding
test without the patch in a dual sockets and 56-CPU system. With the
patch, the server receives around 3.8Mpps, and perf report doesn't show
the locking issue.

Of course the percpu lock isn't as good as rcu, so this isn't intended
to replace rcu, but this is much better than current readwrite lock.
Before we have a rcu implementation, this is a good temporary solution.
Plus, this is a trival change, there is nothing to prevent pursuing a
rcu implmentation.

Cc: Wei Wang 
Cc: Eric Dumazet 
Cc: Stephen Hemminger 
Signed-off-by: Shaohua Li 
---
 include/net/ip6_fib.h | 57 +-
 net/ipv6/addrconf.c   |  8 +++---
 net/ipv6/ip6_fib.c| 76 ---
 net/ipv6/route.c  | 54 ++--
 4 files changed, 129 insertions(+), 66 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1d790ea..124eb04 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -247,7 +247,7 @@ struct rt6_statistics {
 struct fib6_table {
struct hlist_node   tb6_hlist;
u32 tb6_id;
-   rwlock_ttb6_lock;
+   spinlock_t __percpu *percpu_tb6_lock;
struct fib6_nodetb6_root;
struct inet_peer_base   tb6_peers;
unsigned intflags;
@@ -255,6 +255,61 @@ struct fib6_table {
 #define RT6_TABLE_HAS_DFLT_ROUTER  BIT(0)
 };
 
+static inline void fib6_table_read_lock_bh(struct fib6_table *table)
+{
+   preempt_disable();
+   spin_lock_bh(this_cpu_ptr(table->percpu_tb6_lock));
+}
+
+static inline void fib6_table_read_unlock_bh(struct fib6_table *table)
+{
+   spin_unlock_bh(this_cpu_ptr(table->percpu_tb6_lock));
+   preempt_enable();
+}
+
+static inline void fib6_table_read_lock(struct fib6_table *table)
+{
+   preempt_disable();
+   spin_lock(this_cpu_ptr(table->percpu_tb6_lock));
+}
+
+static inline void fib6_table_read_unlock(struct fib6_table *table)
+{
+   spin_unlock(this_cpu_ptr(table->percpu_tb6_lock));
+   preempt_enable();
+}
+
+static inline void fib6_table_write_lock_bh(struct fib6_table *table)
+{
+   int first = nr_cpu_ids;
+   int i;
+
+   for_each_possible_cpu(i) {
+   if (first == nr_cpu_ids) {
+   first = i;
+   spin_lock_bh(per_cpu_ptr(table->percpu_tb6_lock, i));
+   } else
+   spin_lock_nest_lock(
+   per_cpu_ptr(table->percpu_tb6_lock, i),
+   per_cpu_ptr(table->percpu_tb6_lock, first));
+   }
+}
+
+static inline void fib6_table_write_unlock_bh(struct fib6_table *table)
+{
+   int first = nr_cpu_ids;
+   int i;
+
+   for_each_possible_cpu(i) {
+   if (first == nr_cpu_ids) {
+   first = i;
+   continue;
+   }
+   spin_unlock(per_cpu_ptr(table->percpu_tb6_lock, i));
+   }
+   spin_unlock_bh(per_cpu_ptr(table->percpu_tb6_lock, first));
+}
+
 #define RT6_TABLE_UNSPEC   RT_TABLE_UNSPEC
 #define RT6_TABLE_MAIN RT_TABLE_MAIN
 #define RT6_TABLE_DFLT RT6_TABLE_MAIN
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 30ee23e..22e2ad2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2313,7 +2313,7 @@ static struct rt6_info *addrconf_get_prefix_route(const 
struct in6_addr *pfx,
if (!table)
return NULL;
 
-   read_lock_bh(>tb6_lock);
+   fib6_table_read_lock_bh(table);
fn = fib6_locate(>tb6_root, pfx, plen, NULL, 0);
if (!fn)
goto out;
@@ -2330,7 +2330,7 @@ static struct rt6_info *addrconf_get_prefix_route(const 
struct in6_addr *pfx,
break;
}
 out:
-   read_unlock_bh(>tb6_lock);
+   fib6_table_read_unlock_bh(table);
return rt;
 }
 
@@ -5929,7 +5929,7 @@ void addrconf_disable_policy_idev(struct inet6_dev *idev, 
int val)
struct fib6_table *table = rt->rt6i_table;
int cpu;
 
-   read_lock(>tb6_lock);
+   fib6_table_read_lock(table);
addrconf_set_nopolicy(ifa->rt, val);
if (rt->rt6i_pcpu) {
for_each_possible_cpu(cpu) {
@@ -5939,7 +5939,7 @@ void addrconf_disable_policy_idev(struct 

Re: [PATCH] iproute2/misc: do not mix CFLAGS with LDFLAGS

2017-08-04 Thread Stephen Hemminger
On Fri,  4 Aug 2017 11:54:02 +0200
Marcus Meissner  wrote:

> during linking, do not use CFLAGS. This avoid clashes when doing PIE builds.
> ---
>  misc/Makefile | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/misc/Makefile b/misc/Makefile
> index 72807678..1d86c44d 100644
> --- a/misc/Makefile
> +++ b/misc/Makefile
> @@ -23,17 +23,17 @@ all: $(TARGETS)
>  ss: $(SSOBJ)
>   $(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
>  
> -nstat: nstat.c
> - $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o nstat nstat.c $(LIBNETLINK) -lm
> +nstat: nstat.o
> + $(QUIET_CC)$(CC) $(LDFLAGS) -o nstat nstat.o $(LIBNETLINK) -lm
>  
> -ifstat: ifstat.c
> - $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o ifstat ifstat.c $(LIBNETLINK) 
> -lm
> +ifstat: ifstat.o
> + $(QUIET_CC)$(CC) $(LDFLAGS) -o ifstat ifstat.o $(LIBNETLINK) -lm
>  
> -rtacct: rtacct.c
> - $(QUIET_CC)$(CC) $(CFLAGS) $(LDFLAGS) -o rtacct rtacct.c $(LIBNETLINK) 
> -lm
> +rtacct: rtacct.o
> + $(QUIET_CC)$(CC) $(LDFLAGS) -o rtacct rtacct.o $(LIBNETLINK) -lm
>  
> -arpd: arpd.c
> - $(QUIET_CC)$(CC) $(CFLAGS) -I$(DBM_INCLUDE) $(LDFLAGS) -o arpd arpd.c 
> $(LIBNETLINK) -ldb -lpthread
> +arpd: arpd.o
> + $(QUIET_CC)$(CC) $(LDFLAGS) -o arpd arpd.o $(LIBNETLINK) -ldb -lpthread
>  
>  ssfilter.c: ssfilter.y
>   $(QUIET_YACC)bison ssfilter.y -o ssfilter.c

Some CFLAGS do need to be passed to gcc when doing linking, think of -flto
I don't see this on gcc with Debian and hardening.


Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-04 Thread Neal Cardwell
On Fri, Aug 4, 2017 at 3:12 AM, maowenan  wrote:
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2380,21 +2380,12 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> > > u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> > > u32 timeout, rto_delta_us;
> > >
> > > -   /* No consecutive loss probes. */
> > > -   if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> > > -   tcp_rearm_rto(sk);
> > > -   return false;
> > > -   }
> [Mao Wenan] I'm sorry I can't get why you delete this and below "if" branch?

We deleted those two "if" branches in tcp_schedule_loss_probe()
because they were assuming that TLP probes would only be scheduled in
a context where an RTO had already been scheduled. With the old
implementation that was true: on every ACK (tcp_ack()) or send of new
data (tcp_event_new_data_sent()) we would first schedule an RTO (by
calling tcp_rearm_rto()) and then schedule a TLP (by calling
tcp_schedule_loss_probe()). So the checks were the right ones for the
old implementation.

With the new implementation, we do not first rearm the RTO on every
incoming ACK. That means when we get to tcp_schedule_loss_probe() we
may find either an RTO or TLP is pending.

Hope that helps clear that up.

cheers,
neal


Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics

2017-08-04 Thread Florian Fainelli
On 08/03/2017 10:36 PM, Eric Dumazet wrote:
> On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote:
>> During testing with a background iperf pushing 1Gbit/sec worth of
>> traffic and having both ifconfig and ethtool collect statistics, we
>> could see quite frequent deadlocks. Convert the often accessed DSA slave
>> network devices statistics to per-cpu 64-bit statistics to remove these
>> deadlocks and provide fast efficient statistics updates.
>>
> 
> This seems to be a bug fix, it would be nice to get a proper tag like :
> 
> Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics")

Right, should have been added, thanks!

> 
> Problem here is that if multiple cpus can call dsa_switch_rcv() at the
> same time, then u64_stats_update_begin() contract is not respected.

This is really where I struggled understanding what is wrong in the
non-per CPU version, my understanding is that we have:

- writers for xmit executes in process context
- writers for receive executes from NAPI (from the DSA's master network
device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa
-> netif_receive_skb)

readers should all execute in process context. The test scenario that
led to a deadlock involved running iperf in the background, having a
while loop with both ifconfig and ethtool reading stats, and somehow
when iperf exited, either reader would just be locked. So I guess this
leaves us with the two writers not being mutually excluded then, right?

> 
> include/linux/u64_stats_sync.h states :
> 
>  * Usage :
>  *
>  * Stats producer (writer) should use following template granted it already 
> got
>  * an exclusive access to counters (a lock is already taken, or per cpu
>  * data is used [in a non preemptable context])
>  *
>  *   spin_lock_bh(...) or other synchronization to get exclusive access
>  *   ...
>  *   u64_stats_update_begin(>syncp);
> 
> 
> 

-- 
Florian


Re: [PATCH 1/2] net: mvneta: remove bogus use of

2017-08-04 Thread Marcin Wojtas
Hi Gregory,

>From my side: +1 to your modification.

Thanks,
Marcin

2017-08-04 17:26 GMT+02:00 Gregory CLEMENT :
> Hi Rob,
>
>  On jeu., juil. 20 2017, Rob Herring  wrote:
>
>> On Thu, Jul 20, 2017 at 10:06 AM, Gregory CLEMENT
>>  wrote:
>>> Hi Rob,
>>>
>>>  On jeu., juil. 20 2017, Rob Herring  wrote:
>>>
>>> (Adding Marcin in CC who wrote this part of code)
>>>
 Nothing sets ever sets data, so it is always NULL. Remove it as this is
 the only user of data ptr in the whole kernel, and it is going to be
 removed from struct device_node.
>>>
>>> Actually the use of device_node.data ptr is not bogus and it is set in
>>> mvneta_bm_probe:
>>> http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/marvell/mvneta_bm.c#L433
>>
>> Indeed. Looks like some complicated kconfig logic, so I'd not been
>> able to trigger a build failure nor did 0-day (so far).
>>
>>> Your patch will break the BM support on this driver. So if you need to
>>> remove this data ptr, then you have to offer an alternative for it.
>>
>> How about something like this (WS damaged) patch:
>
> I finally took time to test your patch. There was some missing part
> which prevented it to be build, like including linux/of_platform.h, or
> providing tub function when CONFIG_MVNETA_BM was not enable.
>
> Also the fact that you still call mvneta_bm_port_init() even if bm_priv
> was NULL was not really nice. So I proposed the following patch, that I
> tested on a clearfog with and without CONFIG_MVNETA_BM enabled.
>
> From 03c4028bc1f52d3d214e8506d9f0f0d3985d Mon Sep 17 00:00:00 2001
> From: Gregory CLEMENT 
> Date: Fri, 4 Aug 2017 17:18:38 +0200
> Subject: [PATCH] net: mvneta: remove data pointer usage from device_node
>  structure
>
> In order to be able to remove the data pointer from the device_node
> structure. We have to modify the way the BM resources are shared between
> the mvneta port.
>
> Signed-off-by: Gregory CLEMENT 
> ---
>  drivers/net/ethernet/marvell/mvneta.c| 18 --
>  drivers/net/ethernet/marvell/mvneta_bm.c | 13 +
>  drivers/net/ethernet/marvell/mvneta_bm.h |  8 ++--
>  3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 63b6147753fe..fd84447582f7 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4295,12 +4295,16 @@ static int mvneta_probe(struct platform_device *pdev)
>
> /* Obtain access to BM resources if enabled and already initialized */
> bm_node = of_parse_phandle(dn, "buffer-manager", 0);
> -   if (bm_node && bm_node->data) {
> -   pp->bm_priv = bm_node->data;
> -   err = mvneta_bm_port_init(pdev, pp);
> -   if (err < 0) {
> -   dev_info(>dev, "use SW buffer management\n");
> -   pp->bm_priv = NULL;
> +   if (bm_node) {
> +   pp->bm_priv = mvneta_bm_get(bm_node);
> +   if (pp->bm_priv) {
> +   err = mvneta_bm_port_init(pdev, pp);
> +   if (err < 0) {
> +   dev_info(>dev,
> +"use SW buffer management\n");
> +   mvneta_bm_put(pp->bm_priv);
> +   pp->bm_priv = NULL;
> +   }
> }
> }
> of_node_put(bm_node);
> @@ -4369,6 +4373,7 @@ static int mvneta_probe(struct platform_device *pdev)
> mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_long, 1 << 
> pp->id);
> mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_short,
>1 << pp->id);
> +   mvneta_bm_put(pp->bm_priv);
> }
>  err_free_stats:
> free_percpu(pp->stats);
> @@ -4410,6 +4415,7 @@ static int mvneta_remove(struct platform_device *pdev)
> mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_long, 1 << 
> pp->id);
> mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_short,
>1 << pp->id);
> +   mvneta_bm_put(pp->bm_priv);
> }
>
> return 0;
> diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c 
> b/drivers/net/ethernet/marvell/mvneta_bm.c
> index 466939f8f0cf..01e3152e76c8 100644
> --- a/drivers/net/ethernet/marvell/mvneta_bm.c
> +++ b/drivers/net/ethernet/marvell/mvneta_bm.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -392,6 +393,18 @@ static void mvneta_bm_put_sram(struct mvneta_bm *priv)
>   MVNETA_BM_BPPI_SIZE);
>  }
>
> +struct mvneta_bm *mvneta_bm_get(struct device_node *node)
> 

Re: [PATCH 0/6] In-kernel QMI handling

2017-08-04 Thread Dan Williams
On Fri, 2017-08-04 at 07:59 -0700, Bjorn Andersson wrote:
> This series starts by moving the common definitions of the QMUX
> protocol to the
> uapi header, as they are shared with clients - both in kernel and
> userspace.
> 
> This series then introduces in-kernel helper functions for aiding the
> handling
> of QMI encoded messages in the kernel. QMI encoding is a wire-format
> used in
> exchanging messages between the majority of QRTR clients and
> services.

This raises a few red-flags for me.  So far, we've kept almost
everything QMI related in userspace and handled all QMI control-channel 
messages from libraries like libqmi or uqmi via the cdc-wdm driver and
the "rmnet" interface via the qmi_wwan driver.  The kernel drivers just
serve as the transport.

Can you describe what kinds of in-kernel drivers need to actually parse
QMI messages as part of their operation?

Dan

> It then adds an abstractions to reduce the duplication of common code
> in
> drivers that needs to query the name server and send and receive
> encoded
> messages to a remote service.
> 
> Finally it introduces a sample implementation for showing QRTR and
> the QMI
> helpers in action. The sample device instantiates in response to
> finding the
> "test service" and implements the "test protocol".
> 
> Bjorn Andersson (6):
>   net: qrtr: Invoke sk_error_report() after setting sk_err
>   net: qrtr: Move constants to header file
>   net: qrtr: Add control packet definition to uapi
>   soc: qcom: Introduce QMI encoder/decoder
>   soc: qcom: Introduce QMI helpers
>   samples: Introduce Qualcomm QRTR sample client
> 
>  drivers/soc/qcom/Kconfig  |   8 +
>  drivers/soc/qcom/Makefile |   3 +
>  drivers/soc/qcom/qmi_encdec.c | 812
> ++
>  drivers/soc/qcom/qmi_interface.c  | 540 +
>  include/linux/soc/qcom/qmi.h  | 249 
>  include/uapi/linux/qrtr.h |  35 ++
>  net/qrtr/qrtr.c   |  16 +-
>  samples/Kconfig   |   8 +
>  samples/Makefile  |   2 +-
>  samples/qrtr/Makefile |   1 +
>  samples/qrtr/qrtr_sample_client.c | 603 
>  11 files changed, 2261 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/soc/qcom/qmi_encdec.c
>  create mode 100644 drivers/soc/qcom/qmi_interface.c
>  create mode 100644 include/linux/soc/qcom/qmi.h
>  create mode 100644 samples/qrtr/Makefile
>  create mode 100644 samples/qrtr/qrtr_sample_client.c
> 


Re: [PATCH net-next v2 00/13] Change DSA's FDB API and perform switchdev cleanup

2017-08-04 Thread Vivien Didelot
Hi Arkadi, Jiri,

Jiri Pirko  writes:

>>It seems impossible currently to move the self to be the default, and
>>this introduces regression which you don't approve, so it seems few
>>options left:
>>
>>a) Leave two ways to add fdb, through the bridge (by using the master
>>   flag) which is introduced in this patchset, and by using the self
>>   which is the legacy way. In this way no regression will be introduced,
>>   yet, it feels confusing a bit. The benefit is that we (DSA/mlxsw)
>>   will be synced.
>>b) Leave only the self (which means removing patch no 4,5).
>
> I belive that option a) is the correct way to go. Introduction of self
> inclusion was a mistake from the very beginning. I think that we should
> just move one and correct this mistake.
>
> Vivien, any arguments against a)?

I do agree with a). Arkadi, when moving switchdev implementations inside
of DSA core, can I ask you to move the ones considered as the legacy way
into legacy.c and ideally comment it? Configuration from userspace is
still very confusing and this will remind us to get rid of it one day.


Thanks,

Vivien


Re: [net-next PATCH v2] net: comment fixes against BPF devmap helper calls

2017-08-04 Thread Daniel Borkmann

On 08/04/2017 05:24 PM, John Fastabend wrote:

Update BPF comments to accurately reflect XDP usage.

Fixes: 97f91a7cf04ff ("bpf: add bpf_redirect_map helper routine")
Reported-by: Alexei Starovoitov 
Signed-off-by: John Fastabend 


Acked-by: Daniel Borkmann 

Thanks!


Re: [PATCH 1/2] net: mvneta: remove bogus use of

2017-08-04 Thread Gregory CLEMENT
Hi Rob,
 
 On jeu., juil. 20 2017, Rob Herring  wrote:

> On Thu, Jul 20, 2017 at 10:06 AM, Gregory CLEMENT
>  wrote:
>> Hi Rob,
>>
>>  On jeu., juil. 20 2017, Rob Herring  wrote:
>>
>> (Adding Marcin in CC who wrote this part of code)
>>
>>> Nothing sets ever sets data, so it is always NULL. Remove it as this is
>>> the only user of data ptr in the whole kernel, and it is going to be
>>> removed from struct device_node.
>>
>> Actually the use of device_node.data ptr is not bogus and it is set in
>> mvneta_bm_probe:
>> http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/marvell/mvneta_bm.c#L433
>
> Indeed. Looks like some complicated kconfig logic, so I'd not been
> able to trigger a build failure nor did 0-day (so far).
>
>> Your patch will break the BM support on this driver. So if you need to
>> remove this data ptr, then you have to offer an alternative for it.
>
> How about something like this (WS damaged) patch:

I finally took time to test your patch. There was some missing part
which prevented it to be build, like including linux/of_platform.h, or
providing tub function when CONFIG_MVNETA_BM was not enable.

Also the fact that you still call mvneta_bm_port_init() even if bm_priv
was NULL was not really nice. So I proposed the following patch, that I
tested on a clearfog with and without CONFIG_MVNETA_BM enabled.

>From 03c4028bc1f52d3d214e8506d9f0f0d3985d Mon Sep 17 00:00:00 2001
From: Gregory CLEMENT 
Date: Fri, 4 Aug 2017 17:18:38 +0200
Subject: [PATCH] net: mvneta: remove data pointer usage from device_node
 structure

In order to be able to remove the data pointer from the device_node
structure. We have to modify the way the BM resources are shared between
the mvneta port.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c| 18 --
 drivers/net/ethernet/marvell/mvneta_bm.c | 13 +
 drivers/net/ethernet/marvell/mvneta_bm.h |  8 ++--
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 63b6147753fe..fd84447582f7 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4295,12 +4295,16 @@ static int mvneta_probe(struct platform_device *pdev)
 
/* Obtain access to BM resources if enabled and already initialized */
bm_node = of_parse_phandle(dn, "buffer-manager", 0);
-   if (bm_node && bm_node->data) {
-   pp->bm_priv = bm_node->data;
-   err = mvneta_bm_port_init(pdev, pp);
-   if (err < 0) {
-   dev_info(>dev, "use SW buffer management\n");
-   pp->bm_priv = NULL;
+   if (bm_node) {
+   pp->bm_priv = mvneta_bm_get(bm_node);
+   if (pp->bm_priv) {
+   err = mvneta_bm_port_init(pdev, pp);
+   if (err < 0) {
+   dev_info(>dev,
+"use SW buffer management\n");
+   mvneta_bm_put(pp->bm_priv);
+   pp->bm_priv = NULL;
+   }
}
}
of_node_put(bm_node);
@@ -4369,6 +4373,7 @@ static int mvneta_probe(struct platform_device *pdev)
mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_long, 1 << pp->id);
mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_short,
   1 << pp->id);
+   mvneta_bm_put(pp->bm_priv);
}
 err_free_stats:
free_percpu(pp->stats);
@@ -4410,6 +4415,7 @@ static int mvneta_remove(struct platform_device *pdev)
mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_long, 1 << pp->id);
mvneta_bm_pool_destroy(pp->bm_priv, pp->pool_short,
   1 << pp->id);
+   mvneta_bm_put(pp->bm_priv);
}
 
return 0;
diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c 
b/drivers/net/ethernet/marvell/mvneta_bm.c
index 466939f8f0cf..01e3152e76c8 100644
--- a/drivers/net/ethernet/marvell/mvneta_bm.c
+++ b/drivers/net/ethernet/marvell/mvneta_bm.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -392,6 +393,18 @@ static void mvneta_bm_put_sram(struct mvneta_bm *priv)
  MVNETA_BM_BPPI_SIZE);
 }
 
+struct mvneta_bm *mvneta_bm_get(struct device_node *node)
+{
+   struct platform_device *pdev = of_find_device_by_node(node);
+
+   return pdev ? platform_get_drvdata(pdev) : NULL;
+}
+
+void mvneta_bm_put(struct mvneta_bm *priv)
+{
+   platform_device_put(priv->pdev);
+}
+
 static int mvneta_bm_probe(struct platform_device *pdev)
 {
struct device_node *dn = pdev->dev.of_node;
diff 

[net-next PATCH v2] net: comment fixes against BPF devmap helper calls

2017-08-04 Thread John Fastabend
Update BPF comments to accurately reflect XDP usage.

Fixes: 97f91a7cf04ff ("bpf: add bpf_redirect_map helper routine")
Reported-by: Alexei Starovoitov 
Signed-off-by: John Fastabend 
---
 include/uapi/linux/bpf.h |   16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1106a8c..1d06be1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -345,14 +345,20 @@ enum bpf_attach_type {
  * int bpf_redirect(ifindex, flags)
  * redirect to another netdev
  * @ifindex: ifindex of the net device
- * @flags: bit 0 - if set, redirect to ingress instead of egress
- * other bits - reserved
- * Return: TC_ACT_REDIRECT
- * int bpf_redirect_map(key, map, flags)
+ * @flags:
+ *   cls_bpf:
+ *  bit 0 - if set, redirect to ingress instead of egress
+ *  other bits - reserved
+ *   xdp_bpf:
+ * all bits - reserved
+ * Return: cls_bpf: TC_ACT_REDIRECT on success or TC_ACT_SHOT on error
+ *xdp_bfp: XDP_REDIRECT on success or XDP_ABORT on error
+ * int bpf_redirect_map(map, key, flags)
  * redirect to endpoint in map
+ * @map: pointer to dev map
  * @key: index in map to lookup
- * @map: fd of map to do lookup in
  * @flags: --
+ * Return: XDP_REDIRECT on success or XDP_ABORT on error
  *
  * u32 bpf_get_route_realm(skb)
  * retrieve a dst's tclassid



[PATCH 0/6] In-kernel QMI handling

2017-08-04 Thread Bjorn Andersson
This series starts by moving the common definitions of the QMUX protocol to the
uapi header, as they are shared with clients - both in kernel and userspace.

This series then introduces in-kernel helper functions for aiding the handling
of QMI encoded messages in the kernel. QMI encoding is a wire-format used in
exchanging messages between the majority of QRTR clients and services.

It then adds an abstractions to reduce the duplication of common code in
drivers that needs to query the name server and send and receive encoded
messages to a remote service.

Finally it introduces a sample implementation for showing QRTR and the QMI
helpers in action. The sample device instantiates in response to finding the
"test service" and implements the "test protocol".

Bjorn Andersson (6):
  net: qrtr: Invoke sk_error_report() after setting sk_err
  net: qrtr: Move constants to header file
  net: qrtr: Add control packet definition to uapi
  soc: qcom: Introduce QMI encoder/decoder
  soc: qcom: Introduce QMI helpers
  samples: Introduce Qualcomm QRTR sample client

 drivers/soc/qcom/Kconfig  |   8 +
 drivers/soc/qcom/Makefile |   3 +
 drivers/soc/qcom/qmi_encdec.c | 812 ++
 drivers/soc/qcom/qmi_interface.c  | 540 +
 include/linux/soc/qcom/qmi.h  | 249 
 include/uapi/linux/qrtr.h |  35 ++
 net/qrtr/qrtr.c   |  16 +-
 samples/Kconfig   |   8 +
 samples/Makefile  |   2 +-
 samples/qrtr/Makefile |   1 +
 samples/qrtr/qrtr_sample_client.c | 603 
 11 files changed, 2261 insertions(+), 16 deletions(-)
 create mode 100644 drivers/soc/qcom/qmi_encdec.c
 create mode 100644 drivers/soc/qcom/qmi_interface.c
 create mode 100644 include/linux/soc/qcom/qmi.h
 create mode 100644 samples/qrtr/Makefile
 create mode 100644 samples/qrtr/qrtr_sample_client.c

-- 
2.12.0



[PATCH 1/6] net: qrtr: Invoke sk_error_report() after setting sk_err

2017-08-04 Thread Bjorn Andersson
Rather than manually waking up any context sleeping on the sock to
signal an error we should call sk_error_report(). This has the added
benefit that in-kernel consumers can override this notificatino with
its own callback.

Signed-off-by: Bjorn Andersson 
---
 net/qrtr/qrtr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 5586609afa27..2058b27821a4 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -541,7 +541,7 @@ static void qrtr_reset_ports(void)
 
sock_hold(>sk);
ipc->sk.sk_err = ENETRESET;
-   wake_up_interruptible(sk_sleep(>sk));
+   ipc->sk.sk_error_report(>sk);
sock_put(>sk);
}
mutex_unlock(_port_lock);
-- 
2.12.0



[PATCH 4/6] soc: qcom: Introduce QMI encoder/decoder

2017-08-04 Thread Bjorn Andersson
Add the helper library for encoding and decoding QMI encoded messages.
The implementation is taken from lib/qmi_encdec.c of the Qualcomm kernel
(msm-3.18).

Modifications has been made to the public API, source buffers has been
made const and the debug-logging part was omitted, for now.

Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/Kconfig  |   8 +
 drivers/soc/qcom/Makefile |   2 +
 drivers/soc/qcom/qmi_encdec.c | 812 ++
 include/linux/soc/qcom/qmi.h  | 116 ++
 4 files changed, 938 insertions(+)
 create mode 100644 drivers/soc/qcom/qmi_encdec.c
 create mode 100644 include/linux/soc/qcom/qmi.h

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 9fca977ef18d..2541ae07ad2a 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -24,6 +24,14 @@ config QCOM_PM
  modes. It interface with various system drivers to put the cores in
  low power modes.
 
+config QCOM_QMI_HELPERS
+   bool
+   help
+ Helper library for handling QMI encoded messages.  QMI encoded
+ messages are used in communication between the majority of QRTR
+ clients and this helpers provide the common functionality needed for
+ doing this from a kernel driver.
+
 config QCOM_SMEM
tristate "Qualcomm Shared Memory Manager (SMEM)"
depends on ARCH_QCOM
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 414f0de274fa..27b60da7a062 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -1,6 +1,8 @@
 obj-$(CONFIG_QCOM_GSBI)+=  qcom_gsbi.o
 obj-$(CONFIG_QCOM_MDT_LOADER)  += mdt_loader.o
 obj-$(CONFIG_QCOM_PM)  +=  spm.o
+obj-$(CONFIG_QCOM_QMI_HELPERS) += qmi_helpers.o
+qmi_helpers-y  += qmi_encdec.o
 obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o
 obj-$(CONFIG_QCOM_SMEM) += smem.o
 obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
diff --git a/drivers/soc/qcom/qmi_encdec.c b/drivers/soc/qcom/qmi_encdec.c
new file mode 100644
index ..4eb3099c64e7
--- /dev/null
+++ b/drivers/soc/qcom/qmi_encdec.c
@@ -0,0 +1,812 @@
+/*
+ * Copyright (c) 2012-2015, The Linux Foundation. All rights reserved.
+ * Copyright (C) 2017 Linaro Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define QMI_ENCDEC_ENCODE_TLV(type, length, p_dst) do { \
+   *p_dst++ = type; \
+   *p_dst++ = ((uint8_t)((length) & 0xFF)); \
+   *p_dst++ = ((uint8_t)(((length) >> 8) & 0xFF)); \
+} while (0)
+
+#define QMI_ENCDEC_DECODE_TLV(p_type, p_length, p_src) do { \
+   *p_type = (uint8_t)*p_src++; \
+   *p_length = (uint8_t)*p_src++; \
+   *p_length |= ((uint8_t)*p_src) << 8; \
+} while (0)
+
+#define QMI_ENCDEC_ENCODE_N_BYTES(p_dst, p_src, size) \
+do { \
+   memcpy(p_dst, p_src, size); \
+   p_dst = (uint8_t *)p_dst + size; \
+   p_src = (uint8_t *)p_src + size; \
+} while (0)
+
+#define QMI_ENCDEC_DECODE_N_BYTES(p_dst, p_src, size) \
+do { \
+   memcpy(p_dst, p_src, size); \
+   p_dst = (uint8_t *)p_dst + size; \
+   p_src = (uint8_t *)p_src + size; \
+} while (0)
+
+#define UPDATE_ENCODE_VARIABLES(temp_si, buf_dst, \
+   encoded_bytes, tlv_len, encode_tlv, rc) \
+do { \
+   buf_dst = (uint8_t *)buf_dst + rc; \
+   encoded_bytes += rc; \
+   tlv_len += rc; \
+   temp_si = temp_si + 1; \
+   encode_tlv = 1; \
+} while (0)
+
+#define UPDATE_DECODE_VARIABLES(buf_src, decoded_bytes, rc) \
+do { \
+   buf_src = (uint8_t *)buf_src + rc; \
+   decoded_bytes += rc; \
+} while (0)
+
+#define TLV_LEN_SIZE sizeof(uint16_t)
+#define TLV_TYPE_SIZE sizeof(uint8_t)
+#define OPTIONAL_TLV_TYPE_START 0x10
+
+static int qmi_encode(struct qmi_elem_info *ei_array, void *out_buf,
+ const void *in_c_struct, uint32_t out_buf_len,
+ int enc_level);
+
+static int qmi_decode(struct qmi_elem_info *ei_array, void *out_c_struct,
+ const void *in_buf, uint32_t in_buf_len, int dec_level);
+
+/**
+ * skip_to_next_elem() - Skip to next element in the structure to be encoded
+ * @ei_array: Struct info describing the element to be skipped.
+ * @level: Depth level of encoding/decoding to identify nested structures.
+ *
+ * Returns struct info of the next element that can be encoded.
+ *
+ * This function is used while encoding optional elements. If the flag
+ * corresponding to an optional element is 

[PATCH 2/6] net: qrtr: Move constants to header file

2017-08-04 Thread Bjorn Andersson
The constants are used by both the name server and clients, so make
their value explicit and move them to the uapi header.

Signed-off-by: Bjorn Andersson 
---
 include/uapi/linux/qrtr.h | 3 +++
 net/qrtr/qrtr.c   | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/qrtr.h b/include/uapi/linux/qrtr.h
index 9d76c566f66e..63e8803e4d90 100644
--- a/include/uapi/linux/qrtr.h
+++ b/include/uapi/linux/qrtr.h
@@ -4,6 +4,9 @@
 #include 
 #include 
 
+#define QRTR_NODE_BCAST0xu
+#define QRTR_PORT_CTRL 0xfffeu
+
 struct sockaddr_qrtr {
__kernel_sa_family_t sq_family;
__u32 sq_node;
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 2058b27821a4..0d7d3968414e 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -61,8 +61,6 @@ struct qrtr_hdr {
 } __packed;
 
 #define QRTR_HDR_SIZE sizeof(struct qrtr_hdr)
-#define QRTR_NODE_BCAST ((unsigned int)-1)
-#define QRTR_PORT_CTRL ((unsigned int)-2)
 
 struct qrtr_sock {
/* WARNING: sk must be the first member */
-- 
2.12.0



[PATCH 3/6] net: qrtr: Add control packet definition to uapi

2017-08-04 Thread Bjorn Andersson
The QMUX protocol specification defines structure of the special control
packet messages being sent between handlers of the control port.

Add these to the uapi header, as this structure and the associated types
are shared between the kernel and all userspace handlers of control
messages.

Signed-off-by: Bjorn Andersson 
---
 include/uapi/linux/qrtr.h | 32 
 net/qrtr/qrtr.c   | 12 
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/qrtr.h b/include/uapi/linux/qrtr.h
index 63e8803e4d90..179af64846e0 100644
--- a/include/uapi/linux/qrtr.h
+++ b/include/uapi/linux/qrtr.h
@@ -13,4 +13,36 @@ struct sockaddr_qrtr {
__u32 sq_port;
 };
 
+enum qrtr_pkt_type {
+   QRTR_TYPE_DATA  = 1,
+   QRTR_TYPE_HELLO = 2,
+   QRTR_TYPE_BYE   = 3,
+   QRTR_TYPE_NEW_SERVER= 4,
+   QRTR_TYPE_DEL_SERVER= 5,
+   QRTR_TYPE_DEL_CLIENT= 6,
+   QRTR_TYPE_RESUME_TX = 7,
+   QRTR_TYPE_EXIT  = 8,
+   QRTR_TYPE_PING  = 9,
+   QRTR_TYPE_NEW_LOOKUP= 10,
+   QRTR_TYPE_DEL_LOOKUP= 11,
+};
+
+struct qrtr_ctrl_pkt {
+   __le32 cmd;
+
+   union {
+   struct {
+   __le32 service;
+   __le32 instance;
+   __le32 node;
+   __le32 port;
+   } server;
+
+   struct {
+   __le32 node;
+   __le32 port;
+   } client;
+   };
+} __packed;
+
 #endif /* _LINUX_QRTR_H */
diff --git a/net/qrtr/qrtr.c b/net/qrtr/qrtr.c
index 0d7d3968414e..fac7cd6ea445 100644
--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -26,18 +26,6 @@
 #define QRTR_MIN_EPH_SOCKET 0x4000
 #define QRTR_MAX_EPH_SOCKET 0x7fff
 
-enum qrtr_pkt_type {
-   QRTR_TYPE_DATA  = 1,
-   QRTR_TYPE_HELLO = 2,
-   QRTR_TYPE_BYE   = 3,
-   QRTR_TYPE_NEW_SERVER= 4,
-   QRTR_TYPE_DEL_SERVER= 5,
-   QRTR_TYPE_DEL_CLIENT= 6,
-   QRTR_TYPE_RESUME_TX = 7,
-   QRTR_TYPE_EXIT  = 8,
-   QRTR_TYPE_PING  = 9,
-};
-
 /**
  * struct qrtr_hdr - (I|R)PCrouter packet header
  * @version: protocol version
-- 
2.12.0



[PATCH 5/6] soc: qcom: Introduce QMI helpers

2017-08-04 Thread Bjorn Andersson
Drivers that needs to communicate with a remote QMI service all has to
perform the operations of discovering the service, encoding and decoding
the messages and operate the socket. This introduces an abstraction for
these common operations, reducing most of the duplication in such cases.

Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/Makefile|   1 +
 drivers/soc/qcom/qmi_interface.c | 540 +++
 include/linux/soc/qcom/qmi.h | 133 ++
 3 files changed, 674 insertions(+)
 create mode 100644 drivers/soc/qcom/qmi_interface.c

diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index 27b60da7a062..812402ae9cfa 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_QCOM_MDT_LOADER)   += mdt_loader.o
 obj-$(CONFIG_QCOM_PM)  +=  spm.o
 obj-$(CONFIG_QCOM_QMI_HELPERS) += qmi_helpers.o
 qmi_helpers-y  += qmi_encdec.o
+qmi_helpers-y  += qmi_interface.o
 obj-$(CONFIG_QCOM_SMD_RPM) += smd-rpm.o
 obj-$(CONFIG_QCOM_SMEM) += smem.o
 obj-$(CONFIG_QCOM_SMEM_STATE) += smem_state.o
diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c
new file mode 100644
index ..41853a9becfd
--- /dev/null
+++ b/drivers/soc/qcom/qmi_interface.c
@@ -0,0 +1,540 @@
+/*
+ * Sample QRTR client driver
+ *
+ * Copyright (C) 2017 Linaro Ltd.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * qrtr_client_new_server() - handler of NEW_SERVER control message
+ * @qrtr:  qrtr handle
+ * @node:  node of the new server
+ * @port:  port of the new server
+ *
+ * Calls the new_server callback to inform the client about a newly registered
+ * server matching the currently registered service lookup.
+ */
+static void qrtr_client_new_server(struct qrtr_handle *qrtr,
+  unsigned int node, unsigned int port)
+{
+   struct qrtr_handle_ops *ops = >ops;
+   struct qrtr_service *service;
+   int ret;
+
+   if (!ops->new_server)
+   return;
+
+   /* Ignore EOF marker */
+   if (!node && !port)
+   return;
+
+   service = kzalloc(sizeof(*service), GFP_KERNEL);
+   if (!service)
+   return;
+
+   service->node = node;
+   service->port = port;
+
+   ret = ops->new_server(qrtr, service);
+   if (ret < 0)
+   kfree(service);
+   else
+   list_add(>list_node, >services);
+}
+
+/**
+ * qrtr_client_del_server() - handler of DEL_SERVER control message
+ * @qrtr:  qrtr handle
+ * @node:  node of the dying server, a value of -1 matches all nodes
+ * @port:  port of the dying server, a value of -1 matches all ports
+ *
+ * Calls the del_server callback for each previously seen server, allowing the
+ * client to react to the disappearing server.
+ */
+static void qrtr_client_del_server(struct qrtr_handle *qrtr,
+  unsigned int node, unsigned int port)
+{
+   struct qrtr_handle_ops *ops = >ops;
+   struct qrtr_service *service;
+   struct qrtr_service *tmp;
+
+   list_for_each_entry_safe(service, tmp, >services, list_node) {
+   if (node != -1 && service->node != node)
+   continue;
+   if (port != -1 && service->port != port)
+   continue;
+
+   if (ops->del_server)
+   ops->del_server(qrtr, service);
+
+   list_del(>list_node);
+   kfree(service);
+   }
+}
+
+static void qrtr_client_ctrl_pkt(struct qrtr_handle *qrtr,
+const void *buf, size_t len)
+{
+   const struct qrtr_ctrl_pkt *pkt = buf;
+
+   if (len < sizeof(struct qrtr_ctrl_pkt)) {
+   pr_debug("ignoring short control packet\n");
+   return;
+   }
+
+   switch (le32_to_cpu(pkt->cmd)) {
+   case QRTR_TYPE_NEW_SERVER:
+   qrtr_client_new_server(qrtr,
+  le32_to_cpu(pkt->server.node),
+  le32_to_cpu(pkt->server.port));
+   break;
+   case QRTR_TYPE_DEL_SERVER:
+   qrtr_client_del_server(qrtr,
+  le32_to_cpu(pkt->server.node),
+  le32_to_cpu(pkt->server.port));
+   break;
+  

[PATCH 6/6] samples: Introduce Qualcomm QRTR sample client

2017-08-04 Thread Bjorn Andersson
Introduce a sample driver that register for server notifications and
spawn clients for each available test service (service 15). The spawned
clients implements the interface for encoding "ping" and "data" requests
and decode the responses from the remote.

Signed-off-by: Bjorn Andersson 
---
 samples/Kconfig   |   8 +
 samples/Makefile  |   2 +-
 samples/qrtr/Makefile |   1 +
 samples/qrtr/qrtr_sample_client.c | 603 ++
 4 files changed, 613 insertions(+), 1 deletion(-)
 create mode 100644 samples/qrtr/Makefile
 create mode 100644 samples/qrtr/qrtr_sample_client.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 9cb63188d3ef..18796928ab58 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -62,6 +62,14 @@ config SAMPLE_KDB
  Build an example of how to dynamically add the hello
  command to the kdb shell.
 
+config SAMPLE_QRTR_CLIENT
+   tristate "Build qrtr client sample -- loadable modules only"
+   depends on m
+   select QCOM_QMI_HELPERS
+   help
+ Build an qrtr client sample driver, which demonstrates how to
+ communicate with a remote QRTR service, using QMI encoded messages.
+
 config SAMPLE_RPMSG_CLIENT
tristate "Build rpmsg client sample -- loadable modules only"
depends on RPMSG && m
diff --git a/samples/Makefile b/samples/Makefile
index db54e766ddb1..4bf64276860c 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -3,4 +3,4 @@
 obj-$(CONFIG_SAMPLES)  += kobject/ kprobes/ trace_events/ livepatch/ \
   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \
   configfs/ connector/ v4l/ trace_printk/ blackfin/ \
-  vfio-mdev/ statx/
+  vfio-mdev/ statx/ qrtr/
diff --git a/samples/qrtr/Makefile b/samples/qrtr/Makefile
new file mode 100644
index ..3f2c2cfdf2e7
--- /dev/null
+++ b/samples/qrtr/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_SAMPLE_QRTR_CLIENT) += qrtr_sample_client.o
diff --git a/samples/qrtr/qrtr_sample_client.c 
b/samples/qrtr/qrtr_sample_client.c
new file mode 100644
index ..ccb359de4340
--- /dev/null
+++ b/samples/qrtr/qrtr_sample_client.c
@@ -0,0 +1,603 @@
+/*
+ * Sample QRTR client driver
+ *
+ * Copyright (c) 2013-2014, The Linux Foundation. All rights reserved.
+ * Copyright (C) 2017 Linaro Ltd.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PING_REQ1_TLV_TYPE 0x1
+#define PING_RESP1_TLV_TYPE0x2
+#define PING_OPT1_TLV_TYPE 0x10
+#define PING_OPT2_TLV_TYPE 0x11
+
+#define DATA_REQ1_TLV_TYPE 0x1
+#define DATA_RESP1_TLV_TYPE0x2
+#define DATA_OPT1_TLV_TYPE 0x10
+#define DATA_OPT2_TLV_TYPE 0x11
+
+#define TEST_MED_DATA_SIZE_V01 8192
+#define TEST_MAX_NAME_SIZE_V01 255
+
+#define TEST_PING_REQ_MSG_ID_V01   0x20
+#define TEST_DATA_REQ_MSG_ID_V01   0x21
+
+#define TEST_PING_REQ_MAX_MSG_LEN_V01  266
+#define TEST_DATA_REQ_MAX_MSG_LEN_V01  8456
+
+struct test_name_type_v01 {
+   uint32_t name_len;
+   char name[TEST_MAX_NAME_SIZE_V01];
+};
+
+static struct qmi_elem_info test_name_type_v01_ei[] = {
+   {
+   .data_type  = QMI_DATA_LEN,
+   .elem_len   = 1,
+   .elem_size  = sizeof(uint8_t),
+   .is_array   = NO_ARRAY,
+   .tlv_type   = QMI_COMMON_TLV_TYPE,
+   .offset = offsetof(struct test_name_type_v01,
+  name_len),
+   },
+   {
+   .data_type  = QMI_UNSIGNED_1_BYTE,
+   .elem_len   = TEST_MAX_NAME_SIZE_V01,
+   .elem_size  = sizeof(char),
+   .is_array   = VAR_LEN_ARRAY,
+   .tlv_type   = QMI_COMMON_TLV_TYPE,
+   .offset = offsetof(struct test_name_type_v01,
+  name),
+   },
+   {}
+};
+
+struct test_ping_req_msg_v01 {
+   char ping[4];
+
+   uint8_t client_name_valid;
+   struct test_name_type_v01 client_name;
+};
+
+struct qmi_elem_info test_ping_req_msg_v01_ei[] = {
+   {
+   .data_type  = QMI_UNSIGNED_1_BYTE,
+   .elem_len   = 4,
+   .elem_size  = sizeof(char),
+ 

RE: [PATCH] net: phy: marvell: logical vs bitwise OR typo

2017-08-04 Thread David Laight
From: Dan Carpenter
> Sent: 04 August 2017 09:17
> This was supposed to be a bitwise OR but there is a || vs | typo.
> 
> Fixes: 864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay 
> configuration")
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 361fe9927ef2..15cbcdba618a 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -83,7 +83,7 @@
>  #define MII_88E1121_PHY_MSCR_REG 21
>  #define MII_88E1121_PHY_MSCR_RX_DELAYBIT(5)
>  #define MII_88E1121_PHY_MSCR_TX_DELAYBIT(4)
> -#define MII_88E1121_PHY_MSCR_DELAY_MASK  (~(BIT(5) || BIT(4)))
> +#define MII_88E1121_PHY_MSCR_DELAY_MASK  (~(BIT(5) | BIT(4)))

Wouldn't:
+#define MII_88E1121_PHY_MSCR_DELAY_MASK
(~(MII_88E1121_PHY_MSCR_RX_DELAY | MII_88E1121_PHY_MSCR_TX_DELAY))
be more correct?
If a little long?
Actually the ~ looks odd here
Reads code
Kill the define and explicitly mask off the two values just before
conditionally setting them.

David



  1   2   >