date:20170210

Re: [PATH v3 net-next] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread David Miller

From: yuan linyu 
Date: Sat, 11 Feb 2017 11:41:17 +0800

> From: yuan linyu 
> 
> 'max' only used at three places in scm.c,
> 1. in scm_fp_copy(), fpl->max = SCM_MAX_FD;
> 2. in scm_fp_copy(), if (fpl->count + num > fpl->max)
> 3. in scm_fp_dup(), new_fpl->max = new_fpl->count;
> at place 3, the worst case is new_fpl->count = SCM_MAX_FD,
> so do a full size dup, then 'max' field will always
> SCM_MAX_FD and it can be removed.
> 
> Signed-off-by: yuan linyu 

Please don't take this the wrong way, but I am ignoring your
patches on this issue.  This is even more broken than your
previous two submissions.

Sorry.

Re: [RFC][PATCH] nfsd: add +1 to reference counting scheme for struct nfsd4_session

2017-02-10 Thread David Windsor



> Signed-off-by: David Windsor 
> ---
>  fs/nfsd/nfs4state.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index a0dee8a..b0f3010 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -196,7 +196,7 @@ static void nfsd4_put_session_locked(struct nfsd4_session 
> *ses)
>
> lockdep_assert_held(>client_lock);
>
> -   if (atomic_dec_and_test(>se_ref) && is_session_dead(ses))
> +   if (!atomic_add_unless(>se_ref, -1, 1) && is_session_des(ses))

This should read:
if (!atomic_add_unless(>se_ref, -1, 1) && is_session_dead(ses))

> free_session(ses);
> put_client_renew_locked(clp);
>  }
> @@ -1645,7 +1645,7 @@ static void init_session(struct svc_rqst *rqstp, struct 
> nfsd4_session *new, stru
> new->se_flags = cses->flags;
> new->se_cb_prog = cses->callback_prog;
> new->se_cb_sec = cses->cb_sec;
> -   atomic_set(>se_ref, 0);
> +   atomic_set(>se_ref, 1);
> idx = hash_sessionid(>se_sessionid);
> list_add(>se_hash, >sessionid_hashtbl[idx]);
> spin_lock(>cl_lock);
> @@ -1792,7 +1792,7 @@ free_client(struct nfs4_client *clp)
> ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
> se_perclnt);
> list_del(>se_perclnt);
> -   WARN_ON_ONCE(atomic_read(>se_ref));
> +   WARN_ON_ONCE((atomic_read(>se_ref) > 1));
> free_session(ses);
> }
> rpc_destroy_wait_queue(>cl_cb_waitq);
> --
> 2.7.4
>

[RFC][PATCH] nfsd: add +1 to reference counting scheme for struct nfsd4_session

2017-02-10 Thread David Windsor

In furtherance of the KSPP effort to add overflow protection to kernel
reference counters, a new type (refcount_t) and API have been created.
Part of the refcount_t API is refcount_inc(), which will not increment a
refcount_t variable if its value is 0 (as this would indicate a possible
use-after-free condition). 

In auditing the kernel for refcounting corner cases, we've come across the
case of struct nfsd4_session.  

>From fs/nfsd/state.h:

/*
 * Representation of a v4.1+ session. These are refcounted in a similar 
 * fashion to the nfs4_client. References are only taken when the server
 * is actively working on the object (primarily during the processing of
 * compounds).
 */
struct nfsd4_session {
atomic_t se_ref;
...
};


>From fs/nfsd/nfs4state.c:

static void init_session(..., struct nfsd4_session *new, ...)
{
...
atomic_set(>se_ref, 0);
...
}
 
Since nfsd4_session objects are initialized with refcount = 0, subsequent
increments will fail using the new refcount_t API.

Being largely unfamiliar with this subsystem's garbage collection
mechanism, I'm unsure how to best fix this.  Attached is a patch that
performs a logical +1 on struct nfsd4_session's reference counting
scheme.

If this is the correct route to take, I will resubmit this patch with
updated comments for how struct nfsd4_session is refcounted (see the above
comment from fs/nsfd/state.h).  This is in preparation for the previously
mentioned refcount_t API series.

Thanks,
David Windsor

Signed-off-by: David Windsor 
---
 fs/nfsd/nfs4state.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a0dee8a..b0f3010 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -196,7 +196,7 @@ static void nfsd4_put_session_locked(struct nfsd4_session 
*ses)
 
lockdep_assert_held(>client_lock);
 
-   if (atomic_dec_and_test(>se_ref) && is_session_dead(ses))
+   if (!atomic_add_unless(>se_ref, -1, 1) && is_session_des(ses))
free_session(ses);
put_client_renew_locked(clp);
 }
@@ -1645,7 +1645,7 @@ static void init_session(struct svc_rqst *rqstp, struct 
nfsd4_session *new, stru
new->se_flags = cses->flags;
new->se_cb_prog = cses->callback_prog;
new->se_cb_sec = cses->cb_sec;
-   atomic_set(>se_ref, 0);
+   atomic_set(>se_ref, 1);
idx = hash_sessionid(>se_sessionid);
list_add(>se_hash, >sessionid_hashtbl[idx]);
spin_lock(>cl_lock);
@@ -1792,7 +1792,7 @@ free_client(struct nfs4_client *clp)
ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
se_perclnt);
list_del(>se_perclnt);
-   WARN_ON_ONCE(atomic_read(>se_ref));
+   WARN_ON_ONCE((atomic_read(>se_ref) > 1));
free_session(ses);
}
rpc_destroy_wait_queue(>cl_cb_waitq);
-- 
2.7.4

[PATCH net-next] vxlan: remove vni zero check and drop for COLLECT_METADATA

2017-02-10 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch drops the vni zero check for COLLECT_METADATA mode.
It is not really needed, vni zero is a valid vni.

Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode"
Reported-by: Joe Stringer 
Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2374a75..4e27c5b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1333,9 +1333,6 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
 
vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
 
-   if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni)
-   goto drop;
-
vxlan = vxlan_vs_find_vni(vs, vni);
if (!vxlan)
goto drop;
-- 
1.7.10.4

Re: [PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode

2017-02-10 Thread Roopa Prabhu

On 2/10/17, 8:05 PM, Joe Stringer wrote:
> On 31 January 2017 at 22:59, Roopa Prabhu  wrote:
>> @@ -1289,7 +1331,12 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
>> *skb)
>> if (!vs)
>> goto drop;
>>
>> -   vxlan = vxlan_vs_find_vni(vs, vxlan_vni(vxlan_hdr(skb)->vx_vni));
>> +   vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
>> +
>> +   if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni)
>> +   goto drop;
>> +
>> +   vxlan = vxlan_vs_find_vni(vs, vni);
>> if (!vxlan)
>> goto drop;
> Hi Roopa,
>
> We've noticed a failure in OVS system-traffic kmod test cases and
> bisected it down to this commit. It seems that it's related to this
> new drop condition here. Can you explain what's meant to be special
> about VNI 0? I can't see anything mentioned about it in RFC7348, so I
> don't see why it should be dropped.
>
> In the OVS testsuite, we configure OVS in the root namespace with an
> OVS vxlan device (which has VXLAN_F_COLLECT_METADATA set), with vni 0.
> Then, we configure a veth pair into another namespace where we have
> the other end of the tunnel configured using a regular native linux
> vxlan device on vni 0. Prior to this commit, the test worked; after
> this test it failed. If we manually change to use a nonzero VNI, it
> works. The test is here:
To be honest, I thought vni 0 was only used for the collect metadata device for 
lookup
of the device until a real vni was derived. and since i moved the line that got 
the vni from the packet
up, I ended up adding that check. Did not realize vni 0 could be valid vni in 
the packet.
>
> https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L218
>
> Jarno also tried setting up two namespaces with regular vxlan devices
> and VNI 0, and this worked too. Presumably this is because this would
> not use VXLAN_F_COLLECT_METADATA.
yeah, that should be it.

I will send a patch in a few hours. Thanks for reporting. I am glad you ran 
these tests.. as I was not able to
completely verify all cases for ovs.

[PATCH v2 net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag

2017-02-10 Thread Alexei Starovoitov

If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov 
---
v1->v2: disallowed overridable->non_override transition as suggested by Andy
added tests and fixed double detach bug

Andy, Daniel,
please review and ack quickly, so it can land into 4.10.
---
 include/linux/bpf-cgroup.h   | 13 
 include/uapi/linux/bpf.h |  7 +
 kernel/bpf/cgroup.c  | 59 +++---
 kernel/bpf/syscall.c | 20 
 kernel/cgroup.c  |  9 +++---
 samples/bpf/test_cgrp2_attach.c  |  2 +-
 samples/bpf/test_cgrp2_attach2.c | 68 +---
 samples/bpf/test_cgrp2_sock.c|  2 +-
 samples/bpf/test_cgrp2_sock2.c   |  2 +-
 tools/lib/bpf/bpf.c  |  4 ++-
 tools/lib/bpf/bpf.h  |  3 +-
 11 files changed, 151 insertions(+), 38 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 92bc89ae7e20..c970a25d2a49 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -21,20 +21,19 @@ struct cgroup_bpf {
 */
struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
struct bpf_prog __rcu *effective[MAX_BPF_ATTACH_TYPE];
+   bool disallow_override[MAX_BPF_ATTACH_TYPE];
 };
 
 void cgroup_bpf_put(struct cgroup *cgrp);
 void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
 
-void __cgroup_bpf_update(struct cgroup *cgrp,
-struct cgroup *parent,
-struct bpf_prog *prog,
-enum bpf_attach_type type);
+int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
+   struct bpf_prog *prog, enum bpf_attach_type type,
+   bool overridable);
 
 /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
-void cgroup_bpf_update(struct cgroup *cgrp,
-  struct bpf_prog *prog,
-  enum bpf_attach_type type);
+int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
+ enum bpf_attach_type type, bool overridable);
 
 int __cgroup_bpf_run_filter_skb(struct sock *sk,
struct sk_buff *skb,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e5b8cf16cbaf..69f65b710b10 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -116,6 +116,12 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+/* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
+ * to the given target_fd cgroup the descendent cgroup will be able to
+ * override effective bpf program that was inherited from this cgroup
+ */
+#define BPF_F_ALLOW_OVERRIDE   (1U << 0)
+
 #define BPF_PSEUDO_MAP_FD  1
 
 /* flags for BPF_MAP_UPDATE_ELEM command */
@@ -171,6 +177,7 @@ union bpf_attr {
__u32   target_fd;  /* container object to attach 
to */
__u32   attach_bpf_fd;  /* eBPF program to attach */
__u32   attach_type;
+   __u32   attach_flags;
};
 } __attribute__((aligned(8)));
 
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a515f7b007c6..da0f53690295 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -52,6 +52,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup 
*parent)
e = rcu_dereference_protected(parent->bpf.effective[type],
  lockdep_is_held(_mutex));
rcu_assign_pointer(cgrp->bpf.effective[type], e);
+   cgrp->bpf.disallow_override[type] = 
parent->bpf.disallow_override[type];
}
 }
 
@@ -82,30 +83,63 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup 
*parent)
  *
  * Must be called with cgroup_mutex held.
  */
-void __cgroup_bpf_update(struct cgroup *cgrp,
-struct cgroup *parent,
-struct bpf_prog

Re: [PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode

2017-02-10 Thread Joe Stringer

On 31 January 2017 at 22:59, Roopa Prabhu  wrote:
> @@ -1289,7 +1331,12 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
> *skb)
> if (!vs)
> goto drop;
>
> -   vxlan = vxlan_vs_find_vni(vs, vxlan_vni(vxlan_hdr(skb)->vx_vni));
> +   vni = vxlan_vni(vxlan_hdr(skb)->vx_vni);
> +
> +   if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni)
> +   goto drop;
> +
> +   vxlan = vxlan_vs_find_vni(vs, vni);
> if (!vxlan)
> goto drop;

Hi Roopa,

We've noticed a failure in OVS system-traffic kmod test cases and
bisected it down to this commit. It seems that it's related to this
new drop condition here. Can you explain what's meant to be special
about VNI 0? I can't see anything mentioned about it in RFC7348, so I
don't see why it should be dropped.

In the OVS testsuite, we configure OVS in the root namespace with an
OVS vxlan device (which has VXLAN_F_COLLECT_METADATA set), with vni 0.
Then, we configure a veth pair into another namespace where we have
the other end of the tunnel configured using a regular native linux
vxlan device on vni 0. Prior to this commit, the test worked; after
this test it failed. If we manually change to use a nonzero VNI, it
works. The test is here:

https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L218

Jarno also tried setting up two namespaces with regular vxlan devices
and VNI 0, and this worked too. Presumably this is because this would
not use VXLAN_F_COLLECT_METADATA.

Re: [PATCH] net: add regs attribute to phy device for user diagnose

2017-02-10 Thread yuan linyu

On 四, 2017-01-19 at 02:01 +0100, Andrew Lunn wrote:
> > 
> > I will add two ethtool command in kernel to read and write register in PHY. 
> Write access will get NACKed by me. Read only please.
some register need to write some value first then read.
if read only, it will not achieve the goal.
> 
> > 
> > ethtool can use these command to dump what user want, there is no
> > more work to PHY driver.
> Please think about how you handle PHYs with pages. This needs to be
> part of the API.
thank, I will.
> 
>  Andrew

[PATH v3 net-next] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread yuan linyu

From: yuan linyu 

'max' only used at three places in scm.c,
1. in scm_fp_copy(), fpl->max = SCM_MAX_FD;
2. in scm_fp_copy(), if (fpl->count + num > fpl->max)
3. in scm_fp_dup(), new_fpl->max = new_fpl->count;
at place 3, the worst case is new_fpl->count = SCM_MAX_FD,
so do a full size dup, then 'max' field will always
SCM_MAX_FD and it can be removed.

Signed-off-by: yuan linyu 
---
v2->v3:
change scm_fp_dup() to do a full size dup

v1->v2:
update commit log to describe correct reason to remove 'max'

 include/net/scm.h |  3 +--
 net/core/scm.c| 23 ++-
 2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 59fa93c..1301227 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -19,8 +19,7 @@ struct scm_creds {
 };
 
 struct scm_fp_list {
-   short   count;
-   short   max;
+   unsigned intcount;
struct user_struct  *user;
struct file *fp[SCM_MAX_FD];
 };
diff --git a/net/core/scm.c b/net/core/scm.c
index b6d8368..fb3ab32 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
int *fdp = (int*)CMSG_DATA(cmsg);
struct scm_fp_list *fpl = *fplp;
struct file **fpp;
-   int i, num;
-
-   num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
-
-   if (num <= 0)
-   return 0;
-
-   if (num > SCM_MAX_FD)
-   return -EINVAL;
+   unsigned int i, num;
 
if (!fpl)
{
@@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
return -ENOMEM;
*fplp = fpl;
fpl->count = 0;
-   fpl->max = SCM_MAX_FD;
fpl->user = NULL;
}
-   fpp = >fp[fpl->count];
 
-   if (fpl->count + num > fpl->max)
+   num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
+   if (fpl->count + num > SCM_MAX_FD)
return -EINVAL;
 
/*
 *  Verify the descriptors and increment the usage count.
 */
-
+   fpp = >fp[fpl->count];
for (i=0; i< num; i++)
{
int fd = fdp[i];
@@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
if (!fpl->user)
fpl->user = get_uid(current_user());
 
-   return num;
+   return 0;
 }
 
 void __scm_destroy(struct scm_cookie *scm)
@@ -336,12 +327,10 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
if (!fpl)
return NULL;
 
-   new_fpl = kmemdup(fpl, offsetof(struct scm_fp_list, fp[fpl->count]),
- GFP_KERNEL);
+   new_fpl = kmemdup(fpl, sizeof(*fpl), GFP_KERNEL);
if (new_fpl) {
for (i = 0; i < fpl->count; i++)
get_file(fpl->fp[i]);
-   new_fpl->max = new_fpl->count;
new_fpl->user = get_uid(fpl->user);
}
return new_fpl;
-- 
2.7.4

[PATCH 3/7] staging: r8712u: Fix macros used to read/write the TX/RX descriptors

2017-02-10 Thread Larry Finger

Although the driver works on big-endian hardware, Sparse generates a lot
of warnings. Many of these are the result of incorrect coding of these
macros.

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/wifi.h | 109 -
 1 file changed, 52 insertions(+), 57 deletions(-)

diff --git a/drivers/staging/rtl8712/wifi.h b/drivers/staging/rtl8712/wifi.h
index 7ebf247..74dfc9b 100644
--- a/drivers/staging/rtl8712/wifi.h
+++ b/drivers/staging/rtl8712/wifi.h
@@ -151,92 +151,88 @@ enum WIFI_REG_DOMAIN {
 #define _ORDER_BIT(15)
 
 #define SetToDs(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_TO_DS_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_TO_DS_); \
 })
 
-#define GetToDs(pbuf)  (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_TO_DS_)) != 0)
+#define GetToDs(pbuf)  (((*(__le16 *)(pbuf)) & cpu_to_le16(_TO_DS_)) != 0)
 
 #define ClearToDs(pbuf)({ \
-   *(unsigned short *)(pbuf) &= (~cpu_to_le16(_TO_DS_)); \
+   *(__le16 *)(pbuf) &= (~cpu_to_le16(_TO_DS_)); \
 })
 
 #define SetFrDs(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_FROM_DS_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_FROM_DS_); \
 })
 
-#define GetFrDs(pbuf)  (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_FROM_DS_)) != 0)
+#define GetFrDs(pbuf)  (((*(__le16 *)(pbuf)) & cpu_to_le16(_FROM_DS_)) != 0)
 
 #define ClearFrDs(pbuf)({ \
-   *(unsigned short *)(pbuf) &= (~cpu_to_le16(_FROM_DS_)); \
+   *(__le16 *)(pbuf) &= (~cpu_to_le16(_FROM_DS_)); \
 })
 
 #define get_tofr_ds(pframe)((GetToDs(pframe) << 1) | GetFrDs(pframe))
 
 
 #define SetMFrag(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_MORE_FRAG_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_MORE_FRAG_); \
 })
 
-#define GetMFrag(pbuf) (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_MORE_FRAG_)) != 0)
+#define GetMFrag(pbuf) (((*(__le16 *)(pbuf)) & cpu_to_le16(_MORE_FRAG_)) != 0)
 
 #define ClearMFrag(pbuf) ({ \
-   *(unsigned short *)(pbuf) &= (~cpu_to_le16(_MORE_FRAG_)); \
+   *(__le16 *)(pbuf) &= (~cpu_to_le16(_MORE_FRAG_)); \
 })
 
 #define SetRetry(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_RETRY_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_RETRY_); \
 })
 
-#define GetRetry(pbuf) (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_RETRY_)) != 0)
+#define GetRetry(pbuf) (((*(__le16 *)(pbuf)) & cpu_to_le16(_RETRY_)) != 0)
 
 #define ClearRetry(pbuf) ({ \
-   *(unsigned short *)(pbuf) &= (~cpu_to_le16(_RETRY_)); \
+   *(__le16 *)(pbuf) &= (~cpu_to_le16(_RETRY_)); \
 })
 
 #define SetPwrMgt(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_PWRMGT_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_PWRMGT_); \
 })
 
-#define GetPwrMgt(pbuf)(((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_PWRMGT_)) != 0)
+#define GetPwrMgt(pbuf)(((*(__le16 *)(pbuf)) & \
+   cpu_to_le16(_PWRMGT_)) != 0)
 
 #define ClearPwrMgt(pbuf) ({ \
-   *(unsigned short *)(pbuf) &= (~cpu_to_le16(_PWRMGT_)); \
+   *(__le16 *)(pbuf) &= (~cpu_to_le16(_PWRMGT_)); \
 })
 
 #define SetMData(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_MORE_DATA_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_MORE_DATA_); \
 })
 
-#define GetMData(pbuf) (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_MORE_DATA_)) != 0)
+#define GetMData(pbuf) (((*(__le16 *)(pbuf)) & \
+   cpu_to_le16(_MORE_DATA_)) != 0)
 
 #define ClearMData(pbuf) ({ \
-   *(unsigned short *)(pbuf) &= (~cpu_to_le16(_MORE_DATA_)); \
+   *(__le16 *)(pbuf) &= (~cpu_to_le16(_MORE_DATA_)); \
 })
 
 #define SetPrivacy(pbuf) ({ \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(_PRIVACY_); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(_PRIVACY_); \
 })
 
-#define GetPrivacy(pbuf)   (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_PRIVACY_)) != 0)
+#define GetPrivacy(pbuf)   (((*(__le16 *)(pbuf)) & \
+   cpu_to_le16(_PRIVACY_)) != 0)
 
-#define GetOrder(pbuf) (((*(unsigned short *)(pbuf)) & \
-   le16_to_cpu(_ORDER_)) != 0)
+#define GetOrder(pbuf) (((*(__le16 *)(pbuf)) & \
+   cpu_to_le16(_ORDER_)) != 0)
 
 #define GetFrameType(pbuf) (le16_to_cpu(*(__le16 *)(pbuf)) & \
(BIT(3) | BIT(2)))
 
 #define SetFrameType(pbuf, type)   \
do {\
-   *(unsigned short *)(pbuf) &= cpu_to_le16(~(BIT(3) | \
+   *(__le16 *)(pbuf) &= cpu_to_le16(~(BIT(3) | \
BIT(2))); \
-   *(unsigned short *)(pbuf) |= cpu_to_le16(type); \
+   *(__le16 *)(pbuf) |= cpu_to_le16(type); \
} while (0)
 
 #define GetFrameSubType(pbuf)  (le16_to_cpu(*(__le16 *)(pbuf)) & \
@@ -245,44 +241,43 @@

[PATCH 7/7] staging: r8712u: Fix Sparse warnings in rtl871x_mlme.c

2017-02-10 Thread Larry Finger

Sparse reports the following:
  CHECK   drivers/staging/rtl8712/rtl871x_mlme.c
drivers/staging/rtl8712/rtl871x_mlme.c:1653:46: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8712/rtl871x_mlme.c:1653:46:expected unsigned int 
[unsigned] [usertype] DSConfig
drivers/staging/rtl8712/rtl871x_mlme.c:1653:46:got restricted __le32 
[usertype] 
drivers/staging/rtl8712/rtl871x_mlme.c:1656:56: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8712/rtl871x_mlme.c:1656:56:expected unsigned int 
[unsigned] [usertype] ATIMWindow
drivers/staging/rtl8712/rtl871x_mlme.c:1656:56:got restricted __le32 
[usertype] 
drivers/staging/rtl8712/rtl871x_mlme.c:1712:35: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8712/rtl871x_mlme.c:1712:35:expected restricted __le16 
[addressable] [usertype] cap_info
drivers/staging/rtl8712/rtl871x_mlme.c:1712:35:got int

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/rtl871x_mlme.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/rtl8712/rtl871x_mlme.c 
b/drivers/staging/rtl8712/rtl871x_mlme.c
index fd8d96d..bf1ac22 100644
--- a/drivers/staging/rtl8712/rtl871x_mlme.c
+++ b/drivers/staging/rtl8712/rtl871x_mlme.c
@@ -1650,10 +1650,9 @@ void r8712_update_registrypriv_dev_network(struct 
_adapter *adapter)
/* TODO */
break;
}
-   pdev_network->Configuration.DSConfig = cpu_to_le32(
-  pregistrypriv->channel);
+   pdev_network->Configuration.DSConfig = pregistrypriv->channel;
if (cur_network->network.InfrastructureMode == Ndis802_11IBSS)
-   pdev_network->Configuration.ATIMWindow = cpu_to_le32(3);
+   pdev_network->Configuration.ATIMWindow = 3;
pdev_network->InfrastructureMode = 
cur_network->network.InfrastructureMode;
/* 1. Supported rates
 * 2. IE
@@ -1709,12 +1708,12 @@ unsigned int r8712_restructure_ht_ie(struct _adapter 
*padapter, u8 *in_ie,
}
out_len = *pout_len;
memset(_capie, 0, sizeof(struct ieee80211_ht_cap));
-   ht_capie.cap_info = IEEE80211_HT_CAP_SUP_WIDTH |
+   ht_capie.cap_info = cpu_to_le16(IEEE80211_HT_CAP_SUP_WIDTH |
IEEE80211_HT_CAP_SGI_20 |
IEEE80211_HT_CAP_SGI_40 |
IEEE80211_HT_CAP_TX_STBC |
IEEE80211_HT_CAP_MAX_AMSDU |
-   IEEE80211_HT_CAP_DSSSCCK40;
+   IEEE80211_HT_CAP_DSSSCCK40);
ht_capie.ampdu_params_info = (IEEE80211_HT_CAP_AMPDU_FACTOR &
0x03) | (IEEE80211_HT_CAP_AMPDU_DENSITY & 0x00);
r8712_set_ie(out_ie + out_len, _HT_CAPABILITY_IE_,
-- 
2.10.2

[PATCH 1/7] staging: rtl8712: Fix some Sparse endian messages

2017-02-10 Thread Larry Finger

Sparse reports the following:

  CHECK   drivers/staging/rtl8712/rtl8712_xmit.c
drivers/staging/rtl8712/rtl8712_xmit.c:564:42: warning: cast from restricted 
__le32
drivers/staging/rtl8712/rtl8712_xmit.c:569:42: warning: cast from restricted 
__le32
drivers/staging/rtl8712/rtl8712_xmit.c:571:42: warning: cast from restricted 
__le32

Each of these cases is transferring a quantity that is little-endian. There
is no need for conversion.

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/rtl8712_xmit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rtl8712/rtl8712_xmit.c 
b/drivers/staging/rtl8712/rtl8712_xmit.c
index 4231a0a..7fe6265 100644
--- a/drivers/staging/rtl8712/rtl8712_xmit.c
+++ b/drivers/staging/rtl8712/rtl8712_xmit.c
@@ -561,14 +561,14 @@ static void update_txdesc(struct xmit_frame *pxmitframe, 
uint *pmem, int sz)
 
ptxdesc_mp = _mp;
/* offset 8 */
-   ptxdesc->txdw2 = cpu_to_le32(ptxdesc_mp->txdw2);
+   ptxdesc->txdw2 = ptxdesc_mp->txdw2;
if (bmcst)
ptxdesc->txdw2 |= cpu_to_le32(BMC);
ptxdesc->txdw2 |= cpu_to_le32(BK);
/* offset 16 */
-   ptxdesc->txdw4 = cpu_to_le32(ptxdesc_mp->txdw4);
+   ptxdesc->txdw4 = ptxdesc_mp->txdw4;
/* offset 20 */
-   ptxdesc->txdw5 = cpu_to_le32(ptxdesc_mp->txdw5);
+   ptxdesc->txdw5 = ptxdesc_mp->txdw5;
pattrib->pctrl = 0;/* reset to zero; */
}
} else if (pxmitframe->frame_tag == MGNT_FRAMETAG) {
-- 
2.10.2

[PATCH 2/7] staging: rtl8712u: Fix endian settings for structs describing network packets

2017-02-10 Thread Larry Finger

The headers describing a number of network packets do not have the
correct endian settings for several types of data.

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/ieee80211.h | 84 ++---
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/drivers/staging/rtl8712/ieee80211.h 
b/drivers/staging/rtl8712/ieee80211.h
index 67ab580..68fd65e 100644
--- a/drivers/staging/rtl8712/ieee80211.h
+++ b/drivers/staging/rtl8712/ieee80211.h
@@ -138,51 +138,51 @@ struct ieee_ibss_seq {
 };
 
 struct ieee80211_hdr {
-   u16 frame_ctl;
-   u16 duration_id;
+   __le16 frame_ctl;
+   __le16 duration_id;
u8 addr1[ETH_ALEN];
u8 addr2[ETH_ALEN];
u8 addr3[ETH_ALEN];
-   u16 seq_ctl;
+   __le16 seq_ctl;
u8 addr4[ETH_ALEN];
-} __packed;
+}  __packed __aligned(2);
 
 struct ieee80211_hdr_3addr {
-   u16 frame_ctl;
-   u16 duration_id;
+   __le16 frame_ctl;
+   __le16 duration_id;
u8 addr1[ETH_ALEN];
u8 addr2[ETH_ALEN];
u8 addr3[ETH_ALEN];
-   u16 seq_ctl;
-} __packed;
+   __le16 seq_ctl;
+}  __packed __aligned(2);
 
 struct ieee80211_hdr_qos {
-   u16 frame_ctl;
-   u16 duration_id;
+   __le16 frame_ctl;
+   __le16 duration_id;
u8 addr1[ETH_ALEN];
u8 addr2[ETH_ALEN];
u8 addr3[ETH_ALEN];
-   u16 seq_ctl;
+   __le16 seq_ctl;
u8 addr4[ETH_ALEN];
-   u16 qc;
-}  __packed;
+   __le16  qc;
+}   __packed __aligned(2);
 
 struct  ieee80211_hdr_3addr_qos {
-   u16 frame_ctl;
-   u16 duration_id;
+   __le16 frame_ctl;
+   __le16 duration_id;
u8  addr1[ETH_ALEN];
u8  addr2[ETH_ALEN];
u8  addr3[ETH_ALEN];
-   u16 seq_ctl;
-   u16 qc;
+   __le16 seq_ctl;
+   __le16 qc;
 }  __packed;
 
 struct eapol {
u8 snap[6];
-   u16 ethertype;
+   __be16 ethertype;
u8 version;
u8 type;
-   u16 length;
+   __le16 length;
 } __packed;
 
 enum eap_type {
@@ -514,13 +514,13 @@ struct ieee80211_security {
  */
 
 struct ieee80211_header_data {
-   u16 frame_ctl;
-   u16 duration_id;
+   __le16 frame_ctl;
+   __le16 duration_id;
u8 addr1[6];
u8 addr2[6];
u8 addr3[6];
-   u16 seq_ctrl;
-};
+   __le16 seq_ctrl;
+} __packed __aligned(2);
 
 #define BEACON_PROBE_SSID_ID_POSITION 12
 
@@ -552,18 +552,18 @@ struct ieee80211_info_element {
 /*
  * These are the data types that can make up management packets
  *
-   u16 auth_algorithm;
-   u16 auth_sequence;
-   u16 beacon_interval;
-   u16 capability;
+   __le16 auth_algorithm;
+   __le16 auth_sequence;
+   __le16 beacon_interval;
+   __le16 capability;
u8 current_ap[ETH_ALEN];
-   u16 listen_interval;
+   __le16 listen_interval;
struct {
u16 association_id:14, reserved:2;
} __packed;
-   u32 time_stamp[2];
-   u16 reason;
-   u16 status;
+   __le32 time_stamp[2];
+   __le16 reason;
+   __le16 status;
 */
 
 #define IEEE80211_DEFAULT_TX_ESSID "Penguin"
@@ -571,16 +571,16 @@ struct ieee80211_info_element {
 
 struct ieee80211_authentication {
struct ieee80211_header_data header;
-   u16 algorithm;
-   u16 transaction;
-   u16 status;
+   __le16 algorithm;
+   __le16 transaction;
+   __le16 status;
 } __packed;
 
 struct ieee80211_probe_response {
struct ieee80211_header_data header;
-   u32 time_stamp[2];
-   u16 beacon_interval;
-   u16 capability;
+   __le32 time_stamp[2];
+   __le16 beacon_interval;
+   __le16 capability;
struct ieee80211_info_element info_element;
 } __packed;
 
@@ -590,16 +590,16 @@ struct ieee80211_probe_request {
 
 struct ieee80211_assoc_request_frame {
struct ieee80211_hdr_3addr header;
-   u16 capability;
-   u16 listen_interval;
+   __le16 capability;
+   __le16 listen_interval;
struct ieee80211_info_element_hdr info_element;
 } __packed;
 
 struct ieee80211_assoc_response_frame {
struct ieee80211_hdr_3addr header;
-   u16 capability;
-   u16 status;
-   u16 aid;
+   __le16 capability;
+   __le16 status;
+   __le16 aid;
 } __packed;
 
 struct ieee80211_txb {
-- 
2.10.2

[PATCH 4/7] staging: r8712u: Fix Sparse warning in rtl871x_xmit.c

2017-02-10 Thread Larry Finger

Sparse reports the following:
  CHECK   drivers/staging/rtl8712/rtl871x_xmit.c
drivers/staging/rtl8712/rtl871x_xmit.c:350:44: warning: restricted __le32 
degrades to integer
drivers/staging/rtl8712/rtl871x_xmit.c:491:23: warning: incorrect type in 
initializer (different base types)
drivers/staging/rtl8712/rtl871x_xmit.c:491:23:expected unsigned short 
[usertype] *fctrl
drivers/staging/rtl8712/rtl871x_xmit.c:491:23:got restricted __le16 
*
drivers/staging/rtl8712/rtl871x_xmit.c:580:36: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8712/rtl871x_xmit.c:580:36:expected unsigned short 
[unsigned] [short] [usertype] 
drivers/staging/rtl8712/rtl871x_xmit.c:580:36:got restricted __be16 
[usertype] 

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/rtl871x_xmit.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/rtl8712/rtl871x_xmit.c 
b/drivers/staging/rtl8712/rtl871x_xmit.c
index 4ab82ba..de88819 100644
--- a/drivers/staging/rtl8712/rtl871x_xmit.c
+++ b/drivers/staging/rtl8712/rtl871x_xmit.c
@@ -347,7 +347,8 @@ sint r8712_update_attrib(struct _adapter *padapter, _pkt 
*pkt,
 * some settings above.
 */
if (check_fwstate(pmlmepriv, WIFI_MP_STATE))
-   pattrib->priority = (txdesc.txdw1 >> QSEL_SHT) & 0x1f;
+   pattrib->priority =
+   (le32_to_cpu(txdesc.txdw1) >> QSEL_SHT) & 0x1f;
return _SUCCESS;
 }
 
@@ -488,7 +489,7 @@ static sint make_wlanhdr(struct _adapter *padapter, u8 *hdr,
struct ieee80211_hdr *pwlanhdr = (struct ieee80211_hdr *)hdr;
struct mlme_priv *pmlmepriv = >mlmepriv;
struct qos_priv *pqospriv = >qospriv;
-   u16 *fctrl = >frame_ctl;
+   __le16 *fctrl = >frame_ctl;
 
memset(hdr, 0, WLANHDR_OFFSET);
SetFrameSubType(fctrl, pattrib->subtype);
@@ -577,7 +578,7 @@ static sint r8712_put_snap(u8 *data, u16 h_proto)
snap->oui[0] = oui[0];
snap->oui[1] = oui[1];
snap->oui[2] = oui[2];
-   *(u16 *)(data + SNAP_SIZE) = htons(h_proto);
+   *(__be16 *)(data + SNAP_SIZE) = htons(h_proto);
return SNAP_SIZE + sizeof(u16);
 }
 
-- 
2.10.2

[PATCH 5/7] staging: r8712u: Fix Sparse endian warning in rtl871x_recv.c

2017-02-10 Thread Larry Finger

Sparse reports the following:
  CHECK   drivers/staging/rtl8712/rtl871x_recv.c
drivers/staging/rtl8712/rtl871x_recv.c:657:21: warning: incorrect type in 
assignment (different base types)
drivers/staging/rtl8712/rtl871x_recv.c:657:21:expected unsigned short 
[unsigned] [assigned] [usertype] len
drivers/staging/rtl8712/rtl871x_recv.c:657:21:got restricted __be16 
[usertype] 

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/rtl871x_recv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8712/rtl871x_recv.c 
b/drivers/staging/rtl8712/rtl871x_recv.c
index 147b75b..2ef31a4 100644
--- a/drivers/staging/rtl8712/rtl871x_recv.c
+++ b/drivers/staging/rtl8712/rtl871x_recv.c
@@ -654,8 +654,9 @@ sint r8712_wlanhdr_to_ethhdr(union recv_frame *precvframe)
memcpy(ptr, pattrib->dst, ETH_ALEN);
memcpy(ptr + ETH_ALEN, pattrib->src, ETH_ALEN);
if (!bsnaphdr) {
-   len = htons(len);
-   memcpy(ptr + 12, , 2);
+   __be16 be_tmp = htons(len);
+
+   memcpy(ptr + 12, _tmp, 2);
}
return _SUCCESS;
 }
-- 
2.10.2

[PATCH 0/7] Fix Sparse endian warnings in r8712u

2017-02-10 Thread Larry Finger

Now that endian checking is an automatic part of Sparse, it is advisable
to fix these warnings under controlled conditions, which include testing
on big-endian hardware. This set of patches fix all the issues.

Signed-off-by: Larry Finger 


Larry Finger (7):
  staging: r8712u: Fix some Sparse endian messages
  staging: r8712u: Fix endian settings for structs describing network
packets
  staging: r8712u: Fix macros used to read/write the TX/RX descriptors
  staging: r8712u: Fix Sparse warning in rtl871x_xmit.c
  staging: r8712u: Fix Sparse endian warning in rtl871x_recv.c
  staging: r8712u: Fix Sparse warnings in rtl871x_ioctl_linux.c
  staging: r8712u: Fix Sparse warnings in rtl871x_mlme.c

 drivers/staging/rtl8712/ieee80211.h   |  84 ++--
 drivers/staging/rtl8712/rtl8712_xmit.c|   6 +-
 drivers/staging/rtl8712/rtl871x_ioctl_linux.c |   4 +-
 drivers/staging/rtl8712/rtl871x_mlme.c|   9 +--
 drivers/staging/rtl8712/rtl871x_recv.c|   5 +-
 drivers/staging/rtl8712/rtl871x_xmit.c|   7 +-
 drivers/staging/rtl8712/wifi.h| 109 --
 7 files changed, 110 insertions(+), 114 deletions(-)

-- 
2.10.2

[PATCH 6/7] staging: r8712u: Fix Sparse warnings in rtl871x_ioctl_linux.c

2017-02-10 Thread Larry Finger

Sparse reports the following:
  CHECK   drivers/staging/rtl8712/rtl871x_ioctl_linux.c
drivers/staging/rtl8712/rtl871x_ioctl_linux.c:1422:46: warning: restricted 
__le16 degrades to integer
drivers/staging/rtl8712/rtl871x_ioctl_linux.c:1424:46: warning: restricted 
__le16 degrades to integer

Signed-off-by: Larry Finger 
---
 drivers/staging/rtl8712/rtl871x_ioctl_linux.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rtl8712/rtl871x_ioctl_linux.c 
b/drivers/staging/rtl8712/rtl871x_ioctl_linux.c
index 0dc18d6..f4167f1 100644
--- a/drivers/staging/rtl8712/rtl871x_ioctl_linux.c
+++ b/drivers/staging/rtl8712/rtl871x_ioctl_linux.c
@@ -1419,9 +1419,9 @@ static int r8711_wx_get_rate(struct net_device *dev,
ht_cap = true;
pht_capie = (struct ieee80211_ht_cap *)(p + 2);
memcpy(_rate, pht_capie->supp_mcs_set, 2);
-   bw_40MHz = (pht_capie->cap_info &
+   bw_40MHz = (le16_to_cpu(pht_capie->cap_info) &
IEEE80211_HT_CAP_SUP_WIDTH) ? 1 : 0;
-   short_GI = (pht_capie->cap_info &
+   short_GI = (le16_to_cpu(pht_capie->cap_info) &
(IEEE80211_HT_CAP_SGI_20 |
IEEE80211_HT_CAP_SGI_40)) ? 1 : 0;
}
-- 
2.10.2

Re: [PATH v2 net-next] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread yuan linyu

hi,
yes, my misunderstanding.

it's error when use after dup.

can we do a full size(SCM_MAX_FD) dup?
 

On 六, 2017-02-11 at 10:36 +0800, yuan linyu wrote:
> From: yuan linyu 
> 
> 'max' only used at three places in scm.c,
> 1. in scm_fp_copy(), fpl->max = SCM_MAX_FD;
> 2. in scm_fp_copy(), if (fpl->count + num > fpl->max)
> 3. in scm_fp_dup(), new_fpl->max = new_fpl->count;
> at place 2, fpl->max can be replaced with SCM_MAX_FD.
> no other place read this 'max' again, so it can be removed.
> 
> Signed-off-by: yuan linyu 
> ---
> v1->v2:
> update commit log to describe correct reason to remove 'max'
> 
>  include/net/scm.h |  3 +--
>  net/core/scm.c| 20 +---
>  2 files changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 59fa93c..1301227 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -19,8 +19,7 @@ struct scm_creds {
>  };
>  
>  struct scm_fp_list {
> - short   count;
> - short   max;
> + unsigned intcount;
>   struct user_struct  *user;
>   struct file *fp[SCM_MAX_FD];
>  };
> diff --git a/net/core/scm.c b/net/core/scm.c
> index b6d8368..53679517 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
> scm_fp_list **fplp)
>   int *fdp = (int*)CMSG_DATA(cmsg);
>   struct scm_fp_list *fpl = *fplp;
>   struct file **fpp;
> - int i, num;
> -
> - num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
> -
> - if (num <= 0)
> - return 0;
> -
> - if (num > SCM_MAX_FD)
> - return -EINVAL;
> + unsigned int i, num;
>  
>   if (!fpl)
>   {
> @@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
> scm_fp_list **fplp)
>   return -ENOMEM;
>   *fplp = fpl;
>   fpl->count = 0;
> - fpl->max = SCM_MAX_FD;
>   fpl->user = NULL;
>   }
> - fpp = >fp[fpl->count];
>  
> - if (fpl->count + num > fpl->max)
> + num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
> + if (fpl->count + num > SCM_MAX_FD)
>   return -EINVAL;
>  
>   /*
>    *  Verify the descriptors and increment the usage count.
>    */
> -
> + fpp = >fp[fpl->count];
>   for (i=0; i< num; i++)
>   {
>   int fd = fdp[i];
> @@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
> scm_fp_list **fplp)
>   if (!fpl->user)
>   fpl->user = get_uid(current_user());
>  
> - return num;
> + return 0;
>  }
>  
>  void __scm_destroy(struct scm_cookie *scm)
> @@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
>   if (new_fpl) {
>   for (i = 0; i < fpl->count; i++)
>   get_file(fpl->fp[i]);
> - new_fpl->max = new_fpl->count;
>   new_fpl->user = get_uid(fpl->user);
>   }
>   return new_fpl;

[PATH v2 net-next] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread yuan linyu

From: yuan linyu 

'max' only used at three places in scm.c,
1. in scm_fp_copy(), fpl->max = SCM_MAX_FD;
2. in scm_fp_copy(), if (fpl->count + num > fpl->max)
3. in scm_fp_dup(), new_fpl->max = new_fpl->count;
at place 2, fpl->max can be replaced with SCM_MAX_FD.
no other place read this 'max' again, so it can be removed.

Signed-off-by: yuan linyu 
---
v1->v2:
update commit log to describe correct reason to remove 'max'

 include/net/scm.h |  3 +--
 net/core/scm.c| 20 +---
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 59fa93c..1301227 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -19,8 +19,7 @@ struct scm_creds {
 };
 
 struct scm_fp_list {
-   short   count;
-   short   max;
+   unsigned intcount;
struct user_struct  *user;
struct file *fp[SCM_MAX_FD];
 };
diff --git a/net/core/scm.c b/net/core/scm.c
index b6d8368..53679517 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
int *fdp = (int*)CMSG_DATA(cmsg);
struct scm_fp_list *fpl = *fplp;
struct file **fpp;
-   int i, num;
-
-   num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
-
-   if (num <= 0)
-   return 0;
-
-   if (num > SCM_MAX_FD)
-   return -EINVAL;
+   unsigned int i, num;
 
if (!fpl)
{
@@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
return -ENOMEM;
*fplp = fpl;
fpl->count = 0;
-   fpl->max = SCM_MAX_FD;
fpl->user = NULL;
}
-   fpp = >fp[fpl->count];
 
-   if (fpl->count + num > fpl->max)
+   num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int);
+   if (fpl->count + num > SCM_MAX_FD)
return -EINVAL;
 
/*
 *  Verify the descriptors and increment the usage count.
 */
-
+   fpp = >fp[fpl->count];
for (i=0; i< num; i++)
{
int fd = fdp[i];
@@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct 
scm_fp_list **fplp)
if (!fpl->user)
fpl->user = get_uid(current_user());
 
-   return num;
+   return 0;
 }
 
 void __scm_destroy(struct scm_cookie *scm)
@@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
if (new_fpl) {
for (i = 0; i < fpl->count; i++)
get_file(fpl->fp[i]);
-   new_fpl->max = new_fpl->count;
new_fpl->user = get_uid(fpl->user);
}
return new_fpl;
-- 
2.7.4

Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag

2017-02-10 Thread Alexei Starovoitov


On 2/10/17 1:38 PM, Andy Lutomirski wrote:

On Thu, Feb 9, 2017 at 10:59 AM, Alexei Starovoitov  wrote:

If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with ALLOW_OVERRIDE
prog Y attached to /A/B with default. Everything under /A/B runs prog Y


I think that, for ease of future extension, Y should also need
ALLOW_OVERRIDE.  Otherwise, when non-overridable hooks can stack,
there could be confusion as to whether Y should override something or
should stack.


I see. Fair enough. It's indeed easier for future extensions.


2.
we can add another flag to reverse this call order too.
Instead of calling the progs from child to parent, do parent to child.


I think the order should depend on the hook.  Hooks for
process-initiated actions (egress, socket creation) should run
innermost first and hooks for outside actions (ingress) should be
outermost first.


There are use cases where both ingress and egress
would want both ordering. Like the monitoring would want to
see the bytes that app wants to send and it would want
to see the bytes that it's actually sending. So if something
in the middle wants to drop due to whatever conditions,
the monitoring needs to be the first and the last in the prog chain.
That's one of the use cases for 'attach_priority'.
Some high priority can be reserved for debugging and so on.


Andy,
does it all make sense?


Yes with the caveat above.


great!


Do you still insist on submitting this patch officially?


I'm not sure what you mean.


it's an RFC. In netdev we never apply rfc patches.


or you're ok keeping it overridable for now.


I really think the default should change for 4.10.  People are going


fine. will respin with requested change.

Re: [PATCH] net: remove member 'max' of struct scm_fp_list

2017-02-10 Thread yuan linyu

On 五, 2017-02-10 at 10:25 -0500, David Miller wrote:
> From: yuan linyu 
> Date: Fri, 10 Feb 2017 20:11:13 +0800
> 
> > From: yuan linyu 
> > 
> > SCM_MAX_FD can fully replace it.
> > 
> > Signed-off-by: yuan linyu 
> 
> I don't think so:
> 
> > @@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
> >   if (new_fpl) {
> >   for (i = 0; i < fpl->count; i++)
> >   get_file(fpl->fp[i]);
> > - new_fpl->max = new_fpl->count;
> >   new_fpl->user = get_uid(fpl->user);
> 
> It's not set the SCM_MAX_FD here, it's set to whatever fpl->count is.
> 
> In other words, your patch breaks things.
maybe it's not good to "SCM_MAX_FD can fully replace it".
actually 'max' field is useless.'count' field is enough.

[PATCH] net: ethernet: ti: cpsw: return NET_XMIT_DROP if skb_padto failed

2017-02-10 Thread Ivan Khoronzhuk

If skb_padto failed the skb has been dropped already, so it was
consumed, but it doesn't mean it was sent, thus no need to update
queue tx time, etc. So, return NET_XMIT_DROP as more appropriate.

Signed-off-by: Ivan Khoronzhuk 
---
Based on net-next/master

 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 4d1c0c3..503fa8a 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1604,7 +1604,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff 
*skb,
if (skb_padto(skb, CPSW_MIN_PACKET_SIZE)) {
cpsw_err(priv, tx_err, "packet pad failed\n");
ndev->stats.tx_dropped++;
-   return NETDEV_TX_OK;
+   return NET_XMIT_DROP;
}
 
if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP &&
-- 
2.7.4

Re: [PATCH 2/3] Bluetooth: cmtp: fix possible might sleep error in cmtp_session

2017-02-10 Thread Brian Norris

Hi,

On Tue, Jan 24, 2017 at 12:07:50PM +0800, Jeffy Chen wrote:
> It looks like cmtp_session has same pattern as the issue reported in
> old rfcomm:
> 
>   while (1) {
>   set_current_state(TASK_INTERRUPTIBLE);
>   if (condition)
>   break;
>   // may call might_sleep here
>   schedule();
>   }
>   __set_current_state(TASK_RUNNING);
> 
> Which fixed at:
>   dfb2fae Bluetooth: Fix nested sleeps
> 
> So let's fix it at the same way, also follow the suggestion of:
> https://lwn.net/Articles/628628/
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
>  net/bluetooth/cmtp/core.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
> index 9e59b66..6b03f2b 100644
> --- a/net/bluetooth/cmtp/core.c
> +++ b/net/bluetooth/cmtp/core.c
> @@ -280,16 +280,16 @@ static int cmtp_session(void *arg)
>   struct cmtp_session *session = arg;
>   struct sock *sk = session->sock->sk;
>   struct sk_buff *skb;
> - wait_queue_t wait;
> + DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  
>   BT_DBG("session %p", session);
>  
>   set_user_nice(current, -15);
>  
> - init_waitqueue_entry(, current);
>   add_wait_queue(sk_sleep(sk), );
>   while (1) {
> - set_current_state(TASK_INTERRUPTIBLE);
> + /* Ensure session->terminate is updated */
> + smp_mb__before_atomic();
>  
>   if (atomic_read(>terminate))
>   break;
> @@ -306,9 +306,8 @@ static int cmtp_session(void *arg)
>  
>   cmtp_process_transmit(session);
>  
> - schedule();
> + wait_woken(, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
>   }
> - __set_current_state(TASK_RUNNING);
>   remove_wait_queue(sk_sleep(sk), );
>  
>   down_write(_session_sem);
> @@ -393,7 +392,11 @@ int cmtp_add_connection(struct cmtp_connadd_req *req, 
> struct socket *sock)
>   err = cmtp_attach_device(session);
>   if (err < 0) {
>   atomic_inc(>terminate);
> - wake_up_process(session->task);
> +
> + /* Ensure session->terminate is updated */
> + smp_mb__after_atomic();
> +

Same comment about the barrier.

> + wake_up_interruptible(sk_sleep(session->sock->sk));
>   up_write(_session_sem);
>   return err;
>   }
> @@ -431,7 +434,11 @@ int cmtp_del_connection(struct cmtp_conndel_req *req)
>  
>   /* Stop session thread */
>   atomic_inc(>terminate);
> - wake_up_process(session->task);
> +
> + /* Ensure session->terminate is updated */
> + smp_mb__after_atomic();

And again.

But otherwise I think this looks OK, again with the caveat that I don't
know Bluetooth/CMTP that well:

Reviewed-by: Brian Norris 

> +
> + wake_up_interruptible(sk_sleep(session->sock->sk));
>   } else
>   err = -ENOENT;
>  
> -- 
> 2.1.4
> 
>

Re: [PATCH 1/3] Bluetooth: bnep: fix possible might sleep error in bnep_session

2017-02-10 Thread Brian Norris

Hi,

On Tue, Jan 24, 2017 at 12:07:49PM +0800, Jeffy Chen wrote:
> It looks like bnep_session has same pattern as the issue reported in
> old rfcomm:
> 
>   while (1) {
>   set_current_state(TASK_INTERRUPTIBLE);
>   if (condition)
>   break;
>   // may call might_sleep here
>   schedule();
>   }
>   __set_current_state(TASK_RUNNING);
> 
> Which fixed at:
>   dfb2fae Bluetooth: Fix nested sleeps
> 
> So let's fix it at the same way, also follow the suggestion of:
> https://lwn.net/Articles/628628/
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
>  net/bluetooth/bnep/core.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
> index fbf251f..da04d51 100644
> --- a/net/bluetooth/bnep/core.c
> +++ b/net/bluetooth/bnep/core.c
> @@ -484,16 +484,16 @@ static int bnep_session(void *arg)
>   struct net_device *dev = s->dev;
>   struct sock *sk = s->sock->sk;
>   struct sk_buff *skb;
> - wait_queue_t wait;
> + DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  
>   BT_DBG("");
>  
>   set_user_nice(current, -15);
>  
> - init_waitqueue_entry(, current);
>   add_wait_queue(sk_sleep(sk), );
>   while (1) {
> - set_current_state(TASK_INTERRUPTIBLE);
> + /* Ensure session->terminate is updated */
> + smp_mb__before_atomic();
>  
>   if (atomic_read(>terminate))
>   break;
> @@ -515,9 +515,8 @@ static int bnep_session(void *arg)
>   break;
>   netif_wake_queue(dev);
>  
> - schedule();
> + wait_woken(, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
>   }
> - __set_current_state(TASK_RUNNING);
>   remove_wait_queue(sk_sleep(sk), );
>  
>   /* Cleanup session */
> @@ -666,7 +665,11 @@ int bnep_del_connection(struct bnep_conndel_req *req)
>   s = __bnep_get_session(req->dst);
>   if (s) {
>   atomic_inc(>terminate);
> - wake_up_process(s->task);
> +
> + /* Ensure session->terminate is updated */
> + smp_mb__after_atomic();
> +

__wake_up() suggests:

 * It may be assumed that this function implies a write memory barrier before
 * changing the task state if and only if any tasks are woken up.

so the above barrier is probably unnecessary. I'm not so sure about the
one before atomic_read(); seems fine.

Other than that, I this looks ok:

Reviewed-by: Brian Norris 

But I haven't been testing BNEP.

Brian

> + wake_up_interruptible(sk_sleep(s->sock->sk));
>   } else
>   err = -ENOENT;
>  
> -- 
> 2.1.4
> 
>

Re: [PATCH 3/3] Bluetooth: hidp: fix possible might sleep error in hidp_session_thread

2017-02-10 Thread Brian Norris

Hi Jeffy,

I'm really not an expert on bluetooth or HIDP, but I can't bring myself
to say that this is correct. I still think you have a problem.

On Tue, Jan 24, 2017 at 12:07:51PM +0800, Jeffy Chen wrote:
> It looks like hidp_session_thread has same pattern as the issue reported in
> old rfcomm:
> 
>   while (1) {
>   set_current_state(TASK_INTERRUPTIBLE);
>   if (condition)
>   break;
>   // may call might_sleep here
>   schedule();
>   }
>   __set_current_state(TASK_RUNNING);
> 
> Which fixed at:
>   dfb2fae Bluetooth: Fix nested sleeps
> 
> So let's fix it at the same way, also follow the suggestion of:
> https://lwn.net/Articles/628628/
> 
> Signed-off-by: Jeffy Chen 
> ---
> 
>  net/bluetooth/hidp/core.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
> index 0bec458..43d6e6a 100644
> --- a/net/bluetooth/hidp/core.c
> +++ b/net/bluetooth/hidp/core.c
> @@ -36,6 +36,7 @@
>  #define VERSION "1.2"
>  
>  static DECLARE_RWSEM(hidp_session_sem);
> +static DECLARE_WAIT_QUEUE_HEAD(hidp_session_wq);
>  static LIST_HEAD(hidp_session_list);
>  
>  static unsigned char hidp_keycode[256] = {
> @@ -1068,12 +1069,15 @@ static int hidp_session_start_sync(struct 
> hidp_session *session)
>   * Wake up session thread and notify it to stop. This is asynchronous and
>   * returns immediately. Call this whenever a runtime error occurs and you 
> want
>   * the session to stop.
> - * Note: wake_up_process() performs any necessary memory-barriers for us.
>   */
>  static void hidp_session_terminate(struct hidp_session *session)
>  {
>   atomic_inc(>terminate);
> - wake_up_process(session->task);
> +
> + /* Ensure session->terminate is updated */
> + smp_mb__after_atomic();
> +
> + wake_up_interruptible(_session_wq);

So, you're adding a whole new wait queue here.

>  }
>  
>  /*
> @@ -1180,7 +1184,9 @@ static void hidp_session_run(struct hidp_session 
> *session)
>   struct sock *ctrl_sk = session->ctrl_sock->sk;
>   struct sock *intr_sk = session->intr_sock->sk;
>   struct sk_buff *skb;
> + DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  
> + add_wait_queue(_session_wq, );
>   for (;;) {
>   /*
>* This thread can be woken up two ways:
> @@ -1188,12 +1194,10 @@ static void hidp_session_run(struct hidp_session 
> *session)
>*session->terminate flag and wakes this thread up.
>*  - Via modifying the socket state of ctrl/intr_sock. This
>*thread is woken up by ->sk_state_changed().
> -  *
> -  * Note: set_current_state() performs any necessary
> -  * memory-barriers for us.
>*/
> - set_current_state(TASK_INTERRUPTIBLE);
>  
> + /* Ensure session->terminate is updated */
> + smp_mb__before_atomic();
>   if (atomic_read(>terminate))
>   break;
>  
> @@ -1227,11 +1231,14 @@ static void hidp_session_run(struct hidp_session 
> *session)
>   hidp_process_transmit(session, >ctrl_transmit,
> session->ctrl_sock);
>  
> - schedule();
> + wait_woken(, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);

And you're waiting on it here.

But you're already on two other wait queues (hidp_session_thread()). So
the nice WQ_FLAG_WOKEN handling will only happen if you get woken via
the new hidp_session_wq queue. But what about the other two? Seems like
again you might have a race condition that would lead you to
(temporarily, at least?) missing a wake-up attempt.

I'm not really sure what the best way to resolve this would be. My best
guess would be to either consolidate the use of these wait queues, or
lese roll a version of wait_woken() to handle 2 or more wait heads...

Am I wrong? I easily could be.

Brian

>   }
> + remove_wait_queue(_session_wq, );
>  
>   atomic_inc(>terminate);
> - set_current_state(TASK_RUNNING);
> +
> + /* Ensure session->terminate is updated */
> + smp_mb__after_atomic();
>  }
>  
>  /*
> -- 
> 2.1.4
> 
>

Re: [PATCH] net: ethernet: ti: netcp_core: return netdev_tx_t in xmit

2017-02-10 Thread Ivan Khoronzhuk

On Fri, Feb 10, 2017 at 02:45:21PM -0500, David Miller wrote:
> From: Ivan Khoronzhuk 
> Date: Thu,  9 Feb 2017 16:24:14 +0200
> 
> > @@ -1300,7 +1301,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, 
> > struct net_device *ndev)
> > dev_warn(netcp->ndev_dev, "padding failed (%d), packet 
> > dropped\n",
> >  ret);
> > tx_stats->tx_dropped++;
> > -   return ret;
> > +   return NETDEV_TX_BUSY;
> > }
> > skb->len = NETCP_MIN_PACKET_SIZE;
> > }
> > @@ -1329,7 +1330,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, 
> > struct net_device *ndev)
> > if (desc)
> > netcp_free_tx_desc_chain(netcp, desc, sizeof(*desc));
> > dev_kfree_skb(skb);
> > -   return ret;
> > +   return NETDEV_TX_BUSY;
> >  }
> 
> I really think these should be returning NET_XMIT_DROP.

Yes, it seems here can be a little more changes then, will send new version
later.

Re: net: hix5hd2_gmac uninitialized net_device

2017-02-10 Thread Marty Plummer

On Fri, Feb 10, 2017 at 06:21:35PM +0800, Dongpo Li wrote:
> I think the error "No irq resource" happened for some other reason, has no 
> relation with
> the info "(unnamed net_device) (uninitialized):".
> You can add more debug info to find bug.
Do you have any particular suggestions as to what to check out, or is
this just a general 'debug more' instruction?
> Yes, I agree with you that the ndev has not been initialized completely,
> because the function "register_netdev" has not been called yet.
> It's better to use the "dev_err" to replace the "netdev_err".
>
Ah, I see. So, prior to line 1266's call to register_netdev, it will
always be uninitialized and unnamed, regardless of what is or isn't
right elsewhere. Good to know. So, I could replace these netdev_err
with dev_err for now, up until that point, so I can get a bit more info,
yes?
> 
> Regards,
> Dongpo
>

Regards,
Marty

Re: [PATCH v3 net-next 4/9] sunvnet: add driver stats for ethtool support

2017-02-10 Thread Stephen Hemminger

On Fri, 10 Feb 2017 09:38:20 -0800
Shannon Nelson  wrote:

> +static void vsw_get_ethtool_stats(struct net_device *dev,
> +   struct ethtool_stats *estats, u64 *data)
> +{
> + int i = 0;
> +
> + data[i++] = dev->stats.rx_packets;
> + data[i++] = dev->stats.tx_packets;
> + data[i++] = dev->stats.rx_bytes;
> + data[i++] = dev->stats.tx_bytes;
> + data[i++] = dev->stats.rx_errors;
> + data[i++] = dev->stats.tx_errors;
> + data[i++] = dev->stats.rx_dropped;
> + data[i++] = dev->stats.tx_dropped;
> + data[i++] = dev->stats.multicast;

Please do not duplicate regular network statistics into ethtool.
This doesn't really add any value.

[PATCHv6 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-02-10 Thread Sainath Grandhi

Tap character devices can be implemented on other virtual interfaces like
ipvlan, similar to macvtap. Source code for tap functionality in macvtap
can be re-used for this purpose.

This patch series splits macvtap source into two modules, macvtap and tap.
This patch series also includes a patch for implementing tap character
device driver based on the IP-VLAN network interface, called ipvtap.

These patches are tested on x86 platform.

Sainath Grandhi (7):
  tap: Refactoring macvtap.c
  tap: Renaming tap related APIs, data structures, macros
  tap: Tap character device creation/destroy API
  tap: Abstract type of virtual interface from tap implementation
  tap: Extending tap device create/destroy APIs
  tap: tap as an independent module
  ipvtap: IP-VLAN based tap driver

 drivers/net/Kconfig  |   20 +
 drivers/net/Makefile |2 +
 drivers/net/ipvlan/Makefile  |1 +
 drivers/net/ipvlan/ipvlan.h  |7 +
 drivers/net/ipvlan/ipvlan_core.c |3 +-
 drivers/net/ipvlan/ipvlan_main.c |   27 +-
 drivers/net/ipvlan/ipvtap.c  |  241 +++
 drivers/net/macvlan.c|2 +-
 drivers/net/macvtap.c| 1229 ++--
 drivers/net/tap.c| 1285 ++
 drivers/vhost/Kconfig|2 +-
 drivers/vhost/net.c  |3 +-
 include/linux/if_macvlan.h   |   17 +-
 include/linux/if_tap.h   |   75 +++
 14 files changed, 1706 insertions(+), 1208 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c
 create mode 100644 drivers/net/tap.c
 create mode 100644 include/linux/if_tap.h

-- 
2.7.4

Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume

2017-02-10 Thread Ivan Khoronzhuk

On Fri, Feb 10, 2017 at 12:05:07PM -0600, Grygorii Strashko wrote:
> 
> 
> On 02/09/2017 07:45 PM, David Miller wrote:
> >From: Ivan Khoronzhuk 
> >Date: Fri, 10 Feb 2017 00:54:24 +0200
> >
> >>On Thu, Feb 09, 2017 at 05:21:26PM -0500, David Miller wrote:
> >>>From: Ivan Khoronzhuk 
> >>>Date: Thu,  9 Feb 2017 02:07:34 +0200
> >>>
> These two patches fix suspend/resume chain.
> >>>
> >>>Patch 2 doesn't apply cleanly to the 'net' tree, please
> >>>respin this series.
> >>
> >>Strange, I've just checked it on net-next/master, it was applied w/o any
> >>warnings.
> >
> >It makes no sense to test "net-next" when I am telling you that it is
> >the "net" tree it doesn't apply to.
> >
> >This is a bug fix, so it should be targetting the "net" tree.
> >
> 
> Looks like the first fix is for net, but the second one is for net-next
> I do not see
> 03fd01ad0eead23eb79294b6fb4d71dcac493855
> "net: ethernet: ti: cpsw: don't duplicate ndev_running"
> in net.

There is dependency, both for net-next and only first is for net tree

> 
> -- 
> regards,
> -grygorii

[PATCHv6 1/7] tap: Refactoring macvtap.c

2017-02-10 Thread Sainath Grandhi

macvtap module has code for tap/queue management and link management. This 
patch splits
the code into macvtap_main.c for link management and tap.c for tap/queue 
management.
Functionality in tap.c can be re-used for implementing tap on other virtual 
interfaces.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Makefile |   2 +
 drivers/net/macvtap_main.c   | 218 +++
 drivers/net/{macvtap.c => tap.c} | 204 ++--
 include/linux/if_macvtap.h   |  10 ++
 4 files changed, 238 insertions(+), 196 deletions(-)
 create mode 100644 drivers/net/macvtap_main.c
 rename drivers/net/{macvtap.c => tap.c} (84%)
 create mode 100644 include/linux/if_macvtap.h

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd..19b03a9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -29,6 +29,8 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 
+macvtap-objs := macvtap_main.o tap.o
+
 #
 # Networking Drivers
 #
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
new file mode 100644
index 000..96ffa60
--- /dev/null
+++ b/drivers/net/macvtap_main.c
@@ -0,0 +1,218 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Variables for dealing with macvtaps device numbers.
+ */
+static dev_t macvtap_major;
+#define MACVTAP_NUM_DEVS (1U << MINORBITS)
+
+static const void *macvtap_net_namespace(struct device *d)
+{
+   struct net_device *dev = to_net_dev(d->parent);
+   return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+   .name = "macvtap",
+   .owner = THIS_MODULE,
+   .ns_type = _ns_type_operations,
+   .namespace = macvtap_net_namespace,
+};
+static struct cdev macvtap_cdev;
+
+#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
+ NETIF_F_TSO6 | NETIF_F_UFO)
+
+static int macvtap_newlink(struct net *src_net,
+  struct net_device *dev,
+  struct nlattr *tb[],
+  struct nlattr *data[])
+{
+   struct macvlan_dev *vlan = netdev_priv(dev);
+   int err;
+
+   INIT_LIST_HEAD(>queue_list);
+
+   /* Since macvlan supports all offloads by default, make
+* tap support all offloads also.
+*/
+   vlan->tap_features = TUN_OFFLOADS;
+
+   err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan);
+   if (err)
+   return err;
+
+   /* Don't put anything that may fail after macvlan_common_newlink
+* because we can't undo what it does.
+*/
+   err = macvlan_common_newlink(src_net, dev, tb, data);
+   if (err) {
+   netdev_rx_handler_unregister(dev);
+   return err;
+   }
+
+   return 0;
+}
+
+static void macvtap_dellink(struct net_device *dev,
+   struct list_head *head)
+{
+   netdev_rx_handler_unregister(dev);
+   macvtap_del_queues(dev);
+   macvlan_dellink(dev, head);
+}
+
+static void macvtap_setup(struct net_device *dev)
+{
+   macvlan_common_setup(dev);
+   dev->tx_queue_len = TUN_READQ_SIZE;
+}
+
+static struct rtnl_link_ops macvtap_link_ops __read_mostly = {
+   .kind   = "macvtap",
+   .setup  = macvtap_setup,
+   .newlink= macvtap_newlink,
+   .dellink= macvtap_dellink,
+};
+
+static int macvtap_device_event(struct notifier_block *unused,
+   unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct macvlan_dev *vlan;
+   struct device *classdev;
+   dev_t devt;
+   int err;
+   char tap_name[IFNAMSIZ];
+
+   if (dev->rtnl_link_ops != _link_ops)
+   return NOTIFY_DONE;
+
+   snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex);
+   vlan = netdev_priv(dev);
+
+   switch (event) {
+   case NETDEV_REGISTER:
+   /* Create the device node here after the network device has
+* been registered but before register_netdevice has
+* finished running.
+*/
+   err = macvtap_get_minor(vlan);
+   if (err)
+   return notifier_from_errno(err);
+
+   devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
+   classdev = device_create(_class, >dev, devt,
+dev, tap_name);
+   if (IS_ERR(classdev)) {
+   macvtap_free_minor(vlan);
+   return notifier_from_errno(PTR_ERR(classdev));
+   }
+   err =

[PATCHv6 2/7] tap: Renaming tap related APIs, data structures, macros

2017-02-10 Thread Sainath Grandhi

Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* 
to tap_.*

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c |  18 +--
 drivers/net/tap.c  | 332 ++---
 drivers/vhost/net.c|   3 +-
 include/linux/if_macvlan.h |  17 +--
 include/linux/if_macvtap.h |  10 --
 include/linux/if_tap.h |  23 
 6 files changed, 202 insertions(+), 201 deletions(-)
 delete mode 100644 include/linux/if_macvtap.h
 create mode 100644 include/linux/if_tap.h

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 96ffa60..548f339 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -1,6 +1,6 @@
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -62,7 +62,7 @@ static int macvtap_newlink(struct net *src_net,
 */
vlan->tap_features = TUN_OFFLOADS;
 
-   err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan);
+   err = netdev_rx_handler_register(dev, tap_handle_frame, vlan);
if (err)
return err;
 
@@ -82,7 +82,7 @@ static void macvtap_dellink(struct net_device *dev,
struct list_head *head)
 {
netdev_rx_handler_unregister(dev);
-   macvtap_del_queues(dev);
+   tap_del_queues(dev);
macvlan_dellink(dev, head);
 }
 
@@ -121,7 +121,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
 * been registered but before register_netdevice has
 * finished running.
 */
-   err = macvtap_get_minor(vlan);
+   err = tap_get_minor(vlan);
if (err)
return notifier_from_errno(err);
 
@@ -129,7 +129,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
classdev = device_create(_class, >dev, devt,
 dev, tap_name);
if (IS_ERR(classdev)) {
-   macvtap_free_minor(vlan);
+   tap_free_minor(vlan);
return notifier_from_errno(PTR_ERR(classdev));
}
err = sysfs_create_link(>dev.kobj, >kobj,
@@ -144,10 +144,10 @@ static int macvtap_device_event(struct notifier_block 
*unused,
sysfs_remove_link(>dev.kobj, tap_name);
devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
device_destroy(_class, devt);
-   macvtap_free_minor(vlan);
+   tap_free_minor(vlan);
break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
-   if (macvtap_queue_resize(vlan))
+   if (tap_queue_resize(vlan))
return NOTIFY_BAD;
break;
}
@@ -159,7 +159,7 @@ static struct notifier_block macvtap_notifier_block 
__read_mostly = {
.notifier_call  = macvtap_device_event,
 };
 
-extern struct file_operations macvtap_fops;
+extern struct file_operations tap_fops;
 static int macvtap_init(void)
 {
int err;
@@ -169,7 +169,7 @@ static int macvtap_init(void)
if (err)
goto out1;
 
-   cdev_init(_cdev, _fops);
+   cdev_init(_cdev, _fops);
err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS);
if (err)
goto out2;
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 6f6228e..15ca2d5 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -24,16 +24,16 @@
 #include 
 
 /*
- * A macvtap queue is the central object of this driver, it connects
+ * A tap queue is the central object of this driver, it connects
  * an open character device to a macvlan interface. There can be
  * multiple queues on one interface, which map back to queues
  * implemented in hardware on the underlying device.
  *
- * macvtap_proto is used to allocate queues through the sock allocation
+ * tap_proto is used to allocate queues through the sock allocation
  * mechanism.
  *
  */
-struct macvtap_queue {
+struct tap_queue {
struct sock sk;
struct socket sock;
struct socket_wq wq;
@@ -47,21 +47,21 @@ struct macvtap_queue {
struct skb_array skb_array;
 };
 
-#define MACVTAP_FEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE)
+#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE)
 
-#define MACVTAP_VNET_LE 0x8000
-#define MACVTAP_VNET_BE 0x4000
+#define TAP_VNET_LE 0x8000
+#define TAP_VNET_BE 0x4000
 
 #ifdef CONFIG_TUN_VNET_CROSS_LE
-static inline bool macvtap_legacy_is_little_endian(struct macvtap_queue *q)
+static inline bool tap_legacy_is_little_endian(struct tap_queue *q)
 {
-   return q->flags & MACVTAP_VNET_BE ? false :
+   return q->flags & TAP_VNET_BE ? false :
virtio_legacy_is_little_endian();
 }
 
-static long macvtap_get_vnet_be(struct macvtap_queue *q, int __user *sp)
+static long tap_get_vnet_be(struct tap_queue *q, int __user *sp)

RE: [PATCHv5 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-02-10 Thread Grandhi, Sainath



> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, February 09, 2017 2:08 PM
> To: Grandhi, Sainath 
> Cc: netdev@vger.kernel.org; mah...@bandewar.net; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCHv5 0/7] Refactor macvtap to re-use tap functionality by
> other virtual intefaces
> 
> From: Sainath Grandhi 
> Date: Wed,  8 Feb 2017 13:37:09 -0800
> 
> > Tap character devices can be implemented on other virtual interfaces
> > like ipvlan, similar to macvtap. Source code for tap functionality in
> > macvtap can be re-used for this purpose.
> >
> > This patch series splits macvtap source into two modules, macvtap and tap.
> > This patch series also includes a patch for implementing tap character
> > device driver based on the IP-VLAN network interface, called ipvtap.
> >
> > These patches are tested on x86 platform.
> 
> I get rejects on patch #7 when I try to apply this to net-next, please respin.

Please check next version. I have based it on net-next.
There is a change in "net-next" repo with ipvlan_core.c that has not made into 
"net" repo.

[PATCHv6 4/7] tap: Abstract type of virtual interface from tap implementation

2017-02-10 Thread Sainath Grandhi

macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvlan.c  |   2 +-
 drivers/net/macvtap_main.c |  71 +---
 drivers/net/tap.c  | 264 -
 include/linux/if_tap.h |  57 +-
 4 files changed, 229 insertions(+), 165 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index cbfc1be..9261722 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1525,7 +1525,6 @@ static const struct nla_policy 
macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
 int macvlan_link_register(struct rtnl_link_ops *ops)
 {
/* common fields */
-   ops->priv_size  = sizeof(struct macvlan_dev);
ops->validate   = macvlan_validate;
ops->maxtype= IFLA_MACVLAN_MAX;
ops->policy = macvlan_policy;
@@ -1548,6 +1547,7 @@ static struct rtnl_link_ops macvlan_link_ops = {
.newlink= macvlan_newlink,
.dellink= macvlan_dellink,
.get_link_net   = macvlan_get_link_net,
+   .priv_size  = sizeof(struct macvlan_dev),
 };
 
 static int macvlan_device_event(struct notifier_block *unused,
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 215ab7a..0238df6 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 
+struct macvtap_dev {
+   struct macvlan_dev vlan;
+   struct tap_devtap;
+};
+
 /*
  * Variables for dealing with macvtaps device numbers.
  */
@@ -46,22 +51,55 @@ static struct cdev macvtap_cdev;
 #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
  NETIF_F_TSO6 | NETIF_F_UFO)
 
+static void macvtap_count_tx_dropped(struct tap_dev *tap)
+{
+   struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, 
tap);
+   struct macvlan_dev *vlan = >vlan;
+
+   this_cpu_inc(vlan->pcpu_stats->tx_dropped);
+}
+
+static void macvtap_count_rx_dropped(struct tap_dev *tap)
+{
+   struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, 
tap);
+   struct macvlan_dev *vlan = >vlan;
+
+   macvlan_count_rx(vlan, 0, 0, 0);
+}
+
+static void macvtap_update_features(struct tap_dev *tap,
+   netdev_features_t features)
+{
+   struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, 
tap);
+   struct macvlan_dev *vlan = >vlan;
+
+   vlan->set_features = features;
+   netdev_update_features(vlan->dev);
+}
+
 static int macvtap_newlink(struct net *src_net,
   struct net_device *dev,
   struct nlattr *tb[],
   struct nlattr *data[])
 {
-   struct macvlan_dev *vlan = netdev_priv(dev);
+   struct macvtap_dev *vlantap = netdev_priv(dev);
int err;
 
-   INIT_LIST_HEAD(>queue_list);
+   INIT_LIST_HEAD(>tap.queue_list);
 
/* Since macvlan supports all offloads by default, make
 * tap support all offloads also.
 */
-   vlan->tap_features = TUN_OFFLOADS;
+   vlantap->tap.tap_features = TUN_OFFLOADS;
 
-   err = netdev_rx_handler_register(dev, tap_handle_frame, vlan);
+   /* Register callbacks for rx/tx drops accounting and updating
+* net_device features
+*/
+   vlantap->tap.count_tx_dropped = macvtap_count_tx_dropped;
+   vlantap->tap.count_rx_dropped = macvtap_count_rx_dropped;
+   vlantap->tap.update_features  = macvtap_update_features;
+
+   err = netdev_rx_handler_register(dev, tap_handle_frame, >tap);
if (err)
return err;
 
@@ -74,14 +112,18 @@ static int macvtap_newlink(struct net *src_net,
return err;
}
 
+   vlantap->tap.dev = vlantap->vlan.dev;
+
return 0;
 }
 
 static void macvtap_dellink(struct net_device *dev,
struct list_head *head)
 {
+   struct macvtap_dev *vlantap = netdev_priv(dev);
+
netdev_rx_handler_unregister(dev);
-   tap_del_queues(dev);
+   tap_del_queues(>tap);
macvlan_dellink(dev, head);
 }
 
@@ -96,13 +138,14 @@ static struct rtnl_link_ops macvtap_link_ops __read_mostly 
= {
.setup  = macvtap_setup,
.newlink= macvtap_newlink,
.dellink= macvtap_dellink,
+   .priv_size  = sizeof(struct macvtap_dev),
 };
 
 static int macvtap_device_event(struct notifier_block *unused,
unsigned long event, void *ptr)
 {
struct net_device *dev =

[PATCHv6 7/7] ipvtap: IP-VLAN based tap driver

2017-02-10 Thread Sainath Grandhi

This patch adds a tap character device driver that is based on the
IP-VLAN network interface, called ipvtap. An ipvtap device can be created
in the same way as an ipvlan device, using 'type ipvtap', and then accessed
using the tap user space interface.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Kconfig  |  13 +++
 drivers/net/Makefile |   1 +
 drivers/net/ipvlan/Makefile  |   1 +
 drivers/net/ipvlan/ipvlan.h  |   7 ++
 drivers/net/ipvlan/ipvlan_core.c |   3 +-
 drivers/net/ipvlan/ipvlan_main.c |  27 +++--
 drivers/net/ipvlan/ipvtap.c  | 241 +++
 7 files changed, 280 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 5763503..823bc2f 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -166,6 +166,19 @@ config IPVLAN
   To compile this driver as a module, choose M here: the module
   will be called ipvlan.
 
+config IPVTAP
+   tristate "IP-VLAN based tap driver"
+   depends on IPVLAN
+   depends on INET
+   select TAP
+   ---help---
+ This adds a specialized tap character device driver that is based
+ on the IP-VLAN network interface, called ipvtap. An ipvtap device
+ can be added in the same way as a ipvlan device, using 'type
+ ipvtap', and then be accessed through the tap user space interface.
+
+ To compile this driver as a module, choose M here: the module
+ will be called ipvtap.
 
 config VXLAN
tristate "Virtual eXtensible Local Area Network (VXLAN)"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7dd86ca..98ed4d9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -7,6 +7,7 @@
 #
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_IPVLAN) += ipvlan/
+obj-$(CONFIG_IPVTAP) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile
index df79910..8a2c64d 100644
--- a/drivers/net/ipvlan/Makefile
+++ b/drivers/net/ipvlan/Makefile
@@ -3,5 +3,6 @@
 #
 
 obj-$(CONFIG_IPVLAN) += ipvlan.o
+obj-$(CONFIG_IPVTAP) += ipvtap.o
 
 ipvlan-objs := ipvlan_core.o ipvlan_main.o
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 406ae4f..800a46c 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -135,4 +135,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, 
struct sk_buff *skb,
  u16 proto);
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 const struct nf_hook_state *state);
+void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+unsigned int len, bool success, bool mcast);
+int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[]);
+void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
+void ipvlan_link_setup(struct net_device *dev);
+int ipvlan_link_register(struct rtnl_link_ops *ops);
 #endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 8ae335d..1f3295e 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -16,7 +16,7 @@ void ipvlan_init_secret(void)
net_get_random_once(_jhash_secret, sizeof(ipvlan_jhash_secret));
 }
 
-static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
unsigned int len, bool success, bool mcast)
 {
if (likely(success)) {
@@ -33,6 +33,7 @@ static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
this_cpu_inc(ipvlan->pcpu_stats->rx_errs);
}
 }
+EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 95b18f4..aa8575c 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -496,8 +496,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb,
return ret;
 }
 
-static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
-  struct nlattr *tb[], struct nlattr *data[])
+int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[])
 {
struct ipvl_dev *ipvlan = netdev_priv(dev);
struct ipvl_port *port;
@@ -594,8 +594,9 @@ static int ipvlan_link_new(struct net *src_net, struct 
net_device *dev,
ipvlan_port_destroy(phy_dev);
return err;
 }
+EXPORT_SYMBOL_GPL(ipvlan_link_new);
 
-static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
+void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
 {

[PATCHv6 5/7] tap: Extending tap device create/destroy APIs

2017-02-10 Thread Sainath Grandhi

Extending tap APIs get/free_minor and create/destroy_cdev to handle more than 
one
type of virtual interface.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c |   6 +--
 drivers/net/tap.c  | 118 +
 include/linux/if_tap.h |   4 +-
 3 files changed, 102 insertions(+), 26 deletions(-)

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 0238df6..a4bfc10 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -163,7 +163,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
 * been registered but before register_netdevice has
 * finished running.
 */
-   err = tap_get_minor(>tap);
+   err = tap_get_minor(macvtap_major, >tap);
if (err)
return notifier_from_errno(err);
 
@@ -171,7 +171,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
classdev = device_create(_class, >dev, devt,
 dev, tap_name);
if (IS_ERR(classdev)) {
-   tap_free_minor(>tap);
+   tap_free_minor(macvtap_major, >tap);
return notifier_from_errno(PTR_ERR(classdev));
}
err = sysfs_create_link(>dev.kobj, >kobj,
@@ -186,7 +186,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
sysfs_remove_link(>dev.kobj, tap_name);
devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor);
device_destroy(_class, devt);
-   tap_free_minor(>tap);
+   tap_free_minor(macvtap_major, >tap);
break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
if (tap_queue_resize(>tap))
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 7d3e8b1..71bbf0b 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -99,12 +99,17 @@ static struct proto tap_proto = {
 };
 
 #define TAP_NUM_DEVS (1U << MINORBITS)
+
+static LIST_HEAD(major_list);
+
 struct major_info {
+   struct rcu_head rcu;
dev_t major;
struct idr minor_idr;
struct mutex minor_lock;
const char *device_name;
-} macvtap_major;
+   struct list_head next;
+};
 
 #define GOODCOPY_LEN 128
 
@@ -385,44 +390,89 @@ rx_handler_result_t tap_handle_frame(struct sk_buff 
**pskb)
return RX_HANDLER_CONSUMED;
 }
 
-int tap_get_minor(struct tap_dev *tap)
+static struct major_info *tap_get_major(int major)
+{
+   struct major_info *tap_major;
+
+   list_for_each_entry_rcu(tap_major, _list, next) {
+   if (tap_major->major == major)
+   return tap_major;
+   }
+
+   return NULL;
+}
+
+int tap_get_minor(dev_t major, struct tap_dev *tap)
 {
int retval = -ENOMEM;
+   struct major_info *tap_major;
+
+   rcu_read_lock();
+   tap_major = tap_get_major(MAJOR(major));
+   if (!tap_major) {
+   retval = -EINVAL;
+   goto unlock;
+   }
 
-   mutex_lock(_major.minor_lock);
-   retval = idr_alloc(_major.minor_idr, tap, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
+   mutex_lock(_major->minor_lock);
+   retval = idr_alloc(_major->minor_idr, tap, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
if (retval >= 0) {
tap->minor = retval;
} else if (retval == -ENOSPC) {
netdev_err(tap->dev, "Too many tap devices\n");
retval = -EINVAL;
}
-   mutex_unlock(_major.minor_lock);
+   mutex_unlock(_major->minor_lock);
+
+unlock:
+   rcu_read_unlock();
return retval < 0 ? retval : 0;
 }
 
-void tap_free_minor(struct tap_dev *tap)
+void tap_free_minor(dev_t major, struct tap_dev *tap)
 {
-   mutex_lock(_major.minor_lock);
+   struct major_info *tap_major;
+
+   rcu_read_lock();
+   tap_major = tap_get_major(MAJOR(major));
+   if (!tap_major) {
+   goto unlock;
+   }
+
+   mutex_lock(_major->minor_lock);
if (tap->minor) {
-   idr_remove(_major.minor_idr, tap->minor);
+   idr_remove(_major->minor_idr, tap->minor);
tap->minor = 0;
}
-   mutex_unlock(_major.minor_lock);
+   mutex_unlock(_major->minor_lock);
+
+unlock:
+   rcu_read_unlock();
 }
 
-static struct tap_dev *dev_get_by_tap_minor(int minor)
+static struct tap_dev *dev_get_by_tap_file(int major, int minor)
 {
struct net_device *dev = NULL;
struct tap_dev *tap;
+   struct major_info *tap_major;
 
-   mutex_lock(_major.minor_lock);
-   tap = idr_find(_major.minor_idr, minor);
+   rcu_read_lock();
+   tap_major = tap_get_major(major);
+   if (!tap_major) {
+   tap = NULL;
+   goto unlock;
+   }
+
+   mutex_lock(_major->minor_lock);
+   tap =

[PATCHv6 6/7] tap: tap as an independent module

2017-02-10 Thread Sainath Grandhi

This patch makes tap a separate module for other types of virtual interfaces, 
for example,
ipvlan to use.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Kconfig   |  7 +++
 drivers/net/Makefile  |  3 +--
 drivers/net/{macvtap_main.c => macvtap.c} |  0
 drivers/net/tap.c | 11 +++
 drivers/vhost/Kconfig |  2 +-
 include/linux/if_tap.h|  4 ++--
 6 files changed, 22 insertions(+), 5 deletions(-)
 rename drivers/net/{macvtap_main.c => macvtap.c} (100%)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index a993cbe..5763503 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -135,6 +135,7 @@ config MACVTAP
tristate "MAC-VLAN based tap driver"
depends on MACVLAN
depends on INET
+   select TAP
help
  This adds a specialized tap character device driver that is based
  on the MAC-VLAN network interface, called macvtap. A macvtap device
@@ -287,6 +288,12 @@ config TUN
 
  If you don't know what to use this for, you don't need it.
 
+config TAP
+   tristate
+   ---help---
+ This option is selected by any driver implementing tap user space
+ interface for a virtual interface to re-use core tap functionality.
+
 config TUN_VNET_CROSS_LE
bool "Support for cross-endian vnet headers on little-endian kernels"
default n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 19b03a9..7dd86ca 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_PHYLIB) += phy/
 obj-$(CONFIG_RIONET) += rionet.o
 obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
+obj-$(CONFIG_TAP) += tap.o
 obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
@@ -29,8 +30,6 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 
-macvtap-objs := macvtap_main.o tap.o
-
 #
 # Networking Drivers
 #
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap.c
similarity index 100%
rename from drivers/net/macvtap_main.c
rename to drivers/net/macvtap.c
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 71bbf0b..35b55a2 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -312,6 +312,7 @@ void tap_del_queues(struct tap_dev *tap)
/* guarantee that any future tap_set_queue will fail */
tap->numvtaps = MAX_TAP_QUEUES;
 }
+EXPORT_SYMBOL_GPL(tap_del_queues);
 
 rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 {
@@ -389,6 +390,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
kfree_skb(skb);
return RX_HANDLER_CONSUMED;
 }
+EXPORT_SYMBOL_GPL(tap_handle_frame);
 
 static struct major_info *tap_get_major(int major)
 {
@@ -428,6 +430,7 @@ int tap_get_minor(dev_t major, struct tap_dev *tap)
rcu_read_unlock();
return retval < 0 ? retval : 0;
 }
+EXPORT_SYMBOL_GPL(tap_get_minor);
 
 void tap_free_minor(dev_t major, struct tap_dev *tap)
 {
@@ -449,6 +452,7 @@ void tap_free_minor(dev_t major, struct tap_dev *tap)
 unlock:
rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(tap_free_minor);
 
 static struct tap_dev *dev_get_by_tap_file(int major, int minor)
 {
@@ -1210,6 +1214,7 @@ int tap_queue_resize(struct tap_dev *tap)
kfree(arrays);
return ret;
 }
+EXPORT_SYMBOL_GPL(tap_queue_resize);
 
 static int tap_list_add(dev_t major, const char *device_name)
 {
@@ -1257,6 +1262,7 @@ int tap_create_cdev(struct cdev *tap_cdev,
 out1:
return err;
 }
+EXPORT_SYMBOL_GPL(tap_create_cdev);
 
 void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)
 {
@@ -1272,3 +1278,8 @@ void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)
}
}
 }
+EXPORT_SYMBOL_GPL(tap_destroy_cdev);
+
+MODULE_AUTHOR("Arnd Bergmann ");
+MODULE_AUTHOR("Sainath Grandhi ");
+MODULE_LICENSE("GPL");
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 40764ec..cfdecea 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,6 @@
 config VHOST_NET
tristate "Host kernel accelerator for virtio net"
-   depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP)
+   depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP)
select VHOST
---help---
  This kernel module can be loaded in host kernel to accelerate
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 362e71c..3482c3c 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -1,7 +1,7 @@
 #ifndef _LINUX_IF_TAP_H_
 #define _LINUX_IF_TAP_H_
 
-#if IS_ENABLED(CONFIG_MACVTAP)
+#if IS_ENABLED(CONFIG_TAP)
 struct socket *tap_get_socket(struct file *);
 #else
 #include 
@@ -12,7 +12,7 @@ static inline struct socket *tap_get_socket(struct file *f)
 {
return ERR_PTR(-EINVAL);
 }
-#endif /*

[PATCHv6 3/7] tap: Tap character device creation/destroy API

2017-02-10 Thread Sainath Grandhi

This patch provides tap device create/destroy APIs in tap.c.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c | 30 +++---
 drivers/net/tap.c  | 62 ++
 include/linux/if_tap.h |  3 +++
 3 files changed, 63 insertions(+), 32 deletions(-)

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 548f339..215ab7a 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -28,7 +28,6 @@
  * Variables for dealing with macvtaps device numbers.
  */
 static dev_t macvtap_major;
-#define MACVTAP_NUM_DEVS (1U << MINORBITS)
 
 static const void *macvtap_net_namespace(struct device *d)
 {
@@ -159,57 +158,46 @@ static struct notifier_block macvtap_notifier_block 
__read_mostly = {
.notifier_call  = macvtap_device_event,
 };
 
-extern struct file_operations tap_fops;
 static int macvtap_init(void)
 {
int err;
 
-   err = alloc_chrdev_region(_major, 0,
-   MACVTAP_NUM_DEVS, "macvtap");
-   if (err)
-   goto out1;
+   err = tap_create_cdev(_cdev, _major, "macvtap");
 
-   cdev_init(_cdev, _fops);
-   err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS);
if (err)
-   goto out2;
+   goto out1;
 
err = class_register(_class);
if (err)
-   goto out3;
+   goto out2;
 
err = register_netdevice_notifier(_notifier_block);
if (err)
-   goto out4;
+   goto out3;
 
err = macvlan_link_register(_link_ops);
if (err)
-   goto out5;
+   goto out4;
 
return 0;
 
-out5:
-   unregister_netdevice_notifier(_notifier_block);
 out4:
-   class_unregister(_class);
+   unregister_netdevice_notifier(_notifier_block);
 out3:
-   cdev_del(_cdev);
+   class_unregister(_class);
 out2:
-   unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
+   tap_destroy_cdev(macvtap_major, _cdev);
 out1:
return err;
 }
 module_init(macvtap_init);
 
-extern struct idr minor_idr;
 static void macvtap_exit(void)
 {
rtnl_link_unregister(_link_ops);
unregister_netdevice_notifier(_notifier_block);
class_unregister(_class);
-   cdev_del(_cdev);
-   unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
-   idr_destroy(_idr);
+   tap_destroy_cdev(macvtap_major, _cdev);
 }
 module_exit(macvtap_exit);
 
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 15ca2d5..04ba978 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -123,8 +123,12 @@ static struct proto tap_proto = {
 };
 
 #define TAP_NUM_DEVS (1U << MINORBITS)
-static DEFINE_MUTEX(minor_lock);
-DEFINE_IDR(minor_idr);
+struct major_info {
+   dev_t major;
+   struct idr minor_idr;
+   struct mutex minor_lock;
+   const char *device_name;
+} macvtap_major;
 
 #define GOODCOPY_LEN 128
 
@@ -413,26 +417,26 @@ int tap_get_minor(struct macvlan_dev *vlan)
 {
int retval = -ENOMEM;
 
-   mutex_lock(_lock);
-   retval = idr_alloc(_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL);
+   mutex_lock(_major.minor_lock);
+   retval = idr_alloc(_major.minor_idr, vlan, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
if (retval >= 0) {
vlan->minor = retval;
} else if (retval == -ENOSPC) {
netdev_err(vlan->dev, "Too many tap devices\n");
retval = -EINVAL;
}
-   mutex_unlock(_lock);
+   mutex_unlock(_major.minor_lock);
return retval < 0 ? retval : 0;
 }
 
 void tap_free_minor(struct macvlan_dev *vlan)
 {
-   mutex_lock(_lock);
+   mutex_lock(_major.minor_lock);
if (vlan->minor) {
-   idr_remove(_idr, vlan->minor);
+   idr_remove(_major.minor_idr, vlan->minor);
vlan->minor = 0;
}
-   mutex_unlock(_lock);
+   mutex_unlock(_major.minor_lock);
 }
 
 static struct net_device *dev_get_by_tap_minor(int minor)
@@ -440,13 +444,13 @@ static struct net_device *dev_get_by_tap_minor(int minor)
struct net_device *dev = NULL;
struct macvlan_dev *vlan;
 
-   mutex_lock(_lock);
-   vlan = idr_find(_idr, minor);
+   mutex_lock(_major.minor_lock);
+   vlan = idr_find(_major.minor_idr, minor);
if (vlan) {
dev = vlan->dev;
dev_hold(dev);
}
-   mutex_unlock(_lock);
+   mutex_unlock(_major.minor_lock);
return dev;
 }
 
@@ -1184,3 +1188,39 @@ int tap_queue_resize(struct macvlan_dev *vlan)
kfree(arrays);
return ret;
 }
+
+int tap_create_cdev(struct cdev *tap_cdev,
+   dev_t *tap_major, const char *device_name)
+{
+   int err;
+
+   err = alloc_chrdev_region(tap_major, 0, TAP_NUM_DEVS, device_name);
+   if (err)
+   goto out1;
+
+   cdev_init(tap_cdev, _fops);
+

[PATCH] NET: Fix /proc/net/arp for AX.25

2017-02-10 Thread Ralf Baechle

When sending ARP requests over AX.25 links the hwaddress in the neighbour
cache are not getting initialized.  For such an incomplete arp entry
ax2asc2 will generate an empty string resulting in /proc/net/arp output
like the following:

$ cat /proc/net/arp
IP address   HW type Flags   HW addressMask Device
192.168.122.10x1 0x2 52:54:00:00:5d:5f *ens3
172.20.1.99  0x3 0x0  *bpq0

The missing field will confuse the procfs parsing of arp(8) resulting in
incorrect output for the device such as the following:

$ arp
Address  HWtype  HWaddress   Flags MaskIface
gateway  ether   52:54:00:00:5d:5f   C ens3
172.20.1.99  (incomplete)  ens3

This changes the content of /proc/net/arp to:

$ cat /proc/net/arp
IP address   HW type Flags   HW addressMask Device
172.20.1.99  0x3 0x0 * *bpq0
192.168.122.10x1 0x2 52:54:00:00:5d:5f *ens3

To do so it change ax2asc to put the string "*" in buf for a NULL address
argument.  Finally the HW address field is left aligned in a 17 character
field (the length of an ethernet HW address in the usual hex notation) for
readability.

Signed-off-by: Ralf Baechle 
---
 net/ipv4/arp.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 89a8cac4..51b27ae 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1263,7 +1263,7 @@ void __init arp_init(void)
 /*
  * ax25 -> ASCII conversion
  */
-static char *ax2asc2(ax25_address *a, char *buf)
+static void ax2asc2(ax25_address *a, char *buf)
 {
char c, *s;
int n;
@@ -1285,10 +1285,10 @@ static char *ax2asc2(ax25_address *a, char *buf)
*s++ = n + '0';
*s++ = '\0';
 
-   if (*buf == '\0' || *buf == '-')
-   return "*";
-
-   return buf;
+   if (*buf == '\0' || *buf == '-') {
+   buf[0] = '*';
+   buf[1] = '\0';
+   }
 }
 #endif /* CONFIG_AX25 */
 
@@ -1322,7 +1322,7 @@ static void arp_format_neigh_entry(struct seq_file *seq,
}
 #endif
sprintf(tbuf, "%pI4", n->primary_key);
-   seq_printf(seq, "%-16s 0x%-10x0x%-10x%s *%s\n",
+   seq_printf(seq, "%-16s 0x%-10x0x%-10x%-17s *%s\n",
   tbuf, hatype, arp_state_to_flags(n), hbuffer, dev->name);
read_unlock(>lock);
 }

[PATCH] ibmvnic: Initialize completion variables before starting work

2017-02-10 Thread Nathan Fontenot

Initialize condition variables prior to invoking any work that can
mark them complete. This resolves a race in the ibmvnic driver where
the driver faults trying to complete an uninitialized condition
variable.

Signed-off-by: Nathan Fontenot 
---
 drivers/net/ethernet/ibm/ibmvnic.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index a024141..752b082 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -189,9 +189,10 @@ static int alloc_long_term_buff(struct ibmvnic_adapter 
*adapter,
}
ltb->map_id = adapter->map_id;
adapter->map_id++;
+
+   init_completion(>fw_done);
send_request_map(adapter, ltb->addr,
 ltb->size, ltb->map_id);
-   init_completion(>fw_done);
wait_for_completion(>fw_done);
return 0;
 }
@@ -1121,10 +1122,10 @@ static void ibmvnic_get_ethtool_stats(struct net_device 
*dev,
crq.request_statistics.ioba = cpu_to_be32(adapter->stats_token);
crq.request_statistics.len =
cpu_to_be32(sizeof(struct ibmvnic_statistics));
-   ibmvnic_send_crq(adapter, );
 
/* Wait for data to be written */
init_completion(>stats_done);
+   ibmvnic_send_crq(adapter, );
wait_for_completion(>stats_done);
 
for (i = 0; i < ARRAY_SIZE(ibmvnic_stats); i++)
@@ -2799,9 +2800,9 @@ static ssize_t trace_read(struct file *file, char __user 
*user_buf, size_t len,
crq.collect_fw_trace.correlator = adapter->ras_comps[num].correlator;
crq.collect_fw_trace.ioba = cpu_to_be32(trace_tok);
crq.collect_fw_trace.len = adapter->ras_comps[num].trace_buff_size;
-   ibmvnic_send_crq(adapter, );
 
init_completion(>fw_done);
+   ibmvnic_send_crq(adapter, );
wait_for_completion(>fw_done);
 
if (*ppos + len > be32_to_cpu(adapter->ras_comps[num].trace_buff_size))
@@ -3581,9 +3582,9 @@ static int ibmvnic_dump_show(struct seq_file *seq, void 
*v)
memset(, 0, sizeof(crq));
crq.request_dump_size.first = IBMVNIC_CRQ_CMD;
crq.request_dump_size.cmd = REQUEST_DUMP_SIZE;
-   ibmvnic_send_crq(adapter, );
 
init_completion(>fw_done);
+   ibmvnic_send_crq(adapter, );
wait_for_completion(>fw_done);
 
seq_write(seq, adapter->dump_data, adapter->dump_data_size);
@@ -3629,8 +3630,8 @@ static void handle_crq_init_rsp(struct work_struct *work)
}
}
 
-   send_version_xchg(adapter);
reinit_completion(>init_done);
+   send_version_xchg(adapter);
if (!wait_for_completion_timeout(>init_done, timeout)) {
dev_err(dev, "Passive init timeout\n");
goto task_failed;
@@ -3640,9 +3641,9 @@ static void handle_crq_init_rsp(struct work_struct *work)
if (adapter->renegotiate) {
adapter->renegotiate = false;
release_sub_crqs_no_irqs(adapter);
-   send_cap_queries(adapter);
 
reinit_completion(>init_done);
+   send_cap_queries(adapter);
if (!wait_for_completion_timeout(>init_done,
 timeout)) {
dev_err(dev, "Passive init timeout\n");
@@ -3772,9 +3773,9 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)
adapter->debugfs_dump = ent;
}
}
-   ibmvnic_send_crq_init(adapter);
 
init_completion(>init_done);
+   ibmvnic_send_crq_init(adapter);
if (!wait_for_completion_timeout(>init_done, timeout))
return 0;
 
@@ -3782,9 +3783,9 @@ static int ibmvnic_probe(struct vio_dev *dev, const 
struct vio_device_id *id)
if (adapter->renegotiate) {
adapter->renegotiate = false;
release_sub_crqs_no_irqs(adapter);
-   send_cap_queries(adapter);
 
reinit_completion(>init_done);
+   send_cap_queries(adapter);
if (!wait_for_completion_timeout(>init_done,
 timeout))
return 0;

[PATCH iproute2 1/1] man page: add page for skbmod action

2017-02-10 Thread Lucas Bates

Signed-off-by: Lucas Bates 
Signed-off-by: Jamal Hadi Salim 
Signed-off-by: Roman Mashak 
---
 man/man8/Makefile|   2 +-
 man/man8/tc-skbmod.8 | 137 +++
 2 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/tc-skbmod.8

diff --git a/man/man8/Makefile b/man/man8/Makefile
index bc2fc81..1bd2f02 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -16,7 +16,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 
rtmon.8 rtpr.8 ss.
tc-basic.8 tc-cgroup.8 tc-flow.8 tc-flower.8 tc-fw.8 tc-route.8 \
tc-tcindex.8 tc-u32.8 tc-matchall.8 \
tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
-   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 \
+   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8 tc-skbmod.8 tc-ife.8 \
tc-tunnel_key.8 \
devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8 \
ifstat.8
diff --git a/man/man8/tc-skbmod.8 b/man/man8/tc-skbmod.8
new file mode 100644
index 000..46418b6
--- /dev/null
+++ b/man/man8/tc-skbmod.8
@@ -0,0 +1,137 @@
+.TH "skbmod action in tc" 8 "21 Sep 2016" "iproute2" "Linux"
+
+.SH NAME
+skbmod - user-friendly packet editor action
+.SH SYNOPSIS
+.in +8
+.ti -8
+.BR tc " ... " "action skbmod " "{ [ " "set "
+.IR SETTABLE " ] [ "
+.BI swap " SWAPPABLE"
+.RI " ] [ " CONTROL " ] [ "
+.BI index " INDEX "
+] }
+
+.ti -8
+.IR SETTABLE " := "
+.RB " [ " dmac
+.IR DMAC " ] "
+.RB " [ " smac
+.IR SMAC " ] "
+.RB " [ " etype
+.IR ETYPE " ] "
+
+.ti -8
+.IR SWAPPABLE " := "
+.B mac
+.ti -8
+.IR CONTROL " := {"
+.BR reclassify " | " pipe " | " drop " | " shot " | " continue " | " pass " }"
+.SH DESCRIPTION
+The
+.B skbmod
+action is intended as a usability upgrade to the existing
+.B pedit
+action. Instead of having to manually edit 8-, 16-, or 32-bit chunks of an
+ethernet header,
+.B skbmod
+allows complete substitution of supported elements.
+.SH OPTIONS
+.TP
+.BI dmac " DMAC"
+Change the destination mac to the specified address.
+.TP
+.BI smac " SMAC"
+Change the source mac to the specified address.
+.TP
+.BI etype " ETYPE"
+Change the ethertype to the specified value.
+.TP
+.BI mac
+Used to swap mac addresses. The
+.B swap mac
+directive is performed
+after any outstanding D/SMAC changes.
+.TP
+.I CONTROL
+The following keywords allow to control how the tree of qdisc, classes,
+filters and actions is further traversed after this action.
+.RS
+.TP
+.B reclassify
+Restart with the first filter in the current list.
+.TP
+.B pipe
+Continue with the next action attached to the same filter.
+.TP
+.B drop
+.TQ
+.B shot
+Drop the packet.
+.TP
+.B continue
+Continue classification with the next filter in line.
+.TP
+.B pass
+Finish classification process and return to calling qdisc for further packet
+processing. This is the default.
+.SH EXAMPLES
+To start, observe the following filter with a pedit action:
+
+.RS
+.EX
+tc filter add dev eth1 parent 1: protocol ip prio 10 \\
+   u32 match ip protocol 1 0xff flowid 1:2 \\
+   action pedit munge offset -14 u8 set 0x02 \\
+   munge offset -13 u8 set 0x15 \\
+   munge offset -12 u8 set 0x15 \\
+   munge offset -11 u8 set 0x15 \\
+   munge offset -10 u16 set 0x1515 \\
+   pipe
+.EE
+.RE
+
+Using the skbmod action, this command can be simplified to:
+
+.RS
+.EX
+tc filter add dev eth1 parent 1: protocol ip prio 10 \\
+   u32 match ip protocol 1 0xff flowid 1:2 \\
+   action skbmod set dmac 02:15:15:15:15:15 \\
+   pipe
+.EE
+.RE
+
+Complexity will increase if source mac and ethertype are also being edited
+as part of the action. If all three fields are to be changed with skbmod:
+
+.RS
+.EX
+tc filter add dev eth5 parent 1: protocol ip prio 10 \\
+   u32 match ip protocol 1 0xff flowid 1:2 \\
+   action skbmod \\
+   set etype 0xBEEF \\
+   set dmac 02:12:13:14:15:16 \\
+   set smac 02:22:23:24:25:26
+.EE
+.RE
+
+Finally, swap the destination and source mac addresses in the header:
+
+.RS
+.EX
+tc filter add dev eth3 parent 1: protocol ip prio 10 \\
+   u32 match ip protocol 1 0xff flowid 1:2 \\
+   action skbmod \\
+   swap mac
+.EE
+.RE
+
+As mentioned above, the swap action will occur after any
+.B " smac/dmac "
+substitutions are executed, if they are present.
+
+.SH SEE ALSO
+.BR tc (8),
+.BR tc-u32 (8),
+.BR tc-pedit (8)
-- 
2.7.4

[PATCH] ibmvnic: Call napi_disable instead of napi_enable in failure path

2017-02-10 Thread Nathan Fontenot

The failure path in ibmvnic_open() mistakenly makes a second call
to napi_enable instead of calling napi_disable. This can result
in a BUG_ON for any queues that were enabled in the previous call
to napi_enable.

Signed-off-by: Nathan Fontenot 
---
 drivers/net/ethernet/ibm/ibmvnic.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index c125966..a024141 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -505,7 +505,7 @@ static int ibmvnic_open(struct net_device *netdev)
adapter->rx_pool = NULL;
 rx_pool_arr_alloc_failed:
for (i = 0; i < adapter->req_rx_queues; i++)
-   napi_enable(>napi[i]);
+   napi_disable(>napi[i]);
 alloc_napi_failed:
return -ENOMEM;
 }

[PATCH] NET: mkiss/6pack: Fix SIOCSIFENCAP ioctl

2017-02-10 Thread Ralf Baechle

When looking at Thomas' mkiss fix 7ba1b6890387 ("NET: mkiss: Fix panic")
I noticed that the mkiss SIOCSIFENCAPS ioctl was also doing a slightly
strange assignment 

   dev->hard_header_len = AX25_KISS_HEADER_LEN +
  AX25_MAX_HEADER_LEN + 3;

AX25_MAX_HEADER_LEN already accounts for the KISS byte so adding
AX25_KISS_HEADER_LEN is a double allocation nor does the "+ 3" seem to
be necessary.  So this can be simplified to

   dev->hard_header_len = AX25_MAX_HEADER_LEN

which after the preceeding fix is a redundant assignment of what
ax_setup has already assigned so delete the line.  The assignments
to dev->addr_len and dev->type are similarly redundant.

The SIOCSIFENCAP argument was never checked for validity.  Check that
it is 4 and return -EINVAL if not.  The magic constant 4 dates back to
the days when KISS was handled by the SLIP driver where it had the
symbol name SL_MODE_AX25.

Since however mkiss.c only supports a single encapsulation mode there
is no point in storing it in struct mkiss so delete all that.

Note that while useless we can't delete the SIOCSIFENCAP ioctl as
kissattach(8) is still using it and without mkiss issuing a
SIOCSIFENCAP ioctl an older kernel that does not have Thomas' mkiss fix
would still panic on attempt to transmit via mkiss.

6pack was suffering from the same issue except there SIOCGIFENCAP was
return 0 for the encapsulation while the spattach utility was passing 4
for the mode, so the mode check added for 6pack is a bit more lenient
allow the values 0 and 4 to be set.  That way we retain the option
to set different encapsulation modes for future extensions.

Signed-off-by: Ralf Baechle 

 drivers/net/hamradio/6pack.c | 10 --
 drivers/net/hamradio/mkiss.c | 10 --
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c
index 470b3dc..d949b9f 100644
--- a/drivers/net/hamradio/6pack.c
+++ b/drivers/net/hamradio/6pack.c
@@ -104,7 +104,6 @@ struct sixpack {
int buffsize;   /* Max buffers sizes */
 
unsigned long   flags;  /* Flag values/ mode etc */
-   unsigned char   mode;   /* 6pack mode */
 
/* 6pack stuff */
unsigned char   tx_delay;
@@ -723,11 +722,10 @@ static int sixpack_ioctl(struct tty_struct *tty, struct 
file *file,
break;
}
 
-   sp->mode = tmp;
-   dev->addr_len= AX25_ADDR_LEN;
-   dev->hard_header_len = AX25_KISS_HEADER_LEN +
-  AX25_MAX_HEADER_LEN + 3;
-   dev->type= ARPHRD_AX25;
+   if (tmp != 0 && tmp != 4) {
+   err = -EINVAL;
+   break;
+   }
 
err = 0;
break;
diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index 1dfe230..cdaf819 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -71,7 +71,6 @@ struct mkiss {
 #define AXF_KEEPTEST   3   /* Keepalive test flag  */
 #define AXF_OUTWAIT4   /* is outpacket was flag*/
 
-   int mode;
 intcrcmode;/* MW: for FlexNet, SMACK etc.  */
int crcauto;/* CRC auto mode */
 
@@ -841,11 +840,10 @@ static int mkiss_ioctl(struct tty_struct *tty, struct 
file *file,
break;
}
 
-   ax->mode = tmp;
-   dev->addr_len= AX25_ADDR_LEN;
-   dev->hard_header_len = AX25_KISS_HEADER_LEN +
-  AX25_MAX_HEADER_LEN + 3;
-   dev->type= ARPHRD_AX25;
+   if (tmp != 4) {
+   err = -EINVAL;
+   break;
+   }
 
err = 0;
break;

[PATCH] net: natsemi: ns83820: use new api ethtool_{get|set}_link_ksettings

2017-02-10 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/natsemi/ns83820.c |   46 +--
 1 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/ns83820.c 
b/drivers/net/ethernet/natsemi/ns83820.c
index f9d2eb9..729095d 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -1217,12 +1217,13 @@ static void ns83820_update_stats(struct ns83820 *dev)
 }
 
 /* Let ethtool retrieve info */
-static int ns83820_get_settings(struct net_device *ndev,
-   struct ethtool_cmd *cmd)
+static int ns83820_get_link_ksettings(struct net_device *ndev,
+ struct ethtool_link_ksettings *cmd)
 {
struct ns83820 *dev = PRIV(ndev);
u32 cfg, tanar, tbicr;
int fullduplex   = 0;
+   u32 supported;
 
/*
 * Here's the list of available ethtool commands from other drivers:
@@ -1244,44 +1245,47 @@ static int ns83820_get_settings(struct net_device *ndev,
 
fullduplex = (cfg & CFG_DUPSTS) ? 1 : 0;
 
-   cmd->supported = SUPPORTED_Autoneg;
+   supported = SUPPORTED_Autoneg;
 
if (dev->CFG_cache & CFG_TBI_EN) {
/* we have optical interface */
-   cmd->supported |= SUPPORTED_1000baseT_Half |
+   supported |= SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full |
SUPPORTED_FIBRE;
-   cmd->port   = PORT_FIBRE;
+   cmd->base.port   = PORT_FIBRE;
} else {
/* we have copper */
-   cmd->supported |= SUPPORTED_10baseT_Half |
+   supported |= SUPPORTED_10baseT_Half |
SUPPORTED_10baseT_Full | SUPPORTED_100baseT_Half |
SUPPORTED_100baseT_Full | SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full |
SUPPORTED_MII;
-   cmd->port = PORT_MII;
+   cmd->base.port = PORT_MII;
}
 
-   cmd->duplex = fullduplex ? DUPLEX_FULL : DUPLEX_HALF;
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+
+   cmd->base.duplex = fullduplex ? DUPLEX_FULL : DUPLEX_HALF;
switch (cfg / CFG_SPDSTS0 & 3) {
case 2:
-   ethtool_cmd_speed_set(cmd, SPEED_1000);
+   cmd->base.speed = SPEED_1000;
break;
case 1:
-   ethtool_cmd_speed_set(cmd, SPEED_100);
+   cmd->base.speed = SPEED_100;
break;
default:
-   ethtool_cmd_speed_set(cmd, SPEED_10);
+   cmd->base.speed = SPEED_10;
break;
}
-   cmd->autoneg = (tbicr & TBICR_MR_AN_ENABLE)
+   cmd->base.autoneg = (tbicr & TBICR_MR_AN_ENABLE)
? AUTONEG_ENABLE : AUTONEG_DISABLE;
return 0;
 }
 
 /* Let ethool change settings*/
-static int ns83820_set_settings(struct net_device *ndev,
-   struct ethtool_cmd *cmd)
+static int ns83820_set_link_ksettings(struct net_device *ndev,
+ const struct ethtool_link_ksettings *cmd)
 {
struct ns83820 *dev = PRIV(ndev);
u32 cfg, tanar;
@@ -1306,10 +1310,10 @@ static int ns83820_set_settings(struct net_device *ndev,
spin_lock(>tx_lock);
 
/* Set duplex */
-   if (cmd->duplex != fullduplex) {
+   if (cmd->base.duplex != fullduplex) {
if (have_optical) {
/*set full duplex*/
-   if (cmd->duplex == DUPLEX_FULL) {
+   if (cmd->base.duplex == DUPLEX_FULL) {
/* force full duplex */
writel(readl(dev->base + TXCFG)
| TXCFG_CSI | TXCFG_HBI | TXCFG_ATP,
@@ -1333,7 +1337,7 @@ static int ns83820_set_settings(struct net_device *ndev,
 
/* Set autonegotiation */
if (1) {
-   if (cmd->autoneg == AUTONEG_ENABLE) {
+   if (cmd->base.autoneg == AUTONEG_ENABLE) {
/* restart auto negotiation */
writel(TBICR_MR_AN_ENABLE | TBICR_MR_RESTART_AN,
dev->base + TBICR);
@@ -1348,7 +1352,7 @@ static int ns83820_set_settings(struct net_device *ndev,
}
 
printk(KERN_INFO "%s: autoneg %s via ethtool\n", ndev->name,
-   cmd->autoneg ? "ENABLED" : "DISABLED");
+   cmd->base.autoneg ? "ENABLED" : "DISABLED");
}

Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions

2017-02-10 Thread Daniel Borkmann


On 02/10/2017 06:22 AM, Alexei Starovoitov wrote:

On Thu, Feb 09, 2017 at 12:25:37PM +0100, Daniel Borkmann wrote:


Correct the overlap both use-cases share is the dump itself. It needs
to be in such a condition for CRIU, that it can be reloaded eventually,


I don't think it makes sense to drag criu into this discussion.
I expressed my take on criu in the other thread. tldr:
bpf is a graph of dependencies between programs, maps, applications
and kernel events. So to save/restore this graph one would need to solve
very hard problems of stopping multiple applications at once,
stopping kernel events and so on. I don't think it's worth going that route.


Definitely not straight forward, fully agree. Worst-case you probably
need to go via stop_machine() (like in ftrace case when it modifies
code) in order to get a global consistent snapshot at a specific time.
Sounds ugly. Or if small steps first, tail calls etc would not be supported;
then you would need to tackle progs and generic maps, for the progs part
it could be a very similar interface at least, thus I'm saying that it
would be good if it's designed extendable in future on that regard.


- Alternatively, the attach is always done by passing the FD as an
attribute, so the netlink dump could attach an fd to the running
program, return the FD as an attribute and the bpf program is retrieved

>from the fd. This is a major departure from how dumps work with

processing attributes and needing to attach open files to a process will
be problematic. Integrating the bpf into the dump is a natural fit.


Right, I think it's a natural fit to place it into the various points/
places where it's attached to, as we're stuck with that anyway for the
attachment part. Meaning in cls_bpf, it would go as a mem blob into the
netlink attribute. There would need to be a common BPF core helper that
the various subsystem users call in order to generate that mentioned
output format, and that resulting mem blob is then stuck into either
nlattr, mem provided by syscall, etc.


I think if we use ten different ways to dump it, it will
complicate the user space tooling.
I'd rather see one way of doing it via new syscall command.
Pass prog_fd and it will return insns in some form.

Here is more concrete proposal:
- add two flags to PROG_LOAD:
   BPF_F_ENFORCE_STATELESS - it will require verifier to check that program
   doesn't use maps and any other global state (doesn't use bpf_redirect,
   doesn't use bpf_set_tunnel_key and tunnel_opt)
   This will ensure that program is stateless and pure instruction
   dump is meaningful. For 'ip vrf' case it will be enough.


I don't think such flag will be needed from uapi pov. Verifier can
just set a flag like that in the bpf_prog aux bits while verifying ...


   BPF_F_ALLOW_DUMP - it will save original program, so in the common
   case we wouldn't need to waste memory to save program


... and when that one is passed and the prog has state, then it gets
rejected. Effectively, both flags are saying the same thing. Plus side
is that you don't waste any resources when not set, but problem I see
is that BPF_F_ALLOW_DUMP requires explicit cooperation from a process,
when used for introspection doing that transparently instead might be
more desirable. Problem is that even when transparent, we have mentioned
limitations, so someone who doesn't want to cooperate could then just
use f.e. an empty tail call map on exit and that would be enough to
make dump not supported again. But also with just BPF_F_ALLOW_DUMP, I
can foresee that in half a year or so people request that dump should
be possible also without BPF_F_ALLOW_DUMP explicitly set.


- add new bpf syscall command BPF_PROG_DUMP
   input: prog_fd, output: insns
   it will work right away with OBJ_GET command and the user will
   be able to dump stateless programs pinned in bpffs


(And with that it requires really cooperation by design.)


- add approriate interfaces for different attach points to return prog_fd:
   for cgroup it will be new BPF_PROG_GET command.
   for socket it will be new getsockopt. (Actually BPF_PROG_GET can work
   for sockets too and probably better).


I assume you mean above BPF_PROG_DUMP, right? Yeah, for them it's
not that difficult, agree.


   for xdp and tc we need to find a way to return prog_fd.
   netlink is no good, since it would be very weird to install fd
   and return it async in netlink body. We can simply say that
   whoever wants to dump programs need to first pin them in bpffs
   and then attach to tc/xdp. iproute2 already does it anyway.
   Realistically tc/xdp programs are almost always stateful, so
   dump won't be available for them anyway.


Right, but if it's just for introspection, I still think that this
format I described earlier could work. Meaning for maps, you dump
all the params used to create the map along with refs where they
are used, that would allow for tc/xdp to dump it at least. It still
wouldn't support tail calls, but you

Re: [PATCH net-next 4/4] net/sched: cls_bpf: Use skip flags to reflect HW offload status

2017-02-10 Thread Jakub Kicinski

On Thu,  9 Feb 2017 16:18:08 +0200, Or Gerlitz wrote:
> Currently there is no way of querying whether a filter is
> offloaded to HW or not when using both policy (no flag).
> 
> Reuse the skip flags to show the insertion status by setting
> the skip_hw flag in case the filter wasn't offloaded.
> 
> Signed-off-by: Or Gerlitz 

FWIW I tested this one and it works.  I also tested this version which
would take advantage of @offloaded:

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index d9c97018317d..51d464f991ff 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -568,8 +568,8 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto 
*tp, unsigned long fh,
struct sk_buff *skb, struct tcmsg *tm)
 {
struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh;
+   u32 gen_flags, bpf_flags = 0;
struct nlattr *nest;
-   u32 bpf_flags = 0;
int ret;
 
if (prog == NULL)
@@ -601,8 +601,11 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto 
*tp, unsigned long fh,
bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
if (bpf_flags && nla_put_u32(skb, TCA_BPF_FLAGS, bpf_flags))
goto nla_put_failure;
-   if (prog->gen_flags &&
-   nla_put_u32(skb, TCA_BPF_FLAGS_GEN, prog->gen_flags))
+
+   gen_flags = prog->gen_flags;
+   if (!prog->offloaded)
+   gen_flags |= TCA_CLS_FLAGS_SKIP_HW;
+   if (gen_flags && nla_put_u32(skb, TCA_BPF_FLAGS_GEN, gen_flags))
goto nla_put_failure;
 
nla_nest_end(skb, nest);

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2017-02-10 Thread Al Viro

[repost with netdev added - hadn't realized it wasn't in Cc]

On Tue, Aug 09, 2016 at 03:58:36PM +0100, Al Viro wrote:

> Actually returning to the original behaviour would be "restore ->msg_iter
> if we tried skb_copy_and_csum_datagram() and failed for any reason".  Which
> would be bloody inconsistent wrt EFAULT, since the other branch (chunk
> large enough to cover the entire recvmsg()) will copy as much as it can
> and (in old kernel) drain iovec or (on the current one) leave iov_iter
> advance unreverted.

To resurrect the old thread: the problem is still there.  Namely, csum
mismatch on packet should leave the iterator as it had been.  That much
is clear; the question is what should be done on EFAULT halfway through.

Semantics of both csum and non-csum skb_copy_datagram_msg() variants in
EFAULT case is an interesting question.  None of that family report
partial copy; it's full or -EFAULT.  So for the sake of basic sanity
it would be better to leave iterator in the original state when that
kind of thing happens.  On the other hand, quite a few callers don't
care about the state of iterator after that and I wonder if the overhead
would be sensitive.  OTTH, the overhead in question is "save 5 words into
local variable and don't use it in the normal case" - in the code that
copies an skb worth of data.

AFAICS, the following gives consistent (and minimally surprising) semantics,
as well as fixing the outright bug with iov_iter left advanced in case of csum
errors.  Comments?

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index c27011bbe30c..14ae17e77603 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -848,7 +848,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
total += VLAN_HLEN;
 
-   ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
+   ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
if (ret || !iov_iter_count(iter))
goto done;
 
@@ -857,7 +857,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
goto done;
}
 
-   ret = skb_copy_datagram_iter(skb, vlan_offset, iter,
+   ret = __skb_copy_datagram_iter(skb, vlan_offset, iter,
 skb->len - vlan_offset);
 
 done:
@@ -899,11 +899,14 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q,
finish_wait(sk_sleep(>sk), );
 
if (skb) {
+   struct iov_iter saved = *to;
ret = macvtap_put_user(q, skb, to);
-   if (unlikely(ret < 0))
+   if (unlikely(ret < 0)) {
+   *to = saved;
kfree_skb(skb);
-   else
+   } else {
consume_skb(skb);
+   }
}
return ret;
 }
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a411b43a69eb..0d8badc3c4e9 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -480,7 +480,7 @@ static ssize_t ppp_read(struct file *file, char __user *buf,
iov.iov_base = buf;
iov.iov_len = count;
iov_iter_init(, READ, , 1, count);
-   if (skb_copy_datagram_iter(skb, 0, , skb->len))
+   if (__skb_copy_datagram_iter(skb, 0, , skb->len))
goto outf;
ret = skb->len;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 30863e378925..2003b8c9970e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1430,7 +1430,7 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 
vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
 
-   ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
+   ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset);
if (ret || !iov_iter_count(iter))
goto done;
 
@@ -1439,7 +1439,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
goto done;
}
 
-   skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset);
+   /* XXX: no error check? */
+   __skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - 
vlan_offset);
 
 done:
/* caller is in process context, */
@@ -1501,6 +1502,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct 
tun_file *tfile,
 {
struct sk_buff *skb;
ssize_t ret;
+   struct iov_iter saved;
int err;
 
tun_debug(KERN_INFO, tun, "tun_do_read\n");
@@ -1513,11 +1515,14 @@ static ssize_t tun_do_read(struct tun_struct *tun, 
struct tun_file *tfile,
if (!skb)
return err;
 
+   saved = *to;
ret = tun_put_user(tun, tfile, skb, to);
-   if (unlikely(ret < 0))
+   if (unlikely(ret < 0)) {
+   *to = saved;
kfree_skb(skb);
-   else
+   } else

Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag

2017-02-10 Thread Andy Lutomirski

On Thu, Feb 9, 2017 at 10:59 AM, Alexei Starovoitov  wrote:
> If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
> to the given cgroup the descendent cgroup will be able to override
> effective bpf program that was inherited from this cgroup.
> By default it's not passed, therefore override is disallowed.
>
> Examples:
> 1.
> prog X attached to /A with default
> prog Y fails to attach to /A/B and /A/B/C
> Everything under /A runs prog X
>
> 2.
> prog X attached to /A with ALLOW_OVERRIDE
> prog Y attached to /A/B with default. Everything under /A/B runs prog Y

I think that, for ease of future extension, Y should also need
ALLOW_OVERRIDE.  Otherwise, when non-overridable hooks can stack,
there could be confusion as to whether Y should override something or
should stack.

> prog M attached to /A/C with default. Everything under /A/C runs prog M
> prog N fails to attach to /A/C/foo.
> prog L attached to /A/D with ALLOW_OVERRIDE.
>   Events under /A/D run prog L and can be overridden in /A/D/foo
>
> /A still runs prog X
> prog K attached to /A with ALLOW_OVERRIDE.
>   /A now runs prog K while /A/B runs prog Y and /A/C runs prog M
> prog J attached to /A with default.
>   /A now runs prog J while /A/B runs prog Y.
>   /A/B cannot be changed anymore (since parent disallows override),
>   but can be cleared. After detach /A/B will run prog J.
>
> Signed-off-by: Alexei Starovoitov 
> ---
>
> Below are few proposals for future extensions and not definitive:
> 1.
> we can extend the behavior with a chain of non-overridable like:
> prog X attached to /A with default
> prog Y attached to /A/B with default
> The events scoped by /A/B will run program Y first and if it returns 1
> the prog X will be run. For control app there will be an illusion
> that it owns cgroup /A/B with single prog and detach from /A/B will delete
> prog Y unambiguously.
> While another control app that attached to /A also see its prog X running,
> unless prog Y filtered it out, which means (from X point of view)
> that event didn't happen.
> Attaching two programs to /A is not allowed.
> We would need to combine prog X and Y into array to avoid link list
> traversal for performance reasons, but that's an implementation detail.
>
> 2.
> we can add another flag to reverse this call order too.
> Instead of calling the progs from child to parent, do parent to child.

I think the order should depend on the hook.  Hooks for
process-initiated actions (egress, socket creation) should run
innermost first and hooks for outside actions (ingress) should be
outermost first.

>
> 3.
> we can extend the api further by adding 'attach_priority' flag as:
> prog X attach /A prio=20
> prog Y attach /A prio=10
> prog N attach /A/B prio=20
> prog M attach /A/B prio=10
> in /A/B the sequence of progs will be M -> N -> Y -> X

I haven't thought of a use for this.  Maybe there is one.

>
> prog X attach /A prio=10 and prog Y attach /A prio=10 will be disallowed,
> but attach with the same prio to different cgroups is ok.
> If attached with prio, detach must specify prio as well.
> Attach transitions:
> allow_override -> disable_override/single_prog = ok
> allow_override -> prio (multi prog at the same cgroup) = ok
> disable_override/single_prog -> prio = ok (with respect to child/parent order)
> prio -> allow_override = fail
> prio -> disable_override/single_prog = fail
>
> ***
> To summarize the key to not breaking abi is to preserve user space
> expectations. Right now (without this patch) we have progs
> overridable by any descendent. Which means that control plane
> application has to expect that something may overwrite the program.
> Hence any new flag will not break this expectation
> (overridable == control plane cannot assume that its attached
> programs will run in the hostile environment)
> and that's the main reason why I don't think we need to change anything now
> and hence this patch is an RFC.
>
> Adding 'allow_override' flag and changing the default to
> override disallowed is also fine from api extensibility point of view.
> Since for 'override disallowed' case the control plane app will
> be expecting that any processes will not override its program
> in the descendent cgroups and it will run. This would have to be preserved.
> That's why the future api extensions (like #1 above) would have to do
> the program chaining to preserve 'disallow override' flag expectations.
> So imo it's safer to keep overridable as it is today, since this flag
> adds a bit more restrictions to the future extensions
> comparing to everything overridable.
>
> Andy,
> does it all make sense?

Yes with the caveat above.

> Do you still insist on submitting this patch officially?

I'm not sure what you mean.

> or you're ok keeping it overridable for now.

I really think the default should change for 4.10.  People are going
to use this feature for sandboxing or in systemd or whatever, and that
code should keep working in newer kernels

Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency

2017-02-10 Thread Arnd Bergmann

On Fri, Feb 10, 2017 at 9:57 PM, Florian Fainelli  wrote:
> On 02/10/2017 12:05 PM, Arnd Bergmann wrote:
>> On Friday, February 10, 2017 9:42:21 AM CET Florian Fainelli wrote:
>>> On 02/10/2017 12:20 AM, Arnd Bergmann wrote:
 On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli  
 wrote:
> On 02/09/2017 07:08 AM, Arnd Bergmann wrote:
> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and
> I was not able to reproduce this, what am I missing?

 In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and
 we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in
 the header as a oneline workaround, but I think that would be more 
 confusing
 to real users that try to use CONFIG_PHY=m without realizing why they lose
 access to their switch.
>>>
>>> I see, this patch should also help fixing this:
>>>
>>> http://patchwork.ozlabs.org/patch/726381/
>>
>> I think you still have the same problem, as you can still have the
>> boardinfo registration in a loadable module.
>
> The patch exports mdiobus_register_board_info() so that should solve
> your problem here, and I did verify this with a loadable module that
> references mdiobus_register_board_info() in that case.

No, that's a different problem. What you get with arm allmodconfig
(try it!) is that mdio-bus.ko is a loadable module, but referenced
from built-in code rather than the other way around. Exporting
the symbol doesn't change anything since the module cannot
be loaded by the time we need the symbol.

>>
>> I have come up with a patch too now and done some randconfig testing
>> on it (it took me several tries as well), please see below. It does
>> some of the same things as yours and some others.
>>
>> The main trick is to have a separate 'MDIO_BOARDINFO' Kconfig symbol
>> that can be selected regardless of all the other symbols, and that
>> will lead to the registration being either built-in when it's needed
>> or not built at all when either no board calls it, or PHYLIB is
>> disabled.
>
> Your patch is fine in premise except that you are making CONFIG_MDIO
> encompass both drivers/net/mdio.c and
> drivers/net/phy/mdio_{bus,device}.c and these do share the same header
> (for better or for worse), but are not quite dealing with MDIO at the
> same level. drivers/net/mdio.c is more like PHYLIB for the old-style,
> pre mdiobus() drivers helper functions.

Ah, makes sense. I had missed that part.

> I like it that you made MDIO_BOARDINFO separate, and that is probably a
> patch I should incorporate in the other patch splitting things up, see
> below though for the remainder of the changes.

Ok.

>>
>> From f35e89cacfabdf7b822772013389132605941def Mon Sep 17 00:00:00 2001
>> From: Arnd Bergmann 
>> Date: Wed, 27 Apr 2016 11:51:18 +0200
>> Subject: [PATCH] [RFC] move ethernet PHY config into drivers/phy/Kconfig
>>
>> Calling mdiobus_register_board_info from builtin code with CONFIG_PHYLIB=m
>> currently results in a link error:
>>
>> arch/arm/plat-orion/common.o: In function `orion_ge00_switch_init':
>> common.c:(.init.text+0x6a4): undefined reference to 
>> `mdiobus_register_board_info'
>>
>> As the long-term strategy is to separate mdio from phylib, and to get 
>> generic-phy
>> and (networking-only) phylib closer together, this performs a first step in 
>> that
>> direction: The Kconfig file for phylib gets logically pulled under the PHY
>> driver configuration and becomes independent from networking. This lets us
>> select the new CONFIG_MDIO_BOARDINFO from platforms that need it, and provide
>> the functions exactly when we need them.
>
> This is too broad, the only part that is worth in drivers/net/phy/ of
> pulling out of drivers/net/phy/ is what I tried to extract: mdio bus and
> device. There are some bad inter-dependencies between that code and
> phy_device.c and phy.c which makes it hard to split and make that part
> completely standalone for now.
>
> The only part that is truly valuable to non-Ethernet PHY devices is the
> MDIO bus/device registration part, which is available in my patch with
> CONFIG_MDIO_DEVICE, and which probably should not depend from
> NETDEVICES, so the other part of your patch makes sense too here.

My patch started out from something I had done a long time ago when
we discussed how the two subsystems (generic-phy and phylib) can
be tied together more. This has two aspects:

- Moving them into a single top-level Kconfig menu (and eventually
  directory) to make it easier to find one of them when you look in
  the wrong place. My patch starts doing that.
- making the MDIO bus available to generic-phy drivers. This is
  what your patch does.

Right now, we only really need part of my patch to fix the link
error, but it makes way more sense once all the parts come together.

Arnd

Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)

2017-02-10 Thread Joe Stringer

On 10 February 2017 at 09:42, Arnaldo Carvalho de Melo  wrote:
> Em Wed, Feb 08, 2017 at 09:27:41PM +0100, Mickaël Salaün escreveu:
>> This series brings some fixes and small improvements to the BPF samples.
>>
>> This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf:
>> Add missing header to the library").
>
> Wang, are you ok with this series? Joe?

The changes look good to me. I also tried tracex5 and it seems to work fine.

[GIT] Networking

2017-02-10 Thread David Miller


1) If the timing is wrong we can indefinitely stop generating new
   ipv6 temporary addresses, from Marcus Huewe.

2) Don't double free per-cpu stats in ipv6 SIT tunnel driver, from
   Cong Wang.

3) Put protections in place so that AF_PACKET is not able to submit
   packets which don't even have a link level header to drivers.  From
   Willem de Bruijn.

4) Fix memory leaks in ipv4 and ipv6 multicast code, from Hangbin Liu.

5) Don't use udp_ioctl() in l2tp code, UDP version expects a UDP socket
   and that doesn't go over very well when it is passed an L2TP one.
   Fix from Eric Dumazet.

6) Don't crash on NULL pointer in phy_attach_direct(), from Florian
   Fainelli.

Please pull, thanks a lot.

The following changes since commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-02-07 
12:10:57 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 72fb96e7bdbbdd4421b0726992496531060f3636:

  l2tp: do not use udp_ioctl() (2017-02-10 15:57:34 -0500)


Boris Ostrovsky (1):
  xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()

David Ahern (1):
  lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled

David S. Miller (2):
  Merge branch 'net-header-length-truncation'
  Merge branch 'sierra_net-fixes'

Eric Dumazet (1):
  l2tp: do not use udp_ioctl()

Florian Fainelli (2):
  net: dsa: Do not destroy invalid network devices
  net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()

Hangbin Liu (1):
  igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()

Kejian Yan (1):
  net: hns: Fix the device being used for dma mapping during TX

Marcus Huewe (1):
  ipv6: addrconf: fix generation of new temporary addresses

Ralf Baechle (1):
  NET: mkiss: Fix panic

Ross Lagerwall (1):
  xen-netfront: Improve error handling during initialization

Stefan Brüns (2):
  sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
  sierra_net: Skip validating irrelevant fields for IDLE LSIs

Thanneeru Srinivasulu (1):
  net: thunderx: Fix PHY autoneg for SGMII QLM mode

Vineeth Remanan Pillai (1):
  xen-netfront: Rework the fix for Rx stall during OOM and network stress

WANG Cong (3):
  sit: fix a double free on error path
  ping: fix a null pointer dereference
  kcm: fix 0-length case for kcm_sendmsg()

Willem de Bruijn (2):
  net: introduce device min_header_len
  packet: round up linear to header len

Yendapally Reddy Dhananjaya Reddy (1):
  net: phy: Initialize mdio clock at probe function

 drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 108 

 drivers/net/ethernet/cavium/thunder/thunder_bgx.h |   5 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c |   2 +-
 drivers/net/hamradio/mkiss.c  |   4 ++--
 drivers/net/loopback.c|   1 +
 drivers/net/phy/mdio-bcm-iproc.c  |   6 ++
 drivers/net/phy/phy_device.c  |  28 

 drivers/net/usb/sierra_net.c  | 111 
+++
 drivers/net/xen-netfront.c|  46 
--
 include/linux/netdevice.h |   4 
 include/net/lwtunnel.h|   5 -
 net/dsa/dsa2.c|   1 +
 net/ethernet/eth.c|   1 +
 net/ipv4/igmp.c   |   1 +
 net/ipv4/ping.c   |   2 ++
 net/ipv6/addrconf.c   |   6 ++
 net/ipv6/mcast.c  |   1 +
 net/ipv6/sit.c|   1 +
 net/kcm/kcmsock.c |  40 
++--
 net/l2tp/l2tp_core.h  |   1 +
 net/l2tp/l2tp_ip.c|  27 
++-
 net/l2tp/l2tp_ip6.c   |   2 +-
 net/packet/af_packet.c|   7 ---
 23 files changed, 297 insertions(+), 113 deletions(-)

Re: [PATCH net 1/1] net: fec: fix multicast filtering hardware setup

2017-02-10 Thread Fabio Estevam

On Fri, Feb 10, 2017 at 3:54 AM, Andy Duan  wrote:
> Fix hardware setup of multicast address hash:
> - Never clear the hardware hash (to avoid packet loss)
> - Construct the hash register values in software and then write once
> to hardware
>
> Signed-off-by: Fugang Duan 
> Signed-off-by: Rui Sousa 

It seems you missed to put Rui's name in the From: field.

Re: [PATCH net-next,v2] gtp: add MAINTAINERS

2017-02-10 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri, 10 Feb 2017 13:26:27 +0100

> From: Pablo Neira 
> 
> Add maintainers for this tunnel driver. Include main osmocom.org mailist
> list too.
> 
> Signed-off-by: Pablo Neira Ayuso 
> ---
> v2: Harald suggests osmocom-net-g...@lists.osmocom.org is better ML for this.

Applied, thanks.

Re: [PATCH 2/3] tipc: Fix tipc_sk_reinit race conditions

2017-02-10 Thread Ying Xue

On 02/07/2017 08:39 PM, Herbert Xu wrote:
> There are two problems with the function tipc_sk_reinit.  Firstly
> it's doing a manual walk over an rhashtable.  This is broken as
> an rhashtable can be resized and if you manually walk over it
> during a resize then you may miss entries.
> 
> Secondly it's missing memory barriers as previously the code used
> spinlocks which provide the barriers implicitly.
> 
> This patch fixes both problems.
> 
> Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to...")
> Signed-off-by: Herbert Xu 

Acked-by: Ying Xue 

> ---
> 
>  net/tipc/net.c|4 
>  net/tipc/socket.c |   30 +++---
>  2 files changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/net/tipc/net.c b/net/tipc/net.c
> index 28bf4fe..ab8a2d5 100644
> --- a/net/tipc/net.c
> +++ b/net/tipc/net.c
> @@ -110,6 +110,10 @@ int tipc_net_start(struct net *net, u32 addr)
>   char addr_string[16];
>  
>   tn->own_addr = addr;
> +
> + /* Ensure that the new address is visible before we reinit. */
> + smp_mb();
> +
>   tipc_named_reinit(net);
>   tipc_sk_reinit(net);
>  
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 333c5da..20240e1 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -384,8 +384,6 @@ static int tipc_sk_create(struct net *net, struct socket 
> *sock,
>   INIT_LIST_HEAD(>publications);
>   msg = >phdr;
>   tn = net_generic(sock_net(sk), tipc_net_id);
> - tipc_msg_init(tn->own_addr, msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG,
> -   NAMED_H_SIZE, 0);
>  
>   /* Finish initializing socket data structures */
>   sock->ops = ops;
> @@ -395,6 +393,13 @@ static int tipc_sk_create(struct net *net, struct socket 
> *sock,
>   pr_warn("Socket create failed; port number exhausted\n");
>   return -EINVAL;
>   }
> +
> + /* Ensure tsk is visible before we read own_addr. */
> + smp_mb();
> +
> + tipc_msg_init(tn->own_addr, msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG,
> +   NAMED_H_SIZE, 0);
> +
>   msg_set_origport(msg, tsk->portid);
>   setup_timer(>sk_timer, tipc_sk_timeout, (unsigned long)tsk);
>   sk->sk_shutdown = 0;
> @@ -2267,24 +2272,27 @@ static int tipc_sk_withdraw(struct tipc_sock *tsk, 
> uint scope,
>  void tipc_sk_reinit(struct net *net)
>  {
>   struct tipc_net *tn = net_generic(net, tipc_net_id);
> - const struct bucket_table *tbl;
> - struct rhash_head *pos;
> + struct rhashtable_iter iter;
>   struct tipc_sock *tsk;
>   struct tipc_msg *msg;
> - int i;
>  
> - rcu_read_lock();
> - tbl = rht_dereference_rcu((>sk_rht)->tbl, >sk_rht);
> - for (i = 0; i < tbl->size; i++) {
> - rht_for_each_entry_rcu(tsk, pos, tbl, i, node) {
> + rhashtable_walk_enter(>sk_rht, );
> +
> + do {
> + tsk = ERR_PTR(rhashtable_walk_start());
> + if (tsk)
> + continue;
> +
> + while ((tsk = rhashtable_walk_next()) && !IS_ERR(tsk)) {
>   spin_lock_bh(>sk.sk_lock.slock);
>   msg = >phdr;
>   msg_set_prevnode(msg, tn->own_addr);
>   msg_set_orignode(msg, tn->own_addr);
>   spin_unlock_bh(>sk.sk_lock.slock);
>   }
> - }
> - rcu_read_unlock();
> +
> + rhashtable_walk_stop();
> + } while (tsk == ERR_PTR(-EAGAIN));
>  }
>  
>  static struct tipc_sock *tipc_sk_lookup(struct net *net, u32 portid)
>

Re: [PATCH net-next 00/12] Netronome NFP4000 and NFP6000 PF driver

2017-02-10 Thread David Miller

From: Jakub Kicinski 
Date: Thu,  9 Feb 2017 09:17:26 -0800

> This is a base PF driver for Netronome NFP4000 and NFP6000 chips.  This
> series doesn't add any exciting new features, it provides a foundation
> for supporting more advanced firmware applications.

Series applied, thank you.

Re: [PATCH v2 net] l2tp: do not use udp_ioctl()

2017-02-10 Thread David Miller

From: Eric Dumazet 
Date: Thu, 09 Feb 2017 16:15:52 -0800

> From: Eric Dumazet 
> 
> udp_ioctl(), as its name suggests, is used by UDP protocols,
> but is also used by L2TP :(
> 
> L2TP should use its own handler, because it really does not
> look the same.
> 
> SIOCINQ for instance should not assume UDP checksum or headers.
> 
> Thanks to Andrey and syzkaller team for providing the report
> and a nice reproducer.
> 
> While crashes only happen on recent kernels (after commit 
> 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
> probably needs to be backported to older kernels.
> 
> Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
> Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()")
> Signed-off-by: Eric Dumazet 
> Reported-by: Andrey Konovalov 
> Acked-by: Paolo Abeni 
> ---
> v2: Adding the EXPORT_SYMBOL(l2tp_ioctl) for ipv6, of course...

Applied and queued up for -stable, thanks Eric.

Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency

2017-02-10 Thread Florian Fainelli

On 02/10/2017 12:05 PM, Arnd Bergmann wrote:
> On Friday, February 10, 2017 9:42:21 AM CET Florian Fainelli wrote:
>> On 02/10/2017 12:20 AM, Arnd Bergmann wrote:
>>> On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli  
>>> wrote:
 On 02/09/2017 07:08 AM, Arnd Bergmann wrote:
 I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and
 I was not able to reproduce this, what am I missing?
>>>
>>> In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and
>>> we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in
>>> the header as a oneline workaround, but I think that would be more confusing
>>> to real users that try to use CONFIG_PHY=m without realizing why they lose
>>> access to their switch.
>>
>> I see, this patch should also help fixing this:
>>
>> http://patchwork.ozlabs.org/patch/726381/
> 
> I think you still have the same problem, as you can still have the
> boardinfo registration in a loadable module.

The patch exports mdiobus_register_board_info() so that should solve
your problem here, and I did verify this with a loadable module that
references mdiobus_register_board_info() in that case.

> 
> I have come up with a patch too now and done some randconfig testing
> on it (it took me several tries as well), please see below. It does
> some of the same things as yours and some others.
> 
> The main trick is to have a separate 'MDIO_BOARDINFO' Kconfig symbol
> that can be selected regardless of all the other symbols, and that
> will lead to the registration being either built-in when it's needed
> or not built at all when either no board calls it, or PHYLIB is
> disabled.

Your patch is fine in premise except that you are making CONFIG_MDIO
encompass both drivers/net/mdio.c and
drivers/net/phy/mdio_{bus,device}.c and these do share the same header
(for better or for worse), but are not quite dealing with MDIO at the
same level. drivers/net/mdio.c is more like PHYLIB for the old-style,
pre mdiobus() drivers helper functions.

I like it that you made MDIO_BOARDINFO separate, and that is probably a
patch I should incorporate in the other patch splitting things up, see
below though for the remainder of the changes.

> 
> From f35e89cacfabdf7b822772013389132605941def Mon Sep 17 00:00:00 2001
> From: Arnd Bergmann 
> Date: Wed, 27 Apr 2016 11:51:18 +0200
> Subject: [PATCH] [RFC] move ethernet PHY config into drivers/phy/Kconfig
> 
> Calling mdiobus_register_board_info from builtin code with CONFIG_PHYLIB=m
> currently results in a link error:
> 
> arch/arm/plat-orion/common.o: In function `orion_ge00_switch_init':
> common.c:(.init.text+0x6a4): undefined reference to 
> `mdiobus_register_board_info'
> 
> As the long-term strategy is to separate mdio from phylib, and to get 
> generic-phy
> and (networking-only) phylib closer together, this performs a first step in 
> that
> direction: The Kconfig file for phylib gets logically pulled under the PHY
> driver configuration and becomes independent from networking. This lets us
> select the new CONFIG_MDIO_BOARDINFO from platforms that need it, and provide
> the functions exactly when we need them.

This is too broad, the only part that is worth in drivers/net/phy/ of
pulling out of drivers/net/phy/ is what I tried to extract: mdio bus and
device. There are some bad inter-dependencies between that code and
phy_device.c and phy.c which makes it hard to split and make that part
completely standalone for now.

The only part that is truly valuable to non-Ethernet PHY devices is the
MDIO bus/device registration part, which is available in my patch with
CONFIG_MDIO_DEVICE, and which probably should not depend from
NETDEVICES, so the other part of your patch makes sense too here.

Thanks!


> 
> In the same step, we can also split out the MDIO driver configuration from
> phylib. This is based on an older experimental patch I had, but it still
> requires some code changes in phylib itself to let users actually rely on
> MDIO without all of PHYLIB.
> 
> Signed-off-by: Arnd Bergmann 
> 
> diff --git a/arch/arm/mach-orion5x/Kconfig b/arch/arm/mach-orion5x/Kconfig
> index 468b8cb7fd5f..e1126e1aa3d2 100644
> --- a/arch/arm/mach-orion5x/Kconfig
> +++ b/arch/arm/mach-orion5x/Kconfig
> @@ -4,6 +4,7 @@ menuconfig ARCH_ORION5X
>   select CPU_FEROCEON
>   select GENERIC_CLOCKEVENTS
>   select GPIOLIB
> + select MDIO_BOARDINFO
>   select MVEBU_MBUS
>   select PCI
>   select PLAT_ORION_LEGACY
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index a993cbeb9e0c..9eb15b7518bd 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -378,8 +378,6 @@ config NET_SB1000
>  
> If you don't have this card, of course say N.
>  
> -source "drivers/net/phy/Kconfig"
> -
>  source "drivers/net/plip/Kconfig"
>  
>  source "drivers/net/ppp/Kconfig"
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index

Re: [PATCH net-next v5 00/11] Improve BPF selftests and use the library (net-next tree)

2017-02-10 Thread David Miller

From: Mickaël Salaün 
Date: Fri, 10 Feb 2017 00:21:34 +0100

> This series brings some fixes to selftests, add the ability to test
> unprivileged BPF programs as root and replace bpf_sys.h with calls to the BPF
> library.
> 
> This is intended for the net-next tree and apply on c0e4dadb3494 ("net: dsa:
> mv88e6xxx: Move forward declaration to where it is needed").

Series applied, thank you.

[PATCH 1/2] net: fs_enet: Fix an error handling path

2017-02-10 Thread Christophe JAILLET

'of_node_put(fpi->phy_node)' should also be called if we branch to
'out_deregister_fixed_link' error handling path.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c 
b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index 54e3ce9bd94c..5c6426756d11 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -1045,10 +1045,10 @@ static int fs_enet_probe(struct platform_device *ofdev)
 out_free_dev:
free_netdev(ndev);
 out_put:
-   of_node_put(fpi->phy_node);
if (fpi->clk_per)
clk_disable_unprepare(fpi->clk_per);
 out_deregister_fixed_link:
+   of_node_put(fpi->phy_node);
if (of_phy_is_fixed_link(ofdev->dev.of_node))
of_phy_deregister_fixed_link(ofdev->dev.of_node);
 out_free_fpi:
-- 
2.9.3

[PATCH 2/2] net: fs_enet: Simplify code

2017-02-10 Thread Christophe JAILLET

There is no need to use an intermediate variable to handle an error code
in this case.

Signed-off-by: Christophe JAILLET 
---
I think that the remaining use of 'err' a few lines above could also be
dropped. However, it could change the return value (i.e. propagation of the
error returned by 'of_phy_register_fixed_link' instead of -ENODEV) and I'm
unsure it would be correct. So I leave it as-is.
---
 drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c 
b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index 5c6426756d11..753259091b22 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -964,11 +964,10 @@ static int fs_enet_probe(struct platform_device *ofdev)
 */
clk = devm_clk_get(>dev, "per");
if (!IS_ERR(clk)) {
-   err = clk_prepare_enable(clk);
-   if (err) {
-   ret = err;
+   ret = clk_prepare_enable(clk);
+   if (ret)
goto out_deregister_fixed_link;
-   }
+
fpi->clk_per = clk;
}
 
-- 
2.9.3

Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency

2017-02-10 Thread Arnd Bergmann

On Friday, February 10, 2017 9:42:21 AM CET Florian Fainelli wrote:
> On 02/10/2017 12:20 AM, Arnd Bergmann wrote:
> > On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli  
> > wrote:
> >> On 02/09/2017 07:08 AM, Arnd Bergmann wrote:
> >> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and
> >> I was not able to reproduce this, what am I missing?
> > 
> > In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and
> > we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in
> > the header as a oneline workaround, but I think that would be more confusing
> > to real users that try to use CONFIG_PHY=m without realizing why they lose
> > access to their switch.
> 
> I see, this patch should also help fixing this:
> 
> http://patchwork.ozlabs.org/patch/726381/

I think you still have the same problem, as you can still have the
boardinfo registration in a loadable module.

I have come up with a patch too now and done some randconfig testing
on it (it took me several tries as well), please see below. It does
some of the same things as yours and some others.

The main trick is to have a separate 'MDIO_BOARDINFO' Kconfig symbol
that can be selected regardless of all the other symbols, and that
will lead to the registration being either built-in when it's needed
or not built at all when either no board calls it, or PHYLIB is
disabled.

>From f35e89cacfabdf7b822772013389132605941def Mon Sep 17 00:00:00 2001
From: Arnd Bergmann 
Date: Wed, 27 Apr 2016 11:51:18 +0200
Subject: [PATCH] [RFC] move ethernet PHY config into drivers/phy/Kconfig

Calling mdiobus_register_board_info from builtin code with CONFIG_PHYLIB=m
currently results in a link error:

arch/arm/plat-orion/common.o: In function `orion_ge00_switch_init':
common.c:(.init.text+0x6a4): undefined reference to 
`mdiobus_register_board_info'

As the long-term strategy is to separate mdio from phylib, and to get 
generic-phy
and (networking-only) phylib closer together, this performs a first step in that
direction: The Kconfig file for phylib gets logically pulled under the PHY
driver configuration and becomes independent from networking. This lets us
select the new CONFIG_MDIO_BOARDINFO from platforms that need it, and provide
the functions exactly when we need them.

In the same step, we can also split out the MDIO driver configuration from
phylib. This is based on an older experimental patch I had, but it still
requires some code changes in phylib itself to let users actually rely on
MDIO without all of PHYLIB.

Signed-off-by: Arnd Bergmann 

diff --git a/arch/arm/mach-orion5x/Kconfig b/arch/arm/mach-orion5x/Kconfig
index 468b8cb7fd5f..e1126e1aa3d2 100644
--- a/arch/arm/mach-orion5x/Kconfig
+++ b/arch/arm/mach-orion5x/Kconfig
@@ -4,6 +4,7 @@ menuconfig ARCH_ORION5X
select CPU_FEROCEON
select GENERIC_CLOCKEVENTS
select GPIOLIB
+   select MDIO_BOARDINFO
select MVEBU_MBUS
select PCI
select PLAT_ORION_LEGACY
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index a993cbeb9e0c..9eb15b7518bd 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -378,8 +378,6 @@ config NET_SB1000
 
  If you don't have this card, of course say N.
 
-source "drivers/net/phy/Kconfig"
-
 source "drivers/net/plip/Kconfig"
 
 source "drivers/net/ppp/Kconfig"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd3ef5d..3ab87e9f9442 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_MII) += mii.o
 obj-$(CONFIG_MDIO) += mdio.o
 obj-$(CONFIG_NET) += Space.o loopback.o
 obj-$(CONFIG_NETCONSOLE) += netconsole.o
-obj-$(CONFIG_PHYLIB) += phy/
+obj-y+= phy/
 obj-$(CONFIG_RIONET) += rionet.o
 obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 8c08f9deef92..9c4652ae2750 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -11,9 +11,6 @@ menuconfig ETHERNET
 
 if ETHERNET
 
-config MDIO
-   tristate
-
 config SUNGEM_PHY
tristate
 
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 8dbd59baa34d..37f5552cc5b3 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -3,8 +3,9 @@
 #
 
 menuconfig PHYLIB
-   tristate "PHY Device support and infrastructure"
+   tristate "Ethernet PHY Device support and infrastructure"
depends on NETDEVICES
+   select MDIO
help
  Ethernet controllers are usually attached to PHY
  devices.  This option provides infrastructure for
@@ -248,6 +249,16 @@ config FIXED_PHY
  PHYs that are not connected to the real MDIO bus.
 
  Currently tested with mpc866ads and mpc8349e-mitx.
+endif # PHYLIB
+
+config MDIO
+   tristate
+   help
+ The MDIO bus is typically used ethernet PHYs, but can also be
+ used by other PHY drivers.
+

Re: [patch net-next] spectrum: flower: Treat ETH_P_ALL as a special case and translate for HW

2017-02-10 Thread David Miller

From: Jiri Pirko 
Date: Thu,  9 Feb 2017 14:42:03 +0100

> From: Jiri Pirko 
> 
> HW does not understand ETH_P_ALL. So treat this special case differently
> and translate to 0/0 key/mask. That will allow HW to match all ethertypes.
> 
> Fixes: 7aa0f5aa9030 ("mlxsw: spectrum: Implement TC flower offload")
> Signed-off-by: Jiri Pirko 
> Reviewed-by: Ido Schimmel 

Applied.

Re: [patch net-next 0/4] devlink: small cleanup around eswitch [sg]et

2017-02-10 Thread David Miller

From: Jiri Pirko 
Date: Thu,  9 Feb 2017 15:54:32 +0100

> Contains small devlink cleanup around eswitch get/set commands.

Series applied, thanks.

Re: [PATCH] net: ethernet: ti: netcp_core: return netdev_tx_t in xmit

2017-02-10 Thread David Miller

From: Ivan Khoronzhuk 
Date: Thu,  9 Feb 2017 16:24:14 +0200

> @@ -1300,7 +1301,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, 
> struct net_device *ndev)
>   dev_warn(netcp->ndev_dev, "padding failed (%d), packet 
> dropped\n",
>ret);
>   tx_stats->tx_dropped++;
> - return ret;
> + return NETDEV_TX_BUSY;
>   }
>   skb->len = NETCP_MIN_PACKET_SIZE;
>   }
> @@ -1329,7 +1330,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, 
> struct net_device *ndev)
>   if (desc)
>   netcp_free_tx_desc_chain(netcp, desc, sizeof(*desc));
>   dev_kfree_skb(skb);
> - return ret;
> + return NETDEV_TX_BUSY;
>  }

I really think these should be returning NET_XMIT_DROP.

Re: [PATCH net-next v2 00/12] net: dsa: remove unnecessary phy.h include

2017-02-10 Thread Florian Fainelli

On 02/10/2017 10:51 AM, David Miller wrote:
> From: Kalle Valo 
> Date: Thu, 09 Feb 2017 16:10:06 +0200
> 
>> Florian Fainelli  writes:
>>
> If not, for something like this it's a must:
>
> drivers/net/wireless/ath/wil6210/cfg80211.c:24:30: error: expected ‘)’ 
> before ‘bool’
>  module_param(disable_ap_sme, bool, 0444);
>   ^
> drivers/net/wireless/ath/wil6210/cfg80211.c:25:34: error: expected ‘)’ 
> before string constant
>  MODULE_PARM_DESC(disable_ap_sme, " let user space handle AP mode SME");
>   ^
> Like like that file needs linux/module.h included.

 Johannes already fixed a similar (or same) problem in my tree:

 wil6210: include moduleparam.h

 https://git.kernel.org/cgit/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=949c2d0096753d518ef6e0bd8418c8086747196b

 I'm planning to send you a pull request tomorrow which contains that
 one.
>>>
>>> Thanks Kalle!
>>>
>>> David, can you hold on this series until Kalle's pull request gets
>>> submitted? Past this error, allmodconfig builds fine with this patch
>>> series (just tested). Thanks!
>>
>> Just submitted the pull request:
>>
>> https://patchwork.ozlabs.org/patch/726133/
> 
> I've retried this patch series, and will push it out assuming the build
> completes properly.

I see it merged in net-next/master, thanks a lot this is going to save a
lot of cycles in the future, thanks David!
-- 
Florian

Re: [PATCH net-next 0/4] net/sched: Use TC skip flags to reflect HW offload status

2017-02-10 Thread David Miller

From: Or Gerlitz 
Date: Thu,  9 Feb 2017 16:18:04 +0200

> Currently there is no way of querying whether a filter is
> offloaded to HW or not when using both policy (no flag).
> 
> Reuse the skip flags to show the insertion status by setting
> the skip_hw flag in case the filter wasn't offloaded.
> 
> The bpf patch is compile tested only, Daniel/Jakub, will 
> appreciate your review/ack.
 ...

I'm learning towards suggesting that you use new flags, this way it
will be unambiguous whether we are running an old kernel.

If you just use the skip flag, it's impossible to tell the difference.

Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()

2017-02-10 Thread Adrian Chadd

On 9 February 2017 at 23:37, Joe Perches  wrote:
> On Thu, 2017-02-09 at 23:14 -0800, Adrian Chadd wrote:
>
>> If there
>> were accessors for the skb data / len fields (like we do for mbufs)
>> then porting the code would've involved about 5,000 less changed
>> lines.
>
> What generic mechanisms would you suggest to make
> porting easier between bsd and linux and what in
> your opinion are the best naming schemes to make
> these functions easiest to read and implement
> without resorting to excessive identifier lengths?
>
> If you have some, please provide examples.

(Why not, it's pre-coffee o'clock.)

The biggest barriers are direct struct accessors. Most of the time the
kernels have similar enough semantics that I can just implement a
linux shim layer (like we do for graphics layer porting from Linux.)
Eg, having skb_data(skb) (and skb_data_const(skb)) + skb_len(skb)
instead of skb->data and skb->len would remove a lot of churn. Having
say, a vif_to_drvpriv() method analogous to ath10k_vif_to_arvif()
would also simplify the changes. For the rest of it we can just use a
linux-like shim layer to get everything else working pretty darn well.

But the biggest thing that helps is a quasi HAL code structure. I know
HAL is a dirty word, so think of it more as "how would one separate
out the OS interface layer from the rest of the driver." A good
example in ath10k is the difference between say, wmi.c, the pci /
copyengine code and mac.c.

* the pci / copyengine code is almost 100% compilable on other
platforms, save the differences in little things (malloc, free, KVA
versus physical memory allocation, bounce buffering, sync'ing, etc.) A
sufficiently refactored driver like ath10k where almost all of that
stuff happens in the pci/copyengine code made porting that much less
painful.

* the wmi code is almost exclusively portable - besides the
malloc/free, etc mechanical changes which honestly can be stubbed, it
uses the lower layers (pci/ce, hif, htc, etc) for doing actual work,
and the upper layer uses a well-defined API + callback mechanism for
getting work done. Porting that was mechanical but reasonably easy.

* however, the mac.c code contains both code which sends commands to
the firmware (vif create/destroy, pdev commands, station
associate/update/destroy, crypto key handling, peer rate control, etc)
/and/ very linux mac80211/cfg80211 specific bits. If mac.c were split
into mac-mac80211.c (which was /just/ mac80211, cfg80211, etc bits)
and mac-utils.c (the bits that actually /sent/ the commands,
responses, all the support code, etc) then my port would just
implement mac-net80211.c as a completely new file, and the rest would
just be modified as required by porting.

A lot of the ath10k headers too mix linux specific things (eg struct
device, dependencies) with hardware specific definitions for say,
register accesses. I split out the register and firmware command /
structures into separate header files that didn't mingle OS and driver
specific structures to make it much easier to reuse that code. I find
that good driver writing hygiene in any case.

I'm not expecting an intel ethernet driver style HAL separation,
although that'd certainly make life easier in porting over drivers.
But just having inlined accessor functions for most things and some
stricter driver structure for OS touch points (dma setup/teardown,
bounce buffer stuff, mac80211/cfg80211, ethtool, etc APIs) would make
porting and testing things a lot easier. :-)

2c, and I'll do the porting/reimplementing work anyway regardless of
how much coffee it requires,

-adrian

Re: pull-request: mac80211-next 2017-02-09

2017-02-10 Thread David Miller

From: Johannes Berg 
Date: Thu,  9 Feb 2017 15:27:33 +0100

> Here are some more (final) updates for -next. Nothing here is
> really interesting, mostly cleanups and small fixes.
> 
> Please pull and let me know if there's any problem.

Pulled, thank you.

Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume

2017-02-10 Thread Grygorii Strashko

On 02/09/2017 07:45 PM, David Miller wrote:

From: Ivan Khoronzhuk 
Date: Fri, 10 Feb 2017 00:54:24 +0200

On Thu, Feb 09, 2017 at 05:21:26PM -0500, David Miller wrote:

From: Ivan Khoronzhuk 
Date: Thu,  9 Feb 2017 02:07:34 +0200

These two patches fix suspend/resume chain.

Patch 2 doesn't apply cleanly to the 'net' tree, please
respin this series.

Strange, I've just checked it on net-next/master, it was applied w/o any
warnings.

It makes no sense to test "net-next" when I am telling you that it is
the "net" tree it doesn't apply to.

This is a bug fix, so it should be targetting the "net" tree.

Looks like the first fix is for net, but the second one is for net-next
I do not see
03fd01ad0eead23eb79294b6fb4d71dcac493855
"net: ethernet: ti: cpsw: don't duplicate ndev_running"
in net.

--
regards,
-grygorii

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Sowmini Varadhan

On (02/10/17 10:00), Cong Wang wrote:
> My understanding about the race here is packet_release() doesn't
> wait for flying packets correctly, which leads to a flying packet still
> refers to the struct sock which is being released.
> 
> This could happen because struct packet_fanout is refcn'ted, it is
   :
> At least I believe this explains the crash Dmitry reported.

hmm, the proof of the pudding is in the eating- would be good to 
be able to reliably reproduce this somewhere (thus proving that
root-cause analysis is rock-solid), maybe by introducing artificial
delays to slow down paths..

I'm travelling at the moment but may be able to give this (try
to reproduce it reliably) next week.

--Sowmini

Re: [PATCH] NET: mkiss: Fix panic

2017-02-10 Thread David Miller

From: Ralf Baechle 
Date: Thu, 9 Feb 2017 14:12:11 +0100

> If a USB-to-serial adapter is unplugged, the driver re-initializes, with
> dev->hard_header_len and dev->addr_len set to zero, instead of the correct
> values.  If then a packet is sent through the half-dead interface, the
> kernel will panic due to running out of headroom in the skb when pushing
> for the AX.25 headers resulting in this panic:
> 
> [] (skb_panic) from [] (skb_push+0x4c/0x50)
> [] (skb_push) from [] (ax25_hard_header+0x34/0xf4 [ax25])
> [] (ax25_hard_header [ax25]) from [] (ax_header+0x38/0x40 
> [mkiss])
> [] (ax_header [mkiss]) from [] 
> (neigh_compat_output+0x8c/0xd8)
> [] (neigh_compat_output) from [] 
> (ip_finish_output+0x2a0/0x914)
> [] (ip_finish_output) from [] (ip_output+0xd8/0xf0)
> [] (ip_output) from [] (ip_local_out_sk+0x44/0x48)
> 
> This patch makes mkiss behave like the 6pack driver. 6pack does not
> panic.  In 6pack.c sp_setup() (same function name here) the values for
> dev->hard_header_len and dev->addr_len are set to the same values as in
> my mkiss patch.
> 
> [r...@linux-mips.org: Massages original submission to conform to the usual
> standards for patch submissions.]
> 
> Signed-off-by: Thomas Osterried 
> Signed-off-by: Ralf Baechle 

Applied, thank you.

Re: [PATCH net-next v2 00/12] net: dsa: remove unnecessary phy.h include

2017-02-10 Thread David Miller

From: Kalle Valo 
Date: Thu, 09 Feb 2017 16:10:06 +0200

> Florian Fainelli  writes:
> 
 If not, for something like this it's a must:

 drivers/net/wireless/ath/wil6210/cfg80211.c:24:30: error: expected ‘)’ 
 before ‘bool’
  module_param(disable_ap_sme, bool, 0444);
   ^
 drivers/net/wireless/ath/wil6210/cfg80211.c:25:34: error: expected ‘)’ 
 before string constant
  MODULE_PARM_DESC(disable_ap_sme, " let user space handle AP mode SME");
   ^
 Like like that file needs linux/module.h included.
>>> 
>>> Johannes already fixed a similar (or same) problem in my tree:
>>> 
>>> wil6210: include moduleparam.h
>>> 
>>> https://git.kernel.org/cgit/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=949c2d0096753d518ef6e0bd8418c8086747196b
>>> 
>>> I'm planning to send you a pull request tomorrow which contains that
>>> one.
>>
>> Thanks Kalle!
>>
>> David, can you hold on this series until Kalle's pull request gets
>> submitted? Past this error, allmodconfig builds fine with this patch
>> series (just tested). Thanks!
> 
> Just submitted the pull request:
> 
> https://patchwork.ozlabs.org/patch/726133/

I've retried this patch series, and will push it out assuming the build
completes properly.

[PATCH net-next] sfc: fix swapped arguments to efx_ef10_handle_rx_event_errors

2017-02-10 Thread Edward Cree

Fixes: a0ee35414837 ("sfc: process RX event inner checksum flags")
Reported-by: Colin Ian King 
Signed-off-by: Edward Cree 
---
 drivers/net/ethernet/sfc/ef10.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 6bba2d2..761ccc6 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3356,8 +3356,9 @@ static int efx_ef10_handle_rx_event(struct efx_channel 
*channel,
EFX_AND_QWORD(errors, *event, errors);
if (unlikely(!EFX_QWORD_IS_ZERO(errors))) {
flags |= efx_ef10_handle_rx_event_errors(channel, n_packets,
+rx_encap_hdr,
 rx_l3_class, 
rx_l4_class,
-rx_encap_hdr, event);
+event);
} else {
bool tcpudp = rx_l4_class == ESE_DZ_L4_CLASS_TCP ||
  rx_l4_class == ESE_DZ_L4_CLASS_UDP;

Re: Extending socket timestamping API for NTP

2017-02-10 Thread Denny Page


> On Feb 09, 2017, at 16:33, Denny Page  wrote:
> 
> 
>> On Feb 09, 2017, at 11:42, sdncurious  wrote:
>> 
>> I am still at a loss as to why transpose is required in case of HW
>> time stamping. If STF is used for both Tx and Rx time stamping the
>> timing is absolutely correct.
> 
> Perhaps this will help. The specific transposition is:
> 
>  transposed_timestamp_ns = timestamp_ns + (frame_len_bits * 10) / 
> (interface_speed * 100)
> 
> The transposition is applied to received timestamps only.


Before anyone else asks, yes, I know this can be reduced. :)

Re: pull-request: wireless-drivers-next 2017-02-09

2017-02-10 Thread David Miller

From: Kalle Valo 
Date: Thu, 09 Feb 2017 16:08:25 +0200

> another pull request for net-next. If the merge window starts on Sunday
> this would be the last pull request from me with new features. But if it
> doesn't open, I'm planning to send one more next week.
> 
> Please let me know if there any problems.

Pulled, thank you.

Re: [PATCH net] at803x: insure minimum delay for SGMII link AN completion ckeck

2017-02-10 Thread Florian Fainelli

On 02/10/2017 08:42 AM, Claudiu Manoil wrote:
> Commit: f62265b "at803x: double check SGMII side autoneg"
> introduced a regression for the p1010rdb board which has
> two of the ethernet controllers (eTSEC) connected through
> SGMII links to external Atheros SGMII AR8033 PHYs.
> The issue consists in a dead link for these ports, and is
> 100% reproducible on kernel 4.9 (and later):
> 
> root@p1010rdb-pb:~# ifconfig eth2 172.16.1.1
> [  203.274263] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
> root@p1010rdb-pb:~# [  206.408255] 803x_aneg_done: SGMII link is not ok
> 
> root@p1010rdb-pb:~# ethtool eth2
> Settings for eth2:
> Supported ports: [ MII ]
> Supported link modes:   10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Supported pause frame use: Symmetric Receive-only
> Supports auto-negotiation: Yes
> Advertised link modes:  10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Advertised pause frame use: No
> Advertised auto-negotiation: Yes
> Link partner advertised link modes:  10baseT/Half 10baseT/Full
>  100baseT/Half 100baseT/Full
>  1000baseT/Half 1000baseT/Full
> Link partner advertised pause frame use: Symmetric Receive-only
> Link partner advertised auto-negotiation: Yes
> Speed: 1000Mb/s
> Duplex: Full
> Port: MII
> PHYAD: 2
> Transceiver: internal
> Auto-negotiation: on
> Supports Wake-on: g
> Wake-on: d
> Current message level: 0x003f (63)
>drv probe link timer ifdown ifup
> Link detected: no
> 
> Insuring up to 100 usecs for the SGMII link side AN to complete
> proves to be enough to have a working SGMII link, for this board.
> The need for a delay for the SGMII link side may be explained by
> the fact that there are two levels of auto-negotiation (AN) for a
> SGMII link.  First the PHY autonegotiates the link parameters w/
> its link partner over the copper link. In the second stage, the
> AN results are then passed to the eTSEC MAC over the SGMII link
> using the Clause 37 auto-negotiation functionality.  While the
> aneg_done() hook is called by the phylib state machine to check
> for the completion of the 1st stage AN of the external PHY,
> there's no mechanism to insure proper AN completion of the internal
> SGMII link (which is actually handled on the eTSEC side by a
> "internal PHY", called TBI).
> 
> Fixes: f62265b "at803x: double check SGMII side autoneg"
> 
> Signed-off-by: Claudiu Manoil 
> ---
>  drivers/net/phy/at803x.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
> index a52b560..55fa7c4 100644
> --- a/drivers/net/phy/at803x.c
> +++ b/drivers/net/phy/at803x.c
> @@ -366,6 +366,7 @@ static void at803x_link_change_notify(struct phy_device 
> *phydev)
>  static int at803x_aneg_done(struct phy_device *phydev)
>  {
>   int ccr;
> + int timeout = 100; /* usecs */

unsigned int, and use reverse christmas tree declarations, order from
longest variable to shortest.

-- 
Florian

Re: [PATCH net-next V3 0/3] net/sched: act_pedit: Use offset relative to conventional network headers

2017-02-10 Thread David Miller

From: Amir Vadai 
Date: Tue,  7 Feb 2017 09:56:05 +0200

> Some FW/HW parser APIs are such that they need to get the specific header 
> type (e.g
> IPV4 or IPV6, TCP or UDP) and not only the networking level (e.g network or 
> transport).
> 
> Enhancing the UAPI to allow for specifying that, would allow the same flows 
> to be
> set into both SW and HW.
> 
> This patchset also makes pedit more robust. Currently fields offset is 
> specified
> by offset relative to the ip header, while using negative offsets for 
> MAC layer fields.
> 
> This series enables the user to set offset relative to the relevant header.
> 
> Usage example:
> $ tc filter add dev enp0s9 protocol ip parent : \
>flower \
>  ip_proto tcp \
> dst_port 80 \
>action \
>pedit munge ip ttl add 0xff \
>pedit munge tcp dport set 8080 \
>  pipe action mirred egress redirect dev veth0
> 
> Will forward traffic destined to tcp dport 80, while modifying the
> destination port to 8080, and decreasing the ttl by one.
> 
> I've uploaded a draft for the userspace [2] to make it easier to review and
> test the patchset.
> 
> [1] - http://patchwork.ozlabs.org/patch/700909/
> [2] - git: https://bitbucket.org/av42/iproute2.git
>   branch: pedit
> 
> Patchset was tested and applied on top of upstream commit bd092ad1463c ("Merge
> branch 'remove-__napi_complete_done'")

Series applied, thank you.

Re: [PATCH] xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()

2017-02-10 Thread David Miller

From: Boris Ostrovsky 
Date: Thu, 9 Feb 2017 08:42:59 -0500

> Are you going to take this to your tree or would you rather it goes
> via Xen tree?

Ok, I just did.

> And the same question for
> 
> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00625.html

As I stated in the thread, I applied this one.

> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00754.html

Likewise.

In the future, if you use netdev patchwork URLs, two things will
happen.  You will see immediately in the discussion log and the patch
state whether I applied it or not.  And second, I will be able to
reference and do something with the patch that much more quickly
and easily.

Thank you.

Re: [PATCH] net: ethernet: ti: netcp_core: remove netif_trans_update

2017-02-10 Thread David Miller

From: Ivan Khoronzhuk 
Date: Thu,  9 Feb 2017 16:17:40 +0200

> No need to update jiffies in txq->trans_start twice and only for tx 0,
> it's supposed to be done in netdev_start_xmit() and per tx queue.
> 
> Signed-off-by: Ivan Khoronzhuk 
> ---
> Based on net-next/master

Applied, thanks.

Re: [PATCH V2 net] net: hns: Fix the device being used for dma mapping during TX

2017-02-10 Thread David Miller

From: Salil Mehta 
Date: Thu, 9 Feb 2017 11:46:15 +

> From: Kejian Yan 
> 
> This patch fixes the device being used to DMA map skb->data.
> Erroneous device assignment causes the crash when SMMU is enabled.
> This happens during TX since buffer gets DMA mapped with device
> correspondign to net_device and gets unmapped using the device
> related to DSAF.
> 
> Signed-off-by: Kejian Yan 
> Reviewed-by: Yisen Zhuang 
> Signed-off-by: Salil Mehta 

Applied, thank you.

Re: [patch net-next] spectrum: flower: Treat ETH_P_ALL as a special case and translate for HW

2017-02-10 Thread David Miller

From: Jiri Pirko 
Date: Thu,  9 Feb 2017 14:42:03 +0100

> From: Jiri Pirko 
> 
> HW does not understand ETH_P_ALL. So treat this special case differently
> and translate to 0/0 key/mask. That will allow HW to match all ethertypes.
> 
> Fixes: 7aa0f5aa9030 ("mlxsw: spectrum: Implement TC flower offload")
> Signed-off-by: Jiri Pirko 
> Reviewed-by: Ido Schimmel 

Applied, thanks.

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang

On Fri, Feb 10, 2017 at 10:02 AM, Eric Dumazet  wrote:
> On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote:
>> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  
>> > wrote:
>> > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> > >
>> > >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> > >> case, and not correct locking.
>> > >>
>> > >> kfree(po->rollover);
>> > >> po->rollover = NULL;
>> > >>
>> > >> Two cpus entering fanout_add() (using the same af_packet socket,
>> > >> syzkaller courtesy...) might both see po->fanout being NULL.
>> > >>
>> > >> Then they grab the mutex.  Too late...
>> > >
>> > > Patch could be :
>> > >
>> >
>> > For me, clearly the data structure that use-after-free'd is struct sock
>> > rather than struct packet_rollover.
>>
>> Fine. But your patch makes absolutely no sense.
>
> At least, Anoob patch is making a step into the right direction ;)
>
> https://patchwork.ozlabs.org/patch/726532/
>

Yeah, but still looks like a different one with the one Dmitry reported.

Re: [PATCH net-next v3 06/10] net: dsa: Migrate to device_find_class()

2017-02-10 Thread Florian Fainelli

On 02/10/2017 05:02 AM, Greg KH wrote:
> On Thu, Jan 19, 2017 at 04:51:55PM +, Russell King - ARM Linux wrote:
>> (This is mainly for Greg's benefit to help him understand the issue.)
>>
>> I think the diagram you gave initially made this confusing, as it
>> talks about a CPU(sic) producing the "RGMII" and "MII-MGMT".
>>
>> Let's instead show a better representation that hopefully helps Greg
>> understand networking. :)
>>
>>
>>   CPU
>> System <-B->  Ethernet controller <-P-> } PHY <---> network cable
>> } - - - - - - - or - - - - - - -
>>   MDIO bus ---M---> } Switch <-P-> PHYs <--> network
>>   `M^cables
>>
>> 'B' can be an on-SoC bus or something like PCI.
>>
>> 'P' are the high-speed connectivity between the ethernet controller and
>> PHY which carries the packet data.  It has no addressing, it's a point
>> to point link.  RGMII is just one wiring example, there are many
>> different interfaces there (SGMII, AUI, XAUI, XGMII to name a few.)
>>
>> 'M' are the MDIO bus, which is the bus by which ethernet PHYs and
>> switches can be identified and controlled.
>>
>> The MDIO bus has a bus_type, has host drivers which are sometimes
>> part of the ethernet controller, but can also be stand-alone devices
>> shared between multiple ethernet controllers.
>>
>> PHYs are a kind of MDIO device which are members of the MDIO bus
>> type.  Each PHY (and switch) has a numerical address, and identifying
>> numbers within its register set which identifies the manufacturer
>> and device type.  We have device_driver objects for these.
>>
>> Expanding the above diagram to make it (hopefully) even clearer,
>> we can have this classic setup:
>>
>>   CPU
>> System <-B-> Ethernet controller <-P-> PHY <---> network cable
>>  MDIO bus ---M--^
>>
>> Or, in the case of two DSA switches attached to an Ethernet controller:
>>
>>  ||
>> System <-B-> Ethernet controller <-P-> Switch <-P-> PHY1 <--> network cable
>>  MDIO bus +--M--->   1<-P-> PHY2 <--> network cable
>>   |  |...|
>>   |  |<-P-> PHYn <--> network cable
>>   |  |^...|  |
>>   |   |  `---M---'
>>   |   P
>>   |   |
>>   |  |v~~~|
>>   `--> Switch <-P-> PHY1 <--> network cable
>>  |   2...|
>>  |<-P-> PHYn <--> network cable
>>  ||  |
>>  `---M---'
>>
>> The problem that the DSA guys are trying to deal with is how to
>> represent the link between the DSA switches (which are devices
>> sitting off their controlling bus - the MDIO bus) and the ethernet
>> controller associated with that collection of devices, be it a
>> switch or PHY.
> 
> Why do they have to represent that link?  This is a driver that somehow
> binds the two togther in some sort of "control plane"?

We have to represent that link because the CPU/host/management Ethernet
MAC is physically connected to the CPU/management port of the switch. It
does indeed participate in establishing the control plane.

The basic idea of DSA is that the switch inserts vendor tags to indicate
why the packet is sent towards the CPU in the first place: flooding,
management, copy etc along with information as to which
originating/destination port(s) this packet comes/goes from/to. On top
of that, we demultiplex that tag to deliver normal Ethernet frames to
per-port network devices (virtual network devices).

If we did leave the switch in an unmanaged mode and not logically
attached to an Ethernet MAC for management, we'd lose all that
information (we could use per-port VLANs to re-create it, but it would
be inferior to what a switch with proprietary tags can do)

Code in net/dsa/dsa2.c that binds the two (switch and Ethernet MAC)
together is not strictly a driver, it just is resident in memory and
waits for dsa_register_switch() to be called until it tries to do this
binding.

> 
>> Merely changing the parent/child relationships to try and solve
>> one issue just creates exactly the same problem elsewhere.
> 
> Fair enough.
> 
>> So, I hope with these diagrams, you can see that trying to make
>> the ethernet controller a child device of the DSA switches
>> means that (eg) it's no longer a PCI device, which is rather
>> absurd, especially when considering that what happens to the
>> right of the ethernet controller in the diagrams above is
>> normally external chips to the SoC or ethernet device.
> 
> Ok, thanks for the long explainations and

Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Andrew Lunn

On Fri, Feb 10, 2017 at 12:55:44PM -0500, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
> > Fixed in the "net" tree with:
> >
> > 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module
> > checks and NULL deref in phy_attach_direct()"), applies fine to net-next
> > as well.
> 
> Correct, this fixes my setup. Shouldn't this be submitted to net-next as
> well then?

Hi Vivien

David will at some point merge net into net-next.

Until then, you can work around the issue by enabling the PHY drivers
for you hardware. You are also likely to gain a few nice features,
like PHY interrupts rather than polling, maybe some temperature
sensors, PHY statistics, etc...

 Andrew

Re: [PATCH 0/4] Whitespace checkpatch fixes

2017-02-10 Thread David Miller

From: "Tobin C. Harding" 
Date: Thu,  9 Feb 2017 17:56:03 +1100

> This patch set fixes various whitespace checkpatch errors and warnings.

Series applied.

[PATCH net-next] net_sched: fix error recovery at qdisc creation

2017-02-10 Thread Eric Dumazet

From: Eric Dumazet 

Dmitry reported uses after free in qdisc code [1]

The problem here is that ops->init() can return an error.

qdisc_create_dflt() then call ops->destroy(),
while qdisc_create() does _not_ call it.

Four qdisc chose to call their own ops->destroy(), assuming their caller
would not.

This patch makes sure qdisc_create() calls ops->destroy()
and fixes the four qdisc to avoid double free.

[1]
BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at 
addr 8801d415d440
Read of size 8 by task syz-executor2/5030
CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
 0046 8801b435b870 81bbbed4 8801db000400
 8801d415d440 8801d415dc40 8801c4988510 8801b435b898
 816682b1 8801b435b928 8801d415d440 8801c49880c0
Call Trace:
 [] __dump_stack lib/dump_stack.c:15 [inline]
 [] dump_stack+0x6c/0x98 lib/dump_stack.c:51
 [] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
 [] print_address_description mm/kasan/report.c:196 [inline]
 [] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285
 [] kasan_report mm/kasan/report.c:305 [inline]
 [] __asan_report_load8_noabort+0x43/0x50 
mm/kasan/report.c:326
 [] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33
 [] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953
 [] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848
 [] attach_default_qdiscs net/sched/sch_generic.c:1029 
[inline]
 [] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064
 [] __dev_open+0x221/0x320 net/core/dev.c:1403
 [] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858
 [] dev_change_flags+0x8e/0x140 net/core/dev.c:6926
 [] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260
 [] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546
 [] sock_do_ioctl+0x99/0xb0 net/socket.c:879
 [] sock_ioctl+0x2a0/0x390 net/socket.c:958
 [] vfs_ioctl fs/ioctl.c:44 [inline]
 [] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611
 [] SYSC_ioctl fs/ioctl.c:626 [inline]
 [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617
 [] entry_SYSCALL_64_fastpath+0x12/0x17

Signed-off-by: Eric Dumazet 
Reported-by: Dmitry Vyukov 
---
 net/sched/sch_api.c|2 ++
 net/sched/sch_hhf.c|8 ++--
 net/sched/sch_mq.c |   10 +++---
 net/sched/sch_mqprio.c |   19 ++-
 net/sched/sch_sfq.c|3 ++-
 5 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 
adeabaec0d0b8bd3115e8d8db756460227142c60..a13c15e8f08782f9a428690052bf5585c446b6fe
 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1019,6 +1019,8 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
 
return sch;
}
+   /* ops->init() failed, we call ->destroy() like qdisc_create_dflt() */
+   ops->destroy(sch);
 err_out3:
dev_put(dev);
kfree((char *) sch - sch->padded);
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 
e3d0458af17ba32cb203d4a5bed952baf9d22588..2fae8b5f1b80c017c4ae60df54c9143f82de4e9d
 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -627,7 +627,9 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN *
  sizeof(u32));
if (!q->hhf_arrays[i]) {
-   hhf_destroy(sch);
+   /* Note: hhf_destroy() will be called
+* by our caller.
+*/
return -ENOMEM;
}
}
@@ -638,7 +640,9 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt)
q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN /
  BITS_PER_BYTE);
if (!q->hhf_valid_bits[i]) {
-   hhf_destroy(sch);
+   /* Note: hhf_destroy() will be called
+* by our caller.
+*/
return -ENOMEM;
}
}
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 
2bc8d7f8df161005bc89245ca5ebc52f3360e3af..20b7f1646f69270e08d8b7588759a0146f262e89
 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -52,7 +52,7 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
/* pre-allocate qdiscs, attachment can't fail */
priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
   GFP_KERNEL);
-   if (priv->qdiscs == NULL)
+   if (!priv->qdiscs)
return -ENOMEM;
 
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
@@ -60,18

[PATCH net-next] net: ethtool: add support for forward error correction modes

2017-02-10 Thread Vidya Sagar Ravipati

From: Vidya Sagar Ravipati 

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds.
Various networking devices which support 25G/40G/100G provides ability
to manage supported FEC modes and the lack of FEC encoding control and
reporting today is a source for itneroperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.

This patch set intends to provide option under ethtool to manage and report
FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by
specs.

set-fec/show-fec option(s) are  designed to provide  control and report
the FEC encoding on the link.

SET FEC option:
root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]

Encoding: Types of encoding
Off:  Turning off any encoding
RS :  enforcing RS-FEC encoding on supported speeds
BaseR  :  enforcing Base R encoding on supported speeds
Auto   :  IEEE defaults for the speed/medium combination

Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are  expecting FEC to be negotiated as on or off
  as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
  receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
  defaults.

>From our  understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.

SHOW FEC option:
root@tor: ethtool --show-fec  swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings:  RS | BaseR

ETHTOOL DEVNAME output modification:

ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
Supported ports: [ FIBRE ]
Supported link modes:   4baseCR4/Full
4baseSR4/Full
4baseLR4/Full
10baseSR4/Full
10baseCR4/Full
10baseLR4_ER4/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: [RS | BaseR | None | Not reported]
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: [RS | BaseR | None | Not reported]
 One or more FEC modes
Speed: 10Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 106
Transceiver: internal
Auto-negotiation: off
Link detected: yes

This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
  the new get_fecparam/set_fecparam callbacks, provides support
  for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
  are defined so that users can configure these fec modes for supported
  and advertising fields as part of link autonegotiation.

Signed-off-by: Vidya Sagar Ravipati 

Changes in RFC PATCH v2:
- Implement Gal Pressman and Casey Leedom feedback
- Removing autonegotiation field in fecparam structure
  and included active_fec to provide mechanism to indicate
  the configured and active FEC modes on port
---
 include/linux/ethtool.h  |  4 
 include/uapi/linux/ethtool.h | 48 +++-
 net/core/ethtool.c   | 34 +++
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..79a0bab 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -372,5 +372,9 @@ struct ethtool_ops {
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
  const struct ethtool_link_ksettings *);
+   int (*get_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
+   int (*set_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 3dc91a4..38dbfeb 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1238,6 +1238,47 @@ struct ethtool_per_queue_op {
chardata[];
 };
 
+/**
+ * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters
+ * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
+ * @active_fec: FEC mode which is active on porte
+ * @fec: Bitmask of supported/configured FEC modes
+ * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
+ *
+ * Drivers

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Eric Dumazet

On Fri, 2017-02-10 at 10:02 -0800, Cong Wang wrote:

> I don't have to give a 100% correct patch to prove my explanation
> of the crash. At least it makes more sense than yours...

I will submit it regardless of what you think.

It solves _another_ issue, one of of 10 in af_packet.c

Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> David will at some point merge net into net-next.

Yes I know that, I just wasn't sure if having such crash in net-next was
tolerated or not. Cherry-picking 6d9f66ac7fec does the job on my side.

> Until then, you can work around the issue by enabling the PHY drivers
> for you hardware. You are also likely to gain a few nice features,
> like PHY interrupts rather than polling, maybe some temperature
> sensors, PHY statistics, etc...

Hum I have CONFIG_MARVELL_PHY enabled, am I missing something?

Thanks,

Vivien

Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread David Miller

From: Vivien Didelot 
Date: Fri, 10 Feb 2017 12:55:44 -0500

> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> Fixed in the "net" tree with:
>>
>> 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module
>> checks and NULL deref in phy_attach_direct()"), applies fine to net-next
>> as well.
> 
> Correct, this fixes my setup. Shouldn't this be submitted to net-next as
> well then?

It will propagate there the next time I merge to Linus and then merge
net into net-next.

Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag

2017-02-10 Thread Alexei Starovoitov

On Thu, Feb 09, 2017 at 10:59:23AM -0800, Alexei Starovoitov wrote:
> Andy,
> does it all make sense?

Andy, ping.

Re: [patch net-next 00/10] mlxsw: Offload MC flood for unregister MC

2017-02-10 Thread David Miller

From: Jiri Pirko 
Date: Thu,  9 Feb 2017 14:54:39 +0100

> From: Jiri Pirko 
> 
> Nogah says:
> 
> When multicast is enabled, the Linux bridge floods unregistered multicast
> packets only to ports connected to a multicast router. Devices capable of
> offloading the Linux bridge need to be made aware of such ports, for
> proper flooding behavior.
> On the other hand, when multicast is disabled, such packets should be
> flooded to all ports. This patchset aims to fix that, by offloading
> the multicast state and the list of multicast router ports.
> 
> The first 3 patches adds switchdev attributes to offload this data.
> The rest of the patchset add implementation for handling this data in the
> mlxsw driver.
> 
> The effects this data has on the MDB (namely, when the multicast is
> disabled the MDB should be considered as invalid, and when it is enabled, a
> packet that is flooded by it should also be flooded to the multicast
> routers ports) is subject of future work.
> 
> Testing of this patchset included:
> Sending 3 mc packets streams, LL, register and unregistered, and checking
> that they reached only to the ports that should have received them.
> The configs were:
> mc disabled, mc without mc router ports and mc with fixed router port.
> It was checked for vlan aware bridge, vlan unaware bridge and vlan unaware
> bridge with another vlan unaware bridge on the same machine

Series applied, thanks.

Re: cafe8df8b9bc clashes with DSA

2017-02-10 Thread Florian Fainelli

On 02/10/2017 10:15 AM, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
>> David will at some point merge net into net-next.
> 
> Yes I know that, I just wasn't sure if having such crash in net-next was
> tolerated or not. Cherry-picking 6d9f66ac7fec does the job on my side.
> 
>> Until then, you can work around the issue by enabling the PHY drivers
>> for you hardware. You are also likely to gain a few nice features,
>> like PHY interrupts rather than polling, maybe some temperature
>> sensors, PHY statistics, etc...
> 
> Hum I have CONFIG_MARVELL_PHY enabled, am I missing something?

If you have fixed PHYs they'll use Generic PHY, suddenly the dungeons
collapses, you die.
-- 
Florian

Re: [PATCH v4] net: ethernet: faraday: To support device tree usage.

2017-02-10 Thread Rob Herring

On Wed, Feb 8, 2017 at 5:59 AM, Greentime Hu  wrote:
> On Sat, Jan 28, 2017 at 6:17 AM, Rob Herring  wrote:
>>
>> On Wed, Jan 25, 2017 at 10:09:20PM +0100, Arnd Bergmann wrote:
>> > On Wed, Jan 25, 2017 at 6:34 PM, David Miller  wrote:
>> > > From: Greentime Hu 
>> > > Date: Tue, 24 Jan 2017 16:46:14 +0800
>> > >> We also use the same binding document to describe the same faraday 
>> > >> ethernet
>> > >> controller and add faraday to vendor-prefixes.txt.
>> > >
>> > > Why are you renaming the MOXA binding file instead of adding a 
>> > > completely new one
>> > > for faraday?  The MOXA one should stick around, I don't see a 
>> > > justification for
>> > > removing it.
>> >
>> > This was my suggestion, basically fixing the name of the existing
>> > binding, which was
>> > accidentally named after one of the users rather than the company that did 
>> > the
>> > hardware.
>> >
>> > We can't change the compatible string, but I'd much prefer having only
>> > one binding
>> > file for this device rather than two separate ones that could possibly 
>> > become
>> > incompatible in case we add new properties to them. If there is only
>> > one of them,
>> > naming it according to the hardware design is the general policy.
>> >
>> > Note that we currently have two separate device drivers, but that is more a
>> > historic artifact, and if we ever get around to merging them into one 
>> > driver,
>> > that should not impact the binding.
>>
>> The change is fine with me, but the subject and commit message need some
>> work.
>
> Hi, Rob:
>
> Would you please advise me of the proper subject and commit messages?

Split the binding to a separate commit and summarize the email
discussion here. For a subject, something like this:

"dt-bindings: net: generalize moxart-mac to support all faraday based ftmac IP"

Rob

Re: [PATCH RFC net] net/mlx5e: Add preemption enable/disable around TC statistics upcall

2017-02-10 Thread Jakub Kicinski

On Fri, 10 Feb 2017 18:21:25 +0200, Or Gerlitz wrote:
> On Fri, Feb 10, 2017 at 3:34 AM, Jakub Kicinski wrote:
> > On Thu,  9 Feb 2017 17:38:43 +0200, Or Gerlitz wrote:  
> >> Running with CONFIG_PREEMPT set, I get a
> >>
> >> BUG: using smp_processor_id() in preemptible [] code: tc/3793
> >>
> >> asserion from the TC action (mirred) stats_update callback, when the do
> >>
> >>   _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets)
> >>
> >> As done by commit 66860be "nfp: bpf: allow offloaded filters to update 
> >> stats",
> >> disabling/enabling preemption around the TC upcall solves that.
> >>
> >> Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter 
> >> statistics support')
> >> Signed-off-by: Or Gerlitz 
> >> ---
> >>
> >> I marked it as RFC, since I wasn't fully sure on the nature of the
> >> problem, nor if this is the direction we should take to the fix.  
> 
> > I think it's the right fix  
> 
> Do you under the problem? what's wrong with the call done in the TC
> action code w.r.t preemption?
> 
> does it make sense to do this (say) 100K times/sec?

TC actions have pre-cpu stats, referencing them has to be done with
preemption disabled.  Let's CC Jamal and Cong - maybe there are some
more clever things we could do here?  The situation in a nutshell is
that the offload drivers read the stats from HW and want to write them
back to the TC action stats.  The writeback happens in process context
when user requests stats dump (potentially for multiple actions but we
currently would just iterate over all actions in driver code).

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Eric Dumazet

On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote:
> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
> > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
> > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
> > >
> > >> More likely the bug is in fanout_add(), with a buggy sequence in error
> > >> case, and not correct locking.
> > >>
> > >> kfree(po->rollover);
> > >> po->rollover = NULL;
> > >>
> > >> Two cpus entering fanout_add() (using the same af_packet socket,
> > >> syzkaller courtesy...) might both see po->fanout being NULL.
> > >>
> > >> Then they grab the mutex.  Too late...
> > >
> > > Patch could be :
> > >
> > 
> > For me, clearly the data structure that use-after-free'd is struct sock
> > rather than struct packet_rollover.
> 
> Fine. But your patch makes absolutely no sense.

At least, Anoob patch is making a step into the right direction ;)

https://patchwork.ozlabs.org/patch/726532/

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang

On Fri, Feb 10, 2017 at 9:59 AM, Eric Dumazet  wrote:
> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote:
>> On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote:
>> >
>> >> More likely the bug is in fanout_add(), with a buggy sequence in error
>> >> case, and not correct locking.
>> >>
>> >> kfree(po->rollover);
>> >> po->rollover = NULL;
>> >>
>> >> Two cpus entering fanout_add() (using the same af_packet socket,
>> >> syzkaller courtesy...) might both see po->fanout being NULL.
>> >>
>> >> Then they grab the mutex.  Too late...
>> >
>> > Patch could be :
>> >
>>
>> For me, clearly the data structure that use-after-free'd is struct sock
>> rather than struct packet_rollover.
>
> Fine. But your patch makes absolutely no sense.

I don't have to give a 100% correct patch to prove my explanation
of the crash. At least it makes more sense than yours...

Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency

2017-02-10 Thread Florian Fainelli

On 02/10/2017 12:20 AM, Arnd Bergmann wrote:
> On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli  wrote:
>> On 02/09/2017 07:08 AM, Arnd Bergmann wrote:
>> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and
>> I was not able to reproduce this, what am I missing?
> 
> In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and
> we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in
> the header as a oneline workaround, but I think that would be more confusing
> to real users that try to use CONFIG_PHY=m without realizing why they lose
> access to their switch.

I see, this patch should also help fixing this:

http://patchwork.ozlabs.org/patch/726381/

-- 
Florian

Re: net/packet: use-after-free in packet_rcv_fanout

2017-02-10 Thread Cong Wang

On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
 wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?

My understanding about the race here is packet_release() doesn't
wait for flying packets correctly, which leads to a flying packet still
refers to the struct sock which is being released.

This could happen because struct packet_fanout is refcn'ted, it is
still there when this is not the last sock referring it, therefore, the
callback packet_rcv_fanout() is not removed yet. When packet_release()
tries to remove the pointer to struct sock from f->arr[i] in
__fanout_unlink(), a flying packet could race with f->arr[i]:

po = pkt_sk(f->arr[idx]);

Of course, the fix may not be as easy as just adding a synchronize_net(),
perhaps we need the spinlock too in fanout_demux_rollover().

At least I believe this explains the crash Dmitry reported.

Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)

2017-02-10 Thread Arnaldo Carvalho de Melo

Em Wed, Feb 08, 2017 at 09:27:41PM +0100, Mickaël Salaün escreveu:
> This series brings some fixes and small improvements to the BPF samples.
> 
> This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf:
> Add missing header to the library").

Wang, are you ok with this series? Joe?

- Arnaldo
 
> Changes since v3:
> * remove applied patch 1/5
> * remove patch 2/5 on bpf_load_program() as requested by Wang Nan
> 
> Changes since v2:
> * add this cover letter
> 
> Changes since v1:
> * exclude patches not intended for the perf tree
> 
> Regards,
> 
> Mickaël Salaün (3):
>   samples/bpf: Ignore already processed ELF sections
>   samples/bpf: Reset global variables
>   samples/bpf: Add missing header
> 
>  samples/bpf/bpf_load.c | 7 +++
>  samples/bpf/tracex5_kern.c | 1 +
>  2 files changed, 8 insertions(+)
> 
> -- 
> 2.11.0

1 2 >

1 - 100 of 189 matches

Mail list logo