Re: [PATH v3 net-next] net: remove member 'max' of struct scm_fp_list
From: yuan linyuDate: Sat, 11 Feb 2017 11:41:17 +0800 > From: yuan linyu > > 'max' only used at three places in scm.c, > 1. in scm_fp_copy(), fpl->max = SCM_MAX_FD; > 2. in scm_fp_copy(), if (fpl->count + num > fpl->max) > 3. in scm_fp_dup(), new_fpl->max = new_fpl->count; > at place 3, the worst case is new_fpl->count = SCM_MAX_FD, > so do a full size dup, then 'max' field will always > SCM_MAX_FD and it can be removed. > > Signed-off-by: yuan linyu Please don't take this the wrong way, but I am ignoring your patches on this issue. This is even more broken than your previous two submissions. Sorry.
Re: [RFC][PATCH] nfsd: add +1 to reference counting scheme for struct nfsd4_session
> Signed-off-by: David Windsor> --- > fs/nfsd/nfs4state.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index a0dee8a..b0f3010 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -196,7 +196,7 @@ static void nfsd4_put_session_locked(struct nfsd4_session > *ses) > > lockdep_assert_held(>client_lock); > > - if (atomic_dec_and_test(>se_ref) && is_session_dead(ses)) > + if (!atomic_add_unless(>se_ref, -1, 1) && is_session_des(ses)) This should read: if (!atomic_add_unless(>se_ref, -1, 1) && is_session_dead(ses)) > free_session(ses); > put_client_renew_locked(clp); > } > @@ -1645,7 +1645,7 @@ static void init_session(struct svc_rqst *rqstp, struct > nfsd4_session *new, stru > new->se_flags = cses->flags; > new->se_cb_prog = cses->callback_prog; > new->se_cb_sec = cses->cb_sec; > - atomic_set(>se_ref, 0); > + atomic_set(>se_ref, 1); > idx = hash_sessionid(>se_sessionid); > list_add(>se_hash, >sessionid_hashtbl[idx]); > spin_lock(>cl_lock); > @@ -1792,7 +1792,7 @@ free_client(struct nfs4_client *clp) > ses = list_entry(clp->cl_sessions.next, struct nfsd4_session, > se_perclnt); > list_del(>se_perclnt); > - WARN_ON_ONCE(atomic_read(>se_ref)); > + WARN_ON_ONCE((atomic_read(>se_ref) > 1)); > free_session(ses); > } > rpc_destroy_wait_queue(>cl_cb_waitq); > -- > 2.7.4 >
[RFC][PATCH] nfsd: add +1 to reference counting scheme for struct nfsd4_session
In furtherance of the KSPP effort to add overflow protection to kernel reference counters, a new type (refcount_t) and API have been created. Part of the refcount_t API is refcount_inc(), which will not increment a refcount_t variable if its value is 0 (as this would indicate a possible use-after-free condition). In auditing the kernel for refcounting corner cases, we've come across the case of struct nfsd4_session. >From fs/nfsd/state.h: /* * Representation of a v4.1+ session. These are refcounted in a similar * fashion to the nfs4_client. References are only taken when the server * is actively working on the object (primarily during the processing of * compounds). */ struct nfsd4_session { atomic_t se_ref; ... }; >From fs/nfsd/nfs4state.c: static void init_session(..., struct nfsd4_session *new, ...) { ... atomic_set(>se_ref, 0); ... } Since nfsd4_session objects are initialized with refcount = 0, subsequent increments will fail using the new refcount_t API. Being largely unfamiliar with this subsystem's garbage collection mechanism, I'm unsure how to best fix this. Attached is a patch that performs a logical +1 on struct nfsd4_session's reference counting scheme. If this is the correct route to take, I will resubmit this patch with updated comments for how struct nfsd4_session is refcounted (see the above comment from fs/nsfd/state.h). This is in preparation for the previously mentioned refcount_t API series. Thanks, David Windsor Signed-off-by: David Windsor--- fs/nfsd/nfs4state.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a0dee8a..b0f3010 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -196,7 +196,7 @@ static void nfsd4_put_session_locked(struct nfsd4_session *ses) lockdep_assert_held(>client_lock); - if (atomic_dec_and_test(>se_ref) && is_session_dead(ses)) + if (!atomic_add_unless(>se_ref, -1, 1) && is_session_des(ses)) free_session(ses); put_client_renew_locked(clp); } @@ -1645,7 +1645,7 @@ static void init_session(struct svc_rqst *rqstp, struct nfsd4_session *new, stru new->se_flags = cses->flags; new->se_cb_prog = cses->callback_prog; new->se_cb_sec = cses->cb_sec; - atomic_set(>se_ref, 0); + atomic_set(>se_ref, 1); idx = hash_sessionid(>se_sessionid); list_add(>se_hash, >sessionid_hashtbl[idx]); spin_lock(>cl_lock); @@ -1792,7 +1792,7 @@ free_client(struct nfs4_client *clp) ses = list_entry(clp->cl_sessions.next, struct nfsd4_session, se_perclnt); list_del(>se_perclnt); - WARN_ON_ONCE(atomic_read(>se_ref)); + WARN_ON_ONCE((atomic_read(>se_ref) > 1)); free_session(ses); } rpc_destroy_wait_queue(>cl_cb_waitq); -- 2.7.4
[PATCH net-next] vxlan: remove vni zero check and drop for COLLECT_METADATA
From: Roopa PrabhuThis patch drops the vni zero check for COLLECT_METADATA mode. It is not really needed, vni zero is a valid vni. Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode" Reported-by: Joe Stringer Signed-off-by: Roopa Prabhu --- drivers/net/vxlan.c |3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 2374a75..4e27c5b 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1333,9 +1333,6 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb) vni = vxlan_vni(vxlan_hdr(skb)->vx_vni); - if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni) - goto drop; - vxlan = vxlan_vs_find_vni(vs, vni); if (!vxlan) goto drop; -- 1.7.10.4
Re: [PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode
On 2/10/17, 8:05 PM, Joe Stringer wrote: > On 31 January 2017 at 22:59, Roopa Prabhuwrote: >> @@ -1289,7 +1331,12 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff >> *skb) >> if (!vs) >> goto drop; >> >> - vxlan = vxlan_vs_find_vni(vs, vxlan_vni(vxlan_hdr(skb)->vx_vni)); >> + vni = vxlan_vni(vxlan_hdr(skb)->vx_vni); >> + >> + if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni) >> + goto drop; >> + >> + vxlan = vxlan_vs_find_vni(vs, vni); >> if (!vxlan) >> goto drop; > Hi Roopa, > > We've noticed a failure in OVS system-traffic kmod test cases and > bisected it down to this commit. It seems that it's related to this > new drop condition here. Can you explain what's meant to be special > about VNI 0? I can't see anything mentioned about it in RFC7348, so I > don't see why it should be dropped. > > In the OVS testsuite, we configure OVS in the root namespace with an > OVS vxlan device (which has VXLAN_F_COLLECT_METADATA set), with vni 0. > Then, we configure a veth pair into another namespace where we have > the other end of the tunnel configured using a regular native linux > vxlan device on vni 0. Prior to this commit, the test worked; after > this test it failed. If we manually change to use a nonzero VNI, it > works. The test is here: To be honest, I thought vni 0 was only used for the collect metadata device for lookup of the device until a real vni was derived. and since i moved the line that got the vni from the packet up, I ended up adding that check. Did not realize vni 0 could be valid vni in the packet. > > https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L218 > > Jarno also tried setting up two namespaces with regular vxlan devices > and VNI 0, and this worked too. Presumably this is because this would > not use VXLAN_F_COLLECT_METADATA. yeah, that should be it. I will send a patch in a few hours. Thanks for reporting. I am glad you ran these tests.. as I was not able to completely verify all cases for ovs.
[PATCH v2 net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command to the given cgroup the descendent cgroup will be able to override effective bpf program that was inherited from this cgroup. By default it's not passed, therefore override is disallowed. Examples: 1. prog X attached to /A with default prog Y fails to attach to /A/B and /A/B/C Everything under /A runs prog X 2. prog X attached to /A with allow_override. prog Y fails to attach to /A/B with default (non-override) prog M attached to /A/B with allow_override. Everything under /A/B runs prog M only. 3. prog X attached to /A with allow_override. prog Y fails to attach to /A with default. The user has to detach first to switch the mode. In the future this behavior may be extended with a chain of non-overridable programs. Also fix the bug where detach from cgroup where nothing is attached was not throwing error. Return ENOENT in such case. Add several testcases and adjust libbpf. Fixes: 3007098494be ("cgroup: add support for eBPF programs") Signed-off-by: Alexei Starovoitov--- v1->v2: disallowed overridable->non_override transition as suggested by Andy added tests and fixed double detach bug Andy, Daniel, please review and ack quickly, so it can land into 4.10. --- include/linux/bpf-cgroup.h | 13 include/uapi/linux/bpf.h | 7 + kernel/bpf/cgroup.c | 59 +++--- kernel/bpf/syscall.c | 20 kernel/cgroup.c | 9 +++--- samples/bpf/test_cgrp2_attach.c | 2 +- samples/bpf/test_cgrp2_attach2.c | 68 +--- samples/bpf/test_cgrp2_sock.c| 2 +- samples/bpf/test_cgrp2_sock2.c | 2 +- tools/lib/bpf/bpf.c | 4 ++- tools/lib/bpf/bpf.h | 3 +- 11 files changed, 151 insertions(+), 38 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 92bc89ae7e20..c970a25d2a49 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -21,20 +21,19 @@ struct cgroup_bpf { */ struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE]; struct bpf_prog __rcu *effective[MAX_BPF_ATTACH_TYPE]; + bool disallow_override[MAX_BPF_ATTACH_TYPE]; }; void cgroup_bpf_put(struct cgroup *cgrp); void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent); -void __cgroup_bpf_update(struct cgroup *cgrp, -struct cgroup *parent, -struct bpf_prog *prog, -enum bpf_attach_type type); +int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent, + struct bpf_prog *prog, enum bpf_attach_type type, + bool overridable); /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */ -void cgroup_bpf_update(struct cgroup *cgrp, - struct bpf_prog *prog, - enum bpf_attach_type type); +int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog, + enum bpf_attach_type type, bool overridable); int __cgroup_bpf_run_filter_skb(struct sock *sk, struct sk_buff *skb, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e5b8cf16cbaf..69f65b710b10 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -116,6 +116,12 @@ enum bpf_attach_type { #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE +/* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command + * to the given target_fd cgroup the descendent cgroup will be able to + * override effective bpf program that was inherited from this cgroup + */ +#define BPF_F_ALLOW_OVERRIDE (1U << 0) + #define BPF_PSEUDO_MAP_FD 1 /* flags for BPF_MAP_UPDATE_ELEM command */ @@ -171,6 +177,7 @@ union bpf_attr { __u32 target_fd; /* container object to attach to */ __u32 attach_bpf_fd; /* eBPF program to attach */ __u32 attach_type; + __u32 attach_flags; }; } __attribute__((aligned(8))); diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index a515f7b007c6..da0f53690295 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -52,6 +52,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent) e = rcu_dereference_protected(parent->bpf.effective[type], lockdep_is_held(_mutex)); rcu_assign_pointer(cgrp->bpf.effective[type], e); + cgrp->bpf.disallow_override[type] = parent->bpf.disallow_override[type]; } } @@ -82,30 +83,63 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent) * * Must be called with cgroup_mutex held. */ -void __cgroup_bpf_update(struct cgroup *cgrp, -struct cgroup *parent, -struct bpf_prog
Re: [PATCH net-next v2 2/5] vxlan: support fdb and learning in COLLECT_METADATA mode
On 31 January 2017 at 22:59, Roopa Prabhuwrote: > @@ -1289,7 +1331,12 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff > *skb) > if (!vs) > goto drop; > > - vxlan = vxlan_vs_find_vni(vs, vxlan_vni(vxlan_hdr(skb)->vx_vni)); > + vni = vxlan_vni(vxlan_hdr(skb)->vx_vni); > + > + if ((vs->flags & VXLAN_F_COLLECT_METADATA) && !vni) > + goto drop; > + > + vxlan = vxlan_vs_find_vni(vs, vni); > if (!vxlan) > goto drop; Hi Roopa, We've noticed a failure in OVS system-traffic kmod test cases and bisected it down to this commit. It seems that it's related to this new drop condition here. Can you explain what's meant to be special about VNI 0? I can't see anything mentioned about it in RFC7348, so I don't see why it should be dropped. In the OVS testsuite, we configure OVS in the root namespace with an OVS vxlan device (which has VXLAN_F_COLLECT_METADATA set), with vni 0. Then, we configure a veth pair into another namespace where we have the other end of the tunnel configured using a regular native linux vxlan device on vni 0. Prior to this commit, the test worked; after this test it failed. If we manually change to use a nonzero VNI, it works. The test is here: https://github.com/openvswitch/ovs/blob/branch-2.7/tests/system-traffic.at#L218 Jarno also tried setting up two namespaces with regular vxlan devices and VNI 0, and this worked too. Presumably this is because this would not use VXLAN_F_COLLECT_METADATA.
Re: [PATCH] net: add regs attribute to phy device for user diagnose
On 四, 2017-01-19 at 02:01 +0100, Andrew Lunn wrote: > > > > I will add two ethtool command in kernel to read and write register in PHY. > Write access will get NACKed by me. Read only please. some register need to write some value first then read. if read only, it will not achieve the goal. > > > > > ethtool can use these command to dump what user want, there is no > > more work to PHY driver. > Please think about how you handle PHYs with pages. This needs to be > part of the API. thank, I will. > > Andrew
[PATH v3 net-next] net: remove member 'max' of struct scm_fp_list
From: yuan linyu'max' only used at three places in scm.c, 1. in scm_fp_copy(), fpl->max = SCM_MAX_FD; 2. in scm_fp_copy(), if (fpl->count + num > fpl->max) 3. in scm_fp_dup(), new_fpl->max = new_fpl->count; at place 3, the worst case is new_fpl->count = SCM_MAX_FD, so do a full size dup, then 'max' field will always SCM_MAX_FD and it can be removed. Signed-off-by: yuan linyu --- v2->v3: change scm_fp_dup() to do a full size dup v1->v2: update commit log to describe correct reason to remove 'max' include/net/scm.h | 3 +-- net/core/scm.c| 23 ++- 2 files changed, 7 insertions(+), 19 deletions(-) diff --git a/include/net/scm.h b/include/net/scm.h index 59fa93c..1301227 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -19,8 +19,7 @@ struct scm_creds { }; struct scm_fp_list { - short count; - short max; + unsigned intcount; struct user_struct *user; struct file *fp[SCM_MAX_FD]; }; diff --git a/net/core/scm.c b/net/core/scm.c index b6d8368..fb3ab32 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) int *fdp = (int*)CMSG_DATA(cmsg); struct scm_fp_list *fpl = *fplp; struct file **fpp; - int i, num; - - num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int); - - if (num <= 0) - return 0; - - if (num > SCM_MAX_FD) - return -EINVAL; + unsigned int i, num; if (!fpl) { @@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) return -ENOMEM; *fplp = fpl; fpl->count = 0; - fpl->max = SCM_MAX_FD; fpl->user = NULL; } - fpp = >fp[fpl->count]; - if (fpl->count + num > fpl->max) + num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int); + if (fpl->count + num > SCM_MAX_FD) return -EINVAL; /* * Verify the descriptors and increment the usage count. */ - + fpp = >fp[fpl->count]; for (i=0; i< num; i++) { int fd = fdp[i]; @@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) if (!fpl->user) fpl->user = get_uid(current_user()); - return num; + return 0; } void __scm_destroy(struct scm_cookie *scm) @@ -336,12 +327,10 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) if (!fpl) return NULL; - new_fpl = kmemdup(fpl, offsetof(struct scm_fp_list, fp[fpl->count]), - GFP_KERNEL); + new_fpl = kmemdup(fpl, sizeof(*fpl), GFP_KERNEL); if (new_fpl) { for (i = 0; i < fpl->count; i++) get_file(fpl->fp[i]); - new_fpl->max = new_fpl->count; new_fpl->user = get_uid(fpl->user); } return new_fpl; -- 2.7.4
[PATCH 3/7] staging: r8712u: Fix macros used to read/write the TX/RX descriptors
Although the driver works on big-endian hardware, Sparse generates a lot of warnings. Many of these are the result of incorrect coding of these macros. Signed-off-by: Larry Finger--- drivers/staging/rtl8712/wifi.h | 109 - 1 file changed, 52 insertions(+), 57 deletions(-) diff --git a/drivers/staging/rtl8712/wifi.h b/drivers/staging/rtl8712/wifi.h index 7ebf247..74dfc9b 100644 --- a/drivers/staging/rtl8712/wifi.h +++ b/drivers/staging/rtl8712/wifi.h @@ -151,92 +151,88 @@ enum WIFI_REG_DOMAIN { #define _ORDER_BIT(15) #define SetToDs(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_TO_DS_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_TO_DS_); \ }) -#define GetToDs(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_TO_DS_)) != 0) +#define GetToDs(pbuf) (((*(__le16 *)(pbuf)) & cpu_to_le16(_TO_DS_)) != 0) #define ClearToDs(pbuf)({ \ - *(unsigned short *)(pbuf) &= (~cpu_to_le16(_TO_DS_)); \ + *(__le16 *)(pbuf) &= (~cpu_to_le16(_TO_DS_)); \ }) #define SetFrDs(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_FROM_DS_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_FROM_DS_); \ }) -#define GetFrDs(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_FROM_DS_)) != 0) +#define GetFrDs(pbuf) (((*(__le16 *)(pbuf)) & cpu_to_le16(_FROM_DS_)) != 0) #define ClearFrDs(pbuf)({ \ - *(unsigned short *)(pbuf) &= (~cpu_to_le16(_FROM_DS_)); \ + *(__le16 *)(pbuf) &= (~cpu_to_le16(_FROM_DS_)); \ }) #define get_tofr_ds(pframe)((GetToDs(pframe) << 1) | GetFrDs(pframe)) #define SetMFrag(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_MORE_FRAG_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_MORE_FRAG_); \ }) -#define GetMFrag(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_MORE_FRAG_)) != 0) +#define GetMFrag(pbuf) (((*(__le16 *)(pbuf)) & cpu_to_le16(_MORE_FRAG_)) != 0) #define ClearMFrag(pbuf) ({ \ - *(unsigned short *)(pbuf) &= (~cpu_to_le16(_MORE_FRAG_)); \ + *(__le16 *)(pbuf) &= (~cpu_to_le16(_MORE_FRAG_)); \ }) #define SetRetry(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_RETRY_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_RETRY_); \ }) -#define GetRetry(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_RETRY_)) != 0) +#define GetRetry(pbuf) (((*(__le16 *)(pbuf)) & cpu_to_le16(_RETRY_)) != 0) #define ClearRetry(pbuf) ({ \ - *(unsigned short *)(pbuf) &= (~cpu_to_le16(_RETRY_)); \ + *(__le16 *)(pbuf) &= (~cpu_to_le16(_RETRY_)); \ }) #define SetPwrMgt(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_PWRMGT_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_PWRMGT_); \ }) -#define GetPwrMgt(pbuf)(((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_PWRMGT_)) != 0) +#define GetPwrMgt(pbuf)(((*(__le16 *)(pbuf)) & \ + cpu_to_le16(_PWRMGT_)) != 0) #define ClearPwrMgt(pbuf) ({ \ - *(unsigned short *)(pbuf) &= (~cpu_to_le16(_PWRMGT_)); \ + *(__le16 *)(pbuf) &= (~cpu_to_le16(_PWRMGT_)); \ }) #define SetMData(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_MORE_DATA_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_MORE_DATA_); \ }) -#define GetMData(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_MORE_DATA_)) != 0) +#define GetMData(pbuf) (((*(__le16 *)(pbuf)) & \ + cpu_to_le16(_MORE_DATA_)) != 0) #define ClearMData(pbuf) ({ \ - *(unsigned short *)(pbuf) &= (~cpu_to_le16(_MORE_DATA_)); \ + *(__le16 *)(pbuf) &= (~cpu_to_le16(_MORE_DATA_)); \ }) #define SetPrivacy(pbuf) ({ \ - *(unsigned short *)(pbuf) |= cpu_to_le16(_PRIVACY_); \ + *(__le16 *)(pbuf) |= cpu_to_le16(_PRIVACY_); \ }) -#define GetPrivacy(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_PRIVACY_)) != 0) +#define GetPrivacy(pbuf) (((*(__le16 *)(pbuf)) & \ + cpu_to_le16(_PRIVACY_)) != 0) -#define GetOrder(pbuf) (((*(unsigned short *)(pbuf)) & \ - le16_to_cpu(_ORDER_)) != 0) +#define GetOrder(pbuf) (((*(__le16 *)(pbuf)) & \ + cpu_to_le16(_ORDER_)) != 0) #define GetFrameType(pbuf) (le16_to_cpu(*(__le16 *)(pbuf)) & \ (BIT(3) | BIT(2))) #define SetFrameType(pbuf, type) \ do {\ - *(unsigned short *)(pbuf) &= cpu_to_le16(~(BIT(3) | \ + *(__le16 *)(pbuf) &= cpu_to_le16(~(BIT(3) | \ BIT(2))); \ - *(unsigned short *)(pbuf) |= cpu_to_le16(type); \ + *(__le16 *)(pbuf) |= cpu_to_le16(type); \ } while (0) #define GetFrameSubType(pbuf) (le16_to_cpu(*(__le16 *)(pbuf)) & \ @@ -245,44 +241,43 @@
[PATCH 7/7] staging: r8712u: Fix Sparse warnings in rtl871x_mlme.c
Sparse reports the following: CHECK drivers/staging/rtl8712/rtl871x_mlme.c drivers/staging/rtl8712/rtl871x_mlme.c:1653:46: warning: incorrect type in assignment (different base types) drivers/staging/rtl8712/rtl871x_mlme.c:1653:46:expected unsigned int [unsigned] [usertype] DSConfig drivers/staging/rtl8712/rtl871x_mlme.c:1653:46:got restricted __le32 [usertype] drivers/staging/rtl8712/rtl871x_mlme.c:1656:56: warning: incorrect type in assignment (different base types) drivers/staging/rtl8712/rtl871x_mlme.c:1656:56:expected unsigned int [unsigned] [usertype] ATIMWindow drivers/staging/rtl8712/rtl871x_mlme.c:1656:56:got restricted __le32 [usertype] drivers/staging/rtl8712/rtl871x_mlme.c:1712:35: warning: incorrect type in assignment (different base types) drivers/staging/rtl8712/rtl871x_mlme.c:1712:35:expected restricted __le16 [addressable] [usertype] cap_info drivers/staging/rtl8712/rtl871x_mlme.c:1712:35:got int Signed-off-by: Larry Finger--- drivers/staging/rtl8712/rtl871x_mlme.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/staging/rtl8712/rtl871x_mlme.c b/drivers/staging/rtl8712/rtl871x_mlme.c index fd8d96d..bf1ac22 100644 --- a/drivers/staging/rtl8712/rtl871x_mlme.c +++ b/drivers/staging/rtl8712/rtl871x_mlme.c @@ -1650,10 +1650,9 @@ void r8712_update_registrypriv_dev_network(struct _adapter *adapter) /* TODO */ break; } - pdev_network->Configuration.DSConfig = cpu_to_le32( - pregistrypriv->channel); + pdev_network->Configuration.DSConfig = pregistrypriv->channel; if (cur_network->network.InfrastructureMode == Ndis802_11IBSS) - pdev_network->Configuration.ATIMWindow = cpu_to_le32(3); + pdev_network->Configuration.ATIMWindow = 3; pdev_network->InfrastructureMode = cur_network->network.InfrastructureMode; /* 1. Supported rates * 2. IE @@ -1709,12 +1708,12 @@ unsigned int r8712_restructure_ht_ie(struct _adapter *padapter, u8 *in_ie, } out_len = *pout_len; memset(_capie, 0, sizeof(struct ieee80211_ht_cap)); - ht_capie.cap_info = IEEE80211_HT_CAP_SUP_WIDTH | + ht_capie.cap_info = cpu_to_le16(IEEE80211_HT_CAP_SUP_WIDTH | IEEE80211_HT_CAP_SGI_20 | IEEE80211_HT_CAP_SGI_40 | IEEE80211_HT_CAP_TX_STBC | IEEE80211_HT_CAP_MAX_AMSDU | - IEEE80211_HT_CAP_DSSSCCK40; + IEEE80211_HT_CAP_DSSSCCK40); ht_capie.ampdu_params_info = (IEEE80211_HT_CAP_AMPDU_FACTOR & 0x03) | (IEEE80211_HT_CAP_AMPDU_DENSITY & 0x00); r8712_set_ie(out_ie + out_len, _HT_CAPABILITY_IE_, -- 2.10.2
[PATCH 1/7] staging: rtl8712: Fix some Sparse endian messages
Sparse reports the following: CHECK drivers/staging/rtl8712/rtl8712_xmit.c drivers/staging/rtl8712/rtl8712_xmit.c:564:42: warning: cast from restricted __le32 drivers/staging/rtl8712/rtl8712_xmit.c:569:42: warning: cast from restricted __le32 drivers/staging/rtl8712/rtl8712_xmit.c:571:42: warning: cast from restricted __le32 Each of these cases is transferring a quantity that is little-endian. There is no need for conversion. Signed-off-by: Larry Finger--- drivers/staging/rtl8712/rtl8712_xmit.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8712/rtl8712_xmit.c b/drivers/staging/rtl8712/rtl8712_xmit.c index 4231a0a..7fe6265 100644 --- a/drivers/staging/rtl8712/rtl8712_xmit.c +++ b/drivers/staging/rtl8712/rtl8712_xmit.c @@ -561,14 +561,14 @@ static void update_txdesc(struct xmit_frame *pxmitframe, uint *pmem, int sz) ptxdesc_mp = _mp; /* offset 8 */ - ptxdesc->txdw2 = cpu_to_le32(ptxdesc_mp->txdw2); + ptxdesc->txdw2 = ptxdesc_mp->txdw2; if (bmcst) ptxdesc->txdw2 |= cpu_to_le32(BMC); ptxdesc->txdw2 |= cpu_to_le32(BK); /* offset 16 */ - ptxdesc->txdw4 = cpu_to_le32(ptxdesc_mp->txdw4); + ptxdesc->txdw4 = ptxdesc_mp->txdw4; /* offset 20 */ - ptxdesc->txdw5 = cpu_to_le32(ptxdesc_mp->txdw5); + ptxdesc->txdw5 = ptxdesc_mp->txdw5; pattrib->pctrl = 0;/* reset to zero; */ } } else if (pxmitframe->frame_tag == MGNT_FRAMETAG) { -- 2.10.2
[PATCH 2/7] staging: rtl8712u: Fix endian settings for structs describing network packets
The headers describing a number of network packets do not have the correct endian settings for several types of data. Signed-off-by: Larry Finger--- drivers/staging/rtl8712/ieee80211.h | 84 ++--- 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/drivers/staging/rtl8712/ieee80211.h b/drivers/staging/rtl8712/ieee80211.h index 67ab580..68fd65e 100644 --- a/drivers/staging/rtl8712/ieee80211.h +++ b/drivers/staging/rtl8712/ieee80211.h @@ -138,51 +138,51 @@ struct ieee_ibss_seq { }; struct ieee80211_hdr { - u16 frame_ctl; - u16 duration_id; + __le16 frame_ctl; + __le16 duration_id; u8 addr1[ETH_ALEN]; u8 addr2[ETH_ALEN]; u8 addr3[ETH_ALEN]; - u16 seq_ctl; + __le16 seq_ctl; u8 addr4[ETH_ALEN]; -} __packed; +} __packed __aligned(2); struct ieee80211_hdr_3addr { - u16 frame_ctl; - u16 duration_id; + __le16 frame_ctl; + __le16 duration_id; u8 addr1[ETH_ALEN]; u8 addr2[ETH_ALEN]; u8 addr3[ETH_ALEN]; - u16 seq_ctl; -} __packed; + __le16 seq_ctl; +} __packed __aligned(2); struct ieee80211_hdr_qos { - u16 frame_ctl; - u16 duration_id; + __le16 frame_ctl; + __le16 duration_id; u8 addr1[ETH_ALEN]; u8 addr2[ETH_ALEN]; u8 addr3[ETH_ALEN]; - u16 seq_ctl; + __le16 seq_ctl; u8 addr4[ETH_ALEN]; - u16 qc; -} __packed; + __le16 qc; +} __packed __aligned(2); struct ieee80211_hdr_3addr_qos { - u16 frame_ctl; - u16 duration_id; + __le16 frame_ctl; + __le16 duration_id; u8 addr1[ETH_ALEN]; u8 addr2[ETH_ALEN]; u8 addr3[ETH_ALEN]; - u16 seq_ctl; - u16 qc; + __le16 seq_ctl; + __le16 qc; } __packed; struct eapol { u8 snap[6]; - u16 ethertype; + __be16 ethertype; u8 version; u8 type; - u16 length; + __le16 length; } __packed; enum eap_type { @@ -514,13 +514,13 @@ struct ieee80211_security { */ struct ieee80211_header_data { - u16 frame_ctl; - u16 duration_id; + __le16 frame_ctl; + __le16 duration_id; u8 addr1[6]; u8 addr2[6]; u8 addr3[6]; - u16 seq_ctrl; -}; + __le16 seq_ctrl; +} __packed __aligned(2); #define BEACON_PROBE_SSID_ID_POSITION 12 @@ -552,18 +552,18 @@ struct ieee80211_info_element { /* * These are the data types that can make up management packets * - u16 auth_algorithm; - u16 auth_sequence; - u16 beacon_interval; - u16 capability; + __le16 auth_algorithm; + __le16 auth_sequence; + __le16 beacon_interval; + __le16 capability; u8 current_ap[ETH_ALEN]; - u16 listen_interval; + __le16 listen_interval; struct { u16 association_id:14, reserved:2; } __packed; - u32 time_stamp[2]; - u16 reason; - u16 status; + __le32 time_stamp[2]; + __le16 reason; + __le16 status; */ #define IEEE80211_DEFAULT_TX_ESSID "Penguin" @@ -571,16 +571,16 @@ struct ieee80211_info_element { struct ieee80211_authentication { struct ieee80211_header_data header; - u16 algorithm; - u16 transaction; - u16 status; + __le16 algorithm; + __le16 transaction; + __le16 status; } __packed; struct ieee80211_probe_response { struct ieee80211_header_data header; - u32 time_stamp[2]; - u16 beacon_interval; - u16 capability; + __le32 time_stamp[2]; + __le16 beacon_interval; + __le16 capability; struct ieee80211_info_element info_element; } __packed; @@ -590,16 +590,16 @@ struct ieee80211_probe_request { struct ieee80211_assoc_request_frame { struct ieee80211_hdr_3addr header; - u16 capability; - u16 listen_interval; + __le16 capability; + __le16 listen_interval; struct ieee80211_info_element_hdr info_element; } __packed; struct ieee80211_assoc_response_frame { struct ieee80211_hdr_3addr header; - u16 capability; - u16 status; - u16 aid; + __le16 capability; + __le16 status; + __le16 aid; } __packed; struct ieee80211_txb { -- 2.10.2
[PATCH 4/7] staging: r8712u: Fix Sparse warning in rtl871x_xmit.c
Sparse reports the following: CHECK drivers/staging/rtl8712/rtl871x_xmit.c drivers/staging/rtl8712/rtl871x_xmit.c:350:44: warning: restricted __le32 degrades to integer drivers/staging/rtl8712/rtl871x_xmit.c:491:23: warning: incorrect type in initializer (different base types) drivers/staging/rtl8712/rtl871x_xmit.c:491:23:expected unsigned short [usertype] *fctrl drivers/staging/rtl8712/rtl871x_xmit.c:491:23:got restricted __le16 * drivers/staging/rtl8712/rtl871x_xmit.c:580:36: warning: incorrect type in assignment (different base types) drivers/staging/rtl8712/rtl871x_xmit.c:580:36:expected unsigned short [unsigned] [short] [usertype] drivers/staging/rtl8712/rtl871x_xmit.c:580:36:got restricted __be16 [usertype] Signed-off-by: Larry Finger--- drivers/staging/rtl8712/rtl871x_xmit.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8712/rtl871x_xmit.c b/drivers/staging/rtl8712/rtl871x_xmit.c index 4ab82ba..de88819 100644 --- a/drivers/staging/rtl8712/rtl871x_xmit.c +++ b/drivers/staging/rtl8712/rtl871x_xmit.c @@ -347,7 +347,8 @@ sint r8712_update_attrib(struct _adapter *padapter, _pkt *pkt, * some settings above. */ if (check_fwstate(pmlmepriv, WIFI_MP_STATE)) - pattrib->priority = (txdesc.txdw1 >> QSEL_SHT) & 0x1f; + pattrib->priority = + (le32_to_cpu(txdesc.txdw1) >> QSEL_SHT) & 0x1f; return _SUCCESS; } @@ -488,7 +489,7 @@ static sint make_wlanhdr(struct _adapter *padapter, u8 *hdr, struct ieee80211_hdr *pwlanhdr = (struct ieee80211_hdr *)hdr; struct mlme_priv *pmlmepriv = >mlmepriv; struct qos_priv *pqospriv = >qospriv; - u16 *fctrl = >frame_ctl; + __le16 *fctrl = >frame_ctl; memset(hdr, 0, WLANHDR_OFFSET); SetFrameSubType(fctrl, pattrib->subtype); @@ -577,7 +578,7 @@ static sint r8712_put_snap(u8 *data, u16 h_proto) snap->oui[0] = oui[0]; snap->oui[1] = oui[1]; snap->oui[2] = oui[2]; - *(u16 *)(data + SNAP_SIZE) = htons(h_proto); + *(__be16 *)(data + SNAP_SIZE) = htons(h_proto); return SNAP_SIZE + sizeof(u16); } -- 2.10.2
[PATCH 5/7] staging: r8712u: Fix Sparse endian warning in rtl871x_recv.c
Sparse reports the following: CHECK drivers/staging/rtl8712/rtl871x_recv.c drivers/staging/rtl8712/rtl871x_recv.c:657:21: warning: incorrect type in assignment (different base types) drivers/staging/rtl8712/rtl871x_recv.c:657:21:expected unsigned short [unsigned] [assigned] [usertype] len drivers/staging/rtl8712/rtl871x_recv.c:657:21:got restricted __be16 [usertype] Signed-off-by: Larry Finger--- drivers/staging/rtl8712/rtl871x_recv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/staging/rtl8712/rtl871x_recv.c b/drivers/staging/rtl8712/rtl871x_recv.c index 147b75b..2ef31a4 100644 --- a/drivers/staging/rtl8712/rtl871x_recv.c +++ b/drivers/staging/rtl8712/rtl871x_recv.c @@ -654,8 +654,9 @@ sint r8712_wlanhdr_to_ethhdr(union recv_frame *precvframe) memcpy(ptr, pattrib->dst, ETH_ALEN); memcpy(ptr + ETH_ALEN, pattrib->src, ETH_ALEN); if (!bsnaphdr) { - len = htons(len); - memcpy(ptr + 12, , 2); + __be16 be_tmp = htons(len); + + memcpy(ptr + 12, _tmp, 2); } return _SUCCESS; } -- 2.10.2
[PATCH 0/7] Fix Sparse endian warnings in r8712u
Now that endian checking is an automatic part of Sparse, it is advisable to fix these warnings under controlled conditions, which include testing on big-endian hardware. This set of patches fix all the issues. Signed-off-by: Larry FingerLarry Finger (7): staging: r8712u: Fix some Sparse endian messages staging: r8712u: Fix endian settings for structs describing network packets staging: r8712u: Fix macros used to read/write the TX/RX descriptors staging: r8712u: Fix Sparse warning in rtl871x_xmit.c staging: r8712u: Fix Sparse endian warning in rtl871x_recv.c staging: r8712u: Fix Sparse warnings in rtl871x_ioctl_linux.c staging: r8712u: Fix Sparse warnings in rtl871x_mlme.c drivers/staging/rtl8712/ieee80211.h | 84 ++-- drivers/staging/rtl8712/rtl8712_xmit.c| 6 +- drivers/staging/rtl8712/rtl871x_ioctl_linux.c | 4 +- drivers/staging/rtl8712/rtl871x_mlme.c| 9 +-- drivers/staging/rtl8712/rtl871x_recv.c| 5 +- drivers/staging/rtl8712/rtl871x_xmit.c| 7 +- drivers/staging/rtl8712/wifi.h| 109 -- 7 files changed, 110 insertions(+), 114 deletions(-) -- 2.10.2
[PATCH 6/7] staging: r8712u: Fix Sparse warnings in rtl871x_ioctl_linux.c
Sparse reports the following: CHECK drivers/staging/rtl8712/rtl871x_ioctl_linux.c drivers/staging/rtl8712/rtl871x_ioctl_linux.c:1422:46: warning: restricted __le16 degrades to integer drivers/staging/rtl8712/rtl871x_ioctl_linux.c:1424:46: warning: restricted __le16 degrades to integer Signed-off-by: Larry Finger--- drivers/staging/rtl8712/rtl871x_ioctl_linux.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/rtl8712/rtl871x_ioctl_linux.c b/drivers/staging/rtl8712/rtl871x_ioctl_linux.c index 0dc18d6..f4167f1 100644 --- a/drivers/staging/rtl8712/rtl871x_ioctl_linux.c +++ b/drivers/staging/rtl8712/rtl871x_ioctl_linux.c @@ -1419,9 +1419,9 @@ static int r8711_wx_get_rate(struct net_device *dev, ht_cap = true; pht_capie = (struct ieee80211_ht_cap *)(p + 2); memcpy(_rate, pht_capie->supp_mcs_set, 2); - bw_40MHz = (pht_capie->cap_info & + bw_40MHz = (le16_to_cpu(pht_capie->cap_info) & IEEE80211_HT_CAP_SUP_WIDTH) ? 1 : 0; - short_GI = (pht_capie->cap_info & + short_GI = (le16_to_cpu(pht_capie->cap_info) & (IEEE80211_HT_CAP_SGI_20 | IEEE80211_HT_CAP_SGI_40)) ? 1 : 0; } -- 2.10.2
Re: [PATH v2 net-next] net: remove member 'max' of struct scm_fp_list
hi, yes, my misunderstanding. it's error when use after dup. can we do a full size(SCM_MAX_FD) dup? On 六, 2017-02-11 at 10:36 +0800, yuan linyu wrote: > From: yuan linyu> > 'max' only used at three places in scm.c, > 1. in scm_fp_copy(), fpl->max = SCM_MAX_FD; > 2. in scm_fp_copy(), if (fpl->count + num > fpl->max) > 3. in scm_fp_dup(), new_fpl->max = new_fpl->count; > at place 2, fpl->max can be replaced with SCM_MAX_FD. > no other place read this 'max' again, so it can be removed. > > Signed-off-by: yuan linyu > --- > v1->v2: > update commit log to describe correct reason to remove 'max' > > include/net/scm.h | 3 +-- > net/core/scm.c| 20 +--- > 2 files changed, 6 insertions(+), 17 deletions(-) > > diff --git a/include/net/scm.h b/include/net/scm.h > index 59fa93c..1301227 100644 > --- a/include/net/scm.h > +++ b/include/net/scm.h > @@ -19,8 +19,7 @@ struct scm_creds { > }; > > struct scm_fp_list { > - short count; > - short max; > + unsigned intcount; > struct user_struct *user; > struct file *fp[SCM_MAX_FD]; > }; > diff --git a/net/core/scm.c b/net/core/scm.c > index b6d8368..53679517 100644 > --- a/net/core/scm.c > +++ b/net/core/scm.c > @@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct > scm_fp_list **fplp) > int *fdp = (int*)CMSG_DATA(cmsg); > struct scm_fp_list *fpl = *fplp; > struct file **fpp; > - int i, num; > - > - num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int); > - > - if (num <= 0) > - return 0; > - > - if (num > SCM_MAX_FD) > - return -EINVAL; > + unsigned int i, num; > > if (!fpl) > { > @@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct > scm_fp_list **fplp) > return -ENOMEM; > *fplp = fpl; > fpl->count = 0; > - fpl->max = SCM_MAX_FD; > fpl->user = NULL; > } > - fpp = >fp[fpl->count]; > > - if (fpl->count + num > fpl->max) > + num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int); > + if (fpl->count + num > SCM_MAX_FD) > return -EINVAL; > > /* > * Verify the descriptors and increment the usage count. > */ > - > + fpp = >fp[fpl->count]; > for (i=0; i< num; i++) > { > int fd = fdp[i]; > @@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct > scm_fp_list **fplp) > if (!fpl->user) > fpl->user = get_uid(current_user()); > > - return num; > + return 0; > } > > void __scm_destroy(struct scm_cookie *scm) > @@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) > if (new_fpl) { > for (i = 0; i < fpl->count; i++) > get_file(fpl->fp[i]); > - new_fpl->max = new_fpl->count; > new_fpl->user = get_uid(fpl->user); > } > return new_fpl;
[PATH v2 net-next] net: remove member 'max' of struct scm_fp_list
From: yuan linyu'max' only used at three places in scm.c, 1. in scm_fp_copy(), fpl->max = SCM_MAX_FD; 2. in scm_fp_copy(), if (fpl->count + num > fpl->max) 3. in scm_fp_dup(), new_fpl->max = new_fpl->count; at place 2, fpl->max can be replaced with SCM_MAX_FD. no other place read this 'max' again, so it can be removed. Signed-off-by: yuan linyu --- v1->v2: update commit log to describe correct reason to remove 'max' include/net/scm.h | 3 +-- net/core/scm.c| 20 +--- 2 files changed, 6 insertions(+), 17 deletions(-) diff --git a/include/net/scm.h b/include/net/scm.h index 59fa93c..1301227 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -19,8 +19,7 @@ struct scm_creds { }; struct scm_fp_list { - short count; - short max; + unsigned intcount; struct user_struct *user; struct file *fp[SCM_MAX_FD]; }; diff --git a/net/core/scm.c b/net/core/scm.c index b6d8368..53679517 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -69,15 +69,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) int *fdp = (int*)CMSG_DATA(cmsg); struct scm_fp_list *fpl = *fplp; struct file **fpp; - int i, num; - - num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int); - - if (num <= 0) - return 0; - - if (num > SCM_MAX_FD) - return -EINVAL; + unsigned int i, num; if (!fpl) { @@ -86,18 +78,17 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) return -ENOMEM; *fplp = fpl; fpl->count = 0; - fpl->max = SCM_MAX_FD; fpl->user = NULL; } - fpp = >fp[fpl->count]; - if (fpl->count + num > fpl->max) + num = (cmsg->cmsg_len - sizeof(struct cmsghdr))/sizeof(int); + if (fpl->count + num > SCM_MAX_FD) return -EINVAL; /* * Verify the descriptors and increment the usage count. */ - + fpp = >fp[fpl->count]; for (i=0; i< num; i++) { int fd = fdp[i]; @@ -112,7 +103,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) if (!fpl->user) fpl->user = get_uid(current_user()); - return num; + return 0; } void __scm_destroy(struct scm_cookie *scm) @@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) if (new_fpl) { for (i = 0; i < fpl->count; i++) get_file(fpl->fp[i]); - new_fpl->max = new_fpl->count; new_fpl->user = get_uid(fpl->user); } return new_fpl; -- 2.7.4
Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag
On 2/10/17 1:38 PM, Andy Lutomirski wrote: On Thu, Feb 9, 2017 at 10:59 AM, Alexei Starovoitovwrote: If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command to the given cgroup the descendent cgroup will be able to override effective bpf program that was inherited from this cgroup. By default it's not passed, therefore override is disallowed. Examples: 1. prog X attached to /A with default prog Y fails to attach to /A/B and /A/B/C Everything under /A runs prog X 2. prog X attached to /A with ALLOW_OVERRIDE prog Y attached to /A/B with default. Everything under /A/B runs prog Y I think that, for ease of future extension, Y should also need ALLOW_OVERRIDE. Otherwise, when non-overridable hooks can stack, there could be confusion as to whether Y should override something or should stack. I see. Fair enough. It's indeed easier for future extensions. 2. we can add another flag to reverse this call order too. Instead of calling the progs from child to parent, do parent to child. I think the order should depend on the hook. Hooks for process-initiated actions (egress, socket creation) should run innermost first and hooks for outside actions (ingress) should be outermost first. There are use cases where both ingress and egress would want both ordering. Like the monitoring would want to see the bytes that app wants to send and it would want to see the bytes that it's actually sending. So if something in the middle wants to drop due to whatever conditions, the monitoring needs to be the first and the last in the prog chain. That's one of the use cases for 'attach_priority'. Some high priority can be reserved for debugging and so on. Andy, does it all make sense? Yes with the caveat above. great! Do you still insist on submitting this patch officially? I'm not sure what you mean. it's an RFC. In netdev we never apply rfc patches. or you're ok keeping it overridable for now. I really think the default should change for 4.10. People are going fine. will respin with requested change.
Re: [PATCH] net: remove member 'max' of struct scm_fp_list
On 五, 2017-02-10 at 10:25 -0500, David Miller wrote: > From: yuan linyu> Date: Fri, 10 Feb 2017 20:11:13 +0800 > > > From: yuan linyu > > > > SCM_MAX_FD can fully replace it. > > > > Signed-off-by: yuan linyu > > I don't think so: > > > @@ -341,7 +332,6 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) > > if (new_fpl) { > > for (i = 0; i < fpl->count; i++) > > get_file(fpl->fp[i]); > > - new_fpl->max = new_fpl->count; > > new_fpl->user = get_uid(fpl->user); > > It's not set the SCM_MAX_FD here, it's set to whatever fpl->count is. > > In other words, your patch breaks things. maybe it's not good to "SCM_MAX_FD can fully replace it". actually 'max' field is useless.'count' field is enough.
[PATCH] net: ethernet: ti: cpsw: return NET_XMIT_DROP if skb_padto failed
If skb_padto failed the skb has been dropped already, so it was consumed, but it doesn't mean it was sent, thus no need to update queue tx time, etc. So, return NET_XMIT_DROP as more appropriate. Signed-off-by: Ivan Khoronzhuk--- Based on net-next/master drivers/net/ethernet/ti/cpsw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 4d1c0c3..503fa8a 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1604,7 +1604,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb, if (skb_padto(skb, CPSW_MIN_PACKET_SIZE)) { cpsw_err(priv, tx_err, "packet pad failed\n"); ndev->stats.tx_dropped++; - return NETDEV_TX_OK; + return NET_XMIT_DROP; } if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP && -- 2.7.4
Re: [PATCH 2/3] Bluetooth: cmtp: fix possible might sleep error in cmtp_session
Hi, On Tue, Jan 24, 2017 at 12:07:50PM +0800, Jeffy Chen wrote: > It looks like cmtp_session has same pattern as the issue reported in > old rfcomm: > > while (1) { > set_current_state(TASK_INTERRUPTIBLE); > if (condition) > break; > // may call might_sleep here > schedule(); > } > __set_current_state(TASK_RUNNING); > > Which fixed at: > dfb2fae Bluetooth: Fix nested sleeps > > So let's fix it at the same way, also follow the suggestion of: > https://lwn.net/Articles/628628/ > > Signed-off-by: Jeffy Chen> --- > > net/bluetooth/cmtp/core.c | 21 ++--- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c > index 9e59b66..6b03f2b 100644 > --- a/net/bluetooth/cmtp/core.c > +++ b/net/bluetooth/cmtp/core.c > @@ -280,16 +280,16 @@ static int cmtp_session(void *arg) > struct cmtp_session *session = arg; > struct sock *sk = session->sock->sk; > struct sk_buff *skb; > - wait_queue_t wait; > + DEFINE_WAIT_FUNC(wait, woken_wake_function); > > BT_DBG("session %p", session); > > set_user_nice(current, -15); > > - init_waitqueue_entry(, current); > add_wait_queue(sk_sleep(sk), ); > while (1) { > - set_current_state(TASK_INTERRUPTIBLE); > + /* Ensure session->terminate is updated */ > + smp_mb__before_atomic(); > > if (atomic_read(>terminate)) > break; > @@ -306,9 +306,8 @@ static int cmtp_session(void *arg) > > cmtp_process_transmit(session); > > - schedule(); > + wait_woken(, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); > } > - __set_current_state(TASK_RUNNING); > remove_wait_queue(sk_sleep(sk), ); > > down_write(_session_sem); > @@ -393,7 +392,11 @@ int cmtp_add_connection(struct cmtp_connadd_req *req, > struct socket *sock) > err = cmtp_attach_device(session); > if (err < 0) { > atomic_inc(>terminate); > - wake_up_process(session->task); > + > + /* Ensure session->terminate is updated */ > + smp_mb__after_atomic(); > + Same comment about the barrier. > + wake_up_interruptible(sk_sleep(session->sock->sk)); > up_write(_session_sem); > return err; > } > @@ -431,7 +434,11 @@ int cmtp_del_connection(struct cmtp_conndel_req *req) > > /* Stop session thread */ > atomic_inc(>terminate); > - wake_up_process(session->task); > + > + /* Ensure session->terminate is updated */ > + smp_mb__after_atomic(); And again. But otherwise I think this looks OK, again with the caveat that I don't know Bluetooth/CMTP that well: Reviewed-by: Brian Norris > + > + wake_up_interruptible(sk_sleep(session->sock->sk)); > } else > err = -ENOENT; > > -- > 2.1.4 > >
Re: [PATCH 1/3] Bluetooth: bnep: fix possible might sleep error in bnep_session
Hi, On Tue, Jan 24, 2017 at 12:07:49PM +0800, Jeffy Chen wrote: > It looks like bnep_session has same pattern as the issue reported in > old rfcomm: > > while (1) { > set_current_state(TASK_INTERRUPTIBLE); > if (condition) > break; > // may call might_sleep here > schedule(); > } > __set_current_state(TASK_RUNNING); > > Which fixed at: > dfb2fae Bluetooth: Fix nested sleeps > > So let's fix it at the same way, also follow the suggestion of: > https://lwn.net/Articles/628628/ > > Signed-off-by: Jeffy Chen> --- > > net/bluetooth/bnep/core.c | 15 +-- > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c > index fbf251f..da04d51 100644 > --- a/net/bluetooth/bnep/core.c > +++ b/net/bluetooth/bnep/core.c > @@ -484,16 +484,16 @@ static int bnep_session(void *arg) > struct net_device *dev = s->dev; > struct sock *sk = s->sock->sk; > struct sk_buff *skb; > - wait_queue_t wait; > + DEFINE_WAIT_FUNC(wait, woken_wake_function); > > BT_DBG(""); > > set_user_nice(current, -15); > > - init_waitqueue_entry(, current); > add_wait_queue(sk_sleep(sk), ); > while (1) { > - set_current_state(TASK_INTERRUPTIBLE); > + /* Ensure session->terminate is updated */ > + smp_mb__before_atomic(); > > if (atomic_read(>terminate)) > break; > @@ -515,9 +515,8 @@ static int bnep_session(void *arg) > break; > netif_wake_queue(dev); > > - schedule(); > + wait_woken(, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); > } > - __set_current_state(TASK_RUNNING); > remove_wait_queue(sk_sleep(sk), ); > > /* Cleanup session */ > @@ -666,7 +665,11 @@ int bnep_del_connection(struct bnep_conndel_req *req) > s = __bnep_get_session(req->dst); > if (s) { > atomic_inc(>terminate); > - wake_up_process(s->task); > + > + /* Ensure session->terminate is updated */ > + smp_mb__after_atomic(); > + __wake_up() suggests: * It may be assumed that this function implies a write memory barrier before * changing the task state if and only if any tasks are woken up. so the above barrier is probably unnecessary. I'm not so sure about the one before atomic_read(); seems fine. Other than that, I this looks ok: Reviewed-by: Brian Norris But I haven't been testing BNEP. Brian > + wake_up_interruptible(sk_sleep(s->sock->sk)); > } else > err = -ENOENT; > > -- > 2.1.4 > >
Re: [PATCH 3/3] Bluetooth: hidp: fix possible might sleep error in hidp_session_thread
Hi Jeffy, I'm really not an expert on bluetooth or HIDP, but I can't bring myself to say that this is correct. I still think you have a problem. On Tue, Jan 24, 2017 at 12:07:51PM +0800, Jeffy Chen wrote: > It looks like hidp_session_thread has same pattern as the issue reported in > old rfcomm: > > while (1) { > set_current_state(TASK_INTERRUPTIBLE); > if (condition) > break; > // may call might_sleep here > schedule(); > } > __set_current_state(TASK_RUNNING); > > Which fixed at: > dfb2fae Bluetooth: Fix nested sleeps > > So let's fix it at the same way, also follow the suggestion of: > https://lwn.net/Articles/628628/ > > Signed-off-by: Jeffy Chen> --- > > net/bluetooth/hidp/core.c | 23 +++ > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c > index 0bec458..43d6e6a 100644 > --- a/net/bluetooth/hidp/core.c > +++ b/net/bluetooth/hidp/core.c > @@ -36,6 +36,7 @@ > #define VERSION "1.2" > > static DECLARE_RWSEM(hidp_session_sem); > +static DECLARE_WAIT_QUEUE_HEAD(hidp_session_wq); > static LIST_HEAD(hidp_session_list); > > static unsigned char hidp_keycode[256] = { > @@ -1068,12 +1069,15 @@ static int hidp_session_start_sync(struct > hidp_session *session) > * Wake up session thread and notify it to stop. This is asynchronous and > * returns immediately. Call this whenever a runtime error occurs and you > want > * the session to stop. > - * Note: wake_up_process() performs any necessary memory-barriers for us. > */ > static void hidp_session_terminate(struct hidp_session *session) > { > atomic_inc(>terminate); > - wake_up_process(session->task); > + > + /* Ensure session->terminate is updated */ > + smp_mb__after_atomic(); > + > + wake_up_interruptible(_session_wq); So, you're adding a whole new wait queue here. > } > > /* > @@ -1180,7 +1184,9 @@ static void hidp_session_run(struct hidp_session > *session) > struct sock *ctrl_sk = session->ctrl_sock->sk; > struct sock *intr_sk = session->intr_sock->sk; > struct sk_buff *skb; > + DEFINE_WAIT_FUNC(wait, woken_wake_function); > > + add_wait_queue(_session_wq, ); > for (;;) { > /* >* This thread can be woken up two ways: > @@ -1188,12 +1194,10 @@ static void hidp_session_run(struct hidp_session > *session) >*session->terminate flag and wakes this thread up. >* - Via modifying the socket state of ctrl/intr_sock. This >*thread is woken up by ->sk_state_changed(). > - * > - * Note: set_current_state() performs any necessary > - * memory-barriers for us. >*/ > - set_current_state(TASK_INTERRUPTIBLE); > > + /* Ensure session->terminate is updated */ > + smp_mb__before_atomic(); > if (atomic_read(>terminate)) > break; > > @@ -1227,11 +1231,14 @@ static void hidp_session_run(struct hidp_session > *session) > hidp_process_transmit(session, >ctrl_transmit, > session->ctrl_sock); > > - schedule(); > + wait_woken(, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); And you're waiting on it here. But you're already on two other wait queues (hidp_session_thread()). So the nice WQ_FLAG_WOKEN handling will only happen if you get woken via the new hidp_session_wq queue. But what about the other two? Seems like again you might have a race condition that would lead you to (temporarily, at least?) missing a wake-up attempt. I'm not really sure what the best way to resolve this would be. My best guess would be to either consolidate the use of these wait queues, or lese roll a version of wait_woken() to handle 2 or more wait heads... Am I wrong? I easily could be. Brian > } > + remove_wait_queue(_session_wq, ); > > atomic_inc(>terminate); > - set_current_state(TASK_RUNNING); > + > + /* Ensure session->terminate is updated */ > + smp_mb__after_atomic(); > } > > /* > -- > 2.1.4 > >
Re: [PATCH] net: ethernet: ti: netcp_core: return netdev_tx_t in xmit
On Fri, Feb 10, 2017 at 02:45:21PM -0500, David Miller wrote: > From: Ivan Khoronzhuk> Date: Thu, 9 Feb 2017 16:24:14 +0200 > > > @@ -1300,7 +1301,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, > > struct net_device *ndev) > > dev_warn(netcp->ndev_dev, "padding failed (%d), packet > > dropped\n", > > ret); > > tx_stats->tx_dropped++; > > - return ret; > > + return NETDEV_TX_BUSY; > > } > > skb->len = NETCP_MIN_PACKET_SIZE; > > } > > @@ -1329,7 +1330,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, > > struct net_device *ndev) > > if (desc) > > netcp_free_tx_desc_chain(netcp, desc, sizeof(*desc)); > > dev_kfree_skb(skb); > > - return ret; > > + return NETDEV_TX_BUSY; > > } > > I really think these should be returning NET_XMIT_DROP. Yes, it seems here can be a little more changes then, will send new version later.
Re: net: hix5hd2_gmac uninitialized net_device
On Fri, Feb 10, 2017 at 06:21:35PM +0800, Dongpo Li wrote: > I think the error "No irq resource" happened for some other reason, has no > relation with > the info "(unnamed net_device) (uninitialized):". > You can add more debug info to find bug. Do you have any particular suggestions as to what to check out, or is this just a general 'debug more' instruction? > Yes, I agree with you that the ndev has not been initialized completely, > because the function "register_netdev" has not been called yet. > It's better to use the "dev_err" to replace the "netdev_err". > Ah, I see. So, prior to line 1266's call to register_netdev, it will always be uninitialized and unnamed, regardless of what is or isn't right elsewhere. Good to know. So, I could replace these netdev_err with dev_err for now, up until that point, so I can get a bit more info, yes? > > Regards, > Dongpo > Regards, Marty
Re: [PATCH v3 net-next 4/9] sunvnet: add driver stats for ethtool support
On Fri, 10 Feb 2017 09:38:20 -0800 Shannon Nelsonwrote: > +static void vsw_get_ethtool_stats(struct net_device *dev, > + struct ethtool_stats *estats, u64 *data) > +{ > + int i = 0; > + > + data[i++] = dev->stats.rx_packets; > + data[i++] = dev->stats.tx_packets; > + data[i++] = dev->stats.rx_bytes; > + data[i++] = dev->stats.tx_bytes; > + data[i++] = dev->stats.rx_errors; > + data[i++] = dev->stats.tx_errors; > + data[i++] = dev->stats.rx_dropped; > + data[i++] = dev->stats.tx_dropped; > + data[i++] = dev->stats.multicast; Please do not duplicate regular network statistics into ethtool. This doesn't really add any value.
[PATCHv6 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
Tap character devices can be implemented on other virtual interfaces like ipvlan, similar to macvtap. Source code for tap functionality in macvtap can be re-used for this purpose. This patch series splits macvtap source into two modules, macvtap and tap. This patch series also includes a patch for implementing tap character device driver based on the IP-VLAN network interface, called ipvtap. These patches are tested on x86 platform. Sainath Grandhi (7): tap: Refactoring macvtap.c tap: Renaming tap related APIs, data structures, macros tap: Tap character device creation/destroy API tap: Abstract type of virtual interface from tap implementation tap: Extending tap device create/destroy APIs tap: tap as an independent module ipvtap: IP-VLAN based tap driver drivers/net/Kconfig | 20 + drivers/net/Makefile |2 + drivers/net/ipvlan/Makefile |1 + drivers/net/ipvlan/ipvlan.h |7 + drivers/net/ipvlan/ipvlan_core.c |3 +- drivers/net/ipvlan/ipvlan_main.c | 27 +- drivers/net/ipvlan/ipvtap.c | 241 +++ drivers/net/macvlan.c|2 +- drivers/net/macvtap.c| 1229 ++-- drivers/net/tap.c| 1285 ++ drivers/vhost/Kconfig|2 +- drivers/vhost/net.c |3 +- include/linux/if_macvlan.h | 17 +- include/linux/if_tap.h | 75 +++ 14 files changed, 1706 insertions(+), 1208 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c create mode 100644 drivers/net/tap.c create mode 100644 include/linux/if_tap.h -- 2.7.4
Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume
On Fri, Feb 10, 2017 at 12:05:07PM -0600, Grygorii Strashko wrote: > > > On 02/09/2017 07:45 PM, David Miller wrote: > >From: Ivan Khoronzhuk> >Date: Fri, 10 Feb 2017 00:54:24 +0200 > > > >>On Thu, Feb 09, 2017 at 05:21:26PM -0500, David Miller wrote: > >>>From: Ivan Khoronzhuk > >>>Date: Thu, 9 Feb 2017 02:07:34 +0200 > >>> > These two patches fix suspend/resume chain. > >>> > >>>Patch 2 doesn't apply cleanly to the 'net' tree, please > >>>respin this series. > >> > >>Strange, I've just checked it on net-next/master, it was applied w/o any > >>warnings. > > > >It makes no sense to test "net-next" when I am telling you that it is > >the "net" tree it doesn't apply to. > > > >This is a bug fix, so it should be targetting the "net" tree. > > > > Looks like the first fix is for net, but the second one is for net-next > I do not see > 03fd01ad0eead23eb79294b6fb4d71dcac493855 > "net: ethernet: ti: cpsw: don't duplicate ndev_running" > in net. There is dependency, both for net-next and only first is for net tree > > -- > regards, > -grygorii
[PATCHv6 1/7] tap: Refactoring macvtap.c
macvtap module has code for tap/queue management and link management. This patch splits the code into macvtap_main.c for link management and tap.c for tap/queue management. Functionality in tap.c can be re-used for implementing tap on other virtual interfaces. Signed-off-by: Sainath Grandhi--- drivers/net/Makefile | 2 + drivers/net/macvtap_main.c | 218 +++ drivers/net/{macvtap.c => tap.c} | 204 ++-- include/linux/if_macvtap.h | 10 ++ 4 files changed, 238 insertions(+), 196 deletions(-) create mode 100644 drivers/net/macvtap_main.c rename drivers/net/{macvtap.c => tap.c} (84%) create mode 100644 include/linux/if_macvtap.h diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7336cbd..19b03a9 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -29,6 +29,8 @@ obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o +macvtap-objs := macvtap_main.o tap.o + # # Networking Drivers # diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c new file mode 100644 index 000..96ffa60 --- /dev/null +++ b/drivers/net/macvtap_main.c @@ -0,0 +1,218 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +/* + * Variables for dealing with macvtaps device numbers. + */ +static dev_t macvtap_major; +#define MACVTAP_NUM_DEVS (1U << MINORBITS) + +static const void *macvtap_net_namespace(struct device *d) +{ + struct net_device *dev = to_net_dev(d->parent); + return dev_net(dev); +} + +static struct class macvtap_class = { + .name = "macvtap", + .owner = THIS_MODULE, + .ns_type = _ns_type_operations, + .namespace = macvtap_net_namespace, +}; +static struct cdev macvtap_cdev; + +#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ + NETIF_F_TSO6 | NETIF_F_UFO) + +static int macvtap_newlink(struct net *src_net, + struct net_device *dev, + struct nlattr *tb[], + struct nlattr *data[]) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + int err; + + INIT_LIST_HEAD(>queue_list); + + /* Since macvlan supports all offloads by default, make +* tap support all offloads also. +*/ + vlan->tap_features = TUN_OFFLOADS; + + err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan); + if (err) + return err; + + /* Don't put anything that may fail after macvlan_common_newlink +* because we can't undo what it does. +*/ + err = macvlan_common_newlink(src_net, dev, tb, data); + if (err) { + netdev_rx_handler_unregister(dev); + return err; + } + + return 0; +} + +static void macvtap_dellink(struct net_device *dev, + struct list_head *head) +{ + netdev_rx_handler_unregister(dev); + macvtap_del_queues(dev); + macvlan_dellink(dev, head); +} + +static void macvtap_setup(struct net_device *dev) +{ + macvlan_common_setup(dev); + dev->tx_queue_len = TUN_READQ_SIZE; +} + +static struct rtnl_link_ops macvtap_link_ops __read_mostly = { + .kind = "macvtap", + .setup = macvtap_setup, + .newlink= macvtap_newlink, + .dellink= macvtap_dellink, +}; + +static int macvtap_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct macvlan_dev *vlan; + struct device *classdev; + dev_t devt; + int err; + char tap_name[IFNAMSIZ]; + + if (dev->rtnl_link_ops != _link_ops) + return NOTIFY_DONE; + + snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex); + vlan = netdev_priv(dev); + + switch (event) { + case NETDEV_REGISTER: + /* Create the device node here after the network device has +* been registered but before register_netdevice has +* finished running. +*/ + err = macvtap_get_minor(vlan); + if (err) + return notifier_from_errno(err); + + devt = MKDEV(MAJOR(macvtap_major), vlan->minor); + classdev = device_create(_class, >dev, devt, +dev, tap_name); + if (IS_ERR(classdev)) { + macvtap_free_minor(vlan); + return notifier_from_errno(PTR_ERR(classdev)); + } + err =
[PATCHv6 2/7] tap: Renaming tap related APIs, data structures, macros
Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* to tap_.* Signed-off-by: Sainath Grandhi--- drivers/net/macvtap_main.c | 18 +-- drivers/net/tap.c | 332 ++--- drivers/vhost/net.c| 3 +- include/linux/if_macvlan.h | 17 +-- include/linux/if_macvtap.h | 10 -- include/linux/if_tap.h | 23 6 files changed, 202 insertions(+), 201 deletions(-) delete mode 100644 include/linux/if_macvtap.h create mode 100644 include/linux/if_tap.h diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 96ffa60..548f339 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -1,6 +1,6 @@ #include #include -#include +#include #include #include #include @@ -62,7 +62,7 @@ static int macvtap_newlink(struct net *src_net, */ vlan->tap_features = TUN_OFFLOADS; - err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan); + err = netdev_rx_handler_register(dev, tap_handle_frame, vlan); if (err) return err; @@ -82,7 +82,7 @@ static void macvtap_dellink(struct net_device *dev, struct list_head *head) { netdev_rx_handler_unregister(dev); - macvtap_del_queues(dev); + tap_del_queues(dev); macvlan_dellink(dev, head); } @@ -121,7 +121,7 @@ static int macvtap_device_event(struct notifier_block *unused, * been registered but before register_netdevice has * finished running. */ - err = macvtap_get_minor(vlan); + err = tap_get_minor(vlan); if (err) return notifier_from_errno(err); @@ -129,7 +129,7 @@ static int macvtap_device_event(struct notifier_block *unused, classdev = device_create(_class, >dev, devt, dev, tap_name); if (IS_ERR(classdev)) { - macvtap_free_minor(vlan); + tap_free_minor(vlan); return notifier_from_errno(PTR_ERR(classdev)); } err = sysfs_create_link(>dev.kobj, >kobj, @@ -144,10 +144,10 @@ static int macvtap_device_event(struct notifier_block *unused, sysfs_remove_link(>dev.kobj, tap_name); devt = MKDEV(MAJOR(macvtap_major), vlan->minor); device_destroy(_class, devt); - macvtap_free_minor(vlan); + tap_free_minor(vlan); break; case NETDEV_CHANGE_TX_QUEUE_LEN: - if (macvtap_queue_resize(vlan)) + if (tap_queue_resize(vlan)) return NOTIFY_BAD; break; } @@ -159,7 +159,7 @@ static struct notifier_block macvtap_notifier_block __read_mostly = { .notifier_call = macvtap_device_event, }; -extern struct file_operations macvtap_fops; +extern struct file_operations tap_fops; static int macvtap_init(void) { int err; @@ -169,7 +169,7 @@ static int macvtap_init(void) if (err) goto out1; - cdev_init(_cdev, _fops); + cdev_init(_cdev, _fops); err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS); if (err) goto out2; diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 6f6228e..15ca2d5 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -24,16 +24,16 @@ #include /* - * A macvtap queue is the central object of this driver, it connects + * A tap queue is the central object of this driver, it connects * an open character device to a macvlan interface. There can be * multiple queues on one interface, which map back to queues * implemented in hardware on the underlying device. * - * macvtap_proto is used to allocate queues through the sock allocation + * tap_proto is used to allocate queues through the sock allocation * mechanism. * */ -struct macvtap_queue { +struct tap_queue { struct sock sk; struct socket sock; struct socket_wq wq; @@ -47,21 +47,21 @@ struct macvtap_queue { struct skb_array skb_array; }; -#define MACVTAP_FEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) +#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) -#define MACVTAP_VNET_LE 0x8000 -#define MACVTAP_VNET_BE 0x4000 +#define TAP_VNET_LE 0x8000 +#define TAP_VNET_BE 0x4000 #ifdef CONFIG_TUN_VNET_CROSS_LE -static inline bool macvtap_legacy_is_little_endian(struct macvtap_queue *q) +static inline bool tap_legacy_is_little_endian(struct tap_queue *q) { - return q->flags & MACVTAP_VNET_BE ? false : + return q->flags & TAP_VNET_BE ? false : virtio_legacy_is_little_endian(); } -static long macvtap_get_vnet_be(struct macvtap_queue *q, int __user *sp) +static long tap_get_vnet_be(struct tap_queue *q, int __user *sp)
RE: [PATCHv5 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
> -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, February 09, 2017 2:08 PM > To: Grandhi, Sainath> Cc: netdev@vger.kernel.org; mah...@bandewar.net; linux- > ker...@vger.kernel.org > Subject: Re: [PATCHv5 0/7] Refactor macvtap to re-use tap functionality by > other virtual intefaces > > From: Sainath Grandhi > Date: Wed, 8 Feb 2017 13:37:09 -0800 > > > Tap character devices can be implemented on other virtual interfaces > > like ipvlan, similar to macvtap. Source code for tap functionality in > > macvtap can be re-used for this purpose. > > > > This patch series splits macvtap source into two modules, macvtap and tap. > > This patch series also includes a patch for implementing tap character > > device driver based on the IP-VLAN network interface, called ipvtap. > > > > These patches are tested on x86 platform. > > I get rejects on patch #7 when I try to apply this to net-next, please respin. Please check next version. I have based it on net-next. There is a change in "net-next" repo with ipvlan_core.c that has not made into "net" repo.
[PATCHv6 4/7] tap: Abstract type of virtual interface from tap implementation
macvlan object is re-structured to hold tap related elements in a separate entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with idr and fetched again on tap_open. Few of the tap functions are modified to accepted tap_dev as argument. tap_dev object includes callbacks to be used by underlying virtual interface to take care of tx and rx accounting. Signed-off-by: Sainath Grandhi--- drivers/net/macvlan.c | 2 +- drivers/net/macvtap_main.c | 71 +--- drivers/net/tap.c | 264 - include/linux/if_tap.h | 57 +- 4 files changed, 229 insertions(+), 165 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index cbfc1be..9261722 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1525,7 +1525,6 @@ static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = { int macvlan_link_register(struct rtnl_link_ops *ops) { /* common fields */ - ops->priv_size = sizeof(struct macvlan_dev); ops->validate = macvlan_validate; ops->maxtype= IFLA_MACVLAN_MAX; ops->policy = macvlan_policy; @@ -1548,6 +1547,7 @@ static struct rtnl_link_ops macvlan_link_ops = { .newlink= macvlan_newlink, .dellink= macvlan_dellink, .get_link_net = macvlan_get_link_net, + .priv_size = sizeof(struct macvlan_dev), }; static int macvlan_device_event(struct notifier_block *unused, diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 215ab7a..0238df6 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -24,6 +24,11 @@ #include #include +struct macvtap_dev { + struct macvlan_dev vlan; + struct tap_devtap; +}; + /* * Variables for dealing with macvtaps device numbers. */ @@ -46,22 +51,55 @@ static struct cdev macvtap_cdev; #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ NETIF_F_TSO6 | NETIF_F_UFO) +static void macvtap_count_tx_dropped(struct tap_dev *tap) +{ + struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, tap); + struct macvlan_dev *vlan = >vlan; + + this_cpu_inc(vlan->pcpu_stats->tx_dropped); +} + +static void macvtap_count_rx_dropped(struct tap_dev *tap) +{ + struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, tap); + struct macvlan_dev *vlan = >vlan; + + macvlan_count_rx(vlan, 0, 0, 0); +} + +static void macvtap_update_features(struct tap_dev *tap, + netdev_features_t features) +{ + struct macvtap_dev *vlantap = container_of(tap, struct macvtap_dev, tap); + struct macvlan_dev *vlan = >vlan; + + vlan->set_features = features; + netdev_update_features(vlan->dev); +} + static int macvtap_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { - struct macvlan_dev *vlan = netdev_priv(dev); + struct macvtap_dev *vlantap = netdev_priv(dev); int err; - INIT_LIST_HEAD(>queue_list); + INIT_LIST_HEAD(>tap.queue_list); /* Since macvlan supports all offloads by default, make * tap support all offloads also. */ - vlan->tap_features = TUN_OFFLOADS; + vlantap->tap.tap_features = TUN_OFFLOADS; - err = netdev_rx_handler_register(dev, tap_handle_frame, vlan); + /* Register callbacks for rx/tx drops accounting and updating +* net_device features +*/ + vlantap->tap.count_tx_dropped = macvtap_count_tx_dropped; + vlantap->tap.count_rx_dropped = macvtap_count_rx_dropped; + vlantap->tap.update_features = macvtap_update_features; + + err = netdev_rx_handler_register(dev, tap_handle_frame, >tap); if (err) return err; @@ -74,14 +112,18 @@ static int macvtap_newlink(struct net *src_net, return err; } + vlantap->tap.dev = vlantap->vlan.dev; + return 0; } static void macvtap_dellink(struct net_device *dev, struct list_head *head) { + struct macvtap_dev *vlantap = netdev_priv(dev); + netdev_rx_handler_unregister(dev); - tap_del_queues(dev); + tap_del_queues(>tap); macvlan_dellink(dev, head); } @@ -96,13 +138,14 @@ static struct rtnl_link_ops macvtap_link_ops __read_mostly = { .setup = macvtap_setup, .newlink= macvtap_newlink, .dellink= macvtap_dellink, + .priv_size = sizeof(struct macvtap_dev), }; static int macvtap_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { struct net_device *dev =
[PATCHv6 7/7] ipvtap: IP-VLAN based tap driver
This patch adds a tap character device driver that is based on the IP-VLAN network interface, called ipvtap. An ipvtap device can be created in the same way as an ipvlan device, using 'type ipvtap', and then accessed using the tap user space interface. Signed-off-by: Sainath Grandhi--- drivers/net/Kconfig | 13 +++ drivers/net/Makefile | 1 + drivers/net/ipvlan/Makefile | 1 + drivers/net/ipvlan/ipvlan.h | 7 ++ drivers/net/ipvlan/ipvlan_core.c | 3 +- drivers/net/ipvlan/ipvlan_main.c | 27 +++-- drivers/net/ipvlan/ipvtap.c | 241 +++ 7 files changed, 280 insertions(+), 13 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 5763503..823bc2f 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -166,6 +166,19 @@ config IPVLAN To compile this driver as a module, choose M here: the module will be called ipvlan. +config IPVTAP + tristate "IP-VLAN based tap driver" + depends on IPVLAN + depends on INET + select TAP + ---help--- + This adds a specialized tap character device driver that is based + on the IP-VLAN network interface, called ipvtap. An ipvtap device + can be added in the same way as a ipvlan device, using 'type + ipvtap', and then be accessed through the tap user space interface. + + To compile this driver as a module, choose M here: the module + will be called ipvtap. config VXLAN tristate "Virtual eXtensible Local Area Network (VXLAN)" diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7dd86ca..98ed4d9 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -7,6 +7,7 @@ # obj-$(CONFIG_BONDING) += bonding/ obj-$(CONFIG_IPVLAN) += ipvlan/ +obj-$(CONFIG_IPVTAP) += ipvlan/ obj-$(CONFIG_DUMMY) += dummy.o obj-$(CONFIG_EQUALIZER) += eql.o obj-$(CONFIG_IFB) += ifb.o diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile index df79910..8a2c64d 100644 --- a/drivers/net/ipvlan/Makefile +++ b/drivers/net/ipvlan/Makefile @@ -3,5 +3,6 @@ # obj-$(CONFIG_IPVLAN) += ipvlan.o +obj-$(CONFIG_IPVTAP) += ipvtap.o ipvlan-objs := ipvlan_core.o ipvlan_main.o diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index 406ae4f..800a46c 100644 --- a/drivers/net/ipvlan/ipvlan.h +++ b/drivers/net/ipvlan/ipvlan.h @@ -135,4 +135,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb, u16 proto); unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, +unsigned int len, bool success, bool mcast); +int ipvlan_link_new(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]); +void ipvlan_link_delete(struct net_device *dev, struct list_head *head); +void ipvlan_link_setup(struct net_device *dev); +int ipvlan_link_register(struct rtnl_link_ops *ops); #endif /* __IPVLAN_H */ diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 8ae335d..1f3295e 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -16,7 +16,7 @@ void ipvlan_init_secret(void) net_get_random_once(_jhash_secret, sizeof(ipvlan_jhash_secret)); } -static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, unsigned int len, bool success, bool mcast) { if (likely(success)) { @@ -33,6 +33,7 @@ static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, this_cpu_inc(ipvlan->pcpu_stats->rx_errs); } } +EXPORT_SYMBOL_GPL(ipvlan_count_rx); static u8 ipvlan_get_v6_hash(const void *iaddr) { diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 95b18f4..aa8575c 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -496,8 +496,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb, return ret; } -static int ipvlan_link_new(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[]) +int ipvlan_link_new(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) { struct ipvl_dev *ipvlan = netdev_priv(dev); struct ipvl_port *port; @@ -594,8 +594,9 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev, ipvlan_port_destroy(phy_dev); return err; } +EXPORT_SYMBOL_GPL(ipvlan_link_new); -static void ipvlan_link_delete(struct net_device *dev, struct list_head *head) +void ipvlan_link_delete(struct net_device *dev, struct list_head *head) {
[PATCHv6 5/7] tap: Extending tap device create/destroy APIs
Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one type of virtual interface. Signed-off-by: Sainath Grandhi--- drivers/net/macvtap_main.c | 6 +-- drivers/net/tap.c | 118 + include/linux/if_tap.h | 4 +- 3 files changed, 102 insertions(+), 26 deletions(-) diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 0238df6..a4bfc10 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -163,7 +163,7 @@ static int macvtap_device_event(struct notifier_block *unused, * been registered but before register_netdevice has * finished running. */ - err = tap_get_minor(>tap); + err = tap_get_minor(macvtap_major, >tap); if (err) return notifier_from_errno(err); @@ -171,7 +171,7 @@ static int macvtap_device_event(struct notifier_block *unused, classdev = device_create(_class, >dev, devt, dev, tap_name); if (IS_ERR(classdev)) { - tap_free_minor(>tap); + tap_free_minor(macvtap_major, >tap); return notifier_from_errno(PTR_ERR(classdev)); } err = sysfs_create_link(>dev.kobj, >kobj, @@ -186,7 +186,7 @@ static int macvtap_device_event(struct notifier_block *unused, sysfs_remove_link(>dev.kobj, tap_name); devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor); device_destroy(_class, devt); - tap_free_minor(>tap); + tap_free_minor(macvtap_major, >tap); break; case NETDEV_CHANGE_TX_QUEUE_LEN: if (tap_queue_resize(>tap)) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 7d3e8b1..71bbf0b 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -99,12 +99,17 @@ static struct proto tap_proto = { }; #define TAP_NUM_DEVS (1U << MINORBITS) + +static LIST_HEAD(major_list); + struct major_info { + struct rcu_head rcu; dev_t major; struct idr minor_idr; struct mutex minor_lock; const char *device_name; -} macvtap_major; + struct list_head next; +}; #define GOODCOPY_LEN 128 @@ -385,44 +390,89 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; } -int tap_get_minor(struct tap_dev *tap) +static struct major_info *tap_get_major(int major) +{ + struct major_info *tap_major; + + list_for_each_entry_rcu(tap_major, _list, next) { + if (tap_major->major == major) + return tap_major; + } + + return NULL; +} + +int tap_get_minor(dev_t major, struct tap_dev *tap) { int retval = -ENOMEM; + struct major_info *tap_major; + + rcu_read_lock(); + tap_major = tap_get_major(MAJOR(major)); + if (!tap_major) { + retval = -EINVAL; + goto unlock; + } - mutex_lock(_major.minor_lock); - retval = idr_alloc(_major.minor_idr, tap, 1, TAP_NUM_DEVS, GFP_KERNEL); + mutex_lock(_major->minor_lock); + retval = idr_alloc(_major->minor_idr, tap, 1, TAP_NUM_DEVS, GFP_KERNEL); if (retval >= 0) { tap->minor = retval; } else if (retval == -ENOSPC) { netdev_err(tap->dev, "Too many tap devices\n"); retval = -EINVAL; } - mutex_unlock(_major.minor_lock); + mutex_unlock(_major->minor_lock); + +unlock: + rcu_read_unlock(); return retval < 0 ? retval : 0; } -void tap_free_minor(struct tap_dev *tap) +void tap_free_minor(dev_t major, struct tap_dev *tap) { - mutex_lock(_major.minor_lock); + struct major_info *tap_major; + + rcu_read_lock(); + tap_major = tap_get_major(MAJOR(major)); + if (!tap_major) { + goto unlock; + } + + mutex_lock(_major->minor_lock); if (tap->minor) { - idr_remove(_major.minor_idr, tap->minor); + idr_remove(_major->minor_idr, tap->minor); tap->minor = 0; } - mutex_unlock(_major.minor_lock); + mutex_unlock(_major->minor_lock); + +unlock: + rcu_read_unlock(); } -static struct tap_dev *dev_get_by_tap_minor(int minor) +static struct tap_dev *dev_get_by_tap_file(int major, int minor) { struct net_device *dev = NULL; struct tap_dev *tap; + struct major_info *tap_major; - mutex_lock(_major.minor_lock); - tap = idr_find(_major.minor_idr, minor); + rcu_read_lock(); + tap_major = tap_get_major(major); + if (!tap_major) { + tap = NULL; + goto unlock; + } + + mutex_lock(_major->minor_lock); + tap =
[PATCHv6 6/7] tap: tap as an independent module
This patch makes tap a separate module for other types of virtual interfaces, for example, ipvlan to use. Signed-off-by: Sainath Grandhi--- drivers/net/Kconfig | 7 +++ drivers/net/Makefile | 3 +-- drivers/net/{macvtap_main.c => macvtap.c} | 0 drivers/net/tap.c | 11 +++ drivers/vhost/Kconfig | 2 +- include/linux/if_tap.h| 4 ++-- 6 files changed, 22 insertions(+), 5 deletions(-) rename drivers/net/{macvtap_main.c => macvtap.c} (100%) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index a993cbe..5763503 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -135,6 +135,7 @@ config MACVTAP tristate "MAC-VLAN based tap driver" depends on MACVLAN depends on INET + select TAP help This adds a specialized tap character device driver that is based on the MAC-VLAN network interface, called macvtap. A macvtap device @@ -287,6 +288,12 @@ config TUN If you don't know what to use this for, you don't need it. +config TAP + tristate + ---help--- + This option is selected by any driver implementing tap user space + interface for a virtual interface to re-use core tap functionality. + config TUN_VNET_CROSS_LE bool "Support for cross-endian vnet headers on little-endian kernels" default n diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 19b03a9..7dd86ca 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -21,6 +21,7 @@ obj-$(CONFIG_PHYLIB) += phy/ obj-$(CONFIG_RIONET) += rionet.o obj-$(CONFIG_NET_TEAM) += team/ obj-$(CONFIG_TUN) += tun.o +obj-$(CONFIG_TAP) += tap.o obj-$(CONFIG_VETH) += veth.o obj-$(CONFIG_VIRTIO_NET) += virtio_net.o obj-$(CONFIG_VXLAN) += vxlan.o @@ -29,8 +30,6 @@ obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o -macvtap-objs := macvtap_main.o tap.o - # # Networking Drivers # diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap.c similarity index 100% rename from drivers/net/macvtap_main.c rename to drivers/net/macvtap.c diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 71bbf0b..35b55a2 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -312,6 +312,7 @@ void tap_del_queues(struct tap_dev *tap) /* guarantee that any future tap_set_queue will fail */ tap->numvtaps = MAX_TAP_QUEUES; } +EXPORT_SYMBOL_GPL(tap_del_queues); rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) { @@ -389,6 +390,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) kfree_skb(skb); return RX_HANDLER_CONSUMED; } +EXPORT_SYMBOL_GPL(tap_handle_frame); static struct major_info *tap_get_major(int major) { @@ -428,6 +430,7 @@ int tap_get_minor(dev_t major, struct tap_dev *tap) rcu_read_unlock(); return retval < 0 ? retval : 0; } +EXPORT_SYMBOL_GPL(tap_get_minor); void tap_free_minor(dev_t major, struct tap_dev *tap) { @@ -449,6 +452,7 @@ void tap_free_minor(dev_t major, struct tap_dev *tap) unlock: rcu_read_unlock(); } +EXPORT_SYMBOL_GPL(tap_free_minor); static struct tap_dev *dev_get_by_tap_file(int major, int minor) { @@ -1210,6 +1214,7 @@ int tap_queue_resize(struct tap_dev *tap) kfree(arrays); return ret; } +EXPORT_SYMBOL_GPL(tap_queue_resize); static int tap_list_add(dev_t major, const char *device_name) { @@ -1257,6 +1262,7 @@ int tap_create_cdev(struct cdev *tap_cdev, out1: return err; } +EXPORT_SYMBOL_GPL(tap_create_cdev); void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) { @@ -1272,3 +1278,8 @@ void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) } } } +EXPORT_SYMBOL_GPL(tap_destroy_cdev); + +MODULE_AUTHOR("Arnd Bergmann "); +MODULE_AUTHOR("Sainath Grandhi "); +MODULE_LICENSE("GPL"); diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 40764ec..cfdecea 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -1,6 +1,6 @@ config VHOST_NET tristate "Host kernel accelerator for virtio net" - depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) + depends on NET && EVENTFD && (TUN || !TUN) && (TAP || !TAP) select VHOST ---help--- This kernel module can be loaded in host kernel to accelerate diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 362e71c..3482c3c 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -1,7 +1,7 @@ #ifndef _LINUX_IF_TAP_H_ #define _LINUX_IF_TAP_H_ -#if IS_ENABLED(CONFIG_MACVTAP) +#if IS_ENABLED(CONFIG_TAP) struct socket *tap_get_socket(struct file *); #else #include @@ -12,7 +12,7 @@ static inline struct socket *tap_get_socket(struct file *f) { return ERR_PTR(-EINVAL); } -#endif /*
[PATCHv6 3/7] tap: Tap character device creation/destroy API
This patch provides tap device create/destroy APIs in tap.c. Signed-off-by: Sainath Grandhi--- drivers/net/macvtap_main.c | 30 +++--- drivers/net/tap.c | 62 ++ include/linux/if_tap.h | 3 +++ 3 files changed, 63 insertions(+), 32 deletions(-) diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 548f339..215ab7a 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -28,7 +28,6 @@ * Variables for dealing with macvtaps device numbers. */ static dev_t macvtap_major; -#define MACVTAP_NUM_DEVS (1U << MINORBITS) static const void *macvtap_net_namespace(struct device *d) { @@ -159,57 +158,46 @@ static struct notifier_block macvtap_notifier_block __read_mostly = { .notifier_call = macvtap_device_event, }; -extern struct file_operations tap_fops; static int macvtap_init(void) { int err; - err = alloc_chrdev_region(_major, 0, - MACVTAP_NUM_DEVS, "macvtap"); - if (err) - goto out1; + err = tap_create_cdev(_cdev, _major, "macvtap"); - cdev_init(_cdev, _fops); - err = cdev_add(_cdev, macvtap_major, MACVTAP_NUM_DEVS); if (err) - goto out2; + goto out1; err = class_register(_class); if (err) - goto out3; + goto out2; err = register_netdevice_notifier(_notifier_block); if (err) - goto out4; + goto out3; err = macvlan_link_register(_link_ops); if (err) - goto out5; + goto out4; return 0; -out5: - unregister_netdevice_notifier(_notifier_block); out4: - class_unregister(_class); + unregister_netdevice_notifier(_notifier_block); out3: - cdev_del(_cdev); + class_unregister(_class); out2: - unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS); + tap_destroy_cdev(macvtap_major, _cdev); out1: return err; } module_init(macvtap_init); -extern struct idr minor_idr; static void macvtap_exit(void) { rtnl_link_unregister(_link_ops); unregister_netdevice_notifier(_notifier_block); class_unregister(_class); - cdev_del(_cdev); - unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS); - idr_destroy(_idr); + tap_destroy_cdev(macvtap_major, _cdev); } module_exit(macvtap_exit); diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 15ca2d5..04ba978 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -123,8 +123,12 @@ static struct proto tap_proto = { }; #define TAP_NUM_DEVS (1U << MINORBITS) -static DEFINE_MUTEX(minor_lock); -DEFINE_IDR(minor_idr); +struct major_info { + dev_t major; + struct idr minor_idr; + struct mutex minor_lock; + const char *device_name; +} macvtap_major; #define GOODCOPY_LEN 128 @@ -413,26 +417,26 @@ int tap_get_minor(struct macvlan_dev *vlan) { int retval = -ENOMEM; - mutex_lock(_lock); - retval = idr_alloc(_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL); + mutex_lock(_major.minor_lock); + retval = idr_alloc(_major.minor_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL); if (retval >= 0) { vlan->minor = retval; } else if (retval == -ENOSPC) { netdev_err(vlan->dev, "Too many tap devices\n"); retval = -EINVAL; } - mutex_unlock(_lock); + mutex_unlock(_major.minor_lock); return retval < 0 ? retval : 0; } void tap_free_minor(struct macvlan_dev *vlan) { - mutex_lock(_lock); + mutex_lock(_major.minor_lock); if (vlan->minor) { - idr_remove(_idr, vlan->minor); + idr_remove(_major.minor_idr, vlan->minor); vlan->minor = 0; } - mutex_unlock(_lock); + mutex_unlock(_major.minor_lock); } static struct net_device *dev_get_by_tap_minor(int minor) @@ -440,13 +444,13 @@ static struct net_device *dev_get_by_tap_minor(int minor) struct net_device *dev = NULL; struct macvlan_dev *vlan; - mutex_lock(_lock); - vlan = idr_find(_idr, minor); + mutex_lock(_major.minor_lock); + vlan = idr_find(_major.minor_idr, minor); if (vlan) { dev = vlan->dev; dev_hold(dev); } - mutex_unlock(_lock); + mutex_unlock(_major.minor_lock); return dev; } @@ -1184,3 +1188,39 @@ int tap_queue_resize(struct macvlan_dev *vlan) kfree(arrays); return ret; } + +int tap_create_cdev(struct cdev *tap_cdev, + dev_t *tap_major, const char *device_name) +{ + int err; + + err = alloc_chrdev_region(tap_major, 0, TAP_NUM_DEVS, device_name); + if (err) + goto out1; + + cdev_init(tap_cdev, _fops); +
[PATCH] NET: Fix /proc/net/arp for AX.25
When sending ARP requests over AX.25 links the hwaddress in the neighbour cache are not getting initialized. For such an incomplete arp entry ax2asc2 will generate an empty string resulting in /proc/net/arp output like the following: $ cat /proc/net/arp IP address HW type Flags HW addressMask Device 192.168.122.10x1 0x2 52:54:00:00:5d:5f *ens3 172.20.1.99 0x3 0x0 *bpq0 The missing field will confuse the procfs parsing of arp(8) resulting in incorrect output for the device such as the following: $ arp Address HWtype HWaddress Flags MaskIface gateway ether 52:54:00:00:5d:5f C ens3 172.20.1.99 (incomplete) ens3 This changes the content of /proc/net/arp to: $ cat /proc/net/arp IP address HW type Flags HW addressMask Device 172.20.1.99 0x3 0x0 * *bpq0 192.168.122.10x1 0x2 52:54:00:00:5d:5f *ens3 To do so it change ax2asc to put the string "*" in buf for a NULL address argument. Finally the HW address field is left aligned in a 17 character field (the length of an ethernet HW address in the usual hex notation) for readability. Signed-off-by: Ralf Baechle--- net/ipv4/arp.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 89a8cac4..51b27ae 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -1263,7 +1263,7 @@ void __init arp_init(void) /* * ax25 -> ASCII conversion */ -static char *ax2asc2(ax25_address *a, char *buf) +static void ax2asc2(ax25_address *a, char *buf) { char c, *s; int n; @@ -1285,10 +1285,10 @@ static char *ax2asc2(ax25_address *a, char *buf) *s++ = n + '0'; *s++ = '\0'; - if (*buf == '\0' || *buf == '-') - return "*"; - - return buf; + if (*buf == '\0' || *buf == '-') { + buf[0] = '*'; + buf[1] = '\0'; + } } #endif /* CONFIG_AX25 */ @@ -1322,7 +1322,7 @@ static void arp_format_neigh_entry(struct seq_file *seq, } #endif sprintf(tbuf, "%pI4", n->primary_key); - seq_printf(seq, "%-16s 0x%-10x0x%-10x%s *%s\n", + seq_printf(seq, "%-16s 0x%-10x0x%-10x%-17s *%s\n", tbuf, hatype, arp_state_to_flags(n), hbuffer, dev->name); read_unlock(>lock); }
[PATCH] ibmvnic: Initialize completion variables before starting work
Initialize condition variables prior to invoking any work that can mark them complete. This resolves a race in the ibmvnic driver where the driver faults trying to complete an uninitialized condition variable. Signed-off-by: Nathan Fontenot--- drivers/net/ethernet/ibm/ibmvnic.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index a024141..752b082 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -189,9 +189,10 @@ static int alloc_long_term_buff(struct ibmvnic_adapter *adapter, } ltb->map_id = adapter->map_id; adapter->map_id++; + + init_completion(>fw_done); send_request_map(adapter, ltb->addr, ltb->size, ltb->map_id); - init_completion(>fw_done); wait_for_completion(>fw_done); return 0; } @@ -1121,10 +1122,10 @@ static void ibmvnic_get_ethtool_stats(struct net_device *dev, crq.request_statistics.ioba = cpu_to_be32(adapter->stats_token); crq.request_statistics.len = cpu_to_be32(sizeof(struct ibmvnic_statistics)); - ibmvnic_send_crq(adapter, ); /* Wait for data to be written */ init_completion(>stats_done); + ibmvnic_send_crq(adapter, ); wait_for_completion(>stats_done); for (i = 0; i < ARRAY_SIZE(ibmvnic_stats); i++) @@ -2799,9 +2800,9 @@ static ssize_t trace_read(struct file *file, char __user *user_buf, size_t len, crq.collect_fw_trace.correlator = adapter->ras_comps[num].correlator; crq.collect_fw_trace.ioba = cpu_to_be32(trace_tok); crq.collect_fw_trace.len = adapter->ras_comps[num].trace_buff_size; - ibmvnic_send_crq(adapter, ); init_completion(>fw_done); + ibmvnic_send_crq(adapter, ); wait_for_completion(>fw_done); if (*ppos + len > be32_to_cpu(adapter->ras_comps[num].trace_buff_size)) @@ -3581,9 +3582,9 @@ static int ibmvnic_dump_show(struct seq_file *seq, void *v) memset(, 0, sizeof(crq)); crq.request_dump_size.first = IBMVNIC_CRQ_CMD; crq.request_dump_size.cmd = REQUEST_DUMP_SIZE; - ibmvnic_send_crq(adapter, ); init_completion(>fw_done); + ibmvnic_send_crq(adapter, ); wait_for_completion(>fw_done); seq_write(seq, adapter->dump_data, adapter->dump_data_size); @@ -3629,8 +3630,8 @@ static void handle_crq_init_rsp(struct work_struct *work) } } - send_version_xchg(adapter); reinit_completion(>init_done); + send_version_xchg(adapter); if (!wait_for_completion_timeout(>init_done, timeout)) { dev_err(dev, "Passive init timeout\n"); goto task_failed; @@ -3640,9 +3641,9 @@ static void handle_crq_init_rsp(struct work_struct *work) if (adapter->renegotiate) { adapter->renegotiate = false; release_sub_crqs_no_irqs(adapter); - send_cap_queries(adapter); reinit_completion(>init_done); + send_cap_queries(adapter); if (!wait_for_completion_timeout(>init_done, timeout)) { dev_err(dev, "Passive init timeout\n"); @@ -3772,9 +3773,9 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) adapter->debugfs_dump = ent; } } - ibmvnic_send_crq_init(adapter); init_completion(>init_done); + ibmvnic_send_crq_init(adapter); if (!wait_for_completion_timeout(>init_done, timeout)) return 0; @@ -3782,9 +3783,9 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) if (adapter->renegotiate) { adapter->renegotiate = false; release_sub_crqs_no_irqs(adapter); - send_cap_queries(adapter); reinit_completion(>init_done); + send_cap_queries(adapter); if (!wait_for_completion_timeout(>init_done, timeout)) return 0;
[PATCH iproute2 1/1] man page: add page for skbmod action
Signed-off-by: Lucas BatesSigned-off-by: Jamal Hadi Salim Signed-off-by: Roman Mashak --- man/man8/Makefile| 2 +- man/man8/tc-skbmod.8 | 137 +++ 2 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 man/man8/tc-skbmod.8 diff --git a/man/man8/Makefile b/man/man8/Makefile index bc2fc81..1bd2f02 100644 --- a/man/man8/Makefile +++ b/man/man8/Makefile @@ -16,7 +16,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 rtpr.8 ss. tc-basic.8 tc-cgroup.8 tc-flow.8 tc-flower.8 tc-fw.8 tc-route.8 \ tc-tcindex.8 tc-u32.8 tc-matchall.8 \ tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \ - tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8 tc-ife.8 \ + tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8 tc-skbmod.8 tc-ife.8 \ tc-tunnel_key.8 \ devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8 \ ifstat.8 diff --git a/man/man8/tc-skbmod.8 b/man/man8/tc-skbmod.8 new file mode 100644 index 000..46418b6 --- /dev/null +++ b/man/man8/tc-skbmod.8 @@ -0,0 +1,137 @@ +.TH "skbmod action in tc" 8 "21 Sep 2016" "iproute2" "Linux" + +.SH NAME +skbmod - user-friendly packet editor action +.SH SYNOPSIS +.in +8 +.ti -8 +.BR tc " ... " "action skbmod " "{ [ " "set " +.IR SETTABLE " ] [ " +.BI swap " SWAPPABLE" +.RI " ] [ " CONTROL " ] [ " +.BI index " INDEX " +] } + +.ti -8 +.IR SETTABLE " := " +.RB " [ " dmac +.IR DMAC " ] " +.RB " [ " smac +.IR SMAC " ] " +.RB " [ " etype +.IR ETYPE " ] " + +.ti -8 +.IR SWAPPABLE " := " +.B mac +.ti -8 +.IR CONTROL " := {" +.BR reclassify " | " pipe " | " drop " | " shot " | " continue " | " pass " }" +.SH DESCRIPTION +The +.B skbmod +action is intended as a usability upgrade to the existing +.B pedit +action. Instead of having to manually edit 8-, 16-, or 32-bit chunks of an +ethernet header, +.B skbmod +allows complete substitution of supported elements. +.SH OPTIONS +.TP +.BI dmac " DMAC" +Change the destination mac to the specified address. +.TP +.BI smac " SMAC" +Change the source mac to the specified address. +.TP +.BI etype " ETYPE" +Change the ethertype to the specified value. +.TP +.BI mac +Used to swap mac addresses. The +.B swap mac +directive is performed +after any outstanding D/SMAC changes. +.TP +.I CONTROL +The following keywords allow to control how the tree of qdisc, classes, +filters and actions is further traversed after this action. +.RS +.TP +.B reclassify +Restart with the first filter in the current list. +.TP +.B pipe +Continue with the next action attached to the same filter. +.TP +.B drop +.TQ +.B shot +Drop the packet. +.TP +.B continue +Continue classification with the next filter in line. +.TP +.B pass +Finish classification process and return to calling qdisc for further packet +processing. This is the default. +.SH EXAMPLES +To start, observe the following filter with a pedit action: + +.RS +.EX +tc filter add dev eth1 parent 1: protocol ip prio 10 \\ + u32 match ip protocol 1 0xff flowid 1:2 \\ + action pedit munge offset -14 u8 set 0x02 \\ + munge offset -13 u8 set 0x15 \\ + munge offset -12 u8 set 0x15 \\ + munge offset -11 u8 set 0x15 \\ + munge offset -10 u16 set 0x1515 \\ + pipe +.EE +.RE + +Using the skbmod action, this command can be simplified to: + +.RS +.EX +tc filter add dev eth1 parent 1: protocol ip prio 10 \\ + u32 match ip protocol 1 0xff flowid 1:2 \\ + action skbmod set dmac 02:15:15:15:15:15 \\ + pipe +.EE +.RE + +Complexity will increase if source mac and ethertype are also being edited +as part of the action. If all three fields are to be changed with skbmod: + +.RS +.EX +tc filter add dev eth5 parent 1: protocol ip prio 10 \\ + u32 match ip protocol 1 0xff flowid 1:2 \\ + action skbmod \\ + set etype 0xBEEF \\ + set dmac 02:12:13:14:15:16 \\ + set smac 02:22:23:24:25:26 +.EE +.RE + +Finally, swap the destination and source mac addresses in the header: + +.RS +.EX +tc filter add dev eth3 parent 1: protocol ip prio 10 \\ + u32 match ip protocol 1 0xff flowid 1:2 \\ + action skbmod \\ + swap mac +.EE +.RE + +As mentioned above, the swap action will occur after any +.B " smac/dmac " +substitutions are executed, if they are present. + +.SH SEE ALSO +.BR tc (8), +.BR tc-u32 (8), +.BR tc-pedit (8) -- 2.7.4
[PATCH] ibmvnic: Call napi_disable instead of napi_enable in failure path
The failure path in ibmvnic_open() mistakenly makes a second call to napi_enable instead of calling napi_disable. This can result in a BUG_ON for any queues that were enabled in the previous call to napi_enable. Signed-off-by: Nathan Fontenot--- drivers/net/ethernet/ibm/ibmvnic.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index c125966..a024141 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -505,7 +505,7 @@ static int ibmvnic_open(struct net_device *netdev) adapter->rx_pool = NULL; rx_pool_arr_alloc_failed: for (i = 0; i < adapter->req_rx_queues; i++) - napi_enable(>napi[i]); + napi_disable(>napi[i]); alloc_napi_failed: return -ENOMEM; }
[PATCH] NET: mkiss/6pack: Fix SIOCSIFENCAP ioctl
When looking at Thomas' mkiss fix 7ba1b6890387 ("NET: mkiss: Fix panic") I noticed that the mkiss SIOCSIFENCAPS ioctl was also doing a slightly strange assignment dev->hard_header_len = AX25_KISS_HEADER_LEN + AX25_MAX_HEADER_LEN + 3; AX25_MAX_HEADER_LEN already accounts for the KISS byte so adding AX25_KISS_HEADER_LEN is a double allocation nor does the "+ 3" seem to be necessary. So this can be simplified to dev->hard_header_len = AX25_MAX_HEADER_LEN which after the preceeding fix is a redundant assignment of what ax_setup has already assigned so delete the line. The assignments to dev->addr_len and dev->type are similarly redundant. The SIOCSIFENCAP argument was never checked for validity. Check that it is 4 and return -EINVAL if not. The magic constant 4 dates back to the days when KISS was handled by the SLIP driver where it had the symbol name SL_MODE_AX25. Since however mkiss.c only supports a single encapsulation mode there is no point in storing it in struct mkiss so delete all that. Note that while useless we can't delete the SIOCSIFENCAP ioctl as kissattach(8) is still using it and without mkiss issuing a SIOCSIFENCAP ioctl an older kernel that does not have Thomas' mkiss fix would still panic on attempt to transmit via mkiss. 6pack was suffering from the same issue except there SIOCGIFENCAP was return 0 for the encapsulation while the spattach utility was passing 4 for the mode, so the mode check added for 6pack is a bit more lenient allow the values 0 and 4 to be set. That way we retain the option to set different encapsulation modes for future extensions. Signed-off-by: Ralf Baechledrivers/net/hamradio/6pack.c | 10 -- drivers/net/hamradio/mkiss.c | 10 -- 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c index 470b3dc..d949b9f 100644 --- a/drivers/net/hamradio/6pack.c +++ b/drivers/net/hamradio/6pack.c @@ -104,7 +104,6 @@ struct sixpack { int buffsize; /* Max buffers sizes */ unsigned long flags; /* Flag values/ mode etc */ - unsigned char mode; /* 6pack mode */ /* 6pack stuff */ unsigned char tx_delay; @@ -723,11 +722,10 @@ static int sixpack_ioctl(struct tty_struct *tty, struct file *file, break; } - sp->mode = tmp; - dev->addr_len= AX25_ADDR_LEN; - dev->hard_header_len = AX25_KISS_HEADER_LEN + - AX25_MAX_HEADER_LEN + 3; - dev->type= ARPHRD_AX25; + if (tmp != 0 && tmp != 4) { + err = -EINVAL; + break; + } err = 0; break; diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c index 1dfe230..cdaf819 100644 --- a/drivers/net/hamradio/mkiss.c +++ b/drivers/net/hamradio/mkiss.c @@ -71,7 +71,6 @@ struct mkiss { #define AXF_KEEPTEST 3 /* Keepalive test flag */ #define AXF_OUTWAIT4 /* is outpacket was flag*/ - int mode; intcrcmode;/* MW: for FlexNet, SMACK etc. */ int crcauto;/* CRC auto mode */ @@ -841,11 +840,10 @@ static int mkiss_ioctl(struct tty_struct *tty, struct file *file, break; } - ax->mode = tmp; - dev->addr_len= AX25_ADDR_LEN; - dev->hard_header_len = AX25_KISS_HEADER_LEN + - AX25_MAX_HEADER_LEN + 3; - dev->type= ARPHRD_AX25; + if (tmp != 4) { + err = -EINVAL; + break; + } err = 0; break;
[PATCH] net: natsemi: ns83820: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes--- drivers/net/ethernet/natsemi/ns83820.c | 46 +-- 1 files changed, 25 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/natsemi/ns83820.c b/drivers/net/ethernet/natsemi/ns83820.c index f9d2eb9..729095d 100644 --- a/drivers/net/ethernet/natsemi/ns83820.c +++ b/drivers/net/ethernet/natsemi/ns83820.c @@ -1217,12 +1217,13 @@ static void ns83820_update_stats(struct ns83820 *dev) } /* Let ethtool retrieve info */ -static int ns83820_get_settings(struct net_device *ndev, - struct ethtool_cmd *cmd) +static int ns83820_get_link_ksettings(struct net_device *ndev, + struct ethtool_link_ksettings *cmd) { struct ns83820 *dev = PRIV(ndev); u32 cfg, tanar, tbicr; int fullduplex = 0; + u32 supported; /* * Here's the list of available ethtool commands from other drivers: @@ -1244,44 +1245,47 @@ static int ns83820_get_settings(struct net_device *ndev, fullduplex = (cfg & CFG_DUPSTS) ? 1 : 0; - cmd->supported = SUPPORTED_Autoneg; + supported = SUPPORTED_Autoneg; if (dev->CFG_cache & CFG_TBI_EN) { /* we have optical interface */ - cmd->supported |= SUPPORTED_1000baseT_Half | + supported |= SUPPORTED_1000baseT_Half | SUPPORTED_1000baseT_Full | SUPPORTED_FIBRE; - cmd->port = PORT_FIBRE; + cmd->base.port = PORT_FIBRE; } else { /* we have copper */ - cmd->supported |= SUPPORTED_10baseT_Half | + supported |= SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full | SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full | SUPPORTED_1000baseT_Half | SUPPORTED_1000baseT_Full | SUPPORTED_MII; - cmd->port = PORT_MII; + cmd->base.port = PORT_MII; } - cmd->duplex = fullduplex ? DUPLEX_FULL : DUPLEX_HALF; + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported, + supported); + + cmd->base.duplex = fullduplex ? DUPLEX_FULL : DUPLEX_HALF; switch (cfg / CFG_SPDSTS0 & 3) { case 2: - ethtool_cmd_speed_set(cmd, SPEED_1000); + cmd->base.speed = SPEED_1000; break; case 1: - ethtool_cmd_speed_set(cmd, SPEED_100); + cmd->base.speed = SPEED_100; break; default: - ethtool_cmd_speed_set(cmd, SPEED_10); + cmd->base.speed = SPEED_10; break; } - cmd->autoneg = (tbicr & TBICR_MR_AN_ENABLE) + cmd->base.autoneg = (tbicr & TBICR_MR_AN_ENABLE) ? AUTONEG_ENABLE : AUTONEG_DISABLE; return 0; } /* Let ethool change settings*/ -static int ns83820_set_settings(struct net_device *ndev, - struct ethtool_cmd *cmd) +static int ns83820_set_link_ksettings(struct net_device *ndev, + const struct ethtool_link_ksettings *cmd) { struct ns83820 *dev = PRIV(ndev); u32 cfg, tanar; @@ -1306,10 +1310,10 @@ static int ns83820_set_settings(struct net_device *ndev, spin_lock(>tx_lock); /* Set duplex */ - if (cmd->duplex != fullduplex) { + if (cmd->base.duplex != fullduplex) { if (have_optical) { /*set full duplex*/ - if (cmd->duplex == DUPLEX_FULL) { + if (cmd->base.duplex == DUPLEX_FULL) { /* force full duplex */ writel(readl(dev->base + TXCFG) | TXCFG_CSI | TXCFG_HBI | TXCFG_ATP, @@ -1333,7 +1337,7 @@ static int ns83820_set_settings(struct net_device *ndev, /* Set autonegotiation */ if (1) { - if (cmd->autoneg == AUTONEG_ENABLE) { + if (cmd->base.autoneg == AUTONEG_ENABLE) { /* restart auto negotiation */ writel(TBICR_MR_AN_ENABLE | TBICR_MR_RESTART_AN, dev->base + TBICR); @@ -1348,7 +1352,7 @@ static int ns83820_set_settings(struct net_device *ndev, } printk(KERN_INFO "%s: autoneg %s via ethtool\n", ndev->name, - cmd->autoneg ? "ENABLED" : "DISABLED"); + cmd->base.autoneg ? "ENABLED" : "DISABLED"); }
Re: [RFC PATCH net-next 1/2] bpf: Save original ebpf instructions
On 02/10/2017 06:22 AM, Alexei Starovoitov wrote: On Thu, Feb 09, 2017 at 12:25:37PM +0100, Daniel Borkmann wrote: Correct the overlap both use-cases share is the dump itself. It needs to be in such a condition for CRIU, that it can be reloaded eventually, I don't think it makes sense to drag criu into this discussion. I expressed my take on criu in the other thread. tldr: bpf is a graph of dependencies between programs, maps, applications and kernel events. So to save/restore this graph one would need to solve very hard problems of stopping multiple applications at once, stopping kernel events and so on. I don't think it's worth going that route. Definitely not straight forward, fully agree. Worst-case you probably need to go via stop_machine() (like in ftrace case when it modifies code) in order to get a global consistent snapshot at a specific time. Sounds ugly. Or if small steps first, tail calls etc would not be supported; then you would need to tackle progs and generic maps, for the progs part it could be a very similar interface at least, thus I'm saying that it would be good if it's designed extendable in future on that regard. - Alternatively, the attach is always done by passing the FD as an attribute, so the netlink dump could attach an fd to the running program, return the FD as an attribute and the bpf program is retrieved >from the fd. This is a major departure from how dumps work with processing attributes and needing to attach open files to a process will be problematic. Integrating the bpf into the dump is a natural fit. Right, I think it's a natural fit to place it into the various points/ places where it's attached to, as we're stuck with that anyway for the attachment part. Meaning in cls_bpf, it would go as a mem blob into the netlink attribute. There would need to be a common BPF core helper that the various subsystem users call in order to generate that mentioned output format, and that resulting mem blob is then stuck into either nlattr, mem provided by syscall, etc. I think if we use ten different ways to dump it, it will complicate the user space tooling. I'd rather see one way of doing it via new syscall command. Pass prog_fd and it will return insns in some form. Here is more concrete proposal: - add two flags to PROG_LOAD: BPF_F_ENFORCE_STATELESS - it will require verifier to check that program doesn't use maps and any other global state (doesn't use bpf_redirect, doesn't use bpf_set_tunnel_key and tunnel_opt) This will ensure that program is stateless and pure instruction dump is meaningful. For 'ip vrf' case it will be enough. I don't think such flag will be needed from uapi pov. Verifier can just set a flag like that in the bpf_prog aux bits while verifying ... BPF_F_ALLOW_DUMP - it will save original program, so in the common case we wouldn't need to waste memory to save program ... and when that one is passed and the prog has state, then it gets rejected. Effectively, both flags are saying the same thing. Plus side is that you don't waste any resources when not set, but problem I see is that BPF_F_ALLOW_DUMP requires explicit cooperation from a process, when used for introspection doing that transparently instead might be more desirable. Problem is that even when transparent, we have mentioned limitations, so someone who doesn't want to cooperate could then just use f.e. an empty tail call map on exit and that would be enough to make dump not supported again. But also with just BPF_F_ALLOW_DUMP, I can foresee that in half a year or so people request that dump should be possible also without BPF_F_ALLOW_DUMP explicitly set. - add new bpf syscall command BPF_PROG_DUMP input: prog_fd, output: insns it will work right away with OBJ_GET command and the user will be able to dump stateless programs pinned in bpffs (And with that it requires really cooperation by design.) - add approriate interfaces for different attach points to return prog_fd: for cgroup it will be new BPF_PROG_GET command. for socket it will be new getsockopt. (Actually BPF_PROG_GET can work for sockets too and probably better). I assume you mean above BPF_PROG_DUMP, right? Yeah, for them it's not that difficult, agree. for xdp and tc we need to find a way to return prog_fd. netlink is no good, since it would be very weird to install fd and return it async in netlink body. We can simply say that whoever wants to dump programs need to first pin them in bpffs and then attach to tc/xdp. iproute2 already does it anyway. Realistically tc/xdp programs are almost always stateful, so dump won't be available for them anyway. Right, but if it's just for introspection, I still think that this format I described earlier could work. Meaning for maps, you dump all the params used to create the map along with refs where they are used, that would allow for tc/xdp to dump it at least. It still wouldn't support tail calls, but you
Re: [PATCH net-next 4/4] net/sched: cls_bpf: Use skip flags to reflect HW offload status
On Thu, 9 Feb 2017 16:18:08 +0200, Or Gerlitz wrote: > Currently there is no way of querying whether a filter is > offloaded to HW or not when using both policy (no flag). > > Reuse the skip flags to show the insertion status by setting > the skip_hw flag in case the filter wasn't offloaded. > > Signed-off-by: Or GerlitzFWIW I tested this one and it works. I also tested this version which would take advantage of @offloaded: diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index d9c97018317d..51d464f991ff 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -568,8 +568,8 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh, struct sk_buff *skb, struct tcmsg *tm) { struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh; + u32 gen_flags, bpf_flags = 0; struct nlattr *nest; - u32 bpf_flags = 0; int ret; if (prog == NULL) @@ -601,8 +601,11 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh, bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT; if (bpf_flags && nla_put_u32(skb, TCA_BPF_FLAGS, bpf_flags)) goto nla_put_failure; - if (prog->gen_flags && - nla_put_u32(skb, TCA_BPF_FLAGS_GEN, prog->gen_flags)) + + gen_flags = prog->gen_flags; + if (!prog->offloaded) + gen_flags |= TCA_CLS_FLAGS_SKIP_HW; + if (gen_flags && nla_put_u32(skb, TCA_BPF_FLAGS_GEN, gen_flags)) goto nla_put_failure; nla_nest_end(skb, nest);
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
[repost with netdev added - hadn't realized it wasn't in Cc] On Tue, Aug 09, 2016 at 03:58:36PM +0100, Al Viro wrote: > Actually returning to the original behaviour would be "restore ->msg_iter > if we tried skb_copy_and_csum_datagram() and failed for any reason". Which > would be bloody inconsistent wrt EFAULT, since the other branch (chunk > large enough to cover the entire recvmsg()) will copy as much as it can > and (in old kernel) drain iovec or (on the current one) leave iov_iter > advance unreverted. To resurrect the old thread: the problem is still there. Namely, csum mismatch on packet should leave the iterator as it had been. That much is clear; the question is what should be done on EFAULT halfway through. Semantics of both csum and non-csum skb_copy_datagram_msg() variants in EFAULT case is an interesting question. None of that family report partial copy; it's full or -EFAULT. So for the sake of basic sanity it would be better to leave iterator in the original state when that kind of thing happens. On the other hand, quite a few callers don't care about the state of iterator after that and I wonder if the overhead would be sensitive. OTTH, the overhead in question is "save 5 words into local variable and don't use it in the normal case" - in the code that copies an skb worth of data. AFAICS, the following gives consistent (and minimally surprising) semantics, as well as fixing the outright bug with iov_iter left advanced in case of csum errors. Comments? diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index c27011bbe30c..14ae17e77603 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -848,7 +848,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto); total += VLAN_HLEN; - ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset); + ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset); if (ret || !iov_iter_count(iter)) goto done; @@ -857,7 +857,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q, goto done; } - ret = skb_copy_datagram_iter(skb, vlan_offset, iter, + ret = __skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset); done: @@ -899,11 +899,14 @@ static ssize_t macvtap_do_read(struct macvtap_queue *q, finish_wait(sk_sleep(>sk), ); if (skb) { + struct iov_iter saved = *to; ret = macvtap_put_user(q, skb, to); - if (unlikely(ret < 0)) + if (unlikely(ret < 0)) { + *to = saved; kfree_skb(skb); - else + } else { consume_skb(skb); + } } return ret; } diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index a411b43a69eb..0d8badc3c4e9 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -480,7 +480,7 @@ static ssize_t ppp_read(struct file *file, char __user *buf, iov.iov_base = buf; iov.iov_len = count; iov_iter_init(, READ, , 1, count); - if (skb_copy_datagram_iter(skb, 0, , skb->len)) + if (__skb_copy_datagram_iter(skb, 0, , skb->len)) goto outf; ret = skb->len; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 30863e378925..2003b8c9970e 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1430,7 +1430,7 @@ static ssize_t tun_put_user(struct tun_struct *tun, vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto); - ret = skb_copy_datagram_iter(skb, 0, iter, vlan_offset); + ret = __skb_copy_datagram_iter(skb, 0, iter, vlan_offset); if (ret || !iov_iter_count(iter)) goto done; @@ -1439,7 +1439,8 @@ static ssize_t tun_put_user(struct tun_struct *tun, goto done; } - skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset); + /* XXX: no error check? */ + __skb_copy_datagram_iter(skb, vlan_offset, iter, skb->len - vlan_offset); done: /* caller is in process context, */ @@ -1501,6 +1502,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, { struct sk_buff *skb; ssize_t ret; + struct iov_iter saved; int err; tun_debug(KERN_INFO, tun, "tun_do_read\n"); @@ -1513,11 +1515,14 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, if (!skb) return err; + saved = *to; ret = tun_put_user(tun, tfile, skb, to); - if (unlikely(ret < 0)) + if (unlikely(ret < 0)) { + *to = saved; kfree_skb(skb); - else + } else
Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag
On Thu, Feb 9, 2017 at 10:59 AM, Alexei Starovoitovwrote: > If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command > to the given cgroup the descendent cgroup will be able to override > effective bpf program that was inherited from this cgroup. > By default it's not passed, therefore override is disallowed. > > Examples: > 1. > prog X attached to /A with default > prog Y fails to attach to /A/B and /A/B/C > Everything under /A runs prog X > > 2. > prog X attached to /A with ALLOW_OVERRIDE > prog Y attached to /A/B with default. Everything under /A/B runs prog Y I think that, for ease of future extension, Y should also need ALLOW_OVERRIDE. Otherwise, when non-overridable hooks can stack, there could be confusion as to whether Y should override something or should stack. > prog M attached to /A/C with default. Everything under /A/C runs prog M > prog N fails to attach to /A/C/foo. > prog L attached to /A/D with ALLOW_OVERRIDE. > Events under /A/D run prog L and can be overridden in /A/D/foo > > /A still runs prog X > prog K attached to /A with ALLOW_OVERRIDE. > /A now runs prog K while /A/B runs prog Y and /A/C runs prog M > prog J attached to /A with default. > /A now runs prog J while /A/B runs prog Y. > /A/B cannot be changed anymore (since parent disallows override), > but can be cleared. After detach /A/B will run prog J. > > Signed-off-by: Alexei Starovoitov > --- > > Below are few proposals for future extensions and not definitive: > 1. > we can extend the behavior with a chain of non-overridable like: > prog X attached to /A with default > prog Y attached to /A/B with default > The events scoped by /A/B will run program Y first and if it returns 1 > the prog X will be run. For control app there will be an illusion > that it owns cgroup /A/B with single prog and detach from /A/B will delete > prog Y unambiguously. > While another control app that attached to /A also see its prog X running, > unless prog Y filtered it out, which means (from X point of view) > that event didn't happen. > Attaching two programs to /A is not allowed. > We would need to combine prog X and Y into array to avoid link list > traversal for performance reasons, but that's an implementation detail. > > 2. > we can add another flag to reverse this call order too. > Instead of calling the progs from child to parent, do parent to child. I think the order should depend on the hook. Hooks for process-initiated actions (egress, socket creation) should run innermost first and hooks for outside actions (ingress) should be outermost first. > > 3. > we can extend the api further by adding 'attach_priority' flag as: > prog X attach /A prio=20 > prog Y attach /A prio=10 > prog N attach /A/B prio=20 > prog M attach /A/B prio=10 > in /A/B the sequence of progs will be M -> N -> Y -> X I haven't thought of a use for this. Maybe there is one. > > prog X attach /A prio=10 and prog Y attach /A prio=10 will be disallowed, > but attach with the same prio to different cgroups is ok. > If attached with prio, detach must specify prio as well. > Attach transitions: > allow_override -> disable_override/single_prog = ok > allow_override -> prio (multi prog at the same cgroup) = ok > disable_override/single_prog -> prio = ok (with respect to child/parent order) > prio -> allow_override = fail > prio -> disable_override/single_prog = fail > > *** > To summarize the key to not breaking abi is to preserve user space > expectations. Right now (without this patch) we have progs > overridable by any descendent. Which means that control plane > application has to expect that something may overwrite the program. > Hence any new flag will not break this expectation > (overridable == control plane cannot assume that its attached > programs will run in the hostile environment) > and that's the main reason why I don't think we need to change anything now > and hence this patch is an RFC. > > Adding 'allow_override' flag and changing the default to > override disallowed is also fine from api extensibility point of view. > Since for 'override disallowed' case the control plane app will > be expecting that any processes will not override its program > in the descendent cgroups and it will run. This would have to be preserved. > That's why the future api extensions (like #1 above) would have to do > the program chaining to preserve 'disallow override' flag expectations. > So imo it's safer to keep overridable as it is today, since this flag > adds a bit more restrictions to the future extensions > comparing to everything overridable. > > Andy, > does it all make sense? Yes with the caveat above. > Do you still insist on submitting this patch officially? I'm not sure what you mean. > or you're ok keeping it overridable for now. I really think the default should change for 4.10. People are going to use this feature for sandboxing or in systemd or whatever, and that code should keep working in newer kernels
Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency
On Fri, Feb 10, 2017 at 9:57 PM, Florian Fainelliwrote: > On 02/10/2017 12:05 PM, Arnd Bergmann wrote: >> On Friday, February 10, 2017 9:42:21 AM CET Florian Fainelli wrote: >>> On 02/10/2017 12:20 AM, Arnd Bergmann wrote: On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli wrote: > On 02/09/2017 07:08 AM, Arnd Bergmann wrote: > I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and > I was not able to reproduce this, what am I missing? In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in the header as a oneline workaround, but I think that would be more confusing to real users that try to use CONFIG_PHY=m without realizing why they lose access to their switch. >>> >>> I see, this patch should also help fixing this: >>> >>> http://patchwork.ozlabs.org/patch/726381/ >> >> I think you still have the same problem, as you can still have the >> boardinfo registration in a loadable module. > > The patch exports mdiobus_register_board_info() so that should solve > your problem here, and I did verify this with a loadable module that > references mdiobus_register_board_info() in that case. No, that's a different problem. What you get with arm allmodconfig (try it!) is that mdio-bus.ko is a loadable module, but referenced from built-in code rather than the other way around. Exporting the symbol doesn't change anything since the module cannot be loaded by the time we need the symbol. >> >> I have come up with a patch too now and done some randconfig testing >> on it (it took me several tries as well), please see below. It does >> some of the same things as yours and some others. >> >> The main trick is to have a separate 'MDIO_BOARDINFO' Kconfig symbol >> that can be selected regardless of all the other symbols, and that >> will lead to the registration being either built-in when it's needed >> or not built at all when either no board calls it, or PHYLIB is >> disabled. > > Your patch is fine in premise except that you are making CONFIG_MDIO > encompass both drivers/net/mdio.c and > drivers/net/phy/mdio_{bus,device}.c and these do share the same header > (for better or for worse), but are not quite dealing with MDIO at the > same level. drivers/net/mdio.c is more like PHYLIB for the old-style, > pre mdiobus() drivers helper functions. Ah, makes sense. I had missed that part. > I like it that you made MDIO_BOARDINFO separate, and that is probably a > patch I should incorporate in the other patch splitting things up, see > below though for the remainder of the changes. Ok. >> >> From f35e89cacfabdf7b822772013389132605941def Mon Sep 17 00:00:00 2001 >> From: Arnd Bergmann >> Date: Wed, 27 Apr 2016 11:51:18 +0200 >> Subject: [PATCH] [RFC] move ethernet PHY config into drivers/phy/Kconfig >> >> Calling mdiobus_register_board_info from builtin code with CONFIG_PHYLIB=m >> currently results in a link error: >> >> arch/arm/plat-orion/common.o: In function `orion_ge00_switch_init': >> common.c:(.init.text+0x6a4): undefined reference to >> `mdiobus_register_board_info' >> >> As the long-term strategy is to separate mdio from phylib, and to get >> generic-phy >> and (networking-only) phylib closer together, this performs a first step in >> that >> direction: The Kconfig file for phylib gets logically pulled under the PHY >> driver configuration and becomes independent from networking. This lets us >> select the new CONFIG_MDIO_BOARDINFO from platforms that need it, and provide >> the functions exactly when we need them. > > This is too broad, the only part that is worth in drivers/net/phy/ of > pulling out of drivers/net/phy/ is what I tried to extract: mdio bus and > device. There are some bad inter-dependencies between that code and > phy_device.c and phy.c which makes it hard to split and make that part > completely standalone for now. > > The only part that is truly valuable to non-Ethernet PHY devices is the > MDIO bus/device registration part, which is available in my patch with > CONFIG_MDIO_DEVICE, and which probably should not depend from > NETDEVICES, so the other part of your patch makes sense too here. My patch started out from something I had done a long time ago when we discussed how the two subsystems (generic-phy and phylib) can be tied together more. This has two aspects: - Moving them into a single top-level Kconfig menu (and eventually directory) to make it easier to find one of them when you look in the wrong place. My patch starts doing that. - making the MDIO bus available to generic-phy drivers. This is what your patch does. Right now, we only really need part of my patch to fix the link error, but it makes way more sense once all the parts come together. Arnd
Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)
On 10 February 2017 at 09:42, Arnaldo Carvalho de Melowrote: > Em Wed, Feb 08, 2017 at 09:27:41PM +0100, Mickaël Salaün escreveu: >> This series brings some fixes and small improvements to the BPF samples. >> >> This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf: >> Add missing header to the library"). > > Wang, are you ok with this series? Joe? The changes look good to me. I also tried tracex5 and it seems to work fine.
[GIT] Networking
1) If the timing is wrong we can indefinitely stop generating new ipv6 temporary addresses, from Marcus Huewe. 2) Don't double free per-cpu stats in ipv6 SIT tunnel driver, from Cong Wang. 3) Put protections in place so that AF_PACKET is not able to submit packets which don't even have a link level header to drivers. From Willem de Bruijn. 4) Fix memory leaks in ipv4 and ipv6 multicast code, from Hangbin Liu. 5) Don't use udp_ioctl() in l2tp code, UDP version expects a UDP socket and that doesn't go over very well when it is passed an L2TP one. Fix from Eric Dumazet. 6) Don't crash on NULL pointer in phy_attach_direct(), from Florian Fainelli. Please pull, thanks a lot. The following changes since commit 926af6273fc683cd98cd0ce7bf0d04a02eed6742: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-02-07 12:10:57 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to 72fb96e7bdbbdd4421b0726992496531060f3636: l2tp: do not use udp_ioctl() (2017-02-10 15:57:34 -0500) Boris Ostrovsky (1): xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend() David Ahern (1): lwtunnel: valid encap attr check should return 0 when lwtunnel is disabled David S. Miller (2): Merge branch 'net-header-length-truncation' Merge branch 'sierra_net-fixes' Eric Dumazet (1): l2tp: do not use udp_ioctl() Florian Fainelli (2): net: dsa: Do not destroy invalid network devices net: phy: Fix PHY module checks and NULL deref in phy_attach_direct() Hangbin Liu (1): igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() Kejian Yan (1): net: hns: Fix the device being used for dma mapping during TX Marcus Huewe (1): ipv6: addrconf: fix generation of new temporary addresses Ralf Baechle (1): NET: mkiss: Fix panic Ross Lagerwall (1): xen-netfront: Improve error handling during initialization Stefan Brüns (2): sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications sierra_net: Skip validating irrelevant fields for IDLE LSIs Thanneeru Srinivasulu (1): net: thunderx: Fix PHY autoneg for SGMII QLM mode Vineeth Remanan Pillai (1): xen-netfront: Rework the fix for Rx stall during OOM and network stress WANG Cong (3): sit: fix a double free on error path ping: fix a null pointer dereference kcm: fix 0-length case for kcm_sendmsg() Willem de Bruijn (2): net: introduce device min_header_len packet: round up linear to header len Yendapally Reddy Dhananjaya Reddy (1): net: phy: Initialize mdio clock at probe function drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 108 drivers/net/ethernet/cavium/thunder/thunder_bgx.h | 5 + drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 +- drivers/net/hamradio/mkiss.c | 4 ++-- drivers/net/loopback.c| 1 + drivers/net/phy/mdio-bcm-iproc.c | 6 ++ drivers/net/phy/phy_device.c | 28 drivers/net/usb/sierra_net.c | 111 +++ drivers/net/xen-netfront.c| 46 -- include/linux/netdevice.h | 4 include/net/lwtunnel.h| 5 - net/dsa/dsa2.c| 1 + net/ethernet/eth.c| 1 + net/ipv4/igmp.c | 1 + net/ipv4/ping.c | 2 ++ net/ipv6/addrconf.c | 6 ++ net/ipv6/mcast.c | 1 + net/ipv6/sit.c| 1 + net/kcm/kcmsock.c | 40 ++-- net/l2tp/l2tp_core.h | 1 + net/l2tp/l2tp_ip.c| 27 ++- net/l2tp/l2tp_ip6.c | 2 +- net/packet/af_packet.c| 7 --- 23 files changed, 297 insertions(+), 113 deletions(-)
Re: [PATCH net 1/1] net: fec: fix multicast filtering hardware setup
On Fri, Feb 10, 2017 at 3:54 AM, Andy Duanwrote: > Fix hardware setup of multicast address hash: > - Never clear the hardware hash (to avoid packet loss) > - Construct the hash register values in software and then write once > to hardware > > Signed-off-by: Fugang Duan > Signed-off-by: Rui Sousa It seems you missed to put Rui's name in the From: field.
Re: [PATCH net-next,v2] gtp: add MAINTAINERS
From: Pablo Neira AyusoDate: Fri, 10 Feb 2017 13:26:27 +0100 > From: Pablo Neira > > Add maintainers for this tunnel driver. Include main osmocom.org mailist > list too. > > Signed-off-by: Pablo Neira Ayuso > --- > v2: Harald suggests osmocom-net-g...@lists.osmocom.org is better ML for this. Applied, thanks.
Re: [PATCH 2/3] tipc: Fix tipc_sk_reinit race conditions
On 02/07/2017 08:39 PM, Herbert Xu wrote: > There are two problems with the function tipc_sk_reinit. Firstly > it's doing a manual walk over an rhashtable. This is broken as > an rhashtable can be resized and if you manually walk over it > during a resize then you may miss entries. > > Secondly it's missing memory barriers as previously the code used > spinlocks which provide the barriers implicitly. > > This patch fixes both problems. > > Fixes: 07f6c4bc048a ("tipc: convert tipc reference table to...") > Signed-off-by: Herbert XuAcked-by: Ying Xue > --- > > net/tipc/net.c|4 > net/tipc/socket.c | 30 +++--- > 2 files changed, 23 insertions(+), 11 deletions(-) > > diff --git a/net/tipc/net.c b/net/tipc/net.c > index 28bf4fe..ab8a2d5 100644 > --- a/net/tipc/net.c > +++ b/net/tipc/net.c > @@ -110,6 +110,10 @@ int tipc_net_start(struct net *net, u32 addr) > char addr_string[16]; > > tn->own_addr = addr; > + > + /* Ensure that the new address is visible before we reinit. */ > + smp_mb(); > + > tipc_named_reinit(net); > tipc_sk_reinit(net); > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 333c5da..20240e1 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -384,8 +384,6 @@ static int tipc_sk_create(struct net *net, struct socket > *sock, > INIT_LIST_HEAD(>publications); > msg = >phdr; > tn = net_generic(sock_net(sk), tipc_net_id); > - tipc_msg_init(tn->own_addr, msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG, > - NAMED_H_SIZE, 0); > > /* Finish initializing socket data structures */ > sock->ops = ops; > @@ -395,6 +393,13 @@ static int tipc_sk_create(struct net *net, struct socket > *sock, > pr_warn("Socket create failed; port number exhausted\n"); > return -EINVAL; > } > + > + /* Ensure tsk is visible before we read own_addr. */ > + smp_mb(); > + > + tipc_msg_init(tn->own_addr, msg, TIPC_LOW_IMPORTANCE, TIPC_NAMED_MSG, > + NAMED_H_SIZE, 0); > + > msg_set_origport(msg, tsk->portid); > setup_timer(>sk_timer, tipc_sk_timeout, (unsigned long)tsk); > sk->sk_shutdown = 0; > @@ -2267,24 +2272,27 @@ static int tipc_sk_withdraw(struct tipc_sock *tsk, > uint scope, > void tipc_sk_reinit(struct net *net) > { > struct tipc_net *tn = net_generic(net, tipc_net_id); > - const struct bucket_table *tbl; > - struct rhash_head *pos; > + struct rhashtable_iter iter; > struct tipc_sock *tsk; > struct tipc_msg *msg; > - int i; > > - rcu_read_lock(); > - tbl = rht_dereference_rcu((>sk_rht)->tbl, >sk_rht); > - for (i = 0; i < tbl->size; i++) { > - rht_for_each_entry_rcu(tsk, pos, tbl, i, node) { > + rhashtable_walk_enter(>sk_rht, ); > + > + do { > + tsk = ERR_PTR(rhashtable_walk_start()); > + if (tsk) > + continue; > + > + while ((tsk = rhashtable_walk_next()) && !IS_ERR(tsk)) { > spin_lock_bh(>sk.sk_lock.slock); > msg = >phdr; > msg_set_prevnode(msg, tn->own_addr); > msg_set_orignode(msg, tn->own_addr); > spin_unlock_bh(>sk.sk_lock.slock); > } > - } > - rcu_read_unlock(); > + > + rhashtable_walk_stop(); > + } while (tsk == ERR_PTR(-EAGAIN)); > } > > static struct tipc_sock *tipc_sk_lookup(struct net *net, u32 portid) >
Re: [PATCH net-next 00/12] Netronome NFP4000 and NFP6000 PF driver
From: Jakub KicinskiDate: Thu, 9 Feb 2017 09:17:26 -0800 > This is a base PF driver for Netronome NFP4000 and NFP6000 chips. This > series doesn't add any exciting new features, it provides a foundation > for supporting more advanced firmware applications. Series applied, thank you.
Re: [PATCH v2 net] l2tp: do not use udp_ioctl()
From: Eric DumazetDate: Thu, 09 Feb 2017 16:15:52 -0800 > From: Eric Dumazet > > udp_ioctl(), as its name suggests, is used by UDP protocols, > but is also used by L2TP :( > > L2TP should use its own handler, because it really does not > look the same. > > SIOCINQ for instance should not assume UDP checksum or headers. > > Thanks to Andrey and syzkaller team for providing the report > and a nice reproducer. > > While crashes only happen on recent kernels (after commit > 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this > probably needs to be backported to older kernels. > > Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue") > Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()") > Signed-off-by: Eric Dumazet > Reported-by: Andrey Konovalov > Acked-by: Paolo Abeni > --- > v2: Adding the EXPORT_SYMBOL(l2tp_ioctl) for ipv6, of course... Applied and queued up for -stable, thanks Eric.
Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency
On 02/10/2017 12:05 PM, Arnd Bergmann wrote: > On Friday, February 10, 2017 9:42:21 AM CET Florian Fainelli wrote: >> On 02/10/2017 12:20 AM, Arnd Bergmann wrote: >>> On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli>>> wrote: On 02/09/2017 07:08 AM, Arnd Bergmann wrote: I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and I was not able to reproduce this, what am I missing? >>> >>> In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and >>> we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in >>> the header as a oneline workaround, but I think that would be more confusing >>> to real users that try to use CONFIG_PHY=m without realizing why they lose >>> access to their switch. >> >> I see, this patch should also help fixing this: >> >> http://patchwork.ozlabs.org/patch/726381/ > > I think you still have the same problem, as you can still have the > boardinfo registration in a loadable module. The patch exports mdiobus_register_board_info() so that should solve your problem here, and I did verify this with a loadable module that references mdiobus_register_board_info() in that case. > > I have come up with a patch too now and done some randconfig testing > on it (it took me several tries as well), please see below. It does > some of the same things as yours and some others. > > The main trick is to have a separate 'MDIO_BOARDINFO' Kconfig symbol > that can be selected regardless of all the other symbols, and that > will lead to the registration being either built-in when it's needed > or not built at all when either no board calls it, or PHYLIB is > disabled. Your patch is fine in premise except that you are making CONFIG_MDIO encompass both drivers/net/mdio.c and drivers/net/phy/mdio_{bus,device}.c and these do share the same header (for better or for worse), but are not quite dealing with MDIO at the same level. drivers/net/mdio.c is more like PHYLIB for the old-style, pre mdiobus() drivers helper functions. I like it that you made MDIO_BOARDINFO separate, and that is probably a patch I should incorporate in the other patch splitting things up, see below though for the remainder of the changes. > > From f35e89cacfabdf7b822772013389132605941def Mon Sep 17 00:00:00 2001 > From: Arnd Bergmann > Date: Wed, 27 Apr 2016 11:51:18 +0200 > Subject: [PATCH] [RFC] move ethernet PHY config into drivers/phy/Kconfig > > Calling mdiobus_register_board_info from builtin code with CONFIG_PHYLIB=m > currently results in a link error: > > arch/arm/plat-orion/common.o: In function `orion_ge00_switch_init': > common.c:(.init.text+0x6a4): undefined reference to > `mdiobus_register_board_info' > > As the long-term strategy is to separate mdio from phylib, and to get > generic-phy > and (networking-only) phylib closer together, this performs a first step in > that > direction: The Kconfig file for phylib gets logically pulled under the PHY > driver configuration and becomes independent from networking. This lets us > select the new CONFIG_MDIO_BOARDINFO from platforms that need it, and provide > the functions exactly when we need them. This is too broad, the only part that is worth in drivers/net/phy/ of pulling out of drivers/net/phy/ is what I tried to extract: mdio bus and device. There are some bad inter-dependencies between that code and phy_device.c and phy.c which makes it hard to split and make that part completely standalone for now. The only part that is truly valuable to non-Ethernet PHY devices is the MDIO bus/device registration part, which is available in my patch with CONFIG_MDIO_DEVICE, and which probably should not depend from NETDEVICES, so the other part of your patch makes sense too here. Thanks! > > In the same step, we can also split out the MDIO driver configuration from > phylib. This is based on an older experimental patch I had, but it still > requires some code changes in phylib itself to let users actually rely on > MDIO without all of PHYLIB. > > Signed-off-by: Arnd Bergmann > > diff --git a/arch/arm/mach-orion5x/Kconfig b/arch/arm/mach-orion5x/Kconfig > index 468b8cb7fd5f..e1126e1aa3d2 100644 > --- a/arch/arm/mach-orion5x/Kconfig > +++ b/arch/arm/mach-orion5x/Kconfig > @@ -4,6 +4,7 @@ menuconfig ARCH_ORION5X > select CPU_FEROCEON > select GENERIC_CLOCKEVENTS > select GPIOLIB > + select MDIO_BOARDINFO > select MVEBU_MBUS > select PCI > select PLAT_ORION_LEGACY > diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig > index a993cbeb9e0c..9eb15b7518bd 100644 > --- a/drivers/net/Kconfig > +++ b/drivers/net/Kconfig > @@ -378,8 +378,6 @@ config NET_SB1000 > > If you don't have this card, of course say N. > > -source "drivers/net/phy/Kconfig" > - > source "drivers/net/plip/Kconfig" > > source "drivers/net/ppp/Kconfig" > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index
Re: [PATCH net-next v5 00/11] Improve BPF selftests and use the library (net-next tree)
From: Mickaël SalaünDate: Fri, 10 Feb 2017 00:21:34 +0100 > This series brings some fixes to selftests, add the ability to test > unprivileged BPF programs as root and replace bpf_sys.h with calls to the BPF > library. > > This is intended for the net-next tree and apply on c0e4dadb3494 ("net: dsa: > mv88e6xxx: Move forward declaration to where it is needed"). Series applied, thank you.
[PATCH 1/2] net: fs_enet: Fix an error handling path
'of_node_put(fpi->phy_node)' should also be called if we branch to 'out_deregister_fixed_link' error handling path. Signed-off-by: Christophe JAILLET--- drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c index 54e3ce9bd94c..5c6426756d11 100644 --- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c +++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c @@ -1045,10 +1045,10 @@ static int fs_enet_probe(struct platform_device *ofdev) out_free_dev: free_netdev(ndev); out_put: - of_node_put(fpi->phy_node); if (fpi->clk_per) clk_disable_unprepare(fpi->clk_per); out_deregister_fixed_link: + of_node_put(fpi->phy_node); if (of_phy_is_fixed_link(ofdev->dev.of_node)) of_phy_deregister_fixed_link(ofdev->dev.of_node); out_free_fpi: -- 2.9.3
[PATCH 2/2] net: fs_enet: Simplify code
There is no need to use an intermediate variable to handle an error code in this case. Signed-off-by: Christophe JAILLET--- I think that the remaining use of 'err' a few lines above could also be dropped. However, it could change the return value (i.e. propagation of the error returned by 'of_phy_register_fixed_link' instead of -ENODEV) and I'm unsure it would be correct. So I leave it as-is. --- drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c index 5c6426756d11..753259091b22 100644 --- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c +++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c @@ -964,11 +964,10 @@ static int fs_enet_probe(struct platform_device *ofdev) */ clk = devm_clk_get(>dev, "per"); if (!IS_ERR(clk)) { - err = clk_prepare_enable(clk); - if (err) { - ret = err; + ret = clk_prepare_enable(clk); + if (ret) goto out_deregister_fixed_link; - } + fpi->clk_per = clk; } -- 2.9.3
Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency
On Friday, February 10, 2017 9:42:21 AM CET Florian Fainelli wrote: > On 02/10/2017 12:20 AM, Arnd Bergmann wrote: > > On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelli> > wrote: > >> On 02/09/2017 07:08 AM, Arnd Bergmann wrote: > >> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and > >> I was not able to reproduce this, what am I missing? > > > > In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and > > we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in > > the header as a oneline workaround, but I think that would be more confusing > > to real users that try to use CONFIG_PHY=m without realizing why they lose > > access to their switch. > > I see, this patch should also help fixing this: > > http://patchwork.ozlabs.org/patch/726381/ I think you still have the same problem, as you can still have the boardinfo registration in a loadable module. I have come up with a patch too now and done some randconfig testing on it (it took me several tries as well), please see below. It does some of the same things as yours and some others. The main trick is to have a separate 'MDIO_BOARDINFO' Kconfig symbol that can be selected regardless of all the other symbols, and that will lead to the registration being either built-in when it's needed or not built at all when either no board calls it, or PHYLIB is disabled. >From f35e89cacfabdf7b822772013389132605941def Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 27 Apr 2016 11:51:18 +0200 Subject: [PATCH] [RFC] move ethernet PHY config into drivers/phy/Kconfig Calling mdiobus_register_board_info from builtin code with CONFIG_PHYLIB=m currently results in a link error: arch/arm/plat-orion/common.o: In function `orion_ge00_switch_init': common.c:(.init.text+0x6a4): undefined reference to `mdiobus_register_board_info' As the long-term strategy is to separate mdio from phylib, and to get generic-phy and (networking-only) phylib closer together, this performs a first step in that direction: The Kconfig file for phylib gets logically pulled under the PHY driver configuration and becomes independent from networking. This lets us select the new CONFIG_MDIO_BOARDINFO from platforms that need it, and provide the functions exactly when we need them. In the same step, we can also split out the MDIO driver configuration from phylib. This is based on an older experimental patch I had, but it still requires some code changes in phylib itself to let users actually rely on MDIO without all of PHYLIB. Signed-off-by: Arnd Bergmann diff --git a/arch/arm/mach-orion5x/Kconfig b/arch/arm/mach-orion5x/Kconfig index 468b8cb7fd5f..e1126e1aa3d2 100644 --- a/arch/arm/mach-orion5x/Kconfig +++ b/arch/arm/mach-orion5x/Kconfig @@ -4,6 +4,7 @@ menuconfig ARCH_ORION5X select CPU_FEROCEON select GENERIC_CLOCKEVENTS select GPIOLIB + select MDIO_BOARDINFO select MVEBU_MBUS select PCI select PLAT_ORION_LEGACY diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index a993cbeb9e0c..9eb15b7518bd 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -378,8 +378,6 @@ config NET_SB1000 If you don't have this card, of course say N. -source "drivers/net/phy/Kconfig" - source "drivers/net/plip/Kconfig" source "drivers/net/ppp/Kconfig" diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7336cbd3ef5d..3ab87e9f9442 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -17,7 +17,7 @@ obj-$(CONFIG_MII) += mii.o obj-$(CONFIG_MDIO) += mdio.o obj-$(CONFIG_NET) += Space.o loopback.o obj-$(CONFIG_NETCONSOLE) += netconsole.o -obj-$(CONFIG_PHYLIB) += phy/ +obj-y+= phy/ obj-$(CONFIG_RIONET) += rionet.o obj-$(CONFIG_NET_TEAM) += team/ obj-$(CONFIG_TUN) += tun.o diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig index 8c08f9deef92..9c4652ae2750 100644 --- a/drivers/net/ethernet/Kconfig +++ b/drivers/net/ethernet/Kconfig @@ -11,9 +11,6 @@ menuconfig ETHERNET if ETHERNET -config MDIO - tristate - config SUNGEM_PHY tristate diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index 8dbd59baa34d..37f5552cc5b3 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -3,8 +3,9 @@ # menuconfig PHYLIB - tristate "PHY Device support and infrastructure" + tristate "Ethernet PHY Device support and infrastructure" depends on NETDEVICES + select MDIO help Ethernet controllers are usually attached to PHY devices. This option provides infrastructure for @@ -248,6 +249,16 @@ config FIXED_PHY PHYs that are not connected to the real MDIO bus. Currently tested with mpc866ads and mpc8349e-mitx. +endif # PHYLIB + +config MDIO + tristate + help + The MDIO bus is typically used ethernet PHYs, but can also be + used by other PHY drivers. +
Re: [patch net-next] spectrum: flower: Treat ETH_P_ALL as a special case and translate for HW
From: Jiri PirkoDate: Thu, 9 Feb 2017 14:42:03 +0100 > From: Jiri Pirko > > HW does not understand ETH_P_ALL. So treat this special case differently > and translate to 0/0 key/mask. That will allow HW to match all ethertypes. > > Fixes: 7aa0f5aa9030 ("mlxsw: spectrum: Implement TC flower offload") > Signed-off-by: Jiri Pirko > Reviewed-by: Ido Schimmel Applied.
Re: [patch net-next 0/4] devlink: small cleanup around eswitch [sg]et
From: Jiri PirkoDate: Thu, 9 Feb 2017 15:54:32 +0100 > Contains small devlink cleanup around eswitch get/set commands. Series applied, thanks.
Re: [PATCH] net: ethernet: ti: netcp_core: return netdev_tx_t in xmit
From: Ivan KhoronzhukDate: Thu, 9 Feb 2017 16:24:14 +0200 > @@ -1300,7 +1301,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, > struct net_device *ndev) > dev_warn(netcp->ndev_dev, "padding failed (%d), packet > dropped\n", >ret); > tx_stats->tx_dropped++; > - return ret; > + return NETDEV_TX_BUSY; > } > skb->len = NETCP_MIN_PACKET_SIZE; > } > @@ -1329,7 +1330,7 @@ static int netcp_ndo_start_xmit(struct sk_buff *skb, > struct net_device *ndev) > if (desc) > netcp_free_tx_desc_chain(netcp, desc, sizeof(*desc)); > dev_kfree_skb(skb); > - return ret; > + return NETDEV_TX_BUSY; > } I really think these should be returning NET_XMIT_DROP.
Re: [PATCH net-next v2 00/12] net: dsa: remove unnecessary phy.h include
On 02/10/2017 10:51 AM, David Miller wrote: > From: Kalle Valo> Date: Thu, 09 Feb 2017 16:10:06 +0200 > >> Florian Fainelli writes: >> > If not, for something like this it's a must: > > drivers/net/wireless/ath/wil6210/cfg80211.c:24:30: error: expected ‘)’ > before ‘bool’ > module_param(disable_ap_sme, bool, 0444); > ^ > drivers/net/wireless/ath/wil6210/cfg80211.c:25:34: error: expected ‘)’ > before string constant > MODULE_PARM_DESC(disable_ap_sme, " let user space handle AP mode SME"); > ^ > Like like that file needs linux/module.h included. Johannes already fixed a similar (or same) problem in my tree: wil6210: include moduleparam.h https://git.kernel.org/cgit/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=949c2d0096753d518ef6e0bd8418c8086747196b I'm planning to send you a pull request tomorrow which contains that one. >>> >>> Thanks Kalle! >>> >>> David, can you hold on this series until Kalle's pull request gets >>> submitted? Past this error, allmodconfig builds fine with this patch >>> series (just tested). Thanks! >> >> Just submitted the pull request: >> >> https://patchwork.ozlabs.org/patch/726133/ > > I've retried this patch series, and will push it out assuming the build > completes properly. I see it merged in net-next/master, thanks a lot this is going to save a lot of cycles in the future, thanks David! -- Florian
Re: [PATCH net-next 0/4] net/sched: Use TC skip flags to reflect HW offload status
From: Or GerlitzDate: Thu, 9 Feb 2017 16:18:04 +0200 > Currently there is no way of querying whether a filter is > offloaded to HW or not when using both policy (no flag). > > Reuse the skip flags to show the insertion status by setting > the skip_hw flag in case the filter wasn't offloaded. > > The bpf patch is compile tested only, Daniel/Jakub, will > appreciate your review/ack. ... I'm learning towards suggesting that you use new flags, this way it will be unambiguous whether we are running an old kernel. If you just use the skip flag, it's impossible to tell the difference.
Re: [PATCH 1/3] ath10k: remove ath10k_vif_to_arvif()
On 9 February 2017 at 23:37, Joe Percheswrote: > On Thu, 2017-02-09 at 23:14 -0800, Adrian Chadd wrote: > >> If there >> were accessors for the skb data / len fields (like we do for mbufs) >> then porting the code would've involved about 5,000 less changed >> lines. > > What generic mechanisms would you suggest to make > porting easier between bsd and linux and what in > your opinion are the best naming schemes to make > these functions easiest to read and implement > without resorting to excessive identifier lengths? > > If you have some, please provide examples. (Why not, it's pre-coffee o'clock.) The biggest barriers are direct struct accessors. Most of the time the kernels have similar enough semantics that I can just implement a linux shim layer (like we do for graphics layer porting from Linux.) Eg, having skb_data(skb) (and skb_data_const(skb)) + skb_len(skb) instead of skb->data and skb->len would remove a lot of churn. Having say, a vif_to_drvpriv() method analogous to ath10k_vif_to_arvif() would also simplify the changes. For the rest of it we can just use a linux-like shim layer to get everything else working pretty darn well. But the biggest thing that helps is a quasi HAL code structure. I know HAL is a dirty word, so think of it more as "how would one separate out the OS interface layer from the rest of the driver." A good example in ath10k is the difference between say, wmi.c, the pci / copyengine code and mac.c. * the pci / copyengine code is almost 100% compilable on other platforms, save the differences in little things (malloc, free, KVA versus physical memory allocation, bounce buffering, sync'ing, etc.) A sufficiently refactored driver like ath10k where almost all of that stuff happens in the pci/copyengine code made porting that much less painful. * the wmi code is almost exclusively portable - besides the malloc/free, etc mechanical changes which honestly can be stubbed, it uses the lower layers (pci/ce, hif, htc, etc) for doing actual work, and the upper layer uses a well-defined API + callback mechanism for getting work done. Porting that was mechanical but reasonably easy. * however, the mac.c code contains both code which sends commands to the firmware (vif create/destroy, pdev commands, station associate/update/destroy, crypto key handling, peer rate control, etc) /and/ very linux mac80211/cfg80211 specific bits. If mac.c were split into mac-mac80211.c (which was /just/ mac80211, cfg80211, etc bits) and mac-utils.c (the bits that actually /sent/ the commands, responses, all the support code, etc) then my port would just implement mac-net80211.c as a completely new file, and the rest would just be modified as required by porting. A lot of the ath10k headers too mix linux specific things (eg struct device, dependencies) with hardware specific definitions for say, register accesses. I split out the register and firmware command / structures into separate header files that didn't mingle OS and driver specific structures to make it much easier to reuse that code. I find that good driver writing hygiene in any case. I'm not expecting an intel ethernet driver style HAL separation, although that'd certainly make life easier in porting over drivers. But just having inlined accessor functions for most things and some stricter driver structure for OS touch points (dma setup/teardown, bounce buffer stuff, mac80211/cfg80211, ethtool, etc APIs) would make porting and testing things a lot easier. :-) 2c, and I'll do the porting/reimplementing work anyway regardless of how much coffee it requires, -adrian
Re: pull-request: mac80211-next 2017-02-09
From: Johannes BergDate: Thu, 9 Feb 2017 15:27:33 +0100 > Here are some more (final) updates for -next. Nothing here is > really interesting, mostly cleanups and small fixes. > > Please pull and let me know if there's any problem. Pulled, thank you.
Re: [PATCH 0/2] net: ethernet: ti: cpsw: fix susp/resume
On 02/09/2017 07:45 PM, David Miller wrote: From: Ivan KhoronzhukDate: Fri, 10 Feb 2017 00:54:24 +0200 On Thu, Feb 09, 2017 at 05:21:26PM -0500, David Miller wrote: From: Ivan Khoronzhuk Date: Thu, 9 Feb 2017 02:07:34 +0200 These two patches fix suspend/resume chain. Patch 2 doesn't apply cleanly to the 'net' tree, please respin this series. Strange, I've just checked it on net-next/master, it was applied w/o any warnings. It makes no sense to test "net-next" when I am telling you that it is the "net" tree it doesn't apply to. This is a bug fix, so it should be targetting the "net" tree. Looks like the first fix is for net, but the second one is for net-next I do not see 03fd01ad0eead23eb79294b6fb4d71dcac493855 "net: ethernet: ti: cpsw: don't duplicate ndev_running" in net. -- regards, -grygorii
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/10/17 10:00), Cong Wang wrote: > My understanding about the race here is packet_release() doesn't > wait for flying packets correctly, which leads to a flying packet still > refers to the struct sock which is being released. > > This could happen because struct packet_fanout is refcn'ted, it is : > At least I believe this explains the crash Dmitry reported. hmm, the proof of the pudding is in the eating- would be good to be able to reliably reproduce this somewhere (thus proving that root-cause analysis is rock-solid), maybe by introducing artificial delays to slow down paths.. I'm travelling at the moment but may be able to give this (try to reproduce it reliably) next week. --Sowmini
Re: [PATCH] NET: mkiss: Fix panic
From: Ralf BaechleDate: Thu, 9 Feb 2017 14:12:11 +0100 > If a USB-to-serial adapter is unplugged, the driver re-initializes, with > dev->hard_header_len and dev->addr_len set to zero, instead of the correct > values. If then a packet is sent through the half-dead interface, the > kernel will panic due to running out of headroom in the skb when pushing > for the AX.25 headers resulting in this panic: > > [] (skb_panic) from [] (skb_push+0x4c/0x50) > [] (skb_push) from [] (ax25_hard_header+0x34/0xf4 [ax25]) > [] (ax25_hard_header [ax25]) from [] (ax_header+0x38/0x40 > [mkiss]) > [] (ax_header [mkiss]) from [] > (neigh_compat_output+0x8c/0xd8) > [] (neigh_compat_output) from [] > (ip_finish_output+0x2a0/0x914) > [] (ip_finish_output) from [] (ip_output+0xd8/0xf0) > [] (ip_output) from [] (ip_local_out_sk+0x44/0x48) > > This patch makes mkiss behave like the 6pack driver. 6pack does not > panic. In 6pack.c sp_setup() (same function name here) the values for > dev->hard_header_len and dev->addr_len are set to the same values as in > my mkiss patch. > > [r...@linux-mips.org: Massages original submission to conform to the usual > standards for patch submissions.] > > Signed-off-by: Thomas Osterried > Signed-off-by: Ralf Baechle Applied, thank you.
Re: [PATCH net-next v2 00/12] net: dsa: remove unnecessary phy.h include
From: Kalle ValoDate: Thu, 09 Feb 2017 16:10:06 +0200 > Florian Fainelli writes: > If not, for something like this it's a must: drivers/net/wireless/ath/wil6210/cfg80211.c:24:30: error: expected ‘)’ before ‘bool’ module_param(disable_ap_sme, bool, 0444); ^ drivers/net/wireless/ath/wil6210/cfg80211.c:25:34: error: expected ‘)’ before string constant MODULE_PARM_DESC(disable_ap_sme, " let user space handle AP mode SME"); ^ Like like that file needs linux/module.h included. >>> >>> Johannes already fixed a similar (or same) problem in my tree: >>> >>> wil6210: include moduleparam.h >>> >>> https://git.kernel.org/cgit/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=949c2d0096753d518ef6e0bd8418c8086747196b >>> >>> I'm planning to send you a pull request tomorrow which contains that >>> one. >> >> Thanks Kalle! >> >> David, can you hold on this series until Kalle's pull request gets >> submitted? Past this error, allmodconfig builds fine with this patch >> series (just tested). Thanks! > > Just submitted the pull request: > > https://patchwork.ozlabs.org/patch/726133/ I've retried this patch series, and will push it out assuming the build completes properly.
[PATCH net-next] sfc: fix swapped arguments to efx_ef10_handle_rx_event_errors
Fixes: a0ee35414837 ("sfc: process RX event inner checksum flags") Reported-by: Colin Ian KingSigned-off-by: Edward Cree --- drivers/net/ethernet/sfc/ef10.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 6bba2d2..761ccc6 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -3356,8 +3356,9 @@ static int efx_ef10_handle_rx_event(struct efx_channel *channel, EFX_AND_QWORD(errors, *event, errors); if (unlikely(!EFX_QWORD_IS_ZERO(errors))) { flags |= efx_ef10_handle_rx_event_errors(channel, n_packets, +rx_encap_hdr, rx_l3_class, rx_l4_class, -rx_encap_hdr, event); +event); } else { bool tcpudp = rx_l4_class == ESE_DZ_L4_CLASS_TCP || rx_l4_class == ESE_DZ_L4_CLASS_UDP;
Re: Extending socket timestamping API for NTP
> On Feb 09, 2017, at 16:33, Denny Pagewrote: > > >> On Feb 09, 2017, at 11:42, sdncurious wrote: >> >> I am still at a loss as to why transpose is required in case of HW >> time stamping. If STF is used for both Tx and Rx time stamping the >> timing is absolutely correct. > > Perhaps this will help. The specific transposition is: > > transposed_timestamp_ns = timestamp_ns + (frame_len_bits * 10) / > (interface_speed * 100) > > The transposition is applied to received timestamps only. Before anyone else asks, yes, I know this can be reduced. :)
Re: pull-request: wireless-drivers-next 2017-02-09
From: Kalle ValoDate: Thu, 09 Feb 2017 16:08:25 +0200 > another pull request for net-next. If the merge window starts on Sunday > this would be the last pull request from me with new features. But if it > doesn't open, I'm planning to send one more next week. > > Please let me know if there any problems. Pulled, thank you.
Re: [PATCH net] at803x: insure minimum delay for SGMII link AN completion ckeck
On 02/10/2017 08:42 AM, Claudiu Manoil wrote: > Commit: f62265b "at803x: double check SGMII side autoneg" > introduced a regression for the p1010rdb board which has > two of the ethernet controllers (eTSEC) connected through > SGMII links to external Atheros SGMII AR8033 PHYs. > The issue consists in a dead link for these ports, and is > 100% reproducible on kernel 4.9 (and later): > > root@p1010rdb-pb:~# ifconfig eth2 172.16.1.1 > [ 203.274263] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready > root@p1010rdb-pb:~# [ 206.408255] 803x_aneg_done: SGMII link is not ok > > root@p1010rdb-pb:~# ethtool eth2 > Settings for eth2: > Supported ports: [ MII ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Supported pause frame use: Symmetric Receive-only > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Advertised pause frame use: No > Advertised auto-negotiation: Yes > Link partner advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Half 1000baseT/Full > Link partner advertised pause frame use: Symmetric Receive-only > Link partner advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: MII > PHYAD: 2 > Transceiver: internal > Auto-negotiation: on > Supports Wake-on: g > Wake-on: d > Current message level: 0x003f (63) >drv probe link timer ifdown ifup > Link detected: no > > Insuring up to 100 usecs for the SGMII link side AN to complete > proves to be enough to have a working SGMII link, for this board. > The need for a delay for the SGMII link side may be explained by > the fact that there are two levels of auto-negotiation (AN) for a > SGMII link. First the PHY autonegotiates the link parameters w/ > its link partner over the copper link. In the second stage, the > AN results are then passed to the eTSEC MAC over the SGMII link > using the Clause 37 auto-negotiation functionality. While the > aneg_done() hook is called by the phylib state machine to check > for the completion of the 1st stage AN of the external PHY, > there's no mechanism to insure proper AN completion of the internal > SGMII link (which is actually handled on the eTSEC side by a > "internal PHY", called TBI). > > Fixes: f62265b "at803x: double check SGMII side autoneg" > > Signed-off-by: Claudiu Manoil> --- > drivers/net/phy/at803x.c | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c > index a52b560..55fa7c4 100644 > --- a/drivers/net/phy/at803x.c > +++ b/drivers/net/phy/at803x.c > @@ -366,6 +366,7 @@ static void at803x_link_change_notify(struct phy_device > *phydev) > static int at803x_aneg_done(struct phy_device *phydev) > { > int ccr; > + int timeout = 100; /* usecs */ unsigned int, and use reverse christmas tree declarations, order from longest variable to shortest. -- Florian
Re: [PATCH net-next V3 0/3] net/sched: act_pedit: Use offset relative to conventional network headers
From: Amir VadaiDate: Tue, 7 Feb 2017 09:56:05 +0200 > Some FW/HW parser APIs are such that they need to get the specific header > type (e.g > IPV4 or IPV6, TCP or UDP) and not only the networking level (e.g network or > transport). > > Enhancing the UAPI to allow for specifying that, would allow the same flows > to be > set into both SW and HW. > > This patchset also makes pedit more robust. Currently fields offset is > specified > by offset relative to the ip header, while using negative offsets for > MAC layer fields. > > This series enables the user to set offset relative to the relevant header. > > Usage example: > $ tc filter add dev enp0s9 protocol ip parent : \ >flower \ > ip_proto tcp \ > dst_port 80 \ >action \ >pedit munge ip ttl add 0xff \ >pedit munge tcp dport set 8080 \ > pipe action mirred egress redirect dev veth0 > > Will forward traffic destined to tcp dport 80, while modifying the > destination port to 8080, and decreasing the ttl by one. > > I've uploaded a draft for the userspace [2] to make it easier to review and > test the patchset. > > [1] - http://patchwork.ozlabs.org/patch/700909/ > [2] - git: https://bitbucket.org/av42/iproute2.git > branch: pedit > > Patchset was tested and applied on top of upstream commit bd092ad1463c ("Merge > branch 'remove-__napi_complete_done'") Series applied, thank you.
Re: [PATCH] xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
From: Boris OstrovskyDate: Thu, 9 Feb 2017 08:42:59 -0500 > Are you going to take this to your tree or would you rather it goes > via Xen tree? Ok, I just did. > And the same question for > > https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00625.html As I stated in the thread, I applied this one. > https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00754.html Likewise. In the future, if you use netdev patchwork URLs, two things will happen. You will see immediately in the discussion log and the patch state whether I applied it or not. And second, I will be able to reference and do something with the patch that much more quickly and easily. Thank you.
Re: [PATCH] net: ethernet: ti: netcp_core: remove netif_trans_update
From: Ivan KhoronzhukDate: Thu, 9 Feb 2017 16:17:40 +0200 > No need to update jiffies in txq->trans_start twice and only for tx 0, > it's supposed to be done in netdev_start_xmit() and per tx queue. > > Signed-off-by: Ivan Khoronzhuk > --- > Based on net-next/master Applied, thanks.
Re: [PATCH V2 net] net: hns: Fix the device being used for dma mapping during TX
From: Salil MehtaDate: Thu, 9 Feb 2017 11:46:15 + > From: Kejian Yan > > This patch fixes the device being used to DMA map skb->data. > Erroneous device assignment causes the crash when SMMU is enabled. > This happens during TX since buffer gets DMA mapped with device > correspondign to net_device and gets unmapped using the device > related to DSAF. > > Signed-off-by: Kejian Yan > Reviewed-by: Yisen Zhuang > Signed-off-by: Salil Mehta Applied, thank you.
Re: [patch net-next] spectrum: flower: Treat ETH_P_ALL as a special case and translate for HW
From: Jiri PirkoDate: Thu, 9 Feb 2017 14:42:03 +0100 > From: Jiri Pirko > > HW does not understand ETH_P_ALL. So treat this special case differently > and translate to 0/0 key/mask. That will allow HW to match all ethertypes. > > Fixes: 7aa0f5aa9030 ("mlxsw: spectrum: Implement TC flower offload") > Signed-off-by: Jiri Pirko > Reviewed-by: Ido Schimmel Applied, thanks.
Re: net/packet: use-after-free in packet_rcv_fanout
On Fri, Feb 10, 2017 at 10:02 AM, Eric Dumazetwrote: > On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote: >> On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote: >> > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet >> > wrote: >> > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote: >> > > >> > >> More likely the bug is in fanout_add(), with a buggy sequence in error >> > >> case, and not correct locking. >> > >> >> > >> kfree(po->rollover); >> > >> po->rollover = NULL; >> > >> >> > >> Two cpus entering fanout_add() (using the same af_packet socket, >> > >> syzkaller courtesy...) might both see po->fanout being NULL. >> > >> >> > >> Then they grab the mutex. Too late... >> > > >> > > Patch could be : >> > > >> > >> > For me, clearly the data structure that use-after-free'd is struct sock >> > rather than struct packet_rollover. >> >> Fine. But your patch makes absolutely no sense. > > At least, Anoob patch is making a step into the right direction ;) > > https://patchwork.ozlabs.org/patch/726532/ > Yeah, but still looks like a different one with the one Dmitry reported.
Re: [PATCH net-next v3 06/10] net: dsa: Migrate to device_find_class()
On 02/10/2017 05:02 AM, Greg KH wrote: > On Thu, Jan 19, 2017 at 04:51:55PM +, Russell King - ARM Linux wrote: >> (This is mainly for Greg's benefit to help him understand the issue.) >> >> I think the diagram you gave initially made this confusing, as it >> talks about a CPU(sic) producing the "RGMII" and "MII-MGMT". >> >> Let's instead show a better representation that hopefully helps Greg >> understand networking. :) >> >> >> CPU >> System <-B-> Ethernet controller <-P-> } PHY <---> network cable >> } - - - - - - - or - - - - - - - >> MDIO bus ---M---> } Switch <-P-> PHYs <--> network >> `M^cables >> >> 'B' can be an on-SoC bus or something like PCI. >> >> 'P' are the high-speed connectivity between the ethernet controller and >> PHY which carries the packet data. It has no addressing, it's a point >> to point link. RGMII is just one wiring example, there are many >> different interfaces there (SGMII, AUI, XAUI, XGMII to name a few.) >> >> 'M' are the MDIO bus, which is the bus by which ethernet PHYs and >> switches can be identified and controlled. >> >> The MDIO bus has a bus_type, has host drivers which are sometimes >> part of the ethernet controller, but can also be stand-alone devices >> shared between multiple ethernet controllers. >> >> PHYs are a kind of MDIO device which are members of the MDIO bus >> type. Each PHY (and switch) has a numerical address, and identifying >> numbers within its register set which identifies the manufacturer >> and device type. We have device_driver objects for these. >> >> Expanding the above diagram to make it (hopefully) even clearer, >> we can have this classic setup: >> >> CPU >> System <-B-> Ethernet controller <-P-> PHY <---> network cable >> MDIO bus ---M--^ >> >> Or, in the case of two DSA switches attached to an Ethernet controller: >> >> || >> System <-B-> Ethernet controller <-P-> Switch <-P-> PHY1 <--> network cable >> MDIO bus +--M---> 1<-P-> PHY2 <--> network cable >> | |...| >> | |<-P-> PHYn <--> network cable >> | |^...| | >> | | `---M---' >> | P >> | | >> | |v~~~| >> `--> Switch <-P-> PHY1 <--> network cable >> | 2...| >> |<-P-> PHYn <--> network cable >> || | >> `---M---' >> >> The problem that the DSA guys are trying to deal with is how to >> represent the link between the DSA switches (which are devices >> sitting off their controlling bus - the MDIO bus) and the ethernet >> controller associated with that collection of devices, be it a >> switch or PHY. > > Why do they have to represent that link? This is a driver that somehow > binds the two togther in some sort of "control plane"? We have to represent that link because the CPU/host/management Ethernet MAC is physically connected to the CPU/management port of the switch. It does indeed participate in establishing the control plane. The basic idea of DSA is that the switch inserts vendor tags to indicate why the packet is sent towards the CPU in the first place: flooding, management, copy etc along with information as to which originating/destination port(s) this packet comes/goes from/to. On top of that, we demultiplex that tag to deliver normal Ethernet frames to per-port network devices (virtual network devices). If we did leave the switch in an unmanaged mode and not logically attached to an Ethernet MAC for management, we'd lose all that information (we could use per-port VLANs to re-create it, but it would be inferior to what a switch with proprietary tags can do) Code in net/dsa/dsa2.c that binds the two (switch and Ethernet MAC) together is not strictly a driver, it just is resident in memory and waits for dsa_register_switch() to be called until it tries to do this binding. > >> Merely changing the parent/child relationships to try and solve >> one issue just creates exactly the same problem elsewhere. > > Fair enough. > >> So, I hope with these diagrams, you can see that trying to make >> the ethernet controller a child device of the DSA switches >> means that (eg) it's no longer a PCI device, which is rather >> absurd, especially when considering that what happens to the >> right of the ethernet controller in the diagrams above is >> normally external chips to the SoC or ethernet device. > > Ok, thanks for the long explainations and
Re: cafe8df8b9bc clashes with DSA
On Fri, Feb 10, 2017 at 12:55:44PM -0500, Vivien Didelot wrote: > Hi Florian, > > Florian Fainelliwrites: > > > Fixed in the "net" tree with: > > > > 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module > > checks and NULL deref in phy_attach_direct()"), applies fine to net-next > > as well. > > Correct, this fixes my setup. Shouldn't this be submitted to net-next as > well then? Hi Vivien David will at some point merge net into net-next. Until then, you can work around the issue by enabling the PHY drivers for you hardware. You are also likely to gain a few nice features, like PHY interrupts rather than polling, maybe some temperature sensors, PHY statistics, etc... Andrew
Re: [PATCH 0/4] Whitespace checkpatch fixes
From: "Tobin C. Harding"Date: Thu, 9 Feb 2017 17:56:03 +1100 > This patch set fixes various whitespace checkpatch errors and warnings. Series applied.
[PATCH net-next] net_sched: fix error recovery at qdisc creation
From: Eric DumazetDmitry reported uses after free in qdisc code [1] The problem here is that ops->init() can return an error. qdisc_create_dflt() then call ops->destroy(), while qdisc_create() does _not_ call it. Four qdisc chose to call their own ops->destroy(), assuming their caller would not. This patch makes sure qdisc_create() calls ops->destroy() and fixes the four qdisc to avoid double free. [1] BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr 8801d415d440 Read of size 8 by task syz-executor2/5030 CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 0046 8801b435b870 81bbbed4 8801db000400 8801d415d440 8801d415dc40 8801c4988510 8801b435b898 816682b1 8801b435b928 8801d415d440 8801c49880c0 Call Trace: [] __dump_stack lib/dump_stack.c:15 [inline] [] dump_stack+0x6c/0x98 lib/dump_stack.c:51 [] kasan_object_err+0x21/0x70 mm/kasan/report.c:158 [] print_address_description mm/kasan/report.c:196 [inline] [] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285 [] kasan_report mm/kasan/report.c:305 [inline] [] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326 [] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 [] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953 [] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848 [] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline] [] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064 [] __dev_open+0x221/0x320 net/core/dev.c:1403 [] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858 [] dev_change_flags+0x8e/0x140 net/core/dev.c:6926 [] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260 [] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546 [] sock_do_ioctl+0x99/0xb0 net/socket.c:879 [] sock_ioctl+0x2a0/0x390 net/socket.c:958 [] vfs_ioctl fs/ioctl.c:44 [inline] [] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611 [] SYSC_ioctl fs/ioctl.c:626 [inline] [] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617 [] entry_SYSCALL_64_fastpath+0x12/0x17 Signed-off-by: Eric Dumazet Reported-by: Dmitry Vyukov --- net/sched/sch_api.c|2 ++ net/sched/sch_hhf.c|8 ++-- net/sched/sch_mq.c | 10 +++--- net/sched/sch_mqprio.c | 19 ++- net/sched/sch_sfq.c|3 ++- 5 files changed, 19 insertions(+), 23 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index adeabaec0d0b8bd3115e8d8db756460227142c60..a13c15e8f08782f9a428690052bf5585c446b6fe 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1019,6 +1019,8 @@ static struct Qdisc *qdisc_create(struct net_device *dev, return sch; } + /* ops->init() failed, we call ->destroy() like qdisc_create_dflt() */ + ops->destroy(sch); err_out3: dev_put(dev); kfree((char *) sch - sch->padded); diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c index e3d0458af17ba32cb203d4a5bed952baf9d22588..2fae8b5f1b80c017c4ae60df54c9143f82de4e9d 100644 --- a/net/sched/sch_hhf.c +++ b/net/sched/sch_hhf.c @@ -627,7 +627,9 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt) q->hhf_arrays[i] = hhf_zalloc(HHF_ARRAYS_LEN * sizeof(u32)); if (!q->hhf_arrays[i]) { - hhf_destroy(sch); + /* Note: hhf_destroy() will be called +* by our caller. +*/ return -ENOMEM; } } @@ -638,7 +640,9 @@ static int hhf_init(struct Qdisc *sch, struct nlattr *opt) q->hhf_valid_bits[i] = hhf_zalloc(HHF_ARRAYS_LEN / BITS_PER_BYTE); if (!q->hhf_valid_bits[i]) { - hhf_destroy(sch); + /* Note: hhf_destroy() will be called +* by our caller. +*/ return -ENOMEM; } } diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c index 2bc8d7f8df161005bc89245ca5ebc52f3360e3af..20b7f1646f69270e08d8b7588759a0146f262e89 100644 --- a/net/sched/sch_mq.c +++ b/net/sched/sch_mq.c @@ -52,7 +52,7 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt) /* pre-allocate qdiscs, attachment can't fail */ priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]), GFP_KERNEL); - if (priv->qdiscs == NULL) + if (!priv->qdiscs) return -ENOMEM; for (ntx = 0; ntx < dev->num_tx_queues; ntx++) { @@ -60,18
[PATCH net-next] net: ethtool: add support for forward error correction modes
From: Vidya Sagar RavipatiForward Error Correction (FEC) modes i.e Base-R and Reed-Solomon modes are introduced in 25G/40G/100G standards for providing good BER at high speeds. Various networking devices which support 25G/40G/100G provides ability to manage supported FEC modes and the lack of FEC encoding control and reporting today is a source for itneroperability issues for many vendors. FEC capability as well as specific FEC mode i.e. Base-R or RS modes can be requested or advertised through bits D44:47 of base link codeword. This patch set intends to provide option under ethtool to manage and report FEC encoding settings for networking devices as per IEEE 802.3 bj, bm and by specs. set-fec/show-fec option(s) are designed to provide control and report the FEC encoding on the link. SET FEC option: root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto] Encoding: Types of encoding Off: Turning off any encoding RS : enforcing RS-FEC encoding on supported speeds BaseR : enforcing Base R encoding on supported speeds Auto : IEEE defaults for the speed/medium combination Here are a few examples of what we would expect if encoding=auto: - if autoneg is on, we are expecting FEC to be negotiated as on or off as long as protocol supports it - if the hardware is capable of detecting the FEC encoding on it's receiver it will reconfigure its encoder to match - in absence of the above, the configuration would be set to IEEE defaults. >From our understanding , this is essentially what most hardware/driver combinations are doing today in the absence of a way for users to control the behavior. SHOW FEC option: root@tor: ethtool --show-fec swp1 FEC parameters for swp1: Active FEC encodings: RS Configured FEC encodings: RS | BaseR ETHTOOL DEVNAME output modification: ethtool devname output: root@tor:~# ethtool swp1 Settings for swp1: root@hpe-7712-03:~# ethtool swp18 Settings for swp18: Supported ports: [ FIBRE ] Supported link modes: 4baseCR4/Full 4baseSR4/Full 4baseLR4/Full 10baseSR4/Full 10baseCR4/Full 10baseLR4_ER4/Full Supported pause frame use: No Supports auto-negotiation: Yes Supported FEC modes: [RS | BaseR | None | Not reported] Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: [RS | BaseR | None | Not reported] One or more FEC modes Speed: 10Mb/s Duplex: Full Port: FIBRE PHYAD: 106 Transceiver: internal Auto-negotiation: off Link detected: yes This patch includes following changes a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by the new get_fecparam/set_fecparam callbacks, provides support for configuration of forward error correction modes. b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC are defined so that users can configure these fec modes for supported and advertising fields as part of link autonegotiation. Signed-off-by: Vidya Sagar Ravipati Changes in RFC PATCH v2: - Implement Gal Pressman and Casey Leedom feedback - Removing autonegotiation field in fecparam structure and included active_fec to provide mechanism to indicate the configured and active FEC modes on port --- include/linux/ethtool.h | 4 include/uapi/linux/ethtool.h | 48 +++- net/core/ethtool.c | 34 +++ 3 files changed, 85 insertions(+), 1 deletion(-) diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 9ded8c6..79a0bab 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -372,5 +372,9 @@ struct ethtool_ops { struct ethtool_link_ksettings *); int (*set_link_ksettings)(struct net_device *, const struct ethtool_link_ksettings *); + int (*get_fecparam)(struct net_device *, + struct ethtool_fecparam *); + int (*set_fecparam)(struct net_device *, + struct ethtool_fecparam *); }; #endif /* _LINUX_ETHTOOL_H */ diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 3dc91a4..38dbfeb 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -1238,6 +1238,47 @@ struct ethtool_per_queue_op { chardata[]; }; +/** + * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters + * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM + * @active_fec: FEC mode which is active on porte + * @fec: Bitmask of supported/configured FEC modes + * @rsvd: Reserved for future extensions. i.e FEC bypass feature. + * + * Drivers
Re: net/packet: use-after-free in packet_rcv_fanout
On Fri, 2017-02-10 at 10:02 -0800, Cong Wang wrote: > I don't have to give a 100% correct patch to prove my explanation > of the crash. At least it makes more sense than yours... I will submit it regardless of what you think. It solves _another_ issue, one of of 10 in af_packet.c
Re: cafe8df8b9bc clashes with DSA
Hi Andrew, Andrew Lunnwrites: > David will at some point merge net into net-next. Yes I know that, I just wasn't sure if having such crash in net-next was tolerated or not. Cherry-picking 6d9f66ac7fec does the job on my side. > Until then, you can work around the issue by enabling the PHY drivers > for you hardware. You are also likely to gain a few nice features, > like PHY interrupts rather than polling, maybe some temperature > sensors, PHY statistics, etc... Hum I have CONFIG_MARVELL_PHY enabled, am I missing something? Thanks, Vivien
Re: cafe8df8b9bc clashes with DSA
From: Vivien DidelotDate: Fri, 10 Feb 2017 12:55:44 -0500 > Hi Florian, > > Florian Fainelli writes: > >> Fixed in the "net" tree with: >> >> 6d9f66ac7fec2a6ccd649e5909806dfe36f1fc25 ("net: phy: Fix PHY module >> checks and NULL deref in phy_attach_direct()"), applies fine to net-next >> as well. > > Correct, this fixes my setup. Shouldn't this be submitted to net-next as > well then? It will propagate there the next time I merge to Linus and then merge net into net-next.
Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag
On Thu, Feb 09, 2017 at 10:59:23AM -0800, Alexei Starovoitov wrote: > Andy, > does it all make sense? Andy, ping.
Re: [patch net-next 00/10] mlxsw: Offload MC flood for unregister MC
From: Jiri PirkoDate: Thu, 9 Feb 2017 14:54:39 +0100 > From: Jiri Pirko > > Nogah says: > > When multicast is enabled, the Linux bridge floods unregistered multicast > packets only to ports connected to a multicast router. Devices capable of > offloading the Linux bridge need to be made aware of such ports, for > proper flooding behavior. > On the other hand, when multicast is disabled, such packets should be > flooded to all ports. This patchset aims to fix that, by offloading > the multicast state and the list of multicast router ports. > > The first 3 patches adds switchdev attributes to offload this data. > The rest of the patchset add implementation for handling this data in the > mlxsw driver. > > The effects this data has on the MDB (namely, when the multicast is > disabled the MDB should be considered as invalid, and when it is enabled, a > packet that is flooded by it should also be flooded to the multicast > routers ports) is subject of future work. > > Testing of this patchset included: > Sending 3 mc packets streams, LL, register and unregistered, and checking > that they reached only to the ports that should have received them. > The configs were: > mc disabled, mc without mc router ports and mc with fixed router port. > It was checked for vlan aware bridge, vlan unaware bridge and vlan unaware > bridge with another vlan unaware bridge on the same machine Series applied, thanks.
Re: cafe8df8b9bc clashes with DSA
On 02/10/2017 10:15 AM, Vivien Didelot wrote: > Hi Andrew, > > Andrew Lunnwrites: > >> David will at some point merge net into net-next. > > Yes I know that, I just wasn't sure if having such crash in net-next was > tolerated or not. Cherry-picking 6d9f66ac7fec does the job on my side. > >> Until then, you can work around the issue by enabling the PHY drivers >> for you hardware. You are also likely to gain a few nice features, >> like PHY interrupts rather than polling, maybe some temperature >> sensors, PHY statistics, etc... > > Hum I have CONFIG_MARVELL_PHY enabled, am I missing something? If you have fixed PHYs they'll use Generic PHY, suddenly the dungeons collapses, you die. -- Florian
Re: [PATCH v4] net: ethernet: faraday: To support device tree usage.
On Wed, Feb 8, 2017 at 5:59 AM, Greentime Huwrote: > On Sat, Jan 28, 2017 at 6:17 AM, Rob Herring wrote: >> >> On Wed, Jan 25, 2017 at 10:09:20PM +0100, Arnd Bergmann wrote: >> > On Wed, Jan 25, 2017 at 6:34 PM, David Miller wrote: >> > > From: Greentime Hu >> > > Date: Tue, 24 Jan 2017 16:46:14 +0800 >> > >> We also use the same binding document to describe the same faraday >> > >> ethernet >> > >> controller and add faraday to vendor-prefixes.txt. >> > > >> > > Why are you renaming the MOXA binding file instead of adding a >> > > completely new one >> > > for faraday? The MOXA one should stick around, I don't see a >> > > justification for >> > > removing it. >> > >> > This was my suggestion, basically fixing the name of the existing >> > binding, which was >> > accidentally named after one of the users rather than the company that did >> > the >> > hardware. >> > >> > We can't change the compatible string, but I'd much prefer having only >> > one binding >> > file for this device rather than two separate ones that could possibly >> > become >> > incompatible in case we add new properties to them. If there is only >> > one of them, >> > naming it according to the hardware design is the general policy. >> > >> > Note that we currently have two separate device drivers, but that is more a >> > historic artifact, and if we ever get around to merging them into one >> > driver, >> > that should not impact the binding. >> >> The change is fine with me, but the subject and commit message need some >> work. > > Hi, Rob: > > Would you please advise me of the proper subject and commit messages? Split the binding to a separate commit and summarize the email discussion here. For a subject, something like this: "dt-bindings: net: generalize moxart-mac to support all faraday based ftmac IP" Rob
Re: [PATCH RFC net] net/mlx5e: Add preemption enable/disable around TC statistics upcall
On Fri, 10 Feb 2017 18:21:25 +0200, Or Gerlitz wrote: > On Fri, Feb 10, 2017 at 3:34 AM, Jakub Kicinski wrote: > > On Thu, 9 Feb 2017 17:38:43 +0200, Or Gerlitz wrote: > >> Running with CONFIG_PREEMPT set, I get a > >> > >> BUG: using smp_processor_id() in preemptible [] code: tc/3793 > >> > >> asserion from the TC action (mirred) stats_update callback, when the do > >> > >> _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets) > >> > >> As done by commit 66860be "nfp: bpf: allow offloaded filters to update > >> stats", > >> disabling/enabling preemption around the TC upcall solves that. > >> > >> Fixes: aad7e08d39bd ('net/mlx5e: Hardware offloaded flower filter > >> statistics support') > >> Signed-off-by: Or Gerlitz> >> --- > >> > >> I marked it as RFC, since I wasn't fully sure on the nature of the > >> problem, nor if this is the direction we should take to the fix. > > > I think it's the right fix > > Do you under the problem? what's wrong with the call done in the TC > action code w.r.t preemption? > > does it make sense to do this (say) 100K times/sec? TC actions have pre-cpu stats, referencing them has to be done with preemption disabled. Let's CC Jamal and Cong - maybe there are some more clever things we could do here? The situation in a nutshell is that the offload drivers read the stats from HW and want to write them back to the TC action stats. The writeback happens in process context when user requests stats dump (potentially for multiple actions but we currently would just iterate over all actions in driver code).
Re: net/packet: use-after-free in packet_rcv_fanout
On Fri, 2017-02-10 at 09:59 -0800, Eric Dumazet wrote: > On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote: > > On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazetwrote: > > > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote: > > > > > >> More likely the bug is in fanout_add(), with a buggy sequence in error > > >> case, and not correct locking. > > >> > > >> kfree(po->rollover); > > >> po->rollover = NULL; > > >> > > >> Two cpus entering fanout_add() (using the same af_packet socket, > > >> syzkaller courtesy...) might both see po->fanout being NULL. > > >> > > >> Then they grab the mutex. Too late... > > > > > > Patch could be : > > > > > > > For me, clearly the data structure that use-after-free'd is struct sock > > rather than struct packet_rollover. > > Fine. But your patch makes absolutely no sense. At least, Anoob patch is making a step into the right direction ;) https://patchwork.ozlabs.org/patch/726532/
Re: net/packet: use-after-free in packet_rcv_fanout
On Fri, Feb 10, 2017 at 9:59 AM, Eric Dumazetwrote: > On Fri, 2017-02-10 at 09:49 -0800, Cong Wang wrote: >> On Thu, Feb 9, 2017 at 7:23 PM, Eric Dumazet wrote: >> > On Thu, 2017-02-09 at 19:19 -0800, Eric Dumazet wrote: >> > >> >> More likely the bug is in fanout_add(), with a buggy sequence in error >> >> case, and not correct locking. >> >> >> >> kfree(po->rollover); >> >> po->rollover = NULL; >> >> >> >> Two cpus entering fanout_add() (using the same af_packet socket, >> >> syzkaller courtesy...) might both see po->fanout being NULL. >> >> >> >> Then they grab the mutex. Too late... >> > >> > Patch could be : >> > >> >> For me, clearly the data structure that use-after-free'd is struct sock >> rather than struct packet_rollover. > > Fine. But your patch makes absolutely no sense. I don't have to give a 100% correct patch to prove my explanation of the crash. At least it makes more sense than yours...
Re: [PATCH] [net-next] ARM: orion: fix PHYLIB dependency
On 02/10/2017 12:20 AM, Arnd Bergmann wrote: > On Thu, Feb 9, 2017 at 7:22 PM, Florian Fainelliwrote: >> On 02/09/2017 07:08 AM, Arnd Bergmann wrote: >> I disabled CONFIG_NETDEVICES to force CONFIG_PHY not to be set here, and >> I was not able to reproduce this, what am I missing? > > In the ARMv5 allmodconfig build, this fails because CONFIG_PHY=m, and > we can't call into it. You could use IS_BUILTIN instead of IS_ENABLED in > the header as a oneline workaround, but I think that would be more confusing > to real users that try to use CONFIG_PHY=m without realizing why they lose > access to their switch. I see, this patch should also help fixing this: http://patchwork.ozlabs.org/patch/726381/ -- Florian
Re: net/packet: use-after-free in packet_rcv_fanout
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhanwrote: > On (02/09/17 19:19), Eric Dumazet wrote: >> >> More likely the bug is in fanout_add(), with a buggy sequence in error >> case, and not correct locking. >> >> kfree(po->rollover); >> po->rollover = NULL; >> >> Two cpus entering fanout_add() (using the same af_packet socket, >> syzkaller courtesy...) might both see po->fanout being NULL. >> >> Then they grab the mutex. Too late... > > I'm not sure I follow- aiui the panic was in acceessing the > sk_receive_queue.lock in a socket that had been closed earlier. I think > the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and > rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit > packet delivery can be done safely, and the synchronize_net in > packet_release() makes sure that the Tx paths are quiesced before freeing > the socket. What is the race-hole here? Does it have to do with the > _bh and softirq context, somehow? My understanding about the race here is packet_release() doesn't wait for flying packets correctly, which leads to a flying packet still refers to the struct sock which is being released. This could happen because struct packet_fanout is refcn'ted, it is still there when this is not the last sock referring it, therefore, the callback packet_rcv_fanout() is not removed yet. When packet_release() tries to remove the pointer to struct sock from f->arr[i] in __fanout_unlink(), a flying packet could race with f->arr[i]: po = pkt_sk(f->arr[idx]); Of course, the fix may not be as easy as just adding a synchronize_net(), perhaps we need the spinlock too in fanout_demux_rollover(). At least I believe this explains the crash Dmitry reported.
Re: [PATCH v4 0/3] Miscellaneous fixes for BPF (perf tree)
Em Wed, Feb 08, 2017 at 09:27:41PM +0100, Mickaël Salaün escreveu: > This series brings some fixes and small improvements to the BPF samples. > > This is intended for the perf tree and apply on 7a5980f9c006 ("tools lib bpf: > Add missing header to the library"). Wang, are you ok with this series? Joe? - Arnaldo > Changes since v3: > * remove applied patch 1/5 > * remove patch 2/5 on bpf_load_program() as requested by Wang Nan > > Changes since v2: > * add this cover letter > > Changes since v1: > * exclude patches not intended for the perf tree > > Regards, > > Mickaël Salaün (3): > samples/bpf: Ignore already processed ELF sections > samples/bpf: Reset global variables > samples/bpf: Add missing header > > samples/bpf/bpf_load.c | 7 +++ > samples/bpf/tracex5_kern.c | 1 + > 2 files changed, 8 insertions(+) > > -- > 2.11.0