[PATCH] ip: find correct route for socket which is not bound to a device
For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. >From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packages won't be routed. A problem is hit that all multi-cast packages dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multi-cast package. In side IPoIB the first send is dropped because is exeeding the internal package size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multi-cast, so the setting of pmtu cucceeded and the next multi-cast package(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multi-cast packages got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang--- net/ipv4/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..032481a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2153,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, , 0)) { res.fi = NULL; res.table = NULL; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 0/2] bpf: performance improvements
v1->v2: dropped redundant iff_up check in patch 2 At plumbers we discussed different options on how to get rid of skb_clone from bpf_clone_redirect(), the patch 2 implements the best option. Patch 1 adds 'integrated exts' to cls_bpf to improve performance by combining simple actions into bpf classifier. Alexei Starovoitov (1): bpf: add bpf_redirect() helper Daniel Borkmann (1): cls_bpf: introduce integrated actions include/net/sch_generic.h|3 ++- include/uapi/linux/bpf.h |9 +++ include/uapi/linux/pkt_cls.h |4 +++ net/core/dev.c |8 ++ net/core/filter.c| 58 +++ net/sched/act_bpf.c |1 + net/sched/cls_bpf.c | 61 ++ samples/bpf/bpf_helpers.h|4 +++ samples/bpf/tcbpf1_kern.c| 24 - 9 files changed, 159 insertions(+), 13 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 2/2] bpf: add bpf_redirect() helper
Existing bpf_clone_redirect() helper clones skb before redirecting it to RX or TX of destination netdev. Introduce bpf_redirect() helper that does that without cloning. Benchmarked with two hosts using 10G ixgbe NICs. One host is doing line rate pktgen. Another host is configured as: $ tc qdisc add dev $dev ingress $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop so it receives the packet on $dev and immediately xmits it on $dev + 1 The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program that does bpf_clone_redirect() and performance is 2.0 Mpps $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section redirect_xmit drop which is using bpf_redirect() - 2.4 Mpps and using cls_bpf with integrated actions as: $ tc filter add dev $dev root pref 10 \ bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1 performance is 2.5 Mpps To summarize: u32+act_bpf using clone_redirect - 2.0 Mpps u32+act_bpf using redirect - 2.4 Mpps cls_bpf using redirect - 2.5 Mpps For comparison linux bridge in this setup is doing 2.1 Mpps and ixgbe rx + drop in ip_rcv - 7.8 Mpps Signed-off-by: Alexei StarovoitovAcked-by: Daniel Borkmann --- This approach is using per_cpu scratch area to store ifindex and flags. The other alternatives discussed at plumbers are slower and more intrusive. v1->v2: dropped redundant iff_up check include/net/sch_generic.h|1 + include/uapi/linux/bpf.h |8 include/uapi/linux/pkt_cls.h |1 + net/core/dev.c |8 net/core/filter.c| 44 ++ net/sched/act_bpf.c |1 + net/sched/cls_bpf.c |1 + samples/bpf/bpf_helpers.h|4 samples/bpf/tcbpf1_kern.c| 24 ++- 9 files changed, 91 insertions(+), 1 deletion(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index da61febb9091..4c79ce8c1f92 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -402,6 +402,7 @@ void __qdisc_calculate_pkt_len(struct sk_buff *skb, const struct qdisc_size_table *stab); bool tcf_destroy(struct tcf_proto *tp, bool force); void tcf_destroy_chain(struct tcf_proto __rcu **fl); +int skb_do_redirect(struct sk_buff *); /* Reset all TX qdiscs greater then index of a device. */ static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2fbd1c71fa3b..4ec0b5488294 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -272,6 +272,14 @@ enum bpf_func_id { BPF_FUNC_skb_get_tunnel_key, BPF_FUNC_skb_set_tunnel_key, BPF_FUNC_perf_event_read, /* u64 bpf_perf_event_read(, index) */ + /** +* bpf_redirect(ifindex, flags) - redirect to another netdev +* @ifindex: ifindex of the net device +* @flags: bit 0 - if set, redirect to ingress instead of egress +* other bits - reserved +* Return: TC_ACT_REDIRECT +*/ + BPF_FUNC_redirect, __BPF_FUNC_MAX_ID, }; diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 0a262a83f9d4..439873775d49 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -87,6 +87,7 @@ enum { #define TC_ACT_STOLEN 4 #define TC_ACT_QUEUED 5 #define TC_ACT_REPEAT 6 +#define TC_ACT_REDIRECT7 #define TC_ACT_JUMP0x1000 /* Action type identifiers*/ diff --git a/net/core/dev.c b/net/core/dev.c index 877c84834d81..d6a492e57874 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3668,6 +3668,14 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb, case TC_ACT_QUEUED: kfree_skb(skb); return NULL; + case TC_ACT_REDIRECT: + /* skb_mac_header check was done by cls/act_bpf, so +* we can safely push the L2 header back before +* redirecting to another netdev +*/ + __skb_push(skb, skb->mac_len); + skb_do_redirect(skb); + return NULL; default: break; } diff --git a/net/core/filter.c b/net/core/filter.c index 971d6ba89758..da3f3d94d6e9 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1427,6 +1427,48 @@ const struct bpf_func_proto bpf_clone_redirect_proto = { .arg3_type = ARG_ANYTHING, }; +struct redirect_info { + u32 ifindex; + u32 flags; +}; + +static DEFINE_PER_CPU(struct redirect_info, redirect_info); +static u64 bpf_redirect(u64 ifindex, u64 flags, u64 r3, u64 r4, u64 r5) +{ + struct redirect_info *ri =
[PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions
From: Daniel BorkmannOften cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. Then more interesting programs like the following are easier to write: int cls_bpf_prog(struct __sk_buff *skb) { /* classify arp, ip, ipv6 into different traffic classes * and drop all other packets */ switch (skb->protocol) { case htons(ETH_P_ARP): skb->tc_classid = 1; break; case htons(ETH_P_IP): skb->tc_classid = 2; break; case htons(ETH_P_IPV6): skb->tc_classid = 3; break; default: return TC_ACT_SHOT; } return TC_ACT_OK; } Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann Signed-off-by: Alexei Starovoitov --- v1->v2: no changes include/net/sch_generic.h|2 +- include/uapi/linux/bpf.h |1 + include/uapi/linux/pkt_cls.h |3 +++ net/core/filter.c| 14 ++ net/sched/cls_bpf.c | 60 ++ 5 files changed, 68 insertions(+), 12 deletions(-) diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 444faa89a55f..da61febb9091 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -251,7 +251,7 @@ struct tcf_proto { struct qdisc_skb_cb { unsigned intpkt_len; u16 slave_dev_queue_mapping; - u16 _pad; + u16 tc_classid; #define QDISC_CB_PRIV_LEN 20 unsigned char data[QDISC_CB_PRIV_LEN]; }; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 92a48e2d5461..2fbd1c71fa3b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -293,6 +293,7 @@ struct __sk_buff { __u32 tc_index; __u32 cb[5]; __u32 hash; + __u32 tc_classid; }; struct bpf_tunnel_key { diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 4f0d1bc3647d..0a262a83f9d4 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -373,6 +373,8 @@ enum { /* BPF classifier */ +#define TCA_BPF_FLAG_ACT_DIRECT(1 << 0) + enum { TCA_BPF_UNSPEC, TCA_BPF_ACT, @@ -382,6 +384,7 @@ enum { TCA_BPF_OPS, TCA_BPF_FD, TCA_BPF_NAME, + TCA_BPF_FLAGS, __TCA_BPF_MAX, }; diff --git a/net/core/filter.c b/net/core/filter.c index 13079f03902e..971d6ba89758 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1632,6 +1632,9 @@ static bool __is_valid_access(int off, int size, enum bpf_access_type type) static bool sk_filter_is_valid_access(int off, int size, enum bpf_access_type type) { + if (off == offsetof(struct __sk_buff, tc_classid)) + return false; + if (type == BPF_WRITE) { switch (off) { case offsetof(struct __sk_buff, cb[0]) ... @@ -1648,6 +1651,9 @@ static bool sk_filter_is_valid_access(int off, int size, static bool tc_cls_act_is_valid_access(int off, int size, enum bpf_access_type type) { + if (off == offsetof(struct __sk_buff, tc_classid)) + return type == BPF_WRITE ? true : false; + if (type == BPF_WRITE) { switch (off) { case offsetof(struct __sk_buff, mark): @@ -1760,6 +1766,14 @@ static u32 bpf_net_convert_ctx_access(enum bpf_access_type type, int dst_reg, *insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg, ctx_off); break; + case offsetof(struct __sk_buff, tc_classid): + ctx_off -= offsetof(struct __sk_buff, tc_classid); + ctx_off += offsetof(struct sk_buff, cb); + ctx_off += offsetof(struct qdisc_skb_cb, tc_classid); + WARN_ON(type != BPF_WRITE); + *insn++ = BPF_STX_MEM(BPF_H, dst_reg, src_reg, ctx_off); + break; + case offsetof(struct __sk_buff, tc_index): #ifdef CONFIG_NET_SCHED BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, tc_index) != 2); diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index e5168f8b9640..77b0ef148256 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -38,6 +38,7 @@ struct cls_bpf_prog { struct bpf_prog *filter; struct list_head link; struct tcf_result res; + bool exts_integrated; struct tcf_exts exts; u32 handle; union { @@ -52,6 +53,7 @@ struct cls_bpf_prog { static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = { [TCA_BPF_CLASSID] = { .type = NLA_U32 }, + [TCA_BPF_FLAGS] = { .type = NLA_U32 }, [TCA_BPF_FD]= { .type = NLA_U32 }, [TCA_BPF_NAME]
[PATCH v2 1/1] eventfd: implementation of EFD_MASK flag
From: Martin SustrikWhen implementing network protocols in user space, one has to implement fake file descriptors to represent the sockets for the protocol. Polling on such fake file descriptors is a problem (poll/select/epoll accept only true file descriptors) and forces protocol implementers to use various workarounds resulting in complex, non-standard and convoluted APIs. More generally, ability to create full-blown file descriptors for userspace-to-userspace signalling is missing. While eventfd(2) goes half the way towards this goal it has follwoing shorcomings: I. There's no way to signal POLLPRI, POLLHUP etc. II. There's no way to signal arbitrary combination of POLL* flags. Most notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid combination for a network protocol (rx buffer is empty and tx buffer is full), cannot be signaled using eventfd. This patch implements new EFD_MASK flag which solves the above problems. Additionally, to provide a way to associate user-space state with eventfd object, it allows to attach user-space data to the file descriptor. The semantics of EFD_MASK are as follows: eventfd(2): If eventfd is created with EFD_MASK flag set, it is initialised in such a way as to signal no events on the file descriptor when it is polled on. The 'initval' argument is ignored. write(2): User is allowed to write only buffers containing the following structure: struct efd_mask { uint32_t events; }; The value of 'events' should be any combination of event flags as defined by poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified events will be signaled when polling (select, poll, epoll) on the eventfd is done later on. read(2): read is not supported and will fail with EINVAL. select(2), poll(2) and similar: When polling on the eventfd marked by EFD_MASK flag, all the events specified in last written 'events' field shall be signaled. Signed-off-by: Martin Sustrik [dhobs...@igel.co.jp: Rebased, and resubmitted for Linux 4.3] Signed-off-by: Damian Hobson-Garcia --- fs/eventfd.c | 102 ++- include/linux/eventfd.h | 16 +-- include/uapi/linux/eventfd.h | 39 + 3 files changed, 132 insertions(+), 25 deletions(-) create mode 100644 include/uapi/linux/eventfd.h diff --git a/fs/eventfd.c b/fs/eventfd.c index 8d0c0df..1a6a066 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -2,6 +2,7 @@ * fs/eventfd.c * * Copyright (C) 2007 Davide Libenzi + * Copyright (C) 2013 Martin Sustrik * */ @@ -22,18 +23,31 @@ #include #include +#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK) +#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP) + struct eventfd_ctx { struct kref kref; wait_queue_head_t wqh; - /* -* Every time that a write(2) is performed on an eventfd, the -* value of the __u64 being written is added to "count" and a -* wakeup is performed on "wqh". A read(2) will return the "count" -* value to userspace, and will reset "count" to zero. The kernel -* side eventfd_signal() also, adds to the "count" counter and -* issue a wakeup. -*/ - __u64 count; + union { + /* +* Every time that a write(2) is performed on an eventfd, the +* value of the __u64 being written is added to "count" and a +* wakeup is performed on "wqh". A read(2) will return the +* "count" value to userspace, and will reset "count" to zero. +* The kernel side eventfd_signal() also, adds to the "count" +* counter and issue a wakeup. +*/ + __u64 count; + + /* +* When using eventfd in EFD_MASK mode this stracture stores the +* current events to be signaled on the eventfd (events member) +* along with opaque user-defined data (data member). +*/ + struct efd_mask mask; + }; unsigned int flags; }; @@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) return events; } +static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait) +{ + struct eventfd_ctx *ctx = file->private_data; + + poll_wait(file, >wqh, wait); + return ctx->mask.events; +} + static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) { *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; @@ -239,6 +261,14 @@ static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, return put_user(cnt, (__u64 __user *) buf) ? -EFAULT : sizeof(cnt); }
[PATCH v2 0/1] Generalize poll events from eventfd
Using eventfd user space can generate POLLIN/POLLOUT events but some applications may want to generate POLLPRI/POLLERR events as well. This patch submission aims to generalize the events generated by an eventfd. This is a resubmission of a patch from Feb 2013[1]. The original discussion trailed off without any conclusion, but the original author has recently confirmed[2] that this functionality is still useful, so I volunteered to rebase and resubmit the patch for discussion. [1] https://lkml.org/lkml/2013/2/18/147 [2] https://lkml.org/lkml/2015/7/9/153 Changes in v2 - * rebased on Linux v4.3-rc1 * Move file operation implementations for EFD_MASK to a seperate structure * Remove 'data' element from efd_mask structure * read() is no longer supported when EFD_MASK is set (fails with EINVAL) * eventfd_ctx_fileget() now returns EINVAL when EFD_MASK is set, eliminating the possibility of triggering the orginal BUG_ON() macros which have now been removed. Thank you, Damian Martin Sustrik (1): eventfd: implementation of EFD_MASK flag fs/eventfd.c | 91 ++-- include/linux/eventfd.h | 16 +--- include/uapi/linux/eventfd.h | 40 +++ 3 files changed, 121 insertions(+), 26 deletions(-) create mode 100644 include/uapi/linux/eventfd.h -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 4.2 : "bridge vlan" command return empty result (works with kernel 4.1.3)
>>Do you have a bond in your system ?. Yes, Indeed. Removing the bond fix the problem. I'll try your patch today. Thanks ! Alexandre - Mail original - De: "roopa"À: "aderumier" Cc: "netdev" , "Scott Feldman" Envoyé: Mardi 15 Septembre 2015 21:02:34 Objet: Re: kernel 4.2 : "bridge vlan" command return empty result (works with kernel 4.1.3) On 9/15/15, 10:39 AM, Alexandre DERUMIER wrote: > Hi, > > since kernel 4.2, "bridge vlan" command return empty result. > > > kernel 4.1.3 > > # bridge vlan > port vlan ids > eth0 1 PVID Egress Untagged > 90 > 91 > 92 > 93 > 94 > 95 > 96 > 97 > 98 > 99 > 100 > > vmbr0 1 PVID Egress Untagged > 94 > > > > kernel 4.2 > > # bridge vlan > port vlan ids > > > > Note that vlans are correctly working,it seem that is just the display. > > tcpdump -e -i vmbr0 > > 19:38:08.005055 00:08:7c:bd:ae:40 (oui Unknown) > 00:18:8b:7c:c8:37 (oui > Unknown), ethertype 802.1Q (0x8100), length 64: vlan 94, p 0, ethertype IPv4, > 172.20.0.17.52299 > kvmtest2.odiso.net.ssh: Flags [.], ack 339613, win 5523, > length 0 > 19:38:08.007730 00:08:7c:bd:ae:40 (oui Unknown) > 00:18:8b:7c:c8:37 (oui > Unknown), ethertype 802.1Q (0x8100), length 64: vlan 94, p 0, ethertype IPv4, > 172.20.0.17.52299 > kvmtest2.odiso.net.ssh: Flags [.], ack 342145, win 5568, > length 0 > 19:38:08.010977 00:08:7c:bd:ae:40 (oui Unknown) > 00:18:8b:7c:c8:37 (oui > Unknown), ethertype 802.1Q (0x8100), length 64: vlan 94, p 0, ethertype IPv4, > 172.20.0.17.52299 > kvmtest2.odiso.net.ssh: Flags [.], ack 344677, win 5614, > length 0 > 19:3 I was able to reproduce this when there is a bond in the system. Looks like this was due to 85fdb956726ff2a ("switchdev: cut over to new switchdev_port_bridge_getlink"). When CONFIG_SWITCHDEV is off, nodes that use switchdev api for ndo_bridge_getlink (example, bonds, teams, rocker) can return -EOPNOTSUPP. The problem went away on my box with the following patch. I will submit an official patch in a bit. Do you have a bond in your system ?. diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 01ced4a..bdb3842 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3013,6 +3013,7 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct u32 portid = NETLINK_CB(cb->skb).portid; u32 seq = cb->nlh->nlmsg_seq; u32 filter_mask = 0; + int err; if (nlmsg_len(cb->nlh) > sizeof(struct ifinfomsg)) { struct nlattr *extfilt; @@ -3033,20 +3034,25 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, stru struct net_device *br_dev = netdev_master_upper_dev_get(dev); if (br_dev && br_dev->netdev_ops->ndo_bridge_getlink) { - if (idx >= cb->args[0] && - br_dev->netdev_ops->ndo_bridge_getlink( - skb, portid, seq, dev, filter_mask, - NLM_F_MULTI) < 0) - break; + if (idx >= cb->args[0]) { + err = br_dev->netdev_ops->ndo_bridge_getlink( + skb, portid, seq, dev, + filter_mask, NLM_F_MULTI); + if ( err < 0 && err != -EOPNOTSUPP) + break; + } idx++; } if (ops->ndo_bridge_getlink) { - if (idx >= cb->args[0] && - ops->ndo_bridge_getlink(skb, portid, seq, dev, - filter_mask, - NLM_F_MULTI) < 0) - break; + if (idx >= cb->args[0]) { + err = ops->ndo_bridge_getlink(skb, portid, + seq, dev, + filter_mask, + NLM_F_MULTI); + if ( err < 0 && err != -EOPNOTSUPP) + break; + } idx++; } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 2/2] bpf: add bpf_redirect() helper
On 15-09-15 11:05 PM, Alexei Starovoitov wrote: > Existing bpf_clone_redirect() helper clones skb before redirecting > it to RX or TX of destination netdev. > Introduce bpf_redirect() helper that does that without cloning. > > Benchmarked with two hosts using 10G ixgbe NICs. > One host is doing line rate pktgen. > Another host is configured as: > $ tc qdisc add dev $dev ingress > $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ >action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop > so it receives the packet on $dev and immediately xmits it on $dev + 1 > The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program > that does bpf_clone_redirect() and performance is 2.0 Mpps > > $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ >action bpf run object-file tcbpf1_kern.o section redirect_xmit drop > which is using bpf_redirect() - 2.4 Mpps > > and using cls_bpf with integrated actions as: > $ tc filter add dev $dev root pref 10 \ > bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1 > performance is 2.5 Mpps > > To summarize: > u32+act_bpf using clone_redirect - 2.0 Mpps > u32+act_bpf using redirect - 2.4 Mpps > cls_bpf using redirect - 2.5 Mpps > > For comparison linux bridge in this setup is doing 2.1 Mpps > and ixgbe rx + drop in ip_rcv - 7.8 Mpps > > Signed-off-by: Alexei Starovoitov> Acked-by: Daniel Borkmann > --- > This approach is using per_cpu scratch area to store ifindex and flags. > The other alternatives discussed at plumbers are slower and more intrusive. > v1->v2: dropped redundant iff_up check > > include/net/sch_generic.h|1 + > include/uapi/linux/bpf.h |8 > include/uapi/linux/pkt_cls.h |1 + > net/core/dev.c |8 > net/core/filter.c| 44 > ++ > net/sched/act_bpf.c |1 + > net/sched/cls_bpf.c |1 + > samples/bpf/bpf_helpers.h|4 > samples/bpf/tcbpf1_kern.c| 24 ++- > 9 files changed, 91 insertions(+), 1 deletion(-) > Acked-by: John Fastabend -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bnx2x - occasional high packet loss (on LAN)
On Wed, Sep 16, 2015 at 08:15:41AM +, Ariel Elior wrote: > Hi Nikola, > Please provide dmesg output from your system. > Thanks, > Ariel Hello Ariel, here it is: http://nik.lbox.cz/download/dmesg.txt BR nik > -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - pgppmrmdfyDhb.pgp Description: PGP signature
[PATCH v4] add stealth mode
Add option to disable any reply not related to a listening socket, like RST/ACK for TCP and ICMP Port-Unreachable for UDP. Also disables ICMP replies to echo request and timestamp. The stealth mode can be enabled selectively for a single interface. Signed-off-by: Matteo Croce--- rebased on 4.3-rc1 Documentation/networking/ip-sysctl.txt | 14 ++ include/linux/inetdevice.h | 1 + include/linux/ipv6.h | 1 + include/uapi/linux/ip.h| 1 + net/ipv4/devinet.c | 1 + net/ipv4/icmp.c| 6 ++ net/ipv4/ip_input.c| 5 +++-- net/ipv4/tcp_ipv4.c| 3 ++- net/ipv4/udp.c | 4 +++- net/ipv6/addrconf.c| 7 +++ net/ipv6/icmp.c| 3 ++- net/ipv6/ip6_input.c | 5 +++-- net/ipv6/tcp_ipv6.c| 2 +- net/ipv6/udp.c | 3 ++- 14 files changed, 47 insertions(+), 9 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ebe94f2..1d46adc 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1206,6 +1206,13 @@ igmp_link_local_mcast_reports - BOOLEAN 224.0.0.X range. Default TRUE +stealth - BOOLEAN + Disable any reply not related to a listening socket, + like RST/ACK for TCP and ICMP Port-Unreachable for UDP. + Also disables ICMP replies to echo requests and timestamp + and ICMP errors for unknown protocols. + Default value is 0. + Alexey Kuznetsov. kuz...@ms2.inr.ac.ru @@ -1635,6 +1642,13 @@ stable_secret - IPv6 address By default the stable secret is unset. +stealth - BOOLEAN + Disable any reply not related to a listening socket, + like RST/ACK for TCP and ICMP Port-Unreachable for UDP. + Also disables ICMPv6 replies to echo requests + and ICMP errors for unknown protocols. + Default value is 0. + icmp/*: ratelimit - INTEGER Limit the maximal rates for sending ICMPv6 packets. diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index a4328ce..a64c01e 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE) #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE) #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY) +#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH) struct in_ifaddr { struct hlist_node hash; diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index f1f32af..a9d0172 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -55,6 +55,7 @@ struct ipv6_devconf { __s32 ndisc_notify; __s32 suppress_frag_ndisc; __s32 accept_ra_mtu; + __s32 stealth; struct ipv6_stable_secret { bool initialized; struct in6_addr secret; diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h index 08f894d..4acbf99 100644 --- a/include/uapi/linux/ip.h +++ b/include/uapi/linux/ip.h @@ -165,6 +165,7 @@ enum IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL, IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL, IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN, + IPV4_DEVCONF_STEALTH, __IPV4_DEVCONF_MAX }; diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 2d9cb17..6d9c080 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2190,6 +2190,7 @@ static struct devinet_sysctl_table { "promote_secondaries"), DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET, "route_localnet"), + DEVINET_SYSCTL_RW_ENTRY(STEALTH, "stealth"), }, }; diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 79fe05b..4cd35b2 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -889,6 +889,9 @@ static bool icmp_echo(struct sk_buff *skb) { struct net *net; + if (IN_DEV_STEALTH(skb->dev->ip_ptr)) + return true; + net = dev_net(skb_dst(skb)->dev); if (!net->ipv4.sysctl_icmp_echo_ignore_all) { struct icmp_bxm icmp_param; @@ -922,6 +925,9 @@ static bool icmp_timestamp(struct sk_buff *skb) if (skb->len < 4) goto out_err; + if (IN_DEV_STEALTH(skb->dev->ip_ptr)) + return true; + /* * Fill in the current time as ms since midnight UT: */ diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index f4fc8a7..e75f250 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@
Re: [PATCH v2 1/1] eventfd: implementation of EFD_MASK flag
On 2015-09-16 08:27, Damian Hobson-Garcia wrote: From: Martin SustrikWhen implementing network protocols in user space, one has to implement fake file descriptors to represent the sockets for the protocol. Polling on such fake file descriptors is a problem (poll/select/epoll accept only true file descriptors) and forces protocol implementers to use various workarounds resulting in complex, non-standard and convoluted APIs. More generally, ability to create full-blown file descriptors for userspace-to-userspace signalling is missing. While eventfd(2) goes half the way towards this goal it has follwoing shorcomings: I. There's no way to signal POLLPRI, POLLHUP etc. II. There's no way to signal arbitrary combination of POLL* flags. Most notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid combination for a network protocol (rx buffer is empty and tx buffer is full), cannot be signaled using eventfd. This patch implements new EFD_MASK flag which solves the above problems. Additionally, to provide a way to associate user-space state with eventfd object, it allows to attach user-space data to the file descriptor. The above paragraph is a leftover from the past. The functionality no longer exist. The semantics of EFD_MASK are as follows: eventfd(2): If eventfd is created with EFD_MASK flag set, it is initialised in such a way as to signal no events on the file descriptor when it is polled on. The 'initval' argument is ignored. write(2): User is allowed to write only buffers containing the following structure: struct efd_mask { uint32_t events; }; Is it worth having a struct here? Why not just uint32_t? Martin The value of 'events' should be any combination of event flags as defined by poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified events will be signaled when polling (select, poll, epoll) on the eventfd is done later on. read(2): read is not supported and will fail with EINVAL. select(2), poll(2) and similar: When polling on the eventfd marked by EFD_MASK flag, all the events specified in last written 'events' field shall be signaled. Signed-off-by: Martin Sustrik [dhobs...@igel.co.jp: Rebased, and resubmitted for Linux 4.3] Signed-off-by: Damian Hobson-Garcia --- fs/eventfd.c | 102 ++- include/linux/eventfd.h | 16 +-- include/uapi/linux/eventfd.h | 39 + 3 files changed, 132 insertions(+), 25 deletions(-) create mode 100644 include/uapi/linux/eventfd.h diff --git a/fs/eventfd.c b/fs/eventfd.c index 8d0c0df..1a6a066 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -2,6 +2,7 @@ * fs/eventfd.c * * Copyright (C) 2007 Davide Libenzi + * Copyright (C) 2013 Martin Sustrik * */ @@ -22,18 +23,31 @@ #include #include +#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK) +#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP) + struct eventfd_ctx { struct kref kref; wait_queue_head_t wqh; - /* -* Every time that a write(2) is performed on an eventfd, the -* value of the __u64 being written is added to "count" and a -* wakeup is performed on "wqh". A read(2) will return the "count" -* value to userspace, and will reset "count" to zero. The kernel -* side eventfd_signal() also, adds to the "count" counter and -* issue a wakeup. -*/ - __u64 count; + union { + /* +* Every time that a write(2) is performed on an eventfd, the +* value of the __u64 being written is added to "count" and a +* wakeup is performed on "wqh". A read(2) will return the +* "count" value to userspace, and will reset "count" to zero. +* The kernel side eventfd_signal() also, adds to the "count" +* counter and issue a wakeup. +*/ + __u64 count; + + /* +* When using eventfd in EFD_MASK mode this stracture stores the +* current events to be signaled on the eventfd (events member) +* along with opaque user-defined data (data member). +*/ + struct efd_mask mask; + }; unsigned int flags; }; @@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) return events; } +static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait) +{ + struct eventfd_ctx *ctx = file->private_data; + + poll_wait(file, >wqh, wait); + return ctx->mask.events; +} + static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) { *cnt =
Re: [PATCH 1/4] stmmac: replace all pr_xxx by their dev_xxx counterpart
On Wed, Sep 09, 2015 at 09:14:42AM -0700, Joe Perches wrote: > On Wed, 2015-09-09 at 15:14 +0200, LABBE Corentin wrote: > > The stmmac driver use lots of pr_xxx functions to print information. > > This is bad since we cannot know which device logs the information. > > (moreover if two stmmac device are present) > [] > > So this patch replace all pr_xxx by their dev_xxx counterpart. > > Using > netdev_(priv->dev, ... > instead of > dev_(priv->device, > > would be more consistent with other ethernet devices. > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > [] > > @@ -298,7 +298,7 @@ bool stmmac_eee_init(struct stmmac_priv *priv) > > */ > > spin_lock_irqsave(>lock, flags); > > if (priv->eee_active) { > > - pr_debug("stmmac: disable EEE\n"); > > + dev_dbg(priv->device, "disable EEE\n"); > > netdev_dbg(priv->dev, ...) > > > @@ -657,10 +657,10 @@ static int stmmac_init_ptp(struct stmmac_priv *priv) > > priv->adv_ts = 1; > > > > if (netif_msg_hw(priv) && priv->dma_cap.time_stamp) > > - pr_debug("IEEE 1588-2002 Time Stamp supported\n"); > > + dev_dbg(priv->device, "IEEE 1588-2002 Time Stamp supported\n"); > > And these netif_msg_ could be > > if (priv->dma_cap.timestamp) > netif_dbg(priv, hw, priv->dev, ...); > > Hello My main goal is to improve logging from [0.796804] stmmaceth 1c5.ethernet: no reset control found [0.802635] Ring mode enabled [0.805713] No HW DMA feature register supported [0.810239] Normal descriptors [0.813577] TX Checksum insertion supported [ 23.615074] eth0: device MAC address aa:65:84:d5:a3:58 [ 23.704326] RX IPC Checksum Offload disabled [ 23.704349] No MAC Management Counters available to that: [0.788147] sun7i-dwmac 1c5.ethernet (unnamed net_device) (uninitialized): no reset control found [0.797400] sun7i-dwmac 1c5.ethernet (unnamed net_device) (uninitialized): Ring mode enabled [0.806211] sun7i-dwmac 1c5.ethernet (unnamed net_device) (uninitialized): No HW DMA feature register supported [0.816658] sun7i-dwmac 1c5.ethernet (unnamed net_device) (uninitialized): Normal descriptors [0.825522] sun7i-dwmac 1c5.ethernet (unnamed net_device) (uninitialized): TX Checksum insertion supported [ 12.971725] sun7i-dwmac 1c5.ethernet eth0: device MAC address 3e:62:18:6f:c7:f4 [ 13.056902] sun7i-dwmac 1c5.ethernet eth0: RX IPC Checksum Offload disabled [ 13.056929] sun7i-dwmac 1c5.ethernet eth0: No MAC Management Counters available But by using the netdev_ functions the first five lines are not "pretty" with the "(unnamed net_device) (uninitialized)" Could I switch back do dev_xxx since they are "early device logging" and so make it prettier ? Best regards -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: smc91x: convert pxa dma to dmaengine
David Millerwrites: > From: Robert Jarzmik > Date: Thu, 10 Sep 2015 21:26:04 +0200 > >> Convert the dma transfers to be dmaengine based, now pxa has a dmaengine >> slave driver. This makes this driver a bit more PXA agnostic. >> >> The driver was tested on pxa27x (mainstone) and pxa310 (zylonite), >> ie. only pxa platforms. >> >> Signed-off-by: Robert Jarzmik >> Cc: Russell King >> Cc: Arnd Bergmann >> --- >> This has potential to break other platform such as Neponset, Idp, >> halibut and qsd8x50, so I added Russell and Arnd as they were discussing >> smc91x support last February. > > Is someone testing whether such platforms break or not? I'm waiting for > that before I consider applying this patch. My understanding is that Russell is the only one left testing them, or at least he was the only one complaining about a breakage lately on neponset. I can wait several weeks for Russell to have a bit of time to try : I know it will compile correctly at least for neponset, and I know almost all the code is under #ifdef CONFIG_ARCH_PXA. And still I would feel far more comfortable if it was tested, just as you. Cheers. -- Robert -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
request for stable inclusion
Hi Dave, Commit 9293267 "net/mlx4_core: Capping number of requested MSIXs to MAX_MSIX" fixes a bug under which the driver doesn't really starts over a machine with > 32 cores. The bug was introduced in 4.2-rc1 but the fix missed 4.2 -- could you please push it to 4.2 -stable? If you prefer that we will submit it directly there, fine too. thanks, Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vhost: build failure
Hi, While crosscompiling the kernel for openrisc with allmodconfig the build failed with the error: drivers/vhost/vhost.c: In function 'vhost_vring_ioctl': drivers/vhost/vhost.c:818:3: error: call to '__compiletime_assert_818' declared with attribute error: BUILD_BUG_ON failed: __alignof__ *vq->avail > VRING_AVAIL_ALIGN_SIZE Can you please give me any idea about what the problem might be and how it can be solved. You can see the build log at: https://travis-ci.org/sudipm-mukherjee/parport/jobs/80365425 regards sudip -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
Hi, On Wed, Sep 16, 2015 at 06:53:57AM +, Damien Thébault wrote: > On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote: > > I have a recent Marvell Armada 388 board here which uses the mvneta > > driver. I'm seeing some weird effects with NFS with it acting as a > > client. > > Hello, > > I'm upgrading a Marvelle Armada 370 board using the mvneta driver from > 4.0 to 4.2 and noticed issues with NFS booting. > Basically, most of the time init returns with an error code, or > programs segfault or throw illegal instructions. > > Since it worked fine on 4.0 I bisected until I found commit > a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling > for Rx DMA buffers". > > If I revert this commit, everything seems to get back to normal. > Could you try it ? The two issues look very similar. I'm not sure but I'm seeing that the accounting was changed by this patch without being certain of the implications; if the revert above works, it would be nice to try to only apply this just to see if that's indeed an accounting error or not : diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 62e48bc..4205867 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1463,6 +1463,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, { struct net_device *dev = pp->dev; int rx_done; + int missed = 0; u32 rcvd_pkts = 0; u32 rcvd_bytes = 0; @@ -1527,6 +1528,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, if (err) { netdev_err(dev, "Linux processing - Can't refill\n"); rxq->missed++; + missed++; goto err_drop_frame; } @@ -1561,7 +1563,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, } /* Update rxq management counters */ - mvneta_rxq_desc_num_update(pp, rxq, rx_done, rx_done); + mvneta_rxq_desc_num_update(pp, rxq, rx_done, rx_done - missed); return rx_done; } Regards, Willy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost: build failure
On Wed, Sep 16, 2015 at 01:50:08PM +0530, Sudip Mukherjee wrote: > Hi, > While crosscompiling the kernel for openrisc with allmodconfig the build > failed with the error: > drivers/vhost/vhost.c: In function 'vhost_vring_ioctl': > drivers/vhost/vhost.c:818:3: error: call to '__compiletime_assert_818' > declared with attribute error: BUILD_BUG_ON failed: __alignof__ > *vq->avail > VRING_AVAIL_ALIGN_SIZE > > Can you please give me any idea about what the problem might be and how > it can be solved. > > You can see the build log at: > https://travis-ci.org/sudipm-mukherjee/parport/jobs/80365425 > > regards > sudip Yes - I think I saw this already. I think the openrisc cross-compiler is broken. VRING_AVAIL_ALIGN_SIZE is 2 *vq->avail is: struct vring_avail { __virtio16 flags; __virtio16 idx; __virtio16 ring[]; }; And __virtio16 is just a u16 with some sparse annotations. Looking at openrisc architecture document: Operand:Length addr[3:0] if aligned Halfword (or half) 2 bytes Xxx0 TypeC-TYPESizeof Alignment Openrisc Equivalent Short Signed short2 2 Signed halfword and 16.1.2 Aggregates and Unions Aggregates (structures and arrays) and unions assume the alignment of their most strictly aligned element. So to me, it looks like your gcc violates the ABI by adding alignment requirements > 2. -- MST -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[linux-next] oops in ip_route_input_noref
Hi, 4.3.0-rc1-next-20150916 oops after removal of rndis usb device ... 8146c052: 00 8146c053: 0f b6 55 8a movzbl -0x76(%rbp),%edx 8146c057: 49 8b bf e8 01 00 00mov0x1e8(%r15),%rdi 8146c05e: 45 89 d1mov%r10d,%r9d 8146c061: 44 89 f6mov%r14d,%esi 8146c064: 44 88 95 70 ff ff ffmov%r10b,-0x90(%rbp) 8146c06b: 0f 95 c1setne %cl 8146c06e: 81 ce 00 00 00 80 or $0x8000,%esi 8146c074: 41 83 e1 01 and$0x1,%r9d 8146c078: 45 31 c0xor%r8d,%r8d 8146c07b: e8 49 d5 ff ff callq 814695c9 8146c080: 48 85 c0test %rax,%rax 8146c083: 49 89 c5mov%rax,%r13 8146c086: 75 0a jne8146c092 <ip_route_input_noref+0xa75> 8146c088: bb 97 ff ff ff mov$0xff97,%ebx 8146c08d: e9 06 f8 ff ff jmpq 8146b898 <ip_route_input_noref+0x27b> 8146c092: 48 c7 40 58 a3 95 46movq $0x814695a3,0x58(%rax) 8146c099: 81 8146c09a: c6 80 a2 00 00 00 01movb $0x1,0xa2(%rax) 8146c0a1: 48 8b 45 98 mov-0x68(%rbp),%rax 8146c0a5: 44 8a 95 70 ff ff ffmov-0x90(%rbp),%r10b 8146c0ac: 48 85 c0test %rax,%rax 8146c0af: 74 0a je 8146c0bb <ip_route_input_noref+0xa9e> 8146c0b1: 8b 40 10mov0x10(%rax),%eax ^^^ 8146c0b4: 41 89 85 b0 00 00 00mov%eax,0xb0(%r13) 8146c0bb: 65 ff 05 9e 54 ba 7eincl %gs:0x7eba549e(%rip) # 11560 8146c0c2: 80 7d 8a 07 cmpb $0x7,-0x76(%rbp) 8146c0c6: 75 1a jne8146c0e2 <ip_route_input_noref+0xac5> 8146c0c8: 41 81 a5 9c 00 00 00andl $0x7fff,0x9c(%r13) 8146c0cf: ff ff ff 7f 8146c0d3: f7 db neg%ebx 8146c0d5: 49 c7 45 50 b1 96 46movq $0x814696b1,0x50(%r13) 8146c0dc: 81 8146c0dd: 66 41 89 5d 64 mov%bx,0x64(%r13) 8146c0e2: 45 84 d2test %r10b,%r10b 8146c0e5: 74 29 je 8146c110 <ip_route_input_noref+0xaf3> 8146c0e7: 0f b6 7d 89 movzbl -0x77(%rbp),%edi 8146c0eb: 4c 89 eemov%r13,%rsi 8146c0ee: 48 ff c7inc%rdi 8146c0f1: 48 6b ff 60 imul $0x60,%rdi,%rdi 8146c0f5: 48 03 7d 90 add-0x70(%rbp),%rdi 8146c0f9: e8 10 d1 ff ff callq 8146920e 8146c0fe: 84 c0 test %al,%al 8146c100: 75 0e jne8146c110 <ip_route_input_noref+0xaf3> 8146c102: 66 41 83 4d 60 10 orw$0x10,0x60(%r13) 8146c108: 4c 89 efmov%r13,%rdi 8146c10b: e8 7d cc ff ff callq 81468d8d 8146c110: 4d 89 6c 24 58 mov%r13,0x58(%r12) 8146c115: 31 db xor%ebx,%ebx 8146c117: e9 7c f7 ff ff jmpq 8146b898 <ip_route_input_noref+0x27b> 8146c11c: bb 8f ff ff ff mov$0xff8f,%ebx 8146c121: c6 45 8a 07 movb $0x7,-0x76(%rbp) 8146c125: 48 c7 45 90 00 00 00movq $0x0,-0x70(%rbp) ... addr2line -e vmlinux -i 0x8146c0b1 net/ipv4/route.c:1815 net/ipv4/route.c:1905 which seems to be this line ip_route_input_noref()->ip_route_input_slow(): ... 1813 rth->rt_is_input = 1; 1814 if (res.table) 1815 rth->rt_table_id = res.table->tb_id; 1816 ... added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 net: Add FIB table id to rtable Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. -ss -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
Hi Damien, On mer., sept. 16 2015, Damien Thébaultwrote: > On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote: >> I have a recent Marvell Armada 388 board here which uses the mvneta >> driver. I'm seeing some weird effects with NFS with it acting as a >> client. > > Hello, > > I'm upgrading a Marvelle Armada 370 board using the mvneta driver from > 4.0 to 4.2 and noticed issues with NFS booting. > Basically, most of the time init returns with an error code, or > programs segfault or throw illegal instructions. > > Since it worked fine on 4.0 I bisected until I found commit > a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling > for Rx DMA buffers". > > If I revert this commit, everything seems to get back to normal. > Could you try it ? The two issues look very similar. Actually there was a bug with this commit, but a fix had been submitted and accepted yesterday, you can find him here: https://patchwork.ozlabs.org/patch/518111/. Thanks, Gregory > > Regards -- Gregory Clement, Free Electrons Kernel, drivers, real-time and embedded Linux development, consulting, training and support. http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/1] eventfd: implementation of EFD_MASK flag
Hi Martin, On 2015-09-16 3:51 PM, Martin Sustrik wrote: > On 2015-09-16 08:27, Damian Hobson-Garcia wrote: >> >> Additionally, to provide a way to associate user-space state with eventfd >> object, it allows to attach user-space data to the file descriptor. > > The above paragraph is a leftover from the past. The functionality no > longer exist. > Oops, I forgot to delete that part. I'll get rid of it. >> >> The semantics of EFD_MASK are as follows: >> >> eventfd(2): >> >> If eventfd is created with EFD_MASK flag set, it is initialised in such a >> way as to signal no events on the file descriptor when it is polled on. >> The 'initval' argument is ignored. >> >> write(2): >> >> User is allowed to write only buffers containing the following structure: >> >> struct efd_mask { >> uint32_t events; >> }; > > Is it worth having a struct here? Why not just uint32_t? As it stands right now, no, the struct doesn't really add anything. uint32_t should be just fine. > > Martin Damian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: Fix vti use case with oif in dst lookups
On Tue, Sep 15, 2015 at 03:10:50PM -0700, David Ahern wrote: > Steffen reported that the recent change to add oif to dst lookups breaks > the VTI use case. The problem is that with the oif set in the flow struct > the comparison to the nh_oif is triggered. Fix by splitting the > FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache > bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare > nh oif (FLOWI_FLAG_SKIP_NH_OIF). > > Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") > > Signed-off-by: David AhernThis works, thanks a lot for the quick fix! > --- > IPv6 does not show this problem for me. So no change is added for IPv6. > If your mileage varies let me know and I'll take another look. IPv6 works just fine as it is, so no change needed. Acked-by: Steffen Klassert -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] net: irda: pxaficp_ir: use sched_clock() for time management
David Millerwrites: > From: Robert Jarzmik > Date: Sat, 12 Sep 2015 13:45:22 +0200 > >> Instead of using directly the OS timer through direct register access, >> use the standard sched_clock(), which will end up in OSCR reading >> anyway. >> >> This is a first step for direct access register removal and machine >> specific code removal from this driver. >> >> Signed-off-by: Robert Jarzmik > > What is the granularity of the OSCR register? It's 307ns (ie. 3.25MHz clock). > If it is not nanoseconds, then you need to adjust calculations > such as this one: Tell me if the 307ns requires something I should adjust. My understanding is that the flow will be : sched_clock() rd->read_sched_clock() (cyc_to_ns() transformed for return) pxa_read_sched_clock() readl_relaxed(OSCR) I didn't see any timings issue, as the flow looks equivalent to the readl(OSCR), but I might have overlooked something. Cheers. -- Robert -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost: build failure
On Wed, Sep 16, 2015 at 11:36:45AM +0300, Michael S. Tsirkin wrote: > On Wed, Sep 16, 2015 at 01:50:08PM +0530, Sudip Mukherjee wrote: > > Hi, > > While crosscompiling the kernel for openrisc with allmodconfig the build > > failed with the error: > > drivers/vhost/vhost.c: In function 'vhost_vring_ioctl': > > drivers/vhost/vhost.c:818:3: error: call to '__compiletime_assert_818' > > declared with attribute error: BUILD_BUG_ON failed: __alignof__ > > *vq->avail > VRING_AVAIL_ALIGN_SIZE > > > > Can you please give me any idea about what the problem might be and how > > it can be solved. > > > > You can see the build log at: > > https://travis-ci.org/sudipm-mukherjee/parport/jobs/80365425 > > > > regards > > sudip > > Yes - I think I saw this already. > I think the openrisc cross-compiler is broken. I thought so. Thanks for the quick reply. I will open a bug in gcc and lets see what they say. regards sudip -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote: > I have a recent Marvell Armada 388 board here which uses the mvneta > driver. I'm seeing some weird effects with NFS with it acting as a > client. Hello, I'm upgrading a Marvelle Armada 370 board using the mvneta driver from 4.0 to 4.2 and noticed issues with NFS booting. Basically, most of the time init returns with an error code, or programs segfault or throw illegal instructions. Since it worked fine on 4.0 I bisected until I found commit a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling for Rx DMA buffers". If I revert this commit, everything seems to get back to normal. Could you try it ? The two issues look very similar. Regards -- Damien Thébault
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Wed, 2015-09-16 at 09:15 +0200, Gregory CLEMENT wrote: > > Since it worked fine on 4.0 I bisected until I found commit > > a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix > > refilling > > for Rx DMA buffers". > > > > If I revert this commit, everything seems to get back to normal. > > Could you try it ? The two issues look very similar. > Actually there was a bug with this commit, but a fix had been > submitted > and accepted yesterday, you can find him here: > https://patchwork.ozlabs.org/patch/518111/. Hello, this indeed fixes the issue for me, thanks. -- Damien Thébault R Engineer VITEC T. +33 1 46 73 06 06 F. +33 9 59 85 99 92 E. damien.theba...@vitec.com http://www.vitec.com
RE: bnx2x - occasional high packet loss (on LAN)
Hi Nikola, Please provide dmesg output from your system. Thanks, Ariel > -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On > Behalf Of Nikola Ciprich > Sent: Tuesday, September 15, 2015 7:17 AM > To: netdev> Cc: n...@linuxbox.cz > Subject: bnx2x - occasional high packet loss (on LAN) > > Hello, > > I'm trying to track strange issue with one of our servers and > like to ask for recommendations.. > > I've got three node cluster (nodes A..C) interconnected with stacked broadcom > ICX6610. eth0 of each box is connected to first switch, eth1 to second one, > bonding set as follows: "mode=802.3ad lacp_rate=fast xmit_hash_policy=layer2+3 > miimon=100" > > It happened few times, that suddenly eth1 on box A started misbehaving and > communication > with other nodes (ie flood ping) started dropping up to 30% packets. When > this port > has been shut on both sides, problem immediately vanished. > > We've tried replacing card, cable and using different port on switch, but > problem > repeated again yesterday.. > > Since it's "only" loss, and not link loss, bonding doesn't help me much.. > > however during weekend, port also had strange link issue: > > Sep 12 15:23:45 remrprv1a kernel: [676373.296786] bnx2x :03:00.1 eth1: NIC > Link is Down > Sep 12 15:23:46 remrprv1a kernel: [676373.356638] bond0: link status > definitely > down for interface eth1, disabling it > Sep 12 15:23:46 remrprv1a kernel: [676374.299571] bnx2x :03:00.1 eth1: NIC > Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit > Sep 12 15:23:47 remrprv1a kernel: [676374.364428] bond0: link status > definitely up > for interface eth1, 1 Mbps full duplex > Sep 12 15:23:47 remrprv1a kernel: [676374.372902] bond0: first active > interface up! > Sep 12 15:24:24 remrprv1a kernel: [676411.402511] bnx2x :03:00.1 eth1: NIC > Link is Down > Sep 12 15:24:24 remrprv1a kernel: [676411.407422] bond0: link status > definitely > down for interface eth1, disabling it > Sep 12 15:24:25 remrprv1a kernel: [676412.405311] bnx2x :03:00.1 eth1: NIC > Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit > Sep 12 15:24:25 remrprv1a kernel: [676412.408123] bond0: link status > definitely up > for interface eth1, 0 Mbps full duplex > Sep 12 15:24:51 remrprv1a kernel: [676438.477641] bnx2x :03:00.1 eth1: NIC > Link is Down > Sep 12 15:24:51 remrprv1a kernel: [676438.528513] bond0: link status > definitely > down for interface eth1, disabling it > Sep 12 15:24:52 remrprv1a kernel: [676439.480472] bnx2x :03:00.1 eth1: NIC > Link is Up, 1 Mbps full duplex, Flow control: ON - receive & transmit > Sep 12 15:24:52 remrprv1a kernel: [676439.536282] bond0: link status > definitely up > for interface eth1, 1 Mbps full duplex > > 0mbps link speed is quite weird I guess.. > > all three boxes are the same, running centos6 based system, 4.0.5 x86_64 > kernel. > > The only difference I noticed on them is, that irqbalance was enabled on > problematic > box and not on the others.. So I disabled it and rebooted the box.. The > problem is, > I can't really wait for the problem to reappear, so I'd like to ask, has > anybody > seen similar problem? I of so, was it fixed in some newer kernel release? I > haven't > found mention in the changelogs, but still.. or does somebody have a hint on > what > else > I should check? > > I'll try to reproduce this on test system (enabling irqbalance and doing some > network > benchmarks, but I'd be most happy if I could prevent it on this production > system..) > > thanks a lot for any advance > > with best regards > > nikola ciprich > > PS: here's lspci -vv of eths.. should I provide any further information, > please let me > know: > > http://nik.lbox.cz/download/lspci.txt > > -- > - > Ing. Nikola CIPRICH > LinuxBox.cz, s.r.o. > 28.rijna 168, 709 00 Ostrava > > tel.: +420 591 166 214 > fax:+420 596 621 273 > mobil: +420 777 093 799 > www.linuxbox.cz > > mobil servis: +420 737 238 656 > email servis: ser...@linuxbox.cz > - -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfrm4_garbage_collect reaching limit
On Mon, Sep 14, 2015 at 11:14:59PM -0400, Dan Streetman wrote: > On Fri, Sep 11, 2015 at 5:48 AM, Steffen Klassert >wrote: > > > >> Possibly the > >> default value of xfrm4_gc_thresh could be set proportional to > >> num_online_cpus(), but that doesn't help when cpus are onlined after > >> boot. > > > > This could be an option, we could change the xfrm4_gc_thresh value with > > a cpu notifier callback if more cpus come up after boot. > > the issue there is, if the value is changed by the user, does a cpu > hotplug reset it back to default... What about the patch below? With this we are independent of the number of cpus. It should cover most, if not all usecases. While we are at it, we could think about increasing the flowcache percpu limit. This value was choosen back in 2003, so maybe we could have more than 4k cache entries per cpu these days. Subject: [PATCH RFC] xfrm: Let the flowcache handle its size by default. The xfrm flowcache size is limited by the flowcache limit (4096 * number of online cpus) and the xfrm garbage collector threshold (2 * 32768), whatever is reached first. This means that we can hit the garbage collector limit only on systems with more than 16 cpus. On such systems we simply refuse new allocations if we reach the limit, so new flows are dropped. On syslems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. The xfrm garbage collector threshold can still be set below the flowcache limit to reduce the memory usage of the flowcache. Signed-off-by: Steffen Klassert --- Documentation/networking/ip-sysctl.txt | 6 -- net/ipv4/xfrm4_policy.c| 2 +- net/ipv6/xfrm6_policy.c| 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ebe94f2..260f30b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1199,7 +1199,8 @@ tag - INTEGER xfrm4_gc_thresh - INTEGER The threshold at which we will start garbage collecting for IPv4 destination cache entries. At twice this value the system will - refuse new allocations. + refuse new allocations. The value must be set below the flowcache + limit (4096 * number of online cpus) to take effect. igmp_link_local_mcast_reports - BOOLEAN Enable IGMP reports for link local multicast groups in the @@ -1645,7 +1646,8 @@ ratelimit - INTEGER xfrm6_gc_thresh - INTEGER The threshold at which we will start garbage collecting for IPv6 destination cache entries. At twice this value the system will - refuse new allocations. + refuse new allocations. The value must be set below the flowcache + limit (4096 * number of online cpus) to take effect. IPv6 Update by: diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 1e06c4f..3dffc73 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -248,7 +248,7 @@ static struct dst_ops xfrm4_dst_ops = { .destroy = xfrm4_dst_destroy, .ifdown = xfrm4_dst_ifdown, .local_out =__ip_local_out, - .gc_thresh =32768, + .gc_thresh =INT_MAX, }; static struct xfrm_policy_afinfo xfrm4_policy_afinfo = { diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index f10b940..e9af39a 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -289,7 +289,7 @@ static struct dst_ops xfrm6_dst_ops = { .destroy = xfrm6_dst_destroy, .ifdown = xfrm6_dst_ifdown, .local_out =__ip6_local_out, - .gc_thresh =32768, + .gc_thresh =INT_MAX, }; static struct xfrm_policy_afinfo xfrm6_policy_afinfo = { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND PATCH] net: ks8851: Export OF module alias information
Drivers needs to export the OF id table and this be built into the module or udev won't have the necessary information to autoload the driver module when the device is registered via OF. Signed-off-by: Javier Martinez Canillas--- drivers/net/ethernet/micrel/ks8851.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c index 66d4ab703f45..60f43ec22175 100644 --- a/drivers/net/ethernet/micrel/ks8851.c +++ b/drivers/net/ethernet/micrel/ks8851.c @@ -1601,6 +1601,7 @@ static const struct of_device_id ks8851_match_table[] = { { .compatible = "micrel,ks8851" }, { } }; +MODULE_DEVICE_TABLE(of, ks8851_match_table); static struct spi_driver ks8851_driver = { .driver = { -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] solos-pci: Increase headroom on received packets
On Wed, 2015-09-16 at 11:25 +0100, David Woodhouse wrote: > A comment in include/linux/skbuff.h says that: > > * Various parts of the networking layer expect at least 32 bytes of > * headroom, you should not reduce this. > > This was demonstrated by a panic when handling fragmented IPv6 packets: > http://marc.info/?l=linux-netdev=144236093519172=2 > > It's not entirely clear if that comment is still valid — and if it is, > perhaps netif_rx() ought to be enforcing it with a warning. > > But either way, it is rather stupid from a performance point of view > for us to be receiving packets into a buffer which doesn't have enough > room to prepend an Ethernet header — it means that *every* incoming > packet is going to be need to be reallocated. So let's fix that. > > Signed-off-by: David Woodhouse> --- > Tested in the DMA code path; I don't believe the DMA-capable devices > can still be used in MMIO mode. Simon, Guy, would you be able to test > the MMIO version? You should use netdev_alloc_skb() : This helper is better for rx skbs, as it allows for better packing of frames in GRO or TCP stack. Also netdev_alloc_skb_ip_align() might handle the NET_IP_ALIGN stuff for arches that care. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] add stealth mode
Matteo Crocewrote: > Add option to disable any reply not related to a listening socket, > like RST/ACK for TCP and ICMP Port-Unreachable for UDP. > Also disables ICMP replies to echo request and timestamp. > The stealth mode can be enabled selectively for a single interface. I think it would make more sense to extend the socket match in xtables if it can't be used to achive this already. seems like *filter :INPUT ACCEPT [0:0] -A INPUT -p tcp -m socket --nowildcard -j ACCEPT -A INPUT -p tcp -j DROP COMMIT Already does what you want for tcp, udp should work too. I'd much rather see xtables and/or nftables to be extended with whatever feature(s) are needed to configure such a policy rather than pushing this into the core network stack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On 2015-09-16 11:24, Sergey Senozhatsky wrote: > Hi, > > 4.3.0-rc1-next-20150916 > > oops after removal of rndis usb device > > ... > 8146c052: 00 > 8146c053: 0f b6 55 8a movzbl -0x76(%rbp),%edx > 8146c057: 49 8b bf e8 01 00 00mov0x1e8(%r15),%rdi > 8146c05e: 45 89 d1mov%r10d,%r9d > 8146c061: 44 89 f6mov%r14d,%esi > 8146c064: 44 88 95 70 ff ff ffmov%r10b,-0x90(%rbp) > 8146c06b: 0f 95 c1setne %cl > 8146c06e: 81 ce 00 00 00 80 or $0x8000,%esi > 8146c074: 41 83 e1 01 and$0x1,%r9d > 8146c078: 45 31 c0xor%r8d,%r8d > 8146c07b: e8 49 d5 ff ff callq 814695c9 > > 8146c080: 48 85 c0test %rax,%rax > 8146c083: 49 89 c5mov%rax,%r13 > 8146c086: 75 0a jne8146c092 > <ip_route_input_noref+0xa75> > 8146c088: bb 97 ff ff ff mov$0xff97,%ebx > 8146c08d: e9 06 f8 ff ff jmpq 8146b898 > <ip_route_input_noref+0x27b> > 8146c092: 48 c7 40 58 a3 95 46movq > $0x814695a3,0x58(%rax) > 8146c099: 81 > 8146c09a: c6 80 a2 00 00 00 01movb $0x1,0xa2(%rax) > 8146c0a1: 48 8b 45 98 mov-0x68(%rbp),%rax > 8146c0a5: 44 8a 95 70 ff ff ffmov-0x90(%rbp),%r10b > 8146c0ac: 48 85 c0test %rax,%rax > 8146c0af: 74 0a je 8146c0bb > <ip_route_input_noref+0xa9e> > 8146c0b1: 8b 40 10mov0x10(%rax),%eax > ^^^ > 8146c0b4: 41 89 85 b0 00 00 00mov%eax,0xb0(%r13) > 8146c0bb: 65 ff 05 9e 54 ba 7eincl %gs:0x7eba549e(%rip) > # 11560 > 8146c0c2: 80 7d 8a 07 cmpb $0x7,-0x76(%rbp) > 8146c0c6: 75 1a jne8146c0e2 > <ip_route_input_noref+0xac5> > 8146c0c8: 41 81 a5 9c 00 00 00andl $0x7fff,0x9c(%r13) > 8146c0cf: ff ff ff 7f > 8146c0d3: f7 db neg%ebx > 8146c0d5: 49 c7 45 50 b1 96 46movq > $0x814696b1,0x50(%r13) > 8146c0dc: 81 > 8146c0dd: 66 41 89 5d 64 mov%bx,0x64(%r13) > 8146c0e2: 45 84 d2test %r10b,%r10b > 8146c0e5: 74 29 je 8146c110 > <ip_route_input_noref+0xaf3> > 8146c0e7: 0f b6 7d 89 movzbl -0x77(%rbp),%edi > 8146c0eb: 4c 89 eemov%r13,%rsi > 8146c0ee: 48 ff c7inc%rdi > 8146c0f1: 48 6b ff 60 imul $0x60,%rdi,%rdi > 8146c0f5: 48 03 7d 90 add-0x70(%rbp),%rdi > 8146c0f9: e8 10 d1 ff ff callq 8146920e > > 8146c0fe: 84 c0 test %al,%al > 8146c100: 75 0e jne8146c110 > <ip_route_input_noref+0xaf3> > 8146c102: 66 41 83 4d 60 10 orw$0x10,0x60(%r13) > 8146c108: 4c 89 efmov%r13,%rdi > 8146c10b: e8 7d cc ff ff callq 81468d8d > > 8146c110: 4d 89 6c 24 58 mov%r13,0x58(%r12) > 8146c115: 31 db xor%ebx,%ebx > 8146c117: e9 7c f7 ff ff jmpq 8146b898 > <ip_route_input_noref+0x27b> > 8146c11c: bb 8f ff ff ff mov$0xff8f,%ebx > 8146c121: c6 45 8a 07 movb $0x7,-0x76(%rbp) > 8146c125: 48 c7 45 90 00 00 00movq $0x0,-0x70(%rbp) > ... > > addr2line -e vmlinux -i 0x8146c0b1 > net/ipv4/route.c:1815 > net/ipv4/route.c:1905 > > > which seems to be this line ip_route_input_noref()->ip_route_input_slow(): > ... > 1813 rth->rt_is_input = 1; > 1814 if (res.table) > 1815 rth->rt_table_id = res.table->tb_id; > 1816 > ... > > > added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 > > net: Add FIB table id to rtable > > Add the FIB table id to rtable to make the information available for > IPv4 as it is for IPv6. > > > -ss > --
Re: [ANNOUNCE] libnftnl 1.0.4 release
On Wednesday 2015-09-16 13:50, Pablo Neira Ayuso wrote: >The Netfilter project proudly presents: > >libnftnl 1.0.4 $ git diff libnftnl-1.0.3..libnftnl-1.0.4 src/libnftnl.map diff --git a/src/libnftnl.map b/src/libnftnl.map index be7b998..14ec88c 100644 --- a/src/libnftnl.map +++ b/src/libnftnl.map @@ -124,10 +123,12 @@ global: nft_set_attr_is_set; nft_set_attr_set; nft_set_attr_set_u32; + nft_set_attr_set_u64; nft_set_attr_set_str; nft_set_attr_get; nft_set_attr_get_str; nft_set_attr_get_u32; + nft_set_attr_get_u64; nft_set_nlmsg_build_payload; nft_set_nlmsg_parse; nft_set_parse; You broke the ABI. A program that uses nft_set_attr_set_u64 and is built against libnftnl-1.0.4 is marked to be compatible with the "LIBNFTNL_1.0" symbol group, but this is incorrect, since the nft_set_attr_set_u64 symbol did not previously exist. Existing symbol groups in .map must not be extended. Always start a new group. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/31] net/cavium/liquidio: use kmemdup rather than duplicating its implementation
Ping. Regards Andrzej On 08/07/2015 09:59 AM, Andrzej Hajda wrote: > The patch was generated using fixed coccinelle semantic patch > scripts/coccinelle/api/memdup.cocci [1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 > > Signed-off-by: Andrzej Hajda> --- > drivers/net/ethernet/cavium/liquidio/octeon_device.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c > b/drivers/net/ethernet/cavium/liquidio/octeon_device.c > index f67641a..8e23e3f 100644 > --- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c > +++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c > @@ -602,12 +602,10 @@ int octeon_download_firmware(struct octeon_device *oct, > const u8 *data, > snprintf(oct->fw_info.liquidio_firmware_version, 32, "LIQUIDIO: %s", >h->version); > > - buffer = kmalloc(size, GFP_KERNEL); > + buffer = kmemdup(data, size, GFP_KERNEL); > if (!buffer) > return -ENOMEM; > > - memcpy(buffer, data, size); > - > p = buffer + sizeof(struct octeon_firmware_file_header); > > /* load all images */ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH] net: bridge: fix for bridging 802.1Q without REORDER_HDR
On Tue, Sep 15, 2015 at 10:36:41PM -0400, Vlad Yasevich wrote: > On 09/15/2015 02:17 PM, Phil Sutter wrote: > > On Tue, Sep 15, 2015 at 11:11:53AM -0400, Vlad Yasevich wrote: > >> On 09/14/2015 04:06 PM, Phil Sutter wrote: > >>> On Mon, Sep 14, 2015 at 02:21:10PM -0400, Vlad Yasevich wrote: > On 09/11/2015 04:20 PM, Phil Sutter wrote: > > On Fri, Sep 11, 2015 at 12:24:45PM -0700, Stephen Hemminger wrote: > >> On Fri, 11 Sep 2015 21:22:03 +0200 > >> Phil Sutterwrote: > >> > >>> When forwarding packets from an 802.1Q interface with REORDER_HDR set > >>> to > >>> zero, the VLAN header previously inserted by vlan_do_receive() needs > >>> to > >>> be stripped from the packet and the mac_header adjustment undone, > >>> otherwise a tagged frame with first four bytes missing will be > >>> transmitted. > >>> > >>> Signed-off-by: Phil Sutter > >>> --- > >>> net/bridge/br_input.c | 10 ++ > >>> 1 file changed, 10 insertions(+) > >>> > >>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > >>> index f921a5d..e4e3fc7 100644 > >>> --- a/net/bridge/br_input.c > >>> +++ b/net/bridge/br_input.c > >>> @@ -288,6 +288,16 @@ rx_handler_result_t br_handle_frame(struct > >>> sk_buff **pskb) > >>> } > >>> > >>> forward: > >>> + if (is_vlan_dev(skb->dev) && > >>> + !(vlan_dev_priv(skb->dev)->flags & VLAN_FLAG_REORDER_HDR)) { > >>> + unsigned int offset = skb->data - skb_mac_header(skb); > >>> + > >>> + skb_push(skb, offset); > >>> + memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); > >>> + skb->mac_header += VLAN_HLEN; > >>> + skb_pull(skb, offset); > >>> + skb_reset_mac_len(skb); > >>> + } > >>> switch (p->state) { > >>> case BR_STATE_FORWARDING: > >>> rhook = rcu_dereference(br_should_route_hook); > >> > >> Thanks for finding this. Is this a new thing or has it always been > >> there? > > > > Sorry, I didn't check if this is a regression or not. Seen initially > > with RHEL7's kernel-3.10.0-229.7.2, which due to the massive backporting > > is by far not as old as it might seem. But it's surely not a brand new > > problem of net-next or so. > > > > Since nowadays no sane mind touches REORDER_HDR (there was originally a > > bug in NetworkManager which defaulted this to 0), it may very well be > > there for a long time already. > > > >> Sorry, this looks so special case it doesn't seem like a good idea. > >> Something is broken in VLAN handling if this is required. > > > > It is so ugly, I wish I had found a better way to fix the problem. Well, > > maybe I miss something: > > > > - packet enters __netif_receive_skb_core(): > > - skb->protocol is set to ETH_P_8021Q, so: > > - packet is untagged > > - skb->vlan_tci set > > - skb->protocol set to 'real' protocol > > - skb_vlan_tag_present(skb) == true, so: > > - vlan_do_receive() is called: > > - tags the packet again > > - zeroes vlan_tci > > - goto another_round > > - __netif_receive_skb_core(), round 2: > > - skb->protocol is not ETH_P_8021Q -> no untagging > > - skb_vlan_tag_present(skb) == false -> no vlan_do_receive() > > - rx_handler handler (== br_handle_frame) is called > > > > IMO the root of all evil is the existence of REORDER_HDR itself. It > > causes an skb which should have been untagged to being passed along with > > VLAN header present and code dealing with it needs to clean up the mess. > > So the problem here appears the be the code the in > br_dev_queue_push_xmit(). > It assumes that MAC_HLEN worth of data has been removed from the skb, > which is normal in case of normal VLAN processing. However, without > REORDER_HEADER set this is no longer the case. In this case, the > ethernet > header is shifted 4 bytes, and when we push the it back we miss the 4 > bytes > of the destination mac address... > >>> > >>> Please note that vlan_do_receive() also inserts the VLAN header in > >>> between ethernet header and IP header, therefore: > >>> > I wonder if it would be safe to just use skb->mac_len. > >>> > >>> Given this works, the bridge would still forward a tagged frame which > >>> should have been untagged in the first place. > >>> > >>> I just wondered where this added VLAN header is dropped if the interface > >>> does not belong to a bridge, but then realized that further packet > >>> processing simply ignores the ethernet header (and everything following > >>> it). So unless I forget something, this should indeed be a > >>> bridge-specific problem. > >>> > >> > >> Looks like macvtap
Re: [PATCH v4] add stealth mode
On Wed, 2015-09-16 at 11:54 +0200, Matteo Croce wrote: > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 93898e0..fe62ae0 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -77,6 +77,7 @@ > #include > > #include > +#include > #include > #include > #include > @@ -1652,7 +1653,7 @@ csum_error: > TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS); > bad_packet: > TCP_INC_STATS_BH(net, TCP_MIB_INERRS); > - } else { > + } else if (!IN_DEV_STEALTH(skb->dev->ip_ptr)) { > tcp_v4_send_reset(NULL, skb); > } It is illegal to deref skb->dev->ip_ptr without proper accessor / annotations. Check struct in_device *in_dev = __in_dev_get_rcu(skb->dev); (Same remarks in other places of your patch) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary
Hi, please find attached the next version of my patch set. I have taken all remarks from Maxime Ripard into the new version Please review, test and report bugs if exists. The patchset applies to all recent Kernel versions (4.x, next etc.). [PATCH v8 1/4] Device Tree Binding Documentation [PATCH v8 2/4] Defconfig multi_v7 [PATCH v8 3/4] Defconfig sunxi [PATCH v8 4/4] Kernel Module History: V8: sun4i_can.c: rename sunxi to sun4i dt: use sun4i-a10-can as identifier can_open: don't use shared IRQ v7: set_normal_mode: stripped (code inserted in can_stop) set_reset_mode: stripped (code inserted in can_start) sunxi_can_start: reworked sunxi_can_stop: function added sunxi_can_err: don't skip if skb alloc fails sunxican_bittiming_const: use netdev_dbg instead of netdev_info sunxican_probe: CAN_CTRLMODE_PRESUME_ACK v6: renamed the driver to sun4i as suggested by Maxime Ripard removed module version removed suspend and resume moved clk enable from can_start into open / should be balanced between enabling and disabling now freeing resources on error v5: fix license modify prefix to mode select defines enable and disable clock in sunxican_get_berr_counter delete set_normal_mode at the end of sunxi_can_start removed sunxican_id_table use devm_clk_get instead of clk_get use devm_ioremap_resource to simplify probe and remove make set-normal-mode and set-reset-mode more readable v4: defines prefixed with SUNXI_ sunxi_can_write_cmdreg tweaked loops in set_xxx_mode reworked add return value to set_xxx_mode sunxican_start_xmit reworked struct platform_driver stripped moved set_bittiming into open moved clock start into open add clock stop to close suspend reworked resume reworked fixed double counting bug v3: changed error state change handling (thx Andri for the hint) use bittiming function correct (no need to call it) strip down priv (suggested by Marc) scripts/checkpatch.pl-> no matches anymore sparse -> no errors or warnings anymore v2: cleaning v1: initial Signed-off-by: Gerhard Bertelsmann--- .../devicetree/bindings/net/can/sun4i_can.txt | 38 + arch/arm/configs/multi_v7_defconfig| 1 + arch/arm/configs/sunxi_defconfig | 2 + drivers/net/can/Kconfig| 10 + drivers/net/can/Makefile | 1 + drivers/net/can/sun4i_can.c| 857 + 6 files changed, 909 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 3/4] can: Allwinner A10/A20 CAN Controller support - Defconfig
Defconfig sunxi for Allwinner A10/A20 CAN driver Signed-off-by: Gerhard Bertelsmann--- arch/arm/configs/sunxi_defconfig | 2 + 1 file changed, 2 insertions(+) diff --git a/arch/arm/configs/sunxi_defconfig b/arch/arm/configs/sunxi_defconfig index 51eea22..fe020a5 100644 --- a/arch/arm/configs/sunxi_defconfig +++ b/arch/arm/configs/sunxi_defconfig @@ -31,6 +31,8 @@ CONFIG_IP_PNP_BOOTP=y # CONFIG_INET_LRO is not set # CONFIG_INET_DIAG is not set # CONFIG_IPV6 is not set +CONFIG_CAN=y +CONFIG_CAN_SUN4I=y # CONFIG_WIRELESS is not set CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 2/4] can: Allwinner A10/A20 CAN Controller support - Defconfig
Defconfig multi_v7 for Allwinner A10/A20 CAN driver Signed-off-by: Gerhard Bertelsmann--- arch/arm/configs/multi_v7_defconfig| 1 + 1 file changed, 1 insertions(+) diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index 03deb7f..14eb6b9 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -153,6 +153,7 @@ CONFIG_CAN_DEV=y CONFIG_CAN_AT91=m CONFIG_CAN_XILINXCAN=y CONFIG_CAN_MCP251X=y +CONFIG_CAN_SUN4I=y CONFIG_BT=m CONFIG_BT_MRVL=m CONFIG_BT_MRVL_SDIO=m -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings
Devicetree bindings for Allwinner A10/A20 CAN Signed-off-by: Gerhard Bertelsmann--- .../devicetree/bindings/net/can/sun4i_can.txt | 38 + 1 files changed, 389 insertions(+) diff --git a/Documentation/devicetree/bindings/net/can/sun4i_can.txt b/Documentation/devicetree/bindings/net/can/sun4i_can.txt new file mode 100644 index 000..cd0f50c --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/sun4i_can.txt @@ -0,0 +1,38 @@ +Allwinner A10/A20 CAN controller Device Tree Bindings +- + +Required properties: +- compatible: "allwinner,sun4i-a10-can" +- reg: physical base address and size of the Allwinner A10/A20 CAN register map. +- interrupts: interrupt specifier for the sole interrupt. +- clock: phandle and clock specifier. + + +Example +--- + +SoC common .dtsi file: + + can0_pins_a: can0@0 { + allwinner,pins = "PH20","PH21"; + allwinner,function = "can"; + allwinner,drive = <0>; + allwinner,pull = <0>; + }; +... + can0: can@01c2bc00 { + compatible = "allwinner,sun4i-a10-can"; + reg = <0x01c2bc00 0x400>; + interrupts = <0 26 4>; + clocks = <_gates 4>; + status = "disabled"; + }; + +Board specific .dts file: + + can0: can@01c2bc00 { + pinctrl-names = "default"; + pinctrl-0 = <_pins_a>; + status = "okay"; + }; + -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] solos-pci: Increase headroom on received packets
On Wed, 2015-09-16 at 03:53 -0700, Eric Dumazet wrote: > You should use netdev_alloc_skb() : This helper is better for rx skbs, > as it allows for better packing of frames in GRO or TCP stack. OK, thanks. I don't have a netdev (this is an ATM device) but I can use dev_alloc_skb(). > Also netdev_alloc_skb_ip_align() might handle the NET_IP_ALIGN stuff > for arches that care. I'd briefly considered NET_IP_ALIGN but decided against it because this isn't Ethernet and my hardware header is a nice sane 8 bytes, not 14. But actually, the primary use cases for this are PPPoATM — with 2 bytes of PPP frame type, and PPPoE over BR2684 — with 14 bytes of Ethernet header. So NET_IP_ALIGN would actually make sense. Unfortunately the FPGA can't do DMA to unaligned addresses, so I can't do it in the DMA case. I can do it for the MMIO code path though (which I still haven't tested). I'll send a new patch in a moment... -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: IPv6 routing/fragmentation panic
On Wed, 2015-09-16 at 01:48 +0200, Florian Westphal wrote: > > What I don't understand is why you see this with fragmented ipv6 > packets only (and not with all ipv6 forwarded skbs). > > Something like this copy-pastry from ip_finish_output2 should fix it: That works; thanks. Tested-by: David WoodhouseA little extra debugging output shows that the offending fragments were arriving here with skb_headroom(skb)==10. Which is reasonable, being the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP frame type. The non-fragmented packets, on the other hand, are arriving with a headroom of 42 bytes. Could something else already have reallocated them before they get that far? (Do we have any way to gather statistics on such reallocations? It seems that might be useful for performance investigation.) Johannes and I were talking on IRC yesterday about trying to make this kind of thing easier to reproduce without odd hardware. We postulated a skb_torture() function which, when an appropriate debugging option was enabled, would randomly screw around with the skb in various interesting ways — shifting the data down so that there's no headroom, deliberately making it *non-linear*, temporarily cloning it and freeing the clone a couple of seconds later, etc. Then we could insert calls to skb_torture() in interesting places like netif_rx(), ip6_finish_output2() and anywhere else that seems appropriate (perhaps with flags to indicate *what* kind of torture is permissible in certain locations). And see what breaks... -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v4] add stealth mode
On 09/16/2015 11:54 AM, Matteo Croce wrote: Add option to disable any reply not related to a listening socket, like RST/ACK for TCP and ICMP Port-Unreachable for UDP. Also disables ICMP replies to echo request and timestamp. The stealth mode can be enabled selectively for a single interface. Signed-off-by: Matteo Croce--- rebased on 4.3-rc1 Documentation/networking/ip-sysctl.txt | 14 ++ include/linux/inetdevice.h | 1 + include/linux/ipv6.h | 1 + include/uapi/linux/ip.h| 1 + net/ipv4/devinet.c | 1 + net/ipv4/icmp.c| 6 ++ net/ipv4/ip_input.c| 5 +++-- net/ipv4/tcp_ipv4.c| 3 ++- net/ipv4/udp.c | 4 +++- net/ipv6/addrconf.c| 7 +++ net/ipv6/icmp.c| 3 ++- net/ipv6/ip6_input.c | 5 +++-- net/ipv6/tcp_ipv6.c| 2 +- net/ipv6/udp.c | 3 ++- 14 files changed, 47 insertions(+), 9 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ebe94f2..1d46adc 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1206,6 +1206,13 @@ igmp_link_local_mcast_reports - BOOLEAN 224.0.0.X range. Default TRUE +stealth - BOOLEAN + Disable any reply not related to a listening socket, + like RST/ACK for TCP and ICMP Port-Unreachable for UDP. + Also disables ICMP replies to echo requests and timestamp + and ICMP errors for unknown protocols. + Default value is 0. + Hmm, what about all other protocols besides TCP/UDP such as SCTP, DCCP, etc? It seems it gives false expectations in such cases when the user enables being "stealth", but finds out it has no effect at all there ... nmap f.e. has a couple of scanning options for SCTP, and at least SCTP is still relevant in telco space. I know this question has been asked before, but the only answer on this was so far: "well, I've never played with SCTP before" ... :/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] solos-pci: Increase headroom on received packets
A comment in include/linux/skbuff.h says that: * Various parts of the networking layer expect at least 32 bytes of * headroom, you should not reduce this. This was demonstrated by a panic when handling fragmented IPv6 packets: http://marc.info/?l=linux-netdev=144236093519172=2 It's not entirely clear if that comment is still valid — and if it is, perhaps netif_rx() ought to be enforcing it with a warning. But either way, it is rather stupid from a performance point of view for us to be receiving packets into a buffer which doesn't have enough room to prepend an Ethernet header — it means that *every* incoming packet is going to be need to be reallocated. So let's fix that. Signed-off-by: David Woodhouse--- Tested in the DMA code path; I don't believe the DMA-capable devices can still be used in MMIO mode. Simon, Guy, would you be able to test the MMIO version? diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c index 74e18b0..be8225e 100644 --- a/drivers/atm/solos-pci.c +++ b/drivers/atm/solos-pci.c @@ -805,13 +805,13 @@ static void solos_bh(unsigned long card_arg) continue; } - skb = alloc_skb(size + 1, GFP_ATOMIC); + skb = alloc_skb(size + NET_SKB_PAD + 1, GFP_ATOMIC); if (!skb) { if (net_ratelimit()) dev_warn(>dev->dev, "Failed to allocate sk_buff for RX\n"); continue; } - + skb_reserve(skb, NET_SKB_PAD); memcpy_fromio(skb_put(skb, size), RX_BUF(card, port) + sizeof(*header), size); @@ -869,8 +869,10 @@ static void solos_bh(unsigned long card_arg) /* Allocate RX skbs for any ports which need them */ if (card->using_dma && card->atmdev[port] && !card->rx_skb[port]) { - struct sk_buff *skb = alloc_skb(RX_DMA_SIZE, GFP_ATOMIC); + struct sk_buff *skb = alloc_skb(RX_DMA_SIZE + NET_SKB_PAD, + GFP_ATOMIC); if (skb) { + skb_reserve(skb, NET_SKB_PAD); SKB_CB(skb)->dma_addr = dma_map_single(>dev->dev, skb->data, RX_DMA_SIZE, DMA_FROM_DEVICE); -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
[PATCH v2] solos-pci: Increase headroom on received packets
A comment in include/linux/skbuff.h says that: * Various parts of the networking layer expect at least 32 bytes of * headroom, you should not reduce this. This was demonstrated by a panic when handling fragmented IPv6 packets: http://marc.info/?l=linux-netdev=144236093519172=2 It's not entirely clear if that comment is still valid — and if it is, perhaps netif_rx() ought to be enforcing it with a warning. But either way, it is rather stupid from a performance point of view for us to be receiving packets into a buffer which doesn't have enough room to prepend an Ethernet header — it means that *every* incoming packet is going to be need to be reallocated. So let's fix that. Signed-off-by: David Woodhouse--- diff --git a/drivers/atm/solos-pci.c b/drivers/atm/solos-pci.c index 74e18b0..3d7fb65 100644 --- a/drivers/atm/solos-pci.c +++ b/drivers/atm/solos-pci.c @@ -805,7 +805,12 @@ static void solos_bh(unsigned long card_arg) continue; } - skb = alloc_skb(size + 1, GFP_ATOMIC); + /* Use netdev_alloc_skb() because it adds NET_SKB_PAD of +* headroom, and ensures we can route packets back out an +* Ethernet interface (for example) without having to +* reallocate. Adding NET_IP_ALIGN also ensures that both +* PPPoATM and PPPoEoBR2684 packets end up aligned. */ + skb = netdev_alloc_skb_ip_align(NULL, size + 1); if (!skb) { if (net_ratelimit()) dev_warn(>dev->dev, "Failed to allocate sk_buff for RX\n"); @@ -869,7 +874,10 @@ static void solos_bh(unsigned long card_arg) /* Allocate RX skbs for any ports which need them */ if (card->using_dma && card->atmdev[port] && !card->rx_skb[port]) { - struct sk_buff *skb = alloc_skb(RX_DMA_SIZE, GFP_ATOMIC); + /* Unlike the MMIO case (qv) we can't add NET_IP_ALIGN +* here; the FPGA can only DMA to addresses which are +* aligned to 4 bytes. */ + struct sk_buff *skb = dev_alloc_skb(RX_DMA_SIZE); if (skb) { SKB_CB(skb)->dma_addr = dma_map_single(>dev->dev, skb->data, -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v2 net] net/mlx4_en: really allow to change RSS key
On Wed, Sep 16, 2015 at 4:29 AM, Eric Dumazetwrote: > From: Eric Dumazet > > When changing rss key, we do not want to overwrite user provided key > by the one provided by netdev_rss_key_fill(), which is the host random > key generated at boot time. > > Fixes: 947cbb0ac242 ("net/mlx4_en: Support for configurable RSS hash > function") > Signed-off-by: Eric Dumazet > Cc: Eyal Perry > CC: Amir Vadai Acked-by: Or Gerlitz Dave, can you please push it to -stable of >= 3.19 ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] libnftnl 1.0.4 release
Hi! The Netfilter project proudly presents: libnftnl 1.0.4 libnftnl is a userspace library providing a low-level netlink programming interface (API) to the in-kernel nf_tables subsystem. The library libnftnl has been previously known as libnftables. This library is currently used by the nft command line tool. This release comes with new features available up to 4.2, see ChangeLog for more details. In this release, we have renamed most of the library symbols to use the nftnl_ prefix while keeping aliases to the old ones. We would like to reserve the nft_ prefix for our higher level library which should land anytime soon. We have kept aliases around to reduce the impact of this changes, but they will be deprecated soon. Sorry for the inconvenience in any case. You can download this library from: http://www.netfilter.org/projects/libnftnl/downloads.html ftp://ftp.netfilter.org/pub/libnftnl/ Thanks! Alvaro Neira (12): ruleset: clean up the variable names in the xml/json parsing functions src: don't create iterator with empty list ruleset: refactor nft_ruleset_*_parse_ruleset() set: refactor code in json parse function rule: don't release the tree parameter from nft_jansson_parse_rule() ruleset: fix leak in json/xml in set lists ruleset: fix crash if we free sets included in the set_list ruleset: crash from error path when we build the xml/json tree xml: test if the root node name is initialized examples: add nft-ruleset-parse-file ruleset: add nft_ruleset_ctx_free parser: Add operation not supported error message Alvaro Neira Ayuso (4): buffer: fix missing XML string tag in nft_buf_close src: add command tag in JSON/XML export support src: add support to import JSON/XML with the new command tag tests: update JSON/XML tests with the new syntax Arturo Borrero Gonzalez (1): expr: dynset: fix json/xml parsing Balazs Scheidler (1): expr: redir: fix snprintf to return the number of bytes printed Carlos Falgueras García (1): src: fix memory leaks at nft_[object]_nlmsg_parse Pablo Neira Ayuso (17): src: add missing include in utils.c ruleset: fix more leaks in error path src: split internal.h is smaller files Makefile: internal.h now resides in include src: restore static array with expression operations src: add batch abstraction table: add netdev family support chain: add netdev family support expr: immediate: fix leak in expression destroy path src: introduce nftnl_* aliases for all existing functions src: rename existing functions to use the nftnl_ prefix src: add compat header file definitions src: rename nftnl_rule_expr to nftnl_expr src: rename NFTNL_RULE_EXPR_ATTR to NFTNL_EXPR_ src: get rid of _ATTR_ infix in new nfntl_ definitions src: get rid of _attr_ infix in new nftnl_ definitions bump version to 1.0.4 Patrick McHardy (11): list: fix prefetch dummy set: add support for set timeouts set_elem: add timeout support set: print set elem timeout information set_elem: add support for userdata expr: add support for the dynset expr headers: resync headers for new register definitions data: increase maximum possible data size expr: seperate expression parsing and building functions set_elem: support expressions attached to set elements dynset: support expression templates
Re: [PATCH 27/31] net/tipc: use kmemdup rather than duplicating its implementation
Ping. Regards Andrzej On 08/07/2015 09:59 AM, Andrzej Hajda wrote: > The patch was generated using fixed coccinelle semantic patch > scripts/coccinelle/api/memdup.cocci [1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 > > Signed-off-by: Andrzej Hajda> --- > net/tipc/server.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/net/tipc/server.c b/net/tipc/server.c > index 922e04a..c187cad 100644 > --- a/net/tipc/server.c > +++ b/net/tipc/server.c > @@ -411,13 +411,12 @@ static struct outqueue_entry *tipc_alloc_entry(void > *data, int len) > if (!entry) > return NULL; > > - buf = kmalloc(len, GFP_ATOMIC); > + buf = kmemdup(data, len, GFP_ATOMIC); > if (!buf) { > kfree(entry); > return NULL; > } > > - memcpy(buf, data, len); > entry->iov.iov_base = buf; > entry->iov.iov_len = len; > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 4/4] can: Allwinner A10/A20 CAN Controller support - Kernel module
Kernel module for Allwinner A10/A20 CAN Signed-off-by: Gerhard Bertelsmann--- drivers/net/can/Kconfig| 10 + drivers/net/can/Makefile | 1 + drivers/net/can/sun4i_can.c| 857 + 3 files changed, 868 insertions(+) diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig index e8c96b8..6d04183 100644 --- a/drivers/net/can/Kconfig +++ b/drivers/net/can/Kconfig @@ -129,6 +129,16 @@ config CAN_RCAR To compile this driver as a module, choose M here: the module will be called rcar_can. +config CAN_SUN4I + tristate "Allwinner A10 CAN controller" + depends on MACH_SUN4I || MACH_SUN7I || COMPILE_TEST + ---help--- + Say Y here if you want to use CAN controller found on Allwinner + A10/A20 SoCs. + + To compile this driver as a module, choose M here: the module will + be called sun4i_can. + config CAN_XILINXCAN tristate "Xilinx CAN" depends on ARCH_ZYNQ || ARM64 || MICROBLAZE || COMPILE_TEST diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile index c533c62..1f21cef 100644 --- a/drivers/net/can/Makefile +++ b/drivers/net/can/Makefile @@ -27,6 +27,7 @@ obj-$(CONFIG_CAN_FLEXCAN) += flexcan.o obj-$(CONFIG_PCH_CAN) += pch_can.o obj-$(CONFIG_CAN_GRCAN)+= grcan.o obj-$(CONFIG_CAN_RCAR) += rcar_can.o +obj-$(CONFIG_CAN_SUN4I)+= sun4i_can.o obj-$(CONFIG_CAN_XILINXCAN)+= xilinx_can.o subdir-ccflags-y += -D__CHECK_ENDIAN__ diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c new file mode 100644 index 000..10d8497 --- /dev/null +++ b/drivers/net/can/sun4i_can.c @@ -0,0 +1,857 @@ +/* + * sun4i_can.c - CAN bus controller driver for Allwinner SUN4I based SoCs + * + * Copyright (C) 2013 Peter Chen + * Copyright (C) 2015 Gerhard Bertelsmann + * All rights reserved. + * + * Parts of this software are based on (derived from) the SJA1000 code by: + * Copyright (C) 2014 Oliver Hartkopp + * Copyright (C) 2007 Wolfgang Grandegger + * Copyright (C) 2002-2007 Volkswagen Group Electronic Research + * Copyright (C) 2003 Matthias Brukner, Trajet Gmbh, Rebenring 33, + * 38106 Braunschweig, GERMANY + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. Neither the name of Volkswagen nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * Alternatively, provided that this notice is retained in full, this + * software may be distributed under the terms of the GNU General + * Public License ("GPL") version 2, in which case the provisions of the + * GPL apply INSTEAD OF those given above. + * + * The provided data structures and external interfaces from this code + * are not restricted to be used by modules with a GPL compatible license. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH + * DAMAGE. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DRV_NAME "sun4i_can" + +/* Registers address (physical base address 0x01C2BC00) */ +#define SUN4I_REG_MSEL_ADDR0x /* CAN Mode Select */ +#define SUN4I_REG_CMD_ADDR 0x0004 /* CAN Command */ +#define SUN4I_REG_STA_ADDR 0x0008 /* CAN Status */ +#define SUN4I_REG_INT_ADDR 0x000c /* CAN Interrupt Flag */ +#define SUN4I_REG_INTEN_ADDR 0x0010 /* CAN Interrupt Enable */ +#define SUN4I_REG_BTIME_ADDR 0x0014 /* CAN Bus Timing 0 */ +#define SUN4I_REG_TEWL_ADDR0x0018 /*
Re: [PATCH RFC] solos-pci: Fix BUG() with shared skb
On Tue, September 15, 2015 20:10, David Woodhouse wrote: > On Wed, 2013-09-04 at 21:41 +0100, David Woodhouse wrote: >> +++ b/drivers/atm/solos-pci.c >> @@ -1145,19 +1145,19 @@ static int psend(struct atm_vcc *vcc, struct sk_buff >> *skb) >> +> > if (skb_headroom(skb) < sizeof(*header)) { >> +> > > struct sk_buff *nskb; >> + >> +> > > nskb = skb_realloc_headroom(skb, sizeof(*header)); >> +> > > if (!nskb) { >> +> > > > solos_pop(vcc, skb); >> +> > > > return -ENOMEM; >> +> > > } >> +> > > if (skb->truesize != nskb->truesize) >> +> > > > atm_force_charge(vcc, nskb->truesize - skb->truesize); >> + >> +> > > dev_kfree_skb_any(skb); >> +> > > skb = nskb; >> > > } > > Simon, did you ever test this? > Can you still (tell me how to) reproduce the original problem? I think > that sending on br2684 was necessary but not sufficient...? I'm currently using this but without the call to atm_force_charge(). I don't know how to reproduce the BUG() but it hasn't happened again. -- Simon Arlott -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Experiences with slub bulk use-case for network stack
Hint, this leads up to discussing if current bulk *ALLOC* API need to be changed... Alex and I have been working hard on practical use-case for SLAB bulking (mostly slUb), in the network stack. Here is a summary of what we have learned so far. Bulk free'ing SKBs during TX completion is a big and easy win. Specifically for slUb, normal path for freeing these objects (which are not on c->freelist) require a locked double_cmpxchg per object. The bulk free (via detached freelist patch) allow to free all objects belonging to the same slab-page, to be free'ed with a single locked double_cmpxchg. Thus, the bulk free speedup is quite an improvement. The slUb alloc is hard to beat on speed: * accessing c->freelist, local cmpxchg 9 cycles (38% of cost) * c->freelist is refilled with single locked cmpxchg In micro benchmarking it looks like we can beat alloc, because we do a local_irq_{disable,enable} (cost 7 cycles). And then pull out all objects in c->freelist. Thus, saving 9 cycles per object (counting from the 2nd object). However, in practical use-cases we are seeing the single object alloc win over bulk alloc, we believe this to be due to prefetching. When c->freelist get (semi) cache-cold, then it gets more expensive to walk the freelist (which is a basic single linked list to next free object). For bulk alloc the full freelist is walked (right-way) and objects pulled out into the array. For normal single object alloc only a single object is returned, but it does a prefetch on the next object pointer. Thus, next time single alloc is called the object will have been prefetched. Doing prefetch in bulk alloc only helps a little, as it does not have enough "time" between accessing/walking the freelist for objects. So, how can we solve this and make bulk alloc faster? Alex and I had the idea of bulk alloc returns an "allocator specific cache" data-structure (and we add some helpers to access this). In the slUb case, the freelist is a single linked pointer list. In the network stack the skb objects have a skb->next pointer, which is located at the same position as freelist pointer. Thus, simply returning the freelist directly, could be interpreted as a skb-list. The helper API would then do the prefetching, when pulling out objects. For the slUb case, we would simply cmpxchg either c->freelist or page->freelist with a NULL ptr, and then own all objects on the freelist. This also reduce the time we keep IRQs disabled. API wise, we don't (necessary) know how many objects are on the freelist (without first walking the list, which would cause stalls on data, which we are trying to avoid). Thus, the API of always returning the exact number of requested objects will not work... -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer (related to http://thread.gmane.org/gmane.linux.kernel.mm/137469) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] add stealth mode
On 09/16/2015 12:45 PM, Matteo Croce wrote: 2015-09-16 12:26 GMT+02:00 Daniel Borkmann: On 09/16/2015 11:54 AM, Matteo Croce wrote: Add option to disable any reply not related to a listening socket, like RST/ACK for TCP and ICMP Port-Unreachable for UDP. Also disables ICMP replies to echo request and timestamp. The stealth mode can be enabled selectively for a single interface. Signed-off-by: Matteo Croce --- rebased on 4.3-rc1 Documentation/networking/ip-sysctl.txt | 14 ++ include/linux/inetdevice.h | 1 + include/linux/ipv6.h | 1 + include/uapi/linux/ip.h| 1 + net/ipv4/devinet.c | 1 + net/ipv4/icmp.c| 6 ++ net/ipv4/ip_input.c| 5 +++-- net/ipv4/tcp_ipv4.c| 3 ++- net/ipv4/udp.c | 4 +++- net/ipv6/addrconf.c| 7 +++ net/ipv6/icmp.c| 3 ++- net/ipv6/ip6_input.c | 5 +++-- net/ipv6/tcp_ipv6.c| 2 +- net/ipv6/udp.c | 3 ++- 14 files changed, 47 insertions(+), 9 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index ebe94f2..1d46adc 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1206,6 +1206,13 @@ igmp_link_local_mcast_reports - BOOLEAN 224.0.0.X range. Default TRUE +stealth - BOOLEAN + Disable any reply not related to a listening socket, + like RST/ACK for TCP and ICMP Port-Unreachable for UDP. + Also disables ICMP replies to echo requests and timestamp + and ICMP errors for unknown protocols. + Default value is 0. + Hmm, what about all other protocols besides TCP/UDP such as SCTP, DCCP, etc? It seems it gives false expectations in such cases when the user enables being "stealth", but finds out it has no effect at all there ... nmap f.e. has a couple of scanning options for SCTP, and at least SCTP is still relevant in telco space. I know this question has been asked before, but the only answer on this was so far: "well, I've never played with SCTP before" ... :/ Right, I was thinking to add them in a later version I feel, there would be many follow-ups. :/ Architecturally on the bigger picture, nft and its connection tracker would be the much better place for such policies, and it also provides matches for various protocols already. What has been tried to address this more generically f.e. inside netfilter subsystem, and why is it absolutely not possible to extend this functionality over there? Sorry if my question is stubborn, but from reading over the old threads it still is not fully clear to me. Thanks again, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] bna: check for dma mapping errors
Check for DMA mapping errors, recover from them and register them in ethtool stats like other errors. Cc: Rasesh ModySigned-off-by: Ivan Vecera --- drivers/net/ethernet/brocade/bna/bna_tx_rx.c| 2 ++ drivers/net/ethernet/brocade/bna/bna_types.h| 1 + drivers/net/ethernet/brocade/bna/bnad.c | 29 - drivers/net/ethernet/brocade/bna/bnad.h | 2 ++ drivers/net/ethernet/brocade/bna/bnad_ethtool.c | 4 5 files changed, 37 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/brocade/bna/bna_tx_rx.c b/drivers/net/ethernet/brocade/bna/bna_tx_rx.c index 5d0753c..04b0d16 100644 --- a/drivers/net/ethernet/brocade/bna/bna_tx_rx.c +++ b/drivers/net/ethernet/brocade/bna/bna_tx_rx.c @@ -2400,6 +2400,7 @@ bna_rx_create(struct bna *bna, struct bnad *bnad, q0->rcb->id = 0; q0->rx_packets = q0->rx_bytes = 0; q0->rx_packets_with_error = q0->rxbuf_alloc_failed = 0; + q0->rxbuf_map_failed = 0; bna_rxq_qpt_setup(q0, rxp, dpage_count, PAGE_SIZE, _mem[i], _mem[i], _mem[i]); @@ -2428,6 +2429,7 @@ bna_rx_create(struct bna *bna, struct bnad *bnad, : rx_cfg->q1_buf_size; q1->rx_packets = q1->rx_bytes = 0; q1->rx_packets_with_error = q1->rxbuf_alloc_failed = 0; + q1->rxbuf_map_failed = 0; bna_rxq_qpt_setup(q1, rxp, hpage_count, PAGE_SIZE, _mem[i], _mem[i], diff --git a/drivers/net/ethernet/brocade/bna/bna_types.h b/drivers/net/ethernet/brocade/bna/bna_types.h index e0e797f..c438d03 100644 --- a/drivers/net/ethernet/brocade/bna/bna_types.h +++ b/drivers/net/ethernet/brocade/bna/bna_types.h @@ -587,6 +587,7 @@ struct bna_rxq { u64 rx_bytes; u64 rx_packets_with_error; u64 rxbuf_alloc_failed; + u64 rxbuf_map_failed; }; /* RxQ pair */ diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c index 506047c..21a0cfc 100644 --- a/drivers/net/ethernet/brocade/bna/bnad.c +++ b/drivers/net/ethernet/brocade/bna/bnad.c @@ -399,7 +399,13 @@ bnad_rxq_refill_page(struct bnad *bnad, struct bna_rcb *rcb, u32 nalloc) } dma_addr = dma_map_page(>pcidev->dev, page, page_offset, - unmap_q->map_size, DMA_FROM_DEVICE); + unmap_q->map_size, DMA_FROM_DEVICE); + if (dma_mapping_error(>pcidev->dev, dma_addr)) { + put_page(page); + BNAD_UPDATE_CTR(bnad, rxbuf_map_failed); + rcb->rxq->rxbuf_map_failed++; + goto finishing; + } unmap->page = page; unmap->page_offset = page_offset; @@ -454,8 +460,15 @@ bnad_rxq_refill_skb(struct bnad *bnad, struct bna_rcb *rcb, u32 nalloc) rcb->rxq->rxbuf_alloc_failed++; goto finishing; } + dma_addr = dma_map_single(>pcidev->dev, skb->data, buff_sz, DMA_FROM_DEVICE); + if (dma_mapping_error(>pcidev->dev, dma_addr)) { + dev_kfree_skb_any(skb); + BNAD_UPDATE_CTR(bnad, rxbuf_map_failed); + rcb->rxq->rxbuf_map_failed++; + goto finishing; + } unmap->skb = skb; dma_unmap_addr_set(>vector, dma_addr, dma_addr); @@ -3025,6 +3038,11 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev) unmap = head_unmap; dma_addr = dma_map_single(>pcidev->dev, skb->data, len, DMA_TO_DEVICE); + if (dma_mapping_error(>pcidev->dev, dma_addr)) { + dev_kfree_skb_any(skb); + BNAD_UPDATE_CTR(bnad, tx_skb_map_failed); + return NETDEV_TX_OK; + } BNA_SET_DMA_ADDR(dma_addr, >vector[0].host_addr); txqent->vector[0].length = htons(len); dma_unmap_addr_set(>vectors[0], dma_addr, dma_addr); @@ -3056,6 +3074,15 @@ bnad_start_xmit(struct sk_buff *skb, struct net_device *netdev) dma_addr = skb_frag_dma_map(>pcidev->dev, frag, 0, size, DMA_TO_DEVICE); + if (dma_mapping_error(>pcidev->dev, dma_addr)) { + /* Undo the changes starting at tcb->producer_index */ + bnad_tx_buff_unmap(bnad, unmap_q, q_depth, + tcb->producer_index); + dev_kfree_skb_any(skb); + BNAD_UPDATE_CTR(bnad, tx_skb_map_failed); +
Re: IPv6 routing/fragmentation panic
David Woodhousewrote: > On Wed, 2015-09-16 at 01:48 +0200, Florian Westphal wrote: > > > > What I don't understand is why you see this with fragmented ipv6 > > packets only (and not with all ipv6 forwarded skbs). > > > > Something like this copy-pastry from ip_finish_output2 should fix it: > > That works; thanks. > > Tested-by: David Woodhouse > > A little extra debugging output shows that the offending fragments were > arriving here with skb_headroom(skb)==10. Which is reasonable, being > the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP > frame type. > > The non-fragmented packets, on the other hand, are arriving with a > headroom of 42 bytes. Could something else already have reallocated > them before they get that far? Yep. I missed if (skb_cow(skb, dst->dev->hard_header_len)) { call in ip6_forward(). Problem is of course that we only expand headroom of the skb and not of the fragment(s) stored in that skbs frag list. So we have several options for a fix. - expand headroom in ip6_finish_output2, like we do for ipv4 - expand headroom in ip6_fragment - defer to slowpath if frags don't have enough headroom. The latter is the smallest patch and would not add test for locally generated, non-fragmented skbs. (not even compile tested) David, could you test this? I'd do an official patch submission then. diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -586,6 +586,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, frag_id = ipv6_select_ident(net, _hdr(skb)->daddr, _hdr(skb)->saddr); + hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { int first_len = skb_pagelen(skb); struct sk_buff *frag2; @@ -599,7 +600,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, /* Correct geometry. */ if (frag->len > mtu || ((frag->len & 7) && frag->next) || - skb_headroom(frag) < hlen) + skb_headroom(frag) < (hlen + hroom)) goto slow_path_clean; /* Partially cloned skb? */ @@ -724,7 +725,6 @@ slow_path: */ *prevhdr = NEXTHDR_FRAGMENT; - hroom = LL_RESERVED_SPACE(rt->dst.dev); troom = rt->dst.dev->needed_tailroom; /* -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On 2015-09-16 15:07, David Ahern wrote: > On 9/16/15 5:50 AM, Richard Alpe wrote: >> On 2015-09-16 11:24, Sergey Senozhatsky wrote: >>> Hi, >>> >>> 4.3.0-rc1-next-20150916 >>> >>> oops after removal of rndis usb device > > Hi Sergey: > > Is this with KVM or baremetal? > > -8<- > thanks for the analysis > >>> addr2line -e vmlinux -i 0x8146c0b1 >>> net/ipv4/route.c:1815 >>> net/ipv4/route.c:1905 >>> >>> >>> which seems to be this line ip_route_input_noref()->ip_route_input_slow(): >>> ... >>> 1813 rth->rt_is_input = 1; >>> 1814 if (res.table) >>> 1815 rth->rt_table_id = res.table->tb_id; >>> 1816 >>> ... >>> >>> >>> added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 >>> >>> net: Add FIB table id to rtable >>> >>> Add the FIB table id to rtable to make the information available for >>> IPv4 as it is for IPv6. >>> >>> >>> -ss > > Hi Richard: > >> I to get an Oops in ip_route_input_noref(). It happens occasionally during >> bootup. >> KVM environment using virtio driver. Let me know if you need any additional >> info or >> if you want me to try to bisect it. >> >> Starting network... >> ... >> [0.877040] BUG: unable to handle kernel NULL pointer dereference at >> 0056 >> [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 > > Can you send me your kernel config and qemu command line? KVM with virtio > networking is a primary test vehicle, and I did not encounter this at all. Sure thing. Not sure how ppl normally provide files on netdev but I'm just going to go ahead and paste them here :) $ ps aux | grep kvm qemu-system-x86_64 -enable-kvm -name tipc-medium-node1 -S -machine pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid cdec478a-5f0d-49f1-b25e-fac4ca0b290c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/tipc-medium-node1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=n,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=25,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:0f:ff:10:04:01,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:28101 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on $ cat .config # # Automatically generated file; DO NOT EDIT. # Linux/x86 3.12.28 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_ARCH_HAS_CPU_AUTOPROBE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_CPU_PROBE_RELEASE=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_SUSE_KERNEL=y # CONFIG_SUSE_KERNEL_SUPPORTED is not set # CONFIG_SPLIT_PACKAGE is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME=&qu
Re: [linux-next] oops in ip_route_input_noref
On 2015-09-16 15:53, Richard Alpe wrote: > On 2015-09-16 15:07, David Ahern wrote: >> On 9/16/15 5:50 AM, Richard Alpe wrote: >>> On 2015-09-16 11:24, Sergey Senozhatsky wrote: >>>> Hi, >>>> >>>> 4.3.0-rc1-next-20150916 >>>> >>>> oops after removal of rndis usb device >> >> Hi Sergey: >> >> Is this with KVM or baremetal? >> >> -8<- >> thanks for the analysis >> >>>> addr2line -e vmlinux -i 0x8146c0b1 >>>> net/ipv4/route.c:1815 >>>> net/ipv4/route.c:1905 >>>> >>>> >>>> which seems to be this line ip_route_input_noref()->ip_route_input_slow(): >>>> ... >>>> 1813 rth->rt_is_input = 1; >>>> 1814 if (res.table) >>>> 1815 rth->rt_table_id = res.table->tb_id; >>>> 1816 >>>> ... >>>> >>>> >>>> added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 >>>> >>>> net: Add FIB table id to rtable >>>> >>>> Add the FIB table id to rtable to make the information available for >>>> IPv4 as it is for IPv6. >>>> >>>> >>>> -ss >> >> Hi Richard: >> >>> I to get an Oops in ip_route_input_noref(). It happens occasionally during >>> bootup. >>> KVM environment using virtio driver. Let me know if you need any additional >>> info or >>> if you want me to try to bisect it. >>> >>> Starting network... >>> ... >>> [0.877040] BUG: unable to handle kernel NULL pointer dereference at >>> 0056 >>> [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 >> >> Can you send me your kernel config and qemu command line? KVM with virtio >> networking is a primary test vehicle, and I did not encounter this at all. > Sure thing. Not sure how ppl normally provide files on netdev but I'm just > going > to go ahead and paste them here :) > > $ ps aux | grep kvm > qemu-system-x86_64 -enable-kvm -name tipc-medium-node1 -S -machine > pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp > 2,sockets=2,cores=1,threads=1 -uuid cdec478a-5f0d-49f1-b25e-fac4ca0b290c > -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/tipc-medium-node1.monitor,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown > -boot order=n,menu=on,strict=on -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=25,id=hostnet0 > -device > e1000,netdev=hostnet0,id=net0,mac=00:0f:ff:10:04:01,bus=pci.0,addr=0x3 > -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 > -vnc 127.0.0.1:28101 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on Sorry about that kvm cmdline was a copy-paste error. Here's the right one using virtio. $ ps aux | grep qemu qemu-system-x86_64 -enable-kvm -name tipc-large-node16 -S -machine pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 5c2ffa5f-fc39-47a2-9868-9ef93bada31a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/tipc-large-node16.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=n,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=48 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:0f:ff:10:05:16,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:29116 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on Regards Richard > > $ cat .config > # > # Automatically generated file; DO NOT EDIT. > # Linux/x86 3.12.28 Kernel Configuration > # > CONFIG_64BIT=y > CONFIG_X86_64=y > CONFIG_X86=y > CONFIG_INSTRUCTION_DECODER=y > CONFIG_OUTPUT_FORMAT="elf64-x86-64" > CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" > CONFIG_LOCKDEP_SUPPORT=y > CONFIG_STACKTRACE_SUPPORT=y > CONFIG_HAVE_LATENCYTOP_SUPPORT=y > CONFIG_MMU=y > CONFIG_NEED_DMA_MAP_STATE=y > CONFIG_NEED_SG_DMA_LENGTH=y > CONFIG_GENERIC_ISA_DMA=y > CONFIG_GENERIC_BUG=y > CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y > CONFIG_GENERIC_HWEIGHT=y > CONFIG_ARCH_MAY_HAVE_PC_FDC=y > CONFIG_RWSEM_XCHGADD_ALGORITHM=y > CONFIG_GENERIC_CALIBRATE_DELAY=y > CONFIG_ARCH_HAS_CPU_RELAX=y
Re: IPv6 routing/fragmentation panic
On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote: > @@ -599,7 +600,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff > *skb, > /* Correct geometry. */ > if (frag->len > mtu || > ((frag->len & 7) && frag->next) || > - skb_headroom(frag) < hlen) > + skb_headroom(frag) < (hlen + hroom)) > goto slow_path_clean; > > /* Partially cloned skb? */ My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes followed by a fragment of 776 bytes. The test cited above is only actually running on the latter fragment (which for some reason is fine and has headroom of 58 bytes). The first, larger, fragment isn't being checked. And that's the one with only 10 bytes of headroom. [ 62.027984] has frag list [ 62.030616] line 604 check frag ddc5fcc0 len 776 headroom 58 (hlen 40 hroom 16) [ 62.036720] line 678 send skb ded050c0 len 1280 headroom 10 [ 62.041096] skbuff: skb_under_panic: text:c125f9ca len:1294 put:14 head:dec89 000 data:dec88ffc tail:0xdec8950a end:0xdec89f50 dev:br-lan -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [linux-next] oops in ip_route_input_noref
On 9/16/15 5:50 AM, Richard Alpe wrote: On 2015-09-16 11:24, Sergey Senozhatsky wrote: Hi, 4.3.0-rc1-next-20150916 oops after removal of rndis usb device Hi Sergey: Is this with KVM or baremetal? -8<- thanks for the analysis addr2line -e vmlinux -i 0x8146c0b1 net/ipv4/route.c:1815 net/ipv4/route.c:1905 which seems to be this line ip_route_input_noref()->ip_route_input_slow(): ... 1813 rth->rt_is_input = 1; 1814 if (res.table) 1815 rth->rt_table_id = res.table->tb_id; 1816 ... added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 net: Add FIB table id to rtable Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. -ss Hi Richard: I to get an Oops in ip_route_input_noref(). It happens occasionally during bootup. KVM environment using virtio driver. Let me know if you need any additional info or if you want me to try to bisect it. Starting network... ... [0.877040] BUG: unable to handle kernel NULL pointer dereference at 0056 [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 Can you send me your kernel config and qemu command line? KVM with virtio networking is a primary test vehicle, and I did not encounter this at all. Thanks, David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARCNET: fix hard_header_len limit
On Wed, Aug 05, 2015 at 05:34:51PM +0200, Michael Grzeschik wrote: > On Thu, Jul 30, 2015 at 11:16:36AM -0700, David Miller wrote: > > From: Michael Grzeschik> > Date: Thu, 30 Jul 2015 15:34:36 +0200 > > > > > The commit <9c7077622dd9> ("packet: make packet_snd fail on len smaller > > > than l2 header") adds the check for minimum packet length of the used l2. > > > For arcnet the hardware header length is not the complete archdr which > > > includes hard + soft header. This patch changes the length to > > > sizeof(arc_hardware). > > > > > > Signed-off-by: Michael Grzeschik > > > > The hard header len is used for other purposes as well, are you sure > > those don't get broken by this change? > > Its meaning is to represent the amount of the hardware (link layer) > data of one packet. > > Which other purposes do you mean? > Can you point to some code? > > > Code assumes that if the data at the SKB mac pointer is taken, for > > dev->hard_header_len bytes, that is exactly the link layer header. > > And that this can be used to compare two MAC headers, copy the > > MAC header from one packet to another, etc. > > The link layer size of arcnet is 4 bytes long. 1 byte source, 1 byte > dest and two offset bytes. As described by struct arc_hardware in > if_arcnet.h . The above condition is fulfilled when the mac pointer > is 0. > > The following pending bytes of struct archdr have a variable meaning > depending of the used protocol and are represented by an union. > (network layer) > > In the case of raw packets, the payload comes immediately after the > hard_header. > Ping! I have the cleanup patches from Joe Perches and several ARCNET patches on top, waiting to be posted on the list. What is your Opinion to my Maintainer Request I send some weeks ago? Michael -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0| Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs
On Thu, Aug 27, 2015 at 08:40:29AM +0200, Jiri Pirko wrote: > Thu, Aug 27, 2015 at 08:36:03AM CEST, da...@davemloft.net wrote: > >From: Jiri Pirko> >Date: Thu, 27 Aug 2015 08:27:04 +0200 > > > >> I'm not saying it is not possible, it certainly is. But I think that > >> for example rocker internals have no value to default user, he > >> should not care and he cannot find out what is going on there > >> without knowledge or rocker.c code. The question is, do we need some > >> standard interface to expose random debugging data? I don't think > >> so, I think that debugfs is exactly the tool to be used in that > >> case. > > > >If it is only interesting to rocker.c maintainer, he can keep a local > >patch he applies when he needs such a facility. > > > >This discussion is becomming circular. > > > >If it's useful, it needs a well defined interface. > > > >If it's not useful, it doesn't belong in the tree. > > > >Therefore, debugfs is useless. > > Fair enough. Late reply, sorry, but another idea is to leave the stats in place (as they were going to be calculated even with debugfs unmounted) and (for now at least) fetch them with systemtap, perf or something like that. Then the stats are there for when you need them and with an interface as flexible as it can get. Even if you happen to do a post-mortem analysis, the info would at least be there. Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On 9/16/15 3:24 AM, Sergey Senozhatsky wrote: Hi, 4.3.0-rc1-next-20150916 oops after removal of rndis usb device Sergey: Can you send me the oops output? Thanks, David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On 9/16/15 7:53 AM, Richard Alpe wrote: I to get an Oops in ip_route_input_noref(). It happens occasionally during bootup. KVM environment using virtio driver. Let me know if you need any additional info or if you want me to try to bisect it. Starting network... ... [0.877040] BUG: unable to handle kernel NULL pointer dereference at 0056 [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 Can you send me your kernel config and qemu command line? KVM with virtio networking is a primary test vehicle, and I did not encounter this at all. Sure thing. Not sure how ppl normally provide files on netdev but I'm just going to go ahead and paste them here :) An attachment for the config is better than inline. $ ps aux | grep kvm qemu-system-x86_64 -enable-kvm -name tipc-medium-node1 -S -machine pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid cdec478a-5f0d-49f1-b25e-fac4ca0b290c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/tipc-medium-node1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=n,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=25,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=00:0f:ff:10:04:01,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:28101 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on $ cat .config # # Automatically generated file; DO NOT EDIT. # Linux/x86 3.12.28 Kernel Configuration # 3.12.28? That should say this for net-next: # Linux/x86 4.2.0 Kernel Configuration Or are you reporting a problem with 3.12.28? David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next:master 6/12] include/linux/usb/cdc.h:23: error: redefinition of 'struct usb_cdc_parsed_header'
On Tue, Sep 15, 2015 at 01:27:42PM -0700, David Miller wrote: > From: kbuild test robot> Date: Wed, 16 Sep 2015 03:57:11 +0800 > > > All error/warnings (new ones prefixed by >>): > > > >In file included from drivers/usb/gadget/function/u_ether.h:20, > > from drivers/usb/gadget/legacy/cdc2.c:16: > >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared > > inside parameter list > >include/linux/usb/cdc.h:47: warning: its scope is only this definition > > or declaration, which is probably not what you want > >In file included from drivers/usb/gadget/function/u_serial.h:16, > > from drivers/usb/gadget/legacy/cdc2.c:17: > >>> include/linux/usb/cdc.h:23: error: redefinition of 'struct > >>> usb_cdc_parsed_header' > >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared > > inside parameter list > >>> include/linux/usb/cdc.h:47: error: conflicting types for > >>> 'cdc_parse_cdc_header' > >include/linux/usb/cdc.h:47: error: previous declaration of > > 'cdc_parse_cdc_header' was here > > This may be a side effect of the initial warning, does this reproduce with > that fixed? Please show me what the warning looks like in that case. Dave, net-next/master commit ad1e7b97b3 ("cdc: Fix build warning.") still has errors. The problem is, the header file is included twice. recent_errors ├── arm-arm5 │ ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header │ └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header ├── arm-arm67 │ ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header │ └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header ├── arm-mmp │ ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header │ └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header ├── arm-omap2plus_defconfig │ ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header │ └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header ├── avr32-atngw100_defconfig │ └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header ├── avr32-atstk1006_defconfig │ └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header └── i386-allmodconfig ├── include-linux-usb-cdc.h:error:conflicting-types-for-cdc_parse_cdc_header └── include-linux-usb-cdc.h:error:redefinition-of-struct-usb_cdc_parsed_header The error messages are now: In file included from drivers/usb/gadget/function/u_ether.h:20:0, from drivers/usb/gadget/function/f_ncm.c:26: include/linux/usb/cdc.h:23:8: error: redefinition of 'struct usb_cdc_parsed_header' struct usb_cdc_parsed_header { ^ In file included from drivers/usb/gadget/function/f_ncm.c:24:0: include/linux/usb/cdc.h:23:8: note: originally defined here struct usb_cdc_parsed_header { ^ In file included from drivers/usb/gadget/function/u_ether.h:20:0, from drivers/usb/gadget/function/f_ncm.c:26: include/linux/usb/cdc.h:44:5: error: conflicting types for 'cdc_parse_cdc_header' int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr, ^ In file included from drivers/usb/gadget/function/f_ncm.c:24:0: include/linux/usb/cdc.h:44:5: note: previous declaration of 'cdc_parse_cdc_header' was here int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr, ^ Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 routing/fragmentation panic
On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote: > > David, could you test this? I'd do an official patch submission > then. Compiles. Doesn't fix the problem. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
Re: [linux-next] oops in ip_route_input_noref
On 9/16/15 7:59 AM, Richard Alpe wrote: Sorry about that kvm cmdline was a copy-paste error. Here's the right one using virtio. I was just about to respond to that as well... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On 2015-09-16 15:57, David Ahern wrote: > On 9/16/15 7:53 AM, Richard Alpe wrote: I to get an Oops in ip_route_input_noref(). It happens occasionally during bootup. KVM environment using virtio driver. Let me know if you need any additional info or if you want me to try to bisect it. Starting network... ... [0.877040] BUG: unable to handle kernel NULL pointer dereference at 0056 [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 >>> >>> Can you send me your kernel config and qemu command line? KVM with virtio >>> networking is a primary test vehicle, and I did not encounter this at all. >> Sure thing. Not sure how ppl normally provide files on netdev but I'm just >> going >> to go ahead and paste them here :) > > An attachment for the config is better than inline. Fantastic day today, I managed to mess up two out of two copy pastes. Sorry about that.. Here is the proper kconfig as .gz :) Regards Richard > >> >> $ ps aux | grep kvm >> qemu-system-x86_64 -enable-kvm -name tipc-medium-node1 -S -machine >> pc-0.14,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp >> 2,sockets=2,cores=1,threads=1 -uuid cdec478a-5f0d-49f1-b25e-fac4ca0b290c >> -no-user-config -nodefaults -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/tipc-medium-node1.monitor,server,nowait >> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown >> -boot order=n,menu=on,strict=on -device >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=25,id=hostnet0 >> -device >> e1000,netdev=hostnet0,id=net0,mac=00:0f:ff:10:04:01,bus=pci.0,addr=0x3 >> -chardev pty,id=charserial0 -device >> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:28101 -device >> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device >> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on >> >> $ cat .config >> # >> # Automatically generated file; DO NOT EDIT. >> # Linux/x86 3.12.28 Kernel Configuration >> # > > 3.12.28? That should say this for net-next: > > # Linux/x86 4.2.0 Kernel Configuration > > Or are you reporting a problem with 3.12.28? > > David config.gz Description: application/gzip
Re: [PATCH 2/2] airo: Implement netif_carrier_on/off
Hello. On 9/15/2015 6:18 PM, Ondrej Zary wrote: Add calls to netif_carrier_on and netif_carrier_off Signed-off-by: Ondrej Zary--- drivers/net/wireless/airo.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/airo.c b/drivers/net/wireless/airo.c index a8f2767..629245c 100644 --- a/drivers/net/wireless/airo.c +++ b/drivers/net/wireless/airo.c [...] @@ -3277,7 +3278,9 @@ static void airo_handle_link(struct airo_info *ai) eth_zero_addr(wrqu.ap_addr.sa_data); wrqu.ap_addr.sa_family = ARPHRD_ETHER; wireless_send_event(ai->dev, SIOCGIWAP, , NULL); - } + netif_carrier_off(ai->dev); + } else + netif_carrier_off(ai->dev); Need {} in all branches, according the the Documentation/CodingStyle. } static void airo_handle_rx(struct airo_info *ai) [...] MBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On Wed, Sep 16, 2015 at 6:24 AM, Sergey Senozhatsky <sergey.senozhatsky.w...@gmail.com> wrote: > added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 > > net: Add FIB table id to rtable > > Add the FIB table id to rtable to make the information available for > IPv4 as it is for IPv6. I see the same issue here when booting a mx25 ARM processor via NFS. defconfig is arch/arm/configs/imx_v4_v5_defconfig. It happens in 100% of the boots and the log is: fec 50038000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx Sending DHCP requests . Unable to handle kernel NULL pointer dereference at virtual address 0007 pgd = c0004000 [0007] *pgd= Internal error: Oops: 1 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc1-next-20150916-dirty #96 Hardware name: Freescale i.MX25 (Device Tree Support) task: c06ac1d0 ti: c06a8000 task.ti: c06a8000 PC is at ip_route_input_noref+0x3d8/0x808 LR is at __local_bh_enable_ip+0x5c/0xdc pc : []lr : []psr: a013 sp : c06a9cb0 ip : 000a fp : r10: c39b7000 r9 : c39c8d00 r8 : 1e00a8c0 r7 : c39c04a0 r6 : r5 : c3969a00 r4 : ff8f r3 : r2 : 0001 r1 : c0438410 r0 : c3969a00 Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 0005317f Table: 80004000 DAC: 0053 Process swapper (pid: 0, stack limit = 0xc06a8190) Stack: (0xc06a9cb0 to 0xc06aa000) 9ca0: 0044 c06ab93c 9cc0: 0100a8c0 c06f0540 c3a8f74e 0002 00070044 c043bba8 9ce0: c06a9d24 0002 1e00 0100a8c0 9d00: 0100a8c0 1e00a8c0 c3a8f720 0001 c3a8f74e 9d20: c39c04a0 c39c04a0 002e c3a8f720 0001 c06fa580 c043bbbc 9d40: c39b7000 c06a9d60 c3a8f720 c39b7000 c39c04a0 c06f0540 002e c043c3a0 9d60: c3a8f896 c06929f4 c39b7000 c30d3ce0 c06a9d78 0100a8c0 c0401190 c06ab9d8 9d80: 0008 c39b7048 0008 c06ab9d8 c39b7000 c39b705c c39c04a0 c040dfa0 9da0: c39c04a0 c3a8f74e 002e c39b753c c39c04a0 c39c04a0 c06ab9c0 c39b705c 9dc0: c39b7520 0008 c39c04a0 c04117c0 9de0: 08e0 c39c04a0 c39c04a0 c39b7520 0001 9e00: c39b7000 c0411400 0003 c04120cc c485d000 0800 9e20: c39c04a0 c02fde40 c06ac1d0 c06b1c38 0040 c39b7030 c39b7040 9e40: c3943000 0002 c30d39e0 9e60: c39b74b8 0040 c39b7460 c39b7520 c06b3ce0 c383de14 e1e6cf80 c39b7520 9e80: 0001 0040 012c c06a9ea8 8c4d c06b4500 c06fa580 c0411bdc 9ea0: c06a9eb0 8c4d c06a9ea8 c06a9ea8 c06a9eb0 c06a9eb0 0001 9ec0: 0008 0003 c06fd76c c06fa8d0 0101 0004 000c c001bc24 9ee0: c39d2080 0001 000a 8c4c 0020 9f00: c06cf3a4 0001 c06a9f58 41069264 c3802000 c001c148 9f20: c004c958 c06a9f58 c06fd284 c06a9f58 c06a9f8c c06fa69d 9f40: c06b3034 c0009404 c000ac20 6013 c04b5c64 0005317f 9f60: 0005217f 6013 c06aa0f4 c06fae98 c06fa69d c06fae98 c06fa69d 41069264 9f80: c06b3034 60d3 c06a9fa8 c000ac30 c000ac20 6013 9fa0: 0053 c06fae98 c0041724 c06ac1d0 c065ebc4 9fc0: c065e670 c06978bc c06fd174 9fe0: c06aa094 c06978b8 c06ad120 80004000 80695fb8 80008048 [] (ip_route_input_noref) from [] (ip_rcv_finish+0xe8/0x31c) [] (ip_rcv_finish) from [] (ip_rcv+0x2b4/0x3d4) [] (ip_rcv) from [] (__netif_receive_skb_core+0x304/0x944) [] (__netif_receive_skb_core) from [] (netif_receive_skb_internal+0x28/0x78) [] (netif_receive_skb_internal) from [] (napi_gro_receive+0x88/0x130) [] (napi_gro_receive) from [] (fec_enet_rx_napi+0x404/0xa78) [] (fec_enet_rx_napi) from [] (net_rx_action+0xf8/0x334) [] (net_rx_action) from [] (__do_softirq+0x11c/0x3a0) [] (__do_softirq) from [] (irq_exit+0xac/0xf8) [] (irq_exit) from [] (__handle_domain_irq+0x64/0xd0) [] (__handle_domain_irq) from [] (avic_handle_irq+0x34/0x54) [] (avic_handle_irq) from [] (__irq_svc+0x44/0x78) Exception stack(0xc06a9f58 to 0xc06a9fa0) 9f40: 0005317f 9f60: 0005217f 6013 c06aa0f4 c06fae98 c06fa69d c06fae98 c06fa69d 41069264 9f80: c06b3034 60d3 c06a9fa8 c000ac30 c000ac20 6013 [] (__irq_svc) from [] (arch_cpu_idle+0x28/0x44) [] (arch_cpu_idle) from [] (cpu_startup_entry+0x118/0x2bc) [] (cpu_startup_entry) from [] (start_kernel+0x308/0x368) [] (start_kernel) from [<80008048>] (0x80008048) Code: e3a02001 e353 e585102c e5c5205e (15933008) ---[ end trace 443993f61e8bf0a0 ]--- Kernel panic - not syncing: Fatal exception in interrupt ---[ end Kernel panic - not syncing: Fatal exception in interrupt -- To un
Re: [linux-next] oops in ip_route_input_noref
On 9/16/15 9:00 AM, Fabio Estevam wrote: On Wed, Sep 16, 2015 at 6:24 AM, Sergey Senozhatskywrote: added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 net: Add FIB table id to rtable Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. I see the same issue here when booting a mx25 ARM processor via NFS. defconfig is arch/arm/configs/imx_v4_v5_defconfig. I am still not able to reproduce. While I work on a full Cumulus image for other test cases here's a patch to try; eagle eye Nikolay noted a potential use without init in the maze of goto's. Thanks, David diff --git a/net/ipv4/route.c b/net/ipv4/route.c index da427a4a33fe..80f7c5b7b832 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1712,6 +1712,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, goto martian_source; res.fi = NULL; + res.table = NULL; if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) goto brd_input; @@ -1834,6 +1835,7 @@ out: return err; RT_CACHE_STAT_INC(in_no_route); res.type = RTN_UNREACHABLE; res.fi = NULL; + res.table = NULL; goto local_input; /*
[PATCH net] ipv6: ip6_fragment: fix headroom tests and skb leak
David Woodhouse reports skb_under_panic when we try to push ethernet header to fragmented ipv6 skbs: skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000 data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan [..] ip6_finish_output2+0x196/0x4da David further debugged this: [..] offending fragments were arriving here with skb_headroom(skb)==10. Which is reasonable, being the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP frame type. The problem is that if netfilter ipv6 defragmentation is used, skb_cow() in ip6_forward will only see reassembled skb. Therefore, headroom is overestimated by 8 bytes (we pulled fragment header) and we don't check the skbs in the frag_list either. We can't do these checks in netfilter defrag since outdev isn't known yet. Furthermore, existing tests in ip6_fragment did not consider the fragment or ipv6 header size when checking headroom of the fraglist skbs. While at it, also fix a skb leak on memory allocation -- ip6_fragment must consume the skb. I tested this e1000 driver hacked to not allocate additional headroom (we end up in slowpath, since LL_RESERVED_SPACE is 16). If 2 bytes of headroom are allocated, fastpath is taken (14 byte ethernet header was pulled, so 16 byte headroom available in all fragments). Reported-by: David WoodhouseDiagnosed-by: David Woodhouse Signed-off-by: Florian Westphal --- net/ipv6/ip6_output.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 26ea479..92b1aa3 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -586,20 +586,22 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, frag_id = ipv6_select_ident(net, _hdr(skb)->daddr, _hdr(skb)->saddr); + hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { int first_len = skb_pagelen(skb); struct sk_buff *frag2; if (first_len - hlen > mtu || ((first_len - hlen) & 7) || - skb_cloned(skb)) + skb_cloned(skb) || + skb_headroom(skb) < (hroom + sizeof(struct frag_hdr))) goto slow_path; skb_walk_frags(skb, frag) { /* Correct geometry. */ if (frag->len > mtu || ((frag->len & 7) && frag->next) || - skb_headroom(frag) < hlen) + skb_headroom(frag) < (hlen + hroom + sizeof(struct frag_hdr))) goto slow_path_clean; /* Partially cloned skb? */ @@ -616,8 +618,6 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, err = 0; offset = 0; - frag = skb_shinfo(skb)->frag_list; - skb_frag_list_init(skb); /* BUILD HEADER */ *prevhdr = NEXTHDR_FRAGMENT; @@ -625,8 +625,11 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb, if (!tmp_hdr) { IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_FRAGFAILS); - return -ENOMEM; + err = -ENOMEM; + goto fail; } + frag = skb_shinfo(skb)->frag_list; + skb_frag_list_init(skb); __skb_pull(skb, hlen); fh = (struct frag_hdr *)__skb_push(skb, sizeof(struct frag_hdr)); @@ -723,7 +726,6 @@ slow_path: */ *prevhdr = NEXTHDR_FRAGMENT; - hroom = LL_RESERVED_SPACE(rt->dst.dev); troom = rt->dst.dev->needed_tailroom; /* -- 2.0.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: Initialize table in fib result
Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard: [0.877040] BUG: unable to handle kernel NULL pointer dereference at 0056 [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 [0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0 [0.877597] Oops: [#1] SMP [0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio [0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1 [0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.877597] task: 88003fab0bc0 ti: 88003faa8000 task.ti: 88003faa8000 [0.877597] RIP: 0010:[] [] ip_route_input_noref+0x1a2/0xb00 [0.877597] RSP: 0018:88003ed03ba0 EFLAGS: 00010202 [0.877597] RAX: 0046 RBX: ff8f RCX: 0020 [0.877597] RDX: 88003fab50b8 RSI: 0200 RDI: 8152b4b8 [0.877597] RBP: 88003ed03c50 R08: R09: [0.877597] R10: R11: R12: 88003fab6f00 [0.877597] R13: 88003fab5000 R14: R15: 81cb5600 [0.877597] FS: 7f6de5751700() GS:88003ed0() knlGS: [0.877597] CS: 0010 DS: ES: CR0: 80050033 [0.877597] CR2: 0056 CR3: 3fa6d000 CR4: 06e0 [0.877597] Stack: [0.877597] 0046 88003fffa600 88003ed03be0 [0.877597] 88003f9e2c00 697da8c0017da8c0 8800 0007fd00 [0.877597] 0046 0004 [0.877597] Call Trace: [0.877597] [0.877597] [] ? cpumask_next_and+0x2f/0x40 [0.877597] [] arp_process+0x39c/0x690 [0.877597] [] arp_rcv+0x13e/0x170 [0.877597] [] __netif_receive_skb_core+0x60c/0xa00 [0.877597] [] ? __build_skb+0x25/0x100 [0.877597] [] ? __build_skb+0x25/0x100 [0.877597] [] __netif_receive_skb+0x16/0x70 [0.877597] [] netif_receive_skb_internal+0x28/0x90 [0.877597] [] napi_gro_receive+0x7f/0xd0 [0.877597] [] virtnet_receive+0x256/0x910 [virtio_net] [0.877597] [] virtnet_poll+0x18/0x80 [virtio_net] [0.877597] [] net_rx_action+0x1dd/0x2f0 [0.877597] [] __do_softirq+0x98/0x260 [0.877597] [] do_softirq_own_stack+0x1c/0x30 The root cause is use of res.table uninitialized. Thanks to Nikolay for noticing the uninitialized use amongst the maze of gotos. Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") Reported-by: Sergey SenozhatskyReported-by: Richard Alpe Reported-by: Fabio Estevam Signed-off-by: David Ahern --- net/ipv4/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index da427a4a33fe..80f7c5b7b832 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1712,6 +1712,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, goto martian_source; res.fi = NULL; + res.table = NULL; if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) goto brd_input; @@ -1834,6 +1835,7 @@ out: return err; RT_CACHE_STAT_INC(in_no_route); res.type = RTN_UNREACHABLE; res.fi = NULL; + res.table = NULL; goto local_input; /* -- 2.3.2 (Apple Git-55) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Experiences with slub bulk use-case for network stack
On Wed, 16 Sep 2015, Jesper Dangaard Brouer wrote: > > Hint, this leads up to discussing if current bulk *ALLOC* API need to > be changed... > > Alex and I have been working hard on practical use-case for SLAB > bulking (mostly slUb), in the network stack. Here is a summary of > what we have learned so far. SLAB refers to the SLAB allocator which is one slab allocator and SLUB is another slab allocator. Please keep that consistent otherwise things get confusing > Bulk free'ing SKBs during TX completion is a big and easy win. > > Specifically for slUb, normal path for freeing these objects (which > are not on c->freelist) require a locked double_cmpxchg per object. > The bulk free (via detached freelist patch) allow to free all objects > belonging to the same slab-page, to be free'ed with a single locked > double_cmpxchg. Thus, the bulk free speedup is quite an improvement. Yep. > Alex and I had the idea of bulk alloc returns an "allocator specific > cache" data-structure (and we add some helpers to access this). Maybe add some Macros to handle this? > In the slUb case, the freelist is a single linked pointer list. In > the network stack the skb objects have a skb->next pointer, which is > located at the same position as freelist pointer. Thus, simply > returning the freelist directly, could be interpreted as a skb-list. > The helper API would then do the prefetching, when pulling out > objects. The problem with the SLUB case is that the objects must be on the same slab page. > For the slUb case, we would simply cmpxchg either c->freelist or > page->freelist with a NULL ptr, and then own all objects on the > freelist. This also reduce the time we keep IRQs disabled. You dont need to disable interrupts for the cmpxchges. There is additional state in the page struct though so the updates must be done carefully. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/3] Allow postponed netfilter handling for socket matches
I'm re-addressing the issue of matching socket meta information for non-established sockets that has been discussed a while ago: http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 Being able to reliably match on net_cls cgroup ids is crucial in order to build a per-application or per-container firewall rules which don't leak ingress packets. Such a feature would be very useful to have. A previous attempt to fix the currently existing issues was to call out to the early demuxing helper functions from the meta matching callbacks, but that doesn't suffice because it doesn't address the case of multicast UDP and other, more complex lookup methods implemented in various protocol handlers. This patch set outlines a different approach by adding a flag to 'struct sk_buff' called 'nf_postponed'. This flag is set by nft_meta_get_eval() in case a decision cannot be made due to a missing skb->sk. skbs flagged that way will then be ran through the netfilter chain processor again after the protocol handlers did the real socket lookup. A small addition to 'struct nft_pktinfo' is needed so that the matching callbacks can access the socket that was passed into nf_hook(). Note that the new flag does not actually bloat 'struct skb_buff', because it still fits into the 'flags1' bitfield. Also, the extra netfilter chain iteration will not be done by any subsequent packet in the same stream, as for those, the early demux code will set skb->sk. The patch set is obviously not yet finished, because a lot more protocol handlers need to be patched. Right now, I only addressed tcp_ipv4. Before I do that, I want to get some feedback on the approach, so please let me know what you think. Thanks, Daniel Daniel Mack (3): netfilter: add socket to struct nft_pktinfo netfilter: nft_meta: mark skbs for postponed filter processing net: tcp_ipv4: re-run netfilter chains for marked skbs include/linux/skbuff.h| 3 ++- include/net/netfilter/nf_tables.h | 2 ++ net/ipv4/tcp_ipv4.c | 10 ++ net/netfilter/nft_meta.c | 9 ++--- 4 files changed, 20 insertions(+), 4 deletions(-) -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 2/3] netfilter: nft_meta: mark skbs for postponed filter processing
When the cgroup matching code in nft_meta is called without a socket to look at, it currently bails out and lets the packet pass. This is bad, because the reason for skb->sk being NULL is simply that the packet was directed to a socket that hasn't been looked up yet by early demux. This patch does two things: a) it uses the newly introduced pkt->sk pointer rather than skb->sk to check for the net class ID. This allows us to look at the socket the user passed into nf_hook(). b) in case the socket can't be accessed, it marks the skb as 'nf_postponed', so that later dispatchers have a chance to re-iterate the chain for such packets, after a full demux was conducted. Note that the added flag in 'struct skb' does not increase the size of the struct, as it fits in the 'flags1' bitfield. Signed-off-by: Daniel Mack--- include/linux/skbuff.h | 3 ++- net/netfilter/nft_meta.c | 9 ++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2738d35..3590101 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -584,7 +584,8 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - xmit_more:1; + xmit_more:1, + nf_postponed:1; /* one bit hole */ kmemcheck_bitfield_end(flags1); diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index cb2f13e..33b8d23 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -29,8 +29,9 @@ void nft_meta_get_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { const struct nft_meta *priv = nft_expr_priv(expr); - const struct sk_buff *skb = pkt->skb; const struct net_device *in = pkt->in, *out = pkt->out; + struct sk_buff *skb = pkt->skb; + struct sock *sk = pkt->sk; u32 *dest = >data[priv->dreg]; switch (priv->key) { @@ -168,9 +169,11 @@ void nft_meta_get_eval(const struct nft_expr *expr, break; #ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: - if (skb->sk == NULL || !sk_fullsock(skb->sk)) + if (sk == NULL || !sk_fullsock(sk)) { + skb->nf_postponed = 1; goto err; - *dest = skb->sk->sk_classid; + } + *dest = sk->sk_classid; break; #endif default: -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 3/3] net: tcp_ipv4: re-run netfilter chains for marked skbs
When an skb has been marked for later re-iteration through netfilter, do that after __inet_lookup_skb() has been called. This allows packets sent to unconnected sockets to be filtered reliably. Note that this will never happen for subsequent packets in the same stream, as skb->sk will be set due to early demux, and hence skb->nf_postponed will remain 0. Signed-off-by: Daniel Mack--- net/ipv4/tcp_ipv4.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 93898e0..61e0cb4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -78,6 +78,7 @@ #include #include +#include #include #include #include @@ -1594,6 +1595,15 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!sk) goto no_tcp_socket; + if (unlikely(skb->nf_postponed)) { + ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + } + process: if (sk->sk_state == TCP_TIME_WAIT) goto do_time_wait; -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Initialize table in fib result
On 09/16/2015 05:38 PM, David Ahern wrote: > Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., > from Richard: > > [0.877040] BUG: unable to handle kernel NULL pointer dereference at > 0056 > [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 > [0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0 > [0.877597] Oops: [#1] SMP > [0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio > [0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1 > [0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [0.877597] task: 88003fab0bc0 ti: 88003faa8000 task.ti: > 88003faa8000 > [0.877597] RIP: 0010:[] [] > ip_route_input_noref+0x1a2/0xb00 > [0.877597] RSP: 0018:88003ed03ba0 EFLAGS: 00010202 > [0.877597] RAX: 0046 RBX: ff8f RCX: > 0020 > [0.877597] RDX: 88003fab50b8 RSI: 0200 RDI: > 8152b4b8 > [0.877597] RBP: 88003ed03c50 R08: R09: > > [0.877597] R10: R11: R12: > 88003fab6f00 > [0.877597] R13: 88003fab5000 R14: R15: > 81cb5600 > [0.877597] FS: 7f6de5751700() GS:88003ed0() > knlGS: > [0.877597] CS: 0010 DS: ES: CR0: 80050033 > [0.877597] CR2: 0056 CR3: 3fa6d000 CR4: > 06e0 > [0.877597] Stack: > [0.877597] 0046 88003fffa600 > 88003ed03be0 > [0.877597] 88003f9e2c00 697da8c0017da8c0 8800 > 0007fd00 > [0.877597] 0046 > 0004 > [0.877597] Call Trace: > [0.877597] > [0.877597] [] ? cpumask_next_and+0x2f/0x40 > [0.877597] [] arp_process+0x39c/0x690 > [0.877597] [] arp_rcv+0x13e/0x170 > [0.877597] [] __netif_receive_skb_core+0x60c/0xa00 > [0.877597] [] ? __build_skb+0x25/0x100 > [0.877597] [] ? __build_skb+0x25/0x100 > [0.877597] [] __netif_receive_skb+0x16/0x70 > [0.877597] [] netif_receive_skb_internal+0x28/0x90 > [0.877597] [] napi_gro_receive+0x7f/0xd0 > [0.877597] [] virtnet_receive+0x256/0x910 [virtio_net] > [0.877597] [] virtnet_poll+0x18/0x80 [virtio_net] > [0.877597] [] net_rx_action+0x1dd/0x2f0 > [0.877597] [] __do_softirq+0x98/0x260 > [0.877597] [] do_softirq_own_stack+0x1c/0x30 > > The root cause is use of res.table uninitialized. > > Thanks to Nikolay for noticing the uninitialized use amongst the maze of > gotos. > > Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") > Reported-by: Sergey Senozhatsky> Reported-by: Richard Alpe > Reported-by: Fabio Estevam > Signed-off-by: David Ahern > --- > net/ipv4/route.c | 2 ++ > 1 file changed, 2 insertions(+) > Just to have it documented: I don't think we need the second NULLing, but it doesn't hurt. Thanks, Signed-off-by: Nikolay Aleksandrov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 1/3] netfilter: add socket to struct nft_pktinfo
The high-level netfilter hook API already enables users to pass a socket, but that information is lost when the chains are walked. In order to let internal eval callbacks use the passed filter rather than skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and set that field via nft_set_pktinfo(). This allows us to run filter chains from situations where skb->sk is unset. Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be written in a generic way. Signed-off-by: Daniel Mack--- include/net/netfilter/nf_tables.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index aa8bee7..05e97ed 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -13,6 +13,7 @@ #define NFT_JUMP_STACK_SIZE16 struct nft_pktinfo { + struct sock *sk; struct sk_buff *skb; const struct net_device *in; const struct net_device *out; @@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb, const struct nf_hook_state *state) { + pkt->sk = state->sk ?: skb->sk; pkt->skb = skb; pkt->in = pkt->xt.in = state->in; pkt->out = pkt->xt.out = state->out; -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 routing/fragmentation panic
David Woodhousewrote: > > if (frag->len > mtu || > > ((frag->len & 7) && frag->next) || > > - skb_headroom(frag) < hlen) > > + skb_headroom(frag) < (hlen + hroom)) > > goto slow_path_clean; > > > > /* Partially cloned skb? */ > > My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes > followed by a fragment of 776 bytes. > > The test cited above is only actually running on the latter fragment > (which for some reason is fine and has headroom of 58 bytes). > > The first, larger, fragment isn't being checked. And that's the one > with only 10 bytes of headroom. Thanks for this detailed analysis. I've sent a patch that should address all of these issues. Turns out that all tests are wrong in your case. ip6_fragment doesn't expand headroom, since this skb had the ipv6 fragment header pulled, so that part thinks there are 18 bytes available (we later push the frag header back when sending fragments). The 'skb_headroom(frag) < hlen))' is wrong since it neither accounts for device header length nor the fragment header that we need to push. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate memory allocation in the packet send path") introduced skb headroom request for Hyper-V netvsc driver: max_needed_headroom = sizeof(struct hv_netvsc_packet) + sizeof(struct rndis_message) + NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; ... net->needed_headroom = max_needed_headroom; max_needed_headroom is 220 bytes, it significantly exceeds the LL_MAX_HEADER setting. This causes each skb to be cloned on send path, e.g. for IPv4 case we fall into the following clause (ip_finish_output2()): if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { ... skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); ... } leading to a significant performance regression. Increase LL_MAX_HEADER to make it suitable for netvsc, make it 224 to be 16-aligned. Alternatively we could (partially) revert the commit which introduced skb headroom request restoring manual memory allocation on transmit path. Signed-off-by: Vitaly Kuznetsov--- include/linux/netdevice.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 88a0069..7233790 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) * used. */ -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) +#if IS_ENABLED(CONFIG_HYPERV_NET) +# define LL_MAX_HEADER 224 +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) # if defined(CONFIG_MAC80211_MESH) # define LL_MAX_HEADER 128 # else -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2] man ip-link: Fix wording in VLAN reorder_hdr explanation
From: Vadim KochanSigned-off-by: Vadim Kochan --- man/man8/ip-link.8.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index 1896eb6..4928249 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -327,7 +327,7 @@ physical device (if this device does not support VLAN offloading), the similar on the RX direction - by default the packet will be untagged before being received by VLAN device. Reordering allows to accelerate tagging on egress and to hide VLAN header on ingress so the packet looks like regular Ethernet packet, -at the same time it might be confusing while the packet sniffing as the VLAN header +at the same time it might be confusing for packet capture as the VLAN header does not exist within the packet. VLAN offloading can be checked by -- 2.4.2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] man ip-link: Add more explanation about vlan reordering
On Wed, Aug 26, 2015 at 04:27:48PM +0100, Jeremy Harris wrote: > On 17/08/15 20:22, Vadim Kochan wrote: > > +.BR reorder_hdr " is " on > > +then VLAN header will be not inserted immediately but only before passing > > to the > > +physical device (if this device does not support VLAN offloading), the > > similar > > +on the RX direction - by default the packet will be untagged before being > > +received by VLAN device. Reordering allows to accelerate tagging on egress > > and > > +to hide VLAN header on ingress so the packet looks like regular Ethernet > > packet, > > +at the same time it might be confusing while the packet sniffing as the > > VLAN header > ^ > > Does not read well. "for packet capture" perhaps? > -- > Jeremy > > Hi Jeremy, Thanks for you comment, I have sent a patch, let me know if it is correct now. And sorry for so late response. Thanks, Vadim Kochan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Initialize table in fib result
On 9/16/15 9:56 AM, Nikolay Aleksandrov wrote: Just to have it documented: I don't think we need the second NULLing, but it doesn't hurt. I think we do. After the second one there is a goto to local_input which uses res.table. The second goto is reachable 'if !IN_DEV_FORWARD(in_dev)' in which case res.table is valid but should not be. In short if fi is reset, table should be. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Initialize table in fib result
On Wed, Sep 16, 2015 at 12:38 PM, David Ahernwrote: > The root cause is use of res.table uninitialized. > > Thanks to Nikolay for noticing the uninitialized use amongst the maze of > gotos. > > Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") > Reported-by: Sergey Senozhatsky > Reported-by: Richard Alpe > Reported-by: Fabio Estevam > Signed-off-by: David Ahern Thanks, David. I am able to NFS boot again: Tested-by: Fabio Estevam -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> -Original Message- > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > Sent: Wednesday, September 16, 2015 11:50 AM > To: netdev@vger.kernel.org > Cc: David S. Miller; linux-ker...@vger.kernel.org; > KY Srinivasan ; Haiyang Zhang > ; Jason Wang > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate > memory allocation in the packet send path") introduced skb headroom > request for Hyper-V netvsc driver: > >max_needed_headroom = sizeof(struct hv_netvsc_packet) + >sizeof(struct rndis_message) + >NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + >NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; >... >net->needed_headroom = max_needed_headroom; > > max_needed_headroom is 220 bytes, it significantly exceeds the > LL_MAX_HEADER setting. This causes each skb to be cloned on send path, > e.g. for IPv4 case we fall into the following clause > (ip_finish_output2()): > > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { > ... > skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); > ... > } > > leading to a significant performance regression. Increase LL_MAX_HEADER > to make it suitable for netvsc, make it 224 to be 16-aligned. > Alternatively we could (partially) revert the commit which introduced > skb > headroom request restoring manual memory allocation on transmit path. > > Signed-off-by: Vitaly Kuznetsov > --- > include/linux/netdevice.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 88a0069..7233790 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) > * used. > */ > > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > +#if IS_ENABLED(CONFIG_HYPERV_NET) > +# define LL_MAX_HEADER 224 > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > # if defined(CONFIG_MAC80211_MESH) > # define LL_MAX_HEADER 128 > # else Thanks for the patch. To avoid we forget to update that 224 number when we add more things into netvsc header, I suggest that we define a macro in netdevice.h such as: #define HVNETVSC_MAX_HEADER 224 #define LL_MAX_HEADER HVNETVSC_MAX_HEADER And, put a note in netvsc code saying the header reservation shouldn't exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER. Thanks, - Haiyang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] net: Initialize table in fib result
Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard: [0.877040] BUG: unable to handle kernel NULL pointer dereference at 0056 [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 [0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0 [0.877597] Oops: [#1] SMP [0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio [0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1 [0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [0.877597] task: 88003fab0bc0 ti: 88003faa8000 task.ti: 88003faa8000 [0.877597] RIP: 0010:[] [] ip_route_input_noref+0x1a2/0xb00 [0.877597] RSP: 0018:88003ed03ba0 EFLAGS: 00010202 [0.877597] RAX: 0046 RBX: ff8f RCX: 0020 [0.877597] RDX: 88003fab50b8 RSI: 0200 RDI: 8152b4b8 [0.877597] RBP: 88003ed03c50 R08: R09: [0.877597] R10: R11: R12: 88003fab6f00 [0.877597] R13: 88003fab5000 R14: R15: 81cb5600 [0.877597] FS: 7f6de5751700() GS:88003ed0() knlGS: [0.877597] CS: 0010 DS: ES: CR0: 80050033 [0.877597] CR2: 0056 CR3: 3fa6d000 CR4: 06e0 [0.877597] Stack: [0.877597] 0046 88003fffa600 88003ed03be0 [0.877597] 88003f9e2c00 697da8c0017da8c0 8800 0007fd00 [0.877597] 0046 0004 [0.877597] Call Trace: [0.877597] [0.877597] [] ? cpumask_next_and+0x2f/0x40 [0.877597] [] arp_process+0x39c/0x690 [0.877597] [] arp_rcv+0x13e/0x170 [0.877597] [] __netif_receive_skb_core+0x60c/0xa00 [0.877597] [] ? __build_skb+0x25/0x100 [0.877597] [] ? __build_skb+0x25/0x100 [0.877597] [] __netif_receive_skb+0x16/0x70 [0.877597] [] netif_receive_skb_internal+0x28/0x90 [0.877597] [] napi_gro_receive+0x7f/0xd0 [0.877597] [] virtnet_receive+0x256/0x910 [virtio_net] [0.877597] [] virtnet_poll+0x18/0x80 [virtio_net] [0.877597] [] net_rx_action+0x1dd/0x2f0 [0.877597] [] __do_softirq+0x98/0x260 [0.877597] [] do_softirq_own_stack+0x1c/0x30 The root cause is use of res.table uninitialized. Thanks to Nikolay for noticing the uninitialized use amongst the maze of gotos. As Nikolay pointed out the second initialization is not required to fix the oops, but rather to fix a related problem where a valid lookup should be invalidated before creating the rth entry. Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") Reported-by: Sergey SenozhatskyReported-by: Richard Alpe Reported-by: Fabio Estevam Tested-by: Fabio Estevam Signed-off-by: David Ahern --- v2: - clarification in the commit message regarding the second initialization net/ipv4/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index da427a4a33fe..80f7c5b7b832 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1712,6 +1712,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, goto martian_source; res.fi = NULL; + res.table = NULL; if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) goto brd_input; @@ -1834,6 +1835,7 @@ out: return err; RT_CACHE_STAT_INC(in_no_route); res.type = RTN_UNREACHABLE; res.fi = NULL; + res.table = NULL; goto local_input; /* -- 2.3.2 (Apple Git-55) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] net: Initialize table in fib result
On 09/16/2015 06:16 PM, David Ahern wrote: > Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., > from Richard: > > [0.877040] BUG: unable to handle kernel NULL pointer dereference at > 0056 > [0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 > [0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0 > [0.877597] Oops: [#1] SMP > [0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio > [0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1 > [0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [0.877597] task: 88003fab0bc0 ti: 88003faa8000 task.ti: > 88003faa8000 > [0.877597] RIP: 0010:[] [] > ip_route_input_noref+0x1a2/0xb00 > [0.877597] RSP: 0018:88003ed03ba0 EFLAGS: 00010202 > [0.877597] RAX: 0046 RBX: ff8f RCX: > 0020 > [0.877597] RDX: 88003fab50b8 RSI: 0200 RDI: > 8152b4b8 > [0.877597] RBP: 88003ed03c50 R08: R09: > > [0.877597] R10: R11: R12: > 88003fab6f00 > [0.877597] R13: 88003fab5000 R14: R15: > 81cb5600 > [0.877597] FS: 7f6de5751700() GS:88003ed0() > knlGS: > [0.877597] CS: 0010 DS: ES: CR0: 80050033 > [0.877597] CR2: 0056 CR3: 3fa6d000 CR4: > 06e0 > [0.877597] Stack: > [0.877597] 0046 88003fffa600 > 88003ed03be0 > [0.877597] 88003f9e2c00 697da8c0017da8c0 8800 > 0007fd00 > [0.877597] 0046 > 0004 > [0.877597] Call Trace: > [0.877597] > [0.877597] [] ? cpumask_next_and+0x2f/0x40 > [0.877597] [] arp_process+0x39c/0x690 > [0.877597] [] arp_rcv+0x13e/0x170 > [0.877597] [] __netif_receive_skb_core+0x60c/0xa00 > [0.877597] [] ? __build_skb+0x25/0x100 > [0.877597] [] ? __build_skb+0x25/0x100 > [0.877597] [] __netif_receive_skb+0x16/0x70 > [0.877597] [] netif_receive_skb_internal+0x28/0x90 > [0.877597] [] napi_gro_receive+0x7f/0xd0 > [0.877597] [] virtnet_receive+0x256/0x910 [virtio_net] > [0.877597] [] virtnet_poll+0x18/0x80 [virtio_net] > [0.877597] [] net_rx_action+0x1dd/0x2f0 > [0.877597] [] __do_softirq+0x98/0x260 > [0.877597] [] do_softirq_own_stack+0x1c/0x30 > > The root cause is use of res.table uninitialized. > > Thanks to Nikolay for noticing the uninitialized use amongst the maze of > gotos. > > As Nikolay pointed out the second initialization is not required to fix > the oops, but rather to fix a related problem where a valid lookup should > be invalidated before creating the rth entry. > > Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") > Reported-by: Sergey Senozhatsky> Reported-by: Richard Alpe > Reported-by: Fabio Estevam > Tested-by: Fabio Estevam > Signed-off-by: David Ahern > --- > v2: > - clarification in the commit message regarding the second initialization > > net/ipv4/route.c | 2 ++ > 1 file changed, 2 insertions(+) > Thanks again! Signed-off-by: Nikolay Aleksandrov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
From: Haiyang Zhang > Sent: 16 September 2015 17:09 > > -Original Message- > > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > > Sent: Wednesday, September 16, 2015 11:50 AM > > To: netdev@vger.kernel.org > > Cc: David S. Miller; linux-ker...@vger.kernel.org; > > KY Srinivasan ; Haiyang Zhang > > ; Jason Wang > > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > > > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate > > memory allocation in the packet send path") introduced skb headroom > > request for Hyper-V netvsc driver: > > > >max_needed_headroom = sizeof(struct hv_netvsc_packet) + > >sizeof(struct rndis_message) + > >NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + > >NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; > >... > >net->needed_headroom = max_needed_headroom; > > > > max_needed_headroom is 220 bytes, it significantly exceeds the > > LL_MAX_HEADER setting. This causes each skb to be cloned on send path, > > e.g. for IPv4 case we fall into the following clause > > (ip_finish_output2()): > > > > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { > > ... > > skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); > > ... > > } > > > > leading to a significant performance regression. Increase LL_MAX_HEADER > > to make it suitable for netvsc, make it 224 to be 16-aligned. > > Alternatively we could (partially) revert the commit which introduced > > skb > > headroom request restoring manual memory allocation on transmit path. > > > > Signed-off-by: Vitaly Kuznetsov > > --- > > include/linux/netdevice.h | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 88a0069..7233790 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) > > * used. > > */ > > > > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > > +#if IS_ENABLED(CONFIG_HYPERV_NET) > > +# define LL_MAX_HEADER 224 > > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > > # if defined(CONFIG_MAC80211_MESH) > > # define LL_MAX_HEADER 128 > > # else > > Thanks for the patch. > To avoid we forget to update that 224 number when we add more things > into netvsc header, I suggest that we define a macro in netdevice.h such > as: > #define HVNETVSC_MAX_HEADER 224 > #define LL_MAX_HEADER HVNETVSC_MAX_HEADER > > And, put a note in netvsc code saying the header reservation shouldn't > exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER. Am I right in thinking this is adding an extra 96 unused bytes to the front of almost all skb just so that hyper-v can make its link level header contiguous with whatever follows (IP header ?). Doesn't sound ideal. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] atm: deal with setting entry before mkip was called
If we didn't call ATMARP_MKIP before ATMARP_ENCAP the VCC descriptor is non-existant and we'll end up dereferencing a NULL ptr: [1033173.491930] kasan: GPF could be caused by NULL-ptr deref or user memory accessirq event stamp: 123386 [1033173.493678] general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [1033173.493689] Modules linked in: [1033173.493697] CPU: 9 PID: 23815 Comm: trinity-c64 Not tainted 4.2.0-next-20150911-sasha-00043-g353d875-dirty #2545 [1033173.493706] task: 8800630c4000 ti: 88006311 task.ti: 88006311 [1033173.493823] RIP: clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689) [1033173.493826] RSP: 0018:880063117a88 EFLAGS: 00010203 [1033173.493828] RAX: dc00 RBX: RCX: 000c [1033173.493830] RDX: 0002 RSI: b3f10720 RDI: 0014 [1033173.493832] RBP: 880063117b80 R08: 88047574d9a4 R09: [1033173.493834] R10: R11: R12: 11000c622f53 [1033173.493836] R13: 8800cb905500 R14: 8808d6da2000 R15: fdfd [1033173.493840] FS: 7fa56b92d700() GS:88047800() knlGS: [1033173.493843] CS: 0010 DS: ES: CR0: 8005003b [1033173.493845] CR2: CR3: 630e8000 CR4: 06a0 [1033173.493855] Stack: [1033173.493862] b0b60444 eaea 41b58ab3 b3c3ce32 [1033173.493867] b0b6f3e0 b0b60444 b5ea2e50 11000c622f5e [1033173.493873] 8800630c4cd8 000ee09a b3ec4888 b5ea2de8 [1033173.493874] Call Trace: [1033173.494108] do_vcc_ioctl (net/atm/ioctl.c:170) [1033173.494113] vcc_ioctl (net/atm/ioctl.c:189) [1033173.494116] svc_ioctl (net/atm/svc.c:605) [1033173.494200] sock_do_ioctl (net/socket.c:874) [1033173.494204] sock_ioctl (net/socket.c:958) [1033173.494244] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [1033173.494290] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [1033173.494295] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186) [1033173.494362] Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 50 09 00 00 49 8b 9e 60 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 14 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 14 09 00 All code 0: fa cli 1: 48 c1 ea 03 shr$0x3,%rdx 5: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 9: 0f 85 50 09 00 00 jne0x95f f: 49 8b 9e 60 06 00 00mov0x660(%r14),%rbx 16: 48 b8 00 00 00 00 00movabs $0xdc00,%rax 1d: fc ff df 20: 48 8d 7b 14 lea0x14(%rbx),%rdi 24: 48 89 famov%rdi,%rdx 27: 48 c1 ea 03 shr$0x3,%rdx 2b:* 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction 2f: 48 89 famov%rdi,%rdx 32: 83 e2 07and$0x7,%edx 35: 38 d0 cmp%dl,%al 37: 7f 08 jg 0x41 39: 84 c0 test %al,%al 3b: 0f 85 14 09 00 00 jne0x955 Code starting with the faulting instruction === 0: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax 4: 48 89 famov%rdi,%rdx 7: 83 e2 07and$0x7,%edx a: 38 d0 cmp%dl,%al c: 7f 08 jg 0x16 e: 84 c0 test %al,%al 10: 0f 85 14 09 00 00 jne0x92a [1033173.494366] RIP clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689) [1033173.494368] RSP Signed-off-by: Sasha Levin--- net/atm/clip.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/net/atm/clip.c b/net/atm/clip.c index 17e55df..4407b2f 100644 --- a/net/atm/clip.c +++ b/net/atm/clip.c @@ -317,6 +317,9 @@ static int clip_constructor(struct neighbour *neigh) static int clip_encap(struct atm_vcc *vcc, int mode) { + if (!CLIP_VCC(vcc)) + return -EBADFD; + CLIP_VCC(vcc)->encap = mode; return 0; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> -Original Message- > From: David Laight [mailto:david.lai...@aculab.com] > Sent: Wednesday, September 16, 2015 9:25 AM > To: Haiyang Zhang; Vitaly Kuznetsov > ; netdev@vger.kernel.org > Cc: David S. Miller ; linux-ker...@vger.kernel.org; > KY Srinivasan ; Jason Wang > Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- > V > > From: Haiyang Zhang > > Sent: 16 September 2015 17:09 > > > -Original Message- > > > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > > > Sent: Wednesday, September 16, 2015 11:50 AM > > > To: netdev@vger.kernel.org > > > Cc: David S. Miller ; linux- > ker...@vger.kernel.org; > > > KY Srinivasan ; Haiyang Zhang > > > ; Jason Wang > > > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- > V > > > > > > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: > Eliminate > > > memory allocation in the packet send path") introduced skb headroom > > > request for Hyper-V netvsc driver: > > > > > >max_needed_headroom = sizeof(struct hv_netvsc_packet) + > > >sizeof(struct rndis_message) + > > >NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + > > >NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; > > >... > > >net->needed_headroom = max_needed_headroom; > > > > > > max_needed_headroom is 220 bytes, it significantly exceeds the > > > LL_MAX_HEADER setting. This causes each skb to be cloned on send > path, > > > e.g. for IPv4 case we fall into the following clause > > > (ip_finish_output2()): > > > > > > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { > > > ... > > > skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); > > > ... > > > } > > > > > > leading to a significant performance regression. Increase > LL_MAX_HEADER > > > to make it suitable for netvsc, make it 224 to be 16-aligned. > > > Alternatively we could (partially) revert the commit which introduced > > > skb > > > headroom request restoring manual memory allocation on transmit path. > > > > > > Signed-off-by: Vitaly Kuznetsov > > > --- > > > include/linux/netdevice.h | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > > index 88a0069..7233790 100644 > > > --- a/include/linux/netdevice.h > > > +++ b/include/linux/netdevice.h > > > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) > > > * used. > > > */ > > > > > > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > > > +#if IS_ENABLED(CONFIG_HYPERV_NET) > > > +# define LL_MAX_HEADER 224 > > > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > > > # if defined(CONFIG_MAC80211_MESH) > > > # define LL_MAX_HEADER 128 > > > # else > > > > Thanks for the patch. > > To avoid we forget to update that 224 number when we add more things > > into netvsc header, I suggest that we define a macro in netdevice.h such > > as: > > #define HVNETVSC_MAX_HEADER 224 > > #define LL_MAX_HEADER HVNETVSC_MAX_HEADER > > > > And, put a note in netvsc code saying the header reservation shouldn't > > exceed HVNETVSC_MAX_HEADER, or you need to update > HVNETVSC_MAX_HEADER. > > Am I right in thinking this is adding an extra 96 unused bytes to the front > of almost all skb just so that hyper-v can make its link level header > contiguous with whatever follows (IP header ?). > > Doesn't sound ideal. Remote NDIS is the protocol used to send packets from the guest to the host. Every packet needs to be decorated with the RNDIS header and the maximum room needed for the RNDIS header is the hreadroom we want. K. Y > > David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next:master 6/12] include/linux/usb/cdc.h:23: error: redefinition of 'struct usb_cdc_parsed_header'
From: Fengguang WuDate: Wed, 16 Sep 2015 21:06:58 +0800 > On Tue, Sep 15, 2015 at 01:27:42PM -0700, David Miller wrote: >> From: kbuild test robot >> Date: Wed, 16 Sep 2015 03:57:11 +0800 >> >> > All error/warnings (new ones prefixed by >>): >> > >> >In file included from drivers/usb/gadget/function/u_ether.h:20, >> > from drivers/usb/gadget/legacy/cdc2.c:16: >> >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared >> > inside parameter list >> >include/linux/usb/cdc.h:47: warning: its scope is only this definition >> > or declaration, which is probably not what you want >> >In file included from drivers/usb/gadget/function/u_serial.h:16, >> > from drivers/usb/gadget/legacy/cdc2.c:17: >> >>> include/linux/usb/cdc.h:23: error: redefinition of 'struct >> >>> usb_cdc_parsed_header' >> >include/linux/usb/cdc.h:47: warning: 'struct usb_interface' declared >> > inside parameter list >> >>> include/linux/usb/cdc.h:47: error: conflicting types for >> >>> 'cdc_parse_cdc_header' >> >include/linux/usb/cdc.h:47: error: previous declaration of >> > 'cdc_parse_cdc_header' was here >> >> This may be a side effect of the initial warning, does this reproduce with >> that fixed? Please show me what the warning looks like in that case. > > Dave, net-next/master commit ad1e7b97b3 ("cdc: Fix build warning.") > still has errors. > > The problem is, the header file is included twice. That's not possible after the patch I committed from Stephen Rothwell which adds proper include guards: commit b84ee0d7f375ed7840c7c110d46eac24cf94b2a2 Author: Stephen Rothwell Date: Wed Sep 16 11:10:16 2015 +1000 cdc: add header guards Signed-off-by: Stephen Rothwell Signed-off-by: David S. Miller diff --git a/include/linux/usb/cdc.h b/include/linux/usb/cdc.h index 959d0c8..b5706f9 100644 --- a/include/linux/usb/cdc.h +++ b/include/linux/usb/cdc.h @@ -7,6 +7,8 @@ * modify it under the terms of the GNU General Public License * version 2 as published by the Free Software Foundation. */ +#ifndef __LINUX_USB_CDC_H +#define __LINUX_USB_CDC_H #include @@ -45,3 +47,5 @@ int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr, struct usb_interface *intf, u8 *buffer, int buflen); + +#endif /* __LINUX_USB_CDC_H */ diff --git a/include/uapi/linux/usb/cdc.h b/include/uapi/linux/usb/cdc.h index b6a9cdd..e2bc417 100644 --- a/include/uapi/linux/usb/cdc.h +++ b/include/uapi/linux/usb/cdc.h @@ -6,8 +6,8 @@ * firmware based USB peripherals. */ -#ifndef __LINUX_USB_CDC_H -#define __LINUX_USB_CDC_H +#ifndef __UAPI_LINUX_USB_CDC_H +#define __UAPI_LINUX_USB_CDC_H #include @@ -444,4 +444,4 @@ struct usb_cdc_ncm_ndp_input_size { #define USB_CDC_NCM_CRC_NOT_APPENDED 0x00 #define USB_CDC_NCM_CRC_APPENDED 0x01 -#endif /* __LINUX_USB_CDC_H */ +#endif /* __UAPI_LINUX_USB_CDC_H */ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: fix cdc-phonet.c dependency and build error
From: Randy Dunlap <rdun...@infradead.org> Fix build error caused by missing Kconfig dependency: ERROR: "cdc_parse_cdc_header" [drivers/net/usb/cdc-phonet.ko] undefined! Reported-by: Fengguang Wu <fengguang...@intel.com> Signed-off-by: Randy Dunlap <rdun...@infradead.org> --- drivers/net/usb/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-next-20150916.orig/drivers/net/usb/Kconfig +++ linux-next-20150916/drivers/net/usb/Kconfig @@ -541,7 +541,7 @@ config USB_NET_INT51X1 config USB_CDC_PHONET tristate "CDC Phonet support" - depends on PHONET + depends on PHONET && USB_USBNET help Choose this option to support the Phonet interface to a Nokia cellular modem, as found on most Nokia handsets with the -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
From: David LaightDate: Wed, 16 Sep 2015 16:25:03 + > Am I right in thinking this is adding an extra 96 unused bytes to the front > of almost all skb just so that hyper-v can make its link level header > contiguous with whatever follows (IP header ?). > > Doesn't sound ideal. Agreed, this is rediculous, and the entire stack will incur this cost just because hyperv is enabled in the kernel config. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches
Daniel Mackwrote: > I'm re-addressing the issue of matching socket meta information for > non-established sockets that has been discussed a while ago: > > http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 > > Being able to reliably match on net_cls cgroup ids is crucial in > order to build a per-application or per-container firewall rules > which don't leak ingress packets. Such a feature would be very > useful to have. Could you clarify what 'which don't leak ingress packets' means? > A previous attempt to fix the currently existing issues was to call > out to the early demuxing helper functions from the meta matching > callbacks, but that doesn't suffice because it doesn't address the > case of multicast UDP and other, more complex lookup methods > implemented in various protocol handlers. Yes, but see below. > This patch set outlines a different approach by adding a flag to > 'struct sk_buff' called 'nf_postponed'. This flag is set by > nft_meta_get_eval() in case a decision cannot be made due to a missing > skb->sk. skbs flagged that way will then be ran through the netfilter > chain processor again after the protocol handlers did the real socket > lookup. A small addition to 'struct nft_pktinfo' is needed so that the > matching callbacks can access the socket that was passed into > nf_hook(). > > Note that the new flag does not actually bloat 'struct skb_buff', > because it still fits into the 'flags1' bitfield. Also, the extra > netfilter chain iteration will not be done by any subsequent packet in > the same stream, as for those, the early demux code will set skb->sk. > > The patch set is obviously not yet finished, because a lot more > protocol handlers need to be patched. Right now, I only addressed > tcp_ipv4. Before I do that, I want to get some feedback on the > approach, so please let me know what you think. I think there are several issues. implementation problems: - i'm not sure its legal to call the hook input with skb->sk locked, some matches might want to aquire it. - what makes NFT_META_CGROUP special? (or was that just an example?) design issues: The assumption seems to be that a given skb can always be mapped to a particular socket, and hence a cgroup. Thats not necessarily the case, e.g. with broad-/multicasting or when the socket is e.g. in timewait state. Some skbs will now travel INPUT hooks twice. And once you'd extend this so that we re-invoke nf hooks for mcast packets, for each socket they've been received on, you change netfilter behaviour again (one skb, one traversal -> n traversals of ruleset, one for each sk). I think that this makes it a non-starter, sorry. I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves early-demux-esque code into the nft ruleset. Then you could do something like nft add rule ip filter input meta l4proto tcp demux meta cgroup 42 The caveat being that even in this case we cannot guarantee that skb->sk is set afterwards, or that a cgroup can be derived from it. Iff you absolutely need this, I'd seriously entertain the idea of adding NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset as a socket filter. But really, at that point, a much better question would be wheter net cgroups are the answer to whatever the question was, or what problem we are attempting to address here... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: qdisc: enhance default_qdisc documentation
On Tue, Sep 15, 2015 at 1:33 AM, Phil Sutterwrote: > Aside from some lingual cleanup, point out which interfaces are not or > partly covered by this setting. > > Signed-off-by: Phil Sutter Acked-by: Cong Wang It also worth to explain what the default qdisc means, but we can do that in another patch. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] xen-netfront: always set num queues if possible
If netfront connects with two (or more) queues and then reconnects with only one queue it fails to delete or rewrite the multi-queue-num-queues key and netback will try to use the wrong number of queues. Always write the num-queues field if the backend has multi-queue support. Signed-off-by: Chas Williams <3ch...@gmail.com> --- drivers/net/xen-netfront.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index f821a97..9bf63c2 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1819,19 +1819,22 @@ again: goto destroy_ring; } - if (num_queues == 1) { - err = write_queue_xenstore_keys(>queues[0], , 0); /* flat */ - if (err) - goto abort_transaction_no_dev_fatal; - } else { + if (xenbus_exists(XBT_NIL, + info->xbdev->otherend, "multi-queue-max-queues")) { /* Write the number of queues */ - err = xenbus_printf(xbt, dev->nodename, "multi-queue-num-queues", - "%u", num_queues); + err = xenbus_printf(xbt, dev->nodename, + "multi-queue-num-queues", "%u", num_queues); if (err) { message = "writing multi-queue-num-queues"; goto abort_transaction_no_dev_fatal; } + } + if (num_queues == 1) { + err = write_queue_xenstore_keys(>queues[0], , 0); /* flat */ + if (err) + goto abort_transaction_no_dev_fatal; + } else { /* Write the keys for each queue */ for (i = 0; i < num_queues; ++i) { queue = >queues[i]; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
On 09/16/2015 10:55 AM, KY Srinivasan wrote: -Original Message- From: David Laight [mailto:david.lai...@aculab.com] Sent: Wednesday, September 16, 2015 9:25 AM To: Haiyang Zhang; Vitaly Kuznetsov ; netdev@vger.kernel.org Cc: David S. Miller ; linux-ker...@vger.kernel.org; KY Srinivasan ; Jason Wang Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- V From: Haiyang Zhang Sent: 16 September 2015 17:09 -Original Message- From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] Sent: Wednesday, September 16, 2015 11:50 AM To: netdev@vger.kernel.org Cc: David S. Miller ; linux- ker...@vger.kernel.org; KY Srinivasan ; Haiyang Zhang ; Jason Wang Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- V Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate memory allocation in the packet send path") introduced skb headroom request for Hyper-V netvsc driver: max_needed_headroom = sizeof(struct hv_netvsc_packet) + sizeof(struct rndis_message) + NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; ... net->needed_headroom = max_needed_headroom; max_needed_headroom is 220 bytes, it significantly exceeds the LL_MAX_HEADER setting. This causes each skb to be cloned on send path, e.g. for IPv4 case we fall into the following clause (ip_finish_output2()): if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { ... skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); ... } leading to a significant performance regression. Increase LL_MAX_HEADER to make it suitable for netvsc, make it 224 to be 16-aligned. Alternatively we could (partially) revert the commit which introduced skb headroom request restoring manual memory allocation on transmit path. Signed-off-by: Vitaly Kuznetsov --- include/linux/netdevice.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 88a0069..7233790 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) *used. */ -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) +#if IS_ENABLED(CONFIG_HYPERV_NET) +# define LL_MAX_HEADER 224 +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) # if defined(CONFIG_MAC80211_MESH) # define LL_MAX_HEADER 128 # else Thanks for the patch. To avoid we forget to update that 224 number when we add more things into netvsc header, I suggest that we define a macro in netdevice.h such as: #define HVNETVSC_MAX_HEADER 224 #define LL_MAX_HEADER HVNETVSC_MAX_HEADER And, put a note in netvsc code saying the header reservation shouldn't exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER. Am I right in thinking this is adding an extra 96 unused bytes to the front of almost all skb just so that hyper-v can make its link level header contiguous with whatever follows (IP header ?). Doesn't sound ideal. Remote NDIS is the protocol used to send packets from the guest to the host. Every packet needs to be decorated with the RNDIS header and the maximum room needed for the RNDIS header is the hreadroom we want. I think we get that. The question is does the Remote NDIS header and packet info actually need to be a part of the header data? I would argue that it probably doesn't. So for example in netvsc_start_xmit it looks like you are calling init_page_array in order to populate a set of page buffers, but the first buffer for the Remote NDIS protocol is populated as a separate page and offset. As such it doesn't seem like it necessarily needs to be a part of the header data but could be maintained perhaps in a separate ring buffer, or perhaps just be a separate page that you break up to use for each header. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 5/6] lan78xx: Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control.
Create lan78xx_get_mdix_status() and lan78xx_set_mdix_status() for MDIX control. Signed-off-by: Woojung Huh--- drivers/net/usb/lan78xx.c | 90 +++ 1 file changed, 52 insertions(+), 38 deletions(-) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index 517264d..9102c71 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -1175,6 +1175,55 @@ static void lan78xx_set_msglevel(struct net_device *net, u32 level) dev->msg_enable = level; } +static int lan78xx_get_mdix_status(struct net_device *net) +{ + struct phy_device *phydev = net->phydev; + int buf; + + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_1); + buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_0); + + return buf; +} + +static void lan78xx_set_mdix_status(struct net_device *net, __u8 mdix_ctrl) +{ + struct lan78xx_net *dev = netdev_priv(net); + struct phy_device *phydev = net->phydev; + int buf; + + if (mdix_ctrl == ETH_TP_MDI) { + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, + LAN88XX_EXT_PAGE_SPACE_1); + buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); + buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; + phy_write(phydev, LAN88XX_EXT_MODE_CTRL, + buf | LAN88XX_EXT_MODE_CTRL_MDI_); + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, + LAN88XX_EXT_PAGE_SPACE_0); + } else if (mdix_ctrl == ETH_TP_MDI_X) { + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, + LAN88XX_EXT_PAGE_SPACE_1); + buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); + buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; + phy_write(phydev, LAN88XX_EXT_MODE_CTRL, + buf | LAN88XX_EXT_MODE_CTRL_MDI_X_); + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, + LAN88XX_EXT_PAGE_SPACE_0); + } else if (mdix_ctrl == ETH_TP_MDI_AUTO) { + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, + LAN88XX_EXT_PAGE_SPACE_1); + buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); + buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; + phy_write(phydev, LAN88XX_EXT_MODE_CTRL, + buf | LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_); + phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, + LAN88XX_EXT_PAGE_SPACE_0); + } + dev->mdix_ctrl = mdix_ctrl; +} + static int lan78xx_get_settings(struct net_device *net, struct ethtool_cmd *cmd) { struct lan78xx_net *dev = netdev_priv(net); @@ -1188,9 +1237,7 @@ static int lan78xx_get_settings(struct net_device *net, struct ethtool_cmd *cmd) ret = phy_ethtool_gset(phydev, cmd); - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_1); - buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_0); + buf = lan78xx_get_mdix_status(net); buf &= LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; if (buf == LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_) { @@ -1221,34 +1268,7 @@ static int lan78xx_set_settings(struct net_device *net, struct ethtool_cmd *cmd) return ret; if (dev->mdix_ctrl != cmd->eth_tp_mdix_ctrl) { - if (cmd->eth_tp_mdix_ctrl == ETH_TP_MDI) { - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, - LAN88XX_EXT_PAGE_SPACE_1); - temp = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); - temp &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; - phy_write(phydev, LAN88XX_EXT_MODE_CTRL, - temp | LAN88XX_EXT_MODE_CTRL_MDI_); - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, - LAN88XX_EXT_PAGE_SPACE_0); - } else if (cmd->eth_tp_mdix_ctrl == ETH_TP_MDI_X) { - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, - LAN88XX_EXT_PAGE_SPACE_1); - temp = phy_read(phydev, LAN88XX_EXT_MODE_CTRL); - temp &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_; - phy_write(phydev, LAN88XX_EXT_MODE_CTRL, - temp | LAN88XX_EXT_MODE_CTRL_MDI_X_); - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, - LAN88XX_EXT_PAGE_SPACE_0); - } else if (cmd->eth_tp_mdix_ctrl == ETH_TP_MDI_AUTO) { - phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, - LAN88XX_EXT_PAGE_SPACE_1); - temp = phy_read(phydev,
Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
On 09/16/2015 03:57 PM, KY Srinivasan wrote: -Original Message- From: Alexander Duyck [mailto:alexander.du...@gmail.com] Sent: Wednesday, September 16, 2015 2:39 PM To: KY Srinivasan; David Laight ; Haiyang Zhang ; Vitaly Kuznetsov ; netdev@vger.kernel.org Cc: David S. Miller ; linux-ker...@vger.kernel.org; Jason Wang Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V On 09/16/2015 10:55 AM, KY Srinivasan wrote: -Original Message- From: David Laight [mailto:david.lai...@aculab.com] Sent: Wednesday, September 16, 2015 9:25 AM To: Haiyang Zhang ; Vitaly Kuznetsov ; netdev@vger.kernel.org Cc: David S. Miller ; linux-ker...@vger.kernel.org; KY Srinivasan ; Jason Wang Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- V From: Haiyang Zhang Sent: 16 September 2015 17:09 -Original Message- From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] Sent: Wednesday, September 16, 2015 11:50 AM To: netdev@vger.kernel.org Cc: David S. Miller ; linux- ker...@vger.kernel.org; KY Srinivasan ; Haiyang Zhang ; Jason Wang Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- V Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: Eliminate memory allocation in the packet send path") introduced skb headroom request for Hyper-V netvsc driver: max_needed_headroom = sizeof(struct hv_netvsc_packet) + sizeof(struct rndis_message) + NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; ... net->needed_headroom = max_needed_headroom; max_needed_headroom is 220 bytes, it significantly exceeds the LL_MAX_HEADER setting. This causes each skb to be cloned on send path, e.g. for IPv4 case we fall into the following clause (ip_finish_output2()): if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { ... skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); ... } leading to a significant performance regression. Increase LL_MAX_HEADER to make it suitable for netvsc, make it 224 to be 16-aligned. Alternatively we could (partially) revert the commit which introduced skb headroom request restoring manual memory allocation on transmit path. Signed-off-by: Vitaly Kuznetsov --- include/linux/netdevice.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 88a0069..7233790 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) * used. */ -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) +#if IS_ENABLED(CONFIG_HYPERV_NET) +# define LL_MAX_HEADER 224 +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) # if defined(CONFIG_MAC80211_MESH) # define LL_MAX_HEADER 128 # else Thanks for the patch. To avoid we forget to update that 224 number when we add more things into netvsc header, I suggest that we define a macro in netdevice.h such as: #define HVNETVSC_MAX_HEADER 224 #define LL_MAX_HEADER HVNETVSC_MAX_HEADER And, put a note in netvsc code saying the header reservation shouldn't exceed HVNETVSC_MAX_HEADER, or you need to update HVNETVSC_MAX_HEADER. Am I right in thinking this is adding an extra 96 unused bytes to the front of almost all skb just so that hyper-v can make its link level header contiguous with whatever follows (IP header ?). Doesn't sound ideal. Remote NDIS is the protocol used to send packets from the guest to the host. Every packet needs to be decorated with the RNDIS header and the maximum room needed for the RNDIS header is the hreadroom we want. I think we get that. The question is does the Remote NDIS header and packet info actually need to be a part of the header data? I would argue that it probably doesn't. So for example in netvsc_start_xmit it looks like you are calling init_page_array in order to populate a set of page buffers, but the first buffer for the Remote NDIS protocol is populated as a separate page and offset. As such it doesn't seem like it necessarily needs to be a part of the header data but could be maintained perhaps in a separate ring buffer, or perhaps just be a separate page that you break up to use for each header. You are right; the rndis header can be built as a separate fragment and sent. Indeed this is what we were doing earlier - on the outgoing path we would allocate memory for the rndis header. My goal was to avoid this allocation on every packet being sent and
Re: [PATCH net-next v2] net: Initialize table in fib result
On 16/09/15 09:16, David Ahern wrote: > The root cause is use of res.table uninitialized. > > Thanks to Nikolay for noticing the uninitialized use amongst the maze of > gotos. > > As Nikolay pointed out the second initialization is not required to fix > the oops, but rather to fix a related problem where a valid lookup should > be invalidated before creating the rth entry. > > Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") > Reported-by: Sergey Senozhatsky> Reported-by: Richard Alpe > Reported-by: Fabio Estevam > Tested-by: Fabio Estevam > Signed-off-by: David Ahern There are enough Tested-by tags, but thanks for fixing this! -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> -Original Message- > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > Sent: Wednesday, September 16, 2015 2:39 PM > To: KY Srinivasan; David Laight > ; Haiyang Zhang ; > Vitaly Kuznetsov ; netdev@vger.kernel.org > Cc: David S. Miller ; linux-ker...@vger.kernel.org; > Jason Wang > Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > On 09/16/2015 10:55 AM, KY Srinivasan wrote: > > > >> -Original Message- > >> From: David Laight [mailto:david.lai...@aculab.com] > >> Sent: Wednesday, September 16, 2015 9:25 AM > >> To: Haiyang Zhang ; Vitaly Kuznetsov > >> ; netdev@vger.kernel.org > >> Cc: David S. Miller ; linux-ker...@vger.kernel.org; > >> KY Srinivasan ; Jason Wang > >> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- > >> V > >> > >> From: Haiyang Zhang > >>> Sent: 16 September 2015 17:09 > -Original Message- > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > Sent: Wednesday, September 16, 2015 11:50 AM > To: netdev@vger.kernel.org > Cc: David S. Miller ; linux- > >> ker...@vger.kernel.org; > KY Srinivasan ; Haiyang Zhang > ; Jason Wang > Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- > >> V > Commit b08cc79155fc26d0d112b1470d1ece5034651a4b ("hv_netvsc: > >> Eliminate > memory allocation in the packet send path") introduced skb headroom > request for Hyper-V netvsc driver: > > max_needed_headroom = sizeof(struct hv_netvsc_packet) + > sizeof(struct rndis_message) + > NDIS_VLAN_PPI_SIZE + NDIS_CSUM_PPI_SIZE + > NDIS_LSO_PPI_SIZE + NDIS_HASH_PPI_SIZE; > ... > net->needed_headroom = max_needed_headroom; > > max_needed_headroom is 220 bytes, it significantly exceeds the > LL_MAX_HEADER setting. This causes each skb to be cloned on send > >> path, > e.g. for IPv4 case we fall into the following clause > (ip_finish_output2()): > > if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { > ... > skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); > ... > } > > leading to a significant performance regression. Increase > >> LL_MAX_HEADER > to make it suitable for netvsc, make it 224 to be 16-aligned. > Alternatively we could (partially) revert the commit which introduced > skb > headroom request restoring manual memory allocation on transmit path. > > Signed-off-by: Vitaly Kuznetsov > --- > include/linux/netdevice.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 88a0069..7233790 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) > * used. > */ > > -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > +#if IS_ENABLED(CONFIG_HYPERV_NET) > +# define LL_MAX_HEADER 224 > +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > # if defined(CONFIG_MAC80211_MESH) > # define LL_MAX_HEADER 128 > # else > >>> Thanks for the patch. > >>> To avoid we forget to update that 224 number when we add more things > >>> into netvsc header, I suggest that we define a macro in netdevice.h such > >>> as: > >>> #define HVNETVSC_MAX_HEADER 224 > >>> #define LL_MAX_HEADER HVNETVSC_MAX_HEADER > >>> > >>> And, put a note in netvsc code saying the header reservation shouldn't > >>> exceed HVNETVSC_MAX_HEADER, or you need to update > >> HVNETVSC_MAX_HEADER. > >> > >> Am I right in thinking this is adding an extra 96 unused bytes to the front > >> of almost all skb just so that hyper-v can make its link level header > >> contiguous with whatever follows (IP header ?). > >> > >> Doesn't sound ideal. > > Remote NDIS is the protocol used to send packets from the guest to the host. > Every packet > > needs to be decorated with the RNDIS header and the maximum room needed > for the RNDIS > > header is the hreadroom we want. > > I think we get that. The question is does the Remote NDIS header and > packet info actually need to be a part of the header data? I would > argue that it probably doesn't. > > So for example in netvsc_start_xmit it looks like you are calling > init_page_array in order to populate a set of page buffers, but the > first buffer for the Remote NDIS
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> -Original Message- > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > Sent: Wednesday, September 16, 2015 4:49 PM > To: KY Srinivasan; David Laight > ; Haiyang Zhang ; > Vitaly Kuznetsov ; netdev@vger.kernel.org > Cc: David S. Miller ; linux-ker...@vger.kernel.org; > Jason Wang > Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > On 09/16/2015 03:57 PM, KY Srinivasan wrote: > > > >> -Original Message- > >> From: Alexander Duyck [mailto:alexander.du...@gmail.com] > >> Sent: Wednesday, September 16, 2015 2:39 PM > >> To: KY Srinivasan ; David Laight > >> ; Haiyang Zhang ; > >> Vitaly Kuznetsov ; netdev@vger.kernel.org > >> Cc: David S. Miller ; linux-ker...@vger.kernel.org; > >> Jason Wang > >> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper- > V > >> > >> On 09/16/2015 10:55 AM, KY Srinivasan wrote: > -Original Message- > From: David Laight [mailto:david.lai...@aculab.com] > Sent: Wednesday, September 16, 2015 9:25 AM > To: Haiyang Zhang ; Vitaly Kuznetsov > ; netdev@vger.kernel.org > Cc: David S. Miller ; linux- > ker...@vger.kernel.org; > KY Srinivasan ; Jason Wang > > Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for > Hyper- > V > > From: Haiyang Zhang > > Sent: 16 September 2015 17:09 > >> -Original Message- > >> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > >> Sent: Wednesday, September 16, 2015 11:50 AM > >> To: netdev@vger.kernel.org > >> Cc: David S. Miller ; linux- > ker...@vger.kernel.org; > >> KY Srinivasan ; Haiyang Zhang > >> ; Jason Wang > >> Subject: [PATCH net-next RFC] net: increase LL_MAX_HEADER for > Hyper- > V > >> Commit b08cc79155fc26d0d112b1470d1ece5034651a4b > ("hv_netvsc: > Eliminate > >>memory allocation in the packet send path") introduced skb > headroom > >> request for Hyper-V netvsc driver: > >> > >> max_needed_headroom = sizeof(struct hv_netvsc_packet) + > >> sizeof(struct rndis_message) + > >> NDIS_VLAN_PPI_SIZE + > >> NDIS_CSUM_PPI_SIZE + > >> NDIS_LSO_PPI_SIZE + > >> NDIS_HASH_PPI_SIZE; > >> ... > >> net->needed_headroom = max_needed_headroom; > >> > >> max_needed_headroom is 220 bytes, it significantly exceeds the > >> LL_MAX_HEADER setting. This causes each skb to be cloned on send > path, > >> e.g. for IPv4 case we fall into the following clause > >> (ip_finish_output2()): > >> > >> if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { > >> ... > >> skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev)); > >> ... > >> } > >> > >> leading to a significant performance regression. Increase > LL_MAX_HEADER > >> to make it suitable for netvsc, make it 224 to be 16-aligned. > >> Alternatively we could (partially) revert the commit which introduced > >> skb > >> headroom request restoring manual memory allocation on transmit > path. > >> > >> Signed-off-by: Vitaly Kuznetsov > >> --- > >>include/linux/netdevice.h | 4 +++- > >>1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >> index 88a0069..7233790 100644 > >> --- a/include/linux/netdevice.h > >> +++ b/include/linux/netdevice.h > >> @@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc) > >> * used. > >> */ > >> > >> -#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > >> +#if IS_ENABLED(CONFIG_HYPERV_NET) > >> +# define LL_MAX_HEADER 224 > >> +#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25) > >># if defined(CONFIG_MAC80211_MESH) > >># define LL_MAX_HEADER 128 > >># else > > Thanks for the patch. > > To avoid we forget to update that 224 number when we add more things > > into netvsc header, I suggest that we define a macro in netdevice.h such > > as: > > #define HVNETVSC_MAX_HEADER 224 > > #define LL_MAX_HEADER HVNETVSC_MAX_HEADER > > > > And, put a note in netvsc code saying the header reservation shouldn't > > exceed HVNETVSC_MAX_HEADER, or you need to update > HVNETVSC_MAX_HEADER. > > Am