[PATCH net-next] bridge: ebtables: Avoid resetting limit rule state
So far any changes with ebtables will reset the state of limit rules, leading to spikes in traffic. This is especially noticeable if changes are done frequently, for instance via a daemon. This patch fixes this by bailing out from (re)setting if the limit rule was initialized before. When sending packets every 250ms for 600s, with a "--limit 1/sec --limit-burst 50" rule and a command like this in the background: $ ebtables -N VOIDCHAIN $ while true; do ebtables -F VOIDCHAIN; sleep 30; done The results are: Before: ~1600 packets After: 650 packets Signed-off-by: Linus Lüssing--- net/bridge/netfilter/ebt_limit.c | 4 1 file changed, 4 insertions(+) diff --git a/net/bridge/netfilter/ebt_limit.c b/net/bridge/netfilter/ebt_limit.c index 61a9f1be1263..f74b48633feb 100644 --- a/net/bridge/netfilter/ebt_limit.c +++ b/net/bridge/netfilter/ebt_limit.c @@ -69,6 +69,10 @@ static int ebt_limit_mt_check(const struct xt_mtchk_param *par) { struct ebt_limit_info *info = par->matchinfo; + /* Do not reset state on unrelated table changes */ + if (info->prev) + return 0; + /* Check for overflow. */ if (info->burst == 0 || user2credits(info->avg * info->burst) < user2credits(info->avg)) { -- 2.11.0
Re: [PATCH] uapi: add SPDX identifier to vm_sockets_diag.h
On Fri, Nov 24, 2017 at 8:08 PM, Stephen Hemmingerwrote: > New file seems to have missed the SPDX license scan and update. > > Signed-off-by: Stephen Hemminger > --- > include/uapi/linux/vm_sockets_diag.h | 1 + > 1 file changed, 1 insertion(+) Reviewed-by: Stefan Hajnoczi
Re: [RFC net-next 0/6] xdp: make stack perform remove and tests
On Fri, 24 Nov 2017 00:02:32 -0800, Jakub Kicinski wrote: > >>Something I'm still battling with, and would appreciate help of > >>wiser people is that occasionally during the test something makes > >>the refcount of init_net drop to 0 :S I tried to create a simple > >>reproducer, but seems like just running the script in the loop is > >>the easiest way to go... Could it have something to do with the > >>recent TC work? The driver is pretty simple and never touches > > > > I don't see how... > > To be clear I meant the changes made to destruction of filters, not > your work. The BPF code doesn't touch ref counts and cls exts do seem > to hold a ref on the net... but perhaps that's just pointing the > finger unnecessarily :) I will try to investigate again tomorrow. Looks like I was lazy when adding the offload and just called __cls_bpf_delete_prog() instead of extending the error path. Cong missed this extra call in aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()"). We need something like this: diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index a9f3e317055c..40d4289aea28 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -514,12 +514,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, goto errout_idr; ret = cls_bpf_offload(tp, prog, oldprog); - if (ret) { - if (!oldprog) - idr_remove_ext(>handle_idr, prog->handle); - __cls_bpf_delete_prog(prog); - return ret; - } + if (ret) + goto errout_parms; if (!tc_in_hw(prog->gen_flags)) prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW; @@ -537,6 +533,13 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, *arg = prog; return 0; +errout_parms: + if (cls_bpf_is_ebpf(prog)) + bpf_prog_put(prog->filter); + else + bpf_prog_destroy(prog->filter); + kfree(prog->bpf_name); + kfree(prog->bpf_ops); errout_idr: if (!oldprog) idr_remove_ext(>handle_idr, prog->handle);
Re: [RFC net-next 3/6] net: xdp: make the stack take care of the tear down
On Sat, 25 Nov 2017 00:24:50 +0100, Daniel Borkmann wrote: > > +static void dev_xdp_uninstall(struct net_device *dev) > > +{ > > + struct netdev_bpf xdp; > > + bpf_op_t ndo_bpf; > > Can you add a comment here stating that generic XDP does not > need to be handled since we drop the prog from free_netdev()? > Potentially we could also drop the generic one from here, that > way we'd make no difference and have a dev_xdp_install() and > one dev_xdp_uninstall() for all kind of attach types. Given > generic XDP should simulate native XDP anyway, probably better > to just do that. I will move the freeing of generic XDP here and add a simple test to the last patch. Thanks! > > + ndo_bpf = dev->netdev_ops->ndo_bpf; > > + if (!ndo_bpf) > > + return; > > + > > + __dev_xdp_query(dev, ndo_bpf, ); > > + if (xdp.prog_attached == XDP_ATTACHED_NONE) > > + return; > > + > > + /* Program removal should always succeed */ > > + WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags, NULL)); > > +}
Re: [RFC net-next 3/6] net: xdp: make the stack take care of the tear down
On 11/24/2017 03:36 AM, Jakub Kicinski wrote: > Since day one of XDP drivers had to remember to free the program > on the remove path. This leads to code duplication and is error > prone. Make the stack query the installed programs on unregister > and if something is installed, remove the program. > > Because the remove will now be called before notifiers are > invoked, BPF offload state of the program will not get destroyed > before uninstall. > > Signed-off-by: Jakub Kicinski> Reviewed-by: Simon Horman [...] Nice work, series looks good to me! One really just minor comment below: > diff --git a/net/core/dev.c b/net/core/dev.c > index 3f271c9cb5e0..a3e932f98419 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -7110,6 +7110,23 @@ static int dev_xdp_install(struct net_device *dev, > bpf_op_t bpf_op, > return bpf_op(dev, ); > } > > +static void dev_xdp_uninstall(struct net_device *dev) > +{ > + struct netdev_bpf xdp; > + bpf_op_t ndo_bpf; Can you add a comment here stating that generic XDP does not need to be handled since we drop the prog from free_netdev()? Potentially we could also drop the generic one from here, that way we'd make no difference and have a dev_xdp_install() and one dev_xdp_uninstall() for all kind of attach types. Given generic XDP should simulate native XDP anyway, probably better to just do that. > + ndo_bpf = dev->netdev_ops->ndo_bpf; > + if (!ndo_bpf) > + return; > + > + __dev_xdp_query(dev, ndo_bpf, ); > + if (xdp.prog_attached == XDP_ATTACHED_NONE) > + return; > + > + /* Program removal should always succeed */ > + WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags, NULL)); > +} > + > /** > * dev_change_xdp_fd - set or clear a bpf program for a device rx path > * @dev: device > @@ -7240,6 +7257,7 @@ static void rollback_registered_many(struct list_head > *head) > /* Shutdown queueing discipline. */ > dev_shutdown(dev); > > + dev_xdp_uninstall(dev); > > /* Notify protocols, that we are about to destroy >* this device. They should clean all the things. > Thanks, Daniel
Re: [PATCH net] net: dsa: fix 'increment on 0' warning
On 11/24/2017 08:36 AM, Vivien Didelot wrote: > Setting the refcount to 0 when allocating a tree to match the number of > switch devices it holds may cause an 'increment on 0; use-after-free', > if CONFIG_REFCOUNT_FULL is enabled. > > To fix this, do not decrement the refcount of a newly allocated tree, > increment it when an already allocated tree is found, and decrement it > after the probing of a switch, as done with the previous behavior. > > At the same time, make dsa_tree_get and dsa_tree_put accept a NULL > argument to simplify callers, and return the tree after incrementation, > as most kref users like of_node_get and of_node_put do. > > Fixes: 8e5bf9759a06 ("net: dsa: simplify tree reference counting") > Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli Tested-by: Florian Fainelli Thanks! -- Florian
Re: [PATCH iproute2/net-next v3]tc: B.W limits can now be specified in %.
On Fri, Nov 24, 2017 at 11:25:28AM -0800, Stephen Hemminger wrote: > On Sat, 18 Nov 2017 02:13:38 +0530 > Nishanth Devarajanwrote: > > > This patch adapts the tc command line interface to allow bandwidth limits > > to be specified as a percentage of the interface's capacity. > > > > Adding this functionality requires passing the specified device string to > > each class/qdisc which changes the prototype for a couple of functions: the > > .parse_qopt and .parse_copt interfaces. The device string is a required > > parameter for tc-qdisc and tc-class, and when not specified, the kernel > > returns ENODEV. In this patch, if the user tries to specify a bandwidth > > percentage without naming the device, we return an error from userspace. > > > > v2: > > * Modified and moved int read_prop() from ip/iptuntap.c to lib/utils.c, > > to make it accessible to tc. > > > > v3: > > * Modified and moved int parse_percent() from tc/q_netem.c to ib/util.c for > > use in tc. > > > > * Changed couple variable names in int parse_percent_rate(). > > > > * Handled showing error message when device speed is unknown. > > > > * Updated man page to warn users that when specifying rates in %, tc only > > uses the current device speed and does not recalculate if it changes after. > > > > During cases when properties (like device speed) are unknown, read_prop() > > assumes that if the property file can be opened but not read, it means > > that the property is unknown. > > > > Signed-off by: Nishanth Devarajan > > > > Applied, but there were three things that I needed to change: > 1. The DCO tag is "Signed-off-by" not "Signed-off by" > 2. The revision history should be below the cut line --- in the mail message > so that it doesn't end up in the commit message. > 3. The qopt function declarations now are a really long line. > I will break them up. > Thanks for the help, and will do, I'll keep the feedback in mind for future patches, thanks. -Nishanth
[PATCH iproute2] SPDX license identifiers
For all files in iproute2 which do not already have an obvious license identification, mark them with GPL-2. If any of the original authors want a more permissive license than that, please let ms know. Signed-off-by: Stephen Hemminger--- Makefile | 1 + bridge/Makefile | 1 + bridge/br_common.h | 2 ++ bridge/bridge.c | 1 + bridge/fdb.c | 1 + bridge/link.c| 1 + bridge/mdb.c | 1 + bridge/vlan.c| 1 + configure| 1 + devlink/Makefile | 1 + examples/bpf/bpf_tailcall.c | 1 + genl/Makefile| 1 + genl/genl_utils.h| 1 + genl/static-syms.c | 1 + include/bpf_api.h| 1 + include/bpf_elf.h| 1 + include/bpf_scm.h| 1 + include/color.h | 1 + include/dlfcn.h | 1 + include/ip6tables.h | 1 + include/iptables.h | 1 + include/iptables/internal.h | 1 + include/libgenl.h| 1 + include/libiptc/ipt_kernel_headers.h | 1 + include/libiptc/libip6tc.h | 1 + include/libiptc/libiptc.h| 1 + include/libiptc/libxtc.h | 1 + include/libiptc/xtcshared.h | 1 + include/libnetlink.h | 1 + include/list.h | 1 + include/ll_map.h | 1 + include/names.h | 1 + include/namespace.h | 1 + include/rt_names.h | 1 + include/rtm_map.h| 1 + include/utils.h | 1 + include/xt-internal.h| 1 + include/xtables.h| 1 + ip/Makefile | 1 + ip/ifcfg | 1 + ip/ila_common.h | 1 + ip/ip_common.h | 1 + ip/iplink_dummy.c| 1 + ip/iplink_ifb.c | 1 + ip/iplink_nlmon.c| 1 + ip/iplink_team.c | 1 + ip/iplink_vcan.c | 1 + ip/ipnetns.c | 1 + ip/iproute_lwtunnel.h| 1 + ip/routef| 1 + ip/routel| 2 +- ip/rtpr | 1 + ip/static-syms.c | 1 + ip/xdp.h | 1 + lib/Makefile | 1 + lib/color.c | 1 + lib/dnet_ntop.c | 1 + lib/dnet_pton.c | 1 + lib/exec.c | 1 + lib/ipx_ntop.c | 1 + lib/ipx_pton.c | 1 + lib/libgenl.c| 1 + lib/mpls_ntop.c | 2 ++ lib/mpls_pton.c | 2 ++ man/Makefile | 1 + man/man3/Makefile| 1 + man/man7/Makefile| 1 + man/man8/Makefile| 1 + misc/Makefile| 1 + misc/lnstat.h| 1 + misc/ssfilter.h | 1 + netem/Makefile | 1 + rdma/Makefile| 1 + tc/Makefile | 1 + tc/emp_ematch.l | 1 + tc/f_tcindex.c | 1 + tc/m_ematch.h| 1 + tc/q_atm.c | 1 + tc/q_clsact.c| 1 + tc/q_dsmark.c| 1 + tc/q_hhf.c | 1 + tc/static-syms.c | 1 + tc/tc_cbq.h | 1 + tc/tc_common.h | 1 + tc/tc_core.h | 1 + tc/tc_red.h | 1 + tc/tc_util.h | 1 + testsuite/Makefile | 1 + testsuite/iproute2/Makefile | 1 + testsuite/tools/Makefile | 1 + tipc/Makefile| 1 + 91 files changed, 94 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 6ad961043052..6a51e0db9107 100644 --- a/Makefile +++ b/Makefile @@ -1,3 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0 # Top level Makefile for iproute2 ifeq ($(VERBOSE),0) diff --git a/bridge/Makefile b/bridge/Makefile index b2ae0a4ed04d..c6b7d08dade4 100644 --- a/bridge/Makefile +++ b/bridge/Makefile @@ -1,3 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0 BROBJ = bridge.o fdb.o monitor.o link.o mdb.o vlan.o include ../config.mk diff --git a/bridge/br_common.h b/bridge/br_common.h index 01447ddca337..f07c7d1c9090 100644 --- a/bridge/br_common.h +++ b/bridge/br_common.h @@ -1,3 +1,5 @@ +/* SPDX-License-Identifier:
Re: [PATCH 1/3] net: core: export dev_alloc_name_ns
From: Rasmus VillemoesDate: Tue, 21 Nov 2017 01:34:37 +0100 > dev_alloc_name_ns and dev_get_valid_name now do exactly the same > thing. Let's expose this functionality as dev_alloc_name_ns > (obviously, a core function like this won't return an invalid > name...). > > Signed-off-by: Rasmus Villemoes If you're going to keep one of the routines, keep the one with the simpler and smaller name, "dev_get_valid_name".
Re: [PATCHv2 net-next 1/1] forcedeth: replace pci_unmap_page with dma_unmap_page
From: Zhu YanjunDate: Sun, 19 Nov 2017 22:21:08 -0500 > The function pci_unmap_page is obsolete. So it is replaced with > the function dma_unmap_page. > > CC: Srinivas Eeda > CC: Joe Jin > CC: Junxiao Bi > Signed-off-by: Zhu Yanjun > --- > V1->V2: fix direction flag error. Applied, thank you.
[PATCH] uapi: add SPDX identifier to vm_sockets_diag.h
New file seems to have missed the SPDX license scan and update. Signed-off-by: Stephen Hemminger--- include/uapi/linux/vm_sockets_diag.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/vm_sockets_diag.h b/include/uapi/linux/vm_sockets_diag.h index 14cd7dc5a187..0b4dd54f3d1e 100644 --- a/include/uapi/linux/vm_sockets_diag.h +++ b/include/uapi/linux/vm_sockets_diag.h @@ -1,3 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ /* AF_VSOCK sock_diag(7) interface for querying open sockets */ #ifndef _UAPI__VM_SOCKETS_DIAG_H__ -- 2.11.0
Re: [PATCH net-next 00/12] rxrpc: Fixes and improvements
From: David HowellsDate: Fri, 24 Nov 2017 14:37:39 + > Is it too late for this to go to Linus in this merge window? These look predominantly like fixes so I'll pull this in. Thanks.
Re: [PATCH iproute2/net-next v3]tc: B.W limits can now be specified in %.
On Sat, 18 Nov 2017 02:13:38 +0530 Nishanth Devarajanwrote: > This patch adapts the tc command line interface to allow bandwidth limits > to be specified as a percentage of the interface's capacity. > > Adding this functionality requires passing the specified device string to > each class/qdisc which changes the prototype for a couple of functions: the > .parse_qopt and .parse_copt interfaces. The device string is a required > parameter for tc-qdisc and tc-class, and when not specified, the kernel > returns ENODEV. In this patch, if the user tries to specify a bandwidth > percentage without naming the device, we return an error from userspace. > > v2: > * Modified and moved int read_prop() from ip/iptuntap.c to lib/utils.c, > to make it accessible to tc. > > v3: > * Modified and moved int parse_percent() from tc/q_netem.c to ib/util.c for > use in tc. > > * Changed couple variable names in int parse_percent_rate(). > > * Handled showing error message when device speed is unknown. > > * Updated man page to warn users that when specifying rates in %, tc only > uses the current device speed and does not recalculate if it changes after. > > During cases when properties (like device speed) are unknown, read_prop() > assumes that if the property file can be opened but not read, it means > that the property is unknown. > > Signed-off by: Nishanth Devarajan > Applied, but there were three things that I needed to change: 1. The DCO tag is "Signed-off-by" not "Signed-off by" 2. The revision history should be below the cut line --- in the mail message so that it doesn't end up in the commit message. 3. The qopt function declarations now are a really long line. I will break them up.
Re: [PATCH] net-sysfs: export gso_max_size attribute
On Fri, 2017-11-24 at 11:43 -0700, David Ahern wrote: > On 11/24/17 11:32 AM, Eric Dumazet wrote: > > On Fri, 2017-11-24 at 10:14 -0700, David Ahern wrote: > > > On 11/22/17 5:30 PM, Solio Sarabia wrote: > > > > The netdevice gso_max_size is exposed to allow users fine- > > > > control > > > > on > > > > systems with multiple NICs with different GSO buffer sizes, and > > > > where > > > > the virtual devices like bridge and veth, need to be aware of > > > > the > > > > GSO > > > > size of the underlying devices. > > > > > > > > In a virtualized environment, setting the right GSO sizes for > > > > physical > > > > and virtual devices makes all TSO work to be on physical NIC, > > > > improving > > > > throughput and reducing CPU util. If virtual devices send > > > > buffers > > > > greater than what NIC supports, it forces host to do TSO for > > > > buffers > > > > exceeding the limit, increasing CPU utilization in host. > > > > > > > > Suggested-by: Shiny Sebastian> > > > Signed-off-by: Solio Sarabia > > > > --- > > > > > > This should be added to rtnetlink rather than sysfs. > > > > This is already exposed by rtnetlink [1] > > It currently is read-only. This patch wants to control setting it. > > > > > Please lets not add yet another net-sysfs knob. > > Which is my main point - no more sysfs files. > I was not objecting to your point, sorry if this was not obvious. I usually hit reply on the latest email, not the first one in the thread. Proper support for changing these attributes is more complex than that trivial change. Bonding and team devices, and tunnels comes to mind.
Re: [PATCH] net-sysfs: export gso_max_size attribute
On 11/24/17 11:32 AM, Eric Dumazet wrote: > On Fri, 2017-11-24 at 10:14 -0700, David Ahern wrote: >> On 11/22/17 5:30 PM, Solio Sarabia wrote: >>> The netdevice gso_max_size is exposed to allow users fine-control >>> on >>> systems with multiple NICs with different GSO buffer sizes, and >>> where >>> the virtual devices like bridge and veth, need to be aware of the >>> GSO >>> size of the underlying devices. >>> >>> In a virtualized environment, setting the right GSO sizes for >>> physical >>> and virtual devices makes all TSO work to be on physical NIC, >>> improving >>> throughput and reducing CPU util. If virtual devices send buffers >>> greater than what NIC supports, it forces host to do TSO for >>> buffers >>> exceeding the limit, increasing CPU utilization in host. >>> >>> Suggested-by: Shiny Sebastian>>> Signed-off-by: Solio Sarabia >>> --- >> >> This should be added to rtnetlink rather than sysfs. > > This is already exposed by rtnetlink [1] It currently is read-only. This patch wants to control setting it. > > Please lets not add yet another net-sysfs knob. Which is my main point - no more sysfs files.
Re: [PATCH] net-sysfs: export gso_max_size attribute
On Fri, 2017-11-24 at 10:14 -0700, David Ahern wrote: > On 11/22/17 5:30 PM, Solio Sarabia wrote: > > The netdevice gso_max_size is exposed to allow users fine-control > > on > > systems with multiple NICs with different GSO buffer sizes, and > > where > > the virtual devices like bridge and veth, need to be aware of the > > GSO > > size of the underlying devices. > > > > In a virtualized environment, setting the right GSO sizes for > > physical > > and virtual devices makes all TSO work to be on physical NIC, > > improving > > throughput and reducing CPU util. If virtual devices send buffers > > greater than what NIC supports, it forces host to do TSO for > > buffers > > exceeding the limit, increasing CPU utilization in host. > > > > Suggested-by: Shiny Sebastian> > Signed-off-by: Solio Sarabia > > --- > > This should be added to rtnetlink rather than sysfs. This is already exposed by rtnetlink [1] Please lets not add yet another net-sysfs knob. [1] c70ce028e834f8e51306217dbdbd441d851c64d3 net/rtnetlink: add IFLA_GSO_MAX_SEGS and IFLA_GSO_MAX_SIZE attributes
Re: [PATCH net] net: qmi_wwan: add support for Cinterion PLS8
Reinhard Speyererwrites: > before posting this problem report > https://developer.gemalto.com/threads/ipv6dualstack-problems-pls8-e-revision-03017 > in the Gemalto developer forum I tested the qmi_wwan/cdc_ether changes > you suggested above and apart from having two working QMI interfaces > the IPv6/dualstack problems observed with AT^SWWAN/cdc_ether were > also gone when using WDSStartNetworkInterface and the QMI interface in > raw IP mode instead. Right. I did not know about the "carrier off" issue. But messed up ethernet headers is a well known problem with all these Qualcomm based modems. Switching them to raw IP mode is often the only way to make them work consistently. Having seen this problem with multiple vendors, where some even have borrowed our workarounds for their own out-of-tree drivers, makes me pretty sure that it isn't easily fixable. It's a Qualcomm bug, and I guess no one is allowed to even look at the code. Much less change it. Which makes sense given the mess it must be... > Unfortunately Gemalto does no seems to be willing to provide an > alternative USB composition which includes QMI interfaces for the > PLS8. Therefore applying the above changes to qmi_wwan/cdc_ether might > make the PLS8 network interfaces stop working when Gemalto decides to > replace their f_rmnet gadget in CDCECM mode with a f_ecm gadget when > releasing a firmware update. I don't think this is necessarily a problem. Only the QMI control channel will stop working should this happen. The qmi_wwan driver will provide the same network device support as cdc_ether, using CDC ECM framing. And to be honest, such a redesign of the modem application for a mature product is very unlikely, isn't it? Why would Gemalto want to do all that extra work, taking the risks involved? For what possible purpose? This is probably the reason they don't want to mess with alternative USB compositions either. In any case, I think it is worth adding this device to qmi_wwan if it works with current firmwares and you, or anyone else, finds it useful. And it does sound like that based on the IPv6 issues you mention.. But I'll leave the decision to you or anyone else with such a device. Bjørn
Re: sunrpc: infinite unkillable console spam in xs_tcp_setup_socket
On Mon, 2017-11-20 at 14:02 +0100, Dmitry Vyukov wrote: > Hello, > > The following program triggers infinite stream of the following > output > on console. The program is unkillable and this effectively brings the > machine down: > > > ** 16 printk messages dropped ** [12875.022917] xs_tcp_setup_socket: > connect returned unhandled error -113 > Does the following fix the issue? 8<- From f48d3f01df45f50f0145060f5272ccf1aea855ac Mon Sep 17 00:00:00 2001 From: Trond MyklebustDate: Fri, 24 Nov 2017 12:00:24 -0500 Subject: [PATCH] SUNRPC: Allow connect to return EHOSTUNREACH Reported-by: Dmitry Vyukov Signed-off-by: Trond Myklebust --- net/sunrpc/xprtsock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 4dad5da388d6..8cb40f8ffa5b 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2437,6 +2437,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) case -ECONNREFUSED: case -ECONNRESET: case -ENETUNREACH: + case -EHOSTUNREACH: case -EADDRINUSE: case -ENOBUFS: /* -- 2.14.3 -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: [PATCH iproute 0/5] ila: additional configuratio support
On Wed, 22 Nov 2017 12:05:32 -0800 Tom Herbertwrote: > Add configuration support for checksum neutral-map-auto, identifier > tyoes, and hook type (for LWT). > > Tom Herbert (5): > ila: Fix reporting of ILA locators and locator match > ila: added csum neutral support to ipila > ila: support to configure checksum neutral-map-auto > ila: support for configuring identifier and hook types > ila: create ila_common.h > > ip/ila_common.h | 105 > ++ > ip/ipila.c| 57 +-- > ip/iproute_lwtunnel.c | 68 +++- > 3 files changed, 200 insertions(+), 30 deletions(-) > create mode 100644 ip/ila_common.h > Applied, thanks.
Re: [PATCH] net-sysfs: export gso_max_size attribute
On 11/22/17 5:30 PM, Solio Sarabia wrote: > The netdevice gso_max_size is exposed to allow users fine-control on > systems with multiple NICs with different GSO buffer sizes, and where > the virtual devices like bridge and veth, need to be aware of the GSO > size of the underlying devices. > > In a virtualized environment, setting the right GSO sizes for physical > and virtual devices makes all TSO work to be on physical NIC, improving > throughput and reducing CPU util. If virtual devices send buffers > greater than what NIC supports, it forces host to do TSO for buffers > exceeding the limit, increasing CPU utilization in host. > > Suggested-by: Shiny Sebastian> Signed-off-by: Solio Sarabia > --- This should be added to rtnetlink rather than sysfs.
Re: [patch iproute2] tc: move action cookie print out of the stats if
On Fri, 24 Nov 2017 09:28:21 +0100 Jiri Pirkowrote: > From: Jiri Pirko > > Cookie print was made dependent on show_stats for no good reason. Fix > this bu pushing cookie print ot of the stats if. > > Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies") > Signed-off-by: Jiri Pirko > --- > tc/m_action.c | 17 - > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/tc/m_action.c b/tc/m_action.c > index 0dce97f..c2fc4f1 100644 > --- a/tc/m_action.c > +++ b/tc/m_action.c > @@ -301,19 +301,18 @@ static int tc_print_one_action(FILE *f, struct rtattr > *arg) > return err; > > if (show_stats && tb[TCA_ACT_STATS]) { > - > fprintf(f, "\tAction statistics:\n"); > print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL); > - if (tb[TCA_ACT_COOKIE]) { > - int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]); > - char b1[strsz * 2 + 1]; > - > - fprintf(f, "\n\tcookie len %d %s ", strsz, > - hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]), > - strsz, b1, sizeof(b1))); > - } > fprintf(f, "\n"); > } > + if (tb[TCA_ACT_COOKIE]) { > + int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]); > + char b1[strsz * 2 + 1]; > + > + fprintf(f, "\tcookie len %d %s\n", strsz, > + hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]), > + strsz, b1, sizeof(b1))); > + } > > return 0; > } Yes, it should not be under stats flag. The general model is that -s is for statistics only; and -d is for read only detail values. So this makes sense. The problem is that the format of the action cookie needs to be same on command line argument and on display; i.e drop the length part of the display .
[PATCH net] net: dsa: fix 'increment on 0' warning
Setting the refcount to 0 when allocating a tree to match the number of switch devices it holds may cause an 'increment on 0; use-after-free', if CONFIG_REFCOUNT_FULL is enabled. To fix this, do not decrement the refcount of a newly allocated tree, increment it when an already allocated tree is found, and decrement it after the probing of a switch, as done with the previous behavior. At the same time, make dsa_tree_get and dsa_tree_put accept a NULL argument to simplify callers, and return the tree after incrementation, as most kref users like of_node_get and of_node_put do. Fixes: 8e5bf9759a06 ("net: dsa: simplify tree reference counting") Signed-off-by: Vivien Didelot--- net/dsa/dsa2.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 44e3fb7dec8c..1e287420ff49 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -51,9 +51,7 @@ static struct dsa_switch_tree *dsa_tree_alloc(int index) INIT_LIST_HEAD(>list); list_add_tail(_tree_list, >list); - /* Initialize the reference counter to the number of switches, not 1 */ kref_init(>refcount); - refcount_set(>refcount.refcount, 0); return dst; } @@ -64,20 +62,23 @@ static void dsa_tree_free(struct dsa_switch_tree *dst) kfree(dst); } +static struct dsa_switch_tree *dsa_tree_get(struct dsa_switch_tree *dst) +{ + if (dst) + kref_get(>refcount); + + return dst; +} + static struct dsa_switch_tree *dsa_tree_touch(int index) { struct dsa_switch_tree *dst; dst = dsa_tree_find(index); - if (!dst) - dst = dsa_tree_alloc(index); - - return dst; -} - -static void dsa_tree_get(struct dsa_switch_tree *dst) -{ - kref_get(>refcount); + if (dst) + return dsa_tree_get(dst); + else + return dsa_tree_alloc(index); } static void dsa_tree_release(struct kref *ref) @@ -91,7 +92,8 @@ static void dsa_tree_release(struct kref *ref) static void dsa_tree_put(struct dsa_switch_tree *dst) { - kref_put(>refcount, dsa_tree_release); + if (dst) + kref_put(>refcount, dsa_tree_release); } static bool dsa_port_is_dsa(struct dsa_port *port) @@ -765,6 +767,7 @@ int dsa_register_switch(struct dsa_switch *ds) mutex_lock(_mutex); err = dsa_switch_probe(ds); + dsa_tree_put(ds->dst); mutex_unlock(_mutex); return err; -- 2.15.0
Re: 8e5bf9759a ("net: dsa: simplify tree reference counting"): WARNING: CPU: 1 PID: 27 at lib/refcount.c:153 refcount_inc
Hi Fengguang, Fengguang Wuwrites: > It looks linus/master and linux-next still has this issue. I sent a fix to net-next before it closes but it hasn't been picked. Now that it's in the net tree, I'm sending an alternative fix right now. Thank for the note! Vivien
[PATCH net-next 00/12] rxrpc: Fixes and improvements
Hi David, Is it too late for this to go to Linus in this merge window? --- Here's a set of patches that fix and improve some stuff in the AF_RXRPC protocol: The patches are: (1) Unlock mutex returned by rxrpc_accept_call(). (2) Don't set connection upgrade by default. (3) Differentiate the call->user_mutex used by the kernel from that used by userspace calling sendmsg() to avoid lockdep warnings. (4) Delay terminal ACK transmission to a work queue so that it can be replaced by the next call if there is one. (5) Split the call parameters from the connection parameters so that more call-specific parameters can be passed through. (6) Fix the call timeouts to work the same as for other RxRPC/AFS implementations. (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly so that can be discarded or can represent more packets. (8) Use RTT to calculate certain protocol timeouts. (9) Add a timeout to detect lost ACK/DATA packets. (10) Add a keepalive function so that we ping the peer if we haven't transmitted for a short while, thereby keeping intervening firewall routes open. (11) Make service endpoints expire like they're supposed to so that the UDP port can be reused. (12) Fix connection expiry timers to make cleanup happen in a more timely fashion. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20171124 David --- David Howells (12): rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing rxrpc: Don't set upgrade by default in sendmsg() rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls rxrpc: Delay terminal ACK transmission on a client call rxrpc: Split the call params from the operation params rxrpc: Fix call timeouts rxrpc: Don't transmit DELAY ACKs immediately on proposal rxrpc: Express protocol timeouts in terms of RTT rxrpc: Add a timeout for detecting lost ACKs/lost DATA rxrpc: Add keepalive for a call rxrpc: Fix service endpoint expiry rxrpc: Fix conn expiry timers include/trace/events/rxrpc.h | 86 include/uapi/linux/rxrpc.h |1 net/rxrpc/af_rxrpc.c | 23 net/rxrpc/ar-internal.h | 103 --- net/rxrpc/call_accept.c |2 net/rxrpc/call_event.c | 229 -- net/rxrpc/call_object.c | 62 +++ net/rxrpc/conn_client.c | 54 -- net/rxrpc/conn_event.c | 74 +++--- net/rxrpc/conn_object.c | 76 +- net/rxrpc/input.c| 74 +- net/rxrpc/misc.c | 19 +-- net/rxrpc/net_ns.c | 33 +- net/rxrpc/output.c | 43 net/rxrpc/recvmsg.c | 12 +- net/rxrpc/sendmsg.c | 126 ++- net/rxrpc/sysctl.c | 60 +-- 17 files changed, 752 insertions(+), 325 deletions(-)
[PATCH net-next 01/12] rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing
The caller of rxrpc_accept_call() must release the lock on call->user_mutex returned by that function. Signed-off-by: David Howells--- net/rxrpc/sendmsg.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 7d2595582c09..3a99b1a908df 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -619,8 +619,8 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) /* The socket is now unlocked. */ if (IS_ERR(call)) return PTR_ERR(call); - rxrpc_put_call(call, rxrpc_call_put); - return 0; + ret = 0; + goto out_put_unlock; } call = rxrpc_find_call_by_user_ID(rx, p.user_call_ID); @@ -689,6 +689,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) ret = rxrpc_send_data(rx, call, msg, len, NULL); } +out_put_unlock: mutex_unlock(>user_mutex); error_put: rxrpc_put_call(call, rxrpc_call_put);
[PATCH net-next 03/12] rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls
Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS filesystem. The problem is that lockdep registers a false positive between userspace calling the sendmsg syscall on a user socket where call->user_mutex is held whilst userspace memory is accessed whereas the AFS filesystem may perform operations with mmap_sem held by the caller. In such a case, the following warning is produced. == WARNING: possible circular locking dependency detected 4.14.0-fscache+ #243 Tainted: GE -- modpost/16701 is trying to acquire lock: (>io_lock){+.+.}, at: [] afs_begin_vnode_operation+0x33/0x77 [kafs] but task is already holding lock: (>mmap_sem){}, at: [] __do_page_fault+0x1ef/0x486 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (>mmap_sem){}: __might_fault+0x61/0x89 _copy_from_iter_full+0x40/0x1fa rxrpc_send_data+0x8dc/0xff3 rxrpc_do_sendmsg+0x62f/0x6a1 rxrpc_sendmsg+0x166/0x1b7 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1ad/0x22b __sys_sendmsg+0x41/0x62 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #2 (>user_mutex){+.+.}: __mutex_lock+0x86/0x7d2 rxrpc_new_client_call+0x378/0x80e rxrpc_kernel_begin_call+0xf3/0x154 afs_make_call+0x195/0x454 [kafs] afs_vl_get_capabilities+0x193/0x198 [kafs] afs_vl_lookup_vldb+0x5f/0x151 [kafs] afs_create_volume+0x2e/0x2f4 [kafs] afs_mount+0x56a/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #1 (k-sk_lock-AF_RXRPC){+.+.}: lock_sock_nested+0x74/0x8a rxrpc_kernel_begin_call+0x8a/0x154 afs_make_call+0x195/0x454 [kafs] afs_fs_get_capabilities+0x17a/0x17f [kafs] afs_probe_fileserver+0xf7/0x2f0 [kafs] afs_select_fileserver+0x83f/0x903 [kafs] afs_fetch_status+0x89/0x11d [kafs] afs_iget+0x16f/0x4f8 [kafs] afs_mount+0x6c6/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #0 (>io_lock){+.+.}: lock_acquire+0x174/0x19f __mutex_lock+0x86/0x7d2 afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 __clear_user+0x3d/0x60 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 other info that might help us debug this: Chain exists of: >io_lock --> >user_mutex --> >mmap_sem Possible unsafe locking scenario: CPU0CPU1 lock(>mmap_sem); lock(>user_mutex); lock(>mmap_sem); lock(>io_lock); *** DEADLOCK *** 1 lock held by modpost/16701: #0: (>mmap_sem){}, at: [] __do_page_fault+0x1ef/0x486 stack backtrace: CPU: 0 PID: 16701 Comm: modpost Tainted: GE 4.14.0-fscache+ #243 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x341/0x34f check_prev_add+0x11f/0x5d4 ? add_lock_to_list.isra.12+0x8b/0x8b ? add_lock_to_list.isra.12+0x8b/0x8b ? __lock_acquire+0xf77/0x10b4 __lock_acquire+0xf77/0x10b4 lock_acquire+0x174/0x19f ? afs_begin_vnode_operation+0x33/0x77 [kafs] __mutex_lock+0x86/0x7d2 ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba ? filemap_fault+0x179/0x54d filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 RIP: 0010:__clear_user+0x3d/0x60 RSP: 0018:880071e93da0 EFLAGS: 00010202 RAX: RBX: 011c RCX: 011c RDX: RSI: 0008 RDI: 0060f720 RBP: 0060f720 R08: 0001 R09: R10:
[PATCH net-next 05/12] rxrpc: Split the call params from the operation params
When rxrpc_sendmsg() parses the control message buffer, it places the parameters extracted into a structure, but lumps together call parameters (such as user call ID) with operation parameters (such as whether to send data, send an abort or accept a call). Split the call parameters out into their own structure, a copy of which is then embedded in the operation parameters struct. The call parameters struct is then passed down into the places that need it instead of passing the individual parameters. This allows for extra call parameters to be added. Signed-off-by: David Howells--- net/rxrpc/af_rxrpc.c|8 ++- net/rxrpc/ar-internal.h | 31 - net/rxrpc/call_object.c | 15 ++ net/rxrpc/sendmsg.c | 51 --- 4 files changed, 60 insertions(+), 45 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 9b5c46b052fd..c0cdcf980ffc 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -285,6 +285,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, bool upgrade) { struct rxrpc_conn_parameters cp; + struct rxrpc_call_params p; struct rxrpc_call *call; struct rxrpc_sock *rx = rxrpc_sk(sock->sk); int ret; @@ -302,6 +303,10 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, if (key && !key->payload.data[0]) key = NULL; /* a no-security key */ + memset(, 0, sizeof(p)); + p.user_call_ID = user_call_ID; + p.tx_total_len = tx_total_len; + memset(, 0, sizeof(cp)); cp.local= rx->local; cp.key = key; @@ -309,8 +314,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, cp.exclusive= false; cp.upgrade = upgrade; cp.service_id = srx->srx_service; - call = rxrpc_new_client_call(rx, , srx, user_call_ID, tx_total_len, -gfp); + call = rxrpc_new_client_call(rx, , srx, , gfp); /* The socket has been unlocked. */ if (!IS_ERR(call)) { call->notify_rx = notify_rx; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index d1213d503f30..ba63f2231107 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -643,6 +643,35 @@ struct rxrpc_ack_summary { u8 cumulative_acks; }; +/* + * sendmsg() cmsg-specified parameters. + */ +enum rxrpc_command { + RXRPC_CMD_SEND_DATA,/* send data message */ + RXRPC_CMD_SEND_ABORT, /* request abort generation */ + RXRPC_CMD_ACCEPT, /* [server] accept incoming call */ + RXRPC_CMD_REJECT_BUSY, /* [server] reject a call as busy */ +}; + +struct rxrpc_call_params { + s64 tx_total_len; /* Total Tx data length (if send data) */ + unsigned long user_call_ID; /* User's call ID */ + struct { + u32 hard; /* Maximum lifetime (sec) */ + u32 idle; /* Max time since last data packet (msec) */ + u32 normal; /* Max time since last call packet (msec) */ + } timeouts; + u8 nr_timeouts;/* Number of timeouts specified */ +}; + +struct rxrpc_send_params { + struct rxrpc_call_params call; + u32 abort_code; /* Abort code to Tx (if abort) */ + enum rxrpc_command command : 8;/* The command to implement */ + boolexclusive; /* Shared or exclusive call */ + boolupgrade;/* If the connection is upgradeable */ +}; + #include /* @@ -687,7 +716,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *, gfp_t); struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *, struct rxrpc_conn_parameters *, struct sockaddr_rxrpc *, -unsigned long, s64, gfp_t); +struct rxrpc_call_params *, gfp_t); int rxrpc_retry_client_call(struct rxrpc_sock *, struct rxrpc_call *, struct rxrpc_conn_parameters *, diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 1f141dc08ad2..c3e1fa854471 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -208,8 +208,7 @@ static void rxrpc_start_call_timer(struct rxrpc_call *call) struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, struct rxrpc_conn_parameters *cp, struct sockaddr_rxrpc *srx, -
[PATCH net-next 04/12] rxrpc: Delay terminal ACK transmission on a client call
Delay terminal ACK transmission on a client call by deferring it to the connection processor. This allows it to be skipped if we can send the next call instead, the first DATA packet of which will implicitly ack this call. Signed-off-by: David Howells--- net/rxrpc/ar-internal.h | 17 +++ net/rxrpc/conn_client.c | 18 +++ net/rxrpc/conn_event.c | 74 +++ net/rxrpc/conn_object.c | 10 ++ net/rxrpc/recvmsg.c |2 + 5 files changed, 108 insertions(+), 13 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a972887b3f5d..d1213d503f30 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -338,8 +338,17 @@ enum rxrpc_conn_flag { RXRPC_CONN_DONT_REUSE, /* Don't reuse this connection */ RXRPC_CONN_COUNTED, /* Counted by rxrpc_nr_client_conns */ RXRPC_CONN_PROBING_FOR_UPGRADE, /* Probing for service upgrade */ + RXRPC_CONN_FINAL_ACK_0, /* Need final ACK for channel 0 */ + RXRPC_CONN_FINAL_ACK_1, /* Need final ACK for channel 1 */ + RXRPC_CONN_FINAL_ACK_2, /* Need final ACK for channel 2 */ + RXRPC_CONN_FINAL_ACK_3, /* Need final ACK for channel 3 */ }; +#define RXRPC_CONN_FINAL_ACK_MASK ((1UL << RXRPC_CONN_FINAL_ACK_0) | \ + (1UL << RXRPC_CONN_FINAL_ACK_1) |\ + (1UL << RXRPC_CONN_FINAL_ACK_2) |\ + (1UL << RXRPC_CONN_FINAL_ACK_3)) + /* * Events that can be raised upon a connection. */ @@ -393,6 +402,7 @@ struct rxrpc_connection { #define RXRPC_ACTIVE_CHANS_MASK((1 << RXRPC_MAXCALLS) - 1) struct list_headwaiting_calls; /* Calls waiting for channels */ struct rxrpc_channel { + unsigned long final_ack_at; /* Time at which to issue final ACK */ struct rxrpc_call __rcu *call; /* Active call */ u32 call_id;/* ID of current call */ u32 call_counter; /* Call ID counter */ @@ -404,6 +414,7 @@ struct rxrpc_connection { }; } channels[RXRPC_MAXCALLS]; + struct timer_list timer; /* Conn event timer */ struct work_struct processor; /* connection event processor */ union { struct rb_node client_node;/* Node in local->client_conns */ @@ -861,6 +872,12 @@ static inline void rxrpc_put_connection(struct rxrpc_connection *conn) rxrpc_put_service_conn(conn); } +static inline void rxrpc_reduce_conn_timer(struct rxrpc_connection *conn, + unsigned long expire_at) +{ + timer_reduce(>timer, expire_at); +} + /* * conn_service.c */ diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 5f9624bd311c..cfb997593da9 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -554,6 +554,11 @@ static void rxrpc_activate_one_channel(struct rxrpc_connection *conn, trace_rxrpc_client(conn, channel, rxrpc_client_chan_activate); + /* Cancel the final ACK on the previous call if it hasn't been sent yet +* as the DATA packet will implicitly ACK it. +*/ + clear_bit(RXRPC_CONN_FINAL_ACK_0 + channel, >flags); + write_lock_bh(>state_lock); if (!test_bit(RXRPC_CALL_TX_LASTQ, >flags)) call->state = RXRPC_CALL_CLIENT_SEND_REQUEST; @@ -813,6 +818,19 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) goto out_2; } + /* Schedule the final ACK to be transmitted in a short while so that it +* can be skipped if we find a follow-on call. The first DATA packet +* of the follow on call will implicitly ACK this call. +*/ + if (test_bit(RXRPC_CALL_EXPOSED, >flags)) { + unsigned long final_ack_at = jiffies + 2; + + WRITE_ONCE(chan->final_ack_at, final_ack_at); + smp_wmb(); /* vs rxrpc_process_delayed_final_acks() */ + set_bit(RXRPC_CONN_FINAL_ACK_0 + channel, >flags); + rxrpc_reduce_conn_timer(conn, final_ack_at); + } + /* Things are more complex and we need the cache lock. We might be * able to simply idle the conn or it might now be lurking on the wait * list. It might even get moved back to the active list whilst we're diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 59a51a56e7c8..9e9a8db1bc9c 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -24,9 +24,10 @@ * Retransmit terminal ACK or ABORT of the previous call. */ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, - struct sk_buff *skb) +
[PATCH net-next 02/12] rxrpc: Don't set upgrade by default in sendmsg()
Don't set upgrade by default when creating a call from sendmsg(). This is a holdover from when I was testing the code. Signed-off-by: David Howells--- net/rxrpc/sendmsg.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 3a99b1a908df..94555c94b2d8 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -602,7 +602,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) .abort_code = 0, .command= RXRPC_CMD_SEND_DATA, .exclusive = false, - .upgrade= true, + .upgrade= false, }; _enter("");
[PATCH net-next 07/12] rxrpc: Don't transmit DELAY ACKs immediately on proposal
Don't transmit a DELAY ACK immediately on proposal when the Rx window is rotated, but rather defer it to the work function. This means that we have a chance to queue/consume more received packets before we actually send the DELAY ACK, or even cancel it entirely, thereby reducing the number of packets transmitted. We do, however, want to continue sending other types of packet immediately, particularly REQUESTED ACKs, as they may be used for RTT calculation by the other side. Signed-off-by: David Howells--- net/rxrpc/recvmsg.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index 0b6609da80b7..fad5f42a3abd 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -219,9 +219,9 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call) after_eq(top, call->ackr_seen + 2) || (hard_ack == top && after(hard_ack, call->ackr_consumed))) rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial, - true, false, + true, true, rxrpc_propose_ack_rotate_rx); - if (call->ackr_reason) + if (call->ackr_reason && call->ackr_reason != RXRPC_ACK_DELAY) rxrpc_send_ack_packet(call, false); } }
[PATCH net-next 06/12] rxrpc: Fix call timeouts
Fix the rxrpc call expiration timeouts and make them settable from userspace. By analogy with other rx implementations, there should be three timeouts: (1) "Normal timeout" This is set for all calls and is triggered if we haven't received any packets from the peer in a while. It is measured from the last time we received any packet on that call. This is not reset by any connection packets (such as CHALLENGE/RESPONSE packets). If a service operation takes a long time, the server should generate PING ACKs at a duration that's substantially less than the normal timeout so is to keep both sides alive. This is set at 1/6 of normal timeout. (2) "Idle timeout" This is set only for a service call and is triggered if we stop receiving the DATA packets that comprise the request data. It is measured from the last time we received a DATA packet. (3) "Hard timeout" This can be set for a call and specified the maximum lifetime of that call. It should not be specified by default. Some operations (such as volume transfer) take a long time. Allow userspace to set/change the timeouts on a call with sendmsg, using a control message: RXRPC_SET_CALL_TIMEOUTS The data to the message is a number of 32-bit words, not all of which need be given: u32 hard_timeout; /* sec from first packet */ u32 idle_timeout; /* msec from packet Rx */ u32 normal_timeout; /* msec from data Rx */ This can be set in combination with any other sendmsg() that affects a call. Signed-off-by: David Howells--- include/trace/events/rxrpc.h | 69 +++- include/uapi/linux/rxrpc.h |1 net/rxrpc/ar-internal.h | 37 ++--- net/rxrpc/call_event.c | 179 -- net/rxrpc/call_object.c | 27 -- net/rxrpc/conn_client.c |4 - net/rxrpc/input.c| 34 +++- net/rxrpc/misc.c | 19 ++-- net/rxrpc/recvmsg.c |2 net/rxrpc/sendmsg.c | 59 +++--- net/rxrpc/sysctl.c | 60 +++--- 11 files changed, 290 insertions(+), 201 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index ebe96796027a..01dcbc2164b5 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -138,10 +138,20 @@ enum rxrpc_rtt_rx_trace { enum rxrpc_timer_trace { rxrpc_timer_begin, + rxrpc_timer_exp_ack, + rxrpc_timer_exp_hard, + rxrpc_timer_exp_idle, + rxrpc_timer_exp_normal, + rxrpc_timer_exp_ping, + rxrpc_timer_exp_resend, rxrpc_timer_expired, rxrpc_timer_init_for_reply, rxrpc_timer_init_for_send_reply, + rxrpc_timer_restart, rxrpc_timer_set_for_ack, + rxrpc_timer_set_for_hard, + rxrpc_timer_set_for_idle, + rxrpc_timer_set_for_normal, rxrpc_timer_set_for_ping, rxrpc_timer_set_for_resend, rxrpc_timer_set_for_send, @@ -296,12 +306,22 @@ enum rxrpc_congest_change { #define rxrpc_timer_traces \ EM(rxrpc_timer_begin, "Begin ") \ EM(rxrpc_timer_expired, "*EXPR*") \ + EM(rxrpc_timer_exp_ack, "ExpAck") \ + EM(rxrpc_timer_exp_hard,"ExpHrd") \ + EM(rxrpc_timer_exp_idle,"ExpIdl") \ + EM(rxrpc_timer_exp_normal, "ExpNml") \ + EM(rxrpc_timer_exp_ping,"ExpPng") \ + EM(rxrpc_timer_exp_resend, "ExpRsn") \ EM(rxrpc_timer_init_for_reply, "IniRpl") \ EM(rxrpc_timer_init_for_send_reply, "SndRpl") \ + EM(rxrpc_timer_restart, "Restrt") \ EM(rxrpc_timer_set_for_ack, "SetAck") \ + EM(rxrpc_timer_set_for_hard,"SetHrd") \ + EM(rxrpc_timer_set_for_idle,"SetIdl") \ + EM(rxrpc_timer_set_for_normal, "SetNml") \ EM(rxrpc_timer_set_for_ping,"SetPng") \ EM(rxrpc_timer_set_for_resend, "SetRTx") \ - E_(rxrpc_timer_set_for_send,"SetTx ") + E_(rxrpc_timer_set_for_send,"SetSnd") #define rxrpc_propose_ack_traces \ EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ @@ -932,39 +952,44 @@ TRACE_EVENT(rxrpc_rtt_rx, TRACE_EVENT(rxrpc_timer, TP_PROTO(struct rxrpc_call *call, enum rxrpc_timer_trace why, -ktime_t now, unsigned long now_j), +unsigned long now), - TP_ARGS(call, why, now, now_j), + TP_ARGS(call, why, now), TP_STRUCT__entry( __field(struct rxrpc_call *,call ) __field(enum rxrpc_timer_trace, why ) - __field_struct(ktime_t, now
[PATCH net-next 09/12] rxrpc: Add a timeout for detecting lost ACKs/lost DATA
Add an extra timeout that is set/updated when we send a DATA packet that has the request-ack flag set. This allows us to detect if we don't get an ACK in response to the latest flagged packet. The ACK packet is adjudged to have been lost if it doesn't turn up within 2*RTT of the transmission. If the timeout occurs, we schedule the sending of a PING ACK to find out the state of the other side. If a new DATA packet is ready to go sooner, we cancel the sending of the ping and set the request-ack flag on that instead. If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what we had at the time of the ping transmission, we adjudge all the DATA packets sent between the response tx_top and the ping-time tx_top to have been lost and retransmit immediately. Rather than sending a PING ACK, we could just pick a DATA packet and speculatively retransmit that with request-ack set. It should result in either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the a PING-RESPONSE ACK mentioned above. Signed-off-by: David Howells--- include/trace/events/rxrpc.h | 11 +-- net/rxrpc/ar-internal.h |6 +- net/rxrpc/call_event.c | 26 ++ net/rxrpc/call_object.c |1 + net/rxrpc/input.c| 40 net/rxrpc/output.c | 20 ++-- net/rxrpc/recvmsg.c |4 ++-- net/rxrpc/sendmsg.c |2 +- 8 files changed, 98 insertions(+), 12 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 01dcbc2164b5..84ade8b76a19 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -141,6 +141,7 @@ enum rxrpc_timer_trace { rxrpc_timer_exp_ack, rxrpc_timer_exp_hard, rxrpc_timer_exp_idle, + rxrpc_timer_exp_lost_ack, rxrpc_timer_exp_normal, rxrpc_timer_exp_ping, rxrpc_timer_exp_resend, @@ -151,6 +152,7 @@ enum rxrpc_timer_trace { rxrpc_timer_set_for_ack, rxrpc_timer_set_for_hard, rxrpc_timer_set_for_idle, + rxrpc_timer_set_for_lost_ack, rxrpc_timer_set_for_normal, rxrpc_timer_set_for_ping, rxrpc_timer_set_for_resend, @@ -309,6 +311,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_exp_ack, "ExpAck") \ EM(rxrpc_timer_exp_hard,"ExpHrd") \ EM(rxrpc_timer_exp_idle,"ExpIdl") \ + EM(rxrpc_timer_exp_lost_ack,"ExpLoA") \ EM(rxrpc_timer_exp_normal, "ExpNml") \ EM(rxrpc_timer_exp_ping,"ExpPng") \ EM(rxrpc_timer_exp_resend, "ExpRsn") \ @@ -318,6 +321,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_set_for_ack, "SetAck") \ EM(rxrpc_timer_set_for_hard,"SetHrd") \ EM(rxrpc_timer_set_for_idle,"SetIdl") \ + EM(rxrpc_timer_set_for_lost_ack,"SetLoA") \ EM(rxrpc_timer_set_for_normal, "SetNml") \ EM(rxrpc_timer_set_for_ping,"SetPng") \ EM(rxrpc_timer_set_for_resend, "SetRTx") \ @@ -961,6 +965,7 @@ TRACE_EVENT(rxrpc_timer, __field(enum rxrpc_timer_trace, why ) __field(long, now ) __field(long, ack_at ) + __field(long, ack_lost_at ) __field(long, resend_at ) __field(long, ping_at ) __field(long, expect_rx_by ) @@ -974,6 +979,7 @@ TRACE_EVENT(rxrpc_timer, __entry->why= why; __entry->now= now; __entry->ack_at = call->ack_at; + __entry->ack_lost_at= call->ack_lost_at; __entry->resend_at = call->resend_at; __entry->expect_rx_by = call->expect_rx_by; __entry->expect_req_by = call->expect_req_by; @@ -981,10 +987,11 @@ TRACE_EVENT(rxrpc_timer, __entry->timer = call->timer.expires; ), - TP_printk("c=%p %s a=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld", + TP_printk("c=%p %s a=%ld la=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld", __entry->call, __print_symbolic(__entry->why, rxrpc_timer_traces), __entry->ack_at - __entry->now, + __entry->ack_lost_at - __entry->now, __entry->resend_at - __entry->now, __entry->expect_rx_by - __entry->now,
[PATCH net-next 11/12] rxrpc: Fix service endpoint expiry
RxRPC service endpoints expire like they're supposed to by the following means: (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the global service conn timeout, otherwise the first rxrpc_net struct to die will cause connections on all others to expire immediately from then on. (2) Mark local service endpoints for which the socket has been closed (->service_closed) so that the expiration timeout can be much shortened for service and client connections going through that endpoint. (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage count reaches 1, not 0, as idle conns have a 1 count. (4) The accumulator for the earliest time we might want to schedule for should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as the comparison functions use signed arithmetic. (5) Simplify the expiration handling, adding the expiration value to the idle timestamp each time rather than keeping track of the time in the past before which the idle timestamp must go to be expired. This is much easier to read. (6) Ignore the timeouts if the net namespace is dead. (7) Restart the service reaper work item rather the client reaper. Signed-off-by: David Howells--- include/trace/events/rxrpc.h |2 ++ net/rxrpc/af_rxrpc.c | 13 + net/rxrpc/ar-internal.h |3 +++ net/rxrpc/conn_client.c |2 ++ net/rxrpc/conn_object.c | 42 -- net/rxrpc/net_ns.c |3 +++ 6 files changed, 47 insertions(+), 18 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index e98fed6de497..36cb50c111a6 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -49,6 +49,7 @@ enum rxrpc_conn_trace { rxrpc_conn_put_client, rxrpc_conn_put_service, rxrpc_conn_queued, + rxrpc_conn_reap_service, rxrpc_conn_seen, }; @@ -221,6 +222,7 @@ enum rxrpc_congest_change { EM(rxrpc_conn_put_client, "PTc") \ EM(rxrpc_conn_put_service, "PTs") \ EM(rxrpc_conn_queued, "QUE") \ + EM(rxrpc_conn_reap_service, "RPs") \ E_(rxrpc_conn_seen, "SEE") #define rxrpc_client_traces \ diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index c0cdcf980ffc..abb524c2b8f8 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -867,6 +867,19 @@ static int rxrpc_release_sock(struct sock *sk) sock_orphan(sk); sk->sk_shutdown = SHUTDOWN_MASK; + /* We want to kill off all connections from a service socket +* as fast as possible because we can't share these; client +* sockets, on the other hand, can share an endpoint. +*/ + switch (sk->sk_state) { + case RXRPC_SERVER_BOUND: + case RXRPC_SERVER_BOUND2: + case RXRPC_SERVER_LISTENING: + case RXRPC_SERVER_LISTEN_DISABLED: + rx->local->service_closed = true; + break; + } + spin_lock_bh(>sk_receive_queue.lock); sk->sk_state = RXRPC_CLOSE; spin_unlock_bh(>sk_receive_queue.lock); diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index cdcbc798f921..a0082c407005 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -84,6 +84,7 @@ struct rxrpc_net { unsigned intnr_client_conns; unsigned intnr_active_client_conns; boolkill_all_client_conns; + boollive; spinlock_t client_conn_cache_lock; /* Lock for ->*_client_conns */ spinlock_t client_conn_discard_lock; /* Prevent multiple discarders */ struct list_headwaiting_client_conns; @@ -265,6 +266,7 @@ struct rxrpc_local { rwlock_tservices_lock; /* lock for services list */ int debug_id; /* debug ID for printks */ booldead; + boolservice_closed; /* Service socket closed */ struct sockaddr_rxrpc srx;/* local address */ }; @@ -881,6 +883,7 @@ void rxrpc_process_connection(struct work_struct *); * conn_object.c */ extern unsigned int rxrpc_connection_expiry; +extern unsigned int rxrpc_closed_conn_expiry; struct rxrpc_connection *rxrpc_alloc_connection(gfp_t); struct rxrpc_connection *rxrpc_find_connection_rcu(struct rxrpc_local *, diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 97f6a8de4845..785dfdb9fef1 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -1079,6 +1079,8 @@ void rxrpc_discard_expired_client_conns(struct work_struct *work) expiry = rxrpc_conn_idle_client_expiry; if (nr_conns > rxrpc_reap_client_connections)
[PATCH net-next 10/12] rxrpc: Add keepalive for a call
We need to transmit a packet every so often to act as a keepalive for the peer (which has a timeout from the last time it received a packet) and also to prevent any intervening firewalls from closing the route. Do this by resetting a timer every time we transmit a packet. If the timer ever expires, we transmit a PING ACK packet and thereby also elicit a PING RESPONSE ACK from the other side - which prevents our last-rx timeout from expiring. The timer is set to 1/6 of the last-rx timeout so that we can detect the other side going away if it misses 6 replies in a row. This is particularly necessary for servers where the processing of the service function may take a significant amount of time. Signed-off-by: David Howells--- include/trace/events/rxrpc.h |6 ++ net/rxrpc/ar-internal.h |1 + net/rxrpc/call_event.c | 10 ++ net/rxrpc/output.c | 23 +++ 4 files changed, 40 insertions(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 84ade8b76a19..e98fed6de497 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -141,6 +141,7 @@ enum rxrpc_timer_trace { rxrpc_timer_exp_ack, rxrpc_timer_exp_hard, rxrpc_timer_exp_idle, + rxrpc_timer_exp_keepalive, rxrpc_timer_exp_lost_ack, rxrpc_timer_exp_normal, rxrpc_timer_exp_ping, @@ -152,6 +153,7 @@ enum rxrpc_timer_trace { rxrpc_timer_set_for_ack, rxrpc_timer_set_for_hard, rxrpc_timer_set_for_idle, + rxrpc_timer_set_for_keepalive, rxrpc_timer_set_for_lost_ack, rxrpc_timer_set_for_normal, rxrpc_timer_set_for_ping, @@ -162,6 +164,7 @@ enum rxrpc_timer_trace { enum rxrpc_propose_ack_trace { rxrpc_propose_ack_client_tx_end, rxrpc_propose_ack_input_data, + rxrpc_propose_ack_ping_for_keepalive, rxrpc_propose_ack_ping_for_lost_ack, rxrpc_propose_ack_ping_for_lost_reply, rxrpc_propose_ack_ping_for_params, @@ -311,6 +314,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_exp_ack, "ExpAck") \ EM(rxrpc_timer_exp_hard,"ExpHrd") \ EM(rxrpc_timer_exp_idle,"ExpIdl") \ + EM(rxrpc_timer_exp_keepalive, "ExpKA ") \ EM(rxrpc_timer_exp_lost_ack,"ExpLoA") \ EM(rxrpc_timer_exp_normal, "ExpNml") \ EM(rxrpc_timer_exp_ping,"ExpPng") \ @@ -321,6 +325,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_set_for_ack, "SetAck") \ EM(rxrpc_timer_set_for_hard,"SetHrd") \ EM(rxrpc_timer_set_for_idle,"SetIdl") \ + EM(rxrpc_timer_set_for_keepalive, "KeepAl") \ EM(rxrpc_timer_set_for_lost_ack,"SetLoA") \ EM(rxrpc_timer_set_for_normal, "SetNml") \ EM(rxrpc_timer_set_for_ping,"SetPng") \ @@ -330,6 +335,7 @@ enum rxrpc_congest_change { #define rxrpc_propose_ack_traces \ EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ EM(rxrpc_propose_ack_input_data,"DataIn ") \ + EM(rxrpc_propose_ack_ping_for_keepalive, "KeepAlv") \ EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \ EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \ EM(rxrpc_propose_ack_ping_for_params, "Params ") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 7e7b817c69f0..cdcbc798f921 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -519,6 +519,7 @@ struct rxrpc_call { unsigned long ack_lost_at;/* When ACK is figured as lost */ unsigned long resend_at; /* When next resend needs to happen */ unsigned long ping_at;/* When next to send a ping */ + unsigned long keepalive_at; /* When next to send a keepalive ping */ unsigned long expect_rx_by; /* When we expect to get a packet by */ unsigned long expect_req_by; /* When we expect to get a request DATA packet by */ unsigned long expect_term_by; /* When we expect call termination by */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index c65666b2f39e..bda952ffe6a6 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -366,6 +366,15 @@ void rxrpc_process_call(struct work_struct *work) set_bit(RXRPC_CALL_EV_ACK_LOST, >events); } + t = READ_ONCE(call->keepalive_at); + if (time_after_eq(now, t)) { + trace_rxrpc_timer(call, rxrpc_timer_exp_keepalive, now); + cmpxchg(>keepalive_at, t, now + MAX_JIFFY_OFFSET); + rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, 0, true, true, + rxrpc_propose_ack_ping_for_keepalive); + set_bit(RXRPC_CALL_EV_PING,
[PATCH net-next 12/12] rxrpc: Fix conn expiry timers
Fix the rxrpc connection expiry timers so that connections for closed AF_RXRPC sockets get deleted in a more timely fashion, freeing up the transport UDP port much more quickly. (1) Replace the delayed work items with work items plus timers so that timer_reduce() can be used to shorten them and so that the timer doesn't requeue the work item if the net namespace is dead. (2) Don't use queue_delayed_work() as that won't alter the timeout if the timer is already running. (3) Don't rearm the timers if the network namespace is dead. Signed-off-by: David Howells--- net/rxrpc/af_rxrpc.c|2 ++ net/rxrpc/ar-internal.h |6 -- net/rxrpc/conn_client.c | 30 +++--- net/rxrpc/conn_object.c | 28 +--- net/rxrpc/net_ns.c | 30 ++ 5 files changed, 68 insertions(+), 28 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index abb524c2b8f8..8f7cf4c042be 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -895,6 +895,8 @@ static int rxrpc_release_sock(struct sock *sk) rxrpc_release_calls_on_socket(rx); flush_workqueue(rxrpc_workqueue); rxrpc_purge_queue(>sk_receive_queue); + rxrpc_queue_work(>local->rxnet->service_conn_reaper); + rxrpc_queue_work(>local->rxnet->client_conn_reaper); rxrpc_put_local(rx->local); rx->local = NULL; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a0082c407005..416688381eb7 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -79,7 +79,8 @@ struct rxrpc_net { struct list_headconn_proc_list; /* List of conns in this namespace for proc */ struct list_headservice_conns; /* Service conns in this namespace */ rwlock_tconn_lock; /* Lock for ->conn_proc_list, ->service_conns */ - struct delayed_work service_conn_reaper; + struct work_struct service_conn_reaper; + struct timer_list service_conn_reap_timer; unsigned intnr_client_conns; unsigned intnr_active_client_conns; @@ -90,7 +91,8 @@ struct rxrpc_net { struct list_headwaiting_client_conns; struct list_headactive_client_conns; struct list_headidle_client_conns; - struct delayed_work client_conn_reaper; + struct work_struct client_conn_reaper; + struct timer_list client_conn_reap_timer; struct list_headlocal_endpoints; struct mutexlocal_mutex;/* Lock for ->local_endpoints */ diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 785dfdb9fef1..7f74ca3059f8 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -691,7 +691,7 @@ int rxrpc_connect_call(struct rxrpc_call *call, _enter("{%d,%lx},", call->debug_id, call->user_call_ID); - rxrpc_discard_expired_client_conns(>client_conn_reaper.work); + rxrpc_discard_expired_client_conns(>client_conn_reaper); rxrpc_cull_active_client_conns(rxnet); ret = rxrpc_get_client_conn(call, cp, srx, gfp); @@ -757,6 +757,18 @@ void rxrpc_expose_client_call(struct rxrpc_call *call) } /* + * Set the reap timer. + */ +static void rxrpc_set_client_reap_timer(struct rxrpc_net *rxnet) +{ + unsigned long now = jiffies; + unsigned long reap_at = now + rxrpc_conn_idle_client_expiry; + + if (rxnet->live) + timer_reduce(>client_conn_reap_timer, reap_at); +} + +/* * Disconnect a client call. */ void rxrpc_disconnect_client_call(struct rxrpc_call *call) @@ -896,9 +908,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) list_move_tail(>cache_link, >idle_client_conns); if (rxnet->idle_client_conns.next == >cache_link && !rxnet->kill_all_client_conns) - queue_delayed_work(rxrpc_workqueue, - >client_conn_reaper, - rxrpc_conn_idle_client_expiry); + rxrpc_set_client_reap_timer(rxnet); } else { trace_rxrpc_client(conn, channel, rxrpc_client_to_inactive); conn->cache_state = RXRPC_CONN_CLIENT_INACTIVE; @@ -1036,8 +1046,7 @@ void rxrpc_discard_expired_client_conns(struct work_struct *work) { struct rxrpc_connection *conn; struct rxrpc_net *rxnet = - container_of(to_delayed_work(work), -struct rxrpc_net, client_conn_reaper); + container_of(work, struct rxrpc_net, client_conn_reaper); unsigned long expiry, conn_expires_at, now; unsigned int nr_conns; bool did_discard = false; @@ -1116,9 +1125,8 @@ void rxrpc_discard_expired_client_conns(struct work_struct *work) */
[PATCH net-next 08/12] rxrpc: Express protocol timeouts in terms of RTT
Express protocol timeouts for data retransmission and deferred ack generation in terms on RTT rather than specified timeouts once we have sufficient RTT samples. For the moment, this requires just one RTT sample to be able to use this for ack deferral and two for data retransmission. The data retransmission timeout is set at RTT*1.5 and the ACK deferral timeout is set at RTT. Note that the calculated timeout is limited to a minimum of 4ns to make sure it doesn't happen too quickly. Signed-off-by: David Howells--- net/rxrpc/call_event.c | 22 ++ net/rxrpc/sendmsg.c|7 +++ 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index c14395d5ad8c..da91f16ac77c 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -52,7 +52,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason, enum rxrpc_propose_ack_trace why) { enum rxrpc_propose_ack_outcome outcome = rxrpc_propose_ack_use; - unsigned long now, ack_at, expiry = rxrpc_soft_ack_delay; + unsigned long expiry = rxrpc_soft_ack_delay; s8 prior = rxrpc_ack_priority[ack_reason]; /* Pings are handled specially because we don't want to accidentally @@ -116,7 +116,13 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason, background) rxrpc_queue_call(call); } else { - now = jiffies; + unsigned long now = jiffies, ack_at; + + if (call->peer->rtt_usage > 0) + ack_at = nsecs_to_jiffies(call->peer->rtt); + else + ack_at = expiry; + ack_at = jiffies + expiry; if (time_before(ack_at, call->ack_at)) { WRITE_ONCE(call->ack_at, ack_at); @@ -160,14 +166,22 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) struct sk_buff *skb; unsigned long resend_at; rxrpc_seq_t cursor, seq, top; - ktime_t now, max_age, oldest, ack_ts; + ktime_t now, max_age, oldest, ack_ts, timeout, min_timeo; int ix; u8 annotation, anno_type, retrans = 0, unacked = 0; _enter("{%d,%d}", call->tx_hard_ack, call->tx_top); + if (call->peer->rtt_usage > 1) + timeout = ns_to_ktime(call->peer->rtt * 3 / 2); + else + timeout = ms_to_ktime(rxrpc_resend_timeout); + min_timeo = ns_to_ktime((10 / HZ) * 4); + if (ktime_before(timeout, min_timeo)) + timeout = min_timeo; + now = ktime_get_real(); - max_age = ktime_sub_ms(now, rxrpc_resend_timeout * 1000 / HZ); + max_age = ktime_sub(now, timeout); spin_lock_bh(>lock); diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 03e0676db28c..c56ee54fdd1f 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -226,6 +226,13 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, } else { unsigned long now = jiffies, resend_at; + if (call->peer->rtt_usage > 1) + resend_at = nsecs_to_jiffies(call->peer->rtt * 3 / 2); + else + resend_at = rxrpc_resend_timeout; + if (resend_at < 1) + resend_at = 1; + resend_at = now + rxrpc_resend_timeout; WRITE_ONCE(call->resend_at, resend_at); rxrpc_reduce_call_timer(call, resend_at, now,
[PATCH v2] VSOCK: Don't call vsock_stream_has_data in atomic context
When using the host personality, VMCI will grab a mutex for any queue pair access. In the detach callback for the vmci vsock transport, we call vsock_stream_has_data while holding a spinlock, and vsock_stream_has_data will access a queue pair. To avoid this, we can simply omit calling vsock_stream_has_data for host side queue pairs, since the QPs are empty per default when the guest has detached. This bug affects users of VMware Workstation using kernel version 4.4 and later. Testing: Ran vsock tests between guest and host, and verified that with this change, the host isn't calling vsock_stream_has_data during detach. Ran mixedTest between guest and host using both guest and host as server. v2: Rebased on top of recent change to sk_state values Reviewed-by: Adit RanadiveReviewed-by: Aditya Sarwade Reviewed-by: Stefan Hajnoczi Signed-off-by: Jorgen Hansen --- net/vmw_vsock/vmci_transport.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 391775e..56573dc 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -797,9 +797,13 @@ static void vmci_transport_handle_detach(struct sock *sk) /* We should not be sending anymore since the peer won't be * there to receive, but we can still receive if there is data -* left in our consume queue. +* left in our consume queue. If the local endpoint is a host, +* we can't call vsock_stream_has_data, since that may block, +* but a host endpoint can't read data once the VM has +* detached, so there is no available data in that case. */ - if (vsock_stream_has_data(vsk) <= 0) { + if (vsk->local_addr.svm_cid == VMADDR_CID_HOST || + vsock_stream_has_data(vsk) <= 0) { sk->sk_state = TCP_CLOSE; if (sk->sk_state == TCP_SYN_SENT) { @@ -2144,7 +2148,7 @@ static void __exit vmci_transport_exit(void) MODULE_AUTHOR("VMware, Inc."); MODULE_DESCRIPTION("VMCI transport for Virtual Sockets"); -MODULE_VERSION("1.0.4.0-k"); +MODULE_VERSION("1.0.5.0-k"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("vmware_vsock"); MODULE_ALIAS_NETPROTO(PF_VSOCK); -- 1.7.0
[PATCH] atm: lanai: use setup_timer instead of init_timer
From: Colin Ian KingUse setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Colin Ian King --- drivers/atm/lanai.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/atm/lanai.c b/drivers/atm/lanai.c index 2351dad78ff5..87e8b5dfac39 100644 --- a/drivers/atm/lanai.c +++ b/drivers/atm/lanai.c @@ -1790,10 +1790,8 @@ static void lanai_timed_poll(unsigned long arg) static inline void lanai_timed_poll_start(struct lanai_dev *lanai) { - init_timer(>timer); + setup_timer(>timer, lanai_timed_poll, (unsigned long)lanai); lanai->timer.expires = jiffies + LANAI_POLL_PERIOD; - lanai->timer.data = (unsigned long) lanai; - lanai->timer.function = lanai_timed_poll; add_timer(>timer); } -- 2.14.1
[PATCH] atm: firestream: use setup_timer instead of init_timer
From: Colin Ian KingUse setup_timer function instead of initializing timer with the function and data fields. Signed-off-by: Colin Ian King --- drivers/atm/firestream.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/atm/firestream.c b/drivers/atm/firestream.c index 6b6368a56526..534001270be5 100644 --- a/drivers/atm/firestream.c +++ b/drivers/atm/firestream.c @@ -1885,9 +1885,7 @@ static int fs_init(struct fs_dev *dev) } #ifdef FS_POLL_FREQ - init_timer (>timer); - dev->timer.data = (unsigned long) dev; - dev->timer.function = fs_poll; + setup_timer(>timer, fs_poll, (unsigned long)dev); dev->timer.expires = jiffies + FS_POLL_FREQ; add_timer (>timer); #endif -- 2.14.1
Re: [PATCH v2] net: sched: crash on blocks with goto chain action
Fri, Nov 24, 2017 at 12:27:58PM CET, c...@rkapl.cz wrote: >tcf_block_put_ext has assumed that all filters (and thus their goto >actions) are destroyed in RCU callback and thus can not race with our >list iteration. However, that is not true during netns cleanup (see >tcf_exts_get_net comment). > >Prevent the user after free by holding all chains (except 0, that one is >already held). foreach_safe is not enough in this case. > >To reproduce, run the following in a netns and then delete the ns: >ip link add dtest type dummy >tc qdisc add dev dtest ingress >tc filter add dev dtest chain 1 parent : handle 1 prio 1 flower action > goto chain 2 > >Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()") >Signed-off-by: Roman Kapl>--- >v1 -> v2: Hold all chains instead of just the currently iterated one, > the code should be more clear this way. >--- > net/sched/cls_api.c | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > >diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c >index 7d97f612c9b9..ddcf04b4ab43 100644 >--- a/net/sched/cls_api.c >+++ b/net/sched/cls_api.c >@@ -336,7 +336,8 @@ static void tcf_block_put_final(struct work_struct *work) > struct tcf_chain *chain, *tmp; > > rtnl_lock(); >- /* Only chain 0 should be still here. */ >+ >+ /* At this point, all the chains should have refcnt == 1. */ > list_for_each_entry_safe(chain, tmp, >chain_list, list) > tcf_chain_put(chain); > rtnl_unlock(); >@@ -344,15 +345,21 @@ static void tcf_block_put_final(struct work_struct *work) > } > > /* XXX: Standalone actions are not allowed to jump to any chain, and bound >- * actions should be all removed after flushing. However, filters are now >- * destroyed in tc filter workqueue with RTNL lock, they can not race here. >+ * actions should be all removed after flushing. > */ > void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, > struct tcf_block_ext_info *ei) > { >- struct tcf_chain *chain, *tmp; >+ struct tcf_chain *chain; > >- list_for_each_entry_safe(chain, tmp, >chain_list, list) >+ /* Hold a refcnt for all chains, except 0, so that they don't disappear >+ * while we are iterating. Would be perhaps nice to mention that the appropriate tcf_chain_put is done in tcf_block_put_final() Regardless of this: Acked-by: Jiri Pirko >+ */ >+ list_for_each_entry(chain, >chain_list, list) >+ if (chain->index) >+ tcf_chain_hold(chain); >+ >+ list_for_each_entry(chain, >chain_list, list) > tcf_chain_flush(chain); > > tcf_block_offload_unbind(block, q, ei); >-- >2.15.0 >
Re: [RFC 0/9] net: create adaptive software irq moderation library
On Fri, Nov 24, 2017 at 4:05 AM, Saeed Mahameedwrote: > On Sun, Nov 5, 2017 at 9:44 PM, Andy Gospodarek wrote: >> From: Andy Gospodarek >> >> This RFC converts the adaptive interrupt moderation library from the >> mlx5_en driver into a library so it can be used by any driver. The last >> patch in this set adds support for interrupt moderation in the bnxt_en >> driver. >> >> The main purpose of this code in the mlx5_en driver is to allow an >> administrator to make sure that default coalesce settings are optimized >> for low latency, but quickly adapt to handle high throughput traffic and >> optimize how many packets are received during each napi poll. >> >> For any new driver the following changes would be needed to use this >> library: >> >> - add elements in ring struct to track items needed by this library >> - create function that can be called to actually set coalesce settings >> for the driver >> >> My main reason for making this an RFC is that I would like verification >> from Mellanox that the performance of their driver does not change in a >> unintended way. I did some basic testing (netperf) and did not note a >> statistically significant change in throughput or CPU utilization before >> and after this set. >> >> Credit to Rob Rice and Lee Reed for doing some of the initial proof of >> concept and testing for this patch. > > Hi Andy, > > Following our conversation in netdev 2.2, i would like to suggest the > following: > > Instead of introducing a new API which demands from the driver to > provide callbacks and function pointers to the adaptive moderation > logic, which might be called on every irq interrupt, and to avoid > performance hit, we can move the generic code and the core adaptive > moderation logic to a header file. > I would like also to suggesting adding Tal Gilboa, as the official maintainer for this new file. as he is the current maintainer and the co-author of this feature in mlx5. > the mlx5e am logic and data structures are already written in a very > modular way and can be stripped out of mlx5e fairly easily. > And i would like to suggest to do it in the following manner: > > 1. naming convention: > I would like to change the generic code naming convention to have the > words DIM (Dynamically-Tuned Interrupt Moderation) instead of mlx5e_am > or am, Following our public blog [1] of the matter and the official > name we prefer for this feature. > > [1] https://community.mellanox.com/docs/DOC-2511 > > Suggested naming convention instead of rx_am: net_dim (DIM for net > applications). > As the rx_am or (dim) logic can be applied to other applications. > > 2. Data types: > > All below mlx5e am data types can be used as is as they hold nothing > mlx5 related. > > struct mlx5e_rx_am_sample > - Holds the current stats sample with ktime stamp > - rename to: net_dim_sample > > struct mlx5e_rx_am_stats > - Holds the needed stats (delta) calculation of last 2 samples > - rename to: net_dim_stats > > struct mlx5e_rx_am > - Adaptive moderation handle > - rename to: net_dim > > 3. static inline generic functions API (based on the usage from > mlx5e_rx_am function) > > //Make a DIM measurement: > net_dim_sample(struct *net_dim_sample sample, packets, bytes, event_ctr) > - previously mlx5e_am_sample() > - Fills a sample struct with the provided stats and the current timestamp > > > //start a new DIM measurement and handles the DIM state machine initial state: > net_dim_start_sample(struct *net_dim rx_dim) > - Makes a new measurement > - stores it into rx_dim->start > - rx_dim->state = DIM_MEASURE_IN_PROGRESS > > > // Takes a new sample (curr_sample) and makes the decision (handles > DIM_MEASURE_IN_PROGRESS state) > net_dim_decision(struct *net_dim rx_dim, curr_sample) > - previously mlx5e_am_decision > - Note, instead of providing the current_stats (delta between start > and current_sample) I suggest to provide the current_sample and move > the stats calculation logic into net_dime_decision. >- All the logic in this function will move to the generic code. > > 4. Driver implementation: (according to the above suggested API) >- Driver should initialize struct net_dim rx_dim, and provide a > work function to handle "dim apply new profile" decision. >- in napi_poll driver should implement the rx_dim state machine > using the above API before arming the completion event queues as > follows: > > mlx5e_rx_am: > > void mlx5e_rx_am(struct mlx5e_rq *rq) > { >struct net_dim *rx_dim = >dim; >struct net_dim_sample end_sample; >u16 nevents; > >switch (rx_dim->state) { >case DIM_MEASURE_IN_PROGRESS: >// driver specific pre condition to decide whether to > continue or skip >// Note that here we only sample and don't calc the delta > stats, this logic moved into net_dim_decision >net_dim_sample(rq, _sample, rq->packets,
Re: [PATCH net] net: qmi_wwan: add support for Cinterion PLS8
On 24/11/17, Reinhard Speyerer wrote: > before posting this problem report > https://developer.gemalto.com/threads/ipv6dualstack-problems-pls8-e-revision-03017 > in the Gemalto developer forum I tested the qmi_wwan/cdc_ether changes > you suggested above and apart from having two working QMI interfaces > the IPv6/dualstack problems observed with AT^SWWAN/cdc_ether were > also gone when using WDSStartNetworkInterface and the QMI interface in > raw IP mode instead. thx for sharing this information. IPv6 with PLS8-E is also a topic on our side Best Regards, Oliver
Re: [PATCH net] net: qmi_wwan: add support for Cinterion PLS8
On 23/11/17, Bjørn Mork wrote: > > This is also consistent with the Windows drivers. And being a proper > CDC ECM class function, it should Just Work with the cdc_ether driver. > Except for the "RmNet" part, which I guess is the reason you want to > add this device to qmi_wwan. Which is fine, *if* we can be reasonably > certain that it does support QMI. The description string is a strong > indication, but it would be even better to know this was tested. > > But adding this to qmi_wwan is not enough. You also need to add a > blacklist entry to cdc_ether. Both should use a device+class match, > similar to the Novatel entries. This will make the interface numbering > irrelevant, and will allow a single entry to match both QMI/rmnet > functions. ok I tried it this way: +++ b/drivers/net/usb/cdc_ether.c @@ -562,6 +562,7 @@ static void usbnet_cdc_zte_status(struct usbnet *dev, struct urb *urb) #define MICROSOFT_VENDOR_ID0x045e #define UBLOX_VENDOR_ID0x1546 #define TPLINK_VENDOR_ID 0x2357 +#define CINTERION_VENDOR_ID0x1e2d static const struct usb_device_id products[] = { /* BLACKLIST !! @@ -821,6 +822,13 @@ static void usbnet_cdc_zte_status(struct usbnet *dev, struct urb *urb) .driver_info = 0, }, +/* Cinterion PLS8 - handled by qmi_wwan */ +{ + USB_DEVICE_AND_INTERFACE_INFO(CINTERION_VENDOR_ID, 0x0061, USB_CLASS_COMM, + USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE), + .driver_info = 0, +}, + /* WHITELIST!!! * * CDC Ether uses two interfaces, not necessarily consecutive. diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 720a3a2..93e102e 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1221,6 +1221,7 @@ static int qmi_wwan_resume(struct usb_interface *intf) {QMI_FIXED_INTF(0x0b3c, 0xc00a, 6)},/* Olivetti Olicard 160 */ {QMI_FIXED_INTF(0x0b3c, 0xc00b, 4)},/* Olivetti Olicard 500 */ {QMI_FIXED_INTF(0x1e2d, 0x0060, 4)},/* Cinterion PLxx */ + {QMI_FIXED_INTF(0x1e2d, 0x0061, 3)},/* Cinterion PLS8 LTE */ {QMI_FIXED_INTF(0x1e2d, 0x0053, 4)},/* Cinterion PHxx,PXxx */ {QMI_FIXED_INTF(0x1e2d, 0x0082, 4)},/* Cinterion PHxx,PXxx (2 RmNet) */ {QMI_FIXED_INTF(0x1e2d, 0x0082, 5)},/* Cinterion PHxx,PXxx (2 RmNet) */ but now I'am missing an ttyACM4 interface and the edc_ether registering is not working anymore. [ 124.310611] usb 2-1: new high-speed USB device number 2 using ci_hdrc [ 124.457029] usb 2-1: New USB device found, idVendor=1e2d, idProduct=0061 [ 124.463938] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 124.471307] usb 2-1: Product: LTE Modem [ 124.475278] usb 2-1: Manufacturer: Cinterion [ 124.536219] cdc_acm 2-1:1.0: ttyACM0: USB ACM device [ 124.563155] cdc_acm 2-1:1.2: ttyACM1: USB ACM device [ 124.589625] cdc_acm 2-1:1.4: ttyACM2: USB ACM device [ 124.613517] cdc_acm 2-1:1.6: ttyACM3: USB ACM device in my working old setup with kernel 3.9.11 it looks like this: [ 129.710622] usb 2-1: new high-speed USB device number 2 using ci_hdrc [ 129.873985] usb 2-1: New USB device found, idVendor=1e2d, idProduct=0061 [ 129.888573] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 129.902973] usb 2-1: Product: LTE Modem [ 129.906927] usb 2-1: Manufacturer: Cinterion [ 129.928389] cdc_acm 2-1:1.0: ttyACM0: USB ACM device [ 129.959324] cdc_acm 2-1:1.2: ttyACM1: USB ACM device [ 129.992714] cdc_acm 2-1:1.4: ttyACM2: USB ACM device [ 130.019416] cdc_acm 2-1:1.6: ttyACM3: USB ACM device [ 130.045248] cdc_acm 2-1:1.8: This device cannot do calls on its own. It is not a modem. [ 130.073929] cdc_acm 2-1:1.8: ttyACM4: USB ACM device [ 130.100982] cdc_ether 2-1:1.10 usb0: register 'cdc_ether' at usb-ci_hdrc.1-1, CDC Ethernet Device, de:ad:be:ef:00:00 [ 130.136438] cdc_ether 2-1:1.12 usb1: register 'cdc_ether' at usb-ci_hdrc.1-1, CDC Ethernet Device, de:ad:be:ef:00:01 Any clue what I'am doing wrong here? Best regards, Oliver
Re: [RFC 0/9] net: create adaptive software irq moderation library
On Sun, Nov 5, 2017 at 9:44 PM, Andy Gospodarekwrote: > From: Andy Gospodarek > > This RFC converts the adaptive interrupt moderation library from the > mlx5_en driver into a library so it can be used by any driver. The last > patch in this set adds support for interrupt moderation in the bnxt_en > driver. > > The main purpose of this code in the mlx5_en driver is to allow an > administrator to make sure that default coalesce settings are optimized > for low latency, but quickly adapt to handle high throughput traffic and > optimize how many packets are received during each napi poll. > > For any new driver the following changes would be needed to use this > library: > > - add elements in ring struct to track items needed by this library > - create function that can be called to actually set coalesce settings > for the driver > > My main reason for making this an RFC is that I would like verification > from Mellanox that the performance of their driver does not change in a > unintended way. I did some basic testing (netperf) and did not note a > statistically significant change in throughput or CPU utilization before > and after this set. > > Credit to Rob Rice and Lee Reed for doing some of the initial proof of > concept and testing for this patch. Hi Andy, Following our conversation in netdev 2.2, i would like to suggest the following: Instead of introducing a new API which demands from the driver to provide callbacks and function pointers to the adaptive moderation logic, which might be called on every irq interrupt, and to avoid performance hit, we can move the generic code and the core adaptive moderation logic to a header file. the mlx5e am logic and data structures are already written in a very modular way and can be stripped out of mlx5e fairly easily. And i would like to suggest to do it in the following manner: 1. naming convention: I would like to change the generic code naming convention to have the words DIM (Dynamically-Tuned Interrupt Moderation) instead of mlx5e_am or am, Following our public blog [1] of the matter and the official name we prefer for this feature. [1] https://community.mellanox.com/docs/DOC-2511 Suggested naming convention instead of rx_am: net_dim (DIM for net applications). As the rx_am or (dim) logic can be applied to other applications. 2. Data types: All below mlx5e am data types can be used as is as they hold nothing mlx5 related. struct mlx5e_rx_am_sample - Holds the current stats sample with ktime stamp - rename to: net_dim_sample struct mlx5e_rx_am_stats - Holds the needed stats (delta) calculation of last 2 samples - rename to: net_dim_stats struct mlx5e_rx_am - Adaptive moderation handle - rename to: net_dim 3. static inline generic functions API (based on the usage from mlx5e_rx_am function) //Make a DIM measurement: net_dim_sample(struct *net_dim_sample sample, packets, bytes, event_ctr) - previously mlx5e_am_sample() - Fills a sample struct with the provided stats and the current timestamp //start a new DIM measurement and handles the DIM state machine initial state: net_dim_start_sample(struct *net_dim rx_dim) - Makes a new measurement - stores it into rx_dim->start - rx_dim->state = DIM_MEASURE_IN_PROGRESS // Takes a new sample (curr_sample) and makes the decision (handles DIM_MEASURE_IN_PROGRESS state) net_dim_decision(struct *net_dim rx_dim, curr_sample) - previously mlx5e_am_decision - Note, instead of providing the current_stats (delta between start and current_sample) I suggest to provide the current_sample and move the stats calculation logic into net_dime_decision. - All the logic in this function will move to the generic code. 4. Driver implementation: (according to the above suggested API) - Driver should initialize struct net_dim rx_dim, and provide a work function to handle "dim apply new profile" decision. - in napi_poll driver should implement the rx_dim state machine using the above API before arming the completion event queues as follows: mlx5e_rx_am: void mlx5e_rx_am(struct mlx5e_rq *rq) { struct net_dim *rx_dim = >dim; struct net_dim_sample end_sample; u16 nevents; switch (rx_dim->state) { case DIM_MEASURE_IN_PROGRESS: // driver specific pre condition to decide whether to continue or skip // Note that here we only sample and don't calc the delta stats, this logic moved into net_dim_decision net_dim_sample(rq, _sample, rq->packets, rq->bytes, cq->events); if (net_dim_decision(rx_dim, _sample)) { rx_dim->state = DIM_APPLY_NEW_PROFILE; schedule_work(_dim->work); } /* fall through */ case DIM_START_MEASURE: net_dim_start_sample(rx_dim); break; case DIM_APPLY_NEW_PROFILE: break; } Thanks, Saeed. > > Andy Gospodarek (9): > mlx5_en: move interrupt moderation structs to new file >
[PATCH net-next] net: thunderx: Set max queue count taking XDP_TX into account
From: Sunil Gouthamon T81 there are only 4 cores, hence setting max queue count to 4 would leave nothing for XDP_TX. This patch fixes this by doubling max queue count in above scenarios. Signed-off-by: Sunil Goutham Signed-off-by: cjacob Signed-off-by: Aleksey Makarov --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index b82e28262c57..52b3a6044f85 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -1891,6 +1891,11 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) nic->pdev = pdev; nic->pnicvf = nic; nic->max_queues = qcount; + /* If no of CPUs are too low, there won't be any queues left +* for XDP_TX, hence double it. +*/ + if (!nic->t88) + nic->max_queues *= 2; /* MAP VF's configuration registers */ nic->reg_base = pcim_iomap(pdev, PCI_CFG_REG_BAR_NUM, 0); -- 2.15.0
[PATCH net-next] net: thunderx: Add support for xdp redirect
From: Sunil GouthamThis patch adds support for XDP_REDIRECT. Flush is not yet supported. Signed-off-by: Sunil Goutham Signed-off-by: cjacob Signed-off-by: Aleksey Makarov --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 110 - drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 11 ++- drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 4 + 3 files changed, 94 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index a063c36c4c58..b82e28262c57 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -65,6 +65,11 @@ module_param(cpi_alg, int, S_IRUGO); MODULE_PARM_DESC(cpi_alg, "PFC algorithm (0=none, 1=VLAN, 2=VLAN16, 3=IP Diffserv)"); +struct nicvf_xdp_tx { + u64 dma_addr; + u8 qidx; +}; + static inline u8 nicvf_netdev_qidx(struct nicvf *nic, u8 qidx) { if (nic->sqs_mode) @@ -500,14 +505,29 @@ static int nicvf_init_resources(struct nicvf *nic) return 0; } +static void nicvf_unmap_page(struct nicvf *nic, struct page *page, u64 dma_addr) +{ + /* Check if it's a recycled page, if not unmap the DMA mapping. +* Recycled page holds an extra reference. +*/ + if (page_ref_count(page) == 1) { + dma_addr &= PAGE_MASK; + dma_unmap_page_attrs(>pdev->dev, dma_addr, +RCV_FRAG_LEN + XDP_HEADROOM, +DMA_FROM_DEVICE, +DMA_ATTR_SKIP_CPU_SYNC); + } +} + static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, struct cqe_rx_t *cqe_rx, struct snd_queue *sq, struct sk_buff **skb) { struct xdp_buff xdp; struct page *page; + struct nicvf_xdp_tx *xdp_tx = NULL; u32 action; - u16 len, offset = 0; + u16 len, err, offset = 0; u64 dma_addr, cpu_addr; void *orig_data; @@ -521,7 +541,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, cpu_addr = (u64)phys_to_virt(cpu_addr); page = virt_to_page((void *)cpu_addr); - xdp.data_hard_start = page_address(page); + xdp.data_hard_start = page_address(page) + RCV_BUF_HEADROOM; xdp.data = (void *)cpu_addr; xdp_set_data_meta_invalid(); xdp.data_end = xdp.data + len; @@ -540,18 +560,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, switch (action) { case XDP_PASS: - /* Check if it's a recycled page, if not -* unmap the DMA mapping. -* -* Recycled page holds an extra reference. -*/ - if (page_ref_count(page) == 1) { - dma_addr &= PAGE_MASK; - dma_unmap_page_attrs(>pdev->dev, dma_addr, -RCV_FRAG_LEN + XDP_PACKET_HEADROOM, -DMA_FROM_DEVICE, -DMA_ATTR_SKIP_CPU_SYNC); - } + nicvf_unmap_page(nic, page, dma_addr); /* Build SKB and pass on packet to network stack */ *skb = build_skb(xdp.data, @@ -564,6 +573,20 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, case XDP_TX: nicvf_xdp_sq_append_pkt(nic, sq, (u64)xdp.data, dma_addr, len); return true; + case XDP_REDIRECT: + /* Save DMA address for use while transmitting */ + xdp_tx = (struct nicvf_xdp_tx *)page_address(page); + xdp_tx->dma_addr = dma_addr; + xdp_tx->qidx = nicvf_netdev_qidx(nic, cqe_rx->rq_idx); + + err = xdp_do_redirect(nic->pnicvf->netdev, , prog); + if (!err) + return true; + + /* Free the page on error */ + nicvf_unmap_page(nic, page, dma_addr); + put_page(page); + break; default: bpf_warn_invalid_xdp_action(action); /* fall through */ @@ -571,18 +594,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog, trace_xdp_exception(nic->netdev, prog, action); /* fall through */ case XDP_DROP: - /* Check if it's a recycled page, if not -* unmap the DMA mapping. -* -* Recycled page holds an extra reference. -*/ - if (page_ref_count(page) == 1) { - dma_addr &= PAGE_MASK; -
[PATCH v2] net: sched: crash on blocks with goto chain action
tcf_block_put_ext has assumed that all filters (and thus their goto actions) are destroyed in RCU callback and thus can not race with our list iteration. However, that is not true during netns cleanup (see tcf_exts_get_net comment). Prevent the user after free by holding all chains (except 0, that one is already held). foreach_safe is not enough in this case. To reproduce, run the following in a netns and then delete the ns: ip link add dtest type dummy tc qdisc add dev dtest ingress tc filter add dev dtest chain 1 parent : handle 1 prio 1 flower action goto chain 2 Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()") Signed-off-by: Roman Kapl--- v1 -> v2: Hold all chains instead of just the currently iterated one, the code should be more clear this way. --- net/sched/cls_api.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 7d97f612c9b9..ddcf04b4ab43 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -336,7 +336,8 @@ static void tcf_block_put_final(struct work_struct *work) struct tcf_chain *chain, *tmp; rtnl_lock(); - /* Only chain 0 should be still here. */ + + /* At this point, all the chains should have refcnt == 1. */ list_for_each_entry_safe(chain, tmp, >chain_list, list) tcf_chain_put(chain); rtnl_unlock(); @@ -344,15 +345,21 @@ static void tcf_block_put_final(struct work_struct *work) } /* XXX: Standalone actions are not allowed to jump to any chain, and bound - * actions should be all removed after flushing. However, filters are now - * destroyed in tc filter workqueue with RTNL lock, they can not race here. + * actions should be all removed after flushing. */ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q, struct tcf_block_ext_info *ei) { - struct tcf_chain *chain, *tmp; + struct tcf_chain *chain; - list_for_each_entry_safe(chain, tmp, >chain_list, list) + /* Hold a refcnt for all chains, except 0, so that they don't disappear +* while we are iterating. +*/ + list_for_each_entry(chain, >chain_list, list) + if (chain->index) + tcf_chain_hold(chain); + + list_for_each_entry(chain, >chain_list, list) tcf_chain_flush(chain); tcf_block_offload_unbind(block, q, ei); -- 2.15.0
[PATCH] net: thunderbolt: Stop using zero to mean no valid DMA mapping
Commit 86dabda426ac ("net: thunderbolt: Clear finished Tx frame bus address in tbnet_tx_callback()") fixed a DMA-API violation where the driver called dma_unmap_page() in tbnet_free_buffers() for a bus address that might already be unmapped. The fix was to zero out the bus address of a frame in tbnet_tx_callback(). However, as pointed out by David Miller, zero might well be valid mapping (at least in theory) so it is not good idea to use it here. It turns out that we don't need the whole map/unmap dance for Tx buffers at all. Instead we can map the buffers when they are initially allocated and unmap them when the interface is brought down. In between we just DMA sync the buffers for the CPU or device as needed. Signed-off-by: Mika Westerberg--- drivers/net/thunderbolt.c | 57 --- 1 file changed, 24 insertions(+), 33 deletions(-) diff --git a/drivers/net/thunderbolt.c b/drivers/net/thunderbolt.c index 228d4aa6d9ae..ca5e375de27c 100644 --- a/drivers/net/thunderbolt.c +++ b/drivers/net/thunderbolt.c @@ -335,7 +335,7 @@ static void tbnet_free_buffers(struct tbnet_ring *ring) if (ring->ring->is_tx) { dir = DMA_TO_DEVICE; order = 0; - size = tbnet_frame_size(tf); + size = TBNET_FRAME_SIZE; } else { dir = DMA_FROM_DEVICE; order = TBNET_RX_PAGE_ORDER; @@ -512,6 +512,7 @@ static int tbnet_alloc_rx_buffers(struct tbnet *net, unsigned int nbuffers) static struct tbnet_frame *tbnet_get_tx_buffer(struct tbnet *net) { struct tbnet_ring *ring = >tx_ring; + struct device *dma_dev = tb_ring_dma_device(ring->ring); struct tbnet_frame *tf; unsigned int index; @@ -522,7 +523,9 @@ static struct tbnet_frame *tbnet_get_tx_buffer(struct tbnet *net) tf = >frames[index]; tf->frame.size = 0; - tf->frame.buffer_phy = 0; + + dma_sync_single_for_cpu(dma_dev, tf->frame.buffer_phy, + tbnet_frame_size(tf), DMA_TO_DEVICE); return tf; } @@ -531,13 +534,8 @@ static void tbnet_tx_callback(struct tb_ring *ring, struct ring_frame *frame, bool canceled) { struct tbnet_frame *tf = container_of(frame, typeof(*tf), frame); - struct device *dma_dev = tb_ring_dma_device(ring); struct tbnet *net = netdev_priv(tf->dev); - dma_unmap_page(dma_dev, tf->frame.buffer_phy, tbnet_frame_size(tf), - DMA_TO_DEVICE); - tf->frame.buffer_phy = 0; - /* Return buffer to the ring */ net->tx_ring.prod++; @@ -548,10 +546,12 @@ static void tbnet_tx_callback(struct tb_ring *ring, struct ring_frame *frame, static int tbnet_alloc_tx_buffers(struct tbnet *net) { struct tbnet_ring *ring = >tx_ring; + struct device *dma_dev = tb_ring_dma_device(ring->ring); unsigned int i; for (i = 0; i < TBNET_RING_SIZE; i++) { struct tbnet_frame *tf = >frames[i]; + dma_addr_t dma_addr; tf->page = alloc_page(GFP_KERNEL); if (!tf->page) { @@ -559,7 +559,17 @@ static int tbnet_alloc_tx_buffers(struct tbnet *net) return -ENOMEM; } + dma_addr = dma_map_page(dma_dev, tf->page, 0, TBNET_FRAME_SIZE, + DMA_TO_DEVICE); + if (dma_mapping_error(dma_dev, dma_addr)) { + __free_page(tf->page); + tf->page = NULL; + tbnet_free_buffers(ring); + return -ENOMEM; + } + tf->dev = net->dev; + tf->frame.buffer_phy = dma_addr; tf->frame.callback = tbnet_tx_callback; tf->frame.sof = TBIP_PDF_FRAME_START; tf->frame.eof = TBIP_PDF_FRAME_END; @@ -881,19 +891,6 @@ static int tbnet_stop(struct net_device *dev) return 0; } -static bool tbnet_xmit_map(struct device *dma_dev, struct tbnet_frame *tf) -{ - dma_addr_t dma_addr; - - dma_addr = dma_map_page(dma_dev, tf->page, 0, tbnet_frame_size(tf), - DMA_TO_DEVICE); - if (dma_mapping_error(dma_dev, dma_addr)) - return false; - - tf->frame.buffer_phy = dma_addr; - return true; -} - static bool tbnet_xmit_csum_and_map(struct tbnet *net, struct sk_buff *skb, struct tbnet_frame **frames, u32 frame_count) { @@ -908,13 +905,14 @@ static bool tbnet_xmit_csum_and_map(struct tbnet *net, struct sk_buff *skb, if (skb->ip_summed != CHECKSUM_PARTIAL) { /* No need to calculate checksum so we just update the -* total frame count and map the frames for DMA. +* total frame count and sync the frames for
Re: [PATCH v7 3/5] bpf: add a bpf_override_function helper
On 11/22/2017 10:23 PM, Josef Bacik wrote: > From: Josef Bacik> > Error injection is sloppy and very ad-hoc. BPF could fill this niche > perfectly with it's kprobe functionality. We could make sure errors are > only triggered in specific call chains that we care about with very > specific situations. Accomplish this with the bpf_override_funciton > helper. This will modify the probe'd callers return value to the > specified value and set the PC to an override function that simply > returns, bypassing the originally probed function. This gives us a nice > clean way to implement systematic error injection for all of our code > paths. > > Acked-by: Alexei Starovoitov > Acked-by: Ingo Molnar > Signed-off-by: Josef Bacik Series looks good to me as well; BPF bits: Acked-by: Daniel Borkmann
Re: [PATCH] dsa: dsa2: fix compile error for !CONFIG_OF
On 11/24/2017 3:28 AM, Andrew Lunn wrote: On Thu, Nov 23, 2017 at 08:27:48PM +0100, Arend Van Spriel wrote: + Arnd On Thu, Nov 23, 2017 at 8:12 PM, Arend Van Sprielwrote: On Thu, Nov 23, 2017 at 3:04 PM, Andrew Lunn wrote: On Thu, Nov 23, 2017 at 01:00:51PM +0100, Arend van Spriel wrote: Compilation fails building on x86_64 platform which does not have CONFIG_OF enabled. Signed-off-by: Arend van Spriel --- After rebasing my branch to v4.14 I attempted to build the kernel and hit the following compile issue: net/dsa/dsa2.c: In function \u2018dsa_switch_parse_member_of\u2019: net/dsa/dsa2.c:678:2: error: implicit declaration of function 'of_property_read_variable_u32_array' Hi Arend https://lkml.org/lkml/2017/11/6/493 So my email/patch did get through initially. Sorry for the noise and thanks for the info. Hi Andrew, Getting back to this. It seems that this patch did not get in. At least I searched for it in v4.14.1 but no luck. Hi Arned The use of of_property_read_variable_u32_array was added in 975e6e32215e ("net: dsa: rework switch parsing"). This patch is not in v4.14. It is in linus/master, so v4.15-rc1 should have it. And the fix is also in linus/master. So there does not appear to be anything wrong. I just built v4.14.1 for x86_64 with DSA without problems. Thanks, Andrew I am actually using wireless-testing tree which it based on 4.14 and throws in net-next and the wireless trees. I assume the fix did not go through net-next. Sorry for the confusion. Regards, Arend
[PATCH] atm: nicstar: use the setup_timer helper
From: Colin Ian KingReplace init_timer and two explicit assignments with the setup_timer helper. Signed-off-by: Colin Ian King --- drivers/atm/nicstar.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/atm/nicstar.c b/drivers/atm/nicstar.c index a9702836cbae..335447ed0ba4 100644 --- a/drivers/atm/nicstar.c +++ b/drivers/atm/nicstar.c @@ -284,10 +284,8 @@ static int __init nicstar_init(void) XPRINTK("nicstar: nicstar_init() returned.\n"); if (!error) { - init_timer(_timer); + setup_timer(_timer, ns_poll, 0UL); ns_timer.expires = jiffies + NS_POLL_PERIOD; - ns_timer.data = 0UL; - ns_timer.function = ns_poll; add_timer(_timer); } -- 2.14.1
Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock
> Also the MAC address is different, can you just trigger off of Dell's > MAC address space instead of the address space of the dongle device? A really good idea, never thought of this. Thanks for the hint :) Still, I need to ask Dell folks to get all the answers. Kai-Heng
Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock
> On 24 Nov 2017, at 4:28 PM, Greg KHwrote: > > The bcdDevice is different between the dock device and the "real" > device, why not use that? Yea, I’ll poke around and see if bcdDevice alone can be a good predicate. > Then there is still a bug. Who as ASMedia is working on this, have they > posted anything to the linux-usb mailing list about it? I think they are doing this internally. I’ll advice them to ask questions here if they encounter any problem. > Maybe. Have you tried using usbmon to see what the data streams are for > the two devices and where they have problems and diverge? Is the dock > device doing something different in response to something from the host > that the "real" device does not do? No I haven’t. Not really sure how do debug network packets over USB. I’ll do some research on the topic. Kai-Heng
Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock
On Fri, Nov 24, 2017 at 11:44:02AM +0800, Kai Heng Feng wrote: > > > > On 23 Nov 2017, at 5:24 PM, Greg KHwrote: > > > > On Thu, Nov 23, 2017 at 04:53:41PM +0800, Kai Heng Feng wrote: > >> > >> What I want to do here is to finding this connection: > >> Realtek r8153 <-> SMSC hub (USD ID: 0424:5537) <-> > >> ASMedia XHCI controller (PCI ID: 1b21:1142). > >> > >> Is there a safer way to do this? > > > > Nope! You can't do that at all from within a USB driver, sorry. As you > > really should not care at all :) > > Got it :) > > The r8153 in Dell TB dock has version information, RTL_VER_05. > We can use it to check for workaround, but many working RTL_VER_05 devices > will also be affected. > Do you think it’s an acceptable compromise? I think all of the users of this device that is working just fine for them would not like that to happen :( > >> I have a r8153 <-> USB 3.0 dongle which work just fine. I can’t find any > >> information to differentiate them. Hence I want to use the connection to > >> identify if r8153 is on a Dell TB dock. > > > > Are you sure there is nothing different in the version or release number > > of the device? 'lsusb -v' shows the exact same information for both > > devices? > > Yes. I attached `lsusb -v` for r8153 on Dell TB dock, on a RJ45 <-> USB 3.0 > dongle, > and on a RJ45 <-> USB Type-C dongle. The bcdDevice is different between the dock device and the "real" device, why not use that? > >> Yes. From what I know, ASMedia is working on it, but not sure how long it > >> will take. In the meantime, I’d like to workaround this issue for the > >> users. > > > > Again, it's a host controller bug, it should be fixed there, don't try > > to paper over the real issue in different individual drivers. > > > > I think I've seen various patches on the linux-usb list for this > > controller already, have you tried them? > > Yes. These patches are all in mainline Linux now. Then there is still a bug. Who as ASMedia is working on this, have they posted anything to the linux-usb mailing list about it? > >> Actually no. > >> I just plugged r8153 dongle into the same hub, surprisingly the issue > >> doesn’t happen in this scenario. > > > > Then something seems to be wrong with the device itself, as that would > > be the same exact electrical/logical path, right? > > I have no idea why externally plugged one doesn’t have this issue. > Maybe it’s related how it’s wired inside the Dell TB dock... Maybe. Have you tried using usbmon to see what the data streams are for the two devices and where they have problems and diverge? Is the dock device doing something different in response to something from the host that the "real" device does not do? thanks, greg k-h
[patch iproute2] tc: move action cookie print out of the stats if
From: Jiri PirkoCookie print was made dependent on show_stats for no good reason. Fix this bu pushing cookie print ot of the stats if. Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies") Signed-off-by: Jiri Pirko --- tc/m_action.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/tc/m_action.c b/tc/m_action.c index 0dce97f..c2fc4f1 100644 --- a/tc/m_action.c +++ b/tc/m_action.c @@ -301,19 +301,18 @@ static int tc_print_one_action(FILE *f, struct rtattr *arg) return err; if (show_stats && tb[TCA_ACT_STATS]) { - fprintf(f, "\tAction statistics:\n"); print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL); - if (tb[TCA_ACT_COOKIE]) { - int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]); - char b1[strsz * 2 + 1]; - - fprintf(f, "\n\tcookie len %d %s ", strsz, - hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]), - strsz, b1, sizeof(b1))); - } fprintf(f, "\n"); } + if (tb[TCA_ACT_COOKIE]) { + int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]); + char b1[strsz * 2 + 1]; + + fprintf(f, "\tcookie len %d %s\n", strsz, + hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]), + strsz, b1, sizeof(b1))); + } return 0; } -- 2.9.5
Re: [PATCH] r8152: disable rx checksum offload on Dell TB dock
On Fri, Nov 24, 2017 at 09:28:05AM +0100, Greg KH wrote: > On Fri, Nov 24, 2017 at 11:44:02AM +0800, Kai Heng Feng wrote: > > > > > > > On 23 Nov 2017, at 5:24 PM, Greg KHwrote: > > > > > > On Thu, Nov 23, 2017 at 04:53:41PM +0800, Kai Heng Feng wrote: > > >> > > >> What I want to do here is to finding this connection: > > >> Realtek r8153 <-> SMSC hub (USD ID: 0424:5537) <-> > > >> ASMedia XHCI controller (PCI ID: 1b21:1142). > > >> > > >> Is there a safer way to do this? > > > > > > Nope! You can't do that at all from within a USB driver, sorry. As you > > > really should not care at all :) > > > > Got it :) > > > > The r8153 in Dell TB dock has version information, RTL_VER_05. > > We can use it to check for workaround, but many working RTL_VER_05 devices > > will also be affected. > > Do you think it’s an acceptable compromise? > > I think all of the users of this device that is working just fine for > them would not like that to happen :( > > > >> I have a r8153 <-> USB 3.0 dongle which work just fine. I can’t find any > > >> information to differentiate them. Hence I want to use the connection to > > >> identify if r8153 is on a Dell TB dock. > > > > > > Are you sure there is nothing different in the version or release number > > > of the device? 'lsusb -v' shows the exact same information for both > > > devices? > > > > Yes. I attached `lsusb -v` for r8153 on Dell TB dock, on a RJ45 <-> USB 3.0 > > dongle, > > and on a RJ45 <-> USB Type-C dongle. > > The bcdDevice is different between the dock device and the "real" > device, why not use that? Also the MAC address is different, can you just trigger off of Dell's MAC address space instead of the address space of the dongle device? thanks, greg k-h
Re: [PATCH 1/6] perf: Add new type PERF_TYPE_PROBE
On Thu, Nov 23, 2017 at 10:31:29PM -0800, Alexei Starovoitov wrote: > unfortunately 32-bit is more screwed than it seems: > > $ cat align.c > #include > > struct S { > unsigned long long a; > } s; > > struct U { > unsigned long long a; > } u; > > int main() > { > printf("%d, %d\n", sizeof(unsigned long long), >__alignof__(unsigned long long)); > printf("%d, %d\n", sizeof(s), __alignof__(s)); > printf("%d, %d\n", sizeof(u), __alignof__(u)); > } > $ gcc -m32 align.c > $ ./a.out > 8, 8 > 8, 4 > 8, 4 *blink* how is that even correct? I understood the spec to say the alignment of composite types should be the max alignment of any of its member types (otherwise it cannot guarantee the alignment of its members). > so we have to use __aligned_u64 in uapi. Ideally yes, but effectively it most often doesn't matter. > Otherwise, yes, we could have used config1 and config2 to pass pointers > to the kernel, but since they're defined as __u64 already we cannot > change them and have to do this ugly dance around 'config' field. I don't understand the reasoning why you cannot use them. Even if they are not naturally aligned on x86_32, why would it matter? x86_32 needs two loads in any case, but there is no concurrency, so split loads is not a problem. Add to that that 'intptr_t' on ILP32 is in fact only a single u32 and thus the other u32 will always be 0. So yes, alignment is screwy, but I really don't see who cares and why it would matter in practise. Please explain.
Re: [RFC net-next 4/6] netdevsim: add software driver for testing offloads
Fri, Nov 24, 2017 at 08:49:17AM CET, jakub.kicin...@netronome.com wrote: >On Thu, Nov 23, 2017 at 11:24 PM, Jiri Pirkowrote: >> Fri, Nov 24, 2017 at 03:36:11AM CET, jakub.kicin...@netronome.com wrote: >>>To be able to run selftests without any hardware required we >>>need a software model. The model can also serve as an example >>>implementation for those implementing actual HW offloads. >>>The dummy driver have previously been extended to test SR-IOV, >>>but the general consensus seems to be against adding further >>>features to it. >>> >>>Signed-off-by: Jakub Kicinski >>>Reviewed-by: Simon Horman >>>--- >> >> [...] >> >> >>>+++ b/drivers/net/netdevsim/netdev.c >>>@@ -0,0 +1,136 @@ >>>+/* >>>+ * Copyright (C) 2017 Netronome Systems, Inc. >>>+ * >>>+ * This software is dual licensed under the GNU General License Version 2, >>>+ * June 1991 as shown in the file COPYING in the top-level directory of this >>>+ * source tree or the BSD 2-Clause License provided below. You have the >>>+ * option to license this software under the complete terms of either >>>license. >>>+ * >>>+ * The BSD 2-Clause License: >> >> Why gpl2 is not enough for this? > >It's the license I got from legal, I will request permission to use >pure gpl2. Thanks! Yeah, I semi-understand need for bsd for actual hw driver (we have it for mlxsw as well). But for this testing driver, it really does not make sense.
Re: [RFC net-next 0/6] xdp: make stack perform remove and tests
On Thu, Nov 23, 2017 at 11:45 PM, Jiri Pirkowrote: > Fri, Nov 24, 2017 at 03:36:07AM CET, jakub.kicin...@netronome.com wrote: >>Hi! >> >>The purpose of this series is to add a software model of BPF offloads >>to make it easier for everyone to test them and make some of the more >>arcane rules and assumptions more clear. >> >>The series starts with 3 patches aiming to make XDP handling in the >>drivers less error prone. Currently driver authors have to remember >>to free XDP programs if XDP is active during unregister. With this >>series the core will disable XDP on its own. It will take place >>after close, drivers are not expected to perform reconfiguration >>when disabling XDP on a downed device. >> >>Next two patches add the software netdev driver. Last but not least > > I wonder if for this it is needed to split the driver into multiple > files. I think that a single file would be better as I don't expect the > driver would get big. I was hoping other offloads will be added to their separate files, to make it easier for people to find "all code relevant when implementing X" easier. Sort of related to your comment on the license, I'm hoping to be able to use SPDX one-line header to lower the overhead of many files. Has anyone managed to get an OK to do that? >>there is a python test which exercises all the corner cases which >>came to my mind. >> >>Test needs to be run as root. It will print basic information to >>stdout, but can also create a more detailed log of all commands >>when --log option is passed. Log is in Emacs Org-mode format. >> >> ./tools/testing/selftests/bpf/test_offload.py --log /tmp/log >> >>Something I'm still battling with, and would appreciate help of >>wiser people is that occasionally during the test something makes >>the refcount of init_net drop to 0 :S I tried to create a simple >>reproducer, but seems like just running the script in the loop is >>the easiest way to go... Could it have something to do with the >>recent TC work? The driver is pretty simple and never touches > > I don't see how... To be clear I meant the changes made to destruction of filters, not your work. The BPF code doesn't touch ref counts and cls exts do seem to hold a ref on the net... but perhaps that's just pointing the finger unnecessarily :) I will try to investigate again tomorrow. >>ref counts. The only slightly unusual thing is that the BPF code >>sleeps for a bit on remove in the netdev notifier. >> >> >>Jakub Kicinski (6): >> net: xdp: avoid output parameters when querying XDP prog >> net: xdp: report flags program was installed with on query >> net: xdp: make the stack take care of the tear down >> netdevsim: add software driver for testing offloads >> netdevsim: add bpf offload support >> selftests/bpf: add offload test based on netdevsim > > Patchset looks fine to me. > Thanks for this! Thanks!