[PATCH iproute2] tc: fix second printing of requeues
Non-JSON tc qdisc output used to print the "requeues" statistic twice. Commit 4fcec7f3665b ("tc: jsonify stats2") tried to preserve this behaviour for both standard output and JSON, but used the wrong statistic (q.qlen). Also duplicating keys in JSON is not allowed, so the second occurrence should be completely skipped with JSON. Fixes: 4fcec7f3665b ("tc: jsonify stats2") Signed-off-by: Jakub Kicinski--- tc/tc_util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tc/tc_util.c b/tc/tc_util.c index 10e5aa91168a..aceb0d944933 100644 --- a/tc/tc_util.c +++ b/tc/tc_util.c @@ -846,7 +846,7 @@ void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char *prefix, struct rtat print_string(PRINT_FP, NULL, "backlog %s", sprint_size(q.backlog, b1)); print_uint(PRINT_ANY, "qlen", " %up", q.qlen); - print_uint(PRINT_ANY, "requeues", " requeues %u", q.qlen); + print_uint(PRINT_FP, NULL, " requeues %u", q.requeues); } if (xstats) -- 2.15.1
Re: [PATCH net] ipv6: addrconf: break critical section in addrconf_verify_rtnl()
On Fri, Jan 26, 2018 at 04:10:43PM -0800, Eric Dumazet wrote: > From: Eric Dumazet> > Heiner reported a lockdep splat [1] > > This is caused by attempting GFP_KERNEL allocation while RCU lock is > held and BH blocked. > > We believe that addrconf_verify_rtnl() could run for a long period, > so instead of using GFP_ATOMIC here as Ido suggested, we should break > the critical section and restart it after the allocation. [...] > Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr") > Signed-off-by: Eric Dumazet > Reported-by: Heiner Kallweit Reviewed-by: Ido Schimmel Thanks!
GREETINGS FROM MR.MUSTAPHA ALI.
My Dear Friend. I am Mr. Mustapha Ali a banker in Bank of Africa Burkina Faso West Africa, Please i want to transfer an abandoned sum of 13.5 millions USD to your account.50% will be for you and 50% for me. No risk involved. Respond back to me if you are interested along with your personal information needed below for more details. 1. Full name:. 2. Current Address:. 3. Phone. 4. Occupation:. 5. Age: 6. Country: 7. Sex 8. Your Passport or ID card or Driving License Thanks. Mr. Mustapha Ali
Re: [PATCH bpf-next v7 2/5] libbpf: add function to setup XDP
Hi, On Sat, 2018-01-27 at 02:23 +0100, Daniel Borkmann wrote: > On 01/25/2018 01:05 AM, Eric Leblond wrote: > > Most of the code is taken from set_link_xdp_fd() in bpf_load.c and > > slightly modified to be library compliant. > > > > Signed-off-by: Eric Leblond> > Acked-by: Alexei Starovoitov > > --- > > tools/lib/bpf/bpf.c| 127 > > + > > tools/lib/bpf/libbpf.c | 2 + > > tools/lib/bpf/libbpf.h | 4 ++ > > 3 files changed, 133 insertions(+) > > > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 5128677e4117..749a447ec9ed 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > @@ -25,6 +25,17 @@ > > #include > > #include > > #include "bpf.h" > > +#include "libbpf.h" > > +#include "nlattr.h" > > +#include > > Doesn't libbpf pull in already -I$(srctree)/tools/include/uapi? Seems > the > other headers don't need 'uapi/' path prefix. Right, it works without the uapi. > > > +#include > > +#include > > + > > +#ifndef IFLA_XDP_MAX > > +#define IFLA_XDP 43 > > +#define IFLA_XDP_FD1 > > +#define IFLA_XDP_FLAGS 3 > > +#endif > > Hm, given we pull in tools/include/uapi/linux/netlink.h, shouldn't we > also > get include/uapi/linux/if_link.h dependency in here, so above ifdef > workaround > can be avoided? This values are fixed so we risk nothing by keeping a definition if ever it is not available in system headers. But it is fine with me if you want me to add if_link.h to include/uapi/. BR, -- Eric Leblond Blog: https://home.regit.org/
Re: [PATCH net] ipv6: change route cache aging logic
On Fri, 2018-01-26 at 11:40 -0800, Wei Wang wrote: > From: Wei Wang> > In current route cache aging logic, if a route has both RTF_EXPIRE and > RTF_GATEWAY set, the route will only be removed if the neighbor cache > has no RTN_ROUTE flag. Otherwise, even if the route has expired, it > won't get deleted. > Fix this logic to always check if the route has expired first and then > do the gateway neighbor cache check if previous check decide to not > remove the exception entry. > > Fixes: 1859bac04fb6 ("ipv6: remove from fib tree aged out RTF_CACHE dst") > Signed-off-by: Wei Wang > Signed-off-by: Eric Dumazet Thank you for the fix! LGTM Acked-by: Paolo Abeni Cheers, /P
Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
Hi Tariq Thanks for your kindly response. That's really appreciated. On 01/25/2018 05:54 PM, Tariq Toukan wrote: > > > On 25/01/2018 8:25 AM, jianchao.wang wrote: >> Hi Eric >> >> Thanks for you kindly response and suggestion. >> That's really appreciated. >> >> Jianchao >> >> On 01/25/2018 11:55 AM, Eric Dumazet wrote: >>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote: Hi Tariq On 01/22/2018 10:12 AM, jianchao.wang wrote: >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote: On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: > Hi Tariq > > Very sad that the crash was reproduced again after applied the patch. >> >> Memory barriers vary for different Archs, can you please share more >> details regarding arch and repro steps? > The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 > 12/27/2015 > The xen is installed. The crash occurred in DOM0. > Regarding to the repro steps, it is a customer's test which does heavy > disk I/O over NFS storage without any guest. > What is the finial suggestion on this ? If use wmb there, is the performance pulled down ? > > I want to evaluate this effect. > I agree with Eric, expected impact is restricted, especially after batching > the allocations.> >>> >>> Since >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k= >>> >>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that >>> often. >>> >>> I doubt the additional wmb() will have serious impact there. >>> > > I will test the effect (it'll be beginning of next week). > I'll update so we can make a more confident decision. > I have also sent patches with wmb and batching allocations to customer and let them check whether the performance is impacted. And update here asap when get feedback. > Thanks, > Tariq > >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at >> https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ=f0myCdBQoRjaklxGau_S9ZtQKSQYALW9p2MIuTMAEYo=447fFu-xZoLvmxdaVhijK6cUk4Jcx7GtBCNddQT4GOQ= >> >
Re: [PATCH v2 0/6] wl1251: Fix MAC address for Nokia N900
On Friday 05 January 2018 02:45:10 Luis R. Rodriguez wrote: > On Tue, Jan 02, 2018 at 08:23:45PM +0100, Pali Rohár wrote: > > On Friday 10 November 2017 00:38:22 Pali Rohár wrote: > > > This patch series fix processing MAC address for wl1251 chip found in > > > Nokia N900. > > > > > > Changes since v1: > > > * Added Acked-by for Pavel Machek > > > * Fixed grammar > > > * Magic numbers for NVS offsets are replaced by defines > > > * Check for validity of mac address NVS data is moved into function > > > * Changed order of patches as Pavel requested > > > > > > Pali Rohár (6): > > > wl1251: Update wl->nvs_len after wl->nvs is valid > > > wl1251: Generate random MAC address only if driver does not have > > > valid > > > wl1251: Parse and use MAC address from supplied NVS data > > > wl1251: Set generated MAC address back to NVS data > > > firmware: Add request_firmware_prefer_user() function > > > wl1251: Use request_firmware_prefer_user() for loading NVS > > > calibration data > > > > > > drivers/base/firmware_class.c | 45 +- > > > drivers/net/wireless/ti/wl1251/Kconfig |1 + > > > drivers/net/wireless/ti/wl1251/main.c | 104 > > > ++-- > > > include/linux/firmware.h |9 +++ > > > 4 files changed, 138 insertions(+), 21 deletions(-) > > > > Hi! Are there any comments for first 4 patches? If not, could they be > > accepted and merged? > > Since the first 4 patches do not touch the firmware API they seem fine to me > so > long as the maintainer accepts them. Maybe resend and clarify you have dropped > the other ones and amend with the new tags. According to get_maintainer.pl, Kalle Valo is maintainer. Kalle Valo, if you do not have any other comments, can you accept first 4 patches? Or do you really need to resent first 4 patches again? -- Pali Rohár pali.ro...@gmail.com signature.asc Description: PGP signature
Re: [RFC 0/2] hv_netvsc shutdown redo
Stephen Hemmingerwrites: > These patches change how teardown of Hyper-V network devices > is done. These are tested on WS2012 and WS2016. > > It moves the tx/rx shutdown into the rndis close handling, > and that makes earlier gpadl changes unnecsssary. > Thank you Stephen, I gave these a try and they didn't survive my 'death row' test on WS2016: I run 3 things in parallel: 1) iperf to some external IP 2) while true; do ethtool -L ethX combined 6; ethtool -L ethX combined 8; done 3) while true; do ip link set dev ethX mtu 1400; ip link set dev ethX mtu 1450; done I ended up with a hang: [ 1226.710034] INFO: task ip:2357 blocked for more than 120 seconds. [ 1226.712397] Not tainted 4.15.0-rc9+ #321 [ 1226.714030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1226.716724] ip D0 2357 1474 0x [ 1226.718698] Call Trace: [ 1226.719588] ? __schedule+0x1da/0x7b0 [ 1226.720910] ? get_page_from_freelist+0x106d/0x15c0 [ 1226.722648] schedule+0x28/0x80 [ 1226.723807] schedule_preempt_disabled+0xa/0x10 [ 1226.725952] __mutex_lock.isra.1+0x1a0/0x4e0 [ 1226.727915] ? rtnetlink_rcv_msg+0x212/0x2d0 [ 1226.729849] rtnetlink_rcv_msg+0x212/0x2d0 [ 1226.731611] ? rtnl_calcit.isra.28+0x110/0x110 [ 1226.733824] netlink_rcv_skb+0x4a/0x120 [ 1226.736916] netlink_unicast+0x19d/0x250 [ 1226.738907] netlink_sendmsg+0x2a5/0x3a0 [ 1226.740762] sock_sendmsg+0x30/0x40 [ 1226.742552] SYSC_sendto+0x10e/0x140 [ 1226.744310] ? __do_page_fault+0x26d/0x4c0 [ 1226.746332] entry_SYSCALL_64_fastpath+0x20/0x83 [ 1226.748730] RIP: 0033:0x7ff2cdc9aa7d [ 1226.750776] RSP: 002b:7ffd0a3455e8 EFLAGS: 0246 [ 1349.590041] INFO: task kworker/3:6:1586 blocked for more than 120 seconds. [ 1349.595358] Not tainted 4.15.0-rc9+ #321 [ 1349.597335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1349.600638] kworker/3:6 D0 1586 2 0x8000 [ 1349.603335] Workqueue: ipv6_addrconf addrconf_verify_work [ 1349.605779] Call Trace: [ 1349.607080] ? __schedule+0x1da/0x7b0 [ 1349.608856] ? update_load_avg+0x563/0x6d0 [ 1349.610834] ? update_curr+0xb9/0x190 [ 1349.613050] schedule+0x28/0x80 [ 1349.615290] schedule_preempt_disabled+0xa/0x10 [ 1349.617306] __mutex_lock.isra.1+0x1a0/0x4e0 [ 1349.619072] ? addrconf_verify_work+0xa/0x20 [ 1349.621108] addrconf_verify_work+0xa/0x20 [ 1349.623107] process_one_work+0x188/0x380 [ 1349.625012] worker_thread+0x2e/0x390 [ 1349.626976] ? process_one_work+0x380/0x380 [ 1349.628925] kthread+0x111/0x130 [ 1349.630498] ? kthread_create_worker_on_cpu+0x70/0x70 [ 1349.632786] ? do_group_exit+0x3a/0xa0 [ 1349.634598] ret_from_fork+0x35/0x40 (I'm not 100% sure this is a _new_ issue btw, it can happen that the race was always there and it's just easier to trigger it now). I'll try to do more testing next week. Thanks, -- Vitaly
Re: [PATCH bpf-next v7 3/5] libbpf: add error reporting in XDP
Hi, On Sat, 2018-01-27 at 02:28 +0100, Daniel Borkmann wrote: > On 01/25/2018 01:05 AM, Eric Leblond wrote: > > Parse netlink ext attribute to get the error message returned by > > the card. Code is partially take from libnl. > > > > We add netlink.h to the uapi include of tools. And we need to > > avoid include of userspace netlink header to have a successful > > build of sample so nlattr.h has a define to avoid > > the inclusion. Using a direct define could have been an issue > > as NLMSGERR_ATTR_MAX can change in the future. > > > > We also define SOL_NETLINK if not defined to avoid to have to > > copy socket.h for a fixed value. > > > > Signed-off-by: Eric Leblond> > Acked-by: Alexei Starovoitov > > > > remote rtne > > > > Signed-off-by: Eric Leblond > > Some leftover artifact from squashing commits? Outch > > samples/bpf/Makefile | 2 +- > > tools/lib/bpf/Build| 2 +- > > tools/lib/bpf/bpf.c| 13 +++- > > tools/lib/bpf/nlattr.c | 187 > > + > > tools/lib/bpf/nlattr.h | 72 +++ > > 5 files changed, 273 insertions(+), 3 deletions(-) > > create mode 100644 tools/lib/bpf/nlattr.c > > create mode 100644 tools/lib/bpf/nlattr.h > > > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile > > index 7f61a3d57fa7..5c4cd3745282 100644 > > --- a/samples/bpf/Makefile > > +++ b/samples/bpf/Makefile > > @@ -45,7 +45,7 @@ hostprogs-y += xdp_rxq_info > > hostprogs-y += syscall_tp > > > > # Libbpf dependencies > > -LIBBPF := ../../tools/lib/bpf/bpf.o > > +LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o > > CGROUP_HELPERS := > > ../../tools/testing/selftests/bpf/cgroup_helpers.o > > > > test_lru_dist-objs := test_lru_dist.o $(LIBBPF) > > diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build > > index d8749756352d..64c679d67109 100644 > > --- a/tools/lib/bpf/Build > > +++ b/tools/lib/bpf/Build > > @@ -1 +1 @@ > > -libbpf-y := libbpf.o bpf.o > > +libbpf-y := libbpf.o bpf.o nlattr.o > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 749a447ec9ed..765fd95b0657 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > @@ -27,7 +27,7 @@ > > #include "bpf.h" > > #include "libbpf.h" > > #include "nlattr.h" > > -#include > > +#include > > Okay, so here it's put back from prior added uapi/linux/rtnetlink.h > into linux/rtnetlink.h. Could you add this properly in the first > commit rather than relative adjustment/fix within the same set? Yes, sure. > > #include > > #include > > > > @@ -37,6 +37,10 @@ > > #define IFLA_XDP_FLAGS 3 > > #endif > > > > +#ifndef SOL_NETLINK > > +#define SOL_NETLINK 270 > > +#endif > > This would need include/linux/socket.h into tools/ include infra > as well, no? Yes, and I fear a lot of dependencies. ++ -- Eric Leblond Blog: https://home.regit.org/
Re: [4.15-rc9] fs_reclaim lockdep trace
On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: > Just triggered this on a server I was rsync'ing to. Actually, I can trigger this really easily, even with an rsync from one disk to another. Though that also smells a little like networking in the traces. Maybe netdev has ideas. The first instance: > > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] > tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > ? _atomic_dec_and_lock+0x101/0x101 > ? fs_reclaim_acquire.part.102+0x5/0x30 > ? print_lock+0x54/0x68 > __lock_acquire+0xa09/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? mutex_destroy+0x120/0x120 > ? hlock_class+0xa0/0xa0 > ? kernel_text_address+0x5c/0x90 > ? __kernel_text_address+0xe/0x30 > ? unwind_get_return_address+0x2f/0x50 > ? __save_stack_trace+0x92/0x100 > ? graph_lock+0x8d/0x100 > ? check_noncircular+0x20/0x20 > ? __lock_acquire+0x616/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? __lock_acquire+0x616/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? print_irqtrace_events+0x110/0x110 > ? active_load_balance_cpu_stop+0x7b0/0x7b0 > ? debug_show_all_locks+0x2f0/0x2f0 > ? mark_lock+0x1b1/0xa00 > ? lock_acquire+0x12e/0x350 > lock_acquire+0x12e/0x350 > ? fs_reclaim_acquire.part.102+0x5/0x30 > ? lockdep_rcu_suspicious+0x100/0x100 > ? set_next_entity+0x20e/0x10d0 > ? mark_lock+0x1b1/0xa00 > ? match_held_lock+0x8d/0x440 > ? mark_lock+0x1b1/0xa00 > ? save_trace+0x1e0/0x1e0 > ? print_irqtrace_events+0x110/0x110 > ? alloc_extent_state+0xa7/0x410 > fs_reclaim_acquire.part.102+0x29/0x30 > ? fs_reclaim_acquire.part.102+0x5/0x30 > kmem_cache_alloc+0x3d/0x2c0 > ? rb_erase+0xe63/0x1240 > alloc_extent_state+0xa7/0x410 > ? lock_extent_buffer_for_io+0x3f0/0x3f0 > ? find_held_lock+0x6d/0xd0 > ? test_range_bit+0x197/0x210 > ? lock_acquire+0x350/0x350 > ? do_raw_spin_unlock+0x147/0x220 > ? do_raw_spin_trylock+0x100/0x100 > ? iotree_fs_info+0x30/0x30 > __clear_extent_bit+0x3ea/0x570 > ? clear_state_bit+0x270/0x270 > ? count_range_bits+0x2f0/0x2f0 > ? lock_acquire+0x350/0x350 > ? rb_prev+0x21/0x90 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > ? btrfs_submit_direct+0xca0/0xca0 > ? check_new_page_bad+0x1f0/0x1f0 > ? match_held_lock+0xa5/0x440 > ? debug_show_all_locks+0x2f0/0x2f0 > btrfs_releasepage+0x161/0x170 > ? __btrfs_releasepage+0x1c0/0x1c0 > ? page_rmapping+0xd0/0xd0 > ? rmap_walk+0x100/0x100 > try_to_release_page+0x162/0x1c0 > ? generic_file_write_iter+0x3c0/0x3c0 > ? page_evictable+0xcc/0x110 > ? lookup_address_in_pgd+0x107/0x190 > shrink_page_list+0x1d5a/0x2fb0 > ? putback_lru_page+0x3f0/0x3f0 > ? save_trace+0x1e0/0x1e0 > ? _lookup_address_cpa.isra.13+0x40/0x60 > ? debug_show_all_locks+0x2f0/0x2f0 > ? kmem_cache_free+0x8c/0x280 > ? free_extent_state+0x1c8/0x3b0 > ? mark_lock+0x1b1/0xa00 > ? page_rmapping+0xd0/0xd0 > ? print_irqtrace_events+0x110/0x110 > ? shrink_node_memcg.constprop.88+0x4c9/0x5e0 > ? shrink_node+0x12d/0x260 > ? try_to_free_pages+0x418/0xaf0 > ? __alloc_pages_slowpath+0x976/0x1790 > ? __alloc_pages_nodemask+0x52c/0x5c0 > ? delete_node+0x28d/0x5c0 > ? find_held_lock+0x6d/0xd0 > ? free_pcppages_bulk+0x381/0x570 > ? lock_acquire+0x350/0x350 > ? do_raw_spin_unlock+0x147/0x220 > ? do_raw_spin_trylock+0x100/0x100 > ? __lock_is_held+0x51/0xc0 > ? _raw_spin_unlock+0x24/0x30 > ? free_pcppages_bulk+0x381/0x570 > ? mark_lock+0x1b1/0xa00 > ? free_compound_page+0x30/0x30 > ? print_irqtrace_events+0x110/0x110 > ? __kernel_map_pages+0x2c9/0x310 > ? mark_lock+0x1b1/0xa00 > ? print_irqtrace_events+0x110/0x110 > ? __delete_from_page_cache+0x2e7/0x4e0 > ? save_trace+0x1e0/0x1e0 > ? __add_to_page_cache_locked+0x680/0x680 > ? find_held_lock+0x6d/0xd0 > ? __list_add_valid+0x29/0xa0 > ? free_unref_page_commit+0x198/0x270 > ? drain_local_pages_wq+0x20/0x20 > ? stop_critical_timings+0x210/0x210 > ? mark_lock+0x1b1/0xa00 > ? mark_lock+0x1b1/0xa00 > ?
[PATCH net] net_sched: gen_estimator: fix lockdep splat
From: Eric Dumazetsyzbot reported a lockdep splat in gen_new_estimator() / est_fetch_counters() when attempting to lock est->stats_lock. Since est_fetch_counters() is called from BH context from timer interrupt, we need to block BH as well when calling it from process context. Most qdiscs use per cpu counters and are immune to the problem, but net/sched/act_api.c and net/netfilter/xt_RATEEST.c are using a spinlock to protect their data. They both call gen_new_estimator() while object is created and not yet alive, so this bug could not trigger a deadlock, only a lockdep splat. Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: Eric Dumazet Reported-by: syzbot --- net/core/gen_estimator.c |4 1 file changed, 4 insertions(+) diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c index 9834cfa21b21168a7654290dc2a999e41937b534..0a3f88f08727f1f1217560407ff539c8a8c17496 100644 --- a/net/core/gen_estimator.c +++ b/net/core/gen_estimator.c @@ -159,7 +159,11 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats, est->intvl_log = intvl_log; est->cpu_bstats = cpu_bstats; + if (stats_lock) + local_bh_disable(); est_fetch_counters(est, ); + if (stats_lock) + local_bh_enable(); est->last_bytes = b.bytes; est->last_packets = b.packets; old = rcu_dereference_protected(*rate_est, 1);
kernel 4.15.0-rc9+ (net-next) high cpu load at 50Gbit/s - about 6Mpps
Hi Today I made some real life traffic tests with kernel 4.15.0-rc9 but when traffic reach 50Gbit/s and about 6Mpps cpou load rises fast from 48% to 100% for all cpu cores. Here is some graph that presenting how cpu load rises when there was more pps. https://ibb.co/mhD5ob here is perf record from that time: https://pastebin.com/3zqG1rvE There is 8x 10G ixgbe 82599 interfaces teamed with teamd. No traffic queueing - only pfifo fast on all interfaces. No NAT or iptables forles other than INPUT (about 30rules) All nic's have same ethtool settings: ethtool -k eth0 Features for eth0: Cannot get device udp-fragmentation-offload settings: Operation not supported rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: on receive-hashing: on highdma: on [fixed] rx-vlan-filter: on vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: on tx-gre-csum-segmentation: on tx-ipxip4-segmentation: on tx-ipxip6-segmentation: on tx-udp_tnl-segmentation: on tx-udp_tnl-csum-segmentation: on tx-gso-partial: on tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off hw-tc-offload: off esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: on ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 4096 Current hardware settings: RX: 4096 RX Mini: 0 RX Jumbo: 0 TX: 2048 ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 512 rx-frames: 0 rx-usecs-irq: 0 rx-frames-irq: 0 tx-usecs: 0 tx-frames: 0 tx-usecs-irq: 0 tx-frames-irq: 0 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0
[bpf-next PATCH 1/5] bpf: Sync kernel ABI header with tooling header for bpf_common.h
I recently fixed up a lot of commits that forgot to keep the tooling headers in sync. And then I forgot to do the same thing in commit cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct that before people notice ;-). Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09 ("bpf: add selftest for tcpbpf"). Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes") Signed-off-by: Jesper Dangaard Brouer--- 0 files changed diff --git a/tools/include/uapi/linux/bpf_common.h b/tools/include/uapi/linux/bpf_common.h index 18be90725ab0..ee97668bdadb 100644 --- a/tools/include/uapi/linux/bpf_common.h +++ b/tools/include/uapi/linux/bpf_common.h @@ -15,9 +15,10 @@ /* ld/ldx fields */ #define BPF_SIZE(code) ((code) & 0x18) -#defineBPF_W 0x00 -#defineBPF_H 0x08 -#defineBPF_B 0x10 +#defineBPF_W 0x00 /* 32-bit */ +#defineBPF_H 0x08 /* 16-bit */ +#defineBPF_B 0x10 /* 8-bit */ +/* eBPFBPF_DW 0x1864-bit */ #define BPF_MODE(code) ((code) & 0xe0) #defineBPF_IMM 0x00 #defineBPF_ABS 0x20
[bpf-next PATCH 3/5] tools/libbpf: add test program for loading BPF ELF files
Signed-off-by: Jesper Dangaard Brouer--- 0 files changed diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 83714ca1f22b..f968702f4ef6 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -147,11 +147,11 @@ LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE)) CMD_TARGETS = $(LIB_FILE) -TARGETS = $(CMD_TARGETS) +TARGETS = $(CMD_TARGETS) test_libbpf_open all: fixdep all_cmd -all_cmd: $(CMD_TARGETS) +all_cmd: $(TARGETS) $(BPF_IN): force elfdep bpfdep @(test -f ../../include/uapi/linux/bpf.h -a -f ../../../include/uapi/linux/bpf.h && ( \ @@ -168,6 +168,9 @@ $(OUTPUT)libbpf.so: $(BPF_IN) $(OUTPUT)libbpf.a: $(BPF_IN) $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ +test_libbpf_open: test_libbpf_open.c $(OUTPUT)libbpf.a + $(QUIET_LINK)$(CC) -lelf -Wall $< $(OUTPUT)libbpf.a -o $(OUTPUT)$@ + define do_install if [ ! -d '$(DESTDIR_SQ)$2' ]; then \ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \ diff --git a/tools/lib/bpf/test_libbpf_open.c b/tools/lib/bpf/test_libbpf_open.c new file mode 100644 index ..8fcd1c076add --- /dev/null +++ b/tools/lib/bpf/test_libbpf_open.c @@ -0,0 +1,150 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Copyright (c) 2018 Jesper Dangaard Brouer, Red Hat Inc. + */ +static const char *__doc__ = + "Libbpf test program for loading BPF ELF object files"; + +#include +#include +#include +#include +#include +#include + +static const struct option long_options[] = { + {"help",no_argument,NULL, 'h' }, + {"debug", no_argument,NULL, 'D' }, + {"quiet", no_argument,NULL, 'q' }, + {0, 0, NULL, 0 } +}; + +static void usage(char *argv[]) +{ + int i; + + printf("\nDOCUMENTATION:\n%s\n\n", __doc__); + printf(" Usage: %s (options-see-below) BPF_FILE\n", argv[0]); + printf(" Listing options:\n"); + for (i = 0; long_options[i].name != 0; i++) { + printf(" --%-12s", long_options[i].name); + printf(" short-option: -%c", + long_options[i].val); + printf("\n"); + } + printf("\n"); +} + +#define DEFINE_PRINT_FN(name, enabled) \ +static int libbpf_##name(const char *fmt, ...) \ +{ \ +va_list args; \ +int ret; \ + \ +va_start(args, fmt); \ + if (enabled) { \ + fprintf(stderr, "[" #name "] ");\ + ret = vfprintf(stderr, fmt, args); \ + } \ +va_end(args); \ +return ret;\ +} +DEFINE_PRINT_FN(warning, 1) +DEFINE_PRINT_FN(info, 1) +DEFINE_PRINT_FN(debug, 1) + +#define EXIT_FAIL_LIBBPF EXIT_FAILURE +#define EXIT_FAIL_OPTION 2 + +int test_walk_progs(struct bpf_object *obj, bool verbose) +{ + struct bpf_program *prog; + int cnt = 0; + + bpf_object__for_each_program(prog, obj) { + cnt++; + if (verbose) + printf("Prog (count:%d) section_name: %s\n", cnt, + bpf_program__title(prog, false)); + } + return 0; +} + +int test_walk_maps(struct bpf_object *obj, bool verbose) +{ + struct bpf_map *map; + int cnt = 0; + + bpf_map__for_each(map, obj) { + cnt++; + if (verbose) + printf("Map (count:%d) name: %s\n", cnt, + bpf_map__name(map)); + } + return 0; +} + +int test_open_file(char *filename, bool verbose) +{ + struct bpf_object *bpfobj = NULL; + long err; + + if (verbose) + printf("Open BPF ELF-file with libbpf: %s\n", filename); + + /* Load BPF ELF object file and check for errors */ + bpfobj = bpf_object__open(filename); + err = libbpf_get_error(bpfobj); + if (err) { + char err_buf[128]; + libbpf_strerror(err, err_buf, sizeof(err_buf)); + if (verbose) + printf("Unable to load eBPF objects in file '%s': %s\n", + filename, err_buf); + return EXIT_FAIL_LIBBPF; + } + test_walk_progs(bpfobj, verbose); + test_walk_maps(bpfobj, verbose); + + if (verbose) + printf("Close BPF ELF-file with libbpf: %s\n", + bpf_object__name(bpfobj)); + bpf_object__close(bpfobj); + + return 0; +} + +int main(int argc, char **argv) +{ + char filename[1024] = { 0 }; + bool verbose = 1; + int longindex = 0; + int opt; + +
[bpf-next PATCH 4/5] selftests/bpf: add selftest that use test_libbpf_open
Signed-off-by: Jesper Dangaard Brouer--- tools/testing/selftests/bpf/Makefile |9 +- tools/testing/selftests/bpf/test_libbpf.sh | 45 2 files changed, 53 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/bpf/test_libbpf.sh diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index bf05bc5e36e5..ea2e7d498f5a 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -13,6 +13,7 @@ endif CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) -I../../../include LDLIBS += -lcap -lelf -lrt -lpthread +# Order correspond to 'make run_tests' order TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \ test_align test_verifier_log test_dev_cgroup test_tcpbpf_user @@ -22,7 +23,11 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \ sample_map_ret0.o test_tcpbpf_kern.o -TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \ +# Order correspond to 'make run_tests' order +TEST_PROGS := test_kmod.sh \ + test_libbpf.sh \ + test_xdp_redirect.sh \ + test_xdp_meta.sh \ test_offload.py include ../lib.mk @@ -36,6 +41,8 @@ $(TEST_GEN_PROGS): $(BPFOBJ) # force a rebuild of BPFOBJ when its dependencies are updated force: +$(OUTPUT)/test_libbpf_open: $(BPFOBJ) + $(BPFOBJ): force $(MAKE) -C $(BPFDIR) OUTPUT=$(OUTPUT)/ diff --git a/tools/testing/selftests/bpf/test_libbpf.sh b/tools/testing/selftests/bpf/test_libbpf.sh new file mode 100755 index ..bd623d2cbdb8 --- /dev/null +++ b/tools/testing/selftests/bpf/test_libbpf.sh @@ -0,0 +1,45 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +export TESTNAME=test_libbpf + +# Determine selftest success via shell exit code +exit_handler() +{ + if (( $? == 0 )); then + echo "selftests: $TESTNAME [PASS]"; + else + echo "$TESTNAME: failed at file $LAST_LOADED" 1>&2 + echo "selftests: $TESTNAME [FAILED]"; + fi +} + +libbpf_open_file() +{ + LAST_LOADED=$1 + ./test_libbpf_open --quiet $1 +} + +# Exit script immediately (well catched by trap handler) if any +# program/thing exits with a non-zero status. +set -e + +# (Use 'trap -l' to list meaning of numbers) +trap exit_handler 0 2 3 6 9 + +libbpf_open_file test_l4lb.o + +# TODO: fix libbpf to load noinline functions +# [warning] libbpf: incorrect bpf_call opcode +#libbpf_open_file test_l4lb_noinline.o + +# TODO: fix test_xdp_meta.c to load with libbpf +# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version +#libbpf_open_file test_xdp_meta.o + +# TODO: fix libbpf to handle .eh_frame +# [warning] libbpf: relocation failed: no section(10) +#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o + +# Success +exit 0
[bpf-next PATCH 0/5] tools/libbpf improvements and selftests
While playing with using libbpf for the Suricata project, we had issues LLVM >= 4.0.1 generating ELF files that could not be loaded with libbpf (tools/lib/bpf/). During the troubleshooting phase, I wrote a test program and improved the debugging output in libbpf. I turned this into a selftests program, and it also serves as a code example for libbpf in itself. I discovered that there are at least three ELF load issues with libbpf. I left them as TODO comments in (tools/testing/selftests/bpf) test_libbpf.sh. I've only fixed the load issue with eh_frames. We can work on the other issues later. --- Jesper Dangaard Brouer (5): bpf: Sync kernel ABI header with tooling header for bpf_common.h tools/libbpf: improve the pr_debug statements to contain section numbers tools/libbpf: add test program for loading BPF ELF files selftests/bpf: add selftest that use test_libbpf_open tools/libbpf: handle issues with bpf ELF objects containing .eh_frames tools/testing/selftests/bpf/Makefile |9 +- tools/testing/selftests/bpf/test_libbpf.sh | 45 2 files changed, 53 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/bpf/test_libbpf.sh --
[bpf-next PATCH 5/5] tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
If clang >= 4.0.1 is missing the option '-target bpf', it will cause llc/llvm to create two ELF sections for "Exception Frames", with section names '.eh_frame' and '.rel.eh_frame'. The BPF ELF loader library libbpf fails when loading files with these sections. The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c, handle this gracefully. And iproute2 loader also seems to work with these "eh" sections. The issue in libbpf is caused by bpf_object__elf_collect() skip the '.eh_frame' and thus doesn't create an internal data structure pointing to this ELF section index. Later when the relocation section '.rel.eh_frame' is processed, it tries to find the '.eh_frame' via the ELF section idx, which is that fails (in bpf_object__collect_reloc). I couldn't find a way to see that the '.rel.eh_frame' was irrelevant (that is only determined by looking at the section it reference, which we no longer have info available on). Thus, my solution is simply to match on the name of the relocation section, to skip that too. Signed-off-by: Jesper Dangaard Brouer--- 0 files changed diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index b4eeaa3ebff5..84e8bbe07347 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -822,6 +822,13 @@ static int bpf_object__elf_collect(struct bpf_object *obj) void *reloc = obj->efile.reloc; int nr_reloc = obj->efile.nr_reloc + 1; + /* Skip decoding of "eh" exception frames */ + if (strcmp(name, ".rel.eh_frame") == 0) { + pr_debug("skip relo section %s(%d) for section(%d)\n", +name, idx, sh.sh_info); + continue; + } + reloc = realloc(reloc, sizeof(*obj->efile.reloc) * nr_reloc); if (!reloc) {
[bpf-next PATCH 2/5] tools/libbpf: improve the pr_debug statements to contain section numbers
While debugging a bpf ELF loading issue, I needed to correlate the ELF section number with the failed relocation section reference. Thus, add section numbers/index to the pr_debug. In debug mode, also print section that were skipped. This helped me identify that a section (.eh_frame) was skipped, and this was the reason the relocation section (.rel.eh_frame) could not find that section number. The section numbers corresponds to the readelf tools Section Headers [Nr]. Signed-off-by: Jesper Dangaard Brouer--- 0 files changed diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 30c776375118..b4eeaa3ebff5 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -315,8 +315,8 @@ bpf_program__init(void *data, size_t size, char *section_name, int idx, prog->section_name = strdup(section_name); if (!prog->section_name) { - pr_warning("failed to alloc name for prog under section %s\n", - section_name); + pr_warning("failed to alloc name for prog under section(%d) %s\n", + idx, section_name); goto errout; } @@ -759,29 +759,29 @@ static int bpf_object__elf_collect(struct bpf_object *obj) idx++; if (gelf_getshdr(scn, ) != ) { - pr_warning("failed to get section header from %s\n", - obj->path); + pr_warning("failed to get section(%d) header from %s\n", + idx, obj->path); err = -LIBBPF_ERRNO__FORMAT; goto out; } name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name); if (!name) { - pr_warning("failed to get section name from %s\n", - obj->path); + pr_warning("failed to get section(%d) name from %s\n", + idx, obj->path); err = -LIBBPF_ERRNO__FORMAT; goto out; } data = elf_getdata(scn, 0); if (!data) { - pr_warning("failed to get section data from %s(%s)\n", - name, obj->path); + pr_warning("failed to get section(%d) data from %s(%s)\n", + idx, name, obj->path); err = -LIBBPF_ERRNO__FORMAT; goto out; } - pr_debug("section %s, size %ld, link %d, flags %lx, type=%d\n", -name, (unsigned long)data->d_size, + pr_debug("section(%d) %s, size %ld, link %d, flags %lx, type=%d\n", +idx, name, (unsigned long)data->d_size, (int)sh.sh_link, (unsigned long)sh.sh_flags, (int)sh.sh_type); @@ -836,6 +836,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj) obj->efile.reloc[n].shdr = sh; obj->efile.reloc[n].data = data; } + } else { + pr_debug("skip section(%d) %s\n", idx, name); } if (err) goto out; @@ -1115,8 +1117,7 @@ static int bpf_object__collect_reloc(struct bpf_object *obj) prog = bpf_object__find_prog_by_idx(obj, idx); if (!prog) { - pr_warning("relocation failed: no %d section\n", - idx); + pr_warning("relocation failed: no section(%d)\n", idx); return -LIBBPF_ERRNO__RELOC; }
Re: [net-next 07/15] i40e: Implement an ethtool private flag to stop LLDP in FW
On Fri, Jan 26, 2018 at 11:24 PM, Jeff Kirsherwrote: > From: Dave Ertman > > Implement the private flag disable-fw-lldp for ethtool > to disable the processing of LLDP packets by the FW. > This will stop the FW from consuming LLDPDU and cause > them to be sent up the stack. > > The FW is also being configured to apply a default DCB > configuration on link up. > > Toggling the value of this flag will also cause a PF reset. > > Disabling FW DCB will also disable DCBx. wait, isn't there a knob in the DCB NL UAPI to state where the DCBx state-machine runs {nowhere, host, firmware}, I am pretty much there is such, and if not, why not add it there instead of private flags? Or.
[PATCH] net: pxa168_eth: add netconsole support
This implements ndo_poll_controller callback which is necessary to enable netconsole. Signed-off-by: Alexander MonakovCc: Russell King Cc: Sebastian Hesselbarth Cc: Florian Fainelli --- Hello, I'm using this to enable netconsole on a consumer device built around the Marvell Berlin BG2CD SoC. Thanks. Alexander drivers/net/ethernet/marvell/pxa168_eth.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c b/drivers/net/ethernet/marvell/pxa168_eth.c index 7bbd86f08e5f..6a188f7b426a 100644 --- a/drivers/net/ethernet/marvell/pxa168_eth.c +++ b/drivers/net/ethernet/marvell/pxa168_eth.c @@ -1362,6 +1362,14 @@ static int pxa168_eth_do_ioctl(struct net_device *dev, struct ifreq *ifr, return -EOPNOTSUPP; } +#ifdef CONFIG_NET_POLL_CONTROLLER +static void pxa168_eth_netpoll(struct net_device *dev) +{ + struct pxa168_eth_private *pep = netdev_priv(dev); + napi_schedule(>napi); +} +#endif + static void pxa168_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { @@ -1390,6 +1398,9 @@ static const struct net_device_ops pxa168_eth_netdev_ops = { .ndo_do_ioctl = pxa168_eth_do_ioctl, .ndo_change_mtu = pxa168_eth_change_mtu, .ndo_tx_timeout = pxa168_eth_tx_timeout, +#ifdef CONFIG_NET_POLL_CONTROLLER + .ndo_poll_controller= pxa168_eth_netpoll, +#endif }; static int pxa168_eth_probe(struct platform_device *pdev) -- 2.11.0
Re: [4.15-rc9] fs_reclaim lockdep trace
On Sat, Jan 27, 2018 at 2:24 PM, Dave Joneswrote: > On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: > > Just triggered this on a server I was rsync'ing to. > > Actually, I can trigger this really easily, even with an rsync from one > disk to another. Though that also smells a little like networking in > the traces. Maybe netdev has ideas. Is this new to 4.15? Or is it just that you're testing something new? If it's new and easy to repro, can you just bisect it? And if it isn't new, can you perhaps check whether it's new to 4.14 (ie 4.13 being ok)? Because that fs_reclaim_acquire/release() debugging isn't new to 4.15, but it was rewritten for 4.14.. I'm wondering if that remodeling ended up triggering something. Adding PeterZ to the participants list in case he has ideas. I'm not seeing what would be the problem in that call chain from hell. Linus
Re: [RFC 0/2] hv_netvsc shutdown redo
On Sat, 27 Jan 2018 21:00:12 + Haiyang Zhangwrote: > In the functions, set_channels and change_mtu, we used to call netvsc_close > which has a wait for ring buffers to drain. Now, we call rndis_filter_close() > directly without the wait for rings to drain. Could this be a problem? > rndis_filter_close now waits for rings to drain.
Re: [PATCH iproute2] ip: address: fix stats64 JSON object name
On Fri, 26 Jan 2018 11:30:35 -0800 Jakub Kicinskiwrote: > The JSON object name for statistics in ip link show is "stats644". > Looks like a typo, commit d0e720111aad ("ip: ipaddress.c: add support > for json output") contains an example with the expected "stats64" name. > > The fact that no one has noticed until now is probably an indication > that no one is using this object. Hopefully it's not too late to fix > this, although IIUC this has already been in 4.13 and 4.14 releases :S > > Fixes: d0e720111aad ("ip: ipaddress.c: add support for json output") > Signed-off-by: Jakub Kicinski > --- > ip/ipaddress.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ip/ipaddress.c b/ip/ipaddress.c > index ba60125c1b78..67ac6bd31373 100644 > --- a/ip/ipaddress.c > +++ b/ip/ipaddress.c > @@ -598,7 +598,7 @@ static void print_link_stats64(FILE *fp, const struct > rtnl_link_stats64 *s, > const struct rtattr *carrier_changes) > { > if (is_json_context()) { > - open_json_object("stats644"); > + open_json_object("stats64"); > > /* RX stats */ > open_json_object("rx"); Thanks for the bugfix. Applied.
Re: [PATCH iproute2] tc: fix second printing of requeues
On Sat, 27 Jan 2018 01:19:04 -0800 Jakub Kicinskiwrote: > Non-JSON tc qdisc output used to print the "requeues" statistic > twice. Commit 4fcec7f3665b ("tc: jsonify stats2") tried to preserve > this behaviour for both standard output and JSON, but used the wrong > statistic (q.qlen). Also duplicating keys in JSON is not allowed, > so the second occurrence should be completely skipped with JSON. > > Fixes: 4fcec7f3665b ("tc: jsonify stats2") > Signed-off-by: Jakub Kicinski Also applied this fix
Re: [4.15-rc9] fs_reclaim lockdep trace
Linus Torvalds wrote: > On Sat, Jan 27, 2018 at 2:24 PM, Dave Joneswrote: >> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: >> > Just triggered this on a server I was rsync'ing to. >> >> Actually, I can trigger this really easily, even with an rsync from one >> disk to another. Though that also smells a little like networking in >> the traces. Maybe netdev has ideas. > > Is this new to 4.15? Or is it just that you're testing something new? > > If it's new and easy to repro, can you just bisect it? And if it isn't > new, can you perhaps check whether it's new to 4.14 (ie 4.13 being > ok)? > > Because that fs_reclaim_acquire/release() debugging isn't new to 4.15, > but it was rewritten for 4.14.. I'm wondering if that remodeling ended > up triggering something. --- linux-4.13.16/mm/page_alloc.c +++ linux-4.14.15/mm/page_alloc.c @@ -3527,53 +3519,12 @@ return true; } return false; } #endif /* CONFIG_COMPACTION */ -#ifdef CONFIG_LOCKDEP -struct lockdep_map __fs_reclaim_map = - STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map); - -static bool __need_fs_reclaim(gfp_t gfp_mask) -{ - gfp_mask = current_gfp_context(gfp_mask); - - /* no reclaim without waiting on it */ - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) - return false; - - /* this guy won't enter reclaim */ - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) - return false; - - /* We're only interested __GFP_FS allocations for now */ - if (!(gfp_mask & __GFP_FS)) - return false; - - if (gfp_mask & __GFP_NOLOCKDEP) - return false; - - return true; -} - -void fs_reclaim_acquire(gfp_t gfp_mask) -{ - if (__need_fs_reclaim(gfp_mask)) - lock_map_acquire(&__fs_reclaim_map); -} -EXPORT_SYMBOL_GPL(fs_reclaim_acquire); - -void fs_reclaim_release(gfp_t gfp_mask) -{ - if (__need_fs_reclaim(gfp_mask)) - lock_map_release(&__fs_reclaim_map); -} -EXPORT_SYMBOL_GPL(fs_reclaim_release); -#endif - /* Perform direct synchronous page reclaim */ static int __perform_reclaim(gfp_t gfp_mask, unsigned int order, const struct alloc_context *ac) { struct reclaim_state reclaim_state; @@ -3582,21 +3533,21 @@ cond_resched(); /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); noreclaim_flag = memalloc_noreclaim_save(); - fs_reclaim_acquire(gfp_mask); + lockdep_set_current_reclaim_state(gfp_mask); reclaim_state.reclaimed_slab = 0; current->reclaim_state = _state; progress = try_to_free_pages(ac->zonelist, order, gfp_mask, ac->nodemask); current->reclaim_state = NULL; - fs_reclaim_release(gfp_mask); + lockdep_clear_current_reclaim_state(); memalloc_noreclaim_restore(noreclaim_flag); cond_resched(); return progress; } > > Adding PeterZ to the participants list in case he has ideas. I'm not > seeing what would be the problem in that call chain from hell. > >Linus Dave Jones wrote: > > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > __lock_acquire+0xa09/0x2040 > lock_acquire+0x12e/0x350 > fs_reclaim_acquire.part.102+0x29/0x30 > kmem_cache_alloc+0x3d/0x2c0 > alloc_extent_state+0xa7/0x410 > __clear_extent_bit+0x3ea/0x570 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > btrfs_releasepage+0x161/0x170 > try_to_release_page+0x162/0x1c0 > shrink_page_list+0x1d5a/0x2fb0 > shrink_inactive_list+0x451/0x940 > shrink_node_memcg.constprop.88+0x4c9/0x5e0 > shrink_node+0x12d/0x260 > try_to_free_pages+0x418/0xaf0 > __alloc_pages_slowpath+0x976/0x1790 > __alloc_pages_nodemask+0x52c/0x5c0 > new_slab+0x374/0x3f0 > ___slab_alloc.constprop.81+0x47e/0x5a0 >
Re: [4.15-rc9] fs_reclaim lockdep trace
On 2018/01/28 10:16, Tetsuo Handa wrote: > Linus Torvalds wrote: >> On Sat, Jan 27, 2018 at 2:24 PM, Dave Joneswrote: >>> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: >>> > Just triggered this on a server I was rsync'ing to. >>> >>> Actually, I can trigger this really easily, even with an rsync from one >>> disk to another. Though that also smells a little like networking in >>> the traces. Maybe netdev has ideas. >> >> Is this new to 4.15? Or is it just that you're testing something new? >> >> If it's new and easy to repro, can you just bisect it? And if it isn't >> new, can you perhaps check whether it's new to 4.14 (ie 4.13 being >> ok)? >> >> Because that fs_reclaim_acquire/release() debugging isn't new to 4.15, >> but it was rewritten for 4.14.. I'm wondering if that remodeling ended >> up triggering something. > > --- linux-4.13.16/mm/page_alloc.c > +++ linux-4.14.15/mm/page_alloc.c Oops. This output was inverted. > @@ -3527,53 +3519,12 @@ > return true; > } > return false; > } > #endif /* CONFIG_COMPACTION */ > > -#ifdef CONFIG_LOCKDEP > -struct lockdep_map __fs_reclaim_map = > - STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map); > - > -static bool __need_fs_reclaim(gfp_t gfp_mask) > -{ > - gfp_mask = current_gfp_context(gfp_mask); > - > - /* no reclaim without waiting on it */ > - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) > - return false; > - > - /* this guy won't enter reclaim */ > - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) > - return false; Since __kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, __need_fs_reclaim() is failing to return false here. But why checking __GFP_NOMEMALLOC here? __alloc_pages_slowpath() skips direct reclaim if !(gfp_mask & __GFP_DIRECT_RECLAIM) or (current->flags & PF_MEMALLOC), doesn't it? -- static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { (...snipped...) /* Caller is not willing to reclaim, we can't balance anything */ if (!can_direct_reclaim) goto nopage; /* Avoid recursion of direct reclaim */ if (current->flags & PF_MEMALLOC) goto nopage; /* Try direct reclaim and then allocating */ page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac, _some_progress); if (page) goto got_pg; (...snipped...) } --
Incoming TCP packet validation (of ACK numbers)
Hi there, As part of our ongoing research effort to understand the discrepancies among Linux, macOS (FreeBSD), and Windows. We discover a violation of the way Linux hanldes incoming TCP packet, specifically ACK number validation. According to RFC 793, "If the ACK is a duplicate (SEG.ACK < SND.UNA), it can be ignored. If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return". In RFC 5961, the first sentence is changed (for more stringent ACK number validation) but the second sentence remains the same. Clearly, when the ACK number of the incoming packet is larger than SND.NXT, we are supposed to send back an ACK. However, Linux currently chooses to silently discard the packet without any reply. We have checked macOS implementation which adheres to the specification. I'd love to hear any thoughts on this. Best, -Zhiyun
RE: [RFC crypto v3 8/9] chtls: Register the ULP
-Original Message- From: Dave Watson [mailto:davejwat...@fb.com] Sent: Friday, January 26, 2018 2:39 AM To: Atul GuptaCc: herb...@gondor.apana.org.au; linux-cry...@vger.kernel.org; ganes...@chelsio.co; netdev@vger.kernel.org; da...@davemloft.net; Boris Pismenny ; Ilya Lesokhin Subject: Re: [RFC crypto v3 8/9] chtls: Register the ULP <1513769897-26945-1-git-send-email-atul.gu...@chelsio.com> On 12/20/17 05:08 PM, Atul Gupta wrote: > +static void __init chtls_init_ulp_ops(void) { > + chtls_base_prot = tcp_prot; > + chtls_base_prot.hash= chtls_hash; > + chtls_base_prot.unhash = chtls_unhash; > + chtls_base_prot.close = chtls_lsk_close; > + > + chtls_cpl_prot = chtls_base_prot; > + chtls_init_rsk_ops(_cpl_prot, _rsk_ops, > +_prot, PF_INET); > + chtls_cpl_prot.close= chtls_close; > + chtls_cpl_prot.disconnect = chtls_disconnect; > + chtls_cpl_prot.destroy = chtls_destroy_sock; > + chtls_cpl_prot.shutdown = chtls_shutdown; > + chtls_cpl_prot.sendmsg = chtls_sendmsg; > + chtls_cpl_prot.recvmsg = chtls_recvmsg; > + chtls_cpl_prot.sendpage = chtls_sendpage; > + chtls_cpl_prot.setsockopt = chtls_setsockopt; > + chtls_cpl_prot.getsockopt = chtls_getsockopt; > +} Much of this file should go in tls_main.c, reusing as much as possible. For example it doesn't look like the get/set sockopts have changed at all for chtls. Agree, should common code and anything other than TLS_BASE_TX/TLS_SW_TX prot should go in vendor specific file/driver. Since, prot require redefinition for hardware the code is kept in chtls_main.c > + > +static int __init chtls_register(void) { > + chtls_init_ulp_ops(); > + register_listen_notifier(_notifier); > + cxgb4_register_uld(CXGB4_ULD_TLS, _uld_info); > + tcp_register_ulp(_chtls_ulp_ops); > + return 0; > +} > + > +static void __exit chtls_unregister(void) { > + unregister_listen_notifier(_notifier); > + tcp_unregister_ulp(_chtls_ulp_ops); > + chtls_free_all_uld(); > + cxgb4_unregister_uld(CXGB4_ULD_TLS); > +} The idea with ULP is that there is one ULP hook per protocol, not per driver. One thought is that apps/lib calling setsockopt pass the required ulp type [tls or chtls or xtls], this enables any HW assist to define base_prot as required and keep common code [tls_main] independent of underlying HW. If we are to have single TLS ULP hook [good from user point] then need a way to determine which Inline tls hw is used? System with multiple Inline TLS capable hw and differing functionality would require checks in tls_main to exercise that specific functionality/callback?
Re: [4.15-rc9] fs_reclaim lockdep trace
Dave, would you try below patch? >From cae2cbf389ae3cdef1b492622722b4aeb07eb284 Mon Sep 17 00:00:00 2001 From: Tetsuo HandaDate: Sun, 28 Jan 2018 14:17:14 +0900 Subject: [PATCH] lockdep: Fix fs_reclaim warning. Dave Jones reported fs_reclaim lockdep warnings. WARNING: possible recursive locking detected 4.15.0-rc9-backup-debug+ #1 Not tainted sshd/24800 is trying to acquire lock: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 but task is already holding lock: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(fs_reclaim); lock(fs_reclaim); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by sshd/24800: #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40 #1: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 stack backtrace: CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 Call Trace: dump_stack+0xbc/0x13f __lock_acquire+0xa09/0x2040 lock_acquire+0x12e/0x350 fs_reclaim_acquire.part.102+0x29/0x30 kmem_cache_alloc+0x3d/0x2c0 alloc_extent_state+0xa7/0x410 __clear_extent_bit+0x3ea/0x570 try_release_extent_mapping+0x21a/0x260 __btrfs_releasepage+0xb0/0x1c0 btrfs_releasepage+0x161/0x170 try_to_release_page+0x162/0x1c0 shrink_page_list+0x1d5a/0x2fb0 shrink_inactive_list+0x451/0x940 shrink_node_memcg.constprop.88+0x4c9/0x5e0 shrink_node+0x12d/0x260 try_to_free_pages+0x418/0xaf0 __alloc_pages_slowpath+0x976/0x1790 __alloc_pages_nodemask+0x52c/0x5c0 new_slab+0x374/0x3f0 ___slab_alloc.constprop.81+0x47e/0x5a0 __slab_alloc.constprop.80+0x32/0x60 __kmalloc_track_caller+0x267/0x310 __kmalloc_reserve.isra.40+0x29/0x80 __alloc_skb+0xee/0x390 sk_stream_alloc_skb+0xb8/0x340 tcp_sendmsg_locked+0x8e6/0x1d30 tcp_sendmsg+0x27/0x40 inet_sendmsg+0xd0/0x310 sock_write_iter+0x17a/0x240 __vfs_write+0x2ab/0x380 vfs_write+0xfb/0x260 SyS_write+0xb6/0x140 do_syscall_64+0x1e5/0xc05 entry_SYSCALL64_slow_path+0x25/0x25 Since no fs locks are held, doing GFP_KERNEL allocation should be safe as long as there is PF_MEMALLOC safeguard ( /* Avoid recursion of direct reclaim */ if (p->flags & PF_MEMALLOC) goto nopage; ) which prevents infinite recursion. This warning seems to be caused by commit d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation") which moved the location of /* this guy won't enter reclaim */ if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) return false; check added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context (__GFP_NOFS)"). Since __kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, __need_fs_reclaim() is failing to return false despite PF_MEMALLOC context (and resulted in lockdep warning). Since there was no PF_MEMALLOC safeguard as of cf40bd16fdad42c0, checking __GFP_NOMEMALLOC might make sense. But since this safeguard was added by commit 341ce06f69abfafa ("page allocator: calculate the alloc_flags for allocation only once"), checking __GFP_NOMEMALLOC no longer makes sense. Thus, let's remove __GFP_NOMEMALLOC check and allow __need_fs_reclaim() to return false. Reported-by: Dave Jones Signed-off-by: Tetsuo Handa Cc: Peter Zijlstra Cc: Nick Piggin --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 76c9688..7804b0e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3583,7 +3583,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask) return false; /* this guy won't enter reclaim */ - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) + if (current->flags & PF_MEMALLOC) return false; /* We're only interested __GFP_FS allocations for now */ -- 1.8.3.1