[PATCH net-next 2/9] inet: shrink netns_ipv4 by another cache line

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet By shuffling around some fields to remove 8 bytes of hole, we can save one cache line. pahole result before/after the patch : /* size: 768, cachelines: 12, members: 139 */ /* sum members: 673, holes: 11, sum holes: 39 */ /* padding: 56 */ /* paddings: 2, sum paddings: 7

[PATCH net-next 3/9] ipv4: convert fib_notify_on_flag_change sysctl to u8

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet Reduce footprint of sysctls. Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 2 +- net/ipv4/sysctl_net_ipv4.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index

[PATCH net-next 5/9] ipv4: convert fib_multipath_{use_neigh|hash_policy} sysctls to u8

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet Make room for better packing of netns_ipv4 Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 4 ++-- net/ipv4/sysctl_net_ipv4.c | 8 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index

[PATCH net-next 6/9] ipv4: convert igmp_link_local_mcast_reports sysctl to u8

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet This sysctl is a bool, can use less storage. Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 2 +- net/ipv4/sysctl_net_ipv4.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index

[PATCH net-next 7/9] tcp: convert tcp_comp_sack_nr sysctl to u8

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet tcp_comp_sack_nr max value was already 255. Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 2 +- net/ipv4/sysctl_net_ipv4.c | 6 ++ 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index

[PATCH net-next 8/9] ipv6: convert elligible sysctls to u8

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet Convert most sysctls that can fit in a byte. Signed-off-by: Eric Dumazet --- include/net/netns/ipv6.h | 24 net/ipv6/icmp.c| 12 ++-- net/ipv6/sysctl_net_ipv6.c | 38 ++ 3 files changed, 36

[PATCH net-next 9/9] ipv6: move ip6_dst_ops first in netns_ipv6

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet ip6_dst_ops have cache line alignement. Moving it at beginning of netns_ipv6 removes a 48 byte hole, and shrinks netns_ipv6 from 12 to 11 cache lines. Signed-off-by: Eric Dumazet --- include/net/netns/ipv6.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff

[PATCH net-next 4/9] ipv4: convert udp_l3mdev_accept sysctl to u8

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet Reduce footprint of sysctls. Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 2 +- net/ipv4/sysctl_net_ipv4.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index

[PATCH net-next] ipv6: remove extra dev_hold() for fallback tunnels

2021-03-31 Thread Eric Dumazet
From: Eric Dumazet My previous commits added a dev_hold() in tunnels ndo_init(), but forgot to remove it from special functions setting up fallback tunnels. Fallback tunnels do call their respective ndo_init() This leads to various reports like : unregister_netdevice: waiting for ip6gre0 to

Re: Fw: [Bug 212515] New: DoS Attack on Fragment Cache

2021-04-01 Thread Eric Dumazet
On 4/1/21 8:08 PM, Stephen Hemminger wrote: > Initial discussion is that this bug is not easily addressable. > Any fragmentation handler is subject to getting poisoned. > > Begin forwarded message: > > Date: Wed, 31 Mar 2021 22:39:12 + > From: bugzilla-dae...@bugzilla.kernel.org > To: step

Re: [PATCH net] atl1c: move tx cleanup processing out of interrupt

2021-04-01 Thread Eric Dumazet
On 4/1/21 7:32 PM, Gatis Peisenieks wrote: > Tx queue cleanup happens in interrupt handler on same core as rx queue > processing. > Both can take considerable amount of processing in high packet-per-second > scenarios. > > Sending big amounts of packets can stall the rx processing which is un

[PATCH net] virtio_net: Do not pull payload in skb->head

2021-04-02 Thread Eric Dumazet
From: Eric Dumazet Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop. The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which use

Re: [PATCH] net: initialize local variables in net/ipv6/mcast.c and net/ipv4/igmp.c

2021-04-02 Thread Eric Dumazet
On 4/2/21 7:36 PM, Phillip Potter wrote: > Use memset to initialize two local buffers in net/ipv6/mcast.c, > and another in net/ipv4/igmp.c. Fixes a KMSAN found uninit-value > bug reported by syzbot at: > https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51 According t

Re: [PATCH net v2] atl1c: move tx cleanup processing out of interrupt

2021-04-02 Thread Eric Dumazet
On 4/2/21 7:20 PM, Gatis Peisenieks wrote: > Tx queue cleanup happens in interrupt handler on same core as rx queue > processing. > Both can take considerable amount of processing in high packet-per-second > scenarios. > > Sending big amounts of packets can stall the rx processing which is un

[PATCH net-next] net: reorganize fields in netns_mib

2021-04-02 Thread Eric Dumazet
From: Eric Dumazet Order fields to increase locality for most used protocols. udplite and icmp are moved at the end. Same for proc_net_devsnmp6 which is not used in fast path. This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6. Signed-off-by: Eric Dumazet

[PATCH net-next] tcp: reorder tcp_congestion_ops for better cache locality

2021-04-02 Thread Eric Dumazet
From: Eric Dumazet Group all the often used fields in the first cache line, to reduce cache line misses. Signed-off-by: Eric Dumazet --- include/net/tcp.h | 42 +++--- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/include/net/tcp.h b

Re: [PATCH net v2] atl1c: move tx cleanup processing out of interrupt

2021-04-02 Thread Eric Dumazet
On 4/2/21 7:20 PM, Gatis Peisenieks wrote: > Tx queue cleanup happens in interrupt handler on same core as rx queue > processing. > Both can take considerable amount of processing in high packet-per-second > scenarios. > ... > @@ -2504,6 +2537,7 @@ static int atl1c_init_netdev(struct net_de

Re: [PATCH] net: initialize local variables in net/ipv6/mcast.c and net/ipv4/igmp.c

2021-04-02 Thread Eric Dumazet
On 4/2/21 8:10 PM, Phillip Potter wrote: > On Fri, Apr 02, 2021 at 07:49:44PM +0200, Eric Dumazet wrote: >> >> >> On 4/2/21 7:36 PM, Phillip Potter wrote: >>> Use memset to initialize two local buffers in net/ipv6/mcast.c, >>> and another in net/ipv4/i

Re: [PATCH] net: initialize local variables in net/ipv6/mcast.c and net/ipv4/igmp.c

2021-04-02 Thread Eric Dumazet
On 4/2/21 10:53 PM, Eric Dumazet wrote: > > > On 4/2/21 8:10 PM, Phillip Potter wrote: >> On Fri, Apr 02, 2021 at 07:49:44PM +0200, Eric Dumazet wrote: >>> >>> >>> On 4/2/21 7:36 PM, Phillip Potter wrote: >>>> Use memset to initialize two

Re: [Patch bpf-next v8 10/16] sock: introduce sk->sk_prot->psock_update_sk_prot()

2021-04-05 Thread Eric Dumazet
On 3/31/21 4:32 AM, Cong Wang wrote: > From: Cong Wang > > Currently sockmap calls into each protocol to update the struct > proto and replace it. This certainly won't work when the protocol > is implemented as a module, for example, AF_UNIX. > > Introduce a new ops sk->sk_prot->psock_update_

Re: [PATCH] net: tun: set tun->dev->addr_len during TUNSETLINK processing

2021-04-06 Thread Eric Dumazet
d uninit-value bug reported by syzbot at: > https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51 > > Reported-by: syzbot+001516d86dbe88862...@syzkaller.appspotmail.com > Signed-off-by: Phillip Potter > --- Please give credits to people who helped. You could h

Re: [PATCH v3] net: tun: set tun->dev->addr_len during TUNSETLINK processing

2021-04-06 Thread Eric Dumazet
d uninit-value bug reported by syzbot at: > https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51 > > Reported-by: syzbot+001516d86dbe88862...@syzkaller.appspotmail.com > Diagnosed-by: Eric Dumazet > Signed-off-by: Phillip Potter > --- SGTM, thanks a lot. Reviewed-by: Eric Dumazet

Re: [PATCH net v4] atl1c: move tx cleanup processing out of interrupt

2021-04-07 Thread Eric Dumazet
On 4/6/21 4:49 PM, Gatis Peisenieks wrote: > Tx queue cleanup happens in interrupt handler on same core as rx queue > processing. Both can take considerable amount of processing in high > packet-per-second scenarios. > > Sending big amounts of packets can stall the rx processing which is unfair

Re: [PATCH net-next] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-07 Thread Eric Dumazet
On 4/7/21 7:49 AM, Xuan Zhuo wrote: > In page_to_skb(), if we have enough tailroom to save skb_shared_info, we > can use build_skb to create skb directly. No need to alloc for > additional space. And it can save a 'frags slot', which is very friendly > to GRO. > > Here, if the payload of the re

Re: [PATCH] net: sched: sch_teql: fix null-pointer dereference

2021-04-08 Thread Eric Dumazet
imediately calls teql_destroy() which does not expect > zero master pointer and we get OOPS. > > Signed-off-by: Pavel Tikhomirov > --- This makes sense, thanks ! Reviewed-by: Eric Dumazet I would think bug origin is Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Can you confirm you have this backported to 3.10.0-1062.7.1.el7.x86_64 ?

Re: Problem in pfmemalloc skb handling in net/core/dev.c

2021-04-09 Thread Eric Dumazet
On 4/9/21 11:14 AM, Xie He wrote: > On Fri, Apr 9, 2021 at 1:44 AM Mel Gorman wrote: >> >> That would imply that the tap was communicating with a swap device to >> allocate a pfmemalloc skb which shouldn't happen. Furthermore, it would >> require the swap device to be deactivated while pfmemall

Re: [PATCH net] net: fix hangup on napi_disable for threaded napi

2021-04-09 Thread Eric Dumazet
On 4/9/21 11:24 AM, Paolo Abeni wrote: > On Wed, 2021-04-07 at 11:13 -0700, Jakub Kicinski wrote: >> On Wed, 07 Apr 2021 16:54:29 +0200 Paolo Abeni wrote: > I think in the above example even the normal processing will be > fooled?!? e.g. even without the napi_disable(), napi_thread_wait(

Re: [PATCH] net/rds: Avoid potential use after free in rds_send_remove_from_sock

2021-04-09 Thread Eric Dumazet
On 4/7/21 2:09 AM, Aditya Pakki wrote: > In case of rs failure in rds_send_remove_from_sock(), the 'rm' resource > is freed and later under spinlock, causing potential use-after-free. > Set the free pointer to NULL to avoid undefined behavior. > > Signed-off-by: Aditya Pakki > --- > net/rds/m

Re: Problem in pfmemalloc skb handling in net/core/dev.c

2021-04-09 Thread Eric Dumazet
On 4/9/21 12:14 PM, Xie He wrote: > On Fri, Apr 9, 2021 at 3:04 AM Eric Dumazet wrote: >> >> Note that pfmemalloc skbs are normally dropped in sk_filter_trim_cap() >> >> Simply make sure your protocol use it. > > It seems "sk_filter_trim_cap" ne

[PATCH net] netfilter: nft_limit: avoid possible divide error in nft_limit_init

2021-04-09 Thread Eric Dumazet
From: Eric Dumazet div_u64() divides u64 by u32. nft_limit_init() wants to divide u64 by u64, use the appropriate math function (div64_u64) divide error: [#1] PREEMPT SMP KASAN CPU: 1 PID: 8390 Comm: syz-executor188 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute

Re: [PATCH] tcp: Reset tcp connections in SYN-SENT state

2021-04-09 Thread Eric Dumazet
On 4/5/21 7:02 PM, Manoj Basapathi wrote: > Userspace sends tcp connection (sock) destroy on network switch > i.e switching the default network of the device between multiple > networks(Cellular/Wifi/Ethernet). > > Kernel though doesn't send reset for the connections in SYN-SENT state > and the

[PATCH net-next] Revert "tcp: Reset tcp connections in SYN-SENT state"

2021-04-09 Thread Eric Dumazet
From: Eric Dumazet This reverts commit e880f8b3a24a73704731a7227ed5fee14bd90192. 1) Patch has not been properly tested, and is wrong [1] 2) Patch submission did not include TCP maintainer (this is me) [1] divide error: [#1] PREEMPT SMP KASAN CPU: 0 PID: 8426 Comm: syz-executor478 Not

Re: [syzbot] KMSAN: uninit-value in INET_ECN_decapsulate (2)

2021-04-12 Thread Eric Dumazet
On 3/30/21 3:26 PM, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:29ad81a1 arch/x86: add missing include to sparsemem.h > git tree: https://github.com/google/kmsan.git master > console output: https://syzkaller.appspot.com/x/log.txt?x=166fe481d0

Re: [PATCH net-next v2 2/3] net: use skb_for_each_frag() helper where possible

2021-04-12 Thread Eric Dumazet
On 4/12/21 2:38 AM, Matteo Croce wrote: > From: Matteo Croce > > use the new helper macro skb_for_each_frag() which allows to iterate > through all the SKB fragments. > > The patch was created with Coccinelle, this was the semantic patch: > > @@ > struct sk_buff *skb; > identifier i; > state

[PATCH net] gro: ensure frag0 meets IP header alignment

2021-04-13 Thread Eric Dumazet
From: Eric Dumazet After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head") Guenter Roeck reported one failure in his tests using sh architecture. After much debugging, we have been able to spot silent unaligned accesses in inet_gro_receive() The issue at hand

Re: A data race between fanout_demux_rollover() and __fanout_unlink()

2021-04-14 Thread Eric Dumazet
On 4/14/21 1:27 AM, Willem de Bruijn wrote: > On Tue, Apr 13, 2021 at 6:55 PM Xie He wrote: >> >> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai wrote: >>> >>> Hi, >>> >>> We found a data race in linux-5.12-rc3 between af_packet.c functions >>> fanout_demux_rollover() and __fanout_unlink() and

Re: A data race between fanout_demux_rollover() and __fanout_unlink()

2021-04-14 Thread Eric Dumazet
On 4/14/21 6:52 PM, Eric Dumazet wrote: > > > On 4/14/21 1:27 AM, Willem de Bruijn wrote: >> On Tue, Apr 13, 2021 at 6:55 PM Xie He wrote: >>> >>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai wrote: >>>> >>>> Hi, >>>

[PATCH net] net/packet: remove data races in fanout operations

2021-04-14 Thread Eric Dumazet
From: Eric Dumazet af_packet fanout uses RCU rules to ensure f->arr elements are not dismantled before RCU grace period. However, it lacks rcu accessors to make sure KCSAN and other tools wont detect data races. Stupid compilers could also play games. Fixes: dc99f600698d ("packet: Ad

Re: [PATCH net v2] net: core: make napi_disable more robust

2021-04-14 Thread Eric Dumazet
On 4/15/21 1:21 AM, Jakub Kicinski wrote: > On Wed, 14 Apr 2021 03:08:45 -0500 Lijun Pan wrote: >> There are chances that napi_disable can be called twice by NIC driver. >> This could generate deadlock. For example, >> the first napi_disable will spin until NAPI_STATE_SCHED is cleared >> by napi

Re: [PATCH net v2] net: core: make napi_disable more robust

2021-04-14 Thread Eric Dumazet
On 4/14/21 10:08 AM, Lijun Pan wrote: > There are chances that napi_disable can be called twice by NIC driver. > This could generate deadlock. For example, > the first napi_disable will spin until NAPI_STATE_SCHED is cleared > by napi_complete_done, then set it again. > When napi_disable is call

Re: [PATCH] net: sched: tapr: remove WARN_ON() in taprio_get_start_time()

2021-04-14 Thread Eric Dumazet
On 4/15/21 8:39 AM, Du Cheng wrote: > There is a reproducible sequence from the userland that will trigger a > WARN_ON() > condition in taprio_get_start_time, which causes kernel to panic if configured > as "panic_on_warn". Remove this WARN_ON() to prevent kernel from crashing by > userland-ini

[PATCH net-next] scm: optimize put_cmsg()

2021-04-15 Thread Eric Dumazet
From: Eric Dumazet Calling two copy_to_user() for very small regions has very high overhead. Switch to inlined unsafe_put_user() to save one stac/clac sequence, and avoid copy_to_user(). Signed-off-by: Eric Dumazet Cc: Soheil Hassas Yeganeh --- net/core/scm.c | 21 ++--- 1

Re: [PATCH] net: sched: tapr: remove WARN_ON() in taprio_get_start_time()

2021-04-15 Thread Eric Dumazet
On 4/15/21 9:50 AM, Du Cheng wrote: > Le Thu, Apr 15, 2021 at 08:56:09AM +0200, Eric Dumazet a écrit : >> >> >> On 4/15/21 8:39 AM, Du Cheng wrote: >>> There is a reproducible sequence from the userland that will trigger a >>> WARN_ON() >>>

Re: [PATCH v2] net: sched: tapr: remove WARN_ON() in taprio_get_start_time()

2021-04-15 Thread Eric Dumazet
On 4/15/21 9:59 AM, Du Cheng wrote: > There is a reproducible sequence from the userland that will trigger a > WARN_ON() > condition in taprio_get_start_time, which causes kernel to panic if configured > as "panic_on_warn". Remove this WARN_ON() to prevent kernel from crashing by > userland-ini

[PATCH net-next] scm: fix a typo in put_cmsg()

2021-04-16 Thread Eric Dumazet
From: Eric Dumazet We need to store cmlen instead of len in cm->cmsg_len. Fixes: 38ebcf5096a8 ("scm: optimize put_cmsg()") Signed-off-by: Eric Dumazet Reported-by: Jakub Kicinski --- net/core/scm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/scm

[PATCH net] iwlwifi: provide gso_type to GSO packets

2021-01-25 Thread Eric Dumazet
From: Eric Dumazet net/core/tso.c got recent support for USO, and this broke iwlfifi because the driver implemented a limited form of GSO. Providing ->gso_type allows for skb_is_gso_tcp() to provide a correct result. Fixes: 3d5b459ba0e3 ("net: tso: add UDP segmentation support")

[PATCH net-next] net: reduce indentation level in sk_clone_lock()

2021-01-27 Thread Eric Dumazet
From: Eric Dumazet Rework initial test to jump over init code if memory allocation has failed. Signed-off-by: Eric Dumazet --- net/core/sock.c | 209 1 file changed, 103 insertions(+), 106 deletions(-) diff --git a/net/core/sock.c b/net/core

Re: [PATCH] netdevsim: init u64 stats for 32bit hardware

2021-01-28 Thread Eric Dumazet
On 1/28/21 8:23 AM, Dmitry Vyukov wrote: > On Thu, Jan 28, 2021 at 3:43 AM Hillf Danton wrote: >> >> Init the u64 stats in order to avoid the lockdep prints on the 32bit >> hardware like > > FTR this is not just to avoid lockdep prints, but also to prevent very > real stalls in production. Ar

[PATCH net-next] net: proc: speedup /proc/net/netstat

2021-01-28 Thread Eric Dumazet
From: Eric Dumazet Use cache friendly helpers to better use cpu caches while reading /proc/net/netstat Tested on a platform with 256 threads (AMD Rome) Before: 305 usec spent in netstat_seq_show() After: 130 usec spent in netstat_seq_show() Signed-off-by: Eric Dumazet --- net/ipv4/proc.c

Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage

2021-01-29 Thread Eric Dumazet
On 1/29/21 8:35 PM, Jakub Kicinski wrote: > kdoc didn't complain, and as you say it's already a mess, plus it's > two screen-fulls of scrolling away... > > I think converting to inline kdoc of members would be an improvement, > if you want to sign up for that? Otherwise -EDIDNTCARE on my side

[PATCH net-next] inet: do not export inet_gro_{receive|complete}

2021-02-02 Thread Eric Dumazet
From: Eric Dumazet inet_gro_receive() and inet_gro_complete() are part of GRO engine which can not be modular. Similarly, inet_gso_segment() does not need to be exported, being part of GSO stack. In other words, net/ipv6/ip6_offload.o is part of vmlinux, regardless of CONFIG_IPV6. Signed-off

Re: [PATCH] net/core/skbuff.c: __netdev_alloc_skb fix when len is greater than KMALLOC_MAX_SIZE

2021-03-01 Thread Eric Dumazet
On 2/26/21 8:11 PM, Pavel Skripkin wrote: > syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER. > It was caused by __netdev_alloc_skb(), which doesn't check len value after > adding NET_SKB_PAD. > Order will be >= MAX_ORDER and passed to __alloc_pages_nodemask() if size

Re: Fw: [Bug 212005] New: WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0

2021-03-01 Thread Eric Dumazet
On 3/1/21 4:58 PM, Stephen Hemminger wrote: > > > Begin forwarded message: > > Date: Mon, 01 Mar 2021 11:50:22 + > From: bugzilla-dae...@bugzilla.kernel.org > To: step...@networkplumber.org > Subject: [Bug 212005] New: WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 > tcp_recvmsg_locked+

Re: Fw: [Bug 212005] New: WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0

2021-03-01 Thread Eric Dumazet
On 3/1/21 5:37 PM, Eric Dumazet wrote: > > > On 3/1/21 4:58 PM, Stephen Hemminger wrote: >> >> >> Begin forwarded message: >> >> Date: Mon, 01 Mar 2021 11:50:22 + >> From: bugzilla-dae...@bugzilla.kernel.org >> To: step...@networkplumbe

[PATCH net] tcp: add sanity tests to TCP_QUEUE_SEQ

2021-03-01 Thread Eric Dumazet
From: Eric Dumazet Qingyu Li reported a syzkaller bug where the repro changes RCV SEQ _after_ restoring data in the receive queue. mprotect(0x4aa000, 12288, PROT_READ)= 0 mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1000 mmap(0x2000, 16777216

Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

2021-03-02 Thread Eric Dumazet
ack to non-fast clone skbs, this way > skb_still_in_host_queue() won't prevent the recovery flow > from completing. > > Suggested-by: Eric Dumazet > Fixes: 355a901e6cf1 ("tcp: make connect() mem charging friendly") Hmmm, not sure if this Fixes: tag makes sense. Really, if we delay TX

Re: seqlock lockdep false positives?

2021-03-09 Thread Eric Dumazet
On 3/9/21 8:54 AM, Peter Zijlstra wrote: > On Mon, Mar 08, 2021 at 09:42:08PM +0100, Erhard F. wrote: > >> I can confirm that your patch on top of 5.12-rc2 makes the lockdep >> splat disappear (Ahmeds' 1st patch not installed). > > Excellent, I'll queue the below in locking/urgent then. > >

Re: [syzbot] BUG: unable to handle kernel NULL pointer dereference in htb_select_queue

2021-03-09 Thread Eric Dumazet
On 3/9/21 4:13 PM, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:38b5133a octeontx2-pf: Fix otx2_get_fecparam() > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=166288a8d0 > kernel config: https://syzkaller.appspo

Re: [RFC Patch v1 1/3] net: ena: implement local page cache (LPC) system

2021-03-09 Thread Eric Dumazet
On 3/9/21 6:10 PM, Shay Agroskin wrote: > The page cache holds pages we allocated in the past during napi cycle, > and tracks their availability status using page ref count. > > The cache can hold up to 2048 pages. Upon allocating a page, we check > whether the next entry in the cache contains

Re: [PATCH] net: add net namespace inode for all net_dev events

2021-03-09 Thread Eric Dumazet
On 3/9/21 5:43 AM, Tony Lu wrote: > There are lots of net namespaces on the host runs containers like k8s. > It is very common to see the same interface names among different net > namespaces, such as eth0. It is not possible to distinguish them without > net namespace inode. > > This adds net

[PATCH net] macvlan: macvlan_count_rx() needs to be aware of preemption

2021-03-10 Thread Eric Dumazet
From: Eric Dumazet macvlan_count_rx() can be called from process context, it is thus necessary to disable preemption before calling u64_stats_update_begin() syzbot was able to spot this on 32bit arch: WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux

[PATCH net] net: sched: validate stab values

2021-03-10 Thread Eric Dumazet
From: Eric Dumazet iproute2 package is well behaved, but malicious user space can provide illegal shift values and trigger UBSAN reports. Add stab parameter to red_check_params() to validate user input. syzbot reported: UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18 shift exponent

Re: [PATCH] net: bonding: fix error return code of bond_neigh_init()

2021-03-10 Thread Eric Dumazet
On 3/10/21 10:24 AM, Roi Dayan wrote: > > > On 2021-03-08 5:11 AM, Jia-Ju Bai wrote: >> When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error >> return code of bond_neigh_init() is assigned. >> To fix this bug, ret is assigned with -EINVAL in these cases. >> >> Fixes: 9e99bfefdbce

Re: [syzbot] BUG: unable to handle kernel NULL pointer dereference in htb_select_queue

2021-03-10 Thread Eric Dumazet
On 3/10/21 3:54 PM, Maxim Mikityanskiy wrote: > On 2021-03-09 17:20, Eric Dumazet wrote: >> >> >> On 3/9/21 4:13 PM, syzbot wrote: >>> Hello, >>> >>> syzbot found the following issue on: >>> >>> HEAD commit:    38b5133a

Re: [syzbot] BUG: unable to handle kernel NULL pointer dereference in htb_select_queue

2021-03-10 Thread Eric Dumazet
On 3/10/21 7:55 PM, Maxim Mikityanskiy wrote: > On 2021-03-10 19:03, Eric Dumazet wrote: >> >> >> On 3/10/21 3:54 PM, Maxim Mikityanskiy wrote: >>> On 2021-03-09 17:20, Eric Dumazet wrote: >>>> >>>> >>>> On 3/9/21 4:13 PM, syzbo

[PATCH net-next 1/3] tcp: plug skb_still_in_host_queue() to TSQ

2021-03-11 Thread Eric Dumazet
From: Eric Dumazet Jakub and Neil reported an increase of RTO timers whenever TX completions are delayed a bit more (by increasing NIC TX coalescing parameters) Main issue is that TCP stack has a logic preventing a packet being retransmit if the prior clone has not yet been orphaned or freed

[PATCH net-next 0/3] tcp: better deal with delayed TX completions

2021-03-11 Thread Eric Dumazet
From: Eric Dumazet Jakub and Neil reported an increase of RTO timers whenever TX completions are delayed a bit more (by increasing NIC TX coalescing parameters) While problems have been there forever, second patch might introduce some regressions so I prefer not backport them to stable releases

[PATCH net-next 3/3] tcp: remove obsolete check in __tcp_retransmit_skb()

2021-03-11 Thread Eric Dumazet
From: Eric Dumazet TSQ provides a nice way to avoid bufferbloat on individual socket, including retransmit packets. We can get rid of the old heuristic: /* Do not sent more than we queued. 1/4 is reserved for possible * copying overhead: fragmentation, tunneling, mangling etc

[PATCH net-next 2/3] tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()

2021-03-11 Thread Eric Dumazet
From: Eric Dumazet Jakub reported Data included in a Fastopen SYN that had to be retransmit would have to wait for an RTO if TX completions are slow, even with prior fix. This is because tcp_rcv_fastopen_synack() does not use standard rtx logic, meaning TSQ handler exits early in tcp_tsq_write

[PATCH 4.19-stable 2/3] tcp: annotate tp->write_seq lockless reads

2021-03-12 Thread Eric Dumazet
From: Eric Dumazet [ Upstream commit 0f31746452e6793ad6271337438af8f4defb8940 ] There are few places where we fetch tp->write_seq while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to av

[PATCH 4.19-stable 1/3] tcp: annotate tp->copied_seq lockless reads

2021-03-12 Thread Eric Dumazet
From: Eric Dumazet [ Upstream commit 7db48e983930285b765743ebd665aecf9850582b ] There are few places where we fetch tp->copied_seq while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to av

[PATCH 4.19-stable 3/3] tcp: add sanity tests to TCP_QUEUE_SEQ

2021-03-12 Thread Eric Dumazet
From: Eric Dumazet [ Upstream commit 8811f4a9836e31c14ecdf79d9f3cb7c5d463265d ] Qingyu Li reported a syzkaller bug where the repro changes RCV SEQ _after_ restoring data in the receive queue. mprotect(0x4aa000, 12288, PROT_READ)= 0 mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED

Re: [PATCH net-next v2] net: sock: simplify tw proto registration

2021-03-12 Thread Eric Dumazet
On 3/12/21 1:10 AM, patchwork-bot+netdev...@kernel.org wrote: > Hello: > > This patch was applied to netdev/net.git (refs/heads/master): > > On Thu, 11 Mar 2021 10:57:36 +0800 you wrote: >> From: Tonghao Zhang >> >> Introduce the new function tw_prot_init (inspired by >> req_prot_init) to sim

[PATCH net] net: qrtr: fix a kernel-infoleak in qrtr_recvmsg()

2021-03-12 Thread Eric Dumazet
From: Eric Dumazet struct sockaddr_qrtr has a 2-byte hole, and qrtr_recvmsg() currently does not clear it before copying kernel data to user space. It might be too late to name the hole since sockaddr_qrtr structure is uapi. BUG: KMSAN: kernel-infoleak in kmsan_copy_to_user+0x9c/0xb0 mm/kmsan

[PATCH net] tipc: better validate user input in tipc_nl_retrieve_key()

2021-03-15 Thread Eric Dumazet
From: Eric Dumazet Before calling tipc_aead_key_size(ptr), we need to ensure we have enough data to dereference ptr->keylen. We probably also want to make sure tipc_aead_key_size() wont overflow with malicious ptr->keylen values. Syzbot reported: BUG: KMSAN: uninit-va

Re: [PATCH] tcp: relookup sock for RST+ACK packets handled by obsolete req sock

2021-03-15 Thread Eric Dumazet
On 3/12/21 12:07 AM, Alexander Ovechkin wrote: > Currently tcp_check_req can be called with obsolete req socket for which big > socket have been already created (because of CPU race or early demux > assigning req socket to multiple packets in gro batch). > > Commit e0f9759f530bf789e984 (\"tcp:

Re: [PATCH net v2] udp: fix skb_copy_and_csum_datagram with odd segment sizes

2021-02-04 Thread Eric Dumazet
of csump pointer (Alexander Duyck) > > Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ > Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") > Reported-by: Oliver Graute > Signed-off-by: Willem de Bruijn > --- > Reviewed-by: Eric Dumazet

[PATCH net] net: gro: do not keep too many GRO packets in napi->rx_list

2021-02-04 Thread Eric Dumazet
From: Eric Dumazet Commit c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation") had the unfortunate effect of adding latencies in common workloads. Before the patch, GRO packets were immediately passed to upper stacks. After the patch, we can accumula

Re: [PATCH] net/vmw_vsock: fix NULL pointer deref and improve locking

2021-02-04 Thread Eric Dumazet
On 2/4/21 10:28 PM, Norbert Slusarek wrote: > From: Norbert Slusarek > Date: Thu, 4 Feb 2021 18:49:24 +0100 > Subject: [PATCH] net/vmw_vsock: fix NULL pointer deref and improve locking > > In vsock_stream_connect(), a thread will enter schedule_timeout(). > While being scheduled out, another t

Re: [PATCH net-next 7/8] mld: convert ip6_sf_socklist to list macros

2021-02-08 Thread Eric Dumazet
On 2/8/21 6:58 PM, Taehee Yoo wrote: > Currently, struct ip6_sf_socklist doesn't use list API so that code > shape is a little bit different from others. > So it converts ip6_sf_socklist to use list API so it would > improve readability. > > Signed-off-by: Taehee Yoo > --- > include/net/if_in

[PATCH net-next 2/2] tcp: add some entropy in __inet_hash_connect()

2021-02-09 Thread Eric Dumazet
From: Eric Dumazet Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash Port Selection Algorithm), a patient attacker could still be able to collect enough state from an otherwise idle host. Idea of this patch is to inject some noise, in the cases __inet_hash_connect() found a

[PATCH net-next 0/2] tcp: RFC 6056 induced changes

2021-02-09 Thread Eric Dumazet
From: Eric Dumazet This is based on a report from David Dworken. First patch implements RFC 6056 3.3.4 proposal. Second patch is adding a little bit of noise to make attacker life a bit harder. Eric Dumazet (2): tcp: change source port randomizarion at connect() time tcp: add some entropy

[PATCH net-next 1/2] tcp: change source port randomizarion at connect() time

2021-02-09 Thread Eric Dumazet
From: Eric Dumazet RFC 6056 (Recommendations for Transport-Protocol Port Randomization) provides good summary of why source selection needs extra care. David Dworken reminded us that linux implements Algorithm 3 as described in RFC 6056 3.3.3 Quoting David : In the context of the web, this

[PATCH net-next] net: initialize net->net_cookie at netns setup

2021-02-10 Thread Eric Dumazet
From: Eric Dumazet It is simpler to make net->net_cookie a plain u64 written once in setup_net() instead of looping and using atomic64 helpers. Lorenz Bauer wants to add SO_NETNS_COOKIE socket option and this patch would makes his patch series simpler. Signed-off-by: Eric Dumazet Cc: Dan

Re: [PATCH bpf 1/4] net: add SO_NETNS_COOKIE socket option

2021-02-10 Thread Eric Dumazet
On 2/10/21 1:04 PM, Lorenz Bauer wrote: > We need to distinguish which network namespace a socket belongs to. > BPF has the useful bpf_get_netns_cookie helper for this, but accessing > it from user space isn't possible. Add a read-only socket option that > returns the netns cookie, similar to SO

[PATCH net] tcp: fix tcp_rmem documentation

2021-02-10 Thread Eric Dumazet
From: Eric Dumazet tcp_rmem[1] has been changed to 131072, we should update the documentation to reflect this. Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") Signed-off-by: Eric Dumazet Reported-by: Zhibin Liu Cc: Yuchung Cheng --- Doc

Re: KASAN: vmalloc-out-of-bounds Read in bpf_trace_run3

2021-02-10 Thread Eric Dumazet
On 11/13/20 5:08 PM, Yonghong Song wrote: > > > On 11/12/20 9:37 PM, Matt Mullins wrote: >> On Wed, Nov 11, 2020 at 03:57:50PM +0100, Dmitry Vyukov wrote: >>> On Mon, Nov 2, 2020 at 12:54 PM syzbot >>> wrote: Hello, syzbot found the following issue on: HEAD commi

[PATCH net-next] net: add CONFIG_PCPU_DEV_REFCNT

2021-03-19 Thread Eric Dumazet
From: Eric Dumazet I was working on a syzbot issue, claiming one device could not be dismantled because its refcount was -1 unregister_netdevice: waiting for sit0 to become free. Usage count = -1 It would be nice if syzbot could trigger a warning at the time this reference count became

[PATCH v2 net-next] net: add CONFIG_PCPU_DEV_REFCNT

2021-03-19 Thread Eric Dumazet
From: Eric Dumazet I was working on a syzbot issue, claiming one device could not be dismantled because its refcount was -1 unregister_netdevice: waiting for sit0 to become free. Usage count = -1 It would be nice if syzbot could trigger a warning at the time this reference count became

[PATCH net-next] net: set initial device refcount to 1

2021-03-22 Thread Eric Dumazet
From: Eric Dumazet When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the initial net device refcount was 0. When CONFIG_PCPU_DEV_REFCNT is not set, this means the first dev_hold() triggers an illegal refcount operation (addition on 0) refcount_t: addition on 0; use-after-free. WARNING: CPU: 0

Re: [PATCH net-next V5 6/6] icmp: add response to RFC 8335 PROBE messages

2021-03-24 Thread Eric Dumazet
On 3/24/21 7:18 PM, Andreas Roeseler wrote: > Modify the icmp_rcv function to check PROBE messages and call icmp_echo > if a PROBE request is detected. > ... > @@ -1340,6 +1440,7 @@ static int __net_init icmp_sk_init(struct net *net) > > /* Control parameters for ECHO replies. */ >

[PATCH net-next] inet: use bigger hash table for IP ID generation

2021-03-24 Thread Eric Dumazet
From: Eric Dumazet In commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count") I used a very small hash table that could be abused by patient attackers to reveal sensitive information. Switch to a dynamic sizing, depending on RAM size. Typical big hosts will now use 128x more storage

[PATCH net-next] tcp_metrics: tcpm_hash_bucket is strictly local

2021-03-24 Thread Eric Dumazet
From: Eric Dumazet After commit 098a697b497e ("tcp_metrics: Use a single hash table for all network namespaces."), tcpm_hash_bucket is local to net/ipv4/tcp_metrics.c Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include

Re: [PATCH] net: change netdev_unregister_timeout_secs min value to 1

2021-03-25 Thread Eric Dumazet
On 3/25/21 11:31 AM, Dmitry Vyukov wrote: > netdev_unregister_timeout_secs=0 can lead to printing the > "waiting for dev to become free" message every jiffy. > This is too frequent and unnecessary. > Set the min value to 1 second. > > Signed-off-by: Dmitry Vyukov

Re: [PATCH] net: change netdev_unregister_timeout_secs min value to 1

2021-03-25 Thread Eric Dumazet
On 3/25/21 3:38 PM, Dmitry Vyukov wrote: > On Thu, Mar 25, 2021 at 3:34 PM Eric Dumazet wrote: >> On 3/25/21 11:31 AM, Dmitry Vyukov wrote: >>> netdev_unregister_timeout_secs=0 can lead to printing the >>> "waiting for dev to become free" message ever

Re: [PATCH net-next v2] net: change netdev_unregister_timeout_secs min value to 1

2021-03-25 Thread Eric Dumazet
e introduced by > "net: make unregister netdev warning timeout configurable": > it changed "refcnt != 1" to "refcnt". > > Signed-off-by: Dmitry Vyukov > Suggested-by: Eric Dumazet > Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout

[PATCH net-next 0/5] net: use less storage for most sysctl

2021-03-25 Thread Eric Dumazet
From: Eric Dumazet This patch series adds a new sysctl type, to allow using u8 instead of "int" or "long int" types. Then we convert mosts sysctls found in struct netns_ipv4 to shrink it by three cache lines. Eric Dumazet (5): sysctl: add proc_dou8vec_minmax() ipv4: sh

[PATCH net-next 1/5] sysctl: add proc_dou8vec_minmax()

2021-03-25 Thread Eric Dumazet
From: Eric Dumazet Networking has many sysctls that could fit in one u8. This patch adds proc_dou8vec_minmax() for this purpose. Note that the .extra1 and .extra2 fields are pointing to integers, because it makes conversions easier. Signed-off-by: Eric Dumazet --- fs/proc/proc_sysctl.c

[PATCH net-next 2/5] ipv4: shrink netns_ipv4 with sysctl conversions

2021-03-25 Thread Eric Dumazet
From: Eric Dumazet These sysctls that can fit in one byte instead of one int are converted to save space and thus reduce cache line misses. - icmp_echo_ignore_all, icmp_echo_ignore_broadcasts, - icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr - tcp_ecn, tcp_ecn_fallback

[PATCH net-next 4/5] inet: convert tcp_early_demux and udp_early_demux to u8

2021-03-25 Thread Eric Dumazet
From: Eric Dumazet For these sysctls, their dedicated helpers have to use proc_dou8vec_minmax(). Signed-off-by: Eric Dumazet --- include/net/netns/ipv4.h | 4 ++-- net/ipv4/sysctl_net_ipv4.c | 8 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/net/netns/ipv4

<    1   2   3   4   5   6   7   8   9   10   >