From: Eric Dumazet
By shuffling around some fields to remove 8 bytes of hole,
we can save one cache line.
pahole result before/after the patch :
/* size: 768, cachelines: 12, members: 139 */
/* sum members: 673, holes: 11, sum holes: 39 */
/* padding: 56 */
/* paddings: 2, sum paddings: 7
From: Eric Dumazet
Reduce footprint of sysctls.
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 2 +-
net/ipv4/sysctl_net_ipv4.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index
From: Eric Dumazet
Make room for better packing of netns_ipv4
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 4 ++--
net/ipv4/sysctl_net_ipv4.c | 8
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index
From: Eric Dumazet
This sysctl is a bool, can use less storage.
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 2 +-
net/ipv4/sysctl_net_ipv4.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index
From: Eric Dumazet
tcp_comp_sack_nr max value was already 255.
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 2 +-
net/ipv4/sysctl_net_ipv4.c | 6 ++
2 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index
From: Eric Dumazet
Convert most sysctls that can fit in a byte.
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv6.h | 24
net/ipv6/icmp.c| 12 ++--
net/ipv6/sysctl_net_ipv6.c | 38 ++
3 files changed, 36
From: Eric Dumazet
ip6_dst_ops have cache line alignement.
Moving it at beginning of netns_ipv6
removes a 48 byte hole, and shrinks netns_ipv6
from 12 to 11 cache lines.
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv6.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff
From: Eric Dumazet
Reduce footprint of sysctls.
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 2 +-
net/ipv4/sysctl_net_ipv4.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index
From: Eric Dumazet
My previous commits added a dev_hold() in tunnels ndo_init(),
but forgot to remove it from special functions setting up fallback tunnels.
Fallback tunnels do call their respective ndo_init()
This leads to various reports like :
unregister_netdevice: waiting for ip6gre0 to
On 4/1/21 8:08 PM, Stephen Hemminger wrote:
> Initial discussion is that this bug is not easily addressable.
> Any fragmentation handler is subject to getting poisoned.
>
> Begin forwarded message:
>
> Date: Wed, 31 Mar 2021 22:39:12 +
> From: bugzilla-dae...@bugzilla.kernel.org
> To: step
On 4/1/21 7:32 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing.
> Both can take considerable amount of processing in high packet-per-second
> scenarios.
>
> Sending big amounts of packets can stall the rx processing which is un
From: Eric Dumazet
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced
to chain sk_buff (using skb_shinfo(skb)->frag_list), which
use
On 4/2/21 7:36 PM, Phillip Potter wrote:
> Use memset to initialize two local buffers in net/ipv6/mcast.c,
> and another in net/ipv4/igmp.c. Fixes a KMSAN found uninit-value
> bug reported by syzbot at:
> https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51
According t
On 4/2/21 7:20 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing.
> Both can take considerable amount of processing in high packet-per-second
> scenarios.
>
> Sending big amounts of packets can stall the rx processing which is un
From: Eric Dumazet
Order fields to increase locality for most used protocols.
udplite and icmp are moved at the end.
Same for proc_net_devsnmp6 which is not used in fast path.
This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6.
Signed-off-by: Eric Dumazet
From: Eric Dumazet
Group all the often used fields in the first cache line,
to reduce cache line misses.
Signed-off-by: Eric Dumazet
---
include/net/tcp.h | 42 +++---
1 file changed, 27 insertions(+), 15 deletions(-)
diff --git a/include/net/tcp.h b
On 4/2/21 7:20 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing.
> Both can take considerable amount of processing in high packet-per-second
> scenarios.
>
...
> @@ -2504,6 +2537,7 @@ static int atl1c_init_netdev(struct net_de
On 4/2/21 8:10 PM, Phillip Potter wrote:
> On Fri, Apr 02, 2021 at 07:49:44PM +0200, Eric Dumazet wrote:
>>
>>
>> On 4/2/21 7:36 PM, Phillip Potter wrote:
>>> Use memset to initialize two local buffers in net/ipv6/mcast.c,
>>> and another in net/ipv4/i
On 4/2/21 10:53 PM, Eric Dumazet wrote:
>
>
> On 4/2/21 8:10 PM, Phillip Potter wrote:
>> On Fri, Apr 02, 2021 at 07:49:44PM +0200, Eric Dumazet wrote:
>>>
>>>
>>> On 4/2/21 7:36 PM, Phillip Potter wrote:
>>>> Use memset to initialize two
On 3/31/21 4:32 AM, Cong Wang wrote:
> From: Cong Wang
>
> Currently sockmap calls into each protocol to update the struct
> proto and replace it. This certainly won't work when the protocol
> is implemented as a module, for example, AF_UNIX.
>
> Introduce a new ops sk->sk_prot->psock_update_
d uninit-value bug reported by syzbot at:
> https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51
>
> Reported-by: syzbot+001516d86dbe88862...@syzkaller.appspotmail.com
> Signed-off-by: Phillip Potter
> ---
Please give credits to people who helped.
You could h
d uninit-value bug reported by syzbot at:
> https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51
>
> Reported-by: syzbot+001516d86dbe88862...@syzkaller.appspotmail.com
> Diagnosed-by: Eric Dumazet
> Signed-off-by: Phillip Potter
> ---
SGTM, thanks a lot.
Reviewed-by: Eric Dumazet
On 4/6/21 4:49 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue
> processing. Both can take considerable amount of processing in high
> packet-per-second scenarios.
>
> Sending big amounts of packets can stall the rx processing which is unfair
On 4/7/21 7:49 AM, Xuan Zhuo wrote:
> In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
> can use build_skb to create skb directly. No need to alloc for
> additional space. And it can save a 'frags slot', which is very friendly
> to GRO.
>
> Here, if the payload of the re
imediately calls teql_destroy() which does not expect
> zero master pointer and we get OOPS.
>
> Signed-off-by: Pavel Tikhomirov
> ---
This makes sense, thanks !
Reviewed-by: Eric Dumazet
I would think bug origin is
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation")
Can you confirm you have this backported to 3.10.0-1062.7.1.el7.x86_64 ?
On 4/9/21 11:14 AM, Xie He wrote:
> On Fri, Apr 9, 2021 at 1:44 AM Mel Gorman wrote:
>>
>> That would imply that the tap was communicating with a swap device to
>> allocate a pfmemalloc skb which shouldn't happen. Furthermore, it would
>> require the swap device to be deactivated while pfmemall
On 4/9/21 11:24 AM, Paolo Abeni wrote:
> On Wed, 2021-04-07 at 11:13 -0700, Jakub Kicinski wrote:
>> On Wed, 07 Apr 2021 16:54:29 +0200 Paolo Abeni wrote:
> I think in the above example even the normal processing will be
> fooled?!? e.g. even without the napi_disable(), napi_thread_wait(
On 4/7/21 2:09 AM, Aditya Pakki wrote:
> In case of rs failure in rds_send_remove_from_sock(), the 'rm' resource
> is freed and later under spinlock, causing potential use-after-free.
> Set the free pointer to NULL to avoid undefined behavior.
>
> Signed-off-by: Aditya Pakki
> ---
> net/rds/m
On 4/9/21 12:14 PM, Xie He wrote:
> On Fri, Apr 9, 2021 at 3:04 AM Eric Dumazet wrote:
>>
>> Note that pfmemalloc skbs are normally dropped in sk_filter_trim_cap()
>>
>> Simply make sure your protocol use it.
>
> It seems "sk_filter_trim_cap" ne
From: Eric Dumazet
div_u64() divides u64 by u32.
nft_limit_init() wants to divide u64 by u64, use the appropriate
math function (div64_u64)
divide error: [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8390 Comm: syz-executor188 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute
On 4/5/21 7:02 PM, Manoj Basapathi wrote:
> Userspace sends tcp connection (sock) destroy on network switch
> i.e switching the default network of the device between multiple
> networks(Cellular/Wifi/Ethernet).
>
> Kernel though doesn't send reset for the connections in SYN-SENT state
> and the
From: Eric Dumazet
This reverts commit e880f8b3a24a73704731a7227ed5fee14bd90192.
1) Patch has not been properly tested, and is wrong [1]
2) Patch submission did not include TCP maintainer (this is me)
[1]
divide error: [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8426 Comm: syz-executor478 Not
On 3/30/21 3:26 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:29ad81a1 arch/x86: add missing include to sparsemem.h
> git tree: https://github.com/google/kmsan.git master
> console output: https://syzkaller.appspot.com/x/log.txt?x=166fe481d0
On 4/12/21 2:38 AM, Matteo Croce wrote:
> From: Matteo Croce
>
> use the new helper macro skb_for_each_frag() which allows to iterate
> through all the SKB fragments.
>
> The patch was created with Coccinelle, this was the semantic patch:
>
> @@
> struct sk_buff *skb;
> identifier i;
> state
From: Eric Dumazet
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand
On 4/14/21 1:27 AM, Willem de Bruijn wrote:
> On Tue, Apr 13, 2021 at 6:55 PM Xie He wrote:
>>
>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai wrote:
>>>
>>> Hi,
>>>
>>> We found a data race in linux-5.12-rc3 between af_packet.c functions
>>> fanout_demux_rollover() and __fanout_unlink() and
On 4/14/21 6:52 PM, Eric Dumazet wrote:
>
>
> On 4/14/21 1:27 AM, Willem de Bruijn wrote:
>> On Tue, Apr 13, 2021 at 6:55 PM Xie He wrote:
>>>
>>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai wrote:
>>>>
>>>> Hi,
>>>
From: Eric Dumazet
af_packet fanout uses RCU rules to ensure f->arr elements
are not dismantled before RCU grace period.
However, it lacks rcu accessors to make sure KCSAN and other tools
wont detect data races. Stupid compilers could also play games.
Fixes: dc99f600698d ("packet: Ad
On 4/15/21 1:21 AM, Jakub Kicinski wrote:
> On Wed, 14 Apr 2021 03:08:45 -0500 Lijun Pan wrote:
>> There are chances that napi_disable can be called twice by NIC driver.
>> This could generate deadlock. For example,
>> the first napi_disable will spin until NAPI_STATE_SCHED is cleared
>> by napi
On 4/14/21 10:08 AM, Lijun Pan wrote:
> There are chances that napi_disable can be called twice by NIC driver.
> This could generate deadlock. For example,
> the first napi_disable will spin until NAPI_STATE_SCHED is cleared
> by napi_complete_done, then set it again.
> When napi_disable is call
On 4/15/21 8:39 AM, Du Cheng wrote:
> There is a reproducible sequence from the userland that will trigger a
> WARN_ON()
> condition in taprio_get_start_time, which causes kernel to panic if configured
> as "panic_on_warn". Remove this WARN_ON() to prevent kernel from crashing by
> userland-ini
From: Eric Dumazet
Calling two copy_to_user() for very small regions has very high overhead.
Switch to inlined unsafe_put_user() to save one stac/clac sequence,
and avoid copy_to_user().
Signed-off-by: Eric Dumazet
Cc: Soheil Hassas Yeganeh
---
net/core/scm.c | 21 ++---
1
On 4/15/21 9:50 AM, Du Cheng wrote:
> Le Thu, Apr 15, 2021 at 08:56:09AM +0200, Eric Dumazet a écrit :
>>
>>
>> On 4/15/21 8:39 AM, Du Cheng wrote:
>>> There is a reproducible sequence from the userland that will trigger a
>>> WARN_ON()
>>>
On 4/15/21 9:59 AM, Du Cheng wrote:
> There is a reproducible sequence from the userland that will trigger a
> WARN_ON()
> condition in taprio_get_start_time, which causes kernel to panic if configured
> as "panic_on_warn". Remove this WARN_ON() to prevent kernel from crashing by
> userland-ini
From: Eric Dumazet
We need to store cmlen instead of len in cm->cmsg_len.
Fixes: 38ebcf5096a8 ("scm: optimize put_cmsg()")
Signed-off-by: Eric Dumazet
Reported-by: Jakub Kicinski
---
net/core/scm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/scm
From: Eric Dumazet
net/core/tso.c got recent support for USO, and this broke iwlfifi
because the driver implemented a limited form of GSO.
Providing ->gso_type allows for skb_is_gso_tcp() to provide
a correct result.
Fixes: 3d5b459ba0e3 ("net: tso: add UDP segmentation support")
From: Eric Dumazet
Rework initial test to jump over init code
if memory allocation has failed.
Signed-off-by: Eric Dumazet
---
net/core/sock.c | 209
1 file changed, 103 insertions(+), 106 deletions(-)
diff --git a/net/core/sock.c b/net/core
On 1/28/21 8:23 AM, Dmitry Vyukov wrote:
> On Thu, Jan 28, 2021 at 3:43 AM Hillf Danton wrote:
>>
>> Init the u64 stats in order to avoid the lockdep prints on the 32bit
>> hardware like
>
> FTR this is not just to avoid lockdep prints, but also to prevent very
> real stalls in production.
Ar
From: Eric Dumazet
Use cache friendly helpers to better use cpu caches
while reading /proc/net/netstat
Tested on a platform with 256 threads (AMD Rome)
Before: 305 usec spent in netstat_seq_show()
After: 130 usec spent in netstat_seq_show()
Signed-off-by: Eric Dumazet
---
net/ipv4/proc.c
On 1/29/21 8:35 PM, Jakub Kicinski wrote:
> kdoc didn't complain, and as you say it's already a mess, plus it's
> two screen-fulls of scrolling away...
>
> I think converting to inline kdoc of members would be an improvement,
> if you want to sign up for that? Otherwise -EDIDNTCARE on my side
From: Eric Dumazet
inet_gro_receive() and inet_gro_complete() are part
of GRO engine which can not be modular.
Similarly, inet_gso_segment() does not need to be exported,
being part of GSO stack.
In other words, net/ipv6/ip6_offload.o is part of vmlinux,
regardless of CONFIG_IPV6.
Signed-off
On 2/26/21 8:11 PM, Pavel Skripkin wrote:
> syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
> It was caused by __netdev_alloc_skb(), which doesn't check len value after
> adding NET_SKB_PAD.
> Order will be >= MAX_ORDER and passed to __alloc_pages_nodemask() if size
On 3/1/21 4:58 PM, Stephen Hemminger wrote:
>
>
> Begin forwarded message:
>
> Date: Mon, 01 Mar 2021 11:50:22 +
> From: bugzilla-dae...@bugzilla.kernel.org
> To: step...@networkplumber.org
> Subject: [Bug 212005] New: WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343
> tcp_recvmsg_locked+
On 3/1/21 5:37 PM, Eric Dumazet wrote:
>
>
> On 3/1/21 4:58 PM, Stephen Hemminger wrote:
>>
>>
>> Begin forwarded message:
>>
>> Date: Mon, 01 Mar 2021 11:50:22 +
>> From: bugzilla-dae...@bugzilla.kernel.org
>> To: step...@networkplumbe
From: Eric Dumazet
Qingyu Li reported a syzkaller bug where the repro
changes RCV SEQ _after_ restoring data in the receive queue.
mprotect(0x4aa000, 12288, PROT_READ)= 0
mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) =
0x1000
mmap(0x2000, 16777216
ack to non-fast clone skbs, this way
> skb_still_in_host_queue() won't prevent the recovery flow
> from completing.
>
> Suggested-by: Eric Dumazet
> Fixes: 355a901e6cf1 ("tcp: make connect() mem charging friendly")
Hmmm, not sure if this Fixes: tag makes sense.
Really, if we delay TX
On 3/9/21 8:54 AM, Peter Zijlstra wrote:
> On Mon, Mar 08, 2021 at 09:42:08PM +0100, Erhard F. wrote:
>
>> I can confirm that your patch on top of 5.12-rc2 makes the lockdep
>> splat disappear (Ahmeds' 1st patch not installed).
>
> Excellent, I'll queue the below in locking/urgent then.
>
>
On 3/9/21 4:13 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:38b5133a octeontx2-pf: Fix otx2_get_fecparam()
> git tree: net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=166288a8d0
> kernel config: https://syzkaller.appspo
On 3/9/21 6:10 PM, Shay Agroskin wrote:
> The page cache holds pages we allocated in the past during napi cycle,
> and tracks their availability status using page ref count.
>
> The cache can hold up to 2048 pages. Upon allocating a page, we check
> whether the next entry in the cache contains
On 3/9/21 5:43 AM, Tony Lu wrote:
> There are lots of net namespaces on the host runs containers like k8s.
> It is very common to see the same interface names among different net
> namespaces, such as eth0. It is not possible to distinguish them without
> net namespace inode.
>
> This adds net
From: Eric Dumazet
macvlan_count_rx() can be called from process context, it is thus
necessary to disable preemption before calling u64_stats_update_begin()
syzbot was able to spot this on 32bit arch:
WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert
include/linux
From: Eric Dumazet
iproute2 package is well behaved, but malicious user space can
provide illegal shift values and trigger UBSAN reports.
Add stab parameter to red_check_params() to validate user input.
syzbot reported:
UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18
shift exponent
On 3/10/21 10:24 AM, Roi Dayan wrote:
>
>
> On 2021-03-08 5:11 AM, Jia-Ju Bai wrote:
>> When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
>> return code of bond_neigh_init() is assigned.
>> To fix this bug, ret is assigned with -EINVAL in these cases.
>>
>> Fixes: 9e99bfefdbce
On 3/10/21 3:54 PM, Maxim Mikityanskiy wrote:
> On 2021-03-09 17:20, Eric Dumazet wrote:
>>
>>
>> On 3/9/21 4:13 PM, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit: 38b5133a
On 3/10/21 7:55 PM, Maxim Mikityanskiy wrote:
> On 2021-03-10 19:03, Eric Dumazet wrote:
>>
>>
>> On 3/10/21 3:54 PM, Maxim Mikityanskiy wrote:
>>> On 2021-03-09 17:20, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 3/9/21 4:13 PM, syzbo
From: Eric Dumazet
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)
Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed
From: Eric Dumazet
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)
While problems have been there forever, second patch might
introduce some regressions so I prefer not backport
them to stable releases
From: Eric Dumazet
TSQ provides a nice way to avoid bufferbloat on individual socket,
including retransmit packets. We can get rid of the old
heuristic:
/* Do not sent more than we queued. 1/4 is reserved for possible
* copying overhead: fragmentation, tunneling, mangling etc
From: Eric Dumazet
Jakub reported Data included in a Fastopen SYN that had to be
retransmit would have to wait for an RTO if TX completions are slow,
even with prior fix.
This is because tcp_rcv_fastopen_synack() does not use standard
rtx logic, meaning TSQ handler exits early in tcp_tsq_write
From: Eric Dumazet
[ Upstream commit 0f31746452e6793ad6271337438af8f4defb8940 ]
There are few places where we fetch tp->write_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to av
From: Eric Dumazet
[ Upstream commit 7db48e983930285b765743ebd665aecf9850582b ]
There are few places where we fetch tp->copied_seq while
this field can change from IRQ or other cpu.
We need to add READ_ONCE() annotations, and also make
sure write sides use corresponding WRITE_ONCE() to av
From: Eric Dumazet
[ Upstream commit 8811f4a9836e31c14ecdf79d9f3cb7c5d463265d ]
Qingyu Li reported a syzkaller bug where the repro
changes RCV SEQ _after_ restoring data in the receive queue.
mprotect(0x4aa000, 12288, PROT_READ)= 0
mmap(0x1000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED
On 3/12/21 1:10 AM, patchwork-bot+netdev...@kernel.org wrote:
> Hello:
>
> This patch was applied to netdev/net.git (refs/heads/master):
>
> On Thu, 11 Mar 2021 10:57:36 +0800 you wrote:
>> From: Tonghao Zhang
>>
>> Introduce the new function tw_prot_init (inspired by
>> req_prot_init) to sim
From: Eric Dumazet
struct sockaddr_qrtr has a 2-byte hole, and qrtr_recvmsg() currently
does not clear it before copying kernel data to user space.
It might be too late to name the hole since sockaddr_qrtr structure is uapi.
BUG: KMSAN: kernel-infoleak in kmsan_copy_to_user+0x9c/0xb0
mm/kmsan
From: Eric Dumazet
Before calling tipc_aead_key_size(ptr), we need to ensure
we have enough data to dereference ptr->keylen.
We probably also want to make sure tipc_aead_key_size()
wont overflow with malicious ptr->keylen values.
Syzbot reported:
BUG: KMSAN: uninit-va
On 3/12/21 12:07 AM, Alexander Ovechkin wrote:
> Currently tcp_check_req can be called with obsolete req socket for which big
> socket have been already created (because of CPU race or early demux
> assigning req socket to multiple packets in gro batch).
>
> Commit e0f9759f530bf789e984 (\"tcp:
of csump pointer (Alexander Duyck)
>
> Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
> Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers")
> Reported-by: Oliver Graute
> Signed-off-by: Willem de Bruijn
> ---
>
Reviewed-by: Eric Dumazet
From: Eric Dumazet
Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.
Before the patch, GRO packets were immediately passed to
upper stacks.
After the patch, we can accumula
On 2/4/21 10:28 PM, Norbert Slusarek wrote:
> From: Norbert Slusarek
> Date: Thu, 4 Feb 2021 18:49:24 +0100
> Subject: [PATCH] net/vmw_vsock: fix NULL pointer deref and improve locking
>
> In vsock_stream_connect(), a thread will enter schedule_timeout().
> While being scheduled out, another t
On 2/8/21 6:58 PM, Taehee Yoo wrote:
> Currently, struct ip6_sf_socklist doesn't use list API so that code
> shape is a little bit different from others.
> So it converts ip6_sf_socklist to use list API so it would
> improve readability.
>
> Signed-off-by: Taehee Yoo
> ---
> include/net/if_in
From: Eric Dumazet
Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
Port Selection Algorithm), a patient attacker could still be able
to collect enough state from an otherwise idle host.
Idea of this patch is to inject some noise, in the
cases __inet_hash_connect() found a
From: Eric Dumazet
This is based on a report from David Dworken.
First patch implements RFC 6056 3.3.4 proposal.
Second patch is adding a little bit of noise to make
attacker life a bit harder.
Eric Dumazet (2):
tcp: change source port randomizarion at connect() time
tcp: add some entropy
From: Eric Dumazet
RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
provides good summary of why source selection needs extra care.
David Dworken reminded us that linux implements Algorithm 3
as described in RFC 6056 3.3.3
Quoting David :
In the context of the web, this
From: Eric Dumazet
It is simpler to make net->net_cookie a plain u64
written once in setup_net() instead of looping
and using atomic64 helpers.
Lorenz Bauer wants to add SO_NETNS_COOKIE socket option
and this patch would makes his patch series simpler.
Signed-off-by: Eric Dumazet
Cc: Dan
On 2/10/21 1:04 PM, Lorenz Bauer wrote:
> We need to distinguish which network namespace a socket belongs to.
> BPF has the useful bpf_get_netns_cookie helper for this, but accessing
> it from user space isn't possible. Add a read-only socket option that
> returns the netns cookie, similar to SO
From: Eric Dumazet
tcp_rmem[1] has been changed to 131072, we should update the documentation
to reflect this.
Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around
64KB")
Signed-off-by: Eric Dumazet
Reported-by: Zhibin Liu
Cc: Yuchung Cheng
---
Doc
On 11/13/20 5:08 PM, Yonghong Song wrote:
>
>
> On 11/12/20 9:37 PM, Matt Mullins wrote:
>> On Wed, Nov 11, 2020 at 03:57:50PM +0100, Dmitry Vyukov wrote:
>>> On Mon, Nov 2, 2020 at 12:54 PM syzbot
>>> wrote:
Hello,
syzbot found the following issue on:
HEAD commi
From: Eric Dumazet
I was working on a syzbot issue, claiming one device could not be
dismantled because its refcount was -1
unregister_netdevice: waiting for sit0 to become free. Usage count = -1
It would be nice if syzbot could trigger a warning at the time
this reference count became
From: Eric Dumazet
I was working on a syzbot issue, claiming one device could not be
dismantled because its refcount was -1
unregister_netdevice: waiting for sit0 to become free. Usage count = -1
It would be nice if syzbot could trigger a warning at the time
this reference count became
From: Eric Dumazet
When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the
initial net device refcount was 0.
When CONFIG_PCPU_DEV_REFCNT is not set, this means
the first dev_hold() triggers an illegal refcount
operation (addition on 0)
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 0
On 3/24/21 7:18 PM, Andreas Roeseler wrote:
> Modify the icmp_rcv function to check PROBE messages and call icmp_echo
> if a PROBE request is detected.
>
...
> @@ -1340,6 +1440,7 @@ static int __net_init icmp_sk_init(struct net *net)
>
> /* Control parameters for ECHO replies. */
>
From: Eric Dumazet
In commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count")
I used a very small hash table that could be abused
by patient attackers to reveal sensitive information.
Switch to a dynamic sizing, depending on RAM size.
Typical big hosts will now use 128x more storage
From: Eric Dumazet
After commit 098a697b497e ("tcp_metrics: Use a single hash table
for all network namespaces."), tcpm_hash_bucket is local to
net/ipv4/tcp_metrics.c
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include
On 3/25/21 11:31 AM, Dmitry Vyukov wrote:
> netdev_unregister_timeout_secs=0 can lead to printing the
> "waiting for dev to become free" message every jiffy.
> This is too frequent and unnecessary.
> Set the min value to 1 second.
>
> Signed-off-by: Dmitry Vyukov
On 3/25/21 3:38 PM, Dmitry Vyukov wrote:
> On Thu, Mar 25, 2021 at 3:34 PM Eric Dumazet wrote:
>> On 3/25/21 11:31 AM, Dmitry Vyukov wrote:
>>> netdev_unregister_timeout_secs=0 can lead to printing the
>>> "waiting for dev to become free" message ever
e introduced by
> "net: make unregister netdev warning timeout configurable":
> it changed "refcnt != 1" to "refcnt".
>
> Signed-off-by: Dmitry Vyukov
> Suggested-by: Eric Dumazet
> Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout
From: Eric Dumazet
This patch series adds a new sysctl type, to allow using u8 instead of
"int" or "long int" types.
Then we convert mosts sysctls found in struct netns_ipv4
to shrink it by three cache lines.
Eric Dumazet (5):
sysctl: add proc_dou8vec_minmax()
ipv4: sh
From: Eric Dumazet
Networking has many sysctls that could fit in one u8.
This patch adds proc_dou8vec_minmax() for this purpose.
Note that the .extra1 and .extra2 fields are pointing
to integers, because it makes conversions easier.
Signed-off-by: Eric Dumazet
---
fs/proc/proc_sysctl.c
From: Eric Dumazet
These sysctls that can fit in one byte instead of one int
are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts,
- icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr
- tcp_ecn, tcp_ecn_fallback
From: Eric Dumazet
For these sysctls, their dedicated helpers have
to use proc_dou8vec_minmax().
Signed-off-by: Eric Dumazet
---
include/net/netns/ipv4.h | 4 ++--
net/ipv4/sysctl_net_ipv4.c | 8
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/net/netns/ipv4
201 - 300 of 7364 matches
Mail list logo