.
Eric Dumazet (2):
tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE
include/uapi/linux/tcp.h | 8 ++
net/ipv4/tcp.c | 186 +
tools/testing/selftests/net/tcp_mmap.c
On 04/24/2018 11:28 PM, Christoph Hellwig wrote:
> On Tue, Apr 24, 2018 at 10:27:21PM -0700, Eric Dumazet wrote:
>> When adding tcp mmap() implementation, I forgot that socket lock
>> had to be taken before current->mm->mmap_sem. syzbot eventually caught
>> the bug.
On 04/24/2018 10:27 PM, Eric Dumazet wrote:
> When adding tcp mmap() implementation, I forgot that socket lock
> had to be taken before current->mm->mmap_sem. syzbot eventually caught
> the bug.
> +
...
> + down_read(¤t->mm->mmap_sem);
> +
> + ret
On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote:
> sch_cake targets the home router use case and is intended to squeeze the
> most bandwidth and latency out of even the slowest ISP links and routers,
> while presenting an API simple enough that even an ISP can configure it.
>
* Support for
On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote:
> Hmm, because pure ACKs are not generally aggregated (sorry, I'm not
> quite clear on when exactly GSO will kick in)?
A GSO packet must contain payload in each of its segment.
There is no way some ack aggregation logic could build a GSO p
On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote:
> Eric Dumazet writes:
>> What performance number do you get on a 10Gbit NIC for example ?
>
> Single-flow throughput through 2 hops on a 40Gbit connection (with CAKE
> in unlimited mode vs pfifo_fast on the router):
On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote:
> Eric Dumazet writes:
>> Lack of any pskb_may_pull() is really concerning.
>
> By this you mean "check that the packet is long enough to contain the
> header we are looking for before trying to do ACK filtering&quo
On 04/25/2018 09:04 AM, Matthew Wilcox wrote:
> If you don't zap the page range, any of the CPUs in the system where
> any thread in this task have ever run may have a TLB entry pointing to
> this page ... if the page is being recycled into the page allocator,
> then that page might end up as a
On 04/25/2018 09:06 AM, Toke Høiland-Jørgensen wrote:
> Eric Dumazet writes:
>
>> On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote:
>>> Eric Dumazet writes:
>>
>>>> What performance number do you get on a 10Gbit NIC for example ?
>>>
>&
On 04/25/2018 09:22 AM, Andy Lutomirski wrote:
> In general, I suspect that the zerocopy receive mechanism will only
> really be a win in single-threaded applications that consume large
> amounts of receive bandwidth on a single TCP socket using lots of
> memory and don't do all that much else.
On 04/25/2018 09:35 AM, Eric Dumazet wrote:
>
>
> On 04/25/2018 09:22 AM, Andy Lutomirski wrote:
>
>> In general, I suspect that the zerocopy receive mechanism will only
>> really be a win in single-threaded applications that consume large
>> amounts of rec
On 04/25/2018 09:52 AM, Jonathan Morton wrote:
>> We can see here the high cost of forcing software GSO :/
>>
>> Really, this should be done only :
>> 1) If requested by the admin ( tc gso )
>>
>> 2) If packet size is above a threshold.
>> The threshold could be set by the admin, and/or
On 04/25/2018 09:55 AM, Toke Høiland-Jørgensen wrote:
> Well, as I said, 10Gbit+ links are not really the target audience ;)
Well, 640KB of memory is all we need.
>
> We did actually have a threshold at some point, but it was removed
> because it didn't work well (I'm not sure of the details,
On 04/25/2018 09:17 AM, Toke Høiland-Jørgensen wrote:
> Or am I to interpret that as a hard NAK on having this feature in CAKE
> (even if we fix the issues you pointed out)?
No strong NACK, as long as the current code is fixed.
Right now, a malicious packet will kill the box or something bad l
On 04/25/2018 11:34 AM, Toke Høiland-Jørgensen wrote:
> Eric Dumazet writes:
>
>> On 04/25/2018 09:52 AM, Jonathan Morton wrote:
>>>> We can see here the high cost of forcing software GSO :/
>>>>
>>>> Really, this should be done only :
.
v2:
Added a missing page align of zc->length in tcp_zerocopy_receive()
Properly clear zc->recv_skip_hint in case user request was completed.
Eric Dumazet (2):
tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE
include/uapi
f mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
Note that memcg might require additional changes.
Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Suggested-by: Andy Lutomirski
Cc: linux...@kvack.org
Cc:
number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.
Signed-off-by: Eric Dumazet
Cc: Andy Lutomirski
Cc: Soheil Hassas Yeganeh
---
tools/testing/selftests/net/tcp_mmap.c
On 04/26/2018 06:40 AM, Ka-Cheong Poon wrote:
> A quick question. Is it a normal practice to return a result
> in setsockopt() given that the optval parameter is supposed to
> be a const void *?
Very good question.
Andy suggested an ioctl() or setsockopt(), and I chose setsockopt() but it loo
number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.
Signed-off-by: Eric Dumazet
Cc: Andy Lutomirski
Cc: Soheil Hassas Yeganeh
---
tools/testing/selftests/net/tcp_mmap.c
f mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
Note that memcg might require additional changes.
Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet
Reported-by: syzbot
Suggested-by: Andy Lutomirski
Cc: linux...@kvack.org
Cc:
.
v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
instead of setsockopt(), feedback from Ka-Cheon Poon
v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
Properly clear zc->recv_skip_hint in case user request was completed.
Eric Dumazet (2):
tc
On 04/25/2018 06:20 PM, Soheil Hassas Yeganeh wrote:
>
> Acked-by: Soheil Hassas Yeganeh
>
>
Thanks Soheil for reviewing.
I have changed setsockopt() to getsockopt() so chose to not carry your Acked-by
Please add it back if you agree, thanks !
On 04/26/2018 02:16 PM, Andy Lutomirski wrote:
> At the risk of further muddying the waters, there's another minor tweak
> that could improve performance on certain workloads. Currently you mmap()
> a range for a given socket and then getsockopt() to receive. If you made
> it so you could mmap(
On 03/20/2018 09:21 AM, Eric Dumazet wrote:
>
>
> On 03/20/2018 08:53 AM, Ursula Braun wrote:
>> From: Hans Wippel
>>
>> Currently, the SMC experimental TCP option in a SYN packet is lost on
>> the server side when SYN Cookies are active. However, the corresp
On 03/20/2018 10:11 AM, David Lebrun wrote:
> On 20/03/18 15:07, Eric Dumazet wrote:
>> This is not the proper fix.
>>
>> Control path holds RTNL and can sleeep if needed.
>>
>> RCU should be avoided in lwtunnel_build_state()
>>
>
> +Roopa
>
&
On 03/20/2018 04:21 PM, Yonghong Song wrote:
> Without the previous commit,
> "modprobe test_bpf" will have the following errors:
> ...
> [ 98.149165] [ cut here ]
> [ 98.159362] kernel BUG at net/core/skbuff.c:3667!
> [ 98.169756] invalid opcode: [#1] SMP PTI
>
On 03/20/2018 11:47 PM, Yonghong Song wrote:
...
+
> +static __init int test_skb_segment(void)
> +{
> + netdev_features_t features;
> + struct sk_buff *skb;
> + int ret = -1;
> +
> + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM |
> +NETIF_F_IPV6_C
On 03/20/2018 11:47 PM, Yonghong Song wrote:
> +static __init int test_skb_segment(void)
> +{
> + netdev_features_t features;
> + struct sk_buff *skb;
> + int ret = -1;
> +
> + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM |
> +NETIF_F_IPV6_CSUM;
>
On 03/21/2018 02:01 PM, Saeed Mahameed wrote:
> From: Ilya Lesokhin
>
> This patch adds a generic infrastructure to offload TLS crypto to a
...
> +
> +static inline int tls_push_record(struct sock *sk,
> + struct tls_context *ctx,
> +
On 03/22/2018 06:23 AM, Ursula Braun wrote:
> We moved the clear to cookie_v4_check()/cookie_v6_check. However, this does
> not seem to
> be sufficient to prevent the SYNACK from containing the SMC experimental
> option.
> We found that an additional check in tcp_conn_request() helps:
>
> ---
Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2")
Reviewed-by: Eric Dumazet
Thanks Alexander.
On 03/23/2018 06:05 AM, Paolo Abeni wrote:
> While building ipv6 datagram we currently allow arbitrary large
> extheaders, even beyond pmtu size. The syzbot has found a way
> to exploit the above to trigger the following splat:
>
...
> As stated by RFC 7112 section 5:
>
>When a host fragme
On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote:
> +
> + /* Note, below struct compat code was primarily needed when
> + * page_pool code lived under MM-tree control, given mmots and
> + * net-next trees progress in very different rates.
> + *
> + * Allow kernel deve
On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote:
> +
> +void page_pool_destroy_rcu(struct page_pool *pool)
> +{
> + call_rcu(&pool->rcu, __page_pool_destroy_rcu);
> +}
> +EXPORT_SYMBOL(page_pool_destroy_rcu);
>
Why do we need to respect one rcu grace period before destroying a page p
On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote:
> +#define PP_ALLOC_CACHE_SIZE 128
> +#define PP_ALLOC_CACHE_REFILL64
> +struct pp_alloc_cache {
> + u32 count cacheline_aligned_in_smp;
> + void *cache[PP_ALLOC_CACHE_SIZE];
> +};
> +
> +struct page_pool_params {
...
>
t;)
> Reported-by: syzbot+91e6f9932ff122fa4...@syzkaller.appspotmail.com
> Signed-off-by: Paolo Abeni
>
Reviewed-by: Eric Dumazet
Thanks Paolo !
On 03/23/2018 07:15 AM, Jesper Dangaard Brouer wrote:
> On Fri, 23 Mar 2018 06:29:55 -0700
> Eric Dumazet wrote:
>
>> On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote:
>>
>>> +
>>> +void page_pool_destroy_rcu(struct page_pool *p
l fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
SYSC_ioctl fs/ioctl.c:701 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for
On 03/23/2018 07:39 AM, Alexey Kodanev wrote:
> After commit 33c162a980fe ("ipv6: datagram: Update dst cache of a
> connected datagram sk during pmtu update"), when the error occurs on
> sending datagram in udpv6_sendmsg() due to ICMPV6_PKT_TOOBIG type,
> error handler can trigger the following p
On 03/24/2018 01:13 PM, John Fastabend wrote:
> After the qdisc lock was dropped in pfifo_fast we allow multiple
> enqueue threads and dequeue threads to run in parallel. On the
> enqueue side the skb bit ooo_okay is used to ensure all related
> skbs are enqueued in-order. On the dequeue side tho
e same socket.
>
> Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected
> datagram sk during pmtu update")
> Signed-off-by: Alexey Kodanev
> ---
>
Reviewed-by: Eric Dumazet
Thanks Alexey.
net/socket.c:629 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:639
___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
__sys_sendmsg+0xe5/0x210 net/socket.c:2081
Fixes: 19acc327258a ("gso: Handle Trans-Ether-Bridging protocol in
skb_network_protocol()")
Signed-off-by: Eric Dumazet
Cc: Pravin B She
Refine the RX check summing handling to propagate the
hardware provided checksum so that we do not have to
compute it later in software.
Signed-off-by: Eric Dumazet
Cc: Willem de Bruijn
Cc: Tariq Toukan
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 10 --
1 file changed, 4
IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment
sysctl to unprivileged users")
The only sysctl that is not per-netns is not used :
ip6frag_secret_interval
Signed-off-by: Eric Dumazet
Cc: Nikolay Borisov
---
net/ipv6/reassembly.c | 4
1 file changed, 4 deletion
On 03/16/2018 07:05 AM, David Miller wrote:
> From: Toshiaki Makita
> Date: Tue, 13 Mar 2018 14:51:26 +0900
>
>> As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
>> bridge devices ends up with corrupted packets. He narrowed down this problem
>> and found that the root c
On 03/28/2018 03:51 AM, Krzysztof Blaszkowski wrote:
> Hi,
>
> I noticed a kernel bug report like below:
>
> [95576.826393] BUG: unable to handle kernel NULL pointer dereference at
> 0038
> [95576.834296] IP: tcp_push+0x3d/0x110
> [95576.837829] PGD 2c8474067 P4D 2c8474067 PUD 1119c
On 03/28/2018 02:26 AM, Anny Hu wrote:
> Dears,
>
> Recently, we find the following patch will impact multi-core network
> throughput performance on kernel-4.9,
> for it will cause rx packet out-of-order.
> commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882
>
> [kernel version]:
> kernel-4.9
On 03/28/2018 12:30 AM, Saurabh Kr wrote:
> Hi Eric/Angelo,
>
> We are seeing the assertion error in linux kernel 2.4.29 “*kernel: KERNEL:
> assertion (atomic_read(&skb->users) == 0) failed at dev.c(1397)**”.* Based on
> patch provided (_https://patchwork.kernel.org/patch/5368051/_ ) we mer
On 03/28/2018 07:38 AM, David Miller wrote:
> From: Eric Dumazet
> Date: Wed, 28 Mar 2018 06:38:21 -0700
>
>> https://patchwork.ozlabs.org/patch/886324/
>
> I have this in my current -stable submission set, and I'm working
> actively on this right now.
Thanks a lot David.
On 03/28/2018 08:51 PM, Dongli Zhang wrote:
> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
> the received sk_buff is malformed, that is, when the sk_buff has pattern
> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
> stack:
>
>...
>
> The
;] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
[<00007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link
down")
Signed-off-by: Eric Dumazet
Cc: Hangbin Liu
Reported-by: syzbot+6ca
Extend flowlabel_reflect bitmask to allow conditional
reflection of incoming flowlabels in echo replies.
Note this has precedence against auto flowlabels.
Add flowlabel_reflect enum to replace hard coded
values.
Signed-off-by: Eric Dumazet
---
Documentation/networking/ip-sysctl.txt | 4
On 7/1/19 8:12 AM, Willem de Bruijn wrote:
> On Mon, Jul 1, 2019 at 6:05 AM Antoine Tenart
> wrote:
>>
>> This patch adds support for PTP Hardware Clock (PHC) to the Ocelot
>> switch for both PTP 1-step and 2-step modes.
>>
>> Signed-off-by: Antoine Tenart
>
>> void ocelot_deinit(struct ocel
using slave printk macros")
Signed-off-by: Eric Dumazet
Reported-by: John Sperbeck
Cc: Jarod Wilson
CC: Jay Vosburgh
CC: Veaceslav Falico
CC: Andy Gospodarek
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main
On Mon, Jul 1, 2019 at 9:15 PM Jay Vosburgh wrote:
>
> Eric Dumazet wrote:
>
> >A bonding master can be up while best_slave is NULL.
> >
> >[12105.636318] BUG: unable to handle kernel NULL pointer dereference at
> >
> >[12105.638204
us scenarios before testing in prod env.
No known side effect.
Honestly, applications setting small SO_SNDBUF values can not expect good TCP
performance anyway.
>
> Thanks.
>
> Cheers,
> Tony Lu
>
> On Fri, Jun 21, 2019 at 06:09:55AM -0700, Eric Dumazet wrote:
>
On 7/3/19 1:35 AM, ZhangXiao wrote:
> Hi David & Eric,
>
> Commit 5f3e2bf0 (tcp: add tcp_min_snd_mss sysctl) add a new interface to
> adjust network. While if this variable been set too large, for example
> larger then (MTU - 40), the net link maybe damaged. So, how about adding
> some warning
On Sun, Jul 7, 2019 at 1:13 AM Christoph Paasch wrote:
>
> If an app is playing tricks to reuse a socket via tcp_disconnect(),
> bytes_acked/received needs to be reset to 0. Otherwise tcp_info will
> report the sum of the current and the old connection..
>
> Cc: Eri
On 7/8/19 11:13 PM, Chris Packham wrote:
> On 9/07/19 8:43 AM, Chris Packham wrote:
>> On 8/07/19 8:18 PM, Eric Dumazet wrote:
>>>
>>>
>>> On 7/8/19 12:53 AM, Chris Packham wrote:
>>>> tipc_named_node_up() creates a skb list. It passes the list to
On 7/9/19 1:10 PM, Marek Majkowski wrote:
> Morning,
>
> I'm experimenting with flow label reflection from a server point of
> view. I'm able to get it working in both supported ways:
>
> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
>
> (b) with global flowla
5: Flags [P.]
> IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R]
>
> Now it seem to work reliably. Tested on net-next under virtme.
>
> Marek
>
> On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet wrote:
>>
>>
>>
>> On 7/9/19 1:10 PM, Marek M
On 7/9/19 3:22 PM, Eric Dumazet wrote:
>
>
> On 7/9/19 2:33 PM, Marek Majkowski wrote:
>> Ha, thanks. I missed that.
>>
>> There is a caveat though. I don't think it's working as intended...
>
>
> Note that my commit really took a
et
is sent on behalf of a 'full' socket.
In Marek use case, this was a socket in TCP_CLOSE state.
Signed-off-by: Eric Dumazet
Reported-by: Marek Majkowski
Tested-by: Marek Majkowski
---
net/ipv6/tcp_ipv6.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/
nds..
Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
Signed-off-by: Eric Dumazet
Acked-by: Willem de Bruijn
Reported-by: syzbot
---
net/ipv6/ip6_flowlabel.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/ip6_
04 00 00 e8 dc 29 3f fb 49 8d 7e
20 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 16 06
00 00 4d 8b 6e 20 e8 b4 29 3f fb 4c 89 ee
Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
Signed-off-by: Eric Dumazet
Acked-by: W
On 7/10/19 4:52 PM, Edward Cree wrote:
> Hmm, I was caught out by the call to napi_poll() actually being a local
> function pointer, not the static function of the same name. How did a
> shadow like that ever get allowed?
> But in that case I _really_ don't understand napi_busy_loop(); nothi
On 7/10/19 6:47 PM, Edward Cree wrote:
> On 10/07/2019 16:41, Eric Dumazet wrote:
>> On 7/10/19 4:52 PM, Edward Cree wrote:
>>> Hmm, I was caught out by the call to napi_poll() actually being a local
>>> function pointer, not the static function of the same name.
On 7/10/19 8:23 PM, Prout, Andrew - LLSC - MITLL wrote:
> On 6/17/19 8:19 PM, Christoph Paasch wrote:
>>
>> Yes, this does the trick for my packetdrill-test.
>>
>> I wonder, is there a way we could end up in a situation where we can't
>> retransmit anymore?
>> For example, sk_wmem_queued has gro
On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote:
>
> Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as
> well, reliably for the samba test and once (not reliably) with synthetic test
> I was trying. A patched v4.14.132 with this patch partially reverted (just
>
On 7/11/19 9:28 AM, Christoph Paasch wrote:
>
>
>> On Jul 10, 2019, at 9:26 PM, Eric Dumazet wrote:
>>
>>
>>
>> On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote:
>>>
>>> Our initial rollout was v4.14.130, but I reproduced it with v
On 7/11/19 9:28 AM, Christoph Paasch wrote:
>
> Would it make sense to always allow the alloc in tcp_fragment when coming
> from __tcp_retransmit_skb() through the retransmit-timer ?
>
> AFAICS, the crasher was when an attacker sends "fake" SACK-blocks. Thus, we
> would still be protected f
On 7/11/19 7:14 PM, Prout, Andrew - LLSC - MITLL wrote:
>
> In my opinion, if a small SO_SNDBUF below a certain value is no longer
> supported, then SOCK_MIN_SNDBUF should be adjusted to reflect this. The
> RCVBUF/SNDBUF sizes are supposed to be hints, no error is returned if they
> are not
On 7/11/19 8:26 PM, Michal Kubecek wrote:
>
> I'm aware it's not a realistic test. It was written as quick and simple
> check of the pre-4.19 patch, but it shows that even TLP may not get
> through.
Most of TLP probes send new data, not rtx.
But yes, I get your point.
SO_SNDBUF=15000 in yo
On 7/11/19 9:04 PM, Jonathan Lemon wrote:
> I discovered we have some production services that set SO_SNDBUF to
> very small values (~4k), as they are essentially doing interactive
> communications, not bulk transfers. But there's a difference between
> "terrible performance" and "TCP stops w
On 7/12/19 5:59 PM, Edward Cree wrote:
> On 10/07/2019 18:39, Eric Dumazet wrote:
>> Holding a small packet in the list up to the point we call busy_poll_stop()
>> will basically make busypoll non working anymore.
>>
>> napi_complete_done() has special behavior
On Tue, 2017-08-15 at 13:08 +, Boris Pismenny wrote:
> Hi Eric,
>
> The IPV6_ADDRFORM socket option assumes that when
> (sk->sk_protocol == IPPROTO_TCP)
> then the sk_proto is set to tcpv6_prot and it replaces it with tcp_prot.
>
> This patch ensures that the IPV6_ADDRFORM socket option do
;[PATCH 1/2] net_sched: call
> qlen_notify only if child qdisc is empty".
> I hadn't tested them separately.
Thanks for the info. I've read this patch and it looks fine indeed :)
Acked-by: Eric Dumazet
On Tue, 2017-08-15 at 12:05 -0300, Marcelo Ricardo Leitner wrote:
> Ok, but I should see a difference in the generated code, right?
Depends on the compiler. Have you tried older versions ?
One argument is that following struct member definition eases code
review.
(It is easier to catch a field
On Tue, 2017-08-15 at 18:30 +0200, Paweł Staszewski wrote:
> Hi
>
>
> Doing some tests i discovered that when traffic is send by pktgen to
> forwarding host where nexthop for destination network on forwarding
> router is not reachable i have 100% cpu on all cores and perf top show
> mostly:
>
On Tue, 2017-08-15 at 19:42 +0200, Paweł Staszewski wrote:
> # To display the perf.data header info, please use
> --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 2M of event 'cycles'
> # Event count (approx.): 1585571545969
> #
> # Children Self Command
On Tue, 2017-08-15 at 22:45 +0300, Julian Anastasov wrote:
> Hello,
>
> On Tue, 15 Aug 2017, Eric Dumazet wrote:
>
> > Please try this :
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > index
> > 16a1
On Fri, 2017-08-11 at 19:41 +0800, Jason Wang wrote:
> We use tun_alloc_skb() which calls sock_alloc_send_pskb() to allocate
> skb in the past. This socket based method is not suitable for high
> speed userspace like virtualization which usually:
>
> - ignore sk_sndbuf (INT_MAX) and expect to rece
On Wed, 2017-08-16 at 11:55 +0800, Jason Wang wrote:
>
> On 2017年08月16日 11:45, Eric Dumazet wrote:
> >
> > You do realize that tun_build_skb() is not thread safe ?
>
> Ok, I think the issue if skb_page_frag_refill(), need a spinlock
> probably. Will prepare a patch.
On Wed, 2017-08-16 at 10:18 +0530, Akshat Kakkar wrote:
> On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkar wrote:
> > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and
> > 2 x 10 Core Xeon Processor.
> > I have hosted a webserver on it and enabled ssh for remote maintenance.
> >
On Wed, 2017-08-16 at 04:14 -0400, Chris Mi wrote:
> IDR uses internally radix tree which uses unsigned long. It doesn't
> makes sense to have index as signed value.
>
> Signed-off-by: Chris Mi
> Signed-off-by: Jiri Pirko
> ---
> block/bsg.c | 8 ++--
> bloc
On Wed, 2017-08-16 at 11:03 +0200, Jakub Sitnicki wrote:
> On Tue, 15 Aug 2017 15:25:10 -0700
> Jonathan Basseri wrote:
>
> > If an IPv6 socket has a valid dst cache, then xfrm_lookup_route will get
> > skipped. However, the cache is not invalidated when applying policy to a
> > socket (i.e. IPV6
On Wed, 2017-08-16 at 12:53 +0200, Jiri Pirko wrote:
> rhashtable is unnecesary big hammer for this. IDR is nice fit for
> this purpose.
Obviously IDR does not fit, since you have to change its ABI.
If rhashtable does not fit this, then I wonder why we spent so many days
of work adding it in the
From: Eric Dumazet
syszkaller team reported another problem in DCCP [1]
Problem here is that the structure holding RTO timer
(ccid2_hc_tx_rto_expire() handler) is freed too soon.
We can not use del_timer_sync() to cancel the timer
since this timer wants to grab socket lock (that would risk a
On Wed, 2017-08-16 at 06:31 -0700, Tonghao Zhang wrote:
> Because we remove the tcp_tw_recycle support in the commit
> 4396e46187c ('tcp: remove tcp_tw_recycle') and also delete
> the code 'af_ops->route_req' for sysctl_tw_recycle in tcp_conn_request.
> Now when we call the 'af_ops->route_req', t
On Wed, 2017-08-16 at 13:06 +0200, Jiri Pirko wrote:
> Wed, Aug 16, 2017 at 12:58:53PM CEST, eric.duma...@gmail.com wrote:
> >On Wed, 2017-08-16 at 12:53 +0200, Jiri Pirko wrote:
> >
> >> rhashtable is unnecesary big hammer for this. IDR is nice fit for
> >> this purpose.
> >
> >Obviously IDR does
From: Eric Dumazet
syszkaller reported use-after-free in tipc [1]
When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.
[1]
==
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/c
From: Eric Dumazet
As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.
Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.
Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for poi
From: Eric Dumazet
While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.
gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/
This is a problem because device mtu can be c
On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote:
> On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet wrote:
> > From: Eric Dumazet
> >
> > Based on a syzkaller report [1], I found that a per cpu allocation
> > failure in snmp6_alloc_dev() would then
On Wed, 2017-08-16 at 12:13 -0700, David Miller wrote:
> From: John Fastabend
> Date: Wed, 16 Aug 2017 12:06:36 -0700
>
> > On 08/16/2017 11:35 AM, David Miller wrote:
> >> From: David Miller
> >> Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)
> >>
> >>> From: John Fastabend
> >>> Date: Tue, 15 A
From: Eric Dumazet
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Eric Dumazet
---
include/net/dst.h
On Wed, 2017-08-16 at 12:15 -0700, Eric Dumazet wrote:
> On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote:
> > On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet
> > wrote:
> > > From: Eric Dumazet
> > >
> > > Based on a syzkaller report [1], I found
On Wed, 2017-08-16 at 12:24 -0700, Eric Dumazet wrote:
> From: Eric Dumazet
>
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead
From: Eric Dumazet
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: Eric Dumazet
---
v2: fix a missing
501 - 600 of 7364 matches
Mail list logo