[PATCH net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-24 Thread Eric Dumazet
. Eric Dumazet (2): tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE include/uapi/linux/tcp.h | 8 ++ net/ipv4/tcp.c | 186 + tools/testing/selftests/net/tcp_mmap.c

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/24/2018 11:28 PM, Christoph Hellwig wrote: > On Tue, Apr 24, 2018 at 10:27:21PM -0700, Eric Dumazet wrote: >> When adding tcp mmap() implementation, I forgot that socket lock >> had to be taken before current->mm->mmap_sem. syzbot eventually caught >> the bug.

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/24/2018 10:27 PM, Eric Dumazet wrote: > When adding tcp mmap() implementation, I forgot that socket lock > had to be taken before current->mm->mmap_sem. syzbot eventually caught > the bug. > + ... > + down_read(¤t->mm->mmap_sem); > + > + ret

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 06:42 AM, Toke Høiland-Jørgensen wrote: > sch_cake targets the home router use case and is intended to squeeze the > most bandwidth and latency out of even the slowest ISP links and routers, > while presenting an API simple enough that even an ISP can configure it. > * Support for

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote: > Hmm, because pure ACKs are not generally aggregated (sorry, I'm not > quite clear on when exactly GSO will kick in)? A GSO packet must contain payload in each of its segment. There is no way some ack aggregation logic could build a GSO p

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote: > Eric Dumazet writes: >> What performance number do you get on a 10Gbit NIC for example ? > > Single-flow throughput through 2 hops on a 40Gbit connection (with CAKE > in unlimited mode vs pfifo_fast on the router):

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote: > Eric Dumazet writes: >> Lack of any pskb_may_pull() is really concerning. > > By this you mean "check that the packet is long enough to contain the > header we are looking for before trying to do ACK filtering&quo

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:04 AM, Matthew Wilcox wrote: > If you don't zap the page range, any of the CPUs in the system where > any thread in this task have ever run may have a TLB entry pointing to > this page ... if the page is being recycled into the page allocator, > then that page might end up as a

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:06 AM, Toke Høiland-Jørgensen wrote: > Eric Dumazet writes: > >> On 04/25/2018 08:22 AM, Toke Høiland-Jørgensen wrote: >>> Eric Dumazet writes: >> >>>> What performance number do you get on a 10Gbit NIC for example ? >>> >&

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:22 AM, Andy Lutomirski wrote: > In general, I suspect that the zerocopy receive mechanism will only > really be a win in single-threaded applications that consume large > amounts of receive bandwidth on a single TCP socket using lots of > memory and don't do all that much else.

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:35 AM, Eric Dumazet wrote: > > > On 04/25/2018 09:22 AM, Andy Lutomirski wrote: > >> In general, I suspect that the zerocopy receive mechanism will only >> really be a win in single-threaded applications that consume large >> amounts of rec

Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:52 AM, Jonathan Morton wrote: >> We can see here the high cost of forcing software GSO :/ >> >> Really, this should be done only : >> 1) If requested by the admin ( tc gso ) >> >> 2) If packet size is above a threshold. >> The threshold could be set by the admin, and/or

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:55 AM, Toke Høiland-Jørgensen wrote: > Well, as I said, 10Gbit+ links are not really the target audience ;) Well, 640KB of memory is all we need. > > We did actually have a threshold at some point, but it was removed > because it didn't work well (I'm not sure of the details,

Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:17 AM, Toke Høiland-Jørgensen wrote: > Or am I to interpret that as a hard NAK on having this feature in CAKE > (even if we fix the issues you pointed out)? No strong NACK, as long as the current code is fixed. Right now, a malicious packet will kill the box or something bad l

Re: [Cake] [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc

2018-04-25 Thread Eric Dumazet
On 04/25/2018 11:34 AM, Toke Høiland-Jørgensen wrote: > Eric Dumazet writes: > >> On 04/25/2018 09:52 AM, Jonathan Morton wrote: >>>> We can see here the high cost of forcing software GSO :/ >>>> >>>> Really, this should be done only :

[PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-25 Thread Eric Dumazet
. v2: Added a missing page align of zc->length in tcp_zerocopy_receive() Properly clear zc->recv_skip_hint in case user request was completed. Eric Dumazet (2): tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE include/uapi

[PATCH v2 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
f mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet Reported-by: syzbot Suggested-by: Andy Lutomirski Cc: linux...@kvack.org Cc:

[PATCH v2 net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

2018-04-25 Thread Eric Dumazet
number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet Cc: Andy Lutomirski Cc: Soheil Hassas Yeganeh --- tools/testing/selftests/net/tcp_mmap.c

Re: [PATCH v2 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-26 Thread Eric Dumazet
On 04/26/2018 06:40 AM, Ka-Cheong Poon wrote: > A quick question.  Is it a normal practice to return a result > in setsockopt() given that the optval parameter is supposed to > be a const void *? Very good question. Andy suggested an ioctl() or setsockopt(), and I chose setsockopt() but it loo

[PATCH v3 net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

2018-04-26 Thread Eric Dumazet
number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet Cc: Andy Lutomirski Cc: Soheil Hassas Yeganeh --- tools/testing/selftests/net/tcp_mmap.c

[PATCH v3 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-26 Thread Eric Dumazet
f mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet Reported-by: syzbot Suggested-by: Andy Lutomirski Cc: linux...@kvack.org Cc:

[PATCH v3 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-26 Thread Eric Dumazet
. v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option instead of setsockopt(), feedback from Ka-Cheon Poon v2: Added a missing page align of zc->length in tcp_zerocopy_receive() Properly clear zc->recv_skip_hint in case user request was completed. Eric Dumazet (2): tc

Re: [PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-26 Thread Eric Dumazet
On 04/25/2018 06:20 PM, Soheil Hassas Yeganeh wrote: > > Acked-by: Soheil Hassas Yeganeh > > Thanks Soheil for reviewing. I have changed setsockopt() to getsockopt() so chose to not carry your Acked-by Please add it back if you agree, thanks !

Re: [PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-26 Thread Eric Dumazet
On 04/26/2018 02:16 PM, Andy Lutomirski wrote: > At the risk of further muddying the waters, there's another minor tweak > that could improve performance on certain workloads. Currently you mmap() > a range for a given socket and then getsockopt() to receive. If you made > it so you could mmap(

Re: [PATCH net-next 1/1] net/ipv4: disable SMC TCP option with SYN Cookies

2018-03-20 Thread Eric Dumazet
On 03/20/2018 09:21 AM, Eric Dumazet wrote: > > > On 03/20/2018 08:53 AM, Ursula Braun wrote: >> From: Hans Wippel >> >> Currently, the SMC experimental TCP option in a SYN packet is lost on >> the server side when SYN Cookies are active. However, the corresp

Re: [PATCH net] ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state

2018-03-20 Thread Eric Dumazet
On 03/20/2018 10:11 AM, David Lebrun wrote: > On 20/03/18 15:07, Eric Dumazet wrote: >> This is not the proper fix. >> >> Control path holds RTNL and can sleeep if needed. >> >> RCU should be avoided in lwtunnel_build_state() >> > > +Roopa > &

Re: [PATCH net-next v3 2/2] net: bpf: add a test for skb_segment in test_bpf module

2018-03-20 Thread Eric Dumazet
On 03/20/2018 04:21 PM, Yonghong Song wrote: > Without the previous commit, > "modprobe test_bpf" will have the following errors: > ... > [ 98.149165] [ cut here ] > [ 98.159362] kernel BUG at net/core/skbuff.c:3667! > [ 98.169756] invalid opcode: [#1] SMP PTI >

Re: [PATCH net-next v4 2/2] net: bpf: add a test for skb_segment in test_bpf module

2018-03-21 Thread Eric Dumazet
On 03/20/2018 11:47 PM, Yonghong Song wrote: ... + > +static __init int test_skb_segment(void) > +{ > + netdev_features_t features; > + struct sk_buff *skb; > + int ret = -1; > + > + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | > +NETIF_F_IPV6_C

Re: [PATCH net-next v4 2/2] net: bpf: add a test for skb_segment in test_bpf module

2018-03-21 Thread Eric Dumazet
On 03/20/2018 11:47 PM, Yonghong Song wrote: > +static __init int test_skb_segment(void) > +{ > + netdev_features_t features; > + struct sk_buff *skb; > + int ret = -1; > + > + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | > +NETIF_F_IPV6_CSUM; >

Re: [PATCH V2 net-next 06/14] net/tls: Add generic NIC offload infrastructure

2018-03-21 Thread Eric Dumazet
On 03/21/2018 02:01 PM, Saeed Mahameed wrote: > From: Ilya Lesokhin > > This patch adds a generic infrastructure to offload TLS crypto to a ... > + > +static inline int tls_push_record(struct sock *sk, > + struct tls_context *ctx, > +

Re: [PATCH net-next 1/1] net/ipv4: disable SMC TCP option with SYN Cookies

2018-03-22 Thread Eric Dumazet
On 03/22/2018 06:23 AM, Ursula Braun wrote: > We moved the clear to cookie_v4_check()/cookie_v6_check. However, this does > not seem to > be sufficient to prevent the SYNACK from containing the SMC experimental > option. > We found that an additional check in tcp_conn_request() helps: > > ---

Re: [PATCH] netlink: make sure nladdr has correct size in netlink_connect()

2018-03-23 Thread Eric Dumazet
Fixes: 1da177e4c3f41524 ("Linux-2.6.12-rc2") Reviewed-by: Eric Dumazet Thanks Alexander.

Re: [PATCH net] ipv6: the entire IPv6 header chain must fit the first fragment

2018-03-23 Thread Eric Dumazet
On 03/23/2018 06:05 AM, Paolo Abeni wrote: > While building ipv6 datagram we currently allow arbitrary large > extheaders, even beyond pmtu size. The syzbot has found a way > to exploit the above to trigger the following splat: > ... > As stated by RFC 7112 section 5: > >When a host fragme

Re: [bpf-next V5 PATCH 11/15] page_pool: refurbish version of page_pool code

2018-03-23 Thread Eric Dumazet
On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote: > + > + /* Note, below struct compat code was primarily needed when > + * page_pool code lived under MM-tree control, given mmots and > + * net-next trees progress in very different rates. > + * > + * Allow kernel deve

Re: [bpf-next V5 PATCH 11/15] page_pool: refurbish version of page_pool code

2018-03-23 Thread Eric Dumazet
On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote: > + > +void page_pool_destroy_rcu(struct page_pool *pool) > +{ > + call_rcu(&pool->rcu, __page_pool_destroy_rcu); > +} > +EXPORT_SYMBOL(page_pool_destroy_rcu); > Why do we need to respect one rcu grace period before destroying a page p

Re: [bpf-next V5 PATCH 11/15] page_pool: refurbish version of page_pool code

2018-03-23 Thread Eric Dumazet
On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote: > +#define PP_ALLOC_CACHE_SIZE 128 > +#define PP_ALLOC_CACHE_REFILL64 > +struct pp_alloc_cache { > + u32 count cacheline_aligned_in_smp; > + void *cache[PP_ALLOC_CACHE_SIZE]; > +}; > + > +struct page_pool_params { ... >

Re: [PATCH net v2] ipv6: the entire IPv6 header chain must fit the first fragment

2018-03-23 Thread Eric Dumazet
t;) > Reported-by: syzbot+91e6f9932ff122fa4...@syzkaller.appspotmail.com > Signed-off-by: Paolo Abeni > Reviewed-by: Eric Dumazet Thanks Paolo !

Re: [bpf-next V5 PATCH 11/15] page_pool: refurbish version of page_pool code

2018-03-23 Thread Eric Dumazet
On 03/23/2018 07:15 AM, Jesper Dangaard Brouer wrote: > On Fri, 23 Mar 2018 06:29:55 -0700 > Eric Dumazet wrote: > >> On 03/23/2018 05:18 AM, Jesper Dangaard Brouer wrote: >> >>> + >>> +void page_pool_destroy_rcu(struct page_pool *p

[PATCH net] ipv6: fix possible deadlock in rt6_age_examine_exception()

2018-03-23 Thread Eric Dumazet
l fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: c757faa8bfa2 ("ipv6: prepare fib6_age() for

Re: [PATCH net] udp6: set dst cache for a connected sk before udp_v6_send_skb

2018-03-23 Thread Eric Dumazet
On 03/23/2018 07:39 AM, Alexey Kodanev wrote: > After commit 33c162a980fe ("ipv6: datagram: Update dst cache of a > connected datagram sk during pmtu update"), when the error occurs on > sending datagram in udpv6_sendmsg() due to ICMPV6_PKT_TOOBIG type, > error handler can trigger the following p

Re: [net PATCH] net: sched, fix OOO packets with pfifo_fast

2018-03-24 Thread Eric Dumazet
On 03/24/2018 01:13 PM, John Fastabend wrote: > After the qdisc lock was dropped in pfifo_fast we allow multiple > enqueue threads and dequeue threads to run in parallel. On the > enqueue side the skb bit ooo_okay is used to ensure all related > skbs are enqueued in-order. On the dequeue side tho

Re: [PATCH net v2] udp6: set dst cache for a connected sk before udp_v6_send_skb

2018-03-26 Thread Eric Dumazet
e same socket. > > Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected > datagram sk during pmtu update") > Signed-off-by: Alexey Kodanev > --- > Reviewed-by: Eric Dumazet Thanks Alexey.

[PATCH net] net: fix possible out-of-bound read in skb_network_protocol()

2018-03-26 Thread Eric Dumazet
net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047 __sys_sendmsg+0xe5/0x210 net/socket.c:2081 Fixes: 19acc327258a ("gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()") Signed-off-by: Eric Dumazet Cc: Pravin B She

[PATCH net-next] net/mlx4_en: CHECKSUM_COMPLETE support for fragments

2018-03-27 Thread Eric Dumazet
Refine the RX check summing handling to propagate the hardware provided checksum so that we do not have to compute it later in software. Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 10 -- 1 file changed, 4

[PATCH net-next] ipv6: export ip6 fragments sysctl to unprivileged users

2018-03-27 Thread Eric Dumazet
IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment sysctl to unprivileged users") The only sysctl that is not per-netns is not used : ip6frag_secret_interval Signed-off-by: Eric Dumazet Cc: Nikolay Borisov --- net/ipv6/reassembly.c | 4 1 file changed, 4 deletion

Re: [PATCH net 0/2] Fix vlan untag and insertion for bridge and vlan with reorder_hdr off

2018-03-27 Thread Eric Dumazet
On 03/16/2018 07:05 AM, David Miller wrote: > From: Toshiaki Makita > Date: Tue, 13 Mar 2018 14:51:26 +0900 > >> As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from >> bridge devices ends up with corrupted packets. He narrowed down this problem >> and found that the root c

Re: 4.14.29 - tcp_push() - null skb's cb dereference

2018-03-28 Thread Eric Dumazet
On 03/28/2018 03:51 AM, Krzysztof Blaszkowski wrote: > Hi, > > I noticed a kernel bug report like below: > > [95576.826393] BUG: unable to handle kernel NULL pointer dereference at > 0038 > [95576.834296] IP: tcp_push+0x3d/0x110 > [95576.837829] PGD 2c8474067 P4D 2c8474067 PUD 1119c

Re: Problems: network rx out-of-order issue

2018-03-28 Thread Eric Dumazet
On 03/28/2018 02:26 AM, Anny Hu wrote: > Dears, > > Recently, we find the following patch will impact multi-core network > throughput performance on kernel-4.9, > for it will cause rx packet out-of-order. > commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882 > > [kernel version]: > kernel-4.9

Re: net_tx_action race condition?

2018-03-28 Thread Eric Dumazet
On 03/28/2018 12:30 AM, Saurabh Kr wrote: > Hi Eric/Angelo, >   > We are seeing the assertion error  in linux kernel 2.4.29  “*kernel: KERNEL: > assertion (atomic_read(&skb->users) == 0) failed at dev.c(1397)**”.* Based on > patch provided (_https://patchwork.kernel.org/patch/5368051/_ ) we mer

Re: 4.14.29 - tcp_push() - null skb's cb dereference

2018-03-28 Thread Eric Dumazet
On 03/28/2018 07:38 AM, David Miller wrote: > From: Eric Dumazet > Date: Wed, 28 Mar 2018 06:38:21 -0700 > >> https://patchwork.ozlabs.org/patch/886324/ > > I have this in my current -stable submission set, and I'm working > actively on this right now. Thanks a lot David.

Re: [PATCH v2 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Eric Dumazet
On 03/28/2018 08:51 PM, Dongli Zhang wrote: > The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if > the received sk_buff is malformed, that is, when the sk_buff has pattern > (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call > stack: > >... > > The

[PATCH net] igmp: fix memory leak in igmpv3_del_delrec()

2019-06-27 Thread Eric Dumazet
;] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301 [<00007fd83a4b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down") Signed-off-by: Eric Dumazet Cc: Hangbin Liu Reported-by: syzbot+6ca

[PATCH net-next] ipv6: icmp: allow flowlabel reflection in echo replies

2019-07-01 Thread Eric Dumazet
Extend flowlabel_reflect bitmask to allow conditional reflection of incoming flowlabels in echo replies. Note this has precedence against auto flowlabels. Add flowlabel_reflect enum to replace hard coded values. Signed-off-by: Eric Dumazet --- Documentation/networking/ip-sysctl.txt | 4

Re: [PATCH net-next 8/8] net: mscc: PTP Hardware Clock (PHC) support

2019-07-01 Thread Eric Dumazet
On 7/1/19 8:12 AM, Willem de Bruijn wrote: > On Mon, Jul 1, 2019 at 6:05 AM Antoine Tenart > wrote: >> >> This patch adds support for PTP Hardware Clock (PHC) to the Ocelot >> switch for both PTP 1-step and 2-step modes. >> >> Signed-off-by: Antoine Tenart > >> void ocelot_deinit(struct ocel

[PATCH net-next] bonding/main: fix NULL dereference in bond_select_active_slave()

2019-07-01 Thread Eric Dumazet
using slave printk macros") Signed-off-by: Eric Dumazet Reported-by: John Sperbeck Cc: Jarod Wilson CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek --- drivers/net/bonding/bond_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main

Re: [PATCH net-next] bonding/main: fix NULL dereference in bond_select_active_slave()

2019-07-01 Thread Eric Dumazet
On Mon, Jul 1, 2019 at 9:15 PM Jay Vosburgh wrote: > > Eric Dumazet wrote: > > >A bonding master can be up while best_slave is NULL. > > > >[12105.636318] BUG: unable to handle kernel NULL pointer dereference at > > > >[12105.638204

Re: [PATCH net] tcp: refine memory limit test in tcp_fragment()

2019-07-03 Thread Eric Dumazet
us scenarios before testing in prod env. No known side effect. Honestly, applications setting small SO_SNDBUF values can not expect good TCP performance anyway. > > Thanks. > > Cheers, > Tony Lu > > On Fri, Jun 21, 2019 at 06:09:55AM -0700, Eric Dumazet wrote: >

Re: Shall we add some note info for tcp_min_snd_mss?

2019-07-03 Thread Eric Dumazet
On 7/3/19 1:35 AM, ZhangXiao wrote: > Hi David & Eric, > > Commit 5f3e2bf0 (tcp: add tcp_min_snd_mss sysctl) add a new interface to > adjust network. While if this variable been set too large, for example > larger then (MTU - 40), the net link maybe damaged. So, how about adding > some warning

Re: [PATCH net] tcp: Reset bytes_acked and bytes_received when disconnecting

2019-07-08 Thread Eric Dumazet
On Sun, Jul 7, 2019 at 1:13 AM Christoph Paasch wrote: > > If an app is playing tricks to reuse a socket via tcp_disconnect(), > bytes_acked/received needs to be reset to 0. Otherwise tcp_info will > report the sum of the current and the old connection.. > > Cc: Eri

Re: [PATCH] tipc: ensure skb->lock is initialised

2019-07-09 Thread Eric Dumazet
On 7/8/19 11:13 PM, Chris Packham wrote: > On 9/07/19 8:43 AM, Chris Packham wrote: >> On 8/07/19 8:18 PM, Eric Dumazet wrote: >>> >>> >>> On 7/8/19 12:53 AM, Chris Packham wrote: >>>> tipc_named_node_up() creates a skb list. It passes the list to

Re: IPv6 flow label reflection behave for RST packets

2019-07-09 Thread Eric Dumazet
On 7/9/19 1:10 PM, Marek Majkowski wrote: > Morning, > > I'm experimenting with flow label reflection from a server point of > view. I'm able to get it working in both supported ways: > > (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0 > > (b) with global flowla

Re: IPv6 flow label reflection behave for RST packets

2019-07-09 Thread Eric Dumazet
5: Flags [P.] > IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R] > > Now it seem to work reliably. Tested on net-next under virtme. > > Marek > > On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet wrote: >> >> >> >> On 7/9/19 1:10 PM, Marek M

Re: IPv6 flow label reflection behave for RST packets

2019-07-09 Thread Eric Dumazet
On 7/9/19 3:22 PM, Eric Dumazet wrote: > > > On 7/9/19 2:33 PM, Marek Majkowski wrote: >> Ha, thanks. I missed that. >> >> There is a caveat though. I don't think it's working as intended... > > > Note that my commit really took a

[PATCH net] ipv6: tcp: fix flowlabels reflection for RST packets

2019-07-10 Thread Eric Dumazet
et is sent on behalf of a 'full' socket. In Marek use case, this was a socket in TCP_CLOSE state. Signed-off-by: Eric Dumazet Reported-by: Marek Majkowski Tested-by: Marek Majkowski --- net/ipv6/tcp_ipv6.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/

[PATCH net] ipv6: fix static key imbalance in fl_create()

2019-07-10 Thread Eric Dumazet
nds.. Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist") Signed-off-by: Eric Dumazet Acked-by: Willem de Bruijn Reported-by: syzbot --- net/ipv6/ip6_flowlabel.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/ipv6/ip6_

[PATCH net] ipv6: fix potential crash in ip6_datagram_dst_update()

2019-07-10 Thread Eric Dumazet
04 00 00 e8 dc 29 3f fb 49 8d 7e 20 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 16 06 00 00 4d 8b 6e 20 e8 b4 29 3f fb 4c 89 ee Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist") Signed-off-by: Eric Dumazet Acked-by: W

Re: [RFC PATCH net-next 0/3] net: batched receive in GRO path

2019-07-10 Thread Eric Dumazet
On 7/10/19 4:52 PM, Edward Cree wrote: > Hmm, I was caught out by the call to napi_poll() actually being a local >  function pointer, not the static function of the same name.  How did a >  shadow like that ever get allowed? > But in that case I _really_ don't understand napi_busy_loop(); nothi

Re: [RFC PATCH net-next 0/3] net: batched receive in GRO path

2019-07-10 Thread Eric Dumazet
On 7/10/19 6:47 PM, Edward Cree wrote: > On 10/07/2019 16:41, Eric Dumazet wrote: >> On 7/10/19 4:52 PM, Edward Cree wrote: >>> Hmm, I was caught out by the call to napi_poll() actually being a local >>>  function pointer, not the static function of the same name. 

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-10 Thread Eric Dumazet
On 7/10/19 8:23 PM, Prout, Andrew - LLSC - MITLL wrote: > On 6/17/19 8:19 PM, Christoph Paasch wrote: >> >> Yes, this does the trick for my packetdrill-test. >> >> I wonder, is there a way we could end up in a situation where we can't >> retransmit anymore? >> For example, sk_wmem_queued has gro

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-10 Thread Eric Dumazet
On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote: > > Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as > well, reliably for the samba test and once (not reliably) with synthetic test > I was trying. A patched v4.14.132 with this patch partially reverted (just >

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-11 Thread Eric Dumazet
On 7/11/19 9:28 AM, Christoph Paasch wrote: > > >> On Jul 10, 2019, at 9:26 PM, Eric Dumazet wrote: >> >> >> >> On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote: >>> >>> Our initial rollout was v4.14.130, but I reproduced it with v

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-11 Thread Eric Dumazet
On 7/11/19 9:28 AM, Christoph Paasch wrote: > > Would it make sense to always allow the alloc in tcp_fragment when coming > from __tcp_retransmit_skb() through the retransmit-timer ? > > AFAICS, the crasher was when an attacker sends "fake" SACK-blocks. Thus, we > would still be protected f

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-11 Thread Eric Dumazet
On 7/11/19 7:14 PM, Prout, Andrew - LLSC - MITLL wrote: > > In my opinion, if a small SO_SNDBUF below a certain value is no longer > supported, then SOCK_MIN_SNDBUF should be adjusted to reflect this. The > RCVBUF/SNDBUF sizes are supposed to be hints, no error is returned if they > are not

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-11 Thread Eric Dumazet
On 7/11/19 8:26 PM, Michal Kubecek wrote: > > I'm aware it's not a realistic test. It was written as quick and simple > check of the pre-4.19 patch, but it shows that even TLP may not get > through. Most of TLP probes send new data, not rtx. But yes, I get your point. SO_SNDBUF=15000 in yo

Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

2019-07-12 Thread Eric Dumazet
On 7/11/19 9:04 PM, Jonathan Lemon wrote: > I discovered we have some production services that set SO_SNDBUF to > very small values (~4k), as they are essentially doing interactive > communications, not bulk transfers.  But there's a difference between > "terrible performance" and "TCP stops w

Re: [RFC PATCH net-next 0/3] net: batched receive in GRO path

2019-07-12 Thread Eric Dumazet
On 7/12/19 5:59 PM, Edward Cree wrote: > On 10/07/2019 18:39, Eric Dumazet wrote: >> Holding a small packet in the list up to the point we call busy_poll_stop() >> will basically make busypoll non working anymore. >> >> napi_complete_done() has special behavior

Re: [PATCH v2 net-next 1/3] ipv6: Prevent unexpected sk->sk_prot changes

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 13:08 +, Boris Pismenny wrote: > Hi Eric, > > The IPV6_ADDRFORM socket option assumes that when > (sk->sk_protocol == IPPROTO_TCP) > then the sk_proto is set to tcpv6_prot and it replaces it with tcp_prot. > > This patch ensures that the IPV6_ADDRFORM socket option do

Re: [PATCH] net_sched/sfq: update hierarchical backlog when drop packet

2017-08-15 Thread Eric Dumazet
;[PATCH 1/2] net_sched: call > qlen_notify only if child qdisc is empty". > I hadn't tested them separately. Thanks for the info. I've read this patch and it looks fine indeed :) Acked-by: Eric Dumazet

Re: [PATCH v2] sctp: fully initialize the IPv6 address in sctp_v6_to_addr()

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 12:05 -0300, Marcelo Ricardo Leitner wrote: > Ok, but I should see a difference in the generated code, right? Depends on the compiler. Have you tried older versions ? One argument is that following struct member definition eases code review. (It is easier to catch a field

Re: 100% CPU load when generating traffic to destination network that nexthop is not reachable

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 18:30 +0200, Paweł Staszewski wrote: > Hi > > > Doing some tests i discovered that when traffic is send by pktgen to > forwarding host where nexthop for destination network on forwarding > router is not reachable i have 100% cpu on all cores and perf top show > mostly: >

Re: 100% CPU load when generating traffic to destination network that nexthop is not reachable

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 19:42 +0200, Paweł Staszewski wrote: > # To display the perf.data header info, please use > --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 2M of event 'cycles' > # Event count (approx.): 1585571545969 > # > # Children Self Command

Re: 100% CPU load when generating traffic to destination network that nexthop is not reachable

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 22:45 +0300, Julian Anastasov wrote: > Hello, > > On Tue, 15 Aug 2017, Eric Dumazet wrote: > > > Please try this : > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > > index > > 16a1

Re: [PATCH net-next V2 1/3] tap: use build_skb() for small packet

2017-08-15 Thread Eric Dumazet
On Fri, 2017-08-11 at 19:41 +0800, Jason Wang wrote: > We use tun_alloc_skb() which calls sock_alloc_send_pskb() to allocate > skb in the past. This socket based method is not suitable for high > speed userspace like virtualization which usually: > > - ignore sk_sndbuf (INT_MAX) and expect to rece

Re: [PATCH net-next V2 1/3] tap: use build_skb() for small packet

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 11:55 +0800, Jason Wang wrote: > > On 2017年08月16日 11:45, Eric Dumazet wrote: > > > > You do realize that tun_build_skb() is not thread safe ? > > Ok, I think the issue if skb_page_frag_refill(), need a spinlock > probably. Will prepare a patch.

Re: Something hitting my total number of connections to the server

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 10:18 +0530, Akshat Kakkar wrote: > On Mon, Aug 14, 2017 at 2:37 PM, Akshat Kakkar wrote: > > I have centos 7.3 (Kernel 3.10) running on a server with 128GB RAM and > > 2 x 10 Core Xeon Processor. > > I have hosted a webserver on it and enabled ssh for remote maintenance. > >

Re: [patch net-next repost 1/3] idr: Use unsigned long instead of int

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 04:14 -0400, Chris Mi wrote: > IDR uses internally radix tree which uses unsigned long. It doesn't > makes sense to have index as signed value. > > Signed-off-by: Chris Mi > Signed-off-by: Jiri Pirko > --- > block/bsg.c | 8 ++-- > bloc

Re: [PATCH net] xfrm: Clear sk_dst_cache when applying per-socket policy.

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 11:03 +0200, Jakub Sitnicki wrote: > On Tue, 15 Aug 2017 15:25:10 -0700 > Jonathan Basseri wrote: > > > If an IPv6 socket has a valid dst cache, then xfrm_lookup_route will get > > skipped. However, the cache is not invalidated when applying policy to a > > socket (i.e. IPV6

Re: [patch net-next repost 1/3] idr: Use unsigned long instead of int

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 12:53 +0200, Jiri Pirko wrote: > rhashtable is unnecesary big hammer for this. IDR is nice fit for > this purpose. Obviously IDR does not fit, since you have to change its ABI. If rhashtable does not fit this, then I wonder why we spent so many days of work adding it in the

[PATCH net] dccp: defer ccid_hc_tx_delete() at dismantle time

2017-08-16 Thread Eric Dumazet
From: Eric Dumazet syszkaller team reported another problem in DCCP [1] Problem here is that the structure holding RTO timer (ccid2_hc_tx_rto_expire() handler) is freed too soon. We can not use del_timer_sync() to cancel the timer since this timer wants to grab socket lock (that would risk a

Re: [PATCH 1/2] tcp: Remove unnecessary dst check in tcp_conn_request.

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 06:31 -0700, Tonghao Zhang wrote: > Because we remove the tcp_tw_recycle support in the commit > 4396e46187c ('tcp: remove tcp_tw_recycle') and also delete > the code 'af_ops->route_req' for sysctl_tw_recycle in tcp_conn_request. > Now when we call the 'af_ops->route_req', t

Re: [patch net-next repost 1/3] idr: Use unsigned long instead of int

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 13:06 +0200, Jiri Pirko wrote: > Wed, Aug 16, 2017 at 12:58:53PM CEST, eric.duma...@gmail.com wrote: > >On Wed, 2017-08-16 at 12:53 +0200, Jiri Pirko wrote: > > > >> rhashtable is unnecesary big hammer for this. IDR is nice fit for > >> this purpose. > > > >Obviously IDR does

[PATCH net] tipc: fix use-after-free

2017-08-16 Thread Eric Dumazet
From: Eric Dumazet syszkaller reported use-after-free in tipc [1] When msg->rep skb is freed, set the pointer to NULL, so that caller does not free it again. [1] == BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/c

[PATCH net] ptr_ring: use kmalloc_array()

2017-08-16 Thread Eric Dumazet
From: Eric Dumazet As found by syzkaller, malicious users can set whatever tx_queue_len on a tun device and eventually crash the kernel. Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small ring buffer is not fast anyway. Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for poi

[PATCH net] ipv4: better IP_MAX_MTU enforcement

2017-08-16 Thread Eric Dumazet
From: Eric Dumazet While working on yet another syzkaller report, I found that our IP_MAX_MTU enforcements were not properly done. gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and final result can be bigger than IP_MAX_MTU :/ This is a problem because device mtu can be c

Re: [PATCH net] ipv6: fix NULL dereference in ip6_route_dev_notify()

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote: > On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet wrote: > > From: Eric Dumazet > > > > Based on a syzkaller report [1], I found that a per cpu allocation > > failure in snmp6_alloc_dev() would then

Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 12:13 -0700, David Miller wrote: > From: John Fastabend > Date: Wed, 16 Aug 2017 12:06:36 -0700 > > > On 08/16/2017 11:35 AM, David Miller wrote: > >> From: David Miller > >> Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT) > >> > >>> From: John Fastabend > >>> Date: Tue, 15 A

[PATCH net-next] ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t

2017-08-16 Thread Eric Dumazet
From: Eric Dumazet refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Eric Dumazet --- include/net/dst.h

Re: [PATCH net] ipv6: fix NULL dereference in ip6_route_dev_notify()

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 12:15 -0700, Eric Dumazet wrote: > On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote: > > On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet > > wrote: > > > From: Eric Dumazet > > > > > > Based on a syzkaller report [1], I found

Re: [PATCH net-next] ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t

2017-08-16 Thread Eric Dumazet
On Wed, 2017-08-16 at 12:24 -0700, Eric Dumazet wrote: > From: Eric Dumazet > > refcount_t type and corresponding API should be > used instead of atomic_t when the variable is used as > a reference counter. This allows to avoid accidental > refcounter overflows that might lead

[PATCH v2 net-next] ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t

2017-08-16 Thread Eric Dumazet
From: Eric Dumazet refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Eric Dumazet --- v2: fix a missing

<    1   2   3   4   5   6   7   8   9   10   >