[PATCH] iov_iter: optimize page_copy_sane()

2018-12-08 Thread Eric Dumazet
Avoid cache line miss dereferencing struct page if we can. page_copy_sane() mostly deals with order-0 pages. Signed-off-by: Eric Dumazet Cc: Al Viro --- lib/iov_iter.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index

Re: [PATCH] Change judgment len position

2018-10-24 Thread Eric Dumazet
On Wed, Oct 24, 2018 at 9:54 AM Joe Perches wrote: > I think if the point is to test for negative numbers, > it's clearer to do that before using min_t.and it's > probably clearer not to use min_t at all. > ... > > if (len > sizeof(int)) > len = sizeof(int); It is a

Re: [PATCH 2/2] x86/percpu: Fix this_cpu_read()

2018-10-11 Thread Eric Dumazet
On Thu, Oct 11, 2018 at 8:50 AM Peter Zijlstra wrote: > Right; it goes back a long long way... is: > > 7c3576d261ce ("[PATCH] i386: Convert PDA into the percpu section") > > early enough? That introduces percpu_from_op(), but arguably the > pda_from_op() it replaces was buggy already. Yeah I

Re: [PATCH 2/2] x86/percpu: Fix this_cpu_read()

2018-10-11 Thread Eric Dumazet
On Thu, Oct 11, 2018 at 8:02 AM Eric Dumazet wrote: > > On Thu, Oct 11, 2018 at 3:45 AM Peter Zijlstra wrote: > > > > Eric reported that a sequence count loop using this_cpu_read() got > > optimized out. This is wrong, this_cpu_read() must imply READ_ONCE() > > bec

Re: [PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin()

2018-10-11 Thread Eric Dumazet
On Thu, Oct 11, 2018 at 12:31 AM Peter Zijlstra wrote: > > On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote: > > While looking at native_sched_clock() disassembly I had > > the surprise to see the compiler (gcc 7.3 here) had > > optimized out the loop, me

Re: [PATCH 2/2] x86/percpu: Fix this_cpu_read()

2018-10-11 Thread Eric Dumazet
e per-cpu value. > > Fixes: 59eaef78bfea ("x86/tsc: Remodel cyc2ns to use seqcount_latch()") > Reported-by: Eric Dumazet > Signed-off-by: Peter Zijlstra (Intel) Acked-by: Eric Dumazet > --- > arch/x86/include/asm/percpu.h |8 > 1 file changed, 4 in

[PATCH] x86/tsc: use real seqcount_latch in cyc2ns_read_begin()

2018-10-10 Thread Eric Dumazet
() by one this_cpu_ptr() makes the generated code smaller. Fixes: 59eaef78bfea ("x86/tsc: Remodel cyc2ns to use seqcount_latch()") Signed-off-by: Eric Dumazet Cc: Peter Zijlstra (Intel) Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" --- arch/x86/kernel/tsc.c | 15 +++

Re: [PATCH net-next] tcp: forbid direct reclaim if MSG_DONTWAIT is set in send path

2018-10-09 Thread Eric Dumazet
On Tue, Oct 9, 2018 at 7:58 AM Eric Dumazet wrote: > > We do not add bloat in the kernel if no application is ever going to > use it, especially in the TCP fast path. > BTW, are you willing to change all memory allocations in the kernel as well ? Let say an application is using a

Re: [PATCH] Input: mousedev - add a schedule point in mousedev_write()

2018-10-04 Thread Eric Dumazet
On 10/04/2018 03:54 PM, Dmitry Torokhov wrote: > OK, I see. I'll apply the patch then. Thanks ! > > I think evdev.c needs similar treatment as it will keep looping while > there is data... Yeah, presumably other drivers need care as well :/

Re: [PATCH] Input: mousedev - add a schedule point in mousedev_write()

2018-10-04 Thread Eric Dumazet
On Thu, Oct 4, 2018 at 12:38 PM Dmitry Torokhov wrote: > > On October 4, 2018 12:28:56 PM PDT, Eric Dumazet wrote: > >On Thu, Oct 4, 2018 at 11:59 AM Dmitry Torokhov > > wrote: > >> > >> Hi Eric, > >> > >> On Thu, Oct 04, 2018 at 08:47:

Re: [PATCH] Input: mousedev - add a schedule point in mousedev_write()

2018-10-04 Thread Eric Dumazet
On Thu, Oct 4, 2018 at 11:59 AM Dmitry Torokhov wrote: > > Hi Eric, > > On Thu, Oct 04, 2018 at 08:47:49AM -0700, Eric Dumazet wrote: > > syzbot was able to trigger rcu stalls by calling write() > > with large number of bytes. > > > > Add a cond_resched() in

[PATCH] Input: mousedev - add a schedule point in mousedev_write()

2018-10-04 Thread Eric Dumazet
syzbot was able to trigger rcu stalls by calling write() with large number of bytes. Add a cond_resched() in the loop to avoid this. Link: https://lkml.org/lkml/2018/8/23/1106 Signed-off-by: Eric Dumazet Reported-by: syzbot+9436b02171ac0894d...@syzkaller.appspotmail.com Cc: Dmitry Torokhov Cc

Re: KMSAN: uninit-value in __dev_mc_add

2018-09-27 Thread Eric Dumazet
On Thu, Sep 27, 2018 at 2:30 PM Vladis Dronov wrote: > > Hello, > > This report is actually for the same bug which was reported in: > > https://syzkaller.appspot.com/bug?id=088efeac32fdde781038a777a63e436c0d4d7036 > > The note there that the bug was fixed by "Commits: net: fix uninit-value in >

Re: KASAN: use-after-free Read in tcf_block_find

2018-09-27 Thread Eric Dumazet
On 09/27/2018 06:02 AM, Dmitry Vyukov wrote: > I am not suggesting to commit this. This is just a hack for debugging. > It in fact lead to some warnings, but still allowed me to reproduce > the bug reliably. > Had you got more meaningful stack traces ? (Showing which context was actually

Re: KASAN: use-after-free Read in tcf_block_find

2018-09-27 Thread Eric Dumazet
On 09/27/2018 01:10 AM, Dmitry Vyukov wrote: > > Would a stack trace for call_rcu be helpful here? I have this idea for > a long time, but never get around to implementing it: > https://bugzilla.kernel.org/show_bug.cgi?id=198437 > > Also FWIW I recently used the following hack for another

Re: KASAN: use-after-free Read in tcf_block_find

2018-09-26 Thread Eric Dumazet
On 09/26/2018 02:44 PM, Cong Wang wrote: > On Wed, Sep 26, 2018 at 8:44 AM syzbot > wrote: >> >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:4b1bd6976945 net: phy: marvell: Fix build. >> git tree: net-next >> console output:

Re: SLAB_TYPESAFE_BY_RCU without constructors (was Re: [PATCH v4 13/17] khwasan: add hooks implementation)

2018-08-01 Thread Eric Dumazet
On Wed, Aug 1, 2018 at 9:47 AM Dmitry Vyukov wrote: > > Proving with numbers is required for a claimed performance improvement > at the cost of code degradation/increase. For a win-win change there > is really nothing to prove. You have to _prove_ it is a win-win. It is not sufficient to claim

Re: [PATCH] net/rds/Kconfig: RDS should depend on IPV6

2018-07-27 Thread Eric Dumazet
On 07/25/2018 03:20 PM, Anders Roxell wrote: > Build error, implicit declaration of function __inet6_ehashfn shows up > When RDS is enabled but not IPV6. > net/rds/connection.c: In function ‘rds_conn_bucket’: > net/rds/connection.c:67:9: error: implicit declaration of function >

Re: [PATCH net-next] tcp: add SNMP counter for the number of packets pruned from ofo queue

2018-07-25 Thread Eric Dumazet
On 07/25/2018 06:40 AM, Yafang Shao wrote: > > Because we want to know why packets were dropped. > If that could be show in netstat, we could easily find that it is > dropped due to ofo prune. We have a counter already for these events : LINUX_MIB_OFOPRUNED You want to add another counter

Re: [PATCH net-next] tcp: add SNMP counter for the number of packets pruned from ofo queue

2018-07-25 Thread Eric Dumazet
On 07/25/2018 06:06 AM, Yafang Shao wrote: > LINUX_MIB_OFOPRUNED is used to count how many times ofo queue is pruned, > but sometimes we want to know how many packets are pruned from this queue, > that could help us to track the dropped packets. > > As LINUX_MIB_OFOPRUNED is a useful event for

[tip:timers/core] ktime: Provide typesafe ktime_to_ns()

2018-07-12 Thread tip-bot for Eric Dumazet
Commit-ID: a8802d97e73346bc81609df9dfba7d3306f40d87 Gitweb: https://git.kernel.org/tip/a8802d97e73346bc81609df9dfba7d3306f40d87 Author: Eric Dumazet AuthorDate: Wed, 11 Jul 2018 11:16:41 -0700 Committer: Thomas Gleixner CommitDate: Thu, 12 Jul 2018 21:35:28 +0200 ktime: Provide

[PATCH] ktime: provide typesafe ktime_to_ns()

2018-07-11 Thread Eric Dumazet
Using ktime_to_ns() is nice to help backports to stable kernels. Having a typesafe function instead of a macro avoid stupid typos and waste of time tracking these typos. Signed-off-by: Eric Dumazet Reported-by: Willem de Bruijn Cc: John Stultz Cc: Peter Zijlstra Cc: Linus Torvalds

Re: [PATCH net-next 2/2] tcp: refactor tcp_queue_rcv

2018-07-11 Thread Eric Dumazet
Hi Yafang On Wed, Jul 11, 2018 at 6:17 AM Yafang Shao wrote: > > There're are some code similar to tcp_queue_rcv() in tcp_ofo_queue(), > so refactor tcp_queue_rcv() to make it be used in tcp_ofo_queue(). > > After this change, skb->sk is set when skb is moved from ofo queue into > receive queue

Re: [net-next,v3] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy

2018-07-10 Thread Eric Dumazet
On 07/10/2018 05:38 AM, Eric Dumazet wrote: > Note that if we always do jiffies_to_msecs(icsk->icsk_user_timeout) in TCP, > we also could change the convention and store msecs in this field instead of > jiffies. > > That would eliminate the msecs_to_jiffies() and jiffie

Re: [PATCH] tcp: Added check of destination specific CC before sending syn/ack

2018-07-09 Thread Eric Dumazet
On 07/09/2018 04:25 AM, joakim.mis...@gmail.com wrote: > From: Joakim Misund > > Issue: > Currently TCP stack does not check for a destination specific CC before > responding to a syn with a syn/ack. > The system wide default CC is used. If the default CC does not need ECN, but > the

[tip:irq/core] genirq: Speedup show_interrupts()

2018-06-22 Thread tip-bot for Eric Dumazet
Commit-ID: 74bdf7815dfb3805a37b0bba615814063a227bf5 Gitweb: https://git.kernel.org/tip/74bdf7815dfb3805a37b0bba615814063a227bf5 Author: Eric Dumazet AuthorDate: Wed, 20 Jun 2018 08:03:32 -0700 Committer: Thomas Gleixner CommitDate: Fri, 22 Jun 2018 14:22:58 +0200 genirq: Speedup

[PATCH] genirq: speedup show_interrupts()

2018-06-20 Thread Eric Dumazet
t_irqs_cpu() and abuse irq_to_desc() while holding rcu read lock, since desc and desc->kstat_irqs wont disappear or change. Signed-off-by: Eric Dumazet --- kernel/irq/proc.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/kernel/irq/proc.c b/kernel/irq

Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"

2018-06-19 Thread Eric Dumazet
On 06/19/2018 05:13 PM, Michael Ellerman wrote: > Just so I'm clear, this turned out to be a driver/hw problem rather than > the arch csum implementation? Yes, that was a driver bug. I will send an official patch to fix this. You guys will have faster RX, since CHECKSUM_COMPLETE will

[tip:irq/core] genirq: Use rcu in kstat_irqs_usr()

2018-06-19 Thread tip-bot for Eric Dumazet
Commit-ID: 4a5f4d2f891bcff7285b5f7490ed5a7a5d516049 Gitweb: https://git.kernel.org/tip/4a5f4d2f891bcff7285b5f7490ed5a7a5d516049 Author: Eric Dumazet AuthorDate: Mon, 18 Jun 2018 05:56:12 -0700 Committer: Thomas Gleixner CommitDate: Tue, 19 Jun 2018 09:19:40 +0200 genirq: Use rcu

Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"

2018-06-18 Thread Eric Dumazet
On 06/18/2018 04:29 PM, Eric Dumazet wrote: > > > On 06/18/2018 11:45 AM, Mathieu Malaterre wrote: > >> >> Here is what I get on my side >> >> [ 53.628847] sungem: sungem wrong csum : 4e04/f97, len 64 bytes >> [ 53.667063] sungem: su

[PATCH] genirq: use rcu in kstat_irqs_usr()

2018-06-18 Thread Eric Dumazet
pts() case will be handled in a separate patch. Signed-off-by: Eric Dumazet Reported-by: Jeremy Dorfman Cc: Thomas Gleixner Cc: Willem de Bruijn --- kernel/irq/irqdesc.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqde

Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-05-29 Thread Eric Dumazet
On 05/29/2018 11:44 PM, Eric Dumazet wrote: > > And I will add this simple fix, this really should address your initial > concern much better. > > @@ -99,6 +100,8 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, > int order, > { >

Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-05-29 Thread Eric Dumazet
On 05/29/2018 11:34 PM, Eric Dumazet wrote: > I will test : > > diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c > b/drivers/net/ethernet/mellanox/mlx4/icm.c > index > 685337d58276fc91baeeb64387c52985e1bc6dda..4d2a71381acb739585d662175e86caef72338097 > 10064

Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-05-29 Thread Eric Dumazet
On 05/25/2018 10:23 AM, David Miller wrote: > From: Qing Huang > Date: Wed, 23 May 2018 16:22:46 -0700 > >> When a system is under memory presure (high usage with fragments), >> the original 256KB ICM chunk allocations will likely trigger kernel >> memory management to enter slow path doing

Re: [PATCH v2 net-next] tcp: use data length instead of skb->len in tcp_probe

2018-05-29 Thread Eric Dumazet
On 05/29/2018 07:36 AM, Yafang Shao wrote: > On Tue, May 29, 2018 at 10:15 PM, David Miller wrote: >> From: Yafang Shao >> Date: Fri, 25 May 2018 18:14:05 +0800 >> >>> skb->len is meaningless to user. >>> data length could be more helpful, with which we can easily filter out >>> the packet

Re: [PATCH v3 net-next 2/2] tcp: minor optimization around tcp_hdr() usage in tcp receive path

2018-05-28 Thread Eric Dumazet
On 05/28/2018 05:41 PM, Yafang Shao wrote: > OK. > > And what about introducing a new helper tcp_hdr_fast() ? > > /* use it when tcp header has not been pulled yet */ > static inline struct tcphdr *tcp_hdr_fast(const struct sk_buff *skb) > > { > > return (const struct tcphdr

Re: [PATCH v3 net-next 2/2] tcp: minor optimization around tcp_hdr() usage in tcp receive path

2018-05-28 Thread Eric Dumazet
On Mon, May 28, 2018 at 8:36 AM Yafang Shao <laoar.s...@gmail.com> wrote: > This is additional to the commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage"). > At this point, skb->data is same with tcp_hdr() as tcp header has not > been pulled yet

Re: INFO: rcu detected stall in corrupted

2018-05-21 Thread Eric Dumazet
On 05/21/2018 11:09 AM, David Miller wrote: > From: syzbot > Date: Mon, 21 May 2018 11:05:02 -0700 > >> find_match+0x244/0x13a0 net/ipv6/route.c:691 >> find_rr_leaf net/ipv6/route.c:729 [inline] >> rt6_select net/ipv6/route.c:779

Re: [PATCH] bpf: check NULL for sk_to_full_sk()

2018-05-21 Thread Eric Dumazet
On 05/21/2018 12:55 AM, YueHaibing wrote: > like commit df39a9f106d5 ("bpf: check NULL for sk_to_full_sk() return value"), > we should check sk_to_full_sk return value against NULL. > > Signed-off-by: YueHaibing > --- > include/linux/bpf-cgroup.h | 2 +- > 1 file

Re: INFO: rcu detected stall in is_bpf_text_address

2018-05-19 Thread Eric Dumazet
SCTP experts, please take a look. On 05/19/2018 08:55 AM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:    73fcb1a370c7 Merge branch 'akpm' (patches from Andrew) > git tree:   upstream > console output:

Re: WARNING in ip_recv_error

2018-05-18 Thread Eric Dumazet
On 05/18/2018 05:08 AM, DaeRyong Jeong wrote: > We report the crash: WARNING in ip_recv_error > (I resend the email since I mistakenly missed the subject in my previous > email. I'm sorry.) > > > This crash has been found in v4.17-rc1 using RaceFuzzer (a modified > version of Syzkaller), which

Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks

2018-05-17 Thread Eric Dumazet
On 05/17/2018 01:53 PM, Qing Huang wrote: > When a system is under memory presure (high usage with fragments), > the original 256KB ICM chunk allocations will likely trigger kernel > memory management to enter slow path doing memory compact/migration > ops in order to complete high order memory

Re: [PATCH V2] mlx4_core: allocate ICM memory in page size chunks

2018-05-15 Thread Eric Dumazet
On 05/15/2018 11:53 AM, Qing Huang wrote: > >> This is control path so it is less latency-sensitive. >> Let's not produce unnecessary degradation here, please call kvzalloc so we >> maintain a similar behavior when contiguous memory is available, and a >> fallback for resiliency. > > No sure

Re: [PATCH v2] {net, IB}/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'

2018-05-13 Thread Eric Dumazet
On 05/13/2018 12:00 AM, Christophe JAILLET wrote: > When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to > free it. > > Signed-off-by: Christophe JAILLET > --- > v1 -> v2: More places to update have been added to the patch Please add

Re: INFO: rcu detected stall in kfree_skbmem

2018-05-11 Thread Eric Dumazet
On 05/11/2018 11:41 AM, Marcelo Ricardo Leitner wrote: > But calling ip6_xmit with rcu_read_lock is expected. tcp stack also > does it. > Thus I think this is more of an issue with IPv6 stack. If a host has > an extensive ip6tables ruleset, it probably generates this more > easily. > >>>

Re: [PATCH 14/32] net/tcp: convert to ->poll_mask

2018-05-11 Thread Eric Dumazet
On 05/11/2018 04:07 AM, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig > --- > include/net/tcp.h | 4 ++-- > net/ipv4/af_inet.c | 3 ++- > net/ipv4/tcp.c | 31 ++- > net/ipv6/af_inet6.c | 3 ++- > 4 files changed, 20

Re: [PATCH net-next v2] tcp: Add mark for TIMEWAIT sockets

2018-05-10 Thread Eric Dumazet
On 05/09/2018 11:53 PM, Jon Maxwell wrote: > This version has some suggestions by Eric Dumazet: > > - Use a local variable for the mark in IPv6 instead of ctl_sk to avoid SMP > races. > - Use the more elegant "IP4_REPLY_MARK(net, skb->mark) ?: sk->sk_mark" &

Re: [PATCH net-next v1] tcp: Add mark for TIMEWAIT sockets

2018-05-09 Thread Eric Dumazet
On 05/09/2018 10:21 PM, Jon Maxwell wrote: ... > if (th->rst) > @@ -723,11 +724,17 @@ static void tcp_v4_send_reset(const struct sock *sk, > struct sk_buff *skb) > arg.tos = ip_hdr(skb)->tos; > arg.uid = sock_net_uid(net, sk && sk_fullsock(sk) ? sk : NULL); >

Re: [PATCH net-next] tcp: Add mark for TIMEWAIT sockets

2018-05-09 Thread Eric Dumazet
On 05/09/2018 07:07 PM, Jon Maxwell wrote: > Aidan McGurn from Openwave Mobility systems reported the following bug: > > "Marked routing is broken on customer deployment. Its effects are large > increase in Uplink retransmissions caused by the client never receiving > the final ACK to their

Re: KASAN: use-after-free Read in __dev_queue_xmit

2018-05-09 Thread Eric Dumazet
On 05/09/2018 12:21 PM, Willem de Bruijn wrote: > Indeed. The skb shared info struct is zeroed by dev_validate_header > as a result of dev->hard_header_len exceeding skb->end - skb->data. > > Not exactly sure yet how this can happen. The hard header length space > is accounted for during

Re: BUG: spinlock bad magic in tun_do_read

2018-05-08 Thread Eric Dumazet
On 05/07/2018 10:54 PM, Cong Wang wrote: > On Mon, May 7, 2018 at 10:27 PM, syzbot > wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:75bc37fefc44 Linux 4.17-rc4 >> git tree: upstream >> console

Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats

2018-05-07 Thread Eric Dumazet
On 05/07/2018 07:16 PM, Jia-Ju Bai wrote: > Yes, ">stats" will not change, because it is a fixed address. > But the field data in "dev->stats" is changed (rx_frame_errors, rx_crc_errors > and rx_missed_errors). > So if the driver returns ">stats" without lock protection (like on line > 858),

Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats

2018-05-07 Thread Eric Dumazet
On 05/07/2018 05:51 PM, Jia-Ju Bai wrote: > > > On 2018/5/7 22:15, Eric Dumazet wrote: >> >> On 05/07/2018 07:08 AM, Jia-Ju Bai wrote: >>> The write operations to "dev->stats" are protected by >>> the spinlock on line 862-864, but the

Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats

2018-05-07 Thread Eric Dumazet
On 05/07/2018 07:08 AM, Jia-Ju Bai wrote: > The write operations to "dev->stats" are protected by > the spinlock on line 862-864, but the read operations to > this data on line 858 and 867 are not protected by the spinlock. > Thus, there may exist data races for "dev->stats". > > To fix the

Re: WARNING in kernfs_add_one

2018-05-05 Thread Eric Dumazet
On 05/05/2018 09:40 AM, Greg KH wrote: > On Sat, May 05, 2018 at 08:47:02AM -0700, syzbot wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:8fb11a9a8d51 net/ipv6: rename rt6_next to fib6_next >> git tree: net-next >> console output:

Re: [PATCH] net: disable UDP punt on sockets in RCV_SHUTDWON

2018-05-04 Thread Eric Dumazet
On 05/04/2018 02:08 PM, Chintan Shah wrote: > A UDP application which opens multiple sockets with same local > address/port combination (using SO_REUSEPORT/SO_REUSEADDR socket options); > and issues connect to a remote socket (using one of these local socket). > Now if the same socket, which

[PATCH v4 net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

2018-04-27 Thread Eric Dumazet
number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet <eduma...@google.com> Cc: Andy Lutomirski <l...@kernel.org> Acked-by: Soheil Ha

[PATCH v4 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-27 Thread Eric Dumazet
p_hint in case user request was completed. Eric Dumazet (2): tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE include/uapi/linux/tcp.h | 8 + net/ipv4/af_inet.c | 2 + net/ipv4

[PATCH v4 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-27 Thread Eric Dumazet
use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <eduma...@google.com> Reported-by: syzbot <syzkal...@googlegroups.com>

Re: [PATCH v2 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-27 Thread Eric Dumazet
On Fri, Apr 27, 2018 at 1:45 AM kbuild test robot <l...@intel.com> wrote: > Hi Eric, > Thank you for the patch! Yet something to improve: > [auto build test ERROR on net-next/master] > url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-add-TCP_ZEROCOPY_RECEIVE-su

Re: [PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-26 Thread Eric Dumazet
On 04/26/2018 02:16 PM, Andy Lutomirski wrote: > At the risk of further muddying the waters, there's another minor tweak > that could improve performance on certain workloads. Currently you mmap() > a range for a given socket and then getsockopt() to receive. If you made > it so you could

Re: [PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-26 Thread Eric Dumazet
On 04/25/2018 06:20 PM, Soheil Hassas Yeganeh wrote: > > Acked-by: Soheil Hassas Yeganeh > > Thanks Soheil for reviewing. I have changed setsockopt() to getsockopt() so chose to not carry your Acked-by Please add it back if you agree, thanks !

[PATCH v3 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-26 Thread Eric Dumazet
use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <eduma...@google.com> Reported-by: syzbot <syzkal...@googlegroups.com>

[PATCH v3 net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

2018-04-26 Thread Eric Dumazet
number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet <eduma...@google.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Soheil Hassas

[PATCH v3 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-26 Thread Eric Dumazet
. v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option instead of setsockopt(), feedback from Ka-Cheon Poon v2: Added a missing page align of zc->length in tcp_zerocopy_receive() Properly clear zc->recv_skip_hint in case user request was completed. Eric Dumazet (2): tc

Re: [PATCH v2 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-26 Thread Eric Dumazet
On 04/26/2018 06:40 AM, Ka-Cheong Poon wrote: > A quick question.  Is it a normal practice to return a result > in setsockopt() given that the optval parameter is supposed to > be a const void *? Very good question. Andy suggested an ioctl() or setsockopt(), and I chose setsockopt() but it

[PATCH v2 net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <eduma...@google.com> Reported-by: syzbot <syzkal...@googlegroups.com>

[PATCH v2 net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

2018-04-25 Thread Eric Dumazet
number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet <eduma...@google.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Soheil Hassas

[PATCH v2 net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-25 Thread Eric Dumazet
. v2: Added a missing page align of zc->length in tcp_zerocopy_receive() Properly clear zc->recv_skip_hint in case user request was completed. Eric Dumazet (2): tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE include/uapi

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:35 AM, Eric Dumazet wrote: > > > On 04/25/2018 09:22 AM, Andy Lutomirski wrote: > >> In general, I suspect that the zerocopy receive mechanism will only >> really be a win in single-threaded applications that consume large >> amounts of rec

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:22 AM, Andy Lutomirski wrote: > In general, I suspect that the zerocopy receive mechanism will only > really be a win in single-threaded applications that consume large > amounts of receive bandwidth on a single TCP socket using lots of > memory and don't do all that much else.

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/25/2018 09:04 AM, Matthew Wilcox wrote: > If you don't zap the page range, any of the CPUs in the system where > any thread in this task have ever run may have a TLB entry pointing to > this page ... if the page is being recycled into the page allocator, > then that page might end up as a

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/24/2018 10:27 PM, Eric Dumazet wrote: > When adding tcp mmap() implementation, I forgot that socket lock > had to be taken before current->mm->mmap_sem. syzbot eventually caught > the bug. > + ... > + down_read(>mm->mmap_sem); > + > + ret

Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-25 Thread Eric Dumazet
On 04/24/2018 11:28 PM, Christoph Hellwig wrote: > On Tue, Apr 24, 2018 at 10:27:21PM -0700, Eric Dumazet wrote: >> When adding tcp mmap() implementation, I forgot that socket lock >> had to be taken before current->mm->mmap_sem. syzbot eventually caught >> the bug.

[PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

2018-04-24 Thread Eric Dumazet
use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...) Note that memcg might require additional changes. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <eduma...@google.com> Reported-by: syzbot <syzkal...@googlegroups.com>

[PATCH net-next 0/2] tcp: mmap: rework zerocopy receive

2018-04-24 Thread Eric Dumazet
. Eric Dumazet (2): tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE include/uapi/linux/tcp.h | 8 ++ net/ipv4/tcp.c | 186 + tools/testing/selftests/net/tcp_mmap.c

[PATCH net-next 2/2] selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

2018-04-24 Thread Eric Dumazet
number of bytes that should be read using conventional read()/recv()/recvmsg() system calls, to skip a sequence of bytes that can not be mapped, because not properly page aligned. Signed-off-by: Eric Dumazet <eduma...@google.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Soheil Hassas

Re: [PATCH net-next] net: init sk_cookie for inet socket

2018-04-24 Thread Eric Dumazet
On 04/24/2018 04:47 AM, Yafang Shao wrote: > > Could you pls. explain the issue to me ? Just run a synflood test on your host, it will definitely show the atomic consuming most cpu cycles in inet_reqsk_alloc(), because of huge contention on a cache line shared by all cpus. Performance is

Re: [PATCH net-next] net: init sk_cookie for inet socket

2018-04-24 Thread Eric Dumazet
On 04/23/2018 09:39 PM, Yafang Shao wrote: > On Tue, Apr 24, 2018 at 12:09 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: >> >> >> On 04/23/2018 08:58 AM, David Miller wrote: >>> From: Yafang Shao <laoar.s...@gmail.com> >>> Date: Sun, 22 Apr

Re: [PATCH net-next 0/4] mm,tcp: provide mmap_hook to solve lockdep issue

2018-04-23 Thread Eric Dumazet
On 04/23/2018 07:04 PM, Andy Lutomirski wrote: > On Mon, Apr 23, 2018 at 2:38 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: >> Hi Andy >> >> On 04/23/2018 02:14 PM, Andy Lutomirski wrote: > >>> I would suggest that you rework the interface a bit. First

Re: [PATCH net-next 0/4] mm,tcp: provide mmap_hook to solve lockdep issue

2018-04-23 Thread Eric Dumazet
Hi Andy On 04/23/2018 02:14 PM, Andy Lutomirski wrote: > On 04/20/2018 08:55 AM, Eric Dumazet wrote: >> This patch series provide a new mmap_hook to fs willing to grab >> a mutex before mm->mmap_sem is taken, to ensure lockdep sanity. >> >> This hook allows us to sho

Re: [PATCH net-next] net: init sk_cookie for inet socket

2018-04-23 Thread Eric Dumazet
On 04/23/2018 08:58 AM, David Miller wrote: > From: Yafang Shao > Date: Sun, 22 Apr 2018 21:50:04 +0800 > >> With sk_cookie we can identify a socket, that is very helpful for >> traceing and statistic, i.e. tcp tracepiont and ebpf. >> So we'd better init it by default for

Re: WARNING: suspicious RCU usage in rt6_check_expired

2018-04-23 Thread Eric Dumazet
On 04/23/2018 01:24 AM, syzbot wrote: > Hello, > > syzbot hit the following crash on net-next commit > 0638eb573cde5888c0886c7f35da604e5db209a6 (Sat Apr 21 20:06:14 2018 +) > Merge branch 'ipv6-Another-followup-to-the-fib6_info-change' > syzbot dashboard link: >

Re: [PATCH tip/core/rcu 07/22] softirq: Eliminate unused cond_resched_softirq() macro

2018-04-23 Thread Eric Dumazet
t. > > > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > Cc: Ingo Molnar <mi...@redhat.com> > Fair enough, > Acked-by: Peter Zijlstra (Intel) <pet...@infradead.org> Yes, I suggested this removal in https://www.spinics.net/lists/netdev

Re: [PATCH net-next 0/4] mm,tcp: provide mmap_hook to solve lockdep issue

2018-04-21 Thread Eric Dumazet
On 04/21/2018 02:07 AM, Christoph Hellwig wrote: > On Fri, Apr 20, 2018 at 08:55:38AM -0700, Eric Dumazet wrote: >> This patch series provide a new mmap_hook to fs willing to grab >> a mutex before mm->mmap_sem is taken, to ensure lockdep sanity. >> >> This hook

[PATCH net-next 4/4] tcp: mmap: move the skb cleanup to tcp_mmap_hook()

2018-04-20 Thread Eric Dumazet
s to perform mm operations without delay. Note that the preparation work (building the array of page pointers) can also be done from tcp_mmap_hook() while mmap_sem has not been taken yet, but this is another independent change. Signed-off-by: Eric Dumazet <eduma...@google.com> --- n

[PATCH net-next 2/4] net: implement sock_mmap_hook()

2018-04-20 Thread Eric Dumazet
sock_mmap_hook() is the mmap_hook handler provided for socket_file_ops Following patch will provide tcp_mmap_hook() for TCP protocol. Signed-off-by: Eric Dumazet <eduma...@google.com> --- include/linux/net.h | 1 + net/socket.c| 9 + 2 files changed, 10 insertions(+) diff

[PATCH net-next 1/4] mm: provide a mmap_hook infrastructure

2018-04-20 Thread Eric Dumazet
in multi-threading programs. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <eduma...@google.com> Reported-by: syzbot <syzkal...@googlegroups.com> --- include/linux/fs.h | 6 ++ mm/util.c | 19 ++- 2 fil

[PATCH net-next 0/4] mm,tcp: provide mmap_hook to solve lockdep issue

2018-04-20 Thread Eric Dumazet
This patch series provide a new mmap_hook to fs willing to grab a mutex before mm->mmap_sem is taken, to ensure lockdep sanity. This hook allows us to shorten tcp_mmap() execution time (while mmap_sem is held), and improve multi-threading scalability. Eric Dumazet (4): mm: provide a mmap_h

[PATCH net-next 3/4] tcp: provide tcp_mmap_hook()

2018-04-20 Thread Eric Dumazet
tcp_mmap() execution time and thus increase mmap() performance in multi-threaded programs. Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive") Signed-off-by: Eric Dumazet <eduma...@google.com> Reported-by: syzbot <syzkal...@googlegroups.com> --- include/net/tcp

Re: [PATCH net] tcp: don't read out-of-bounds opsize

2018-04-20 Thread Eric Dumazet
On 04/20/2018 06:57 AM, Jann Horn wrote: > The old code reads the "opsize" variable from out-of-bounds memory (first > byte behind the segment) if a broken TCP segment ends directly after an > opcode that is neither EOL nor NOP. > > The result of the read isn't used for anything, so the worst

Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM

2018-04-19 Thread Eric Dumazet
On 04/19/2018 09:12 AM, Mikulas Patocka wrote: > > > These bugs are hard to reproduce because vmalloc falls back to kmalloc > only if memory is fragmented. > This sentence is wrong. because kvmalloc() falls back to vmalloc() ...

Re: WARNING: suspicious RCU usage in fib6_info_alloc

2018-04-18 Thread Eric Dumazet
On 04/18/2018 02:04 PM, David Ahern wrote: > On 4/18/18 3:02 PM, syzbot wrote: >> stack backtrace: >> CPU: 1 PID: 25 Comm: kworker/1:1 Not tainted 4.16.0+ #5 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Workqueue: ipv6_addrconf

Re: [PATCH] net: don't use kvzalloc for DMA memory

2018-04-18 Thread Eric Dumazet
On 04/18/2018 10:55 AM, Michael S. Tsirkin wrote: > Imagine you want to pass some data to card. > Natural thing is to just put it in a variable and start DMA. > However DMA API disallows stack access nowdays, > so it's natural to put this within struct device. > > See e.g. > > commit

Re: [PATCH] net: don't use kvzalloc for DMA memory

2018-04-18 Thread Eric Dumazet
On 04/18/2018 09:44 AM, Mikulas Patocka wrote: > > > On Wed, 18 Apr 2018, Eric Dumazet wrote: > >> >> >> On 04/18/2018 07:34 AM, Mikulas Patocka wrote: >>> The patch 74d332c13b21 changes alloc_netdev_mqs to use vzalloc if kzalloc >>

Re: [PATCH] net: don't use kvzalloc for DMA memory

2018-04-18 Thread Eric Dumazet
On 04/18/2018 07:34 AM, Mikulas Patocka wrote: > The patch 74d332c13b21 changes alloc_netdev_mqs to use vzalloc if kzalloc > fails (later patches change it to kvzalloc). > > The problem with this is that if the vzalloc function is actually used, > virtio_net doesn't work (because it expects

Re: [PATCH v2 net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust

2018-04-17 Thread Eric Dumazet
On 04/17/2018 09:36 AM, Yafang Shao wrote: > tcp_rcv_space_adjust is called every time data is copied to user space, > introducing a tcp tracepoint for which could show us when the packet is > copied to user. > This could help us figure out whether there's latency in user process. > > When a

Re: [PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust

2018-04-16 Thread Eric Dumazet
On 04/16/2018 08:33 AM, Yafang Shao wrote: > tcp_rcv_space_adjust is called every time data is copied to user space, > introducing a tcp tracepoint for which could show us when the packet is > copied to user. > This could help us figure out whether there's latency in user process. > > When a

Re: instant reboot caused by 194a9749c73d650c0

2018-04-16 Thread Eric Dumazet
On 04/16/2018 02:15 AM, Kirill A. Shutemov wrote: > On Mon, Apr 16, 2018 at 08:07:09AM +0200, Ingo Molnar wrote: >> >> * Eric Dumazet <eric.duma...@gmail.com> wrote: >> >>> Hi Kirill >>> >>> For some reason, my hosts instantly cras

instant reboot caused by 194a9749c73d650c0

2018-04-14 Thread Eric Dumazet
Hi Kirill For some reason, my hosts instantly crash at boot time, with absolutely no log on console. Bisection pointed to : $ git bisect bad 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 is the first bad commit commit 194a9749c73d650c0b1dfdee04fb0bdf0a888ba8 Author: Kirill A. Shutemov

  1   2   3   4   5   6   7   8   9   10   >