TCP is limited by receive
> window")
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Excellent catch! Thank you for the fix, Eric!
> ---
> net/ipv4/tcp_output.c | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/n
On Thu, Nov 29, 2018 at 10:56 AM Eric Dumazet wrote:
>
> We can remove the loop and conditional branches
> and compute wscale efficiently thanks to ilog2()
>
> Signed-off-by: Eric Dumazet
Acked-by: Soheil Hassas Yeganeh
Very nice, thank you, Eric!
> ---
> net/
> Signed-off-by: Eric Dumazet
> Cc: Willem de Bruijn
> Cc: Soheil Hassas Yeganeh
Acked-by: Soheil Hassas Yeganeh
Thank you, Eric!
> ---
> drivers/net/loopback.c | 4
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/loop
From: Soheil Hassas Yeganeh
When we have less than PAGE_SIZE of data on receive queue,
we set recv_skip_hint to 0. Instead, set it to the actual
number of bytes available.
Signed-off-by: Soheil Hassas Yeganeh
Signed-off-by: Eric Dumazet
---
net/ipv4/tcp.c | 14 +-
1 file changed
From: Soheil Hassas Yeganeh
When SKBs are coalesced, we can have SKBs with different
frag sizes. Some with PAGE_SIZE and some not with PAGE_SIZE.
Since recv_skip_hint is always set to the full SKB size,
it can overestimate the amount that should be read using
normal read for coalesced packets
From: Soheil Hassas Yeganeh
The user-provided value to setsockopt(SO_RCVLOWAT) can be
larger than the maximum possible receive buffer. Such values
mute POLLIN signals on the socket which can stall progress
on the socket.
Limit the user-provided value to half of the maximum receive
buffer, i.e
current packet should be enough.
>
> This should reduce the extra load noticed in DCTCP environments,
> after congestion events.
>
> This is part 2 of our effort to reduce pure ACK packets.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thanks for the patch!
umazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ipv4/tcp_input.c | 24 +---
> 1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input
but also
> because of ACK compression or losses.
>
> We plan to add SACK compression in the following patch, we
> must therefore not call tcp_enter_quickack_mode()
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you, Eric!
ted-by: Michael Wenig <mwe...@vmware.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you for catching and fixing this!
From: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Yuchung Cheng <ych...@google.com>
Signed-off-by: Willem de Bruijn <will...@google.com>
Reviewed-by: Eric Dumazet <eduma...@google.com>
Revi
From: Soheil Hassas Yeganeh <soh...@google.com>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimi
On Tue, May 1, 2018 at 2:34 PM, David Miller <da...@davemloft.net> wrote:
> From: Soheil Hassas Yeganeh <soheil.k...@gmail.com>
> Date: Tue, 1 May 2018 10:11:27 -0400
>
>> +static inline int tcp_inq_hint(struct sock *sk)
>
> Please do not use 'inline' in foo
From: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Yuchung Cheng <ych...@google.com>
Signed-off-by: Willem de Bruijn <will...@google.com>
Reviewed-by: Eric Dumazet <eduma...@google.com>
Revi
From: Soheil Hassas Yeganeh <soh...@google.com>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimi
On Mon, Apr 30, 2018 at 12:10 PM, David Miller wrote:
> From: Eric Dumazet
> Date: Mon, 30 Apr 2018 09:01:47 -0700
>
>> TCP sockets are read by a single thread really (or synchronized
>> threads), or garbage is ensured, regardless of how the kernel
>>
On Mon, Apr 30, 2018 at 11:43 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On 04/30/2018 08:38 AM, David Miller wrote:
>> From: Soheil Hassas Yeganeh <soheil.k...@gmail.com>
>> Date: Fri, 27 Apr 2018 14:57:32 -0400
>>
>>> Since the socket
From: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Yuchung Cheng <ych...@google.com>
Signed-off-by: Willem de Bruijn <will...@google.com>
Reviewed-by: Eric Dumazet <eduma...@google.com>
Revi
From: Soheil Hassas Yeganeh <soh...@google.com>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimi
On Fri, Apr 27, 2018 at 2:50 PM, Soheil Hassas Yeganeh
<soheil.k...@gmail.com> wrote:
> From: Soheil Hassas Yeganeh <soh...@google.com>
>
> Signed-off-by: Soheil Hassas Yeganeh <soh...@google.com>
> Signed-off-by: Yuchung Cheng <ych...@google.com&g
From: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soh...@google.com>
Signed-off-by: Yuchung Cheng <ych...@google.com>
Signed-off-by: Willem de Bruijn <will...@google.com>
Reviewed-by: Eric Dumazet <eduma...@google.com>
Revi
From: Soheil Hassas Yeganeh <soh...@google.com>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimi
_RECEIVE ...)
>
> Note that memcg might require additional changes.
>
> Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Reported-by: syzbot <syzkal...@googlegroups.com>
> Suggested-by:
d, because not properly page
> aligned.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Andy Lutomirski <l...@kernel.org>
> Cc: Soheil Hassas Yeganeh <soh...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you, again!
>
adding a new setsockopt() operation and changes mmap()
> behavior.
>
> Second patch changes tcp_mmap reference program.
>
> v2:
> Added a missing page align of zc->length in tcp_zerocopy_receive()
> Properly clear zc->recv_skip_hint in case user request was completed.
Acked-b
From: Soheil Hassas Yeganeh <soh...@google.com>
Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.
Also, remove the redundant `tp->packets_out = 0` from
tcp_disconn
From: Soheil Hassas Yeganeh <soh...@google.com>
Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.
Also, remove the redundant `tp->packets_out = 0` from
tcp_disconn
On Tue, Apr 3, 2018 at 11:19 AM Miroslav Lichvar wrote:
>
> I came across an interesting issue with error messages in sockets with
> enabled timestamping using the SOF_TIMESTAMPING_OPT_CMSG option. When
> the socket is connected and there is an error (e.g. due to destination
On Mon, Mar 19, 2018 at 10:16 AM Eric Dumazet
wrote:
> On 03/19/2018 07:03 AM, David Miller wrote:
> > From: Eric Dumazet
> > Date: Mon, 19 Mar 2018 05:17:37 -0700
> >
> >> We have sent a fix last week, I am not sure if David took it.
> >>
> >>
From: Soheil Hassas Yeganeh <soh...@google.com>
tcp_write_queue_purge clears all the SKBs in the write queue
but does not reset the sk_send_head. As a result, we can have
a NULL pointer dereference anywhere that we use tcp_send_head
instead of the tcp_write_queue_tail.
For example,
On Wed, Mar 14, 2018 at 12:32 PM Willem de Bruijn <
willemdebruijn.ker...@gmail.com> wrote:
> On Tue, Mar 13, 2018 at 4:35 PM, Vinicius Costa Gomes
> wrote:
> > Hi,
> >
> > Changes from the RFC:
> > - tweaked commit messages;
> >
> > Original cover letter:
> >
> > This
From: Soheil Hassas Yeganeh <soh...@google.com>
When the connection is aborted, there is no point in
keeping the packets on the write queue until the connection
is closed.
Similar to a27fd7a8ed38 ('tcp: purge write queue upon RST'),
this is essential for a correct MSG_ZEROCOPY implemen
er
> of packets that are needed to fill the pipe when a device has
> suboptimal TSO limits.
> Eric Dumazet (2):
>tcp_bbr: better deal with suboptimal GSO (II)
>tcp_bbr: remove bbr->tso_segs_goal
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you
From: Soheil Hassas Yeganeh <soh...@google.com>
When the connection is reset, there is no point in
keeping the packets on the write queue until the connection
is closed.
RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
purging the write queue upon RST:
https://tools.ietf.org/html
From: Soheil Hassas Yeganeh <soh...@google.com>
recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are
gt;1746
> >1781
> >1718
> >
> > Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> > Signed-off-by: Eric Dumazet <eduma...@google.com>
> > Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name>
> > ---
> > net/ipv4/tcp_output.c |9 +
> > 1 file changed, 5 insertions(+), 4 deletions(-)
> Acked-by: Neal Cardwell <ncardw...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you Eric for the nice patch!
p_sendmsg() only deals with CHECKSUM_PARTIAL
> tcp: remove dead code from tcp_set_skb_tso_segs()
> tcp: remove dead code after CHECKSUM_PARTIAL adoption
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Very nice patch-series! Thank you, Eric!
>include/net/soc
From: Soheil Hassas Yeganeh <soh...@google.com>
On multi-threaded processes, one common architecture is to have
one (or a small number of) threads polling sockets, and a
considerably larger pool of threads reading form and writing to the
sockets. When we set RPS core on tcp_poll() or ud
From: Soheil Hassas Yeganeh <soh...@google.com>
We should only record RPS on normal reads and writes.
In single threaded processes, all calls record the same state. In
multi-threaded processes where a separate thread processes
errors, the RFS table mispredicts.
Note that, when CONF
stamp in output path")
>> Signed-off-by: Eric Dumazet <eduma...@google.com>
>> Cc: Soheil Hassas Yeganeh <soh...@google.com>
>> Cc: Mike Maloney <malo...@google.com>
>> Cc: Neal Cardwell <ncardw...@google.com>
>> ---
>
> Acked-by: Neal Cardwell <ncardw...@google.com>
>
> Thanks, Eric!
>
> neal
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
This is a very nice catch! Thank you Eric!
T to arrive (often due to delayed ACKs), then
> the TLP timer fires too quickly.
>
> Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data
> ACKed/SACKed")
> Signed-off-by: Neal Cardwell <ncardw...@google.com>
> Signed-off-by: Yuchung Cheng <ych...@google.com>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Nice fix. Thank you, Neal!
il MTU probing was enabled.
>>
>> Fortunately we can use the tcp_queue enum added later (but in same linux
>> version)
>> for rtx-rb-tree to fix the bug.
>>
>> Fixes: e2080072ed2d ("tcp: new list for sent but unacked skbs for RACK
>> recovery")
>
overhead in both TCP and netem.
>
> v2: removes the swtstamp field from struct tcp_skb_cb
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Soheil Hassas Yeganeh <soh...@google.com>
> Cc: Wei Wang <wei...@google.com>
> Cc: Willem de Bruijn <will...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Very nice!
"tcp: update skb->skb_mstamp more carefully")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Nice catch!
CK would then lead to a very small RTT sample and min_rtt
>>> would then be lowered to this too small value.
>> ...
>>>
>>> Signed-off-by: Eric Dumazet <eduma...@googl.com>
>>> Reported-by: liujian <liujia...@huawei.com>
>>> ---
>>> net/ipv4/tcp_output.c | 19 ---
>>> 1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> Acked-by: Neal Cardwell <ncardw...@google.com>
> Acked-by: Yuchung Cheng <ych...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Very nice! Thank you, Eric!
81
>
> lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 50 -r 300,300 -o
> P99_LATENCY
> 99th Percentile Latency Microseconds
> 858
>
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you, Eric!
p_tx ensures that new
> skbs do not accidentally inherit flags such as SKBTX_SHARED_FRAG.
>
> Signed-off-by: Willem de Bruijn <will...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
From: Soheil Hassas Yeganeh <soh...@google.com>
Prior to f5f99309fa74 (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.
Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets
On Thu, Jun 1, 2017 at 11:36 AM, Cyril Hrubis wrote:
> It seems to repeatedly produce (until I plug the cable back):
>
> ee_errno = 113 ee_origin = 2 ee_type = 3 ee_code = 1 ee_info = 0 ee_data = 0
>
> So we get EHOSTUNREACH on SO_EE_ORIGIN_ICMP.
Thank you very much! I have a
On Thu, Jun 1, 2017 at 11:10 AM, Cyril Hrubis wrote:
>> Thank you for the confirmation. Could you please try the following
>> patch to see if it fixes your issue?
>
> Does not seem to help, I still got the same bussy loop.
Thank you for trying the patch. Unfortunately, I can't
you please try the following
patch to see if it fixes your issue?
>From 3ec438460425d127741b20f03f78644c9e441e8c Mon Sep 17 00:00:00 2001
From: Soheil Hassas Yeganeh <soh...@google.com>
Date: Thu, 1 Jun 2017 10:34:09 -0400
Subject: [PATCH net] sock: reset sk_err when the error queue is empty
Befor
On Thu, Jun 1, 2017 at 10:00 AM, Cyril Hrubis <chru...@suse.cz> wrote:
> I've bisected the problem to this commit:
>
> commit f5f99309fa7481f59a500f0d08f3379cd6424c1f (HEAD, refs/bisect/bad)
> Author: Soheil Hassas Yeganeh <soh...@google.com>
> Date: Thu Nov 3 18:24
retransmits_timed_out()
>
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Signed-off-by: Yuchung Cheng <ych...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Nice!
>
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
Thank you for the fix, Eric!
o correlation to tcp_jiffies32.
>
> We have to convert rto from jiffies to usec, compute a time difference
> in usec, then convert the delta to HZ units.
>
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet <eduma...@go
On Tue, May 16, 2017 at 8:44 AM, Miroslav Lichvar wrote:
> Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
> for incoming packets with hardware timestamps. It contains the index of
> the real interface which received the packet and the length of the
>
ngs/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> include/linux/skbuff.h | 62 +-
> include/linux/tcp.h
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> This CC does not need 1 ms tcp_time_stamp and can use
> the jiffy based 'timestamp'.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@goo
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> This place wants to use tcp_jiffies32, this is good enough.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ip
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> After this patch, all uses of tcp_time_stamp will require
> a change when we introduce 1 ms and/or 1 us TCP TS option.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: So
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> tcp_time_stamp will become slightly more expensive soon,
> cache its value.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Ye
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> tcp_time_stamp will no longer be tied to jiffies.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ipv4/tcp.c
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Ye
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Ye
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> Use tcp_jiffies32 instead of tcp_time_stamp, since
> tcp_time_stamp will soon be only used for TCP TS option.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Ye
c Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ipv4/tcp_input.c | 14 +++---
> net/ipv4/tcp_metrics.c | 2 +-
> net/ipv4/tcp_output.c | 8
> 3 files changed, 12 insertions(+), 12 deletions(-)
>
t <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> include/net/tcp.h | 2 +-
> net/ipv4/tcp.c| 2 +-
> net/ipv4/tcp_cubic.c | 2 +-
> net/ipv4/tcp_input.c | 4 ++--
> net/ipv4/tcp_output.c | 4 ++--
> net/ipv4/tcp_tim
On Tue, May 16, 2017 at 5:00 PM, Eric Dumazet <eduma...@google.com> wrote:
> Use our own macro instead of abusing tcp_time_stamp
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/dccp/ccids
emory.
>
> Since we want in the future to have 1ms TCP TS clock,
> regardless of HZ value, we want to cleanup things.
>
> tcp_jiffies32 is the truncated jiffies value,
> which will be used only in places where we want a 'host'
> timestamp.
>
> Signed-off-by: Eric Dumazet <
resh tp->tcp_mstamp only when necessary.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ipv4/tcp_ipv4.c | 1 +
> net/ipv4/tcp_output.c | 21 +++--
> net/ipv4/tcp_recover
From: Soheil Hassas Yeganeh <soh...@google.com>
Commit bafbb9c73241 ("tcp: eliminate negative reordering
in tcp_clean_rtx_queue") fixes an issue for negative
reordering metrics.
To be resilient to such errors, warn and return
when a negative metric is passed to tcp_update_reord
0 0 0 1696340 197756 1
> 17 83 0 0
> 4 0 0 259829168 46024 27105840 0 16 0 1688472 197158 1
> 17 82 0 0
> 3 0 0 259830224 46024 271040800 0 0 1692450 197212 0
> 18 82 0 0
>
> As expected, number of interrupts per
From: Soheil Hassas Yeganeh <soh...@google.com>
tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
resul
;sock: enable timestamping using control messages")
> Signed-off-by: Douglas Caetano dos Santos <dougla...@taghos.com.br>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/packet/af_packet.c | 14 +++---
> 1 file changed, 7 insertions(+), 7 deletions(
92f ("tcp: do not pass timestamp to tcp_rack_detect_loss()")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> Cc: Soheil Hassas Yeganeh <soh...@google.com>
> Cc: Neal Cardwell <ncardw...@google.com&
cket.
>
> We will use it in the following patches, removing specific
> skb_mstamp_get() calls, and removing ack_time from
> struct tcp_sacktag_state.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
>
before the first wakeup.
>
> This might even allow us to remove some barriers we added in the past.
>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ipv4/tcp_input.c | 4 ++--
> 1 file changed, 2
ialization earlier.
>
> The bug was detected with KMSAN.
Nice catch and thanks for the fix! This is missing a "fixes"
attribution, added below.
> Signed-off-by: Alexander Potapenko <gli...@google.com>
Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
From: Soheil Hassas Yeganeh <soh...@google.com>
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
while packets are collected on the error queue.
So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
is not enough to safely assume that the skb contains
OPT_STATS data.
From: Soheil Hassas Yeganeh <soh...@google.com>
__sock_recv_timestamp can be called for both normal skbs (for
receive timestamps) and for skbs on the error queue (for transmit
timestamps).
Commit 1c885808e456
(tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
assumes any skb
From: Soheil Hassas Yeganeh <soh...@google.com>
Commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection)
randomizes TCP timestamps per connection. After this commit,
there is no guarantee that the timestamps received from the
same destination are monotonically incr
From: Soheil Hassas Yeganeh <soh...@google.com>
The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.
After the randomization of TCP timestamp o
On Sat, Mar 11, 2017 at 9:42 AM, Ezequiel Lara Gomez
wrote:
> Also, cleanup some warnings from timestamping code.
Can you please submit the styling fixes as a separate patch?
Thanks,
Soheil
te, like
> sk_wmem_alloc.
>
> Fixes: bf7fa551e0ce ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi
> ack path")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Alexander Duyck <alexander.h.du...@intel.com>
> Cc: Johannes Berg <johan...@
@intel.com>
> Cc: Johannes Berg <johan...@sipsolutions.net>
> Cc: Soheil Hassas Yeganeh <soh...@google.com>
> Cc: Willem de Bruijn <will...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/core/skbuff.c | 15 ---
> 1
tpath+0x1f/0xc2
> RIP: 0033:0x4458b9
> RSP: 002b:7fe8b26c2b58 EFLAGS: 0292 ORIG_RAX: 0036
> RAX: ffda RBX: 0006 RCX: 0000004458b9
> RDX: 001a RSI: 0001 RDI: 0006
> RBP: 006e2110 R08:
On Thu, Feb 16, 2017 at 11:08 AM, <l...@pengaru.com> wrote:
> On Thu, Feb 16, 2017 at 10:52:19AM -0500, Soheil Hassas Yeganeh wrote:
>> On Thu, Feb 16, 2017 at 10:50 AM, Soheil Hassas Yeganeh
>> <soh...@google.com> wrote:
>> > Thank you Vito for the report.
&g
On Thu, Feb 16, 2017 at 10:50 AM, Soheil Hassas Yeganeh
<soh...@google.com> wrote:
> Thank you Vito for the report.
>
> The patch you cited actually resolves a similar backward compatibility
> problem for traceroute.
>
> I suspect the problem here is that ther
d message below. This was sent to linux-kernel but
> after digging a little I suspect it's specific to the network stack.
>
> Perusing the net/ changes between 4.9 and 4.10-rc8 this sounded awful related
> to what I'm observing:
>
> commit 83a1a1a70e87f676fbb6086b26b6ac7f7fdd107d
> Au
On Tue, Feb 7, 2017 at 2:32 PM, Willem de Bruijn
wrote:
>>> 2) new SO_TIMESTAMPING option to receive from the error queue only
>>>user data as was passed to sendmsg() instead of Ethernet frames
>>>
>>>Parsing Ethernet and IP headers (especially IPv6
On Tue, Feb 7, 2017 at 6:01 AM, Miroslav Lichvar wrote:
> 2) new SO_TIMESTAMPING option to receive from the error queue only
>user data as was passed to sendmsg() instead of Ethernet frames
>
>Parsing Ethernet and IP headers (especially IPv6 options) is not
>fun
From: Soheil Hassas Yeganeh <soh...@google.com>
For TCP sockets, TX timestamps are only captured when the user data
is successfully and fully written to the socket. In many cases,
however, TCP writes can be partial for which no timestamp is
collected.
Collect timestamps whenever any use
On Wed, Jan 4, 2017 at 7:55 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>
> On Tue, 2017-01-03 at 10:22 -0500, Soheil Hassas Yeganeh wrote:
> > On Mon, Jan 2, 2017 at 3:23 PM, Soheil Hassas Yeganeh <soh...@google.com>
> > wrote:
> > > On Mon, Jan 2,
On Mon, Jan 2, 2017 at 3:23 PM, Soheil Hassas Yeganeh <soh...@google.com> wrote:
> On Mon, Jan 2, 2017 at 3:20 PM, Soheil Hassas Yeganeh
> <soheil.k...@gmail.com> wrote:
>> From: Soheil Hassas Yeganeh <soh...@google.com>
>>
>> For TCP sockets, tx times
On Mon, Jan 2, 2017 at 3:20 PM, Soheil Hassas Yeganeh
<soheil.k...@gmail.com> wrote:
> From: Soheil Hassas Yeganeh <soh...@google.com>
>
> For TCP sockets, tx timestamps are only captured when the user data
> is successfully and fully written to the socket. In many cases,
&
From: Soheil Hassas Yeganeh <soh...@google.com>
For TCP sockets, tx timestamps are only captured when the user data
is successfully and fully written to the socket. In many cases,
however, TCP writes can be partial for which no timestamp is
collected.
Collect timestamps when the use
From: Soheil Hassas Yeganeh <soh...@google.com>
Only when ICMP packets are enqueued onto the error queue,
sk_err is also set. Before f5f99309fa74 (sock: do not set sk_err
in sock_dequeue_err_skb), a subsequent error queue read
would set sk_err to the next error on the queue, or 0 if
we can avoid
> the unaligned helpers.
>
> Suggested-by: David Miller <da...@davemloft.net>
> Signed-off-by: Eric Dumazet <eduma...@google.com>
Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
> ---
> net/ipv4/tcp.c | 11 +--
> 1 file changed, 5 insertions
From: Soheil Hassas Yeganeh <soh...@google.com>
Do not send the next message in sendmmsg for partial sendmsg
invocations.
sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting th
1 - 100 of 183 matches
Mail list logo