Re: [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent

2018-07-03 Thread Lawrence Brakmo
On 7/3/18, 6:15 AM, "Neal Cardwell"  wrote:

On Mon, Jul 2, 2018 at 7:49 PM Yuchung Cheng  wrote:
>
> On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo  wrote:
> >
> > DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> > notifications to keep track if it needs to send an ACK for packets that
> > were received with a particular ECN state but whose ACK was delayed.
> >
> > Under some circumstances, for example when a delayed ACK is sent with a
> > data packet, DCTCP state was not being updated due to a lack of
> > notification that the previously delayed ACK was sent. As a result, it
> > would sometimes send a duplicate ACK when a new data packet arrived.
> >
> > This patch insures that DCTCP's state is correctly updated so it will
> > not send the duplicate ACK.
> Sorry to chime-in late here (lame excuse: IETF deadline)
>
> IIRC this issue would exist prior to 4.11 kernel. While it'd be good
> to fix that, it's not clear which patch introduced the regression
> between 4.11 and 4.16? I assume you tested Eric's most recent quickack
> fix.

I don't think Larry is saying there's a regression between 4.11 and
4.16. His recent "tcp: force cwnd at least 2 in tcp_cwnd_reduction"
patch here:

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_935233_&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=pq_Mqvzfy-C8ltkgyx1u_g&m=7ko5C_ln2b7gZvz2A_UrZzz0AlcnhrW7-9KRahj_PEA&s=HcpZvo1TulN4-Y7Jhba5KM1MIaPwnBC95T8pLZfJESI&e=

said that 4.11 was bad (high tail latency and lots of RTOs) and 4.16
was still bad but with different netstat counters (no RTOs but still
high tail latency):

"""
On 4.11, pcap traces indicate that in some instances the 1st packet of
the RPC is received but no ACK is sent before the packet is
retransmitted. On 4.11 netstat shows TCP timeouts, with some of them
spurious.

On 4.16, we don't see retransmits in netstat but the high tail latencies
are still there.
"""

I suspect the RTOs disappeared but latencies remained too high because
between 4.11 and 4.16 we introduced:
  tcp: allow TLP in ECN CWR
  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=b4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd

So the RTOs probably disappeared because this commit turned them into
TLPs. But the latencies remained high because the fundamental bug
remained throughout 4.11 and 4.16 and today: the DCTCP use of
tcp_send_ack() with an old rcv_nxt caused delayed ACKs to be cancelled
when they should not have been.

> In terms of the fix itself, it seems odd the tcp_send_ack() call in
> DCTCP generates NON_DELAYED_ACK event to toggle DCTCP's
> delayed_ack_reserved bit. Shouldn't the fix to have DCTCP send the
> "prior" ACK w/o cancelling delayed-ACK and mis-recording that it's
> cancelled, because that prior-ACK really is a supplementary old ACK.

This patch is fixing an issue that's orthogonal to the one you are
talking about. Using the taxonomy from our team's internal discussion
yesterday, the issue you mention where the DCTCP "prior" ACK is
cancelling delayed ACKs is issue (4); the issue that this particular
"tcp: notify when a delayed ack is sent" patch from Larry fixes is
issue (3). It's a bit tricky because both issues appeared in Larry's
trace summary and packetdrill script to reproduce the issue.

neal

I was able to track the patch that introduced the problem:

commit 3759824da87b30ce7a35b4873b62b0ba38905ef5
Author: Yuchung Cheng 
Date:   Wed Jul 1 14:11:15 2015 -0700

tcp: PRR uses CRB mode by default and SS mode conditionally

I tested a kernel which reverted the relevant change (see diff below) and the 
high tail latencies of more than 40ms disappeared. However, the 10KB high 
percentile latencies are 4ms vs. less than 200us with my patches. It looks like 
the patch above ended up reducing the cwnd to 1 in the scenarios that were 
triggering the high tail latencies. That is, it increased the likelihood of 
triggering actual bugs in the network stack code that my patch set fixes.


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2c5d70bc294e..50fabb07d739 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2468,13 +2468,14 @@ void tcp_cwnd_reduction(struct sock *sk, int 
newly_acked_sacked, int flag)
u64 dividend = (u64)tp->snd_ssthresh * tp->prr_delivered +
   tp->prior_cwnd - 1;
sndcnt = div_u64(dividend, tp->prior_cwnd) - tp->prr_out;
-   } else if ((flag & FLAG_RETRANS_DATA_ACKED) &&
-  !(flag & FLAG_LOST_RETRANS)) {
+   } else {
+// } else if ((flag & FLAG_RETRANS_DATA_ACKED) &&
+//!(flag & FLAG_LOST_RETRANS)) {
sndcnt =

Re: [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent

2018-07-03 Thread Lawrence Brakmo
On 7/2/18, 4:50 PM, "Yuchung Cheng"  wrote:

On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo  wrote:
>
> DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> notifications to keep track if it needs to send an ACK for packets that
> were received with a particular ECN state but whose ACK was delayed.
>
> Under some circumstances, for example when a delayed ACK is sent with a
> data packet, DCTCP state was not being updated due to a lack of
> notification that the previously delayed ACK was sent. As a result, it
> would sometimes send a duplicate ACK when a new data packet arrived.
>
> This patch insures that DCTCP's state is correctly updated so it will
> not send the duplicate ACK.
Sorry to chime-in late here (lame excuse: IETF deadline)

IIRC this issue would exist prior to 4.11 kernel. While it'd be good
to fix that, it's not clear which patch introduced the regression
between 4.11 and 4.16? I assume you tested Eric's most recent quickack
fix.

In terms of the fix itself, it seems odd the tcp_send_ack() call in
DCTCP generates NON_DELAYED_ACK event to toggle DCTCP's
delayed_ack_reserved bit. Shouldn't the fix to have DCTCP send the
"prior" ACK w/o cancelling delayed-ACK and mis-recording that it's
cancelled, because that prior-ACK really is a supplementary old ACK.

But it's still unclear how this bug introduces the regression 4.11 - 4.16
   
Feedback is always appreciated! This issue is also present in 4.11 (that is 
where I discovered). I think the bug was introduces much earlier.

Yes, I tested with Eric's quickack fix, it did not fix either of the two issues 
that are fixed with this patch set.

As I mentioned earlier, the bug was introduced before 4.11. I am not sure I 
understand your comments. Yes, at some level it would make sense to change the 
delayed_ack_reserved bit directly, but we would still need to do it whenever we 
send the ACK, so I do not think it can be helped. Please clarify if I 
misunderstood your comment.

>
> Improved based on comments from Neal Cardwell .
>
> Signed-off-by: Lawrence Brakmo 
> ---
>  net/ipv4/tcp_output.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f8f6129160dd..acefb64e8280 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -172,6 +172,8 @@ static inline void tcp_event_ack_sent(struct sock 
*sk, unsigned int pkts)
> __sock_put(sk);
> }
> tcp_dec_quickack_mode(sk, pkts);
> +   if (inet_csk_ack_scheduled(sk))
> +   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
> inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
>  }
>
> @@ -3567,8 +3569,6 @@ void tcp_send_ack(struct sock *sk)
> if (sk->sk_state == TCP_CLOSE)
> return;
>
> -   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
> -
> /* We are not putting this on the write queue, so
>  * tcp_transmit_skb() will set the ownership to this
>  * sock.
> --
> 2.17.1
>




Re: [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent

2018-07-03 Thread Neal Cardwell
On Mon, Jul 2, 2018 at 7:49 PM Yuchung Cheng  wrote:
>
> On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo  wrote:
> >
> > DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> > notifications to keep track if it needs to send an ACK for packets that
> > were received with a particular ECN state but whose ACK was delayed.
> >
> > Under some circumstances, for example when a delayed ACK is sent with a
> > data packet, DCTCP state was not being updated due to a lack of
> > notification that the previously delayed ACK was sent. As a result, it
> > would sometimes send a duplicate ACK when a new data packet arrived.
> >
> > This patch insures that DCTCP's state is correctly updated so it will
> > not send the duplicate ACK.
> Sorry to chime-in late here (lame excuse: IETF deadline)
>
> IIRC this issue would exist prior to 4.11 kernel. While it'd be good
> to fix that, it's not clear which patch introduced the regression
> between 4.11 and 4.16? I assume you tested Eric's most recent quickack
> fix.

I don't think Larry is saying there's a regression between 4.11 and
4.16. His recent "tcp: force cwnd at least 2 in tcp_cwnd_reduction"
patch here:

  https://patchwork.ozlabs.org/patch/935233/

said that 4.11 was bad (high tail latency and lots of RTOs) and 4.16
was still bad but with different netstat counters (no RTOs but still
high tail latency):

"""
On 4.11, pcap traces indicate that in some instances the 1st packet of
the RPC is received but no ACK is sent before the packet is
retransmitted. On 4.11 netstat shows TCP timeouts, with some of them
spurious.

On 4.16, we don't see retransmits in netstat but the high tail latencies
are still there.
"""

I suspect the RTOs disappeared but latencies remained too high because
between 4.11 and 4.16 we introduced:
  tcp: allow TLP in ECN CWR
  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=b4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd

So the RTOs probably disappeared because this commit turned them into
TLPs. But the latencies remained high because the fundamental bug
remained throughout 4.11 and 4.16 and today: the DCTCP use of
tcp_send_ack() with an old rcv_nxt caused delayed ACKs to be cancelled
when they should not have been.

> In terms of the fix itself, it seems odd the tcp_send_ack() call in
> DCTCP generates NON_DELAYED_ACK event to toggle DCTCP's
> delayed_ack_reserved bit. Shouldn't the fix to have DCTCP send the
> "prior" ACK w/o cancelling delayed-ACK and mis-recording that it's
> cancelled, because that prior-ACK really is a supplementary old ACK.

This patch is fixing an issue that's orthogonal to the one you are
talking about. Using the taxonomy from our team's internal discussion
yesterday, the issue you mention where the DCTCP "prior" ACK is
cancelling delayed ACKs is issue (4); the issue that this particular
"tcp: notify when a delayed ack is sent" patch from Larry fixes is
issue (3). It's a bit tricky because both issues appeared in Larry's
trace summary and packetdrill script to reproduce the issue.

neal


Re: [PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent

2018-07-02 Thread Yuchung Cheng
On Mon, Jul 2, 2018 at 2:39 PM, Lawrence Brakmo  wrote:
>
> DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
> notifications to keep track if it needs to send an ACK for packets that
> were received with a particular ECN state but whose ACK was delayed.
>
> Under some circumstances, for example when a delayed ACK is sent with a
> data packet, DCTCP state was not being updated due to a lack of
> notification that the previously delayed ACK was sent. As a result, it
> would sometimes send a duplicate ACK when a new data packet arrived.
>
> This patch insures that DCTCP's state is correctly updated so it will
> not send the duplicate ACK.
Sorry to chime-in late here (lame excuse: IETF deadline)

IIRC this issue would exist prior to 4.11 kernel. While it'd be good
to fix that, it's not clear which patch introduced the regression
between 4.11 and 4.16? I assume you tested Eric's most recent quickack
fix.

In terms of the fix itself, it seems odd the tcp_send_ack() call in
DCTCP generates NON_DELAYED_ACK event to toggle DCTCP's
delayed_ack_reserved bit. Shouldn't the fix to have DCTCP send the
"prior" ACK w/o cancelling delayed-ACK and mis-recording that it's
cancelled, because that prior-ACK really is a supplementary old ACK.

But it's still unclear how this bug introduces the regression 4.11 - 4.16


>
> Improved based on comments from Neal Cardwell .
>
> Signed-off-by: Lawrence Brakmo 
> ---
>  net/ipv4/tcp_output.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index f8f6129160dd..acefb64e8280 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -172,6 +172,8 @@ static inline void tcp_event_ack_sent(struct sock *sk, 
> unsigned int pkts)
> __sock_put(sk);
> }
> tcp_dec_quickack_mode(sk, pkts);
> +   if (inet_csk_ack_scheduled(sk))
> +   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
> inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
>  }
>
> @@ -3567,8 +3569,6 @@ void tcp_send_ack(struct sock *sk)
> if (sk->sk_state == TCP_CLOSE)
> return;
>
> -   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
> -
> /* We are not putting this on the write queue, so
>  * tcp_transmit_skb() will set the ownership to this
>  * sock.
> --
> 2.17.1
>


[PATCH net-next v2 1/2] tcp: notify when a delayed ack is sent

2018-07-02 Thread Lawrence Brakmo
DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
notifications to keep track if it needs to send an ACK for packets that
were received with a particular ECN state but whose ACK was delayed.

Under some circumstances, for example when a delayed ACK is sent with a
data packet, DCTCP state was not being updated due to a lack of
notification that the previously delayed ACK was sent. As a result, it
would sometimes send a duplicate ACK when a new data packet arrived.

This patch insures that DCTCP's state is correctly updated so it will
not send the duplicate ACK.

Improved based on comments from Neal Cardwell .

Signed-off-by: Lawrence Brakmo 
---
 net/ipv4/tcp_output.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f8f6129160dd..acefb64e8280 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -172,6 +172,8 @@ static inline void tcp_event_ack_sent(struct sock *sk, 
unsigned int pkts)
__sock_put(sk);
}
tcp_dec_quickack_mode(sk, pkts);
+   if (inet_csk_ack_scheduled(sk))
+   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK);
 }
 
@@ -3567,8 +3569,6 @@ void tcp_send_ack(struct sock *sk)
if (sk->sk_state == TCP_CLOSE)
return;
 
-   tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
-
/* We are not putting this on the write queue, so
 * tcp_transmit_skb() will set the ownership to this
 * sock.
-- 
2.17.1