Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

2018-07-09 Thread Lawrence Brakmo
On 7/9/18, 12:32 PM, "Yuchung Cheng"  wrote:

On Sat, Jul 7, 2018 at 7:07 AM, Neal Cardwell  wrote:
> On Sat, Jul 7, 2018 at 7:15 AM David Miller  wrote:
>>
>> From: Lawrence Brakmo 
>> Date: Tue, 3 Jul 2018 09:26:13 -0700
>>
>> > When have observed high tail latencies when using DCTCP for RPCs as
>> > compared to using Cubic. For example, in one setup there are 2 hosts
>> > sending to a 3rd one, with each sender having 3 flows (1 stream,
>> > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
>> > table shows the 99% and 99.9% latencies for both Cubic and dctcp:
>> >
>> >Cubic 99%  Cubic 99.9%   dctcp 99%dctcp 99.9%
>> > 1MB RPCs2.6ms   5.5ms 43ms  208ms
>> > 10KB RPCs1.1ms   1.3ms 53ms  212ms
>>  ...
>> > v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
>> > tcp_event_ack_sent. Based on Neal Cardwell 
>> > feedback.
>> > Modified tcp_ecn_check_ce (and renamed it tcp_ecn_check) instead 
of modifying
>> > tcp_ack_send_check to insure an ACK when cwr is received.
>> > v3: Handling cwr in tcp_ecn_accept_cwr instead of in tcp_ecn_check.
>> >
>> > [PATCH net-next v3 1/2] tcp: notify when a delayed ack is sent
>> > [PATCH net-next v3 2/2] tcp: ack immediately when a cwr packet
>>
>> Neal and co., what are your thoughts right now about this patch series?
>>
>> Thank you.
>
> IMHO these patches are a definite improvement over what we have now.
>
> That said, in chatting with Yuchung before the July 4th break, I think
> Yuchung and I agreed that we would ideally like to see something like
> the following:
>
> (1) refactor the DCTCP code to check for pending delayed ACKs directly
> using existing state (inet_csk(sk)->icsk_ack.pending &
> ICSK_ACK_TIMER), and remove the ca->delayed_ack_reserved DCTCP field
> and the CA_EVENT_DELAYED_ACK and CA_EVENT_NON_DELAYED_ACK callbacks
> added for DCTCP (which Larry determined had at least one bug).

I agree that getting rid of the callbacks would be an improvement, but that is 
more about optimizing the code. This could be done after we fix the current 
bugs. My concern is that it may be more complicated that we think and the 
current bug would continue to exist. Yes, I realize that it has been there for 
a while; but not because no one found it before, but because it was hard to 
pinpoint. 

> (2) fix the bug with the DCTCP call to tcp_send_ack(sk) causing
> delayed ACKs to be incorrectly dropped/forgotten (not yet addressed by
> this patch series)

Good idea, but as I mentioned earlier, I would rather fix the really bad bugs 
first and then deal with this one. As far as I can see from my testing of 
DC-TCP, I have not seen any bad consequences from this bug so far.

> (3) then with fixes (1) and (2) in place, re-run tests and see if we
> still need Larry's heuristic (in patch 2) to fire an ACK immediately
> if a receiver receives a CWR packet (I suspect this is still very
> useful, but I think Yuchung is reluctant to add this complexity unless
> we have verified it's still needed after (1) and (2))

I fail to understand how (1) and (2) have anything to do with ACKing 
immediately when we receive a CWR packet. It has nothing to do with a current 
delayed ACK, it has to do with the cwnd closing to 1 when an TCP ECE marked 
packet is received at the end of an RPC and the current TCP delay ACK logic 
choosing to delay the ACK. The issue happens right after the receiver has sent 
its reply to the RPC, so at that stage there are no active delayed ACKs (the 
first patch fixed the issue where DC-TCP thought there was an active delayed 
ACK). 

>
> Our team may be able to help out with some proposed patches for (1) and 
(2).
>
> In any case, I would love to have Yuchung and Eric weigh in (perhaps
> Monday) before we merge this patch series.
Thanks Neal. Sorry for not reflecting these timely before I took off
for July 4 holidays. I was going to post the same comment - Larry: I
could provide draft patches if that helps.

Yuchung: go ahead and send me the drafts. But as I already mentioned, I would 
like to fix the bad bug first and then make it pretty.

>
> Thanks,
> neal
   



Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

2018-07-09 Thread Yuchung Cheng
On Sat, Jul 7, 2018 at 7:07 AM, Neal Cardwell  wrote:
> On Sat, Jul 7, 2018 at 7:15 AM David Miller  wrote:
>>
>> From: Lawrence Brakmo 
>> Date: Tue, 3 Jul 2018 09:26:13 -0700
>>
>> > When have observed high tail latencies when using DCTCP for RPCs as
>> > compared to using Cubic. For example, in one setup there are 2 hosts
>> > sending to a 3rd one, with each sender having 3 flows (1 stream,
>> > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
>> > table shows the 99% and 99.9% latencies for both Cubic and dctcp:
>> >
>> >Cubic 99%  Cubic 99.9%   dctcp 99%dctcp 99.9%
>> > 1MB RPCs2.6ms   5.5ms 43ms  208ms
>> > 10KB RPCs1.1ms   1.3ms 53ms  212ms
>>  ...
>> > v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
>> > tcp_event_ack_sent. Based on Neal Cardwell 
>> > feedback.
>> > Modified tcp_ecn_check_ce (and renamed it tcp_ecn_check) instead of 
>> > modifying
>> > tcp_ack_send_check to insure an ACK when cwr is received.
>> > v3: Handling cwr in tcp_ecn_accept_cwr instead of in tcp_ecn_check.
>> >
>> > [PATCH net-next v3 1/2] tcp: notify when a delayed ack is sent
>> > [PATCH net-next v3 2/2] tcp: ack immediately when a cwr packet
>>
>> Neal and co., what are your thoughts right now about this patch series?
>>
>> Thank you.
>
> IMHO these patches are a definite improvement over what we have now.
>
> That said, in chatting with Yuchung before the July 4th break, I think
> Yuchung and I agreed that we would ideally like to see something like
> the following:
>
> (1) refactor the DCTCP code to check for pending delayed ACKs directly
> using existing state (inet_csk(sk)->icsk_ack.pending &
> ICSK_ACK_TIMER), and remove the ca->delayed_ack_reserved DCTCP field
> and the CA_EVENT_DELAYED_ACK and CA_EVENT_NON_DELAYED_ACK callbacks
> added for DCTCP (which Larry determined had at least one bug).
>
> (2) fix the bug with the DCTCP call to tcp_send_ack(sk) causing
> delayed ACKs to be incorrectly dropped/forgotten (not yet addressed by
> this patch series)
>
> (3) then with fixes (1) and (2) in place, re-run tests and see if we
> still need Larry's heuristic (in patch 2) to fire an ACK immediately
> if a receiver receives a CWR packet (I suspect this is still very
> useful, but I think Yuchung is reluctant to add this complexity unless
> we have verified it's still needed after (1) and (2))
>
> Our team may be able to help out with some proposed patches for (1) and (2).
>
> In any case, I would love to have Yuchung and Eric weigh in (perhaps
> Monday) before we merge this patch series.
Thanks Neal. Sorry for not reflecting these timely before I took off
for July 4 holidays. I was going to post the same comment - Larry: I
could provide draft patches if that helps.

>
> Thanks,
> neal


Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

2018-07-07 Thread Neal Cardwell
On Sat, Jul 7, 2018 at 7:15 AM David Miller  wrote:
>
> From: Lawrence Brakmo 
> Date: Tue, 3 Jul 2018 09:26:13 -0700
>
> > When have observed high tail latencies when using DCTCP for RPCs as
> > compared to using Cubic. For example, in one setup there are 2 hosts
> > sending to a 3rd one, with each sender having 3 flows (1 stream,
> > 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
> > table shows the 99% and 99.9% latencies for both Cubic and dctcp:
> >
> >Cubic 99%  Cubic 99.9%   dctcp 99%dctcp 99.9%
> > 1MB RPCs2.6ms   5.5ms 43ms  208ms
> > 10KB RPCs1.1ms   1.3ms 53ms  212ms
>  ...
> > v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
> > tcp_event_ack_sent. Based on Neal Cardwell 
> > feedback.
> > Modified tcp_ecn_check_ce (and renamed it tcp_ecn_check) instead of 
> > modifying
> > tcp_ack_send_check to insure an ACK when cwr is received.
> > v3: Handling cwr in tcp_ecn_accept_cwr instead of in tcp_ecn_check.
> >
> > [PATCH net-next v3 1/2] tcp: notify when a delayed ack is sent
> > [PATCH net-next v3 2/2] tcp: ack immediately when a cwr packet
>
> Neal and co., what are your thoughts right now about this patch series?
>
> Thank you.

IMHO these patches are a definite improvement over what we have now.

That said, in chatting with Yuchung before the July 4th break, I think
Yuchung and I agreed that we would ideally like to see something like
the following:

(1) refactor the DCTCP code to check for pending delayed ACKs directly
using existing state (inet_csk(sk)->icsk_ack.pending &
ICSK_ACK_TIMER), and remove the ca->delayed_ack_reserved DCTCP field
and the CA_EVENT_DELAYED_ACK and CA_EVENT_NON_DELAYED_ACK callbacks
added for DCTCP (which Larry determined had at least one bug).

(2) fix the bug with the DCTCP call to tcp_send_ack(sk) causing
delayed ACKs to be incorrectly dropped/forgotten (not yet addressed by
this patch series)

(3) then with fixes (1) and (2) in place, re-run tests and see if we
still need Larry's heuristic (in patch 2) to fire an ACK immediately
if a receiver receives a CWR packet (I suspect this is still very
useful, but I think Yuchung is reluctant to add this complexity unless
we have verified it's still needed after (1) and (2))

Our team may be able to help out with some proposed patches for (1) and (2).

In any case, I would love to have Yuchung and Eric weigh in (perhaps
Monday) before we merge this patch series.

Thanks,
neal


Re: [PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

2018-07-07 Thread David Miller
From: Lawrence Brakmo 
Date: Tue, 3 Jul 2018 09:26:13 -0700

> When have observed high tail latencies when using DCTCP for RPCs as
> compared to using Cubic. For example, in one setup there are 2 hosts
> sending to a 3rd one, with each sender having 3 flows (1 stream,
> 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
> table shows the 99% and 99.9% latencies for both Cubic and dctcp:
> 
>Cubic 99%  Cubic 99.9%   dctcp 99%dctcp 99.9%
> 1MB RPCs2.6ms   5.5ms 43ms  208ms
> 10KB RPCs1.1ms   1.3ms 53ms  212ms
 ...
> v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
> tcp_event_ack_sent. Based on Neal Cardwell 
> feedback.
> Modified tcp_ecn_check_ce (and renamed it tcp_ecn_check) instead of 
> modifying
> tcp_ack_send_check to insure an ACK when cwr is received.
> v3: Handling cwr in tcp_ecn_accept_cwr instead of in tcp_ecn_check.
> 
> [PATCH net-next v3 1/2] tcp: notify when a delayed ack is sent
> [PATCH net-next v3 2/2] tcp: ack immediately when a cwr packet

Neal and co., what are your thoughts right now about this patch series?

Thank you.


[PATCH net-next v3 0/2] tcp: fix high tail latencies in DCTCP

2018-07-03 Thread Lawrence Brakmo
When have observed high tail latencies when using DCTCP for RPCs as
compared to using Cubic. For example, in one setup there are 2 hosts
sending to a 3rd one, with each sender having 3 flows (1 stream,
1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
table shows the 99% and 99.9% latencies for both Cubic and dctcp:

   Cubic 99%  Cubic 99.9%   dctcp 99%dctcp 99.9%
1MB RPCs2.6ms   5.5ms 43ms  208ms
10KB RPCs1.1ms   1.3ms 53ms  212ms

Looking at tcpdump traces showed that there are two causes for the
latency.  

  1) RTOs caused by the receiver sending a dup ACK and not ACKing
 the last (and only) packet sent.
  2) Delaying ACKs when the sender has a cwnd of 1, so everything
 pauses for the duration of the delayed ACK.

The first patch fixes the cause of the dup ACKs, not updating DCTCP
state when an ACK that was initially delayed has been sent with a
data packet.

The second patch insures that an ACK is sent immediately when a
CWR marked packet arrives.

With the patches the latencies for DCTCP now look like:

   dctcp 99%  dctcp 99.9%
1MB RPCs5.8ms   6.9ms
10KB RPCs146us   203us

Note that while the 1MB RPCs tail latencies are higher than Cubic's,
the 10KB latencies are much smaller than Cubic's. These patches fix
issues on the receiver, but tcpdump traces indicate there is an
opportunity to also fix an issue at the sender that adds about 3ms
to the tail latencies.

The following trace shows the issue that tiggers an RTO (fixed by these 
patches):

   Host A sends the last packets of the request
   Host B receives them, and the last packet is marked with congestion (CE)
   Host B sends ACKs for packets not marked with congestion
   Host B sends data packet with reply and ACK for packet marked with
  congestion (TCP flag ECE)
   Host A receives ACKs with no ECE flag
   Host A receives data packet with ACK for the last packet of request
  and which has TCP ECE bit set
   Host A sends 1st data packet of the next request with TCP flag CWR
   Host B receives the packet (as seen in tcpdump at B), no CE flag
   Host B sends a dup ACK that also has the TCP ECE flag
   Host A RTO timer fires!
   Host A to send the next packet
   Host A receives an ACK for everything it has sent (i.e. Host B
  did receive 1st packet of request)
   Host A send more packets…

v2: Removed call to tcp_ca_event from tcp_send_ack since I added one in
tcp_event_ack_sent. Based on Neal Cardwell 
feedback.
Modified tcp_ecn_check_ce (and renamed it tcp_ecn_check) instead of 
modifying
tcp_ack_send_check to insure an ACK when cwr is received.
v3: Handling cwr in tcp_ecn_accept_cwr instead of in tcp_ecn_check.

[PATCH net-next v3 1/2] tcp: notify when a delayed ack is sent
[PATCH net-next v3 2/2] tcp: ack immediately when a cwr packet

 net/ipv4/tcp_input.c  | 9 -
 net/ipv4/tcp_output.c | 4 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)