Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-10 Thread Per Hurtig


> 10 dec. 2015 kl. 16:37 skrev Neal Cardwell :
> 
>> On Thu, Dec 10, 2015 at 1:51 AM, Per Hurtig  wrote:
>> 
>>> On 08 Dec 2015, at 14:47, Eric Dumazet  wrote:
>>> 
>>> On Tue, 2015-12-08 at 10:19 +0100, Per Hurtig wrote:
>>> 
 +static u32 tcp_unsent_pkts(const struct sock *sk, u32 ulimit)
 +{
 +struct sk_buff *skb = tcp_send_head(sk);
 +u32 pkts = 0;
 +
 +if (skb)
 +tcp_for_write_queue_from(skb, sk) {
 +pkts += tcp_skb_pcount(skb);
 +
 +if (ulimit && pkts >= ulimit)
 +return ulimit;
 +}
 +
 +return pkts;
 +}
>>> 
>>> 
>>> Considering Yuchung feedback, have you looked at using an approximation
>>> instead ?
>>> 
>>> (ie using tp->write_seq - tp->snd_nxt)
>> 
>> Well, an approximation is rather “dangerous” as missing a single packet
>> could inhibit the desired behaviour. If looping is undesired, I think a
>> better solution is to actually *not* do this check at all and instead rely
>> solely on the
>> 
>> tp->packets_out < TCP_RTORESTART_THRESH
> 
> Yes, this simpler version seems very much preferable, IMHO. I agree
> that it does not seem worth the complexity to try to cover the kind of
> corner cases you outline.
> 
> I would also suggest a TCP_RTORESTART_THRESH value higher than 4.
> 
> In the ID at https://tools.ietf.org/html/draft-ietf-tcpm-rtorestart-10 it 
> says:
> 
>   The RECOMMENDED value of rrthresh is four, as this value will ensure
>   that RTOR is only used when fast retransmit cannot be triggered.
> 
> But my sense is that fast retransmit is often not triggered at
> in-flight counts of much higher than 4, due to drop-tail queues, TSO
> bursts, the initial IW10 being unpaced, etc. It would be interesting
> to see A/B experiments for a few TCP_RTORESTART_THRESH values, say, 4
> vs 10.
> 
> neal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sure. One idea could also be to use the "reordering" value as a dynamic 
threshhold?

-- Per
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-10 Thread Neal Cardwell
On Thu, Dec 10, 2015 at 1:51 AM, Per Hurtig  wrote:
>
>> On 08 Dec 2015, at 14:47, Eric Dumazet  wrote:
>>
>> On Tue, 2015-12-08 at 10:19 +0100, Per Hurtig wrote:
>>
>>> +static u32 tcp_unsent_pkts(const struct sock *sk, u32 ulimit)
>>> +{
>>> +struct sk_buff *skb = tcp_send_head(sk);
>>> +u32 pkts = 0;
>>> +
>>> +if (skb)
>>> +tcp_for_write_queue_from(skb, sk) {
>>> +pkts += tcp_skb_pcount(skb);
>>> +
>>> +if (ulimit && pkts >= ulimit)
>>> +return ulimit;
>>> +}
>>> +
>>> +return pkts;
>>> +}
>>
>>
>> Considering Yuchung feedback, have you looked at using an approximation
>> instead ?
>>
>> (ie using tp->write_seq - tp->snd_nxt)
>>
>>
>
> Well, an approximation is rather “dangerous” as missing a single packet
> could inhibit the desired behaviour. If looping is undesired, I think a
> better solution is to actually *not* do this check at all and instead rely
> solely on the
>
> tp->packets_out < TCP_RTORESTART_THRESH

Yes, this simpler version seems very much preferable, IMHO. I agree
that it does not seem worth the complexity to try to cover the kind of
corner cases you outline.

I would also suggest a TCP_RTORESTART_THRESH value higher than 4.

In the ID at https://tools.ietf.org/html/draft-ietf-tcpm-rtorestart-10 it says:

   The RECOMMENDED value of rrthresh is four, as this value will ensure
   that RTOR is only used when fast retransmit cannot be triggered.

But my sense is that fast retransmit is often not triggered at
in-flight counts of much higher than 4, due to drop-tail queues, TSO
bursts, the initial IW10 being unpaced, etc. It would be interesting
to see A/B experiments for a few TCP_RTORESTART_THRESH values, say, 4
vs 10.

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-09 Thread Per Hurtig

> On 08 Dec 2015, at 14:47, Eric Dumazet  wrote:
> 
> On Tue, 2015-12-08 at 10:19 +0100, Per Hurtig wrote:
> 
>> +static u32 tcp_unsent_pkts(const struct sock *sk, u32 ulimit)
>> +{
>> +struct sk_buff *skb = tcp_send_head(sk);
>> +u32 pkts = 0;
>> +
>> +if (skb)
>> +tcp_for_write_queue_from(skb, sk) {
>> +pkts += tcp_skb_pcount(skb);
>> +
>> +if (ulimit && pkts >= ulimit)
>> +return ulimit;
>> +}
>> +
>> +return pkts;
>> +}
> 
> 
> Considering Yuchung feedback, have you looked at using an approximation
> instead ?
> 
> (ie using tp->write_seq - tp->snd_nxt)
> 
> 

Well, an approximation is rather “dangerous” as missing a single packet
could inhibit the desired behaviour. If looping is undesired, I think a
better solution is to actually *not* do this check at all and instead rely
solely on the

tp->packets_out < TCP_RTORESTART_THRESH

check instead. The reason why the number of unsent packets was
included was only to fix a corner case where it should be possible
to use the modified restart, but impossible due to the conditioning.

However, this corner case is likely to not occur very often and we
may be better off with the simpler check.

The corner case (if I remember this correctly) is that the restart is
not triggered when you have 2 segments in flight and (i) have a
congestion window of exactly 3; or (ii) get a packet written to the socket
just between previous data transmission and the arrival of the
acknowledgment that triggers the restart.


— Per



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-08 Thread Eric Dumazet
On Tue, 2015-12-08 at 10:19 +0100, Per Hurtig wrote:

> +static u32 tcp_unsent_pkts(const struct sock *sk, u32 ulimit)
> +{
> + struct sk_buff *skb = tcp_send_head(sk);
> + u32 pkts = 0;
> +
> + if (skb)
> + tcp_for_write_queue_from(skb, sk) {
> + pkts += tcp_skb_pcount(skb);
> +
> + if (ulimit && pkts >= ulimit)
> + return ulimit;
> + }
> +
> + return pkts;
> +}


Considering Yuchung feedback, have you looked at using an approximation
instead ? 

(ie using tp->write_seq - tp->snd_nxt)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-08 Thread Per Hurtig

> On 08 Dec 2015, at 11:50, Ilpo Järvinen  wrote:
> 
> On Tue, 8 Dec 2015, Per Hurtig wrote:
> 
>> This patch implements the RTO restart modification (RTOR). When data is
>> ACKed, and the RTO timer is restarted, the time elapsed since the last
>> outstanding segment was transmitted is subtracted from the calculated RTO
>> value. This way, the RTO timer will expire after exactly RTO seconds, and
>> not RTO + RTT [+ delACK] seconds.
>> 
>> This patch also implements a new sysctl (tcp_timer_restart) that is used
>> to control the timer restart behavior.
>> 
>> Signed-off-by: Per Hurtig 
>> ---
>> Documentation/networking/ip-sysctl.txt | 12 
>> include/net/tcp.h  |  6 ++
>> net/ipv4/sysctl_net_ipv4.c | 10 ++
>> net/ipv4/tcp_input.c   | 29 +
>> 4 files changed, 57 insertions(+)
>> 
>> diff --git a/Documentation/networking/ip-sysctl.txt 
>> b/Documentation/networking/ip-sysctl.txt
>> index 2ea4c45..4094128 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -591,6 +591,18 @@ tcp_syn_retries - INTEGER
>>  with the current initial RTO of 1second. With this the final timeout
>>  for an active TCP connection attempt will happen after 127seconds.
>> 
>> +tcp_timer_restart - INTEGER
>> +Controls how the RTO and PTO timers are restarted (RTOR and TLPR).
>> +If set (per timer or combined) the timers are restarted with
>> +respect to the earliest outstanding segment, to not extend tail loss
>> +latency unnecessarily.
>> +Possible values:
>> +0 disables RTOR and TLPR.
>> +1 enables RTOR.
>> +2 enables TLPR.
>> +3 enables RTOR and TLPR.
>> +Default: 3
>> +
>> tcp_timestamps - BOOLEAN
>>  Enable timestamps as defined in RFC1323.
>> 
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index f80e74c..833efb7 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -76,6 +76,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
>> /* After receiving this amount of duplicate ACKs fast retransmit starts. */
>> #define TCP_FASTRETRANS_THRESH 3
>> 
>> +/* Disable RTO Restart if the number of outstanding segments is at least. */
>> +#define TCP_TIMER_RTORESTART1
>> +#define TCP_TIMER_TLPRESTART2
>> +#define TCP_RTORESTART_THRESH   4
> 
> Unfortunately the comment got now separated from the actual define.
> 
> 
> --
> i.
Darn, I knew I missed something. Well, I’ll fix that in the next round. Suppose 
there are more things that could be improved.

Per



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-08 Thread Ilpo Järvinen
On Tue, 8 Dec 2015, Per Hurtig wrote:

> This patch implements the RTO restart modification (RTOR). When data is
> ACKed, and the RTO timer is restarted, the time elapsed since the last
> outstanding segment was transmitted is subtracted from the calculated RTO
> value. This way, the RTO timer will expire after exactly RTO seconds, and
> not RTO + RTT [+ delACK] seconds.
> 
> This patch also implements a new sysctl (tcp_timer_restart) that is used
> to control the timer restart behavior.
> 
> Signed-off-by: Per Hurtig 
> ---
>  Documentation/networking/ip-sysctl.txt | 12 
>  include/net/tcp.h  |  6 ++
>  net/ipv4/sysctl_net_ipv4.c | 10 ++
>  net/ipv4/tcp_input.c   | 29 +
>  4 files changed, 57 insertions(+)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 2ea4c45..4094128 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -591,6 +591,18 @@ tcp_syn_retries - INTEGER
>   with the current initial RTO of 1second. With this the final timeout
>   for an active TCP connection attempt will happen after 127seconds.
>  
> +tcp_timer_restart - INTEGER
> + Controls how the RTO and PTO timers are restarted (RTOR and TLPR).
> + If set (per timer or combined) the timers are restarted with
> + respect to the earliest outstanding segment, to not extend tail loss
> + latency unnecessarily.
> + Possible values:
> + 0 disables RTOR and TLPR.
> + 1 enables RTOR.
> + 2 enables TLPR.
> + 3 enables RTOR and TLPR.
> + Default: 3
> +
>  tcp_timestamps - BOOLEAN
>   Enable timestamps as defined in RFC1323.
>  
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index f80e74c..833efb7 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -76,6 +76,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
>  /* After receiving this amount of duplicate ACKs fast retransmit starts. */
>  #define TCP_FASTRETRANS_THRESH 3
>  
> +/* Disable RTO Restart if the number of outstanding segments is at least. */
> +#define TCP_TIMER_RTORESTART 1
> +#define TCP_TIMER_TLPRESTART 2
> +#define TCP_RTORESTART_THRESH4

Unfortunately the comment got now separated from the actual define.


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-08 Thread Per Hurtig
This patch implements the RTO restart modification (RTOR). When data is
ACKed, and the RTO timer is restarted, the time elapsed since the last
outstanding segment was transmitted is subtracted from the calculated RTO
value. This way, the RTO timer will expire after exactly RTO seconds, and
not RTO + RTT [+ delACK] seconds.

This patch also implements a new sysctl (tcp_timer_restart) that is used
to control the timer restart behavior.

Signed-off-by: Per Hurtig 
---
 Documentation/networking/ip-sysctl.txt | 12 
 include/net/tcp.h  |  6 ++
 net/ipv4/sysctl_net_ipv4.c | 10 ++
 net/ipv4/tcp_input.c   | 29 +
 4 files changed, 57 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 2ea4c45..4094128 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -591,6 +591,18 @@ tcp_syn_retries - INTEGER
with the current initial RTO of 1second. With this the final timeout
for an active TCP connection attempt will happen after 127seconds.
 
+tcp_timer_restart - INTEGER
+   Controls how the RTO and PTO timers are restarted (RTOR and TLPR).
+   If set (per timer or combined) the timers are restarted with
+   respect to the earliest outstanding segment, to not extend tail loss
+   latency unnecessarily.
+   Possible values:
+   0 disables RTOR and TLPR.
+   1 enables RTOR.
+   2 enables TLPR.
+   3 enables RTOR and TLPR.
+   Default: 3
+
 tcp_timestamps - BOOLEAN
Enable timestamps as defined in RFC1323.
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f80e74c..833efb7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -76,6 +76,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 /* After receiving this amount of duplicate ACKs fast retransmit starts. */
 #define TCP_FASTRETRANS_THRESH 3
 
+/* Disable RTO Restart if the number of outstanding segments is at least. */
+#define TCP_TIMER_RTORESTART   1
+#define TCP_TIMER_TLPRESTART   2
+#define TCP_RTORESTART_THRESH  4
+
 /* Maximal number of ACKs sent quickly to accelerate slow-start. */
 #define TCP_MAX_QUICKACKS  16U
 
@@ -284,6 +289,7 @@ extern int sysctl_tcp_autocorking;
 extern int sysctl_tcp_invalid_ratelimit;
 extern int sysctl_tcp_pacing_ss_ratio;
 extern int sysctl_tcp_pacing_ca_ratio;
+extern int sysctl_tcp_timer_restart;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index a0bd7a5..dfb6968 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -28,6 +28,7 @@
 
 static int zero;
 static int one = 1;
+static int three = 3;
 static int four = 4;
 static int thousand = 1000;
 static int gso_max_segs = GSO_MAX_SEGS;
@@ -745,6 +746,15 @@ static struct ctl_table ipv4_table[] = {
.extra2 = &thousand,
},
{
+   .procname   = "tcp_timer_restart",
+   .data   = &sysctl_tcp_timer_restart,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = &zero,
+   .extra2 = &three,
+   },
+   {
.procname   = "tcp_autocorking",
.data   = &sysctl_tcp_autocorking,
.maxlen = sizeof(int),
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2d656ee..2870af3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
 
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
 int sysctl_tcp_early_retrans __read_mostly = 3;
+int sysctl_tcp_timer_restart __read_mostly = TCP_TIMER_RTORESTART |
+TCP_TIMER_TLPRESTART;
 int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 
 #define FLAG_DATA  0x01 /* Incoming frame contained data.  
*/
@@ -2997,6 +2999,22 @@ static void tcp_cong_avoid(struct sock *sk, u32 ack, u32 
acked)
tcp_sk(sk)->snd_cwnd_stamp = tcp_time_stamp;
 }
 
+static u32 tcp_unsent_pkts(const struct sock *sk, u32 ulimit)
+{
+   struct sk_buff *skb = tcp_send_head(sk);
+   u32 pkts = 0;
+
+   if (skb)
+   tcp_for_write_queue_from(skb, sk) {
+   pkts += tcp_skb_pcount(skb);
+
+   if (ulimit && pkts >= ulimit)
+   return ulimit;
+   }
+
+   return pkts;
+}
+
 /* Restart timer after forward progress on connection.
  * RFC2988 recommends to restart timer to now+rto.
  */
@@ -3027,6 +3045,17 @@ void tcp_rearm_rto(struct sock *sk)
 */
if (delta >