Re: [PATCH net 2/5] tcp: prevent bogus FRTO undos with non-SACK flows

2018-03-07 Thread Ilpo Järvinen
On Wed, 7 Mar 2018, Yuchung Cheng wrote:
> On Wed, Mar 7, 2018 at 11:24 AM, Neal Cardwell  wrote:
> > On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen  
> > wrote:
> > >
> > > In a non-SACK case, any non-retransmitted segment acknowledged will
> > > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > > no indication that it would have been delivered for real (the
> > > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > > case). This causes bogus undos in ordinary RTO recoveries where
> > > segments are lost here and there, with a few delivered segments in
> > > between losses. A cumulative ACKs will cover retransmitted ones at
> > > the bottom and the non-retransmitted ones following that causing
> > > FLAG_ORIG_SACK_ACKED to be set in tcp_clean_rtx_queue and results
> > > in a spurious FRTO undo.
> > >
> > > We need to make the check more strict for non-SACK case and check
> > > that none of the cumulatively ACKed segments were retransmitted,
> > > which would be the case for the last step of FRTO algorithm as we
> > > sent out only new segments previously. Only then, allow FRTO undo
> > > to proceed in non-SACK case.
> >
> > Hi Ilpo - Do you have a packet trace or (even better) packetdrill
> > script illustrating this issue? It would be nice to have a test case
> > or at least concrete example of this.
>
> a packetdrill or even a contrived example would be good ...

I've seen all but this for sure in packet traces. But I'm somewhat 
old-school that while looking for the burst issue I discovered this 
issue by reading the code only (making it more than _one_ issue).
However, I think that I later on saw also this issue from the traces
(as it seemed to not match to any of the other burst issues this whole 
series is trying to fix). But finding that dump afterwards would take 
really long time, I've more than enough of them from our recent
tests ;-)).

But anyway, that was before the recent moving for the condition into 
tp->frto block so it might no longer be triggerable. It clearly was 
triggerable beforehand without tp->frto guard (and I just forward-ported 
past that recent change without thinking it much).

To trigger it, ever-R and !ever-R skb would need to be cumulatively 
ACKed when tp->frto is non-zero. Do you think that is still possible
with FRTO? E.g., after some undo leaving some ever-R and then RTO 
resulting in FRTO procedure?

> also why not just avoid setting FLAG_ORIG_SACK_ACKED on non-sack? seems 
> a much clean fix.

I guess that would work now that the relevant FRTO condition got moved
into the tp->frto block. It wouldn't have been that simple earlier
as SACK wanted FLAG_ORIG_SACK_ACKED while non-SACK wants
FLAG_ONLY_ORIG_ACKED (that was already available through a combination
of the existing FLAGs).


-- 
 i.

Re: [PATCH net 2/5] tcp: prevent bogus FRTO undos with non-SACK flows

2018-03-07 Thread Yuchung Cheng
On Wed, Mar 7, 2018 at 11:24 AM, Neal Cardwell  wrote:
>
> On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen  
> wrote:
> >
> > In a non-SACK case, any non-retransmitted segment acknowledged will
> > set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> > no indication that it would have been delivered for real (the
> > scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> > case). This causes bogus undos in ordinary RTO recoveries where
> > segments are lost here and there, with a few delivered segments in
> > between losses. A cumulative ACKs will cover retransmitted ones at
> > the bottom and the non-retransmitted ones following that causing
> > FLAG_ORIG_SACK_ACKED to be set in tcp_clean_rtx_queue and results
> > in a spurious FRTO undo.
> >
> > We need to make the check more strict for non-SACK case and check
> > that none of the cumulatively ACKed segments were retransmitted,
> > which would be the case for the last step of FRTO algorithm as we
> > sent out only new segments previously. Only then, allow FRTO undo
> > to proceed in non-SACK case.
>
> Hi Ilpo - Do you have a packet trace or (even better) packetdrill
> script illustrating this issue? It would be nice to have a test case
> or at least concrete example of this.
a packetdrill or even a contrived example would be good ... also why
not just avoid setting FLAG_ORIG_SACK_ACKED on non-sack? seems a much
clean fix.

>
> Thanks!
> neal


Re: [PATCH net 2/5] tcp: prevent bogus FRTO undos with non-SACK flows

2018-03-07 Thread Neal Cardwell
On Wed, Mar 7, 2018 at 7:59 AM, Ilpo Järvinen  wrote:
>
> In a non-SACK case, any non-retransmitted segment acknowledged will
> set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
> no indication that it would have been delivered for real (the
> scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
> case). This causes bogus undos in ordinary RTO recoveries where
> segments are lost here and there, with a few delivered segments in
> between losses. A cumulative ACKs will cover retransmitted ones at
> the bottom and the non-retransmitted ones following that causing
> FLAG_ORIG_SACK_ACKED to be set in tcp_clean_rtx_queue and results
> in a spurious FRTO undo.
>
> We need to make the check more strict for non-SACK case and check
> that none of the cumulatively ACKed segments were retransmitted,
> which would be the case for the last step of FRTO algorithm as we
> sent out only new segments previously. Only then, allow FRTO undo
> to proceed in non-SACK case.

Hi Ilpo - Do you have a packet trace or (even better) packetdrill
script illustrating this issue? It would be nice to have a test case
or at least concrete example of this.

Thanks!
neal


[PATCH net 2/5] tcp: prevent bogus FRTO undos with non-SACK flows

2018-03-07 Thread Ilpo Järvinen
In a non-SACK case, any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case). This causes bogus undos in ordinary RTO recoveries where
segments are lost here and there, with a few delivered segments in
between losses. A cumulative ACKs will cover retransmitted ones at
the bottom and the non-retransmitted ones following that causing
FLAG_ORIG_SACK_ACKED to be set in tcp_clean_rtx_queue and results
in a spurious FRTO undo.

We need to make the check more strict for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted,
which would be the case for the last step of FRTO algorithm as we
sent out only new segments previously. Only then, allow FRTO undo
to proceed in non-SACK case.

Signed-off-by: Ilpo Järvinen 
---
 net/ipv4/tcp_input.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0305f6d..1a33752 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2629,8 +2629,13 @@ static void tcp_process_loss(struct sock *sk, int flag, 
bool is_dupack,
if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */
/* Step 3.b. A timeout is spurious if not all data are
 * lost, i.e., never-retransmitted data are (s)acked.
+*
+* As the non-SACK case does not keep track of 
TCPCB_SACKED_ACKED,
+* we need to ensure that none of the cumulative ACKed segments
+* was retransmitted to confirm the validity of 
FLAG_ORIG_SACK_ACKED.
 */
if ((flag & FLAG_ORIG_SACK_ACKED) &&
+   (tcp_is_sack(tp) || !(flag & FLAG_RETRANS_DATA_ACKED)) &&
tcp_try_undo_loss(sk, true))
return;
 
-- 
2.7.4