On Sat, Jan 20, 2018 at 05:53:05PM +0100, Mike Belopuhov wrote:
> On Sat, Jan 20, 2018 at 15:17 +0100, Alexander Bluhm wrote:
> While I'm not against making all TCP timeouts look similar, I'd like
> to understand if there's any other reason to do it other than
> "consistency".

I think it prevents a use after free.  The timeouts may fire and
wait for the netlock.  Then the softnet task or a user process calls
tcp_close().  As our net lock does not include splsoftnet anymore,
soft timeouts are not blocked.  So the reaper timeout may immediately
free the tcpcb.  Then the timer functions could operate on invalid
memory.  I have not seen this in practice, it is just a theory.

In 4.4 BSD there was no reaper timeout.  We have added it here:

revision 1.85
date: 2004/11/25 15:32:08;  author: markus;  state: Exp;  lines: +18 -3;
fix for race between invocation for timer and network input
1) add a reaper for TCP and SYN cache states (cf. netbsd pr 20390)
2) additional check for TCP_TIMER_ISARMED(TCPT_REXMT) in tcp_timer_persist()
with mickey@; ok deraadt@

> > The tcp reaper timeout is still imlemented as soft timeout.  So it
> > can run while net lock is held by others and it is not synchronized
> > with the other tcp timeouts.
> 
> Am I right to say that this is not an issue?  Neither pool_put nor
> tcpstat_inc requre a NET_LOCK, correct?

pool_put() does not need the lock to protect itself.  But it must
not free memory that another thread which is holding the net lock
is still using.

> > Convert it to an ordinary tcp timeout
> > so it is scheduled on the same timeout thread.  It grabs the net
> > lock to make sure that softnet has finished its work.
> 
> This just makes other threads who want to grab NET_LOCK wait for
> a pool_put to finish, but where's the benefit?

It restores the bahavior introduced in revision 1.85.  The memory
is only freed after the thread that called tcp_close() has released
the net lock.  As there was no timeout it 4.4 BSD, and the callers
of tcp_close() set their tcpcb pointer to NULL, and all syncronisation
with the timeouts is done by using a single timeout thread, the net
lock in tcp_timer_reaper() is not necessary.

If we fear that freeing the tcpcb is delayed too much by NET_LOCK()
and we consume too much memory, then I am fine when we do not grab
the net lock in tcp_timer_reaper().

Generally I prefer to place to many locks than to miss some.  When
we see lock congestion, we can remove the unnecessary ones.  This
stategy is better than adding necessary locks to a crashing kernel.

> What's up with the diagnostics?  Do you suspect a race of some sorts?

I consider myself not smart enough to program multi threaded.  There
is always some race I did not think about.  So I add diagnostics
to have a better feeling.

As we agree that the net lock is not needed in the TCP reaper, I
propose this diff without diagnostics and lock, but with a comment.

ok?

bluhm

Index: netinet/tcp_subr.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_subr.c,v
retrieving revision 1.167
diff -u -p -r1.167 tcp_subr.c
--- netinet/tcp_subr.c  7 Dec 2017 16:52:21 -0000       1.167
+++ netinet/tcp_subr.c  22 Jan 2018 20:29:06 -0000
@@ -436,7 +436,6 @@ tcp_newtcpcb(struct inpcb *inp)
        TCP_INIT_DELACK(tp);
        for (i = 0; i < TCPT_NTIMERS; i++)
                TCP_TIMER_INIT(tp, i);
-       timeout_set(&tp->t_reap_to, tcp_reaper, tp);
 
        tp->sack_enable = tcp_do_sack;
        tp->t_flags = tcp_do_rfc1323 ? (TF_REQ_SCALE|TF_REQ_TSTMP) : 0;
@@ -532,21 +531,12 @@ tcp_close(struct tcpcb *tp)
        m_free(tp->t_template);
 
        tp->t_flags |= TF_DEAD;
-       timeout_add(&tp->t_reap_to, 0);
+       TCP_TIMER_ARM(tp, TCPT_REAPER, 0);
 
-       inp->inp_ppcb = 0;
+       inp->inp_ppcb = NULL;
        soisdisconnected(so);
        in_pcbdetach(inp);
        return (NULL);
-}
-
-void
-tcp_reaper(void *arg)
-{
-       struct tcpcb *tp = arg;
-
-       pool_put(&tcpcb_pool, tp);
-       tcpstat_inc(tcps_closed);
 }
 
 int
Index: netinet/tcp_timer.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.c,v
retrieving revision 1.60
diff -u -p -r1.60 tcp_timer.c
--- netinet/tcp_timer.c 29 Oct 2017 14:56:36 -0000      1.60
+++ netinet/tcp_timer.c 22 Jan 2018 21:33:03 -0000
@@ -70,12 +70,14 @@ void        tcp_timer_rexmt(void *);
 void   tcp_timer_persist(void *);
 void   tcp_timer_keep(void *);
 void   tcp_timer_2msl(void *);
+void   tcp_timer_reaper(void *);
 
 const tcp_timer_func_t tcp_timer_funcs[TCPT_NTIMERS] = {
        tcp_timer_rexmt,
        tcp_timer_persist,
        tcp_timer_keep,
        tcp_timer_2msl,
+       tcp_timer_reaper,
 };
 
 /*
@@ -463,4 +465,21 @@ tcp_timer_2msl(void *arg)
 
  out:
        NET_UNLOCK();
+}
+
+void
+tcp_timer_reaper(void *arg)
+{
+       struct tcpcb *tp = arg;
+
+       /*
+        * This timer is necessary to delay the pool_put() after all timers
+        * have finished, even if they were sleeping to grab the net lock.
+        * Putting the pool_put() in a timer is sufficinet as all timers run
+        * from the same timeout thread.  Note that neither softnet thread nor
+        * user process may access the tcpcb after arming the reaper timer.
+        * Freeing may run in parallel as it does not grab the net lock.
+        */
+       pool_put(&tcpcb_pool, tp);
+       tcpstat_inc(tcps_closed);
 }
Index: netinet/tcp_timer.h
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.h,v
retrieving revision 1.14
diff -u -p -r1.14 tcp_timer.h
--- netinet/tcp_timer.h 4 Oct 2016 13:54:32 -0000       1.14
+++ netinet/tcp_timer.h 22 Jan 2018 20:29:06 -0000
@@ -39,12 +39,13 @@
  * Definitions of the TCP timers.  These timers are counted
  * down PR_SLOWHZ times a second.
  */
-#define        TCPT_NTIMERS    4
+#define        TCPT_NTIMERS    5
 
 #define        TCPT_REXMT      0               /* retransmit */
 #define        TCPT_PERSIST    1               /* retransmit persistence */
 #define        TCPT_KEEP       2               /* keep alive */
 #define        TCPT_2MSL       3               /* 2*msl quiet time timer */
+#define        TCPT_REAPER     4               /* delayed cleanup timeout */
 
 /*
  * The TCPT_REXMT timer is used to force retransmissions.
@@ -108,8 +109,8 @@
 #define        TCP_DELACK_TICKS (hz / PR_FASTHZ)       /* time to delay ACK */
 
 #ifdef TCPTIMERS
-const char *tcptimers[] =
-    { "REXMT", "PERSIST", "KEEP", "2MSL" };
+const char *tcptimers[TCPT_NTIMERS] =
+    { "REXMT", "PERSIST", "KEEP", "2MSL", "REAPER" };
 #endif /* TCPTIMERS */
 
 /*
Index: netinet/tcp_var.h
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_var.h,v
retrieving revision 1.128
diff -u -p -r1.128 tcp_var.h
--- netinet/tcp_var.h   2 Nov 2017 14:01:18 -0000       1.128
+++ netinet/tcp_var.h   22 Jan 2018 20:29:06 -0000
@@ -193,8 +193,6 @@ struct tcpcb {
        u_short t_pmtud_ip_hl;          /* IP header length from ICMP payload */
 
        int pf;
-
-       struct  timeout t_reap_to;      /* delayed cleanup timeout */
 };
 
 #define        intotcpcb(ip)   ((struct tcpcb *)(ip)->inp_ppcb)
@@ -679,6 +677,7 @@ tcpstat_pkt(enum tcpstat_counters pcount
        counters_pkt(tcpcounters, pcounter, bcounter, v);
 }
 
+extern struct pool tcpcb_pool;
 extern struct inpcbtable tcbtable;     /* head of queue of active tcpcb's */
 extern u_int32_t tcp_now;              /* for RFC 1323 timestamps */
 extern int tcp_do_rfc1323;     /* enabled/disabled? */
@@ -705,7 +704,6 @@ extern      int tcp_syn_cache_active; /* acti
 void    tcp_canceltimers(struct tcpcb *);
 struct tcpcb *
         tcp_close(struct tcpcb *);
-void    tcp_reaper(void *);
 int     tcp_freeq(struct tcpcb *);
 #ifdef INET6
 void    tcp6_ctlinput(int, struct sockaddr *, u_int, void *);

Reply via email to