On Sat, Jan 20, 2018 at 05:53:05PM +0100, Mike Belopuhov wrote: > On Sat, Jan 20, 2018 at 15:17 +0100, Alexander Bluhm wrote: > While I'm not against making all TCP timeouts look similar, I'd like > to understand if there's any other reason to do it other than > "consistency".
I think it prevents a use after free. The timeouts may fire and wait for the netlock. Then the softnet task or a user process calls tcp_close(). As our net lock does not include splsoftnet anymore, soft timeouts are not blocked. So the reaper timeout may immediately free the tcpcb. Then the timer functions could operate on invalid memory. I have not seen this in practice, it is just a theory. In 4.4 BSD there was no reaper timeout. We have added it here: revision 1.85 date: 2004/11/25 15:32:08; author: markus; state: Exp; lines: +18 -3; fix for race between invocation for timer and network input 1) add a reaper for TCP and SYN cache states (cf. netbsd pr 20390) 2) additional check for TCP_TIMER_ISARMED(TCPT_REXMT) in tcp_timer_persist() with mickey@; ok deraadt@ > > The tcp reaper timeout is still imlemented as soft timeout. So it > > can run while net lock is held by others and it is not synchronized > > with the other tcp timeouts. > > Am I right to say that this is not an issue? Neither pool_put nor > tcpstat_inc requre a NET_LOCK, correct? pool_put() does not need the lock to protect itself. But it must not free memory that another thread which is holding the net lock is still using. > > Convert it to an ordinary tcp timeout > > so it is scheduled on the same timeout thread. It grabs the net > > lock to make sure that softnet has finished its work. > > This just makes other threads who want to grab NET_LOCK wait for > a pool_put to finish, but where's the benefit? It restores the bahavior introduced in revision 1.85. The memory is only freed after the thread that called tcp_close() has released the net lock. As there was no timeout it 4.4 BSD, and the callers of tcp_close() set their tcpcb pointer to NULL, and all syncronisation with the timeouts is done by using a single timeout thread, the net lock in tcp_timer_reaper() is not necessary. If we fear that freeing the tcpcb is delayed too much by NET_LOCK() and we consume too much memory, then I am fine when we do not grab the net lock in tcp_timer_reaper(). Generally I prefer to place to many locks than to miss some. When we see lock congestion, we can remove the unnecessary ones. This stategy is better than adding necessary locks to a crashing kernel. > What's up with the diagnostics? Do you suspect a race of some sorts? I consider myself not smart enough to program multi threaded. There is always some race I did not think about. So I add diagnostics to have a better feeling. As we agree that the net lock is not needed in the TCP reaper, I propose this diff without diagnostics and lock, but with a comment. ok? bluhm Index: netinet/tcp_subr.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_subr.c,v retrieving revision 1.167 diff -u -p -r1.167 tcp_subr.c --- netinet/tcp_subr.c 7 Dec 2017 16:52:21 -0000 1.167 +++ netinet/tcp_subr.c 22 Jan 2018 20:29:06 -0000 @@ -436,7 +436,6 @@ tcp_newtcpcb(struct inpcb *inp) TCP_INIT_DELACK(tp); for (i = 0; i < TCPT_NTIMERS; i++) TCP_TIMER_INIT(tp, i); - timeout_set(&tp->t_reap_to, tcp_reaper, tp); tp->sack_enable = tcp_do_sack; tp->t_flags = tcp_do_rfc1323 ? (TF_REQ_SCALE|TF_REQ_TSTMP) : 0; @@ -532,21 +531,12 @@ tcp_close(struct tcpcb *tp) m_free(tp->t_template); tp->t_flags |= TF_DEAD; - timeout_add(&tp->t_reap_to, 0); + TCP_TIMER_ARM(tp, TCPT_REAPER, 0); - inp->inp_ppcb = 0; + inp->inp_ppcb = NULL; soisdisconnected(so); in_pcbdetach(inp); return (NULL); -} - -void -tcp_reaper(void *arg) -{ - struct tcpcb *tp = arg; - - pool_put(&tcpcb_pool, tp); - tcpstat_inc(tcps_closed); } int Index: netinet/tcp_timer.c =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.c,v retrieving revision 1.60 diff -u -p -r1.60 tcp_timer.c --- netinet/tcp_timer.c 29 Oct 2017 14:56:36 -0000 1.60 +++ netinet/tcp_timer.c 22 Jan 2018 21:33:03 -0000 @@ -70,12 +70,14 @@ void tcp_timer_rexmt(void *); void tcp_timer_persist(void *); void tcp_timer_keep(void *); void tcp_timer_2msl(void *); +void tcp_timer_reaper(void *); const tcp_timer_func_t tcp_timer_funcs[TCPT_NTIMERS] = { tcp_timer_rexmt, tcp_timer_persist, tcp_timer_keep, tcp_timer_2msl, + tcp_timer_reaper, }; /* @@ -463,4 +465,21 @@ tcp_timer_2msl(void *arg) out: NET_UNLOCK(); +} + +void +tcp_timer_reaper(void *arg) +{ + struct tcpcb *tp = arg; + + /* + * This timer is necessary to delay the pool_put() after all timers + * have finished, even if they were sleeping to grab the net lock. + * Putting the pool_put() in a timer is sufficinet as all timers run + * from the same timeout thread. Note that neither softnet thread nor + * user process may access the tcpcb after arming the reaper timer. + * Freeing may run in parallel as it does not grab the net lock. + */ + pool_put(&tcpcb_pool, tp); + tcpstat_inc(tcps_closed); } Index: netinet/tcp_timer.h =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.h,v retrieving revision 1.14 diff -u -p -r1.14 tcp_timer.h --- netinet/tcp_timer.h 4 Oct 2016 13:54:32 -0000 1.14 +++ netinet/tcp_timer.h 22 Jan 2018 20:29:06 -0000 @@ -39,12 +39,13 @@ * Definitions of the TCP timers. These timers are counted * down PR_SLOWHZ times a second. */ -#define TCPT_NTIMERS 4 +#define TCPT_NTIMERS 5 #define TCPT_REXMT 0 /* retransmit */ #define TCPT_PERSIST 1 /* retransmit persistence */ #define TCPT_KEEP 2 /* keep alive */ #define TCPT_2MSL 3 /* 2*msl quiet time timer */ +#define TCPT_REAPER 4 /* delayed cleanup timeout */ /* * The TCPT_REXMT timer is used to force retransmissions. @@ -108,8 +109,8 @@ #define TCP_DELACK_TICKS (hz / PR_FASTHZ) /* time to delay ACK */ #ifdef TCPTIMERS -const char *tcptimers[] = - { "REXMT", "PERSIST", "KEEP", "2MSL" }; +const char *tcptimers[TCPT_NTIMERS] = + { "REXMT", "PERSIST", "KEEP", "2MSL", "REAPER" }; #endif /* TCPTIMERS */ /* Index: netinet/tcp_var.h =================================================================== RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_var.h,v retrieving revision 1.128 diff -u -p -r1.128 tcp_var.h --- netinet/tcp_var.h 2 Nov 2017 14:01:18 -0000 1.128 +++ netinet/tcp_var.h 22 Jan 2018 20:29:06 -0000 @@ -193,8 +193,6 @@ struct tcpcb { u_short t_pmtud_ip_hl; /* IP header length from ICMP payload */ int pf; - - struct timeout t_reap_to; /* delayed cleanup timeout */ }; #define intotcpcb(ip) ((struct tcpcb *)(ip)->inp_ppcb) @@ -679,6 +677,7 @@ tcpstat_pkt(enum tcpstat_counters pcount counters_pkt(tcpcounters, pcounter, bcounter, v); } +extern struct pool tcpcb_pool; extern struct inpcbtable tcbtable; /* head of queue of active tcpcb's */ extern u_int32_t tcp_now; /* for RFC 1323 timestamps */ extern int tcp_do_rfc1323; /* enabled/disabled? */ @@ -705,7 +704,6 @@ extern int tcp_syn_cache_active; /* acti void tcp_canceltimers(struct tcpcb *); struct tcpcb * tcp_close(struct tcpcb *); -void tcp_reaper(void *); int tcp_freeq(struct tcpcb *); #ifdef INET6 void tcp6_ctlinput(int, struct sockaddr *, u_int, void *);