Hi, thanks for the detailed explanation!
On Mon, Jan 22, 2018 at 22:37 +0100, Alexander Bluhm wrote: > On Sat, Jan 20, 2018 at 05:53:05PM +0100, Mike Belopuhov wrote: > > On Sat, Jan 20, 2018 at 15:17 +0100, Alexander Bluhm wrote: > > While I'm not against making all TCP timeouts look similar, I'd like > > to understand if there's any other reason to do it other than > > "consistency". > > I think it prevents a use after free. The timeouts may fire and > wait for the netlock. Then the softnet task or a user process calls > tcp_close(). As our net lock does not include splsoftnet anymore, > soft timeouts are not blocked. So the reaper timeout may immediately > free the tcpcb. Then the timer functions could operate on invalid > memory. I have not seen this in practice, it is just a theory. > > In 4.4 BSD there was no reaper timeout. We have added it here: > > revision 1.85 > date: 2004/11/25 15:32:08; author: markus; state: Exp; lines: +18 -3; > fix for race between invocation for timer and network input > 1) add a reaper for TCP and SYN cache states (cf. netbsd pr 20390) > 2) additional check for TCP_TIMER_ISARMED(TCPT_REXMT) in tcp_timer_persist() > with mickey@; ok deraadt@ > > > > The tcp reaper timeout is still imlemented as soft timeout. So it > > > can run while net lock is held by others and it is not synchronized > > > with the other tcp timeouts. > > > > Am I right to say that this is not an issue? Neither pool_put nor > > tcpstat_inc requre a NET_LOCK, correct? > > pool_put() does not need the lock to protect itself. But it must > not free memory that another thread which is holding the net lock > is still using. > > > > Convert it to an ordinary tcp timeout > > > so it is scheduled on the same timeout thread. It grabs the net > > > lock to make sure that softnet has finished its work. > > > > This just makes other threads who want to grab NET_LOCK wait for > > a pool_put to finish, but where's the benefit? > > It restores the bahavior introduced in revision 1.85. The memory > is only freed after the thread that called tcp_close() has released > the net lock. As there was no timeout it 4.4 BSD, and the callers > of tcp_close() set their tcpcb pointer to NULL, and all syncronisation > with the timeouts is done by using a single timeout thread, the net > lock in tcp_timer_reaper() is not necessary. > I don't mind restoring the order of operations. After the change below there will be one thread serializing all timeouts. Perhaps we can revisit this after we move to multiple timeout threads? By then we might require different synchronization methods. > If we fear that freeing the tcpcb is delayed too much by NET_LOCK() > and we consume too much memory, then I am fine when we do not grab > the net lock in tcp_timer_reaper(). > > Generally I prefer to place to many locks than to miss some. When > we see lock congestion, we can remove the unnecessary ones. This > stategy is better than adding necessary locks to a crashing kernel. > Yeah, but as you've said the kernel isn't crashing and this is purely theoretical. While I see why the current state of affairs isn't desirable, I don't see how it's still flawed after the change below. > > What's up with the diagnostics? Do you suspect a race of some sorts? > > I consider myself not smart enough to program multi threaded. There > is always some race I did not think about. So I add diagnostics > to have a better feeling. > There's nothing wrong with diagnostics of course, but at the same time, if there's no bug to hunt why bother? I'm certain you've tried the diff yourself and it didn't blow up, otherwise we wouldn't be having this conversation. Do you want to have it temporarily enabled? I won't mind that, but it looks good to me as it is. > As we agree that the net lock is not needed in the TCP reaper, I > propose this diff without diagnostics and lock, but with a comment. > > ok? > > bluhm > > Index: netinet/tcp_subr.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_subr.c,v > retrieving revision 1.167 > diff -u -p -r1.167 tcp_subr.c > --- netinet/tcp_subr.c 7 Dec 2017 16:52:21 -0000 1.167 > +++ netinet/tcp_subr.c 22 Jan 2018 20:29:06 -0000 > @@ -436,7 +436,6 @@ tcp_newtcpcb(struct inpcb *inp) > TCP_INIT_DELACK(tp); > for (i = 0; i < TCPT_NTIMERS; i++) > TCP_TIMER_INIT(tp, i); > - timeout_set(&tp->t_reap_to, tcp_reaper, tp); > > tp->sack_enable = tcp_do_sack; > tp->t_flags = tcp_do_rfc1323 ? (TF_REQ_SCALE|TF_REQ_TSTMP) : 0; > @@ -532,21 +531,12 @@ tcp_close(struct tcpcb *tp) > m_free(tp->t_template); > > tp->t_flags |= TF_DEAD; > - timeout_add(&tp->t_reap_to, 0); > + TCP_TIMER_ARM(tp, TCPT_REAPER, 0); > > - inp->inp_ppcb = 0; > + inp->inp_ppcb = NULL; > soisdisconnected(so); > in_pcbdetach(inp); > return (NULL); > -} > - > -void > -tcp_reaper(void *arg) > -{ > - struct tcpcb *tp = arg; > - > - pool_put(&tcpcb_pool, tp); > - tcpstat_inc(tcps_closed); > } > > int > Index: netinet/tcp_timer.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.c,v > retrieving revision 1.60 > diff -u -p -r1.60 tcp_timer.c > --- netinet/tcp_timer.c 29 Oct 2017 14:56:36 -0000 1.60 > +++ netinet/tcp_timer.c 22 Jan 2018 21:33:03 -0000 > @@ -70,12 +70,14 @@ void tcp_timer_rexmt(void *); > void tcp_timer_persist(void *); > void tcp_timer_keep(void *); > void tcp_timer_2msl(void *); > +void tcp_timer_reaper(void *); > > const tcp_timer_func_t tcp_timer_funcs[TCPT_NTIMERS] = { > tcp_timer_rexmt, > tcp_timer_persist, > tcp_timer_keep, > tcp_timer_2msl, > + tcp_timer_reaper, > }; > > /* > @@ -463,4 +465,21 @@ tcp_timer_2msl(void *arg) > > out: > NET_UNLOCK(); > +} > + > +void > +tcp_timer_reaper(void *arg) > +{ > + struct tcpcb *tp = arg; > + > + /* > + * This timer is necessary to delay the pool_put() after all timers > + * have finished, even if they were sleeping to grab the net lock. > + * Putting the pool_put() in a timer is sufficinet as all timers run > + * from the same timeout thread. Note that neither softnet thread nor > + * user process may access the tcpcb after arming the reaper timer. > + * Freeing may run in parallel as it does not grab the net lock. > + */ > + pool_put(&tcpcb_pool, tp); > + tcpstat_inc(tcps_closed); > } > Index: netinet/tcp_timer.h > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_timer.h,v > retrieving revision 1.14 > diff -u -p -r1.14 tcp_timer.h > --- netinet/tcp_timer.h 4 Oct 2016 13:54:32 -0000 1.14 > +++ netinet/tcp_timer.h 22 Jan 2018 20:29:06 -0000 > @@ -39,12 +39,13 @@ > * Definitions of the TCP timers. These timers are counted > * down PR_SLOWHZ times a second. > */ > -#define TCPT_NTIMERS 4 > +#define TCPT_NTIMERS 5 > > #define TCPT_REXMT 0 /* retransmit */ > #define TCPT_PERSIST 1 /* retransmit persistence */ > #define TCPT_KEEP 2 /* keep alive */ > #define TCPT_2MSL 3 /* 2*msl quiet time timer */ > +#define TCPT_REAPER 4 /* delayed cleanup timeout */ > > /* > * The TCPT_REXMT timer is used to force retransmissions. > @@ -108,8 +109,8 @@ > #define TCP_DELACK_TICKS (hz / PR_FASTHZ) /* time to delay ACK */ > > #ifdef TCPTIMERS > -const char *tcptimers[] = > - { "REXMT", "PERSIST", "KEEP", "2MSL" }; > +const char *tcptimers[TCPT_NTIMERS] = > + { "REXMT", "PERSIST", "KEEP", "2MSL", "REAPER" }; > #endif /* TCPTIMERS */ > > /* > Index: netinet/tcp_var.h > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_var.h,v > retrieving revision 1.128 > diff -u -p -r1.128 tcp_var.h > --- netinet/tcp_var.h 2 Nov 2017 14:01:18 -0000 1.128 > +++ netinet/tcp_var.h 22 Jan 2018 20:29:06 -0000 > @@ -193,8 +193,6 @@ struct tcpcb { > u_short t_pmtud_ip_hl; /* IP header length from ICMP payload */ > > int pf; > - > - struct timeout t_reap_to; /* delayed cleanup timeout */ > }; > > #define intotcpcb(ip) ((struct tcpcb *)(ip)->inp_ppcb) > @@ -679,6 +677,7 @@ tcpstat_pkt(enum tcpstat_counters pcount > counters_pkt(tcpcounters, pcounter, bcounter, v); > } > > +extern struct pool tcpcb_pool; > extern struct inpcbtable tcbtable; /* head of queue of active > tcpcb's */ > extern u_int32_t tcp_now; /* for RFC 1323 timestamps */ > extern int tcp_do_rfc1323; /* enabled/disabled? */ > @@ -705,7 +704,6 @@ extern int tcp_syn_cache_active; /* acti > void tcp_canceltimers(struct tcpcb *); > struct tcpcb * > tcp_close(struct tcpcb *); > -void tcp_reaper(void *); > int tcp_freeq(struct tcpcb *); > #ifdef INET6 > void tcp6_ctlinput(int, struct sockaddr *, u_int, void *); >