On Mon, Aug 15, 2022 at 02:36:34PM +0200, Alexander Bluhm wrote:
> On Mon, Aug 15, 2022 at 02:29:02PM +0300, Vitaliy Makkoveev wrote:
> > Nothing special, just get assign `inp' and `tp' from passed socket. I
> > want to do this before (*pru_usrreq)() split to avoid huge code
> > duplication.
> > 
> > We have "tp might get 0 when using socket splicing" commentary. I
> > checked where we set `inp_ppcb' to NULL and found, tcp_close() is the
> > only place, and the following in_pcbdetach() sets `so_pcb' to NULL. So
> > this commentary looks wrong, because we can't have TCP sockets without
> > TCP pcb attached.
> 
> I added the comment and check after a panic in production.
> 
> ----------------------------
> revision 1.109
> date: 2012/01/03 21:50:12;  author: bluhm;  state: Exp;  lines: +6 -2;
> When used with socket splicing, tcp_usrreq() might get called with
> a socket that has an inp but tp is NULL.  The call stack for that
> is tcp_input() tcp_close() soisdisconnected() sorwakeup() somove()
> tcp_usrreq(PRU_RCVD).  To avoid a NULL dereference, just return in
> that case.
> ok henning@
> ----------------------------
> 
> tcp_input() calls tcp_close() after a reset packet.

tcp_close() calls in_pcbdetach() after it sets `inp_ppcb' to NULL:

tcp_close(struct tcpcb *tp)
{
        /* ... */
        inp->inp_ppcb = NULL;
        soisdisconnected(so);
        in_pcbdetach(inp);
        return (NULL);
}

soisdisconnected() doesn't release netlock, so concurrent thread can't
access this `inp'.

soisdisconnected(struct socket *so)
{
        soassertlocked(so);
        so->so_state &= ~(SS_ISCONNECTING|SS_ISCONNECTED|SS_ISDISCONNECTING);
        so->so_state |= (SS_CANTRCVMORE|SS_CANTSENDMORE|SS_ISDISCONNECTED);
        wakeup(&so->so_timeo);
        sowwakeup(so);
        sorwakeup(so);
}

in_pcbdetach() sets `so_pcb' to NULL just at the beginning.

in_pcbdetach(struct inpcb *inp)
{
        struct socket *so = inp->inp_socket;
        struct inpcbtable *table = inp->inp_table;

        NET_ASSERT_LOCKED();

        so->so_pcb = NULL;

        /* ... */
}

So according the current code, we can't have the `inp' with NULL `tp'
after tcp_close() call. Looks like we had missing netlock somewhere.

> 
> > This diff works fine with sys/kern/sosplice test.
> 
> I am not sure that regress covers the tp == NULL case.
> These reset conditions are hard to trigger.
> 
> Maybe it does not happen anymore as I have decoupled somove() with
> a soplice thread 4 years later.
> 
> ----------------------------
> revision 1.155
> date: 2016/08/25 14:13:19;  author: bluhm;  state: Exp;  lines: +48 -9;  
> commitid: PnBATF8qpqSyZflF;
> Spliced TCP sockets become faster when the output part is running
> as its own task thread.  This is inspired by userland copy where a
> process also has to go through the scheduler.  This gives the socket
> buffer a chance to be filled up and tcp_output() is called less
> often and with bigger chunks.
> When two kernel tasks share all the workload, the current scheduler
> implementation will hang userland processes on single cpu machines.
> As a workaround put a yield() into the splicing thread after each
> task execution.  This reduces the number of calls of tcp_output()
> even more.
> OK tedu@ mpi@
> ----------------------------
> 
>                 /*
>                  * TCP has a sendbuffer that can handle multiple packets
>                  * at once.  So queue the stream a bit to accumulate data.
>                  * The sosplice thread will call somove() later and send
>                  * the packets calling tcp_output() only once.
>                  * In the UDP case, send out the packets immediately.
>                  * Using a thread would make things slower.
>                  */
>                 if (so->so_proto->pr_flags & PR_WANTRCVD)
>                         task_add(sosplice_taskq, &so->so_splicetask);
> 
> My original plan was to keep sosplice thread optional and have the
> code work without it.  On the other hand we use it in TCP for 6
> years now.
> 
> Should we move if (tp == NULL) to PRU_RCVD?  That is the relevant
> case.  I think if ((so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING))
> != 0) prevents us from calling tcp_output(tp) with NULL.
> 
> So the only place to crash by dereferencing tp is
>         if (so->so_options & SO_DEBUG) {
>                 otp = tp;
>                 ostate = tp->t_state;
>         }
> 
> When I added the check in rev 1.109, tp was always dereferenced.
>         if (inp) {
>                 tp = intotcpcb(inp);
>                 /* tp might get 0 when using socket splicing */
>                 if (tp == NULL) {
>                         splx(s);
>                         return (0);
>                 }
> #ifdef KPROF
>                 tcp_acounts[tp->t_state][req]++;
> #endif
>                 ostate = tp->t_state;
>         } else
>                 ostate = 0;
> 
> We seem to be safe unless someone removes the sosplice thread and
> runs with SO_DEBUG.  Is this enough history to justify to remove a
> check while just moving code around?
> 
> If we use if (tp != NULL && (so->so_options & SO_DEBUG)) everything
> should be fine.
> 
> > +   KASSERT(*rtp != NULL);
> 
> Remember deraadt@ yelling about too many KASSERT().
> 

What about to return EINVAL in the case, when `tp' is missing? In the
somove() we don't check the PRU_RCVD returned value, so this path
doesn't affected. In the other cases we stops to lie to the userland.

The `inp' without `tp' should not exist, right? This check exists only
to silently prevent panics.

Index: sys/netinet/tcp_usrreq.c
===================================================================
RCS file: /cvs/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.187
diff -u -p -r1.187 tcp_usrreq.c
--- sys/netinet/tcp_usrreq.c    15 Aug 2022 09:11:39 -0000      1.187
+++ sys/netinet/tcp_usrreq.c    15 Aug 2022 13:16:39 -0000
@@ -142,6 +142,36 @@ struct     inpcbtable tcbtable;
 int    tcp_fill_info(struct tcpcb *, struct socket *, struct mbuf *);
 int    tcp_ident(void *, size_t *, void *, size_t, int);
 
+static inline int tcp_sogetpcb(struct socket *, struct inpcb **,
+                      struct tcpcb **);
+
+static inline int
+tcp_sogetpcb(struct socket *so, struct inpcb **rinp, struct tcpcb **rtp)
+{
+       struct inpcb *inp;
+       struct tcpcb *tp;
+
+       /*
+        * When a TCP is attached to a socket, then there will be
+        * a (struct inpcb) pointed at by the socket, and this
+        * structure will point at a subsidiary (struct tcpcb).
+        */
+       if ((inp = sotoinpcb(so)) == NULL) {
+               if (so->so_error)
+                       return so->so_error;
+               return EINVAL;
+       }
+
+       /* tp might get 0 when using socket splicing */
+       if ((tp = intotcpcb(inp)) == NULL)
+               return EINVAL;
+
+       *rinp = inp;
+       *rtp = tp;
+
+       return 0;
+}
+
 /*
  * Process a TCP user request for TCP tb.  If this is a send request
  * then m is the mbuf chain of send data.  If this is a timer expiration
@@ -153,7 +183,7 @@ tcp_usrreq(struct socket *so, int req, s
     struct mbuf *control, struct proc *p)
 {
        struct inpcb *inp;
-       struct tcpcb *otp = NULL, *tp = NULL;
+       struct tcpcb *otp = NULL, *tp;
        int error = 0;
        short ostate;
 
@@ -175,22 +205,9 @@ tcp_usrreq(struct socket *so, int req, s
                goto release;
        }
 
-       inp = sotoinpcb(so);
-       /*
-        * When a TCP is attached to a socket, then there will be
-        * a (struct inpcb) pointed at by the socket, and this
-        * structure will point at a subsidiary (struct tcpcb).
-        */
-       if (inp == NULL) {
-               error = so->so_error;
-               if (error == 0)
-                       error = EINVAL;
-               goto release;
-       }
-       tp = intotcpcb(inp);
-       /* tp might get 0 when using socket splicing */
-       if (tp == NULL)
+       if ((error = tcp_sogetpcb(so, &inp, &tp)))
                goto release;
+
        if (so->so_options & SO_DEBUG) {
                otp = tp;
                ostate = tp->t_state;
@@ -739,28 +756,15 @@ int
 tcp_detach(struct socket *so)
 {
        struct inpcb *inp;
-       struct tcpcb *otp = NULL, *tp = NULL;
+       struct tcpcb *otp = NULL, *tp;
        int error = 0;
        short ostate;
 
        soassertlocked(so);
 
-       inp = sotoinpcb(so);
-       /*
-        * When a TCP is attached to a socket, then there will be
-        * a (struct inpcb) pointed at by the socket, and this
-        * structure will point at a subsidiary (struct tcpcb).
-        */
-       if (inp == NULL) {
-               error = so->so_error;
-               if (error == 0)
-                       error = EINVAL;
+       if ((error = tcp_sogetpcb(so, &inp, &tp)))
                return (error);
-       }
-       tp = intotcpcb(inp);
-       /* tp might get 0 when using socket splicing */
-       if (tp == NULL)
-               return (0);
+
        if (so->so_options & SO_DEBUG) {
                otp = tp;
                ostate = tp->t_state;

Reply via email to