On Thu, Jun 30, 2022 at 03:21:40PM +0300, Vitaliy Makkoveev wrote:
> On Thu, Jun 30, 2022 at 11:56:55AM +0200, Claudio Jeker wrote:
> > On Thu, Jun 30, 2022 at 12:34:33PM +0300, Vitaliy Makkoveev wrote:
> > > On Thu, Jun 30, 2022 at 11:08:48AM +0200, Claudio Jeker wrote:
> > > > This diff converts the SRP list to a SMR list in rtsock.c
> > > > SRP is a bit strange with how it works and the SMR code is a bit easier 
> > > > to
> > > > understand. Since we can sleep in the SMR_TAILQ_FOREACH() we need to 
> > > > grab
> > > > a refcount on the route pcb so that we can leave the SMR critical 
> > > > section
> > > > and then enter the SMR critical section at the end of the loop before
> > > > dropping the refcount again.
> > > > 
> > > > The diff does not immeditaly explode but I doubt we can exploit
> > > > parallelism in route_input() so this may fail at some later stage if it 
> > > > is
> > > > wrong.
> > > > 
> > > > Comments from the lock critics welcome
> > > 
> > > We use `so_lock' rwlock(9) to protect route domain sockets. We can't
> > > convert this SRP list to SMR list because we call solock() within
> > > foreach loop.
> > 
> > because of the so_lock the code uses a refcnt on the route pcb to make
> > sure that the object is not freed while we sleep. So that is handled by
> > this diff.
> >  
> > > We can easily crash kernel by running in parallel some "route monitor"
> > > commands and "while true; ifconfig vether0 create ; ifconfig vether0
> > > destroy; done".
> > 
> > That does not cause problem on my system.
> >  
> 
> Sorry, I missed you leave SMR section before solock() call:
> 
> > > > @@ -519,7 +498,8 @@ route_input(struct mbuf *m0, struct sock
> > > >                     rop->rop_proto != sa_family)
> > > >                         continue;
> > > >  
> > > > -
> > > > +               refcnt_take(&rop->rop_refcnt);
> > > > +               smr_read_leave();
> > > >                 so = rop->rop_socket;
> > > >                 solock(so);
> > > >  
> 
> My system is stable with the second version of this diff and the
> following test.

Cool but I fear that the kernel still synchronizes somewhere and we don't
really get full concurrency here. So I'm not sure if this is able to
trigger a reference bug.
 
> #include <sys/ioctl.h>
> #include <sys/select.h>
> #include <sys/socket.h>
> #include <net/if.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <err.h>
> #include <errno.h>
> #include <pthread.h>
> #include <string.h>
> #include <unistd.h>
> 
> static struct ifreq ifr;
> 
> static void *clone(void *arg)
> {
>       int s;
> 
>       if ((s = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
>               err(1, "socket");
>       while (1) {
>               if (ioctl(s, SIOCIFCREATE, &ifr) < 0)
>                       if (errno == EINVAL)
>                               err(1, "ioctl");
>               if(ioctl(s, SIOCIFDESTROY, &ifr)<0)
>                       if(errno == EINVAL)
>                               err(1, "ioctl");
>       }
> 
>       return NULL;
> }
> 
> static void *rtsock(void *arg)
> {
>       int s;
> 
>       while (1){
>               if ((s = socket(AF_ROUTE, SOCK_RAW, 0)) < 0)
>                       err(1, "socket");
>               close(s);
>       }
> 
>       return NULL;
> }
> 
> int main(int argc, char *argv[])
> {
>       pthread_t thr;
>       int i;
> 
>       if( argc != 2) {
>               fprintf(stderr, "usage: %s ifname\n", getprogname());
>               return 1;
>       }
> 
>       if (getuid() != 0) {
>               fprintf(stderr, "should be root\n");
>               return 1;
>       }
> 
>       memset(&ifr, 0, sizeof(ifr));
>       strlcpy(ifr.ifr_name, argv[1], sizeof(ifr.ifr_name));
> 
>       for(i = 0; i < 8 * 4; ++i) {
>               if (pthread_create(&thr, NULL, clone, NULL)!=0)
>                       errx(1, "pthread_create");
>       }
> 
>       for (i = 0; i < 8 * 4; ++i) {
>               if (pthread_create(&thr, NULL, rtsock, NULL) != 0)
>                       errx(1, "pthread_create");
>       }
> 
>       select(0, NULL, NULL, NULL, NULL);
> 
>       return 0;
> }
> 

-- 
:wq Claudio

Reply via email to