> Date: Tue, 19 Jun 2018 15:45:48 +0200 > From: Martin Pieuchot <m...@openbsd.org> > > On 19/06/18(Tue) 15:08, Mark Kettenis wrote: > > > Date: Tue, 19 Jun 2018 12:51:35 +0200 > > > From: Martin Pieuchot <m...@openbsd.org> > > > > > > There's one place in our kernel where `f_data' is overwritten while the > > > descriptor sits in multiple shared data structures. It is in diskmap. > > > > > > We want to avoid this situation to be able to treat f_data as immutable. > > > > > > So the diff below remove `fp' from the shared data structures before it > > > gets modified. It is not a complete solution to the reference grabbing > > > problem of vnode. There's still a window where a race is possible > > > between the moment where we release the mutex and increase the vnode's > > > reference. The KERNEL_LOCK() is what protects us for now. However this > > > problem is the next one to solve to be able to unlock syscalls messing > > > with vnodes. > > > > > > Still this diff is a step towards that. > > > > > > Note that this change only apply to vnodes, so it doesn't matter for > > > socket-related syscalls. > > > > > > Ok? > > > > I think this is the wrong approach. Instead of modifying the struct > > file, this code should create a new struct file. Then replace the > > pointer in the file descriptor table with that new file and then drop > > the reference to the old struct file. That way f_data *is* immutable. > > I tried that first... well it didn't work for me. > > The problem is that once you swapped `fp' you still need to free the old > one. But you don't want to close the associated file, so you end up > NULLing `f_data' before calling closef()...
You simply call FRELE() on the old struct file. That should properly close the file once the last reference goes. Obviously you'd remove the code that closes the original vnode from diskmapioctl(). I don't see how closef() would be called in that scenario, since you're not releasing the file descriptor. > On top of that this approach introduces a new error condition if we > reach `maxfiles', yes this is a degenerated case but I'd prefer not > have to deal with it. True. And it changes the semantics of the DIOCMAP ioctl. Currently, if you call DIOCMAP on a dup'ed file descriptor it would change the open file associated with all of them. With my suggestion, it will only change the open file for the file descriptor you're operating on. The other file descriptor would continue to refer to the original open file (typically the diskmap device, but not necessarily). I'd argue that is actually how this ioctl should behave. In fact the current behaviour is a bit of a security risk. I could open a file, pass the descriptor to another process and then call DIOCMAP and suddenly the other process has a ilfe descriptor that refers to a disk device instead of a normal file. > The approach below returns `fp' into the "larval" or "reserved" state, > change it, then insert it again. > > > > > > Index: dev/diskmap.c > > > =================================================================== > > > RCS file: /cvs/src/sys/dev/diskmap.c,v > > > retrieving revision 1.20 > > > diff -u -p -r1.20 diskmap.c > > > --- dev/diskmap.c 9 May 2018 08:42:02 -0000 1.20 > > > +++ dev/diskmap.c 19 Jun 2018 10:38:37 -0000 > > > @@ -59,7 +59,7 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > > struct filedesc *fdp = p->p_fd; > > > struct file *fp = NULL; > > > struct vnode *vp = NULL, *ovp; > > > - char *devname; > > > + char *devname, flags; > > > int fd, error = EINVAL; > > > > > > if (cmd != DIOCMAP) > > > @@ -106,6 +106,15 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > > vput(ovp); > > > } > > > > > > + /* Remove the file while we're modifying it to prevent races. */ > > > + flags = fdp->fd_ofileflags[fd]; > > > + mtx_enter(&fhdlk); > > > + fdp->fd_ofiles[fd] = NULL; > > > + fdp->fd_ofileflags[fd] = 0; > > > + if (fp->f_iflags & FIF_INSERTED) > > > + LIST_REMOVE(fp, f_list); > > > + mtx_leave(&fhdlk); > > > + > > > fp->f_data = (caddr_t)vp; > > > fp->f_offset = 0; > > > mtx_enter(&fp->f_mtx); > > > @@ -115,6 +124,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > > > fp->f_rbytes = 0; > > > fp->f_wbytes = 0; > > > mtx_leave(&fp->f_mtx); > > > + > > > + /* Insert it back where it was. */ > > > + fdinsert(fdp, fd, flags, fp); > > > > > > VOP_UNLOCK(vp); > > > > > > > > > > >