> Date: Wed, 20 Jun 2018 15:17:27 +0200 > From: Martin Pieuchot <m...@openbsd.org>
Joel, the diff below changes the semantics of the DIOCMAP ioctl a bit. See the bit about DIOCMAP below. I think this approach is better. But maybe I'm missing someting... Martin, I like fnew(). I can use that in the prime/DRI3 stuff as well. I'm still not convinced that inserting the struct file into the list late is the right choice. Yes, inserting it early (in fnew()) means that tools like fstat may see a struct file that is in an inconsistent state. But I'm not sure we can guarantee that anyway. That would allow you to atomically replace the pointer in the file descriptor table and be done. Actually I think you could already simplify the code and skip the "Zap old file" bit. Just do a fdinsert(fdp, fd, 0, fp) which will retain the flags. Now if you really prefer to keep inserting late, could we change fdinsert to only insert if FIF_INSERTED isn't set already? That way we can use it to insert the same open file into multiple file descriptor tables. I can include such a change in my prime/DRI3 diff. Cheers. Mark > On 19/06/18(Tue) 16:40, Mark Kettenis wrote: > > > Date: Tue, 19 Jun 2018 15:45:48 +0200 > > > From: Martin Pieuchot <m...@openbsd.org> > > > On 19/06/18(Tue) 15:08, Mark Kettenis wrote: > > > > [...] > > > > I think this is the wrong approach. Instead of modifying the struct > > > > file, this code should create a new struct file. Then replace the > > > > pointer in the file descriptor table with that new file and then drop > > > > the reference to the old struct file. That way f_data *is* immutable. > > > > > > I tried that first... well it didn't work for me. > > > > > > The problem is that once you swapped `fp' you still need to free the old > > > one. But you don't want to close the associated file, so you end up > > > NULLing `f_data' before calling closef()... > > > > You simply call FRELE() on the old struct file. That should properly > > close the file once the last reference goes. Obviously you'd remove > > the code that closes the original vnode from diskmapioctl(). I don't > > see how closef() would be called in that scenario, since you're not > > releasing the file descriptor. > > We need to close the original file, which means we have to decrement the > refcounter twice. So using closef() is the way to go, we're using the > same semantic as in other file system syscalls. > > Note however that the last reference on the original file will in most > cases be released when returning from diskmapioctl(). > > > > On top of that this approach introduces a new error condition if we > > > reach `maxfiles', yes this is a degenerated case but I'd prefer not > > > have to deal with it. > > > > True. And it changes the semantics of the DIOCMAP ioctl. Currently, > > if you call DIOCMAP on a dup'ed file descriptor it would change the > > open file associated with all of them. With my suggestion, it will > > only change the open file for the file descriptor you're operating on. > > The other file descriptor would continue to refer to the original open > > file (typically the diskmap device, but not necessarily). I'd argue > > that is actually how this ioctl should behave. > > > > In fact the current behaviour is a bit of a security risk. I could > > open a file, pass the descriptor to another process and then call > > DIOCMAP and suddenly the other process has a ilfe descriptor that > > refers to a disk device instead of a normal file. > > Diff below does that. It also moves the fdplock() after vn_open(9) > to not interleave it with the inode lock nor hold it while we recurse > it other subsystems. > > Index: dev/diskmap.c > =================================================================== > RCS file: /cvs/src/sys/dev/diskmap.c,v > retrieving revision 1.20 > diff -u -p -r1.20 diskmap.c > --- dev/diskmap.c 9 May 2018 08:42:02 -0000 1.20 > +++ dev/diskmap.c 20 Jun 2018 13:14:06 -0000 > @@ -57,9 +57,9 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > struct dk_diskmap *dm; > struct nameidata ndp; > struct filedesc *fdp = p->p_fd; > - struct file *fp = NULL; > - struct vnode *vp = NULL, *ovp; > - char *devname; > + struct file *fp0 = NULL, *fp = NULL; > + struct vnode *vp = NULL; > + char *devname, flags; > int fd, error = EINVAL; > > if (cmd != DIOCMAP) > @@ -80,57 +80,59 @@ diskmapioctl(dev_t dev, u_long cmd, cadd > goto invalid; > > /* Attempt to open actual device. */ > - if ((error = getvnode(p, fd, &fp)) != 0) > + if ((error = getvnode(p, fd, &fp0)) != 0) > goto invalid; > > - KASSERT(fp->f_type == DTYPE_VNODE); > - KASSERT(fp->f_ops == &vnops); > - > - fdplock(fdp); > - > NDINIT(&ndp, 0, 0, UIO_SYSSPACE, devname, p); > ndp.ni_pledge = PLEDGE_RPATH; > - if ((error = vn_open(&ndp, fp->f_flag, 0)) != 0) > + if ((error = vn_open(&ndp, fp0->f_flag, 0)) != 0) > goto bad; > > vp = ndp.ni_vp; > + VOP_UNLOCK(vp); > > - /* Close the original vnode. */ > - ovp = (struct vnode *)fp->f_data; > - if (fp->f_flag & FWRITE) > - ovp->v_writecount--; > - > - if (ovp->v_writecount == 0) { > - vn_lock(ovp, LK_EXCLUSIVE | LK_RETRY); > - VOP_CLOSE(ovp, fp->f_flag, p->p_ucred, p); > - vput(ovp); > + fdplock(fdp); > + /* We lost a race, don't go any further. */ > + if (fdp->fd_ofiles[fd] != fp0) { > + error = EAGAIN; > + goto bad; > } > > - fp->f_data = (caddr_t)vp; > - fp->f_offset = 0; > - mtx_enter(&fp->f_mtx); > - fp->f_rxfer = 0; > - fp->f_wxfer = 0; > - fp->f_seek = 0; > - fp->f_rbytes = 0; > - fp->f_wbytes = 0; > - mtx_leave(&fp->f_mtx); > - > - VOP_UNLOCK(vp); > + fp = fnew(p); > + if (fp == NULL) { > + error = ENFILE; > + goto bad; > + } > > - FRELE(fp, p); > + /* Zap old file. */ > + mtx_enter(&fhdlk); > + KASSERT(fdp->fd_ofiles[fd] == fp0); > + flags = fdp->fd_ofileflags[fd]; > + fdp->fd_ofiles[fd] = NULL; > + fdp->fd_ofileflags[fd] = 0; > + mtx_leave(&fhdlk); > + > + /* Insert new file. */ > + fp->f_flag = fp0->f_flag; > + fp->f_type = DTYPE_VNODE; > + fp->f_ops = &vnops; > + fp->f_data = (caddr_t)vp; > + fdinsert(fdp, fd, flags, fp); > fdpunlock(fdp); > + FRELE(fp, p); > + > + closef(fp0, p); > free(devname, M_DEVBUF, PATH_MAX); > > return 0; > > bad: > - if (vp) > - vput(vp); > - if (fp) > - FRELE(fp, p); > - > fdpunlock(fdp); > + > + if (vp) > + vrele(vp); > + if (fp0) > + FRELE(fp0, p); > > invalid: > free(devname, M_DEVBUF, PATH_MAX); > Index: kern/kern_descrip.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_descrip.c,v > retrieving revision 1.167 > diff -u -p -r1.167 kern_descrip.c > --- kern/kern_descrip.c 20 Jun 2018 10:52:49 -0000 1.167 > +++ kern/kern_descrip.c 20 Jun 2018 11:48:36 -0000 > @@ -971,11 +971,29 @@ restart: > } > return (error); > } > - if (numfiles >= maxfiles) { > + > + fp = fnew(p); > + if (fp == NULL) { > fd_unused(p->p_fd, i); > - tablefull("file"); > return (ENFILE); > } > + > + *resultfp = fp; > + *resultfd = i; > + > + return (0); > +} > + > +struct file * > +fnew(struct proc *p) > +{ > + struct file *fp; > + > + if (numfiles >= maxfiles) { > + tablefull("file"); > + return (NULL); > + } > + > /* > * Allocate a new file descriptor. > * If the process has file descriptor zero open, add to the list > @@ -992,14 +1010,12 @@ restart: > fp->f_count = 1; > fp->f_cred = p->p_ucred; > crhold(fp->f_cred); > - *resultfp = fp; > - *resultfd = i; > > mtx_enter(&fhdlk); > fp->f_count++; > mtx_leave(&fhdlk); > > - return (0); > + return (fp); > } > > /* > Index: sys/filedesc.h > =================================================================== > RCS file: /cvs/src/sys/sys/filedesc.h,v > retrieving revision 1.39 > diff -u -p -r1.39 filedesc.h > --- sys/filedesc.h 18 Jun 2018 09:15:05 -0000 1.39 > +++ sys/filedesc.h 20 Jun 2018 11:49:07 -0000 > @@ -120,6 +120,7 @@ void filedesc_init(void); > int dupfdopen(struct proc *, int, int); > int fdalloc(struct proc *p, int want, int *result); > void fdexpand(struct proc *); > +struct file *fnew(struct proc *_p); > int falloc(struct proc *_p, struct file **_rfp, int *_rfd); > struct filedesc *fdinit(void); > struct filedesc *fdshare(struct process *); > >