Re: [patch] push the KERNEL_LOCK deeper on read(2) and write(2)
Le 2019-06-05 12:06, Mark Kettenis a écrit : Date: Wed, 5 Jun 2019 10:16:25 +0200 From: Sebastien Marie Hi, I would like to have feedback and testing on this diff. The initial work was done by ian@. Don't forget to run "make syscalls" in sys/kern when building your own kernel with this diff! no problem so far with this diff and the scheduler one.
Re: [patch] push the KERNEL_LOCK deeper on read(2) and write(2)
> Date: Wed, 5 Jun 2019 10:16:25 +0200 > From: Sebastien Marie > > Hi, > > I would like to have feedback and testing on this diff. The initial work > was done by ian@. Don't forget to run "make syscalls" in sys/kern when building your own kernel with this diff! > It unlocks read(2) and write(2) families, and push the KERNEL_LOCK > deeper in the code path. With it, read(2) and write(2) on socket will be > KERNEL_LOCK-free. > > read(2) and write(2) are file type agnostics by using `struct fileops' > (see sys/file.h) which define effective implementation per file type. > > Currently, we have the following fileops defined: > - dmabuf dev/pci/drm/drm_linux.c > - kqueue kern/kern_event.c > - pipekern/sys_pipe.c > - socket kern/sys_socket.c > - vnode kern/vfs_vnops.c > > > read(2) family uses dofilereadv(), which calls: > - fileops::fo_read() > - FRELE() -> fdrop() -> fileops::fo_close() > > write(2) family uses dofilewritev(), which calls: > - fileops::fo_write() > - FRELE() -> fdrop() -> fileops::fo_close() > > > The diff pushs the KERNEL_LOCK inside each function, where it is > effectively needed. But as some file types doesn't need the lock (socket > for example), it makes read(2) and write(2) lock-free for them, while > still grabbing the lock for others (vnode for example). The socket calls will still grab the kernel lock for selwakeup and sending signals of course. But some of them will indeed be lock free. It should be fairly easy to also unlock pipes, but that can come later. > fileops::fo_read > - dmabuf_read fine lock-free (just return ENXIO) > - kqueue_read fine lock-free (just return ENXIO) > - pipe_read KERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup) > - soo_readcalls soreceive() which is already called without KERNEL_LOCK > by recvit() (via recvmsg(2)) > - vn_read KERNEL_LOCK/UNLOCK added > > fileops::fo_write > - dmabuf_writefine lock-free (just return ENXIO) > - kqueue_writefine lock-free (just return ENXIO) > - pipe_write KERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup) > - soo_write calls sosend() which is already called without KERNEL_LOCK by > sendit() (via sendmsg(2)) > - vn_writeKERNEL_LOCK/UNLOCK added > > fileops::fo_close > - dmabuf_closealready take care of it > - kqueue_closealready take care of it > - pipe_close already take care of it > - soo_close call soclose() which is already called without KERNEL_LOCK by > socketpair(2) > - vn_closefilealready take care of it > > > In dofilewritev(), I take care of calling ptsignal() with the > KERNEL_LOCK() too, as it requires the lock (it is asserted). > > Others functions should be fine to be called without the KERNEL_LOCK, as > they are already used in such context. Looks good to me. Certainly needs some testing especially on setups that do a lot of socket communication. But if testing doesn't reveal any problems this is ok kettenis@ > Index: kern/sys_generic.c > === > RCS file: /cvs/src/sys/kern/sys_generic.c,v > retrieving revision 1.123 > diff -u -p -r1.123 sys_generic.c > --- kern/sys_generic.c21 Jan 2019 23:41:26 - 1.123 > +++ kern/sys_generic.c4 Jun 2019 06:19:23 - > @@ -366,8 +366,11 @@ dofilewritev(struct proc *p, int fd, str > if (uio->uio_resid != cnt && (error == ERESTART || > error == EINTR || error == EWOULDBLOCK)) > error = 0; > - if (error == EPIPE) > + if (error == EPIPE) { > + KERNEL_LOCK(); > ptsignal(p, SIGPIPE, STHREAD); > + KERNEL_UNLOCK(); > + } > } > cnt -= uio->uio_resid; > > Index: kern/sys_pipe.c > === > RCS file: /cvs/src/sys/kern/sys_pipe.c,v > retrieving revision 1.87 > diff -u -p -r1.87 sys_pipe.c > --- kern/sys_pipe.c 13 Nov 2018 13:02:20 - 1.87 > +++ kern/sys_pipe.c 5 Jun 2019 08:06:19 - > @@ -314,9 +314,11 @@ pipe_read(struct file *fp, struct uio *u > int error; > size_t size, nread = 0; > > + KERNEL_LOCK(); > + > error = pipelock(rpipe); > if (error) > - return (error); > + goto done; > > ++rpipe->pipe_busy; > > @@ -420,6 +422,8 @@ unlocked_error: > if ((rpipe->pipe_buffer.size - rpipe->pipe_buffer.cnt) >= PIPE_BUF) > pipeselwakeup(rpipe); > > +done: > + KERNEL_UNLOCK(); > return (error); > } > > @@ -430,6 +434,8 @@ pipe_write(struct file *fp, struct uio * > size_t orig_resid; > struct pipe *wpipe, *rpipe; > > + KERNEL_LOCK(); > + > rpipe = fp->f_data; > wpipe = rpipe->pipe_peer; > > @@ -437,7 +443,8 @@ pipe_write(struct file *fp, struct uio * >* detect loss of pipe read side, issue SIGPIPE if
[patch] push the KERNEL_LOCK deeper on read(2) and write(2)
Hi, I would like to have feedback and testing on this diff. The initial work was done by ian@. It unlocks read(2) and write(2) families, and push the KERNEL_LOCK deeper in the code path. With it, read(2) and write(2) on socket will be KERNEL_LOCK-free. read(2) and write(2) are file type agnostics by using `struct fileops' (see sys/file.h) which define effective implementation per file type. Currently, we have the following fileops defined: - dmabufdev/pci/drm/drm_linux.c - kqueuekern/kern_event.c - pipe kern/sys_pipe.c - socketkern/sys_socket.c - vnode kern/vfs_vnops.c read(2) family uses dofilereadv(), which calls: - fileops::fo_read() - FRELE() -> fdrop() -> fileops::fo_close() write(2) family uses dofilewritev(), which calls: - fileops::fo_write() - FRELE() -> fdrop() -> fileops::fo_close() The diff pushs the KERNEL_LOCK inside each function, where it is effectively needed. But as some file types doesn't need the lock (socket for example), it makes read(2) and write(2) lock-free for them, while still grabbing the lock for others (vnode for example). fileops::fo_read - dmabuf_read fine lock-free (just return ENXIO) - kqueue_read fine lock-free (just return ENXIO) - pipe_read KERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup) - soo_read calls soreceive() which is already called without KERNEL_LOCK by recvit() (via recvmsg(2)) - vn_read KERNEL_LOCK/UNLOCK added fileops::fo_write - dmabuf_write fine lock-free (just return ENXIO) - kqueue_write fine lock-free (just return ENXIO) - pipe_writeKERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup) - soo_write calls sosend() which is already called without KERNEL_LOCK by sendit() (via sendmsg(2)) - vn_write KERNEL_LOCK/UNLOCK added fileops::fo_close - dmabuf_close already take care of it - kqueue_close already take care of it - pipe_closealready take care of it - soo_close call soclose() which is already called without KERNEL_LOCK by socketpair(2) - vn_closefile already take care of it In dofilewritev(), I take care of calling ptsignal() with the KERNEL_LOCK() too, as it requires the lock (it is asserted). Others functions should be fine to be called without the KERNEL_LOCK, as they are already used in such context. Thanks. -- Sebastien Marie Index: kern/sys_generic.c === RCS file: /cvs/src/sys/kern/sys_generic.c,v retrieving revision 1.123 diff -u -p -r1.123 sys_generic.c --- kern/sys_generic.c 21 Jan 2019 23:41:26 - 1.123 +++ kern/sys_generic.c 4 Jun 2019 06:19:23 - @@ -366,8 +366,11 @@ dofilewritev(struct proc *p, int fd, str if (uio->uio_resid != cnt && (error == ERESTART || error == EINTR || error == EWOULDBLOCK)) error = 0; - if (error == EPIPE) + if (error == EPIPE) { + KERNEL_LOCK(); ptsignal(p, SIGPIPE, STHREAD); + KERNEL_UNLOCK(); + } } cnt -= uio->uio_resid; Index: kern/sys_pipe.c === RCS file: /cvs/src/sys/kern/sys_pipe.c,v retrieving revision 1.87 diff -u -p -r1.87 sys_pipe.c --- kern/sys_pipe.c 13 Nov 2018 13:02:20 - 1.87 +++ kern/sys_pipe.c 5 Jun 2019 08:06:19 - @@ -314,9 +314,11 @@ pipe_read(struct file *fp, struct uio *u int error; size_t size, nread = 0; + KERNEL_LOCK(); + error = pipelock(rpipe); if (error) - return (error); + goto done; ++rpipe->pipe_busy; @@ -420,6 +422,8 @@ unlocked_error: if ((rpipe->pipe_buffer.size - rpipe->pipe_buffer.cnt) >= PIPE_BUF) pipeselwakeup(rpipe); +done: + KERNEL_UNLOCK(); return (error); } @@ -430,6 +434,8 @@ pipe_write(struct file *fp, struct uio * size_t orig_resid; struct pipe *wpipe, *rpipe; + KERNEL_LOCK(); + rpipe = fp->f_data; wpipe = rpipe->pipe_peer; @@ -437,7 +443,8 @@ pipe_write(struct file *fp, struct uio * * detect loss of pipe read side, issue SIGPIPE if lost. */ if ((wpipe == NULL) || (wpipe->pipe_state & PIPE_EOF)) { - return (EPIPE); + error = EPIPE; + goto done; } ++wpipe->pipe_busy; @@ -471,7 +478,7 @@ pipe_write(struct file *fp, struct uio * wpipe->pipe_state &= ~(PIPE_WANT | PIPE_WANTR); wakeup(wpipe); } - return (error); + goto done; } orig_resid = uio->uio_resid; @@ -642,6 +649,8 @@ retrywrite: if (wpipe->pipe_buffer.cnt) pipeselwakeup(wpipe); +done: + KERNEL_UNLOCK(); return (error); } Index: kern/syscalls.master