Re: [patch] push the KERNEL_LOCK deeper on read(2) and write(2)

2019-06-06 Thread Solène Rapenne

Le 2019-06-05 12:06, Mark Kettenis a écrit :

Date: Wed, 5 Jun 2019 10:16:25 +0200
From: Sebastien Marie 

Hi,

I would like to have feedback and testing on this diff. The initial 
work

was done by ian@.


Don't forget to run "make syscalls" in sys/kern when building your own
kernel with this diff!



no problem so far with this diff and the scheduler one.



Re: [patch] push the KERNEL_LOCK deeper on read(2) and write(2)

2019-06-05 Thread Mark Kettenis
> Date: Wed, 5 Jun 2019 10:16:25 +0200
> From: Sebastien Marie 
> 
> Hi,
> 
> I would like to have feedback and testing on this diff. The initial work
> was done by ian@.

Don't forget to run "make syscalls" in sys/kern when building your own
kernel with this diff!

> It unlocks read(2) and write(2) families, and push the KERNEL_LOCK
> deeper in the code path. With it, read(2) and write(2) on socket will be
> KERNEL_LOCK-free.
> 
> read(2) and write(2) are file type agnostics by using `struct fileops'
> (see sys/file.h) which define effective implementation per file type.
> 
> Currently, we have the following fileops defined:
> - dmabuf  dev/pci/drm/drm_linux.c
> - kqueue  kern/kern_event.c
> - pipekern/sys_pipe.c
> - socket  kern/sys_socket.c
> - vnode   kern/vfs_vnops.c
> 
> 
> read(2) family uses dofilereadv(), which calls:
> - fileops::fo_read()
> - FRELE() -> fdrop() -> fileops::fo_close()
> 
> write(2) family uses dofilewritev(), which calls:
> - fileops::fo_write()
> - FRELE() -> fdrop() -> fileops::fo_close()
> 
> 
> The diff pushs the KERNEL_LOCK inside each function, where it is
> effectively needed. But as some file types doesn't need the lock (socket
> for example), it makes read(2) and write(2) lock-free for them, while
> still grabbing the lock for others (vnode for example).

The socket calls will still grab the kernel lock for selwakeup and
sending signals of course.  But some of them will indeed be lock free.

It should be fairly easy to also unlock pipes, but that can come
later.

> fileops::fo_read
> - dmabuf_read fine lock-free (just return ENXIO)
> - kqueue_read fine lock-free (just return ENXIO)
> - pipe_read   KERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup)
> - soo_readcalls soreceive() which is already called without KERNEL_LOCK 
> by recvit() (via recvmsg(2))
> - vn_read KERNEL_LOCK/UNLOCK added
> 
> fileops::fo_write
> - dmabuf_writefine lock-free (just return ENXIO)
> - kqueue_writefine lock-free (just return ENXIO)
> - pipe_write  KERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup)
> - soo_write   calls sosend() which is already called without KERNEL_LOCK by 
> sendit() (via sendmsg(2))
> - vn_writeKERNEL_LOCK/UNLOCK added
> 
> fileops::fo_close
> - dmabuf_closealready take care of it
> - kqueue_closealready take care of it
> - pipe_close  already take care of it
> - soo_close   call soclose() which is already called without KERNEL_LOCK by 
> socketpair(2)
> - vn_closefilealready take care of it
> 
> 
> In dofilewritev(), I take care of calling ptsignal() with the
> KERNEL_LOCK() too, as it requires the lock (it is asserted).
> 
> Others functions should be fine to be called without the KERNEL_LOCK, as
> they are already used in such context.

Looks good to me.  Certainly needs some testing especially on setups
that do a lot of socket communication.  But if testing doesn't reveal
any problems this is ok kettenis@

> Index: kern/sys_generic.c
> ===
> RCS file: /cvs/src/sys/kern/sys_generic.c,v
> retrieving revision 1.123
> diff -u -p -r1.123 sys_generic.c
> --- kern/sys_generic.c21 Jan 2019 23:41:26 -  1.123
> +++ kern/sys_generic.c4 Jun 2019 06:19:23 -
> @@ -366,8 +366,11 @@ dofilewritev(struct proc *p, int fd, str
>   if (uio->uio_resid != cnt && (error == ERESTART ||
>   error == EINTR || error == EWOULDBLOCK))
>   error = 0;
> - if (error == EPIPE)
> + if (error == EPIPE) {
> + KERNEL_LOCK();
>   ptsignal(p, SIGPIPE, STHREAD);
> + KERNEL_UNLOCK();
> + }
>   }
>   cnt -= uio->uio_resid;
>  
> Index: kern/sys_pipe.c
> ===
> RCS file: /cvs/src/sys/kern/sys_pipe.c,v
> retrieving revision 1.87
> diff -u -p -r1.87 sys_pipe.c
> --- kern/sys_pipe.c   13 Nov 2018 13:02:20 -  1.87
> +++ kern/sys_pipe.c   5 Jun 2019 08:06:19 -
> @@ -314,9 +314,11 @@ pipe_read(struct file *fp, struct uio *u
>   int error;
>   size_t size, nread = 0;
>  
> + KERNEL_LOCK();
> +
>   error = pipelock(rpipe);
>   if (error)
> - return (error);
> + goto done;
>  
>   ++rpipe->pipe_busy;
>  
> @@ -420,6 +422,8 @@ unlocked_error:
>   if ((rpipe->pipe_buffer.size - rpipe->pipe_buffer.cnt) >= PIPE_BUF)
>   pipeselwakeup(rpipe);
>  
> +done:
> + KERNEL_UNLOCK();
>   return (error);
>  }
>  
> @@ -430,6 +434,8 @@ pipe_write(struct file *fp, struct uio *
>   size_t orig_resid;
>   struct pipe *wpipe, *rpipe;
>  
> + KERNEL_LOCK();
> +
>   rpipe = fp->f_data;
>   wpipe = rpipe->pipe_peer;
>  
> @@ -437,7 +443,8 @@ pipe_write(struct file *fp, struct uio *
>* detect loss of pipe read side, issue SIGPIPE if 

[patch] push the KERNEL_LOCK deeper on read(2) and write(2)

2019-06-05 Thread Sebastien Marie
Hi,

I would like to have feedback and testing on this diff. The initial work
was done by ian@.

It unlocks read(2) and write(2) families, and push the KERNEL_LOCK
deeper in the code path. With it, read(2) and write(2) on socket will be
KERNEL_LOCK-free.

read(2) and write(2) are file type agnostics by using `struct fileops'
(see sys/file.h) which define effective implementation per file type.

Currently, we have the following fileops defined:
- dmabufdev/pci/drm/drm_linux.c
- kqueuekern/kern_event.c
- pipe  kern/sys_pipe.c
- socketkern/sys_socket.c
- vnode kern/vfs_vnops.c


read(2) family uses dofilereadv(), which calls:
- fileops::fo_read()
- FRELE() -> fdrop() -> fileops::fo_close()

write(2) family uses dofilewritev(), which calls:
- fileops::fo_write()
- FRELE() -> fdrop() -> fileops::fo_close()


The diff pushs the KERNEL_LOCK inside each function, where it is
effectively needed. But as some file types doesn't need the lock (socket
for example), it makes read(2) and write(2) lock-free for them, while
still grabbing the lock for others (vnode for example).

fileops::fo_read
- dmabuf_read   fine lock-free (just return ENXIO)
- kqueue_read   fine lock-free (just return ENXIO)
- pipe_read KERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup)
- soo_read  calls soreceive() which is already called without KERNEL_LOCK 
by recvit() (via recvmsg(2))
- vn_read   KERNEL_LOCK/UNLOCK added

fileops::fo_write
- dmabuf_write  fine lock-free (just return ENXIO)
- kqueue_write  fine lock-free (just return ENXIO)
- pipe_writeKERNEL_LOCK/UNLOCK added (tsleep, wakeup, selwakeup)
- soo_write calls sosend() which is already called without KERNEL_LOCK by 
sendit() (via sendmsg(2))
- vn_write  KERNEL_LOCK/UNLOCK added

fileops::fo_close
- dmabuf_close  already take care of it
- kqueue_close  already take care of it
- pipe_closealready take care of it
- soo_close call soclose() which is already called without KERNEL_LOCK by 
socketpair(2)
- vn_closefile  already take care of it


In dofilewritev(), I take care of calling ptsignal() with the
KERNEL_LOCK() too, as it requires the lock (it is asserted).

Others functions should be fine to be called without the KERNEL_LOCK, as
they are already used in such context.

Thanks.
-- 
Sebastien Marie


Index: kern/sys_generic.c
===
RCS file: /cvs/src/sys/kern/sys_generic.c,v
retrieving revision 1.123
diff -u -p -r1.123 sys_generic.c
--- kern/sys_generic.c  21 Jan 2019 23:41:26 -  1.123
+++ kern/sys_generic.c  4 Jun 2019 06:19:23 -
@@ -366,8 +366,11 @@ dofilewritev(struct proc *p, int fd, str
if (uio->uio_resid != cnt && (error == ERESTART ||
error == EINTR || error == EWOULDBLOCK))
error = 0;
-   if (error == EPIPE)
+   if (error == EPIPE) {
+   KERNEL_LOCK();
ptsignal(p, SIGPIPE, STHREAD);
+   KERNEL_UNLOCK();
+   }
}
cnt -= uio->uio_resid;
 
Index: kern/sys_pipe.c
===
RCS file: /cvs/src/sys/kern/sys_pipe.c,v
retrieving revision 1.87
diff -u -p -r1.87 sys_pipe.c
--- kern/sys_pipe.c 13 Nov 2018 13:02:20 -  1.87
+++ kern/sys_pipe.c 5 Jun 2019 08:06:19 -
@@ -314,9 +314,11 @@ pipe_read(struct file *fp, struct uio *u
int error;
size_t size, nread = 0;
 
+   KERNEL_LOCK();
+
error = pipelock(rpipe);
if (error)
-   return (error);
+   goto done;
 
++rpipe->pipe_busy;
 
@@ -420,6 +422,8 @@ unlocked_error:
if ((rpipe->pipe_buffer.size - rpipe->pipe_buffer.cnt) >= PIPE_BUF)
pipeselwakeup(rpipe);
 
+done:
+   KERNEL_UNLOCK();
return (error);
 }
 
@@ -430,6 +434,8 @@ pipe_write(struct file *fp, struct uio *
size_t orig_resid;
struct pipe *wpipe, *rpipe;
 
+   KERNEL_LOCK();
+
rpipe = fp->f_data;
wpipe = rpipe->pipe_peer;
 
@@ -437,7 +443,8 @@ pipe_write(struct file *fp, struct uio *
 * detect loss of pipe read side, issue SIGPIPE if lost.
 */
if ((wpipe == NULL) || (wpipe->pipe_state & PIPE_EOF)) {
-   return (EPIPE);
+   error = EPIPE;
+   goto done;
}
++wpipe->pipe_busy;
 
@@ -471,7 +478,7 @@ pipe_write(struct file *fp, struct uio *
wpipe->pipe_state &= ~(PIPE_WANT | PIPE_WANTR);
wakeup(wpipe);
}
-   return (error);
+   goto done;
}
 
orig_resid = uio->uio_resid;
@@ -642,6 +649,8 @@ retrywrite:
if (wpipe->pipe_buffer.cnt)
pipeselwakeup(wpipe);
 
+done:
+   KERNEL_UNLOCK();
return (error);
 }
 
Index: kern/syscalls.master