On Wed, Jun 15, 2022 at 06:17:07PM -0600, Theo de Raadt wrote:
> Mark Kettenis <[email protected]> wrote:
> 
> > Well, I believe that Scott was trying to fix a race condition that can
> > only happen if code is using kbind(2) incorrectly, i.e. when the
> > threads deliberately pass different cookies to kbind(2) or execute
> > kbind(2) from different "text" addresses.
> > 
> > I still think the solution is simply to accept that race condition.
> 
> Right.
> 
> People are not calling kbind.  They are calling syscall(SYS_kbind
> 
> The man page says "don't do that".  No user serviceable parts inside.
> Do not provide to children.
> 
> That said, Scott is about to share a diff he and I did a few cycles
> around, to at least make the call-in transaction be a lock.

Okay, here it is.

This patch reorganizes the logic in sys_kbind() preceding the
copyin(9) so that we only need to take the kernel lock in a single
spot if something goes wrong and we need to raise SIGILL.

It also puts the per-process mutex, ps_mtx, around that logic.  This
guarantees that the first thread to reach sys_kbind() sets
ps_kbind_addr and ps_kbind_cookie, even in oddball situations where
the program isn't using ld.so(1) correctly.

I am aware that this is "unsupported", but on balance I would prefer
that the initialization of those two variables was atomic.  I think
taking the mutex is a very small price to pay for the guarantee that
the "security check" logic always happens the way it was intended to
happen in the order it was intended to happen.  Note that we are not
wrapping the binding loop up in a lock, just the ps_kbind_* variable
logic.

... if Mark and Philip want to push back on that, I think I would
yield and just drop the mutex.

... also, if Philip goes ahead with the PT_OPENBSD_KBIND thing we no
longer need the mutex because, as I understand it, we would no longer
need the ps_kbind_cookie.

Even in either of those cases I still think the logic refactoring
shown here is a good thing.

--

I have been running with this now for a day and haven't hit any
panics.  The last time I tried unlocking kbind(2) I hit panics pretty
quickly due to KERNEL_ASSERT_LOCKED() calls down in the binding loop.

Because I'm not seeing that I assume this means something has changed
down in UVM and we no longer need to wrap the binding loop with the
kernel lock.

Thoughts?

Index: uvm/uvm_mmap.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
retrieving revision 1.169
diff -u -p -r1.169 uvm_mmap.c
--- uvm/uvm_mmap.c      19 Jan 2022 10:43:48 -0000      1.169
+++ uvm/uvm_mmap.c      16 Jun 2022 03:36:53 -0000
@@ -1127,7 +1127,7 @@ sys_kbind(struct proc *p, void *v, regis
        size_t psize, s;
        u_long pc;
        int count, i, extra;
-       int error;
+       int error, sigill = 0;
 
        /*
         * extract syscall args from uap
@@ -1135,23 +1135,41 @@ sys_kbind(struct proc *p, void *v, regis
        paramp = SCARG(uap, param);
        psize = SCARG(uap, psize);
 
-       /* a NULL paramp disables the syscall for the process */
-       if (paramp == NULL) {
-               if (pr->ps_kbind_addr != 0)
-                       sigexit(p, SIGILL);
-               pr->ps_kbind_addr = BOGO_PC;
-               return 0;
-       }
-
-       /* security checks */
+       /*
+        * If paramp is NULL and we're uninitialized, disable the syscall
+        * for the process.  Raise SIGILL if we're already initialized.
+        *
+        * If paramp is non-NULL and we're uninitialized, do initialization.
+        * Otherwise, do security checks raise SIGILL on failure.
+        */
        pc = PROC_PC(p);
-       if (pr->ps_kbind_addr == 0) {
+       mtx_enter(&pr->ps_mtx);
+       if (paramp == NULL) {
+               if (pr->ps_kbind_addr == 0)
+                       pr->ps_kbind_addr = BOGO_PC;
+               else
+                       sigill = 1;
+       } else if (pr->ps_kbind_addr == 0) {
                pr->ps_kbind_addr = pc;
                pr->ps_kbind_cookie = SCARG(uap, proc_cookie);
-       } else if (pc != pr->ps_kbind_addr || pc == BOGO_PC)
-               sigexit(p, SIGILL);
-       else if (pr->ps_kbind_cookie != SCARG(uap, proc_cookie))
+       } else if (pc != pr->ps_kbind_addr || pc == BOGO_PC ||
+           pr->ps_kbind_cookie != SCARG(uap, proc_cookie)) {
+               sigill = 1;
+       }
+       mtx_leave(&pr->ps_mtx);
+
+       /* Raise SIGILL if something is off. */
+       if (sigill) {
+               KERNEL_LOCK();
                sigexit(p, SIGILL);
+               /* NOTREACHED */
+               KERNEL_UNLOCK();
+       }
+
+       /* We're done if we were disabling the syscall. */
+       if (paramp == NULL)
+               return 0;
+
        if (psize < sizeof(struct __kbind) || psize > sizeof(param))
                return EINVAL;
        if ((error = copyin(paramp, &param, psize)))
Index: kern/syscalls.master
===================================================================
RCS file: /cvs/src/sys/kern/syscalls.master,v
retrieving revision 1.224
diff -u -p -r1.224 syscalls.master
--- kern/syscalls.master        16 May 2022 07:36:04 -0000      1.224
+++ kern/syscalls.master        16 Jun 2022 03:36:53 -0000
@@ -194,7 +194,7 @@
                            const struct timespec *times, int flag); }
 85     STD             { int sys_futimens(int fd, \
                            const struct timespec *times); }
-86     STD             { int sys_kbind(const struct __kbind *param, \
+86     STD NOLOCK      { int sys_kbind(const struct __kbind *param, \
                            size_t psize, int64_t proc_cookie); }
 87     STD NOLOCK      { int sys_clock_gettime(clockid_t clock_id, \
                            struct timespec *tp); }
Index: kern/init_sysent.c
===================================================================
RCS file: /cvs/src/sys/kern/init_sysent.c,v
retrieving revision 1.237
diff -u -p -r1.237 init_sysent.c
--- kern/init_sysent.c  16 May 2022 07:38:10 -0000      1.237
+++ kern/init_sysent.c  16 Jun 2022 03:36:53 -0000
@@ -1,4 +1,4 @@
-/*     $OpenBSD: init_sysent.c,v 1.237 2022/05/16 07:38:10 mvs Exp $   */
+/*     $OpenBSD$       */
 
 /*
  * System call switch table.
@@ -204,7 +204,7 @@ const struct sysent sysent[] = {
            sys_utimensat },                    /* 84 = utimensat */
        { 2, s(struct sys_futimens_args), 0,
            sys_futimens },                     /* 85 = futimens */
-       { 3, s(struct sys_kbind_args), 0,
+       { 3, s(struct sys_kbind_args), SY_NOLOCK | 0,
            sys_kbind },                        /* 86 = kbind */
        { 2, s(struct sys_clock_gettime_args), SY_NOLOCK | 0,
            sys_clock_gettime },                /* 87 = clock_gettime */

Reply via email to