Re: mmap: Do not push KERNEL_LOCK() too far

2020-10-05 Thread Mark Kettenis
> Date: Mon, 5 Oct 2020 11:25:39 +0200
> From: Martin Pieuchot 
> 
> On 03/10/20(Sat) 12:59, Mark Kettenis wrote:
> > > Date: Fri, 2 Oct 2020 10:32:27 +0200
> > > From: Martin Pieuchot 
> > > 
> > > On 01/10/20(Thu) 21:44, Mark Kettenis wrote:
> > > > > Date: Thu, 1 Oct 2020 14:10:56 +0200
> > > > > From: Martin Pieuchot 
> > > > > 
> > > > > While studying a bug report from naddy@ in 2017 when testing 
> > > > > guenther@'s
> > > > > amap/anon locking diff I figured out that we have been too optimistic 
> > > > > in
> > > > > the !MAP_ANON case.
> > > > > 
> > > > > The reported panic involves, I'd guess, a race between fd_getfile() 
> > > > > and
> > > > > vref():
> > > > > 
> > > > >   panic: vref used where vget required
> > > > >   db_enter() at db_enter+0x5
> > > > >   panic() at panic+0x129
> > > > >   vref(ff03b20d29e8) at vref+0x5d
> > > > >   uvn_attach(101,ff03a5879dc0) at uvn_attach+0x11d
> > > > >   uvm_mmapfile(7,ff03a5879dc0,2,1,13,10012) at 
> > > > > uvm_mmapfile+0x12c
> > > > >   sys_mmap(c50,8000225f82a0,1) at sys_mmap+0x604
> > > > >   syscall() at syscall+0x279
> > > > >   --- syscall (number 198) ---
> > > > >   end of kernel
> > > > > 
> > > > > Removing the KERNEL_LOCK() from file mapping was out of the scope of 
> > > > > this
> > > > > previous work, so I'd like to go back to a single KERNEL_LOCK/UNLOCK 
> > > > > dance
> > > > > in this code path to remove any false positive.
> > > > > 
> > > > > Note that this code is currently always run under KERNEL_LOCK() so 
> > > > > this
> > > > > will only have effect once the syscall will be unlocked.
> > > > > 
> > > > > ok?
> > > > 
> > > > Hmm, I thought fd_getfile() was fully mpsafe.
> > > 
> > > It is to get a reference on `fp'.  However if the current thread
> > > releases the KERNEL_LOCK() before calling vref(9) it might lose a
> > > race.
> > 
> > I don't see the race.  The function returns a 'fp' with a reference,
> > so 'fp' will be valid regardless of whether we hold the kernel lock or
> > not.  So we should be able to take the kernel lock after the
> > fd_getfile() call isn't it?
> 
> Should we?  I'd assume we can't unless somebody can explain the
> contrary.  My point is the following: by releasing the KERNEL_LOCK() we
> allow other parts of the kernel: syscalls and fault handlers to mess
> with this vnode.
> 
> > > > But I suppose the kernel lock needs to be grabbed before we start
> > > > looking at the vnode?
> > > 
> > > Yes, or to say it otherwise not released.
> > 
> > So the problem is that while we have an 'fp', its f_data member points
> > to a vnode that has already been put on the freelist and therefore has
> > v_usecount set to zero?  How does that happen?
> 
> I don't know.  I'm trying to be conservative to be able to concentrate
> on amaps & anons.  I'd rather keep all the rest under a single
> KERNEL_LOCK().
> 
> Hopefully this can be revisited soon.

Fair enough.  ok kettenis@

> > > > Your diff makes the locking a bit convoluted, but I suppose adding a
> > > > KERNEL_UNLOCK() before every "goto out" is worse?
> > > 
> > > I tried to keep the diff as small as possible to not obfuscate the change.
> > > If we want cleaner code we can move the !ANON case in a different 
> > > function.
> > 
> > Splitting would be hard because of the "goto is_anon".
> > 
> > > > > Index: uvm/uvm_mmap.c
> > > > > ===
> > > > > RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
> > > > > retrieving revision 1.161
> > > > > diff -u -p -r1.161 uvm_mmap.c
> > > > > --- uvm/uvm_mmap.c4 Mar 2020 21:15:39 -   1.161
> > > > > +++ uvm/uvm_mmap.c28 Sep 2020 09:48:26 -
> > > > > @@ -288,8 +288,11 @@ sys_mmap(struct proc *p, void *v, regist
> > > > >  
> > > > >   /* check for file mappings (i.e. not anonymous) and verify 
> > > > > file. */
> > > > >   if ((flags & MAP_ANON) == 0) {
> > > > > - if ((fp = fd_getfile(fdp, fd)) == NULL)
> > > > > - return (EBADF);
> > > > > + KERNEL_LOCK();
> > > > > + if ((fp = fd_getfile(fdp, fd)) == NULL) {
> > > > > + error = EBADF;
> > > > > + goto out;
> > > > > + }
> > > > >  
> > > > >   if (fp->f_type != DTYPE_VNODE) {
> > > > >   error = ENODEV; /* only mmap vnodes! */
> > > > > @@ -313,6 +316,7 @@ sys_mmap(struct proc *p, void *v, regist
> > > > >   flags |= MAP_ANON;
> > > > >   FRELE(fp, p);
> > > > >   fp = NULL;
> > > > > + KERNEL_UNLOCK();
> > > > >   goto is_anon;
> > > > >   }
> > > > >  
> > > > > @@ -362,9 +366,7 @@ sys_mmap(struct proc *p, void *v, regist
> > > > >* EPERM.
> > > > >*/
> > > > >   if (fp->f_flag & FWRITE) {
> > > > > - KE

Re: mmap: Do not push KERNEL_LOCK() too far

2020-10-05 Thread Martin Pieuchot
On 03/10/20(Sat) 12:59, Mark Kettenis wrote:
> > Date: Fri, 2 Oct 2020 10:32:27 +0200
> > From: Martin Pieuchot 
> > 
> > On 01/10/20(Thu) 21:44, Mark Kettenis wrote:
> > > > Date: Thu, 1 Oct 2020 14:10:56 +0200
> > > > From: Martin Pieuchot 
> > > > 
> > > > While studying a bug report from naddy@ in 2017 when testing guenther@'s
> > > > amap/anon locking diff I figured out that we have been too optimistic in
> > > > the !MAP_ANON case.
> > > > 
> > > > The reported panic involves, I'd guess, a race between fd_getfile() and
> > > > vref():
> > > > 
> > > >   panic: vref used where vget required
> > > >   db_enter() at db_enter+0x5
> > > >   panic() at panic+0x129
> > > >   vref(ff03b20d29e8) at vref+0x5d
> > > >   uvn_attach(101,ff03a5879dc0) at uvn_attach+0x11d
> > > >   uvm_mmapfile(7,ff03a5879dc0,2,1,13,10012) at 
> > > > uvm_mmapfile+0x12c
> > > >   sys_mmap(c50,8000225f82a0,1) at sys_mmap+0x604
> > > >   syscall() at syscall+0x279
> > > >   --- syscall (number 198) ---
> > > >   end of kernel
> > > > 
> > > > Removing the KERNEL_LOCK() from file mapping was out of the scope of 
> > > > this
> > > > previous work, so I'd like to go back to a single KERNEL_LOCK/UNLOCK 
> > > > dance
> > > > in this code path to remove any false positive.
> > > > 
> > > > Note that this code is currently always run under KERNEL_LOCK() so this
> > > > will only have effect once the syscall will be unlocked.
> > > > 
> > > > ok?
> > > 
> > > Hmm, I thought fd_getfile() was fully mpsafe.
> > 
> > It is to get a reference on `fp'.  However if the current thread
> > releases the KERNEL_LOCK() before calling vref(9) it might lose a
> > race.
> 
> I don't see the race.  The function returns a 'fp' with a reference,
> so 'fp' will be valid regardless of whether we hold the kernel lock or
> not.  So we should be able to take the kernel lock after the
> fd_getfile() call isn't it?

Should we?  I'd assume we can't unless somebody can explain the
contrary.  My point is the following: by releasing the KERNEL_LOCK() we
allow other parts of the kernel: syscalls and fault handlers to mess
with this vnode.

> > > But I suppose the kernel lock needs to be grabbed before we start
> > > looking at the vnode?
> > 
> > Yes, or to say it otherwise not released.
> 
> So the problem is that while we have an 'fp', its f_data member points
> to a vnode that has already been put on the freelist and therefore has
> v_usecount set to zero?  How does that happen?

I don't know.  I'm trying to be conservative to be able to concentrate
on amaps & anons.  I'd rather keep all the rest under a single
KERNEL_LOCK().

Hopefully this can be revisited soon.

> > > Your diff makes the locking a bit convoluted, but I suppose adding a
> > > KERNEL_UNLOCK() before every "goto out" is worse?
> > 
> > I tried to keep the diff as small as possible to not obfuscate the change.
> > If we want cleaner code we can move the !ANON case in a different function.
> 
> Splitting would be hard because of the "goto is_anon".
> 
> > > > Index: uvm/uvm_mmap.c
> > > > ===
> > > > RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
> > > > retrieving revision 1.161
> > > > diff -u -p -r1.161 uvm_mmap.c
> > > > --- uvm/uvm_mmap.c  4 Mar 2020 21:15:39 -   1.161
> > > > +++ uvm/uvm_mmap.c  28 Sep 2020 09:48:26 -
> > > > @@ -288,8 +288,11 @@ sys_mmap(struct proc *p, void *v, regist
> > > >  
> > > > /* check for file mappings (i.e. not anonymous) and verify 
> > > > file. */
> > > > if ((flags & MAP_ANON) == 0) {
> > > > -   if ((fp = fd_getfile(fdp, fd)) == NULL)
> > > > -   return (EBADF);
> > > > +   KERNEL_LOCK();
> > > > +   if ((fp = fd_getfile(fdp, fd)) == NULL) {
> > > > +   error = EBADF;
> > > > +   goto out;
> > > > +   }
> > > >  
> > > > if (fp->f_type != DTYPE_VNODE) {
> > > > error = ENODEV; /* only mmap vnodes! */
> > > > @@ -313,6 +316,7 @@ sys_mmap(struct proc *p, void *v, regist
> > > > flags |= MAP_ANON;
> > > > FRELE(fp, p);
> > > > fp = NULL;
> > > > +   KERNEL_UNLOCK();
> > > > goto is_anon;
> > > > }
> > > >  
> > > > @@ -362,9 +366,7 @@ sys_mmap(struct proc *p, void *v, regist
> > > >  * EPERM.
> > > >  */
> > > > if (fp->f_flag & FWRITE) {
> > > > -   KERNEL_LOCK();
> > > > error = VOP_GETATTR(vp, &va, 
> > > > p->p_ucred, p);
> > > > -   KERNEL_UNLOCK();
> > > > if (error)
> > > > goto out;
> > > > if ((va.v

Re: mmap: Do not push KERNEL_LOCK() too far

2020-10-03 Thread Mark Kettenis
> Date: Fri, 2 Oct 2020 10:32:27 +0200
> From: Martin Pieuchot 
> 
> On 01/10/20(Thu) 21:44, Mark Kettenis wrote:
> > > Date: Thu, 1 Oct 2020 14:10:56 +0200
> > > From: Martin Pieuchot 
> > > 
> > > While studying a bug report from naddy@ in 2017 when testing guenther@'s
> > > amap/anon locking diff I figured out that we have been too optimistic in
> > > the !MAP_ANON case.
> > > 
> > > The reported panic involves, I'd guess, a race between fd_getfile() and
> > > vref():
> > > 
> > >   panic: vref used where vget required
> > >   db_enter() at db_enter+0x5
> > >   panic() at panic+0x129
> > >   vref(ff03b20d29e8) at vref+0x5d
> > >   uvn_attach(101,ff03a5879dc0) at uvn_attach+0x11d
> > >   uvm_mmapfile(7,ff03a5879dc0,2,1,13,10012) at uvm_mmapfile+0x12c
> > >   sys_mmap(c50,8000225f82a0,1) at sys_mmap+0x604
> > >   syscall() at syscall+0x279
> > >   --- syscall (number 198) ---
> > >   end of kernel
> > > 
> > > Removing the KERNEL_LOCK() from file mapping was out of the scope of this
> > > previous work, so I'd like to go back to a single KERNEL_LOCK/UNLOCK dance
> > > in this code path to remove any false positive.
> > > 
> > > Note that this code is currently always run under KERNEL_LOCK() so this
> > > will only have effect once the syscall will be unlocked.
> > > 
> > > ok?
> > 
> > Hmm, I thought fd_getfile() was fully mpsafe.
> 
> It is to get a reference on `fp'.  However if the current thread
> releases the KERNEL_LOCK() before calling vref(9) it might lose a
> race.

I don't see the race.  The function returns a 'fp' with a reference,
so 'fp' will be valid regardless of whether we hold the kernel lock or
not.  So we should be able to take the kernel lock after the
fd_getfile() call isn't it?

> > But I suppose the kernel lock needs to be grabbed before we start
> > looking at the vnode?
> 
> Yes, or to say it otherwise not released.

So the problem is that while we have an 'fp', its f_data member points
to a vnode that has already been put on the freelist and therefore has
v_usecount set to zero?  How does that happen?

> > Your diff makes the locking a bit convoluted, but I suppose adding a
> > KERNEL_UNLOCK() before every "goto out" is worse?
> 
> I tried to keep the diff as small as possible to not obfuscate the change.
> If we want cleaner code we can move the !ANON case in a different function.

Splitting would be hard because of the "goto is_anon".

> > > Index: uvm/uvm_mmap.c
> > > ===
> > > RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
> > > retrieving revision 1.161
> > > diff -u -p -r1.161 uvm_mmap.c
> > > --- uvm/uvm_mmap.c4 Mar 2020 21:15:39 -   1.161
> > > +++ uvm/uvm_mmap.c28 Sep 2020 09:48:26 -
> > > @@ -288,8 +288,11 @@ sys_mmap(struct proc *p, void *v, regist
> > >  
> > >   /* check for file mappings (i.e. not anonymous) and verify file. */
> > >   if ((flags & MAP_ANON) == 0) {
> > > - if ((fp = fd_getfile(fdp, fd)) == NULL)
> > > - return (EBADF);
> > > + KERNEL_LOCK();
> > > + if ((fp = fd_getfile(fdp, fd)) == NULL) {
> > > + error = EBADF;
> > > + goto out;
> > > + }
> > >  
> > >   if (fp->f_type != DTYPE_VNODE) {
> > >   error = ENODEV; /* only mmap vnodes! */
> > > @@ -313,6 +316,7 @@ sys_mmap(struct proc *p, void *v, regist
> > >   flags |= MAP_ANON;
> > >   FRELE(fp, p);
> > >   fp = NULL;
> > > + KERNEL_UNLOCK();
> > >   goto is_anon;
> > >   }
> > >  
> > > @@ -362,9 +366,7 @@ sys_mmap(struct proc *p, void *v, regist
> > >* EPERM.
> > >*/
> > >   if (fp->f_flag & FWRITE) {
> > > - KERNEL_LOCK();
> > >   error = VOP_GETATTR(vp, &va, p->p_ucred, p);
> > > - KERNEL_UNLOCK();
> > >   if (error)
> > >   goto out;
> > >   if ((va.va_flags & (IMMUTABLE|APPEND)) == 0)
> > > @@ -390,9 +392,9 @@ sys_mmap(struct proc *p, void *v, regist
> > >   goto out;
> > >   }
> > >   }
> > > - KERNEL_LOCK();
> > >   error = uvm_mmapfile(&p->p_vmspace->vm_map, &addr, size, prot,
> > >   maxprot, flags, vp, pos, lim_cur(RLIMIT_MEMLOCK), p);
> > > + FRELE(fp, p);
> > >   KERNEL_UNLOCK();
> > >   } else {/* MAP_ANON case */
> > >   if (fd != -1)
> > > @@ -428,7 +430,10 @@ is_anon: /* label for SunOS style /dev/z
> > >   /* remember to add offset */
> > >   *retval = (register_t)(addr + pageoff);
> > >  
> > > + return (error);
> > > +
> > >  out:
> > > + KERNEL_UNLOCK();
> > >   if (fp)
> > >   FRELE(fp, p);
> > >   return (error);
> > > 
> > > 

Re: mmap: Do not push KERNEL_LOCK() too far

2020-10-02 Thread Martin Pieuchot
On 01/10/20(Thu) 21:44, Mark Kettenis wrote:
> > Date: Thu, 1 Oct 2020 14:10:56 +0200
> > From: Martin Pieuchot 
> > 
> > While studying a bug report from naddy@ in 2017 when testing guenther@'s
> > amap/anon locking diff I figured out that we have been too optimistic in
> > the !MAP_ANON case.
> > 
> > The reported panic involves, I'd guess, a race between fd_getfile() and
> > vref():
> > 
> >   panic: vref used where vget required
> >   db_enter() at db_enter+0x5
> >   panic() at panic+0x129
> >   vref(ff03b20d29e8) at vref+0x5d
> >   uvn_attach(101,ff03a5879dc0) at uvn_attach+0x11d
> >   uvm_mmapfile(7,ff03a5879dc0,2,1,13,10012) at uvm_mmapfile+0x12c
> >   sys_mmap(c50,8000225f82a0,1) at sys_mmap+0x604
> >   syscall() at syscall+0x279
> >   --- syscall (number 198) ---
> >   end of kernel
> > 
> > Removing the KERNEL_LOCK() from file mapping was out of the scope of this
> > previous work, so I'd like to go back to a single KERNEL_LOCK/UNLOCK dance
> > in this code path to remove any false positive.
> > 
> > Note that this code is currently always run under KERNEL_LOCK() so this
> > will only have effect once the syscall will be unlocked.
> > 
> > ok?
> 
> Hmm, I thought fd_getfile() was fully mpsafe.

It is to get a reference on `fp'.  However if the current thread
releases the KERNEL_LOCK() before calling vref(9) it might lose a
race.

> But I suppose the kernel lock needs to be grabbed before we start
> looking at the vnode?

Yes, or to say it otherwise not released.

> Your diff makes the locking a bit convoluted, but I suppose adding a
> KERNEL_UNLOCK() before every "goto out" is worse?

I tried to keep the diff as small as possible to not obfuscate the change.
If we want cleaner code we can move the !ANON case in a different function.

> > Index: uvm/uvm_mmap.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
> > retrieving revision 1.161
> > diff -u -p -r1.161 uvm_mmap.c
> > --- uvm/uvm_mmap.c  4 Mar 2020 21:15:39 -   1.161
> > +++ uvm/uvm_mmap.c  28 Sep 2020 09:48:26 -
> > @@ -288,8 +288,11 @@ sys_mmap(struct proc *p, void *v, regist
> >  
> > /* check for file mappings (i.e. not anonymous) and verify file. */
> > if ((flags & MAP_ANON) == 0) {
> > -   if ((fp = fd_getfile(fdp, fd)) == NULL)
> > -   return (EBADF);
> > +   KERNEL_LOCK();
> > +   if ((fp = fd_getfile(fdp, fd)) == NULL) {
> > +   error = EBADF;
> > +   goto out;
> > +   }
> >  
> > if (fp->f_type != DTYPE_VNODE) {
> > error = ENODEV; /* only mmap vnodes! */
> > @@ -313,6 +316,7 @@ sys_mmap(struct proc *p, void *v, regist
> > flags |= MAP_ANON;
> > FRELE(fp, p);
> > fp = NULL;
> > +   KERNEL_UNLOCK();
> > goto is_anon;
> > }
> >  
> > @@ -362,9 +366,7 @@ sys_mmap(struct proc *p, void *v, regist
> >  * EPERM.
> >  */
> > if (fp->f_flag & FWRITE) {
> > -   KERNEL_LOCK();
> > error = VOP_GETATTR(vp, &va, p->p_ucred, p);
> > -   KERNEL_UNLOCK();
> > if (error)
> > goto out;
> > if ((va.va_flags & (IMMUTABLE|APPEND)) == 0)
> > @@ -390,9 +392,9 @@ sys_mmap(struct proc *p, void *v, regist
> > goto out;
> > }
> > }
> > -   KERNEL_LOCK();
> > error = uvm_mmapfile(&p->p_vmspace->vm_map, &addr, size, prot,
> > maxprot, flags, vp, pos, lim_cur(RLIMIT_MEMLOCK), p);
> > +   FRELE(fp, p);
> > KERNEL_UNLOCK();
> > } else {/* MAP_ANON case */
> > if (fd != -1)
> > @@ -428,7 +430,10 @@ is_anon:   /* label for SunOS style /dev/z
> > /* remember to add offset */
> > *retval = (register_t)(addr + pageoff);
> >  
> > +   return (error);
> > +
> >  out:
> > +   KERNEL_UNLOCK();
> > if (fp)
> > FRELE(fp, p);
> > return (error);
> > 
> > 



Re: mmap: Do not push KERNEL_LOCK() too far

2020-10-01 Thread Mark Kettenis
> Date: Thu, 1 Oct 2020 14:10:56 +0200
> From: Martin Pieuchot 
> 
> While studying a bug report from naddy@ in 2017 when testing guenther@'s
> amap/anon locking diff I figured out that we have been too optimistic in
> the !MAP_ANON case.
> 
> The reported panic involves, I'd guess, a race between fd_getfile() and
> vref():
> 
>   panic: vref used where vget required
>   db_enter() at db_enter+0x5
>   panic() at panic+0x129
>   vref(ff03b20d29e8) at vref+0x5d
>   uvn_attach(101,ff03a5879dc0) at uvn_attach+0x11d
>   uvm_mmapfile(7,ff03a5879dc0,2,1,13,10012) at uvm_mmapfile+0x12c
>   sys_mmap(c50,8000225f82a0,1) at sys_mmap+0x604
>   syscall() at syscall+0x279
>   --- syscall (number 198) ---
>   end of kernel
> 
> Removing the KERNEL_LOCK() from file mapping was out of the scope of this
> previous work, so I'd like to go back to a single KERNEL_LOCK/UNLOCK dance
> in this code path to remove any false positive.
> 
> Note that this code is currently always run under KERNEL_LOCK() so this
> will only have effect once the syscall will be unlocked.
> 
> ok?

Hmm, I thought fd_getfile() was fully mpsafe.

But I suppose the kernel lock needs to be grabbed before we start
looking at the vnode?

Your diff makes the locking a bit convoluted, but I suppose adding a
KERNEL_UNLOCK() before every "goto out" is worse?


> Index: uvm/uvm_mmap.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
> retrieving revision 1.161
> diff -u -p -r1.161 uvm_mmap.c
> --- uvm/uvm_mmap.c4 Mar 2020 21:15:39 -   1.161
> +++ uvm/uvm_mmap.c28 Sep 2020 09:48:26 -
> @@ -288,8 +288,11 @@ sys_mmap(struct proc *p, void *v, regist
>  
>   /* check for file mappings (i.e. not anonymous) and verify file. */
>   if ((flags & MAP_ANON) == 0) {
> - if ((fp = fd_getfile(fdp, fd)) == NULL)
> - return (EBADF);
> + KERNEL_LOCK();
> + if ((fp = fd_getfile(fdp, fd)) == NULL) {
> + error = EBADF;
> + goto out;
> + }
>  
>   if (fp->f_type != DTYPE_VNODE) {
>   error = ENODEV; /* only mmap vnodes! */
> @@ -313,6 +316,7 @@ sys_mmap(struct proc *p, void *v, regist
>   flags |= MAP_ANON;
>   FRELE(fp, p);
>   fp = NULL;
> + KERNEL_UNLOCK();
>   goto is_anon;
>   }
>  
> @@ -362,9 +366,7 @@ sys_mmap(struct proc *p, void *v, regist
>* EPERM.
>*/
>   if (fp->f_flag & FWRITE) {
> - KERNEL_LOCK();
>   error = VOP_GETATTR(vp, &va, p->p_ucred, p);
> - KERNEL_UNLOCK();
>   if (error)
>   goto out;
>   if ((va.va_flags & (IMMUTABLE|APPEND)) == 0)
> @@ -390,9 +392,9 @@ sys_mmap(struct proc *p, void *v, regist
>   goto out;
>   }
>   }
> - KERNEL_LOCK();
>   error = uvm_mmapfile(&p->p_vmspace->vm_map, &addr, size, prot,
>   maxprot, flags, vp, pos, lim_cur(RLIMIT_MEMLOCK), p);
> + FRELE(fp, p);
>   KERNEL_UNLOCK();
>   } else {/* MAP_ANON case */
>   if (fd != -1)
> @@ -428,7 +430,10 @@ is_anon: /* label for SunOS style /dev/z
>   /* remember to add offset */
>   *retval = (register_t)(addr + pageoff);
>  
> + return (error);
> +
>  out:
> + KERNEL_UNLOCK();
>   if (fp)
>   FRELE(fp, p);
>   return (error);
> 
> 



mmap: Do not push KERNEL_LOCK() too far

2020-10-01 Thread Martin Pieuchot
While studying a bug report from naddy@ in 2017 when testing guenther@'s
amap/anon locking diff I figured out that we have been too optimistic in
the !MAP_ANON case.

The reported panic involves, I'd guess, a race between fd_getfile() and
vref():

  panic: vref used where vget required
  db_enter() at db_enter+0x5
  panic() at panic+0x129
  vref(ff03b20d29e8) at vref+0x5d
  uvn_attach(101,ff03a5879dc0) at uvn_attach+0x11d
  uvm_mmapfile(7,ff03a5879dc0,2,1,13,10012) at uvm_mmapfile+0x12c
  sys_mmap(c50,8000225f82a0,1) at sys_mmap+0x604
  syscall() at syscall+0x279
  --- syscall (number 198) ---
  end of kernel

Removing the KERNEL_LOCK() from file mapping was out of the scope of this
previous work, so I'd like to go back to a single KERNEL_LOCK/UNLOCK dance
in this code path to remove any false positive.

Note that this code is currently always run under KERNEL_LOCK() so this
will only have effect once the syscall will be unlocked.

ok?

Index: uvm/uvm_mmap.c
===
RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
retrieving revision 1.161
diff -u -p -r1.161 uvm_mmap.c
--- uvm/uvm_mmap.c  4 Mar 2020 21:15:39 -   1.161
+++ uvm/uvm_mmap.c  28 Sep 2020 09:48:26 -
@@ -288,8 +288,11 @@ sys_mmap(struct proc *p, void *v, regist
 
/* check for file mappings (i.e. not anonymous) and verify file. */
if ((flags & MAP_ANON) == 0) {
-   if ((fp = fd_getfile(fdp, fd)) == NULL)
-   return (EBADF);
+   KERNEL_LOCK();
+   if ((fp = fd_getfile(fdp, fd)) == NULL) {
+   error = EBADF;
+   goto out;
+   }
 
if (fp->f_type != DTYPE_VNODE) {
error = ENODEV; /* only mmap vnodes! */
@@ -313,6 +316,7 @@ sys_mmap(struct proc *p, void *v, regist
flags |= MAP_ANON;
FRELE(fp, p);
fp = NULL;
+   KERNEL_UNLOCK();
goto is_anon;
}
 
@@ -362,9 +366,7 @@ sys_mmap(struct proc *p, void *v, regist
 * EPERM.
 */
if (fp->f_flag & FWRITE) {
-   KERNEL_LOCK();
error = VOP_GETATTR(vp, &va, p->p_ucred, p);
-   KERNEL_UNLOCK();
if (error)
goto out;
if ((va.va_flags & (IMMUTABLE|APPEND)) == 0)
@@ -390,9 +392,9 @@ sys_mmap(struct proc *p, void *v, regist
goto out;
}
}
-   KERNEL_LOCK();
error = uvm_mmapfile(&p->p_vmspace->vm_map, &addr, size, prot,
maxprot, flags, vp, pos, lim_cur(RLIMIT_MEMLOCK), p);
+   FRELE(fp, p);
KERNEL_UNLOCK();
} else {/* MAP_ANON case */
if (fd != -1)
@@ -428,7 +430,10 @@ is_anon:   /* label for SunOS style /dev/z
/* remember to add offset */
*retval = (register_t)(addr + pageoff);
 
+   return (error);
+
 out:
+   KERNEL_UNLOCK();
if (fp)
FRELE(fp, p);
return (error);