Re: NET_LOCK() pr_sysctl

2017-01-18 Thread Martin Pieuchot
On 16/01/17(Mon) 23:53, Alexander Bluhm wrote:
> On Mon, Jan 16, 2017 at 08:34:43PM +0100, Alexander Bluhm wrote:
> > If I implement the same trick for newp, I can avoid the "netlock
> > locking against myself" with sysctl on memory mapped files over
> > NFS.  Of course other copyin/copyout paths like pf(4) ioctl(2) still
> > have to be checked.  IPsec pfkey seem to use the sysctl mechanism.
> 
> Hrvoje Popovski has tested the diff and found some ugly
> pmap_unwire: wiring for pmap 0xff00075f5210 va 0x7f7d5000 didn't 
> change!
> kernel printfs.  The happens when sysctl(8) writes a value.
> 
> If oldp and newp are in the same page, I have called uvm_vsunlock()
> twice on the same address.  I have added a simple check that does
> not cover complicated overlappings but catches the common case.
> 
> Also I think PROT_READ for the newp should be enough.
> Does anybody know, why the oldp is mapped PROT_READ | PROT_WRITE?
> Is PROT_WRITE not sufficient?

I don't think this is the way to go.  I'd prefer a solution that work
for the other code paths as well.



Re: NET_LOCK() pr_sysctl

2017-01-16 Thread Mateusz Guzik
On Mon, Jan 16, 2017 at 07:34:43PM +, Alexander Bluhm wrote:
> On Mon, Jan 09, 2017 at 11:55:55PM +0100, Alexander Bluhm wrote:
> > On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote:
> > > In this particular case, what happens if the access results in a page
> > > fault and the area comes from a nfs mapped file? If network i/o is done
> > > from the same context, this should result in 'locking against myself'
> > > assertion failure.
> > 
> > I have written a program the sets a sysctl value from a memory
> > mapped file mounted on NFS.  As expected it panics when NET_LOCK()
> > is enabled.
> 
> I was wondering why my test program did only crash during copyin
> but never for copyout.  For sysctl(2) the copyout does never sleep
> as the memory is wired.  This is done to avoid races when collecting
> data for the oldp.

Once more, I'm not familiar with the codebase, only skimmed through so
apologies for mistakes (if any).

I think the current code is buggy and the approach needs to be adjusted
to work.

On a more SMP-ified kernel a concurrently running thread can munlock the
area or mmap something over it. Then vsunlock executed over it will
cause trouble. The same checks used during munlock have to be repeated.

Even the current kernel has the problem - since NET_LOCK is a sleepable
lock, the caller can be put off the cpu and the newly scheduled thread
can do aforementioned shenaningans. Likely there is a similar story for
other sysctl handlers.

With spurious vsunlock taken care off, we are back to recursive locking
due to the area not faulted in.

If you want to stick to holding NET_LOCK over copyin/copyout, I suggest
introducing _nofault variants - that is, instead of faulting the page
in, just return an error. The kernel did what it should by wiring
appropriate pages. If you play games by changing the same mappings, the
failure is on you. I.e. it will not affect legitimate programs.

However, this slows down the general sysctl interface for a corner case.
At the very least, a dedicated helper for wiring pages can be introduced
and it would be executed by sysctls interested in the facility.


> 
> If I implement the same trick for newp, I can avoid the "netlock
> locking against myself" with sysctl on memory mapped files over
> NFS.  Of course other copyin/copyout paths like pf(4) ioctl(2) still
> have to be checked.  IPsec pfkey seem to use the sysctl mechanism.
> 
> Note that the variable dolock was always 1, so I removed it.
> 
> ok?
> 
> bluhm
> 
> Index: kern/kern_sysctl.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
> retrieving revision 1.320
> diff -u -p -r1.320 kern_sysctl.c
> --- kern/kern_sysctl.c11 Nov 2016 18:59:09 -  1.320
> +++ kern/kern_sysctl.c16 Jan 2017 18:49:20 -
> @@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi
>   syscallarg(void *) new;
>   syscallarg(size_t) newlen;
>   } */ *uap = v;
> - int error, dolock = 1;
> + int error;
>   size_t savelen = 0, oldlen = 0;
>   sysctlfn *fn;
>   int name[CTL_MAXNAME];
> @@ -219,30 +219,41 @@ sys_sysctl(struct proc *p, void *v, regi
>   if (SCARG(uap, oldlenp) &&
>   (error = copyin(SCARG(uap, oldlenp), , sizeof(oldlen
>   return (error);
> - if (SCARG(uap, old) != NULL) {
> + if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) {
>   if ((error = rw_enter(_lock, RW_WRITE|RW_INTR)) != 0)
>   return (error);
> - if (dolock) {
> - if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
> - rw_exit_write(_lock);
> - return (ENOMEM);
> - }
> - error = uvm_vslock(p, SCARG(uap, old), oldlen,
> - PROT_READ | PROT_WRITE);
> - if (error) {
> - rw_exit_write(_lock);
> - return (error);
> - }
> + }
> + if (SCARG(uap, old) != NULL) {
> + if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
> + error = ENOMEM;
> + goto unlock;
>   }
> + error = uvm_vslock(p, SCARG(uap, old), oldlen,
> + PROT_READ | PROT_WRITE);
> + if (error)
> + goto unlock;
>   savelen = oldlen;
>   }
> + if (SCARG(uap, new) != NULL) {
> + if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) {
> + error = ENOMEM;
> + goto unwire;
> + }
> + error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen),
> + PROT_READ | PROT_WRITE);
> + if (error)
> + goto unwire;
> + }
>   error = (*fn)([1], SCARG(uap, 

Re: NET_LOCK() pr_sysctl

2017-01-16 Thread Hrvoje Popovski
On 16.1.2017. 23:53, Alexander Bluhm wrote:
> Hrvoje Popovski has tested the diff and found some ugly
> pmap_unwire: wiring for pmap 0xff00075f5210 va 0x7f7d5000 didn't 
> change!
> kernel printfs.  The happens when sysctl(8) writes a value.
> 
> If oldp and newp are in the same page, I have called uvm_vsunlock()
> twice on the same address.  I have added a simple check that does
> not cover complicated overlappings but catches the common case.
> 
> Also I think PROT_READ for the newp should be enough.
> Does anybody know, why the oldp is mapped PROT_READ | PROT_WRITE?
> Is PROT_WRITE not sufficient?
> 
> bluhm

Hi,

with this new diff i don't see any logs and box seems stable ...




Re: NET_LOCK() pr_sysctl

2017-01-16 Thread Alexander Bluhm
On Mon, Jan 16, 2017 at 08:34:43PM +0100, Alexander Bluhm wrote:
> If I implement the same trick for newp, I can avoid the "netlock
> locking against myself" with sysctl on memory mapped files over
> NFS.  Of course other copyin/copyout paths like pf(4) ioctl(2) still
> have to be checked.  IPsec pfkey seem to use the sysctl mechanism.

Hrvoje Popovski has tested the diff and found some ugly
pmap_unwire: wiring for pmap 0xff00075f5210 va 0x7f7d5000 didn't change!
kernel printfs.  The happens when sysctl(8) writes a value.

If oldp and newp are in the same page, I have called uvm_vsunlock()
twice on the same address.  I have added a simple check that does
not cover complicated overlappings but catches the common case.

Also I think PROT_READ for the newp should be enough.
Does anybody know, why the oldp is mapped PROT_READ | PROT_WRITE?
Is PROT_WRITE not sufficient?

bluhm

Index: kern/kern_sysctl.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.320
diff -u -p -r1.320 kern_sysctl.c
--- kern/kern_sysctl.c  11 Nov 2016 18:59:09 -  1.320
+++ kern/kern_sysctl.c  16 Jan 2017 22:37:42 -
@@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi
syscallarg(void *) new;
syscallarg(size_t) newlen;
} */ *uap = v;
-   int error, dolock = 1;
+   int error;
size_t savelen = 0, oldlen = 0;
sysctlfn *fn;
int name[CTL_MAXNAME];
@@ -219,30 +219,45 @@ sys_sysctl(struct proc *p, void *v, regi
if (SCARG(uap, oldlenp) &&
(error = copyin(SCARG(uap, oldlenp), , sizeof(oldlen
return (error);
-   if (SCARG(uap, old) != NULL) {
+   if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) {
if ((error = rw_enter(_lock, RW_WRITE|RW_INTR)) != 0)
return (error);
-   if (dolock) {
-   if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
-   rw_exit_write(_lock);
-   return (ENOMEM);
-   }
-   error = uvm_vslock(p, SCARG(uap, old), oldlen,
-   PROT_READ | PROT_WRITE);
-   if (error) {
-   rw_exit_write(_lock);
-   return (error);
-   }
+   }
+   if (SCARG(uap, old) != NULL) {
+   if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
+   error = ENOMEM;
+   goto unlock;
}
+   error = uvm_vslock(p, SCARG(uap, old), oldlen,
+   PROT_READ | PROT_WRITE);
+   if (error)
+   goto unlock;
savelen = oldlen;
}
+   if (SCARG(uap, new) != NULL) {
+   if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) {
+   error = ENOMEM;
+   goto unwire;
+   }
+   error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen),
+   PROT_READ);
+   if (error)
+   goto unwire;
+   }
error = (*fn)([1], SCARG(uap, namelen) - 1, SCARG(uap, old),
, SCARG(uap, new), SCARG(uap, newlen), p);
-   if (SCARG(uap, old) != NULL) {
-   if (dolock)
-   uvm_vsunlock(p, SCARG(uap, old), savelen);
+   if (SCARG(uap, new) != NULL && (SCARG(uap, old) == NULL ||
+   (trunc_page((vaddr_t)SCARG(uap, new)) <
+   trunc_page((vaddr_t)SCARG(uap, old)) ||
+   (round_page((vaddr_t)SCARG(uap, new) + SCARG(uap, newlen)) >
+   round_page((vaddr_t)SCARG(uap, old) + savelen)
+   uvm_vsunlock(p, SCARG(uap, new), SCARG(uap, newlen));
+ unwire:
+   if (SCARG(uap, old) != NULL)
+   uvm_vsunlock(p, SCARG(uap, old), savelen);
+ unlock:
+   if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL)
rw_exit_write(_lock);
-   }
if (error)
return (error);
if (SCARG(uap, oldlenp))



Re: NET_LOCK() pr_sysctl

2017-01-16 Thread Alexander Bluhm
On Mon, Jan 09, 2017 at 11:55:55PM +0100, Alexander Bluhm wrote:
> On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote:
> > In this particular case, what happens if the access results in a page
> > fault and the area comes from a nfs mapped file? If network i/o is done
> > from the same context, this should result in 'locking against myself'
> > assertion failure.
> 
> I have written a program the sets a sysctl value from a memory
> mapped file mounted on NFS.  As expected it panics when NET_LOCK()
> is enabled.

I was wondering why my test program did only crash during copyin
but never for copyout.  For sysctl(2) the copyout does never sleep
as the memory is wired.  This is done to avoid races when collecting
data for the oldp.

If I implement the same trick for newp, I can avoid the "netlock
locking against myself" with sysctl on memory mapped files over
NFS.  Of course other copyin/copyout paths like pf(4) ioctl(2) still
have to be checked.  IPsec pfkey seem to use the sysctl mechanism.

Note that the variable dolock was always 1, so I removed it.

ok?

bluhm

Index: kern/kern_sysctl.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.320
diff -u -p -r1.320 kern_sysctl.c
--- kern/kern_sysctl.c  11 Nov 2016 18:59:09 -  1.320
+++ kern/kern_sysctl.c  16 Jan 2017 18:49:20 -
@@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi
syscallarg(void *) new;
syscallarg(size_t) newlen;
} */ *uap = v;
-   int error, dolock = 1;
+   int error;
size_t savelen = 0, oldlen = 0;
sysctlfn *fn;
int name[CTL_MAXNAME];
@@ -219,30 +219,41 @@ sys_sysctl(struct proc *p, void *v, regi
if (SCARG(uap, oldlenp) &&
(error = copyin(SCARG(uap, oldlenp), , sizeof(oldlen
return (error);
-   if (SCARG(uap, old) != NULL) {
+   if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) {
if ((error = rw_enter(_lock, RW_WRITE|RW_INTR)) != 0)
return (error);
-   if (dolock) {
-   if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
-   rw_exit_write(_lock);
-   return (ENOMEM);
-   }
-   error = uvm_vslock(p, SCARG(uap, old), oldlen,
-   PROT_READ | PROT_WRITE);
-   if (error) {
-   rw_exit_write(_lock);
-   return (error);
-   }
+   }
+   if (SCARG(uap, old) != NULL) {
+   if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) {
+   error = ENOMEM;
+   goto unlock;
}
+   error = uvm_vslock(p, SCARG(uap, old), oldlen,
+   PROT_READ | PROT_WRITE);
+   if (error)
+   goto unlock;
savelen = oldlen;
}
+   if (SCARG(uap, new) != NULL) {
+   if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) {
+   error = ENOMEM;
+   goto unwire;
+   }
+   error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen),
+   PROT_READ | PROT_WRITE);
+   if (error)
+   goto unwire;
+   }
error = (*fn)([1], SCARG(uap, namelen) - 1, SCARG(uap, old),
, SCARG(uap, new), SCARG(uap, newlen), p);
-   if (SCARG(uap, old) != NULL) {
-   if (dolock)
-   uvm_vsunlock(p, SCARG(uap, old), savelen);
+   if (SCARG(uap, new) != NULL)
+   uvm_vsunlock(p, SCARG(uap, new), SCARG(uap, newlen));
+ unwire:
+   if (SCARG(uap, old) != NULL)
+   uvm_vsunlock(p, SCARG(uap, old), savelen);
+ unlock:
+   if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL)
rw_exit_write(_lock);
-   }
if (error)
return (error);
if (SCARG(uap, oldlenp))



Re: NET_LOCK() pr_sysctl

2017-01-09 Thread Alexander Bluhm
On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote:
> In this particular case, what happens if the access results in a page
> fault and the area comes from a nfs mapped file? If network i/o is done
> from the same context, this should result in 'locking against myself'
> assertion failure.

I have written a program the sets a sysctl value from a memory
mapped file mounted on NFS.  As expected it panics when NET_LOCK()
is enabled.

panic: rw_enter: netlock locking against myself
Stopped at  Debugger+0x7:   leave
   TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*45785  40072  0 0x3  00  mmap-sysctl

ddb{0}> trace
Debugger(d09f4dbd,f57706d0,d09cc44c,f57706d0,0) at Debugger+0x7
panic(d09cc44c,d09d634f,f5770700,f57706fc,0) at panic+0x71
rw_enter(d0b4ef38,1,f5770784,d04a8b3d,d953c00c) at rw_enter+0x1b4
rw_enter_write(d0b4ef38,1,0,d03cc0ce,d0bbcf80) at rw_enter_write+0x3c
sosend(d9569730,0,0,d976aa00,0) at sosend+0xf5
nfs_send(d9569730,d957bd00,d976aa00,d953c00c,d954a5b0) at nfs_send+0x8a
nfs_request(d96fda84,6,f57708ac,0,d96fda84) at nfs_request+0x3a3
nfs_readrpc(d96fda84,f5770930,0,0,39fcef3f) at nfs_readrpc+0x14e
nfs_doio(d954a5b0,d9540738,0,2000,d9540738) at nfs_doio+0x321
nfs_bioread(d96fda84,f5770b38,0,d97fa840,d9540738) at nfs_bioread+0x4ff
nfs_read(f5770ae0,3,33,d30b5870,d9745ed4) at nfs_read+0x35
VOP_READ(d96fda84,f5770b38,0,d97fa840,d30b5870) at VOP_READ+0x3f
uvn_io(d9745ed4,f5770bc0,1,2,0) at uvn_io+0x2d5
uvn_get(d9745ed4,0,0,f5770cf8,f5770d00) at uvn_get+0x234
uvm_fault(d9581d24,8a1b1000,0,1,f5770d5c) at uvm_fault+0xf8e
trap() at trap+0x723
--- trap (number 4) ---

bluhm



Re: NET_LOCK() pr_sysctl

2016-12-21 Thread Mateusz Guzik
On Tue, Dec 20, 2016 at 05:37:20PM +, Alexander Bluhm wrote:
> Obviosly a NET_LOCK() is missing in tcp_sysctl().
> 
> I think it is better to place the lock into net_sysctl() where all
> the protocol sysctls are called via pr_sysctl.  Then we don't have
> to decide each case individually.  As calling sysctl(2) is in the
> slow path, doing fine grained locking has no benefit.  Many sysctl
> cases copy out a struct.  Having a lock around that keeps the struct
> consistent.
> 

Holding locks across copyout/copyin is always fishy.

In this particular case, what happens if the access results in a page
fault and the area comes from a nfs mapped file? If network i/o is done
from the same context, this should result in 'locking against myself'
assertion failure.

That said, I'm not exactly familiar with the area, so maybe that's a
false alarm.

-- 
Mateusz Guzik 



NET_LOCK() pr_sysctl

2016-12-20 Thread Alexander Bluhm
Hi,

I have seen spl softnet assert failures like this.

splassert: sowwakeup: want 1 have 0
Starting stack trace...
sowwakeup(1,0,d09d8bd7,d03ba9f9,40) at sowwakeup+0x3f
sowwakeup(db549818,d6dc850e,f54d9be8,0,dae674d4) at sowwakeup+0x3f
soisdisconnected(db549818,0,db549818,2,f54d9c00) at soisdisconnected+0x2c
tcp_close(dae674d4,35,ff94,bc51bc0a,a88c) at tcp_close+0x97
tcp_ident(0,f54d9ef8,cf7e2a50,20c,1) at tcp_ident+0x302
tcp_sysctl(f54d9ed4,1,0,f54d9ef8,cf7e2a50) at tcp_sysctl+0x28b
sys_sysctl(d9f0188c,f54d9f5c,f54d9f7c,0,f54d9fa8) at sys_sysctl+0x20e
syscall() at syscall+0x250

Obviosly a NET_LOCK() is missing in tcp_sysctl().

I think it is better to place the lock into net_sysctl() where all
the protocol sysctls are called via pr_sysctl.  Then we don't have
to decide each case individually.  As calling sysctl(2) is in the
slow path, doing fine grained locking has no benefit.  Many sysctl
cases copy out a struct.  Having a lock around that keeps the struct
consistent.

ok?

bluhm

Index: kern/uipc_domain.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/uipc_domain.c,v
retrieving revision 1.46
diff -u -p -r1.46 uipc_domain.c
--- kern/uipc_domain.c  22 Nov 2016 10:32:31 -  1.46
+++ kern/uipc_domain.c  20 Dec 2016 14:57:26 -
@@ -165,7 +165,7 @@ net_sysctl(int *name, u_int namelen, voi
 {
struct domain *dp;
struct protosw *pr;
-   int family, protocol;
+   int s, error, family, protocol;
 
/*
 * All sysctl names at this level are nonterminal.
@@ -199,18 +199,26 @@ net_sysctl(int *name, u_int namelen, voi
 #ifdef MPLS
/* XXX WARNING: big fat ugly hack */
/* stupid net.mpls is special as it does not have a protocol */
-   if (family == PF_MPLS)
-   return (dp->dom_protosw[0].pr_sysctl(name + 1, namelen - 1,
-   oldp, oldlenp, newp, newlen));
+   if (family == PF_MPLS) {
+   NET_LOCK(s);
+   error = (dp->dom_protosw[0].pr_sysctl)(name + 1, namelen - 1,
+   oldp, oldlenp, newp, newlen);
+   NET_UNLOCK(s);
+   return (error);
+   }
 #endif
 
if (namelen < 3)
return (EISDIR);/* overloaded */
protocol = name[1];
for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
-   if (pr->pr_protocol == protocol && pr->pr_sysctl)
-   return ((*pr->pr_sysctl)(name + 2, namelen - 2,
-   oldp, oldlenp, newp, newlen));
+   if (pr->pr_protocol == protocol && pr->pr_sysctl) {
+   NET_LOCK(s);
+   error = (*pr->pr_sysctl)(name + 2, namelen - 2,
+   oldp, oldlenp, newp, newlen);
+   NET_UNLOCK(s);
+   return (error);
+   }
return (ENOPROTOOPT);
 }
 
Index: net/rtsock.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtsock.c,v
retrieving revision 1.211
diff -u -p -r1.211 rtsock.c
--- net/rtsock.c19 Dec 2016 08:36:49 -  1.211
+++ net/rtsock.c20 Dec 2016 15:12:06 -
@@ -1563,12 +1563,14 @@ int
 sysctl_rtable(int *name, u_int namelen, void *where, size_t *given, void *new,
 size_t newlen)
 {
-   int  i, s, error = EINVAL;
+   int  i, error = EINVAL;
u_char   af;
struct walkarg   w;
struct rt_tableinfo  tableinfo;
u_inttableid = 0;
 
+   NET_ASSERT_LOCKED();
+
if (new)
return (EPERM);
if (namelen < 3 || namelen > 4)
@@ -1588,7 +1590,6 @@ sysctl_rtable(int *name, u_int namelen, 
} else
tableid = curproc->p_p->ps_rtableid;
 
-   s = splsoftnet();
switch (w.w_op) {
case NET_RT_DUMP:
case NET_RT_FLAGS:
@@ -1611,25 +1612,20 @@ sysctl_rtable(int *name, u_int namelen, 
case NET_RT_STATS:
error = sysctl_rdstruct(where, given, new,
, sizeof(rtstat));
-   splx(s);
return (error);
case NET_RT_TABLE:
tableid = w.w_arg;
-   if (!rtable_exists(tableid)) {
-   splx(s);
+   if (!rtable_exists(tableid))
return (ENOENT);
-   }
tableinfo.rti_tableid = tableid;
tableinfo.rti_domainid = rtable_l2(tableid);
error = sysctl_rdstruct(where, given, new,
, sizeof(tableinfo));
-   splx(s);
return (error);
case NET_RT_IFNAMES:
error = sysctl_ifnames();
break;
}
-   splx(s);
free(w.w_tmem, M_RTABLE, 0);
w.w_needed += w.w_given;