Re: NET_LOCK() pr_sysctl
On 16/01/17(Mon) 23:53, Alexander Bluhm wrote: > On Mon, Jan 16, 2017 at 08:34:43PM +0100, Alexander Bluhm wrote: > > If I implement the same trick for newp, I can avoid the "netlock > > locking against myself" with sysctl on memory mapped files over > > NFS. Of course other copyin/copyout paths like pf(4) ioctl(2) still > > have to be checked. IPsec pfkey seem to use the sysctl mechanism. > > Hrvoje Popovski has tested the diff and found some ugly > pmap_unwire: wiring for pmap 0xff00075f5210 va 0x7f7d5000 didn't > change! > kernel printfs. The happens when sysctl(8) writes a value. > > If oldp and newp are in the same page, I have called uvm_vsunlock() > twice on the same address. I have added a simple check that does > not cover complicated overlappings but catches the common case. > > Also I think PROT_READ for the newp should be enough. > Does anybody know, why the oldp is mapped PROT_READ | PROT_WRITE? > Is PROT_WRITE not sufficient? I don't think this is the way to go. I'd prefer a solution that work for the other code paths as well.
Re: NET_LOCK() pr_sysctl
On Mon, Jan 16, 2017 at 07:34:43PM +, Alexander Bluhm wrote: > On Mon, Jan 09, 2017 at 11:55:55PM +0100, Alexander Bluhm wrote: > > On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote: > > > In this particular case, what happens if the access results in a page > > > fault and the area comes from a nfs mapped file? If network i/o is done > > > from the same context, this should result in 'locking against myself' > > > assertion failure. > > > > I have written a program the sets a sysctl value from a memory > > mapped file mounted on NFS. As expected it panics when NET_LOCK() > > is enabled. > > I was wondering why my test program did only crash during copyin > but never for copyout. For sysctl(2) the copyout does never sleep > as the memory is wired. This is done to avoid races when collecting > data for the oldp. Once more, I'm not familiar with the codebase, only skimmed through so apologies for mistakes (if any). I think the current code is buggy and the approach needs to be adjusted to work. On a more SMP-ified kernel a concurrently running thread can munlock the area or mmap something over it. Then vsunlock executed over it will cause trouble. The same checks used during munlock have to be repeated. Even the current kernel has the problem - since NET_LOCK is a sleepable lock, the caller can be put off the cpu and the newly scheduled thread can do aforementioned shenaningans. Likely there is a similar story for other sysctl handlers. With spurious vsunlock taken care off, we are back to recursive locking due to the area not faulted in. If you want to stick to holding NET_LOCK over copyin/copyout, I suggest introducing _nofault variants - that is, instead of faulting the page in, just return an error. The kernel did what it should by wiring appropriate pages. If you play games by changing the same mappings, the failure is on you. I.e. it will not affect legitimate programs. However, this slows down the general sysctl interface for a corner case. At the very least, a dedicated helper for wiring pages can be introduced and it would be executed by sysctls interested in the facility. > > If I implement the same trick for newp, I can avoid the "netlock > locking against myself" with sysctl on memory mapped files over > NFS. Of course other copyin/copyout paths like pf(4) ioctl(2) still > have to be checked. IPsec pfkey seem to use the sysctl mechanism. > > Note that the variable dolock was always 1, so I removed it. > > ok? > > bluhm > > Index: kern/kern_sysctl.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v > retrieving revision 1.320 > diff -u -p -r1.320 kern_sysctl.c > --- kern/kern_sysctl.c11 Nov 2016 18:59:09 - 1.320 > +++ kern/kern_sysctl.c16 Jan 2017 18:49:20 - > @@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi > syscallarg(void *) new; > syscallarg(size_t) newlen; > } */ *uap = v; > - int error, dolock = 1; > + int error; > size_t savelen = 0, oldlen = 0; > sysctlfn *fn; > int name[CTL_MAXNAME]; > @@ -219,30 +219,41 @@ sys_sysctl(struct proc *p, void *v, regi > if (SCARG(uap, oldlenp) && > (error = copyin(SCARG(uap, oldlenp), , sizeof(oldlen > return (error); > - if (SCARG(uap, old) != NULL) { > + if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) { > if ((error = rw_enter(_lock, RW_WRITE|RW_INTR)) != 0) > return (error); > - if (dolock) { > - if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) { > - rw_exit_write(_lock); > - return (ENOMEM); > - } > - error = uvm_vslock(p, SCARG(uap, old), oldlen, > - PROT_READ | PROT_WRITE); > - if (error) { > - rw_exit_write(_lock); > - return (error); > - } > + } > + if (SCARG(uap, old) != NULL) { > + if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) { > + error = ENOMEM; > + goto unlock; > } > + error = uvm_vslock(p, SCARG(uap, old), oldlen, > + PROT_READ | PROT_WRITE); > + if (error) > + goto unlock; > savelen = oldlen; > } > + if (SCARG(uap, new) != NULL) { > + if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) { > + error = ENOMEM; > + goto unwire; > + } > + error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen), > + PROT_READ | PROT_WRITE); > + if (error) > + goto unwire; > + } > error = (*fn)([1], SCARG(uap,
Re: NET_LOCK() pr_sysctl
On 16.1.2017. 23:53, Alexander Bluhm wrote: > Hrvoje Popovski has tested the diff and found some ugly > pmap_unwire: wiring for pmap 0xff00075f5210 va 0x7f7d5000 didn't > change! > kernel printfs. The happens when sysctl(8) writes a value. > > If oldp and newp are in the same page, I have called uvm_vsunlock() > twice on the same address. I have added a simple check that does > not cover complicated overlappings but catches the common case. > > Also I think PROT_READ for the newp should be enough. > Does anybody know, why the oldp is mapped PROT_READ | PROT_WRITE? > Is PROT_WRITE not sufficient? > > bluhm Hi, with this new diff i don't see any logs and box seems stable ...
Re: NET_LOCK() pr_sysctl
On Mon, Jan 16, 2017 at 08:34:43PM +0100, Alexander Bluhm wrote: > If I implement the same trick for newp, I can avoid the "netlock > locking against myself" with sysctl on memory mapped files over > NFS. Of course other copyin/copyout paths like pf(4) ioctl(2) still > have to be checked. IPsec pfkey seem to use the sysctl mechanism. Hrvoje Popovski has tested the diff and found some ugly pmap_unwire: wiring for pmap 0xff00075f5210 va 0x7f7d5000 didn't change! kernel printfs. The happens when sysctl(8) writes a value. If oldp and newp are in the same page, I have called uvm_vsunlock() twice on the same address. I have added a simple check that does not cover complicated overlappings but catches the common case. Also I think PROT_READ for the newp should be enough. Does anybody know, why the oldp is mapped PROT_READ | PROT_WRITE? Is PROT_WRITE not sufficient? bluhm Index: kern/kern_sysctl.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v retrieving revision 1.320 diff -u -p -r1.320 kern_sysctl.c --- kern/kern_sysctl.c 11 Nov 2016 18:59:09 - 1.320 +++ kern/kern_sysctl.c 16 Jan 2017 22:37:42 - @@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi syscallarg(void *) new; syscallarg(size_t) newlen; } */ *uap = v; - int error, dolock = 1; + int error; size_t savelen = 0, oldlen = 0; sysctlfn *fn; int name[CTL_MAXNAME]; @@ -219,30 +219,45 @@ sys_sysctl(struct proc *p, void *v, regi if (SCARG(uap, oldlenp) && (error = copyin(SCARG(uap, oldlenp), , sizeof(oldlen return (error); - if (SCARG(uap, old) != NULL) { + if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) { if ((error = rw_enter(_lock, RW_WRITE|RW_INTR)) != 0) return (error); - if (dolock) { - if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) { - rw_exit_write(_lock); - return (ENOMEM); - } - error = uvm_vslock(p, SCARG(uap, old), oldlen, - PROT_READ | PROT_WRITE); - if (error) { - rw_exit_write(_lock); - return (error); - } + } + if (SCARG(uap, old) != NULL) { + if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) { + error = ENOMEM; + goto unlock; } + error = uvm_vslock(p, SCARG(uap, old), oldlen, + PROT_READ | PROT_WRITE); + if (error) + goto unlock; savelen = oldlen; } + if (SCARG(uap, new) != NULL) { + if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) { + error = ENOMEM; + goto unwire; + } + error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen), + PROT_READ); + if (error) + goto unwire; + } error = (*fn)([1], SCARG(uap, namelen) - 1, SCARG(uap, old), , SCARG(uap, new), SCARG(uap, newlen), p); - if (SCARG(uap, old) != NULL) { - if (dolock) - uvm_vsunlock(p, SCARG(uap, old), savelen); + if (SCARG(uap, new) != NULL && (SCARG(uap, old) == NULL || + (trunc_page((vaddr_t)SCARG(uap, new)) < + trunc_page((vaddr_t)SCARG(uap, old)) || + (round_page((vaddr_t)SCARG(uap, new) + SCARG(uap, newlen)) > + round_page((vaddr_t)SCARG(uap, old) + savelen) + uvm_vsunlock(p, SCARG(uap, new), SCARG(uap, newlen)); + unwire: + if (SCARG(uap, old) != NULL) + uvm_vsunlock(p, SCARG(uap, old), savelen); + unlock: + if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) rw_exit_write(_lock); - } if (error) return (error); if (SCARG(uap, oldlenp))
Re: NET_LOCK() pr_sysctl
On Mon, Jan 09, 2017 at 11:55:55PM +0100, Alexander Bluhm wrote: > On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote: > > In this particular case, what happens if the access results in a page > > fault and the area comes from a nfs mapped file? If network i/o is done > > from the same context, this should result in 'locking against myself' > > assertion failure. > > I have written a program the sets a sysctl value from a memory > mapped file mounted on NFS. As expected it panics when NET_LOCK() > is enabled. I was wondering why my test program did only crash during copyin but never for copyout. For sysctl(2) the copyout does never sleep as the memory is wired. This is done to avoid races when collecting data for the oldp. If I implement the same trick for newp, I can avoid the "netlock locking against myself" with sysctl on memory mapped files over NFS. Of course other copyin/copyout paths like pf(4) ioctl(2) still have to be checked. IPsec pfkey seem to use the sysctl mechanism. Note that the variable dolock was always 1, so I removed it. ok? bluhm Index: kern/kern_sysctl.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v retrieving revision 1.320 diff -u -p -r1.320 kern_sysctl.c --- kern/kern_sysctl.c 11 Nov 2016 18:59:09 - 1.320 +++ kern/kern_sysctl.c 16 Jan 2017 18:49:20 - @@ -157,7 +157,7 @@ sys_sysctl(struct proc *p, void *v, regi syscallarg(void *) new; syscallarg(size_t) newlen; } */ *uap = v; - int error, dolock = 1; + int error; size_t savelen = 0, oldlen = 0; sysctlfn *fn; int name[CTL_MAXNAME]; @@ -219,30 +219,41 @@ sys_sysctl(struct proc *p, void *v, regi if (SCARG(uap, oldlenp) && (error = copyin(SCARG(uap, oldlenp), , sizeof(oldlen return (error); - if (SCARG(uap, old) != NULL) { + if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) { if ((error = rw_enter(_lock, RW_WRITE|RW_INTR)) != 0) return (error); - if (dolock) { - if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) { - rw_exit_write(_lock); - return (ENOMEM); - } - error = uvm_vslock(p, SCARG(uap, old), oldlen, - PROT_READ | PROT_WRITE); - if (error) { - rw_exit_write(_lock); - return (error); - } + } + if (SCARG(uap, old) != NULL) { + if (atop(oldlen) > uvmexp.wiredmax - uvmexp.wired) { + error = ENOMEM; + goto unlock; } + error = uvm_vslock(p, SCARG(uap, old), oldlen, + PROT_READ | PROT_WRITE); + if (error) + goto unlock; savelen = oldlen; } + if (SCARG(uap, new) != NULL) { + if (atop(SCARG(uap, newlen)) > uvmexp.wiredmax - uvmexp.wired) { + error = ENOMEM; + goto unwire; + } + error = uvm_vslock(p, SCARG(uap, new), SCARG(uap, newlen), + PROT_READ | PROT_WRITE); + if (error) + goto unwire; + } error = (*fn)([1], SCARG(uap, namelen) - 1, SCARG(uap, old), , SCARG(uap, new), SCARG(uap, newlen), p); - if (SCARG(uap, old) != NULL) { - if (dolock) - uvm_vsunlock(p, SCARG(uap, old), savelen); + if (SCARG(uap, new) != NULL) + uvm_vsunlock(p, SCARG(uap, new), SCARG(uap, newlen)); + unwire: + if (SCARG(uap, old) != NULL) + uvm_vsunlock(p, SCARG(uap, old), savelen); + unlock: + if (SCARG(uap, old) != NULL || SCARG(uap, new) != NULL) rw_exit_write(_lock); - } if (error) return (error); if (SCARG(uap, oldlenp))
Re: NET_LOCK() pr_sysctl
On Thu, Dec 22, 2016 at 01:38:17AM +0100, Mateusz Guzik wrote: > In this particular case, what happens if the access results in a page > fault and the area comes from a nfs mapped file? If network i/o is done > from the same context, this should result in 'locking against myself' > assertion failure. I have written a program the sets a sysctl value from a memory mapped file mounted on NFS. As expected it panics when NET_LOCK() is enabled. panic: rw_enter: netlock locking against myself Stopped at Debugger+0x7: leave TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *45785 40072 0 0x3 00 mmap-sysctl ddb{0}> trace Debugger(d09f4dbd,f57706d0,d09cc44c,f57706d0,0) at Debugger+0x7 panic(d09cc44c,d09d634f,f5770700,f57706fc,0) at panic+0x71 rw_enter(d0b4ef38,1,f5770784,d04a8b3d,d953c00c) at rw_enter+0x1b4 rw_enter_write(d0b4ef38,1,0,d03cc0ce,d0bbcf80) at rw_enter_write+0x3c sosend(d9569730,0,0,d976aa00,0) at sosend+0xf5 nfs_send(d9569730,d957bd00,d976aa00,d953c00c,d954a5b0) at nfs_send+0x8a nfs_request(d96fda84,6,f57708ac,0,d96fda84) at nfs_request+0x3a3 nfs_readrpc(d96fda84,f5770930,0,0,39fcef3f) at nfs_readrpc+0x14e nfs_doio(d954a5b0,d9540738,0,2000,d9540738) at nfs_doio+0x321 nfs_bioread(d96fda84,f5770b38,0,d97fa840,d9540738) at nfs_bioread+0x4ff nfs_read(f5770ae0,3,33,d30b5870,d9745ed4) at nfs_read+0x35 VOP_READ(d96fda84,f5770b38,0,d97fa840,d30b5870) at VOP_READ+0x3f uvn_io(d9745ed4,f5770bc0,1,2,0) at uvn_io+0x2d5 uvn_get(d9745ed4,0,0,f5770cf8,f5770d00) at uvn_get+0x234 uvm_fault(d9581d24,8a1b1000,0,1,f5770d5c) at uvm_fault+0xf8e trap() at trap+0x723 --- trap (number 4) --- bluhm
Re: NET_LOCK() pr_sysctl
On Tue, Dec 20, 2016 at 05:37:20PM +, Alexander Bluhm wrote: > Obviosly a NET_LOCK() is missing in tcp_sysctl(). > > I think it is better to place the lock into net_sysctl() where all > the protocol sysctls are called via pr_sysctl. Then we don't have > to decide each case individually. As calling sysctl(2) is in the > slow path, doing fine grained locking has no benefit. Many sysctl > cases copy out a struct. Having a lock around that keeps the struct > consistent. > Holding locks across copyout/copyin is always fishy. In this particular case, what happens if the access results in a page fault and the area comes from a nfs mapped file? If network i/o is done from the same context, this should result in 'locking against myself' assertion failure. That said, I'm not exactly familiar with the area, so maybe that's a false alarm. -- Mateusz Guzik