Re: armv7 cache flushing: don't take shortcuts
On Mon, Aug 15, 2016 at 09:56:09PM +0200, Mark Kettenis wrote: > The functions that clean/invalidate the caches by virtual address, > bail out after cleaning 32k worth of data. The 32k matches the L1 > cache of most of the CPUs we current run on. But the Cortex-A7 has an > integrated L2 cache that is larger. And if you only flush it > partially you may get into trouble. And now that we actually use the > cache that matters. Many of the more recent ARMv7 CPUs include such a > L2 cache. And some of them even have L1 caches that are larger than > 32k. So drop the shortcut and simply clean/invalidate what we were > asked to clean/invalidate. Most of the calls should be covering a > single page or less anyway. > > This fixes the core dumps and illegal instructions that I see when > booting from a SATA disk. Just saw this commited. It makes Cubieboard2 fully useable so far. Kernel rebuild with fs on ahci: (...) ld -T ldscript --warn-common -nopie -S -o bsd ${SYSTEM_HEAD} vers.o ${OBJS} textdatabss dec hex 3744040 139412 479308 4362760 429208 25m50.10s real17m18.26s user 1m28.06s system Just as a comparison, it takes around 20 min on Wandboard with fs on nfs and around 23 min on BeagleBone Black with fs also on nfs. Thank you. -- U-Boot SPL 2016.07 (Aug 05 2016 - 23:44:57) DRAM: 1024 MiB CPU: 91200Hz, AXI/AHB/APB: 3/2/2 Trying to boot from MMC1 U-Boot 2016.07 (Aug 05 2016 - 23:44:57 -0600) Allwinner Technology CPU: Allwinner A20 (SUN7I) Model: Cubietech Cubieboard2 I2C: ready DRAM: 1 GiB MMC: SUNXI SD/MMC: 0 *** Warning - bad CRC, using default environment In:serial Out: serial Err: serial SCSI: Target spinup took 0 ms. AHCI 0001.0100 32 slots 1 ports 3 Gbps 0x1 impl SATA mode flags: ncq stag pm led clo only pmp pio slum part ccc apst Net: eth0: ethernet@01c5 starting USB... USB0: USB EHCI 1.00 USB1: USB OHCI 1.0 USB2: USB EHCI 1.00 USB3: USB OHCI 1.0 scanning bus 0 for devices... 1 USB Device(s) found scanning bus 2 for devices... 1 USB Device(s) found Hit any key to stop autoboot: 0 => => setenv devnum 0 => run scsi_boot scanning bus for devices... Device 0: (0:0) Vendor: ATA Prod.: TOSHIBA MK1235GS Rev: PV01 Type: Hard Disk Capacity: 114473.4 MB = 111.7 GB (234441648 x 512) Found 1 device(s). Device 0: (0:0) Vendor: ATA Prod.: TOSHIBA MK1235GS Rev: PV01 Type: Hard Disk Capacity: 114473.4 MB = 111.7 GB (234441648 x 512) ... is now current device Scanning scsi 0:1... Found EFI removable media binary efi/boot/bootarm.efi reading efi/boot/bootarm.efi 65276 bytes read in 23 ms (2.7 MiB/s) libfdt fdt_check_header(): FDT_ERR_BADMAGIC ## Starting EFI application at 0x4200 ... Scanning disks on scsi... Scanning disks on usb... Scanning disks on mmc... MMC Device 1 not found MMC Device 2 not found MMC Device 3 not found Found 6 disks >> OpenBSD/armv7 BOOTARM 0.1 boot> booting sd0a:/bsd: 3743840+139408+479308 [64+501824+238352]=0x4e3de0 OpenBSD/armv7 booting ... arg0 0x4000 arg1 0x10bb arg2 0x4800 Allocating page tables freestart = 0x407e4000, free_pages = 260124 (0x0003f81c) IRQ stack: p0x40812000 v0xc0812000 ABT stack: p0x40813000 v0xc0813000 UND stack: p0x40814000 v0xc0814000 SVC stack: p0x40815000 v0xc0815000 Creating L1 page table at 0x407e4000 Mapping kernel Constructing L2 page tables undefined page pmap [ using 740612 bytes of bsd ELF symbol table ] board type: 4283 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2016 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 6.0-current (GENERIC) #1: Mon Aug 15 19:34:05 BRT 2016 dbolgher...@wbs.my.domain:/usr/src/sys/arch/armv7/compile/GENERIC real mem = 1073741824 (1024MB) avail mem = 104448 (996MB) mainbus0 at root: Cubietech Cubieboard2 cpu0 at mainbus0: ARM Cortex A7 rev 4 (ARMv7 core) cpu0: DC enabled IC enabled WB disabled EABT branch prediction enabled cpu0: 32KB(32b/l,2way) I-cache, 32KB(64b/l,4way) wr-back D-cache cortex0 at mainbus0 sunxi0 at mainbus0 sxipio0 at sunxi0: 175 pins sxiccmu0 at sunxi0 gpio0 at sxipio0: 18 pins gpio1 at sxipio0: 24 pins gpio2 at sxipio0: 25 pins gpio3 at sxipio0: 28 pins gpio4 at sxipio0: 12 pins gpio5 at sxipio0: 6 pins gpio6 at sxipio0: 12 pins gpio7 at sxipio0: 28 pins gpio8 at sxipio0: 22 pins agtimer0 at mainbus0: tick rate 24000 KHz simplebus0 at mainbus0: "soc" ehci0 at simplebus0 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Allwinner EHCI root hub" rev 2.00/1.00 addr 1 sxiahci0 at simplebus0: AHCI 1.1 sxiahci0: port 0: 3.0Gb/s scsibus0 at sxiahci0: 32 targets sd0 at scsibus0 targ 0 lun 0:SCSI3 0/direct fixed naa.5391d4f841be sd0: 114473MB, 512 bytes/sector, 234441648 sectors ehci1 at simplebus0 usb1 at ehci1: USB revision 2.0 uhub1 at usb1 "Allwinner EHCI root hub" rev 2.00/1.00 addr 1 sxidog0 at simplebus0 sxirtc0 at simplebus0
Re: finer grained art locking again
> On 16 Aug 2016, at 08:28, Mark Ketteniswrote: > >> Date: Tue, 16 Aug 2016 08:21:42 +1000 >> From: Jonathan Matthew >> >> On Mon, Aug 15, 2016 at 02:47:29PM +0200, Mark Kettenis wrote: Date: Mon, 15 Aug 2016 22:08:16 +1000 From: Jonathan Matthew This removes ART's reliance on the kernel lock to serialise updates. I sent out an earlier version of this prior to 6.0, but it didn't make it in. Since then, we've decided to go with rwlocks in the packet processing path, so here's a version with an rwlock instead of a mutex. >>> >>> Merely out of curiousity: is there any reason to only use write locks >>> here? Naively you'd think this would be a case where making use of >>> the multiple-reader, single-writer property of rwlocks would make >>> sense... >> >> The read operations that need to be fast don't go through the lock >> at all. We could use read locks for rtable_walk, but we're unlikely >> to see concurrent rtable_walks, and I'd prefer to keep the lock use >> as simple as possible for now. art_walk modifies the table refcnts, which are serialised by the lock. taking a write lock is still necessary. > Thanks for the explanation.
Re: finer grained art locking again
> Date: Tue, 16 Aug 2016 08:21:42 +1000 > From: Jonathan Matthew> > On Mon, Aug 15, 2016 at 02:47:29PM +0200, Mark Kettenis wrote: > > > Date: Mon, 15 Aug 2016 22:08:16 +1000 > > > From: Jonathan Matthew > > > > > > This removes ART's reliance on the kernel lock to serialise > > > updates. I sent out an earlier version of this prior to 6.0, > > > but it didn't make it in. Since then, we've decided to go with > > > rwlocks in the packet processing path, so here's a version with > > > an rwlock instead of a mutex. > > > > Merely out of curiousity: is there any reason to only use write locks > > here? Naively you'd think this would be a case where making use of > > the multiple-reader, single-writer property of rwlocks would make > > sense... > > The read operations that need to be fast don't go through the lock > at all. We could use read locks for rtable_walk, but we're unlikely > to see concurrent rtable_walks, and I'd prefer to keep the lock use > as simple as possible for now. Thanks for the explanation.
Re: finer grained art locking again
On Mon, Aug 15, 2016 at 02:47:29PM +0200, Mark Kettenis wrote: > > Date: Mon, 15 Aug 2016 22:08:16 +1000 > > From: Jonathan Matthew> > > > This removes ART's reliance on the kernel lock to serialise updates. > > I sent out an earlier version of this prior to 6.0, but it didn't make it > > in. > > Since then, we've decided to go with rwlocks in the packet processing path, > > so here's a version with an rwlock instead of a mutex. > > Merely out of curiousity: is there any reason to only use write locks > here? Naively you'd think this would be a case where making use of > the multiple-reader, single-writer property of rwlocks would make > sense... The read operations that need to be fast don't go through the lock at all. We could use read locks for rtable_walk, but we're unlikely to see concurrent rtable_walks, and I'd prefer to keep the lock use as simple as possible for now.
Re: rwlock physlock mem.c
On Mon, Aug 15, 2016 at 06:10:55PM -0400, Ted Unangst wrote: > same change as for i386, but now for arm and sparcs. > ok mlarkin > Index: arm/arm/mem.c > === > RCS file: /cvs/src/sys/arch/arm/arm/mem.c,v > retrieving revision 1.16 > diff -u -p -r1.16 mem.c > --- arm/arm/mem.c 15 Aug 2016 22:01:59 - 1.16 > +++ arm/arm/mem.c 15 Aug 2016 22:10:03 - > @@ -81,6 +81,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -89,7 +90,6 @@ > > extern char *memhook;/* poor name! */ > caddr_t zeropage; > -int physlock; > > /* open counter for aperture */ > #ifdef APERTURE > @@ -142,6 +142,7 @@ mmclose(dev_t dev, int flag, int mode, s > int > mmrw(dev_t dev, struct uio *uio, int flags) > { > + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw"); > vaddr_t o, v; > size_t c; > struct iovec *iov; > @@ -150,14 +151,10 @@ mmrw(dev_t dev, struct uio *uio, int fla > > if (minor(dev) == DEV_MEM) { > /* lock against other uses of shared vmmap */ > - while (physlock > 0) { > - physlock++; > - error = tsleep((caddr_t), PZERO | PCATCH, > - "mmrw", 0); > - if (error) > - return (error); > - } > - physlock = 1; > + error = rw_enter(, RW_WRITE | RW_INTR); > + if (error) > + return (error); > + > } > while (uio->uio_resid > 0 && error == 0) { > iov = uio->uio_iov; > @@ -216,9 +213,7 @@ mmrw(dev_t dev, struct uio *uio, int fla > } > } > if (minor(dev) == DEV_MEM) { > - if (physlock > 1) > - wakeup((caddr_t)); > - physlock = 0; > + rw_exit(); > } > return (error); > } > Index: sparc/sparc/mem.c > === > RCS file: /cvs/src/sys/arch/sparc/sparc/mem.c,v > retrieving revision 1.27 > diff -u -p -r1.27 mem.c > --- sparc/sparc/mem.c 15 Aug 2016 22:01:59 - 1.27 > +++ sparc/sparc/mem.c 15 Aug 2016 22:10:03 - > @@ -47,6 +47,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -86,24 +87,19 @@ mmclose(dev_t dev, int flag, int mode, s > int > mmrw(dev_t dev, struct uio *uio, int flags) > { > + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw"); > off_t o; > paddr_t pa; > vaddr_t va; > size_t c; > struct iovec *iov; > int error = 0; > - static int physlock; > > if (minor(dev) == 0) { > /* lock against other uses of shared mem_page */ > - while (physlock > 0) { > - physlock++; > - error = tsleep((caddr_t), PZERO | PCATCH, > - "mmrw", 0); > - if (error) > - return (error); > - } > - physlock = 1; > + error = rw_enter(, RW_WRITE | RW_INTR); > + if (error) > + return (error); > if (mem_page == 0) > mem_page = uvm_km_valloc_wait(kernel_map, NBPG); > if (mem_page == 0) > @@ -193,9 +189,7 @@ mmrw(dev_t dev, struct uio *uio, int fla > } > if (minor(dev) == 0) { > unlock: > - if (physlock > 1) > - wakeup((caddr_t)); > - physlock = 0; > + rw_exit(); > } > return (error); > } > Index: sparc64/sparc64/mem.c > === > RCS file: /cvs/src/sys/arch/sparc64/sparc64/mem.c,v > retrieving revision 1.16 > diff -u -p -r1.16 mem.c > --- sparc64/sparc64/mem.c 15 Aug 2016 22:01:59 - 1.16 > +++ sparc64/sparc64/mem.c 15 Aug 2016 22:10:03 - > @@ -47,6 +47,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -75,24 +76,19 @@ mmclose(dev_t dev, int flag, int mode, s > int > mmrw(dev_t dev, struct uio *uio, int flags) > { > + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw"); > vaddr_t o, v; > size_t c; > struct iovec *iov; > int error = 0; > - static int physlock; > vm_prot_t prot; > extern caddr_t vmmap; > > if (minor(dev) == 0) { > /* lock against other uses of shared vmmap */ > - while (physlock > 0) { > - physlock++; > - error = tsleep((caddr_t), PZERO | PCATCH, > - "mmrw", 0); > - if (error) > - return (error); > - } > - physlock = 1; > + error = rw_enter(, RW_WRITE | RW_INTR); > + if (error) > +
rwlock physlock mem.c
same change as for i386, but now for arm and sparcs. Index: arm/arm/mem.c === RCS file: /cvs/src/sys/arch/arm/arm/mem.c,v retrieving revision 1.16 diff -u -p -r1.16 mem.c --- arm/arm/mem.c 15 Aug 2016 22:01:59 - 1.16 +++ arm/arm/mem.c 15 Aug 2016 22:10:03 - @@ -81,6 +81,7 @@ #include #include #include +#include #include #include @@ -89,7 +90,6 @@ extern char *memhook;/* poor name! */ caddr_t zeropage; -int physlock; /* open counter for aperture */ #ifdef APERTURE @@ -142,6 +142,7 @@ mmclose(dev_t dev, int flag, int mode, s int mmrw(dev_t dev, struct uio *uio, int flags) { + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw"); vaddr_t o, v; size_t c; struct iovec *iov; @@ -150,14 +151,10 @@ mmrw(dev_t dev, struct uio *uio, int fla if (minor(dev) == DEV_MEM) { /* lock against other uses of shared vmmap */ - while (physlock > 0) { - physlock++; - error = tsleep((caddr_t), PZERO | PCATCH, - "mmrw", 0); - if (error) - return (error); - } - physlock = 1; + error = rw_enter(, RW_WRITE | RW_INTR); + if (error) + return (error); + } while (uio->uio_resid > 0 && error == 0) { iov = uio->uio_iov; @@ -216,9 +213,7 @@ mmrw(dev_t dev, struct uio *uio, int fla } } if (minor(dev) == DEV_MEM) { - if (physlock > 1) - wakeup((caddr_t)); - physlock = 0; + rw_exit(); } return (error); } Index: sparc/sparc/mem.c === RCS file: /cvs/src/sys/arch/sparc/sparc/mem.c,v retrieving revision 1.27 diff -u -p -r1.27 mem.c --- sparc/sparc/mem.c 15 Aug 2016 22:01:59 - 1.27 +++ sparc/sparc/mem.c 15 Aug 2016 22:10:03 - @@ -47,6 +47,7 @@ #include #include #include +#include #include #include @@ -86,24 +87,19 @@ mmclose(dev_t dev, int flag, int mode, s int mmrw(dev_t dev, struct uio *uio, int flags) { + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw"); off_t o; paddr_t pa; vaddr_t va; size_t c; struct iovec *iov; int error = 0; - static int physlock; if (minor(dev) == 0) { /* lock against other uses of shared mem_page */ - while (physlock > 0) { - physlock++; - error = tsleep((caddr_t), PZERO | PCATCH, - "mmrw", 0); - if (error) - return (error); - } - physlock = 1; + error = rw_enter(, RW_WRITE | RW_INTR); + if (error) + return (error); if (mem_page == 0) mem_page = uvm_km_valloc_wait(kernel_map, NBPG); if (mem_page == 0) @@ -193,9 +189,7 @@ mmrw(dev_t dev, struct uio *uio, int fla } if (minor(dev) == 0) { unlock: - if (physlock > 1) - wakeup((caddr_t)); - physlock = 0; + rw_exit(); } return (error); } Index: sparc64/sparc64/mem.c === RCS file: /cvs/src/sys/arch/sparc64/sparc64/mem.c,v retrieving revision 1.16 diff -u -p -r1.16 mem.c --- sparc64/sparc64/mem.c 15 Aug 2016 22:01:59 - 1.16 +++ sparc64/sparc64/mem.c 15 Aug 2016 22:10:03 - @@ -47,6 +47,7 @@ #include #include #include +#include #include #include @@ -75,24 +76,19 @@ mmclose(dev_t dev, int flag, int mode, s int mmrw(dev_t dev, struct uio *uio, int flags) { + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw"); vaddr_t o, v; size_t c; struct iovec *iov; int error = 0; - static int physlock; vm_prot_t prot; extern caddr_t vmmap; if (minor(dev) == 0) { /* lock against other uses of shared vmmap */ - while (physlock > 0) { - physlock++; - error = tsleep((caddr_t), PZERO | PCATCH, - "mmrw", 0); - if (error) - return (error); - } - physlock = 1; + error = rw_enter(, RW_WRITE | RW_INTR); + if (error) + return (error); } while (uio->uio_resid > 0 && error == 0) { int n; @@ -169,9 +165,7 @@ mmrw(dev_t dev, struct uio *uio, int fla } if (minor(dev) == 0)
com(4) UART type probing
It looks like we're going to use com(4) on armv7 after all. Given that we have a device tree there, we have a pretty good idea what UART type we're driving. So we should be able to avoid the flaky probing code in com(4). Here is a diff that makes the driver only do the probing if the uart type was set to COM_UART_UNKNOWN at the start. This is the default if you don't explicitly set it in the bus glue part of the driver. There are a few bits of glue code that do set te uart type. Of those only arch/octeon/dev/cn30xxuart.c sets it to a value that makes com(4) go through the probing codepath. Can somebody verify that this diff doesn't break octeon? Thanks, Mark Index: dev/ic/com.c === RCS file: /cvs/src/sys/dev/ic/com.c,v retrieving revision 1.158 diff -u -p -r1.158 com.c --- dev/ic/com.c14 Sep 2014 14:17:24 - 1.158 +++ dev/ic/com.c15 Aug 2016 21:23:58 - @@ -1583,6 +1583,7 @@ com_attach_subr(struct com_softc *sc) { bus_space_tag_t iot = sc->sc_iot; bus_space_handle_t ioh = sc->sc_ioh; + int probe = 0; u_int8_t lcr; sc->sc_ier = 0; @@ -1616,7 +1617,7 @@ com_attach_subr(struct com_softc *sc) /* * Skip specific probes if attachment code knows it already. */ - if (sc->sc_uarttype == COM_UART_UNKNOWN) + if (sc->sc_uarttype == COM_UART_UNKNOWN) { switch (bus_space_read_1(iot, ioh, com_iir) >> 6) { case 0: sc->sc_uarttype = COM_UART_16450; @@ -1631,8 +1632,11 @@ com_attach_subr(struct com_softc *sc) sc->sc_uarttype = COM_UART_UNKNOWN; break; } + probe = 1; + } - if (sc->sc_uarttype == COM_UART_16550A) { /* Probe for ST16650s */ + /* Probe for ST16650s */ + if (probe && sc->sc_uarttype == COM_UART_16550A) { bus_space_write_1(iot, ioh, com_lcr, lcr | LCR_DLAB); if (bus_space_read_1(iot, ioh, com_efr) == 0) { bus_space_write_1(iot, ioh, com_efr, EFR_CTS); @@ -1647,7 +1651,8 @@ com_attach_subr(struct com_softc *sc) } #if 0 /* until com works with large FIFOs */ - if (sc->sc_uarttype == COM_UART_ST16650V2) { /* Probe for XR16850s */ + /* Probe for XR16850s */ + if (probe && sc->sc_uarttype == COM_UART_ST16650V2) { u_int8_t dlbl, dlbh; /* Enable latch access and get the current values. */ @@ -1670,7 +1675,8 @@ com_attach_subr(struct com_softc *sc) } #endif - if (sc->sc_uarttype == COM_UART_16550A) { /* Probe for TI16750s */ + /* Probe for TI16750s */ + if (probe && sc->sc_uarttype == COM_UART_16550A) { bus_space_write_1(iot, ioh, com_lcr, lcr | LCR_DLAB); bus_space_write_1(iot, ioh, com_fifo, FIFO_ENABLE | FIFO_ENABLE_64BYTE); @@ -1686,7 +1692,9 @@ com_attach_subr(struct com_softc *sc) /* Reset the LCR (latch access is probably enabled). */ bus_space_write_1(iot, ioh, com_lcr, lcr); - if (sc->sc_uarttype == COM_UART_16450) { /* Probe for 8250 */ + + /* Probe for 8250 */ + if (probe && sc->sc_uarttype == COM_UART_16450) { u_int8_t scr0, scr1, scr2; scr0 = bus_space_read_1(iot, ioh, com_scratch);
Re: fuse cache shenanigans
Yeah, ok in that context, sure.. since this is userspace shit the caching if any should probably happen there. On Mon, Aug 15, 2016 at 2:20 PM, Ted Unangstwrote: > Bob Beck wrote: > > Note - NFS has similar behaviour ;) > > > > at least within a directory. - so this isn't tht "unusual" for non-local > > > > I'm wondering if this isn't a bit premature Have you looked for other > side > > effects of this removal? > > I think the nature of FUSE is that it's used with even "faker" filesystems, > such that the backend may not even have filesystem like semantics. We > should > be careful what we assume. > > Also, the fuse daemon can probably do a better job of caching because it > has > more specialized knowledge. It can cache more than names, too. >
Re: fuse cache shenanigans
Bob Beck wrote: > Note - NFS has similar behaviour ;) > > at least within a directory. - so this isn't tht "unusual" for non-local > > I'm wondering if this isn't a bit premature Have you looked for other side > effects of this removal? I think the nature of FUSE is that it's used with even "faker" filesystems, such that the backend may not even have filesystem like semantics. We should be careful what we assume. Also, the fuse daemon can probably do a better job of caching because it has more specialized knowledge. It can cache more than names, too.
Re: fuse cache shenanigans
Note - NFS has similar behaviour ;) at least within a directory. - so this isn't tht "unusual" for non-local I'm wondering if this isn't a bit premature Have you looked for other side effects of this removal? On Mon, Aug 15, 2016 at 1:52 PM, Ted Unangstwrote: > Martin Natano wrote: > > Watch this: > > > > $ doas sshfs natano@localhost:/tmp /mnt > > $ cat /mnt/foo > > cat: /mnt/foo: No such file or directory > > $ echo bar > /tmp/foo > > $ cat /mnt/foo > > cat: /mnt/foo: No such file or directory > > $ touch /mnt/foo > > $ cat /mnt/foo > > bar > > $ > > > > There is no sense in doing caching in fusefs. In case of a non-local > > filesystem the tree can change behind our back without us having a > > chance to purge the cache. > > > > "The only winning move is not to play." Ok? > > ok > >
armv7 cache flushing: don't take shortcuts
The functions that clean/invalidate the caches by virtual address, bail out after cleaning 32k worth of data. The 32k matches the L1 cache of most of the CPUs we current run on. But the Cortex-A7 has an integrated L2 cache that is larger. And if you only flush it partially you may get into trouble. And now that we actually use the cache that matters. Many of the more recent ARMv7 CPUs include such a L2 cache. And some of them even have L1 caches that are larger than 32k. So drop the shortcut and simply clean/invalidate what we were asked to clean/invalidate. Most of the calls should be covering a single page or less anyway. This fixes the core dumps and illegal instructions that I see when booting from a SATA disk. ok? Index: arch/arm/arm/cpufunc_asm_armv7.S === RCS file: /cvs/src/sys/arch/arm/arm/cpufunc_asm_armv7.S,v retrieving revision 1.13 diff -u -p -r1.13 cpufunc_asm_armv7.S --- arch/arm/arm/cpufunc_asm_armv7.S6 Aug 2016 16:46:25 - 1.13 +++ arch/arm/arm/cpufunc_asm_armv7.S15 Aug 2016 19:45:53 - @@ -103,8 +103,6 @@ ENTRY(armv7_tlb_flushD) i_inc .req r3 ENTRY(armv7_icache_sync_range) ldr ip, .Larmv7_icache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -136,8 +134,6 @@ ENTRY(armv7_icache_sync_all) ENTRY(armv7_dcache_wb_range) ldr ip, .Larmv7_dcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -155,8 +151,6 @@ ENTRY(armv7_dcache_wb_range) ENTRY(armv7_idcache_wbinv_range) ldr ip, .Larmv7_idcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -177,8 +171,6 @@ ENTRY(armv7_idcache_wbinv_range) ENTRY(armv7_dcache_wbinv_range) ldr ip, .Larmv7_dcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1 @@ -198,8 +190,6 @@ ENTRY(armv7_dcache_wbinv_range) ENTRY(armv7_dcache_inv_range) ldr ip, .Larmv7_dcache_line_size - cmp r1, #0x8000 - movcs r1, #0x8000 /* XXX needs to match cache size... */ ldr ip, [ip] sub r1, r1, #1 /* Don't overrun */ sub r3, ip, #1
Re: fuse cache shenanigans
Martin Natano wrote: > Watch this: > > $ doas sshfs natano@localhost:/tmp /mnt > $ cat /mnt/foo > cat: /mnt/foo: No such file or directory > $ echo bar > /tmp/foo > $ cat /mnt/foo > cat: /mnt/foo: No such file or directory > $ touch /mnt/foo > $ cat /mnt/foo > bar > $ > > There is no sense in doing caching in fusefs. In case of a non-local > filesystem the tree can change behind our back without us having a > chance to purge the cache. > > "The only winning move is not to play." Ok? ok
fuse cache shenanigans
Watch this: $ doas sshfs natano@localhost:/tmp /mnt $ cat /mnt/foo cat: /mnt/foo: No such file or directory $ echo bar > /tmp/foo $ cat /mnt/foo cat: /mnt/foo: No such file or directory $ touch /mnt/foo $ cat /mnt/foo bar $ There is no sense in doing caching in fusefs. In case of a non-local filesystem the tree can change behind our back without us having a chance to purge the cache. "The only winning move is not to play." Ok? natano Index: miscfs/fuse/fuse_lookup.c === RCS file: /cvs/src/sys/miscfs/fuse/fuse_lookup.c,v retrieving revision 1.12 diff -u -p -r1.12 fuse_lookup.c --- miscfs/fuse/fuse_lookup.c 12 Aug 2016 20:18:44 - 1.12 +++ miscfs/fuse/fuse_lookup.c 15 Aug 2016 10:01:41 - @@ -64,9 +64,6 @@ fusefs_lookup(void *v) (cnp->cn_nameiop == DELETE || cnp->cn_nameiop == RENAME)) return (EROFS); - if ((error = cache_lookup(vdp, vpp, cnp)) >= 0) - return (error); - if (flags & ISDOTDOT) { /* got ".." */ nid = dp->parent; @@ -120,10 +117,8 @@ fusefs_lookup(void *v) * Write access to directory required to delete files. */ error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_proc); - if (error != 0) { - fb_delete(fbuf); - return (error); - } + if (error) + goto out; cnp->cn_flags |= SAVENAME; } @@ -132,10 +127,8 @@ fusefs_lookup(void *v) /* * Write access to directory required to delete files. */ - if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_proc)) != 0) { - fb_delete(fbuf); - return (error); - } + if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_proc)) != 0) + goto out; if (nid == VTOI(vdp)->ufs_ino.i_number) { error = EISDIR; @@ -187,10 +180,8 @@ fusefs_lookup(void *v) update_vattr(fmp->mp, >fb_vattr); - if (error) { - fb_delete(fbuf); - return (error); - } + if (error) + goto out; if (vdp != NULL && vdp->v_type == VDIR) VTOI(tdp)->parent = dp->ufs_ino.i_number; @@ -204,10 +195,6 @@ fusefs_lookup(void *v) } out: - if ((cnp->cn_flags & MAKEENTRY) && nameiop != CREATE && - nameiop != DELETE) - cache_enter(vdp, *vpp, cnp); - fb_delete(fbuf); return (error); } Index: miscfs/fuse/fuse_vnops.c === RCS file: /cvs/src/sys/miscfs/fuse/fuse_vnops.c,v retrieving revision 1.29 diff -u -p -r1.29 fuse_vnops.c --- miscfs/fuse/fuse_vnops.c12 Aug 2016 20:18:44 - 1.29 +++ miscfs/fuse/fuse_vnops.c15 Aug 2016 10:01:41 - @@ -857,7 +857,6 @@ fusefs_reclaim(void *v) * Remove the inode from its hash chain. */ ufs_ihashrem(>ufs_ino); - cache_purge(vp); free(ip, M_FUSEFS, 0); vp->v_data = NULL; @@ -1386,11 +1385,9 @@ fusefs_rmdir(void *v) goto out; } - cache_purge(dvp); vput(dvp); dvp = NULL; - cache_purge(ITOV(ip)); fb_delete(fbuf); out: if (dvp)
httpd "Persistent connection issue in FastCGI", anyone?
Hi, has somebody else seen something like "Persistent connection issue in FastCGI" as described in https://github.com/reyk/httpd/issues/49 ? Reyk
Re: typo in bytgpio(4) manpage
On Mon, Aug 15, 2016 at 02:15:27PM -0300, Daniel Bolgheroni wrote: > Typo. > > s/drirect/direct/ It's in - Thank you! > Index: bytgpio.4 > === > RCS file: /cvs/src/share/man/man4/bytgpio.4,v > retrieving revision 1.2 > diff -u -p -r1.2 bytgpio.4 > --- bytgpio.4 28 Mar 2016 20:08:56 - 1.2 > +++ bytgpio.4 15 Aug 2016 17:11:31 - > @@ -27,7 +27,7 @@ The > .Nm > driver provides support for the GPIO controllers found on Intel's Bay > Trail SoC. > -It does not provide drirect device driver entry points but makes its > +It does not provide direct device driver entry points but makes its > functions available to > .Xr acpi 4 . > .Sh SEE ALSO > > -- > db
typo in bytgpio(4) manpage
Typo. s/drirect/direct/ Index: bytgpio.4 === RCS file: /cvs/src/share/man/man4/bytgpio.4,v retrieving revision 1.2 diff -u -p -r1.2 bytgpio.4 --- bytgpio.4 28 Mar 2016 20:08:56 - 1.2 +++ bytgpio.4 15 Aug 2016 17:11:31 - @@ -27,7 +27,7 @@ The .Nm driver provides support for the GPIO controllers found on Intel's Bay Trail SoC. -It does not provide drirect device driver entry points but makes its +It does not provide direct device driver entry points but makes its functions available to .Xr acpi 4 . .Sh SEE ALSO -- db
Re: ed(1) exit without saving if output file is a command
On Mon, Aug 15, 2016 at 06:26:44PM +0200, Theo Buehler wrote: > On Mon, Aug 15, 2016 at 03:33:38PM +0200, Jérôme FRGACIC wrote: > > Hello tech, > > > > I recently use ed(1) to transmit some input lines to another command. > > However, I remark that after the 'w' command, I can exit ed without any > > warnings even if the data were not saved into a file. > > > > Here is an example : > > > > $ ed > > P > > *a > > A simple line. > > . > > *w !sed 's/^/#/' > > #A simple line. > > 15 > > *q > > $ # No warning before exit > > > > This behaviour seems not conform to the Open Group Base Specifications > > Issue 6 (IEEE Std 1003.1, 2004 Edition). I would suggest this diff to > > solve this problem. > > POSIX says the following about the quit command: > >The q command shall cause ed to exit. If the buffer has changed since >the last time the entire buffer was written, the user shall be warned, >as described previously. > > where 'previously' means: > >If changes have been made in the buffer since the last w command that >wrote the entire buffer, ed shall warn the user if an attempt is made >to destroy the editor buffer via the e or q commands. >[...] > > You did write the entire buffer, namely to the stdin of the sed command, > and afterwards you didn't change the buffer anymore, so I think this is > compliant with what is quoted above. > > FWIW, the GNU ed port (ged) behaves in exactly the same way. > > Could you please be more specific what part of the standard you have in > mind? Oh, there's this in the description of the w command, sorry: If file begins with '!', the rest of the line shall be taken to be a shell command line whose standard input shall be the addressed lines. Such a shell command line shall not be remembered as the current pathname. This usage of the write command with '!' shall not be considered as a ``last w command that wrote the entire buffer'', as described previously; thus, this alone shall not prevent the warning to the user if an attempt is made to destroy the editor buffer via the e or q commands. Please ignore my previous objection. > > > > Index: main.c > > === > > RCS file: /cvs/src/bin/ed/main.c,v > > retrieving revision 1.57 > > diff -r1.57 main.c > > 862c862 > > < else if (addr == addr_last) > > --- > > > else if (addr == addr_last && *fnp != '!') > > > > > > PS : I haven't subcribe to the tech mailing list, so please add me as > > recipient if you reply. > > > > > > Kind regards, > > > > > > Jérôme FRGACIC > > >
Re: ed(1) exit without saving if output file is a command
On Mon, Aug 15, 2016 at 03:33:38PM +0200, Jérôme FRGACIC wrote: > Hello tech, > > I recently use ed(1) to transmit some input lines to another command. > However, I remark that after the 'w' command, I can exit ed without any > warnings even if the data were not saved into a file. > > Here is an example : > > $ ed > P > *a > A simple line. > . > *w !sed 's/^/#/' > #A simple line. > 15 > *q > $ # No warning before exit > > This behaviour seems not conform to the Open Group Base Specifications > Issue 6 (IEEE Std 1003.1, 2004 Edition). I would suggest this diff to > solve this problem. POSIX says the following about the quit command: The q command shall cause ed to exit. If the buffer has changed since the last time the entire buffer was written, the user shall be warned, as described previously. where 'previously' means: If changes have been made in the buffer since the last w command that wrote the entire buffer, ed shall warn the user if an attempt is made to destroy the editor buffer via the e or q commands. [...] You did write the entire buffer, namely to the stdin of the sed command, and afterwards you didn't change the buffer anymore, so I think this is compliant with what is quoted above. FWIW, the GNU ed port (ged) behaves in exactly the same way. Could you please be more specific what part of the standard you have in mind? > > Index: main.c > === > RCS file: /cvs/src/bin/ed/main.c,v > retrieving revision 1.57 > diff -r1.57 main.c > 862c862 > < else if (addr == addr_last) > --- > > else if (addr == addr_last && *fnp != '!') > > > PS : I haven't subcribe to the tech mailing list, so please add me as > recipient if you reply. > > > Kind regards, > > > Jérôme FRGACIC >
ed(1) exit without saving if output file is a command
Hello tech, I recently use ed(1) to transmit some input lines to another command. However, I remark that after the 'w' command, I can exit ed without any warnings even if the data were not saved into a file. Here is an example : $ ed P *a A simple line. . *w !sed 's/^/#/' #A simple line. 15 *q $ # No warning before exit This behaviour seems not conform to the Open Group Base Specifications Issue 6 (IEEE Std 1003.1, 2004 Edition). I would suggest this diff to solve this problem. Index: main.c === RCS file: /cvs/src/bin/ed/main.c,v retrieving revision 1.57 diff -r1.57 main.c 862c862 < else if (addr == addr_last) --- > else if (addr == addr_last && *fnp != '!') PS : I haven't subcribe to the tech mailing list, so please add me as recipient if you reply. Kind regards, Jérôme FRGACIC
Re: httpd: be stricter with TLS configuration
On Mon, Aug 15, 2016 at 11:33:18PM +1000, Joel Sing wrote: > On Monday 15 August 2016 13:04:43 Reyk Floeter wrote: > > On Sat, Aug 13, 2016 at 02:57:14AM +1000, Joel Sing wrote: > > > The following diff makes httpd stricter with respect to TLS configuration: > > > > > > - Do not allow TLS and non-TLS to be configured on the same port. > > > - Do not allow TLS options to be specified without a TLS listener. > > > - Ensure that TLS options are the same when a server is specified on the > > > > > > same address/port. > > > > > > Currently, these configurations are permitted but do not work as intended. > > > > > > This also factors out (and reuses) the server matching code that was > > > already duplicated. > > > > > > ok? > > > > - I think server_match() and server_tls_cmp() can both live in > > server.c (server_match() somewhere close to server_foreach() - this > > match function can be used for at least one other case outside of > > parse.y). > > I've moved server_tls_cmp() to server.c, however I'm not sure it makes sense > for server_match() since it operates on conf (a global declared in parse.y) > and I do not see any identical matching (the closest seems to be the code in > server_privinit(), but it still differs). > you can use env-> instead of conf-> there. I think server_privinit() could be reworked to use it. In other words: I don't think that server_match() is something that is specific to the parser, so it shouldn't be in the parse.y file. Reyk > > - As discussed before, for consistency with the config, please use > > "tls" instead of "TLS" in the log messages. > > Agreed, I'll handle this in a separate commit. > > > FYI, The SNI diff doesn't like the tls_cert_file and tls_key_file > > checks in server_tls_cmp(), as they now become valid, but they can be > > removed/changed later. > > Yes, they are independent diffs and we'll need to relax the restrictions > slightly when we enable SNI. > > > Otherwise OK reyk@ > > Thanks. --
bpf(4), csignal() and selwakup()
I'd like to make sure that bpf_tap(9) does not grab the KERNEL_LOCK(). The reason is to reduce the potential lock ordering problems within PF. I'm currently using a mutex to serialize buffer changes, but since bpf_wakeup() will still need the KERNEL_LOCK(), I'm using a task for that. Diff below only introduce this task. I'd like to commit it first to make sure possible regressions are fixed before playing with the mutex diff. Note that an acceptable race can occur with this diff. The value of ``bd_async'' and/or ``bd_sig'' might change between the bpf_wakeup() call and the check. ok? Index: net/bpf.c === RCS file: /cvs/src/sys/net/bpf.c,v retrieving revision 1.147 diff -u -p -r1.147 bpf.c --- net/bpf.c 15 Aug 2016 07:20:14 - 1.147 +++ net/bpf.c 15 Aug 2016 13:55:04 - @@ -57,6 +57,8 @@ #include #include #include +#include +#include #include #include @@ -103,6 +105,7 @@ int bpf_setif(struct bpf_d *, struct ifr intbpfpoll(dev_t, int, struct proc *); intbpfkqfilter(dev_t, struct knote *); void bpf_wakeup(struct bpf_d *); +void bpf_wakeup_cb(void *); void bpf_catchpacket(struct bpf_d *, u_char *, size_t, size_t, void (*)(const void *, void *, size_t), struct timeval *); void bpf_reset_d(struct bpf_d *); @@ -345,6 +348,7 @@ bpfopen(dev_t dev, int flag, int mode, s bd->bd_unit = unit; bd->bd_bufsize = bpf_bufsize; bd->bd_sig = SIGIO; + task_set(>bd_wake_task, bpf_wakeup_cb, bd); if (flag & FNONBLOCK) bd->bd_rtout = -1; @@ -515,12 +519,28 @@ bpfread(dev_t dev, struct uio *uio, int void bpf_wakeup(struct bpf_d *d) { - wakeup((caddr_t)d); + /* +* As long as csignal() and selwakeup() need to be protected +* by the KERNEL_LOCK() we have to delay the wakeup to +* another context to keep the hot path KERNEL_LOCK()-free. +*/ + bpf_get(d); + task_add(systq, >bd_wake_task); +} + +void +bpf_wakeup_cb(void *xd) +{ + struct bpf_d *d = xd; + + KERNEL_ASSERT_LOCKED(); + + wakeup(d); if (d->bd_async && d->bd_sig) - csignal(d->bd_pgid, d->bd_sig, - d->bd_siguid, d->bd_sigeuid); + csignal(d->bd_pgid, d->bd_sig, d->bd_siguid, d->bd_sigeuid); selwakeup(>bd_sel); + bpf_put(d); } int Index: net/bpfdesc.h === RCS file: /cvs/src/sys/net/bpfdesc.h,v retrieving revision 1.30 diff -u -p -r1.30 bpfdesc.h --- net/bpfdesc.h 30 Mar 2016 12:33:10 - 1.30 +++ net/bpfdesc.h 15 Aug 2016 13:55:04 - @@ -42,8 +42,6 @@ #ifdef _KERNEL -#include - /* * Descriptor associated with each open bpf file. */ @@ -90,6 +88,8 @@ struct bpf_d { struct selinfo bd_sel; /* bsd select info */ int bd_unit;/* logical unit number */ LIST_ENTRY(bpf_d) bd_list; /* descriptor list */ + + struct task bd_wake_task; /* delay csignal() and selwakeup() */ }; /*
Re: httpd: be stricter with TLS configuration
On Monday 15 August 2016 13:04:43 Reyk Floeter wrote: > On Sat, Aug 13, 2016 at 02:57:14AM +1000, Joel Sing wrote: > > The following diff makes httpd stricter with respect to TLS configuration: > > > > - Do not allow TLS and non-TLS to be configured on the same port. > > - Do not allow TLS options to be specified without a TLS listener. > > - Ensure that TLS options are the same when a server is specified on the > > > > same address/port. > > > > Currently, these configurations are permitted but do not work as intended. > > > > This also factors out (and reuses) the server matching code that was > > already duplicated. > > > > ok? > > - I think server_match() and server_tls_cmp() can both live in > server.c (server_match() somewhere close to server_foreach() - this > match function can be used for at least one other case outside of > parse.y). I've moved server_tls_cmp() to server.c, however I'm not sure it makes sense for server_match() since it operates on conf (a global declared in parse.y) and I do not see any identical matching (the closest seems to be the code in server_privinit(), but it still differs). > - As discussed before, for consistency with the config, please use > "tls" instead of "TLS" in the log messages. Agreed, I'll handle this in a separate commit. > FYI, The SNI diff doesn't like the tls_cert_file and tls_key_file > checks in server_tls_cmp(), as they now become valid, but they can be > removed/changed later. Yes, they are independent diffs and we'll need to relax the restrictions slightly when we enable SNI. > Otherwise OK reyk@ Thanks.
Re: finer grained art locking again
> Date: Mon, 15 Aug 2016 22:08:16 +1000 > From: Jonathan Matthew> > This removes ART's reliance on the kernel lock to serialise updates. > I sent out an earlier version of this prior to 6.0, but it didn't make it in. > Since then, we've decided to go with rwlocks in the packet processing path, > so here's a version with an rwlock instead of a mutex. Merely out of curiousity: is there any reason to only use write locks here? Naively you'd think this would be a case where making use of the multiple-reader, single-writer property of rwlocks would make sense... > Index: art.c > === > RCS file: /cvs/src/sys/net/art.c,v > retrieving revision 1.22 > diff -u -p -r1.22 art.c > --- art.c 19 Jul 2016 10:51:44 - 1.22 > +++ art.c 14 Aug 2016 11:10:45 - > @@ -152,6 +152,7 @@ art_alloc(unsigned int rtableid, unsigne > > ar->ar_off = off; > ar->ar_rtableid = rtableid; > + rw_init(>ar_lock, "art"); > > return (ar); > } > @@ -378,7 +379,7 @@ art_insert(struct art_root *ar, struct a > struct art_node *node; > int i, j; > > - KERNEL_ASSERT_LOCKED(); > + rw_assert_wrlock(>ar_lock); > KASSERT(plen >= 0 && plen <= ar->ar_alen); > > at = srp_get_locked(>ar_root); > @@ -482,7 +483,7 @@ art_delete(struct art_root *ar, struct a > struct art_node *node; > int i, j; > > - KERNEL_ASSERT_LOCKED(); > + rw_assert_wrlock(>ar_lock); > KASSERT(plen >= 0 && plen <= ar->ar_alen); > > at = srp_get_locked(>ar_root); > @@ -613,7 +614,7 @@ art_walk(struct art_root *ar, int (*f)(s > struct art_node *node; > int error = 0; > > - KERNEL_LOCK(); > + rw_enter_write(>ar_lock); > at = srp_get_locked(>ar_root); > if (at != NULL) { > art_table_ref(ar, at); > @@ -631,7 +632,7 @@ art_walk(struct art_root *ar, int (*f)(s > > art_table_free(ar, at); > } > - KERNEL_UNLOCK(); > + rw_exit_write(>ar_lock); > > return (error); > } > @@ -708,9 +709,9 @@ art_walk_apply(struct art_root *ar, > > if ((an != NULL) && (an != next)) { > /* this assumes an->an_dst is not used by f */ > - KERNEL_UNLOCK(); > + rw_exit_write(>ar_lock); > error = (*f)(an, arg); > - KERNEL_LOCK(); > + rw_enter_write(>ar_lock); > } > > return (error); > Index: art.h > === > RCS file: /cvs/src/sys/net/art.h,v > retrieving revision 1.14 > diff -u -p -r1.14 art.h > --- art.h 14 Jun 2016 04:42:02 - 1.14 > +++ art.h 14 Aug 2016 11:10:45 - > @@ -19,6 +19,8 @@ > #ifndef _NET_ART_H_ > #define _NET_ART_H_ > > +#include > + > #define ART_MAXLVL 32 /* We currently use 32 levels for IPv6. */ > > /* > @@ -26,6 +28,7 @@ > */ > struct art_root { > struct srp ar_root; /* First table */ > + struct rwlockar_lock; /* Serialise modifications */ > uint8_t ar_bits[ART_MAXLVL]; /* Per level stride */ > uint8_t ar_nlvl; /* Number of levels */ > uint8_t ar_alen; /* Address length in bits */ > Index: rtable.c > === > RCS file: /cvs/src/sys/net/rtable.c,v > retrieving revision 1.50 > diff -u -p -r1.50 rtable.c > --- rtable.c 19 Jul 2016 10:51:44 - 1.50 > +++ rtable.c 14 Aug 2016 11:10:45 - > @@ -515,6 +515,10 @@ static inline uint8_t*satoaddr(struct a > void rtentry_ref(void *, void *); > void rtentry_unref(void *, void *); > > +#ifndef SMALL_KERNEL > +void rtable_mpath_insert(struct art_node *, struct rtentry *); > +#endif > + > struct srpl_rc rt_rc = SRPL_RC_INITIALIZER(rtentry_ref, rtentry_unref, NULL); > > void > @@ -679,8 +683,6 @@ rtable_insert(unsigned int rtableid, str > unsigned int rt_flags; > int error = 0; > > - KERNEL_ASSERT_LOCKED(); > - > ar = rtable_get(rtableid, dst->sa_family); > if (ar == NULL) > return (EAFNOSUPPORT); > @@ -691,6 +693,7 @@ rtable_insert(unsigned int rtableid, str > return (EINVAL); > > rtref(rt); /* guarantee rtfree won't do anything during insert */ > + rw_enter_write(>ar_lock); > > #ifndef SMALL_KERNEL > /* Do not permit exactly the same dst/mask/gw pair. */ > @@ -766,15 +769,14 @@ rtable_insert(unsigned int rtableid, str > } > } > > - SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next); > - > /* Put newly inserted entry at the right place. */ > - rtable_mpath_reprio(rtableid, dst, mask,
5GHz vs 2GHz APs
The wireless stack usually runs its scanning loop once per frequency band. It begins with 5GHz so that APs on this band are preferred. Within a band, an AP with the best RSSI (receive signal strength indicator) is chosen, after matching all other desired parameters such as the ESSID etc. I am not sure what the rationale was at the time this code was written but today there is a good reason to prefer 5GHz. This band is usually much less crowded and gives a much better experience as a result. Because 5GHz is scanned first we'll usually prefer this band. However, iwm(4) runs a scan across all bands at once so we pick an AP based on RSSI alone. In practice, we often end up choosing 2GHz APs instead of 5GHz APs, even in cases where 5GHz would work fine, just with a slightly lower RSSI. With this diff we prefer 5GHz APs with "good enough" RSSI over 2GHz APs. Index: ieee80211_node.c === RCS file: /cvs/src/sys/net80211/ieee80211_node.c,v retrieving revision 1.103 diff -u -p -r1.103 ieee80211_node.c --- ieee80211_node.c21 May 2016 09:07:11 - 1.103 +++ ieee80211_node.c15 Aug 2016 12:14:21 - @@ -622,12 +622,23 @@ ieee80211_end_scan(struct ifnet *ifp) ieee80211_free_node(ic, ni); continue; } - if (ieee80211_match_bss(ic, ni) == 0) { - if (selbs == NULL) - selbs = ni; - else if (ni->ni_rssi > selbs->ni_rssi) - selbs = ni; - } + if (ieee80211_match_bss(ic, ni) != 0) + continue; + + /* Pick the AP/IBSS match with the best RSSI. */ + if (selbs == NULL) + selbs = ni; + else if ((ic->ic_caps & IEEE80211_C_SCANALLBAND) && + IEEE80211_IS_CHAN_5GHZ(selbs->ni_chan) && + IEEE80211_IS_CHAN_2GHZ(ni->ni_chan) && + selbs->ni_rssi >= (ic->ic_max_rssi - (ic->ic_max_rssi / 4))) + /* +* Prefer 5GHz (with reasonable RSSI) over 2GHz since +* the 5GHz band is usually less saturated. +*/ + continue; + else if (ni->ni_rssi > selbs->ni_rssi) + selbs = ni; } if (selbs == NULL) goto notfound;
Re: per cpu memory in the kernel
> From: David Gwynne> Date: Sun, 14 Aug 2016 16:26:45 +1000 > > > On 13 Aug 2016, at 5:48 AM, Mark Kettenis wrote: > > > >> From: Martin Pieuchot > >> Date: Fri, 12 Aug 2016 20:44:04 +0200 > >> > >> On 08/11/16 06:43, David Gwynne wrote: > >>> ive been tinkering with per cpu memory in the kernel > >>> [...] > >> > >> I'd like to have more people comment on this because we need an MP-safe > >> way to handle counters in the network stack. > > > > I had a first look. Need to have a bit more time to digest this. > > > >>> im still debating whether the API should do protection against > >>> interrupts on the local cpu by handling spls for you. at the moment > >>> it is up to the caller to manually splfoo() and splx(), but im half > >>> convinced that cpumem_enter and _leave should do that on behalf of > >>> the caller. > >> > >> Let's see how it looks like when we have more usages of per cpu memory. > > > > The per-CPU counters API is likely to need it though. I'm less sure > > about the per-CPU memory API itself. > > there are a couple of arguments against it. > > one big reason i didnt put it in the api itself at the time was the > lack of MI splraise(). ive since fixed that on everything except arm > as far as i can tell. arm may be ok, but its hard to read and my > zaurus is dusty. it's worth figuring out though cos the pool parts > of the diff use splraise instead of a mutex per cpu. > > id also have to store the ipl somewhere inside struct cpumem, but be > clever about avoiding more indirection on the way to the per cpu > memory. so far the caller already has a pretty good idea of what > level it is or should be at, and generally keeps state on the stack > (unlike mutexes). And I suppose in most cases when updating the counters you'll already be at the desired spl. So I'm with mpi@ here. Let's stick to what you have now and revise after we have used it in a couple of places. > >>> ive added a wrapper around percpu memory for counters. basically > >>> you ask it for N counters, where each counter is a uint64_t. > >>> counters_enter will give you a per cpu reference to these N counters > >>> which you can increment and decrement as you wish. internally the > >>> api will version the per cpu counters so a reader can know when > >>> theyre consistent, which is important on 32bit archs (where 64bit > >>> ops arent necessarily atomic), or where you want several counters > >>> to be consistent with each other (like packet and byte counters). > >>> counters_read is provided to use the above machinery for a consistent > >>> read. > >> > >> Comment below. > >> > >>> secondly, there is a boot strapping problem with per cpu data > >>> structures, which is very apparent with the mbuf layer. the problem > >>> is we dont know how many cpus are in the system until we're halfway > >>> through attaching device drivers in the system. however, if we want > >>> to use percpu data structures during attach we need to know how > >>> many cpus we have. mbufs are allocated during attach, so we need > >>> to know how many cpus we have before attach. > >> > >> What about using ncpusfound? > > > > On sparc64 that gets called very early, before pmap_bootstrap(), > > because we need to allocate some per-CPU memory ;). > > > > On other systems we typically calculate it on the fly as we're > > attaching CPUs. That may not be early enough. > > i tried ncpusfound and MAXCPUS. ncpusfound is generally unavailable > before cpu_configure, and im trying to avoid wasting memory with > MAXCPUs. using global memory for the boot cpu and doing a "realloc" > after cpu_configure seems to work well. > > the second diff i mailed out has realloc continue to use the global > as the cpumem for cpu 0 and allocate more memory for 1 and up. this > means we only waste a pointer of memory for percpu memory that needs > to work straight after boot. I think that's perfectly acceptable. Hiwver, looking at the code xxx_realloc() might be a bit misleading. One could naively think that these interfaces could be used to change the size of the per-CPU memory area. But that's not the case isn't it? It also seems that you currently rely on each subsystem that uses per-CPU memory to call the appropriate xxx_realloc function itself. It might be feasable to do this automatically in percpu_init() by using ELF section magic to construct a list of CPUMEM_BOOT_MEMORY() declarations. But perhaps it is best to file that away as a future improvement for now. > on UP we will waste nothing cos we dont use struct cpumem, and avoid > needing to allocate anything further at runtime. > > > > >>> +uint64_t * > >>> +counters_enter(struct counters_ref *ref, struct cpumem *cm) > >>> +{ > >>> + ref->gen = cpumem_enter(cm); > >>> + (*ref->gen)++; /* make the generation number odd */ > >>> + return (ref->gen + 1); > >>> +} > >>> + > >>> +void > >>> +counters_leave(struct counters_ref *ref,
finer grained art locking again
This removes ART's reliance on the kernel lock to serialise updates. I sent out an earlier version of this prior to 6.0, but it didn't make it in. Since then, we've decided to go with rwlocks in the packet processing path, so here's a version with an rwlock instead of a mutex. Index: art.c === RCS file: /cvs/src/sys/net/art.c,v retrieving revision 1.22 diff -u -p -r1.22 art.c --- art.c 19 Jul 2016 10:51:44 - 1.22 +++ art.c 14 Aug 2016 11:10:45 - @@ -152,6 +152,7 @@ art_alloc(unsigned int rtableid, unsigne ar->ar_off = off; ar->ar_rtableid = rtableid; + rw_init(>ar_lock, "art"); return (ar); } @@ -378,7 +379,7 @@ art_insert(struct art_root *ar, struct a struct art_node *node; int i, j; - KERNEL_ASSERT_LOCKED(); + rw_assert_wrlock(>ar_lock); KASSERT(plen >= 0 && plen <= ar->ar_alen); at = srp_get_locked(>ar_root); @@ -482,7 +483,7 @@ art_delete(struct art_root *ar, struct a struct art_node *node; int i, j; - KERNEL_ASSERT_LOCKED(); + rw_assert_wrlock(>ar_lock); KASSERT(plen >= 0 && plen <= ar->ar_alen); at = srp_get_locked(>ar_root); @@ -613,7 +614,7 @@ art_walk(struct art_root *ar, int (*f)(s struct art_node *node; int error = 0; - KERNEL_LOCK(); + rw_enter_write(>ar_lock); at = srp_get_locked(>ar_root); if (at != NULL) { art_table_ref(ar, at); @@ -631,7 +632,7 @@ art_walk(struct art_root *ar, int (*f)(s art_table_free(ar, at); } - KERNEL_UNLOCK(); + rw_exit_write(>ar_lock); return (error); } @@ -708,9 +709,9 @@ art_walk_apply(struct art_root *ar, if ((an != NULL) && (an != next)) { /* this assumes an->an_dst is not used by f */ - KERNEL_UNLOCK(); + rw_exit_write(>ar_lock); error = (*f)(an, arg); - KERNEL_LOCK(); + rw_enter_write(>ar_lock); } return (error); Index: art.h === RCS file: /cvs/src/sys/net/art.h,v retrieving revision 1.14 diff -u -p -r1.14 art.h --- art.h 14 Jun 2016 04:42:02 - 1.14 +++ art.h 14 Aug 2016 11:10:45 - @@ -19,6 +19,8 @@ #ifndef _NET_ART_H_ #define _NET_ART_H_ +#include + #define ART_MAXLVL 32 /* We currently use 32 levels for IPv6. */ /* @@ -26,6 +28,7 @@ */ struct art_root { struct srp ar_root; /* First table */ + struct rwlockar_lock; /* Serialise modifications */ uint8_t ar_bits[ART_MAXLVL]; /* Per level stride */ uint8_t ar_nlvl; /* Number of levels */ uint8_t ar_alen; /* Address length in bits */ Index: rtable.c === RCS file: /cvs/src/sys/net/rtable.c,v retrieving revision 1.50 diff -u -p -r1.50 rtable.c --- rtable.c19 Jul 2016 10:51:44 - 1.50 +++ rtable.c14 Aug 2016 11:10:45 - @@ -515,6 +515,10 @@ static inline uint8_t *satoaddr(struct a void rtentry_ref(void *, void *); void rtentry_unref(void *, void *); +#ifndef SMALL_KERNEL +void rtable_mpath_insert(struct art_node *, struct rtentry *); +#endif + struct srpl_rc rt_rc = SRPL_RC_INITIALIZER(rtentry_ref, rtentry_unref, NULL); void @@ -679,8 +683,6 @@ rtable_insert(unsigned int rtableid, str unsigned int rt_flags; int error = 0; - KERNEL_ASSERT_LOCKED(); - ar = rtable_get(rtableid, dst->sa_family); if (ar == NULL) return (EAFNOSUPPORT); @@ -691,6 +693,7 @@ rtable_insert(unsigned int rtableid, str return (EINVAL); rtref(rt); /* guarantee rtfree won't do anything during insert */ + rw_enter_write(>ar_lock); #ifndef SMALL_KERNEL /* Do not permit exactly the same dst/mask/gw pair. */ @@ -766,15 +769,14 @@ rtable_insert(unsigned int rtableid, str } } - SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next); - /* Put newly inserted entry at the right place. */ - rtable_mpath_reprio(rtableid, dst, mask, rt->rt_priority, rt); + rtable_mpath_insert(an, rt); #else error = EEXIST; #endif /* SMALL_KERNEL */ } leave: + rw_exit_write(>ar_lock); rtfree(rt); return (error); } @@ -792,6 +794,7 @@ rtable_delete(unsigned int rtableid, str struct rtentry *mrt; int npaths = 0; #endif /* SMALL_KERNEL */ + int
Re: Two sunxi driver diffs
> Date: Sun, 14 Aug 2016 13:59:00 +0200 (CEST) > From: Mark Kettenis> > Here are two diffs for the sunxi armv7 platform that I can't test > myself. The first one is for sxiahci(4), and changes it to use the > regulator API to power on the bus. The second one is for sxie(4), and > changes it to use the pinctrl API to select the appropriate pins. > > If anybody with a Allwinner A10/A10s/A13 board (Cubiboard, pcDuino, > Olinuxino A10/A10s/A13) could verify that this doesn't break their > SATA or Ethernet it would be greatly appreciated. FWIW, these diffs have committed now. The SATA diff has been tested, the Ethernet diff hasn't. So testing of that one is still welcome. But you can simply get the latest kernel sources from anoncvs or wait until the next snapshot. > Index: arch/armv7/sunxi/sxiahci.c > === > RCS file: /cvs/src/sys/arch/armv7/sunxi/sxiahci.c,v > retrieving revision 1.9 > diff -u -p -r1.9 sxiahci.c > --- arch/armv7/sunxi/sxiahci.c5 Aug 2016 19:00:25 - 1.9 > +++ arch/armv7/sunxi/sxiahci.c14 Aug 2016 11:21:34 - > @@ -33,9 +33,9 @@ > #include > #include > #include > -#include > > #include > +#include > #include > > #define SXIAHCI_CAP 0x > @@ -97,6 +97,7 @@ sxiahci_attach(struct device *parent, st > struct sxiahci_softc *sxisc = (struct sxiahci_softc *)self; > struct ahci_softc *sc = >sc; > struct fdt_attach_args *faa = aux; > + uint32_t target_supply; > uint32_t timo; > > if (faa->fa_nreg < 1) > @@ -165,8 +166,9 @@ sxiahci_attach(struct device *parent, st > SXIWRITE4(sc, SXIAHCI_RWC, 7); > > /* power up phy */ > - sxipio_setcfg(SXIAHCI_PWRPIN, SXIPIO_OUTPUT); > - sxipio_setpin(SXIAHCI_PWRPIN); > + target_supply = OF_getpropint(faa->fa_node, "target-supply", 0); > + if (target_supply) > + regulator_enable(target_supply); > > sc->sc_ih = arm_intr_establish_fdt(faa->fa_node, IPL_BIO, > ahci_intr, sc, sc->sc_dev.dv_xname); > @@ -190,7 +192,8 @@ sxiahci_attach(struct device *parent, st > irq: > arm_intr_disestablish(sc->sc_ih); > clrpwr: > - sxipio_clrpin(SXIAHCI_PWRPIN); > + if (target_supply) > + regulator_disable(target_supply); > dismod: > sxiccmu_disablemodule(CCMU_AHCI); > bus_space_unmap(sc->sc_iot, sc->sc_ioh, sc->sc_ios); > Index: arch/armv7/sunxi/sxie.c > === > RCS file: /cvs/src/sys/arch/armv7/sunxi/sxie.c,v > retrieving revision 1.19 > diff -u -p -r1.19 sxie.c > --- arch/armv7/sunxi/sxie.c 5 Aug 2016 22:19:23 - 1.19 > +++ arch/armv7/sunxi/sxie.c 14 Aug 2016 11:21:34 - > @@ -49,9 +49,9 @@ > #include > #include > #include > -#include > > #include > +#include > #include > > /* configuration registers */ > @@ -215,6 +215,8 @@ sxie_attach(struct device *parent, struc > if (faa->fa_nreg < 1) > return; > > + pinctrl_byname(faa->fa_node, "default"); > + > sc->sc_iot = faa->fa_iot; > > if (bus_space_map(sc->sc_iot, faa->fa_reg[0].addr, > @@ -273,11 +275,9 @@ sxie_attach(struct device *parent, struc > void > sxie_socware_init(struct sxie_softc *sc) > { > - int i, have_mac = 0; > + int have_mac = 0; > uint32_t reg; > > - for (i = 0; i < SXIPIO_EMAC_NPINS; i++) > - sxipio_setcfg(i, 2); /* mux pins to EMAC */ > sxiccmu_enablemodule(CCMU_EMAC); > > /* MII clock cfg */ > >
Re: MP-safe L2 "lookup" w/o atomic operation
On 11/08/16(Thu) 16:04, Martin Pieuchot wrote: > One of the remaining SMP issue with our routing table usage is to > guarantee that the L2 entry referenced by a RTF_GATEWAY route via > the ``rt_gwroute'' pointer wont be replaced/invalidated by another > CPU while we are filling the address field of an Ethernet frame. > > The most efficient way, performance wise, to do that is to make the > ``rt_gwroute'' pointer immutable during the lifetime of a RTF_GATEWAY > route. If we know that this pointer won't change and that the memory > it points to won't be freed as long as a CPU has reference to one of > the RTF_GATEWAY routes, we don't need any special protection inside > arp_resolve() and nd6_resolve(). > > Diff below achieve that by always resolving the next-hop entry before > inserting the corresponding RTF_GATEWAY route in the tree. In other > words rt_setgwroute() is no longer called in the sending path. > > To guarantee that a cached route won't be deleted while it is still > referenced a new flag, RTF_CACHED, is added to the route. arp(8) and > ndp(8) treat entry with this flag like RTF_LOCAL. > > Removing rt_setgwroute() from the sending path means that inserting a > RTF_GATEWAY route will now fail if the next-hop cannot be resolve. This > might introduce regression for some setups but I see that has an > improvement since the kernel no longer let you add a route that cannot > be used. > It also mean that stale routes need to be fixed whenever a new address > is configured. The diff does that and also plug an ifa leak that was > happening when the same address is configured twice on an ifa. > > Note that even with this diff there are *still* some MP-safeness issue > due to stale routes, they will be address in a later diff. Updated diff that: - Fix a panic reported by Hrvoje Popovski - Properly invalidate ARP/NDP entry when issuing a RTM_DELETE from userland, as discussed with bluhm@. For that I introduced RTM_INVALIDATE and use it instead of deleting L2 entries. With this we no longer generate deletion/insertion for L2 entries, of course that include RTF_CACHED. - This has been tested with a pppoe(4) setup to make sure that inserting a route with a magic 0.0.0.1 gateway works, pointed out by claudio@. More tests, comments, ok are welcome. Index: net/route.c === RCS file: /cvs/src/sys/net/route.c,v retrieving revision 1.313 diff -u -p -r1.313 route.c --- net/route.c 22 Jul 2016 11:03:30 - 1.313 +++ net/route.c 15 Aug 2016 08:58:39 - @@ -153,7 +153,9 @@ struct pool rtentry_pool; /* pool for r struct poolrttimer_pool; /* pool for rttimer structures */ void rt_timer_init(void); -void rt_setgwroute(struct rtentry *, u_int); +intrt_setgwroute(struct rtentry *, u_int); +void rt_putgwroute(struct rtentry *); +intrt_fixgwroute(struct rtentry *, void *, unsigned int); intrtflushclone1(struct rtentry *, void *, u_int); void rtflushclone(unsigned int, struct rtentry *); intrt_if_remove_rtdelete(struct rtentry *, void *, u_int); @@ -204,21 +206,20 @@ rtisvalid(struct rtentry *rt) if (rt == NULL) return (0); -#ifdef DIAGNOSTIC - if (ISSET(rt->rt_flags, RTF_GATEWAY) && (rt->rt_gwroute != NULL) && - ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY)) - panic("next hop must be directly reachable"); -#endif - - if ((rt->rt_flags & RTF_UP) == 0) + if (!ISSET(rt->rt_flags, RTF_UP)) return (0); /* Routes attached to stale ifas should be freed. */ if (rt->rt_ifa == NULL || rt->rt_ifa->ifa_ifp == NULL) return (0); - if (ISSET(rt->rt_flags, RTF_GATEWAY) && !rtisvalid(rt->rt_gwroute)) - return (0); +#ifdef DIAGNOSTIC + if (ISSET(rt->rt_flags, RTF_GATEWAY)) { + KASSERT(rt->rt_gwroute != NULL); + KASSERT(ISSET(rt->rt_gwroute->rt_flags, RTF_UP)); + KASSERT(!ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY)); + } +#endif /* DIAGNOSTIC */ return (1); } @@ -267,8 +268,6 @@ rt_match(struct sockaddr *dst, uint32_t return (rt); } -struct rtentry *_rtalloc(struct sockaddr *, uint32_t *, int, unsigned int); - #ifndef SMALL_KERNEL /* * Originated from bridge_hash() in if_bridge.c @@ -349,16 +348,10 @@ rt_hash(struct rtentry *rt, struct socka struct rtentry * rtalloc_mpath(struct sockaddr *dst, uint32_t *src, unsigned int rtableid) { - return (_rtalloc(dst, src, RT_RESOLVE, rtableid)); + return (rt_match(dst, src, RT_RESOLVE, rtableid)); } #endif /* SMALL_KERNEL */ -struct rtentry * -rtalloc(struct sockaddr *dst, int flags, unsigned int rtableid) -{ - return (_rtalloc(dst, NULL, flags, rtableid)); -} - /* * Look in the routing table for the best matching entry for * ``dst''. @@ -367,44 +360,35 @@ rtalloc(struct sockaddr *dst, int
Re: httpd: be stricter with TLS configuration
On Sat, Aug 13, 2016 at 02:57:14AM +1000, Joel Sing wrote: > The following diff makes httpd stricter with respect to TLS configuration: > > - Do not allow TLS and non-TLS to be configured on the same port. > - Do not allow TLS options to be specified without a TLS listener. > - Ensure that TLS options are the same when a server is specified on the > same address/port. > > Currently, these configurations are permitted but do not work as intended. > > This also factors out (and reuses) the server matching code that was already > duplicated. > > ok? > - I think server_match() and server_tls_cmp() can both live in server.c (server_match() somewhere close to server_foreach() - this match function can be used for at least one other case outside of parse.y). - As discussed before, for consistency with the config, please use "tls" instead of "TLS" in the log messages. FYI, The SNI diff doesn't like the tls_cert_file and tls_key_file checks in server_tls_cmp(), as they now become valid, but they can be removed/changed later. Otherwise OK reyk@ > Index: parse.y > === > RCS file: /cvs/src/usr.sbin/httpd/parse.y,v > retrieving revision 1.78 > diff -u -p -r1.78 parse.y > --- parse.y 21 Jun 2016 21:35:24 - 1.78 > +++ parse.y 12 Aug 2016 16:48:28 - > @@ -107,8 +107,10 @@ int host_if(const char *, struct addre > int host(const char *, struct addresslist *, > int, struct portrange *, const char *, int); > void host_free(struct addresslist *); > +struct server*server_match(struct server *, int); > struct server*server_inherit(struct server *, struct server_config *, > struct server_config *); > +int server_tls_cmp(struct server *, struct server *); > int getservice(char *); > int is_if_in_group(const char *, const char *); > > @@ -283,24 +285,13 @@ server : SERVER optmatch STRING{ > > TAILQ_INSERT_TAIL(>srv_hosts, srv_conf, entry); > } '{' optnl serveropts_l '}'{ > - struct server *s = NULL, *sn; > + struct server *s, *sn; > struct server_config*a, *b; > > srv_conf = >srv_conf; > > - TAILQ_FOREACH(s, conf->sc_servers, srv_entry) { > - if ((s->srv_conf.flags & > - SRVFLAG_LOCATION) == 0 && > - strcmp(s->srv_conf.name, > - srv->srv_conf.name) == 0 && > - s->srv_conf.port == srv->srv_conf.port && > - sockaddr_cmp( > - (struct sockaddr *)>srv_conf.ss, > - (struct sockaddr *)>srv_conf.ss, > - s->srv_conf.prefixlen) == 0) > - break; > - } > - if (s != NULL) { > + /* Check if the new server already exists. */ > + if (server_match(srv, 1) != NULL) { > yyerror("server \"%s\" defined twice", > srv->srv_conf.name); > serverconfig_free(srv_conf); > @@ -315,16 +306,39 @@ server : SERVER optmatch STRING{ > YYERROR; > } > > + if ((s = server_match(srv, 0)) != NULL) { > + if ((s->srv_conf.flags & SRVFLAG_TLS) != > + (srv->srv_conf.flags & SRVFLAG_TLS)) { > + yyerror("server \"%s\": TLS and " > + "non-TLS on same address/port", > + srv->srv_conf.name); > + serverconfig_free(srv_conf); > + free(srv); > + YYERROR; > + } > + if (server_tls_cmp(s, srv) != 0) { > + yyerror("server \"%s\": TLS " > + "configuration mismatch on same " > + "address/port", > + srv->srv_conf.name); > + serverconfig_free(srv_conf); > + free(srv); > + YYERROR; > + } > + } > + > if ((srv->srv_conf.flags & SRVFLAG_TLS) && > srv->srv_conf.tls_protocols == 0) { > - yyerror("no TLS
cvs(1): default to a)bort for empty commit log
This is one of the quirks of cvs that I have a hard time getting used to: $ cvs commit Log message unchanged or not specified a)bort, c)ontinue, e)dit, !)reuse this message unchanged for remaining dirs Action: (continue) I don't think I'll ever want to c)ontinue here because I don't want to commit anything without a log message. I exit the editor without writing the log file precisely because there might be something wrong, something I need to go double check. In that situation, I don't need the extra thrill of making sure that I choose a non-default option so I don't change anything. Thus, I suggest to default to abort here. Index: src/logmsg.c === RCS file: /var/cvs/src/gnu/usr.bin/cvs/src/logmsg.c,v retrieving revision 1.4 diff -u -p -r1.4 logmsg.c --- src/logmsg.c4 Mar 2012 04:05:15 - 1.4 +++ src/logmsg.c15 Aug 2016 03:25:27 - @@ -346,7 +346,7 @@ do_editor (dir, messagep, repository, ch { (void) printf ("\nLog message unchanged or not specified\n"); (void) printf ("a)bort, c)ontinue, e)dit, !)reuse this message unchanged for remaining dirs\n"); - (void) printf ("Action: (continue) "); + (void) printf ("Action: (abort) "); (void) fflush (stdout); line_length = get_line (, _chars_allocated, stdin); if (line_length < 0) @@ -358,14 +358,14 @@ do_editor (dir, messagep, repository, ch error (1, 0, "aborting"); } else if (line_length == 0 -|| *line == '\n' || *line == 'c' || *line == 'C') - break; - if (*line == 'a' || *line == 'A') +|| *line == '\n' || *line == 'a' || *line == 'A') { if (unlink_file (fname) < 0) error (0, errno, "warning: cannot remove temp file %s", fname); error (1, 0, "aborted by user"); } + if (*line == 'c' || *line == 'C') + break; if (*line == 'e' || *line == 'E') goto again; if (*line == '!')
pppoe(4) IPv6 example
Using fe80:: as gateway is not be a good idea since by default netstart(8) insert: fe80::/10 ::1 UGRS0 0 32768 8 lo0 As you can see this route is attached to lo0 *and* has RTF_GATEWAY. This contradicts what rt_setgwroute() wants and make the kernel unhappy, diff below fixes that. ok? Index: pppoe.4 === RCS file: /cvs/src/share/man/man4/pppoe.4,v retrieving revision 1.31 diff -u -p -r1.31 pppoe.4 --- pppoe.4 1 Apr 2016 05:47:34 - 1.31 +++ pppoe.4 15 Aug 2016 09:00:56 - @@ -102,7 +102,7 @@ inet 0.0.0.0 255.255.255.255 NONE \e dest 0.0.0.1 inet6 eui64 !/sbin/route add default -ifp pppoe0 0.0.0.1 -!/sbin/route add -inet6 default -ifp pppoe0 fe80:: +!/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0 .Ed .Pp The physical interface must also be marked
Re: Correct order of route removal
On 15/08/16(Mon) 10:25, Claudio Jeker wrote: > On Mon, Aug 15, 2016 at 08:42:06AM +0200, Martin Pieuchot wrote: > > On 08/08/16(Mon) 11:48, Martin Pieuchot wrote: > > > The rtable_walk() & prio bug I just sent a fix for should theoretically > > > not cause any trouble. Sadly it is piled on top of another bug for > > > which a fix is attached. > > > > > > When an interface is removed the current code starts by purging all its > > > corresponding route entries. This is wrong because the per-AF code has > > > some knowledge of which automagic route should be removed first. > > > > > > In other words, the rtable_walk() hang should never have been triggered > > > because the IPv4-specific code should take care of removing the > > > RTF_BROADCAST entry. I believe that this ordering problem is the reason > > > why error code are ignored in AF-specific code paths. > > > > > > Diff attached fixes that, ok? > > > > Anyone? > > > > IMO this is a bit of a catch-22 here. Until recently we did store less > interface specific stuff in the routing table and so it made sense to > clean up the routes first (the assumption was that this will first remove > all gateway routes and probably also the arp/ndp caches). This was done when no RTF_LOCAL/RTF_BROADCAST existed. > This makes > removing of the interface specific stuff simpler because the upper layers > are already clean. How does it make it simpler? >Now we turn this upside down but then we may get burned > because gateway routes are cleaned too late and put the routing table into > an inconsitent state. Which inconsistent state? Cached route are still referenced. > I know you removed a fair amount of references that > would cause crashes because of bad pointers. Need to update my mental > model of the rtable / ifa / ifp interaction to see if this is indeed not > possible. Sure, you need to look at in_ifinit() and in_purgeaddr(). Currently rt_ifa_dellocal() and the rt_ifa_del() are noop and return errors. That's why errors are not checked. > At the same time I wonder if we need multiple walks to make this > save and easier to understand... (replacing dragons with other smaller > dragons is not that helpful because they grow over time). I'm not sure to understand. > I try to find time tonight to look into this. Thanks.
Re: Correct order of route removal
On Mon, Aug 15, 2016 at 08:42:06AM +0200, Martin Pieuchot wrote: > On 08/08/16(Mon) 11:48, Martin Pieuchot wrote: > > The rtable_walk() & prio bug I just sent a fix for should theoretically > > not cause any trouble. Sadly it is piled on top of another bug for > > which a fix is attached. > > > > When an interface is removed the current code starts by purging all its > > corresponding route entries. This is wrong because the per-AF code has > > some knowledge of which automagic route should be removed first. > > > > In other words, the rtable_walk() hang should never have been triggered > > because the IPv4-specific code should take care of removing the > > RTF_BROADCAST entry. I believe that this ordering problem is the reason > > why error code are ignored in AF-specific code paths. > > > > Diff attached fixes that, ok? > > Anyone? > IMO this is a bit of a catch-22 here. Until recently we did store less interface specific stuff in the routing table and so it made sense to clean up the routes first (the assumption was that this will first remove all gateway routes and probably also the arp/ndp caches). This makes removing of the interface specific stuff simpler because the upper layers are already clean. Now we turn this upside down but then we may get burned because gateway routes are cleaned too late and put the routing table into an inconsitent state. I know you removed a fair amount of references that would cause crashes because of bad pointers. Need to update my mental model of the rtable / ifa / ifp interaction to see if this is indeed not possible. At the same time I wonder if we need multiple walks to make this save and easier to understand... (replacing dragons with other smaller dragons is not that helpful because they grow over time). I try to find time tonight to look into this. > > Index: net/if.c > > === > > RCS file: /cvs/src/sys/net/if.c,v > > retrieving revision 1.436 > > diff -u -p -r1.436 if.c > > --- net/if.c13 Jul 2016 16:45:19 - 1.436 > > +++ net/if.c22 Jul 2016 12:45:28 - > > @@ -931,7 +931,6 @@ if_detach(struct ifnet *ifp) > > #if NBPFILTER > 0 > > bpfdetach(ifp); > > #endif > > - rt_if_remove(ifp); > > rti_delete(ifp); > > #if NETHER > 0 && defined(NFSCLIENT) > > if (ifp->if_index == revarp_ifidx) > > @@ -944,6 +943,7 @@ if_detach(struct ifnet *ifp) > > #ifdef INET6 > > in6_ifdetach(ifp); > > #endif > > + rt_if_remove(ifp); > > #if NPF > 0 > > pfi_detach_ifnet(ifp); > > #endif > > @@ -1931,15 +1931,15 @@ ifioctl(struct socket *so, u_long cmd, c > > */ > > if (up) > > if_down(ifp); > > - rt_if_remove(ifp); > > rti_delete(ifp); > > #ifdef MROUTING > > vif_delete(ifp); > > #endif > > + in_ifdetach(ifp); > > #ifdef INET6 > > in6_ifdetach(ifp); > > #endif > > - in_ifdetach(ifp); > > + rt_if_remove(ifp); > > splx(s); > > } > > > -- :wq Claudio
Re: rtable_walk() hand and route prio
On Mon, Aug 15, 2016 at 08:41:52AM +0200, Martin Pieuchot wrote: > On 08/08/16(Mon) 11:42, Martin Pieuchot wrote: > > On the train back from n2k16 I found the real cause of the hang reported > > by Dimitris Papastamos [0] and exposed by our recent > > changes to the routing table. > > > > When an interface is removed/detached the kernel delete all the > > corresponding route entries. At this moment the interface is > > DOWN and the corresponding route as well. So the priority check > > should consider that. > > > > Without the diff attached or the workaround to stop iterating when > > an error occurs, the kernel would loop forever since it can't remove > > the RTF_BROADCAST entry. > > > > [0] https://marc.info/?l=openbsd-bugs=146909621511954=2 > > > > ok? > > Anyone? > This is correct. OK claudio@ IIRC this is not the first time we go burned by RTP_MASK so I wonder if we should use a macro to access prio that does the masking internally or removing the DOWN prio but then then all checks need to also check for RTF_UP... > > Index: net/route.c > > === > > RCS file: /cvs/src/sys/net/route.c,v > > retrieving revision 1.313 > > diff -u -p -r1.313 route.c > > --- net/route.c 22 Jul 2016 11:03:30 - 1.313 > > +++ net/route.c 8 Aug 2016 09:33:15 - > > @@ -873,7 +873,7 @@ rtrequest_delete(struct rt_addrinfo *inf > > * kernel. > > */ > > if ((rt->rt_flags & (RTF_LOCAL|RTF_BROADCAST)) && > > - prio != RTP_LOCAL) { > > + (prio & RTP_MASK) != RTP_LOCAL) { > > rtfree(rt); > > return (EINVAL); > > } > -- :wq Claudio
Re: a few userspace quad_t conversions
On Sun, Aug 14, 2016 at 09:59:21PM -0700, Philip Guenther wrote: > > du is an easy on, though the use of fts_number here reminded me that we > need to increase to a long long at some point like the other BSDs did. > > hexdump is...subtle, as it both creates format strings on the fly and > *rewrites* them! Turns out it was using %q for both of those, hah! The > display.c change is a straight forward quad_t-->int64_t conversion, while > the parse.c change is to use %ll instead of %q. > > systat just needs u_quad_t to be converted to uint64_t, but while we're > here let's convert bcopy() to memcpy() and bzero() to memset(). > > ok? QK, reads fine to me. See one comment below. > > PHilip > > Index: usr.bin/du/du.c > === > RCS file: /data/src/openbsd/src/usr.bin/du/du.c,v > retrieving revision 1.31 > diff -u -p -r1.31 du.c > --- usr.bin/du/du.c 10 Oct 2015 05:32:52 - 1.31 > +++ usr.bin/du/du.c 15 Aug 2016 04:08:25 - > @@ -50,7 +50,7 @@ > > > int linkchk(FTSENT *); > -void prtout(quad_t, char *, int); > +void prtout(int64_t, char *, int); > void usage(void); > > int > @@ -59,7 +59,7 @@ main(int argc, char *argv[]) > FTS *fts; > FTSENT *p; > long blocksize; > - quad_t totalblocks; > + int64_t totalblocks; > int ftsoptions, listfiles, maxdepth; > int Hflag, Lflag, cflag, hflag, kflag; > int ch, notused, rval; > @@ -177,7 +177,7 @@ main(int argc, char *argv[]) >* root of a traversal, display the total. >*/ > if (p->fts_level <= maxdepth) > - prtout((quad_t)howmany(p->fts_number, > + prtout(howmany(p->fts_number, > (unsigned long)blocksize), p->fts_path, > hflag); > break; > @@ -207,7 +207,7 @@ main(int argc, char *argv[]) > if (errno) > err(1, "fts_read"); > if (cflag) { > - prtout((quad_t)howmany(totalblocks, blocksize), "total", hflag); > + prtout(howmany(totalblocks, blocksize), "total", hflag); > } > fts_close(fts); > exit(rval); > @@ -301,17 +301,17 @@ linkchk(FTSENT *p) > } > > void > -prtout(quad_t size, char *path, int hflag) > +prtout(int64_t size, char *path, int hflag) > { > if (!hflag) > - (void)printf("%lld\t%s\n", (long long)size, path); > + (void)printf("%lld\t%s\n", size, path); > else { > char buf[FMT_SCALED_STRSIZE]; > > if (fmt_scaled(size * 512, buf) == 0) > (void)printf("%s\t%s\n", buf, path); > else > - (void)printf("%lld\t%s\n", (long long)size, path); > + (void)printf("%lld\t%s\n", size, path); > } > } > > Index: usr.bin/hexdump/display.c > === > RCS file: /data/src/openbsd/src/usr.bin/hexdump/display.c,v > retrieving revision 1.24 > diff -u -p -r1.24 display.c > --- usr.bin/hexdump/display.c 15 Mar 2016 04:19:13 - 1.24 > +++ usr.bin/hexdump/display.c 15 Aug 2016 04:16:35 - > @@ -100,7 +100,7 @@ display(void) > for (pr = endfu->nextpr; pr; pr = pr->nextpr) > switch(pr->flags) { > case F_ADDRESS: > - (void)printf(pr->fmt, (quad_t)eaddress); > + (void)printf(pr->fmt, (int64_t)eaddress); > break; > case F_TEXT: > (void)printf("%s", pr->fmt); > @@ -123,7 +123,7 @@ print(PR *pr, u_char *bp) > > switch(pr->flags) { > case F_ADDRESS: > - (void)printf(pr->fmt, (quad_t)address); > + (void)printf(pr->fmt, (int64_t)address); > break; > case F_BPAD: > (void)printf(pr->fmt, ""); > @@ -149,15 +149,15 @@ print(PR *pr, u_char *bp) > case F_INT: > switch(pr->bcnt) { > case 1: > - (void)printf(pr->fmt, (quad_t)*bp); > + (void)printf(pr->fmt, (int64_t)*bp); > break; > case 2: > memmove(, bp, sizeof(s2)); > - (void)printf(pr->fmt, (quad_t)s2); > + (void)printf(pr->fmt, (int64_t)s2); > break; > case 4: > memmove(, bp, sizeof(s4)); > - (void)printf(pr->fmt, (quad_t)s4); > + (void)printf(pr->fmt, (int64_t)s4); > break; > case 8: > memmove(, bp, sizeof(s8)); > @@ -180,15 +180,15 @@ print(PR *pr, u_char *bp) > case F_UINT: > switch(pr->bcnt) { >
Re: Correct order of route removal
On 08/08/16(Mon) 11:48, Martin Pieuchot wrote: > The rtable_walk() & prio bug I just sent a fix for should theoretically > not cause any trouble. Sadly it is piled on top of another bug for > which a fix is attached. > > When an interface is removed the current code starts by purging all its > corresponding route entries. This is wrong because the per-AF code has > some knowledge of which automagic route should be removed first. > > In other words, the rtable_walk() hang should never have been triggered > because the IPv4-specific code should take care of removing the > RTF_BROADCAST entry. I believe that this ordering problem is the reason > why error code are ignored in AF-specific code paths. > > Diff attached fixes that, ok? Anyone? > Index: net/if.c > === > RCS file: /cvs/src/sys/net/if.c,v > retrieving revision 1.436 > diff -u -p -r1.436 if.c > --- net/if.c 13 Jul 2016 16:45:19 - 1.436 > +++ net/if.c 22 Jul 2016 12:45:28 - > @@ -931,7 +931,6 @@ if_detach(struct ifnet *ifp) > #if NBPFILTER > 0 > bpfdetach(ifp); > #endif > - rt_if_remove(ifp); > rti_delete(ifp); > #if NETHER > 0 && defined(NFSCLIENT) > if (ifp->if_index == revarp_ifidx) > @@ -944,6 +943,7 @@ if_detach(struct ifnet *ifp) > #ifdef INET6 > in6_ifdetach(ifp); > #endif > + rt_if_remove(ifp); > #if NPF > 0 > pfi_detach_ifnet(ifp); > #endif > @@ -1931,15 +1931,15 @@ ifioctl(struct socket *so, u_long cmd, c >*/ > if (up) > if_down(ifp); > - rt_if_remove(ifp); > rti_delete(ifp); > #ifdef MROUTING > vif_delete(ifp); > #endif > + in_ifdetach(ifp); > #ifdef INET6 > in6_ifdetach(ifp); > #endif > - in_ifdetach(ifp); > + rt_if_remove(ifp); > splx(s); > } >
Re: rtable_walk() hand and route prio
On 08/08/16(Mon) 11:42, Martin Pieuchot wrote: > On the train back from n2k16 I found the real cause of the hang reported > by Dimitris Papastamos [0] and exposed by our recent > changes to the routing table. > > When an interface is removed/detached the kernel delete all the > corresponding route entries. At this moment the interface is > DOWN and the corresponding route as well. So the priority check > should consider that. > > Without the diff attached or the workaround to stop iterating when > an error occurs, the kernel would loop forever since it can't remove > the RTF_BROADCAST entry. > > [0] https://marc.info/?l=openbsd-bugs=146909621511954=2 > > ok? Anyone? > Index: net/route.c > === > RCS file: /cvs/src/sys/net/route.c,v > retrieving revision 1.313 > diff -u -p -r1.313 route.c > --- net/route.c 22 Jul 2016 11:03:30 - 1.313 > +++ net/route.c 8 Aug 2016 09:33:15 - > @@ -873,7 +873,7 @@ rtrequest_delete(struct rt_addrinfo *inf >* kernel. >*/ > if ((rt->rt_flags & (RTF_LOCAL|RTF_BROADCAST)) && > - prio != RTP_LOCAL) { > + (prio & RTP_MASK) != RTP_LOCAL) { > rtfree(rt); > return (EINVAL); > }