Re: armv7 cache flushing: don't take shortcuts

2016-08-15 Thread Daniel Bolgheroni
On Mon, Aug 15, 2016 at 09:56:09PM +0200, Mark Kettenis wrote:
> The functions that clean/invalidate the caches by virtual address,
> bail out after cleaning 32k worth of data.  The 32k matches the L1
> cache of most of the CPUs we current run on.  But the Cortex-A7 has an
> integrated L2 cache that is larger.  And if you only flush it
> partially you may get into trouble.  And now that we actually use the
> cache that matters.  Many of the more recent ARMv7 CPUs include such a
> L2 cache.  And some of them even have L1 caches that are larger than
> 32k.  So drop the shortcut and simply clean/invalidate what we were
> asked to clean/invalidate.  Most of the calls should be covering a
> single page or less anyway.
> 
> This fixes the core dumps and illegal instructions that I see when
> booting from a SATA disk.

Just saw this commited. It makes Cubieboard2 fully useable so far.
Kernel rebuild with fs on ahci:

(...)
ld -T ldscript --warn-common -nopie -S -o bsd ${SYSTEM_HEAD} vers.o ${OBJS}
textdatabss dec hex
3744040 139412  479308  4362760 429208
   25m50.10s real17m18.26s user 1m28.06s system

Just as a comparison, it takes around 20 min on Wandboard with fs on nfs and
around 23 min on BeagleBone Black with fs also on nfs.

Thank you.

--

U-Boot SPL 2016.07 (Aug 05 2016 - 23:44:57)
DRAM: 1024 MiB
CPU: 91200Hz, AXI/AHB/APB: 3/2/2
Trying to boot from MMC1


U-Boot 2016.07 (Aug 05 2016 - 23:44:57 -0600) Allwinner Technology

CPU:   Allwinner A20 (SUN7I)
Model: Cubietech Cubieboard2
I2C:   ready
DRAM:  1 GiB
MMC:   SUNXI SD/MMC: 0
*** Warning - bad CRC, using default environment

In:serial
Out:   serial
Err:   serial
SCSI:  Target spinup took 0 ms.
AHCI 0001.0100 32 slots 1 ports 3 Gbps 0x1 impl SATA mode
flags: ncq stag pm led clo only pmp pio slum part ccc apst
Net:   eth0: ethernet@01c5
starting USB...
USB0:   USB EHCI 1.00
USB1:   USB OHCI 1.0
USB2:   USB EHCI 1.00
USB3:   USB OHCI 1.0
scanning bus 0 for devices... 1 USB Device(s) found
scanning bus 2 for devices... 1 USB Device(s) found
Hit any key to stop autoboot:  0
=>
=> setenv devnum 0
=> run scsi_boot
scanning bus for devices...
  Device 0: (0:0) Vendor: ATA Prod.: TOSHIBA MK1235GS Rev: PV01
Type: Hard Disk
Capacity: 114473.4 MB = 111.7 GB (234441648 x 512)
Found 1 device(s).

Device 0: (0:0) Vendor: ATA Prod.: TOSHIBA MK1235GS Rev: PV01
Type: Hard Disk
Capacity: 114473.4 MB = 111.7 GB (234441648 x 512)
... is now current device
Scanning scsi 0:1...
Found EFI removable media binary efi/boot/bootarm.efi
reading efi/boot/bootarm.efi
65276 bytes read in 23 ms (2.7 MiB/s)
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
## Starting EFI application at 0x4200 ...
Scanning disks on scsi...
Scanning disks on usb...
Scanning disks on mmc...
MMC Device 1 not found
MMC Device 2 not found
MMC Device 3 not found
Found 6 disks
>> OpenBSD/armv7 BOOTARM 0.1
boot>
booting sd0a:/bsd: 3743840+139408+479308 [64+501824+238352]=0x4e3de0

OpenBSD/armv7 booting ...
arg0 0x4000 arg1 0x10bb arg2 0x4800
Allocating page tables
freestart = 0x407e4000, free_pages = 260124 (0x0003f81c)
IRQ stack: p0x40812000 v0xc0812000
ABT stack: p0x40813000 v0xc0813000
UND stack: p0x40814000 v0xc0814000
SVC stack: p0x40815000 v0xc0815000
Creating L1 page table at 0x407e4000
Mapping kernel
Constructing L2 page tables
undefined page pmap [ using 740612 bytes of bsd ELF symbol table ]
board type: 4283
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2016 OpenBSD. All rights reserved.  http://www.OpenBSD.org

OpenBSD 6.0-current (GENERIC) #1: Mon Aug 15 19:34:05 BRT 2016
dbolgher...@wbs.my.domain:/usr/src/sys/arch/armv7/compile/GENERIC
real mem  = 1073741824 (1024MB)
avail mem = 104448 (996MB)
mainbus0 at root: Cubietech Cubieboard2
cpu0 at mainbus0: ARM Cortex A7 rev 4 (ARMv7 core)
cpu0: DC enabled IC enabled WB disabled EABT branch prediction enabled
cpu0: 32KB(32b/l,2way) I-cache, 32KB(64b/l,4way) wr-back D-cache
cortex0 at mainbus0
sunxi0 at mainbus0
sxipio0 at sunxi0: 175 pins
sxiccmu0 at sunxi0
gpio0 at sxipio0: 18 pins
gpio1 at sxipio0: 24 pins
gpio2 at sxipio0: 25 pins
gpio3 at sxipio0: 28 pins
gpio4 at sxipio0: 12 pins
gpio5 at sxipio0: 6 pins
gpio6 at sxipio0: 12 pins
gpio7 at sxipio0: 28 pins
gpio8 at sxipio0: 22 pins
agtimer0 at mainbus0: tick rate 24000 KHz
simplebus0 at mainbus0: "soc"
ehci0 at simplebus0
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Allwinner EHCI root hub" rev 2.00/1.00 addr 1
sxiahci0 at simplebus0: AHCI 1.1
sxiahci0: port 0: 3.0Gb/s
scsibus0 at sxiahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0:  SCSI3 0/direct 
fixed naa.5391d4f841be
sd0: 114473MB, 512 bytes/sector, 234441648 sectors
ehci1 at simplebus0
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 "Allwinner EHCI root hub" rev 2.00/1.00 addr 1
sxidog0 at simplebus0
sxirtc0 at simplebus0

Re: finer grained art locking again

2016-08-15 Thread David Gwynne

> On 16 Aug 2016, at 08:28, Mark Kettenis  wrote:
> 
>> Date: Tue, 16 Aug 2016 08:21:42 +1000
>> From: Jonathan Matthew 
>> 
>> On Mon, Aug 15, 2016 at 02:47:29PM +0200, Mark Kettenis wrote:
 Date: Mon, 15 Aug 2016 22:08:16 +1000
 From: Jonathan Matthew 
 
 This removes ART's reliance on the kernel lock to serialise
 updates.  I sent out an earlier version of this prior to 6.0,
 but it didn't make it in.  Since then, we've decided to go with
 rwlocks in the packet processing path, so here's a version with
 an rwlock instead of a mutex.
>>> 
>>> Merely out of curiousity: is there any reason to only use write locks
>>> here?  Naively you'd think this would be a case where making use of
>>> the multiple-reader, single-writer property of rwlocks would make
>>> sense...
>> 
>> The read operations that need to be fast don't go through the lock
>> at all.  We could use read locks for rtable_walk, but we're unlikely
>> to see concurrent rtable_walks, and I'd prefer to keep the lock use
>> as simple as possible for now.

art_walk modifies the table refcnts, which are serialised by the lock. taking a 
write lock is still necessary.

> Thanks for the explanation.




Re: finer grained art locking again

2016-08-15 Thread Mark Kettenis
> Date: Tue, 16 Aug 2016 08:21:42 +1000
> From: Jonathan Matthew 
> 
> On Mon, Aug 15, 2016 at 02:47:29PM +0200, Mark Kettenis wrote:
> > > Date: Mon, 15 Aug 2016 22:08:16 +1000
> > > From: Jonathan Matthew 
> > > 
> > > This removes ART's reliance on the kernel lock to serialise
> > > updates.  I sent out an earlier version of this prior to 6.0,
> > > but it didn't make it in.  Since then, we've decided to go with
> > > rwlocks in the packet processing path, so here's a version with
> > > an rwlock instead of a mutex.
> > 
> > Merely out of curiousity: is there any reason to only use write locks
> > here?  Naively you'd think this would be a case where making use of
> > the multiple-reader, single-writer property of rwlocks would make
> > sense...
> 
> The read operations that need to be fast don't go through the lock
> at all.  We could use read locks for rtable_walk, but we're unlikely
> to see concurrent rtable_walks, and I'd prefer to keep the lock use
> as simple as possible for now.

Thanks for the explanation.



Re: finer grained art locking again

2016-08-15 Thread Jonathan Matthew
On Mon, Aug 15, 2016 at 02:47:29PM +0200, Mark Kettenis wrote:
> > Date: Mon, 15 Aug 2016 22:08:16 +1000
> > From: Jonathan Matthew 
> > 
> > This removes ART's reliance on the kernel lock to serialise updates.
> > I sent out an earlier version of this prior to 6.0, but it didn't make it 
> > in.
> > Since then, we've decided to go with rwlocks in the packet processing path,
> > so here's a version with an rwlock instead of a mutex.
> 
> Merely out of curiousity: is there any reason to only use write locks
> here?  Naively you'd think this would be a case where making use of
> the multiple-reader, single-writer property of rwlocks would make
> sense...

The read operations that need to be fast don't go through the lock at all.
We could use read locks for rtable_walk, but we're unlikely to see concurrent
rtable_walks, and I'd prefer to keep the lock use as simple as possible for now.



Re: rwlock physlock mem.c

2016-08-15 Thread Mike Larkin
On Mon, Aug 15, 2016 at 06:10:55PM -0400, Ted Unangst wrote:
> same change as for i386, but now for arm and sparcs.
> 

ok mlarkin

> Index: arm/arm/mem.c
> ===
> RCS file: /cvs/src/sys/arch/arm/arm/mem.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 mem.c
> --- arm/arm/mem.c 15 Aug 2016 22:01:59 -  1.16
> +++ arm/arm/mem.c 15 Aug 2016 22:10:03 -
> @@ -81,6 +81,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -89,7 +90,6 @@
>  
>  extern char *memhook;/* poor name! */
>  caddr_t zeropage;
> -int physlock;
>  
>  /* open counter for aperture */
>  #ifdef APERTURE
> @@ -142,6 +142,7 @@ mmclose(dev_t dev, int flag, int mode, s
>  int
>  mmrw(dev_t dev, struct uio *uio, int flags)
>  {
> + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw");
>   vaddr_t o, v;
>   size_t c;
>   struct iovec *iov;
> @@ -150,14 +151,10 @@ mmrw(dev_t dev, struct uio *uio, int fla
>  
>   if (minor(dev) == DEV_MEM) {
>   /* lock against other uses of shared vmmap */
> - while (physlock > 0) {
> - physlock++;
> - error = tsleep((caddr_t), PZERO | PCATCH,
> - "mmrw", 0);
> - if (error)
> - return (error);
> - }
> - physlock = 1;
> + error = rw_enter(, RW_WRITE | RW_INTR);
> + if (error)
> + return (error);
> +
>   }
>   while (uio->uio_resid > 0 && error == 0) {
>   iov = uio->uio_iov;
> @@ -216,9 +213,7 @@ mmrw(dev_t dev, struct uio *uio, int fla
>   }
>   }
>   if (minor(dev) == DEV_MEM) {
> - if (physlock > 1)
> - wakeup((caddr_t));
> - physlock = 0;
> + rw_exit();
>   }
>   return (error);
>  }
> Index: sparc/sparc/mem.c
> ===
> RCS file: /cvs/src/sys/arch/sparc/sparc/mem.c,v
> retrieving revision 1.27
> diff -u -p -r1.27 mem.c
> --- sparc/sparc/mem.c 15 Aug 2016 22:01:59 -  1.27
> +++ sparc/sparc/mem.c 15 Aug 2016 22:10:03 -
> @@ -47,6 +47,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -86,24 +87,19 @@ mmclose(dev_t dev, int flag, int mode, s
>  int
>  mmrw(dev_t dev, struct uio *uio, int flags)
>  {
> + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw");
>   off_t o;
>   paddr_t pa;
>   vaddr_t va;
>   size_t c;
>   struct iovec *iov;
>   int error = 0;
> - static int physlock;
>  
>   if (minor(dev) == 0) {
>   /* lock against other uses of shared mem_page */
> - while (physlock > 0) {
> - physlock++;
> - error = tsleep((caddr_t), PZERO | PCATCH,
> - "mmrw", 0);
> - if (error)
> - return (error);
> - }
> - physlock = 1;
> + error = rw_enter(, RW_WRITE | RW_INTR);
> + if (error)
> + return (error);
>   if (mem_page == 0)
>   mem_page = uvm_km_valloc_wait(kernel_map, NBPG);
>   if (mem_page == 0)
> @@ -193,9 +189,7 @@ mmrw(dev_t dev, struct uio *uio, int fla
>   }
>   if (minor(dev) == 0) {
>  unlock:
> - if (physlock > 1)
> - wakeup((caddr_t));
> - physlock = 0;
> + rw_exit();
>   }
>   return (error);
>  }
> Index: sparc64/sparc64/mem.c
> ===
> RCS file: /cvs/src/sys/arch/sparc64/sparc64/mem.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 mem.c
> --- sparc64/sparc64/mem.c 15 Aug 2016 22:01:59 -  1.16
> +++ sparc64/sparc64/mem.c 15 Aug 2016 22:10:03 -
> @@ -47,6 +47,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -75,24 +76,19 @@ mmclose(dev_t dev, int flag, int mode, s
>  int
>  mmrw(dev_t dev, struct uio *uio, int flags)
>  {
> + static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw");
>   vaddr_t o, v;
>   size_t c;
>   struct iovec *iov;
>   int error = 0;
> - static int physlock;
>   vm_prot_t prot;
>   extern caddr_t vmmap;
>  
>   if (minor(dev) == 0) {
>   /* lock against other uses of shared vmmap */
> - while (physlock > 0) {
> - physlock++;
> - error = tsleep((caddr_t), PZERO | PCATCH,
> - "mmrw", 0);
> - if (error)
> - return (error);
> - }
> - physlock = 1;
> + error = rw_enter(, RW_WRITE | RW_INTR);
> + if (error)
> +  

rwlock physlock mem.c

2016-08-15 Thread Ted Unangst
same change as for i386, but now for arm and sparcs.

Index: arm/arm/mem.c
===
RCS file: /cvs/src/sys/arch/arm/arm/mem.c,v
retrieving revision 1.16
diff -u -p -r1.16 mem.c
--- arm/arm/mem.c   15 Aug 2016 22:01:59 -  1.16
+++ arm/arm/mem.c   15 Aug 2016 22:10:03 -
@@ -81,6 +81,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -89,7 +90,6 @@
 
 extern char *memhook;/* poor name! */
 caddr_t zeropage;
-int physlock;
 
 /* open counter for aperture */
 #ifdef APERTURE
@@ -142,6 +142,7 @@ mmclose(dev_t dev, int flag, int mode, s
 int
 mmrw(dev_t dev, struct uio *uio, int flags)
 {
+   static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw");
vaddr_t o, v;
size_t c;
struct iovec *iov;
@@ -150,14 +151,10 @@ mmrw(dev_t dev, struct uio *uio, int fla
 
if (minor(dev) == DEV_MEM) {
/* lock against other uses of shared vmmap */
-   while (physlock > 0) {
-   physlock++;
-   error = tsleep((caddr_t), PZERO | PCATCH,
-   "mmrw", 0);
-   if (error)
-   return (error);
-   }
-   physlock = 1;
+   error = rw_enter(, RW_WRITE | RW_INTR);
+   if (error)
+   return (error);
+
}
while (uio->uio_resid > 0 && error == 0) {
iov = uio->uio_iov;
@@ -216,9 +213,7 @@ mmrw(dev_t dev, struct uio *uio, int fla
}
}
if (minor(dev) == DEV_MEM) {
-   if (physlock > 1)
-   wakeup((caddr_t));
-   physlock = 0;
+   rw_exit();
}
return (error);
 }
Index: sparc/sparc/mem.c
===
RCS file: /cvs/src/sys/arch/sparc/sparc/mem.c,v
retrieving revision 1.27
diff -u -p -r1.27 mem.c
--- sparc/sparc/mem.c   15 Aug 2016 22:01:59 -  1.27
+++ sparc/sparc/mem.c   15 Aug 2016 22:10:03 -
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -86,24 +87,19 @@ mmclose(dev_t dev, int flag, int mode, s
 int
 mmrw(dev_t dev, struct uio *uio, int flags)
 {
+   static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw");
off_t o;
paddr_t pa;
vaddr_t va;
size_t c;
struct iovec *iov;
int error = 0;
-   static int physlock;
 
if (minor(dev) == 0) {
/* lock against other uses of shared mem_page */
-   while (physlock > 0) {
-   physlock++;
-   error = tsleep((caddr_t), PZERO | PCATCH,
-   "mmrw", 0);
-   if (error)
-   return (error);
-   }
-   physlock = 1;
+   error = rw_enter(, RW_WRITE | RW_INTR);
+   if (error)
+   return (error);
if (mem_page == 0)
mem_page = uvm_km_valloc_wait(kernel_map, NBPG);
if (mem_page == 0)
@@ -193,9 +189,7 @@ mmrw(dev_t dev, struct uio *uio, int fla
}
if (minor(dev) == 0) {
 unlock:
-   if (physlock > 1)
-   wakeup((caddr_t));
-   physlock = 0;
+   rw_exit();
}
return (error);
 }
Index: sparc64/sparc64/mem.c
===
RCS file: /cvs/src/sys/arch/sparc64/sparc64/mem.c,v
retrieving revision 1.16
diff -u -p -r1.16 mem.c
--- sparc64/sparc64/mem.c   15 Aug 2016 22:01:59 -  1.16
+++ sparc64/sparc64/mem.c   15 Aug 2016 22:10:03 -
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -75,24 +76,19 @@ mmclose(dev_t dev, int flag, int mode, s
 int
 mmrw(dev_t dev, struct uio *uio, int flags)
 {
+   static struct rwlock physlock = RWLOCK_INITIALIZER("mmrw");
vaddr_t o, v;
size_t c;
struct iovec *iov;
int error = 0;
-   static int physlock;
vm_prot_t prot;
extern caddr_t vmmap;
 
if (minor(dev) == 0) {
/* lock against other uses of shared vmmap */
-   while (physlock > 0) {
-   physlock++;
-   error = tsleep((caddr_t), PZERO | PCATCH,
-   "mmrw", 0);
-   if (error)
-   return (error);
-   }
-   physlock = 1;
+   error = rw_enter(, RW_WRITE | RW_INTR);
+   if (error)
+   return (error);
}
while (uio->uio_resid > 0 && error == 0) {
int n;
@@ -169,9 +165,7 @@ mmrw(dev_t dev, struct uio *uio, int fla
}
if (minor(dev) == 0) 

com(4) UART type probing

2016-08-15 Thread Mark Kettenis
It looks like we're going to use com(4) on armv7 after all.  Given
that we have a device tree there, we have a pretty good idea what UART
type we're driving.  So we should be able to avoid the flaky probing
code in com(4).  Here is a diff that makes the driver only do the
probing if the uart type was set to COM_UART_UNKNOWN at the start.
This is the default if you don't explicitly set it in the bus glue
part of the driver.  There are a few bits of glue code that do set te
uart type.  Of those only arch/octeon/dev/cn30xxuart.c sets it to a
value that makes com(4) go through the probing codepath.

Can somebody verify that this diff doesn't break octeon?

Thanks,

Mark


Index: dev/ic/com.c
===
RCS file: /cvs/src/sys/dev/ic/com.c,v
retrieving revision 1.158
diff -u -p -r1.158 com.c
--- dev/ic/com.c14 Sep 2014 14:17:24 -  1.158
+++ dev/ic/com.c15 Aug 2016 21:23:58 -
@@ -1583,6 +1583,7 @@ com_attach_subr(struct com_softc *sc)
 {
bus_space_tag_t iot = sc->sc_iot;
bus_space_handle_t ioh = sc->sc_ioh;
+   int probe = 0;
u_int8_t lcr;
 
sc->sc_ier = 0;
@@ -1616,7 +1617,7 @@ com_attach_subr(struct com_softc *sc)
/*
 * Skip specific probes if attachment code knows it already.
 */
-   if (sc->sc_uarttype == COM_UART_UNKNOWN)
+   if (sc->sc_uarttype == COM_UART_UNKNOWN) {
switch (bus_space_read_1(iot, ioh, com_iir) >> 6) {
case 0:
sc->sc_uarttype = COM_UART_16450;
@@ -1631,8 +1632,11 @@ com_attach_subr(struct com_softc *sc)
sc->sc_uarttype = COM_UART_UNKNOWN;
break;
}
+   probe = 1;
+   }
 
-   if (sc->sc_uarttype == COM_UART_16550A) { /* Probe for ST16650s */
+   /* Probe for ST16650s */
+   if (probe && sc->sc_uarttype == COM_UART_16550A) {
bus_space_write_1(iot, ioh, com_lcr, lcr | LCR_DLAB);
if (bus_space_read_1(iot, ioh, com_efr) == 0) {
bus_space_write_1(iot, ioh, com_efr, EFR_CTS);
@@ -1647,7 +1651,8 @@ com_attach_subr(struct com_softc *sc)
}
 
 #if 0  /* until com works with large FIFOs */
-   if (sc->sc_uarttype == COM_UART_ST16650V2) { /* Probe for XR16850s */
+   /* Probe for XR16850s */
+   if (probe && sc->sc_uarttype == COM_UART_ST16650V2) {
u_int8_t dlbl, dlbh;
 
/* Enable latch access and get the current values. */
@@ -1670,7 +1675,8 @@ com_attach_subr(struct com_softc *sc)
}
 #endif
 
-   if (sc->sc_uarttype == COM_UART_16550A) { /* Probe for TI16750s */
+   /* Probe for TI16750s */
+   if (probe && sc->sc_uarttype == COM_UART_16550A) {
bus_space_write_1(iot, ioh, com_lcr, lcr | LCR_DLAB);
bus_space_write_1(iot, ioh, com_fifo,
FIFO_ENABLE | FIFO_ENABLE_64BYTE);
@@ -1686,7 +1692,9 @@ com_attach_subr(struct com_softc *sc)
 
/* Reset the LCR (latch access is probably enabled). */
bus_space_write_1(iot, ioh, com_lcr, lcr);
-   if (sc->sc_uarttype == COM_UART_16450) { /* Probe for 8250 */
+
+   /* Probe for 8250 */
+   if (probe && sc->sc_uarttype == COM_UART_16450) {
u_int8_t scr0, scr1, scr2;
 
scr0 = bus_space_read_1(iot, ioh, com_scratch);



Re: fuse cache shenanigans

2016-08-15 Thread Bob Beck
Yeah, ok in that context, sure.. since this is userspace shit the caching
if any should probably happen there.


On Mon, Aug 15, 2016 at 2:20 PM, Ted Unangst  wrote:

> Bob Beck wrote:
> > Note - NFS has similar behaviour ;)
> >
> > at least within a directory.  - so this isn't tht "unusual" for non-local
> >
> > I'm wondering if this isn't a bit premature Have you looked for other
> side
> > effects of this removal?
>
> I think the nature of FUSE is that it's used with even "faker" filesystems,
> such that the backend may not even have filesystem like semantics. We
> should
> be careful what we assume.
>
> Also, the fuse daemon can probably do a better job of caching because it
> has
> more specialized knowledge. It can cache more than names, too.
>


Re: fuse cache shenanigans

2016-08-15 Thread Ted Unangst
Bob Beck wrote:
> Note - NFS has similar behaviour ;)
> 
> at least within a directory.  - so this isn't tht "unusual" for non-local
> 
> I'm wondering if this isn't a bit premature Have you looked for other side
> effects of this removal?

I think the nature of FUSE is that it's used with even "faker" filesystems,
such that the backend may not even have filesystem like semantics. We should
be careful what we assume.

Also, the fuse daemon can probably do a better job of caching because it has
more specialized knowledge. It can cache more than names, too.



Re: fuse cache shenanigans

2016-08-15 Thread Bob Beck
Note - NFS has similar behaviour ;)

at least within a directory.  - so this isn't tht "unusual" for non-local

I'm wondering if this isn't a bit premature Have you looked for other side
effects of this removal?


On Mon, Aug 15, 2016 at 1:52 PM, Ted Unangst  wrote:

> Martin Natano wrote:
> > Watch this:
> >
> >   $ doas sshfs natano@localhost:/tmp /mnt
> >   $ cat /mnt/foo
> >   cat: /mnt/foo: No such file or directory
> >   $ echo bar > /tmp/foo
> >   $ cat /mnt/foo
> >   cat: /mnt/foo: No such file or directory
> >   $ touch /mnt/foo
> >   $ cat /mnt/foo
> >   bar
> >   $
> >
> > There is no sense in doing caching in fusefs. In case of a non-local
> > filesystem the tree can change behind our back without us having a
> > chance to purge the cache.
> >
> > "The only winning move is not to play." Ok?
>
> ok
>
>


armv7 cache flushing: don't take shortcuts

2016-08-15 Thread Mark Kettenis
The functions that clean/invalidate the caches by virtual address,
bail out after cleaning 32k worth of data.  The 32k matches the L1
cache of most of the CPUs we current run on.  But the Cortex-A7 has an
integrated L2 cache that is larger.  And if you only flush it
partially you may get into trouble.  And now that we actually use the
cache that matters.  Many of the more recent ARMv7 CPUs include such a
L2 cache.  And some of them even have L1 caches that are larger than
32k.  So drop the shortcut and simply clean/invalidate what we were
asked to clean/invalidate.  Most of the calls should be covering a
single page or less anyway.

This fixes the core dumps and illegal instructions that I see when
booting from a SATA disk.

ok?


Index: arch/arm/arm/cpufunc_asm_armv7.S
===
RCS file: /cvs/src/sys/arch/arm/arm/cpufunc_asm_armv7.S,v
retrieving revision 1.13
diff -u -p -r1.13 cpufunc_asm_armv7.S
--- arch/arm/arm/cpufunc_asm_armv7.S6 Aug 2016 16:46:25 -   1.13
+++ arch/arm/arm/cpufunc_asm_armv7.S15 Aug 2016 19:45:53 -
@@ -103,8 +103,6 @@ ENTRY(armv7_tlb_flushD)
i_inc   .req r3
 ENTRY(armv7_icache_sync_range)
ldr ip, .Larmv7_icache_line_size
-   cmp r1, #0x8000
-   movcs   r1, #0x8000 /* XXX needs to match cache size... */
ldr ip, [ip]
sub r1, r1, #1  /* Don't overrun */
sub r3, ip, #1
@@ -136,8 +134,6 @@ ENTRY(armv7_icache_sync_all)
 
 ENTRY(armv7_dcache_wb_range)
ldr ip, .Larmv7_dcache_line_size
-   cmp r1, #0x8000
-   movcs   r1, #0x8000 /* XXX needs to match cache size... */
ldr ip, [ip]
sub r1, r1, #1  /* Don't overrun */
sub r3, ip, #1
@@ -155,8 +151,6 @@ ENTRY(armv7_dcache_wb_range)
 
 ENTRY(armv7_idcache_wbinv_range)
ldr ip, .Larmv7_idcache_line_size
-   cmp r1, #0x8000
-   movcs   r1, #0x8000 /* XXX needs to match cache size... */
ldr ip, [ip]
sub r1, r1, #1  /* Don't overrun */
sub r3, ip, #1
@@ -177,8 +171,6 @@ ENTRY(armv7_idcache_wbinv_range)
 
 ENTRY(armv7_dcache_wbinv_range)
ldr ip, .Larmv7_dcache_line_size
-   cmp r1, #0x8000
-   movcs   r1, #0x8000 /* XXX needs to match cache size... */
ldr ip, [ip]
sub r1, r1, #1  /* Don't overrun */
sub r3, ip, #1
@@ -198,8 +190,6 @@ ENTRY(armv7_dcache_wbinv_range)
 
 ENTRY(armv7_dcache_inv_range)
ldr ip, .Larmv7_dcache_line_size
-   cmp r1, #0x8000
-   movcs   r1, #0x8000 /* XXX needs to match cache size... */
ldr ip, [ip]
sub r1, r1, #1  /* Don't overrun */
sub r3, ip, #1



Re: fuse cache shenanigans

2016-08-15 Thread Ted Unangst
Martin Natano wrote:
> Watch this:
> 
>   $ doas sshfs natano@localhost:/tmp /mnt
>   $ cat /mnt/foo
>   cat: /mnt/foo: No such file or directory
>   $ echo bar > /tmp/foo
>   $ cat /mnt/foo
>   cat: /mnt/foo: No such file or directory
>   $ touch /mnt/foo
>   $ cat /mnt/foo
>   bar
>   $
> 
> There is no sense in doing caching in fusefs. In case of a non-local
> filesystem the tree can change behind our back without us having a
> chance to purge the cache.
> 
> "The only winning move is not to play." Ok?

ok



fuse cache shenanigans

2016-08-15 Thread Martin Natano
Watch this:

$ doas sshfs natano@localhost:/tmp /mnt
$ cat /mnt/foo
cat: /mnt/foo: No such file or directory
$ echo bar > /tmp/foo
$ cat /mnt/foo
cat: /mnt/foo: No such file or directory
$ touch /mnt/foo
$ cat /mnt/foo
bar
$

There is no sense in doing caching in fusefs. In case of a non-local
filesystem the tree can change behind our back without us having a
chance to purge the cache.

"The only winning move is not to play." Ok?

natano


Index: miscfs/fuse/fuse_lookup.c
===
RCS file: /cvs/src/sys/miscfs/fuse/fuse_lookup.c,v
retrieving revision 1.12
diff -u -p -r1.12 fuse_lookup.c
--- miscfs/fuse/fuse_lookup.c   12 Aug 2016 20:18:44 -  1.12
+++ miscfs/fuse/fuse_lookup.c   15 Aug 2016 10:01:41 -
@@ -64,9 +64,6 @@ fusefs_lookup(void *v)
(cnp->cn_nameiop == DELETE || cnp->cn_nameiop == RENAME))
return (EROFS);
 
-   if ((error = cache_lookup(vdp, vpp, cnp)) >= 0)
-   return (error);
-
if (flags & ISDOTDOT) {
/* got ".." */
nid = dp->parent;
@@ -120,10 +117,8 @@ fusefs_lookup(void *v)
 * Write access to directory required to delete files.
 */
error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_proc);
-   if (error != 0) {
-   fb_delete(fbuf);
-   return (error);
-   }
+   if (error)
+   goto out;
 
cnp->cn_flags |= SAVENAME;
}
@@ -132,10 +127,8 @@ fusefs_lookup(void *v)
/*
 * Write access to directory required to delete files.
 */
-   if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_proc)) != 0) 
{
-   fb_delete(fbuf);
-   return (error);
-   }
+   if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_proc)) != 0)
+   goto out;
 
if (nid == VTOI(vdp)->ufs_ino.i_number) {
error = EISDIR;
@@ -187,10 +180,8 @@ fusefs_lookup(void *v)
 
update_vattr(fmp->mp, >fb_vattr);
 
-   if (error) {
-   fb_delete(fbuf);
-   return (error);
-   }
+   if (error)
+   goto out;
 
if (vdp != NULL && vdp->v_type == VDIR)
VTOI(tdp)->parent = dp->ufs_ino.i_number;
@@ -204,10 +195,6 @@ fusefs_lookup(void *v)
}
 
 out:
-   if ((cnp->cn_flags & MAKEENTRY) && nameiop != CREATE &&
-   nameiop != DELETE)
-   cache_enter(vdp, *vpp, cnp);
-
fb_delete(fbuf);
return (error);
 }
Index: miscfs/fuse/fuse_vnops.c
===
RCS file: /cvs/src/sys/miscfs/fuse/fuse_vnops.c,v
retrieving revision 1.29
diff -u -p -r1.29 fuse_vnops.c
--- miscfs/fuse/fuse_vnops.c12 Aug 2016 20:18:44 -  1.29
+++ miscfs/fuse/fuse_vnops.c15 Aug 2016 10:01:41 -
@@ -857,7 +857,6 @@ fusefs_reclaim(void *v)
 * Remove the inode from its hash chain.
 */
ufs_ihashrem(>ufs_ino);
-   cache_purge(vp);
 
free(ip, M_FUSEFS, 0);
vp->v_data = NULL;
@@ -1386,11 +1385,9 @@ fusefs_rmdir(void *v)
goto out;
}
 
-   cache_purge(dvp);
vput(dvp);
dvp = NULL;
 
-   cache_purge(ITOV(ip));
fb_delete(fbuf);
 out:
if (dvp)



httpd "Persistent connection issue in FastCGI", anyone?

2016-08-15 Thread Reyk Floeter
Hi,

has somebody else seen something like "Persistent connection issue in
FastCGI" as described in https://github.com/reyk/httpd/issues/49 ?

Reyk



Re: typo in bytgpio(4) manpage

2016-08-15 Thread Marcus Glocker
On Mon, Aug 15, 2016 at 02:15:27PM -0300, Daniel Bolgheroni wrote:

> Typo.
> 
> s/drirect/direct/

It's in - Thank you!
 
> Index: bytgpio.4
> ===
> RCS file: /cvs/src/share/man/man4/bytgpio.4,v
> retrieving revision 1.2
> diff -u -p -r1.2 bytgpio.4
> --- bytgpio.4   28 Mar 2016 20:08:56 -  1.2
> +++ bytgpio.4   15 Aug 2016 17:11:31 -
> @@ -27,7 +27,7 @@ The
>  .Nm
>  driver provides support for the GPIO controllers found on Intel's Bay
>  Trail SoC.
> -It does not provide drirect device driver entry points but makes its
> +It does not provide direct device driver entry points but makes its
>  functions available to
>  .Xr acpi 4 .
>  .Sh SEE ALSO
> 
> -- 
> db



typo in bytgpio(4) manpage

2016-08-15 Thread Daniel Bolgheroni
Typo.

s/drirect/direct/

Index: bytgpio.4
===
RCS file: /cvs/src/share/man/man4/bytgpio.4,v
retrieving revision 1.2
diff -u -p -r1.2 bytgpio.4
--- bytgpio.4   28 Mar 2016 20:08:56 -  1.2
+++ bytgpio.4   15 Aug 2016 17:11:31 -
@@ -27,7 +27,7 @@ The
 .Nm
 driver provides support for the GPIO controllers found on Intel's Bay
 Trail SoC.
-It does not provide drirect device driver entry points but makes its
+It does not provide direct device driver entry points but makes its
 functions available to
 .Xr acpi 4 .
 .Sh SEE ALSO

-- 
db



Re: ed(1) exit without saving if output file is a command

2016-08-15 Thread Theo Buehler
On Mon, Aug 15, 2016 at 06:26:44PM +0200, Theo Buehler wrote:
> On Mon, Aug 15, 2016 at 03:33:38PM +0200, Jérôme FRGACIC wrote:
> > Hello tech,
> > 
> > I recently use ed(1) to transmit some input lines to another command.
> > However, I remark that after the 'w' command, I can exit ed without any
> > warnings even if the data were not saved into a file.
> > 
> > Here is an example :
> > 
> > $ ed
> > P
> > *a
> > A simple line.
> > .
> > *w !sed 's/^/#/'
> > #A simple line.
> > 15
> > *q
> > $ # No warning before exit
> > 
> > This behaviour seems not conform to the Open Group Base Specifications
> > Issue 6 (IEEE Std 1003.1, 2004 Edition). I would suggest this diff to
> > solve this problem.
> 
> POSIX says the following about the quit command:
> 
>The q command shall cause ed to exit. If the buffer has changed since
>the last time the entire buffer was written, the user shall be warned,
>as described previously.
> 
> where 'previously' means:
> 
>If changes have been made in the buffer since the last w command that
>wrote the entire buffer, ed shall warn the user if an attempt is made
>to destroy the editor buffer via the e or q commands.
>[...]
> 
> You did write the entire buffer, namely to the stdin of the sed command,
> and afterwards you didn't change the buffer anymore, so I think this is
> compliant with what is quoted above.
> 
> FWIW, the GNU ed port (ged) behaves in exactly the same way.
> 
> Could you please be more specific what part of the standard you have in
> mind?

Oh, there's this in the description of the w command, sorry:

   If file begins with '!', the rest of the line shall be taken to be a
   shell command line whose standard input shall be the addressed lines.
   Such a shell command line shall not be remembered as the current
   pathname. This usage of the write command with '!'  shall not be
   considered as a ``last w command that wrote the entire buffer'', as
   described previously; thus, this alone shall not prevent the warning to
   the user if an attempt is made to destroy the editor buffer via the e
   or q commands.

Please ignore my previous objection.

> > 
> > Index: main.c
> > ===
> > RCS file: /cvs/src/bin/ed/main.c,v
> > retrieving revision 1.57
> > diff -r1.57 main.c
> > 862c862
> > <   else if (addr == addr_last)
> > ---
> > >   else if (addr == addr_last && *fnp != '!')
> > 
> > 
> > PS : I haven't subcribe to the tech mailing list, so please add me as
> > recipient if you reply.
> > 
> > 
> > Kind regards,
> > 
> > 
> > Jérôme FRGACIC
> > 
> 



Re: ed(1) exit without saving if output file is a command

2016-08-15 Thread Theo Buehler
On Mon, Aug 15, 2016 at 03:33:38PM +0200, Jérôme FRGACIC wrote:
> Hello tech,
> 
> I recently use ed(1) to transmit some input lines to another command.
> However, I remark that after the 'w' command, I can exit ed without any
> warnings even if the data were not saved into a file.
> 
> Here is an example :
> 
> $ ed
> P
> *a
> A simple line.
> .
> *w !sed 's/^/#/'
> #A simple line.
> 15
> *q
> $ # No warning before exit
> 
> This behaviour seems not conform to the Open Group Base Specifications
> Issue 6 (IEEE Std 1003.1, 2004 Edition). I would suggest this diff to
> solve this problem.

POSIX says the following about the quit command:

   The q command shall cause ed to exit. If the buffer has changed since
   the last time the entire buffer was written, the user shall be warned,
   as described previously.

where 'previously' means:

   If changes have been made in the buffer since the last w command that
   wrote the entire buffer, ed shall warn the user if an attempt is made
   to destroy the editor buffer via the e or q commands.
   [...]

You did write the entire buffer, namely to the stdin of the sed command,
and afterwards you didn't change the buffer anymore, so I think this is
compliant with what is quoted above.

FWIW, the GNU ed port (ged) behaves in exactly the same way.

Could you please be more specific what part of the standard you have in
mind?

> 
> Index: main.c
> ===
> RCS file: /cvs/src/bin/ed/main.c,v
> retrieving revision 1.57
> diff -r1.57 main.c
> 862c862
> < else if (addr == addr_last)
> ---
> > else if (addr == addr_last && *fnp != '!')
> 
> 
> PS : I haven't subcribe to the tech mailing list, so please add me as
> recipient if you reply.
> 
> 
> Kind regards,
> 
> 
> Jérôme FRGACIC
> 



ed(1) exit without saving if output file is a command

2016-08-15 Thread Jérôme FRGACIC
Hello tech,

I recently use ed(1) to transmit some input lines to another command.
However, I remark that after the 'w' command, I can exit ed without any
warnings even if the data were not saved into a file.

Here is an example :

$ ed
P
*a
A simple line.
.
*w !sed 's/^/#/'
#A simple line.
15
*q
$ # No warning before exit

This behaviour seems not conform to the Open Group Base Specifications
Issue 6 (IEEE Std 1003.1, 2004 Edition). I would suggest this diff to
solve this problem.

Index: main.c
===
RCS file: /cvs/src/bin/ed/main.c,v
retrieving revision 1.57
diff -r1.57 main.c
862c862
<   else if (addr == addr_last)
---
>   else if (addr == addr_last && *fnp != '!')


PS : I haven't subcribe to the tech mailing list, so please add me as
recipient if you reply.


Kind regards,


Jérôme FRGACIC



Re: httpd: be stricter with TLS configuration

2016-08-15 Thread Reyk Floeter
On Mon, Aug 15, 2016 at 11:33:18PM +1000, Joel Sing wrote:
> On Monday 15 August 2016 13:04:43 Reyk Floeter wrote:
> > On Sat, Aug 13, 2016 at 02:57:14AM +1000, Joel Sing wrote:
> > > The following diff makes httpd stricter with respect to TLS configuration:
> > > 
> > > - Do not allow TLS and non-TLS to be configured on the same port.
> > > - Do not allow TLS options to be specified without a TLS listener.
> > > - Ensure that TLS options are the same when a server is specified on the
> > > 
> > >   same address/port.
> > > 
> > > Currently, these configurations are permitted but do not work as intended.
> > > 
> > > This also factors out (and reuses) the server matching code that was
> > > already duplicated.
> > > 
> > > ok?
> > 
> > - I think server_match() and server_tls_cmp() can both live in
> > server.c (server_match() somewhere close to server_foreach() - this
> > match function can be used for at least one other case outside of
> > parse.y).
> 
> I've moved server_tls_cmp() to server.c, however I'm not sure it makes sense 
> for server_match() since it operates on conf (a global declared in parse.y) 
> and I do not see any identical matching (the closest seems to be the code in 
> server_privinit(), but it still differs).
>  

you can use env-> instead of conf-> there.

I think server_privinit() could be reworked to use it.

In other words: I don't think that server_match() is something that is
specific to the parser, so it shouldn't be in the parse.y file.

Reyk

> > - As discussed before, for consistency with the config, please use
> > "tls" instead of "TLS" in the log messages.
> 
> Agreed, I'll handle this in a separate commit.
>  
> > FYI, The SNI diff doesn't like the tls_cert_file and tls_key_file
> > checks in server_tls_cmp(), as they now become valid, but they can be
> > removed/changed later.
> 
> Yes, they are independent diffs and we'll need to relax the restrictions 
> slightly when we enable SNI.
> 
> > Otherwise OK reyk@
> 
> Thanks.

-- 



bpf(4), csignal() and selwakup()

2016-08-15 Thread Martin Pieuchot
I'd like to make sure that bpf_tap(9) does not grab the KERNEL_LOCK().
The reason is to reduce the potential lock ordering problems within PF.

I'm currently using a mutex to serialize buffer changes, but since
bpf_wakeup() will still need the KERNEL_LOCK(), I'm using a task for
that.

Diff below only introduce this task.  I'd like to commit it first to
make sure possible regressions are fixed before playing with the mutex
diff.

Note that an acceptable race can occur with this diff.  The value of
``bd_async'' and/or ``bd_sig'' might change between the bpf_wakeup()
call and the check.

ok?

Index: net/bpf.c
===
RCS file: /cvs/src/sys/net/bpf.c,v
retrieving revision 1.147
diff -u -p -r1.147 bpf.c
--- net/bpf.c   15 Aug 2016 07:20:14 -  1.147
+++ net/bpf.c   15 Aug 2016 13:55:04 -
@@ -57,6 +57,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -103,6 +105,7 @@ int bpf_setif(struct bpf_d *, struct ifr
 intbpfpoll(dev_t, int, struct proc *);
 intbpfkqfilter(dev_t, struct knote *);
 void   bpf_wakeup(struct bpf_d *);
+void   bpf_wakeup_cb(void *);
 void   bpf_catchpacket(struct bpf_d *, u_char *, size_t, size_t,
void (*)(const void *, void *, size_t), struct timeval *);
 void   bpf_reset_d(struct bpf_d *);
@@ -345,6 +348,7 @@ bpfopen(dev_t dev, int flag, int mode, s
bd->bd_unit = unit;
bd->bd_bufsize = bpf_bufsize;
bd->bd_sig = SIGIO;
+   task_set(>bd_wake_task, bpf_wakeup_cb, bd);
 
if (flag & FNONBLOCK)
bd->bd_rtout = -1;
@@ -515,12 +519,28 @@ bpfread(dev_t dev, struct uio *uio, int 
 void
 bpf_wakeup(struct bpf_d *d)
 {
-   wakeup((caddr_t)d);
+   /*
+* As long as csignal() and selwakeup() need to be protected
+* by the KERNEL_LOCK() we have to delay the wakeup to
+* another context to keep the hot path KERNEL_LOCK()-free.
+*/
+   bpf_get(d);
+   task_add(systq, >bd_wake_task);
+}
+
+void
+bpf_wakeup_cb(void *xd)
+{
+   struct bpf_d *d = xd;
+
+   KERNEL_ASSERT_LOCKED();
+
+   wakeup(d);
if (d->bd_async && d->bd_sig)
-   csignal(d->bd_pgid, d->bd_sig,
-   d->bd_siguid, d->bd_sigeuid);
+   csignal(d->bd_pgid, d->bd_sig, d->bd_siguid, d->bd_sigeuid);
 
selwakeup(>bd_sel);
+   bpf_put(d);
 }
 
 int
Index: net/bpfdesc.h
===
RCS file: /cvs/src/sys/net/bpfdesc.h,v
retrieving revision 1.30
diff -u -p -r1.30 bpfdesc.h
--- net/bpfdesc.h   30 Mar 2016 12:33:10 -  1.30
+++ net/bpfdesc.h   15 Aug 2016 13:55:04 -
@@ -42,8 +42,6 @@
 
 #ifdef _KERNEL
 
-#include 
-
 /*
  * Descriptor associated with each open bpf file.
  */
@@ -90,6 +88,8 @@ struct bpf_d {
struct selinfo  bd_sel; /* bsd select info */
int bd_unit;/* logical unit number */
LIST_ENTRY(bpf_d) bd_list;  /* descriptor list */
+
+   struct task bd_wake_task;   /* delay csignal() and selwakeup() */
 };
 
 /*



Re: httpd: be stricter with TLS configuration

2016-08-15 Thread Joel Sing
On Monday 15 August 2016 13:04:43 Reyk Floeter wrote:
> On Sat, Aug 13, 2016 at 02:57:14AM +1000, Joel Sing wrote:
> > The following diff makes httpd stricter with respect to TLS configuration:
> > 
> > - Do not allow TLS and non-TLS to be configured on the same port.
> > - Do not allow TLS options to be specified without a TLS listener.
> > - Ensure that TLS options are the same when a server is specified on the
> > 
> >   same address/port.
> > 
> > Currently, these configurations are permitted but do not work as intended.
> > 
> > This also factors out (and reuses) the server matching code that was
> > already duplicated.
> > 
> > ok?
> 
> - I think server_match() and server_tls_cmp() can both live in
> server.c (server_match() somewhere close to server_foreach() - this
> match function can be used for at least one other case outside of
> parse.y).

I've moved server_tls_cmp() to server.c, however I'm not sure it makes sense 
for server_match() since it operates on conf (a global declared in parse.y) 
and I do not see any identical matching (the closest seems to be the code in 
server_privinit(), but it still differs).
 
> - As discussed before, for consistency with the config, please use
> "tls" instead of "TLS" in the log messages.

Agreed, I'll handle this in a separate commit.
 
> FYI, The SNI diff doesn't like the tls_cert_file and tls_key_file
> checks in server_tls_cmp(), as they now become valid, but they can be
> removed/changed later.

Yes, they are independent diffs and we'll need to relax the restrictions 
slightly when we enable SNI.

> Otherwise OK reyk@

Thanks.



Re: finer grained art locking again

2016-08-15 Thread Mark Kettenis
> Date: Mon, 15 Aug 2016 22:08:16 +1000
> From: Jonathan Matthew 
> 
> This removes ART's reliance on the kernel lock to serialise updates.
> I sent out an earlier version of this prior to 6.0, but it didn't make it in.
> Since then, we've decided to go with rwlocks in the packet processing path,
> so here's a version with an rwlock instead of a mutex.

Merely out of curiousity: is there any reason to only use write locks
here?  Naively you'd think this would be a case where making use of
the multiple-reader, single-writer property of rwlocks would make
sense...

> Index: art.c
> ===
> RCS file: /cvs/src/sys/net/art.c,v
> retrieving revision 1.22
> diff -u -p -r1.22 art.c
> --- art.c 19 Jul 2016 10:51:44 -  1.22
> +++ art.c 14 Aug 2016 11:10:45 -
> @@ -152,6 +152,7 @@ art_alloc(unsigned int rtableid, unsigne
>  
>   ar->ar_off = off;
>   ar->ar_rtableid = rtableid;
> + rw_init(>ar_lock, "art");
>  
>   return (ar);
>  }
> @@ -378,7 +379,7 @@ art_insert(struct art_root *ar, struct a
>   struct art_node *node;
>   int  i, j;
>  
> - KERNEL_ASSERT_LOCKED();
> + rw_assert_wrlock(>ar_lock);
>   KASSERT(plen >= 0 && plen <= ar->ar_alen);
>  
>   at = srp_get_locked(>ar_root);
> @@ -482,7 +483,7 @@ art_delete(struct art_root *ar, struct a
>   struct art_node *node;
>   int  i, j;
>  
> - KERNEL_ASSERT_LOCKED();
> + rw_assert_wrlock(>ar_lock);
>   KASSERT(plen >= 0 && plen <= ar->ar_alen);
>  
>   at = srp_get_locked(>ar_root);
> @@ -613,7 +614,7 @@ art_walk(struct art_root *ar, int (*f)(s
>   struct art_node *node;
>   int  error = 0;
>  
> - KERNEL_LOCK();
> + rw_enter_write(>ar_lock);
>   at = srp_get_locked(>ar_root);
>   if (at != NULL) {
>   art_table_ref(ar, at);
> @@ -631,7 +632,7 @@ art_walk(struct art_root *ar, int (*f)(s
>  
>   art_table_free(ar, at);
>   }
> - KERNEL_UNLOCK();
> + rw_exit_write(>ar_lock);
>  
>   return (error);
>  }
> @@ -708,9 +709,9 @@ art_walk_apply(struct art_root *ar,
>  
>   if ((an != NULL) && (an != next)) {
>   /* this assumes an->an_dst is not used by f */
> - KERNEL_UNLOCK();
> + rw_exit_write(>ar_lock);
>   error = (*f)(an, arg);
> - KERNEL_LOCK();
> + rw_enter_write(>ar_lock);
>   }
>  
>   return (error);
> Index: art.h
> ===
> RCS file: /cvs/src/sys/net/art.h,v
> retrieving revision 1.14
> diff -u -p -r1.14 art.h
> --- art.h 14 Jun 2016 04:42:02 -  1.14
> +++ art.h 14 Aug 2016 11:10:45 -
> @@ -19,6 +19,8 @@
>  #ifndef _NET_ART_H_
>  #define _NET_ART_H_
>  
> +#include 
> +
>  #define ART_MAXLVL   32  /* We currently use 32 levels for IPv6. */
>  
>  /*
> @@ -26,6 +28,7 @@
>   */
>  struct art_root {
>   struct srp   ar_root;   /* First table */
> + struct rwlockar_lock;   /* Serialise modifications */
>   uint8_t  ar_bits[ART_MAXLVL];   /* Per level stride */
>   uint8_t  ar_nlvl;   /* Number of levels */
>   uint8_t  ar_alen;   /* Address length in bits */
> Index: rtable.c
> ===
> RCS file: /cvs/src/sys/net/rtable.c,v
> retrieving revision 1.50
> diff -u -p -r1.50 rtable.c
> --- rtable.c  19 Jul 2016 10:51:44 -  1.50
> +++ rtable.c  14 Aug 2016 11:10:45 -
> @@ -515,6 +515,10 @@ static inline uint8_t*satoaddr(struct a
>  void rtentry_ref(void *, void *);
>  void rtentry_unref(void *, void *);
>  
> +#ifndef SMALL_KERNEL
> +void rtable_mpath_insert(struct art_node *, struct rtentry *);
> +#endif
> +
>  struct srpl_rc rt_rc = SRPL_RC_INITIALIZER(rtentry_ref, rtentry_unref, NULL);
>  
>  void
> @@ -679,8 +683,6 @@ rtable_insert(unsigned int rtableid, str
>   unsigned int rt_flags;
>   int  error = 0;
>  
> - KERNEL_ASSERT_LOCKED();
> -
>   ar = rtable_get(rtableid, dst->sa_family);
>   if (ar == NULL)
>   return (EAFNOSUPPORT);
> @@ -691,6 +693,7 @@ rtable_insert(unsigned int rtableid, str
>   return (EINVAL);
>  
>   rtref(rt); /* guarantee rtfree won't do anything during insert */
> + rw_enter_write(>ar_lock);
>  
>  #ifndef SMALL_KERNEL
>   /* Do not permit exactly the same dst/mask/gw pair. */
> @@ -766,15 +769,14 @@ rtable_insert(unsigned int rtableid, str
>   }
>   }
>  
> - SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next);
> -
>   /* Put newly inserted entry at the right place. */
> - rtable_mpath_reprio(rtableid, dst, mask, 

5GHz vs 2GHz APs

2016-08-15 Thread Stefan Sperling
The wireless stack usually runs its scanning loop once per frequency band.
It begins with 5GHz so that APs on this band are preferred. Within a band,
an AP with the best RSSI (receive signal strength indicator) is chosen,
after matching all other desired parameters such as the ESSID etc.

I am not sure what the rationale was at the time this code was written
but today there is a good reason to prefer 5GHz. This band is usually
much less crowded and gives a much better experience as a result.
Because 5GHz is scanned first we'll usually prefer this band.

However, iwm(4) runs a scan across all bands at once so we pick an AP based
on RSSI alone. In practice, we often end up choosing 2GHz APs instead of
5GHz APs, even in cases where 5GHz would work fine, just with a slightly
lower RSSI.

With this diff we prefer 5GHz APs with "good enough" RSSI over 2GHz APs.

Index: ieee80211_node.c
===
RCS file: /cvs/src/sys/net80211/ieee80211_node.c,v
retrieving revision 1.103
diff -u -p -r1.103 ieee80211_node.c
--- ieee80211_node.c21 May 2016 09:07:11 -  1.103
+++ ieee80211_node.c15 Aug 2016 12:14:21 -
@@ -622,12 +622,23 @@ ieee80211_end_scan(struct ifnet *ifp)
ieee80211_free_node(ic, ni);
continue;
}
-   if (ieee80211_match_bss(ic, ni) == 0) {
-   if (selbs == NULL)
-   selbs = ni;
-   else if (ni->ni_rssi > selbs->ni_rssi)
-   selbs = ni;
-   }
+   if (ieee80211_match_bss(ic, ni) != 0)
+   continue;
+
+   /* Pick the AP/IBSS match with the best RSSI. */
+   if (selbs == NULL)
+   selbs = ni;
+   else if ((ic->ic_caps & IEEE80211_C_SCANALLBAND) &&
+   IEEE80211_IS_CHAN_5GHZ(selbs->ni_chan) &&
+   IEEE80211_IS_CHAN_2GHZ(ni->ni_chan) &&
+   selbs->ni_rssi >= (ic->ic_max_rssi - (ic->ic_max_rssi / 4)))
+   /* 
+* Prefer 5GHz (with reasonable RSSI) over 2GHz since
+* the 5GHz band is usually less saturated.
+*/
+   continue;
+   else if (ni->ni_rssi > selbs->ni_rssi)
+   selbs = ni;
}
if (selbs == NULL)
goto notfound;



Re: per cpu memory in the kernel

2016-08-15 Thread Mark Kettenis
> From: David Gwynne 
> Date: Sun, 14 Aug 2016 16:26:45 +1000
> 
> > On 13 Aug 2016, at 5:48 AM, Mark Kettenis  wrote:
> > 
> >> From: Martin Pieuchot 
> >> Date: Fri, 12 Aug 2016 20:44:04 +0200
> >> 
> >> On 08/11/16 06:43, David Gwynne wrote:
> >>> ive been tinkering with per cpu memory in the kernel
> >>> [...]
> >> 
> >> I'd like to have more people comment on this because we need an MP-safe
> >> way to handle counters in the network stack.
> > 
> > I had a first look.  Need to have a bit more time to digest this.
> > 
> >>> im still debating whether the API should do protection against
> >>> interrupts on the local cpu by handling spls for you. at the moment
> >>> it is up to the caller to manually splfoo() and splx(), but im half
> >>> convinced that cpumem_enter and _leave should do that on behalf of
> >>> the caller.
> >> 
> >> Let's see how it looks like when we have more usages of per cpu memory.
> > 
> > The per-CPU counters API is likely to need it though.  I'm less sure
> > about the per-CPU memory API itself.
> 
> there are a couple of arguments against it.
> 
> one big reason i didnt put it in the api itself at the time was the
> lack of MI splraise(). ive since fixed that on everything except arm
> as far as i can tell. arm may be ok, but its hard to read and my
> zaurus is dusty. it's worth figuring out though cos the pool parts
> of the diff use splraise instead of a mutex per cpu.
> 
> id also have to store the ipl somewhere inside struct cpumem, but be
> clever about avoiding more indirection on the way to the per cpu
> memory. so far the caller already has a pretty good idea of what
> level it is or should be at, and generally keeps state on the stack
> (unlike mutexes).

And I suppose in most cases when updating the counters you'll already
be at the desired spl.

So I'm with mpi@ here.  Let's stick to what you have now and revise
after we have used it in a couple of places.

> >>> ive added a wrapper around percpu memory for counters. basically
> >>> you ask it for N counters, where each counter is a uint64_t.
> >>> counters_enter will give you a per cpu reference to these N counters
> >>> which you can increment and decrement as you wish. internally the
> >>> api will version the per cpu counters so a reader can know when
> >>> theyre consistent, which is important on 32bit archs (where 64bit
> >>> ops arent necessarily atomic), or where you want several counters
> >>> to be consistent with each other (like packet and byte counters).
> >>> counters_read is provided to use the above machinery for a consistent
> >>> read.
> >> 
> >> Comment below.
> >> 
> >>> secondly, there is a boot strapping problem with per cpu data
> >>> structures, which is very apparent with the mbuf layer. the problem
> >>> is we dont know how many cpus are in the system until we're halfway
> >>> through attaching device drivers in the system. however, if we want
> >>> to use percpu data structures during attach we need to know how
> >>> many cpus we have. mbufs are allocated during attach, so we need
> >>> to know how many cpus we have before attach.
> >> 
> >> What about using ncpusfound?
> > 
> > On sparc64 that gets called very early, before pmap_bootstrap(),
> > because we need to allocate some per-CPU memory ;).
> > 
> > On other systems we typically calculate it on the fly as we're
> > attaching CPUs.  That may not be early enough.
> 
> i tried ncpusfound and MAXCPUS. ncpusfound is generally unavailable
> before cpu_configure, and im trying to avoid wasting memory with
> MAXCPUs. using global memory for the boot cpu and doing a "realloc"
> after cpu_configure seems to work well.
> 
> the second diff i mailed out has realloc continue to use the global
> as the cpumem for cpu 0 and allocate more memory for 1 and up. this
> means we only waste a pointer of memory for percpu memory that needs
> to work straight after boot.

I think that's perfectly acceptable.

Hiwver, looking at the code xxx_realloc() might be a bit misleading.
One could naively think that these interfaces could be used to change
the size of the per-CPU memory area.  But that's not the case isn't it?

It also seems that you currently rely on each subsystem that uses
per-CPU memory to call the appropriate xxx_realloc function itself.
It might be feasable to do this automatically in percpu_init() by
using ELF section magic to construct a list of CPUMEM_BOOT_MEMORY()
declarations.  But perhaps it is best to file that away as a future
improvement for now.

> on UP we will waste nothing cos we dont use struct cpumem, and avoid
> needing to allocate anything further at runtime.
> 
> > 
> >>> +uint64_t *
> >>> +counters_enter(struct counters_ref *ref, struct cpumem *cm)
> >>> +{
> >>> + ref->gen = cpumem_enter(cm);
> >>> + (*ref->gen)++; /* make the generation number odd */
> >>> + return (ref->gen + 1);
> >>> +}
> >>> +
> >>> +void
> >>> +counters_leave(struct counters_ref *ref, 

finer grained art locking again

2016-08-15 Thread Jonathan Matthew
This removes ART's reliance on the kernel lock to serialise updates.
I sent out an earlier version of this prior to 6.0, but it didn't make it in.
Since then, we've decided to go with rwlocks in the packet processing path,
so here's a version with an rwlock instead of a mutex.


Index: art.c
===
RCS file: /cvs/src/sys/net/art.c,v
retrieving revision 1.22
diff -u -p -r1.22 art.c
--- art.c   19 Jul 2016 10:51:44 -  1.22
+++ art.c   14 Aug 2016 11:10:45 -
@@ -152,6 +152,7 @@ art_alloc(unsigned int rtableid, unsigne
 
ar->ar_off = off;
ar->ar_rtableid = rtableid;
+   rw_init(>ar_lock, "art");
 
return (ar);
 }
@@ -378,7 +379,7 @@ art_insert(struct art_root *ar, struct a
struct art_node *node;
int  i, j;
 
-   KERNEL_ASSERT_LOCKED();
+   rw_assert_wrlock(>ar_lock);
KASSERT(plen >= 0 && plen <= ar->ar_alen);
 
at = srp_get_locked(>ar_root);
@@ -482,7 +483,7 @@ art_delete(struct art_root *ar, struct a
struct art_node *node;
int  i, j;
 
-   KERNEL_ASSERT_LOCKED();
+   rw_assert_wrlock(>ar_lock);
KASSERT(plen >= 0 && plen <= ar->ar_alen);
 
at = srp_get_locked(>ar_root);
@@ -613,7 +614,7 @@ art_walk(struct art_root *ar, int (*f)(s
struct art_node *node;
int  error = 0;
 
-   KERNEL_LOCK();
+   rw_enter_write(>ar_lock);
at = srp_get_locked(>ar_root);
if (at != NULL) {
art_table_ref(ar, at);
@@ -631,7 +632,7 @@ art_walk(struct art_root *ar, int (*f)(s
 
art_table_free(ar, at);
}
-   KERNEL_UNLOCK();
+   rw_exit_write(>ar_lock);
 
return (error);
 }
@@ -708,9 +709,9 @@ art_walk_apply(struct art_root *ar,
 
if ((an != NULL) && (an != next)) {
/* this assumes an->an_dst is not used by f */
-   KERNEL_UNLOCK();
+   rw_exit_write(>ar_lock);
error = (*f)(an, arg);
-   KERNEL_LOCK();
+   rw_enter_write(>ar_lock);
}
 
return (error);
Index: art.h
===
RCS file: /cvs/src/sys/net/art.h,v
retrieving revision 1.14
diff -u -p -r1.14 art.h
--- art.h   14 Jun 2016 04:42:02 -  1.14
+++ art.h   14 Aug 2016 11:10:45 -
@@ -19,6 +19,8 @@
 #ifndef _NET_ART_H_
 #define _NET_ART_H_
 
+#include 
+
 #define ART_MAXLVL 32  /* We currently use 32 levels for IPv6. */
 
 /*
@@ -26,6 +28,7 @@
  */
 struct art_root {
struct srp   ar_root;   /* First table */
+   struct rwlockar_lock;   /* Serialise modifications */
uint8_t  ar_bits[ART_MAXLVL];   /* Per level stride */
uint8_t  ar_nlvl;   /* Number of levels */
uint8_t  ar_alen;   /* Address length in bits */
Index: rtable.c
===
RCS file: /cvs/src/sys/net/rtable.c,v
retrieving revision 1.50
diff -u -p -r1.50 rtable.c
--- rtable.c19 Jul 2016 10:51:44 -  1.50
+++ rtable.c14 Aug 2016 11:10:45 -
@@ -515,6 +515,10 @@ static inline uint8_t  *satoaddr(struct a
 void   rtentry_ref(void *, void *);
 void   rtentry_unref(void *, void *);
 
+#ifndef SMALL_KERNEL
+void   rtable_mpath_insert(struct art_node *, struct rtentry *);
+#endif
+
 struct srpl_rc rt_rc = SRPL_RC_INITIALIZER(rtentry_ref, rtentry_unref, NULL);
 
 void
@@ -679,8 +683,6 @@ rtable_insert(unsigned int rtableid, str
unsigned int rt_flags;
int  error = 0;
 
-   KERNEL_ASSERT_LOCKED();
-
ar = rtable_get(rtableid, dst->sa_family);
if (ar == NULL)
return (EAFNOSUPPORT);
@@ -691,6 +693,7 @@ rtable_insert(unsigned int rtableid, str
return (EINVAL);
 
rtref(rt); /* guarantee rtfree won't do anything during insert */
+   rw_enter_write(>ar_lock);
 
 #ifndef SMALL_KERNEL
/* Do not permit exactly the same dst/mask/gw pair. */
@@ -766,15 +769,14 @@ rtable_insert(unsigned int rtableid, str
}
}
 
-   SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next);
-
/* Put newly inserted entry at the right place. */
-   rtable_mpath_reprio(rtableid, dst, mask, rt->rt_priority, rt);
+   rtable_mpath_insert(an, rt);
 #else
error = EEXIST;
 #endif /* SMALL_KERNEL */
}
 leave:
+   rw_exit_write(>ar_lock);
rtfree(rt);
return (error);
 }
@@ -792,6 +794,7 @@ rtable_delete(unsigned int rtableid, str
struct rtentry  *mrt;
int  npaths = 0;
 #endif /* SMALL_KERNEL */
+   int   

Re: Two sunxi driver diffs

2016-08-15 Thread Mark Kettenis
> Date: Sun, 14 Aug 2016 13:59:00 +0200 (CEST)
> From: Mark Kettenis 
> 
> Here are two diffs for the sunxi armv7 platform that I can't test
> myself.  The first one is for sxiahci(4), and changes it to use the
> regulator API to power on the bus.  The second one is for sxie(4), and
> changes it to use the pinctrl API to select the appropriate pins.
> 
> If anybody with a Allwinner A10/A10s/A13 board (Cubiboard, pcDuino,
> Olinuxino A10/A10s/A13) could verify that this doesn't break their
> SATA or Ethernet it would be greatly appreciated.

FWIW, these diffs have committed now.  The SATA diff has been tested,
the Ethernet diff hasn't.  So testing of that one is still welcome.
But you can simply get the latest kernel sources from anoncvs or wait
until the next snapshot.

> Index: arch/armv7/sunxi/sxiahci.c
> ===
> RCS file: /cvs/src/sys/arch/armv7/sunxi/sxiahci.c,v
> retrieving revision 1.9
> diff -u -p -r1.9 sxiahci.c
> --- arch/armv7/sunxi/sxiahci.c5 Aug 2016 19:00:25 -   1.9
> +++ arch/armv7/sunxi/sxiahci.c14 Aug 2016 11:21:34 -
> @@ -33,9 +33,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include 
> +#include 
>  #include 
>  
>  #define  SXIAHCI_CAP 0x
> @@ -97,6 +97,7 @@ sxiahci_attach(struct device *parent, st
>   struct sxiahci_softc *sxisc = (struct sxiahci_softc *)self;
>   struct ahci_softc *sc = >sc;
>   struct fdt_attach_args *faa = aux;
> + uint32_t target_supply;
>   uint32_t timo;
>  
>   if (faa->fa_nreg < 1)
> @@ -165,8 +166,9 @@ sxiahci_attach(struct device *parent, st
>   SXIWRITE4(sc, SXIAHCI_RWC, 7);
>  
>   /* power up phy */
> - sxipio_setcfg(SXIAHCI_PWRPIN, SXIPIO_OUTPUT);
> - sxipio_setpin(SXIAHCI_PWRPIN);
> + target_supply = OF_getpropint(faa->fa_node, "target-supply", 0);
> + if (target_supply)
> + regulator_enable(target_supply);
>  
>   sc->sc_ih = arm_intr_establish_fdt(faa->fa_node, IPL_BIO,
>   ahci_intr, sc, sc->sc_dev.dv_xname);
> @@ -190,7 +192,8 @@ sxiahci_attach(struct device *parent, st
>  irq:
>   arm_intr_disestablish(sc->sc_ih);
>  clrpwr:
> - sxipio_clrpin(SXIAHCI_PWRPIN);
> + if (target_supply)
> + regulator_disable(target_supply);
>  dismod:
>   sxiccmu_disablemodule(CCMU_AHCI);
>   bus_space_unmap(sc->sc_iot, sc->sc_ioh, sc->sc_ios);
> Index: arch/armv7/sunxi/sxie.c
> ===
> RCS file: /cvs/src/sys/arch/armv7/sunxi/sxie.c,v
> retrieving revision 1.19
> diff -u -p -r1.19 sxie.c
> --- arch/armv7/sunxi/sxie.c   5 Aug 2016 22:19:23 -   1.19
> +++ arch/armv7/sunxi/sxie.c   14 Aug 2016 11:21:34 -
> @@ -49,9 +49,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include 
> +#include 
>  #include 
>  
>  /* configuration registers */
> @@ -215,6 +215,8 @@ sxie_attach(struct device *parent, struc
>   if (faa->fa_nreg < 1)
>   return;
>  
> + pinctrl_byname(faa->fa_node, "default");
> +
>   sc->sc_iot = faa->fa_iot;
>  
>   if (bus_space_map(sc->sc_iot, faa->fa_reg[0].addr,
> @@ -273,11 +275,9 @@ sxie_attach(struct device *parent, struc
>  void
>  sxie_socware_init(struct sxie_softc *sc)
>  {
> - int i, have_mac = 0;
> + int have_mac = 0;
>   uint32_t reg;
>  
> - for (i = 0; i < SXIPIO_EMAC_NPINS; i++)
> - sxipio_setcfg(i, 2); /* mux pins to EMAC */
>   sxiccmu_enablemodule(CCMU_EMAC);
>  
>   /* MII clock cfg */
> 
> 



Re: MP-safe L2 "lookup" w/o atomic operation

2016-08-15 Thread Martin Pieuchot
On 11/08/16(Thu) 16:04, Martin Pieuchot wrote:
> One of the remaining SMP issue with our routing table usage is to
> guarantee that the L2 entry referenced by a RTF_GATEWAY route via
> the ``rt_gwroute'' pointer wont be replaced/invalidated by another
> CPU while we are filling the address field of an Ethernet frame.
> 
> The most efficient way, performance wise, to do that is to make the
> ``rt_gwroute'' pointer immutable during the lifetime of a RTF_GATEWAY
> route.  If we know that this pointer won't change and that the memory
> it points to won't be freed as long as a CPU has reference to one of
> the RTF_GATEWAY routes, we don't need any special protection inside
> arp_resolve() and nd6_resolve().
> 
> Diff below achieve that by always resolving the next-hop entry before
> inserting the corresponding RTF_GATEWAY route in the tree.  In other
> words rt_setgwroute() is no longer called in the sending path.
> 
> To guarantee that a cached route won't be deleted while it is still
> referenced a new flag, RTF_CACHED, is added to the route.  arp(8) and
> ndp(8) treat entry with this flag like RTF_LOCAL.
> 
> Removing rt_setgwroute() from the sending path means that inserting a
> RTF_GATEWAY route will now fail if the next-hop cannot be resolve.  This
> might introduce regression for some setups but I see that has an
> improvement since the kernel no longer let you add a route that cannot
> be used.
> It also mean that stale routes need to be fixed whenever a new address
> is configured.  The diff does that and also plug an ifa leak that was
> happening when the same address is configured twice on an ifa.
> 
> Note that even with this diff there are *still* some MP-safeness issue
> due to stale routes, they will be address in a later diff.

Updated diff that:

 - Fix a panic reported by Hrvoje Popovski 

 - Properly invalidate ARP/NDP entry when issuing a RTM_DELETE from
   userland, as discussed with bluhm@.  For that I introduced
   RTM_INVALIDATE and use it instead of deleting L2 entries.  With
   this we no longer generate deletion/insertion for L2 entries, of
   course that include RTF_CACHED.

 - This has been tested with a pppoe(4) setup to make sure that
   inserting a route with a magic 0.0.0.1 gateway works, pointed out
   by claudio@.

More tests, comments, ok are welcome.

Index: net/route.c
===
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.313
diff -u -p -r1.313 route.c
--- net/route.c 22 Jul 2016 11:03:30 -  1.313
+++ net/route.c 15 Aug 2016 08:58:39 -
@@ -153,7 +153,9 @@ struct pool rtentry_pool;   /* pool for r
 struct poolrttimer_pool;   /* pool for rttimer structures */
 
 void   rt_timer_init(void);
-void   rt_setgwroute(struct rtentry *, u_int);
+intrt_setgwroute(struct rtentry *, u_int);
+void   rt_putgwroute(struct rtentry *);
+intrt_fixgwroute(struct rtentry *, void *, unsigned int);
 intrtflushclone1(struct rtentry *, void *, u_int);
 void   rtflushclone(unsigned int, struct rtentry *);
 intrt_if_remove_rtdelete(struct rtentry *, void *, u_int);
@@ -204,21 +206,20 @@ rtisvalid(struct rtentry *rt)
if (rt == NULL)
return (0);
 
-#ifdef DIAGNOSTIC
-   if (ISSET(rt->rt_flags, RTF_GATEWAY) && (rt->rt_gwroute != NULL) &&
-   ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY))
-   panic("next hop must be directly reachable");
-#endif
-
-   if ((rt->rt_flags & RTF_UP) == 0)
+   if (!ISSET(rt->rt_flags, RTF_UP))
return (0);
 
/* Routes attached to stale ifas should be freed. */
if (rt->rt_ifa == NULL || rt->rt_ifa->ifa_ifp == NULL)
return (0);
 
-   if (ISSET(rt->rt_flags, RTF_GATEWAY) && !rtisvalid(rt->rt_gwroute))
-   return (0);
+#ifdef DIAGNOSTIC
+   if (ISSET(rt->rt_flags, RTF_GATEWAY)) {
+   KASSERT(rt->rt_gwroute != NULL);
+   KASSERT(ISSET(rt->rt_gwroute->rt_flags, RTF_UP));
+   KASSERT(!ISSET(rt->rt_gwroute->rt_flags, RTF_GATEWAY));
+   }
+#endif /* DIAGNOSTIC */
 
return (1);
 }
@@ -267,8 +268,6 @@ rt_match(struct sockaddr *dst, uint32_t 
return (rt);
 }
 
-struct rtentry *_rtalloc(struct sockaddr *, uint32_t *, int, unsigned int);
-
 #ifndef SMALL_KERNEL
 /*
  * Originated from bridge_hash() in if_bridge.c
@@ -349,16 +348,10 @@ rt_hash(struct rtentry *rt, struct socka
 struct rtentry *
 rtalloc_mpath(struct sockaddr *dst, uint32_t *src, unsigned int rtableid)
 {
-   return (_rtalloc(dst, src, RT_RESOLVE, rtableid));
+   return (rt_match(dst, src, RT_RESOLVE, rtableid));
 }
 #endif /* SMALL_KERNEL */
 
-struct rtentry *
-rtalloc(struct sockaddr *dst, int flags, unsigned int rtableid)
-{
-   return (_rtalloc(dst, NULL, flags, rtableid));
-}
-
 /*
  * Look in the routing table for the best matching entry for
  * ``dst''.
@@ -367,44 +360,35 @@ rtalloc(struct sockaddr *dst, int 

Re: httpd: be stricter with TLS configuration

2016-08-15 Thread Reyk Floeter
On Sat, Aug 13, 2016 at 02:57:14AM +1000, Joel Sing wrote:
> The following diff makes httpd stricter with respect to TLS configuration:
> 
> - Do not allow TLS and non-TLS to be configured on the same port.
> - Do not allow TLS options to be specified without a TLS listener.
> - Ensure that TLS options are the same when a server is specified on the
>   same address/port.
> 
> Currently, these configurations are permitted but do not work as intended.
> 
> This also factors out (and reuses) the server matching code that was already
> duplicated.
> 
> ok?
> 

- I think server_match() and server_tls_cmp() can both live in
server.c (server_match() somewhere close to server_foreach() - this
match function can be used for at least one other case outside of
parse.y).

- As discussed before, for consistency with the config, please use
"tls" instead of "TLS" in the log messages. 

FYI, The SNI diff doesn't like the tls_cert_file and tls_key_file
checks in server_tls_cmp(), as they now become valid, but they can be
removed/changed later.

Otherwise OK reyk@

> Index: parse.y
> ===
> RCS file: /cvs/src/usr.sbin/httpd/parse.y,v
> retrieving revision 1.78
> diff -u -p -r1.78 parse.y
> --- parse.y   21 Jun 2016 21:35:24 -  1.78
> +++ parse.y   12 Aug 2016 16:48:28 -
> @@ -107,8 +107,10 @@ int   host_if(const char *, struct addre
>  int   host(const char *, struct addresslist *,
>   int, struct portrange *, const char *, int);
>  void  host_free(struct addresslist *);
> +struct server*server_match(struct server *, int);
>  struct server*server_inherit(struct server *, struct server_config *,
>   struct server_config *);
> +int   server_tls_cmp(struct server *, struct server *);
>  int   getservice(char *);
>  int   is_if_in_group(const char *, const char *);
>  
> @@ -283,24 +285,13 @@ server  : SERVER optmatch STRING{
>  
>   TAILQ_INSERT_TAIL(>srv_hosts, srv_conf, entry);
>   } '{' optnl serveropts_l '}'{
> - struct server   *s = NULL, *sn;
> + struct server   *s, *sn;
>   struct server_config*a, *b;
>  
>   srv_conf = >srv_conf;
>  
> - TAILQ_FOREACH(s, conf->sc_servers, srv_entry) {
> - if ((s->srv_conf.flags &
> - SRVFLAG_LOCATION) == 0 &&
> - strcmp(s->srv_conf.name,
> - srv->srv_conf.name) == 0 &&
> - s->srv_conf.port == srv->srv_conf.port &&
> - sockaddr_cmp(
> - (struct sockaddr *)>srv_conf.ss,
> - (struct sockaddr *)>srv_conf.ss,
> - s->srv_conf.prefixlen) == 0)
> - break;
> - }
> - if (s != NULL) {
> + /* Check if the new server already exists. */
> + if (server_match(srv, 1) != NULL) {
>   yyerror("server \"%s\" defined twice",
>   srv->srv_conf.name);
>   serverconfig_free(srv_conf);
> @@ -315,16 +306,39 @@ server  : SERVER optmatch STRING{
>   YYERROR;
>   }
>  
> + if ((s = server_match(srv, 0)) != NULL) {
> + if ((s->srv_conf.flags & SRVFLAG_TLS) != 
> + (srv->srv_conf.flags & SRVFLAG_TLS)) {
> + yyerror("server \"%s\": TLS and "
> + "non-TLS on same address/port",
> + srv->srv_conf.name);
> + serverconfig_free(srv_conf);
> + free(srv);
> + YYERROR;
> + }
> + if (server_tls_cmp(s, srv) != 0) {
> + yyerror("server \"%s\": TLS "
> + "configuration mismatch on same "
> + "address/port",
> + srv->srv_conf.name);
> + serverconfig_free(srv_conf);
> + free(srv);
> + YYERROR;
> + }
> + }
> +
>   if ((srv->srv_conf.flags & SRVFLAG_TLS) &&
>   srv->srv_conf.tls_protocols == 0) {
> - yyerror("no TLS 

cvs(1): default to a)bort for empty commit log

2016-08-15 Thread Theo Buehler
This is one of the quirks of cvs that I have a hard time getting used to:

$ cvs commit

Log message unchanged or not specified
a)bort, c)ontinue, e)dit, !)reuse this message unchanged for remaining dirs
Action: (continue)

I don't think I'll ever want to c)ontinue here because I don't want to
commit anything without a log message.  I exit the editor without
writing the log file precisely because there might be something wrong,
something I need to go double check.  In that situation, I don't need
the extra thrill of making sure that I choose a non-default option so I
don't change anything.

Thus, I suggest to default to abort here.

Index: src/logmsg.c
===
RCS file: /var/cvs/src/gnu/usr.bin/cvs/src/logmsg.c,v
retrieving revision 1.4
diff -u -p -r1.4 logmsg.c
--- src/logmsg.c4 Mar 2012 04:05:15 -   1.4
+++ src/logmsg.c15 Aug 2016 03:25:27 -
@@ -346,7 +346,7 @@ do_editor (dir, messagep, repository, ch
{
(void) printf ("\nLog message unchanged or not specified\n");
(void) printf ("a)bort, c)ontinue, e)dit, !)reuse this message 
unchanged for remaining dirs\n");
-   (void) printf ("Action: (continue) ");
+   (void) printf ("Action: (abort) ");
(void) fflush (stdout);
line_length = get_line (, _chars_allocated, stdin);
if (line_length < 0)
@@ -358,14 +358,14 @@ do_editor (dir, messagep, repository, ch
error (1, 0, "aborting");
}
else if (line_length == 0
-|| *line == '\n' || *line == 'c' || *line == 'C')
-   break;
-   if (*line == 'a' || *line == 'A')
+|| *line == '\n' || *line == 'a' || *line == 'A')
{
if (unlink_file (fname) < 0)
error (0, errno, "warning: cannot remove temp file %s", 
fname);
error (1, 0, "aborted by user");
}
+   if (*line == 'c' || *line == 'C')
+   break;  
if (*line == 'e' || *line == 'E')
goto again;
if (*line == '!')



pppoe(4) IPv6 example

2016-08-15 Thread Martin Pieuchot
Using fe80:: as gateway is not be a good idea since by default
netstart(8) insert:

fe80::/10  ::1  UGRS0  0 32768 8 lo0  

As you can see this route is attached to lo0 *and* has RTF_GATEWAY.

This contradicts what rt_setgwroute() wants and make the kernel unhappy,
diff below fixes that.

ok?

Index: pppoe.4
===
RCS file: /cvs/src/share/man/man4/pppoe.4,v
retrieving revision 1.31
diff -u -p -r1.31 pppoe.4
--- pppoe.4 1 Apr 2016 05:47:34 -   1.31
+++ pppoe.4 15 Aug 2016 09:00:56 -
@@ -102,7 +102,7 @@ inet 0.0.0.0 255.255.255.255 NONE \e
 dest 0.0.0.1
 inet6 eui64
 !/sbin/route add default -ifp pppoe0 0.0.0.1
-!/sbin/route add -inet6 default -ifp pppoe0 fe80::
+!/sbin/route add -inet6 default -ifp pppoe0 fe80::%pppoe0
 .Ed
 .Pp
 The physical interface must also be marked



Re: Correct order of route removal

2016-08-15 Thread Martin Pieuchot
On 15/08/16(Mon) 10:25, Claudio Jeker wrote:
> On Mon, Aug 15, 2016 at 08:42:06AM +0200, Martin Pieuchot wrote:
> > On 08/08/16(Mon) 11:48, Martin Pieuchot wrote:
> > > The rtable_walk() & prio bug I just sent a fix for should theoretically
> > > not cause any trouble.  Sadly it is piled on top of another bug for
> > > which a fix is attached.
> > > 
> > > When an interface is removed the current code starts by purging all its
> > > corresponding route entries.  This is wrong because the per-AF code has
> > > some knowledge of which automagic route should be removed first.
> > > 
> > > In other words, the rtable_walk() hang should never have been triggered
> > > because the IPv4-specific code should take care of removing the
> > > RTF_BROADCAST entry.  I believe that this ordering problem is the reason
> > > why error code are ignored in AF-specific code paths.
> > > 
> > > Diff attached fixes that, ok?
> > 
> > Anyone?
> > 
> 
> IMO this is a bit of a catch-22 here. Until recently we did store less
> interface specific stuff in the routing table and so it made sense to
> clean up the routes first (the assumption was that this will first remove
> all gateway routes and probably also the arp/ndp caches). 

This was done when no RTF_LOCAL/RTF_BROADCAST existed.

>   This makes
> removing of the interface specific stuff simpler because the upper layers
> are already clean.

How does it make it simpler?

>Now we turn this upside down but then we may get burned
> because gateway routes are cleaned too late and put the routing table into
> an inconsitent state.

Which inconsistent state?  Cached route are still referenced.

>   I know you removed a fair amount of references that
> would cause crashes because of bad pointers. Need to update my mental
> model of the rtable / ifa / ifp interaction to see if this is indeed not
> possible.

Sure, you need to look at in_ifinit() and in_purgeaddr().  Currently
rt_ifa_dellocal() and the rt_ifa_del() are noop and return errors.
That's why errors are not checked.

>   At the same time I wonder if we need multiple walks to make this
> save and easier to understand... (replacing dragons with other smaller
> dragons is not that helpful because they grow over time).

I'm not sure to understand.

> I try to find time tonight to look into this.

Thanks.



Re: Correct order of route removal

2016-08-15 Thread Claudio Jeker
On Mon, Aug 15, 2016 at 08:42:06AM +0200, Martin Pieuchot wrote:
> On 08/08/16(Mon) 11:48, Martin Pieuchot wrote:
> > The rtable_walk() & prio bug I just sent a fix for should theoretically
> > not cause any trouble.  Sadly it is piled on top of another bug for
> > which a fix is attached.
> > 
> > When an interface is removed the current code starts by purging all its
> > corresponding route entries.  This is wrong because the per-AF code has
> > some knowledge of which automagic route should be removed first.
> > 
> > In other words, the rtable_walk() hang should never have been triggered
> > because the IPv4-specific code should take care of removing the
> > RTF_BROADCAST entry.  I believe that this ordering problem is the reason
> > why error code are ignored in AF-specific code paths.
> > 
> > Diff attached fixes that, ok?
> 
> Anyone?
> 

IMO this is a bit of a catch-22 here. Until recently we did store less
interface specific stuff in the routing table and so it made sense to
clean up the routes first (the assumption was that this will first remove
all gateway routes and probably also the arp/ndp caches). This makes
removing of the interface specific stuff simpler because the upper layers
are already clean. Now we turn this upside down but then we may get burned
because gateway routes are cleaned too late and put the routing table into
an inconsitent state. I know you removed a fair amount of references that
would cause crashes because of bad pointers. Need to update my mental
model of the rtable / ifa / ifp interaction to see if this is indeed not
possible. At the same time I wonder if we need multiple walks to make this
save and easier to understand... (replacing dragons with other smaller
dragons is not that helpful because they grow over time).

I try to find time tonight to look into this.

> > Index: net/if.c
> > ===
> > RCS file: /cvs/src/sys/net/if.c,v
> > retrieving revision 1.436
> > diff -u -p -r1.436 if.c
> > --- net/if.c13 Jul 2016 16:45:19 -  1.436
> > +++ net/if.c22 Jul 2016 12:45:28 -
> > @@ -931,7 +931,6 @@ if_detach(struct ifnet *ifp)
> >  #if NBPFILTER > 0
> > bpfdetach(ifp);
> >  #endif
> > -   rt_if_remove(ifp);
> > rti_delete(ifp);
> >  #if NETHER > 0 && defined(NFSCLIENT)
> > if (ifp->if_index == revarp_ifidx)
> > @@ -944,6 +943,7 @@ if_detach(struct ifnet *ifp)
> >  #ifdef INET6
> > in6_ifdetach(ifp);
> >  #endif
> > +   rt_if_remove(ifp);
> >  #if NPF > 0
> > pfi_detach_ifnet(ifp);
> >  #endif
> > @@ -1931,15 +1931,15 @@ ifioctl(struct socket *so, u_long cmd, c
> >  */
> > if (up)
> > if_down(ifp);
> > -   rt_if_remove(ifp);
> > rti_delete(ifp);
> >  #ifdef MROUTING
> > vif_delete(ifp);
> >  #endif
> > +   in_ifdetach(ifp);
> >  #ifdef INET6
> > in6_ifdetach(ifp);
> >  #endif
> > -   in_ifdetach(ifp);
> > +   rt_if_remove(ifp);
> > splx(s);
> > }
> >  
> 

-- 
:wq Claudio



Re: rtable_walk() hand and route prio

2016-08-15 Thread Claudio Jeker
On Mon, Aug 15, 2016 at 08:41:52AM +0200, Martin Pieuchot wrote:
> On 08/08/16(Mon) 11:42, Martin Pieuchot wrote:
> > On the train back from n2k16 I found the real cause of the hang reported
> > by Dimitris Papastamos [0] and exposed by our recent
> > changes to the routing table.
> > 
> > When an interface is removed/detached the kernel delete all the
> > corresponding route entries.  At this moment the interface is
> > DOWN and the corresponding route as well.  So the priority check
> > should consider that.
> > 
> > Without the diff attached or the workaround to stop iterating when
> > an error occurs, the kernel would loop forever since it can't remove
> > the RTF_BROADCAST entry.
> > 
> > [0] https://marc.info/?l=openbsd-bugs=146909621511954=2
> > 
> > ok?
> 
> Anyone?
> 

This is correct. OK claudio@

IIRC this is not the first time we go burned by RTP_MASK so I wonder if we
should use a macro to access prio that does the masking internally or
removing the DOWN prio but then then all checks need to also check for
RTF_UP...

> > Index: net/route.c
> > ===
> > RCS file: /cvs/src/sys/net/route.c,v
> > retrieving revision 1.313
> > diff -u -p -r1.313 route.c
> > --- net/route.c 22 Jul 2016 11:03:30 -  1.313
> > +++ net/route.c 8 Aug 2016 09:33:15 -
> > @@ -873,7 +873,7 @@ rtrequest_delete(struct rt_addrinfo *inf
> >  * kernel.
> >  */
> > if ((rt->rt_flags & (RTF_LOCAL|RTF_BROADCAST)) &&
> > -   prio != RTP_LOCAL) {
> > +   (prio & RTP_MASK) != RTP_LOCAL) {
> > rtfree(rt);
> > return (EINVAL);
> > }
> 

-- 
:wq Claudio



Re: a few userspace quad_t conversions

2016-08-15 Thread Martin Natano
On Sun, Aug 14, 2016 at 09:59:21PM -0700, Philip Guenther wrote:
> 
> du is an easy on, though the use of fts_number here reminded me that we 
> need to increase to a long long at some point like the other BSDs did.
> 
> hexdump is...subtle, as it both creates format strings on the fly and 
> *rewrites* them!  Turns out it was using %q for both of those, hah!  The 
> display.c change is a straight forward quad_t-->int64_t conversion, while 
> the parse.c change is to use %ll instead of %q.
> 
> systat just needs u_quad_t to be converted to uint64_t, but while we're 
> here let's convert bcopy() to memcpy() and bzero() to memset().
> 
> ok?

QK, reads fine to me. See one comment below.


> 
> PHilip
> 
> Index: usr.bin/du/du.c
> ===
> RCS file: /data/src/openbsd/src/usr.bin/du/du.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 du.c
> --- usr.bin/du/du.c   10 Oct 2015 05:32:52 -  1.31
> +++ usr.bin/du/du.c   15 Aug 2016 04:08:25 -
> @@ -50,7 +50,7 @@
>  
>  
>  int   linkchk(FTSENT *);
> -void  prtout(quad_t, char *, int);
> +void  prtout(int64_t, char *, int);
>  void  usage(void);
>  
>  int
> @@ -59,7 +59,7 @@ main(int argc, char *argv[])
>   FTS *fts;
>   FTSENT *p;
>   long blocksize;
> - quad_t totalblocks;
> + int64_t totalblocks;
>   int ftsoptions, listfiles, maxdepth;
>   int Hflag, Lflag, cflag, hflag, kflag;
>   int ch, notused, rval;
> @@ -177,7 +177,7 @@ main(int argc, char *argv[])
>* root of a traversal, display the total.
>*/
>   if (p->fts_level <= maxdepth)
> - prtout((quad_t)howmany(p->fts_number,
> + prtout(howmany(p->fts_number,
>   (unsigned long)blocksize), p->fts_path,
>   hflag);
>   break;
> @@ -207,7 +207,7 @@ main(int argc, char *argv[])
>   if (errno)
>   err(1, "fts_read");
>   if (cflag) {
> - prtout((quad_t)howmany(totalblocks, blocksize), "total", hflag);
> + prtout(howmany(totalblocks, blocksize), "total", hflag);
>   }
>   fts_close(fts);
>   exit(rval);
> @@ -301,17 +301,17 @@ linkchk(FTSENT *p)
>  }
>  
>  void
> -prtout(quad_t size, char *path, int hflag)
> +prtout(int64_t size, char *path, int hflag)
>  {
>   if (!hflag)
> - (void)printf("%lld\t%s\n", (long long)size, path);
> + (void)printf("%lld\t%s\n", size, path);
>   else {
>   char buf[FMT_SCALED_STRSIZE];
>  
>   if (fmt_scaled(size * 512, buf) == 0)
>   (void)printf("%s\t%s\n", buf, path);
>   else
> - (void)printf("%lld\t%s\n", (long long)size, path);
> + (void)printf("%lld\t%s\n", size, path);
>   }
>  }
>  
> Index: usr.bin/hexdump/display.c
> ===
> RCS file: /data/src/openbsd/src/usr.bin/hexdump/display.c,v
> retrieving revision 1.24
> diff -u -p -r1.24 display.c
> --- usr.bin/hexdump/display.c 15 Mar 2016 04:19:13 -  1.24
> +++ usr.bin/hexdump/display.c 15 Aug 2016 04:16:35 -
> @@ -100,7 +100,7 @@ display(void)
>   for (pr = endfu->nextpr; pr; pr = pr->nextpr)
>   switch(pr->flags) {
>   case F_ADDRESS:
> - (void)printf(pr->fmt, (quad_t)eaddress);
> + (void)printf(pr->fmt, (int64_t)eaddress);
>   break;
>   case F_TEXT:
>   (void)printf("%s", pr->fmt);
> @@ -123,7 +123,7 @@ print(PR *pr, u_char *bp)
>  
>   switch(pr->flags) {
>   case F_ADDRESS:
> - (void)printf(pr->fmt, (quad_t)address);
> + (void)printf(pr->fmt, (int64_t)address);
>   break;
>   case F_BPAD:
>   (void)printf(pr->fmt, "");
> @@ -149,15 +149,15 @@ print(PR *pr, u_char *bp)
>   case F_INT:
>   switch(pr->bcnt) {
>   case 1:
> - (void)printf(pr->fmt, (quad_t)*bp);
> + (void)printf(pr->fmt, (int64_t)*bp);
>   break;
>   case 2:
>   memmove(, bp, sizeof(s2));
> - (void)printf(pr->fmt, (quad_t)s2);
> + (void)printf(pr->fmt, (int64_t)s2);
>   break;
>   case 4:
>   memmove(, bp, sizeof(s4));
> - (void)printf(pr->fmt, (quad_t)s4);
> + (void)printf(pr->fmt, (int64_t)s4);
>   break;
>   case 8:
>   memmove(, bp, sizeof(s8));
> @@ -180,15 +180,15 @@ print(PR *pr, u_char *bp)
>   case F_UINT:
>   switch(pr->bcnt) {
>

Re: Correct order of route removal

2016-08-15 Thread Martin Pieuchot
On 08/08/16(Mon) 11:48, Martin Pieuchot wrote:
> The rtable_walk() & prio bug I just sent a fix for should theoretically
> not cause any trouble.  Sadly it is piled on top of another bug for
> which a fix is attached.
> 
> When an interface is removed the current code starts by purging all its
> corresponding route entries.  This is wrong because the per-AF code has
> some knowledge of which automagic route should be removed first.
> 
> In other words, the rtable_walk() hang should never have been triggered
> because the IPv4-specific code should take care of removing the
> RTF_BROADCAST entry.  I believe that this ordering problem is the reason
> why error code are ignored in AF-specific code paths.
> 
> Diff attached fixes that, ok?

Anyone?

> Index: net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.436
> diff -u -p -r1.436 if.c
> --- net/if.c  13 Jul 2016 16:45:19 -  1.436
> +++ net/if.c  22 Jul 2016 12:45:28 -
> @@ -931,7 +931,6 @@ if_detach(struct ifnet *ifp)
>  #if NBPFILTER > 0
>   bpfdetach(ifp);
>  #endif
> - rt_if_remove(ifp);
>   rti_delete(ifp);
>  #if NETHER > 0 && defined(NFSCLIENT)
>   if (ifp->if_index == revarp_ifidx)
> @@ -944,6 +943,7 @@ if_detach(struct ifnet *ifp)
>  #ifdef INET6
>   in6_ifdetach(ifp);
>  #endif
> + rt_if_remove(ifp);
>  #if NPF > 0
>   pfi_detach_ifnet(ifp);
>  #endif
> @@ -1931,15 +1931,15 @@ ifioctl(struct socket *so, u_long cmd, c
>*/
>   if (up)
>   if_down(ifp);
> - rt_if_remove(ifp);
>   rti_delete(ifp);
>  #ifdef MROUTING
>   vif_delete(ifp);
>  #endif
> + in_ifdetach(ifp);
>  #ifdef INET6
>   in6_ifdetach(ifp);
>  #endif
> - in_ifdetach(ifp);
> + rt_if_remove(ifp);
>   splx(s);
>   }
>  



Re: rtable_walk() hand and route prio

2016-08-15 Thread Martin Pieuchot
On 08/08/16(Mon) 11:42, Martin Pieuchot wrote:
> On the train back from n2k16 I found the real cause of the hang reported
> by Dimitris Papastamos [0] and exposed by our recent
> changes to the routing table.
> 
> When an interface is removed/detached the kernel delete all the
> corresponding route entries.  At this moment the interface is
> DOWN and the corresponding route as well.  So the priority check
> should consider that.
> 
> Without the diff attached or the workaround to stop iterating when
> an error occurs, the kernel would loop forever since it can't remove
> the RTF_BROADCAST entry.
> 
> [0] https://marc.info/?l=openbsd-bugs=146909621511954=2
> 
> ok?

Anyone?

> Index: net/route.c
> ===
> RCS file: /cvs/src/sys/net/route.c,v
> retrieving revision 1.313
> diff -u -p -r1.313 route.c
> --- net/route.c   22 Jul 2016 11:03:30 -  1.313
> +++ net/route.c   8 Aug 2016 09:33:15 -
> @@ -873,7 +873,7 @@ rtrequest_delete(struct rt_addrinfo *inf
>* kernel.
>*/
>   if ((rt->rt_flags & (RTF_LOCAL|RTF_BROADCAST)) &&
> - prio != RTP_LOCAL) {
> + (prio & RTP_MASK) != RTP_LOCAL) {
>   rtfree(rt);
>   return (EINVAL);
>   }