Re: Infinite loop in uvm protection mapping

2020-10-19 Thread Philip Guenther
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet  wrote:

> Hi,
>
> I'm starting to help in the development of the dt device.
>
> I'm stuck on permission handling of memory. I'm trying to allocate a
> page in kernel with read/write protections, fill the allocated page
> with data then change the permissions to  read/exec.
>
> Snippet of my code:
>
>   addr = uvm_km_alloc(kernel_map, PAGE_SIZE);
>
>  [...] (memcpy data in allocated page)
>
>   uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
>  | PROT_EXEC, FALSE)))
>

This is same usage as seen in the 'sti' driver...which is on hppa only, so
while it's presumably the correct usage of uvm_km_alloc() and
uvm_map_protect(), I don't think uvm_map_protect() has been used on
kernel-space on amd64 (or possibly all non-hppa archs) before in OpenBSD.
Whee?


It triggers the following error at boot time when executing
> the uvm_map_protect function.
>
> uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
> trap, code=0 Stopped atpmap_write_protect+0x1f5:  lock andq
> $-0x3,0(%rdi)
>
> Trace:
>
> pmap_write_protect(82187b28,80002255b000,80002255c000,
>  5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
> uvm_map_protect(82129ae0,80002255b000,80002255c000
>  ,5,0,82129ae0) at uvm_map_protect+0x501
> dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
>  82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
> dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
>  824d9008) at dt_prov_kprobe_init+0x1d9
> dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
> main(0,0,0,0,0,1) at main+0x419
>
> The problem comes from the loop in pmap_write_protect
> (sys/arch/amd64/amd64/pmap.c:2108) that is executed
> infinity in my case.
>
> Entry of function pmap_write_protect:
>  sva:  80002250A000
>  eva:  80002250B000
>
> After &= PG_FRAME (line 2098-2099)
>  sva= F80002250A000
>  eva= F80002250B000
>
>   loop:  (ligne 2108)
>
>   first iteration:
>  va   = F80002250A000
>  eva = F80002250B000
>  blockend = 080012240
>
...

> Does anyone have an idea how to fix this issue?


So, blockend is clearly wrong for va and eva.  I suspect the use of
L2_FRAME here:
   blockend = (va & L2_FRAME) + NBPD_L2;

is wrong here and it should be
   blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;

or some equivalent expression to keep all the bits above the frame.


Philip Guenther


Re: [PATCH] umb(4) fix for X20 (DW5821e) in Dell Latitude 7300

2020-10-19 Thread Bryan Vyhmeister
On Fri, Oct 02, 2020 at 12:33:15PM -0700, Bryan Vyhmeister wrote:
> On Wed, Sep 30, 2020 at 04:05:51PM -0700, Bryan Vyhmeister wrote:
> > Gerhard Roth provided a patch to me to get the Qualcomm Snapdragon X20
> > umb(4) interface in my Dell Latitude 7300 working. It is also known as a
> > Dell DW5821e interface. I have been using this patch for months now with
> > AT&T Wireless in the US and it works great just as I have been used to
> > on my ThinkPad X270 with the older umb(4) interface. This is how it
> > shows up now in my dmesg:
> > 
> > umb0 at uhub0 port 16 "Dell Inc. DW5821e Snapdragon X20 LTE" rev
> > 3.10/3.18 addr 2
> > 
> > I would love to get this committed at some point so I no longer have to
> > keep compiling a new kernel for every snapshot to have umb(4) working.
> 
> Thanks to jmc@ for suggesting a man page diff as well. Both the diff for
> if_umb.c and umb.4 are below.

Ping.

Bryan



Index: share/man/man4/umb.4
===
RCS file: /cvs/src/share/man/man4/umb.4,v
retrieving revision 1.11
diff -u -p -r1.11 umb.4
--- share/man/man4/umb.412 May 2020 13:03:52 -  1.11
+++ share/man/man4/umb.42 Oct 2020 19:30:53 -
@@ -44,6 +44,7 @@ PIN again even if the system was reboote
 The following devices should work:
 .Pp
 .Bl -tag -width Ds -offset indent -compact
+.It Dell DW5821e
 .It Ericsson H5321gw and N5321gw
 .It Fibocom L831-EAU
 .It Medion Mobile S4222 (MediaTek OEM)
Index: sys/dev/usb/if_umb.c
===
RCS file: /cvs/src/sys/dev/usb/if_umb.c,v
retrieving revision 1.36
diff -u -p -r1.36 if_umb.c
--- sys/dev/usb/if_umb.c22 Jul 2020 02:16:02 -  1.36
+++ sys/dev/usb/if_umb.c2 Oct 2020 19:30:53 -
@@ -1,4 +1,4 @@
-/* $OpenBSD: if_umb.c,v 1.36 2020/07/22 02:16:02 dlg Exp $ */
+/* $OpenBSD: if_umb.c,v 1.34 2020/05/04 14:41:03 gerhard Exp $ */
 
 /*
  * Copyright (c) 2016 genua mbH
@@ -225,6 +225,26 @@ const struct cfattach umb_ca = {
 int umb_delay = 4000;
 
 /*
+ * Normally, MBIM devices are detected by their interface class and subclass.
+ * But for some models that have multiple configurations, it is better to
+ * match by vendor and product id so that we can select the desired
+ * configuration ourselves.
+ *
+ * OTOH, some devices identifiy themself als an MBIM device but fail to speak
+ * the MBIM protocol.
+ */
+struct umb_products {
+   struct usb_devno dev;
+   int  confno;
+};
+const struct umb_products umb_devs[] = {
+   { { USB_VENDOR_DELL, 0x81d7 }, 2 }, /* XXX FIXME */
+};
+
+#define umb_lookup(vid, pid)   \
+   ((const struct umb_products *)usb_lookup(umb_devs, vid, pid))
+
+/*
  * These devices require an "FCC Authentication" command.
  */
 const struct usb_devno umb_fccauth_devs[] = {
@@ -263,6 +283,8 @@ umb_match(struct device *parent, void *m
struct usb_attach_arg *uaa = aux;
usb_interface_descriptor_t *id;
 
+   if (umb_lookup(uaa->vendor, uaa->product) != NULL)
+   return UMATCH_VENDOR_PRODUCT;
if (!uaa->iface)
return UMATCH_NONE;
if ((id = usbd_get_interface_descriptor(uaa->iface)) == NULL)
@@ -315,6 +337,43 @@ umb_attach(struct device *parent, struct
sc->sc_ctrl_ifaceno = uaa->ifaceno;
ml_init(&sc->sc_tx_ml);
 
+   if (uaa->configno < 0) {
+   /*
+* In case the device was matched by VID/PID instead of
+* InterfaceClass/InterfaceSubClass, we have to pick the
+* correct configuration ourself.
+*/
+   uaa->configno = umb_lookup(uaa->vendor, uaa->product)->confno;
+   DPRINTF("%s: switching to config #%d\n", DEVNAM(sc),
+   uaa->configno);
+   status = usbd_set_config_no(sc->sc_udev, uaa->configno, 1);
+   if (status) {
+   printf("%s: failed to switch to config #%d: %s\n",
+   DEVNAM(sc), uaa->configno, usbd_errstr(status));
+   goto fail;
+   }
+   usbd_delay_ms(sc->sc_udev, 200);
+
+   /*
+* Need to do some manual setups that usbd_probe_and_attach()
+* would do for us otherwise.
+*/
+   uaa->nifaces = uaa->device->cdesc->bNumInterface;
+   for (i = 0; i < uaa->nifaces; i++) {
+   if (usbd_iface_claimed(sc->sc_udev, i))
+   continue;
+   id = 
usbd_get_interface_descriptor(&uaa->device->ifaces[i]);
+   if (id != NULL && id->bInterfaceClass == UICLASS_CDC &&
+   id->bInterfaceSubClass ==
+   UISUBCLASS_MOBILE_BROADBAND_INTERFACE_MODEL) {
+   uaa->iface = &uaa->device->ifaces[i];
+  

Re: net.inet.ip.forwarding=0 vs lo(4)

2020-10-19 Thread David Gwynne
On Mon, Oct 19, 2020 at 07:16:46PM +1000, David Gwynne wrote:
> On Mon, Oct 19, 2020 at 10:03:29AM +0100, Stuart Henderson wrote:
> > On 2020/10/19 11:47, David Gwynne wrote:
> > > On Sun, Oct 18, 2020 at 08:57:34PM +0100, Stuart Henderson wrote:
> > > > On 2020/10/18 14:04, David Gwynne wrote:
> > > > > the problem i'm hitting is that i have a multihomed box where the
> > > > > service it provides listens on an IP address that's assigned to lo1.
> > > > > it's a host running a service, it's not a router, so the
> > > > > net.inet.ip.forwarding sysctl is not set to 1.
> > > > 
> > > > I ran into this, I just turned on the forwarding sysctl to avoid the
> > > > problem.
> > > > 
> > > > > i came up with this diff, which adds even more special casing for
> > > > > loopback interfaces. it says addreesses on loopbacks are globally
> > > > > reachable, even if ip forwarding is disabled.
> > > > 
> > > > I don't see why loopbacks should be special. Another place this
> > > > might show up is services running on carp addresses (I haven't updated
> > > > those machines yet but there's a fair chance they'll be affected too).
> > > > I would prefer an explicit sysctl to disable "strong host model".
> > > 
> > > loopback is already special. if a packet comes from an loopback
> > > interface, we allow it to talk to any IP on the local machine. i think
> > > this is mostly to cope with the semantic we've had where local traffic
> > > get's tied to a loopback interface instead of going anywhere near the
> > > physical ones.
> > > 
> > > carp is also special.
> > > 
> > > let me paste the ip_laddr function instead of the diff to it, it's a bit
> > > more obvious what's going on:
> > 
> > Thanks, that will already work for the machines I was thinking of then.
> > 
> > > back to loopback and receiving packets. loopback is special because it
> > > is not connected to the outside world. it is impossible to send a packet
> > > via a loopback interface from another host, so configuring a globally
> > > (externally) routable IP on it is currently pointless unless you enable
> > > forwarding. i think making loopback more special and allowing it
> > > to be globally reachable makes sense. i can't think of any downsides to
> > > this at the moment, except that the behaviour would be subtle/not
> > > obvious
> > 
> > ok, so it makes sense for this to be independent of any possible
> > separate lever.
> > 
> > > is there a need to configure a globally reachable IP on a non-loopback
> > > interface on a host (not router)? if so, then i'd be more convinced that
> > > we need a separate lever to pull.
> > 
> > I'm not using it this way, but here's a scenario.
> > 
> > Say there are a couple of webservers with addresses from a carp on
> > ethernet/vlan, with a link to their upstream router on some separate
> > interface. They announce the carp prefix into ospf.
> 
> so carp is just being used to elect a webserver as a master, and then
> the result of that election is fed upstream.
> 
> > They aren't routing themselves so the only reason to have forwarding=1
> > is to have them use "weak host model".
> > 
> > With forwarding=0 I think they'll have to use "stub router no" otherwise
> > everything will be announced high metric (rather than being dependent on
> > carp state), but ospfd explicitly handles this; it's marked in parse.y
> > with "/* allow to force non stub mode */".
> 
> so is a Big Global Lever what you want here? if you enable weak host
> mode, all IPs on the host will be addressible from all legs of the
> host. would it make more sense to configure specific interfaces as
> holding globally addressible IPs?
> 
> if my understanding of your scenario is right, you could configure
> the carp interface with the weak or globally accessible flag. in
> my situation i could configure that on lo1.

such a diff looks like this. it adds a "global" flag that you can set on
interfaces.

Index: sbin/ifconfig/ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.429
diff -u -p -r1.429 ifconfig.c
--- sbin/ifconfig/ifconfig.c7 Oct 2020 14:38:54 -   1.429
+++ sbin/ifconfig/ifconfig.c20 Oct 2020 00:12:06 -
@@ -468,6 +468,8 @@ const structcmd {
{ "-autoconfprivacy",   IFXF_INET6_NOPRIVACY,   0,  setifxflags },
{ "soii",   -IFXF_INET6_NOSOII, 0,  setifxflags },
{ "-soii",  IFXF_INET6_NOSOII,  0,  setifxflags },
+   { "global", IFXF_GLOBAL,0,  setifxflags },
+   { "-global",-IFXF_GLOBAL,   0,  setifxflags },
 #ifndef SMALL
{ "hwfeatures", NEXTARG0,   0,  printifhwfeatures },
{ "metric", NEXTARG,0,  setifmetric },
@@ -675,7 +677,7 @@ const structcmd {
"\7RUNNING\10NOARP\11PROMISC\12ALLMULTI\13OACTIVE\14SIMPLEX"\
"\15LINK0\16LINK1\17LINK2\20MULTICAST"   

Infinite loop in uvm protection mapping

2020-10-19 Thread Tom Rollet

Hi,

I'm starting to help in the development of the dt device.

I'm stuck on permission handling of memory. I'm trying to allocate a
page in kernel with read/write protections, fill the allocated page
with data then change the permissions to  read/exec.

Snippet of my code:

 addr = uvm_km_alloc(kernel_map, PAGE_SIZE);

    [...] (memcpy data in allocated page)

 uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
                                            | PROT_EXEC, FALSE)))

It triggers the following error at boot time when executing
the uvm_map_protect function.

uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
trap, code=0 Stopped at    pmap_write_protect+0x1f5:  lock andq
$-0x3,0(%rdi)

Trace:

pmap_write_protect(82187b28,80002255b000,80002255c000,
    5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
uvm_map_protect(82129ae0,80002255b000,80002255c000
    ,5,0,82129ae0) at uvm_map_protect+0x501
dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
    82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
    824d9008) at dt_prov_kprobe_init+0x1d9
dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
main(0,0,0,0,0,1) at main+0x419

The problem comes from the loop in pmap_write_protect
(sys/arch/amd64/amd64/pmap.c:2108) that is executed
infinity in my case.

Entry of function pmap_write_protect:
    sva:  80002250A000
    eva:  80002250B000

After &= PG_FRAME (line 2098-2099)
    sva= F80002250A000
    eva= F80002250B000

 loop:  (ligne 2108)

     first iteration:
        va       = F80002250A000
        eva     = F80002250B000
        blockend = 080012240

    second iteration:
        va       = 080012240
        eva     = F80002250B000
        blockend = 080022240

We can see that the problem is that the "va" variable has lost
her 4 upper set bits (48 to 51).
So now the comparison between "va" and "eva" is always false and it
loops indefinitely because "va" 4 upper bits are clean each iteration.

The fix that I found is to clear the 4 same bits in "eva".
It can be done with this mask:
      new_mask =  F000
      PG_FRAME =  000FF000


Diff with quick dirty fix:

diff --git a/sys/arch/amd64/amd64/pmap.c b/sys/arch/amd64/amd64/pmap.c
index 10ab3da2949..2783c9d26a5 100644
--- a/sys/arch/amd64/amd64/pmap.c
+++ b/sys/arch/amd64/amd64/pmap.c
@@ -2105,10 +2105,12 @@ pmap_write_protect(struct pmap *pmap, vaddr_t sva, 
vaddr_t eva, vm_prot_t prot)

    if ((eva - sva > 32 * PAGE_SIZE) && sva < VM_MIN_KERNEL_ADDRESS)
    shootall = 1;

-   for (va = sva; va < eva ; va = blockend) {
-   blockend = (va & L2_FRAME) + NBPD_L2;
-   if (blockend > eva)
-   blockend = eva;
+    vaddr_t tmp_eva = eva & 0xF000UL;
+    vaddr_t tmp_sva = sva & 0xF000UL;
+    for (va = tmp_sva; va < tmp_eva ; va = blockend) {
+    blockend = (va & L2_FRAME) + NBPD_L2;
+    if (blockend > tmp_eva)
+    blockend = tmp_eva;

    /*
 * XXXCDC: our PTE mappings should never be write-protected!


The problem is solved at boot time (I am able to boot without crashing).
But during the execution, we I jump on the page
(with read and exec permissions), the OS completely freeze without
error message, I can't launch the kernel debugger and I don't know how
debug it further.

Does anyone have an idea how to fix this issue?

Thanks

--
Tom Rollet



Re: push NET_LOCK() down in pf_ioctl.c

2020-10-19 Thread Alexandr Nedvedicky
Hello Klemens,


> > the change is fairly large, but mostly mechanical.
> Relocating malloc(9) and pool(9) seems good but other pf_*() calls are
> now locked inconsistently after your diff.
> 
> Given that only reason about "allocations" this is either an oversight
> and should be fixed or you need to provide more explanation.
> 
> See inline for the spots I'm talking about.

thanks for catching this. there was unfinished clean up in a branch
I took diff from.

Currently we need to keep pf_rm_rule() under both locks. The function
might be calling pf_tag_unref(), pf_dynaddr_remove()... which alter lists,
which are currently supposed to be protected by PF_LOCK()/NET_LOCK().

updated diff is below. pf_rm_rule() is being called with both locks held.


thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
index ef7d995e5a7..3a75cd05005 100644
--- a/sys/net/pf_ioctl.c
+++ b/sys/net/pf_ioctl.c
@@ -1006,10 +1006,10 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
return (EACCES);
}
 
-   NET_LOCK();
switch (cmd) {
 
case DIOCSTART:
+   NET_LOCK();
PF_LOCK();
if (pf_status.running)
error = EEXIST;
@@ -1025,9 +1025,11 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
DPFPRINTF(LOG_NOTICE, "pf: started");
}
PF_UNLOCK();
+   NET_UNLOCK();
break;
 
case DIOCSTOP:
+   NET_LOCK();
PF_LOCK();
if (!pf_status.running)
error = ENOENT;
@@ -1038,6 +1040,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
DPFPRINTF(LOG_NOTICE, "pf: stopped");
}
PF_UNLOCK();
+   NET_UNLOCK();
break;
 
case DIOCGETQUEUES: {
@@ -1045,6 +1048,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pf_queuespec *qs;
u_int32_tnr = 0;
 
+   NET_LOCK();
PF_LOCK();
pq->ticket = pf_main_ruleset.rules.active.ticket;
 
@@ -1056,6 +1060,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
}
pq->nr = nr;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1064,10 +1069,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pf_queuespec *qs;
u_int32_tnr = 0;
 
+   NET_LOCK();
PF_LOCK();
if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1078,10 +1085,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (qs == NULL) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
memcpy(&pq->queue, qs, sizeof(pq->queue));
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1091,10 +1100,12 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
u_int32_tnr;
int  nbytes;
 
+   NET_LOCK();
PF_LOCK();
if (pq->ticket != pf_main_ruleset.rules.active.ticket) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
nbytes = pq->nbytes;
@@ -1107,6 +1118,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (qs == NULL) {
error = EBUSY;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
memcpy(&pq->queue, qs, sizeof(pq->queue));
@@ -1121,6 +1133,7 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
if (error == 0)
pq->nbytes = nbytes;
PF_UNLOCK();
+   NET_UNLOCK();
break;
}
 
@@ -1128,38 +1141,44 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int flags, 
struct proc *p)
struct pfioc_queue  *q = (struct pfioc_queue *)addr;
struct pf_queuespec *qs;
 
-   PF_LOCK();
-   if (q->ticket != pf_main_ru

Re: pf route-to issues

2020-10-19 Thread David Gwynne
On Mon, Oct 19, 2020 at 12:33:25PM +0100, Stuart Henderson wrote:
> On 2020/10/19 19:53, David Gwynne wrote:
> > On Mon, Oct 19, 2020 at 09:34:31AM +0100, Stuart Henderson wrote:
> > > On 2020/10/19 15:35, David Gwynne wrote:
> > > > every few years i try and use route-to in pf, and every time it
> > > > goes badly. i tried it again last week in a slightly different
> > > > setting, and actually tried to understand the sharp edges i hit
> > > > this time instead of giving up. it turns out there are 2 or 3
> > > > different things together that have cause me trouble, which is why
> > > > the diff below is so big.
> > > 
> > > I used to route-to/reply-to quite a lot at places with poor internet
> > > connections to split traffic between lines (mostly those have better
> > > connections now so I don't need it as often). It worked as I expected -
> > > but I only ever used it with the interface specified.
> > 
> > cool. did it work beyond the first packet in a connection?
> 
> It must have done. The webcams would have utterly broken the rest of
> traffic if it hadn't :)


> 
> > > I mostly used it with pppoe interfaces so the peer address was unknown
> > > at ruleset load time. (I was lucky and had static IPs my side, but the
> > > ISP side was variable). I relied on the fact that once packets are
> > > directed at a point-point interface there's only one place for them to
> > > go. I didn't notice that ":peer" might be useful here (and the syntax
> > > 'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
> > > have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
> > > (though actually I think it works with $any_random_address@pppoeX).
> > 
> > yes. i was trying to use it with peers over ethernet, and always
> > struggled with the syntax.
> > 
> > > > the first and i would argue most fundamental problem is a semantic
> > > > problem. if you ask a random person who has some clue about networks
> > > > and routing what they would expect the "argument" to route-to or
> > > > reply-to to be, they would say "a nexthop address" or "a gateway
> > > > address". eg, say i want to force packets to a specific backend
> > > > server without using NAT, i would write a rule like this:
> > > > 
> > > >   n_servers="192.0.2.128/27"
> > > >   pass out on $if_internal to $n_servers route-to 192.168.0.1
> > > > 
> > > > pfctl will happily parse this, shove it into the kernel, let you read
> > > > the rules back out again with pfctl -sr, and it all looks plausible, but
> > > > it turns out that it's using the argument to route-to as an interface
> > > > name. because rulesets can refer to interfaces that don't exist yet, pf
> > > > just passes the IP address around as a string, hoping i'll plug in an
> > > > interface with a driver name that looks like an ip address. i spent
> > > > literally a day trying to figure out why a rule like this wasn't
> > > > working.
> > > 
> > > I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
> > > supposed to work. So either doc or implementation bug there.
> > 
> > im leaning toward implementation bug.
> > 
> > >  route  = ( "route-to" | "reply-to" | "dup-to" )
> > >   ( routehost | "{" routehost-list "}" )
> > >   [ pooltype ]
> > > 
> > >  routehost-list = routehost [ [ "," ] routehost-list ]
> > > 
> > >  routehost  = host | host "@" interface-name |
> > >   "(" interface-name [ address [ "/" mask-bits ] ] ")"
> > > 
> > > > the second problem is that the pf_route calls from pfsync don't
> > > > have all the information it is supposed to have. more specifically,
> > > > an ifp pointer isn't set which leads to a segfault. the ifp pointer
> > > > isn't set because pfsync doesnt track which interface a packet is
> > > > going out, it assumes the ip layer will get it right again later, or a
> > > > rule provided something usable.
> > > > 
> > > > the third problem is that pf_route relies on information from rules to
> > > > work correctly. this is a problem in a pfsync environment because you
> > > > cannot have the same ruleset on both firewalls 100% of the time, which
> > > > means you cannot have route-to/reply-to behave consistently on a pair of
> > > > firwalls 100% of the time.
> > > 
> > > I didn't run into this because pppoe(4) and pfsync/carp don't really
> > > go well together, but ouch!
> > > 
> > > > all of this together makes things work pretty obviously and smoothly.
> > > > in my opinion anyway. route-to now works more like rdr-to, it just
> > > > feels like it changes the address used for the route lookup rather
> > > > than changing the actual IP address in the packet. it also works
> > > > predictably in a pfsync pair, which is great from the point of view of
> > > > high availability.
> > > > 
> > > > the main caveat is that it's not backward compatible. if you're already
> > > > using route-to, you will need to tweak your rules to have them par

Re: basename(3) should have non-const arg, says POSIX

2020-10-19 Thread Todd C . Miller
On Mon, 19 Oct 2020 22:06:52 +0200, Christian Weisgerber wrote:

> The patch below aligns the function prototypes with POSIX.  All
> resulting warnings "passing 'const char *' to parameter of type
> 'char *' discards qualifiers" in the base system have been cleaned
> up.  It successfully passes "make release".  For good measure I'm
> also running a package bulk build with it as we speak.

OK millert@

 - todd



Re: basename(3) should have non-const arg, says POSIX

2020-10-19 Thread Stefan Sperling
On Mon, Oct 19, 2020 at 10:06:52PM +0200, Christian Weisgerber wrote:
> [Picking this up again after a month:]
> 
> Our basename(3) and dirname(3) take a const argument:
> 
>   char*basename(const char *);
>   char*dirname(const char *);
> 
> POSIX says otherwise...
> 
>   char *basename(char *path);
>   char *dirname(char *path);
> 
> ... and explicitly says the functions may modify the input string.
> 
> 
> Our functions were const-ified in 1999 by espie@:
> 
>   proper const semantics for dirname & basename.
>   (this follows FreeBSD and Linux. Single Unix 2 is still illogical)
> 
> Well, four years ago, FreeBSD finally switched to the POSIX prototypes
> https://svnweb.freebsd.org/base/head/include/libgen.h?revision=303451&view=markup
> and in fact now has an implementation that modifies the string.
> 
> Linux (GNU libc) has a bizarro solution where you get a const basename()
> and no dirname() if you just include , but POSIX basename()
> and dirname() if you instead or also include .
> 
> This is a portability trap.  Code written on OpenBSD may not be
> prepared for basename() or dirname() to splat a '\0' into the input
> string, despite the warning in the man page.  This is not hypothetical.
> Both Got and OpenCVS have fallen victim to the unportable assumption.
> 
> 
> The patch below aligns the function prototypes with POSIX.  All
> resulting warnings "passing 'const char *' to parameter of type
> 'char *' discards qualifiers" in the base system have been cleaned
> up.  It successfully passes "make release".  For good measure I'm
> also running a package bulk build with it as we speak.
> 
> OK?

Ok by me.

> Index: include/libgen.h
> ===
> RCS file: /cvs/src/include/libgen.h,v
> retrieving revision 1.9
> diff -u -p -r1.9 libgen.h
> --- include/libgen.h  25 Jan 2019 00:19:25 -  1.9
> +++ include/libgen.h  11 Sep 2020 20:41:34 -
> @@ -22,8 +22,8 @@
>  #include 
>  
>  __BEGIN_DECLS
> -char *basename(const char *);
> -char *dirname(const char *);
> +char *basename(char *);
> +char *dirname(char *);
>  __END_DECLS
>  
>  #endif /* _LIBGEN_H_ */
> Index: lib/libc/gen/basename.3
> ===
> RCS file: /cvs/src/lib/libc/gen/basename.3,v
> retrieving revision 1.24
> diff -u -p -r1.24 basename.3
> --- lib/libc/gen/basename.3   25 Jan 2019 00:19:25 -  1.24
> +++ lib/libc/gen/basename.3   11 Sep 2020 20:46:30 -
> @@ -23,7 +23,7 @@
>  .Sh SYNOPSIS
>  .In libgen.h
>  .Ft char *
> -.Fn basename "const char *path"
> +.Fn basename "char *path"
>  .Sh DESCRIPTION
>  The
>  .Fn basename
> Index: lib/libc/gen/basename.c
> ===
> RCS file: /cvs/src/lib/libc/gen/basename.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 basename.c
> --- lib/libc/gen/basename.c   25 Jan 2019 00:19:25 -  1.16
> +++ lib/libc/gen/basename.c   11 Sep 2020 20:43:13 -
> @@ -22,7 +22,7 @@
>  #include 
>  
>  char *
> -basename(const char *path)
> +basename(char *path)
>  {
>   static char bname[PATH_MAX];
>   size_t len;
> Index: lib/libc/gen/dirname.3
> ===
> RCS file: /cvs/src/lib/libc/gen/dirname.3,v
> retrieving revision 1.23
> diff -u -p -r1.23 dirname.3
> --- lib/libc/gen/dirname.38 Mar 2019 17:33:23 -   1.23
> +++ lib/libc/gen/dirname.311 Sep 2020 20:47:08 -
> @@ -23,7 +23,7 @@
>  .Sh SYNOPSIS
>  .In libgen.h
>  .Ft char *
> -.Fn dirname "const char *path"
> +.Fn dirname "char *path"
>  .Sh DESCRIPTION
>  The
>  .Fn dirname
> Index: lib/libc/gen/dirname.c
> ===
> RCS file: /cvs/src/lib/libc/gen/dirname.c,v
> retrieving revision 1.16
> diff -u -p -r1.16 dirname.c
> --- lib/libc/gen/dirname.c25 Jan 2019 00:19:25 -  1.16
> +++ lib/libc/gen/dirname.c11 Sep 2020 20:43:34 -
> @@ -24,7 +24,7 @@
>  /* A slightly modified copy of this file exists in libexec/ld.so */
>  
>  char *
> -dirname(const char *path)
> +dirname(char *path)
>  {
>   static char dname[PATH_MAX];
>   size_t len;
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 
> 



basename(3) should have non-const arg, says POSIX

2020-10-19 Thread Christian Weisgerber
[Picking this up again after a month:]

Our basename(3) and dirname(3) take a const argument:

  char*basename(const char *);
  char*dirname(const char *);

POSIX says otherwise...

  char *basename(char *path);
  char *dirname(char *path);

... and explicitly says the functions may modify the input string.


Our functions were const-ified in 1999 by espie@:

  proper const semantics for dirname & basename.
  (this follows FreeBSD and Linux. Single Unix 2 is still illogical)

Well, four years ago, FreeBSD finally switched to the POSIX prototypes
https://svnweb.freebsd.org/base/head/include/libgen.h?revision=303451&view=markup
and in fact now has an implementation that modifies the string.

Linux (GNU libc) has a bizarro solution where you get a const basename()
and no dirname() if you just include , but POSIX basename()
and dirname() if you instead or also include .

This is a portability trap.  Code written on OpenBSD may not be
prepared for basename() or dirname() to splat a '\0' into the input
string, despite the warning in the man page.  This is not hypothetical.
Both Got and OpenCVS have fallen victim to the unportable assumption.


The patch below aligns the function prototypes with POSIX.  All
resulting warnings "passing 'const char *' to parameter of type
'char *' discards qualifiers" in the base system have been cleaned
up.  It successfully passes "make release".  For good measure I'm
also running a package bulk build with it as we speak.

OK?


Index: include/libgen.h
===
RCS file: /cvs/src/include/libgen.h,v
retrieving revision 1.9
diff -u -p -r1.9 libgen.h
--- include/libgen.h25 Jan 2019 00:19:25 -  1.9
+++ include/libgen.h11 Sep 2020 20:41:34 -
@@ -22,8 +22,8 @@
 #include 
 
 __BEGIN_DECLS
-char   *basename(const char *);
-char   *dirname(const char *);
+char   *basename(char *);
+char   *dirname(char *);
 __END_DECLS
 
 #endif /* _LIBGEN_H_ */
Index: lib/libc/gen/basename.3
===
RCS file: /cvs/src/lib/libc/gen/basename.3,v
retrieving revision 1.24
diff -u -p -r1.24 basename.3
--- lib/libc/gen/basename.3 25 Jan 2019 00:19:25 -  1.24
+++ lib/libc/gen/basename.3 11 Sep 2020 20:46:30 -
@@ -23,7 +23,7 @@
 .Sh SYNOPSIS
 .In libgen.h
 .Ft char *
-.Fn basename "const char *path"
+.Fn basename "char *path"
 .Sh DESCRIPTION
 The
 .Fn basename
Index: lib/libc/gen/basename.c
===
RCS file: /cvs/src/lib/libc/gen/basename.c,v
retrieving revision 1.16
diff -u -p -r1.16 basename.c
--- lib/libc/gen/basename.c 25 Jan 2019 00:19:25 -  1.16
+++ lib/libc/gen/basename.c 11 Sep 2020 20:43:13 -
@@ -22,7 +22,7 @@
 #include 
 
 char *
-basename(const char *path)
+basename(char *path)
 {
static char bname[PATH_MAX];
size_t len;
Index: lib/libc/gen/dirname.3
===
RCS file: /cvs/src/lib/libc/gen/dirname.3,v
retrieving revision 1.23
diff -u -p -r1.23 dirname.3
--- lib/libc/gen/dirname.3  8 Mar 2019 17:33:23 -   1.23
+++ lib/libc/gen/dirname.3  11 Sep 2020 20:47:08 -
@@ -23,7 +23,7 @@
 .Sh SYNOPSIS
 .In libgen.h
 .Ft char *
-.Fn dirname "const char *path"
+.Fn dirname "char *path"
 .Sh DESCRIPTION
 The
 .Fn dirname
Index: lib/libc/gen/dirname.c
===
RCS file: /cvs/src/lib/libc/gen/dirname.c,v
retrieving revision 1.16
diff -u -p -r1.16 dirname.c
--- lib/libc/gen/dirname.c  25 Jan 2019 00:19:25 -  1.16
+++ lib/libc/gen/dirname.c  11 Sep 2020 20:43:34 -
@@ -24,7 +24,7 @@
 /* A slightly modified copy of this file exists in libexec/ld.so */
 
 char *
-dirname(const char *path)
+dirname(char *path)
 {
static char dname[PATH_MAX];
size_t len;
-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: Non-const basename: usr.bin/cvs

2020-10-19 Thread Todd C . Miller
On Sat, 17 Oct 2020 00:15:54 +0200, Christian Weisgerber wrote:

> Accommodate POSIX basename(3) that takes a non-const parameter and
> may modify the string buffer.
>
> There were only two compiler warnings about discarded const, but
> there are numerous instances where the code assumes non-POSIX
> semantics for basename() and dirname().  Given that there is at
> least a FreeBSD port of OpenCVS, cleaning this up is more than
> cosmetic.

OK millert@

 - todd



Re: arm64 ddb: decode "udf" instruction

2020-10-19 Thread Mark Kettenis
> Date: Mon, 19 Oct 2020 20:03:01 +0200
> From: Christian Weisgerber 
> 
> This decodes the UDF ("permanently undefined") instruction in ddb's
> arm64 disassembler.  The particular immediate16 format appears to
> be unique to this instruction.
> 
> OK?  Or don't bother?

ok kettenis@

> Index: arch/arm64/arm64/disasm.c
> ===
> RCS file: /cvs/src/sys/arch/arm64/arm64/disasm.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 disasm.c
> --- arch/arm64/arm64/disasm.c 11 Sep 2020 09:27:10 -  1.2
> +++ arch/arm64/arm64/disasm.c 19 Oct 2020 16:17:55 -
> @@ -3107,6 +3107,11 @@ OP4FUNC(op_tbz, b5, b40, imm14, Rt)
>   PRINTF("\n");
>  }
>  
> +OP1FUNC(op_udf, imm16)
> +{
> + PRINTF("udf\t#0x%"PRIx64"\n", imm16);
> +}
> +
>  OP4FUNC(op_udiv, sf, Rm, Rn, Rd)
>  {
>   PRINTF("udiv\t%s, %s, %s\n",
> @@ -3668,6 +3673,8 @@ struct insn_info {
>   {{ 5,16}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}}
>  #define FMT_IMM16_LL \
>   {{ 5,16}, { 0, 2}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}}
> +#define FMT_IMM16_UDF\
> + {{ 0,16}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}}
>  #define FMT_OP0_OP1_CRN_CRM_OP2_RT   \
>   {{19, 2}, {16, 3}, {12, 4}, { 8, 4}, { 5, 3}, { 0, 5}, { 0, 0}, { 0, 0}}
>  #define FMT_IMM7_RT2_RN_RT   \
> @@ -3786,6 +3793,7 @@ static const struct insn_info insn_table
>   { 0xd800, 0xdac1, FMT_Z_M_RN_RD,   op_pacia },
>   { 0xcc00, 0x4e284800, FMT_M_D_RN_RD,   op_simd_aes },
>   { 0x8c00, 0x5e280800, FMT_OP3_RN_RD,   op_simd_sha_reg2 },
> + { 0x, 0x, FMT_IMM16_UDF,   op_udf },
>   { 0xfff8f01f, 0xd500401f, FMT_OP1_CRM_OP2, op_msr_imm },
>   { 0xfff8, 0xd508, FMT_OP1_CRN_CRM_OP2_RT,  op_sys },
>   { 0xfff8, 0xd528, FMT_OP1_CRN_CRM_OP2_RT,  op_sysl },
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 
> 



arm64 ddb: decode "udf" instruction

2020-10-19 Thread Christian Weisgerber
This decodes the UDF ("permanently undefined") instruction in ddb's
arm64 disassembler.  The particular immediate16 format appears to
be unique to this instruction.

OK?  Or don't bother?

Index: arch/arm64/arm64/disasm.c
===
RCS file: /cvs/src/sys/arch/arm64/arm64/disasm.c,v
retrieving revision 1.2
diff -u -p -r1.2 disasm.c
--- arch/arm64/arm64/disasm.c   11 Sep 2020 09:27:10 -  1.2
+++ arch/arm64/arm64/disasm.c   19 Oct 2020 16:17:55 -
@@ -3107,6 +3107,11 @@ OP4FUNC(op_tbz, b5, b40, imm14, Rt)
PRINTF("\n");
 }
 
+OP1FUNC(op_udf, imm16)
+{
+   PRINTF("udf\t#0x%"PRIx64"\n", imm16);
+}
+
 OP4FUNC(op_udiv, sf, Rm, Rn, Rd)
 {
PRINTF("udiv\t%s, %s, %s\n",
@@ -3668,6 +3673,8 @@ struct insn_info {
{{ 5,16}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}}
 #define FMT_IMM16_LL   \
{{ 5,16}, { 0, 2}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}}
+#define FMT_IMM16_UDF  \
+   {{ 0,16}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}, { 0, 0}}
 #define FMT_OP0_OP1_CRN_CRM_OP2_RT \
{{19, 2}, {16, 3}, {12, 4}, { 8, 4}, { 5, 3}, { 0, 5}, { 0, 0}, { 0, 0}}
 #define FMT_IMM7_RT2_RN_RT \
@@ -3786,6 +3793,7 @@ static const struct insn_info insn_table
  { 0xd800, 0xdac1, FMT_Z_M_RN_RD,   op_pacia },
  { 0xcc00, 0x4e284800, FMT_M_D_RN_RD,   op_simd_aes },
  { 0x8c00, 0x5e280800, FMT_OP3_RN_RD,   op_simd_sha_reg2 },
+ { 0x, 0x, FMT_IMM16_UDF,   op_udf },
  { 0xfff8f01f, 0xd500401f, FMT_OP1_CRM_OP2, op_msr_imm },
  { 0xfff8, 0xd508, FMT_OP1_CRN_CRM_OP2_RT,  op_sys },
  { 0xfff8, 0xd528, FMT_OP1_CRN_CRM_OP2_RT,  op_sysl },
-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



utpms(4): less jumpy mouse movement

2020-10-19 Thread Tobias Heider
Hi,

i noticed that the mouse movement on my powermac can be pretty jittery at times.
One of the reasons I have identified is our use of a position change threshold.

The driver ignores all finger position changes below a certain threshold.
If the finger position change is > threshold it is used to calculate a new mouse
pointer position from the value of the change.
I think it would be better to use the difference of the finger position delta
minus threshold for those calculations, otherwise the minimal mouse position
change is also > threshold which seems wrong.

feedback, ok?

Index: utpms.c
===
RCS file: /cvs/src/sys/dev/usb/utpms.c,v
retrieving revision 1.10
diff -u -p -r1.10 utpms.c
--- utpms.c 11 Sep 2020 19:18:01 -  1.10
+++ utpms.c 19 Oct 2020 16:55:53 -
@@ -638,8 +638,8 @@ detect_pos(int *sensors, int n_sensors, 
if (sensors[i] >= threshold) {
if (i == 0 || sensors[i - 1] < threshold)
*fingers_ret += 1;
-   s += sensors[i];
-   w += sensors[i] * i;
+   s += sensors[i] - threshold;
+   w += (sensors[i] - threshold) * i;
}
}
 



Re: arm64, armv7: proper illegal instruction

2020-10-19 Thread Mark Kettenis
> From: "Theo de Raadt" 
> Date: Mon, 19 Oct 2020 09:25:30 -0600
> 
> Mark Kettenis  wrote:
> 
> > > Date: Mon, 19 Oct 2020 16:36:14 +0200
> > > From: Christian Weisgerber 
> > > 
> > > Belatedly, ARM has taken a slice of the reserved opcode space and
> > > assigned it as a properly defined illegal instruction, udf #imm16.
> > > (Armv8 Architecture Reference Manual, edition F.c, section C6.2.335).
> > > Clang already knows about it.
> > > 
> > > We really should use this instead of picking something ad-hoc out
> > > of the opcode space.
> > > 
> > > I have verified that this builds on arm64, produces a SIGILL in
> > > userland, and drops me into ddb in the kernel.
> > > 
> > > armv7 has an equivalent instruction.  kettenis@ confirms it builds
> > > and SIGILLs there.
> > > 
> > > OK?
> > 
> > So on armv7 there is an additional consideration.  The architecture
> > defines tow instruction sets: A32 and T32 (Thumb).  A32 instructions
> > are 32-bit but T32 instructions can be 16-bit.  If an attacker can
> > switch the CPU into T32 mode, it will interpret this UDF instruction
> > as two different instructions.  We may have to consider how "bad"
> > these two instructions are and maybe tune that #imm16 accordingly.
> 
> If the attacker has turned on thumb in the kernel I think all causes
> are lost ... aren't there thousands of thumb gadgets?

Probably.

So 0xa000f7f0 happens to be the 32-bit Thumb UDF instruction.  If you
jump into the middle of that you get an ADR instruction.  SoI don't
think that one was particularly "safe".  So I think naddy's patch is
fine.




Re: arm64, armv7: proper illegal instruction

2020-10-19 Thread Theo de Raadt
Mark Kettenis  wrote:

> > Date: Mon, 19 Oct 2020 16:36:14 +0200
> > From: Christian Weisgerber 
> > 
> > Belatedly, ARM has taken a slice of the reserved opcode space and
> > assigned it as a properly defined illegal instruction, udf #imm16.
> > (Armv8 Architecture Reference Manual, edition F.c, section C6.2.335).
> > Clang already knows about it.
> > 
> > We really should use this instead of picking something ad-hoc out
> > of the opcode space.
> > 
> > I have verified that this builds on arm64, produces a SIGILL in
> > userland, and drops me into ddb in the kernel.
> > 
> > armv7 has an equivalent instruction.  kettenis@ confirms it builds
> > and SIGILLs there.
> > 
> > OK?
> 
> So on armv7 there is an additional consideration.  The architecture
> defines tow instruction sets: A32 and T32 (Thumb).  A32 instructions
> are 32-bit but T32 instructions can be 16-bit.  If an attacker can
> switch the CPU into T32 mode, it will interpret this UDF instruction
> as two different instructions.  We may have to consider how "bad"
> these two instructions are and maybe tune that #imm16 accordingly.

If the attacker has turned on thumb in the kernel I think all causes
are lost ... aren't there thousands of thumb gadgets?



Re: arm64, armv7: proper illegal instruction

2020-10-19 Thread Mark Kettenis
> Date: Mon, 19 Oct 2020 16:36:14 +0200
> From: Christian Weisgerber 
> 
> Belatedly, ARM has taken a slice of the reserved opcode space and
> assigned it as a properly defined illegal instruction, udf #imm16.
> (Armv8 Architecture Reference Manual, edition F.c, section C6.2.335).
> Clang already knows about it.
> 
> We really should use this instead of picking something ad-hoc out
> of the opcode space.
> 
> I have verified that this builds on arm64, produces a SIGILL in
> userland, and drops me into ddb in the kernel.
> 
> armv7 has an equivalent instruction.  kettenis@ confirms it builds
> and SIGILLs there.
> 
> OK?

So on armv7 there is an additional consideration.  The architecture
defines tow instruction sets: A32 and T32 (Thumb).  A32 instructions
are 32-bit but T32 instructions can be 16-bit.  If an attacker can
switch the CPU into T32 mode, it will interpret this UDF instruction
as two different instructions.  We may have to consider how "bad"
these two instructions are and maybe tune that #imm16 accordingly.

> Index: lib/csu/aarch64/md_init.h
> ===
> RCS file: /cvs/src/lib/csu/aarch64/md_init.h,v
> retrieving revision 1.9
> diff -u -p -r1.9 md_init.h
> --- lib/csu/aarch64/md_init.h 15 Oct 2020 16:30:21 -  1.9
> +++ lib/csu/aarch64/md_init.h 19 Oct 2020 11:57:02 -
> @@ -115,5 +115,5 @@
>   "   svc #0  \n" \
>   "   dsb nsh \n" \
>   "   isb \n" \
> - "   .word 0xa000f7f0 /* illegal */  \n" \
> + "   udf #0  \n" \
>   ".previous");
> Index: lib/csu/arm/md_init.h
> ===
> RCS file: /cvs/src/lib/csu/arm/md_init.h,v
> retrieving revision 1.16
> diff -u -p -r1.16 md_init.h
> --- lib/csu/arm/md_init.h 15 Oct 2020 16:30:23 -  1.16
> +++ lib/csu/arm/md_init.h 19 Oct 2020 13:23:00 -
> @@ -159,5 +159,5 @@
>   "   swi #0  \n" \
>   "   dsb nsh \n" \
>   "   isb \n" \
> - "   .word 0xa000f7f0 /* illegal */  \n" \
> + "   udf #0  \n" \
>   ".previous");
> Index: lib/libc/arch/aarch64/sys/tfork_thread.S
> ===
> RCS file: /cvs/src/lib/libc/arch/aarch64/sys/tfork_thread.S,v
> retrieving revision 1.5
> diff -u -p -r1.5 tfork_thread.S
> --- lib/libc/arch/aarch64/sys/tfork_thread.S  18 Oct 2020 14:28:16 -  
> 1.5
> +++ lib/libc/arch/aarch64/sys/tfork_thread.S  19 Oct 2020 11:59:32 -
> @@ -43,6 +43,6 @@ ENTRY(__tfork_thread)
>   mov x0, x3
>   blr x2
>   SYSTRAP(__threxit)
> - .word   0xa000f7f0  /* illegal on all cpus? */
> + udf #0
>   .cfi_endproc
>  END(__tfork_thread)
> Index: lib/libc/arch/arm/sys/tfork_thread.S
> ===
> RCS file: /cvs/src/lib/libc/arch/arm/sys/tfork_thread.S,v
> retrieving revision 1.5
> diff -u -p -r1.5 tfork_thread.S
> --- lib/libc/arch/arm/sys/tfork_thread.S  18 Oct 2020 14:28:17 -  
> 1.5
> +++ lib/libc/arch/arm/sys/tfork_thread.S  19 Oct 2020 13:23:35 -
> @@ -37,5 +37,5 @@ ENTRY(__tfork_thread)
>   mov pc, r2
>   nop
>   SYSTRAP(__threxit)
> - .word   0xa000f7f0  /* illegal on all cpus? */
> + udf #0
>  END(__tfork_thread)
> Index: sys/arch/arm/arm/sigcode.S
> ===
> RCS file: /cvs/src/sys/arch/arm/arm/sigcode.S,v
> retrieving revision 1.9
> diff -u -p -r1.9 sigcode.S
> --- sys/arch/arm/arm/sigcode.S13 Mar 2020 08:46:50 -  1.9
> +++ sys/arch/arm/arm/sigcode.S19 Oct 2020 13:23:55 -
> @@ -72,7 +72,7 @@ _C_LABEL(esigcode):
>  
>   .globl  sigfill
>  sigfill:
> - .word   0xa000f7f0  /* illegal on all cpus? */
> + udf #0
>  esigfill:
>  
>   .data
> Index: sys/arch/arm64/arm64/locore.S
> ===
> RCS file: /cvs/src/sys/arch/arm64/arm64/locore.S,v
> retrieving revision 1.31
> diff -u -p -r1.31 locore.S
> --- sys/arch/arm64/arm64/locore.S 13 Mar 2020 00:14:38 -  1.31
> +++ sys/arch/arm64/arm64/locore.S 19 Oct 2020 12:02:23 -
> @@ -366,7 +366,7 @@ _C_LABEL(esigcode):
>  
>   .globl  sigfill
>  sigfill:
> - .word   0xa000f7f0  /* FIXME: illegal on all cpus? */
> + udf #0
>  esigfill:
>  
>   .data
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 
> 



Re: arm64, armv7: proper illegal instruction

2020-10-19 Thread Theo de Raadt
Thanks for finding a better instruction!

ok deraadt

Christian Weisgerber  wrote:

> Belatedly, ARM has taken a slice of the reserved opcode space and
> assigned it as a properly defined illegal instruction, udf #imm16.
> (Armv8 Architecture Reference Manual, edition F.c, section C6.2.335).
> Clang already knows about it.
> 
> We really should use this instead of picking something ad-hoc out
> of the opcode space.
> 
> I have verified that this builds on arm64, produces a SIGILL in
> userland, and drops me into ddb in the kernel.
> 
> armv7 has an equivalent instruction.  kettenis@ confirms it builds
> and SIGILLs there.
> 
> OK?
> 
> Index: lib/csu/aarch64/md_init.h
> ===
> RCS file: /cvs/src/lib/csu/aarch64/md_init.h,v
> retrieving revision 1.9
> diff -u -p -r1.9 md_init.h
> --- lib/csu/aarch64/md_init.h 15 Oct 2020 16:30:21 -  1.9
> +++ lib/csu/aarch64/md_init.h 19 Oct 2020 11:57:02 -
> @@ -115,5 +115,5 @@
>   "   svc #0  \n" \
>   "   dsb nsh \n" \
>   "   isb \n" \
> - "   .word 0xa000f7f0 /* illegal */  \n" \
> + "   udf #0  \n" \
>   ".previous");
> Index: lib/csu/arm/md_init.h
> ===
> RCS file: /cvs/src/lib/csu/arm/md_init.h,v
> retrieving revision 1.16
> diff -u -p -r1.16 md_init.h
> --- lib/csu/arm/md_init.h 15 Oct 2020 16:30:23 -  1.16
> +++ lib/csu/arm/md_init.h 19 Oct 2020 13:23:00 -
> @@ -159,5 +159,5 @@
>   "   swi #0  \n" \
>   "   dsb nsh \n" \
>   "   isb \n" \
> - "   .word 0xa000f7f0 /* illegal */  \n" \
> + "   udf #0  \n" \
>   ".previous");
> Index: lib/libc/arch/aarch64/sys/tfork_thread.S
> ===
> RCS file: /cvs/src/lib/libc/arch/aarch64/sys/tfork_thread.S,v
> retrieving revision 1.5
> diff -u -p -r1.5 tfork_thread.S
> --- lib/libc/arch/aarch64/sys/tfork_thread.S  18 Oct 2020 14:28:16 -  
> 1.5
> +++ lib/libc/arch/aarch64/sys/tfork_thread.S  19 Oct 2020 11:59:32 -
> @@ -43,6 +43,6 @@ ENTRY(__tfork_thread)
>   mov x0, x3
>   blr x2
>   SYSTRAP(__threxit)
> - .word   0xa000f7f0  /* illegal on all cpus? */
> + udf #0
>   .cfi_endproc
>  END(__tfork_thread)
> Index: lib/libc/arch/arm/sys/tfork_thread.S
> ===
> RCS file: /cvs/src/lib/libc/arch/arm/sys/tfork_thread.S,v
> retrieving revision 1.5
> diff -u -p -r1.5 tfork_thread.S
> --- lib/libc/arch/arm/sys/tfork_thread.S  18 Oct 2020 14:28:17 -  
> 1.5
> +++ lib/libc/arch/arm/sys/tfork_thread.S  19 Oct 2020 13:23:35 -
> @@ -37,5 +37,5 @@ ENTRY(__tfork_thread)
>   mov pc, r2
>   nop
>   SYSTRAP(__threxit)
> - .word   0xa000f7f0  /* illegal on all cpus? */
> + udf #0
>  END(__tfork_thread)
> Index: sys/arch/arm/arm/sigcode.S
> ===
> RCS file: /cvs/src/sys/arch/arm/arm/sigcode.S,v
> retrieving revision 1.9
> diff -u -p -r1.9 sigcode.S
> --- sys/arch/arm/arm/sigcode.S13 Mar 2020 08:46:50 -  1.9
> +++ sys/arch/arm/arm/sigcode.S19 Oct 2020 13:23:55 -
> @@ -72,7 +72,7 @@ _C_LABEL(esigcode):
>  
>   .globl  sigfill
>  sigfill:
> - .word   0xa000f7f0  /* illegal on all cpus? */
> + udf #0
>  esigfill:
>  
>   .data
> Index: sys/arch/arm64/arm64/locore.S
> ===
> RCS file: /cvs/src/sys/arch/arm64/arm64/locore.S,v
> retrieving revision 1.31
> diff -u -p -r1.31 locore.S
> --- sys/arch/arm64/arm64/locore.S 13 Mar 2020 00:14:38 -  1.31
> +++ sys/arch/arm64/arm64/locore.S 19 Oct 2020 12:02:23 -
> @@ -366,7 +366,7 @@ _C_LABEL(esigcode):
>  
>   .globl  sigfill
>  sigfill:
> - .word   0xa000f7f0  /* FIXME: illegal on all cpus? */
> + udf #0
>  esigfill:
>  
>   .data
> -- 
> Christian "naddy" Weisgerber  na...@mips.inka.de
> 



arm64, armv7: proper illegal instruction

2020-10-19 Thread Christian Weisgerber
Belatedly, ARM has taken a slice of the reserved opcode space and
assigned it as a properly defined illegal instruction, udf #imm16.
(Armv8 Architecture Reference Manual, edition F.c, section C6.2.335).
Clang already knows about it.

We really should use this instead of picking something ad-hoc out
of the opcode space.

I have verified that this builds on arm64, produces a SIGILL in
userland, and drops me into ddb in the kernel.

armv7 has an equivalent instruction.  kettenis@ confirms it builds
and SIGILLs there.

OK?

Index: lib/csu/aarch64/md_init.h
===
RCS file: /cvs/src/lib/csu/aarch64/md_init.h,v
retrieving revision 1.9
diff -u -p -r1.9 md_init.h
--- lib/csu/aarch64/md_init.h   15 Oct 2020 16:30:21 -  1.9
+++ lib/csu/aarch64/md_init.h   19 Oct 2020 11:57:02 -
@@ -115,5 +115,5 @@
"   svc #0  \n" \
"   dsb nsh \n" \
"   isb \n" \
-   "   .word 0xa000f7f0 /* illegal */  \n" \
+   "   udf #0  \n" \
".previous");
Index: lib/csu/arm/md_init.h
===
RCS file: /cvs/src/lib/csu/arm/md_init.h,v
retrieving revision 1.16
diff -u -p -r1.16 md_init.h
--- lib/csu/arm/md_init.h   15 Oct 2020 16:30:23 -  1.16
+++ lib/csu/arm/md_init.h   19 Oct 2020 13:23:00 -
@@ -159,5 +159,5 @@
"   swi #0  \n" \
"   dsb nsh \n" \
"   isb \n" \
-   "   .word 0xa000f7f0 /* illegal */  \n" \
+   "   udf #0  \n" \
".previous");
Index: lib/libc/arch/aarch64/sys/tfork_thread.S
===
RCS file: /cvs/src/lib/libc/arch/aarch64/sys/tfork_thread.S,v
retrieving revision 1.5
diff -u -p -r1.5 tfork_thread.S
--- lib/libc/arch/aarch64/sys/tfork_thread.S18 Oct 2020 14:28:16 -  
1.5
+++ lib/libc/arch/aarch64/sys/tfork_thread.S19 Oct 2020 11:59:32 -
@@ -43,6 +43,6 @@ ENTRY(__tfork_thread)
mov x0, x3
blr x2
SYSTRAP(__threxit)
-   .word   0xa000f7f0  /* illegal on all cpus? */
+   udf #0
.cfi_endproc
 END(__tfork_thread)
Index: lib/libc/arch/arm/sys/tfork_thread.S
===
RCS file: /cvs/src/lib/libc/arch/arm/sys/tfork_thread.S,v
retrieving revision 1.5
diff -u -p -r1.5 tfork_thread.S
--- lib/libc/arch/arm/sys/tfork_thread.S18 Oct 2020 14:28:17 -  
1.5
+++ lib/libc/arch/arm/sys/tfork_thread.S19 Oct 2020 13:23:35 -
@@ -37,5 +37,5 @@ ENTRY(__tfork_thread)
mov pc, r2
nop
SYSTRAP(__threxit)
-   .word   0xa000f7f0  /* illegal on all cpus? */
+   udf #0
 END(__tfork_thread)
Index: sys/arch/arm/arm/sigcode.S
===
RCS file: /cvs/src/sys/arch/arm/arm/sigcode.S,v
retrieving revision 1.9
diff -u -p -r1.9 sigcode.S
--- sys/arch/arm/arm/sigcode.S  13 Mar 2020 08:46:50 -  1.9
+++ sys/arch/arm/arm/sigcode.S  19 Oct 2020 13:23:55 -
@@ -72,7 +72,7 @@ _C_LABEL(esigcode):
 
.globl  sigfill
 sigfill:
-   .word   0xa000f7f0  /* illegal on all cpus? */
+   udf #0
 esigfill:
 
.data
Index: sys/arch/arm64/arm64/locore.S
===
RCS file: /cvs/src/sys/arch/arm64/arm64/locore.S,v
retrieving revision 1.31
diff -u -p -r1.31 locore.S
--- sys/arch/arm64/arm64/locore.S   13 Mar 2020 00:14:38 -  1.31
+++ sys/arch/arm64/arm64/locore.S   19 Oct 2020 12:02:23 -
@@ -366,7 +366,7 @@ _C_LABEL(esigcode):
 
.globl  sigfill
 sigfill:
-   .word   0xa000f7f0  /* FIXME: illegal on all cpus? */
+   udf #0
 esigfill:
 
.data
-- 
Christian "naddy" Weisgerber  na...@mips.inka.de



Re: mixerctl names

2020-10-19 Thread Alexandre Ratchov
On Sat, Oct 17, 2020 at 05:52:58PM +0200, Jan Stary wrote:
> Currently, mixerctl.conf(5) says
> 
>   Most devices have a number of digital to analogue converters
>   (DACs), used for sound playback, and each DAC has a corresponding
>   output mixer. The mixers are labelled “mix” or “sel”.
> 
> That doesn't seem to be the case, at least not universaly
> as the wording seems to imply. For example, this is
> mixerctl output on a Thinkpad T400:
> 
>   inputs.dac-0:1=222,222
>   inputs.dac-2:3=222,222
>   inputs.beep=0
>   record.adc-2:3_source=mic2
>   record.adc-2:3=219,219
>   record.adc-0:1_source=mic
>   record.adc-0:1=219,219
>   outputs.hp_source=dac-0:1
>   outputs.hp_boost=on
>   inputs.mic=189,189
>   outputs.mic_dir=input-vr80
>   outputs.spkr_source=dac-2:3
>   outputs.spkr_eapd=on
>   inputs.mic2=189,189
>   outputs.hp_sense=unplugged
>   outputs.mic_sense=unplugged
>   outputs.master=240,240
>   outputs.master.mute=off
>   outputs.master.slaves=
>   record.volume=240,240
>   record.volume.mute=off
>   record.volume.slaves=
>   record.enable=sysctl
> 
> Apparently, it has two DACS (for the speakers and the headphones).
> The current wording might confuse the user into thinking he has
> no output mixer, but the
> 
>   inputs.dac-0:1=222,222
>   inputs.dac-2:3=222,222
> 
> do control the respective volumes,
> while no "mix" or "sel" exists.
> 
> Similarly for recording via the two ADCs.
> 

It depends on the hardware. Basically, azalia exposes the schematic of
the mixer and, in turn, the driver exposes all "widgets" to the upper
layers, with a generated name. This is very similar to uaudio.

Certain devices have no "mix_xxx" widgets others have. On one of my
machines:

$ doas mixerctl | egrep '(inputs|outputs).mix' | wc -l 
  20

desktop computers have many jacks and tend to have many
mixers. The "sel_xxx" widgets are rare but still exist.



Re: pf route-to issues

2020-10-19 Thread Stuart Henderson
On 2020/10/19 19:53, David Gwynne wrote:
> On Mon, Oct 19, 2020 at 09:34:31AM +0100, Stuart Henderson wrote:
> > On 2020/10/19 15:35, David Gwynne wrote:
> > > every few years i try and use route-to in pf, and every time it
> > > goes badly. i tried it again last week in a slightly different
> > > setting, and actually tried to understand the sharp edges i hit
> > > this time instead of giving up. it turns out there are 2 or 3
> > > different things together that have cause me trouble, which is why
> > > the diff below is so big.
> > 
> > I used to route-to/reply-to quite a lot at places with poor internet
> > connections to split traffic between lines (mostly those have better
> > connections now so I don't need it as often). It worked as I expected -
> > but I only ever used it with the interface specified.
> 
> cool. did it work beyond the first packet in a connection?

It must have done. The webcams would have utterly broken the rest of
traffic if it hadn't :)

> > I mostly used it with pppoe interfaces so the peer address was unknown
> > at ruleset load time. (I was lucky and had static IPs my side, but the
> > ISP side was variable). I relied on the fact that once packets are
> > directed at a point-point interface there's only one place for them to
> > go. I didn't notice that ":peer" might be useful here (and the syntax
> > 'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
> > have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
> > (though actually I think it works with $any_random_address@pppoeX).
> 
> yes. i was trying to use it with peers over ethernet, and always
> struggled with the syntax.
> 
> > > the first and i would argue most fundamental problem is a semantic
> > > problem. if you ask a random person who has some clue about networks
> > > and routing what they would expect the "argument" to route-to or
> > > reply-to to be, they would say "a nexthop address" or "a gateway
> > > address". eg, say i want to force packets to a specific backend
> > > server without using NAT, i would write a rule like this:
> > > 
> > >   n_servers="192.0.2.128/27"
> > >   pass out on $if_internal to $n_servers route-to 192.168.0.1
> > > 
> > > pfctl will happily parse this, shove it into the kernel, let you read
> > > the rules back out again with pfctl -sr, and it all looks plausible, but
> > > it turns out that it's using the argument to route-to as an interface
> > > name. because rulesets can refer to interfaces that don't exist yet, pf
> > > just passes the IP address around as a string, hoping i'll plug in an
> > > interface with a driver name that looks like an ip address. i spent
> > > literally a day trying to figure out why a rule like this wasn't
> > > working.
> > 
> > I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
> > supposed to work. So either doc or implementation bug there.
> 
> im leaning toward implementation bug.
> 
> >  route  = ( "route-to" | "reply-to" | "dup-to" )
> >   ( routehost | "{" routehost-list "}" )
> >   [ pooltype ]
> > 
> >  routehost-list = routehost [ [ "," ] routehost-list ]
> > 
> >  routehost  = host | host "@" interface-name |
> >   "(" interface-name [ address [ "/" mask-bits ] ] ")"
> > 
> > > the second problem is that the pf_route calls from pfsync don't
> > > have all the information it is supposed to have. more specifically,
> > > an ifp pointer isn't set which leads to a segfault. the ifp pointer
> > > isn't set because pfsync doesnt track which interface a packet is
> > > going out, it assumes the ip layer will get it right again later, or a
> > > rule provided something usable.
> > > 
> > > the third problem is that pf_route relies on information from rules to
> > > work correctly. this is a problem in a pfsync environment because you
> > > cannot have the same ruleset on both firewalls 100% of the time, which
> > > means you cannot have route-to/reply-to behave consistently on a pair of
> > > firwalls 100% of the time.
> > 
> > I didn't run into this because pppoe(4) and pfsync/carp don't really
> > go well together, but ouch!
> > 
> > > all of this together makes things work pretty obviously and smoothly.
> > > in my opinion anyway. route-to now works more like rdr-to, it just
> > > feels like it changes the address used for the route lookup rather
> > > than changing the actual IP address in the packet. it also works
> > > predictably in a pfsync pair, which is great from the point of view of
> > > high availability.
> > > 
> > > the main caveat is that it's not backward compatible. if you're already
> > > using route-to, you will need to tweak your rules to have them parse.
> > > however, i doubt anyone is using this stuff because it feels very broken
> > > to me.
> > 
> > Do you expect this to work with a bracketed "address" to defer lookup
> > until rule evaluation time? i.e.
> > 
> > pass out proto tcp to any p

Re: pf route-to issues

2020-10-19 Thread David Gwynne
On Mon, Oct 19, 2020 at 12:28:19PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> > > 
> > > it seems to me 'route-to vs. pfsync' still needs more thought.  the
> > > next-hop IP address in route-to may be different for each PF box
> > > linked by pfsync(4). To be honest I have no answer to address this
> > > issue at the moment.
> > 
> > i have thought about that a little bit. we could play with what the
> > argument to route-to means. rather than requiring it to be a directly
> > connected host/gateway address, we could interpret it as a destination
> > address, and use the gateway for that destination as the nexthop.
> > 
> > eg, if i have the following routing table on frontend a:
> > 
> > Internet:
> > DestinationGatewayFlags   Refs  Use   Mtu  Prio 
> > Iface
> > default192.168.96.33  UGS6   176171 - 8 
> > vmx0 
> > 224/4  127.0.0.1  URS00 32768 8 lo0 
> >  
> > 10.0.0.0/30192.168.0.1UGS00 - 8 
> > gre0 
> 
> > 
> > and this routing table on frontend b:
> > 
> > Internet:
> > DestinationGatewayFlags   Refs  Use   Mtu  Prio 
> > Iface
> > default192.168.96.33  UGS987548 - 8 
> > aggr0
> > 224/4  127.0.0.1  URS00 32768 8 lo0 
> >  
> > 10.0.0.0/30192.168.0.3UGS00 - 8 
> > gre0 
> 
> > 
> > if gre0 on both frontends pointed at different legs on the same backend
> > server, i could write a pf rule like this:
> > 
> > pass out to port 80 route-to 10.0.0.1
> > 
> > 10.0.0.1 would then end up as the rt_addr field in pf_state and
> > pfsync_state.
> > 
> > both frontend a and b would lookup the route to 10.0.0.1, and then
> > use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
> > both would end up pushing the packet over their gre link to the
> > same backend. the same semantic would work if the link to the backend
> > was over ethernet instead of a tunnel.
> > 
> > > > thoughts?
> > > > 
> > > 
> > > What you've said makes sense. However I still feel pfsync(4)
> > > does not play well with route-to.
> > 
> > maybe your opinion is different if the above makes sense?
> > 
> 
> Thanks for detailed explanation. This is good enough to make me happy.
> The remaining questions on this are sort of 'homework for me' to poke
> to PF source code, for example:
>   are we doing route look up for every packet? or route look up
>   is performed when state is created/imported? (and we cache
>   outbound interface + next-hop along the state)

the "destination" address is determined when the state is created and
stored in rt_addr if pf_state. the route lookup using that address is
done per packet in pf_route.

>   also what happens when route does not exist on pfsync peer, which
>   receives state? How admin will discover state failed to import?

the route lookup or interface lookup fails and the packet is dropped.
there are no counters to show this is happening though :(

> Anyway, your plan above looks solid to me now. It's certainly more 
> flexible
> (?reliable?) to select route to particular destination, than using pair of
> interface,next-hop.

cool, i'll keep working on it then.

> 
> thanks and
> regards
> sashan



Re: pf route-to issues

2020-10-19 Thread Alexandr Nedvedicky
Hello,


> > 
> > it seems to me 'route-to vs. pfsync' still needs more thought.  the
> > next-hop IP address in route-to may be different for each PF box
> > linked by pfsync(4). To be honest I have no answer to address this
> > issue at the moment.
> 
> i have thought about that a little bit. we could play with what the
> argument to route-to means. rather than requiring it to be a directly
> connected host/gateway address, we could interpret it as a destination
> address, and use the gateway for that destination as the nexthop.
> 
> eg, if i have the following routing table on frontend a:
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default192.168.96.33  UGS6   176171 - 8 vmx0 
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 10.0.0.0/30192.168.0.1UGS00 - 8 gre0 

> 
> and this routing table on frontend b:
> 
> Internet:
> DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
> default192.168.96.33  UGS987548 - 8 aggr0
> 224/4  127.0.0.1  URS00 32768 8 lo0  
> 10.0.0.0/30192.168.0.3UGS00 - 8 gre0 

> 
> if gre0 on both frontends pointed at different legs on the same backend
> server, i could write a pf rule like this:
> 
> pass out to port 80 route-to 10.0.0.1
> 
> 10.0.0.1 would then end up as the rt_addr field in pf_state and
> pfsync_state.
> 
> both frontend a and b would lookup the route to 10.0.0.1, and then
> use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
> both would end up pushing the packet over their gre link to the
> same backend. the same semantic would work if the link to the backend
> was over ethernet instead of a tunnel.
> 
> > > thoughts?
> > > 
> > 
> > What you've said makes sense. However I still feel pfsync(4)
> > does not play well with route-to.
> 
> maybe your opinion is different if the above makes sense?
> 

Thanks for detailed explanation. This is good enough to make me happy.
The remaining questions on this are sort of 'homework for me' to poke
to PF source code, for example:
are we doing route look up for every packet? or route look up
is performed when state is created/imported? (and we cache
outbound interface + next-hop along the state)

also what happens when route does not exist on pfsync peer, which
receives state? How admin will discover state failed to import?

Anyway, your plan above looks solid to me now. It's certainly more flexible
(?reliable?) to select route to particular destination, than using pair of
interface,next-hop.

thanks and
regards
sashan



Re: pf route-to issues

2020-10-19 Thread David Gwynne
On Mon, Oct 19, 2020 at 09:34:31AM +0100, Stuart Henderson wrote:
> On 2020/10/19 15:35, David Gwynne wrote:
> > every few years i try and use route-to in pf, and every time it
> > goes badly. i tried it again last week in a slightly different
> > setting, and actually tried to understand the sharp edges i hit
> > this time instead of giving up. it turns out there are 2 or 3
> > different things together that have cause me trouble, which is why
> > the diff below is so big.
> 
> I used to route-to/reply-to quite a lot at places with poor internet
> connections to split traffic between lines (mostly those have better
> connections now so I don't need it as often). It worked as I expected -
> but I only ever used it with the interface specified.

cool. did it work beyond the first packet in a connection?

> I mostly used it with pppoe interfaces so the peer address was unknown
> at ruleset load time. (I was lucky and had static IPs my side, but the
> ISP side was variable). I relied on the fact that once packets are
> directed at a point-point interface there's only one place for them to
> go. I didn't notice that ":peer" might be useful here (and the syntax
> 'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
> have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
> (though actually I think it works with $any_random_address@pppoeX).

yes. i was trying to use it with peers over ethernet, and always
struggled with the syntax.

> > the first and i would argue most fundamental problem is a semantic
> > problem. if you ask a random person who has some clue about networks
> > and routing what they would expect the "argument" to route-to or
> > reply-to to be, they would say "a nexthop address" or "a gateway
> > address". eg, say i want to force packets to a specific backend
> > server without using NAT, i would write a rule like this:
> > 
> >   n_servers="192.0.2.128/27"
> >   pass out on $if_internal to $n_servers route-to 192.168.0.1
> > 
> > pfctl will happily parse this, shove it into the kernel, let you read
> > the rules back out again with pfctl -sr, and it all looks plausible, but
> > it turns out that it's using the argument to route-to as an interface
> > name. because rulesets can refer to interfaces that don't exist yet, pf
> > just passes the IP address around as a string, hoping i'll plug in an
> > interface with a driver name that looks like an ip address. i spent
> > literally a day trying to figure out why a rule like this wasn't
> > working.
> 
> I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
> supposed to work. So either doc or implementation bug there.

im leaning toward implementation bug.

>  route  = ( "route-to" | "reply-to" | "dup-to" )
>   ( routehost | "{" routehost-list "}" )
>   [ pooltype ]
> 
>  routehost-list = routehost [ [ "," ] routehost-list ]
> 
>  routehost  = host | host "@" interface-name |
>   "(" interface-name [ address [ "/" mask-bits ] ] ")"
> 
> > the second problem is that the pf_route calls from pfsync don't
> > have all the information it is supposed to have. more specifically,
> > an ifp pointer isn't set which leads to a segfault. the ifp pointer
> > isn't set because pfsync doesnt track which interface a packet is
> > going out, it assumes the ip layer will get it right again later, or a
> > rule provided something usable.
> > 
> > the third problem is that pf_route relies on information from rules to
> > work correctly. this is a problem in a pfsync environment because you
> > cannot have the same ruleset on both firewalls 100% of the time, which
> > means you cannot have route-to/reply-to behave consistently on a pair of
> > firwalls 100% of the time.
> 
> I didn't run into this because pppoe(4) and pfsync/carp don't really
> go well together, but ouch!
> 
> > all of this together makes things work pretty obviously and smoothly.
> > in my opinion anyway. route-to now works more like rdr-to, it just
> > feels like it changes the address used for the route lookup rather
> > than changing the actual IP address in the packet. it also works
> > predictably in a pfsync pair, which is great from the point of view of
> > high availability.
> > 
> > the main caveat is that it's not backward compatible. if you're already
> > using route-to, you will need to tweak your rules to have them parse.
> > however, i doubt anyone is using this stuff because it feels very broken
> > to me.
> 
> Do you expect this to work with a bracketed "address" to defer lookup
> until rule evaluation time? i.e.
> 
> pass out proto tcp to any port 22 route-to (pppoe1:peer)

in my opinion route-to should be able to take whatever rdr-to accepts.
however, i just tried it and it doesnt currently work, but i'm sure i
can figure it out.

> I think that will be all that's needed to allow converting the pppoe
> use case. I don't have a multiple pppoe setup handy bu

Re: pf route-to issues

2020-10-19 Thread David Gwynne
On Mon, Oct 19, 2020 at 09:46:19AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> disclaimer: I have no chance to run pfsync on production, I'm very
> inexperienced with pfsync(4).

i wrote the defer code in pfsync, and i think i wrote the code in
pfsync that calls pf_route badly, so noones perfect :)

> 
> > 
> > the third problem is that pf_route relies on information from rules to
> > work correctly. this is a problem in a pfsync environment because you
> > cannot have the same ruleset on both firewalls 100% of the time, which
> > means you cannot have route-to/reply-to behave consistently on a pair of
> > firwalls 100% of the time.
> > 
> > my solution to both these problems is reduce the amount of information
> > pf_route needs to work with, to make sure that the info it does need
> > is in the pf state structure, and that pfsync handles it properly.
> > 
> > if we limit the information needed for pf_route to a nexthop address,
> > and which direction the address is used, this is doable. both the
> > pf_state and pfsync_state structs already contain an address to store a
> > nexthop in, i just had to move the route-to direction from the rule into
> > the state. this is easy with pf_state, but i used a spare pad field in
> > pfsync_state for this.
> > 
> 
> it seems to me 'route-to vs. pfsync' still needs more thought.  the
> next-hop IP address in route-to may be different for each PF box
> linked by pfsync(4). To be honest I have no answer to address this
> issue at the moment.

i have thought about that a little bit. we could play with what the
argument to route-to means. rather than requiring it to be a directly
connected host/gateway address, we could interpret it as a destination
address, and use the gateway for that destination as the nexthop.

eg, if i have the following routing table on frontend a:

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default192.168.96.33  UGS6   176171 - 8 vmx0 
224/4  127.0.0.1  URS00 32768 8 lo0  
10.0.0.0/30192.168.0.1UGS00 - 8 gre0 
127/8  127.0.0.1  UGRS   00 32768 8 lo0  
127.0.0.1  127.0.0.1  UHhl   1  142 32768 1 lo0  
192.168.0.0192.168.0.0UHl00 - 1 gre0 
192.168.0.1192.168.0.0UHh11 - 8 gre0 
192.168.96.32/27   192.168.96.34  UCn2   122849 - 4 vmx0 
192.168.96.33  00:00:5e:00:01:47  UHLch  114611 - 3 vmx0 
192.168.96.34  00:50:56:a1:73:91  UHLl   0   362231 - 1 vmx0 
192.168.96.60  fe:e1:ba:d0:74:ef  UHLc   059690 - 3 vmx0 
192.168.96.63  192.168.96.34  UHb00 - 1 vmx0 

and this routing table on frontend b:

Internet:
DestinationGatewayFlags   Refs  Use   Mtu  Prio Iface
default192.168.96.33  UGS987548 - 8 aggr0
224/4  127.0.0.1  URS00 32768 8 lo0  
10.0.0.0/30192.168.0.3UGS00 - 8 gre0 
127/8  127.0.0.1  UGRS   00 32768 8 lo0  
127.0.0.1  127.0.0.1  UHhl   1   62 32768 1 lo0  
192.168.0.2192.168.0.2UHl00 - 1 gre0 
192.168.0.3192.168.0.2UHh11 - 8 gre0 
192.168.96.32/27   192.168.96.55  UCn362186 - 4 aggr0
192.168.96.33  00:00:5e:00:01:47  UHLch  1 7442 - 3 aggr0
192.168.96.35  00:23:42:d0:56:8e  UHLc   0 1905 - 3 aggr0
192.168.96.55  fe:e1:ba:d0:e0:83  UHLl   0   178119 - 1 aggr0
192.168.96.60  fe:e1:ba:d0:74:ef  UHLc   031021 - 3 aggr0
192.168.96.63  192.168.96.55  UHb00 - 1 aggr0

if gre0 on both frontends pointed at different legs on the same backend
server, i could write a pf rule like this:

pass out to port 80 route-to 10.0.0.1

10.0.0.1 would then end up as the rt_addr field in pf_state and
pfsync_state.

both frontend a and b would lookup the route to 10.0.0.1, and then
use 192.168.0.1 and 192.168.0.3 as the gateway address respectively.
both would end up pushing the packet over their gre link to the
same backend. the same semantic would work if the link to the backend
was over ethernet instead of a tunnel.

> > thoughts?
> > 
> 
> What you've said makes sense. However I still feel pfsync(4)
> does not play well with route-to.

maybe your opinion is different if the above makes sense?

> thanks and
> regards
> sashan

no, thank you for reading my long email.

cheers,
dlg



Re: net.inet.ip.forwarding=0 vs lo(4)

2020-10-19 Thread David Gwynne
On Mon, Oct 19, 2020 at 10:03:29AM +0100, Stuart Henderson wrote:
> On 2020/10/19 11:47, David Gwynne wrote:
> > On Sun, Oct 18, 2020 at 08:57:34PM +0100, Stuart Henderson wrote:
> > > On 2020/10/18 14:04, David Gwynne wrote:
> > > > the problem i'm hitting is that i have a multihomed box where the
> > > > service it provides listens on an IP address that's assigned to lo1.
> > > > it's a host running a service, it's not a router, so the
> > > > net.inet.ip.forwarding sysctl is not set to 1.
> > > 
> > > I ran into this, I just turned on the forwarding sysctl to avoid the
> > > problem.
> > > 
> > > > i came up with this diff, which adds even more special casing for
> > > > loopback interfaces. it says addreesses on loopbacks are globally
> > > > reachable, even if ip forwarding is disabled.
> > > 
> > > I don't see why loopbacks should be special. Another place this
> > > might show up is services running on carp addresses (I haven't updated
> > > those machines yet but there's a fair chance they'll be affected too).
> > > I would prefer an explicit sysctl to disable "strong host model".
> > 
> > loopback is already special. if a packet comes from an loopback
> > interface, we allow it to talk to any IP on the local machine. i think
> > this is mostly to cope with the semantic we've had where local traffic
> > get's tied to a loopback interface instead of going anywhere near the
> > physical ones.
> > 
> > carp is also special.
> > 
> > let me paste the ip_laddr function instead of the diff to it, it's a bit
> > more obvious what's going on:
> 
> Thanks, that will already work for the machines I was thinking of then.
> 
> > back to loopback and receiving packets. loopback is special because it
> > is not connected to the outside world. it is impossible to send a packet
> > via a loopback interface from another host, so configuring a globally
> > (externally) routable IP on it is currently pointless unless you enable
> > forwarding. i think making loopback more special and allowing it
> > to be globally reachable makes sense. i can't think of any downsides to
> > this at the moment, except that the behaviour would be subtle/not
> > obvious
> 
> ok, so it makes sense for this to be independent of any possible
> separate lever.
> 
> > is there a need to configure a globally reachable IP on a non-loopback
> > interface on a host (not router)? if so, then i'd be more convinced that
> > we need a separate lever to pull.
> 
> I'm not using it this way, but here's a scenario.
> 
> Say there are a couple of webservers with addresses from a carp on
> ethernet/vlan, with a link to their upstream router on some separate
> interface. They announce the carp prefix into ospf.

so carp is just being used to elect a webserver as a master, and then
the result of that election is fed upstream.

> They aren't routing themselves so the only reason to have forwarding=1
> is to have them use "weak host model".
> 
> With forwarding=0 I think they'll have to use "stub router no" otherwise
> everything will be announced high metric (rather than being dependent on
> carp state), but ospfd explicitly handles this; it's marked in parse.y
> with "/* allow to force non stub mode */".

so is a Big Global Lever what you want here? if you enable weak host
mode, all IPs on the host will be addressible from all legs of the
host. would it make more sense to configure specific interfaces as
holding globally addressible IPs?

if my understanding of your scenario is right, you could configure
the carp interface with the weak or globally accessible flag. in
my situation i could configure that on lo1.

dlg



Re: net.inet.ip.forwarding=0 vs lo(4)

2020-10-19 Thread Stuart Henderson
On 2020/10/19 11:47, David Gwynne wrote:
> On Sun, Oct 18, 2020 at 08:57:34PM +0100, Stuart Henderson wrote:
> > On 2020/10/18 14:04, David Gwynne wrote:
> > > the problem i'm hitting is that i have a multihomed box where the
> > > service it provides listens on an IP address that's assigned to lo1.
> > > it's a host running a service, it's not a router, so the
> > > net.inet.ip.forwarding sysctl is not set to 1.
> > 
> > I ran into this, I just turned on the forwarding sysctl to avoid the
> > problem.
> > 
> > > i came up with this diff, which adds even more special casing for
> > > loopback interfaces. it says addreesses on loopbacks are globally
> > > reachable, even if ip forwarding is disabled.
> > 
> > I don't see why loopbacks should be special. Another place this
> > might show up is services running on carp addresses (I haven't updated
> > those machines yet but there's a fair chance they'll be affected too).
> > I would prefer an explicit sysctl to disable "strong host model".
> 
> loopback is already special. if a packet comes from an loopback
> interface, we allow it to talk to any IP on the local machine. i think
> this is mostly to cope with the semantic we've had where local traffic
> get's tied to a loopback interface instead of going anywhere near the
> physical ones.
> 
> carp is also special.
> 
> let me paste the ip_laddr function instead of the diff to it, it's a bit
> more obvious what's going on:

Thanks, that will already work for the machines I was thinking of then.

> back to loopback and receiving packets. loopback is special because it
> is not connected to the outside world. it is impossible to send a packet
> via a loopback interface from another host, so configuring a globally
> (externally) routable IP on it is currently pointless unless you enable
> forwarding. i think making loopback more special and allowing it
> to be globally reachable makes sense. i can't think of any downsides to
> this at the moment, except that the behaviour would be subtle/not
> obvious

ok, so it makes sense for this to be independent of any possible
separate lever.

> is there a need to configure a globally reachable IP on a non-loopback
> interface on a host (not router)? if so, then i'd be more convinced that
> we need a separate lever to pull.

I'm not using it this way, but here's a scenario.

Say there are a couple of webservers with addresses from a carp on
ethernet/vlan, with a link to their upstream router on some separate
interface. They announce the carp prefix into ospf.

They aren't routing themselves so the only reason to have forwarding=1
is to have them use "weak host model".

With forwarding=0 I think they'll have to use "stub router no" otherwise
everything will be announced high metric (rather than being dependent on
carp state), but ospfd explicitly handles this; it's marked in parse.y
with "/* allow to force non stub mode */".



kqueue_scan() refactoring

2020-10-19 Thread Martin Pieuchot
Diff below is the second part of the refactoring [0] that got committed
then reverted 6 months ago.

The idea of this refactoring is to move the accounting and markers used
when collecting events to a context (struct kqueue_scan_state) such that
the collecting routine (kqueue_scan()) can be called multiple times.

Please test this especially with X and report back.

[0] https://marc.info/?l=openbsd-tech&m=158739418817156&w=2

Index: kern/kern_event.c
===
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.143
diff -u -p -r1.143 kern_event.c
--- kern/kern_event.c   11 Oct 2020 07:11:59 -  1.143
+++ kern/kern_event.c   19 Oct 2020 08:47:00 -
@@ -905,7 +905,6 @@ kqueue_scan(struct kqueue_scan_state *sc
 
nkev = 0;
kevp = kev;
-
count = maxevents;
if (count == 0)
goto done;
@@ -921,7 +920,8 @@ retry:
 
s = splhigh();
if (kq->kq_count == 0) {
-   if (tsp != NULL && !timespecisset(tsp)) {
+   if ((tsp != NULL && !timespecisset(tsp)) ||
+   scan->kqs_nevent != 0) {
splx(s);
error = 0;
goto done;
@@ -937,18 +937,32 @@ retry:
goto done;
}
 
-   TAILQ_INSERT_TAIL(&kq->kq_head, &scan->kqs_end, kn_tqe);
+   /*
+* Put the end marker in the queue to limit the scan to the events
+* that are currently active.  This prevents events from being
+* recollected if they reactivate during scan.
+*
+* If a partial scan has been performed already but no events have
+* been collected, reposition the end marker to make any new events
+* reachable.
+*/
+   if (!scan->kqs_queued) {
+   TAILQ_INSERT_TAIL(&kq->kq_head, &scan->kqs_end, kn_tqe);
+   scan->kqs_queued = 1;
+   } else if (scan->kqs_nevent == 0) {
+   TAILQ_REMOVE(&kq->kq_head, &scan->kqs_end, kn_tqe);
+   TAILQ_INSERT_TAIL(&kq->kq_head, &scan->kqs_end, kn_tqe);
+   }
+
TAILQ_INSERT_HEAD(&kq->kq_head, &scan->kqs_start, kn_tqe);
while (count) {
kn = TAILQ_NEXT(&scan->kqs_start, kn_tqe);
if (kn->kn_filter == EVFILT_MARKER) {
if (kn == &scan->kqs_end) {
-   TAILQ_REMOVE(&kq->kq_head, &scan->kqs_end,
-   kn_tqe);
TAILQ_REMOVE(&kq->kq_head, &scan->kqs_start,
kn_tqe);
splx(s);
-   if (count == maxevents)
+   if (scan->kqs_nevent == 0)
goto retry;
goto done;
}
@@ -984,6 +998,9 @@ retry:
*kevp = kn->kn_kevent;
kevp++;
nkev++;
+   count--;
+   scan->kqs_nevent++;
+
if (kn->kn_flags & EV_ONESHOT) {
splx(s);
kn->kn_fop->f_detach(kn);
@@ -1009,7 +1026,6 @@ retry:
knote_release(kn);
}
kqueue_check(kq);
-   count--;
if (nkev == KQ_NEVENTS) {
splx(s);
 #ifdef KTRACE
@@ -1026,7 +1042,6 @@ retry:
break;
}
}
-   TAILQ_REMOVE(&kq->kq_head, &scan->kqs_end, kn_tqe);
TAILQ_REMOVE(&kq->kq_head, &scan->kqs_start, kn_tqe);
splx(s);
 done:
@@ -1059,15 +1074,21 @@ void
 kqueue_scan_finish(struct kqueue_scan_state *scan)
 {
struct kqueue *kq = scan->kqs_kq;
+   int s;
 
KASSERT(scan->kqs_start.kn_filter == EVFILT_MARKER);
KASSERT(scan->kqs_start.kn_status == KN_PROCESSING);
KASSERT(scan->kqs_end.kn_filter == EVFILT_MARKER);
KASSERT(scan->kqs_end.kn_status == KN_PROCESSING);
 
+   if (scan->kqs_queued) {
+   scan->kqs_queued = 0;
+   s = splhigh();
+   TAILQ_REMOVE(&kq->kq_head, &scan->kqs_end, kn_tqe);
+   splx(s);
+   }
KQRELE(kq);
 }
-
 
 /*
  * XXX
Index: sys/event.h
===
RCS file: /cvs/src/sys/sys/event.h,v
retrieving revision 1.46
diff -u -p -r1.46 event.h
--- sys/event.h 11 Oct 2020 07:11:59 -  1.46
+++ sys/event.h 19 Oct 2020 08:39:51 -
@@ -204,6 +204,9 @@ struct kqueue_scan_state {
struct kqueue   *kqs_kq;/* kqueue of this scan */
struct knote kqs_start; /* start marker */
struct knote kqs_end;   /* end marker */
+   int  kqs_nevent;/* number of events collected */
+   int  kqs_queued;/*

Re: pf route-to issues

2020-10-19 Thread Stuart Henderson
On 2020/10/19 15:35, David Gwynne wrote:
> every few years i try and use route-to in pf, and every time it
> goes badly. i tried it again last week in a slightly different
> setting, and actually tried to understand the sharp edges i hit
> this time instead of giving up. it turns out there are 2 or 3
> different things together that have cause me trouble, which is why
> the diff below is so big.

I used to route-to/reply-to quite a lot at places with poor internet
connections to split traffic between lines (mostly those have better
connections now so I don't need it as often). It worked as I expected -
but I only ever used it with the interface specified.

I mostly used it with pppoe interfaces so the peer address was unknown
at ruleset load time. (I was lucky and had static IPs my side, but the
ISP side was variable). I relied on the fact that once packets are
directed at a point-point interface there's only one place for them to
go. I didn't notice that ":peer" might be useful here (and the syntax
'route-to pppoe1:peer@pppoe1' is pretty awkward so I probably wouldn't
have come up with it), I had 0.0.0.1@pppoe1, 0.0.0.2@pppoe2 etc
(though actually I think it works with $any_random_address@pppoeX).

> the first and i would argue most fundamental problem is a semantic
> problem. if you ask a random person who has some clue about networks
> and routing what they would expect the "argument" to route-to or
> reply-to to be, they would say "a nexthop address" or "a gateway
> address". eg, say i want to force packets to a specific backend
> server without using NAT, i would write a rule like this:
> 
>   n_servers="192.0.2.128/27"
>   pass out on $if_internal to $n_servers route-to 192.168.0.1
> 
> pfctl will happily parse this, shove it into the kernel, let you read
> the rules back out again with pfctl -sr, and it all looks plausible, but
> it turns out that it's using the argument to route-to as an interface
> name. because rulesets can refer to interfaces that don't exist yet, pf
> just passes the IP address around as a string, hoping i'll plug in an
> interface with a driver name that looks like an ip address. i spent
> literally a day trying to figure out why a rule like this wasn't
> working.

I don't think I tried this, but the pf.conf(5) BNF syntax suggests it's
supposed to work. So either doc or implementation bug there.

 route  = ( "route-to" | "reply-to" | "dup-to" )
  ( routehost | "{" routehost-list "}" )
  [ pooltype ]

 routehost-list = routehost [ [ "," ] routehost-list ]

 routehost  = host | host "@" interface-name |
  "(" interface-name [ address [ "/" mask-bits ] ] ")"

> the second problem is that the pf_route calls from pfsync don't
> have all the information it is supposed to have. more specifically,
> an ifp pointer isn't set which leads to a segfault. the ifp pointer
> isn't set because pfsync doesnt track which interface a packet is
> going out, it assumes the ip layer will get it right again later, or a
> rule provided something usable.
> 
> the third problem is that pf_route relies on information from rules to
> work correctly. this is a problem in a pfsync environment because you
> cannot have the same ruleset on both firewalls 100% of the time, which
> means you cannot have route-to/reply-to behave consistently on a pair of
> firwalls 100% of the time.

I didn't run into this because pppoe(4) and pfsync/carp don't really
go well together, but ouch!

> all of this together makes things work pretty obviously and smoothly.
> in my opinion anyway. route-to now works more like rdr-to, it just
> feels like it changes the address used for the route lookup rather
> than changing the actual IP address in the packet. it also works
> predictably in a pfsync pair, which is great from the point of view of
> high availability.
> 
> the main caveat is that it's not backward compatible. if you're already
> using route-to, you will need to tweak your rules to have them parse.
> however, i doubt anyone is using this stuff because it feels very broken
> to me.

Do you expect this to work with a bracketed "address" to defer lookup
until rule evaluation time? i.e.

pass out proto tcp to any port 22 route-to (pppoe1:peer)

I think that will be all that's needed to allow converting the pppoe
use case. I don't have a multiple pppoe setup handy but I can probably
hack together some sort of test.

I've also used route-to with squid "transparent" proxying (shown in
the pkg-readme), I don't do that any more but I can put a squid test
together easily enough.

> @@ -1842,37 +1833,18 @@ pfrule: action dir logquick interface 
>   decide_address_family($7.src.host, &r.af);
>   decide_address_family($7.dst.host, &r.af);
>  
> - if ($8.route.rt) {
> + if ($8.rt) {
> + if ($8.rt != PF_DUPTO && !r.direction) {
> 

Re: uao_init() cleanup

2020-10-19 Thread Mark Kettenis
> Date: Mon, 19 Oct 2020 10:00:49 +0200
> From: Martin Pieuchot 
> 
> uao_init() is called from uvm_km_init() which itself is called by
> uvm_init().  None of the *init() functions in UVM have a guard, so be
> coherent and remove this one.
> 
> ok?

Yes, that seems redundant.

ok kettenis@

> Index: uvm/uvm_aobj.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_aobj.c,v
> retrieving revision 1.87
> diff -u -p -r1.87 uvm_aobj.c
> --- uvm/uvm_aobj.c22 Sep 2020 14:31:08 -  1.87
> +++ uvm/uvm_aobj.c13 Oct 2020 09:25:20 -
> @@ -788,12 +788,6 @@ uao_create(vsize_t size, int flags)
>  void
>  uao_init(void)
>  {
> - static int uao_initialized;
> -
> - if (uao_initialized)
> - return;
> - uao_initialized = TRUE;
> -
>   /*
>* NOTE: Pages for this pool must not come from a pageable
>* kernel map!
> 
> 



UVM fault check

2020-10-19 Thread Martin Pieuchot
uvm_fault() is one of the most contended "entry point" of the kernel.
To reduce this contention I'm carefully refactoring this code to be able
to push the KERNEL_LOCK() inside the fault handler.

The first aim of this project would be to get the upper layer faults
(cases 1A and 1B) out of ze big lock.  As these faults do not involve
`uobj' the scope of this project should be limited to serializing amap
changes without the KERNEL_LOCK().

The diff below moves the first part of uvm_fault() into its own
function: uvm_fault_check().  It is inspired by/imitates the current
code structure of NetBSD's fault handler. 

This diff should not have any functional change.

I hope it helps build better understanding of this area.

Comments?  Oks?

Index: uvm/uvm_fault.c
===
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.102
diff -u -p -r1.102 uvm_fault.c
--- uvm/uvm_fault.c 29 Sep 2020 11:47:41 -  1.102
+++ uvm/uvm_fault.c 12 Oct 2020 09:01:05 -
@@ -472,114 +472,101 @@ uvmfault_update_stats(struct uvm_faultin
}
 }
 
-/*
- *   F A U L T   -   m a i n   e n t r y   p o i n t
- */
+struct uvm_faultctx {
+   /*
+* the following members are set up by uvm_fault_check() and
+* read-only after that.
+*/
+   vm_prot_t enter_prot;
+   vaddr_t startva;
+   int npages;
+   int centeridx;
+   boolean_t narrow;
+   boolean_t wired;
+   paddr_t pa_flags;
+};
 
 /*
- * uvm_fault: page fault handler
+ * uvm_fault_check: check prot, handle needs-copy, etc.
  *
- * => called from MD code to resolve a page fault
- * => VM data structures usually should be unlocked.   however, it is
- * possible to call here with the main map locked if the caller
- * gets a write lock, sets it recursive, and then calls us (c.f.
- * uvm_map_pageable).   this should be avoided because it keeps
- * the map locked off during I/O.
+ * 1. lookup entry.
+ * 2. check protection.
+ * 3. adjust fault condition (mainly for simulated fault).
+ * 4. handle needs-copy (lazy amap copy).
+ * 5. establish range of interest for neighbor fault (aka pre-fault).
+ * 6. look up anons (if amap exists).
+ * 7. flush pages (if MADV_SEQUENTIAL)
+ *
+ * => called with nothing locked.
+ * => if we fail (result != 0) we unlock everything.
+ * => initialize/adjust many members of flt.
  */
-#define MASK(entry) (UVM_ET_ISCOPYONWRITE(entry) ? \
-~PROT_WRITE : PROT_MASK)
 int
-uvm_fault(vm_map_t orig_map, vaddr_t vaddr, vm_fault_t fault_type,
-vm_prot_t access_type)
+uvm_fault_check(struct uvm_faultinfo *ufi, struct uvm_faultctx *flt,
+struct vm_anon ***ranons, vm_prot_t access_type)
 {
-   struct uvm_faultinfo ufi;
-   vm_prot_t enter_prot;
-   boolean_t wired, narrow, promote, locked, shadowed;
-   int npages, nback, nforw, centeridx, result, lcv, gotpages, ret;
-   vaddr_t startva, currva;
-   voff_t uoff;
-   paddr_t pa, pa_flags;
struct vm_amap *amap;
struct uvm_object *uobj;
-   struct vm_anon *anons_store[UVM_MAXRANGE], **anons, *anon, *oanon;
-   struct vm_page *pages[UVM_MAXRANGE], *pg, *uobjpage;
+   int nback, nforw;
 
-   anon = NULL;
-   pg = NULL;
-
-   uvmexp.faults++;/* XXX: locking? */
-   TRACEPOINT(uvm, fault, vaddr, fault_type, access_type, NULL);
-
-   /* init the IN parameters in the ufi */
-   ufi.orig_map = orig_map;
-   ufi.orig_rvaddr = trunc_page(vaddr);
-   ufi.orig_size = PAGE_SIZE;  /* can't get any smaller than this */
-   if (fault_type == VM_FAULT_WIRE)
-   narrow = TRUE;  /* don't look for neighborhood
-* pages on wire */
-   else
-   narrow = FALSE; /* normal fault */
-
-   /* "goto ReFault" means restart the page fault from ground zero. */
-ReFault:
/* lookup and lock the maps */
-   if (uvmfault_lookup(&ufi, FALSE) == FALSE) {
+   if (uvmfault_lookup(ufi, FALSE) == FALSE) {
return (EFAULT);
}
 
 #ifdef DIAGNOSTIC
-   if ((ufi.map->flags & VM_MAP_PAGEABLE) == 0)
+   if ((ufi->map->flags & VM_MAP_PAGEABLE) == 0)
panic("uvm_fault: fault on non-pageable map (%p, 0x%lx)",
-   ufi.map, vaddr);
+   ufi->map, ufi->orig_rvaddr);
 #endif
 
/* check protection */
-   if ((ufi.entry->protection & access_type) != access_type) {
-   uvmfault_unlockmaps(&ufi, FALSE);
+   if ((ufi->entry->protection & access_type) != access_type) {
+   uvmfault_unlockmaps(ufi, FALSE);
return (EACCES);
}
 
/*
 * "enter_prot" is the protection we want to enter the page in at.
 * for certain pages (e.g. copy-on-write pages) this protection can
-* be more strict than ufi.en

uao_init() cleanup

2020-10-19 Thread Martin Pieuchot
uao_init() is called from uvm_km_init() which itself is called by
uvm_init().  None of the *init() functions in UVM have a guard, so be
coherent and remove this one.

ok?

Index: uvm/uvm_aobj.c
===
RCS file: /cvs/src/sys/uvm/uvm_aobj.c,v
retrieving revision 1.87
diff -u -p -r1.87 uvm_aobj.c
--- uvm/uvm_aobj.c  22 Sep 2020 14:31:08 -  1.87
+++ uvm/uvm_aobj.c  13 Oct 2020 09:25:20 -
@@ -788,12 +788,6 @@ uao_create(vsize_t size, int flags)
 void
 uao_init(void)
 {
-   static int uao_initialized;
-
-   if (uao_initialized)
-   return;
-   uao_initialized = TRUE;
-
/*
 * NOTE: Pages for this pool must not come from a pageable
 * kernel map!



Re: pf route-to issues

2020-10-19 Thread Alexandr Nedvedicky
Hello,

disclaimer: I have no chance to run pfsync on production, I'm very
inexperienced with pfsync(4).


> 
> the third problem is that pf_route relies on information from rules to
> work correctly. this is a problem in a pfsync environment because you
> cannot have the same ruleset on both firewalls 100% of the time, which
> means you cannot have route-to/reply-to behave consistently on a pair of
> firwalls 100% of the time.
> 
> my solution to both these problems is reduce the amount of information
> pf_route needs to work with, to make sure that the info it does need
> is in the pf state structure, and that pfsync handles it properly.
> 
> if we limit the information needed for pf_route to a nexthop address,
> and which direction the address is used, this is doable. both the
> pf_state and pfsync_state structs already contain an address to store a
> nexthop in, i just had to move the route-to direction from the rule into
> the state. this is easy with pf_state, but i used a spare pad field in
> pfsync_state for this.
> 

it seems to me 'route-to vs. pfsync' still needs more thought.  the
next-hop IP address in route-to may be different for each PF box
linked by pfsync(4). To be honest I have no answer to address this
issue at the moment.

> 
> thoughts?
> 

What you've said makes sense. However I still feel pfsync(4)
does not play well with route-to.


thanks and
regards
sashan