Re: hid_is_collection() sync with NetBSD

2021-01-11 Thread Marcus Glocker
On Sun, 10 Jan 2021 00:04:21 +0100
Marcus Glocker  wrote:

> On Sat, 9 Jan 2021 21:51:03 +0100 (CET)
> Mark Kettenis  wrote:
> 
> > > Date: Sat, 9 Jan 2021 10:38:20 +0100
> > > From: Marcus Glocker 
> > > 
> > > I have not much clue about HID, but when we did some testing for
> > > the new ujoy(4) driver it turned out that the PS4 controller
> > > doesn't get handled correctly by hid.c:hid_is_collection().  This
> > > made me peek in to the NetBSD code where I could find an update
> > > in this function.
> > > 
> > > Syncing it up makes the PS4 controller attachment work correctly,
> > > while not breaking my other HID devices.
> > > 
> > > The change was introduced in NetBSD with hid.c revision 1.25 on
> > > the 19.06.2006, obviously as part of a bigger change, not saying
> > > something about the hid_is_collection() update itself:
> > > 
> > > --
> > > 
> > > Initial import of bluetooth stack on behalf of Iain Hibbert.
> > > (plunky@, NetBSD Foundation Membership still pending.)  This stack
> > > was written by Iain under sponsorship from Itronix Inc.
> > > 
> > > The stack includes support for rfcomm networking (networking via
> > > your bluetooth enabled cell phone), hid devices (keyboards/mice),
> > > and headsets.
> > > 
> > > Drivers for both PCMCIA and USB bluetooth controllers are
> > > included.
> > > 
> > > --
> > > 
> > > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/usb/Attic/hid.c.diff?r1=1.24=1.25_with_tag=MAIN
> > > 
> > > More testers?  Comments?  OKs?
> > 
> > Probably needs some testing with touchpad and touchscreen devices.  
> 
> Hmm, yes.  I just quickly tested this on my X1 with ums(4)
> touchscreen. Beside that it still works, the diff seems to fix
> something in the ums attach.  Before the diff:
> 
> uhidev0 at uhub0 port 10 configuration 1 interface 0 "Raydium
> Corporation Raydium Touch System" rev 2.01/1.08 addr 4 uhidev0:
> iclass 3/0, 68 report ids uhid0 at uhidev0 reportid 1: input=0,
> output=63, feature=0 uhid1 at uhidev0 reportid 2: input=63, output=0,
> feature=0 uhid2 at uhidev0 reportid 3: input=0, output=63, feature=0
> uhid3 at uhidev0 reportid 4: input=0, output=63, feature=0
> uhid4 at uhidev0 reportid 5: input=0, output=63, feature=0
> uhid5 at uhidev0 reportid 6: input=63, output=0, feature=0
> uhid6 at uhidev0 reportid 7: input=0, output=63, feature=0
> uhid7 at uhidev0 reportid 8: input=0, output=63, feature=0
> uhid8 at uhidev0 reportid 9: input=63, output=0, feature=0
> ums0 at uhidev0 reportid 10: 1 button, tip
> wsmouse2 at ums0 mux 0
> uhid9 at uhidev0 reportid 11: input=0, output=0, feature=1
> uhid10 at uhidev0 reportid 12: input=0, output=0, feature=255
> uhid11 at uhidev0 reportid 13: input=0, output=0, feature=255
> ums1 at uhidev0 reportid 68
> ums1: mouse has no X report
> 
> I didn't notice before, but ums1 seems not to make sense, since there
> is no second ums device attached here.
> 
> After the diff the reportid 68 just turns in another uhid of uhidev0
> which makes more sense I guess:
> 
> uhidev0 at uhub0 port 10 configuration 1 interface 0 "Raydium
> Corporation Raydium Touch System" rev 2.01/1.08 addr 4 uhidev0:
> iclass 3/0, 68 report ids uhid0 at uhidev0 reportid 1: input=0,
> output=63, feature=0 uhid1 at uhidev0 reportid 2: input=63, output=0,
> feature=0 uhid2 at uhidev0 reportid 3: input=0, output=63, feature=0
> uhid3 at uhidev0 reportid 4: input=0, output=63, feature=0
> uhid4 at uhidev0 reportid 5: input=0, output=63, feature=0
> uhid5 at uhidev0 reportid 6: input=63, output=0, feature=0
> uhid6 at uhidev0 reportid 7: input=0, output=63, feature=0
> uhid7 at uhidev0 reportid 8: input=0, output=63, feature=0
> uhid8 at uhidev0 reportid 9: input=63, output=0, feature=0
> ums0 at uhidev0 reportid 10: 1 button, tip
> wsmouse2 at ums0 mux 0
> uhid9 at uhidev0 reportid 11: input=0, output=0, feature=1
> uhid10 at uhidev0 reportid 12: input=0, output=0, feature=255
> uhid11 at uhidev0 reportid 13: input=0, output=0, feature=255
> uhid12 at uhidev0 reportid 68: input=0, output=0, feature=255

jcs@ found that it breaks imt(4), and maybe ims(4).  So unfortunately
that doesn't seem to be a solution either.

> > > Index: dev/hid/hid.c
> > > ===
> > > RCS file: /cvs/src/sys/dev/hid/hid.c,v
> > > retrieving revision 1.3
> > > diff -u -p -u -p -r1.3 hid.c
> > > --- dev/hid/hid.c 4 Jun 2020 23:03:43 -   1.3
> > > +++ dev/hid/hid.c 9 Jan 2021 08:43:30 -
> > > @@ -630,6 +630,25 @@ hid_get_udata(const uint8_t *buf, int le
> > >  return (hid_get_data_sub(buf, len, loc, 0));
> > >  }
> > >  
> > > +/*
> > > + * hid_is_collection(desc, size, id, usage)
> > > + *
> > > + * This function is broken in the following way.
> > > + *
> > > + * It is used to discover if the given 'id' is part of 'usage'
> > > collection
> > > + * in the descriptor in order to match report id against device
> > > type.
> > > + *
> > > + * The semantics of hid_start_parse() means though, 

Wireguard: can't remove multiple peers at once.

2021-01-11 Thread Yuichiro NAITO


Hi.

I have set up multiple peers in a wg0 interface,
and tried to remove more than one peers at once.
Ifconfig(1) only removes the first peer.

Command line was like following.

```
# ifconfig wg0 -wgpeer  -wgpeer  -wgpeer 
```

Only  was removed.

I think next peer pointer isn't calculated in case of removing peer
in sys/net/if_wg.c: wg_ioctl_set() function.

I have tried following patch that can fix this problem.
Is it OK?

diff --git a/sys/net/if_wg.c b/sys/net/if_wg.c
index c534f966363..c21f883269f 100644
--- a/sys/net/if_wg.c
+++ b/sys/net/if_wg.c
@@ -2270,7 +2270,7 @@ wg_ioctl_set(struct wg_softc *sc, struct wg_data_io *data)
 
/* Peer must have public key */
if (!(peer_o.p_flags & WG_PEER_HAS_PUBLIC))
-   continue;
+   goto next_peer;
 
/* 0 = latest protocol, 1 = this protocol */
if (peer_o.p_protocol_version != 0) {
@@ -2283,7 +2283,7 @@ wg_ioctl_set(struct wg_softc *sc, struct wg_data_io *data)
/* Get local public and check that peer key doesn't match */
if (noise_local_keys(>sc_local, public, NULL) == 0 &&
bcmp(public, peer_o.p_public, WG_KEY_SIZE) == 0)
-   continue;
+   goto next_peer;
 
/* Lookup peer, or create if it doesn't exist */
if ((peer = wg_peer_lookup(sc, peer_o.p_public)) == NULL) {
@@ -2291,7 +2291,7 @@ wg_ioctl_set(struct wg_softc *sc, struct wg_data_io *data)
 * Also, don't create a new one if we only want to
 * update. */
if (peer_o.p_flags & (WG_PEER_REMOVE|WG_PEER_UPDATE))
-   continue;
+   goto next_peer;
 
if ((peer = wg_peer_create(sc,
peer_o.p_public)) == NULL) {
@@ -2303,7 +2303,7 @@ wg_ioctl_set(struct wg_softc *sc, struct wg_data_io *data)
/* Remove peer and continue if specified */
if (peer_o.p_flags & WG_PEER_REMOVE) {
wg_peer_destroy(peer);
-   continue;
+   goto next_peer;
}
 
if (peer_o.p_flags & WG_PEER_HAS_ENDPOINT)
@@ -2333,6 +2333,11 @@ wg_ioctl_set(struct wg_softc *sc, struct wg_data_io 
*data)
}
 
peer_p = (struct wg_peer_io *)aip_p;
+   continue;
+   next_peer:
+   aip_p = _p->p_aips[0];
+   aip_p += peer_o.p_aips_count;
+   peer_p = (struct wg_peer_io *)aip_p;
}
 
 error:

—
Yuichiro NAITO
naito.yuich...@gmail.com



Re: pf log user and group

2021-01-11 Thread Vitaliy Makkoveev
ok mvs@

> On 11 Jan 2021, at 19:49, Alexander Bluhm  wrote:
> 
> Hi,
> 
> Sometimes an uid is logged in pflog(4) although the logopt of the
> rule does not specify it.  Check the option again for the log rule
> in case another rule has triggered a socket lookup.  Remove logopt
> group, it is not documented and cannot work as struct pfloghdr does
> not contain a gid.  Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to
> express what it does.  The lookup involved is only an implemntation
> detail.
> 
> ok?
> 
> bluhm
> 
> Index: sys/net/if_pflog.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pflog.c,v
> retrieving revision 1.91
> diff -u -p -r1.91 if_pflog.c
> --- sys/net/if_pflog.c28 Aug 2020 12:01:48 -  1.91
> +++ sys/net/if_pflog.c11 Jan 2021 14:44:55 -
> @@ -253,9 +253,9 @@ pflog_packet(struct pf_pdesc *pd, u_int8
>   strlcpy(hdr.ruleset, ruleset->anchor->name,
>   sizeof(hdr.ruleset));
>   }
> - if (trigger->log & PF_LOG_SOCKET_LOOKUP && !pd->lookup.done)
> + if (trigger->log & PF_LOG_USER && !pd->lookup.done)
>   pd->lookup.done = pf_socket_lookup(pd);
> - if (pd->lookup.done > 0) {
> + if (trigger->log & PF_LOG_USER && pd->lookup.done > 0) {
>   hdr.uid = pd->lookup.uid;
>   hdr.pid = pd->lookup.pid;
>   } else {
> Index: sys/net/pfvar.h
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfvar.h,v
> retrieving revision 1.497
> diff -u -p -r1.497 pfvar.h
> --- sys/net/pfvar.h   14 Oct 2020 19:22:14 -  1.497
> +++ sys/net/pfvar.h   11 Jan 2021 14:46:54 -
> @@ -156,7 +156,7 @@ enum  { PF_ADDR_ADDRMASK, PF_ADDR_NOROUTE
> 
> #define   PF_LOG  0x01
> #define   PF_LOG_ALL  0x02
> -#define  PF_LOG_SOCKET_LOOKUP0x04
> +#define  PF_LOG_USER 0x04
> #define   PF_LOG_FORCE0x08
> #define   PF_LOG_MATCHES  0x10
> 
> Index: sbin/pfctl/parse.y
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sbin/pfctl/parse.y,v
> retrieving revision 1.707
> diff -u -p -r1.707 parse.y
> --- sbin/pfctl/parse.y16 Dec 2020 18:01:16 -  1.707
> +++ sbin/pfctl/parse.y11 Jan 2021 14:44:46 -
> @@ -2409,8 +2409,7 @@ logopts : logopt{ $$ = 
> $1; }
> 
> logopt: ALL   { $$.log = PF_LOG_ALL; $$.logif = 0; }
>   | MATCHES   { $$.log = PF_LOG_MATCHES; $$.logif = 0; }
> - | USER  { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; }
> - | GROUP { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; }
> + | USER  { $$.log = PF_LOG_USER; $$.logif = 0; }
>   | TO string {
>   const char  *errstr;
>   u_inti;
> Index: sbin/pfctl/pfctl_parser.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sbin/pfctl/pfctl_parser.c,v
> retrieving revision 1.344
> diff -u -p -r1.344 pfctl_parser.c
> --- sbin/pfctl/pfctl_parser.c 29 Dec 2020 19:50:28 -  1.344
> +++ sbin/pfctl/pfctl_parser.c 11 Jan 2021 14:44:26 -
> @@ -795,7 +795,7 @@ print_rule(struct pf_rule *r, const char
>   printf("%sall", count++ ? ", " : "");
>   if (r->log & PF_LOG_MATCHES)
>   printf("%smatches", count++ ? ", " : "");
> - if (r->log & PF_LOG_SOCKET_LOOKUP)
> + if (r->log & PF_LOG_USER)
>   printf("%suser", count++ ? ", " : "");
>   if (r->logif)
>   printf("%sto pflog%u", count++ ? ", " : "",
> 



Re: pf log user and group

2021-01-11 Thread Alexandr Nedvedicky
Hello,

looks good to me.

OK sashan

On Mon, Jan 11, 2021 at 05:49:10PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> Sometimes an uid is logged in pflog(4) although the logopt of the
> rule does not specify it.  Check the option again for the log rule
> in case another rule has triggered a socket lookup.  Remove logopt
> group, it is not documented and cannot work as struct pfloghdr does
> not contain a gid.  Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to
> express what it does.  The lookup involved is only an implemntation
> detail.
> 
> ok?
> 
> bluhm
> 



Re: pf log user and group

2021-01-11 Thread Klemens Nanni
On Mon, Jan 11, 2021 at 05:49:10PM +0100, Alexander Bluhm wrote:
> Sometimes an uid is logged in pflog(4) although the logopt of the
> rule does not specify it.  Check the option again for the log rule
> in case another rule has triggered a socket lookup.  Remove logopt
> group, it is not documented and cannot work as struct pfloghdr does
> not contain a gid.  Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to
> express what it does.  The lookup involved is only an implemntation
> detail.
OK kn



pf log user and group

2021-01-11 Thread Alexander Bluhm
Hi,

Sometimes an uid is logged in pflog(4) although the logopt of the
rule does not specify it.  Check the option again for the log rule
in case another rule has triggered a socket lookup.  Remove logopt
group, it is not documented and cannot work as struct pfloghdr does
not contain a gid.  Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to
express what it does.  The lookup involved is only an implemntation
detail.

ok?

bluhm

Index: sys/net/if_pflog.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pflog.c,v
retrieving revision 1.91
diff -u -p -r1.91 if_pflog.c
--- sys/net/if_pflog.c  28 Aug 2020 12:01:48 -  1.91
+++ sys/net/if_pflog.c  11 Jan 2021 14:44:55 -
@@ -253,9 +253,9 @@ pflog_packet(struct pf_pdesc *pd, u_int8
strlcpy(hdr.ruleset, ruleset->anchor->name,
sizeof(hdr.ruleset));
}
-   if (trigger->log & PF_LOG_SOCKET_LOOKUP && !pd->lookup.done)
+   if (trigger->log & PF_LOG_USER && !pd->lookup.done)
pd->lookup.done = pf_socket_lookup(pd);
-   if (pd->lookup.done > 0) {
+   if (trigger->log & PF_LOG_USER && pd->lookup.done > 0) {
hdr.uid = pd->lookup.uid;
hdr.pid = pd->lookup.pid;
} else {
Index: sys/net/pfvar.h
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/pfvar.h,v
retrieving revision 1.497
diff -u -p -r1.497 pfvar.h
--- sys/net/pfvar.h 14 Oct 2020 19:22:14 -  1.497
+++ sys/net/pfvar.h 11 Jan 2021 14:46:54 -
@@ -156,7 +156,7 @@ enum{ PF_ADDR_ADDRMASK, PF_ADDR_NOROUTE
 
 #definePF_LOG  0x01
 #definePF_LOG_ALL  0x02
-#definePF_LOG_SOCKET_LOOKUP0x04
+#definePF_LOG_USER 0x04
 #definePF_LOG_FORCE0x08
 #definePF_LOG_MATCHES  0x10
 
Index: sbin/pfctl/parse.y
===
RCS file: /data/mirror/openbsd/cvs/src/sbin/pfctl/parse.y,v
retrieving revision 1.707
diff -u -p -r1.707 parse.y
--- sbin/pfctl/parse.y  16 Dec 2020 18:01:16 -  1.707
+++ sbin/pfctl/parse.y  11 Jan 2021 14:44:46 -
@@ -2409,8 +2409,7 @@ logopts   : logopt{ $$ = 
$1; }
 
 logopt : ALL   { $$.log = PF_LOG_ALL; $$.logif = 0; }
| MATCHES   { $$.log = PF_LOG_MATCHES; $$.logif = 0; }
-   | USER  { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; }
-   | GROUP { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; }
+   | USER  { $$.log = PF_LOG_USER; $$.logif = 0; }
| TO string {
const char  *errstr;
u_inti;
Index: sbin/pfctl/pfctl_parser.c
===
RCS file: /data/mirror/openbsd/cvs/src/sbin/pfctl/pfctl_parser.c,v
retrieving revision 1.344
diff -u -p -r1.344 pfctl_parser.c
--- sbin/pfctl/pfctl_parser.c   29 Dec 2020 19:50:28 -  1.344
+++ sbin/pfctl/pfctl_parser.c   11 Jan 2021 14:44:26 -
@@ -795,7 +795,7 @@ print_rule(struct pf_rule *r, const char
printf("%sall", count++ ? ", " : "");
if (r->log & PF_LOG_MATCHES)
printf("%smatches", count++ ? ", " : "");
-   if (r->log & PF_LOG_SOCKET_LOOKUP)
+   if (r->log & PF_LOG_USER)
printf("%suser", count++ ? ", " : "");
if (r->logif)
printf("%sto pflog%u", count++ ? ", " : "",



OpenBSD Errata: January 11th, 2021 (nd6)

2021-01-11 Thread Sebastian Benoit
Errata patches for the kernel have been released for OpenBSD 6.7 and 6.8.

When an NDP entry is invalidated the associated layer 2 address is not
invalidated.

Binary updates for the amd64, i386, and arm64 platforms are available via
the syspatch utility. Source code patches can be found on the respective
errata page:

  https://www.openbsd.org/errata67.html
  https://www.openbsd.org/errata68.html

As these affect the kernel, a reboot will be needed after patching.



Re: uvm_fault: amap & anon locking

2021-01-11 Thread Mark Kettenis
> Date: Mon, 11 Jan 2021 11:35:17 -0300
> From: Martin Pieuchot 
> 
> On 31/12/20(Thu) 22:35, Mark Kettenis wrote:
> > > Date: Wed, 30 Dec 2020 11:19:41 -0300
> > > From: Martin Pieuchot 
> > > 
> > > Diff below adds some locking to UVM's amap & anon data structures that
> > > should be enough to get the upper part of the fault handler out of the
> > > KERNEL_LOCK().
> > > 
> > > This diff doesn't unlock the fault handler, I'd suggest to do this in a
> > > later step on an arch by arch basis.
> > > 
> > > This is a port of what exists in NetBSD.  A rwlock is attached to every
> > > amap and is shared with all its anon.  The same lock will be used by
> > > multiple amaps if they have anons in common.  This diff includes the new
> > > rw_obj_* API required to have reference-counted rwlocks.
> > > 
> > > Other than that a global rwlock is used to protect the list of amap and
> > > many pool have been converted to use a rwlock internally to not create
> > > lock ordering problem when allocations are made while holding a rwlock.
> > 
> > Can you explain what those lock odering problems are?  To me it seems
> > that a pool mutex should always be taken last, and that the pool
> > system doesn't need to enter the amap or anon code.
> 
> This is the same problem prevented by IPL_MPFLOOR but with a different
> approach.  Taking an interrupt whose handler grab the KERNEL_LOCK() while
> holding a mutex can lead to a deadlock if this mutex can be grabbed with
> the KERNEL_LOCK() held.
> 
> Using PR_RWLOCK was the approach introduced in guenther@'s diff to
> unlock mmap(2).  The way I understand it is that this code should only be
> used in process contexts and that's why PR_RWLOCK got introduced.

The downside of the PR_RWLOCK approach is that it introduces
additional sleeping points.  I think that's best avoided "deep" in the
kernel like the amap and anon implementation.

> This leads to me question whether PR_RWLOCK is worth the complexity.  Since
> then its used has spread across the tree.
> 
> > Removing the check in pool_get that PR_WAITOK is set for pools created
> > with PR_RWLOCK is a bit problematic from my perspective.  At the very
> > least we should adjust the man page.
> 
> This is necessary if we want to use PR_RWLOCK since uvm_analloc() does
> not sleep to be able to handle out-of-RAM conditions in the fault
> handler.
> 
> The alternative would be to use IPL_MPFLOOR when allocating anon & amap,
> this has the advantage of documenting that this IPL change is only
> "temporary".
> 
> Diff below does that.  If we agree on this approach I'd like to start
> collecting oks, for parts or for the whole.

I'd be happier with this approach.  Are you confident the switch
doesn't invalidate the testing that we have done already?

ok kettenis@

> Index: kern/init_main.c
> ===
> RCS file: /cvs/src/sys/kern/init_main.c,v
> retrieving revision 1.304
> diff -u -p -r1.304 init_main.c
> --- kern/init_main.c  1 Jan 2021 07:00:33 -   1.304
> +++ kern/init_main.c  11 Jan 2021 11:59:38 -
> @@ -232,6 +232,7 @@ main(void *framep)
>   KERNEL_LOCK_INIT();
>   SCHED_LOCK_INIT();
>  
> + rw_obj_init();
>   uvm_init();
>   disk_init();/* must come before autoconfiguration */
>   tty_init(); /* initialise tty's */
> Index: kern/kern_rwlock.c
> ===
> RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
> retrieving revision 1.45
> diff -u -p -r1.45 kern_rwlock.c
> --- kern/kern_rwlock.c2 Mar 2020 17:07:49 -   1.45
> +++ kern/kern_rwlock.c11 Jan 2021 13:59:07 -
> @@ -19,6 +19,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -487,4 +488,124 @@ int
>  rrw_status(struct rrwlock *rrwl)
>  {
>   return (rw_status(>rrwl_lock));
> +}
> +
> +/*-
> + * Copyright (c) 2008 The NetBSD Foundation, Inc.
> + * All rights reserved.
> + *
> + * This code is derived from software contributed to The NetBSD Foundation
> + * by Andrew Doran.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
> + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
> LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> + * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
> + * BE 

Re: uvm_fault: amap & anon locking

2021-01-11 Thread Martin Pieuchot
On 31/12/20(Thu) 22:35, Mark Kettenis wrote:
> > Date: Wed, 30 Dec 2020 11:19:41 -0300
> > From: Martin Pieuchot 
> > 
> > Diff below adds some locking to UVM's amap & anon data structures that
> > should be enough to get the upper part of the fault handler out of the
> > KERNEL_LOCK().
> > 
> > This diff doesn't unlock the fault handler, I'd suggest to do this in a
> > later step on an arch by arch basis.
> > 
> > This is a port of what exists in NetBSD.  A rwlock is attached to every
> > amap and is shared with all its anon.  The same lock will be used by
> > multiple amaps if they have anons in common.  This diff includes the new
> > rw_obj_* API required to have reference-counted rwlocks.
> > 
> > Other than that a global rwlock is used to protect the list of amap and
> > many pool have been converted to use a rwlock internally to not create
> > lock ordering problem when allocations are made while holding a rwlock.
> 
> Can you explain what those lock odering problems are?  To me it seems
> that a pool mutex should always be taken last, and that the pool
> system doesn't need to enter the amap or anon code.

This is the same problem prevented by IPL_MPFLOOR but with a different
approach.  Taking an interrupt whose handler grab the KERNEL_LOCK() while
holding a mutex can lead to a deadlock if this mutex can be grabbed with
the KERNEL_LOCK() held.

Using PR_RWLOCK was the approach introduced in guenther@'s diff to
unlock mmap(2).  The way I understand it is that this code should only be
used in process contexts and that's why PR_RWLOCK got introduced.

This leads to me question whether PR_RWLOCK is worth the complexity.  Since
then its used has spread across the tree.

> Removing the check in pool_get that PR_WAITOK is set for pools created
> with PR_RWLOCK is a bit problematic from my perspective.  At the very
> least we should adjust the man page.

This is necessary if we want to use PR_RWLOCK since uvm_analloc() does
not sleep to be able to handle out-of-RAM conditions in the fault
handler.

The alternative would be to use IPL_MPFLOOR when allocating anon & amap,
this has the advantage of documenting that this IPL change is only
"temporary".

Diff below does that.  If we agree on this approach I'd like to start
collecting oks, for parts or for the whole.


Index: kern/init_main.c
===
RCS file: /cvs/src/sys/kern/init_main.c,v
retrieving revision 1.304
diff -u -p -r1.304 init_main.c
--- kern/init_main.c1 Jan 2021 07:00:33 -   1.304
+++ kern/init_main.c11 Jan 2021 11:59:38 -
@@ -232,6 +232,7 @@ main(void *framep)
KERNEL_LOCK_INIT();
SCHED_LOCK_INIT();
 
+   rw_obj_init();
uvm_init();
disk_init();/* must come before autoconfiguration */
tty_init(); /* initialise tty's */
Index: kern/kern_rwlock.c
===
RCS file: /cvs/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.45
diff -u -p -r1.45 kern_rwlock.c
--- kern/kern_rwlock.c  2 Mar 2020 17:07:49 -   1.45
+++ kern/kern_rwlock.c  11 Jan 2021 13:59:07 -
@@ -19,6 +19,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -487,4 +488,124 @@ int
 rrw_status(struct rrwlock *rrwl)
 {
return (rw_status(>rrwl_lock));
+}
+
+/*-
+ * Copyright (c) 2008 The NetBSD Foundation, Inc.
+ * All rights reserved.
+ *
+ * This code is derived from software contributed to The NetBSD Foundation
+ * by Andrew Doran.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#defineRWLOCK_OBJ_MAGIC0x5aa3c85d
+struct rwlock_obj {
+   struct rwlock   ro_lock;