Re: AMD pcidevs updates

2022-11-29 Thread Laurence Tratt
On Tue, Nov 29, 2022 at 10:42:36PM +, Laurence Tratt wrote:

> The diff below adds some newish AMD elements to pcidevs.

As Mike Larkin kindly pointed out off-list, I sent a diff to the generated
file. Sorry!


Laurie

diff --git sys/dev/pci/pcidevs sys/dev/pci/pcidevs
index 2a395ab413a..158a3cc7640 100644
--- sys/dev/pci/pcidevs
+++ sys/dev/pci/pcidevs
@@ -780,6 +780,16 @@ product AMD 19_4X_IOMMU0x14b6  19h/4xh IOMMU
 product AMD 19_4X_HB_1 0x14b7  19h/4xh Host
 product AMD 19_4X_PCIE_1   0x14b9  19h/4xh PCIE
 product AMD 19_4X_PCIE_2   0x14ba  19h/4xh PCIE
+product AMD 19_6X_RC   0x14d8  19h/6xh Root Complex
+product AMD 19_6X_IOMMU0x14d9  19h/6xh IOMMU
+product AMD 19_6X_DF_1 0x14e0  19h Data Fabric
+product AMD 19_6X_DF_2 0x14e1  19h Data Fabric
+product AMD 19_6X_DF_3 0x14e2  19h Data Fabric
+product AMD 19_6X_DF_4 0x14e3  19h Data Fabric
+product AMD 19_6X_DF_5 0x14e4  19h Data Fabric
+product AMD 19_6X_DF_6 0x14e5  19h Data Fabric
+product AMD 19_6X_DF_7 0x14e6  19h Data Fabric
+product AMD 19_6X_DF_8 0x14e7  19h Data Fabric
 product AMD 14_HB  0x1510  14h Host
 product AMD 14_PCIE_1  0x1512  14h PCIE
 product AMD 14_PCIE_2  0x1513  14h PCIE



Re: mbufs growing in 7.2

2022-11-29 Thread Greg Steuck
Greg Steuck  writes:

> The watched kettle never boiled. No more crashes in over two weeks
> (instead of two in the first week). I tried a loop of alternating iperf3
> tcp and udp to no ill effect. I still see the growth in the metrics I
> reported, yet the system remained stable.
>
> I applied the patch below and am still collecting the metrics. I doubt
> they are responsible for the original problem.

This time the problem fired after 6 days of uptime. The system is
running 7.2 + igc off-by-one fix.

The symptoms are:

1 A single interface is "stuck", sometimes ping replies come back,
  incoming packets are visible in tcpdump, no reply packets
  appear in tcpdump (nor received on the other machine).
2 Other interfaces are fine to the point that I can ssh over one of
  them to debug.
3 The stuck interface remains stuck after ifconfig down/up.
4 The stuck interface remains stuck throughout pfctl -d/-e. This did
  reenable ping replies, that were stuck for a bit.
5 netstat -i shows large (and rising) value in Ofail column
6 netstat -m shows a number of 'mbuf 2112' stuck fairly high:
  5539 mbufs in use:
5441 mbufs allocated to data
7 mbufs allocated to packet headers
91 mbufs allocated to socket names and addresses
  40/112 mbuf 2048 byte clusters in use (current/peak)
  4510/6630 mbuf 2112 byte clusters in use (current/peak)
7 no established connections to speak of

The interface stickiness is mainly its inability to send higher level
protocol replies. E.g. a TCP connection from a remote system doesn't get
completed. Or SYN/ACK completes, but then I can see the data to the
machine and not even ACKs coming back. ktrace shows the application
writes the data which just never makes it down the stack to where
tcpdump would see it.

Obligatory graph of seemingly related counters: "mbufs in use",
"mbuf 2112 byte clusters", and "Ofail" counts

https://docs.google.com/spreadsheets/d/e/2PACX-1vRr61USv9VNvaIq9qEs8W1wy869ai6MwNmevDLmxLJOV3DaUBcrRUzwzNZP92syltrWfrmIUWq7qevG/pubchart?oid=202363413=interactive

There's a long tail to the left covering 5 days of mostly nothing
happening. The drop of "mbufs in use" from 7355 to 5739 is around the
time I removed the system from service and possibly when I cycled
ifconfig down/up (I have no record).

The system is still up and moved off to the side for debugging. It can
remain up for as long as we have things to try (and power utility
cooperates).

Thanks
Greg



Re: Unlock getsockopt(2)

2022-11-29 Thread Alexander Bluhm
On Wed, Nov 30, 2022 at 02:02:03AM +0300, Vitaliy Makkoveev wrote:
> On Mon, Nov 28, 2022 at 08:38:02PM +0300, Vitaliy Makkoveev wrote:
> > Subj.
> > 
> > At sockets layer we touch only per-socket data, which is solock()
> > protected().
> > 
> > At protocol layer, unix(4) and key management sockets have no
> > (*pr_ctloutput)() handlers. route_ctloutput() touches only per socket
> > data, which is solock() protected. inet{,6} globals are protected by
> > netlock, which is solock() backend for corresponding sockets.
> > 
> 
> Since getsockopt(2) and setsockopt(2) both follow the same
> (*pr_ctloutput)() handlers, it makes sense to unlock them both.

The scariest part is multicast forwarding.  ip_mforward() is called
with shared netlock and kernel lock from ip_input().  The kernel
lock is needed as multicast is not MP safe yet and forwarding changes
multicast data.

As long setsockopt(2) runs with exlusive netlock, kernel lock is
not necessary.

OK bluhm@



Re: Unlock nd6_ioctl(), push kernel lock into in6_ioctl_{get,change_ifaddr}()

2022-11-29 Thread Klemens Nanni
On Wed, Nov 30, 2022 at 02:25:46AM +0300, Vitaliy Makkoveev wrote:
> I like to have current "error =" notation for both mrt6_ioctl()
> and in6_ioctl() within in6_control().

Alright.

> Also, `data’ passed to in6_ioctl_change_ifaddr() is the local
> variable, kernel lock could be pushed deep down, just before
> netlock.

Sure.

OK?
---
Neighbour Discovery information is protected by the net lock, as
documented in nd6.h struct nd_ifinfo.

ndp(8) is the only SIOCGIFINFO_IN6 and SIOCGNBRINFO_IN6 user in base.

nd6_lookup(), also used in ICMP6 input and IPv6 forwarding, only needs
the net lock.

diff --git a/sys/netinet6/in6.c b/sys/netinet6/in6.c
index 44dfa3488ef..a34e3860916 100644
--- a/sys/netinet6/in6.c
+++ b/sys/netinet6/in6.c
@@ -213,9 +213,7 @@ in6_control(struct socket *so, u_long cmd, caddr_t data, 
struct ifnet *ifp)
break;
 #endif /* MROUTING */
default:
-   KERNEL_LOCK();
error = in6_ioctl(cmd, data, ifp, privileged);
-   KERNEL_UNLOCK();
break;
}
 
@@ -296,6 +294,7 @@ in6_ioctl_change_ifaddr(u_long cmd, caddr_t data, struct 
ifnet *ifp)
return (error);
}
 
+   KERNEL_LOCK();
NET_LOCK();
 
if (sa6 != NULL) {
@@ -402,6 +401,7 @@ in6_ioctl_change_ifaddr(u_long cmd, caddr_t data, struct 
ifnet *ifp)
 
 err:
NET_UNLOCK();
+   KERNEL_UNLOCK();
return (error);
 }
 
@@ -422,6 +422,7 @@ in6_ioctl_get(u_long cmd, caddr_t data, struct ifnet *ifp)
return (error);
}
 
+   KERNEL_LOCK();
NET_LOCK_SHARED();
 
if (sa6 != NULL) {
@@ -517,6 +518,7 @@ in6_ioctl_get(u_long cmd, caddr_t data, struct ifnet *ifp)
 
 err:
NET_UNLOCK_SHARED();
+   KERNEL_UNLOCK();
return (error);
 }
 



Re: Unlock nd6_ioctl(), push kernel lock into in6_ioctl_get()

2022-11-29 Thread Vitaliy Makkoveev
I like to have current "error =" notation for both mrt6_ioctl()
and in6_ioctl() within in6_control().

Also, `data’ passed to in6_ioctl_change_ifaddr() is the local
variable, kernel lock could be pushed deep down, just before
netlock.

> On 29 Nov 2022, at 16:35, Klemens Nanni  wrote:
> 
> On Wed, Nov 23, 2022 at 03:39:54PM +0300, Vitaliy Makkoveev wrote:
>> Could this be merged with the following non "Mechanical move" diff?
> 
> Here's a rebased and cleaned up diff.
> 
> Feedback? Objection? OK?
> 
> ---
> Neighbour Discovery information is protected by the net lock, as
> documented in nd6.h struct nd_ifinfo.
> 
> ndp(8) is the only SIOCGIFINFO_IN6 and SIOCGNBRINFO_IN6 user in base.
> 
> nd6_lookup(), also used in ICMP6 input and IPv6 forwarding, only needs
> the net lock.
> 
> diff --git a/sys/netinet6/in6.c b/sys/netinet6/in6.c
> index 44dfa3488ef..ad680278210 100644
> --- a/sys/netinet6/in6.c
> +++ b/sys/netinet6/in6.c
> @@ -205,48 +205,48 @@ in6_control(struct socket *so, u_long cmd, caddr_t 
> data, struct ifnet *ifp)
> 
>   switch (cmd) {
> #ifdef MROUTING
>   case SIOCGETSGCNT_IN6:
>   case SIOCGETMIFCNT_IN6:
>   KERNEL_LOCK();
>   error = mrt6_ioctl(so, cmd, data);
>   KERNEL_UNLOCK();
> - break;
> + return (error);
> #endif /* MROUTING */
>   default:
> - KERNEL_LOCK();
> - error = in6_ioctl(cmd, data, ifp, privileged);
> - KERNEL_UNLOCK();
> - break;
> + return (in6_ioctl(cmd, data, ifp, privileged));
>   }
> -
> - return error;
> }
> 
> int
> in6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, int privileged)
> {
> + int error;
> +
>   if (ifp == NULL)
>   return (ENXIO);
> 
>   switch (cmd) {
>   case SIOCGIFINFO_IN6:
>   case SIOCGNBRINFO_IN6:
>   return (nd6_ioctl(cmd, data, ifp));
>   case SIOCGIFDSTADDR_IN6:
>   case SIOCGIFNETMASK_IN6:
>   case SIOCGIFAFLAG_IN6:
>   case SIOCGIFALIFETIME_IN6:
>   return (in6_ioctl_get(cmd, data, ifp));
>   case SIOCAIFADDR_IN6:
>   case SIOCDIFADDR_IN6:
>   if (!privileged)
>   return (EPERM);
> - return (in6_ioctl_change_ifaddr(cmd, data, ifp));
> + KERNEL_LOCK();
> + error = in6_ioctl_change_ifaddr(cmd, data, ifp);
> + KERNEL_UNLOCK();
> + return (error);
>   case SIOCSIFADDR:
>   case SIOCSIFDSTADDR:
>   case SIOCSIFBRDADDR:
>   case SIOCSIFNETMASK:
>   /*
>* Do not pass those ioctl to driver handler since they are not
>* properly set up. Instead just error out.
>*/
> @@ -417,16 +417,17 @@ in6_ioctl_get(u_long cmd, caddr_t data, struct ifnet 
> *ifp)
>   sa = sin6tosa(>ifr_addr);
>   if (sa->sa_family == AF_INET6) {
>   sa->sa_len = sizeof(struct sockaddr_in6);
>   error = in6_sa2sin6(sa, );
>   if (error)
>   return (error);
>   }
> 
> + KERNEL_LOCK();
>   NET_LOCK_SHARED();
> 
>   if (sa6 != NULL) {
>   error = in6_check_embed_scope(sa6, ifp->if_index);
>   if (error)
>   goto err;
>   error = in6_clear_scope_id(sa6, ifp->if_index);
>   if (error)
> @@ -512,16 +513,17 @@ in6_ioctl_get(u_long cmd, caddr_t data, struct ifnet 
> *ifp)
>   break;
> 
>   default:
>   panic("%s: invalid ioctl %lu", __func__, cmd);
>   }
> 
> err:
>   NET_UNLOCK_SHARED();
> + KERNEL_UNLOCK();
>   return (error);
> }
> 
> int
> in6_check_embed_scope(struct sockaddr_in6 *sa6, unsigned int ifidx)
> {
>   if (IN6_IS_ADDR_LINKLOCAL(>sin6_addr)) {
>   if (sa6->sin6_addr.s6_addr16[1] == 0) {
> 



Re: Unlock getsockopt(2)

2022-11-29 Thread Vitaliy Makkoveev
On Mon, Nov 28, 2022 at 08:38:02PM +0300, Vitaliy Makkoveev wrote:
> Subj.
> 
> At sockets layer we touch only per-socket data, which is solock()
> protected().
> 
> At protocol layer, unix(4) and key management sockets have no
> (*pr_ctloutput)() handlers. route_ctloutput() touches only per socket
> data, which is solock() protected. inet{,6} globals are protected by
> netlock, which is solock() backend for corresponding sockets.
> 

Since getsockopt(2) and setsockopt(2) both follow the same
(*pr_ctloutput)() handlers, it makes sense to unlock them both.

Index: sys/kern/syscalls.master
===
RCS file: /cvs/src/sys/kern/syscalls.master,v
retrieving revision 1.236
diff -u -p -r1.236 syscalls.master
--- sys/kern/syscalls.master9 Nov 2022 10:26:28 -   1.236
+++ sys/kern/syscalls.master29 Nov 2022 22:58:25 -
@@ -223,7 +223,7 @@
 103STD { int sys_sigreturn(struct sigcontext *sigcntxp); }
 104STD { int sys_bind(int s, const struct sockaddr *name, \
socklen_t namelen); }
-105STD { int sys_setsockopt(int s, int level, int name, \
+105STD NOLOCK  { int sys_setsockopt(int s, int level, int name, \
const void *val, socklen_t valsize); }
 106STD { int sys_listen(int s, int backlog); }
 107STD { int sys_chflagsat(int fd, const char *path, \
@@ -249,7 +249,7 @@
struct timespec *timeout); }
 117STD NOLOCK  { int sys_sendmmsg(int s,  struct mmsghdr *mmsg,\
unsigned int vlen, int flags); }
-118STD { int sys_getsockopt(int s, int level, int name, \
+118STD NOLOCK  { int sys_getsockopt(int s, int level, int name, \
void *val, socklen_t *avalsize); }
 119STD { int sys_thrkill(pid_t tid, int signum, void *tcb); }
 120STD NOLOCK  { ssize_t sys_readv(int fd, \



AMD pcidevs updates

2022-11-29 Thread Laurence Tratt
The diff below adds some newish AMD elements to pcidevs. Here's the diff of
them on my MSI board:

  -pchb0 at pci0 dev 0 function 0 vendor "AMD", unknown product 0x14d8 rev 0x00
  -vendor "AMD", unknown product 0x14d9 (class system subclass IOMMU, rev 0x00) 
at pci0 dev 0 function 2 not configured
  +pchb0 at pci0 dev 0 function 0 "AMD 19h/6xh Root Complex" rev 0x00
  +"AMD 19h/6xh IOMMU" rev 0x00 at pci0 dev 0 function 2 not configured
  -pchb6 at pci0 dev 24 function 0 vendor "AMD", unknown product 0x14e0 rev 0x00
  -pchb7 at pci0 dev 24 function 1 vendor "AMD", unknown product 0x14e1 rev 0x00
  -pchb8 at pci0 dev 24 function 2 vendor "AMD", unknown product 0x14e2 rev 0x00
  -pchb9 at pci0 dev 24 function 3 vendor "AMD", unknown product 0x14e3 rev 0x00
  -pchb10 at pci0 dev 24 function 4 vendor "AMD", unknown product 0x14e4 rev 
0x00
  -pchb11 at pci0 dev 24 function 5 vendor "AMD", unknown product 0x14e5 rev 
0x00
  -pchb12 at pci0 dev 24 function 6 vendor "AMD", unknown product 0x14e6 rev 
0x00
  -pchb13 at pci0 dev 24 function 7 vendor "AMD", unknown product 0x14e7 rev 
0x00
  +pchb6 at pci0 dev 24 function 0 "AMD 19h Data Fabric" rev 0x00
  +pchb7 at pci0 dev 24 function 1 "AMD 19h Data Fabric" rev 0x00
  +pchb8 at pci0 dev 24 function 2 "AMD 19h Data Fabric" rev 0x00
  +pchb9 at pci0 dev 24 function 3 "AMD 19h Data Fabric" rev 0x00
  +pchb10 at pci0 dev 24 function 4 "AMD 19h Data Fabric" rev 0x00
  +pchb11 at pci0 dev 24 function 5 "AMD 19h Data Fabric" rev 0x00
  +pchb12 at pci0 dev 24 function 6 "AMD 19h Data Fabric" rev 0x00
  +pchb13 at pci0 dev 24 function 7 "AMD 19h Data Fabric" rev 0x00

As the dmesg shows, there are several remaining unknown elements, and
several unconfigured, on this machine. The machine is, unfortunately, not
really usable as it loses wireless access and/or input to X (mouse/keyboard
stop working, but the display is fine) after about 15-45 minutes, at which
point only a hard reboot will restore them. Still, fewer "unknown products"
is a start!


Laurie

diff --git sys/dev/pci/pcidevs.h sys/dev/pci/pcidevs.h
index cb1f344909a..7df027538cd 100644
--- sys/dev/pci/pcidevs.h
+++ sys/dev/pci/pcidevs.h
@@ -785,6 +785,16 @@
 #definePCI_PRODUCT_AMD_19_4X_HB_1  0x14b7  /* 19h/4xh Host 
*/
 #definePCI_PRODUCT_AMD_19_4X_PCIE_10x14b9  /* 19h/4xh PCIE 
*/
 #definePCI_PRODUCT_AMD_19_4X_PCIE_20x14ba  /* 19h/4xh PCIE 
*/
+#definePCI_PRODUCT_AMD_19_6X_RC0x14d8  /* 19h/6xh Root 
Complex */
+#definePCI_PRODUCT_AMD_19_6X_IOMMU 0x14d9  /* 19h/6xh 
IOMMU */
+#definePCI_PRODUCT_AMD_19_6X_DF_1  0x14e0  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_2  0x14e1  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_3  0x14e2  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_4  0x14e3  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_5  0x14e4  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_6  0x14e5  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_7  0x14e6  /* 19h Data 
Fabric */
+#definePCI_PRODUCT_AMD_19_6X_DF_8  0x14e7  /* 19h Data 
Fabric */
 #definePCI_PRODUCT_AMD_14_HB   0x1510  /* 14h Host */
 #definePCI_PRODUCT_AMD_14_PCIE_1   0x1512  /* 14h PCIE */
 #definePCI_PRODUCT_AMD_14_PCIE_2   0x1513  /* 14h PCIE */



OpenBSD 7.2-current (GENERIC.MP) #15: Tue Nov 29 22:26:24 GMT 2022
ltr...@phase.tratt.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 33390379008 (31843MB)
avail mem = 32360992768 (30861MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.5 @ 0xa9b2a000 (43 entries)
bios0: vendor American Megatrends International, LLC. version "1.30" date 
10/12/2022
bios0: Micro-Star International Co., Ltd. MS-7D67
efi0 at bios0: UEFI 2.8
efi0: American Megatrends rev 0x5001a
acpi0 at bios0: ACPI 6.4Undefined scope: \\_SB_.PCI0.GPP7.UP00.DP40.UP00.DP68

acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP IVRS SSDT SSDT FIDT MCFG HPET WDRT UEFI FPDT VFCT TPM2 
SSDT CRAT CDIT BGRT SSDT SSDT SSDT SSDT WSMT APIC SSDT SSDT SSDT SSDT SSDT SSDT 
SSDT SSDT
acpi0: wakeup devices GPP3(S4) GPP4(S4) GPP5(S4) GPP6(S4) GP17(S4) XHC0(S4) 
XHC1(S4) XHC2(S4) GPP0(S4) GPP1(S4) GPP2(S4) GPP7(S4) GPP8(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 9 7900X 12-Core Processor, 4700.00 MHz, 19-61-02
cpu0: 

Unlock nd6_ioctl(), push kernel lock into in6_ioctl_get()

2022-11-29 Thread Klemens Nanni
On Wed, Nov 23, 2022 at 03:39:54PM +0300, Vitaliy Makkoveev wrote:
> Could this be merged with the following non "Mechanical move" diff?

Here's a rebased and cleaned up diff.

Feedback? Objection? OK?

---
Neighbour Discovery information is protected by the net lock, as
documented in nd6.h struct nd_ifinfo.

ndp(8) is the only SIOCGIFINFO_IN6 and SIOCGNBRINFO_IN6 user in base.

nd6_lookup(), also used in ICMP6 input and IPv6 forwarding, only needs
the net lock.

diff --git a/sys/netinet6/in6.c b/sys/netinet6/in6.c
index 44dfa3488ef..ad680278210 100644
--- a/sys/netinet6/in6.c
+++ b/sys/netinet6/in6.c
@@ -205,48 +205,48 @@ in6_control(struct socket *so, u_long cmd, caddr_t data, 
struct ifnet *ifp)
 
switch (cmd) {
 #ifdef MROUTING
case SIOCGETSGCNT_IN6:
case SIOCGETMIFCNT_IN6:
KERNEL_LOCK();
error = mrt6_ioctl(so, cmd, data);
KERNEL_UNLOCK();
-   break;
+   return (error);
 #endif /* MROUTING */
default:
-   KERNEL_LOCK();
-   error = in6_ioctl(cmd, data, ifp, privileged);
-   KERNEL_UNLOCK();
-   break;
+   return (in6_ioctl(cmd, data, ifp, privileged));
}
-
-   return error;
 }
 
 int
 in6_ioctl(u_long cmd, caddr_t data, struct ifnet *ifp, int privileged)
 {
+   int error;
+
if (ifp == NULL)
return (ENXIO);
 
switch (cmd) {
case SIOCGIFINFO_IN6:
case SIOCGNBRINFO_IN6:
return (nd6_ioctl(cmd, data, ifp));
case SIOCGIFDSTADDR_IN6:
case SIOCGIFNETMASK_IN6:
case SIOCGIFAFLAG_IN6:
case SIOCGIFALIFETIME_IN6:
return (in6_ioctl_get(cmd, data, ifp));
case SIOCAIFADDR_IN6:
case SIOCDIFADDR_IN6:
if (!privileged)
return (EPERM);
-   return (in6_ioctl_change_ifaddr(cmd, data, ifp));
+   KERNEL_LOCK();
+   error = in6_ioctl_change_ifaddr(cmd, data, ifp);
+   KERNEL_UNLOCK();
+   return (error);
case SIOCSIFADDR:
case SIOCSIFDSTADDR:
case SIOCSIFBRDADDR:
case SIOCSIFNETMASK:
/*
 * Do not pass those ioctl to driver handler since they are not
 * properly set up. Instead just error out.
 */
@@ -417,16 +417,17 @@ in6_ioctl_get(u_long cmd, caddr_t data, struct ifnet *ifp)
sa = sin6tosa(>ifr_addr);
if (sa->sa_family == AF_INET6) {
sa->sa_len = sizeof(struct sockaddr_in6);
error = in6_sa2sin6(sa, );
if (error)
return (error);
}
 
+   KERNEL_LOCK();
NET_LOCK_SHARED();
 
if (sa6 != NULL) {
error = in6_check_embed_scope(sa6, ifp->if_index);
if (error)
goto err;
error = in6_clear_scope_id(sa6, ifp->if_index);
if (error)
@@ -512,16 +513,17 @@ in6_ioctl_get(u_long cmd, caddr_t data, struct ifnet *ifp)
break;
 
default:
panic("%s: invalid ioctl %lu", __func__, cmd);
}
 
 err:
NET_UNLOCK_SHARED();
+   KERNEL_UNLOCK();
return (error);
 }
 
 int
 in6_check_embed_scope(struct sockaddr_in6 *sa6, unsigned int ifidx)
 {
if (IN6_IS_ADDR_LINKLOCAL(>sin6_addr)) {
if (sa6->sin6_addr.s6_addr16[1] == 0) {



Re: pfsync panic after pf_purge backout

2022-11-29 Thread Hrvoje Popovski
On 28.11.2022. 17:07, Alexandr Nedvedicky wrote:
> diff below should avoid panic above (and similar panics in pfsync_q_del().
> It also prints some 'error' system message buffer (a.k.a. dmesg)
> 
> We panic because we attempt to remove state from psync queue which is
> already empty. the pfsync_q_del() must be acting on some stale information
> kept in `st` argument (state).
> 
> the information we print into dmesg buffer should help us understand
> what's going on. At the moment I can not explain how does it come
> there is a state which claims its presence on state queue, while the
> queue in question is empty.
> 
> I'd like to ask you to give a try diff below and repeat your test.
> Let it run for some time and collect 'dmesg' output for me after usual
> uptime-to-panic elapses during a test run.


Hi,

here's panic with WITESS, this diff and this one
https://www.mail-archive.com/tech@openbsd.org/msg72582.html

I will leave box in ddb ...


wsmouse1 at ums1 mux 0
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
107228145326
32609891)nosync: no unlinked: no timeout: PFTM_TCP_OPENING
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
132937190089
77519715)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
124393720902
11468387)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
182347109632
42042467)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<4>com1: 1 silo overflow, 0 ibuf overflows
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
142245292777
58899299)nosync: yes unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
475207607577
1004003)nosync: no unlinked: no timeout: PFTM_TCP_FIN_WAIT
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
159470263971
51003747)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
134712868555
08329571)nosync: no unlinked: no timeout: PFTM_TCP_FIN_WAIT
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
790090587538
2371427)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
543286072470
0808291)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
924950197971
1276131)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
139134763774
04146787)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
930062379002
4590435)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
125019156702
63792739)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
134887042707
43536739)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
155369141176
78236771)nosync: no unlinked: no timeout: PFTM_TCP_FIN_WAIT
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
100393240929
61031267)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
909085788793
5661155)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
909085788793
5661155)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
909085788793
5661155)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
909085788793
5661155)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
909085788793
5661155)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
182376159588
61972579)nosync: no unlinked: no timeout: PFTM_UDP_MULTIPLE
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
159637401507
14238051)nosync: no unlinked: no timeout: PFTM_TCP_FIN_WAIT
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
242206588258
3172195)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
105537141411
11420003)nosync: no unlinked: no timeout: PFTM_TCP_OPENING
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
152424297451
57538915)nosync: no unlinked: no timeout: PFTM_TCP_CLOSED
<3>pfsync: pfsync_q_del: stale queue (PFSYNC_S_UPD_C) in state (id
498918805181
3434467)nosync: no unlinked: no timeout: PFTM_TCP_FIN_WAIT
<3>pfsync: pfsync_q_del: stale queue 

Re: Improve rpki-client warnings on .mft files

2022-11-29 Thread Claudio Jeker
On Tue, Nov 29, 2022 at 11:16:25AM +0100, Theo Buehler wrote:
> On Tue, Nov 29, 2022 at 10:55:02AM +0100, Claudio Jeker wrote:
> > On Mon, Nov 28, 2022 at 07:40:40PM +0100, Theo Buehler wrote:
> > > On Mon, Nov 28, 2022 at 07:18:24PM +0100, Claudio Jeker wrote:
> > > > On Mon, Nov 28, 2022 at 06:02:56PM +0100, Theo Buehler wrote:
> > > > > On Mon, Nov 28, 2022 at 05:14:48PM +0100, Claudio Jeker wrote:
> > > > > > On Mon, Nov 28, 2022 at 04:50:24PM +0100, Theo Buehler wrote:
> > > > > > > On Mon, Nov 28, 2022 at 04:02:11PM +0100, Claudio Jeker wrote:
> > > > > > > > Since a long time any problem that caused rpki-client to not 
> > > > > > > > load a
> > > > > > > > manifest resulted in the non helpful:
> > > > > > > > rpki-client: 
> > > > > > > > rpki.afrinic.net/repository/member_repository/F36505B2/0569917622D711ED862FD6E0F1222468/0nALpPtwFyntPHjkS8xt-VQrqLw.mft:
> > > > > > > >  no valid mft available
> > > > > > > > 
> > > > > > > > This hides in most cases the cause why the manifest verificatin 
> > > > > > > > failed.
> > > > > > > > The following diff exposes the error from valid_x509() and with 
> > > > > > > > that some
> > > > > > > > manifest errors change to e.g.:
> > > > > > > > rpki-client: 
> > > > > > > > parent.rov.koenvanhove.nl/repo/KoenvanHove/1/C1F7424F77FBF669FB750C6DC7B649C5969DCD55.mft:
> > > > > > > >  CRL has expired
> > > > > > > > 
> > > > > > > > or if the CRL is missing
> > > > > > > > 
> > > > > > > > rpki-client: 
> > > > > > > > repo.pedjoeang.group/repo/localname/0/EF79F8E55B6A248EF9CF4CE70FF60C017BF1A3B0.mft:
> > > > > > > >  unable to get certificate CRL
> > > > > > > > 
> > > > > > > > If the certificate is pointing to a manifest that does not 
> > > > > > > > exist the old
> > > > > > > > "no valid mft available" is shown.
> > > > > > > > 
> > > > > > > > I tried to keep original behaviour as much as possible but I 
> > > > > > > > think
> > > > > > > > filemode can be further improved. And maybe we can remove 
> > > > > > > > verbose from
> > > > > > > > other warnings as well.
> > > > > > > 
> > > > > > > I like this a lot.
> > > > > > > 
> > > > > > > I was wondering if valid_x509() should have a const char **errstr
> > > > > > > instead of an int * as last argument. valid_x509() could do the
> > > > > > > conversion from error code to error string itself. This way we 
> > > > > > > don't
> > > > > > > have to sprinkle X509_verify_cert_error_string() calls everywhere.
> > > > > > > 
> > > > > > > Or we could introduce a warn_invalid_x509(const char *str, int 
> > > > > > > err) that
> > > > > > > does the conversion from err using 
> > > > > > > X509_verify_cert_error_string().
> > > > > > > 
> > > > > > > One downside of X509_verify_cert_error_string() is that it isn't 
> > > > > > > thread
> > > > > > > safe since it might return a pointer to a static buffer -- it 
> > > > > > > should
> > > > > > > not, but who can be sure...
> > > > > > 
> > > > > > Tough call. It may also help other code paths to do the same. But 
> > > > > > in many
> > > > > > cases a dynamic buffer would be needed.
> > > > > > 
> > > > > > Not sure if it makes sense to introduce warn_invalid_x509(). What I 
> > > > > > don't
> > > > > > like is the verbose check before the warning. I wonder if we still 
> > > > > > need
> > > > > > that. My last run has 11 failed roas and 51 failed mfts. The mft 
> > > > > > errors
> > > > > > already show up. Shouldn't the roa errors be shown as well?
> > > > > 
> > > > > They should. Unless I'm completely confusing myself, this is a bug in
> > > > > the diff and all the added if (verbose > 1) should be dropped.
> > > > > 
> > > > > If the last argument (nowarn) of valid_x509() was 0 (everywhere except
> > > > > in proc_parser_pre()), valid_x509() would print the error 
> > > > > independently
> > > > > of verbose. verbose > 1 would force printing the warning also for 
> > > > > mfts,
> > > > > but there it would be drowned in the other noise.
> > > > > 
> > > > > - if (!valid_x509(file, ctx, x509, a, crl, 0)) {
> > > > > 
> > > > > ...
> > > > > 
> > > > > - if (!nowarn || verbose > 1)
> > > > > - warnx("%s: %s", file, 
> > > > > X509_verify_cert_error_string(c));
> > > > > 
> > > > 
> > > > Indeed. Better diff below.
> > > > Still thinking about the idea with the const char **.
> > > 
> > > One of the main reasons for suggesting it was the amount of awkward line
> > > wrapping. There's now much less of this. We can easily switch if you
> > > should change your mind.
> > > 
> > > ok tb
> > 
> > Now that X509_verify_cert_error_string() is always returning a constant
> > string lets return the error string instead.
> 
> Fine with me. Couple of simple comments below, then it's
> 
> ok tb
> 
> (happy to review again if you prefer that)
> 
> > Index: filemode.c
> > ===
> > RCS file: /cvs/src/usr.sbin/rpki-client/filemode.c,v
> > retrieving revision 1.17
> > diff 

Re: Improve rpki-client warnings on .mft files

2022-11-29 Thread Theo Buehler
On Tue, Nov 29, 2022 at 10:55:02AM +0100, Claudio Jeker wrote:
> On Mon, Nov 28, 2022 at 07:40:40PM +0100, Theo Buehler wrote:
> > On Mon, Nov 28, 2022 at 07:18:24PM +0100, Claudio Jeker wrote:
> > > On Mon, Nov 28, 2022 at 06:02:56PM +0100, Theo Buehler wrote:
> > > > On Mon, Nov 28, 2022 at 05:14:48PM +0100, Claudio Jeker wrote:
> > > > > On Mon, Nov 28, 2022 at 04:50:24PM +0100, Theo Buehler wrote:
> > > > > > On Mon, Nov 28, 2022 at 04:02:11PM +0100, Claudio Jeker wrote:
> > > > > > > Since a long time any problem that caused rpki-client to not load 
> > > > > > > a
> > > > > > > manifest resulted in the non helpful:
> > > > > > > rpki-client: 
> > > > > > > rpki.afrinic.net/repository/member_repository/F36505B2/0569917622D711ED862FD6E0F1222468/0nALpPtwFyntPHjkS8xt-VQrqLw.mft:
> > > > > > >  no valid mft available
> > > > > > > 
> > > > > > > This hides in most cases the cause why the manifest verificatin 
> > > > > > > failed.
> > > > > > > The following diff exposes the error from valid_x509() and with 
> > > > > > > that some
> > > > > > > manifest errors change to e.g.:
> > > > > > > rpki-client: 
> > > > > > > parent.rov.koenvanhove.nl/repo/KoenvanHove/1/C1F7424F77FBF669FB750C6DC7B649C5969DCD55.mft:
> > > > > > >  CRL has expired
> > > > > > > 
> > > > > > > or if the CRL is missing
> > > > > > > 
> > > > > > > rpki-client: 
> > > > > > > repo.pedjoeang.group/repo/localname/0/EF79F8E55B6A248EF9CF4CE70FF60C017BF1A3B0.mft:
> > > > > > >  unable to get certificate CRL
> > > > > > > 
> > > > > > > If the certificate is pointing to a manifest that does not exist 
> > > > > > > the old
> > > > > > > "no valid mft available" is shown.
> > > > > > > 
> > > > > > > I tried to keep original behaviour as much as possible but I think
> > > > > > > filemode can be further improved. And maybe we can remove verbose 
> > > > > > > from
> > > > > > > other warnings as well.
> > > > > > 
> > > > > > I like this a lot.
> > > > > > 
> > > > > > I was wondering if valid_x509() should have a const char **errstr
> > > > > > instead of an int * as last argument. valid_x509() could do the
> > > > > > conversion from error code to error string itself. This way we don't
> > > > > > have to sprinkle X509_verify_cert_error_string() calls everywhere.
> > > > > > 
> > > > > > Or we could introduce a warn_invalid_x509(const char *str, int err) 
> > > > > > that
> > > > > > does the conversion from err using X509_verify_cert_error_string().
> > > > > > 
> > > > > > One downside of X509_verify_cert_error_string() is that it isn't 
> > > > > > thread
> > > > > > safe since it might return a pointer to a static buffer -- it should
> > > > > > not, but who can be sure...
> > > > > 
> > > > > Tough call. It may also help other code paths to do the same. But in 
> > > > > many
> > > > > cases a dynamic buffer would be needed.
> > > > > 
> > > > > Not sure if it makes sense to introduce warn_invalid_x509(). What I 
> > > > > don't
> > > > > like is the verbose check before the warning. I wonder if we still 
> > > > > need
> > > > > that. My last run has 11 failed roas and 51 failed mfts. The mft 
> > > > > errors
> > > > > already show up. Shouldn't the roa errors be shown as well?
> > > > 
> > > > They should. Unless I'm completely confusing myself, this is a bug in
> > > > the diff and all the added if (verbose > 1) should be dropped.
> > > > 
> > > > If the last argument (nowarn) of valid_x509() was 0 (everywhere except
> > > > in proc_parser_pre()), valid_x509() would print the error independently
> > > > of verbose. verbose > 1 would force printing the warning also for mfts,
> > > > but there it would be drowned in the other noise.
> > > > 
> > > > -   if (!valid_x509(file, ctx, x509, a, crl, 0)) {
> > > > 
> > > > ...
> > > > 
> > > > -   if (!nowarn || verbose > 1)
> > > > -   warnx("%s: %s", file, 
> > > > X509_verify_cert_error_string(c));
> > > > 
> > > 
> > > Indeed. Better diff below.
> > > Still thinking about the idea with the const char **.
> > 
> > One of the main reasons for suggesting it was the amount of awkward line
> > wrapping. There's now much less of this. We can easily switch if you
> > should change your mind.
> > 
> > ok tb
> 
> Now that X509_verify_cert_error_string() is always returning a constant
> string lets return the error string instead.

Fine with me. Couple of simple comments below, then it's

ok tb

(happy to review again if you prefer that)

> 
> -- 
> :wq Claudio
> 
> Index: extern.h
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
> retrieving revision 1.161
> diff -u -p -r1.161 extern.h
> --- extern.h  26 Nov 2022 12:02:36 -  1.161
> +++ extern.h  29 Nov 2022 09:36:29 -
> @@ -629,7 +629,7 @@ intvalid_filename(const char *, size_
>  int   valid_uri(const char *, size_t, const char *);
>  int   valid_origin(const char *, const char *);
>  int

Re: Improve rpki-client warnings on .mft files

2022-11-29 Thread Claudio Jeker
On Mon, Nov 28, 2022 at 07:40:40PM +0100, Theo Buehler wrote:
> On Mon, Nov 28, 2022 at 07:18:24PM +0100, Claudio Jeker wrote:
> > On Mon, Nov 28, 2022 at 06:02:56PM +0100, Theo Buehler wrote:
> > > On Mon, Nov 28, 2022 at 05:14:48PM +0100, Claudio Jeker wrote:
> > > > On Mon, Nov 28, 2022 at 04:50:24PM +0100, Theo Buehler wrote:
> > > > > On Mon, Nov 28, 2022 at 04:02:11PM +0100, Claudio Jeker wrote:
> > > > > > Since a long time any problem that caused rpki-client to not load a
> > > > > > manifest resulted in the non helpful:
> > > > > > rpki-client: 
> > > > > > rpki.afrinic.net/repository/member_repository/F36505B2/0569917622D711ED862FD6E0F1222468/0nALpPtwFyntPHjkS8xt-VQrqLw.mft:
> > > > > >  no valid mft available
> > > > > > 
> > > > > > This hides in most cases the cause why the manifest verificatin 
> > > > > > failed.
> > > > > > The following diff exposes the error from valid_x509() and with 
> > > > > > that some
> > > > > > manifest errors change to e.g.:
> > > > > > rpki-client: 
> > > > > > parent.rov.koenvanhove.nl/repo/KoenvanHove/1/C1F7424F77FBF669FB750C6DC7B649C5969DCD55.mft:
> > > > > >  CRL has expired
> > > > > > 
> > > > > > or if the CRL is missing
> > > > > > 
> > > > > > rpki-client: 
> > > > > > repo.pedjoeang.group/repo/localname/0/EF79F8E55B6A248EF9CF4CE70FF60C017BF1A3B0.mft:
> > > > > >  unable to get certificate CRL
> > > > > > 
> > > > > > If the certificate is pointing to a manifest that does not exist 
> > > > > > the old
> > > > > > "no valid mft available" is shown.
> > > > > > 
> > > > > > I tried to keep original behaviour as much as possible but I think
> > > > > > filemode can be further improved. And maybe we can remove verbose 
> > > > > > from
> > > > > > other warnings as well.
> > > > > 
> > > > > I like this a lot.
> > > > > 
> > > > > I was wondering if valid_x509() should have a const char **errstr
> > > > > instead of an int * as last argument. valid_x509() could do the
> > > > > conversion from error code to error string itself. This way we don't
> > > > > have to sprinkle X509_verify_cert_error_string() calls everywhere.
> > > > > 
> > > > > Or we could introduce a warn_invalid_x509(const char *str, int err) 
> > > > > that
> > > > > does the conversion from err using X509_verify_cert_error_string().
> > > > > 
> > > > > One downside of X509_verify_cert_error_string() is that it isn't 
> > > > > thread
> > > > > safe since it might return a pointer to a static buffer -- it should
> > > > > not, but who can be sure...
> > > > 
> > > > Tough call. It may also help other code paths to do the same. But in 
> > > > many
> > > > cases a dynamic buffer would be needed.
> > > > 
> > > > Not sure if it makes sense to introduce warn_invalid_x509(). What I 
> > > > don't
> > > > like is the verbose check before the warning. I wonder if we still need
> > > > that. My last run has 11 failed roas and 51 failed mfts. The mft errors
> > > > already show up. Shouldn't the roa errors be shown as well?
> > > 
> > > They should. Unless I'm completely confusing myself, this is a bug in
> > > the diff and all the added if (verbose > 1) should be dropped.
> > > 
> > > If the last argument (nowarn) of valid_x509() was 0 (everywhere except
> > > in proc_parser_pre()), valid_x509() would print the error independently
> > > of verbose. verbose > 1 would force printing the warning also for mfts,
> > > but there it would be drowned in the other noise.
> > > 
> > > - if (!valid_x509(file, ctx, x509, a, crl, 0)) {
> > > 
> > > ...
> > > 
> > > - if (!nowarn || verbose > 1)
> > > - warnx("%s: %s", file, X509_verify_cert_error_string(c));
> > > 
> > 
> > Indeed. Better diff below.
> > Still thinking about the idea with the const char **.
> 
> One of the main reasons for suggesting it was the amount of awkward line
> wrapping. There's now much less of this. We can easily switch if you
> should change your mind.
> 
> ok tb

Now that X509_verify_cert_error_string() is always returning a constant
string lets return the error string instead.

-- 
:wq Claudio

Index: extern.h
===
RCS file: /cvs/src/usr.sbin/rpki-client/extern.h,v
retrieving revision 1.161
diff -u -p -r1.161 extern.h
--- extern.h26 Nov 2022 12:02:36 -  1.161
+++ extern.h29 Nov 2022 09:36:29 -
@@ -629,7 +629,7 @@ int  valid_filename(const char *, size_
 int valid_uri(const char *, size_t, const char *);
 int valid_origin(const char *, const char *);
 int valid_x509(char *, X509_STORE_CTX *, X509 *, struct auth *,
-   struct crl *, int);
+   struct crl *, const char **);
 int valid_rsc(const char *, struct cert *, struct rsc *);
 int valid_econtent_version(const char *, const ASN1_INTEGER *);
 int valid_aspa(const char *, struct cert *, struct aspa *);
Index: filemode.c

Re: Get rid of UVM_VNODE_CANPERSIST

2022-11-29 Thread Martin Pieuchot
On 28/11/22(Mon) 15:04, Mark Kettenis wrote:
> > Date: Wed, 23 Nov 2022 17:33:26 +0100
> > From: Martin Pieuchot 
> > 
> > On 23/11/22(Wed) 16:34, Mark Kettenis wrote:
> > > > Date: Wed, 23 Nov 2022 10:52:32 +0100
> > > > From: Martin Pieuchot 
> > > > 
> > > > On 22/11/22(Tue) 23:40, Mark Kettenis wrote:
> > > > > > Date: Tue, 22 Nov 2022 17:47:44 +
> > > > > > From: Miod Vallat 
> > > > > > 
> > > > > > > Here is a diff.  Maybe bluhm@ can try this on the macppc machine 
> > > > > > > that
> > > > > > > triggered the original "vref used where vget required" problem?
> > > > > > 
> > > > > > On a similar machine it panics after a few hours with:
> > > > > > 
> > > > > > panic: uvn_flush: PGO_SYNCIO return 'try again' error (impossible)
> > > > > > 
> > > > > > The trace (transcribed by hand) is
> > > > > > uvn_flush+0x820
> > > > > > uvm_vnp_terminate+0x79
> > > > > > vclean+0xdc
> > > > > > vgonel+0x70
> > > > > > getnewvnode+0x240
> > > > > > ffs_vget+0xcc
> > > > > > ffs_inode_alloc+0x13c
> > > > > > ufs_makeinode+0x94
> > > > > > ufs_create+0x58
> > > > > > VOP_CREATE+0x48
> > > > > > vn_open+0x188
> > > > > > doopenat+0x1b4
> > > > > 
> > > > > Ah right, there is another path where we end up with a refcount of
> > > > > zero.  Should be fixable, but I need to think about this for a bit.
> > > > 
> > > > Not sure to understand what you mean with refcount of 0.  Could you
> > > > elaborate?
> > > 
> > > Sorry, I was thinking ahead a bit.  I'm pretty much convinced that the
> > > issue we're dealing with is a race between a vnode being
> > > recycled/cleaned and the pagedaemon paging out pages associated with
> > > that same vnode.
> > > 
> > > The crashes we've seen before were all in the pagedaemon path where we
> > > end up calling into the VFS layer with a vnode that has v_usecount ==
> > > 0.  My "fix" avoids that, but hits the issue that when we are in the
> > > codepath that is recycling/cleaning the vnode, we can't use vget() to
> > > get a reference to the vnode since it checks that the vnode isn't in
> > > the process of being cleaned.
> > > 
> > > But if we avoid that issue (by for example) skipping the vget() call
> > > if the UVM_VNODE_DYING flag is set, we run into the same scenario
> > > where we call into the VFS layer with v_usecount == 0.  Now that may
> > > not actually be a problem, but I need to investigate this a bit more.
> > 
> > When the UVM_VNODE_DYING flag is set the caller always own a valid
> > reference to the vnode.  Either because it is in the process of cleaning
> > it via  uvm_vnp_terminate() or because it uvn_detach() has been called
> > which means the reference to the vnode hasn't been dropped yet.  So I
> > believe `v_usecount' for such vnode is positive.
> 
> I don't think so.  The vnode that can be recycled is sitting on the
> freelist with v_usecount == 0.  When getnewvnode() decides to recycle
> a vnode it takes it off the freelist and calls vgonel(), which i turn
> calls vclean(), which only increases v_usecount if it is non-zero.  So
> at the point where uvn_vnp_terminate() gets called v_usecount
> definitely is 0.
> 
> That said, the vnode is no longer on the freelist at that point and
> since UVM_VNODE_DYING is set, uvn_vnp_uncache() will return
> immediately without calling vref() to get another reference.  So that
> is fine.
> 
> > > Or maybe calling into the VFS layer with a vnode that has v_usecount
> > > == 0 is perfectly fine and we should do the vget() dance I propose in
> > > uvm_vnp_unache() instead of in uvn_put().
> > 
> > I'm not following.  uvm_vnp_uncache() is always called with a valid
> > vnode, no?
> 
> Not sure what you mean with a "valid vnode"; uvm_vnp_uncache() checks
> the UVM_VNODE_VALID flag at the start, which suggests that it can be
> called in cases where that flag is not set.  But it will unlock and
> return immediately in that case, so it should be safe.
> 
> Anyway, I think I have convinced myself that in the case where the
> pagedaemon ends up calling uvn_put() for a persisting vnode vm object,
> we do need to get a refence before calling uvn_io().  Otherwise we run
> the risk of the vnode being recycled under our feet while we're doing
> I/O.
> 
> So here is an updated diff that checks the UVM_VNODE_DYING flag and
> skips the refcount manipulation if it is set.
> 
> What do you think?

If this works, I'm more than happy and definitively ok with it.

> Index: uvm/uvm_vnode.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
> retrieving revision 1.130
> diff -u -p -r1.130 uvm_vnode.c
> --- uvm/uvm_vnode.c   20 Oct 2022 13:31:52 -  1.130
> +++ uvm/uvm_vnode.c   28 Nov 2022 14:01:20 -
> @@ -899,11 +899,21 @@ uvn_cluster(struct uvm_object *uobj, vof
>  int
>  uvn_put(struct uvm_object *uobj, struct vm_page **pps, int npages, int flags)
>  {
> - int retval;
> + struct uvm_vnode *uvn = (struct uvm_vnode *)uobj;
> + int dying, retval;
>  
>