Support for Banana PI BPI-M5 confusion on ftp.openbsd.org vs www.openbsd.org

2023-03-02 Thread kod code
Hello,

at
https://ftp.openbsd.org/pub/OpenBSD/7.2/arm64/INSTALL.arm64
under
"OpenBSD System Requirements and Supported Devices:"
...
"Amlogic G12B/SM1"
the Banana PI BPI-M5 isn't listed.

Whereas at
https://www.openbsd.org/arm64.html
under
"OpenBSD/arm64 runs on the following hardware:"
...
"Amlogic G12B/SM1"
it can be found.

Please inform me, which page is accurate.

Thank you!
-kodcode



Re: Dell Wyse 3040 acpitz vs tipmic

2023-03-02 Thread Mark Kettenis
> Date: Mon, 27 Feb 2023 10:00:25 +1000
> From: David Gwynne 
> 
> On Sun, Feb 26, 2023 at 01:28:04PM +0100, Mark Kettenis wrote:
> > > Date: Sun, 26 Feb 2023 18:13:18 +1000
> > > From: David Gwynne 
> 
> yeesh, i should have proofread my email before i sent it. sorry about
> making it harder to read than it should have been.
> 
> > > i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
> > > like them a lot. however, i have to disable acpitz to be able to
> > > use them because the driver gets stuck during attach.
> > > 
> > > during apcitz_attach does a read of all the temperatures. the read
> > > of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
> > > tipmic_thermal_opreg_handler has a loop on line 335 waiting for
> > > sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
> > > acpitz_attach is running while the kernel is code, and it appears that
> > > the interrupt handler never runs, so that value never changes, and
> > > acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
> > > do anything. thanks to patrick for helping me on the acpi side of things
> > > so we could figure this out.
> > 
> > A better approach might be to make sure that while we're cold,
> > tipmic_thermal_opreg_handler() polls for completion.  Something like:
> > 
> > while (sc->sc_stat_adc == 0) {
> > if (cold) {
> > delay(1000);
> > tpmic_intr();
> > } else {
> > if (tsleep(>sc_stat_adc, PRIBIO, "tipmic",
> > SEC_TO_NSEC(1))) {
> > ...
> > }
> > }
> > }   
> > 
> > 
> > > i tried deferring basically all of acpitz_attach to when kthreads are
> > > running, and that works well enough to get to userland.
> > > 
> > > is that reasonable?
> > 
> > The problem is that you can't really know whether AML accesses the
> > opregion while cold.
> 
> good point. the diff below works in this situation and is less
> intrusive.

ok kettenis@

> > > also, shortly after dwiic complains about short reads and the kernel
> > > locks up again. i'll have to plug it in and transcribe the exact
> > > errors. i think that's a separate problem though.
> > 
> > Yes, dwiic(4) has always been somewhat problematic.  Transactions seem
> > to fail randomly on some platforms like the atom system you're looking
> > at but also on my Ampere eMAG system.
> 
> fun. i managed to catch some of the dwiic stuff via dmesg before
> it locked up:
> 
> dwiic0: timed out waiting for tx_empty intr
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x5b
> dwiic0: timed out waiting for tx_empty intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x5a
> dwiic0: timed out waiting for rx_full intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x50
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for bus idle
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out waiting for stop intr
> dwiic0: timed out reading remaining 1
> tipmic0: can't read register 0x01
> dwiic0: timed out waiting for bus idle
> 
> Index: tipmic.c
> ===
> RCS file: /cvs/src/sys/dev/acpi/tipmic.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 tipmic.c
> --- tipmic.c  6 Apr 2022 18:59:27 -   1.7
> +++ tipmic.c  26 Feb 2023 23:56:04 -
> @@ -276,6 +276,25 @@ struct tipmic_regmap tipmic_thermal_regm
>   { 0x18, TIPMIC_SYSTEMP_HI, TIPMIC_SYSTEMP_LO }
>  };
>  
> +static int
> +tipmic_wait_adc(struct tipmic_softc *sc)
> +{
> + int i;
> +
> + if (!cold) {
> + return (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic",
> + SEC_TO_NSEC(1)));
> + }
> +
> + for (i = 0; i < 1000; i++) {
> + delay(1000);
> + if (tipmic_intr(sc) == 1)
> + return (0);
> + }
> +
> + return (EWOULDBLOCK);
> +}
> +
>  int
>  tipmic_thermal_opreg_handler(void *cookie, int iodir, uint64_t address,
>  int size, uint64_t *value)
> @@ -333,8 +352,7 @@ tipmic_thermal_opreg_handler(void *cooki
>   splx(s);
>  
>   while (sc->sc_stat_adc == 0) {
> - if (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic",
> - SEC_TO_NSEC(1))) {
> + if (tipmic_wait_adc(sc)) {
>   

Re: possible segmentation violation in login_radius

2023-03-02 Thread Todd C . Miller
On Thu, 02 Mar 2023 17:28:01 +0100, "Peter J. Philipp" wrote:

> I just looked up RADIUS in RFC 2865 and on page 15 it reads:
>
> ->
>Length
>
>   The Length field is two octets.  It indicates the length of the
>   packet including the Code, Identifier, Length, Authenticator and
>   Attribute fields.  Octets outside the range of the Length field
>   MUST be treated as padding and ignored on reception.  If the
>   packet is shorter than the Length field indicates, it MUST be
>   silently discarded.  The minimum length is 20 and maximum length
>   is 4096.
>
> <-
>
> Notice the silent discard, so perhaps we should try again?  I adjusted
> the patch for this, but I can't test.

I think it is simplest to just retry if rad_recv() returns -1 until
we hit the max retries.  Here's a diff relative to the earlier one
I just committed that simplifies the retry logic a bit by removing
the use of timedout in raddauth().

 - todd

Index: libexec/login_radius/raddauth.c
===
RCS file: /cvs/src/libexec/login_radius/raddauth.c,v
retrieving revision 1.31
diff -u -p -u -r1.31 raddauth.c
--- libexec/login_radius/raddauth.c 2 Mar 2023 16:13:57 -   1.31
+++ libexec/login_radius/raddauth.c 2 Mar 2023 16:50:27 -
@@ -114,13 +114,10 @@
 char *radius_dir = RADIUS_DIR;
 char auth_secret[MAXSECRETLEN+1];
 volatile sig_atomic_t timedout;
-int alt_retries;
-int retries;
+in_port_t radius_port;
+in_addr_t auth_server;
 int sockfd;
 int timeout;
-in_addr_t alt_server;
-in_addr_t auth_server;
-in_port_t radius_port;
 
 typedef struct {
u_char  code;
@@ -158,6 +155,8 @@ raddauth(char *username, char *class, ch
struct servent *svp;
struct sockaddr_in sin;
struct sigaction sa;
+   int alt_retries, retries;
+   in_addr_t alt_server;
const char *errstr;
 
memset(_pwstate, 0, sizeof(_pwstate));
@@ -268,24 +267,20 @@ raddauth(char *username, char *class, ch
sa.sa_flags = 0;/* don't restart system calls */
(void)sigaction(SIGALRM, , NULL);
 retry:
-   if (timedout) {
-   timedout = 0;
-   if (--retries <= 0) {
-   /*
-* If we ran out of tries but there is an alternate
-* server, switch to it and try again.
-*/
-   if (alt_retries) {
-   auth_server = alt_server;
-   retries = alt_retries;
-   alt_retries = 0;
-   getsecret();
-   } else
-   warnx("no response from authentication server");
-   }
+   if (retries-- <= 0) {
+   /*
+* If we ran out of tries but there is an alternate
+* server, switch to it and try again.
+*/
+   if (alt_retries) {
+   auth_server = alt_server;
+   retries = alt_retries;
+   alt_retries = 0;
+   getsecret();
+   } else
+   warnx("no response from authentication server");
}
-
-   if (retries > 0) {
+   if (retries >= 0) {
rad_request(req_id, userstyle, passwd, auth_port, vector,
pwstate);
 
@@ -324,9 +319,11 @@ retry:
passwd = "";
break;
 
+   case -1:
+   /* Timeout or bad packet, retry. */
+   goto retry;
+
default:
-   if (timedout)
-   goto retry;
snprintf(_pwstate, sizeof(_pwstate),
"invalid response type %d\n", i);
*emsg = _pwstate;
@@ -460,12 +457,16 @@ rad_recv(char *state, char *challenge, u
(struct sockaddr *), );
alarm(0);
if (total_length < AUTH_HDR_LEN) {
-   if (timedout)
+   if (timedout) {
+   timedout = 0;
return(-1);
+   }
errx(1, "bogus auth packet from server");
}
-   if (ntohs(auth.length) > total_length)
-   errx(1, "bogus auth packet from server");
+   if (ntohs(auth.length) > total_length) {
+   /* RFC 2865 says to silently discard short packets. */
+   return(-1);
+   }
 
if (sin.sin_addr.s_addr != auth_server)
errx(1, "bogus authentication server");



Re: possible segmentation violation in login_radius

2023-03-02 Thread Peter J. Philipp
On Thu, Mar 02, 2023 at 09:31:57AM -0700, Theo de Raadt wrote:
> Using a global variable like that is poor style.

OK, I'm gonna give it one more attempt:

In RFC 2865 there is no auth code for discarding a message but there is a
255 reserved value which we may be able to use as a hack.  Refer to page
14 of RFC 2865.

The updated patch then returns from rad_recv() with that 255 and is caught
in the switch/case, and executes with a following goto retry.

Again I don't have a test network for this.

Best Regards,
-peter


Index: raddauth.c
===
RCS file: /cvs/src/libexec/login_radius/raddauth.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 raddauth.c
--- raddauth.c  28 Jun 2019 13:32:53 -  1.30
+++ raddauth.c  2 Mar 2023 16:46:37 -
@@ -105,6 +105,7 @@
 #definePW_CLIENT_PORT_ID   5
 #define PW_PORT_MESSAGE18
 #define PW_STATE   24
+#define PW_SILENT_DISCARD  255 /* Reserved in RFC 2865 */
 
 #ifndefRADIUS_DIR
 #define RADIUS_DIR "/etc/raddb"
@@ -324,6 +325,10 @@ retry:
passwd = "";
break;
 
+   case PW_SILENT_DISCARD:
+   goto retry;
+   break;
+
default:
if (timedout)
goto retry;
@@ -451,17 +456,22 @@ rad_recv(char *state, char *challenge, u
struct sockaddr_in sin;
u_char recv_vector[AUTH_VECTOR_LEN], test_vector[AUTH_VECTOR_LEN];
MD5_CTX context;
+   ssize_t total_length;
 
salen = sizeof(sin);
 
alarm(timeout);
-   if ((recvfrom(sockfd, , sizeof(auth), 0,
-   (struct sockaddr *), )) < AUTH_HDR_LEN) {
+   total_length = recvfrom(sockfd, , sizeof(auth), 0,
+   (struct sockaddr *), );
+   alarm(0);
+   if (total_length < AUTH_HDR_LEN) {
if (timedout)
return(-1);
errx(1, "bogus auth packet from server");
}
-   alarm(0);
+   if (ntohs(auth.length) > total_length) {
+   return (PW_SILENT_DISCARD);
+   }
 
if (sin.sin_addr.s_addr != auth_server)
errx(1, "bogus authentication server");



Re: possible segmentation violation in login_radius

2023-03-02 Thread Theo de Raadt
Using a global variable like that is poor style.



Re: possible segmentation violation in login_radius

2023-03-02 Thread Peter J. Philipp
On Thu, Mar 02, 2023 at 09:09:31AM -0700, Todd C. Miller wrote:
> On Thu, 02 Mar 2023 09:07:38 -0700, "Theo de Raadt" wrote:
> 
> > +   if (auth.length > total_length)
> >
> > Isn't auth.length a network byte order value?
> 
> Ah yes, good catch; it needs an ntohs().
> 
>  - todd

Hi,

I just looked up RADIUS in RFC 2865 and on page 15 it reads:

->
   Length

  The Length field is two octets.  It indicates the length of the
  packet including the Code, Identifier, Length, Authenticator and
  Attribute fields.  Octets outside the range of the Length field
  MUST be treated as padding and ignored on reception.  If the
  packet is shorter than the Length field indicates, it MUST be
  silently discarded.  The minimum length is 20 and maximum length
  is 4096.

<-

Notice the silent discard, so perhaps we should try again?  I adjusted
the patch for this, but I can't test.

Best Regards,
-peter


Index: raddauth.c
===
RCS file: /cvs/src/libexec/login_radius/raddauth.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 raddauth.c
--- raddauth.c  28 Jun 2019 13:32:53 -  1.30
+++ raddauth.c  2 Mar 2023 16:24:02 -
@@ -114,6 +114,7 @@
 char *radius_dir = RADIUS_DIR;
 char auth_secret[MAXSECRETLEN+1];
 volatile sig_atomic_t timedout;
+int silent_discard;
 int alt_retries;
 int retries;
 int sockfd;
@@ -325,7 +326,7 @@ retry:
break;
 
default:
-   if (timedout)
+   if (timedout || silent_discard)
goto retry;
snprintf(_pwstate, sizeof(_pwstate),
"invalid response type %d\n", i);
@@ -451,17 +452,24 @@ rad_recv(char *state, char *challenge, u
struct sockaddr_in sin;
u_char recv_vector[AUTH_VECTOR_LEN], test_vector[AUTH_VECTOR_LEN];
MD5_CTX context;
+   ssize_t total_length;
 
salen = sizeof(sin);
+   silent_discard = 0;
 
alarm(timeout);
-   if ((recvfrom(sockfd, , sizeof(auth), 0,
-   (struct sockaddr *), )) < AUTH_HDR_LEN) {
+   total_length = recvfrom(sockfd, , sizeof(auth), 0,
+   (struct sockaddr *), );
+   alarm(0);
+   if (total_length < AUTH_HDR_LEN) {
if (timedout)
return(-1);
errx(1, "bogus auth packet from server");
}
-   alarm(0);
+   if (ntohs(auth.length) > total_length) {
+   silent_discard = 1;
+   return(-1);
+   }
 
if (sin.sin_addr.s_addr != auth_server)
errx(1, "bogus authentication server");



Re: possible segmentation violation in login_radius

2023-03-02 Thread Todd C . Miller
On Thu, 02 Mar 2023 09:07:38 -0700, "Theo de Raadt" wrote:

> +   if (auth.length > total_length)
>
> Isn't auth.length a network byte order value?

Ah yes, good catch; it needs an ntohs().

 - todd

Index: libexec/login_radius/raddauth.c
===
RCS file: /cvs/src/libexec/login_radius/raddauth.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 raddauth.c
--- libexec/login_radius/raddauth.c 28 Jun 2019 13:32:53 -  1.30
+++ libexec/login_radius/raddauth.c 2 Mar 2023 16:08:41 -
@@ -451,17 +451,21 @@ rad_recv(char *state, char *challenge, u
struct sockaddr_in sin;
u_char recv_vector[AUTH_VECTOR_LEN], test_vector[AUTH_VECTOR_LEN];
MD5_CTX context;
+   ssize_t total_length;
 
salen = sizeof(sin);
 
alarm(timeout);
-   if ((recvfrom(sockfd, , sizeof(auth), 0,
-   (struct sockaddr *), )) < AUTH_HDR_LEN) {
+   total_length = recvfrom(sockfd, , sizeof(auth), 0,
+   (struct sockaddr *), );
+   alarm(0);
+   if (total_length < AUTH_HDR_LEN) {
if (timedout)
return(-1);
errx(1, "bogus auth packet from server");
}
-   alarm(0);
+   if (ntohs(auth.length) > total_length)
+   errx(1, "bogus auth packet from server");
 
if (sin.sin_addr.s_addr != auth_server)
errx(1, "bogus authentication server");



Re: possible segmentation violation in login_radius

2023-03-02 Thread Peter J. Philipp
On Thu, Mar 02, 2023 at 08:56:10AM -0700, Todd C. Miller wrote:
> The following patch should fix the problem, can you try it out?
> 
>  - todd

Hi Todd,

thanks for the quick patch that was really awesome!  I modified it a little
to use ntohs(auth.length) in the length check.  Other than that it reads
great and compiles.  I don't have a radius setup here at the moment so I
can't test it.

Best Regards,
-peter


Index: raddauth.c
===
RCS file: /cvs/src/libexec/login_radius/raddauth.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 raddauth.c
--- raddauth.c  28 Jun 2019 13:32:53 -  1.30
+++ raddauth.c  2 Mar 2023 16:05:20 -
@@ -451,17 +451,21 @@ rad_recv(char *state, char *challenge, u
struct sockaddr_in sin;
u_char recv_vector[AUTH_VECTOR_LEN], test_vector[AUTH_VECTOR_LEN];
MD5_CTX context;
+   ssize_t total_length;
 
salen = sizeof(sin);
 
alarm(timeout);
-   if ((recvfrom(sockfd, , sizeof(auth), 0,
-   (struct sockaddr *), )) < AUTH_HDR_LEN) {
+   total_length = recvfrom(sockfd, , sizeof(auth), 0,
+   (struct sockaddr *), );
+   alarm(0);
+   if (total_length < AUTH_HDR_LEN) {
if (timedout)
return(-1);
errx(1, "bogus auth packet from server");
}
-   alarm(0);
+   if (ntohs(auth.length) > total_length)
+   errx(1, "bogus auth packet from server");
 
if (sin.sin_addr.s_addr != auth_server)
errx(1, "bogus authentication server");



Re: possible segmentation violation in login_radius

2023-03-02 Thread Theo de Raadt
+   if (auth.length > total_length)

Isn't auth.length a network byte order value?



Re: possible segmentation violation in login_radius

2023-03-02 Thread Todd C . Miller
The following patch should fix the problem, can you try it out?

 - todd

Index: libexec/login_radius/raddauth.c
===
RCS file: /cvs/src/libexec/login_radius/raddauth.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 raddauth.c
--- libexec/login_radius/raddauth.c 28 Jun 2019 13:32:53 -  1.30
+++ libexec/login_radius/raddauth.c 2 Mar 2023 15:54:18 -
@@ -451,17 +451,21 @@ rad_recv(char *state, char *challenge, u
struct sockaddr_in sin;
u_char recv_vector[AUTH_VECTOR_LEN], test_vector[AUTH_VECTOR_LEN];
MD5_CTX context;
+   ssize_t total_length;
 
salen = sizeof(sin);
 
alarm(timeout);
-   if ((recvfrom(sockfd, , sizeof(auth), 0,
-   (struct sockaddr *), )) < AUTH_HDR_LEN) {
+   total_length = recvfrom(sockfd, , sizeof(auth), 0,
+   (struct sockaddr *), );
+   alarm(0);
+   if (total_length < AUTH_HDR_LEN) {
if (timedout)
return(-1);
errx(1, "bogus auth packet from server");
}
-   alarm(0);
+   if (auth.length > total_length)
+   errx(1, "bogus auth packet from server");
 
if (sin.sin_addr.s_addr != auth_server)
errx(1, "bogus authentication server");



possible segmentation violation in login_radius

2023-03-02 Thread pjp
>Synopsis:  possible segmentation violation in login radius
>Category:  system
>Environment:
System  : OpenBSD 7.2
Details : OpenBSD 7.2 (GENERIC.MP) #2: Thu Nov 24 23:53:03 MST 2022
 
r...@syspatch-72-arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP

Architecture: OpenBSD.arm64
Machine : arm64
>Description:
While bored and reading through tech@ someone was using radius server.
So I wanted to see if they are using login_radius(8), and that answer was no.
But while there I got stuck reading the code :}.

I saw a segmentation violation in the MD5 code, in raddauth.c line 473:

473 MD5Update(, (u_char *), ntohs(auth.length));

This length comes from the network payload and if over a specific value, it
will read beyond auth.

125 typedef struct {
126 u_char  code;
127 u_char  id;
128 u_short length;
129 u_char  vector[AUTH_VECTOR_LEN];
130 u_char  data[4096 - AUTH_HDR_LEN];
131 } auth_hdr_t;

that is the size of auth.
>How-To-Repeat:
This may be used as a dos in a flood when someone is logging in?  I made a
test program that shows the segmentation fault:

#define LENGTH 4096

int
main(void)
{
char auth[LENGTH];
MD5_CTX context;
uint8_t test_vector[MD5_DIGEST_LENGTH];

MD5Init();
MD5Update(, (u_char *), LENGTH * 2);
MD5Final(test_vector, );

exit(0);
}

pjp@polarstern$ ./testprog
Segmentation fault (core dumped) 

>Fix:
It is pretty insane here not to use IPSEC, but this is just a workaround.  
The right thing to do would be to get the value of length from recvfrom() 
and use that.



dmesg:
see earlier posts last month.



Re: Memory issue on desktop at home, probably Radeon related

2023-03-02 Thread Landry Breuil
Le Thu, Mar 02, 2023 at 10:26:55AM +0100, Marc Espie a écrit :
> Every few days, my desktop at home suddenly slows down. Looking at top,
> I can see that literally everything is being forced into swap (I have 16G
> of memory, when the problem starts, I usually have roughly 5-7G of memory 
> used, so very far below the limit, and I can see it drain into swap, 
> until there's 100K of memory used and everything else in swap).
> 
> I haven't tinkered with bufcachepercent.
> 
> At that point the machine is completely frozen, and I have no option besides
> materially stopping it.
> 
> If I catch it before that, I can try to reboot it to avoid fsck.
> 
> Same kind of workload I use at home and at work: lots of chrome with images, 
> lots of image/video displays with mpv, lots of youtube, and also a game
> that wants webgl (elvenar) which doesn't work on firefox at all.
> 
> The big difference is that my home box is an amd with a radeon TURKS adapter
> (r300 I think ?)
> 
> I suspect something in the memory allocator of dri wants memory with dma
> constraints, and somehow, the pagedaemon decides to put everything into swap.
> 
> I've talked a bit to kettenis@, and followed a red herring through to 
> instrumenting the memory allocation routines that this radeon does NOT 
> abuse (namely alloc_pages and friends: allocates a grant total of one single
> page). I will try to look at the ttm stuff next, probably.
> 
> Since this takes usually a day or three to trigger, I don't know when this
> started exactly, but this has been going on for at least a few months.
> 
> Just in case somebody has a bright idea, or is experiencing similar issues.

I'm having a totally similar issue on my old optiplex radeondrm @work w/
8Gb RAM (sorry no dmesg for now), running top -SH in a term i see
pagedaemon going berserk before things slow down to a halt. Sometimes it
recovers after a minute or too, sometimes i give up and powercycle.
Happens most of the days and sometimes several times a day. Been
happening for 1 year maybe.
Discussed it a bit with claudio@ who told me it was somewhat already
known, about having a 'special region for the first 4Gb of RAM' that
apparently goes low on available free mem and advised talking to mpi@ :).

I know this doesnt help, but if i have some precise guidance i can try
to extract info from the box by breaking into ddb, i should have serial
somewhere.

Landry



Memory issue on desktop at home, probably Radeon related

2023-03-02 Thread Marc Espie
Every few days, my desktop at home suddenly slows down. Looking at top,
I can see that literally everything is being forced into swap (I have 16G
of memory, when the problem starts, I usually have roughly 5-7G of memory 
used, so very far below the limit, and I can see it drain into swap, 
until there's 100K of memory used and everything else in swap).

I haven't tinkered with bufcachepercent.

At that point the machine is completely frozen, and I have no option besides
materially stopping it.

If I catch it before that, I can try to reboot it to avoid fsck.

Same kind of workload I use at home and at work: lots of chrome with images, 
lots of image/video displays with mpv, lots of youtube, and also a game
that wants webgl (elvenar) which doesn't work on firefox at all.

The big difference is that my home box is an amd with a radeon TURKS adapter
(r300 I think ?)

I suspect something in the memory allocator of dri wants memory with dma
constraints, and somehow, the pagedaemon decides to put everything into swap.

I've talked a bit to kettenis@, and followed a red herring through to 
instrumenting the memory allocation routines that this radeon does NOT 
abuse (namely alloc_pages and friends: allocates a grant total of one single
page). I will try to look at the ttm stuff next, probably.

Since this takes usually a day or three to trigger, I don't know when this
started exactly, but this has been going on for at least a few months.

Just in case somebody has a bright idea, or is experiencing similar issues.

Here's my dmesg.

OpenBSD 7.2-current (GENERIC.MP) #1074: Thu Feb 23 12:15:52 MST 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17108107264 (16315MB)
avail mem = 16570232832 (15802MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb600 (49 entries)
bios0: vendor American Megatrends Inc. version "0601" date 12/25/2012
bios0: ASUSTeK COMPUTER INC. CM1435
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT MCFG HPET MSDM SSDT SSDT IVRS BGRT
acpi0: wakeup devices SBAZ(S4) PS2K(S4) OHC1(S4) EHC1(S4) OHC2(S4) EHC2(S4) 
OHC3(S4) EHC3(S4) OHC4(S4) XHC0(S4) XHC1(S4) PE21(S4) RLAN(S4) PE22(S4) 
PE23(S4) PCE2(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 16 (boot processor)
cpu0: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.18 MHz, 15-10-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
cpu0: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 17 (application processor)
cpu1: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.30 MHz, 15-10-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
cpu1: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 18 (application processor)
cpu2: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.34 MHz, 15-10-01
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
cpu2: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 19 (application processor)
cpu3: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.62 MHz, 15-10-01
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,TCE,NODEID,TBM,TOPEXT,CPCTR,ITSC,BMI1
cpu3: 16KB 64b/line 4-way D-cache, 64KB 64b/line 2-way I-cache, 2MB 64b/line 
16-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 5 pa 0xfec0, version 21, 24 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255