Re: NVIDIA / Mellanox ConnectX-6 Lx support

2023-09-06 Thread Jonathan Gray
On Wed, Sep 06, 2023 at 05:38:28PM +0200, Olivier Croquin wrote:
> Hi,
> 
> recently, we have bought servers with Mellanox Ethernet Adaptors of the
> MT2894
> Family [ConnectX‑6 Lx] :
>  - OCP3 form factor : MCX631432AS-ADAI
>  - PCIe form factor : MCX631102AS-ADAT
> 
> We want to run OpenBSD on those servers.
> But, looking at the source code, we figured out that the MT2894 family is
> not
> supported.
> 
> I wrote these "quick and dirty patch", hopping that sys/dev/pci/if_mcx.c
> includes
> everything that ConnectX-6 LX cards need to operate :

thanks, committed

the .h files are generated by an awk script
modify pcidevs then run make in sys/dev/pci

> 
> ===
> --- sys/dev/pci/if_mcx.c Mon Nov 21 23:48:32 2022
> +++ sys/dev/pci/if_mcx.c Thu Aug 10 09:44:21 2023
> @@ -2657,6 +2657,7 @@ static const struct pci_matchid mcx_devices[] = {
>   { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT28800VF },
>   { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT28908 },
>   { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT2892  },
> + { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT2894  },
>  };
> 
>  struct mcx_eth_proto_capability {
> 
> ===
> --- sys/dev/pci/pcidevs.h Sun Mar 19 03:38:43 2023
> +++ sys/dev/pci/pcidevs.h Thu Aug 10 09:44:21 2023
> @@ -7261,6 +7261,7 @@
>  #define PCI_PRODUCT_MELLANOX_MT28908 0x101b /* ConnectX-6 */
>  #define PCI_PRODUCT_MELLANOX_MT28908VF 0x101c /* ConnectX-6 VF */
>  #define PCI_PRODUCT_MELLANOX_MT2892 0x101d /* ConnectX-6 Dx */
> +#define PCI_PRODUCT_MELLANOX_MT2894 0x101f /* ConnectX-6 Lx */
>  #define PCI_PRODUCT_MELLANOX_CONNECTX_EN 0x6368 /* ConnectX EN */
> 
>  /* Mentor */
> 
> ===
> --- sys/dev/pci/pcidevs_data.h Sun Mar 19 03:38:43 2023
> +++ sys/dev/pci/pcidevs_data.h Thu Aug 10 09:44:21 2023
> @@ -26184,6 +26184,10 @@ static const struct pci_known_product
> pci_known_produc
>  "ConnectX-6 Dx",
>   },
>   {
> +PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT2894,
> +"ConnectX-6 Lx",
> + },
> + {
>  PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_CONNECTX_EN,
>  "ConnectX EN",
>   },
> 
> Thanks to this patch, the cards are recognized by the system. The first
> tests
> don't show any problem. But I'm not sure of anything, because I didn't
> check
> if_mcx.c with the datasheet of the cards (and I'm not able to do this).
> 
> I would like to know if it could be possible to add an official support
> of the Mellanox MT2894 family (ConnectX-6 Lx) in the upcoming OpenBSD
> release ?
> 
> Best regards
> Olivier.
> 
> PS : for your information, already posted this morning on misc mailing list
> (https://marc.info/?l=openbsd-misc=169398814929531=2) but it wasn't
> maybe the best mailing list.
> 



Re: Use counters_read(9) from ddb(4)

2023-09-06 Thread Alexander Bluhm
On Wed, Sep 06, 2023 at 12:23:33PM -0500, Scott Cheloha wrote:
> On Wed, Sep 06, 2023 at 01:04:19PM +0100, Martin Pieuchot wrote:
> > Debugging OOM is hard.  UVM uses per-CPU counters and sadly
> > counters_read(9) needs to allocate memory.  This is not acceptable in
> > ddb(4).  As a result I cannot see the content of UVM counters in OOM
> > situations.
> > 
> > Diff below introduces a *_static() variant of counters_read(9) that
> > takes a secondary buffer to avoid calling malloc(9).  Is it fine?  Do
> > you have a better idea?  Should we make it the default or using the
> > stack might be a problem?
> 
> Instead of adding a second interface I think we could get away with
> just extending counters_read(9) to take a scratch buffer as an optional
> fourth parameter:
> 
> void
> counters_read(struct cpumem *cm, uint64_t *output, unsigned int n,
> uint64_t *scratch);
> 
> "scratch"?  "temp"?  "tmp"?

scratch is fine for me

> This kinda looks like a case where we could annotate these pointers
> with 'restrict', but I have never fully understood when 'restrict' is
> appropriate vs. when it is overkill or useless.

restrict scares me

#ifdef MULTIPROCESSOR

>  void
> -counters_read(struct cpumem *cm, uint64_t *output, unsigned int n)
> +counters_read(struct cpumem *cm, uint64_t *output, unsigned int n,
> +uint64_t *scratch)

#else

>  void
> -counters_read(struct cpumem *cm, uint64_t *output, unsigned int n)
> +counters_read(struct cpumem *cm, uint64_t *output, uint64_t *scratch,
> +unsigned int n)

Here scratch and n are swapped.  Build a ramdisk kernel :-)

bluhm



Re: Use counters_read(9) from ddb(4)

2023-09-06 Thread Klemens Nanni
On Wed, Sep 06, 2023 at 12:23:33PM -0500, Scott Cheloha wrote:
> On Wed, Sep 06, 2023 at 01:04:19PM +0100, Martin Pieuchot wrote:
> > Debugging OOM is hard.  UVM uses per-CPU counters and sadly
> > counters_read(9) needs to allocate memory.  This is not acceptable in
> > ddb(4).  As a result I cannot see the content of UVM counters in OOM
> > situations.
> > 
> > Diff below introduces a *_static() variant of counters_read(9) that
> > takes a secondary buffer to avoid calling malloc(9).  Is it fine?  Do
> > you have a better idea?  Should we make it the default or using the
> > stack might be a problem?
> 
> Instead of adding a second interface I think we could get away with
> just extending counters_read(9) to take a scratch buffer as an optional
> fourth parameter:
> 
> void
> counters_read(struct cpumem *cm, uint64_t *output, unsigned int n,
> uint64_t *scratch);
> 
> "scratch"?  "temp"?  "tmp"?

Just "buf".

> 
> Anyway, a NULL scratch means "allocate this for me", otherwise you're
> saying you've brought your own.  Obviously the contents of scratch are
> undefined upon return.

That's precisely what I did some a year ago to get rid of counter_read()'s
sleep point in order to replace an rwlock with a mutex in network code.

It was OK, but I think the approach that required it stalled... not sure.

The diff mechanically added a NULL argument to all callers so that future
diffs would actually change semantics along with the code that needs it.

Here are the manual bits I wrote, in case it helps.

diff --git a/share/man/man9/counters_alloc.9 b/share/man/man9/counters_alloc.9
index 1de8bffc639..b8e1438d5fd 100644
--- a/share/man/man9/counters_alloc.9
+++ b/share/man/man9/counters_alloc.9
@@ -64,6 +64,7 @@
 .Fa "struct cpumem *cm"
 .Fa "uint64_t *counters"
 .Fa "unsigned int ncounters"
+.Fa "uint64_t *buf"
 .Fc
 .Ft void
 .Fn counters_zero "struct cpumem *cm" "unsigned int ncounters"
@@ -191,6 +192,12 @@ The sum of the counters is written to the
 array.
 The number of counters is specified with
 .Fa ncounters .
+.Fa buf
+may point to a buffer used to temporarily hold
+.Fa counters .
+If
+.Dv NULL ,
+one will be dynamically allocated and freed.
 .Pp
 .Fn counters_zero
 iterates over each CPU's set of counters referenced by



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Mischa

On 2023-09-06 19:38, Dave Voutila wrote:

Mischa  writes:

On 2023-09-06 05:36, Dave Voutila wrote:

Mischa  writes:

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:

/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become 
unresponsive

again,
but nothing on the console. :(
Mischa

Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.
I don't believe we solved any of the underlying uvm issues in 
Bruges

last year. Mischa, can you test with just the latest
snapshot/-current?
I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a 
uvm

share into the device process address space.


If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.

Sadly it looks like that printf doesn't spit out the offending
pid. :(

Just to confirm I am seeing this behavior on the latest snap
without
the patch as well.

Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.

Just started 10 VMs with sleep 2, machine freezes, but nothing on 
the

console. :(

For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, 
starting,

and destroying vm's rapidly in a loop.


That could very well be the case. I will adjust my start script, so
far I've got good results with a 10 second sleep.

Is there some additional debugging I can turn that makes sense for
this? I can easily replicate.



Highly doubtful if the issue is what I think. The only thing would be
making sure you're running in a way to see any panic and drop into
ddb. If you're using X or not on the the primary console or serial
connection it might just appear as a deadlocked system during a panic.


I am using the console via iDRAC, there isn't any information anymore. 
:(


Mischa



Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Dave Voutila


Mischa  writes:

> On 2023-09-06 05:36, Dave Voutila wrote:
>> Mischa  writes:
>>> On 2023-09-05 14:27, Dave Voutila wrote:
 Mike Larkin  writes:

> On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:
>> On 2023-09-04 18:58, Mischa wrote:
>> > On 2023-09-04 18:55, Mischa wrote:
 /snip

>> > > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
>> > > this way, before it would choke on 2-3.
>> > >
>> > > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>> >
>> > I do still get the same message on the console, but the machine isn't
>> > freezing up.
>> >
>> > [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
>> > MAP_STACK
>> Starting 30 VMs this way caused the machine to become unresponsive
>> again,
>> but nothing on the console. :(
>> Mischa
> Were you seeing these uvm errors before this diff? If so, this
> isn't
> causing the problem and something else is.
 I don't believe we solved any of the underlying uvm issues in Bruges
 last year. Mischa, can you test with just the latest
 snapshot/-current?
 I'd imagine starting and stopping many vm's now is exacerbating the
 issue because of the fork/exec for devices plus the ioctl to do a uvm
 share into the device process address space.

> If this diff causes the errors to occur, and without the diff it's
> fine, then
> we need to look into that.
> Also I think a pid number in that printf might be useful, I'll see
> what I can
> find. If it's not vmd causing this and rather some other process
> then that
> would be good to know also.
 Sadly it looks like that printf doesn't spit out the offending
 pid. :(
>>> Just to confirm I am seeing this behavior on the latest snap
>>> without
>>> the patch as well.
>> Since this diff isn't the cause, I've committed it. Thanks for
>> testing. I'll see if I can reproduce your MAP_STACK issues.
>>
>>> Just started 10 VMs with sleep 2, machine freezes, but nothing on the
>>> console. :(
>> For now, I'd recommend spacing out vm launches. I'm pretty sure it's
>> related to the uvm corruption we saw last year when creating, starting,
>> and destroying vm's rapidly in a loop.
>
> That could very well be the case. I will adjust my start script, so
> far I've got good results with a 10 second sleep.
>
> Is there some additional debugging I can turn that makes sense for
> this? I can easily replicate.
>

Highly doubtful if the issue is what I think. The only thing would be
making sure you're running in a way to see any panic and drop into
ddb. If you're using X or not on the the primary console or serial
connection it might just appear as a deadlocked system during a panic.

-dv



Re: Use counters_read(9) from ddb(4)

2023-09-06 Thread Scott Cheloha
On Wed, Sep 06, 2023 at 01:04:19PM +0100, Martin Pieuchot wrote:
> Debugging OOM is hard.  UVM uses per-CPU counters and sadly
> counters_read(9) needs to allocate memory.  This is not acceptable in
> ddb(4).  As a result I cannot see the content of UVM counters in OOM
> situations.
> 
> Diff below introduces a *_static() variant of counters_read(9) that
> takes a secondary buffer to avoid calling malloc(9).  Is it fine?  Do
> you have a better idea?  Should we make it the default or using the
> stack might be a problem?

Instead of adding a second interface I think we could get away with
just extending counters_read(9) to take a scratch buffer as an optional
fourth parameter:

void
counters_read(struct cpumem *cm, uint64_t *output, unsigned int n,
uint64_t *scratch);

"scratch"?  "temp"?  "tmp"?

Anyway, a NULL scratch means "allocate this for me", otherwise you're
saying you've brought your own.  Obviously the contents of scratch are
undefined upon return.

This kinda looks like a case where we could annotate these pointers
with 'restrict', but I have never fully understood when 'restrict' is
appropriate vs. when it is overkill or useless.

Index: ./kern/subr_percpu.c
===
RCS file: /cvs/src/sys/kern/subr_percpu.c,v
retrieving revision 1.10
diff -u -p -r1.10 subr_percpu.c
--- ./kern/subr_percpu.c3 Oct 2022 14:10:53 -   1.10
+++ ./kern/subr_percpu.c6 Sep 2023 17:18:46 -
@@ -159,17 +159,19 @@ counters_free(struct cpumem *cm, unsigne
 }
 
 void
-counters_read(struct cpumem *cm, uint64_t *output, unsigned int n)
+counters_read(struct cpumem *cm, uint64_t *output, unsigned int n,
+uint64_t *scratch)
 {
struct cpumem_iter cmi;
-   uint64_t *gen, *counters, *temp;
+   uint64_t *gen, *counters, *temp = scratch;
uint64_t enter, leave;
unsigned int i;
 
for (i = 0; i < n; i++)
output[i] = 0;
 
-   temp = mallocarray(n, sizeof(uint64_t), M_TEMP, M_WAITOK);
+   if (scratch == NULL)
+   temp = mallocarray(n, sizeof(uint64_t), M_TEMP, M_WAITOK);
 
gen = cpumem_first(, cm);
do {
@@ -202,7 +204,8 @@ counters_read(struct cpumem *cm, uint64_
gen = cpumem_next(, cm);
} while (gen != NULL);
 
-   free(temp, M_TEMP, n * sizeof(uint64_t));
+   if (scratch == NULL)
+   free(temp, M_TEMP, n * sizeof(uint64_t));
 }
 
 void
@@ -305,7 +308,8 @@ counters_free(struct cpumem *cm, unsigne
 }
 
 void
-counters_read(struct cpumem *cm, uint64_t *output, unsigned int n)
+counters_read(struct cpumem *cm, uint64_t *output, uint64_t *scratch,
+unsigned int n)
 {
uint64_t *counters;
unsigned int i;
Index: ./kern/kern_sysctl.c
===
RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.418
diff -u -p -r1.418 kern_sysctl.c
--- ./kern/kern_sysctl.c16 Jul 2023 03:01:31 -  1.418
+++ ./kern/kern_sysctl.c6 Sep 2023 17:18:47 -
@@ -519,7 +519,7 @@ kern_sysctl(int *name, u_int namelen, vo
unsigned int i;
 
memset(, 0, sizeof(mbs));
-   counters_read(mbstat, counters, MBSTAT_COUNT);
+   counters_read(mbstat, counters, nitems(counters), NULL);
for (i = 0; i < MBSTAT_TYPES; i++)
mbs.m_mtypes[i] = counters[i];
 
Index: ./kern/subr_evcount.c
===
RCS file: /cvs/src/sys/kern/subr_evcount.c,v
retrieving revision 1.15
diff -u -p -r1.15 subr_evcount.c
--- ./kern/subr_evcount.c   5 Dec 2022 08:58:49 -   1.15
+++ ./kern/subr_evcount.c   6 Sep 2023 17:18:47 -
@@ -101,7 +101,7 @@ evcount_sysctl(int *name, u_int namelen,
 {
int error = 0, s, nintr, i;
struct evcount *ec;
-   u_int64_t count;
+   u_int64_t count, scratch;
 
if (newp != NULL)
return (EPERM);
@@ -129,7 +129,7 @@ evcount_sysctl(int *name, u_int namelen,
if (ec == NULL)
return (ENOENT);
if (ec->ec_percpu != NULL) {
-   counters_read(ec->ec_percpu, , 1);
+   counters_read(ec->ec_percpu, , 1, );
} else {
s = splhigh();
count = ec->ec_count;
Index: ./net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.707
diff -u -p -r1.707 if.c
--- ./net/if.c  18 Aug 2023 08:10:16 -  1.707
+++ ./net/if.c  6 Sep 2023 17:18:49 -
@@ -2797,7 +2797,8 @@ if_getdata(struct ifnet *ifp, struct if_
if (ifp->if_counters != NULL) {
uint64_t counters[ifc_ncounters];
 
-   counters_read(ifp->if_counters, counters, nitems(counters));
+   counters_read(ifp->if_counters, 

Re: vmd/vmm: remove an ioctl from the vcpu hotpath, go brrr

2023-09-06 Thread Mischa

On 2023-09-06 05:36, Dave Voutila wrote:

Mischa  writes:

On 2023-09-05 14:27, Dave Voutila wrote:

Mike Larkin  writes:


On Mon, Sep 04, 2023 at 07:57:18PM +0200, Mischa wrote:

On 2023-09-04 18:58, Mischa wrote:
> On 2023-09-04 18:55, Mischa wrote:

/snip


> > Adding the sleep 2 does indeed help. I managed to get 20 VMs started
> > this way, before it would choke on 2-3.
> >
> > Do I only need the unpatched kernel or also the vmd/vmctl from snap?
>
> I do still get the same message on the console, but the machine isn't
> freezing up.
>
> [umd173152/210775 sp=7a5f577a1780 inside 702698535000-702698d34fff: not
> MAP_STACK
Starting 30 VMs this way caused the machine to become unresponsive
again,
but nothing on the console. :(
Mischa

Were you seeing these uvm errors before this diff? If so, this
isn't
causing the problem and something else is.

I don't believe we solved any of the underlying uvm issues in Bruges
last year. Mischa, can you test with just the latest 
snapshot/-current?

I'd imagine starting and stopping many vm's now is exacerbating the
issue because of the fork/exec for devices plus the ioctl to do a uvm
share into the device process address space.


If this diff causes the errors to occur, and without the diff it's
fine, then
we need to look into that.
Also I think a pid number in that printf might be useful, I'll see
what I can
find. If it's not vmd causing this and rather some other process
then that
would be good to know also.

Sadly it looks like that printf doesn't spit out the offending
pid. :(


Just to confirm I am seeing this behavior on the latest snap without
the patch as well.


Since this diff isn't the cause, I've committed it. Thanks for
testing. I'll see if I can reproduce your MAP_STACK issues.


Just started 10 VMs with sleep 2, machine freezes, but nothing on the
console. :(


For now, I'd recommend spacing out vm launches. I'm pretty sure it's
related to the uvm corruption we saw last year when creating, starting,
and destroying vm's rapidly in a loop.


That could very well be the case. I will adjust my start script, so far 
I've got good results with a 10 second sleep.


Is there some additional debugging I can turn that makes sense for this? 
I can easily replicate.


Mischa



NVIDIA / Mellanox ConnectX-6 Lx support

2023-09-06 Thread Olivier Croquin
Hi,

recently, we have bought servers with Mellanox Ethernet Adaptors of the
MT2894
Family [ConnectX‑6 Lx] :
 - OCP3 form factor : MCX631432AS-ADAI
 - PCIe form factor : MCX631102AS-ADAT

We want to run OpenBSD on those servers.
But, looking at the source code, we figured out that the MT2894 family is
not
supported.

I wrote these "quick and dirty patch", hopping that sys/dev/pci/if_mcx.c
includes
everything that ConnectX-6 LX cards need to operate :

===
--- sys/dev/pci/if_mcx.c Mon Nov 21 23:48:32 2022
+++ sys/dev/pci/if_mcx.c Thu Aug 10 09:44:21 2023
@@ -2657,6 +2657,7 @@ static const struct pci_matchid mcx_devices[] = {
  { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT28800VF },
  { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT28908 },
  { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT2892  },
+ { PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT2894  },
 };

 struct mcx_eth_proto_capability {

===
--- sys/dev/pci/pcidevs.h Sun Mar 19 03:38:43 2023
+++ sys/dev/pci/pcidevs.h Thu Aug 10 09:44:21 2023
@@ -7261,6 +7261,7 @@
 #define PCI_PRODUCT_MELLANOX_MT28908 0x101b /* ConnectX-6 */
 #define PCI_PRODUCT_MELLANOX_MT28908VF 0x101c /* ConnectX-6 VF */
 #define PCI_PRODUCT_MELLANOX_MT2892 0x101d /* ConnectX-6 Dx */
+#define PCI_PRODUCT_MELLANOX_MT2894 0x101f /* ConnectX-6 Lx */
 #define PCI_PRODUCT_MELLANOX_CONNECTX_EN 0x6368 /* ConnectX EN */

 /* Mentor */

===
--- sys/dev/pci/pcidevs_data.h Sun Mar 19 03:38:43 2023
+++ sys/dev/pci/pcidevs_data.h Thu Aug 10 09:44:21 2023
@@ -26184,6 +26184,10 @@ static const struct pci_known_product
pci_known_produc
 "ConnectX-6 Dx",
  },
  {
+PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_MT2894,
+"ConnectX-6 Lx",
+ },
+ {
 PCI_VENDOR_MELLANOX, PCI_PRODUCT_MELLANOX_CONNECTX_EN,
 "ConnectX EN",
  },

Thanks to this patch, the cards are recognized by the system. The first
tests
don't show any problem. But I'm not sure of anything, because I didn't
check
if_mcx.c with the datasheet of the cards (and I'm not able to do this).

I would like to know if it could be possible to add an official support
of the Mellanox MT2894 family (ConnectX-6 Lx) in the upcoming OpenBSD
release ?

Best regards
Olivier.

PS : for your information, already posted this morning on misc mailing list
(https://marc.info/?l=openbsd-misc=169398814929531=2) but it wasn't
maybe the best mailing list.


Re: bsd.port.mk.5: Ev for NO_ARCH

2023-09-06 Thread Ingo Schwarze
Hi Caspar,

Caspar Schutijser wrote on Wed, Sep 06, 2023 at 01:46:45PM +0200:

> The patch below marks up "NO_ARCH" consistently with the other
> variables. I noticed because on man.openbsd.org, NO_ARCH looks off
> compared to the others:
> https://man.openbsd.org/bsd.port.mk.5#MULTI_PACKAGES

Indeed, keeping markup correct and consistent is worthwhile.

Incorrect markup not only prevents consistent formatting across all
output devices but also harms search functionality.

For example, none of these currently works as expected:

 - typing ":t NO_ARCH" while in "man bsd.port.mk"
 - man -O tag=NO_ARCH bsd.port.mk
 - man -k Ev=NO_ARCH
 - man -k any=NO_ARCH
 - https://man.openbsd.org/bsd.port.mk#NO_ARCH

> Comments or OKs?

OK schwarze@
  Ingo

> Index: bsd.port.mk.5
> ===
> RCS file: /cvs/src/share/man/man5/bsd.port.mk.5,v
> retrieving revision 1.604
> diff -u -p -r1.604 bsd.port.mk.5
> --- bsd.port.mk.5 5 Sep 2023 23:45:53 -   1.604
> +++ bsd.port.mk.5 6 Sep 2023 11:44:22 -
> @@ -2447,7 +2447,7 @@ below.
>  Especially read the part about
>  .Ev ONLY_FOR_ARCHS
>  when some of the packages only exist for some architectures.
> -.It NO_ARCH
> +.It Ev NO_ARCH
>  Location for arch-independent packages.
>  Defaults to
>  .Sq no-arch .
> 



Re: Add Intel Optane P1600X to pcidevs

2023-09-06 Thread Jonathan Gray
On Sun, Sep 03, 2023 at 03:08:13PM +0200, Andreas Bartelt wrote:
> Hi,
> 
> forgot to include the corresponding dmesg output:
> nvme1 at pci20 dev 0 function 0 "Intel P1600X" rev 0x00: msix, NVMe 1.1
> nvme1: INTEL SSDPEK1A118GA, firmware U5110550, serial BTOC14120X9P118B
> 
> Could this please be committed?

thanks, committed



Use counters_read(9) from ddb(4)

2023-09-06 Thread Martin Pieuchot
Debugging OOM is hard.  UVM uses per-CPU counters and sadly
counters_read(9) needs to allocate memory.  This is not acceptable in
ddb(4).  As a result I cannot see the content of UVM counters in OOM
situations.

Diff below introduces a *_static() variant of counters_read(9) that
takes a secondary buffer to avoid calling malloc(9).  Is it fine?  Do
you have a better idea?  Should we make it the default or using the
stack might be a problem?

Thanks,
Martin

Index: kern/subr_percpu.c
===
RCS file: /cvs/src/sys/kern/subr_percpu.c,v
retrieving revision 1.10
diff -u -p -r1.10 subr_percpu.c
--- kern/subr_percpu.c  3 Oct 2022 14:10:53 -   1.10
+++ kern/subr_percpu.c  6 Sep 2023 11:54:31 -
@@ -161,15 +161,25 @@ counters_free(struct cpumem *cm, unsigne
 void
 counters_read(struct cpumem *cm, uint64_t *output, unsigned int n)
 {
-   struct cpumem_iter cmi;
-   uint64_t *gen, *counters, *temp;
-   uint64_t enter, leave;
+   uint64_t *temp;
unsigned int i;
 
for (i = 0; i < n; i++)
output[i] = 0;
 
temp = mallocarray(n, sizeof(uint64_t), M_TEMP, M_WAITOK);
+   counters_read_static(cm, output, n, temp);
+   free(temp, M_TEMP, n * sizeof(uint64_t));
+}
+
+void
+counters_read_static(struct cpumem *cm, uint64_t *output, unsigned int n,
+uint64_t *temp)
+{
+   struct cpumem_iter cmi;
+   uint64_t *gen, *counters;
+   uint64_t enter, leave;
+   unsigned int i;
 
gen = cpumem_first(, cm);
do {
@@ -201,8 +211,6 @@ counters_read(struct cpumem *cm, uint64_
 
gen = cpumem_next(, cm);
} while (gen != NULL);
-
-   free(temp, M_TEMP, n * sizeof(uint64_t));
 }
 
 void
Index: sys/percpu.h
===
RCS file: /cvs/src/sys/sys/percpu.h,v
retrieving revision 1.8
diff -u -p -r1.8 percpu.h
--- sys/percpu.h28 Aug 2018 15:15:02 -  1.8
+++ sys/percpu.h6 Sep 2023 11:52:55 -
@@ -114,6 +114,8 @@ struct cpumem   *counters_alloc(unsigned i
 struct cpumem  *counters_alloc_ncpus(struct cpumem *, unsigned int);
 voidcounters_free(struct cpumem *, unsigned int);
 voidcounters_read(struct cpumem *, uint64_t *, unsigned int);
+voidcounters_read_static(struct cpumem *, uint64_t *,
+unsigned int, uint64_t *);
 voidcounters_zero(struct cpumem *, unsigned int);
 
 static inline uint64_t *
Index: uvm/uvm_meter.c
===
RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
retrieving revision 1.49
diff -u -p -r1.49 uvm_meter.c
--- uvm/uvm_meter.c 18 Aug 2023 09:18:52 -  1.49
+++ uvm/uvm_meter.c 6 Sep 2023 11:53:02 -
@@ -249,11 +249,12 @@ uvm_total(struct vmtotal *totalp)
 void
 uvmexp_read(struct uvmexp *uexp)
 {
-   uint64_t counters[exp_ncounters];
+   uint64_t counters[exp_ncounters], temp[exp_ncounters];
 
memcpy(uexp, , sizeof(*uexp));
 
-   counters_read(uvmexp_counters, counters, exp_ncounters);
+   counters_read_static(uvmexp_counters, counters, exp_ncounters,
+   temp);
 
/* stat counters */
uexp->faults = (int)counters[faults];



bsd.port.mk.5: Ev for NO_ARCH

2023-09-06 Thread Caspar Schutijser
Hi,

The patch below marks up "NO_ARCH" consistently with the other
variables. I noticed because on man.openbsd.org, NO_ARCH looks off
compared to the others:
https://man.openbsd.org/bsd.port.mk.5#MULTI_PACKAGES

Comments or OKs?

Caspar

Index: bsd.port.mk.5
===
RCS file: /cvs/src/share/man/man5/bsd.port.mk.5,v
retrieving revision 1.604
diff -u -p -r1.604 bsd.port.mk.5
--- bsd.port.mk.5   5 Sep 2023 23:45:53 -   1.604
+++ bsd.port.mk.5   6 Sep 2023 11:44:22 -
@@ -2447,7 +2447,7 @@ below.
 Especially read the part about
 .Ev ONLY_FOR_ARCHS
 when some of the packages only exist for some architectures.
-.It NO_ARCH
+.It Ev NO_ARCH
 Location for arch-independent packages.
 Defaults to
 .Sq no-arch .



wsdisplay: disable keyboard backlight with screen burner

2023-09-06 Thread Tobias Heider
Hi,

the diff below disables and restores the keyboard backlight together with
the screen on idle timeout to save a bit of battery.

ok?

diff 848795b17df6d7aac8fe7242132657e294ce39df 
0bb6b11cdeac4d4755e336594acf830b859e9d34
commit - 848795b17df6d7aac8fe7242132657e294ce39df
commit + 0bb6b11cdeac4d4755e336594acf830b859e9d34
blob - a0a3fd25717cce5888c4488c8fc8f35a5e43afe3
blob + bb5bb0ae7c868750c9f598a5bf6ef1142ab2942a
--- sys/dev/wscons/wsdisplay.c
+++ sys/dev/wscons/wsdisplay.c
@@ -156,6 +156,11 @@ struct wsdisplay_softc {
 void   wsdisplay_burner_setup(struct wsdisplay_softc *, struct wsscreen *);
 void   wsdisplay_burner(void *v);
 
+extern int (*wskbd_get_backlight)(struct wskbd_backlight *);
+extern int (*wskbd_set_backlight)(struct wskbd_backlight *);
+
+void   wsdisplay_burn_kbd(struct wsdisplay_softc *, int);
+
 struct wsdisplay_softc {
struct device sc_dv;
 
@@ -177,6 +182,7 @@ struct wsdisplay_softc {
int sc_burnout; /* current sc_burner delay (ms) */
int sc_burnman; /* nonzero if screen blanked */
int sc_burnflags;
+   struct wskbd_backlight  sc_kbdbl_saved;
 #endif
 
int sc_isconsole;
@@ -1224,6 +1230,7 @@ wsdisplay_internal_ioctl(struct wsdisplay_softc *sc, s
return (EOPNOTSUPP);
(*sc->sc_accessops->burn_screen)(sc->sc_accesscookie,
 *(u_int *)data, sc->sc_burnflags);
+   wsdisplay_burn_kbd(sc, *(u_int *)data == WSDISPLAYIO_VIDEO_ON);
sc->sc_burnman = *(u_int *)data == WSDISPLAYIO_VIDEO_OFF;
break;
 
@@ -2409,6 +2416,7 @@ wsdisplay_burner(void *v)
if (sc->sc_accessops->burn_screen) {
(*sc->sc_accessops->burn_screen)(sc->sc_accesscookie,
sc->sc_burnman, sc->sc_burnflags);
+   wsdisplay_burn_kbd(sc, sc->sc_burnman);
s = spltty();
if (sc->sc_burnman) {
sc->sc_burnout = sc->sc_burnoutintvl;
@@ -2533,6 +2541,30 @@ wsdisplay_brightness_cycle(struct device *dev)
wsdisplay_brightness_step(dev, 1);
 }
 
+#ifdef HAVE_BURNER_SUPPORT
+void
+wsdisplay_burn_kbd(struct wsdisplay_softc *sc, int on)
+{
+   struct wskbd_backlight bl;
+
+   if (wskbd_set_backlight == NULL ||
+   wskbd_get_backlight == NULL)
+   return;
+
+   if (!on) {
+   (*wskbd_get_backlight)(>sc_kbdbl_saved);
+   bl = sc->sc_kbdbl_saved;
+   bl.curval = bl.min;
+   } else {
+   bl = sc->sc_kbdbl_saved;
+   if (bl.max == 0)
+   return;
+   }
+
+   (*wskbd_set_backlight)();
+}
+#endif
+
 #ifdef HAVE_WSMOUSED_SUPPORT
 /*
  * wsmoused(8) support functions



Re: riscv64 possible alignment issue?

2023-09-06 Thread Peter J. Philipp
On Tue, Sep 05, 2023 at 11:40:42PM +0200, Peter J. Philipp wrote:
> Hi,
> 
> I'm porting OpenBSD to the Mango Pi D1.  Most of the work is done but now
> is the time to try to get a RAMDISK kernel to boot.
> 
> I'm having an issue with atomic_store_64() as shown here (line 1192):
> 
>1184 for (; va < DMAP_MAX_ADDRESS && pa < max_pa;
>1185 pa += L1_SIZE, va += L1_SIZE, l1_slot++) {
>1186 KASSERT(l1_slot < Ln_ENTRIES);
>1187
>1188 /* gigapages */
>1189 pn = (pa / PAGE_SIZE);
>1190 entry = PTE_KERN;
>1191 entry |= (pn << PTE_PPN0_S);
>1192 atomic_store_64([l1_slot], entry);
>1193 }
> 
> The D1 seems to hang on this call.  To me that seems like the store hangs that
> I did on powerpc64 when memory was not aligned.  When I replaced the 
> AMO instruction with a simple C replacement it worked.  So I wonder if the 
> following is correct or not:
> 
>   entry = _ALIGN(PTE_KERN);
> 
> I looked at riscv-privileged-202111203.pdf (page 79) which is for Sv32 and
> a few pages down is the Sv39 (page 84).  But I can't make sense of it.  And
> my Mona Lisa book that I have from Waterman mentions on page 101:
> 
> ''Misaligned address exceptions occur when the effective address isn't 
> divisible
> by the access size - for example, amoadd.w with an address of 0x12.''
> 
> I don't think in the bootstrapping of pmap that traps are turned on yet,
> but I could be mistaken, hence the hang.

There is something weird which I can't figure out, this document is golden
to find the truth:  

https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf

OK on page 79 is the Sv32 (32-bit specification), it is kinda irrelevant
for us but there is something bothering me, and we must see into this so that
we can talk about Sv39 (64-bit) which is on page 84, and Sv48 which is on
page 86. 

Sv32:

on page 80, figure 4.17 is the 32-bit physical address translation (based on
satp register's MODE) and PPN0 is 10 bits here.  PPN1 is 12 bits.  PPN is
the physical page number.  And the 0 and 1 is the levels of what sort of
pages these are.  4096 byte size is level 0, and level 1 is 4MB (megapages)
size. PTESIZE=4

Sv39:

on page 85, figure 4.20, is the 64-bit (Sv39) physical address translation
based on the satp register which is 64 bits.  PPN0 is 9 bits here, PPN1 is
9 bits and PPN2 is 26 bits.  These three levels represent 4096 byte physical
page numbers (level 0), 2 MB megapages PPN's (level 1) and 1 GB gigapages
PPN's.  "each of which must be virtually and physically aligned to a
boundary equal to its size.  A page-fault exception is raised if the physical
address is insufficently aligned". PTESIZE=8

Sv48:

on page 86, figure 4.23, is a 64-bit (Sv48) physical address translation
based on the satp register again which is 64 bits.  They have 4 levels of
PPN numbers; level 0 pages (9 bits), level 1 megapages (9 bits), 
level 2 gigapages (9 bits) and also introduce 512 GB terapages on level 3
(17 bits).  PTESIZE=8

OK now that I have repeated the document I have 3 questions:

1. AFAIK, we ONLY support Sv39 right now, is this true?
2. From /usr/src/sys/arch/riscv64/include/pte.h:

#define PTE_PPN0_S  10

This is the PPN shift value...why is it 10 and not 9?  We're using Sv39.
If this is wrong then PTE_PPN[123] also have to be adjusted.

3. Why are we talking about 4 levels of PPN's when there is only 3 levels in
the Sv39 definition?

I put some debugging into my kernel this morning and added a printf in
pmap's pmap_bootstrap_dmap() and had it just before the atomic function:

>> OpenBSD/riscv64 BOOTRISCV64 1.5
boot> boot
cannot open sd0a:/etc/random.seed: No such file or directory
booting sd0a:/bsd: 2152904+693716+8907000+538152 
[181604+122+283464+186377]=0xfa1420
bootargs: 
aligned_entry=10001000,pn=4,ae%pn=1000,[l1s]=ffc00100ba00

This is what it showed me.  In my tests instead of _ALIGN()'ing PTE_KERN I
round_page() the entry value to make it 4096 page aligned (10001000),
I also show the physmem in page numbers 0x4 which is 262144 pages for
1 GB RAM.  aligned_entry % pn == 0x1000, which didn't fit exactly into this.
and finally [l1_slot] which gave this odd looking number:  

0xffc00100ba00

This is not a page aligned offset, on any physical page number.  L1_slot I
think was 320 from earlier debugs and I left it alone in my tests.  Let's see,
here it is:

before atomic store l1=ffc00100a000 l1_slot=320,[ffc00100aa00], entry = 
10c7

So I'm wondering about these misalignments and the questions above.


> If you have another idea why I get these let me know, I'm going to bed now
> and won't be up until 6AM or later tomorrow.  I'm letting my kernel compile
> on the slow QEMU host overnight.

Sorry about this, tb@, I'll try to refrain from mentioning my bed times.

Best Regards,
-peter


--