Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-14 Thread Mark Kettenis
> From: kiltz 
> Date: Tue, 14 Jun 2022 13:27:03 +0200
> 
> Dear Mark,
> first of all, thanks again for your efforts!
> After testing the new snapshot, in total 64 cores are recognized (see  
> attached screenshot) before the kernel panics again.
> Since the CPU Ampere Altra Q80-33 processor has 80 cores, I suspect  
> there are only two possibilities - either after initializing 64 cores  
> the kernel we hit a brick wall of sorts or somehow the limit was set  
> to 64 cores?
> Best wishes,

Hi Stefan,

I don't understand what's happening here.  To help us out can you:

1. Boot the machine with a single-processor kernel by typing "bsd.sp"
   at the boot> prompt?

2. Send me the output of the "eeprom -p" command.

3. Send me the files in the /var/db/acpi directory.

4. Send the dmesg output.

Feel free to send me (kette...@openbsd.org) and Patrick
(patr...@openbsd.org) that output in private if you have concerns
sending it to a public mailing list.

Thanks,

Mark



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-14 Thread kiltz
Hello all,
thanks even once more for the support, you people are truly great!
I will have access to the machine in the second part of the day and will
for sure give it a spin!
And of course, if all works out, then of course I will provide you with a
dmesg of the successful start. I will definitely keep you posted!
Best wishes,

 Stefan

On Mon, June 13, 2022 11:08 pm, Patrick Wildt wrote:
> Am Mon, Jun 13, 2022 at 07:44:24PM +0200 schrieb Mark Kettenis:
>
>>> From: kiltz 
>>> Date: Mon, 13 Jun 2022 18:12:27 +0200
>>>
>>>
>>> Dear Mark,
>>> first of all, thank you very much for your explainations, the diff and,
>>> indeed, the ultra swift reply! That helps us a lot already.
>>> A snapshot with a higher value of max CPUs out of the box, of course,
>>> would be the proverbial icing on the cake. Probably a strange question
>>> but I hazard it anyways - should we monitor the snapshot directory the
>>> /pub/OpenBSD/snapshots folder or is
>>> there a quicker way to find out what your fellow developers think?
>>> Again, many thanks for your help and best wishes,
>>>
>>
>> Hi Stefan,
>>
>>
>> Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
>> will have it.  You can easily tell, since all 80 CPUs should attach with
>> that diff.
>>
>> Cheers,
>>
>>
>> Mark
>>
>
> And it's nice to hear that SP install already worked.  I remember
> booting it up on an Oracle machine with an Ampere Altra which led to
> messages like
>
> agintcmsi0 at agintc0: unsupported type 0x001700026f31
>
> See http://ix.io/3GEX
>
>
> While I had a diff somewhere to 'fix that', I never got the timer
> interrupt to fire.
>
> That you already had an SP install means all that should be fine.
> If this change/new snap works, I'd be interested to read a full
> dmesg!
>
> Cheers,
> Patrick
>
>




Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-14 Thread kiltz
Hello all,
thanks even once more for the support, you people are truly great!
I will have access to the machine in the second part of the day and will
for sure give it a spin!
And of course, if all works out, then of course I will provide you with a
dmesg of the successful start. I will definitely keep you posted!
Best wishes,

 Stefan

On Mon, June 13, 2022 11:08 pm, Patrick Wildt wrote:
> Am Mon, Jun 13, 2022 at 07:44:24PM +0200 schrieb Mark Kettenis:
>
>>> From: kiltz 
>>> Date: Mon, 13 Jun 2022 18:12:27 +0200
>>>
>>>
>>> Dear Mark,
>>> first of all, thank you very much for your explainations, the diff and,
>>> indeed, the ultra swift reply! That helps us a lot already.
>>> A snapshot with a higher value of max CPUs out of the box, of course,
>>> would be the proverbial icing on the cake. Probably a strange question
>>> but I hazard it anyways - should we monitor the snapshot directory the
>>> /pub/OpenBSD/snapshots folder or is
>>> there a quicker way to find out what your fellow developers think?
>>> Again, many thanks for your help and best wishes,
>>>
>>
>> Hi Stefan,
>>
>>
>> Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
>> will have it.  You can easily tell, since all 80 CPUs should attach with
>> that diff.
>>
>> Cheers,
>>
>>
>> Mark
>>
>
> And it's nice to hear that SP install already worked.  I remember
> booting it up on an Oracle machine with an Ampere Altra which led to
> messages like
>
> agintcmsi0 at agintc0: unsupported type 0x001700026f31
>
> See http://ix.io/3GEX
>
>
> While I had a diff somewhere to 'fix that', I never got the timer
> interrupt to fire.
>
> That you already had an SP install means all that should be fine.
> If this change/new snap works, I'd be interested to read a full
> dmesg!
>
> Cheers,
> Patrick
>
>




Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Patrick Wildt
Am Mon, Jun 13, 2022 at 07:44:24PM +0200 schrieb Mark Kettenis:
> > From: kiltz 
> > Date: Mon, 13 Jun 2022 18:12:27 +0200
> > 
> > Dear Mark,
> > first of all, thank you very much for your explainations, the diff  
> > and, indeed, the ultra swift reply!
> > That helps us a lot already.
> > A snapshot with a higher value of max CPUs out of the box, of course,  
> > would be the proverbial icing on the cake.
> > Probably a strange question but I hazard it anyways - should we  
> > monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> > there a quicker way to find out what your fellow developers think?
> > Again, many thanks for your help and best wishes,
> 
> Hi Stefan,
> 
> Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
> will have it.  You can easily tell, since all 80 CPUs should attach
> with that diff.
> 
> Cheers,
> 
> Mark

And it's nice to hear that SP install already worked.  I remember
booting it up on an Oracle machine with an Ampere Altra which led
to messages like

agintcmsi0 at agintc0: unsupported type 0x001700026f31

See http://ix.io/3GEX

While I had a diff somewhere to 'fix that', I never got the timer
interrupt to fire.

That you already had an SP install means all that should be fine.
If this change/new snap works, I'd be interested to read a full
dmesg!

Cheers,
Patrick



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis
> From: kiltz 
> Date: Mon, 13 Jun 2022 18:12:27 +0200
> 
> Dear Mark,
> first of all, thank you very much for your explainations, the diff  
> and, indeed, the ultra swift reply!
> That helps us a lot already.
> A snapshot with a higher value of max CPUs out of the box, of course,  
> would be the proverbial icing on the cake.
> Probably a strange question but I hazard it anyways - should we  
> monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> there a quicker way to find out what your fellow developers think?
> Again, many thanks for your help and best wishes,

Hi Stefan,

Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
will have it.  You can easily tell, since all 80 CPUs should attach
with that diff.

Cheers,

Mark

> - 
> Dr.-Ing. Stefan Kiltz
> 
> Otto-von-Guericke University of Magdeburg
> ITI Research Group on
> Multimedia and Security
> Universitaetsplatz 2
> 39106 Magdeburg
> Germany
> 
> Tel: +49-391-67-52838
> Fax: +49-391-67-18110
> 
> eMail: ki...@iti.cs.uni-magdeburg.de
> 
> 
> 
> 
> 
> On 13 Jun 2022, at 17:20, Mark Kettenis wrote:
> 
> >> From: kiltz 
> >> Date: Mon, 13 Jun 2022 14:46:39 +0200
> >
> > Hi Stefan,
> >
> >> Dear kind people at OpenBSD.org,
> >> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30
> >> with the following specifications:
> >>
> >>Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
> >>512 GB RAM (3200 MHz ECC-reg.)
> >>2 x 480 GB SSD SATA 6 Gb/s 2,5''
> >>Dual-Port 1 GbE (RJ-45)
> >>IPMI 2.0 Baseboard Management Controller (BMC)
> >> 1 x PCIe4.0 x16 (FHHL)
> >>1 x PCIe3.0 x16 OCP2.0 (belegt)
> >>1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> >>
> >> We tried both:
> >> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> >> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> >>
> >> The repeatable result is a working install in single CPU/Core
> >> installation mode, cpu panic after first reboot with mp kernel. We  
> >> use
> >> the serial to LAN console provided by the IMPI/BMC card.
> >> Attached you will find screenshots from:
> >>
> >> - the last 49 columns of the reboot into mp kernel
> >> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13
> >> 13-51-00.png),
> >> - the ddb trace output (Screenshot ddb_trace_2022-06-13  
> >> 14-02-11.png),
> >> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> >> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13
> >> 14-04-28.png)
> >> - the ddb show registers output (Screenshot ddb_show_registers_at
> >> 2022-06-13 14-06-34.png)
> >>
> >> Due to the nature of the early boot panic, the kernel output is not
> >> accessible to us.
> >>
> >> Interestingly, FreeBSD only supports them in their current release,
> >> the stable fails with a similar panic. They seem to have found a fix
> >> of sorts. But we very much prefer OpenBSD for the firewalling role of
> >> aforementioned system.
> >>
> >> Of course we support your effort so if you need more info from us
> >> regarding the circumstances, we will happily try and supply the
> >> required information.
> >
> > The immediate problem is that OpenBSD currently supports a maximum of
> > 32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
> > 128.  You could try building a GENERIC.MP kernel with this diff after
> > booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
> > my fellow developers think abut bumping MAXCPUS.  Depending on the
> > outcome of that a snapshot with this change may be available in a few
> > days.
> >
> > I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
> > very well but I guess there is only one way to find out...
> >
> > Cheers,
> >
> > Mark
> >
> >
> > Index: arch/arm64/include/cpu.h
> > ===
> > RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 cpu.h
> > --- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
> > +++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
> > @@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
> > #define CPU_INFO_FOREACH(cii, ci)   for (cii = 0, ci = cpu_info_list; \
> > ci != NULL; ci = ci->ci_next)
> > #define CPU_INFO_UNIT(ci)   ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
> > -#define MAXCPUS32
> > +#define MAXCPUS128
> >
> > extern struct cpu_info *cpu_info[MAXCPUS];
> >
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
> 
> iEYEARECAAYFAmKnYesACgkQuLKZPfaiT0iDDgCfXC6QIWGHzkMyWxPKHCaTkYwR
> AXUAnjLiJX1RyuqrMejk4AT2s5X99fmi
> =pRhT
> -END PGP SIGNATURE-
> 



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread kiltz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Mark,
first of all, thank you very much for your explainations, the diff  
and, indeed, the ultra swift reply!

That helps us a lot already.
A snapshot with a higher value of max CPUs out of the box, of course,  
would be the proverbial icing on the cake.
Probably a strange question but I hazard it anyways - should we  
monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
there a quicker way to find out what your fellow developers think?

Again, many thanks for your help and best wishes,

 Stefan

- 
Dr.-Ing. Stefan Kiltz

Otto-von-Guericke University of Magdeburg
ITI Research Group on
Multimedia and Security
Universitaetsplatz 2
39106 Magdeburg
Germany

Tel: +49-391-67-52838
Fax: +49-391-67-18110

eMail: ki...@iti.cs.uni-magdeburg.de





On 13 Jun 2022, at 17:20, Mark Kettenis wrote:


From: kiltz 
Date: Mon, 13 Jun 2022 14:46:39 +0200


Hi Stefan,


Dear kind people at OpenBSD.org,
we want to run OpenBSD as a firewall system on a Gigabyte R152_P30
with the following specifications:

Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
512 GB RAM (3200 MHz ECC-reg.)
2 x 480 GB SSD SATA 6 Gb/s 2,5''
Dual-Port 1 GbE (RJ-45)
IPMI 2.0 Baseboard Management Controller (BMC)
1 x PCIe4.0 x16 (FHHL)
1 x PCIe3.0 x16 OCP2.0 (belegt)
1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)

We tried both:
- official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
- snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)

The repeatable result is a working install in single CPU/Core
installation mode, cpu panic after first reboot with mp kernel. We  
use

the serial to LAN console provided by the IMPI/BMC card.
Attached you will find screenshots from:

- the last 49 columns of the reboot into mp kernel
(Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13
13-51-00.png),
- the ddb trace output (Screenshot ddb_trace_2022-06-13  
14-02-11.png),

- the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
- the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13
14-04-28.png)
- the ddb show registers output (Screenshot ddb_show_registers_at
2022-06-13 14-06-34.png)

Due to the nature of the early boot panic, the kernel output is not
accessible to us.

Interestingly, FreeBSD only supports them in their current release,
the stable fails with a similar panic. They seem to have found a fix
of sorts. But we very much prefer OpenBSD for the firewalling role of
aforementioned system.

Of course we support your effort so if you need more info from us
regarding the circumstances, we will happily try and supply the
required information.


The immediate problem is that OpenBSD currently supports a maximum of
32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
128.  You could try building a GENERIC.MP kernel with this diff after
booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
my fellow developers think abut bumping MAXCPUS.  Depending on the
outcome of that a snapshot with this change may be available in a few
days.

I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
very well but I guess there is only one way to find out...

Cheers,

Mark


Index: arch/arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.25
diff -u -p -r1.25 cpu.h
--- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
+++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
@@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
#define CPU_INFO_FOREACH(cii, ci)   for (cii = 0, ci = cpu_info_list; \
ci != NULL; ci = ci->ci_next)
#define CPU_INFO_UNIT(ci)   ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
-#define MAXCPUS32
+#define MAXCPUS128

extern struct cpu_info *cpu_info[MAXCPUS];



-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

iEYEARECAAYFAmKnYesACgkQuLKZPfaiT0iDDgCfXC6QIWGHzkMyWxPKHCaTkYwR
AXUAnjLiJX1RyuqrMejk4AT2s5X99fmi
=pRhT
-END PGP SIGNATURE-



Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Mark Kettenis
> From: kiltz 
> Date: Mon, 13 Jun 2022 14:46:39 +0200

Hi Stefan,

> Dear kind people at OpenBSD.org,
> we want to run OpenBSD as a firewall system on a Gigabyte R152_P30  
> with the following specifications:
> 
>   Ampere Altra Q80-33 processor  (80 Cores, 3,3 GHz)
>   512 GB RAM (3200 MHz ECC-reg.)
>   2 x 480 GB SSD SATA 6 Gb/s 2,5''
>   Dual-Port 1 GbE (RJ-45)
>   IPMI 2.0 Baseboard Management Controller (BMC)
>  1 x PCIe4.0 x16 (FHHL)
>   1 x PCIe3.0 x16 OCP2.0 (belegt)
>   1 x USB 3.0 (front), 3 x USB 3.0 (rear), 1 x VGA (rear)
> 
> We tried both:
> - official stable 7.1 (/pub/OpenBSD/7.1/arm64) and
> - snapshot from 6th of June 2022 (/pub/OpenBSD/snapshots/arm64)
> 
> The repeatable result is a working install in single CPU/Core  
> installation mode, cpu panic after first reboot with mp kernel. We use  
> the serial to LAN console provided by the IMPI/BMC card.
> Attached you will find screenshots from:
> 
> - the last 49 columns of the reboot into mp kernel  
> (Screenshot_boot_after_install_Gigabyte_R152_P30 at 2022-06-13  
> 13-51-00.png),
> - the ddb trace output (Screenshot ddb_trace_2022-06-13 14-02-11.png),
> - the ddb ps output (Screenshot ddb_ps_at 2022-06-13 14-03-25.png),
> - the ddb show panic output (Screenshot ddb_show_panic_at 2022-06-13  
> 14-04-28.png)
> - the ddb show registers output (Screenshot ddb_show_registers_at  
> 2022-06-13 14-06-34.png)
> 
> Due to the nature of the early boot panic, the kernel output is not  
> accessible to us.
> 
> Interestingly, FreeBSD only supports them in their current release,  
> the stable fails with a similar panic. They seem to have found a fix  
> of sorts. But we very much prefer OpenBSD for the firewalling role of  
> aforementioned system.
> 
> Of course we support your effort so if you need more info from us  
> regarding the circumstances, we will happily try and supply the  
> required information.

The immediate problem is that OpenBSD currently supports a maximum of
32 CPUs.  That limit is a bit arbitrary, so the diff below bumps it to
128.  You could try building a GENERIC.MP kernel with this diff after
booting the GENERIC (bsd.sp) single-processor kernel.  I'll see what
my fellow developers think abut bumping MAXCPUS.  Depending on the
outcome of that a snapshot with this change may be available in a few
days.

I'm not sure how well OpenBSD/arm64 scales to 80 CPUs.  Probably not
very well but I guess there is only one way to find out...

Cheers,

Mark


Index: arch/arm64/include/cpu.h
===
RCS file: /cvs/src/sys/arch/arm64/include/cpu.h,v
retrieving revision 1.25
diff -u -p -r1.25 cpu.h
--- arch/arm64/include/cpu.h23 Mar 2022 23:36:35 -  1.25
+++ arch/arm64/include/cpu.h13 Jun 2022 15:09:32 -
@@ -184,7 +184,7 @@ extern struct cpu_info *cpu_info_list;
 #define CPU_INFO_FOREACH(cii, ci)  for (cii = 0, ci = cpu_info_list; \
ci != NULL; ci = ci->ci_next)
 #define CPU_INFO_UNIT(ci)  ((ci)->ci_dev ? (ci)->ci_dev->dv_unit : 0)
-#define MAXCPUS32
+#define MAXCPUS128
 
 extern struct cpu_info *cpu_info[MAXCPUS];