Re: OpenBSD 7.1 - hangs after userland upgrade on server hardware

2022-05-01 Thread Andrew Lemin
Hi Stuart,

Good to hear from you. Hope you are well.

Yes sorry I am stretching the terms a little there, by userland I meant
third party packages..
>From (https://www.openbsd.org/faq/upgrade71.html), as usual I just ran;
'sysupgrade' (auto reboot), 'sysmerge', 'reboot', and 'pkg_add -u' (no
longer boots).

So nothing funny.. And only after the 'pkg_add -u' command completed
successfully completed does the box start hanging (which is very strange as
this is just third-party packages.. this makes me think that some GUI
things have changed, and being headless it is causing a problem).

To make things awkward, when I try to boot from the upgraded install (I
still have the upgraded 7.1 SSD - I reinstalled 7.0 on a different SSD), I
tried booting to single user mode. And I can get to the maintenance prompt,
but before I can run any commands it hangs (and it hangs at slightly
different times within a few seconds I'd guess - sometime I can type a
whole command others only the first few characters). So the hang seems to
occur a little randomly, but always within a few seconds of getting to the
single user command prompt.

If I boot from the 7.1 installer, as mentioned, it seems to hang after I
have configured the network interfaces. At least that is as far as I have
been able to get before it freezes.
I don't think that looking at the installer code is going to give any
useful clues, as this fault happens irrespective of the installer (The
successfully upgraded system hangs on normal boot or single user boot).

Watching the normal boot of the upgraded 7.1 disk, the freeze seems to
happen at different places along the boot logs (variance is maybe ~40 boot
messages).
Once I can shut the firewall down again I will swap the disks back and
video the boot multiple times.

Thanks for your time :)

*dmesg from the re-installed 7.0;*
OpenBSD 7.0 (GENERIC.MP) #6: Mon Apr 4 00:47:02 MDT 2022
r...@syspatch-70-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/
GENERIC.MP
real mem = 8485744640 (8092MB)
avail mem = 8212533248 (7832MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xec280 (28 entries)
bios0: vendor American Megatrends Inc. version "3.1" date 05/14/2018
bios0: Supermicro X10SLV-Q
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SSDT SSDT SSDT SSDT MCFG HPET
SSDT SSDT DMAR
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) PEGP(S4)
RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) PXSX(S4)
GLAN(S4) EHC1(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E3-1280 v3 @ 3.60GHz, 3600.69 MHz, 06-3c-03
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU E3-1280 v3 @ 3.60GHz, 3600.01 MHz, 06-3c-03
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Xeon(R) CPU E3-1280 v3 @ 3.60GHz, 3600.01 MHz, 06-3c-03
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Xeon(R) CPU E3-1280 v3 @ 3.60GHz, 3600.01 MHz, 06-3c-03
cpu3:

Re: OpenBSD 7.1 - hangs after userland upgrade on server hardware

2022-05-01 Thread Stuart Henderson
On 2022-05-01, Andrew Lemin  wrote:
> Hi all,
>
> I am totally stumped with issues while upgrading/installing 7.1 and I need
> some help!
>
> Server; Supermicro X10SLV-Q (Intel Q87 Express), Xeon E3-1280 v3, 8G RAM,
> Mellanox 10G NIC
>
> This server has been running OpenBSD flawlessly for years. I followed the
> upgrade instructions and was able to reboot fine onto the 7.1 kernel (I
> rebooted a couple of times on the 7.1 kernel in fact). However after I run
> 'pkg_add -u' to upgrade all of userland to 7.1, the machine started hanging
> during boot.
>
> The hang looked like an IO problem as it would always hang around the disk
> setup stages.
> I went into the BIOS and tried optimised defaults and failsafe defaults but
> no luck..
>
> I also downloaded a fresh copy and tried installing 7.1 from flash, however
> the 7.1 installer also hangs. It hangs in the same place every time after
> selecting 'done' to the networking config.
> As I have a Mellanox card in here, I removed the NIC. but the hang
> continues so its not that..
>
> I get nothing to debug, it just freezes. I have reinstalled 7.0 which is
> still working perfectly so this is not a hardware fault.
>
> Is there anything I can do to increase the verbosity to see what driver it
> is trying to load before the hang?
>
> Other information, this is a totally headless machine, with a Xeon CPU
> without any onboard GPU. It has a console connection with
> console-redirection in the bios, and I have to set the tty params during
> boot to interact over console. Otherwise everything else is standard.

If you can copy the console output (from boot loader to hang) from serial
console into an email, that might give some clues.

dmesg from 7.0 might be useful too.

It's a bit unclear how you upgraded (pkg_add -u is for packages rather than
userland parts of the OS) - normally you would upgrade kernel and userland
such that you'd boot onto new kernel+userland at the same time, then update
packages.




OpenBSD 7.1 - hangs after userland upgrade on server hardware

2022-05-01 Thread Andrew Lemin
Hi all,

I am totally stumped with issues while upgrading/installing 7.1 and I need
some help!

Server; Supermicro X10SLV-Q (Intel Q87 Express), Xeon E3-1280 v3, 8G RAM,
Mellanox 10G NIC

This server has been running OpenBSD flawlessly for years. I followed the
upgrade instructions and was able to reboot fine onto the 7.1 kernel (I
rebooted a couple of times on the 7.1 kernel in fact). However after I run
'pkg_add -u' to upgrade all of userland to 7.1, the machine started hanging
during boot.

The hang looked like an IO problem as it would always hang around the disk
setup stages.
I went into the BIOS and tried optimised defaults and failsafe defaults but
no luck..

I also downloaded a fresh copy and tried installing 7.1 from flash, however
the 7.1 installer also hangs. It hangs in the same place every time after
selecting 'done' to the networking config.
As I have a Mellanox card in here, I removed the NIC. but the hang
continues so its not that..

I get nothing to debug, it just freezes. I have reinstalled 7.0 which is
still working perfectly so this is not a hardware fault.

Is there anything I can do to increase the verbosity to see what driver it
is trying to load before the hang?

Other information, this is a totally headless machine, with a Xeon CPU
without any onboard GPU. It has a console connection with
console-redirection in the bios, and I have to set the tty params during
boot to interact over console. Otherwise everything else is standard.

Thanks for your time,
Best regards Andy.