Re: wsdisplay_switch2: not switching

2023-05-28 Thread Theo de Raadt
Klemens Nanni  wrote:

> Snapshots with 'disable inteldrm' to reduce corruption/hangs on a
> Intel T14 gen 3 always print the following on shutdown/reboot:
> 
>   syncing disks... done
>   wsdisplay_switch2: not switching
>   rebooting...
> 
> Unmodified bsd.mp does not show this.
> 
> It is always a single "wsdisplay_switch2: not switching" line, i.e. never
> "wsdisplay_switch1" or "wsdisplay_switch3" as wsdisplay also provides.
> 
> I do not observe any other misbehaviour wrt. this, reboot/shutdown works.
> 
> Is this a bug or expected behaviour when manually forcing efifb(4) in UKC?
> The wsdisplay code returns EINVAL when logging this, so it reads like an
> error case to me, but I don't know anything about wsdisplay.

x13s gives this also, so it has nothing to do with "disable inteldrm".  It
is suspend/resume wscons acting on non-drm framebuffers clearly.

That said, noone is going to be interested in non-drm amd64 effects gained
from using config -e.  Relying on config -e to make things work is a great
way to avoid hunting for the real problem.  It's why I have occasionally
argued for removing config -e, because of how often it is used as a crutch.



Re: wsdisplay_switch2: not switching

2023-05-28 Thread Mark Kettenis
> Date: Sun, 28 May 2023 12:08:35 +
> From: Klemens Nanni 
> 
> Snapshots with 'disable inteldrm' to reduce corruption/hangs on a
> Intel T14 gen 3 always print the following on shutdown/reboot:
> 
>   syncing disks... done
>   wsdisplay_switch2: not switching
>   rebooting...
> 
> Unmodified bsd.mp does not show this.
> 
> It is always a single "wsdisplay_switch2: not switching" line, i.e. never
> "wsdisplay_switch1" or "wsdisplay_switch3" as wsdisplay also provides.
> 
> I do not observe any other misbehaviour wrt. this, reboot/shutdown works.
> 
> Is this a bug or expected behaviour when manually forcing efifb(4) in UKC?
> The wsdisplay code returns EINVAL when logging this, so it reads like an
> error case to me, but I don't know anything about wsdisplay.

Should not happen, but the code in question is a bit a maze that even
I don't understand.

Feel free to debug what is going wrong.

> OpenBSD 7.3-current (GENERIC.MP) #1203: Sat May 27 09:44:55 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 51214807040 (48842MB)
> avail mem = 49642991616 (47343MB)
> User Kernel Config
> UKC> disable inteldrm
> 240 inteldrm* disabled
> UKC> exit
> Continuing...
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
> bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
> bios0: LENOVO 21AHCTO1WW
> efi0 at bios0: UEFI 2.7
> efi0: Lenovo rev 0x1110
> acpi0 at bios0: ACPI 6.3
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT 
> SSDT SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB 
> DMAR SSDT SSDT SSDT ASF! BGRT PHAT UEFI FPDT
> acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
> XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
> RP03(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
> cpu1 at mainbus0: apid 8 (application processor)
> cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu1: smt 0, core 4, package 0
> cpu2 at mainbus0: apid 16 (application processor)
> cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 10-way L2 cache, 18MB 64b/line 12-way L3 cache
> cpu2: smt 0, core 8, package 0
> cpu3 at mainbus0: apid 24 (application processor)
> cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
> cpu3: 
> 

wsdisplay_switch2: not switching

2023-05-28 Thread Klemens Nanni
Snapshots with 'disable inteldrm' to reduce corruption/hangs on a
Intel T14 gen 3 always print the following on shutdown/reboot:

syncing disks... done
wsdisplay_switch2: not switching
rebooting...

Unmodified bsd.mp does not show this.

It is always a single "wsdisplay_switch2: not switching" line, i.e. never
"wsdisplay_switch1" or "wsdisplay_switch3" as wsdisplay also provides.

I do not observe any other misbehaviour wrt. this, reboot/shutdown works.

Is this a bug or expected behaviour when manually forcing efifb(4) in UKC?
The wsdisplay code returns EINVAL when logging this, so it reads like an
error case to me, but I don't know anything about wsdisplay.


OpenBSD 7.3-current (GENERIC.MP) #1203: Sat May 27 09:44:55 MDT 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 51214807040 (48842MB)
avail mem = 49642991616 (47343MB)
User Kernel Config
UKC> disable inteldrm
240 inteldrm* disabled
UKC> exit
Continuing...
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x900a3000 (80 entries)
bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
bios0: LENOVO 21AHCTO1WW
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1110
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT ASF! BGRT PHAT UEFI FPDT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 8 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu1: smt 0, core 4, package 0
cpu2 at mainbus0: apid 16 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.33 MHz, 06-9a-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 18MB 64b/line 12-way L3 cache
cpu2: smt 0, core 8, package 0
cpu3 at mainbus0: apid 24 (application processor)
cpu3: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.32 MHz, 06-9a-03
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu3: 48KB 

Re: OpenBSD 7.3 under KVM results in fatal protection fault in supervisor mode

2023-05-28 Thread Henryk Paluch

Hello, all!

On 5/27/23 07:49, Mike Larkin wrote:


I don't know what's wrong with atapi CD emulation on wdc(4), my recommendation
would be to move the cd to a vioscsi device instead of wdc.


Yes we know various workarounds, but more detailed view shows that there 
exists kernel memory corruption that is somehow related to ATAPI 
timeouts - leading to trap when accessing xfer->chp ...


I build stable-7.3 kernel with this patch:

--- dev/ic/wdc.c31 Dec 2019 10:05:32 -  1.136
+++ dev/ic/wdc.c28 May 2023 08:24:04 -
@@ -883,8 +883,10 @@ wdcstart(struct channel_softc *chp)
return;
}

+   printf("HP: xfer=%p orig chp=%p\n",xfer,chp);
/* adjust chp, in case we have a shared queue */
chp = xfer->chp;
+   printf("HP: xfer=%p xfer->chp=%p\n",xfer,chp);

if ((chp->ch_flags & WDCF_ACTIVE) != 0 ) {
return; /* channel already active */


And here is what I got:

... lot of messages with occasional timeout message ...
HP: xfer=0xfd807e020c38 orig chp=0x8007c168
HP: xfer=0xfd807e020c38 xfer->chp=0x8007c168
HP: xfer=0xfd807e020c38 orig chp=0x8007c168
HP: xfer=0xfd807e020c38 xfer->chp=0x6e1e3d12d428657b
kernel: protection fault trap, code=0
Stopped at  wdcstart+0x49:  movl0x58(%r15),%eax
ddb> trace
wdcstart(8007c168,8007c168,8007c168,fd807e020c38,10,800021707a90)
 at wdcstart+0x49
wdc_atapi_the_machine(8007c168,fd807e020c38,2,8007c168,8007c168,fd807e020c38)
 at wdc_atapi_the_machine+0x14a
wdc_atapi_intr(8007c168,fd807e020c38,1,8007c168,fd807e020c38,8007c168)
 at wdc_atapi_intr+0x47
wdcintr(8007c168,8007c168,0,0,6,1)
 at 
wdcintr+0xaeintr_handler(800021707bf0,80065500,80065680,0,81212216,800021707be0)

 at intr_handler+0x26
Xintr_ioapic_edge14_untramp(0,ff9c,81483770,0,8000216cd060,75f3d68ef248)
 at Xintr_ioapic_edge14_untramp+0x18f
ndinitat(8000216cd060,ff9c,2e96cea10,75f3d68ef248,0,8000216cd060)
 at ndinitat
syscall(800021707ef0,800021707ef0,0,8000216cd060,0,0)
 at syscall+0x201
Xsyscall(6,26,5,26,2e96cea10,0)
 at Xsyscall+0x128
end of kernel
end trace frame: 0x75f3d68ef2f0, count: -9
ddb>


So it seems that part of xfer structure is under some rare condition 
overwritten.


The question is how to find what is causing that corruption.

Best regards
  --Henryk Paluch