On 6/9/25 18:12, Paolo Bonzini wrote:
On 6/9/25 15:23, Andrey Zhadchenko wrote:
When hotplugging vCPUs to the Windows vms, we observed strange instance
crash on Intel(R) Xeon(R) CPU E3-1230 v6:
panic hyper-v: arg1='0x3e', arg2='0x46d359bbdff', arg3='0x56d359bbdff', arg4='0x0', arg5='0x0'

Presumably, Windows thinks that hotplugged CPU is not "equivalent enough" to the previous ones. The problem lies within msr 3a. During the startup,
Windows assigns some value to this register. During the hotplug it
expects similar value on the new vCPU in msr 3a. But by default it
is zero.

If I understand correctly, you checked that it's Windows that writes 0x40005 to the MSR on non-hotplugged CPUs.

    CPU 0/KVM-16856   [007] ....... 380.398695: kvm_msr: msr_read 3a = 0x0     CPU 0/KVM-16856   [007] .......   380.398696: kvm_msr: msr_write 3a = 0x40005     CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_read 3a = 0x0     CPU 3/KVM-16859   [001] .......   380.398914: kvm_msr: msr_write 3a = 0x40005     CPU 2/KVM-16858   [006] .......   380.398963: kvm_msr: msr_read 3a = 0x0     CPU 2/KVM-16858   [006] .......   380.398964: kvm_msr: msr_write 3a = 0x40005     CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_read 3a = 0x0     CPU 1/KVM-16857   [004] .......   380.399007: kvm_msr: msr_write 3a = 0x40005

This is a random chcek happening, like the one below:

    CPU 0/KVM-16856   [001] ....... 384.497714: kvm_msr: msr_read 3a = 0x40005     CPU 0/KVM-16856   [001] .......   384.497716: kvm_msr: msr_read 3a = 0x40005     CPU 1/KVM-16857   [007] .......   384.934791: kvm_msr: msr_read 3a = 0x40005     CPU 1/KVM-16857   [007] .......   384.934793: kvm_msr: msr_read 3a = 0x40005     CPU 2/KVM-16858   [002] .......   384.977871: kvm_msr: msr_read 3a = 0x40005     CPU 2/KVM-16858   [002] .......   384.977873: kvm_msr: msr_read 3a = 0x40005     CPU 3/KVM-16859   [006] .......   385.021217: kvm_msr: msr_read 3a = 0x40005     CPU 3/KVM-16859   [006] .......   385.021220: kvm_msr: msr_read 3a = 0x40005     CPU 4/KVM-17500   [002] .......   453.733743: kvm_msr: msr_read 3a = 0x0        <- new vcpu, Windows wants to see 0x40005 here instead of default value>     CPU 4/KVM-17500   [002] .......   453.733745: kvm_msr: msr_read 3a = 0x0

Bit #18 probably means that Intel SGX is supported, because disabling
it via CPU arguments results is successfull hotplug (and msr value 0x5).

What is the trace like in this case?  Does Windows "accept" 0x0 and write 0x5?

Does anything in edk2 run during the hotplug process (on real hardware it does, because the whole hotplug is managed via SMM)? If so maybe that could be a better place to write the value.

So many questions, but I'd really prefer to avoid this hack if the only reason for it is SGX...

This problem was originally reported in the scope of
    https://gitlab.com/qemu-project/qemu/-/issues/2669
and is fairly reproducible on
  vendor_id    : GenuineIntel
  cpu family    : 6
  model        : 158
  model name    : Intel(R) Xeon(R) CPU E3-1230 v6 @ 3.50GHz
  stepping    : 9
  microcode    : 0xf4
We are blocked completely without this patch on our test
cluster with this hardware.

BSOD is namely the following:

|MULTIPROCESSOR_CONFIGURATION_NOT_SUPPORTED (3e) ||The system has multiple processors, but they are asymmetric in relation ||to one another. In order to be symmetric all processors must be of ||the same type and level. For example, trying to mix a Pentium level ||processor with an 80486 would cause this BugCheck. ||Arguments: ||Arg1: 0000046d359bbdff ||Arg2: 0000056d359bbdff ||Arg3: 0000000000000000 ||Arg4: 0000000000000000|

|STACK_TEXT: ||ffff9b81`085768e0 fffff802`adadfa45 : ffff9b81`085771b8 00000000`00000000 ffff9b81`08577160 00000000`00000004 : nt!KiStartDynamicProcessor+0x417 ||ffff9b81`085770e0 fffff809`d2c11c08 : ffffab8c`dbbcb820 ffffab8c`e4561e40 ffffab8c`e4561e40 fffff809`d2c0c340 : nt!KeStartDynamicProcessor+0x69 ||ffff9b81`08577110 fffff809`d2be4363 : 00000000`00000001 fffff802`ad6e0000 00000000`00000004 fffff802`00000004 : ACPI!ACPIProcessorStartDevice+0x275b8 ||ffff9b81`085771a0 fffff809`d2ac98e2 : 00000000`00000007 ffffab8c`e2f14970 ffffab8c`e2783c60 00000000`00000000 : ACPI!ACPIDispatchIrp+0x223 ||(Inline Function) --------`-------- : --------`-------- --------`-------- --------`-------- --------`-------- : Wdf01000!FxIrp::CallDriver+0x14 [d:\rs1\minkernel\wdf\framework\shared\inc\private\km\fxirpkm.hpp @ 85] ||ffff9b81`08577220 fffff809`d2acc431 : ffffab8c`e2783c60 ffffab8c`e2f14970 00000000`00000002 00000000`00000000 : Wdf01000!FxPkgFdo::PnpSendStartDeviceDownTheStackOverload+0xd2 [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgfdo.cpp @ 1100] ||ffff9b81`08577290 fffff809`d2ac6a89 : ffffab8c`e2f14970 00000000`00000106 00000000`00000105 fffff809`d2b43290 : Wdf01000!FxPkgPnp::PnpEventInitStarting+0x11 [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1328] ||(Inline Function) --------`-------- : --------`-------- --------`-------- --------`-------- --------`-------- : Wdf01000!FxPkgPnp::PnpEnterNewState+0xda [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1234] ||ffff9b81`085772c0 fffff809`d2ac41a8 : ffffab8c`e2f14ac8 ffff9b81`00000000 ffffab8c`e2f14aa0 00000000`00000001 : Wdf01000!FxPkgPnp::PnpProcessEventInner+0x1c9 [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 1150] ||ffff9b81`08577370 fffff809`d2ad6e9e : 00000000`00000000 ffff9b81`08577479 00000000`00000000 ffffab8c`e2a40270 : Wdf01000!FxPkgPnp::PnpProcessEvent+0x158 [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\pnpstatemachine.cpp @ 933] ||ffff9b81`08577410 fffff809`d2aa3e7f : ffffab8c`e2f14970 ffff9b81`08577479 00000000`00000000 ffffab8c`e2783c60 : Wdf01000!FxPkgPnp::_PnpStartDevice+0x1e [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 1845] ||ffff9b81`08577440 fffff809`d2aa34f5 : ffffab8c`e2783c60 ffffab8c`e2f14970 ffffab8c`e2783c60 fffff802`00000003 : Wdf01000!FxPkgPnp::Dispatch+0xef [d:\rs1\minkernel\wdf\framework\shared\irphandlers\pnp\fxpkgpnp.cpp @ 654] ||(Inline Function) --------`-------- : --------`-------- --------`-------- --------`-------- --------`-------- : Wdf01000!DispatchWorker+0xdf [d:\rs1\minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1572] ||(Inline Function) --------`-------- : --------`-------- --------`-------- --------`-------- --------`-------- : Wdf01000!FxDevice::Dispatch+0xeb [d:\rs1\minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1586] ||ffff9b81`085774e0 fffff802`ad908d7d : ffffab8c`e2f10e20 ffff9b81`08577604 00000000`00000000 00000000`00000000 : Wdf01000!FxDevice::DispatchWithLock+0x155 [d:\rs1\minkernel\wdf\framework\shared\core\fxdevice.cpp @ 1430] ||ffff9b81`085775d0 fffff802`ad5512f6 : ffffab8c`e4561e40 00000000`00000001 ffffab8c`e2b95bf0 00000000`00000000 : nt!PnpAsynchronousCall+0xe5 ||ffff9b81`08577610 fffff802`ad57f738 : 00000000`00000000 ffffab8c`e4561e40 fffff802`ad550e14 fffff802`ad550e14 : nt!PnpSendIrp+0x92 ||ffff9b81`08577680 fffff802`ad9084c7 : ffffab8c`e28b8190 ffffab8c`e2b95bf0 00000000`00000000 00000000`00000000 : nt!PnpStartDevice+0x88 ||ffff9b81`08577710 fffff802`ad8ff8c3 : ffffab8c`e28b8190 ffff9b81`085778e0 00000000`00000000 ffffab8c`e28b8190 : nt!PnpStartDeviceNode+0xdb ||ffff9b81`085777a0 fffff802`ad96670d : ffffab8c`e28b8190 00000000`00000001 00000000`00000001 ffffab8c`dba25d30 : nt!PipProcessStartPhase1+0x53 ||ffff9b81`085777e0 fffff802`ad9063ae : ffffab8c`e227d990 00000000`00000000 ffff9b81`08577b19 fffff802`ad966c17 : nt!PipProcessDevNodeTree+0x401 ||ffff9b81`08577a60 fffff802`ad550176 : 00000001`00000003 00000000`00000000 00000000`00000000 00000000`00000000 : nt!PiProcessReenumeration+0xa6 ||ffff9b81`08577ab0 fffff802`ad4ff6b9 : ffffab8c`dc66e800 fffff802`ad7ae380 fffff802`ad8502c0 fffff802`ad8502c0 : nt!PnpDeviceActionWorker+0x166 ||ffff9b81`08577b80 fffff802`ad5957b9 : ffffab8c`dc66e800 00000000`00000080 ffffab8c`db6b33c0 ffffab8c`dc66e800 : nt!ExpWorkerThread+0xe9 ||ffff9b81`08577c10 fffff802`ad5f6966 : ffff9b81`08100180 ffffab8c`dc66e800 fffff802`ad595778 00000000`00000000 : nt!PspSystemThreadStartup+0x41 ||ffff9b81`08577c60 00000000`00000000 : ffff9b81`08578000 ffff9b81`08572000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16|

Linux by itself handles this well and assigns MSRs properly (we observe
corresponding set_msr on the hotplugged CPU).

Den

Reply via email to