I got around to manually narrow down the bisection by re-ordering the
patches in the already obtained range. This got me to

* cda846f x86, realmode: read cr4 and EFER from kernel for 64-bit
trampoline

I also got around to wire up my Intel test box to have a real serial
port and then use this to handle Xen debug keys. Dumping the registers
with a stuck HVM VCPU shows that eax and cr4 are still the same. That
would indicate that code execution got at least to the place that
assigns CR4 but not much further (EAX would get replaced quite soon).

So the contents written into CR4 were: 0x1407f0. My first suspect was
the PGE flag since that looks to be depending on the PG flag in CR0 to
be set first. However masking that off had no effect. What turned out to
be the offender was the SMEP (supervisor mode execution protection)
which is also set in the CR4 contents that seem to be passed in by Xen.
By manually masking that off in trampoline_64.S:startup_32 all APs again
get started successfully.

Now the question is probably whether the realmode code should be more
conservative or whether it is the responsibility of the hypervisor to
hide this from the system. Even more as to my understanding the SMEP bit
in CR4 should actually not be set at all on this CPU as CPUID[7] does
not indicate support in bit7 of EBX (looked at that after a boot into
bare-metal mode).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1157757

Title:
  [Regression] Stuck CPU1-x when booting as Xen HVM guest on certain
  Intel hosts

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1157757/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to