Re: XSA-446 relevance on Intel

Andrew Cooper Tue, 12 Dec 2023 02:57:14 -0800

On 12/12/2023 9:43 am, James Dingwall wrote:
> Hi,
>
> We were experiencing a crash during PV domU boot on several different models
> of hardware but all with Intel CPUs.  The Xen version was based on stable-4.15
> at 4a4daf6bddbe8a741329df5cc8768f7dec664aed (XSA-444) with some local
> patches.  Since updating the branch to 
> b918c4cdc7ab2c1c9e9a9b54fa9d9c595913e028
> (XSA-446) we have not observed the same crash.


That range covers:

1f5f515da0f6 - iommu/amd-vi: use correct level for quarantine domain
page tables
b918c4cdc7ab - x86/spec-ctrl: Remove conditional IRQs-on-ness for INT
$0x80/0x82 paths

so yeah - not much in the way of change.

> The occurrence was on 1-2% of boots and we couldn't determine a particular
> sequence of events that would trigger it.  The kernel is based on Ubuntu's
> 5.15.0-91 tag but we also observed the same with -85.  Due to the low
> frequency it is possible that we simply haven't observed it again since
> updating our Xen build.
>
> If I have followed the early startup this is happening shortly after detection
> of possible CPU vulnerabilities and patching in alternative instructions.  As
> the RIP was native_irq_return_iret and XSA-446 related to interupt management
> I wondered if it was possible that despite "Xen is not believed to be 
> vulnerable
> in default configurations on CPUs from other hardware vendors." there could
> be some conditions in which an Intel CPU is affected?

In short, XSA-446 isn't plausibly related.  It's completely internal to
Xen, with no alteration on guest state.

It is an error that Linux has ended up in native_irq_return_iret.  Linux
cannot return to itself with an IRET instruction, and must use
HYPERCALL_iret instead.

In recent versions of Linux, this is fixed up as about the earliest
action a PV kernel takes, but on older versions of Linux, any
interrupt/exception early enough on boot was fatal in this way.


This part of the backtrace is odd:

[    0.398962]  ? native_iret+0x7/0x7
[    0.398967]  ? insn_decode+0x79/0x100
[    0.398975]  ? insn_decode+0xcf/0x100
[    0.398980]  optimize_nops+0x68/0x150

as it's not clear how we've ended up in a case wanting to return back to
the kernel to begin with.  However, it's most likely a pagefault, as
optimize_nops() is making changes in arbitrary locations.

It is possible that a change in visible features has altered the
behaviour enough not to crash, but if everything is still the same as
far as you can tell, then it's likely just chance that you haven't seen
it again.

This is definitely a Linux bug, so I suspect something bad has been
backported into Ubuntu.

~Andrew

Re: XSA-446 relevance on Intel

Reply via email to