On Wed, Sep 27, 2023 at 10:21:44AM +0200, Jan Beulich wrote:
> On 15.09.2023 09:43, Roger Pau Monne wrote:
> > The current logic to chose the preferred reboot method is based on the mode 
> > Xen
> > has been booted into, so if the box is booted from UEFI, the preferred 
> > reboot
> > method will be to use the ResetSystem() run time service call.
> > 
> > However, that method seems to be widely untested, and quite often leads to a
> > result similar to:
> > 
> > Hardware Dom0 shutdown: rebooting machine
> > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > CPU:    0
> > RIP:    e008:[<0000000000000017>] 0000000000000017
> > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > [...]
> > Xen call trace:
> >    [<0000000000000017>] R 0000000000000017
> >    [<ffff83207eff7b50>] S ffff83207eff7b50
> >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> >    [<ffff82d040283d33>] F 
> > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> >    [<ffff82d04028436c>] F 
> > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> > 
> > ****************************************
> > Panic on CPU 0:
> > FATAL TRAP: vector = 6 (invalid opcode)
> > ****************************************
> > 
> > Which in most cases does lead to a reboot, however that's unreliable.
> > 
> > Change the default reboot preference to prefer ACPI over UEFI if available 
> > and
> > not in reduced hardware mode.
> > 
> > This is in line to what Linux does, so it's unlikely to cause issues on 
> > current
> > and future hardware, since there's a much higher chance of vendors testing
> > hardware with Linux rather than Xen.
> > 
> > Add a special case for one Acer model that does require being rebooted using
> > ResetSystem().  See Linux commit 0082517fa4bce for rationale.
> > 
> > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > properly implemented ResetSystem() methods.
> 
> A data point from a new system I'm still in the process of setting up: The
> ACPI reboot method, as used by Linux, unconditionally means a warm reboot.
> The EFI method, otoh, properly distinguishes "reboot=warm" from our default
> of explicitly requesting cold reboot. (Without taking the EFI path, I
> assume our write to the relevant BDA location simply has no effect, for
> this being a legacy BIOS thing, and the system apparently defaults to warm
> reboot when using the ACPI method.)

This is unfortunate, but IMO not as worse as getting a #UD or any
other fault while attempting a reboot.  We can always force this
system to use UEFI reboot, if that does work better than ACPI.

> Clearly, as a secondary effect, this system adds to my personal experience
> of so far EFI reboot consistently working on all x86 hardware I have (had)
> direct access to. (That said, this is the first non-Intel system, which
> likely biases my overall experience.)

I can try to gather some data, I can at least tell you that the Intel
NUC11TNHi7 TGL does also hit a fault when attempting UEFI reboot.
The above crash was from a Dell PowerEdge R6625.  I do recall seeing
this with other boxes on the Citrix lab, but don't know the exact
models.  I'm quite sure other downstreams can provide similar
feedback.

I think it's clear now that using ResetSystem() when booted from UEFI
is not mandated by the UEFI specification, so I still stand by this
patch and think we should select the default reboot method that has
the highest chance of succeeding.

Thanks, Roger.

Reply via email to