On Tue, Oct 03, 2023 at 01:35:25PM +0200, Roger Pau Monné wrote:
> On Wed, Sep 27, 2023 at 10:21:44AM +0200, Jan Beulich wrote:
> > On 15.09.2023 09:43, Roger Pau Monne wrote:
> > > The current logic to chose the preferred reboot method is based on the 
> > > mode Xen
> > > has been booted into, so if the box is booted from UEFI, the preferred 
> > > reboot
> > > method will be to use the ResetSystem() run time service call.
> > > 
> > > However, that method seems to be widely untested, and quite often leads 
> > > to a
> > > result similar to:
> > > 
> > > Hardware Dom0 shutdown: rebooting machine
> > > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > > CPU:    0
> > > RIP:    e008:[<0000000000000017>] 0000000000000017
> > > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > > [...]
> > > Xen call trace:
> > >    [<0000000000000017>] R 0000000000000017
> > >    [<ffff83207eff7b50>] S ffff83207eff7b50
> > >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> > >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> > >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> > >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> > >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> > >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> > >    [<ffff82d040283d33>] F 
> > > arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> > >    [<ffff82d04028436c>] F 
> > > arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> > >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> > > 
> > > ****************************************
> > > Panic on CPU 0:
> > > FATAL TRAP: vector = 6 (invalid opcode)
> > > ****************************************
> > > 
> > > Which in most cases does lead to a reboot, however that's unreliable.
> > > 
> > > Change the default reboot preference to prefer ACPI over UEFI if 
> > > available and
> > > not in reduced hardware mode.
> > > 
> > > This is in line to what Linux does, so it's unlikely to cause issues on 
> > > current
> > > and future hardware, since there's a much higher chance of vendors testing
> > > hardware with Linux rather than Xen.
> > > 
> > > Add a special case for one Acer model that does require being rebooted 
> > > using
> > > ResetSystem().  See Linux commit 0082517fa4bce for rationale.
> > > 
> > > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > > properly implemented ResetSystem() methods.
> > 
> > A data point from a new system I'm still in the process of setting up: The
> > ACPI reboot method, as used by Linux, unconditionally means a warm reboot.
> > The EFI method, otoh, properly distinguishes "reboot=warm" from our default
> > of explicitly requesting cold reboot. (Without taking the EFI path, I
> > assume our write to the relevant BDA location simply has no effect, for
> > this being a legacy BIOS thing, and the system apparently defaults to warm
> > reboot when using the ACPI method.)
> 
> This is unfortunate, but IMO not as worse as getting a #UD or any
> other fault while attempting a reboot.  We can always force this
> system to use UEFI reboot, if that does work better than ACPI.
> 
> > Clearly, as a secondary effect, this system adds to my personal experience
> > of so far EFI reboot consistently working on all x86 hardware I have (had)
> > direct access to. (That said, this is the first non-Intel system, which
> > likely biases my overall experience.)
> 
> I can try to gather some data, I can at least tell you that the Intel
> NUC11TNHi7 TGL does also hit a fault when attempting UEFI reboot.
> The above crash was from a Dell PowerEdge R6625.  I do recall seeing
> this with other boxes on the Citrix lab, but don't know the exact
> models.  I'm quite sure other downstreams can provide similar
> feedback.

As a further data point, Dasharo [0] a coreboot downstream was also
providing a firmware with a broken ResetSystem() method, and they
didn't notice until someone reported errors on Xen reboot:

https://github.com/Dasharo/edk2/pull/99/commits/dee75be10ac9387168bd3a8cad0f1ec6e372129a

It's quite clear no one is testing ResetSystem(), the UEFI spec
doesn't mandate using it, and we are just hurting ourselves by forcing
its usage.

Regards, Roger.

[0] https://github.com/Dasharo

Reply via email to