[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
Upstream has been working with me to try to determine what is going on here. The conclusion is that we believe that firmware is piggy-backing on the ARM_SMCCC_ARCH_WORKAROUND calls, and is clobbering some of the registers in the x4-x17 range. The patch series mentioned in Comment #9 happens to avoid consuming those registers, hiding the issue. Apparently clobbering those registers was previously OK, but the SMCCCv1.1 update mandates that x4-x17 be preserved, which the firmware authors may have missed. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
Note that bare metal boots also fail the same way w/ the MDS workaround enabled (failing VM boots all had a newer kernel running on the host): EFI stub: Booting Linux Kernel... EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... [0.00] Booting Linux on physical CPU 0x00 [0x503f0002] [0.00] Linux version 4.15.0-147-generic (buildd@bos02-arm64-076) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #151-Ubuntu SMP Fri Jun 18 19:18:37 UTC 2021 (Ubuntu 4.15.0-147.151-generic 4.15.18) [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: EFI v2.70 by American Megatrends [0.00] efi: ACPI 2.0=0xbff596 SMBIOS 3.0=0xbff686fd98 ESRT=0xbff1a3a018 [0.00] esrt: Reserving ESRT space from 0x00bff1a3a018 to 0x00bff1a3a078. [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x00BFF596 24 (v02 ALASKA) [0.00] ACPI: XSDT 0x00BFF5960028 94 (v01 ALASKA A M I 01072009 AMI 00010013) [0.00] ACPI: FACP 0x00BFF59600C0 000114 (v06 Ampere eMAG 0003 INTL 20190509) [0.00] ACPI: DSDT 0x00BFF59601D8 0077CD (v05 ALASKA A M I 0001 INTL 20190509) [0.00] ACPI: FIDT 0x00BFF59679A8 9C (v01 ALASKA A M I 01072009 AMI 00010013) [0.00] ACPI: DBG2 0x00BFF5967A48 61 (v00 Ampere eMAG INTL 20190509) [0.00] ACPI: GTDT 0x00BFF5967AB0 000108 (v02 Ampere eMAG 0001 INTL 20190509) [0.00] ACPI: IORT 0x00BFF5967BB8 000BCC (v00 Ampere eMAG INTL 20190509) [0.00] ACPI: MCFG 0x00BFF5968788 AC (v01 Ampere eMAG 0001 INTL 20190509) [0.00] ACPI: SSDT 0x00BFF5968838 2D (v02 Ampere eMAG 0001 INTL 20190509) [0.00] ACPI: SPMI 0x00BFF5968868 41 (v05 ALASKA A M I AMI. ) [0.00] ACPI: APIC 0x00BFF59688B0 000A68 (v04 Ampere eMAG AMP. 0113) [0.00] ACPI: PCCT 0x00BFF5969318 0005D0 (v01 Ampere eMAG 0003 0113) [0.00] ACPI: BERT 0x00BFF59698E8 30 (v01 Ampere eMAG 0003 INTL 20190509) [0.00] ACPI: HEST 0x00BFF5969918 000328 (v01 Ampere eMAG 0003 INTL 20190509) [0.00] ACPI: SPCR 0x00BFF5969C40 50 (v02 A M I APTIO V 01072009 AMI. 0005000D) [0.00] ACPI: PPTT 0x00BFF5969C90 000CB8 (v01 Ampere eMAG 0003 0113) [0.00] ACPI: SPCR: console: pl011,mmio32,0x1260,115200 [0.00] ACPI: NUMA: Failed to initialise from firmware [0.00] NUMA: Faking a node at [mem 0x9000-0x00bf] [0.00] NUMA: NODE_DATA [mem 0xbe7d00-0xbeafff] [0.00] Zone ranges: [0.00] DMA [mem 0x9000-0x] [0.00] Normal [mem 0x0001-0x00bf] [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x9000-0x91ff] [0.00] node 0: [mem 0x9200-0x928f] [0.00] node 0: [mem 0x9290-0xfffb] [0.00] node 0: [mem 0xfffc-0x] [0.00] node 0: [mem 0x00088000-0x000f] [0.00] node 0: [mem 0x0088-0x00bff5913fff] [0.00] node 0: [mem 0x00bff5914000-0x00bff595] [0.00] node 0: [mem 0x00bff596-0x00bff59d] [0.00] node 0: [mem 0x00bff59e-0x00bff7de] [0.00] node 0: [mem 0x00bff7df-0x00bff7e5] [0.00] node 0: [mem 0x00bff7e6-0x00bff7ff] [0.00] node 0: [mem 0x00bff800-0x00bf] [0.00] Initmem setup node 0 [mem 0x9000-0x00bf] [0.00] psci: probing for conduit method from ACPI. [0.00] psci: PSCIv1.1 detected in firmware. [0.00] psci: Using standard PSCI v0.2 function IDs [0.00] psci: MIGRATE_INFO_TYPE not supported. [0.00] psci: SMC Calling Convention v1.1 [0.00] random: get_random_bytes called from start_kernel+0xa8/0x478 with crng_init=0 [0.00] percpu: Embedded 25 pages/cpu s62232 r8192 d31976 u102400 [0.00] Detected PIPT I-cache on CPU0 [0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware [0.00] CPU features: enabling workaround for Speculative Store Bypass Disable [0.00] CPU features: detected: Kernel page table isolation (KPTI) [0.00] Built 1 zonelists, mobility grouping on. Total pages: 65995776 [0.00] Policy zone: Normal [0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-147-generic root
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
** Changed in: linux (Ubuntu) Status: New => Triaged -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
The changeset referenced in Comment #8 was part of a 21-part changeset shown here: https://patches.linaro.org/cover/141739/ I was able to backport these changes to our 4.15 kernel and get it booting w/ the MDS workaround enabled. One option is to SRU those (and a number of follow-up Fixes: changes for them). Presumably that's of some security benefit, though I'm not sure how significant of one. It also isn't clear exactly why those changes are needed for compat with the firmware MDS workaround. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** Changed in: qemu (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
I found that booting an upstream 4.18 kernel in the guest trips this problem, while an upstream 5.4 does not. I bisected and found that this commit seems to be the relevant change: commit 3b7142752e4bee153df6db4a76ca104ef0d7c0b4 (refs/bisect/bad) Author: Mark Rutland Date: Wed Jul 11 14:56:45 2018 +0100 arm64: convert native/compat syscall entry to C Now that the syscall invocation logic is in C, we can migrate the rest of the syscall entry logic over, so that the entry assembly needn't look at the register values at all. The SVE reset across syscall logic now unconditionally clears TIF_SVE, but sve_user_disable() will only write back to CPACR_EL1 when SVE is actually enabled. Signed-off-by: Mark Rutland Reviewed-by: Catalin Marinas Reviewed-by: Dave Martin Cc: Will Deacon Signed-off-by: Will Deacon -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
I suppose the next step is to see if we can figure out why this only seems to impact bionic guests. I'll see if I can get access to the system again and try and identify the relevant difference. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
I've confirmed that enabling MDS mitigation causes the problem to occur - it does not occur with it disabled. Firmware version 11.05.116 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
I've got access to an eMAG system now, so I'm going to try to reproduce there. I see the MDS mitigation setting, so I can try flipping that as well. I don't have any theory as to why that would help/hurt, but no reason not to try it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
On Mon, Jun 14, 2021 at 11:23:06PM -, dann frazier wrote: > btw, this bug says bionic guests fails to boot - do we know if other > Ubuntu guests versions are OK? Thanks for your efforts looking into this so far. Yeah, I've only seen it in bionic. I tried spawning instances for all current supported releases, and only bionic failed: ubuntu@juju-806ee7-stg-proposed-migration-43:~$ for ip in $(openstack server list | awk -F'[= |]+' '/bos01-arm64/ { print $6 }'); do timeout 60s ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no ubuntu@${ip} 'lsb_release -a; uptime'; done Warning: Permanently added '10.43.128.5' (ECDSA) to the list of known hosts. Distributor ID: Ubuntu Description:Ubuntu 16.04.7 LTS Release:16.04 Codename: xenial No LSB modules are available. 08:21:32 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00 Warning: Permanently added '10.43.128.22' (ECDSA) to the list of known hosts. Distributor ID: Ubuntu Description:Ubuntu Impish Indri (development branch) Release:21.10 Codename: impish No LSB modules are available. 08:21:34 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00 Warning: Permanently added '10.43.128.8' (ECDSA) to the list of known hosts. Distributor ID: Ubuntu Description:Ubuntu 21.04 Release:21.04 Codename: hirsute No LSB modules are available. 08:21:36 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00 Warning: Permanently added '10.43.128.4' (ECDSA) to the list of known hosts. Distributor ID: Ubuntu Description:Ubuntu 20.10 Release:20.10 Codename: groovy No LSB modules are available. 08:21:38 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00 Warning: Permanently added '10.43.128.15' (ECDSA) to the list of known hosts. Distributor ID: Ubuntu Description:Ubuntu 20.04.2 LTS Release:20.04 Codename: focal No LSB modules are available. 08:21:40 up 3 days, 21:33, 0 users, load average: 0.03, 0.01, 0.00 ssh: connect to host 10.43.128.20 port 22: No route to host # that's the bionic instance and I double checked they all eneded up on an eMAG. (I was trying all releases with a reboot loop to try to reproduce the other issue Julian mentioned on ubuntu-devel, but failed to cause it to happen ...) What do you think about the "MDS mitigation" BIOS setting idea as a difference between the working/broken installations, is it worth trying to get IS to flip that? Seems like it's probably sane to have it on, but it'd maybe be useful info. Cheers, -- Iain Lane [ i...@orangesquash.org.uk ] Debian Developer [ la...@debian.org ] Ubuntu Developer [ la...@ubuntu.com ] -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
IS gave me a copy of the libvirt xml for the guest, and I used it to build a VM as similar as possible (minus some details about storage/NIC). However I was still unable to reproduce on a ThunderX2 host. I asked a contact over at Ampere Computing to see if this symptom was something they'd seen before, but it didn't ring any bells. We are using host-passthrough CPUs, so one difference would be underlying CPU. I noticed that the failing guest log shows KPTI as a cpu feature but my (working) one doesn't. I tried forcing KPTI on (kpti=1) on the guest to compensate, but that also did not trigger the failure. I couldn't really spot any other interesting differences in the log. btw, this bug says bionic guests fails to boot - do we know if other Ubuntu guests versions are OK? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
I don't have access to an eMAG system at the moment, but I tried and failed to reproduce on ThunderX 2 box by running xenial + hwe kernel + cloud-archive:queens. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud
This is very strange. It looks like the guest has managed to start /init in the initramfs. /init is a simple shell script that just creates a few directories/mounts sysfs/proc, before printing "Loading, please wait" - which we do not see. I've tried to simulate this by removing /init from the initramfs, removing /bin/sh, and removing the loader (/lib/ld-linux-aarch64.so.1), but none cause the boot to fail in *this* way. I've asked for some more info from the scalingstack setup - the libvirt xml/qemu command with the hopes of reproducing it in a debug environment. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1931728 Title: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs