[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-08-16 Thread dann frazier
Upstream has been working with me to try to determine what is going on
here. The conclusion is that we believe that firmware is piggy-backing
on the ARM_SMCCC_ARCH_WORKAROUND calls, and is clobbering some of the
registers in the x4-x17 range. The patch series mentioned in Comment #9
happens to avoid consuming those registers, hiding the issue. Apparently
clobbering those registers was previously OK, but the SMCCCv1.1 update
mandates that x4-x17 be preserved, which the firmware authors may have
missed.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-07-19 Thread dann frazier
Note that bare metal boots also fail the same way w/ the MDS workaround
enabled (failing VM boots all had a newer kernel running on the host):

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
[0.00] Booting Linux on physical CPU 0x00 [0x503f0002]
[0.00] Linux version 4.15.0-147-generic (buildd@bos02-arm64-076) (gcc 
version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #151-Ubuntu SMP Fri Jun 18 
19:18:37 UTC 2021 (Ubuntu 4.15.0-147.151-generic 4.15.18)
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: EFI v2.70 by American Megatrends
[0.00] efi:  ACPI 2.0=0xbff596  SMBIOS 3.0=0xbff686fd98  
ESRT=0xbff1a3a018 
[0.00] esrt: Reserving ESRT space from 0x00bff1a3a018 to 
0x00bff1a3a078.
[0.00] ACPI: Early table checksum verification disabled
[0.00] ACPI: RSDP 0x00BFF596 24 (v02 ALASKA)
[0.00] ACPI: XSDT 0x00BFF5960028 94 (v01 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: FACP 0x00BFF59600C0 000114 (v06 Ampere eMAG 
0003 INTL 20190509)
[0.00] ACPI: DSDT 0x00BFF59601D8 0077CD (v05 ALASKA A M I
0001 INTL 20190509)
[0.00] ACPI: FIDT 0x00BFF59679A8 9C (v01 ALASKA A M I
01072009 AMI  00010013)
[0.00] ACPI: DBG2 0x00BFF5967A48 61 (v00 Ampere eMAG 
 INTL 20190509)
[0.00] ACPI: GTDT 0x00BFF5967AB0 000108 (v02 Ampere eMAG 
0001 INTL 20190509)
[0.00] ACPI: IORT 0x00BFF5967BB8 000BCC (v00 Ampere eMAG 
 INTL 20190509)
[0.00] ACPI: MCFG 0x00BFF5968788 AC (v01 Ampere eMAG 
0001 INTL 20190509)
[0.00] ACPI: SSDT 0x00BFF5968838 2D (v02 Ampere eMAG 
0001 INTL 20190509)
[0.00] ACPI: SPMI 0x00BFF5968868 41 (v05 ALASKA A M I
 AMI. )
[0.00] ACPI: APIC 0x00BFF59688B0 000A68 (v04 Ampere eMAG 
 AMP. 0113)
[0.00] ACPI: PCCT 0x00BFF5969318 0005D0 (v01 Ampere eMAG 
0003  0113)
[0.00] ACPI: BERT 0x00BFF59698E8 30 (v01 Ampere eMAG 
0003 INTL 20190509)
[0.00] ACPI: HEST 0x00BFF5969918 000328 (v01 Ampere eMAG 
0003 INTL 20190509)
[0.00] ACPI: SPCR 0x00BFF5969C40 50 (v02 A M I  APTIO V  
01072009 AMI. 0005000D)
[0.00] ACPI: PPTT 0x00BFF5969C90 000CB8 (v01 Ampere eMAG 
0003  0113)
[0.00] ACPI: SPCR: console: pl011,mmio32,0x1260,115200
[0.00] ACPI: NUMA: Failed to initialise from firmware
[0.00] NUMA: Faking a node at [mem 
0x9000-0x00bf]
[0.00] NUMA: NODE_DATA [mem 0xbe7d00-0xbeafff]
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x9000-0x]
[0.00]   Normal   [mem 0x0001-0x00bf]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x9000-0x91ff]
[0.00]   node   0: [mem 0x9200-0x928f]
[0.00]   node   0: [mem 0x9290-0xfffb]
[0.00]   node   0: [mem 0xfffc-0x]
[0.00]   node   0: [mem 0x00088000-0x000f]
[0.00]   node   0: [mem 0x0088-0x00bff5913fff]
[0.00]   node   0: [mem 0x00bff5914000-0x00bff595]
[0.00]   node   0: [mem 0x00bff596-0x00bff59d]
[0.00]   node   0: [mem 0x00bff59e-0x00bff7de]
[0.00]   node   0: [mem 0x00bff7df-0x00bff7e5]
[0.00]   node   0: [mem 0x00bff7e6-0x00bff7ff]
[0.00]   node   0: [mem 0x00bff800-0x00bf]
[0.00] Initmem setup node 0 [mem 0x9000-0x00bf]
[0.00] psci: probing for conduit method from ACPI.
[0.00] psci: PSCIv1.1 detected in firmware.
[0.00] psci: Using standard PSCI v0.2 function IDs
[0.00] psci: MIGRATE_INFO_TYPE not supported.
[0.00] psci: SMC Calling Convention v1.1
[0.00] random: get_random_bytes called from start_kernel+0xa8/0x478 
with crng_init=0
[0.00] percpu: Embedded 25 pages/cpu s62232 r8192 d31976 u102400
[0.00] Detected PIPT I-cache on CPU0
[0.00] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware
[0.00] CPU features: enabling workaround for Speculative Store Bypass 
Disable
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 65995776
[0.00] Policy zone: Normal
[0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-147-generic 

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-25 Thread dann frazier
** Changed in: linux (Ubuntu)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-24 Thread dann frazier
The changeset referenced in Comment #8 was part of a 21-part changeset shown 
here:
  https://patches.linaro.org/cover/141739/

I was able to backport these changes to our 4.15 kernel and get it
booting w/ the MDS workaround enabled. One option is to SRU those (and a
number of follow-up Fixes: changes for them). Presumably that's of some
security benefit, though I'm not sure how significant of one. It also
isn't clear exactly why those changes are needed for compat with the
firmware MDS workaround.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-24 Thread dann frazier
** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: qemu (Ubuntu)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-23 Thread dann frazier
I found that booting an upstream 4.18 kernel in the guest trips this
problem, while an upstream 5.4 does not. I bisected and found that this
commit seems to be the relevant change:

commit 3b7142752e4bee153df6db4a76ca104ef0d7c0b4 (refs/bisect/bad)
Author: Mark Rutland 
Date:   Wed Jul 11 14:56:45 2018 +0100

arm64: convert native/compat syscall entry to C

Now that the syscall invocation logic is in C, we can migrate the rest
of the syscall entry logic over, so that the entry assembly needn't look
at the register values at all.

The SVE reset across syscall logic now unconditionally clears TIF_SVE,
but sve_user_disable() will only write back to CPACR_EL1 when SVE is
actually enabled.

Signed-off-by: Mark Rutland 
Reviewed-by: Catalin Marinas 
Reviewed-by: Dave Martin 
Cc: Will Deacon 
Signed-off-by: Will Deacon 

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-22 Thread dann frazier
I suppose the next step is to see if we can figure out why this only
seems to impact bionic guests. I'll see if I can get access to the
system again and try and identify the relevant difference.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-17 Thread dann frazier
I've confirmed that enabling MDS mitigation causes the problem to occur
- it does not occur with it disabled.

Firmware version 11.05.116

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-16 Thread dann frazier
I've got access to an eMAG system now, so I'm going to try to reproduce
there. I see the MDS mitigation setting, so I can try flipping that as
well. I don't have any theory as to why that would help/hurt, but no
reason not to try it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-15 Thread Iain Lane
On Mon, Jun 14, 2021 at 11:23:06PM -, dann frazier wrote:
> btw, this bug says bionic guests fails to boot - do we know if other
> Ubuntu guests versions are OK?

Thanks for your efforts looking into this so far.

Yeah, I've only seen it in bionic. I tried spawning instances for all 
current supported releases, and only bionic failed:

ubuntu@juju-806ee7-stg-proposed-migration-43:~$ for ip in $(openstack server 
list | awk -F'[= |]+' '/bos01-arm64/ { print $6 }'); do timeout 60s ssh 
-oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no ubuntu@${ip} 
'lsb_release -a; uptime'; done
Warning: Permanently added '10.43.128.5' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description:Ubuntu 16.04.7 LTS
Release:16.04
Codename:   xenial
No LSB modules are available.
 08:21:32 up 3 days, 21:33,  0 users,  load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.22' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description:Ubuntu Impish Indri (development branch)
Release:21.10
Codename:   impish
No LSB modules are available.
 08:21:34 up 3 days, 21:33,  0 users,  load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.8' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description:Ubuntu 21.04
Release:21.04
Codename:   hirsute
No LSB modules are available.
 08:21:36 up 3 days, 21:33,  0 users,  load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.4' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description:Ubuntu 20.10
Release:20.10
Codename:   groovy
No LSB modules are available.
 08:21:38 up 3 days, 21:33,  0 users,  load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.15' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description:Ubuntu 20.04.2 LTS
Release:20.04
Codename:   focal
No LSB modules are available.
 08:21:40 up 3 days, 21:33,  0 users,  load average: 0.03, 0.01, 0.00
ssh: connect to host 10.43.128.20 port 22: No route to host  # that's the 
bionic instance

and I double checked they all eneded up on an eMAG.

(I was trying all releases with a reboot loop to try to reproduce the 
other issue Julian mentioned on ubuntu-devel, but failed to cause it to 
happen ...)

What do you think about the "MDS mitigation" BIOS setting idea as a 
difference between the working/broken installations, is it worth trying 
to get IS to flip that?  Seems like it's probably sane to have it on, 
but it'd maybe be useful info.

Cheers,

-- 
Iain Lane  [ i...@orangesquash.org.uk ]
Debian Developer   [ la...@debian.org ]
Ubuntu Developer   [ la...@ubuntu.com ]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-14 Thread dann frazier
IS gave me a copy of the libvirt xml for the guest, and I used it to
build a VM as similar as possible (minus some details about
storage/NIC). However I was still unable to reproduce on a ThunderX2
host. I asked a contact over at Ampere Computing to see if this symptom
was something they'd seen before, but it didn't ring any bells. We are
using host-passthrough CPUs, so one difference would be underlying CPU.
I noticed that the failing guest log shows KPTI as a cpu feature but my
(working) one doesn't. I tried forcing KPTI on (kpti=1) on the guest to
compensate, but that also did not trigger the failure. I couldn't really
spot any other interesting differences in the log.

btw, this bug says bionic guests fails to boot - do we know if other
Ubuntu guests versions are OK?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-11 Thread dann frazier
I don't have access to an eMAG system at the moment, but I tried and
failed to reproduce on ThunderX 2 box by running xenial + hwe kernel +
cloud-archive:queens.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

2021-06-11 Thread dann frazier
This is very strange. It looks like the guest has managed to start /init
in the initramfs. /init is a simple shell script that just creates a few
directories/mounts sysfs/proc, before printing "Loading, please
wait" - which we do not see. I've tried to simulate this by removing
/init from the initramfs, removing /bin/sh, and removing the loader
(/lib/ld-linux-aarch64.so.1), but none cause the boot to fail in *this*
way. I've asked for some more info from the scalingstack setup - the
libvirt xml/qemu command with the hopes of reproducing it in a debug
environment.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1931728

Title:
  [scalingstack bos01] bionic (arm64) instances always fail to boot on
  eMAGs in this cloud

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1931728/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs