** Description changed:
- Placeholder
- To be improved
+ [Impact]
+ * Currently, users cannot perform multiple kernel kexec loads on AWS Nitro
instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is
observed, with the following signature:
+
+ Initramfs unpacking failed: junk within compressed archive
+ [...]
+ Kernel panic - not syncing: No working init found.
+ Try passing init= option to kernel. See Linux
Documentation/admin-guide/init.rst for guidance.
+ CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26 Hardware
name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017
+ Call Trace:
+ dump_stack+0x6d/0x9a
+ ? csum_partial_copy_generic+0x150/0x170
+ panic+0x101/0x2e3
+ ? do_execve+0x25/0x30
+ ? rest_init+0xb0/0xb0
+ kernel_init+0xfb/0x100
+ ret_from_fork+0x35/0x40
+
+ * After investigation (see comment 2), it was noticed the Amazon ena
+ network driver doesn't provide a shutdown() handler, hence it could be
+ performing a DMA transaction to a previous valid address during boot,
+ which would then corrupt kernel memory. The following patch was proposed
+ and fixed the issue, allowing 1000 kexecs to be executed successfully
+ with no issues observed: 428c491332bc("net: ena: Add PCI shutdown
+ handler to allow safe kexec") [ git.kernel.org/linus/428c491332bc ].
+
+ * Hence, we are hereby requesting SRU for this patch. It was tested in
+ all supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances with
+ success, and reviewed/acked by ena driver team and a kexec developer
+ from other distro. Worth mentioning that we proposed an upstream multi-
+ vendor discussion about this issue: marc.info/?l=kexec&m=158299605013194
+
+ [Test case]
+
+ * The basic test procedure is about performing multiple kexecs
+ sequentially; AWS does not provide a full console, so in case of
+ failures one could check the instance screenshot or use pstore/ramoops
+ in order to collect dmesg after a crash in a preserved memory area. The
+ commands used to perform kexec are:
+
+ kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline
+ systemctl kexec
+
+ Alternatively, one could user "--append=" instead of "--reuse-cmdline"
+ if a change in kexec command-line is desired; also, to execute the
+ kexec-loaded kernel both "kexec -e" and "systemctl kexec" are equally
+ valid.
+
+ * On comment 3 we proposed a script/approach to auto-test kexecs, used
+ here to perform 1000 kexecs with the proposed patch.
+
+ [Regression Potential]
+
+ * Although the patch proposed here introduce a PCI handler, it kept the
+ remove handler identical and based shutdown strongly on ena_remove(),
+ changing just netdev handling following other upstream drivers. It was
+ extensively tested and presented no issue. Also, it's self-contained and
+ affect only one driver, so any other cloud providers or non-cloud
+ environment wouldn't be even affected by the patch.
+
+ * In case of a potential regression, it could manifest as a delay or
+ issue on reboot/shutdown path, only if ena driver is in use.
** Changed in: linux (Ubuntu Xenial)
Status: Confirmed => In Progress
** Changed in: linux (Ubuntu Bionic)
Status: Confirmed => In Progress
** Changed in: linux (Ubuntu Eoan)
Status: Confirmed => In Progress
** Changed in: linux (Ubuntu Focal)
Status: Confirmed => In Progress
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1869948
Title:
Multiple Kexec in AWS Nitro instances fail
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869948/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs