Bug#931644: Buster kernel entropy pool too low on VM boot
I've upgraded my VMs to the 10.3 point release and can confirm that cryptographic services (SSH and others) start quite rapidly now on system boot. Thanks, all! -Michael
Bug#931644: Buster kernel entropy pool too low on VM boot
Apologies for the late reply. I can certainly test on some of my VMs if you're willing to provide packages. Reading over Linus' explanation of deriving jitter from the CPU's cycle counter, while I'm no cryptographer, I might have some concerns about the quality of the entropy that will be generated by this patch on hypervisors that virtualize the time stamp counter. In my environment, I know I can instruct Xen to never virtualize the TSC ( https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html), which would probably benefit the patch, but AWS and other public cloud users may not have that option. -Michael
Bug#931644: Buster kernel entropy pool too low on VM boot
> The release notes for buster do mention this issue and provide a > link to: > > https://wiki.debian.org/BoottimeEntropyStarvation > > which has your Haveged solution as one of its suggestions. > D'oh! Serves me right for just skimming the release notes, then. After doing some in-depth reading, this is a problem for the Linux community at large. Wow. While I'm glad the kernel's getting choosier about where and how to harvest entropy and can personally live with the ~30 seconds added to VM boot times, it could be painful to, for example, bootstrap a Linux guest on AWS for the first time and wait for the initial SSH keys to be created. Will be interesting to see how this evolves over time. In the meantime, as this is not actually a kernel defect, I suppose this bug can be closed. -Michael
Bug#931644: Buster kernel entropy pool too low on VM boot
Package: linux-image-4.19.0-5-amd64 Version: 4.19.0-5 Issue: == After upgrading to Debian Buster, Xen PV guests' entropy pool is too low to start cryptographic services in a timely manner. This results in 30+ second delays in the startup of services such as SSH. If I connect to the VM's virtual VNC console and move the mouse during boot, the system very rapidly collects entropy and crypto-dependent services like SSH start with no delay. The symptoms are identical to a bug I reported for Debian 9 (bug #897917). Workaround: === Install `haveged`. If another RNG feeds the entropy pool, the VM and its services boot as expected.
Bug#903821: 4.9.110-1 Xen PV boot workaround
I've tested the workaround successfully. Added `pti=off` to my kernel's boot arguments, updated GRUB, and it started as intended. Benoît, Just to be sure, since you're loading your guests' kernels directly like that, you're passing pti=off via the `extra` config line in your domU config files, right? I.e. extra = 'elevator=noop pti=off' On Tue, 2018-07-17 at 00:39 +0200, Benoît Tonnerre wrote: > Hi, > > I tested this workaround : I confirm that it works on Xen host, but > not on Xen guest. > If you try to start a vm with latest kernel i.e. theses parameters in > cfg file : > > # > # Kernel + memory size > # > kernel = '/boot/vmlinuz-4.9.0-7-amd64' > extra = 'elevator=noop' > ramdisk = '/boot/initrd.img-4.9.0-7-amd64'
Bug#903767: Stretch kernel 4.9.110-1 boot-loops with Xen Hypervisor 4.8
This also apparently affects at least PV guests. Upgrading a PV domU to kernel 4.9.110-1 and rebooting yields the following output via xl's console: Loading Linux 4.9.0-6-amd64 ... Loading Linux 4.9.0-7-amd64 ... Loading initial ramdisk ... [ vmlinuz-4.9.0-7- amd6 2.69MiB 66% 1.67MiB/s ] [0.128044] dmi: Firmware registration failed.a 17.29MiB 100% 9.75MiB/s ] [1.408778] dmi-sysfs: dmi entry is absent. [1.427758] general protection fault: [#1] SMP [1.427767] Modules linked in: [1.427778] CPU: 0 PID: 1 Comm: init Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-1 [1.427789] task: 88000ee36040 task.stack: c90040068000 [1.427798] RIP: e030:[] [] ret_from_fork+0x2d/0x70 [1.427815] RSP: e02b:c9004006bf50 EFLAGS: 00010006 [1.427823] RAX: 0002175f5000 RBX: 816076d0 RCX: ea310e1f [1.427833] RDX: 0002 RSI: 0002 RDI: c9004006bf58 [1.427840] RBP: R08: R09: 88000a9e5000 [1.427847] R10: 8080808080808080 R11: fefefefefefefeff R12: [1.427854] R13: 179f3966a73fde7b R14: 06f99905e8f3edfb R15: cf60f5f9fd8e4751 [1.427866] FS: () GS:88000fc0() knlGS: [1.427873] CS: e033 DS: ES: CR0: 80050033 [1.427879] CR2: 7ffd37e72eb9 CR3: 0a9f4000 CR4: 00042660 [1.427889] Stack: [1.427893] [1.427906] [1.427921] [1.427943] Call Trace: [1.427951] Code: c7 e8 b8 fe a8 ff 48 85 db 75 2f 48 89 e7 e8 5b ed 9e ff 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02 01 00 <0f> 22 d8 58 66 66 90 66 66 90 e9 c1 07 00 00 4c 89 e7 eb 11 e8 [1.428148] RIP [] ret_from_fork+0x2d/0x70 [1.428160] RSP [1.428168] ---[ end trace cb1a96e88a7c4794 ]--- [1.428298] Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b [1.428298] [1.428316] Kernel Offset: disabled Note that the guest bootloader being used here is pvGRUB; not sure if that is relevant but thought I would include it. Similarly, rolling the domU back to the previous kernel/selecting the previous kernel via pvGRUB allows the VM to boot normally.
Bug#903767: Stretch kernel 4.9.110-1 boot-loops with Xen Hypervisor 4.8
Package: linux-image-4.9.0-7-amd64 Version: 4.9.110-1 Description: After installing the latest Stretch kernel, 4.9.110-1, on a server running Xen Hypervisor 4.8, bootstrapping the kernel fails. GRUB loads the hypervisor as normal, which then attempts to load the Dom0 kernel. Once that process starts, the system simply reboots. Nothing is output to the console after Xen does its thing (screen goes black), so I cannot offer any insights into what the kernel may be doing before it fails. If I roll back to the previous Stretch kernel, linux-image-4.9.0-6- amd64 (4.9.88-1+deb9u1), the hypervisor starts the Dom0 kernel as normal and the system boots successfully. Setup: == Xen Hypervisor version: 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9 Kernel version: linux-image-4.9.0-7-amd64 (4.9.110-1)
Bug#897917: Stretch kernel 4.9.88-1 breaks startup of RPC, KDC services
On further investigation, Arne's absolutely right. I upgraded the kernel back to 4.9.88-1 from Debian Security and installed 'haveged' (another random number generator). Everything started quickly and normally after a reboot. Turns out I hadn't noticed this on any of my other virtual servers because they're all running haveged anyway. So, Workarounds: 1. Roll back kernel to 4.9.82-1+deb9u3 OR 2. Install another RNG, such has 'haveged'
Bug#897917: Stretch kernel 4.9.88-1 breaks startup of RPC, KDC services
Package: linux-image-4.9.0-6-amd64 Version: 4.9.88-1 Issue: == Kernel "linux-image-4.9.0-6-amd64," version 4.9.88-1, breaks systemd startup of RPC, Kerberos KDC services. Description: After upgrading to the latest Stretch kernel (4.9.88-1), RPC and KDC services time out during the boot process. This issue is being seen on a Kerberos KDC that is also an NFS client. Kerberos auth. and encryption are being used with NFS in this environment, and this KDC provides the Kerberos services for that to work. Network is functional prior to these services starting, which is proper. After the server has booted completely, I can issue `service krb5-kdc restart` and, after a short delay, the KDC service starts normally. Not sure if this is a kernel bug, a systemd bug, or something else. Since the kernel package was the only thing that was upgraded before the issue started, I'm leaning toward the kernel. Relevant output from /var/log/syslog: - May 4 09:03:17 systemd[1]: rpc-svcgssd.service: Start operation timed out. Terminating. May 4 09:03:17 systemd[1]: Failed to start RPC security service for NFS server. May 4 09:03:17 systemd[1]: rpc-svcgssd.service: Unit entered failed state. May 4 09:03:17 systemd[1]: rpc-svcgssd.service: Failed with result 'timeout'. May 4 09:03:17 systemd[1]: rpc-gssd.service: Start operation timed out. Terminating. May 4 09:03:17 systemd[1]: Failed to start RPC security service for NFS client and server. May 4 09:03:17 systemd[1]: rpc-gssd.service: Unit entered failed state. May 4 09:03:17 systemd[1]: rpc-gssd.service: Failed with result 'timeout'. May 4 09:03:20 systemd[1]: krb5-kdc.service: Start operation timed out. Terminating. May 4 09:03:20 systemd[1]: Failed to start Kerberos 5 Key Distribution Center. May 4 09:03:20 systemd[1]: krb5-kdc.service: Unit entered failed state.29s random time. May 4 09:03:20 systemd[1]: krb5-kdc.service: Failed with result 'timeout'. random time. Workaround: === Rolling back to Stretch kernel 4.9.82-1+deb9u3 fixes the issue. Setup: == 1. KDC package: krb5-kdc 1.15-1+deb9u1 2. NFS package: nfs-common 1:1.3.4-2.1 3. Kernel: linux-image-4.9.0-6-amd64 4.9.88-1 4. Systemd version: 232-25+deb9u3 5. Server is a 64-bit Xen PV domU
Bug#887676: Stretch kernel 4.9.0-5-amd64 breaks Xen PVH support
Aha! Okay, that certainly explains some things. I didn't realize PVH was a "use at your own risk" tech preview in Xen 4.8 and kernel 4.9. Luckily, my infrastructure didn't rely on PVH to start with; I can go back to conventional PV or HVM with no problem. Thanks for investigating! -Michael
Bug#887676: Stretch kernel 4.9.0-5-amd64 breaks Xen PVH support
Package: linux-image-amd64 Version: 4.9+80+deb9u Not sure if this needs to go to the Debian Kernel team or Debian Xen team, so please feel free to reclassify as necessary. I'm leaning toward this being a kernel bug, as the Xen packages had not changed when this issue was introduced; only the kernel changed. Description: Latest Stretch kernel (4.9.0-5-amd64), released per DSA-4082-1, breaks Xen PVH domU support. Booting the domU with pvh = '1' set in its config file gets the boot process as far as pyGRUB, but once the kernel itself begins to boot, the domU immediately crashes. Even with 'quiet' removed from the kernel's boot options, the kernel outputs nothing before it dies. Thinking this was possibly the work of the Meltdown mitigation, I added `pti=off` to the domU's kernel boot options. No effect. Steps to reproduce: === 1. Run Xen PV domU in PVH mode (PVH = '1' in domU config file), 2. Upgrade domU's kernel from 4.9.0-4-amd64 to 4.9.0-5-amd64, 3. Reboot domU using latest kernel. Workaround: === I could find no workaround with this kernel. I either had to disable Xen PVH mode (set PVH = '0') for the domU or roll back to using kernel 4.9.0-4-amd64. Setup: == - Dom0 kernel: 4.9.0-5-amd64 - DomU kernel: 4.9.0-5-amd64 - Xen hypervisor version: xen-hypervisor-4.8-amd64 (4.8.2+xsa245- 0+deb9u1)