Bug#931644: Buster kernel entropy pool too low on VM boot

2020-02-08 Thread Michael J. Redd
I've upgraded my VMs to the 10.3 point release and can confirm that
cryptographic services (SSH and others) start quite rapidly now on
system boot.

Thanks, all!

-Michael



Bug#931644: Buster kernel entropy pool too low on VM boot

2020-02-06 Thread Michael J. Redd
Apologies for the late reply. I can certainly test on some of my VMs if
you're willing to provide packages.

Reading over Linus' explanation of deriving jitter from the CPU's cycle
counter, while I'm no cryptographer, I might have some concerns about
the quality of the entropy that will be generated by this patch on
hypervisors that virtualize the time stamp counter. In my environment,
I know I can instruct Xen to never virtualize the TSC (
https://xenbits.xen.org/docs/unstable/man/xen-tscmode.7.html), which
would probably benefit the patch, but AWS and other public cloud users
may not have that option.

-Michael



Bug#931644: Buster kernel entropy pool too low on VM boot

2019-07-11 Thread Michael J. Redd


> The release notes for buster do mention this issue and provide a
> link to:
> 
> https://wiki.debian.org/BoottimeEntropyStarvation
> 
> which has your Haveged solution as one of its suggestions.
> 

D'oh! Serves me right for just skimming the release notes, then. After
doing some in-depth reading, this is a problem for the Linux community
at large. Wow. While I'm glad the kernel's getting choosier about where
and how to harvest entropy and can personally live with the ~30 seconds
added to VM boot times, it could be painful to, for example, bootstrap
a Linux guest on AWS for the first time and wait for the initial SSH
keys to be created.

Will be interesting to see how this evolves over time. In the meantime,
as this is not actually a kernel defect, I suppose this bug can be
closed.

-Michael



Bug#931644: Buster kernel entropy pool too low on VM boot

2019-07-08 Thread Michael J. Redd
Package: linux-image-4.19.0-5-amd64
Version: 4.19.0-5

Issue:
==

After upgrading to Debian Buster, Xen PV guests' entropy pool is too
low to start cryptographic services in a timely manner. This results in
30+ second delays in the startup of services such as SSH. If I connect
to the VM's virtual VNC console and move the mouse during boot, the
system very rapidly collects entropy and crypto-dependent services like
SSH start with no delay.

The symptoms are identical to a bug I reported for Debian 9 (bug
#897917).

Workaround:
===

Install `haveged`. If another RNG feeds the entropy pool, the VM and
its services boot as expected.



Bug#903821: 4.9.110-1 Xen PV boot workaround

2018-07-16 Thread Michael J. Redd
I've tested the workaround successfully. Added `pti=off` to my kernel's
boot arguments, updated GRUB, and it started as intended.

Benoît,

Just to be sure, since you're loading your guests' kernels directly
like that, you're passing pti=off via the `extra` config line in your
domU config files, right? I.e.

extra = 'elevator=noop pti=off'



On Tue, 2018-07-17 at 00:39 +0200, Benoît Tonnerre wrote:
> Hi, 
> 
> I tested this workaround : I confirm that it works on Xen host, but
> not on Xen guest.
> If you try to start a vm with latest kernel i.e. theses parameters in
> cfg file : 
> 
> #
> #  Kernel + memory size
> #
> kernel  = '/boot/vmlinuz-4.9.0-7-amd64'
> extra   = 'elevator=noop'
> ramdisk = '/boot/initrd.img-4.9.0-7-amd64'



Bug#903767: Stretch kernel 4.9.110-1 boot-loops with Xen Hypervisor 4.8

2018-07-14 Thread Michael J. Redd
This also apparently affects at least PV guests. Upgrading a PV domU to
 kernel 4.9.110-1 and rebooting yields the following output via xl's
console:


Loading Linux 4.9.0-6-amd64 ...
Loading Linux 4.9.0-7-amd64 ...
Loading initial ramdisk ...   [ vmlinuz-4.9.0-7-
amd6  2.69MiB  66%  1.67MiB/s ]
[0.128044] dmi: Firmware registration
failed.a  17.29MiB  100%  9.75MiB/s ]
[1.408778] dmi-sysfs: dmi entry is absent.
[1.427758] general protection fault:  [#1] SMP
[1.427767] Modules linked in:
[1.427778] CPU: 0 PID: 1 Comm: init Not tainted 4.9.0-7-amd64 #1
Debian 4.9.110-1
[1.427789] task: 88000ee36040 task.stack: c90040068000
[1.427798] RIP: e030:[]  []
ret_from_fork+0x2d/0x70
[1.427815] RSP: e02b:c9004006bf50  EFLAGS: 00010006
[1.427823] RAX: 0002175f5000 RBX: 816076d0 RCX:
ea310e1f
[1.427833] RDX: 0002 RSI: 0002 RDI:
c9004006bf58
[1.427840] RBP:  R08:  R09:
88000a9e5000
[1.427847] R10: 8080808080808080 R11: fefefefefefefeff R12:

[1.427854] R13: 179f3966a73fde7b R14: 06f99905e8f3edfb R15:
cf60f5f9fd8e4751
[1.427866] FS:  () GS:88000fc0()
knlGS:
[1.427873] CS:  e033 DS:  ES:  CR0: 80050033
[1.427879] CR2: 7ffd37e72eb9 CR3: 0a9f4000 CR4:
00042660
[1.427889] Stack:
[1.427893]    

[1.427906]    

[1.427921]    

[1.427943] Call Trace:
[1.427951] Code: c7 e8 b8 fe a8 ff 48 85 db 75 2f 48 89 e7 e8 5b ed
9e ff 50 90 0f 20 d8 65 48 0b 04 25 e0 02 01 00 78 08 65 88 04 25 e7 02
01 00 <0f> 22 d8 58 66 66 90 66 66 90 e9 c1 07 00 00 4c 89 e7 eb 11 e8 
[1.428148] RIP  [] ret_from_fork+0x2d/0x70
[1.428160]  RSP 
[1.428168] ---[ end trace cb1a96e88a7c4794 ]---
[1.428298] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x000b
[1.428298] 
[1.428316] Kernel Offset: disabled


Note that the guest bootloader being used here is pvGRUB; not sure if
that is relevant but thought I would include it. Similarly, rolling the
domU back to the previous kernel/selecting the previous kernel via
pvGRUB allows the VM to boot normally.



Bug#903767: Stretch kernel 4.9.110-1 boot-loops with Xen Hypervisor 4.8

2018-07-14 Thread Michael J. Redd
Package: linux-image-4.9.0-7-amd64
Version: 4.9.110-1

Description:


After installing the latest Stretch kernel, 4.9.110-1, on a server
running Xen Hypervisor 4.8, bootstrapping the kernel fails. GRUB loads
the hypervisor as normal, which then attempts to load the Dom0 kernel.
Once that process starts, the system simply reboots. Nothing is output
to the console after Xen does its thing (screen goes black), so I
cannot offer any insights into what the kernel may be doing before it
fails.

If I roll back to the previous Stretch kernel, linux-image-4.9.0-6-
amd64 (4.9.88-1+deb9u1), the hypervisor starts the Dom0 kernel as
normal and the system boots successfully.

Setup:
==

Xen Hypervisor version: 4.8.3+xsa267+shim4.10.1+xsa267-1+deb9u9
Kernel version: linux-image-4.9.0-7-amd64 (4.9.110-1)



Bug#897917: Stretch kernel 4.9.88-1 breaks startup of RPC, KDC services

2018-05-05 Thread Michael J. Redd
On further investigation, Arne's absolutely right. I upgraded the
kernel back to 4.9.88-1 from Debian Security and installed 'haveged'
(another random number generator). Everything started quickly and
normally after a reboot. Turns out I hadn't noticed this on any of my
other virtual servers because they're all running haveged anyway.

So,

Workarounds:

1. Roll back kernel to 4.9.82-1+deb9u3
OR
2. Install another RNG, such has 'haveged'



Bug#897917: Stretch kernel 4.9.88-1 breaks startup of RPC, KDC services

2018-05-04 Thread Michael J. Redd
Package: linux-image-4.9.0-6-amd64
Version: 4.9.88-1

Issue:
==

Kernel "linux-image-4.9.0-6-amd64," version 4.9.88-1, breaks systemd
startup of RPC, Kerberos KDC services. 

Description:


After upgrading to the latest Stretch kernel (4.9.88-1), RPC and KDC
services time out during the boot process. This issue is being seen on
a Kerberos KDC that is also an NFS client. Kerberos auth. and
encryption are being used with NFS in this environment, and this KDC
provides the Kerberos services for that to work.

Network is functional prior to these services starting, which is
proper.

After the server has booted completely, I can issue `service krb5-kdc
restart` and, after a short delay, the KDC service starts normally.

Not sure if this is a kernel bug, a systemd bug, or something else.
Since the kernel package was the only thing that was upgraded before
the issue started, I'm leaning toward the kernel.

Relevant output from /var/log/syslog:
-

May  4 09:03:17  systemd[1]: rpc-svcgssd.service: Start
operation timed out. Terminating.
May  4 09:03:17  systemd[1]: Failed to start RPC security
service for NFS server.
May  4 09:03:17  systemd[1]: rpc-svcgssd.service: Unit
entered failed state.
May  4 09:03:17  systemd[1]: rpc-svcgssd.service: Failed with
result 'timeout'.
May  4 09:03:17  systemd[1]: rpc-gssd.service: Start
operation timed out. Terminating.
May  4 09:03:17  systemd[1]: Failed to start RPC security
service for NFS client and server.
May  4 09:03:17  systemd[1]: rpc-gssd.service: Unit entered
failed state.
May  4 09:03:17  systemd[1]: rpc-gssd.service: Failed with
result 'timeout'.
May  4 09:03:20  systemd[1]: krb5-kdc.service: Start
operation timed out. Terminating.
May  4 09:03:20  systemd[1]: Failed to start Kerberos 5 Key
Distribution Center.
May  4 09:03:20  systemd[1]: krb5-kdc.service: Unit entered
failed state.29s random time.
May  4 09:03:20  systemd[1]: krb5-kdc.service: Failed with
result 'timeout'. random time.

Workaround:
===

Rolling back to Stretch kernel 4.9.82-1+deb9u3 fixes the issue.

Setup:
==

1. KDC package: krb5-kdc 1.15-1+deb9u1
2. NFS package: nfs-common 1:1.3.4-2.1
3. Kernel: linux-image-4.9.0-6-amd64 4.9.88-1
4. Systemd version: 232-25+deb9u3
5. Server is a 64-bit Xen PV domU



Bug#887676: Stretch kernel 4.9.0-5-amd64 breaks Xen PVH support

2018-01-20 Thread Michael J. Redd
Aha! Okay, that certainly explains some things. I didn't realize PVH
was a "use at your own risk" tech preview in Xen 4.8 and kernel 4.9.
Luckily, my infrastructure didn't rely on PVH to start with; I can go
back to conventional PV or HVM with no problem.

Thanks for investigating!

-Michael



Bug#887676: Stretch kernel 4.9.0-5-amd64 breaks Xen PVH support

2018-01-18 Thread Michael J. Redd
Package: linux-image-amd64
Version: 4.9+80+deb9u

Not sure if this needs to go to the Debian Kernel team or Debian Xen
team, so please feel free to reclassify as necessary. I'm leaning
toward this being a kernel bug, as the Xen packages had not changed
when this issue was introduced; only the kernel changed.

Description:


Latest Stretch kernel (4.9.0-5-amd64), released per DSA-4082-1, breaks
Xen PVH domU support. Booting the domU with pvh = '1' set in its config
file gets the boot process as far as pyGRUB, but once the kernel itself
begins to boot, the domU immediately crashes. Even with 'quiet' removed
from the kernel's boot options, the kernel outputs nothing before it
dies.

Thinking this was possibly the work of the Meltdown mitigation, I added
`pti=off` to the domU's kernel boot options. No effect.

Steps to reproduce:
===

1. Run Xen PV domU in PVH mode (PVH = '1' in domU config file),
2. Upgrade domU's kernel from 4.9.0-4-amd64 to 4.9.0-5-amd64,
3. Reboot domU using latest kernel.

Workaround:
===

I could find no workaround with this kernel. I either had to disable
Xen PVH mode (set PVH = '0') for the domU or roll back to using kernel
4.9.0-4-amd64.

Setup:
==

- Dom0 kernel: 4.9.0-5-amd64
- DomU kernel: 4.9.0-5-amd64
- Xen hypervisor version: xen-hypervisor-4.8-amd64 (4.8.2+xsa245-
0+deb9u1)