Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-09-21 Thread Chuck Zmudzinski

On 8/24/2021 7:12 PM, Ben Hutchings wrote:

On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote:

Ben Hutchings  writes:


I think a proper fix would be one of:

a. If the Xen virtual keyboard driver is advertising capabilities it
doesn't have, stop it doing that.
b. Change the implementation of modalias attributes to allow longer
values.

It's not clear to me whether the Xen driver is advertising correctly or
not.  If it is, then�the solution should be b, but that may be too
disruptive a change to the kernel.  So a reasonable workaround might
be:

c. Change the input subsystem to limit the length of the
capabilities part of the modalias.

The problem with a) is that the Xen keyboard is not a physical keyboard
and so it has no way of knowing what keys it actually has.  It is a fake
input device designed to pass through whatever input the Xen hypervisor
sends down.  As such, any key could come in.  If it doesn't advertise
that it has all of these keys, then they would not be accepted by
libinput when the hypervisor sends them down.

Right, that's what I feared.

xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC,
KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654
keys and 2362 bytes in the modalias.


This seems to be the heart of the problem: libinput was designed
assuming that all keyboards can and must report what keys are actually
present, and then libinput tries to cram that information into the
modalias rather than some other sysfs attribute as it should ( or not at
all... I still don't see how this information is actually supposed to be
useful to userspace ).

I think modaliases aren't intended to be interpreted by user-space,
other than processing wildcards when matching to modules.

For input devices, the same information is available through other
variables in the uevent, in a more compact form.  The information *is*
useful for user-space; e.g. in initramfs-tools we recognise keyboard
devices and add their drivers to the initramfs but ignore other input
devices.


As for b), the problem isn't with the modalias attribute itself, but
when the kernel tries to copy it into the environment block for the udev
callout.  The environment block is only a single page, and so limited to
4 KB.  And that's for everything else that goes into the environment,
not just the modalias.

Text-based sysfs attributes are limited to a page, but udev receives
uevents through netlink, not sysfs.

The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).  That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390

But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.  Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
  
  #define UEVENT_HELPER_PATH_LEN		256

  #define UEVENT_NUM_ENVP   64  /* number of env 
pointers */
-#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */
+#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */
  
  #ifdef CONFIG_UEVENT_HELPER

  /* path to the userspace helper executed on an event */
--- END ---

?

Ben.



Even though this patch has been tested to apparently fix this bug and
the bug has been elevated to important and tagged patch and upstream,
AFAICT there is no action yet upstream or anywhere else after more than
three weeks. Is this patch dead as a possible fix for this bug?

Best wishes,

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-26 Thread Chuck Zmudzinski

On 8/26/2021 8:01 AM, Chuck Zmudzinski wrote:

On 8/24/2021 7:12 PM, Ben Hutchings wrote:


The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).� That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 



But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.� Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
� � #define UEVENT_HELPER_PATH_LEN������� 256
� #define UEVENT_NUM_ENVP����������� 64��� /* 
number of env pointers */
-#define UEVENT_BUFFER_SIZE������� 2048��� /* buffer for the 
variables */
+#define UEVENT_BUFFER_SIZE������� 4096��� /* buffer for the 
variables */

� � #ifdef CONFIG_UEVENT_HELPER
� /* path to the userspace helper executed on an event */
--- END ---

?

Ben.



I tested this patch on my Xen HVM bullseye system and
it appears 4k is enough for the UEVENT_BUFFER_SIZE
to accommodate the Xen Virtual Keyboard's large
modalias. I needed to follow the instructions in
the Kernel team's handbook for changing the ABI
name of the kernel for the build to succeed with
the patch. I just bumped it from 8 to 8.1.

Results:

1. No coldplug failure reported at boot time.

2. With the patch the system can write uevent
data to sysfs for the Xen Virtual Keyboard device.

With the current 5.10.0-8 kernel:

chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent
chuckz@debian:~$

With the patched kernel with a change to the ABI version from 8 to 8.1:

chuckz@debian:~$ uname -r
5.10.0-8.1-amd64
chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent
PRODUCT=1/5853//0
NAME="Xen Virtual Keyboard"
PHYS="xenbus/device/vkbd/0"
PROP=0
EV=3
KEY=7fff  ...
MODALIAS=input:b0001v5853pe-e0,1,k71,72... really long MODALIAS
--- 



So I think a test of the installation media in a Xen HVM with the
4k buffer in the kernel is the next step.

I would also like to test a live CD in a Xen HVM with this patch.
It was also reported to fail to boot in a Xen HVM on the
debian-user list.

BTW, my complements to the Debian Kernel Team for the
excellent handbook on building kernels for Debian. It is
easy to understand and made it very easy for me to
build and test the patch even though I have not built
a Linux kernel in many years, and I never built a Debian
kernel before.

All the best,

Chuck



Results of more tests with the patched kernel:

1. Boot on dom0 - works normally, can create VMs, run Liinux container, etc.
2. Boot in Xen PV - works normally
3. Boot on bare hardware - works normally

I do not see any issues with the patched kernel on my system.

Cheers,

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-26 Thread Chuck Zmudzinski

On 8/24/2021 7:12 PM, Ben Hutchings wrote:


The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).  That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390

But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.  Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
  
  #define UEVENT_HELPER_PATH_LEN		256

  #define UEVENT_NUM_ENVP   64  /* number of env 
pointers */
-#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */
+#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */
  
  #ifdef CONFIG_UEVENT_HELPER

  /* path to the userspace helper executed on an event */
--- END ---

?

Ben.



I tested this patch on my Xen HVM bullseye system and
it appears 4k is enough for the UEVENT_BUFFER_SIZE
to accommodate the Xen Virtual Keyboard's large
modalias. I needed to follow the instructions in
the Kernel team's handbook for changing the ABI
name of the kernel for the build to succeed with
the patch. I just bumped it from 8 to 8.1.

Results:

1. No coldplug failure reported at boot time.

2. With the patch the system can write uevent
data to sysfs for the Xen Virtual Keyboard device.

With the current 5.10.0-8 kernel:

chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent
chuckz@debian:~$

With the patched kernel with a change to the ABI version from 8 to 8.1:

chuckz@debian:~$ uname -r
5.10.0-8.1-amd64
chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent
PRODUCT=1/5853//0
NAME="Xen Virtual Keyboard"
PHYS="xenbus/device/vkbd/0"
PROP=0
EV=3
KEY=7fff  ...
MODALIAS=input:b0001v5853pe-e0,1,k71,72... really long MODALIAS
---

So I think a test of the installation media in a Xen HVM with the
4k buffer in the kernel is the next step.

I would also like to test a live CD in a Xen HVM with this patch.
It was also reported to fail to boot in a Xen HVM on the
debian-user list.

BTW, my complements to the Debian Kernel Team for the
excellent handbook on building kernels for Debian. It is
easy to understand and made it very easy for me to
build and test the patch even though I have not built
a Linux kernel in many years, and I never built a Debian
kernel before.

All the best,

Chuck



Bug#988776: Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Chuck Zmudzinski

On 8/25/2021 4:16 PM, Phillip Susi wrote:

Chuck Zmudzinski  writes:

If it doesn't work, I am also willing to try approach a by patching
the Linux kernel xen-kbdfront driver by removing the for loops that
advertise those 654 keys. I tend to agree with Philip that this is
totally unnecessary, but I suppose I could be wrong about that.
I read the discussion Philip had with the Xen developers and they
seemed to want to keep the Xen keyboard driver as it is.

That was the first thing I tried and the libinput maintainer pointed out
that if you don't advertise the keys, you can't use the keys.  In other
words, somebody presses that key on their keyboard and the domU won't
recognize it.



Well, good news - It looks like Ben's patch works, I just tested it in 
my full

install in a Xen HVM domU and all looks good. I did not see the Coldplug
failure at the beginning of the boot - it is hard to miss in the bright red
letters on the console, and even more convincing is the fact that another
symptom of the bug is gone. This bug manifests itself in udev not being
able to write uevent data to sysfs for the Xen Virtual Keyboard. With
Ben's patch of increasing the UEVENT_BUFFER_SIZE from 2048 to 4096,
udev can write its uevent data to sysfs for the Xen Virtual Keyboard:

With the current 5.10.0-8 kernel:

chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent
chuckz@debian:~$

With the patched kernel with a change to the ABI version from 8 to 8.1:

chuckz@debian:~$ uname -r
5.10.0-8.1-amd64
chuckz@debian:~$ cat /sys/devices/virtual/input/input2/uevent
PRODUCT=1/5853//0
NAME="Xen Virtual Keyboard"
PHYS="xenbus/device/vkbd/0"
PROP=0
EV=3
KEY=7fff  ...
MODALIAS=input:b0001v5853pe-e0,1,k71,72... really long MODALIAS

I expect with that patch the installation media will work
in a Xen HVM domU.

Cheers,

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Chuck Zmudzinski

On 8/24/2021 7:12 PM, Ben Hutchings wrote:


Text-based sysfs attributes are limited to a page, but udev receives
uevents through netlink, not sysfs.

The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).  That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390

But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.  Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
  
  #define UEVENT_HELPER_PATH_LEN		256

  #define UEVENT_NUM_ENVP   64  /* number of env 
pointers */
-#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */
+#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */
  
  #ifdef CONFIG_UEVENT_HELPER

  /* path to the userspace helper executed on an event */
--- END ---

?

Ben.



I tried this patch but the build failed - it ran for over an hour. I am not
sure why as I have not built a Linux kernel in many years. So I will
this:

1) Try to build the unmodified kernel on my system just to be sure I
am building the kernel correctly and that my hardware is OK. Once
I could not build the Linux kernel until I replaced a bad memory
card.

2) If that succeeds, I will try the patch with a bump to the abi version.

From the output of the failed build and what I read in the section on
the Debian kernel ABI name, I think that the system detected an
ABI change and so it failed. The build was checking symbols when
it failed.

This will take a little while because it takes over an hour to build the
kernel on my system.

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Chuck Zmudzinski

On 8/25/2021 12:45 PM, Chuck Zmudzinski wrote:

On 8/24/2021 7:12 PM, Ben Hutchings wrote:

On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote:

Ben Hutchings  writes:


I think a proper fix would be one of:

a. If the Xen virtual keyboard driver is advertising capabilities it
��� doesn't have, stop it doing that.
b. Change the implementation of modalias attributes to allow longer
��� values.

It's not clear to me whether the Xen driver is advertising 
correctly or

not.� If it is, then�the solution should be b, but that may be too
disruptive a change to the kernel.� So a reasonable workaround might
be:

c. Change the input subsystem to limit the length of the
��� capabilities part of the modalias.

The problem with a) is that the Xen keyboard is not a physical keyboard
and so it has no way of knowing what keys it actually has.� It is a 
fake

input device designed to pass through whatever input the Xen hypervisor
sends down.� As such, any key could come in.� If it doesn't advertise
that it has all of these keys, then they would not be accepted by
libinput when the hypervisor sends them down.

Right, that's what I feared.

xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC,
KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654
keys and 2362 bytes in the modalias.


This seems to be the heart of the problem: libinput was designed
assuming that all keyboards can and must report what keys are actually
present, and then libinput tries to cram that information into the
modalias rather than some other sysfs attribute as it should ( or 
not at
all... I still don't see how this information is actually supposed 
to be

useful to userspace ).

I think modaliases aren't intended to be interpreted by user-space,
other than processing wildcards when matching to modules.

For input devices, the same information is available through other
variables in the uevent, in a more compact form.� The information *is*
useful for user-space; e.g. in initramfs-tools we recognise keyboard
devices and add their drivers to the initramfs but ignore other input
devices.


As for b), the problem isn't with the modalias attribute itself, but
when the kernel tries to copy it into the environment block for the 
udev
callout.� The environment block is only a single page, and so 
limited to

4 KB.� And that's for everything else that goes into the environment,
not just the modalias.

Text-based sysfs attributes are limited to a page, but udev receives
uevents through netlink, not sysfs.

The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).� That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390 



But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.� Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
� � #define UEVENT_HELPER_PATH_LEN������� 256
� #define UEVENT_NUM_ENVP����������� 64��� /* 
number of env pointers */
-#define UEVENT_BUFFER_SIZE������� 2048��� /* buffer for the 
variables */
+#define UEVENT_BUFFER_SIZE������� 4096��� /* buffer for the 
variables */

� � #ifdef CONFIG_UEVENT_HELPER
� /* path to the userspace helper executed on an event */
--- END ---

?

Ben.




I will try it in my bullseye Xen HVM DomU.

I am not sure how to rebuild the installation media with a patched
systemd, but I can patch my installed Xen HVM DomU system
with a patched systemd with the increased buffer size and see if the
Coldplug failure early in the boot process goes away. If so, then it
is likely this patch to systemd would also fix the installation media.

If it doesn't work, I am also willing to try approach a by patching
the Linux kernel xen-kbdfront driver by removing the for loops that
advertise those 654 keys. I tend to agree with Philip that this is
totally unnecessary, but I suppose I could be wrong about that.
I read the discussion Philip had with the Xen developers and they
seemed to want to keep the Xen keyboard driver as it is.

Chuck


The build failed with an error. I used the test-patches script to start 
the build:


chuckz@debian:~/linuxdata/sources-bullseye/kernel/linux-5.10.46$ bash 
debian/bin/test-patches ../patch


with Ben's patch to UEVENT_BUFFER_SIZE in ../patch.

The build was running for over an hour and then failed with the last few 
lines on

the console as:

RT_SYMBOL
zl10039_attach���������������������������������� module: 
drivers/media/dvb-frontends/zl10039, 

Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Chuck Zmudzinski

On 8/24/2021 7:12 PM, Ben Hutchings wrote:

On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote:

Ben Hutchings  writes:


I think a proper fix would be one of:

a. If the Xen virtual keyboard driver is advertising capabilities it
doesn't have, stop it doing that.
b. Change the implementation of modalias attributes to allow longer
values.

It's not clear to me whether the Xen driver is advertising correctly or
not.  If it is, then�the solution should be b, but that may be too
disruptive a change to the kernel.  So a reasonable workaround might
be:

c. Change the input subsystem to limit the length of the
capabilities part of the modalias.

The problem with a) is that the Xen keyboard is not a physical keyboard
and so it has no way of knowing what keys it actually has.  It is a fake
input device designed to pass through whatever input the Xen hypervisor
sends down.  As such, any key could come in.  If it doesn't advertise
that it has all of these keys, then they would not be accepted by
libinput when the hypervisor sends them down.

Right, that's what I feared.

xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC,
KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654
keys and 2362 bytes in the modalias.


This seems to be the heart of the problem: libinput was designed
assuming that all keyboards can and must report what keys are actually
present, and then libinput tries to cram that information into the
modalias rather than some other sysfs attribute as it should ( or not at
all... I still don't see how this information is actually supposed to be
useful to userspace ).

I think modaliases aren't intended to be interpreted by user-space,
other than processing wildcards when matching to modules.

For input devices, the same information is available through other
variables in the uevent, in a more compact form.  The information *is*
useful for user-space; e.g. in initramfs-tools we recognise keyboard
devices and add their drivers to the initramfs but ignore other input
devices.


As for b), the problem isn't with the modalias attribute itself, but
when the kernel tries to copy it into the environment block for the udev
callout.  The environment block is only a single page, and so limited to
4 KB.  And that's for everything else that goes into the environment,
not just the modalias.

Text-based sysfs attributes are limited to a page, but udev receives
uevents through netlink, not sysfs.

The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).  That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390

But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.  Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
  
  #define UEVENT_HELPER_PATH_LEN		256

  #define UEVENT_NUM_ENVP   64  /* number of env 
pointers */
-#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */
+#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */
  
  #ifdef CONFIG_UEVENT_HELPER

  /* path to the userspace helper executed on an event */
--- END ---

?

Ben.




I will try it in my bullseye Xen HVM DomU.

I am not sure how to rebuild the installation media with a patched
systemd, but I can patch my installed Xen HVM DomU system
with a patched systemd with the increased buffer size and see if the
Coldplug failure early in the boot process goes away. If so, then it
is likely this patch to systemd would also fix the installation media.

If it doesn't work, I am also willing to try approach a by patching
the Linux kernel xen-kbdfront driver by removing the for loops that
advertise those 654 keys. I tend to agree with Philip that this is
totally unnecessary, but I suppose I could be wrong about that.
I read the discussion Philip had with the Xen developers and they
seemed to want to keep the Xen keyboard driver as it is.

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Phillip Susi


Chuck Zmudzinski  writes:
> If it doesn't work, I am also willing to try approach a by patching
> the Linux kernel xen-kbdfront driver by removing the for loops that
> advertise those 654 keys. I tend to agree with Philip that this is
> totally unnecessary, but I suppose I could be wrong about that.
> I read the discussion Philip had with the Xen developers and they
> seemed to want to keep the Xen keyboard driver as it is.

That was the first thing I tried and the libinput maintainer pointed out
that if you don't advertise the keys, you can't use the keys.  In other
words, somebody presses that key on their keyboard and the domU won't
recognize it.



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Chuck Zmudzinski

On 8/25/2021 10:54 AM, Ben Hutchings wrote:

On Tue, 2021-08-24 at 15:19 -0400, Chuck Zmudzinski wrote:

On 8/24/2021 1:12 PM, Ben Hutchings wrote:

[...]


I think a proper fix would be one of:

a. If the Xen virtual keyboard driver is advertising capabilities it
 doesn't have, stop it doing that.
b. Change the implementation of modalias attributes to allow longer
 values.

It's not clear to me whether the Xen driver is advertising correctly or
not.  If it is, then the solution should be b, but that may be too
disruptive a change to the kernel.  So a reasonable workaround might
be:

c. Change the input subsystem to limit the length of the
 capabilities part of the modalias.


Ben.


So workaround c would not involve disruptions to the kernel or
systemd? Workaround c seems too disruptive for stable to me,
but maybe could go into unstable and eventually into testing.

I don't think it would be very disruptive.  It might require a kernel
ABI bump, but we do those regularly during a stable release.  And this
bug is severe enough that I think a fix would be suitable for Debian
stable.


A problem with the approach of fixing this bug in the Xen
keyboard driver is that the fix must be implemented in the underlying
Dom0 system, which could be almost anything - another Linux distro
or Debian stable or oldstable. Any fix upstream would probably get into
a bullseye Dom0, but not oldstable Dom0, but perhaps it could be
provided as a backport for anyone who is still on oldstable for their
Xen Dom0.

[...]

I agree that we need to fix this for domU independently of any protocol
change to allow discovery of which keys the underlying input device
has.  So we can't solve this with approach a.


Ben.



Actually, now I think my comments about approach a are wrong. I was thinking
the Linux kernel was reading the modalias of the Xen Virtual Keyboard from
through some interface provided by xen - the hypervisor or libxl or some
such component running in Dom0. After further investigation, now I think the
modalias of the Xen Virtual Keyboard is coming from here:

https://github.com/torvalds/linux/blob/6e764bcd1cf72a2846c0e53d3975a09b242c04c9/drivers/input/misc/xen-kbdfront.c#L257

This is the xen-kbdfront.c driver, which is part of the Linux kernel.

At line 257 of that driver, we have:

        for (i = KEY_ESC; i < KEY_UNKNOWN; i++)
            __set_bit(i, kbd->keybit);
        for (i = KEY_OK; i < KEY_MAX; i++)
            __set_bit(i, kbd->keybit);

This is advertising too many keys, making the modalias absurdly large.
The Xen virtual keyboard driver in the Linux kernel has been doing
this at least since 2011 when to Xen virtual keyboard driver was
moved to its current location in the Linux kernel source tree.

So this can probably be fixed in the Linux kernel without any patches
to the Xen hypervisor or libxl running in Dom0. Probably just
removing those two for loops would fix it.

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Ben Hutchings
On Wed, 2021-08-25 at 12:45 -0400, Chuck Zmudzinski wrote:
[...]
> 
> I will try it in my bullseye Xen HVM DomU.
> 
> I am not sure how to rebuild the installation media with a patched
> systemd, but I can patch my installed Xen HVM DomU system
> with a patched systemd with the increased buffer size and see if the
> Coldplug failure early in the boot process goes away. If so, then it
> is likely this patch to systemd would also fix the installation media.
[...]

Sorry for not being clear - this is a patch for the kernel. 
Instructions for rebuilding the kernel package are at
.

I agree that you should check whether this fixes the coldplug error
before we try rebuilding the installer.

Ben.

-- 
Ben Hutchings
Design a system any fool can use, and only a fool will want to use it.


signature.asc
Description: This is a digitally signed message part


Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-25 Thread Ben Hutchings
On Tue, 2021-08-24 at 15:19 -0400, Chuck Zmudzinski wrote:
> On 8/24/2021 1:12 PM, Ben Hutchings wrote:
[...]

> > I think a proper fix would be one of:
> > 
> > a. If the Xen virtual keyboard driver is advertising capabilities it
> > doesn't have, stop it doing that.
> > b. Change the implementation of modalias attributes to allow longer
> > values.
> > 
> > It's not clear to me whether the Xen driver is advertising correctly or
> > not.  If it is, then the solution should be b, but that may be too
> > disruptive a change to the kernel.  So a reasonable workaround might
> > be:
> > 
> > c. Change the input subsystem to limit the length of the
> > capabilities part of the modalias.
> > 
> > 
> > Ben.
> > 
> 
> So workaround c would not involve disruptions to the kernel or
> systemd? Workaround c seems too disruptive for stable to me,
> but maybe could go into unstable and eventually into testing.

I don't think it would be very disruptive.  It might require a kernel
ABI bump, but we do those regularly during a stable release.  And this
bug is severe enough that I think a fix would be suitable for Debian
stable.

> A problem with the approach of fixing this bug in the Xen
> keyboard driver is that the fix must be implemented in the underlying
> Dom0 system, which could be almost anything - another Linux distro
> or Debian stable or oldstable. Any fix upstream would probably get into
> a bullseye Dom0, but not oldstable Dom0, but perhaps it could be
> provided as a backport for anyone who is still on oldstable for their
> Xen Dom0.
[...]

I agree that we need to fix this for domU independently of any protocol
change to allow discovery of which keys the underlying input device
has.  So we can't solve this with approach a.


Ben.

-- 
Ben Hutchings
Design a system any fool can use, and only a fool will want to use it.


signature.asc
Description: This is a digitally signed message part


Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-24 Thread Chuck Zmudzinski

On 8/24/2021 1:12 PM, Ben Hutchings wrote:

On Tue, 2021-08-24 at 10:56 -0400, Chuck Zmudzinski wrote:

On 5/24/2021 3:30 AM, Michael Biebl wrote:

Hi Phillip

Am 24.05.2021 um 06:19 schrieb Cyril Brulebois:

trigger to cold plug all devices.  Both scripts are set -e.  The Xen
Virtual Keyboard driver and at least one other driver have always
failed
to trigger due to having absurdly long modalias, but the error used to
be ignored.  The kernel now returns the error to udevadm

So this is a change in behaviour in the kernel?
What happens if you boot the installed system? Does udevadm trigger
fail there as well?

I feel a bit uneasy changing the udev start script this late in the
release cycle (especially when it appears like covering up an issue
someplace else).

I'll let Marco make the judgement on this though, as he has the most
experience with those udev udeb start scripts as the original author.

Michael


After reviewing Philip's message at

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357#43

which seems to point to the root cause of this bug, I can add:

On my Xen HVM DomU I see the absurdly long modalias for the Xen
Virtual keyboard that seems to be causing this crash in sysfs at

/sys/devices/virtual/input/input2/modalias

But at /sys/devices/vkbd-0/modalias, I see just 'xen:vkbd', which would
probably not result in an error in the udev script if this was also
written as the modalias at /sys/devices/virtual/input/input2/modalias

So the Xen virtual keyboard appears more than once in sysfs, and
modalias is not the same in the different places. This seems
to be a problem.

They are two different devices, and they should have different
modaliases.

Linux has code for discovering devices on each kind of bus, including
virtual buses, and that code creates "bus devices" such as vkbd-0.  At
this point the kernel doesn't know what the device is capable of.  The
modalias for a bus device carries some identifying information that can
be used to select a driver module for it.

The driver does know what the device is capable of, and how to use it.
It will normally create one or more "class devices" that support a
particular set of operations; in this case input device operations.
Class devices typically don't have modaliases, since they don't need
another layer of drivers on top.  However, for input devices the
modalias carries information about the device's capabilities.  These
may trigger loading of the evdev or joydev module.


I understand the correct way to fix this bug is by modifying the
Xxen virtual keyboard (and any other devices that might cause
this crash) and not the start-udev script on the netinst
installation media, which is so far the only available workaround.
Hopefully Xen will accept a fix if we can come up with a fix.

[...]

I think a proper fix would be one of:

a. If the Xen virtual keyboard driver is advertising capabilities it
doesn't have, stop it doing that.
b. Change the implementation of modalias attributes to allow longer
values.

It's not clear to me whether the Xen driver is advertising correctly or
not.  If it is, then the solution should be b, but that may be too
disruptive a change to the kernel.  So a reasonable workaround might
be:

c. Change the input subsystem to limit the length of the
capabilities part of the modalias.


Ben.



So workaround c would not involve disruptions to the kernel or
systemd? Workaround c seems too disruptive for stable to me,
but maybe could go into unstable and eventually into testing.

A problem with the approach of fixing this bug in the Xen
keyboard driver is that the fix must be implemented in the underlying
Dom0 system, which could be almost anything - another Linux distro
or Debian stable or oldstable. Any fix upstream would probably get into
a bullseye Dom0, but not oldstable Dom0, but perhaps it could be
provided as a backport for anyone who is still on oldstable for their
Xen Dom0.

Anyway, I will look into the Xen virtual keyboard capabilities. The
only capability I can think of that would be useful in this context is that
it supports live migration of a VM through some sort of hot-swapping
capability. If it has that capability, a workaround to support it would be
good. But if it does not have that capability or if such a capability is
not needed for a keyboard, then it should probably stop advertising
itself as being able or needing to do that. Ultimately, it is up to Xen to
decide if they are going to make changes to its virtual keyboard.

Chuck



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-24 Thread Ben Hutchings
On Tue, Aug 24, 2021 at 03:27:19PM -0400, Phillip Susi wrote:
> 
> Ben Hutchings  writes:
> 
> > I think a proper fix would be one of:
> >
> > a. If the Xen virtual keyboard driver is advertising capabilities it
> >doesn't have, stop it doing that.
> > b. Change the implementation of modalias attributes to allow longer
> >values.
> >
> > It's not clear to me whether the Xen driver is advertising correctly or
> > not.  If it is, then the solution should be b, but that may be too
> > disruptive a change to the kernel.  So a reasonable workaround might
> > be:
> >
> > c. Change the input subsystem to limit the length of the
> >capabilities part of the modalias.
> 
> The problem with a) is that the Xen keyboard is not a physical keyboard
> and so it has no way of knowing what keys it actually has.  It is a fake
> input device designed to pass through whatever input the Xen hypervisor
> sends down.  As such, any key could come in.  If it doesn't advertise
> that it has all of these keys, then they would not be accepted by
> libinput when the hypervisor sends them down.

Right, that's what I feared.

xen-kbdfront is setting the bits for keys in the ranges [KEY_ESC,
KEY_UNKNOWN) and [KEY_OK, KEY_MAX), which I think works out to 654
keys and 2362 bytes in the modalias.

> This seems to be the heart of the problem: libinput was designed
> assuming that all keyboards can and must report what keys are actually
> present, and then libinput tries to cram that information into the
> modalias rather than some other sysfs attribute as it should ( or not at
> all... I still don't see how this information is actually supposed to be
> useful to userspace ).

I think modaliases aren't intended to be interpreted by user-space,
other than processing wildcards when matching to modules.

For input devices, the same information is available through other
variables in the uevent, in a more compact form.  The information *is*
useful for user-space; e.g. in initramfs-tools we recognise keyboard
devices and add their drivers to the initramfs but ignore other input
devices.

> As for b), the problem isn't with the modalias attribute itself, but
> when the kernel tries to copy it into the environment block for the udev
> callout.  The environment block is only a single page, and so limited to
> 4 KB.  And that's for everything else that goes into the environment,
> not just the modalias.

Text-based sysfs attributes are limited to a page, but udev receives
uevents through netlink, not sysfs.

The current limit on the environment of a uevent appears to be 2 KB
(UEVENT_BUFFER_SIZE defined in ).  That seems like it
*might* be easier to change, so long as user-space doesn't have a
similar limit.

I looked into systemd/udev, and it seems to use an 8 KB buffer for
receiving uevents:

https://sources.debian.org/src/systemd/247.9-1/src/libsystemd/sd-device/device-monitor.c/?hl=390#L390

But as a first step I think increasing the kernel buffer size to 4 KB
would be enough.  Perhaps someone could test whether this patch to the
domU kernel makes udev happier:

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -30,7 +30,7 @@
 
 #define UEVENT_HELPER_PATH_LEN 256
 #define UEVENT_NUM_ENVP64  /* number of env 
pointers */
-#define UEVENT_BUFFER_SIZE 2048/* buffer for the variables */
+#define UEVENT_BUFFER_SIZE 4096/* buffer for the variables */
 
 #ifdef CONFIG_UEVENT_HELPER
 /* path to the userspace helper executed on an event */
--- END ---

?

Ben.

-- 
Ben Hutchings
Design a system any fool can use, and only a fool will want to use it.


signature.asc
Description: PGP signature


Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-24 Thread Phillip Susi


Ben Hutchings  writes:

> I think a proper fix would be one of:
>
> a. If the Xen virtual keyboard driver is advertising capabilities it
>doesn't have, stop it doing that.
> b. Change the implementation of modalias attributes to allow longer
>values.
>
> It's not clear to me whether the Xen driver is advertising correctly or
> not.  If it is, then the solution should be b, but that may be too
> disruptive a change to the kernel.  So a reasonable workaround might
> be:
>
> c. Change the input subsystem to limit the length of the
>capabilities part of the modalias.

The problem with a) is that the Xen keyboard is not a physical keyboard
and so it has no way of knowing what keys it actually has.  It is a fake
input device designed to pass through whatever input the Xen hypervisor
sends down.  As such, any key could come in.  If it doesn't advertise
that it has all of these keys, then they would not be accepted by
libinput when the hypervisor sends them down.

This seems to be the heart of the problem: libinput was designed
assuming that all keyboards can and must report what keys are actually
present, and then libinput tries to cram that information into the
modalias rather than some other sysfs attribute as it should ( or not at
all... I still don't see how this information is actually supposed to be
useful to userspace ).

As for b), the problem isn't with the modalias attribute itself, but
when the kernel tries to copy it into the environment block for the udev
callout.  The environment block is only a single page, and so limited to
4 KB.  And that's for everything else that goes into the environment,
not just the modalias.



Bug#983357: Bug#988776: Bug#983357: Netinst crashes xen domU when loading kernel

2021-08-24 Thread Ben Hutchings
On Tue, 2021-08-24 at 10:56 -0400, Chuck Zmudzinski wrote:
> On 5/24/2021 3:30 AM, Michael Biebl wrote:
> > Hi Phillip
> > 
> > Am 24.05.2021 um 06:19 schrieb Cyril Brulebois:
> > > > trigger to cold plug all devices.  Both scripts are set -e.  The Xen
> > > > Virtual Keyboard driver and at least one other driver have always 
> > > > failed
> > > > to trigger due to having absurdly long modalias, but the error used to
> > > > be ignored.  The kernel now returns the error to udevadm
> > 
> > So this is a change in behaviour in the kernel?
> > What happens if you boot the installed system? Does udevadm trigger 
> > fail there as well?
> > 
> > I feel a bit uneasy changing the udev start script this late in the 
> > release cycle (especially when it appears like covering up an issue 
> > someplace else).
> > 
> > I'll let Marco make the judgement on this though, as he has the most 
> > experience with those udev udeb start scripts as the original author.
> > 
> > Michael
> > 
> 
> After reviewing Philip's message at
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983357#43
> 
> which seems to point to the root cause of this bug, I can add:
> 
> On my Xen HVM DomU I see the absurdly long modalias for the Xen
> Virtual keyboard that seems to be causing this crash in sysfs at
> 
> /sys/devices/virtual/input/input2/modalias
> 
> But at /sys/devices/vkbd-0/modalias, I see just 'xen:vkbd', which would
> probably not result in an error in the udev script if this was also
> written as the modalias at /sys/devices/virtual/input/input2/modalias
>
> So the Xen virtual keyboard appears more than once in sysfs, and
> modalias is not the same in the different places. This seems
> to be a problem.

They are two different devices, and they should have different
modaliases.

Linux has code for discovering devices on each kind of bus, including
virtual buses, and that code creates "bus devices" such as vkbd-0.  At
this point the kernel doesn't know what the device is capable of.  The
modalias for a bus device carries some identifying information that can
be used to select a driver module for it.

The driver does know what the device is capable of, and how to use it.
It will normally create one or more "class devices" that support a
particular set of operations; in this case input device operations. 
Class devices typically don't have modaliases, since they don't need
another layer of drivers on top.  However, for input devices the
modalias carries information about the device's capabilities.  These
may trigger loading of the evdev or joydev module.

> I understand the correct way to fix this bug is by modifying the
> Xxen virtual keyboard (and any other devices that might cause
> this crash) and not the start-udev script on the netinst
> installation media, which is so far the only available workaround.
> Hopefully Xen will accept a fix if we can come up with a fix.
[...]

I think a proper fix would be one of:

a. If the Xen virtual keyboard driver is advertising capabilities it
   doesn't have, stop it doing that.
b. Change the implementation of modalias attributes to allow longer
   values.

It's not clear to me whether the Xen driver is advertising correctly or
not.  If it is, then the solution should be b, but that may be too
disruptive a change to the kernel.  So a reasonable workaround might
be:

c. Change the input subsystem to limit the length of the
   capabilities part of the modalias.


Ben.

-- 
Ben Hutchings
73.46% of all statistics are made up.


signature.asc
Description: This is a digitally signed message part