You have been subscribed to a public bug by Dan Streetman (ddstreet):
[Impact]
Long-running services overflow the sd_bus->cookie counter, causing further
communication with org.freedesktop.systemd1 to stall.
[Description]
Systemd dbus messages include a "cookie" value to uniquely identify them in
their bus context. This value is obtained from the bus header, and incremented
for each exchanged message in the same bus object. For services that run for
longer periods of time and keep communicating through dbus, it's possible to
overflow the cookie value, causing further messages to the
org.freedesktop.systemd1 dbus to fail. This can lead to these services becoming
unresponsive, as they get stuck trying to communicate with invalid bus cookie
values.
This issue has been fixed upstream by the commit below:
- sd-bus: deal with cookie overruns (1f82f5bb4237)
$ git describe --contains 1f82f5bb4237
v242-rc1~228
$ rmadison systemd
systemd | 229-4ubuntu4 | xenial | source, ...
systemd | 229-4ubuntu21.27 | xenial-security | source, ...
systemd | 229-4ubuntu21.27 | xenial-updates | source, ...
systemd | 229-4ubuntu21.28 | xenial-proposed | source, ...
systemd | 237-3ubuntu10 | bionic | source, ...
systemd | 237-3ubuntu10.38 | bionic-security | source, ...
systemd | 237-3ubuntu10.39 | bionic-updates | source, ...
systemd | 237-3ubuntu10.40 | bionic-proposed | source, ... <----
systemd | 242-7ubuntu3 | eoan | source, ...
Releases starting with Eoan already have this fix.
[Test Case]
There doesn't seem to be an easy test case for this, as the cookie values start
at zero and won't overflow until (1<<32). There have been reports from users
hitting this on Kubernetes clusters continuously running for longer periods (~5
months).
Using GDB, we can construct an artificial test case to test the cookie
overflow. The test case below performs the following steps:
1. Create a new system bus object through sd_bus_default_system()
2. Allocate and append a new method_call message to the bus
3. Send the message through sd_bus_call()
4. Handle the response message and free up the message objects
It's essentially the example code from the
sd_bus_message_new_method_call() manpage, with minor modifications: this
is done continuously, to keep incrementing the bus cookie value. We step
in with GDB when it reaches 0x10000, and set its value to 0xffffff00
which then causes the test program to fail shortly afterwards. An
example test run of an impacted system:
ubuntu@bionic:~$ gcc -Wall test.c -o cookie -lsystemd -g
ubuntu@bionic:~$ gdb --batch --command=test.gdb --args ./cookie
Breakpoint 1 at 0xe61: file test.c, line 38.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
(16s) cookie: 0x00010000 reply-cookie: 0x00010000
Breakpoint 1, print_unit_path (bus=0x555555757290) at test.c:38
38 r = sd_bus_message_new_method_call(bus, &m,
$1 = 0x10000
$2 = 0xffffff00
Call failed: Operation not supported
Sleeping and retrying...
Call failed: Invalid argument
Assertion 'm->n_ref > 0' failed at ../src/libsystemd/sd-bus/bus-message.c:934,
function sd_bus_message_unref(). Aborting.
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=0x6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
To compile and debug the test case above, libsystemd-dev and libsystemd0-dbgsym
are required.
Both test.c and test.gdb source code are attached to this LP bug.
[Regression Potential]
This fix introduces some changes in the way cookie incrementation is handled.
We now have a reduced number of available values, since the patch makes use of
a high order bit to indicate whether we have overflowed or not. Potential
issues could arise from two distinct messages repeating the cookie value, or
from us not handling the cookie reuse properly. In practice, this shouldn't
cause serious problems as most dbus messages should not stall long enough for a
possible overlap in the 2^31 space. The patch has been present in other stable
Ubuntu Series and upstream, and has been validated and tested through the
systemd test suite and autopkgtests.
** Affects: systemd (Ubuntu)
Importance: Undecided
Status: Fix Released
** Affects: systemd (Ubuntu Xenial)
Importance: High
Assignee: Heitor Alves de Siqueira (halves)
Status: In Progress
** Affects: systemd (Ubuntu Bionic)
Importance: High
Assignee: Heitor Alves de Siqueira (halves)
Status: In Progress
** Tags: sts sts-sponsor-ddstreet
--
cookie overruns can cause org.freedesktop.systemd1 dbus to hang
https://bugs.launchpad.net/bugs/1876600
You received this bug notification because you are a member of STS Sponsors,
which is subscribed to the bug report.
--
Mailing list: https://launchpad.net/~sts-sponsors
Post to : [email protected]
Unsubscribe : https://launchpad.net/~sts-sponsors
More help : https://help.launchpad.net/ListHelp