[Bug 1896614] Re: Race condition when starting dbus services
This bug was fixed in the package systemd - 237-3ubuntu10.43 --- systemd (237-3ubuntu10.43) bionic; urgency=medium [ Guilherme G. Piccoli ] * d/p/lp1830746-bump-mlock-ulimit-to-64Mb.patch: - Bump the memlock limit to match Focal and newer releases (LP: #1830746) https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=61adb797642f3dd2e5c14f7914c2949c665cefe8 [ Victor Manuel Tapia King ] * d/p/lp1896614-core-Avoid-race-when-starting-dbus-services.patch: - Fix race when starting dbus services (LP: #1896614) https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=373cb6ccd6978a7112bbfd7e5cf4f703a9f8448e [ Dan Streetman ] * d/t/*, d/p/lp1892358/0001-test-increase-qemu-timeout-for-TEST-08-and-TEST-09.patch, d/p/lp1892358/0002-test-increase-timeout-for-TEST-17-UDEV-WANTS.patch, d/p/lp1892358/0003-test-increase-qemu-timeout-for-TEST-18-and-TEST-19.patch: - Increase QEMU_TIMEOUT on 'upstream' autopkgtest tests - Pull latest tests from newer releases to fix false negatives - Blacklist flaky 'upstream' TEST-03 (LP: #1892358) https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=9fd8391c2499e163515b629a8ca5790898fc599d https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d1756b3e1c3e625ed7162cff4909e7a29c315051 https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=37f8d73516a84e85e4057d6a92204b4a174af718 https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=229ed2076eb773efc548035262b8b8009bf89207 https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=f2d7b1f952667316cc07a4b3c5010e66ace07a90 https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=659befe61bbfeb7afc9efa24458c9745412d7c6d -- Victor Manuel Tapia King Wed, 07 Oct 2020 16:30:03 -0400 ** Changed in: systemd (Ubuntu Bionic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
** Tags removed: verification-needed verification-needed-bionic ** Tags added: verification-done verification-done-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
# VERIFICATION Note: As a reminder, the issue here is that there's a race condition between any DBUS service and systemctl daemon-reload, where systemd adds the DBUS filter (AddMatch) that looks for a name change when that has already happened. I'll be using systemd-logind as the DBUS service in my reproducer. Using the following reproducer: for i in $(seq 1 1000); do echo $i; ssh $SERVER 'sudo systemctl daemon- reload & sudo systemctl restart systemd-logind'; done - With systemd=237-3ubuntu10.42 (-updates), after a few runs, systemd-logind is stuck as a running job and ssh is not responsive. DBUS messages[1] show that the AddMatch filter is set by systemd after systemd-logind has acquired its final name (systemd-login1) - With systemd=237-3ubuntu10.43 (-proposed), systemd-logind does not get stuck and everything continues to work. In a scenario[2] where the systemd DBUS AddMatch message arrives after the final systemd-logind NameOwnerChanged, systemd is able to catch up thanks to the GetNameOwner introduced in the patch [1] https://pastebin.ubuntu.com/p/NxRNX9bwCP/ [2] https://pastebin.ubuntu.com/p/jpKpW3g2bK/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
Hello Victor, or anyone else affected, Accepted systemd into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.43 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-bionic. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: systemd (Ubuntu Bionic) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
** Description changed: [impact] In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever. [test case] set up a 1-cpu VM with Bionic, and configure the system with a ssh key - so the user can ssh to localhost. Then run: + so the user can ssh to localhost. Then run something like: - ubuntu@lp1896614-b:~$ while timeout 5 ssh localhost true; do echo - 'reloading'; sudo systemctl restart systemd-logind & sudo systemctl - daemon-reload; done + $ while timeout 5 ssh localhost true; do echo 'reloading'; sudo + systemctl restart systemd-logind & sudo systemctl daemon-reload; done - that should exit the while loop after only a few attempts. At that - point, there should be a running job for systemd-logind, and any logins - attempted after the bug is reproduced should also hang waiting for the - systemd-logind job to complete, e.g.: + if that doesn't work try: + + $ while timeout 5 ssh localhost true; do echo 'reloading'; sudo sh -c + 'systemctl restart systemd-logind & systemctl daemon-reload'; done + + + once the reproducer exits the while loop, there should be a running job for systemd-logind, and any logins attempted after the bug is reproduced should also hang waiting for the systemd-logind job to complete, e.g.: ubuntu@lp1896614-b:~$ systemctl list-jobs JOB UNIT TYPE STATE 525 systemd-logind.service start running 669 session-6.scopestart waiting 664 session-5.scopestart waiting 3 jobs listed. [regression potential] any regression would likely involve services that are Type=dbus failing to complete starting. as with any systemd change, regressions could also involve assertion failures in systemd which causes it to exit. [scope] this is needed only for bionic. this is fixed upstream with commit a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting in v243, so it is included already in focal and later. (per upstream bug) this was introduced by upstream commit 75152a4d6aedbfd3ee8b2d5782b9edf27407622a which was included starting in v237, so this bug is not present in xenial or earlier. [original description] In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever. The issue appears when systemd runs the "AddMatch" dbus method call to track the service's "NameOwnerChange" once it has already ran. A working instance would look like this: https://pastebin.ubuntu.com/p/868J6WBRQx/ A failing instance would be: https://pastebin.ubuntu.com/p/HhJZ4p8dT5/ I've been able to reproduce the issue on Bionic (237-3ubuntu10.42) running: sudo systemctl daemon-reload & sudo systemctl restart systemd-logind -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
** Changed in: systemd Status: Unknown => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
** Description changed: + [impact] + + In certain scenarios, such as high load environments or when "systemctl + daemon-reload" runs at the same time a dbus service is starting (e.g. + systemd-logind), systemd is not able to track properly when the service + has started, keeping the job 'running' forever. + + [test case] + + set up a 1-cpu VM with Bionic, and configure the system with a ssh key + so the user can ssh to localhost. Then run: + + ubuntu@lp1896614-b:~$ while timeout 5 ssh localhost true; do echo + 'reloading'; sudo systemctl restart systemd-logind & sudo systemctl + daemon-reload; done + + that should exit the while loop after only a few attempts. At that + point, there should be a running job for systemd-logind, and any logins + attempted after the bug is reproduced should also hang waiting for the + systemd-logind job to complete, e.g.: + + ubuntu@lp1896614-b:~$ systemctl list-jobs + JOB UNIT TYPE STATE + 525 systemd-logind.service start running + 669 session-6.scopestart waiting + 664 session-5.scopestart waiting + + 3 jobs listed. + + [regression potential] + + any regression would likely involve services that are Type=dbus failing + to complete starting. as with any systemd change, regressions could also + involve assertion failures in systemd which causes it to exit. + + [scope] + + this is needed only for bionic. + + TBD - needed for xenial? + + this is fixed upstream with commit + a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting in + v243, so it is included already in focal and later. + + [original description] + In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever. The issue appears when systemd runs the "AddMatch" dbus method call to track the service's "NameOwnerChange" once it has already ran. A working instance would look like this: https://pastebin.ubuntu.com/p/868J6WBRQx/ A failing instance would be: https://pastebin.ubuntu.com/p/HhJZ4p8dT5/ I've been able to reproduce the issue on Bionic (237-3ubuntu10.42) running: sudo systemctl daemon-reload & sudo systemctl restart systemd-logind ** Also affects: systemd via https://github.com/systemd/systemd/issues/12956 Importance: Unknown Status: Unknown ** Description changed: [impact] In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever. [test case] set up a 1-cpu VM with Bionic, and configure the system with a ssh key so the user can ssh to localhost. Then run: ubuntu@lp1896614-b:~$ while timeout 5 ssh localhost true; do echo 'reloading'; sudo systemctl restart systemd-logind & sudo systemctl daemon-reload; done that should exit the while loop after only a few attempts. At that point, there should be a running job for systemd-logind, and any logins attempted after the bug is reproduced should also hang waiting for the systemd-logind job to complete, e.g.: ubuntu@lp1896614-b:~$ systemctl list-jobs - JOB UNIT TYPE STATE + JOB UNIT TYPE STATE 525 systemd-logind.service start running 669 session-6.scopestart waiting 664 session-5.scopestart waiting 3 jobs listed. [regression potential] any regression would likely involve services that are Type=dbus failing to complete starting. as with any systemd change, regressions could also involve assertion failures in systemd which causes it to exit. [scope] this is needed only for bionic. - TBD - needed for xenial? - this is fixed upstream with commit a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting in v243, so it is included already in focal and later. + + (per upstream bug) this was introduced by upstream commit + 75152a4d6aedbfd3ee8b2d5782b9edf27407622a which was included starting in + v237, so this bug is not present in xenial or earlier. [original description] In certain scenarios, such as high load environments or when "systemctl daemon-reload" runs at the same time a dbus service is starting (e.g. systemd-logind), systemd is not able to track properly when the service has started, keeping the job 'running' forever. The issue appears when systemd runs the "AddMatch" dbus method call to track the service's "NameOwnerChange" once it has already ran. A working instance would look like this: https://pastebin.ubuntu.com/p/868J6WBRQx/ A failing instance would be: https://pastebin.ubuntu.com/p/HhJZ4p8dT5/ I've been able to reproduce the issue on
[Bug 1896614] Re: Race condition when starting dbus services
** Changed in: systemd (Ubuntu Bionic) Assignee: (unassigned) => Victor Tapia (vtapia) ** Changed in: systemd (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: systemd (Ubuntu Bionic) Status: New => In Progress ** Changed in: systemd (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
In the original report, the issue happened randomly on boot when a service[1] was triggering a reload while systemd-logind was starting, resulting in a list of queued jobs that were never executed. The issue can happen too under high load conditions, as reported upstream: https://github.com/systemd/systemd/issues/12956 To simplify the reproducer I went with systemd-logind+daemon-reload, but it can be done with any other dbus service. [1] [Unit] Description=Disable unattended upgrades After=network-online.target local-fs.target [Service] Type=oneshot ExecStart=/bin/bash -c "/bin/chmod 644 /etc/cron.daily/apt-compat ; /bin/systemctl disable apt-daily-upgrade.timer apt-daily.timer ; /bin/systemctl stop apt-daily-upgrade.timer apt-daily.timer" [Install] WantedBy=multi-user.target ** Bug watch added: github.com/systemd/systemd/issues #12956 https://github.com/systemd/systemd/issues/12956 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1896614] Re: Race condition when starting dbus services
restarting systemd-logind is not safe, as existing sessions can be logged out. also performing daemon-reload, mid-boot, also is not safe. Can you explain the usecase and why these actions are performed together, racing each other? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1896614 Title: Race condition when starting dbus services To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1896614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs