Next reboot after the user@0.service start/stop cycle was a quick one. It looks as something gets messed up upon logging in. And if not resolved, the system has problems shutting down.
On 4 October 2013 19:31, Toms Seisums <toms.seis...@gmail.com> wrote: > @Andrey, the full log can be found here, the link was there before also: > http://pastebin.com/wbr04AQw (systemd-207) > It's quite silent about the cause for the timeout, though. > > I've been debugging this for three days now, and I've crystallized a > little bit of information. > > 1. If I boot the system and stay on login screen, no matter when I hit > Ctrl+Alt+Del (ASAP or after an hour), the system will always reboot > normally. > 2. If I login, and instantly type `reboot`, in about 75% the system will > reboot normally, the other 25% being slow, resulting in user@0.servicetimeout. > 2.1. If I login and wait for some time, the next reboot is going to be > slow. > 3. I'm using Arch, and they had a bug report already there - > https://bugs.archlinux.org/task/37007, it has come down to that > systemd-208 fixes it, though, I've updated and the problem still remains. > 4. I've attempted numerous daemon configurations, but the results are > inconsistent. Due to that Arch issue being closed and not reopened, I did > this, in order to find out whether running daemons aren't at fault. > 4.1. I, for some reboot's disabled every daemon (not also running them > after the system booted up), and the reboots/shutdowns were still slow. > 4.1.1. The Ctrl+Alt+Del before logging in, is still fast. > 5. Downgrading to systemd-204 works flawlessly, also removing the issue > that's described in next point. > 6. Whenever I boot with systemd-207 or systemd-208 and login, I am > presented with: > > Failed to open private bus connection: Failed to connect to socket > /run/user/0/dbus/user_bus_socket: No such file or directory > > 6.1. The issue appears to already be reported by @Andrey - here: > http://lists.freedesktop.org/archives/systemd-devel/2013-September/013393.html > 7. Apparently, I specially set up a system running in VBox, also an Arch > Linux, that didn't have the problem neither running with systemd-207, > neither has it with systemd-208 (that actually led me to disable the > daemons). > 7.1. I did encounter it once though, but since then, it hasn't happened. > 8. If, before running shutdown, I manually: `systemctl stop user@0.service`, > the system shuts down prefectly. > > Here are some logs of shutdown with systemd-208: > > 1. Quick, Ctrl+Alt+Del before logging in. (1: > http://pastebin.com/5X0sUHCV) > 2. Slow, executed a little later after logging in. (2: > http://pastebin.com/tjnLNJKH) > 3. Slow, executed by Ctrl+Alt+Del after logging in, doing some tasks, > then `logout`. (3: http://pastebin.com/4hEmRYen) > 4. I wanted to catch a fast reboot with a full login and full daemons > running, but couldn't manage to - they all were slow. > 5. It also happens if I log in with a different user than root and push > Ctrl+Alt+Del. (4: http://pastebin.com/U36WqDzP) > 5.1. This is just a test for you @Lennart, just to see if a different > user changes anything. > 6. `systemctl stop user@0.service` to `reboot`, that results in a quick > one. (5: http://pastebin.com/aJ0s32MR) (I, for a second thought that > Apache might be causing some issues, because it almost always gets killed, > so, I disabled it for some tests, this log has no Apache information - > that's just in case you're wondering why the logs don't match. More on > Apache adventures below...) > 6.1. I reran it after Apache tests to clarify, quick reboot the first > time (9: http://pastebin.com/HcUA72AE). The second time. (10: > http://pastebin.com/rf96QiGA) The third time. (11: > http://pastebin.com/bpFu1sQK) All in a row. I did 2 more to test for the > next point, all quick. > 6.2. This doesn't seem to change the SFTP behavior, though. > > An extra piece of information I can provide, is that, this appears to > happen to break SFTP. If I boot, then connect from my other PC with SFTP > (as root), and initiate a reboot from tty0, my SFTP connection will not be > dropped. It will actually never even timeout. I could restart my Linux > machine for about 100times, and the SFTP window will not even blink, > thinking that it's connected. Only when I refresh the SFTP window, I get an > error message, asking me to reconnect. With systemd-204 there is no such an > issue - it disconnects immediately. Though, on my VBox Arch, there wasn't > such an issue with systemd-207, nor with systemd-208. > > > And @Lennart, you asked, I delivered, here is a log with > user@0.servicestart/stop with 0.5second intervals between start/stop calls. I > stripped > the booting sequence, starting the log off at about where the system ended > it's boot. (12: http://sharetext.org/zJc, the link is different because I > exceeded pastebins free limit, doh). > Nothing special there, except that dbus thingy. > > --- > > Apache: > > So, noticing that Apache gets killed most of time, and systemd-204 also > had some delays where it waited for Apache to exit, I thought that the > problem is with Apache. > > I started this debugging scenario by simply doing a "systemctl stop httpd" > into "reboot", that made my system to reboot quickly. (6: > http://pastebin.com/4NgxVAeP) > > Though, one `httpd` instance still remains (occasionally, when I > "systemctl stop httpd", "ps aux" immediately after - will still display one > httpd process), that appears to actually be a bug in Apache's (I guess), > that appears to be the one that has to be explicitly killed at the end of > reboot process. > > By partially confirming, that this is because of Apache, I stopped it upon > next boot, disabled it, and rebooted. Starting freshly with "ps aux" > reporting no signs of httpd, I attempted to "reboot" - guess what? That's a > long one. Three times in a row with no httpd upon boot - all reboots were > long, I guess that's not Apache then. > > While writing the above and debugging, after having enabled Apache back > and stopping it upon boot, even though previous reboots executed > immediately, one didn't. (7: http://pastebin.com/3FnikCX9) Apparently the > next one after it also didn't. (8: http://pastebin.com/pnWhuQWC) > > --- > > P.S. While debugging this, and trying to link some of the problems to some > of the daemons, I'm starting to feel as if I'd hit a delusion spree. While > occasionally it reboots fast, it might as well be just the random > quick-reboot I've been hunting before. Just that stopping those daemons > makes me think the problem is tied to them. > > > --- > > Loads of information, some of it are probably total duplicates, but I hope > I've given enough and that you'll be able to understand what I've provided. > More importantly, I hope that this will help to resolve these issues. > > > On 1 October 2013 03:43, Lennart Poettering <lenn...@poettering.net>wrote: > >> On Mon, 30.09.13 18:55, Toms Seisums (toms.seis...@gmail.com) wrote: >> >> > It appears that history repeats itself. >> > >> > Once (July 2013) a similar issue was reported here, by different user: >> > >> http://lists.freedesktop.org/archives/systemd-devel/2013-July/012283.html >> > >> > I'm also using Arch Linux, systemd-207, linux-3.11.2. >> > >> > On shutdown and reboot, user@0.service occasionally freezes my system >> until >> > it times out and gets killed. >> > >> > Steps to reproduce: >> > 1) Boot >> > 2) Login >> > 3) Execute either systemctl reboot or reboot or shutdown -h now or >> whatever. >> > >> > I've went through the steps here, to get a proper shutdown log: >> > >> http://freedesktop.org/wiki/Software/systemd/Debugging/#shutdowncompleteseventually >> > >> > And, the shutdown log can be found here: http://pastebin.com/wbr04AQw >> > >> > The actual lines related to issue: >> > >> > #2223 [ 74.387130] systemd[1]: Got D-Bus request: >> > org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local >> > #2224 [ 163.814925] systemd[1]: user@0.service stopping timed out. >> Killing. >> > #2225 [ 163.815447] systemd[1]: user@0.service changed stop-sigterm -> >> > stop-sigkill >> > #2226 [ 163.816063] systemd[1]: Received SIGCHLD from PID 326 >> (systemd). >> > #2227 [ 163.816093] systemd[1]: Got SIGCHLD for process 326 (systemd) >> > #2228 [ 163.816153] systemd[1]: Child 326 died (code=killed, >> status=9/KILL) >> > #2229 [ 163.816158] systemd[1]: Child 326 belongs to user@0.service >> > #2230 [ 163.816171] systemd[1]: user@0.service: main process exited, >> > code=killed, status=9/KILL >> > #2231 [ 163.816183] systemd[1]: user@0.service changed stop-sigkill -> >> > failed >> > >> > I currently haven't found out a pattern for this. I have attempted to >> quit >> > some processes before initiating the shutdown/reboot, but the result is >> > random. >> > >> > About 95% of shutdowns/reboots it will be a long one. >> > 3% it will be noticeably longer than the quickest one, but shorter than >> the >> > above one. >> > Remaining 2%, the shutdown will be very quick. >> >> Hmm, this is the systemd user instance for root which fails to shut >> down. Can you reproduce this if you start/stop "user@0.service" quickly >> in a loop? >> >> Lennart >> >> -- >> Lennart Poettering - Red Hat, Inc. >> > > > > -- > Toms Seisums > > - (+371) 26431391 > > May the force be with you, always! > -- Toms Seisums - (+371) 26431391 May the force be with you, always!
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel