Re: [systemd-devel] Bad accelerometer values cause incorrect screen rotation
On Thu, Sep 5, 2019 at 9:00 PM Bastien Nocera wrote: > Daniel, if you run into many more problems, there's also the > possibility of adding a boot argument to disable the accelerometer (or > maybe its effects?), either in iio-sensor-proxy or gnome-shell. Thanks for the suggestion, manually adding something through the bootloader menu may indeed be a bit more practical than the laptop acrobatics workaround. For cases where we know which driver is used this can probably already be done, by adding a modprobe.blacklist= boot arg. I appreciate the quick action on the HP laptop case. Let's see how much that reduces the problem occurance rate. Thanks! Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Bad accelerometer values cause incorrect screen rotation
On Thu, Sep 5, 2019 at 6:07 PM Bastien Nocera wrote: > I've read through this, and I'm happy blacklisting the hp_accel driver > in code. For the other devices, I'd rather leave it as-is. That would indeed avoid most problem cases that I've seen, and the current case, probably enough to stop me grumbling for another year or so until this happens again in some other context :) So I support that idea. Do you have any preference on where we blacklist it? In the hwdb it's quite easy to match DMI vendor HP & driver lis3lv02d. But we'd really want a new way of saying "ignore the accelerometer" as ACCEL_POSITION=base doesn't seem like the right way to express that. Or we could blacklist it in iio-sensor-proxy but since there's no mention of hp_accel in the udev properties for the device (you just get the driver as li3lv02d) then you'd need to grab the DMI vendor name from /sys/class/dmi/id or something like that. > > When this unfortunate situation happens, the user experience is > > really > > terrible. Except for workarounds that involve going to the command > > line, the best workaround under GNOME seems to be to physically > > rotate > > the device into a position that causes the screen orientation to be > > normal/unrotated, then while maintaining and holding the device in > > that highly awkward position with one hand, try your very best to > > manipulate the mouse cursor with your other hand and navigate the > > menu > > to enable Orientation Lock. > > FYI, Windows+O in GNOME to toggle the orientation lock setting. Good to know, thanks! I just tried though and it's also seriously difficult... Especially because the Windows key is quite a distance from the O key, it's really hard to press this key combo with one hand when you're busy trying to sustain the device at a fixed angle in an awkward position with your other hand. > Where would we get this information? From the same DSDT that doesn't > have enough information? That doesn't sound like a good idea. My initial idea is DMI/DSDT plus a whitelist, I realise its not ideal but I'm trying to think towards something that (in my eyes) would be better than the current state. > If we disable iio-sensor-proxy's functionality by default, I'll be sent > more bug reports than I already receive from folks where the sensor > drivers aren't working or not compiled in, so that's a big no-no from > me. In my eyes, having some users that accidently don't get their screens rotated by the accelerometer (with a relatively simple fix of whitelisting the product) is a better outcome than having some users that go through the miserable experience of having your screen rotated incorrectly (which is hard to recover from and tricky for a developer to fix without physical device access). This may just be a difference of opinion. > Also, it would be pretty trivial changing the default GNOME > configuration to have the accelerometer pegged to the default > orientation. I appreciate the suggestion especially if its trivial but I don't understand what you wrote here- can you explain a bit more? Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Bad accelerometer values cause incorrect screen rotation
Hi, Over the years we've seen a bunch of reports of systems that automatically rotate the display to some incorrect orientation, based on trusting some accelerometer data values which were not interpreted correctly. I have another affected system in hand here. When this unfortunate situation happens, the user experience is really terrible. Except for workarounds that involve going to the command line, the best workaround under GNOME seems to be to physically rotate the device into a position that causes the screen orientation to be normal/unrotated, then while maintaining and holding the device in that highly awkward position with one hand, try your very best to manipulate the mouse cursor with your other hand and navigate the menu to enable Orientation Lock. Since the effects of this issue when it bites are so bad, and because it seems like we aren't winning the "quirk the accelerometer" game here, I'm wondering if it's time for us to restrict this default setting of automatic rotation based on accelerometer data to only situations where: 1. The product is actually designed to be usable when rotated, and 2. We have a higher degree of confidence that we're actually interpreting the accelerometer data correctly Why are we not winning? Why can't we fix this properly? I think we're suffering largely through applying this auto-rotation behaviour to all accelerometer data, from setups where previously nobody really cared if the data was misinterpreted, or the data was specifically interpreted for a different context (we're specifically interested in measuring the physical orientation of the screen, but accelerometers have other uses too). Windows 10 (and presumably 8) does have the automatic screen rotation feature based on accelerometer data, but it seems to apply to fewer products. For example it does not apply automatic rotation to the Quanta NL3 classmate nor to the HP EliteBook 840 G3, two systems that I have in hand that both required specific engineering on Linux after real users had already run into the horrible automatic-incorrect-rotation described above: https://github.com/systemd/systemd/commit/ebf482e7cdabfc1266a86ec8a5f92a964ce08afe hp_accel: fix accelerometer orientation for EliteBook 840 (patch posted today, no link yet) The challenge here is a lack of standardization of how accelerometers are installed relative to the screen, and a lack of a standard way of accessing model-specific data that gives us this info. Without any better options we've been trying to create and maintain our own databases, for example systemds 60-sensor.hwdb and Linux kernel's hp_accel.c, but that's turning out to be problematic because: 1. The databases entries are mostly created retroactively - usually, entries are created when a tech-savvy user steps forward to share the required data, after one or more users have already been bitten by the issue. This is sub-standard. 2. We estimate the right way to distinguish models for different quirks by hoping that DMI data will serve this purpose, but we also don't know how to do that reliably, so sometimes we even apply the wrong quirks. Two recent examples: https://bugzilla.redhat.com/show_bug.cgi?id=1717712 (more on this case below) hp_accel: fix accelerometer orientation for EliteBook 840 (patch posted today, no link yet) Bastien once made the suggestion that we could fish the model-to-quirk mapping from the Windows drivers, but I can't find anything in the HP driver. On HP EliteBook 840 the device is not even exposed as a sensor under Windows and I can't find any way of accessing the data or making it auto-rotate - maybe they don't even have such a mapping? The only Windows application of this sensor seems to be automatic hard disk head parking, which presumably just detects sudden movements in any direction. We did recently work with some Acer all-in-one PCs which had an accelerometer which also provided working auto-rotation under windows out of the box, while again producing the wrong and awkward behaviour on Linux. Thanks to vendor contacts we did discover the scheme used, and now automatically detect the accelerometer orientation on such products. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f38ab20b749da84e3df1f8c9240ddc791b0d5983 However, we then found DSDTs with this orientation data that far predated this patch's existence. So not a great win; our solution was not made in timely fashion. ACPI offers something that might help - PLD can be used to describe the physical orientation of product components. But I don't think we've seen any examples of this data being provided by vendors for accelerometers. I see the latest development of having the hwdb specify whether the accelerometer is in the base or the display of the device. This was implemented for dealing with a device with accelerometers in both positions (https://github.com/hadess/iio-sensor-proxy/pull/262) - clearly the screen rotation should only follow thy
Re: [systemd-devel] Debugging active timers that do not trigger
On Thu, Nov 15, 2018 at 7:04 PM Michal Koutný wrote: > @Daniel, is it possible there are some daemon-reloads running > concurrently with the timer? More precisely, can it happen the timer > expires exactly when systemd reloads? I don't think so. The journal only show a single "systemd[1]: Reloading." message and that happened as part of our initramfs scripts, before the real-root systemd was run. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Debugging active timers that do not trigger
On Thu, Nov 8, 2018 at 6:46 PM Andrei Borzenkov wrote: > It is possible that system never ends booting. Do you have any pending > jobs (systemctl list-jobs)? What "systemctl is-system-running" says? Thanks for the suggestion! It sounds like a good one - I did reproduce this on first boot and we do have a known issue in that area affecting systemd's perception of boot completion. https://gitlab.gnome.org/GNOME/gdm/issues/439 Unfortunately I wasn't able to leave the system in that state after all, so I can't check directly any more, but I'll do more testing along these lines. Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Debugging active timers that do not trigger
Hi, On Endless we have the following eos-autoupdater.timer: [Unit] Description=Endless OS Automatic Update Timer Documentation=man:eos-autoupdater(8) ConditionKernelCommandLine=!endless.live_boot ConditionKernelCommandLine=ostree [Timer] OnBootSec=15m OnUnitInactiveSec=1h RandomizedDelaySec=30min [Install] WantedBy=multi-user.target This ordinarily works fine, but we have seen a couple of random, rare occasions where this timer doesn't trigger the target eos-autoupdater.service. I have one case here in front of me now with details below. In the list-timers output you can see it has "n/a" for NEXT/LAST etc. There is no evidence of eos-autoupdater.service having started at any point in the journal (nor any crashes). This is not a major concern as it seems to only happen rarely, and fixes itself upon reboot. Also so far we have only reproduced this on systemd-237; it's hard to judge whether it's fixed in a newer version due to the low occurance rate of the issue. But I would be curious if there are any easy debugging steps I can follow when we see this - I'll leave the system running in this state for a couple of days in case there are suggestions. $ systemctl status eos-autoupdater.timer ● eos-autoupdater.timer - Endless OS Automatic Update Timer Loaded: loaded (/lib/systemd/system/eos-autoupdater.timer; enabled; vendor preset: enabled) Active: active (elapsed) since Wed 2018-11-07 15:11:14 CST; 23h ago Trigger: n/a Docs: man:eos-autoupdater(8) Nov 07 15:11:14 endless systemd[1]: Started Endless OS Automatic Update Timer. $ systemctl status eos-autoupdater.service ● eos-autoupdater.service - Endless OS Automatic Updater Loaded: loaded (/lib/systemd/system/eos-autoupdater.service; indirect; vendor preset: enabled) Active: inactive (dead) Docs: man:eos-autoupdater(8) $ systemctl list-timers NEXT LEFT LAST PASSED UNIT ACTIVATES Thu 2018-11-08 15:34:06 CST 1h 17min left Wed 2018-11-07 15:26:02 CST 22h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service Thu 2018-11-08 17:10:45 CST 2h 54min left Thu 2018-11-08 14:10:44 CST 5min ago eos-phone-home.timer eos-phone-home.service Mon 2018-11-12 00:00:00 CST 3 days left n/a n/a fstrim.timer fstrim.service n/a n/a n/a n/a eos-autoupdater.timereos-autoupdater.service n/a n/a Wed 2018-11-07 15:27:05 CST 22h ago systemd-readahead-done.timer systemd-readahead-done.service $ systemctl show eos-autoupdater.timer Unit=eos-autoupdater.service NextElapseUSecMonotonic=infinity LastTriggerUSecMonotonic=0 Result=success AccuracyUSec=1min RandomizedDelayUSec=30min Persistent=no WakeSystem=no RemainAfterElapse=yes Id=eos-autoupdater.timer Names=eos-autoupdater.timer Requires=sysinit.target WantedBy=multi-user.target Conflicts=shutdown.target Before=timers.target multi-user.target eos-autoupdater.service shutdown.target After=sysinit.target Triggers=eos-autoupdater.service Documentation=man:eos-autoupdater(8) Description=Endless OS Automatic Update Timer LoadState=loaded ActiveState=active SubState=elapsed FragmentPath=/lib/systemd/system/eos-autoupdater.timer UnitFileState=enabled UnitFilePreset=enabled StateChangeTimestamp=Wed 2018-11-07 15:26:36 CST StateChangeTimestampMonotonic=934682450 InactiveExitTimestamp=Wed 2018-11-07 15:11:14 CST InactiveExitTimestampMonotonic=13380144 ActiveEnterTimestamp=Wed 2018-11-07 15:11:14 CST ActiveEnterTimestampMonotonic=13380144 ActiveExitTimestampMonotonic=0 InactiveEnterTimestampMonotonic=0 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureJobMode=replace IgnoreOnIsolate=no NeedDaemonReload=no JobTimeoutUSec=infinity JobRunningTimeoutUSec=infinity JobTimeoutAction=none ConditionResult=yes AssertResult=yes ConditionTimestamp=Wed 2018-11-07 15:11:14 CST ConditionTimestampMonotonic=13380053 AssertTimestamp=Wed 2018-11-07 15:11:14 CST AssertTimestampMonotonic=13380122 Transient=no Perpetual=no StartLimitIntervalUSec=10s StartLimitBurst=5 StartLimitAction=none FailureAction=none SuccessAction=none InvocationID=c1bf78021112483db79c39221fd58d80 CollectMode=inactive $ ls -l /var/lib/systemd/timers/ total 0 -rw-r--r-- 1 root root 0 Nov 7 15:11 stamp-fstrim.timer Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] man: document kill behavior after the main process exits
On Thu, Apr 23, 2015 at 9:32 AM, Lennart Poettering lenn...@poettering.net wrote: +titleBeyond the main process/title + + paraThe varnameKillMode=/varname option primarily defines + behavior up until the point where the main process has gone away. + systemd expects that when killed with the signal specified by + varnameKillSignal=/varname, the main process will kill and + reap all the other processes in the control group before + exiting itself. Well, I don't think this is right. I mean, systemd doesn't really expect this. It's completely OK if daemons leave children around in this case. I could avoid the word expect but I think it's worth mentioning as those discarded children might not be designed to accept 2 SIGTERMs in normal conditions. For example, any child process that uses glib and exits the mainloop from the SIGTERM handler does not really respond well here - it drops the SIGTERM handler after the first one, so the second SIGTERM will cause an immediate/unclean shutdown, which is not completely OK from the view of the child. KillMode= is actually very much about the time after the main process died. If KillMode=process is specified systemd should not send any signal to anything but the main process, and that applies to both SIGTERm and the following SIGKILL: I agree, which is why I specifically only talk about the cgroup/mixed kill modes. + paraIf optionKillMode=control-group/option, systemd will + then send a second varnameKillSignal=/varname signal to the + remaining processes, which will then be followed by a + constantSIGKILL/constant if processes are still around, even + if optionSendSIGKILL=no/option./para Hmm, no? SendSIGKILL=no should have the effect of not sending any SIGKILL at all. Anything else would be a bug. Must be a bug then; I confirmed this is actually what happens by adding logging to the kill syscall implementation in the kernel. + paraOr, if optionKillMode=mixed/option, systemd will + directly send constantSIGKILL/constant to all remaining members + of the control group, regardless of the + varnameSendSIGKILL=/varname preference./para Hmm? No, not at all. If you use mixed, then SIGTERM is is sent to the main process of the daemon, and SIGKILL to *al* processes of the daemon if there are any left after the main process exited. That's exactly what I wrote - all of this falls under a paragraph explaining what happens when the main process has already gone. I guess I need to improve the wording. Thanks for your feedback Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH] man: document kill behavior after the main process exits
While looking at the exact behavior of how systemd stops services, I encountered some behavior that wasn't clear from reading the man page. Specifically, if the main process exits before its children, the child processes will actually receive a second SIGTERM. If that doesn't kill them, they will later receive a SIGKILL too, even if SendSIGKILL=no. Add some notes about this. --- man/systemd.kill.xml | 30 -- 1 file changed, 28 insertions(+), 2 deletions(-) Thanks for helping me to get to the bottom of this in the thread: Zombie process still exists after stopping gdm.service unit_kill_context() has a comment which is relevant here: /* FIXME: For now, we will not wait for the * cgroup members to die, simply because * cgroup notification is unreliable. It * doesn't work at all in containers, and * outside of containers it can be confused * easily by leaving directories in the * cgroup. */ /* wait_for_exit = true; */ When this is fixed, assumed to happen soon, the precise behaviour seen in the discussion will change slightly (in terms of timing). So I have carefully written this documentation patch in a way that does not go into the timing details. The text changed below should therefore be true both before and after that FIXME is resolved. diff --git a/man/systemd.kill.xml b/man/systemd.kill.xml index e57f0e7..10232fb 100644 --- a/man/systemd.kill.xml +++ b/man/systemd.kill.xml @@ -154,8 +154,9 @@ termvarnameSendSIGKILL=/varname/term listitemparaSpecifies whether to send constantSIGKILL/constant to remaining processes after a -timeout, if the normal shutdown procedure left processes of -the service around. Takes a boolean value. Defaults to yes. +timeout, if the normal shutdown procedure didn't succeed in +shutting down the main process. Takes a boolean value. +Defaults to yes. /para/listitem /varlistentry @@ -163,6 +164,31 @@ /refsect1 refsect1 +titleBeyond the main process/title + + paraThe varnameKillMode=/varname option primarily defines + behavior up until the point where the main process has gone away. + systemd expects that when killed with the signal specified by + varnameKillSignal=/varname, the main process will kill and + reap all the other processes in the control group before + exiting itself. If that doesn't happen, and the main process + exits with other processes still running in the control group, + systemd gets a bit more heavy-handed:/para + + paraIf optionKillMode=control-group/option, systemd will + then send a second varnameKillSignal=/varname signal to the + remaining processes, which will then be followed by a + constantSIGKILL/constant if processes are still around, even + if optionSendSIGKILL=no/option./para + + paraOr, if optionKillMode=mixed/option, systemd will + directly send constantSIGKILL/constant to all remaining members + of the control group, regardless of the + varnameSendSIGKILL=/varname preference./para + + /refsect1 + + refsect1 titleSee Also/title para citerefentryrefentrytitlesystemd/refentrytitlemanvolnum1/manvolnum/citerefentry, -- 2.1.0 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Zombie process still exists after stopping gdm.service
On Mon, Apr 20, 2015 at 6:29 PM, Lennart Poettering lenn...@poettering.net wrote: Sure, we don't want to keep track of which processes we already killed, to distuingish them from the processes newly created in the time between our sending of SIGTERM and receiving SIGCHLD for the main process. We assume that if we get SIGCHLD for the main process that the daemon is down, and everything that is left over then is auxiliary stuff we can kill. OK, doesn't sound unreasonable. Once we get to the end of this topic, I'll submit a documentation patch to make that a bit clearer. So, of the 3 signals (TERM, TERM, KILL) sent to gdm-simple-slave within a total time of 0.01s, we have good explanations for the first 2. The 3rd one (KILL) is still suspicious to me though. It is sent 0.4ms after the preceding SIGTERM, here is what happens in the code: 1. gdm's main process exits due to the first SIGTERM. systemd becomes aware in service_sigchld_event(), and responds as follows: case SERVICE_STOP_SIGTERM: case SERVICE_STOP_SIGKILL: if (!control_pid_good(s)) service_enter_stop_post(s, f); 2. Inside service_enter_stop post, there is no command to execute, so we call: service_enter_signal(s, SERVICE_FINAL_SIGTERM, SERVICE_SUCCESS); 3. service_enter_signal calls unit_kill_context() to send the second SIGTERM. Looking at what happens inside unit_kill_context(): there is no main process, nor control process, so we go straight to the cgroup killing. The cgroup kill happens without error, and we reach the end of the function: return wait_for_exit; wait_for_exit was not modified from its intial value (false) during the course of the function, so false is returned here. 4. Back in service_enter_signal, since unit_kill_context returned false, we do not arm the timer. Without hesitation systemd goes directly and sends SIGKILL. } else if (state == SERVICE_FINAL_SIGTERM) service_enter_signal(s, SERVICE_FINAL_SIGKILL, SERVICE_SUCCESS) I can understand that once the main PID goes away, systemd feels welcome to get heavy handed with the remaining processes. But doing SIGTERM and then immediately SIGKILL just a few microseconds later seems strange - why not go straight for the SIGKILL? There's a comment in unit_kill_context() which looks relevant here: /* FIXME: For now, we will not wait for the * cgroup members to die, simply because * cgroup notification is unreliable. It * doesn't work at all in containers, and * outside of containers it can be confused * easily by leaving directories in the * cgroup. */ /* wait_for_exit = true; */ If that were uncommented, the above behaviour would be different. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Zombie process still exists after stopping gdm.service
On Mon, Apr 20, 2015 at 6:04 PM, Lennart Poettering lenn...@poettering.net wrote: I have stepped through and I think that systemd is being too aggressive. Still running with the default KillMode=cgroup, here is what happens: 1. service_enter_stop() is entered which calls: service_enter_signal(s, SERVICE_STOP_SIGTERM, SERVICE_SUCCESS); 2. service_enter_signal sends SIGTERM to all gdm processes. No, if you use KillMode=mixed (as you say you do) it will only send SIGTERM to the main process of gdm. Only bleeding edge gdm has KillMode=mixed. I'm using a slightly older version which has the default KillMode=cgroup. Sorry for the confusion. 3. gdm simple-slave's signal handler triggers, which causes the mainloop to exit, and it starts to kill and wait for the X server death. I'm not exactly sure why, but quitting the glib mainloop also causes the signal handler to be destroyed, so sigaction() is called here to return SIGTERM to its default behaviour. 4. Moments later we arrive in systemd's service_sigchld_event(), presumably because the main gdm process exited due to SIGTERM. s-main_pid == pid. If PID 1 gets the SIGCHLD for the main process then it assumes the service has finished correctly, and will kill the rest that might remain. Even if we already killed the rest just a few milliseconds ago (in #2)? 7. To make things even worse, after sending the SIGTERMs, service_enter_signal hits: } else if (state == SERVICE_FINAL_SIGTERM) service_enter_signal(s, SERVICE_FINAL_SIGKILL, SERVICE_SUCCESS); Hmm? if we managed to kill something we'll arm the timeout and wait for sigchld or cgroup empty or similar. These shortcuts only take place if we couldn't kill anything because there was nothing. And hence the second killing will have no effect either, but at least we go through the state engine... I added logging to sys_kill at the kernel level, and I definitely observe systemctl stop gdm causing PID 1 to kill gdm-simple-slave 3 times (TERM, TERM, KILL) within the space of a few milliseconds. I will look closer tomorrow to explain in more detail what is going on at the code level. Thanks for your help! Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Zombie process still exists after stopping gdm.service
On Mon, Apr 20, 2015 at 8:24 AM, Lennart Poettering lenn...@poettering.net wrote: On Sun, 19.04.15 09:34, Andrei Borzenkov (arvidj...@gmail.com) wrote: В Fri, 17 Apr 2015 14:04:18 -0600 Daniel Drake dr...@endlessm.com пишет: Hi, I'm investigating why systemctl stop gdm; Xorg usually fails. The new X process complains that X is still running. Here's what I think is happening: 1. systemd sends SIGTERM to gdm to stop the service 2. gdm exits - it has a simple SIGTERM handler which just quits the mainloop without doing any cleanup (as far as I can see, it doesn't make any attempt to kill the child X server) 3. X exits because of PR_SET_PDEATHSIG (i.e. it's set to be automatically killed when the parent goes away). The killed process enters defunct state and is reparented to PID 1, presumably also moving it out of the gdm cgroup. No, it remains in cgroup. Otherwise systemd service management would not be possible at all ... 4. systemd notes that gdm's cgroup is empty and decides that gdm is now successfully stopped. I looked at display-manager.service here and it sets KillMode=process. That is better explanation to your observation. Hmm, it does? It does not on Fedora. Also display-manager.service is just an alias to gdm.service on Fedora. Daniel, can you check with systemctl cat gdm what your distro configures there? gdm git does have KillMode=mixed, but the slightly old gdm I'm running here also does not have any KillMode assignment. I'm investigating further at the moment. I've found a mistake in what I wrote earlier - when gdm receives SIGTERM it *does* do a kill/waitpid() on the child X server. However the process seems to disappear before waitpid() returns - currently trying to understand why. Ideas welcome. Thanks for the help. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Zombie process still exists after stopping gdm.service
On Mon, Apr 20, 2015 at 9:04 AM, Lennart Poettering lenn...@poettering.net wrote: maybe the main gdm process is not the one waiting, but a worker process is, and the main process kills the worker process without the worker process handling that nicely? Not really. I removed all the process-killing code from gdm and the problem is still there. I have stepped through and I think that systemd is being too aggressive. Still running with the default KillMode=cgroup, here is what happens: 1. service_enter_stop() is entered which calls: service_enter_signal(s, SERVICE_STOP_SIGTERM, SERVICE_SUCCESS); 2. service_enter_signal sends SIGTERM to all gdm processes. 3. gdm simple-slave's signal handler triggers, which causes the mainloop to exit, and it starts to kill and wait for the X server death. I'm not exactly sure why, but quitting the glib mainloop also causes the signal handler to be destroyed, so sigaction() is called here to return SIGTERM to its default behaviour. 4. Moments later we arrive in systemd's service_sigchld_event(), presumably because the main gdm process exited due to SIGTERM. s-main_pid == pid. We respond as follows: case SERVICE_STOP_SIGTERM: case SERVICE_STOP_SIGKILL: if (!control_pid_good(s)) service_enter_stop_post(s, f); 5. Inside service_enter_stop post, there is no command to execute, so we call: service_enter_signal(s, SERVICE_FINAL_SIGTERM, SERVICE_SUCCESS); 6. service_enter_signal causes all remaining gdm processes to receive SIGTERM again, only moments after the previous one. As gdm simple-slave now has the default SIGTERM handler (instant death), it dies, before it has finished the X server cleanup :( 7. To make things even worse, after sending the SIGTERMs, service_enter_signal hits: } else if (state == SERVICE_FINAL_SIGTERM) service_enter_signal(s, SERVICE_FINAL_SIGKILL, SERVICE_SUCCESS); So, moments after sending 2 SIGTERMs, SIGKILL is sent to all gdm processes. There does not seem to be any consideration of giving the process some time to respond to SIGTERMs, nor the fact that I have hacked gdm.service to have SendSIGKILL=no as an experiment. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Zombie process still exists after stopping gdm.service
Hi, I'm investigating why systemctl stop gdm; Xorg usually fails. The new X process complains that X is still running. Here's what I think is happening: 1. systemd sends SIGTERM to gdm to stop the service 2. gdm exits - it has a simple SIGTERM handler which just quits the mainloop without doing any cleanup (as far as I can see, it doesn't make any attempt to kill the child X server) 3. X exits because of PR_SET_PDEATHSIG (i.e. it's set to be automatically killed when the parent goes away). The killed process enters defunct state and is reparented to PID 1, presumably also moving it out of the gdm cgroup. 4. systemd notes that gdm's cgroup is empty and decides that gdm is now successfully stopped. 5. systemctl returns and now Xorg is launched immediately. Xorg reads the PID of the old Xorg process from /tmp, and notices that that PID is still in use (it is still an unreaped zombie) because kill() doesn't return an error. Xorg aborts thinking that it is already running. 6. Moments later, systemd reaps the zombie. Oops, too late. Does that make sense? I wonder how it is best to fix this. Is it a bug that systemd decided that gdm.service had stopped before it had reaped zombie processes that originally belonged to gdm? Is it a gdm bug that killing gdm doesn't make any attempt to reap X before going away itself? (they chose PR_SET_PDEATHSIG to do something similar, but maybe we have to argue that it is not quite sufficient) Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] udevd: fix synchronization with settle when handling inotify events
On Sat, Apr 11, 2015 at 5:13 AM, David Herrmann dh.herrm...@gmail.com wrote: Nice catch! There's indeed a small race between handling inotify and queuing up the change-event. We need to re-loop there. One day we should switch to sd-event to avoid such bugs... I mean the symptom is inherent to queuing up events while handling them. Meh! Thanks for reviewing this. Reading your comment, I wonder if there is a small bug in the solution here. Sometimes we may handle inotify events, but without generating change events. After my change, we will loop again, but there may be no events pending, in which case we will block on the 3 second timeout before completing the next loop iteration and replying to settle's ping message. Do you agree? Should I improve this to only do the extra loop iteration in the case where we generated change events, or somehow make the next loop iteration have timeout 0 (non-blocking)? Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH] udevd: fix synchronization with settle when handling inotify events
udev uses inotify to implement a scheme where when the user closes a writable device node, a change uevent is forcefully generated. In the case of block devices, it actually requests a partition rescan. This currently can't be synchronized with udevadm settle, i.e. this is not reliable in a script: sfdisk --change-id /dev/sda 1 81 udevadm settle mount /dev/sda1 /foo The settle call doesn't synchronize there, so at the same time we try to mount the device, udevd is busy removing the partition device nodes and readding them again. The mount call often happens in that moment where the partition node has been removed but not readded yet. This exact issue was fixed long ago: http://git.kernel.org/cgit/linux/hotplug/udev.git/commit/?id=bb38678e3ccc02bcd970ccde3d8166a40edf92d3 but that fix is no longer valid now that sequence numbers are no longer used. Fix this by forcing another mainloop iteration after handling inotify events before unblocking settle. If the inotify event caused us to generate a change event, we'll pick that up in the following loop iteration, before we reach the end of the loop where we respond to settle's control message, unblocking it. --- src/udev/udevd.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/udev/udevd.c b/src/udev/udevd.c index 830aedd..dfecef8 100644 --- a/src/udev/udevd.c +++ b/src/udev/udevd.c @@ -1504,9 +1504,22 @@ int main(int argc, char *argv[]) { continue; /* device node watch */ -if (is_inotify) +if (is_inotify) { handle_inotify(udev); +/* + * settle might be waiting on us to determine the queue + * state. If we just handled an inotify event, we might have + * generated a change event, but we won't have queued up + * the resultant uevent yet. + * + * Before we go ahead and potentially tell settle that the + * queue is empty, lets loop one more time to update the + * queue state again before deciding. + */ +continue; +} + /* tell settle that we are busy or idle, this needs to be before the * PING handling */ -- 2.1.0 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Reliably waiting for udevd to finish processing triggered events
Hi, On Sun, Mar 8, 2015 at 3:50 PM, Lennart Poettering lenn...@poettering.net wrote: On Fri, 06.03.15 14:22, Daniel Drake (dr...@endlessm.com) wrote: To my knowledge newer versions don't do this anymore and actively watch drm devices coming. I'm describing the behaviour of the newer version here. The issue is current. It does watch drm devices but if it gets indication that all udev events have been processed and still there is no usable drm device, it will give up on drm and launch into text mode. No, applications should not watch the queue. And the file is internal to udev anyway. If you watch it, you get to keep the pieces. The plymouth behaviour I described is achieved by using the public libudev API, udev_queue_get_queue_is_empty() and (the exact equivalent of) udev_queue_get_fd(). Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Reliably waiting for udevd to finish processing triggered events
Hi, I'm looking at some issues with the plymouth boot splash system, and why it intermittently fails to get graphics on screen. plymouth watches for the creation of drm display devices during boot. If it finds one, it starts a graphical splash and that is that. However, if the system finishes loading drivers and no drm device is available, it falls back onto a fbdev-based splash or a text-based boot. Once it has made that choice there is no turning back, it basically ignores drm devices if they become available later. In order to know when the system has finished loading drivers, plymouth does the same as udevadm settle - it uses udev API's to inotify-monitor /run/udev, and it assumes that when the queue file is deleted, all driver load events have been processed. But there seem to be a couple of problems associated with this. Firstly, plymouth does the above when it loads in the initramfs. The initramfs will trigger udev events for all devices, but if systemd finds the root filesystem before plymouth finds the drm device, udevd is immediately killed by systemd as it changes to switch-root.target. udevd has not processed the drm device at this point, so udev_device_get_is_initialized() returns false when plymouth inquires. As udevd is killed, it removes /run/udev/queue in its exit path; plymouth sees this and (like udevsettle would) assumes that this apparently empty queue means that driver loading is complete. But no drm devices are available and initialized, so it falls back to textual boot for the rest of boot. The killing of udev seems heavy-handed here, and the way it removes the queue file on exit (without first at least going through the already-pending events) seems to kill any possibility of a program like udevsettle or plymouth knowing if udev finished loading all drivers while the initramfs transitions to the real root. Secondly, there is a race during startup. udevd launches and it actually removes /run/udev/queue (if it were to exist) in the first iteration of the mainloop - even before it checked if any events were available to process. Anyway, we would normally expect the queue to be empty here, it is only after udevd has started up that systemd then goes on to run udevadm trigger and generate events for udevd to handle. In the case where plymouth is run from the real root (instead of the initramfs), once trigger has exited, systemd starts plymouth, which then starts immediately using udev_queue_get_queue_is_empty() to do the detection described above. If plymouth happens to do that before udevd has gotten around to processing the first event generated by udevtrigger, the queue is reported as empty (udevd has not created the marker yet), so plymouth concludes that driver loading has completed. Oops. I believe the same race exists with udevadm settle, if it is launched at that same moment it could hit the same race. The only difference is that udevadm settle uses some internal udev API that actually sends a ping to udevd before it checks the queue status. That likely reduces the probability of the race, but I think it is still there, as I can't see any guarantee that udevd would create the queue file before responding to the ping (it only creates the queue file at the start of the next iteration of the main loop, assuming that it had noted the pending events in the previous iteration where it also handled the ping). If there's a way of running udevadm trigger and then reliably knowing that udevd has finished processing those events, I haven't found it. Any hints much appreciated. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd-fsck-root semantics
Hi, I'm trying to understand dracut/systemd fsck behaviour, in the context of an ext4 filesystem root mounted read-only from dracut, remaining read-only even when the system is fully booted (kiosk-style). I see that systemd's fstab-generator rightly creates a mount unit for /sysroot from the initramfs, and causes e2fsck to be run on it from inside the dracut initramfs, before it is mounted. So far so good. Then the system continues booting, switches root, and then system-fsck-root.service starts from the root fs, and runs fsck on / again. This is the bit I don't understand - we already checked from the initramfs, why check again now? There used to be a marker file in /run to let systemd know that the initramfs already checked it, but that was removed in commit 956eaf2b8d6c024705ddadc7393bc707de02. Also, systemd-fsck-root.service in itself seems a little questionable, is it really safe in any context to run fsck on a mounted partition? That could modify data structures which have already been cached in memory in the kernel fs driver. In fact, e2fsck refuses to run on partitions that are mounted, even ones that are ro. Thanks for any clarification. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-fsck-root semantics
On Wed, Jul 2, 2014 at 1:13 PM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: Thinking about it, I'm not sure how the new systemd would know that systemd-fsck@dev-something.service from the initramfs is the same thing as systemd-fsck-root.service. Maybe that's the problem? Currently systemd-fsck-root.service does nothing if / is mounted rw, which of course is used by almost everybody, so I think you might be using codepaths that are rarely tested. If I'm reading things right, actually the default behaviour is (when no hints are supplied in kernel cmdline) : 1. systemd runs fsck on root from initramfs 2. systemd mounts root fs ro 3. switch-root onto real system 4. systemd-fsck-root runs 5. systemd-remount-fs remounts / as rw Also just noticed another interesting thing - systemd-fsck-root.service is only loaded dynamically, when /etc/fstab has a non-zero passno for /. So maybe the idea is that anyone running a regular and modern dracut/systemd setup sets passno=0 for / in fstab, with the knowledge that fsck of / is done by the initramfs. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-fsck-root semantics
On Wed, Jul 2, 2014 at 1:36 PM, Lennart Poettering lenn...@poettering.net wrote: Then the system continues booting, switches root, and then system-fsck-root.service starts from the root fs, and runs fsck on / again. This is the bit I don't understand - we already checked from the initramfs, why check again now? I think the idea is that the unit is still around, hence won't get started a second time. dracut doesn't include systemd-fsck-root in the initramfs. I think there is good reason for that - systemd-fsck-root causes fsck to run on /, but at this point in the initramfs, / is a ramdisk and the thing that needs checking is at /sysroot. Also, systemd-fsck-root.service in itself seems a little questionable, is it really safe in any context to run fsck on a mounted partition? That could modify data structures which have already been cached in memory in the kernel fs driver. In fact, e2fsck refuses to run on partitions that are mounted, even ones that are ro. Well this is how things were traditionally done on initrd-less systems. It's really a horrible thing to do, and people really shouldn't do it. I certainly wouldn't run my systems like that. I agree, but am a little worried that systemd might do this kind-of by default. I now realise that this is a distro choice, they should probably set passno=0 in fstab, I wonder if they actually do... Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udev interferes with sfdisk partition changes
On Mon, Nov 19, 2012 at 7:57 PM, Lennart Poettering lenn...@poettering.net wrote: parted is actually capable of doing this properly and settles the device. Have you looked into that? Looks like as of version 3.0, parted can no longer resize partitions. The functionality got dropped. Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] udev interferes with sfdisk partition changes
Hi, At OLPC we use sfdisk to grow a partition from the initramfs on first boot. However, we are finding this to be unreliable. Working with Fedora 18, systemd-195, and OLPC XO-4 hardware. The core of the problem seems to be udev's response to BLKRRPART - the ioctl used to ask the kernel to re-read the partition table. When BLKRRPART is executed, udevadm monitor shows KERNEL events for the device partitions being removed, and then added again - even when no partition change has happened. A quick and easy way to fire off a BLKRRPART (and nothing else): sfdisk -R /dev/mmcblk1 By inserting some printks in the kernel, I can see that firing off BLKRRPART causes systemd-udevd to open and close the device, presumably in response to the KERNEL events mentioned above, even when no partitioning changes have happened. With that background, here is what happens when sfdisk is run to modify the partition table on a device which is fully settled and is not mounted: 1. sfdisk fires off BLKRRPART immediately, before making any changes, to check that the device isn't in use. 2. This causes the KERNEL events, and udev opens and closes the device shortly after. 3. sfdisk writes the modified new partition table 4. sfdisk fires BLKRRPART again asking the kernel to note the new partition table The problem here is that #2 runs in parallel, being a separate process. On a quite regular basis, the sequence actually happens like this: 1. sfdisk fires off BLKRRPART immediately, before making any changes, to check that the device isn't in use. 2. This causes the KERNEL events, and udev opens the device. 3. sfdisk writes the modified new partition table 4. sfdisk fires BLKRRPART again asking the kernel to note the new partition table. This fails, since the device is open. 5. udev closes the device. In such a case, sfdisk has failed to update the partition setup visible to the user, claiming that the device is probably mounted or something, for no immediately obvious reason (eh? the device wasn't in use by anyone!). This can be reproduced within seconds by running a script in a loop that uses sfdisk to do the following to an unmounted and otherwise unused SD card: 1. Make the first partition a little smaller 2. Grow the first partition to its full size Within seconds you hit the failure condition noted above. How can/should this be corrected? Its annoying that udev is performing the device open here, but given the events from the kernel, its probably a sensible thing to do. Should the kernel not be generating these events when the partition table hasn't changed? Or should sfdisk acknowledge these possible races and try these ioctls a few times in a loop before bailing out? Maybe this is all a consequence of the kernel's lack of lock this device down, I want to partition it interface? Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Transient hostname default behaviour
On Mon, Oct 29, 2012 at 7:19 PM, Lennart Poettering lenn...@poettering.net wrote: One more thing to add: It looks like /etc/sysconfig/network is still being parsed even though the above link suggests otherwise. Putting HOSTNAME=myhostname in /etc/sysconfig/network sets the default transient hostname. Hmm. That sounds like NM or so is reading the file and applying it? No, systemd does. In git it doesn't, but v195 does read /etc/sysconfig/network on Fedora if /etc/hostname is no good. That caused some of the above confusion. My other problem was that I did not have /etc/hostname available early enough during boot. Still can't explain all the behaviour I was seeing, but with that fixed, things seem to be behaving. It looks like some NM work may be pending here: https://bugzilla.redhat.com/show_bug.cgi?id=831735 And I filed a bug for the dhclient issue identified by Zbyszek https://bugzilla.redhat.com/show_bug.cgi?id=871521 Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Argument quoting in Exec lines
Hi, Not sure whether to submit a bug report or documentation patch for this. ExecStart=/usr/bin/foo --arg1=foo bar Causes foo to be run with 2 command line args: 1. --arg1=foo 2. bar Not what I was hoping for. Whereas: ExecStart=/usr/bin/foo --arg1=foo bar does what I want, just 1 command line arg: 1. --arg1=foo bar Took me a while to figure that out. Is this the desired behaviour? Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Set environment variable system-wide
On Wed, Aug 8, 2012 at 10:56 AM, Lennart Poettering lenn...@poettering.net wrote: I don't think anything can be considered clean if it involves setting system-wide env vars. There must be another way to teach Python optimization system-wide... I have yet to find the other way that you mention. Anyway, I can agree that this is more of a python problem than a systemd one. Thanks for the info! Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Set environment variable system-wide
Hi, I thought I read somewhere that systemd offers a mechanism to set an environment variable system-wide - i.e. the variable assignment will be present in all the processes started by systemd. But I can't find where I read this, or how to use it. Does this functionality exist or am I getting confused with something else? Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Set environment variable system-wide
On Mon, Aug 6, 2012 at 3:09 PM, Kay Sievers k...@vrfy.org wrote: systemctl set-environment ... ? Maybe thats what I read about. In this case I'm looking to set it in early boot though, so that it affects all spawned processes from the very start. Is there a nice way of doing this? But it's in almost all use cases wrong to use anything like that isn't broken unix legacy that expects it that way. I'm having trouble parsing that sentence. In this case I'm looking for a clean way to set PYTHONOPTIMIZE system wide (to enable Python's optimized bytecode usage). Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Diagnosing hang on reboot
Hi, We're encountering a systemd hang on reboot which is proving hard to debug, on the OLPC XO platform (systemd-44 on Fedora 17). It doesn't happen every time, but it is frequent: when running a system that reboots once every 2-3 minutes, it reproduces with an hour (usually much quicker). Can anyone suggest debugging techniques for the following situation, or are there similar-sounding bug reports already that might provide clues? - /sbin/reboot is run, and exits with code 0, without producing any output on stderr or stdout. - the reboot process is definitely initiated, because plymouth's shutdown screen comes up, and the serial console getty is stopped - the hang happens with the plymouth shutdown splash on-screen, and the system continues responding to keypresses (showing/hiding the plymouth splash) - disabling the plymouth shutdown splash doesn't solve the hang, and no interesting messages appear on the console either - the system no longer responds to sysrq over serial (even when the kernel sysrq_always_enabled parameter is used) - the shutdown scripts in /usr/lib/systemd/system-shutdown are not called - enabling systemd debugging via kernel parameters systemd.log_level=debug systemd.log_target=kmsg causes the hang not to happen (left a system reboot-looping with this configuration for 24 hours without hitting the issue) Any tips appreciated. This is perhaps unlikely to be a systemd issue, because when we reboot from a normal session, we don't hit this issue (but I think systemd could help us find the problem?). We hit this issue when rebooting after running our manufacturing tests, which aim to hammer the system very hard and activate as many components as possible (microphone, camera, screen, disk, RAM check, ...). These tests are activated as follows: 1. During boot, runin-check.service (runs early) notes that the laptop's manufacturing data says that the system should run manufacturing tests rather than starting a real session. The runin-check program then calls systemctl isolate runin.target 2. runin.target starts the runin-main program which opens an X session and kicks off all kinds of tests Here are the debug logs from a successful boot-to-reboot cycle (when things work OK): http://dev.laptop.org/~dsd/20120704/runin-verbose.txt At 15.991475, runin-check runs systemctl isolate runin.target At 18.969571, runin tests start At 30.505676, runin tests fail and the reboot process is initiated. (I deliberately triggered the fail so that I don't have to wait a long time for the reboot to happen) At 36.818280, /sbin/reboot is called by runin At 46.956082 the scripts in /usr/lib/systemd/system-shutdown are called Here are the relevant service/target files: runin-check.service: [Unit] Description=Check whether to run OLPC run-in tests DefaultDependencies=no Requires=olpc-configure.service After=olpc-configure.service Before=basic.target [Service] Type=oneshot ExecStart=/runin/runin-check [Install] WantedBy=basic.target runin.target: [Unit] Description=OLPC run-in tests AllowIsolate=true DefaultDependencies=no Requires=runin.service After=olpc-configure.service Wants=plymouth-quit.service plymouth-quit-wait.service runin.service: [Unit] Description=OLPC run-in tests DefaultDependencies=no Wants=udev-settle.service After=udev-settle.service plymouth-quit.service plymouth-quit-wait.service [Service] ExecStart=/runin/runin-main Any help appreciated; this is currently the last blocking bug we have preventing our latest software image (our first systemd-based release!) from entering mass-production in the factory. Thanks! Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Showing plymouth shutdown splash earlier during shutdown process
On Fri, Jun 1, 2012 at 4:03 AM, Michal Schmidt mschm...@redhat.com wrote: On 05/31/2012 05:46 PM, Daniel Drake wrote: In the case of reboot (or poweroff), what does this mean? plymouth-reboot.service is queued to start, and prefdm.service is queued to stop. What does After= mean in this context, who comes first? 'man systemd.unit' says: If one unit with an ordering dependency on another unit is shut down while the latter is started up, the shut down is ordered before the start-up regardless whether the ordering dependency is actually of type After= or Before=. Thanks for pointing that out. It is like it is waiting for those services to stop before executing. How can I find out why? Based on the above rule, check all the ordering dependencies the unit has: systemctl show -p After -p Before plymouth-reboot.service I followed this down a couple of levels and didn't find the answer. Probably need to go further, I'll see if I can find some time to do that soon. Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Showing plymouth shutdown splash earlier during shutdown process
On Wed, Apr 11, 2012 at 10:51 AM, Daniel Drake d...@laptop.org wrote: On Wed, Apr 11, 2012 at 9:42 AM, Lennart Poettering lenn...@poettering.net wrote: I tried modifying e.g. plymouth-reboot.service to have: Before=reboot.service shutdown.target umount.target final.target reboot.target That suggests that the plymouth client tool is not waiting for the operation to finish but just asynchonrously queueing the reuest, which is something that should be fixed in plymouth. You're probably right, but before we get there, even with the above Before= change, systemd seems to be starting plymouth-reboot.service rather late in the process. Logs from a reboot with the Before= change made as above: http://dev.laptop.org/~dsd/20120411/shutdown2.txt Any ideas? Bump. I filed a bug for the plymouth-quitting-before-command-processed issue: https://bugs.freedesktop.org/show_bug.cgi?id=50544 and I worked around it locally. But still, the plymouth splash is being shown late in the process, as shown in the above log. plymouth-reboot.service has After=getty@tty1.service prefdm.service plymouth-start.service Before=reboot.service In the case of reboot (or poweroff), what does this mean? plymouth-reboot.service is queued to start, and prefdm.service is queued to stop. What does After= mean in this context, who comes first? Either way, plymouth-reboot.service seems to be run a long time after prefdm finishes - about 3.5 seconds. And after running it a few times I am seeing that it *always* starts after a whole bunch of other services have been stopped - in the above log: diskspacerecover.service, alsa-store.service, systemd-random-seed-save.service, and maybe more. It is like it is waiting for those services to stop before executing. How can I find out why? Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Don't skip bind mounts on shutdown
Hi, On Wed, Apr 25, 2012 at 9:46 AM, Daniel Drake d...@laptop.org wrote: This reverts commits d72238fcb34abc81aca97c5fb15888708ee937d3 and f3accc08. OLPC runs / as a bind-mount, so this must be remounted RO during shutdown to avoid corruption. As Lennert can't recall the exact reasons for making the shutdown code skip bind mounts, revert to previous behaviour to solve the issue for OLPC. http://lists.freedesktop.org/archives/systemd-devel/2012-April/004957.html Any news on this patch? Thanks Daniel --- src/core/umount.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/src/core/umount.c b/src/core/umount.c index 488e1e4..85b7824 100644 --- a/src/core/umount.c +++ b/src/core/umount.c @@ -37,7 +37,6 @@ typedef struct MountPoint { char *path; dev_t devnum; - bool skip_ro; LIST_FIELDS (struct MountPoint, mount_point); } MountPoint; @@ -72,8 +71,6 @@ static int mount_points_list_get(MountPoint **head) { for (i = 1;; i++) { int k; MountPoint *m; - char *root; - bool skip_ro; path = p = NULL; @@ -81,7 +78,7 @@ static int mount_points_list_get(MountPoint **head) { %*s /* (1) mount id */ %*s /* (2) parent id */ %*s /* (3) major:minor */ - %ms /* (4) root */ + %*s /* (4) root */ %ms /* (5) mount point */ %*s /* (6) mount options */ %*[^-] /* (7) optional fields */ @@ -90,8 +87,7 @@ static int mount_points_list_get(MountPoint **head) { %*s /* (10) mount source */ %*s /* (11) mount options 2 */ %*[^\n], /* some rubbish at the end */ - root, - path)) != 2) { + path)) != 1) { if (k == EOF) break; @@ -101,11 +97,6 @@ static int mount_points_list_get(MountPoint **head) { continue; } - /* If we encounter a bind mount, don't try to remount - * the source dir too early */ - skip_ro = !streq(root, /); - free(root); - p = cunescape(path); free(path); @@ -131,7 +122,6 @@ static int mount_points_list_get(MountPoint **head) { } m-path = p; - m-skip_ro = skip_ro; LIST_PREPEND(MountPoint, mount_point, *head, m); } @@ -448,11 +438,6 @@ static int mount_points_list_remount_read_only(MountPoint **head, bool *changed) LIST_FOREACH_SAFE(mount_point, m, n, *head) { - if (m-skip_ro) { - n_failed++; - continue; - } - /* Trying to remount read-only */ if (mount(NULL, m-path, NULL, MS_MGC_VAL|MS_REMOUNT|MS_RDONLY, NULL) == 0) { if (changed) -- 1.7.10 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH] Don't skip bind mounts on shutdown
This reverts commits d72238fcb34abc81aca97c5fb15888708ee937d3 and f3accc08. OLPC runs / as a bind-mount, so this must be remounted RO during shutdown to avoid corruption. As Lennert can't recall the exact reasons for making the shutdown code skip bind mounts, revert to previous behaviour to solve the issue for OLPC. http://lists.freedesktop.org/archives/systemd-devel/2012-April/004957.html --- src/core/umount.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/src/core/umount.c b/src/core/umount.c index 488e1e4..85b7824 100644 --- a/src/core/umount.c +++ b/src/core/umount.c @@ -37,7 +37,6 @@ typedef struct MountPoint { char *path; dev_t devnum; -bool skip_ro; LIST_FIELDS (struct MountPoint, mount_point); } MountPoint; @@ -72,8 +71,6 @@ static int mount_points_list_get(MountPoint **head) { for (i = 1;; i++) { int k; MountPoint *m; -char *root; -bool skip_ro; path = p = NULL; @@ -81,7 +78,7 @@ static int mount_points_list_get(MountPoint **head) { %*s/* (1) mount id */ %*s/* (2) parent id */ %*s/* (3) major:minor */ -%ms/* (4) root */ +%*s/* (4) root */ %ms/* (5) mount point */ %*s/* (6) mount options */ %*[^-] /* (7) optional fields */ @@ -90,8 +87,7 @@ static int mount_points_list_get(MountPoint **head) { %*s/* (10) mount source */ %*s/* (11) mount options 2 */ %*[^\n], /* some rubbish at the end */ -root, -path)) != 2) { +path)) != 1) { if (k == EOF) break; @@ -101,11 +97,6 @@ static int mount_points_list_get(MountPoint **head) { continue; } -/* If we encounter a bind mount, don't try to remount - * the source dir too early */ -skip_ro = !streq(root, /); -free(root); - p = cunescape(path); free(path); @@ -131,7 +122,6 @@ static int mount_points_list_get(MountPoint **head) { } m-path = p; -m-skip_ro = skip_ro; LIST_PREPEND(MountPoint, mount_point, *head, m); } @@ -448,11 +438,6 @@ static int mount_points_list_remount_read_only(MountPoint **head, bool *changed) LIST_FOREACH_SAFE(mount_point, m, n, *head) { -if (m-skip_ro) { -n_failed++; -continue; -} - /* Trying to remount read-only */ if (mount(NULL, m-path, NULL, MS_MGC_VAL|MS_REMOUNT|MS_RDONLY, NULL) == 0) { if (changed) -- 1.7.10 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Safe handling of root filesystem on shutdown
Hi Lennart, On Thu, Apr 12, 2012 at 8:46 AM, Daniel Drake d...@laptop.org wrote: The mmcblk0p2 message above suggests that / is being re-mounted readonly, and also on next boot the system no longer complains about / not being cleanly unmounted. Tested with 3 reboots to be sure. Reverting these commits seems like a good solution to me. If you go ahead with this, I'd also appreciate it if you could apply the fix to the F17 package next time you are touching things there. Bump :) Can these patches be reverted then? If it makes your life easier, I've attached a patch to do so. At this point I'd also like to get this sorted in F17 sooner rather than later. If you don't object, I'll patch this into the F17/F18 packages and submit an update once it is fixed in systemd git. Thanks, Daniel 0001-Don-t-skip-bind-mounts-on-shutdown.patch Description: Binary data ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Safe handling of root filesystem on shutdown
On Thu, Apr 12, 2012 at 4:56 AM, Lennart Poettering lenn...@poettering.net wrote: I think I added this logic primarily to make the shutdown loop quiet. However I must admit that that's just a guess and since my commit message is disappointingly unconclusive about this I am a bit lost... If you revert f3accc08, do things look good for you then? Do you get any log spew on shutdown? I had to revert d72238fcb34abc81aca97c5fb15888708ee937d3 first. Then I reverted f3accc08, and modified systemd-shutdown to log to kmsg so that I could see the messages before power-down. [ 441.206413] systemd-shutdown[1]: Sending SIGTERM to remaining processes... [ 441.239944] systemd-shutdown[1]: Sending SIGKILL to remaining processes... [ 441.263633] systemd-shutdown[1]: Unmounting file systems. [ 441.280554] systemd-shutdown[1]: Unmounted /var/lib/random-seed. [ 441.297471] systemd-shutdown[1]: Unmounted /var/lib/dhclient. [ 441.320312] systemd-shutdown[1]: Unmounted /var/lib/dbus. [ 441.340072] systemd-shutdown[1]: Unmounted /dev/hugepages. [ 441.355911] systemd-shutdown[1]: Unmounted /sys/kernel/debug. [ 441.372049] systemd-shutdown[1]: Unmounted /dev/mqueue. [ 441.387525] systemd-shutdown[1]: Unmounted /home. [ 441.751119] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null) [ 441.831283] systemd-shutdown[1]: Disabling swaps. [ 441.846084] systemd-shutdown[1]: Detaching loop devices. [ 441.864999] systemd-shutdown[1]: Detaching DM devices. [ 442.965933] ACPI: Preparing to enter system sleep state S5 [ 443.080153] Power down. The mmcblk0p2 message above suggests that / is being re-mounted readonly, and also on next boot the system no longer complains about / not being cleanly unmounted. Tested with 3 reboots to be sure. Reverting these commits seems like a good solution to me. If you go ahead with this, I'd also appreciate it if you could apply the fix to the F17 package next time you are touching things there. Thanks! Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Safe handling of root filesystem on shutdown
Hi, On OLPC laptops we are seeing that ext4 complains on every boot that the filesystem wasn't cleanly unmounted. Looking at systemd debug logs of a shutdown would seem to agree, I can't see where it attempts to remount / read-only as was done with sysvinit. http://dev.laptop.org/~dsd/20120411/shutdown.txt Can anyone point out how this is supposed to work - where is the code that looks after the / mount during shutdown/reboot? We do have a bit of a strange fs-layout, where our root fs is kept inside /versions/pristine/X on the root partition. The initramfs takes care of this with some bind-mount and chroot tricks so that it looks 'normal' afterwards, but maybe something along these lines is confusing systemd. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Showing plymouth shutdown splash earlier during shutdown process
Hi, As can be seen in my logs of a reboot: http://dev.laptop.org/~dsd/20120411/shutdown.txt The plymouth shutdown splash is being shown really quite late. As systemd shuts down fantastically fast, this means that our pretty shutdown graphic is not being drawn on OLPC laptops. Sometimes the image is drawn partially, and sometimes it is not drawn at all. (A few seconds of systemd output text is always visible though) Is there a way to make the plymouth shutdown screen appear earlier? I tried modifying e.g. plymouth-reboot.service to have: Before=reboot.service shutdown.target umount.target final.target reboot.target However this didn't produce any noticable difference. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Safe handling of root filesystem on shutdown
On Wed, Apr 11, 2012 at 9:40 AM, Lennart Poettering lenn...@poettering.net wrote: So on shutdown after stopping all services we execute systemd-shutdown as PID 1 replacing the normal systemd process. This is useful to drop all references to files on disk, so that we can remount the disk r/o even on upgrades. systemd-shutdown is basically a single loop that tries to umount/read-only mount all file systems it finds as long as this changes the list of active mounts. This code also disables all swaps and detachs DM/loop devices in the same loop. Thanks as always for the fast and good explanation! With that pointer, I found the problem, see below. We do have a bit of a strange fs-layout, where our root fs is kept inside /versions/pristine/X on the root partition. The initramfs takes care of this with some bind-mount and chroot tricks so that it looks 'normal' afterwards, but maybe something along these lines is confusing systemd. chroot()? Meh, you should not use chroot for these kinds of things... Actually, we don't use chroot directly. Here's what happens: dracut mounts the root fs at /sysroot, then in a pre-pivot dracut trigger OLPC does: mkdir /vsysroot mount --bind /sysroot/versions/run/6 /vsysroot umount /sysroot NEWROOT=/vsysroot Dracut then goes ahead and performs switch_root on $NEWROOT to pivot onto the real system. (Happy to hear advice on a nicer way to do this) When the system finishes booting, /proc/self/mountinfo looks like: http://dev.laptop.org/~dsd/20120411/mountinfo.txt Now, in systemd-shutdown we reach mount_points_list_get() in umount.c, which does: /* If we encounter a bind mount, don't try to remount * the source dir too early */ skip_ro = !streq(root, /); Hence skip_ro gets set to 1 for our / mount_points_list_remount_read_only() then ignores the / mount and leaves it as RW during shutdown. I don't really understand the reasoning for the above behaviour of bind mounts. Would it be acceptable to special-case this condition if the path in question is / so that skip_ro does not get set? Or are there other options available? Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Showing plymouth shutdown splash earlier during shutdown process
On Wed, Apr 11, 2012 at 9:42 AM, Lennart Poettering lenn...@poettering.net wrote: I tried modifying e.g. plymouth-reboot.service to have: Before=reboot.service shutdown.target umount.target final.target reboot.target That suggests that the plymouth client tool is not waiting for the operation to finish but just asynchonrously queueing the reuest, which is something that should be fixed in plymouth. You're probably right, but before we get there, even with the above Before= change, systemd seems to be starting plymouth-reboot.service rather late in the process. Logs from a reboot with the Before= change made as above: http://dev.laptop.org/~dsd/20120411/shutdown2.txt Any ideas? Thanks Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries
On Tue, Apr 10, 2012 at 6:21 AM, Kay Sievers k...@vrfy.org wrote: Libattr and libcap are gone now from the tools which do not need them: http://cgit.freedesktop.org/systemd/systemd/commit/?id=d7832d2c6e0ef5f2839a2296c1cc2fc85c7d9632 Great! Thanks for slimming up my initramfs a bit :) Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries
Hi, On Tue, Nov 1, 2011 at 4:39 PM, Lennart Poettering lenn...@poettering.net wrote: Hmm, let me see if I get this right: with this patch applied we'd build cap and selinux support into libsystemd-basic.la, but we wouldn't link against the respective libraries but instead do that in the binaries which pull in the .la? I am not sure I like this. I mean, I understand the goal, and it's a good one, but I think if we do this we should do this properly, and split up util.c so that the stuff that uses caps and selinux is independent of the rest and can be pulled in individually as needed. This is true for the libcap case - libcap is only used by util.c so is easy to split out. But with selinux included, the task is more complicated. For example, label.c (part of libsystemd-basic) also uses libselinux, so we need to move it out somewhere else (lets say we put it in a new library: libsystemd-extra). But the label_ functions are used several places inside util.c itself. Things are tangled. If I were to go down this path further I think we'd end up moving a huge amount of stuff to libsystemd-extra. Instead, do any of the following options make sense? - Special-case systemd-timestamp because it's used in the initramfs. Instead of linking against libselinux-basic just pull in util.c directly into the compilation and link with -lrt. - Create a new shared library used in compilation (libsystemd-verybasic?), initially only containing the time-related functions used by systemd-timestamp. Link systemd-timestamp against that, and be happy. - While linking executables (or immediately after), perform some checks to see if the linked libraries are *really* necessary, and if they aren't, drop the links. vim does this via the attached script. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries
On Wed, Apr 4, 2012 at 10:36 AM, Daniel Drake d...@laptop.org wrote: But with selinux included, the task is more complicated. For example, label.c (part of libsystemd-basic) also uses libselinux, so we need to move it out somewhere else (lets say we put it in a new library: libsystemd-extra). But the label_ functions are used several places inside util.c itself. Things are tangled. If I were to go down this path further I think we'd end up moving a huge amount of stuff to libsystemd-extra. I just realised that udevd links against libselinux, so even if we fix selinux-timestamp I still won't be winning on that front - and I don't see an easy way to keep udevd out of a dracut initramfs. However, dropping the link against libcap (which also includes libattr) would be nice. Here is a patch to do that. Now that udev is included in systemd I will use this opportunity to moan a little about the next dependency lover that gets included in the initramfs: udevadm. /usr/bin/udevadm linux-gate.so.1 = (0xb771f000) libselinux.so.1 = /lib/libselinux.so.1 (0xb76b9000) libblkid.so.1 = /lib/libblkid.so.1 (0xb768f000) libkmod.so.2 = /lib/libkmod.so.2 (0xb7677000) librt.so.1 = /lib/librt.so.1 (0xb766e000) libc.so.6 = /lib/libc.so.6 (0xb74be000) libdl.so.2 = /lib/libdl.so.2 (0xb74b9000) /lib/ld-linux.so.2 (0x4610a000) libuuid.so.1 = /lib/libuuid.so.1 (0xb74b3000) liblzma.so.5 = /lib/liblzma.so.5 (0xb748a000) libz.so.1 = /lib/libz.so.1 (0xb7474000) libpthread.so.0 = /lib/libpthread.so.0 (0xb7459000) Don't suppose there is any obvious reduction possible here? Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries
On Wed, Apr 4, 2012 at 10:53 AM, Daniel Drake d...@laptop.org wrote: However, dropping the link against libcap (which also includes libattr) would be nice. Here is a patch to do that. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries
On Wed, Apr 4, 2012 at 2:02 PM, Kay Sievers k...@vrfy.org wrote: Right, when udevadm is there, then there is udevd, which definitely needs all of them. Thats a good point - and if udevd really needs them, then there's no escaping. So I guess there is nothing to gain here :( Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd hangs on shutdown
On Tue, Oct 11, 2011 at 1:44 AM, Lennart Poettering lenn...@poettering.net wrote: So this is a the big issue here I believe. If you look at 87.293308 you'll see that tmp.mount is suddenly mounted again for some reason, which systemd then takes as hint to get rid of poweroff.target/poweroff.service, since they conflict with that. It key to the mystery here is figuring out why systemd suddenly sees those mount points coming back. It would be good to figure out what the mount table is when that happens. Thanks for looking carefully at this! It looks like the problem is that we had /tmp mounted as tmpfs, then mounted as tmpfs again on top. We've had this for a long time (unintentionally), but it hadn't surfaced as an issue until now - we didn't even realise. After removing the duplicate mount setup so that /tmp is only mounted once, the system shuts down. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Excessive linking of systemd-timestamp
Hi, Running systemd-36 on Fedora 16. $ ldd /lib/systemd/systemd-timestamp linux-gate.so.1 = (0x00a84000) libselinux.so.1 = /lib/libselinux.so.1 (0x0059c000) libcap.so.2 = /lib/libcap.so.2 (0x00901000) librt.so.1 = /lib/librt.so.1 (0x00a6a000) libc.so.6 = /lib/libc.so.6 (0x0011) /lib/ld-linux.so.2 (0x009c1000) libdl.so.2 = /lib/libdl.so.2 (0x00f09000) libattr.so.1 = /lib/libattr.so.1 (0x00f23000) libpthread.so.0 = /lib/libpthread.so.0 (0x007be000) The excessive linking of this tiny application is challenging my efforts to keep our initramfs slim for our embedded setup. dracut includes this app in the initramfs by default, and to satisfy its requirements it results in all those libraries getting added too. Could this be reduced? I guess all it needs is libc. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd hangs on shutdown
On Thu, Sep 29, 2011 at 2:29 PM, Daniel Drake d...@laptop.org wrote: Full log of startup, shutdown, and sysrq dumps at point of first hang (before systemd-stdout-syslog-bridge.service wakeup), and second hang, then sysrq dumps again: http://dev.laptop.org/~dsd/20110929/systemd-shutdown-hang-debug.txt I've tried to correspond this to the systemd source and unit files and I think I might have found something of relevance. One of the last links in the chain is that poweroff.service gets started and calls systemctl --force poweroff, right? In my log, poweroff.service gets installed to be run: [ 57.887771] systemd[1]: Installed new job poweroff.service/start as 243 but never gets run. By that I mean: When other services are queued to be started, they later get started with About to execute messages e.g. [ 57.941081] systemd[1]: Installed new job alsa-store.service/start as 250 [ 60.373390] systemd[1]: About to execute: /sbin/alsactl store [ 60.450713] systemd[1]: Forked /sbin/alsactl as 1505 [ 60.456367] systemd[1]: alsa-store.service changed dead - start However, the poweroff.service never gets any of the 'about to execute', 'forked' or 'dead-start' messages. It actually gets stopped for some reason, perhaps before it has had a chance to do its thing? [ 57.887771] systemd[1]: Installed new job poweroff.service/start as 243 snip [ 87.312551] systemd[1]: Installed new job poweroff.service/stop as 347 [ 87.340953] systemd[1]: Job poweroff.service/stop finished, result=done However, I think it should have been ready to run. from poweroff.service, its requirements/dependencies are: Requires=shutdown.target umount.target final.target After=shutdown.target umount.target final.target and all of those seem to have finished: [ 87.255264] systemd[1]: Job shutdown.target/start finished, result=done [ 87.284275] systemd[1]: Job final.target/start finished, result=done [ 87.353693] systemd[1]: Job umount.target/stop finished, result=done Am I onto something here, or am I going in the wrong direction? Debugging tips much appreciated. cheers Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Problems with rootfs over nfs
On 15 May 2011 15:16, Kay Sievers kay.siev...@vrfy.org wrote: Just a first quick check of an issue we ran into with ATA disks: what's in /proc/sys/kernel/hotplug before you shut down? Or what's CONFIG_UEVENT_HELPER in your kernel setup, it must be = on modern systems, otherwise the kernel will they to exec() binaries all the time and keep the system's rootfs busy. I'm also having trouble shutting down with systemd, and I have CONFIG_UEVENT_HELPER_PATH=/sbin/hotplug So I'll try this solution. Thanks. Just a quick question: is the same also true for Fedora 14 (upstart-1.2, udev-161)? i.e. can and should that config option be cleared under that setup too? I guess so, given that /sbin/hotplug doesn't even exist. Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Failed to connect to socket /org/freedesktop/systemd1/private
On 10 May 2011 22:26, Daniel Drake d...@laptop.org wrote: If I log the error after it tries to connect to /run/systemd/private, I get: Failed to connect to socket /run/systemd/private: No such file or directory Indeed, there's nothing at that path, and the only thing in /run/systemd/ is an empty directory at /run/systemd/ask-password For completeness, this is a conflict between systemd and Fedora's readonly-root system. Filed at https://bugzilla.redhat.com/show_bug.cgi?id=704783 Now systemd is a bit happier :) Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Failed to connect to socket /org/freedesktop/systemd1/private
On 9 May 2011 22:46, Lennart Poettering lenn...@poettering.net wrote: You are lacking autofs4 support in the kernel. You should fix this first. I'm not, autofs4 is present, we're working on figuring out on why systemd complains at https://bugs.freedesktop.org/show_bug.cgi?id=36993 Your udev in your initrd is a different version than on your system. You should really fix that too. Ah, that explains that part. Done. /etc/mtab is not a symlink to /proc/mounts. Please fix. Done. Your systemd userspace seems to be out-of-date, and not from the same package that installed /bin/systemd. i.e. the abstract namespace socket /org/freedesktop/systemd1/private was used in older systemd versions, but has since moved to /run/systemd/private. Your userspace still tries to access the old socket, but systemd 26 (which you appear to be running) uses the new one. This image is freshly made, starting from an empty disk, from Fedora rawhide as of yesterday, so there should be no mixing going on. But I've debugged it a little to see why its using the wrong path: We are reaching bus_connect() in dbus-common.c. It first tries to connect to /run/systemd/private, but this fails. It then falls back on /org/freedesktop/systemd1/private. This fails too. So it then returns error and it only logs the last error (which was about /org/freedesktop/systemd1/private) If I log the error after it tries to connect to /run/systemd/private, I get: Failed to connect to socket /run/systemd/private: No such file or directory Indeed, there's nothing at that path, and the only thing in /run/systemd/ is an empty directory at /run/systemd/ask-password So I then looked on the systemd side of things, bus_init_private() in dbus.c does create this socket just fine (via to dbus_server_listen). So the question is when and why does it disappear? I sprinkled debug statements throughout the code and determined that the socket disappears after manager_loop() has iterated around 47 times. Does this give you any ideas? Any suggestions for next debugging steps? Is there an easy way to make manager_loop() log exactly what it does on each iteration? Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd fails to boot OLPC XO-1.5
On 7 May 2011 23:43, Daniel Drake d...@laptop.org wrote: On 7 May 2011 23:30, Kay Sievers kay.siev...@vrfy.org wrote: You need capabilities in your kernel, or comment its use out, in the service file. I think I have capabilities in my kernel: CONFIG_SECURITY=y which means security/capability.c gets compiled in. Were you thinking of something else? Commenting out CapabilityBoundingSet from systemd-kmsg-syslogd.service does fix the issue and allow boot to continue. Thanks! Is this a systemd bug (maybe it should ignore CapabilityBoundingSet lines when capabilities aren't available?) or do I need to decide between hacking systemd unit files or going with this requirement? I looked further. systemd.exec man page pointed me to capabilities(7) man page. That man page says: Removing capabilities from the bounding set is only supported if file capabilities are compiled into the kernel (CONFIG_SECURITY_FILE_CAPA- BILITIES). That option doesn't exist in the kernel any more, it was removed by: commit b3a222e52e4d4be77cc4520a57af1a4a0d8222d1 Author: Serge E. Hallyn se...@us.ibm.com Date: Mon Nov 23 16:21:30 2009 -0600 remove CONFIG_SECURITY_FILE_CAPABILITIES compile option That commit made it be unconditionally on, in agreement with this part of security/Makefile in modern kernels: # always enable default capabilities obj-y += commoncap.o So, I don't think its possible to build a kernel without capabilities support. The problem must be something else (but commenting out those CapabilityBoundingSet lines does work around the problem). Any ideas / next debugging steps? I filed a bug for the /sys/kernel/security problem: https://bugs.freedesktop.org/show_bug.cgi?id=36993 Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Failed to connect to socket /org/freedesktop/systemd1/private
Hi, Another systemd error I am encountering is: Failed to get D-Bus connection: Failed to connect to socket /org/freedesktop/systemd1/private: Connection refused The message appears a lot throughout boot, full logs here: http://dev.laptop.org/~dsd/20110509/systemd-boot.txt It also means I can't run systemctl to diagnose other issues I am having: # systemctl status sys-kernel-security.automount Failed to get D-Bus connection: Failed to connect to socket /org/freedesktop/systemd1/private: Connection refused Any thoughts or things to check for? Thanks, Daniel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel