Re: [systemd-devel] Bad accelerometer values cause incorrect screen rotation

2019-09-06 Thread Daniel Drake
On Thu, Sep 5, 2019 at 9:00 PM Bastien Nocera  wrote:
> Daniel, if you run into many more problems, there's also the
> possibility of adding a boot argument to disable the accelerometer (or
> maybe its effects?), either in iio-sensor-proxy or gnome-shell.

Thanks for the suggestion, manually adding something through the
bootloader menu may indeed be a bit more practical than the laptop
acrobatics workaround.
For cases where we know which driver is used this can probably already
be done, by adding a modprobe.blacklist= boot arg.

I appreciate the quick action on the HP laptop case. Let's see how
much that reduces the problem occurance rate. Thanks!

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Bad accelerometer values cause incorrect screen rotation

2019-09-05 Thread Daniel Drake
On Thu, Sep 5, 2019 at 6:07 PM Bastien Nocera  wrote:
> I've read through this, and I'm happy blacklisting the hp_accel driver
> in code. For the other devices, I'd rather leave it as-is.

That would indeed avoid most problem cases that I've seen, and the
current case, probably enough to stop me grumbling for another year or
so until this happens again in some other context :)
So I support that idea. Do you have any preference on where we blacklist it?

In the hwdb it's quite easy to match DMI vendor HP & driver lis3lv02d.
But we'd really want a new way of saying "ignore the accelerometer" as
ACCEL_POSITION=base doesn't seem like the right way to express that.

Or we could blacklist it in iio-sensor-proxy but since there's no
mention of hp_accel in the udev properties for the device (you just
get the driver as li3lv02d) then you'd need to grab the DMI vendor
name from /sys/class/dmi/id or something like that.

> > When this unfortunate situation happens, the user experience is
> > really
> > terrible. Except for workarounds that involve going to the command
> > line, the best workaround under GNOME seems to be to physically
> > rotate
> > the device into a position that causes the screen orientation to be
> > normal/unrotated, then while maintaining and holding the device in
> > that highly awkward position with one hand, try your very best to
> > manipulate the mouse cursor with your other hand and navigate the
> > menu
> > to enable Orientation Lock.
>
> FYI, Windows+O in GNOME to toggle the orientation lock setting.

Good to know, thanks! I just tried though and it's also seriously
difficult... Especially because the Windows key is quite a distance
from the O key, it's really hard to press this key combo with one hand
when you're busy trying to sustain the device at a fixed angle in an
awkward position with your other hand.

> Where would we get this information? From the same DSDT that doesn't
> have enough information? That doesn't sound like a good idea.

My initial idea is DMI/DSDT plus a whitelist, I realise its not ideal
but I'm trying to think towards something that (in my eyes) would be
better than the current state.

> If we disable iio-sensor-proxy's functionality by default, I'll be sent
> more bug reports than I already receive from folks where the sensor
> drivers aren't working or not compiled in, so that's a big no-no from
> me.

In my eyes, having some users that accidently don't get their screens
rotated by the accelerometer (with a relatively simple fix of
whitelisting the product) is a better outcome than having some users
that go through the miserable experience of having your screen rotated
incorrectly (which is hard to recover from and tricky for a developer
to fix without physical device access). This may just be a difference
of opinion.

> Also, it would be pretty trivial changing the default GNOME
> configuration to have the accelerometer pegged to the default
> orientation.

I appreciate the suggestion especially if its trivial but I don't
understand what you wrote here- can you explain a bit more?

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] Bad accelerometer values cause incorrect screen rotation

2019-09-05 Thread Daniel Drake
Hi,

Over the years we've seen a bunch of reports of systems that
automatically rotate the display to some incorrect orientation, based
on trusting some accelerometer data values which were not interpreted
correctly. I have another affected system in hand here.

When this unfortunate situation happens, the user experience is really
terrible. Except for workarounds that involve going to the command
line, the best workaround under GNOME seems to be to physically rotate
the device into a position that causes the screen orientation to be
normal/unrotated, then while maintaining and holding the device in
that highly awkward position with one hand, try your very best to
manipulate the mouse cursor with your other hand and navigate the menu
to enable Orientation Lock.

Since the effects of this issue when it bites are so bad, and because
it seems like we aren't winning the "quirk the accelerometer" game
here, I'm wondering if it's time for us to restrict this default
setting of automatic rotation based on accelerometer data to only
situations where:
 1. The product is actually designed to be usable when rotated, and
 2. We have a higher degree of confidence that we're actually
interpreting the accelerometer data correctly


Why are we not winning? Why can't we fix this properly?

I think we're suffering largely through applying this auto-rotation
behaviour to all accelerometer data, from setups where previously
nobody really cared if the data was misinterpreted, or the data was
specifically interpreted for a different context (we're specifically
interested in measuring the physical orientation of the screen, but
accelerometers have other uses too).

Windows 10 (and presumably 8) does have the automatic screen rotation
feature based on accelerometer data, but it seems to apply to fewer
products. For example it does not apply automatic rotation to the
Quanta NL3 classmate nor to the HP EliteBook 840 G3, two systems that
I have in hand that both required specific engineering on Linux after
real users had already run into the horrible
automatic-incorrect-rotation described above:
https://github.com/systemd/systemd/commit/ebf482e7cdabfc1266a86ec8a5f92a964ce08afe
hp_accel: fix accelerometer orientation for EliteBook 840 (patch
posted today, no link yet)

The challenge here is a lack of standardization of how accelerometers
are installed relative to the screen, and a lack of a standard way of
accessing model-specific data that gives us this info. Without any
better options we've been trying to create and maintain our own
databases, for example systemds 60-sensor.hwdb and Linux kernel's
hp_accel.c, but that's turning out to be problematic because:
 1. The databases entries are mostly created retroactively - usually,
entries are created when a tech-savvy user steps forward to share the
required data, after one or more users have already been bitten by the
issue. This is sub-standard.
 2. We estimate the right way to distinguish models for different
quirks by hoping that DMI data will serve this purpose, but we also
don't know how to do that reliably, so sometimes we even apply the
wrong quirks. Two recent examples:
https://bugzilla.redhat.com/show_bug.cgi?id=1717712 (more on this case below)
hp_accel: fix accelerometer orientation for EliteBook 840 (patch
posted today, no link yet)

Bastien once made the suggestion that we could fish the model-to-quirk
mapping from the Windows drivers, but I can't find anything in the HP
driver. On HP EliteBook 840 the device is not even exposed as a sensor
under Windows and I can't find any way of accessing the data or making
it auto-rotate - maybe they don't even have such a mapping? The only
Windows application of this sensor seems to be automatic hard disk
head parking, which presumably just detects sudden movements in any
direction.

We did recently work with some Acer all-in-one PCs which had an
accelerometer which also provided working auto-rotation under windows
out of the box, while again producing the wrong and awkward behaviour
on Linux. Thanks to vendor contacts we did discover the scheme used,
and now automatically detect the accelerometer orientation on such
products.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f38ab20b749da84e3df1f8c9240ddc791b0d5983
However, we then found DSDTs with this orientation data that far
predated this patch's existence. So not a great win; our solution was
not made in timely fashion.

ACPI offers something that might help - PLD can be used to describe
the physical orientation of product components. But I don't think
we've seen any examples of this data being provided by vendors for
accelerometers.

I see the latest development of having the hwdb specify whether the
accelerometer is in the base or the display of the device. This was
implemented for dealing with a device with accelerometers in both
positions (https://github.com/hadess/iio-sensor-proxy/pull/262) -
clearly the screen rotation should only follow thy 

Re: [systemd-devel] Debugging active timers that do not trigger

2018-11-16 Thread Daniel Drake
On Thu, Nov 15, 2018 at 7:04 PM Michal Koutný  wrote:
> @Daniel, is it possible there are some daemon-reloads running
> concurrently with the timer? More precisely, can it happen the timer
> expires exactly when systemd reloads?

I don't think so. The journal only show a single "systemd[1]:
Reloading." message and that happened as part of our initramfs
scripts, before the real-root systemd was run.

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Debugging active timers that do not trigger

2018-11-11 Thread Daniel Drake
On Thu, Nov 8, 2018 at 6:46 PM Andrei Borzenkov  wrote:
> It is possible that system never ends booting. Do you have any pending
> jobs (systemctl list-jobs)? What "systemctl is-system-running" says?

Thanks for the suggestion! It sounds like a good one - I did reproduce
this on first boot and we do have a known issue in that area affecting
systemd's perception of boot completion.
https://gitlab.gnome.org/GNOME/gdm/issues/439

Unfortunately I wasn't able to leave the system in that state after
all, so I can't check directly any more, but I'll do more testing
along these lines.

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Debugging active timers that do not trigger

2018-11-07 Thread Daniel Drake
Hi,

On Endless we have the following eos-autoupdater.timer:

  [Unit]
  Description=Endless OS Automatic Update Timer
  Documentation=man:eos-autoupdater(8)
  ConditionKernelCommandLine=!endless.live_boot
  ConditionKernelCommandLine=ostree

  [Timer]
  OnBootSec=15m
  OnUnitInactiveSec=1h
  RandomizedDelaySec=30min

  [Install]
  WantedBy=multi-user.target

This ordinarily works fine, but we have seen a couple of random, rare
occasions where this timer doesn't trigger the target
eos-autoupdater.service. I have one case here in front of me now with
details below.

In the list-timers output you can see it has "n/a" for NEXT/LAST etc.
There is no evidence of eos-autoupdater.service having started at any
point in the journal (nor any crashes).

This is not a major concern as it seems to only happen rarely, and
fixes itself upon reboot. Also so far we have only reproduced this on
systemd-237; it's hard to judge whether it's fixed in a newer version
due to the low occurance rate of the issue. But I would be curious if
there are any easy debugging steps I can follow when we see this -
I'll leave the system running in this state for a couple of days in
case there are suggestions.

$ systemctl status eos-autoupdater.timer
● eos-autoupdater.timer - Endless OS Automatic Update Timer
   Loaded: loaded (/lib/systemd/system/eos-autoupdater.timer; enabled;
vendor preset: enabled)
   Active: active (elapsed) since Wed 2018-11-07 15:11:14 CST; 23h ago
  Trigger: n/a
 Docs: man:eos-autoupdater(8)

Nov 07 15:11:14 endless systemd[1]: Started Endless OS Automatic Update Timer.

$ systemctl status eos-autoupdater.service
● eos-autoupdater.service - Endless OS Automatic Updater
   Loaded: loaded (/lib/systemd/system/eos-autoupdater.service;
indirect; vendor preset: enabled)
   Active: inactive (dead)
 Docs: man:eos-autoupdater(8)


$ systemctl list-timers
NEXT LEFT  LAST
 PASSED   UNIT ACTIVATES
Thu 2018-11-08 15:34:06 CST  1h 17min left Wed 2018-11-07 15:26:02 CST
 22h ago  systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Thu 2018-11-08 17:10:45 CST  2h 54min left Thu 2018-11-08 14:10:44 CST
 5min ago eos-phone-home.timer eos-phone-home.service
Mon 2018-11-12 00:00:00 CST  3 days left   n/a
 n/a  fstrim.timer fstrim.service
n/a  n/a   n/a
 n/a  eos-autoupdater.timereos-autoupdater.service
n/a  n/a   Wed 2018-11-07 15:27:05 CST
 22h ago  systemd-readahead-done.timer systemd-readahead-done.service



$ systemctl show eos-autoupdater.timer
Unit=eos-autoupdater.service
NextElapseUSecMonotonic=infinity
LastTriggerUSecMonotonic=0
Result=success
AccuracyUSec=1min
RandomizedDelayUSec=30min
Persistent=no
WakeSystem=no
RemainAfterElapse=yes
Id=eos-autoupdater.timer
Names=eos-autoupdater.timer
Requires=sysinit.target
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=timers.target multi-user.target eos-autoupdater.service shutdown.target
After=sysinit.target
Triggers=eos-autoupdater.service
Documentation=man:eos-autoupdater(8)
Description=Endless OS Automatic Update Timer
LoadState=loaded
ActiveState=active
SubState=elapsed
FragmentPath=/lib/systemd/system/eos-autoupdater.timer
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Wed 2018-11-07 15:26:36 CST
StateChangeTimestampMonotonic=934682450
InactiveExitTimestamp=Wed 2018-11-07 15:11:14 CST
InactiveExitTimestampMonotonic=13380144
ActiveEnterTimestamp=Wed 2018-11-07 15:11:14 CST
ActiveEnterTimestampMonotonic=13380144
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Wed 2018-11-07 15:11:14 CST
ConditionTimestampMonotonic=13380053
AssertTimestamp=Wed 2018-11-07 15:11:14 CST
AssertTimestampMonotonic=13380122
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=c1bf78021112483db79c39221fd58d80
CollectMode=inactive


$ ls -l /var/lib/systemd/timers/
total 0
-rw-r--r-- 1 root root 0 Nov  7 15:11 stamp-fstrim.timer


Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] man: document kill behavior after the main process exits

2015-04-23 Thread Daniel Drake
On Thu, Apr 23, 2015 at 9:32 AM, Lennart Poettering
lenn...@poettering.net wrote:
 +titleBeyond the main process/title
 +
 + paraThe varnameKillMode=/varname option primarily defines
 + behavior up until the point where the main process has gone away.
 + systemd expects that when killed with the signal specified by
 + varnameKillSignal=/varname, the main process will kill and
 + reap all the other processes in the control group before
 + exiting itself.

 Well, I don't think this is right. I mean, systemd doesn't really
 expect this. It's completely OK if daemons leave children around in
 this case.

I could avoid the word expect but I think it's worth mentioning as
those discarded children might not be designed to accept 2 SIGTERMs in
normal conditions.

For example, any child process that uses glib and exits the mainloop
from the SIGTERM handler does not really respond well here - it drops
the SIGTERM handler after the first one, so the second SIGTERM will
cause an immediate/unclean shutdown, which is not completely OK from
the view of the child.

 KillMode= is actually very much about the time after the main process
 died. If KillMode=process is specified systemd should not send any
 signal to anything but the main process, and that applies to both
 SIGTERm and the following SIGKILL:

I agree, which is why I specifically only talk about the cgroup/mixed
kill modes.

 + paraIf optionKillMode=control-group/option, systemd will
 + then send a second varnameKillSignal=/varname signal to the
 + remaining processes, which will then be followed by a
 + constantSIGKILL/constant if processes are still around, even
 + if optionSendSIGKILL=no/option./para

 Hmm, no? SendSIGKILL=no should have the effect of not sending any
 SIGKILL at all. Anything else would be a bug.

Must be a bug then; I confirmed this is actually what happens by
adding logging to the kill syscall implementation in the kernel.

 + paraOr, if optionKillMode=mixed/option, systemd will
 + directly send constantSIGKILL/constant to all remaining members
 + of the control group, regardless of the
 + varnameSendSIGKILL=/varname preference./para

 Hmm? No, not at all. If you use mixed, then SIGTERM is is sent to
 the main process of the daemon, and SIGKILL to *al* processes of the
 daemon if there are any left after the main process exited.

That's exactly what I wrote - all of this falls under a paragraph
explaining what happens when the main process has already gone. I
guess I need to improve the wording.

Thanks for your feedback

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] man: document kill behavior after the main process exits

2015-04-23 Thread Daniel Drake
While looking at the exact behavior of how systemd stops services,
I encountered some behavior that wasn't clear from reading the man
page.

Specifically, if the main process exits before its children, the child
processes will actually receive a second SIGTERM. If that doesn't
kill them, they will later receive a SIGKILL too, even if
SendSIGKILL=no. Add some notes about this.
---
 man/systemd.kill.xml | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

Thanks for helping me to get to the bottom of this in the thread:
  Zombie process still exists after stopping gdm.service

unit_kill_context() has a comment which is relevant here:

   /* FIXME: For now, we will not wait for the
* cgroup members to die, simply because
* cgroup notification is unreliable. It
* doesn't work at all in containers, and
* outside of containers it can be confused
* easily by leaving directories in the
* cgroup. */

   /* wait_for_exit = true; */

When this is fixed, assumed to happen soon, the precise behaviour seen
in the discussion will change slightly (in terms of timing).

So I have carefully written this documentation patch in a way that does
not go into the timing details. The text changed below should therefore
be true both before and after that FIXME is resolved.

diff --git a/man/systemd.kill.xml b/man/systemd.kill.xml
index e57f0e7..10232fb 100644
--- a/man/systemd.kill.xml
+++ b/man/systemd.kill.xml
@@ -154,8 +154,9 @@
 termvarnameSendSIGKILL=/varname/term
 listitemparaSpecifies whether to send
 constantSIGKILL/constant to remaining processes after a
-timeout, if the normal shutdown procedure left processes of
-the service around. Takes a boolean value. Defaults to yes.
+timeout, if the normal shutdown procedure didn't succeed in
+shutting down the main process. Takes a boolean value.
+Defaults to yes.
 /para/listitem
   /varlistentry
 
@@ -163,6 +164,31 @@
   /refsect1
 
   refsect1
+titleBeyond the main process/title
+
+ paraThe varnameKillMode=/varname option primarily defines
+ behavior up until the point where the main process has gone away.
+ systemd expects that when killed with the signal specified by
+ varnameKillSignal=/varname, the main process will kill and
+ reap all the other processes in the control group before
+ exiting itself. If that doesn't happen, and the main process
+ exits with other processes still running in the control group,
+ systemd gets a bit more heavy-handed:/para
+
+ paraIf optionKillMode=control-group/option, systemd will
+ then send a second varnameKillSignal=/varname signal to the
+ remaining processes, which will then be followed by a
+ constantSIGKILL/constant if processes are still around, even
+ if optionSendSIGKILL=no/option./para
+
+ paraOr, if optionKillMode=mixed/option, systemd will
+ directly send constantSIGKILL/constant to all remaining members
+ of the control group, regardless of the
+ varnameSendSIGKILL=/varname preference./para
+
+  /refsect1
+
+  refsect1
   titleSee Also/title
   para
 
citerefentryrefentrytitlesystemd/refentrytitlemanvolnum1/manvolnum/citerefentry,
-- 
2.1.0

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Zombie process still exists after stopping gdm.service

2015-04-21 Thread Daniel Drake
On Mon, Apr 20, 2015 at 6:29 PM, Lennart Poettering
lenn...@poettering.net wrote:
 Sure, we don't want to keep track of which processes we already
 killed, to distuingish them from the processes newly created in the
 time between our sending of SIGTERM and receiving SIGCHLD for the main
 process.

 We assume that if we get SIGCHLD for the main process that the daemon
 is down, and everything that is left over then is auxiliary stuff we
 can kill.

OK, doesn't sound unreasonable. Once we get to the end of this topic,
I'll submit a documentation patch to make that a bit clearer.

So, of the 3 signals (TERM, TERM, KILL) sent to gdm-simple-slave
within a total time of 0.01s, we have good explanations for the first
2.

The 3rd one (KILL) is still suspicious to me though. It is sent 0.4ms
after the preceding SIGTERM, here is what happens in the code:

1. gdm's main process exits due to the first SIGTERM. systemd becomes
aware in service_sigchld_event(), and responds as follows:

case SERVICE_STOP_SIGTERM:
case SERVICE_STOP_SIGKILL:
if (!control_pid_good(s))
service_enter_stop_post(s, f);

2. Inside service_enter_stop post, there is no command to execute, so we call:
service_enter_signal(s, SERVICE_FINAL_SIGTERM, SERVICE_SUCCESS);

3. service_enter_signal calls unit_kill_context() to send the second
SIGTERM. Looking at what happens inside unit_kill_context(): there is
no main process, nor control process, so we go straight to the cgroup
killing. The cgroup kill happens without error, and we reach the end
of the function:

return wait_for_exit;

wait_for_exit was not modified from its intial value (false) during
the course of the function, so false is returned here.

4. Back in service_enter_signal, since unit_kill_context returned
false, we do not arm the timer. Without hesitation systemd goes
directly and sends SIGKILL.

} else if (state == SERVICE_FINAL_SIGTERM)
service_enter_signal(s, SERVICE_FINAL_SIGKILL, SERVICE_SUCCESS)


I can understand that once the main PID goes away, systemd feels
welcome to get heavy handed with the remaining processes. But doing
SIGTERM and then immediately SIGKILL just a few microseconds later
seems strange - why not go straight for the SIGKILL?

There's a comment in unit_kill_context() which looks relevant here:

/* FIXME: For now, we will not wait for the
 * cgroup members to die, simply because
 * cgroup notification is unreliable. It
 * doesn't work at all in containers, and
 * outside of containers it can be confused
 * easily by leaving directories in the
 * cgroup. */

/* wait_for_exit = true; */

If that were uncommented, the above behaviour would be different.

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Zombie process still exists after stopping gdm.service

2015-04-20 Thread Daniel Drake
On Mon, Apr 20, 2015 at 6:04 PM, Lennart Poettering
lenn...@poettering.net wrote:
 I have stepped through and I think that systemd is being too
 aggressive. Still running with the default KillMode=cgroup, here is
 what happens:

 1. service_enter_stop() is entered which calls:
 service_enter_signal(s, SERVICE_STOP_SIGTERM, 
 SERVICE_SUCCESS);

 2. service_enter_signal sends SIGTERM to all gdm processes.

 No, if you use KillMode=mixed (as you say you do) it will only send
 SIGTERM to the main process of gdm.

Only bleeding edge gdm has KillMode=mixed. I'm using a slightly older
version which has the default KillMode=cgroup. Sorry for the
confusion.

 3. gdm simple-slave's signal handler triggers, which causes the
 mainloop to exit, and it starts to kill and wait for the X server
 death. I'm not exactly sure why, but quitting the glib mainloop also
 causes the signal handler to be destroyed, so sigaction() is called
 here to return SIGTERM to its default behaviour.

 4. Moments later we arrive in systemd's service_sigchld_event(),
 presumably because the main gdm process exited due to SIGTERM.
 s-main_pid == pid.

 If PID 1 gets the SIGCHLD for the main process then it assumes the
 service has finished correctly, and will kill the rest that might remain.

Even if we already killed the rest just a few milliseconds ago (in #2)?

 7. To make things even worse, after sending the SIGTERMs,
 service_enter_signal hits:
 } else if (state == SERVICE_FINAL_SIGTERM)
 service_enter_signal(s, SERVICE_FINAL_SIGKILL,
 SERVICE_SUCCESS);

 Hmm? if we managed to kill something we'll arm the timeout and wait
 for sigchld or cgroup empty or similar.

 These shortcuts only take place if we couldn't kill anything because
 there was nothing. And hence the second killing will have no effect
 either, but at least we go through the state engine...

I added logging to sys_kill at the kernel level, and I definitely
observe systemctl stop gdm causing PID 1 to kill gdm-simple-slave 3
times (TERM, TERM, KILL) within the space of a few milliseconds.
I will look closer tomorrow to explain in more detail what is going on
at the code level.

Thanks for your help!
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Zombie process still exists after stopping gdm.service

2015-04-20 Thread Daniel Drake
On Mon, Apr 20, 2015 at 8:24 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Sun, 19.04.15 09:34, Andrei Borzenkov (arvidj...@gmail.com) wrote:

 В Fri, 17 Apr 2015 14:04:18 -0600
 Daniel Drake dr...@endlessm.com пишет:

  Hi,
 
  I'm investigating why systemctl stop gdm; Xorg usually fails. The
  new X process complains that X is still running.
 
  Here's what I think is happening:
 
  1. systemd sends SIGTERM to gdm to stop the service
 
  2. gdm exits - it has a simple SIGTERM handler which just quits the
  mainloop without doing any cleanup (as far as I can see, it doesn't
  make any attempt to kill the child X server)
 
  3. X exits because of PR_SET_PDEATHSIG (i.e. it's set to be
  automatically killed when the parent goes away). The killed process
  enters defunct state and is reparented to PID 1, presumably also
  moving it out of the gdm cgroup.
 

 No, it remains in cgroup. Otherwise systemd service management would
 not be possible at all ...

  4. systemd notes that gdm's cgroup is empty and decides that gdm is
  now successfully stopped.
 

 I looked at display-manager.service here and it sets KillMode=process.
 That is better explanation to your observation.

 Hmm, it does? It does not on Fedora. Also display-manager.service is
 just an alias to gdm.service on Fedora.

 Daniel, can you check with systemctl cat gdm what your distro
 configures there?

gdm git does have KillMode=mixed, but the slightly old gdm I'm running
here also does not have any KillMode assignment.

I'm investigating further at the moment. I've found a mistake in what
I wrote earlier - when gdm receives SIGTERM it *does* do a
kill/waitpid() on the child X server.
However the process seems to disappear before waitpid() returns -
currently trying to understand why. Ideas welcome.

Thanks for the help.
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Zombie process still exists after stopping gdm.service

2015-04-20 Thread Daniel Drake
On Mon, Apr 20, 2015 at 9:04 AM, Lennart Poettering
lenn...@poettering.net wrote:
 maybe the main gdm process is not the one waiting, but a worker
 process is, and the main process kills the worker process without the
 worker process handling that nicely?

Not really. I removed all the process-killing code from gdm and the
problem is still there.

I have stepped through and I think that systemd is being too
aggressive. Still running with the default KillMode=cgroup, here is
what happens:

1. service_enter_stop() is entered which calls:
service_enter_signal(s, SERVICE_STOP_SIGTERM, SERVICE_SUCCESS);

2. service_enter_signal sends SIGTERM to all gdm processes.

3. gdm simple-slave's signal handler triggers, which causes the
mainloop to exit, and it starts to kill and wait for the X server
death. I'm not exactly sure why, but quitting the glib mainloop also
causes the signal handler to be destroyed, so sigaction() is called
here to return SIGTERM to its default behaviour.

4. Moments later we arrive in systemd's service_sigchld_event(),
presumably because the main gdm process exited due to SIGTERM.
s-main_pid == pid. We respond as follows:

case SERVICE_STOP_SIGTERM:
case SERVICE_STOP_SIGKILL:
if (!control_pid_good(s))
service_enter_stop_post(s, f);

5. Inside service_enter_stop post, there is no command to execute, so we call:
service_enter_signal(s, SERVICE_FINAL_SIGTERM, SERVICE_SUCCESS);

6. service_enter_signal causes all remaining gdm processes to receive
SIGTERM again, only moments after the previous one. As gdm
simple-slave now has the default SIGTERM handler (instant death), it
dies, before it has finished the X server cleanup :(

7. To make things even worse, after sending the SIGTERMs,
service_enter_signal hits:
} else if (state == SERVICE_FINAL_SIGTERM)
service_enter_signal(s, SERVICE_FINAL_SIGKILL, SERVICE_SUCCESS);

So, moments after sending 2 SIGTERMs, SIGKILL is sent to all gdm
processes. There does not seem to be any consideration of giving the
process some time to respond to SIGTERMs, nor the fact that I have
hacked gdm.service to have SendSIGKILL=no as an experiment.

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Zombie process still exists after stopping gdm.service

2015-04-17 Thread Daniel Drake
Hi,

I'm investigating why systemctl stop gdm; Xorg usually fails. The
new X process complains that X is still running.

Here's what I think is happening:

1. systemd sends SIGTERM to gdm to stop the service

2. gdm exits - it has a simple SIGTERM handler which just quits the
mainloop without doing any cleanup (as far as I can see, it doesn't
make any attempt to kill the child X server)

3. X exits because of PR_SET_PDEATHSIG (i.e. it's set to be
automatically killed when the parent goes away). The killed process
enters defunct state and is reparented to PID 1, presumably also
moving it out of the gdm cgroup.

4. systemd notes that gdm's cgroup is empty and decides that gdm is
now successfully stopped.

5. systemctl returns and now Xorg is launched immediately. Xorg reads
the PID of the old Xorg process from /tmp, and notices that that PID
is still in use (it is still an unreaped zombie) because kill()
doesn't return an error. Xorg aborts thinking that it is already
running.

6. Moments later, systemd reaps the zombie. Oops, too late.


Does that make sense?
I wonder how it is best to fix this. Is it a bug that systemd decided
that gdm.service had stopped before it had reaped zombie processes
that originally belonged to gdm?

Is it a gdm bug that killing gdm doesn't make any attempt to reap X
before going away itself? (they chose PR_SET_PDEATHSIG to do something
similar, but maybe we have to argue that it is not quite sufficient)

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] udevd: fix synchronization with settle when handling inotify events

2015-04-13 Thread Daniel Drake
On Sat, Apr 11, 2015 at 5:13 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Nice catch!

 There's indeed a small race between handling inotify and queuing up
 the change-event. We need to re-loop there. One day we should switch
 to sd-event to avoid such bugs... I mean the symptom is inherent to
 queuing up events while handling them. Meh!

Thanks for reviewing this. Reading your comment, I wonder if there is
a small bug in the solution here.

Sometimes we may handle inotify events, but without generating change
events. After my change, we will loop again, but there may be no
events pending, in which case we will block on the 3 second timeout
before completing the next loop iteration and replying to settle's
ping message.

Do you agree? Should I improve this to only do the extra loop
iteration in the case where we generated change events, or somehow
make the next loop iteration have timeout 0 (non-blocking)?

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] udevd: fix synchronization with settle when handling inotify events

2015-04-06 Thread Daniel Drake
udev uses inotify to implement a scheme where when the user closes
a writable device node, a change uevent is forcefully generated.
In the case of block devices, it actually requests a partition rescan.

This currently can't be synchronized with udevadm settle, i.e. this
is not reliable in a script:

 sfdisk --change-id /dev/sda 1 81
 udevadm settle
 mount /dev/sda1 /foo

The settle call doesn't synchronize there, so at the same time we try
to mount the device, udevd is busy removing the partition device nodes and
readding them again. The mount call often happens in that moment where the
partition node has been removed but not readded yet.

This exact issue was fixed long ago:
http://git.kernel.org/cgit/linux/hotplug/udev.git/commit/?id=bb38678e3ccc02bcd970ccde3d8166a40edf92d3

but that fix is no longer valid now that sequence numbers are no longer
used.

Fix this by forcing another mainloop iteration after handling inotify events
before unblocking settle. If the inotify event caused us to generate a
change event, we'll pick that up in the following loop iteration, before
we reach the end of the loop where we respond to settle's control message,
unblocking it.
---
 src/udev/udevd.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/udev/udevd.c b/src/udev/udevd.c
index 830aedd..dfecef8 100644
--- a/src/udev/udevd.c
+++ b/src/udev/udevd.c
@@ -1504,9 +1504,22 @@ int main(int argc, char *argv[]) {
 continue;
 
 /* device node watch */
-if (is_inotify)
+if (is_inotify) {
 handle_inotify(udev);
 
+/*
+ * settle might be waiting on us to determine the queue
+ * state. If we just handled an inotify event, we 
might have
+ * generated a change event, but we won't have 
queued up
+ * the resultant uevent yet.
+ *
+ * Before we go ahead and potentially tell settle that 
the
+ * queue is empty, lets loop one more time to update 
the
+ * queue state again before deciding.
+ */
+continue;
+}
+
 /* tell settle that we are busy or idle, this needs to be 
before the
  * PING handling
  */
-- 
2.1.0

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Reliably waiting for udevd to finish processing triggered events

2015-03-09 Thread Daniel Drake
Hi,

On Sun, Mar 8, 2015 at 3:50 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Fri, 06.03.15 14:22, Daniel Drake (dr...@endlessm.com) wrote:
 To my knowledge newer versions don't do this anymore and actively
 watch drm devices coming.

I'm describing the behaviour of the newer version here. The issue is
current. It does watch drm devices but if it gets indication that all
udev events have been processed and still there is no usable drm
device, it will give up on drm and launch into text mode.

 No, applications should not watch the queue. And the file is internal
 to udev anyway. If you watch it, you get to keep the pieces.

The plymouth behaviour I described is achieved by using the public
libudev API, udev_queue_get_queue_is_empty() and (the exact equivalent
of) udev_queue_get_fd().

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Reliably waiting for udevd to finish processing triggered events

2015-03-06 Thread Daniel Drake
Hi,

I'm looking at some issues with the plymouth boot splash system, and
why it intermittently fails to get graphics on screen.

plymouth watches for the creation of drm display devices during boot.
If it finds one, it starts a graphical splash and that is that.
However, if the system finishes loading drivers and no drm device is
available, it falls back onto a fbdev-based splash or a text-based
boot. Once it has made that choice there is no turning back, it
basically ignores drm devices if they become available later.

In order to know when the system has finished loading drivers,
plymouth does the same as udevadm settle - it uses udev API's to
inotify-monitor /run/udev, and it assumes that when the queue file is
deleted, all driver load events have been processed. But there seem to
be a couple of problems associated with this.

Firstly, plymouth does the above when it loads in the initramfs. The
initramfs will trigger udev events for all devices, but if systemd
finds the root filesystem before plymouth finds the drm device, udevd
is immediately killed by systemd as it changes to switch-root.target.
udevd has not processed the drm device at this point, so
udev_device_get_is_initialized() returns false when plymouth inquires.
As udevd is killed, it removes /run/udev/queue in its exit path;
plymouth sees this and (like udevsettle would) assumes that this
apparently empty queue means that driver loading is complete. But no
drm devices are available and initialized, so it falls back to textual
boot for the rest of boot.

The killing of udev seems heavy-handed here, and the way it removes
the queue file on exit (without first at least going through the
already-pending events) seems to kill any possibility of a program
like udevsettle or plymouth knowing if udev finished loading all
drivers while the initramfs transitions to the real root.


Secondly, there is a race during startup. udevd launches and it
actually removes /run/udev/queue (if it were to exist) in the first
iteration of the mainloop - even before it checked if any events were
available to process. Anyway, we would normally expect the queue to be
empty here, it is only after udevd has started up that systemd then
goes on to run udevadm trigger and generate events for udevd to
handle.

In the case where plymouth is run from the real root (instead of the
initramfs), once trigger has exited, systemd starts plymouth, which
then starts immediately using udev_queue_get_queue_is_empty() to do
the detection described above. If plymouth happens to do that before
udevd has gotten around to processing the first event generated by
udevtrigger, the queue is reported as empty (udevd has not created the
marker yet), so plymouth concludes that driver loading has completed.
Oops.

I believe the same race exists with udevadm settle, if it is
launched at that same moment it could hit the same race. The only
difference is that udevadm settle uses some internal udev API that
actually sends a ping to udevd before it checks the queue status. That
likely reduces the probability of the race, but I think it is still
there, as I can't see any guarantee that udevd would create the queue
file before responding to the ping (it only creates the queue file at
the start of the next iteration of the main loop, assuming that it had
noted the pending events in the previous iteration where it also
handled the ping).


If there's a way of running udevadm trigger and then reliably
knowing that udevd has finished processing those events, I haven't
found it. Any hints much appreciated.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-fsck-root semantics

2014-07-02 Thread Daniel Drake
Hi,

I'm trying to understand dracut/systemd fsck behaviour, in the context
of an ext4 filesystem root mounted read-only from dracut, remaining
read-only even when the system is fully booted (kiosk-style).

I see that systemd's fstab-generator rightly creates a mount unit for
/sysroot from the initramfs, and causes e2fsck to be run on it from
inside the dracut initramfs, before it is mounted. So far so good.


Then the system continues booting, switches root, and then
system-fsck-root.service starts from the root fs, and runs fsck on /
again. This is the bit I don't understand - we already checked from
the initramfs, why check again now?

There used to be a marker file in /run to let systemd know that the
initramfs already checked it, but that was removed in commit
956eaf2b8d6c024705ddadc7393bc707de02.

Also, systemd-fsck-root.service in itself seems a little questionable,
is it really safe in any context to run fsck on a mounted partition?
That could modify data structures which have already been cached in
memory in the kernel fs driver. In fact, e2fsck refuses to run on
partitions that are mounted, even ones that are ro.

Thanks for any clarification.
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-fsck-root semantics

2014-07-02 Thread Daniel Drake
On Wed, Jul 2, 2014 at 1:13 PM, Zbigniew Jędrzejewski-Szmek
zbys...@in.waw.pl wrote:
 Thinking about it, I'm not sure how the new systemd would know that
 systemd-fsck@dev-something.service from the initramfs is the same
 thing as systemd-fsck-root.service. Maybe that's the problem?

 Currently systemd-fsck-root.service does nothing if / is mounted rw,
 which of course is used by almost everybody, so I think you might
 be using codepaths that are rarely tested.

If I'm reading things right, actually the default behaviour is (when
no hints are supplied in kernel cmdline) :
 1. systemd runs fsck on root from initramfs
 2. systemd mounts root fs ro
 3. switch-root onto real system
 4. systemd-fsck-root runs
 5. systemd-remount-fs remounts / as rw

Also just noticed another interesting thing -
systemd-fsck-root.service is only loaded dynamically, when /etc/fstab
has a non-zero passno for /. So maybe the idea is that anyone running
a regular and modern dracut/systemd setup sets passno=0 for / in
fstab, with the knowledge that fsck of / is done by the initramfs.

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-fsck-root semantics

2014-07-02 Thread Daniel Drake
On Wed, Jul 2, 2014 at 1:36 PM, Lennart Poettering
lenn...@poettering.net wrote:
 Then the system continues booting, switches root, and then
 system-fsck-root.service starts from the root fs, and runs fsck on /
 again. This is the bit I don't understand - we already checked from
 the initramfs, why check again now?

 I think the idea is that the unit is still around, hence won't get
 started a second time.

dracut doesn't include systemd-fsck-root in the initramfs. I think
there is good reason for that - systemd-fsck-root causes fsck to run
on /, but at this point in the initramfs, / is a ramdisk and the thing
that needs checking is at /sysroot.

 Also, systemd-fsck-root.service in itself seems a little questionable,
 is it really safe in any context to run fsck on a mounted partition?
 That could modify data structures which have already been cached in
 memory in the kernel fs driver. In fact, e2fsck refuses to run on
 partitions that are mounted, even ones that are ro.

 Well this is how things were traditionally done on initrd-less
 systems. It's really a horrible thing to do, and people really shouldn't
 do it. I certainly wouldn't run my systems like that.

I agree, but am a little worried that systemd might do this kind-of by
default. I now realise that this is a distro choice, they should
probably set passno=0 in fstab, I wonder if they actually do...

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] udev interferes with sfdisk partition changes

2012-11-26 Thread Daniel Drake
On Mon, Nov 19, 2012 at 7:57 PM, Lennart Poettering
lenn...@poettering.net wrote:
 parted is actually capable of doing this properly and settles the
 device. Have you looked into that?

Looks like as of version 3.0, parted can no longer resize partitions.
The functionality got dropped.

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] udev interferes with sfdisk partition changes

2012-11-14 Thread Daniel Drake
Hi,

At OLPC we use sfdisk to grow a partition from the initramfs on first boot.

However, we are finding this to be unreliable. Working with Fedora 18,
systemd-195, and OLPC XO-4 hardware.

The core of the problem seems to be udev's response to BLKRRPART - the
ioctl used to ask the kernel to re-read the partition table. When
BLKRRPART is executed, udevadm monitor shows KERNEL events for the
device partitions being removed, and then added again - even when no
partition change has happened.

A quick and easy way to fire off a BLKRRPART (and nothing else):
sfdisk -R /dev/mmcblk1

By inserting some printks in the kernel, I can see that firing off
BLKRRPART causes systemd-udevd to open and close the device,
presumably in response to the KERNEL events mentioned above, even when
no partitioning changes have happened.

With that background, here is what happens when sfdisk is run to
modify the partition table on a device which is fully settled and is
not mounted:

1. sfdisk fires off BLKRRPART immediately, before making any changes,
to check that the device isn't in use.
2. This causes the KERNEL events, and udev opens and closes the device
shortly after.
3. sfdisk writes the modified new partition table
4. sfdisk fires BLKRRPART again asking the kernel to note the new
partition table

The problem here is that #2 runs in parallel, being a separate
process. On a quite regular basis, the sequence actually happens like
this:

1. sfdisk fires off BLKRRPART immediately, before making any changes,
to check that the device isn't in use.
2. This causes the KERNEL events, and udev opens the device.
3. sfdisk writes the modified new partition table
4. sfdisk fires BLKRRPART again asking the kernel to note the new
partition table. This fails, since the device is open.
5. udev closes the device.

In such a case, sfdisk has failed to update the partition setup
visible to the user, claiming that the device is probably mounted or
something, for no immediately obvious reason (eh? the device wasn't
in use by anyone!).

This can be reproduced within seconds by running a script in a loop
that uses sfdisk to do the following to an unmounted and otherwise
unused SD card:
 1. Make the first partition a little smaller
 2. Grow the first partition to its full size

Within seconds you hit the failure condition noted above.


How can/should this be corrected?

Its annoying that udev is performing the device open here, but given
the events from the kernel, its probably a sensible thing to do.

Should the kernel not be generating these events when the partition
table hasn't changed?

Or should sfdisk acknowledge these possible races and try these ioctls
a few times in a loop before bailing out?

Maybe this is all a consequence of the kernel's lack of lock this
device down, I want to partition it interface?

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Transient hostname default behaviour

2012-10-30 Thread Daniel Drake
On Mon, Oct 29, 2012 at 7:19 PM, Lennart Poettering
lenn...@poettering.net wrote:
 One more thing to add:

 It looks like /etc/sysconfig/network is still being parsed even though
 the above link suggests otherwise. Putting HOSTNAME=myhostname in
 /etc/sysconfig/network sets the default transient hostname. Hmm.

 That sounds like NM or so is reading the file and applying it?

No, systemd does. In git it doesn't, but v195 does read
/etc/sysconfig/network on Fedora if /etc/hostname is no good.
That caused some of the above confusion.

My other problem was that I did not have /etc/hostname available early
enough during boot.

Still can't explain all the behaviour I was seeing, but with that
fixed, things seem to be behaving.

It looks like some NM work may be pending here:
https://bugzilla.redhat.com/show_bug.cgi?id=831735

And I filed a bug for the dhclient issue identified by Zbyszek
https://bugzilla.redhat.com/show_bug.cgi?id=871521

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Argument quoting in Exec lines

2012-09-04 Thread Daniel Drake
Hi,

Not sure whether to submit a bug report or documentation patch for this.

ExecStart=/usr/bin/foo --arg1=foo bar

Causes foo to be run with 2 command line args:
 1. --arg1=foo
 2. bar

Not what I was hoping for.


Whereas:

ExecStart=/usr/bin/foo --arg1=foo bar

does what I want, just 1 command line arg:
 1. --arg1=foo bar



Took me a while to figure that out. Is this the desired behaviour?

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Set environment variable system-wide

2012-08-08 Thread Daniel Drake
On Wed, Aug 8, 2012 at 10:56 AM, Lennart Poettering
lenn...@poettering.net wrote:
 I don't think anything can be considered clean if it involves setting
 system-wide env vars. There must be another way to teach Python
 optimization system-wide...

I have yet to find the other way that you mention.
Anyway, I can agree that this is more of a python problem than a systemd one.

Thanks for the info!

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Set environment variable system-wide

2012-08-06 Thread Daniel Drake
Hi,

I thought I read somewhere that systemd offers a mechanism to set an
environment variable system-wide - i.e. the variable assignment will
be present in all the processes started by systemd.

But I can't find where I read this, or how to use it.

Does this functionality exist or am I getting confused with something else?

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Set environment variable system-wide

2012-08-06 Thread Daniel Drake
On Mon, Aug 6, 2012 at 3:09 PM, Kay Sievers k...@vrfy.org wrote:
 systemctl set-environment ... ?

Maybe thats what I read about.

In this case I'm looking to set it in early boot though, so that it
affects all spawned processes from the very start. Is there a nice way
of doing this?

 But it's in almost all use cases wrong to use anything like that isn't
 broken unix legacy that expects it that way.

I'm having trouble parsing that sentence. In this case I'm looking for
a clean way to set PYTHONOPTIMIZE system wide (to enable Python's
optimized bytecode usage).

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Diagnosing hang on reboot

2012-07-04 Thread Daniel Drake
Hi,

We're encountering a systemd hang on reboot which is proving hard to
debug, on the OLPC XO platform (systemd-44 on Fedora 17). It doesn't
happen every time, but it is frequent: when running a system that
reboots once every 2-3 minutes, it reproduces with an hour (usually
much quicker). Can anyone suggest debugging techniques for the
following situation, or are there similar-sounding bug reports already
that might provide clues?

- /sbin/reboot is run, and exits with code 0, without producing any
output on stderr or stdout.

- the reboot process is definitely initiated, because plymouth's
shutdown screen comes up, and the serial console getty is stopped

- the hang happens with the plymouth shutdown splash on-screen, and
the system continues responding to keypresses (showing/hiding the
plymouth splash)

- disabling the plymouth shutdown splash doesn't solve the hang, and
no interesting messages appear on the console either

- the system no longer responds to sysrq over serial (even when the
kernel sysrq_always_enabled parameter is used)

- the shutdown scripts in /usr/lib/systemd/system-shutdown are not called

- enabling systemd debugging via kernel parameters
systemd.log_level=debug systemd.log_target=kmsg causes the hang not
to happen (left a system reboot-looping with this configuration for 24
hours without hitting the issue)

Any tips appreciated.


This is perhaps unlikely to be a systemd issue, because when we reboot
from a normal session, we don't hit this issue (but I think systemd
could help us find the problem?). We hit this issue when rebooting
after running our manufacturing tests, which aim to hammer the system
very hard and activate as many components as possible (microphone,
camera, screen, disk, RAM check, ...). These tests are activated as
follows:

 1. During boot, runin-check.service (runs early) notes that the
laptop's manufacturing data says that the system should run
manufacturing tests rather than starting a real session. The
runin-check program then calls systemctl isolate runin.target
 2. runin.target starts the runin-main program which opens an X
session and kicks off all kinds of tests

Here are the debug logs from a successful boot-to-reboot cycle (when
things work OK):
http://dev.laptop.org/~dsd/20120704/runin-verbose.txt
At 15.991475, runin-check runs systemctl isolate runin.target
At 18.969571, runin tests start
At 30.505676, runin tests fail and the reboot process is initiated. (I
deliberately triggered the fail so that I don't have to wait a long
time for the reboot to happen)
At 36.818280, /sbin/reboot is called by runin
At 46.956082 the scripts in /usr/lib/systemd/system-shutdown are called


Here are the relevant service/target files:

runin-check.service:

[Unit]
Description=Check whether to run OLPC run-in tests
DefaultDependencies=no
Requires=olpc-configure.service
After=olpc-configure.service
Before=basic.target

[Service]
Type=oneshot
ExecStart=/runin/runin-check

[Install]
WantedBy=basic.target




runin.target:

[Unit]
Description=OLPC run-in tests
AllowIsolate=true
DefaultDependencies=no
Requires=runin.service
After=olpc-configure.service
Wants=plymouth-quit.service plymouth-quit-wait.service




runin.service:


[Unit]
Description=OLPC run-in tests
DefaultDependencies=no
Wants=udev-settle.service
After=udev-settle.service plymouth-quit.service plymouth-quit-wait.service

[Service]
ExecStart=/runin/runin-main


Any help appreciated; this is currently the last blocking bug we have
preventing our latest software image (our first systemd-based
release!) from entering mass-production in the factory.

Thanks!
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Showing plymouth shutdown splash earlier during shutdown process

2012-06-04 Thread Daniel Drake
On Fri, Jun 1, 2012 at 4:03 AM, Michal Schmidt mschm...@redhat.com wrote:
 On 05/31/2012 05:46 PM, Daniel Drake wrote:

 In the case of reboot (or poweroff), what does this mean?
 plymouth-reboot.service is queued to start, and prefdm.service is
 queued to stop. What does After= mean in this context, who comes
 first?


 'man systemd.unit' says:
 If one unit with an ordering dependency on another unit is shut down while
 the latter is started up, the shut down is ordered before the start-up
 regardless whether the ordering dependency is actually of type After= or
 Before=.

Thanks for pointing that out.

 It is like it is waiting for those services to stop before executing.

 How can I find out why?

 Based on the above rule, check all the ordering dependencies the unit has:
 systemctl show -p After -p Before plymouth-reboot.service

I followed this down a couple of levels and didn't find the answer.
Probably need to go further, I'll see if I can find some time to do
that soon.

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Showing plymouth shutdown splash earlier during shutdown process

2012-05-31 Thread Daniel Drake
On Wed, Apr 11, 2012 at 10:51 AM, Daniel Drake d...@laptop.org wrote:
 On Wed, Apr 11, 2012 at 9:42 AM, Lennart Poettering
 lenn...@poettering.net wrote:
 I tried modifying e.g. plymouth-reboot.service to have:
 Before=reboot.service shutdown.target umount.target final.target 
 reboot.target

 That suggests that the plymouth client tool is not waiting for the
 operation to finish but just asynchonrously queueing the reuest, which
 is something that should be fixed in plymouth.

 You're probably right, but before we get there, even with the above
 Before= change, systemd seems to be starting plymouth-reboot.service
 rather late in the process. Logs from a reboot with the Before= change
 made as above:

 http://dev.laptop.org/~dsd/20120411/shutdown2.txt

 Any ideas?

Bump.
I filed a bug for the plymouth-quitting-before-command-processed
issue: https://bugs.freedesktop.org/show_bug.cgi?id=50544
and I worked around it locally.

But still, the plymouth splash is being shown late in the process, as
shown in the above log.

plymouth-reboot.service has

After=getty@tty1.service prefdm.service plymouth-start.service
Before=reboot.service


In the case of reboot (or poweroff), what does this mean?
plymouth-reboot.service is queued to start, and prefdm.service is
queued to stop. What does After= mean in this context, who comes
first?

Either way, plymouth-reboot.service seems to be run a long time after
prefdm finishes - about 3.5 seconds.
And after running it a few times I am seeing that it *always* starts
after a whole bunch of other services have been stopped - in the above
log: diskspacerecover.service, alsa-store.service,
systemd-random-seed-save.service, and maybe more. It is like it is
waiting for those services to stop before executing. How can I find
out why?

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Don't skip bind mounts on shutdown

2012-05-30 Thread Daniel Drake
Hi,

On Wed, Apr 25, 2012 at 9:46 AM, Daniel Drake d...@laptop.org wrote:
 This reverts commits d72238fcb34abc81aca97c5fb15888708ee937d3 and
 f3accc08.

 OLPC runs / as a bind-mount, so this must be remounted RO during
 shutdown to avoid corruption.

 As Lennert can't recall the exact reasons for making the shutdown
 code skip bind mounts, revert to previous behaviour to solve the
 issue for OLPC.

 http://lists.freedesktop.org/archives/systemd-devel/2012-April/004957.html

Any news on this patch?

Thanks
Daniel


 ---
  src/core/umount.c |   19 ++-
  1 file changed, 2 insertions(+), 17 deletions(-)

 diff --git a/src/core/umount.c b/src/core/umount.c
 index 488e1e4..85b7824 100644
 --- a/src/core/umount.c
 +++ b/src/core/umount.c
 @@ -37,7 +37,6 @@
  typedef struct MountPoint {
         char *path;
         dev_t devnum;
 -        bool skip_ro;
         LIST_FIELDS (struct MountPoint, mount_point);
  } MountPoint;

 @@ -72,8 +71,6 @@ static int mount_points_list_get(MountPoint **head) {
         for (i = 1;; i++) {
                 int k;
                 MountPoint *m;
 -                char *root;
 -                bool skip_ro;

                 path = p = NULL;

 @@ -81,7 +78,7 @@ static int mount_points_list_get(MountPoint **head) {
                                 %*s        /* (1) mount id */
                                 %*s        /* (2) parent id */
                                 %*s        /* (3) major:minor */
 -                                %ms        /* (4) root */
 +                                %*s        /* (4) root */
                                 %ms        /* (5) mount point */
                                 %*s        /* (6) mount options */
                                 %*[^-]     /* (7) optional fields */
 @@ -90,8 +87,7 @@ static int mount_points_list_get(MountPoint **head) {
                                 %*s        /* (10) mount source */
                                 %*s        /* (11) mount options 2 */
                                 %*[^\n],   /* some rubbish at the end */
 -                                root,
 -                                path)) != 2) {
 +                                path)) != 1) {
                         if (k == EOF)
                                 break;

 @@ -101,11 +97,6 @@ static int mount_points_list_get(MountPoint **head) {
                         continue;
                 }

 -                /* If we encounter a bind mount, don't try to remount
 -                 * the source dir too early */
 -                skip_ro = !streq(root, /);
 -                free(root);
 -
                 p = cunescape(path);
                 free(path);

 @@ -131,7 +122,6 @@ static int mount_points_list_get(MountPoint **head) {
                 }

                 m-path = p;
 -                m-skip_ro = skip_ro;
                 LIST_PREPEND(MountPoint, mount_point, *head, m);
         }

 @@ -448,11 +438,6 @@ static int 
 mount_points_list_remount_read_only(MountPoint **head, bool *changed)

         LIST_FOREACH_SAFE(mount_point, m, n, *head) {

 -                if (m-skip_ro) {
 -                        n_failed++;
 -                        continue;
 -                }
 -
                 /* Trying to remount read-only */
                 if (mount(NULL, m-path, NULL, 
 MS_MGC_VAL|MS_REMOUNT|MS_RDONLY, NULL) == 0) {
                         if (changed)
 --
 1.7.10
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] Don't skip bind mounts on shutdown

2012-04-25 Thread Daniel Drake
This reverts commits d72238fcb34abc81aca97c5fb15888708ee937d3 and
f3accc08.

OLPC runs / as a bind-mount, so this must be remounted RO during
shutdown to avoid corruption.

As Lennert can't recall the exact reasons for making the shutdown
code skip bind mounts, revert to previous behaviour to solve the
issue for OLPC.

http://lists.freedesktop.org/archives/systemd-devel/2012-April/004957.html
---
 src/core/umount.c |   19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/src/core/umount.c b/src/core/umount.c
index 488e1e4..85b7824 100644
--- a/src/core/umount.c
+++ b/src/core/umount.c
@@ -37,7 +37,6 @@
 typedef struct MountPoint {
 char *path;
 dev_t devnum;
-bool skip_ro;
 LIST_FIELDS (struct MountPoint, mount_point);
 } MountPoint;
 
@@ -72,8 +71,6 @@ static int mount_points_list_get(MountPoint **head) {
 for (i = 1;; i++) {
 int k;
 MountPoint *m;
-char *root;
-bool skip_ro;
 
 path = p = NULL;
 
@@ -81,7 +78,7 @@ static int mount_points_list_get(MountPoint **head) {
 %*s/* (1) mount id */
 %*s/* (2) parent id */
 %*s/* (3) major:minor */
-%ms/* (4) root */
+%*s/* (4) root */
 %ms/* (5) mount point */
 %*s/* (6) mount options */
 %*[^-] /* (7) optional fields */
@@ -90,8 +87,7 @@ static int mount_points_list_get(MountPoint **head) {
 %*s/* (10) mount source */
 %*s/* (11) mount options 2 */
 %*[^\n],   /* some rubbish at the end */
-root,
-path)) != 2) {
+path)) != 1) {
 if (k == EOF)
 break;
 
@@ -101,11 +97,6 @@ static int mount_points_list_get(MountPoint **head) {
 continue;
 }
 
-/* If we encounter a bind mount, don't try to remount
- * the source dir too early */
-skip_ro = !streq(root, /);
-free(root);
-
 p = cunescape(path);
 free(path);
 
@@ -131,7 +122,6 @@ static int mount_points_list_get(MountPoint **head) {
 }
 
 m-path = p;
-m-skip_ro = skip_ro;
 LIST_PREPEND(MountPoint, mount_point, *head, m);
 }
 
@@ -448,11 +438,6 @@ static int mount_points_list_remount_read_only(MountPoint 
**head, bool *changed)
 
 LIST_FOREACH_SAFE(mount_point, m, n, *head) {
 
-if (m-skip_ro) {
-n_failed++;
-continue;
-}
-
 /* Trying to remount read-only */
 if (mount(NULL, m-path, NULL, 
MS_MGC_VAL|MS_REMOUNT|MS_RDONLY, NULL) == 0) {
 if (changed)
-- 
1.7.10

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Safe handling of root filesystem on shutdown

2012-04-19 Thread Daniel Drake
Hi Lennart,

On Thu, Apr 12, 2012 at 8:46 AM, Daniel Drake d...@laptop.org wrote:
 The mmcblk0p2 message above suggests that / is being re-mounted
 readonly, and also on next boot the system no longer complains about /
 not being cleanly unmounted. Tested with 3 reboots to be sure.

 Reverting these commits seems like a good solution to me. If you go
 ahead with this, I'd also appreciate it if you could apply the fix to
 the F17 package next time you are touching things there.

Bump :)
Can these patches be reverted then?
If it makes your life easier, I've attached a patch to do so.

At this point I'd also like to get this sorted in F17 sooner rather
than later. If you don't object, I'll patch this into the F17/F18
packages and submit an update once it is fixed in systemd git.

Thanks,
Daniel


0001-Don-t-skip-bind-mounts-on-shutdown.patch
Description: Binary data
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Safe handling of root filesystem on shutdown

2012-04-12 Thread Daniel Drake
On Thu, Apr 12, 2012 at 4:56 AM, Lennart Poettering
lenn...@poettering.net wrote:
 I think I added this logic primarily to make the shutdown loop quiet.

 However I must admit that that's just a guess and since my commit
 message is disappointingly unconclusive about this I am a bit lost...

 If you revert f3accc08, do things look good for you then? Do you get any
 log spew on shutdown?

I had to revert d72238fcb34abc81aca97c5fb15888708ee937d3 first.
Then I reverted f3accc08, and modified systemd-shutdown to log to kmsg
so that I could see the messages before power-down.

[  441.206413] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[  441.239944] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[  441.263633] systemd-shutdown[1]: Unmounting file systems.
[  441.280554] systemd-shutdown[1]: Unmounted /var/lib/random-seed.
[  441.297471] systemd-shutdown[1]: Unmounted /var/lib/dhclient.
[  441.320312] systemd-shutdown[1]: Unmounted /var/lib/dbus.
[  441.340072] systemd-shutdown[1]: Unmounted /dev/hugepages.
[  441.355911] systemd-shutdown[1]: Unmounted /sys/kernel/debug.
[  441.372049] systemd-shutdown[1]: Unmounted /dev/mqueue.
[  441.387525] systemd-shutdown[1]: Unmounted /home.
[  441.751119] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  441.831283] systemd-shutdown[1]: Disabling swaps.
[  441.846084] systemd-shutdown[1]: Detaching loop devices.
[  441.864999] systemd-shutdown[1]: Detaching DM devices.
[  442.965933] ACPI: Preparing to enter system sleep state S5
[  443.080153] Power down.

The mmcblk0p2 message above suggests that / is being re-mounted
readonly, and also on next boot the system no longer complains about /
not being cleanly unmounted. Tested with 3 reboots to be sure.

Reverting these commits seems like a good solution to me. If you go
ahead with this, I'd also appreciate it if you could apply the fix to
the F17 package next time you are touching things there.

Thanks!
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Safe handling of root filesystem on shutdown

2012-04-11 Thread Daniel Drake
Hi,

On OLPC laptops we are seeing that ext4 complains on every boot that
the filesystem wasn't cleanly unmounted.

Looking at systemd debug logs of a shutdown would seem to agree, I
can't see where it attempts to remount / read-only as was done with
sysvinit.

http://dev.laptop.org/~dsd/20120411/shutdown.txt

Can anyone point out how this is supposed to work - where is the code
that looks after the / mount during shutdown/reboot?

We do have a bit of a strange fs-layout, where our root fs is kept
inside /versions/pristine/X on the root partition. The initramfs takes
care of this with some bind-mount and chroot tricks so that it looks
'normal' afterwards, but maybe something along these lines is
confusing systemd.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Showing plymouth shutdown splash earlier during shutdown process

2012-04-11 Thread Daniel Drake
Hi,

As can be seen in my logs of a reboot:
http://dev.laptop.org/~dsd/20120411/shutdown.txt

The plymouth shutdown splash is being shown really quite late.

As systemd shuts down fantastically fast, this means that our pretty
shutdown graphic is not being drawn on OLPC laptops. Sometimes the
image is drawn partially, and sometimes it is not drawn at all. (A few
seconds of systemd output text is always visible though)

Is there a way to make the plymouth shutdown screen appear earlier?

I tried modifying e.g. plymouth-reboot.service to have:
Before=reboot.service shutdown.target umount.target final.target reboot.target

However this didn't produce any noticable difference.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Safe handling of root filesystem on shutdown

2012-04-11 Thread Daniel Drake
On Wed, Apr 11, 2012 at 9:40 AM, Lennart Poettering
lenn...@poettering.net wrote:
 So on shutdown after stopping all services we execute systemd-shutdown
 as PID 1 replacing the normal systemd process. This is useful to drop
 all references to files on disk, so that we can remount the disk r/o
 even on upgrades.

 systemd-shutdown is basically a single loop that tries to
 umount/read-only mount all file systems it finds as long as this changes
 the list of active mounts. This code also disables all swaps and detachs
 DM/loop devices in the same loop.

Thanks as always for the fast and good explanation! With that pointer,
I found the problem, see below.

 We do have a bit of a strange fs-layout, where our root fs is kept
 inside /versions/pristine/X on the root partition. The initramfs takes
 care of this with some bind-mount and chroot tricks so that it looks
 'normal' afterwards, but maybe something along these lines is
 confusing systemd.

 chroot()? Meh, you should not use chroot for these kinds of things...

Actually, we don't use chroot directly. Here's what happens:

dracut mounts the root fs at /sysroot, then in a pre-pivot dracut
trigger OLPC does:

mkdir /vsysroot
mount --bind /sysroot/versions/run/6 /vsysroot
umount /sysroot
NEWROOT=/vsysroot

Dracut then goes ahead and performs switch_root on $NEWROOT to pivot
onto the real system.

(Happy to hear advice on a nicer way to do this)

When the system finishes booting, /proc/self/mountinfo looks like:
http://dev.laptop.org/~dsd/20120411/mountinfo.txt

Now, in systemd-shutdown we reach mount_points_list_get() in umount.c,
which does:

/* If we encounter a bind mount, don't try to remount
 * the source dir too early */
skip_ro = !streq(root, /);

Hence skip_ro gets set to 1 for our /

mount_points_list_remount_read_only() then ignores the / mount and
leaves it as RW during shutdown.

I don't really understand the reasoning for the above behaviour of
bind mounts. Would it be acceptable to special-case this condition if
the path in question is / so that skip_ro does not get set? Or are
there other options available?

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Showing plymouth shutdown splash earlier during shutdown process

2012-04-11 Thread Daniel Drake
On Wed, Apr 11, 2012 at 9:42 AM, Lennart Poettering
lenn...@poettering.net wrote:
 I tried modifying e.g. plymouth-reboot.service to have:
 Before=reboot.service shutdown.target umount.target final.target 
 reboot.target

 That suggests that the plymouth client tool is not waiting for the
 operation to finish but just asynchonrously queueing the reuest, which
 is something that should be fixed in plymouth.

You're probably right, but before we get there, even with the above
Before= change, systemd seems to be starting plymouth-reboot.service
rather late in the process. Logs from a reboot with the Before= change
made as above:

http://dev.laptop.org/~dsd/20120411/shutdown2.txt

Any ideas?

Thanks
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries

2012-04-10 Thread Daniel Drake
On Tue, Apr 10, 2012 at 6:21 AM, Kay Sievers k...@vrfy.org wrote:
 Libattr and libcap are gone now from the tools which do not need them:
  http://cgit.freedesktop.org/systemd/systemd/commit/?id=d7832d2c6e0ef5f2839a2296c1cc2fc85c7d9632

Great! Thanks for slimming up my initramfs a bit :)

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries

2012-04-04 Thread Daniel Drake
Hi,

On Tue, Nov 1, 2011 at 4:39 PM, Lennart Poettering
lenn...@poettering.net wrote:
 Hmm, let me see if I get this right: with this patch applied we'd build
 cap and selinux support into libsystemd-basic.la, but we wouldn't link
 against the respective libraries but instead do that in the binaries
 which pull in the .la?

 I am not sure I like this. I mean, I understand the goal, and it's a
 good one, but I think if we do this we should do this properly, and
 split up util.c so that the stuff that uses caps and selinux is
 independent of the rest and can be pulled in individually as needed.

This is true for the libcap case - libcap is only used by util.c so is
easy to split out.

But with selinux included, the task is more complicated. For example,
label.c (part of libsystemd-basic) also uses libselinux, so we need to
move it out somewhere else (lets say we put it in a new library:
libsystemd-extra). But the label_ functions are used several places
inside util.c itself. Things are tangled. If I were to go down this
path further I think we'd end up moving a huge amount of stuff to
libsystemd-extra.

Instead, do any of the following options make sense?

- Special-case systemd-timestamp because it's used in the initramfs.
Instead of linking against libselinux-basic just pull in util.c
directly into the compilation and link with -lrt.

- Create a new shared library used in compilation
(libsystemd-verybasic?), initially only containing the time-related
functions used by systemd-timestamp. Link systemd-timestamp against
that, and be happy.

- While linking executables (or immediately after), perform some
checks to see if the linked libraries are *really* necessary, and if
they aren't, drop the links. vim does this via the attached script.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries

2012-04-04 Thread Daniel Drake
On Wed, Apr 4, 2012 at 10:36 AM, Daniel Drake d...@laptop.org wrote:
 But with selinux included, the task is more complicated. For example,
 label.c (part of libsystemd-basic) also uses libselinux, so we need to
 move it out somewhere else (lets say we put it in a new library:
 libsystemd-extra). But the label_ functions are used several places
 inside util.c itself. Things are tangled. If I were to go down this
 path further I think we'd end up moving a huge amount of stuff to
 libsystemd-extra.

I just realised that udevd links against libselinux, so even if we fix
selinux-timestamp I still won't be winning on that front - and I don't
see an easy way to keep udevd out of a dracut initramfs.

However, dropping the link against libcap (which also includes
libattr) would be nice. Here is a patch to do that.


Now that udev is included in systemd I will use this opportunity to
moan a little about the next dependency lover that gets included in
the initramfs: udevadm.

/usr/bin/udevadm
linux-gate.so.1 =  (0xb771f000)
libselinux.so.1 = /lib/libselinux.so.1 (0xb76b9000)
libblkid.so.1 = /lib/libblkid.so.1 (0xb768f000)
libkmod.so.2 = /lib/libkmod.so.2 (0xb7677000)
librt.so.1 = /lib/librt.so.1 (0xb766e000)
libc.so.6 = /lib/libc.so.6 (0xb74be000)
libdl.so.2 = /lib/libdl.so.2 (0xb74b9000)
/lib/ld-linux.so.2 (0x4610a000)
libuuid.so.1 = /lib/libuuid.so.1 (0xb74b3000)
liblzma.so.5 = /lib/liblzma.so.5 (0xb748a000)
libz.so.1 = /lib/libz.so.1 (0xb7474000)
libpthread.so.0 = /lib/libpthread.so.0 (0xb7459000)

Don't suppose there is any obvious reduction possible here?

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries

2012-04-04 Thread Daniel Drake
On Wed, Apr 4, 2012 at 10:53 AM, Daniel Drake d...@laptop.org wrote:
 However, dropping the link against libcap (which also includes
 libattr) would be nice. Here is a patch to do that.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Makefile.am: reduce linked libraries

2012-04-04 Thread Daniel Drake
On Wed, Apr 4, 2012 at 2:02 PM, Kay Sievers k...@vrfy.org wrote:
 Right, when udevadm is there, then there is udevd, which definitely
 needs all of them.

Thats a good point - and if udevd really needs them, then there's no escaping.
So I guess there is nothing to gain here :(

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd hangs on shutdown

2011-10-13 Thread Daniel Drake
On Tue, Oct 11, 2011 at 1:44 AM, Lennart Poettering
lenn...@poettering.net wrote:
 So this is a the big issue here I believe. If you look at 87.293308
 you'll see that tmp.mount is suddenly mounted again for some reason,
 which systemd then takes as hint to get rid of
 poweroff.target/poweroff.service, since they conflict with that.

 It key to the mystery here is figuring out why systemd suddenly sees
 those mount points coming back. It would be good to figure out what the
 mount table is when that happens.

Thanks for looking carefully at this!

It looks like the problem is that we had /tmp mounted as tmpfs, then
mounted as tmpfs again on top. We've had this for a long time
(unintentionally), but it hadn't surfaced as an issue until now - we
didn't even realise.

After removing the duplicate mount setup so that /tmp is only mounted
once, the system shuts down.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Excessive linking of systemd-timestamp

2011-10-13 Thread Daniel Drake
Hi,

Running systemd-36 on Fedora 16.

$ ldd /lib/systemd/systemd-timestamp
linux-gate.so.1 =  (0x00a84000)
libselinux.so.1 = /lib/libselinux.so.1 (0x0059c000)
libcap.so.2 = /lib/libcap.so.2 (0x00901000)
librt.so.1 = /lib/librt.so.1 (0x00a6a000)
libc.so.6 = /lib/libc.so.6 (0x0011)
/lib/ld-linux.so.2 (0x009c1000)
libdl.so.2 = /lib/libdl.so.2 (0x00f09000)
libattr.so.1 = /lib/libattr.so.1 (0x00f23000)
libpthread.so.0 = /lib/libpthread.so.0 (0x007be000)


The excessive linking of this tiny application is challenging my
efforts to keep our initramfs slim for our embedded setup. dracut
includes this app in the initramfs by default, and to satisfy its
requirements it results in all those libraries getting added too.

Could this be reduced? I guess all it needs is libc.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd hangs on shutdown

2011-09-29 Thread Daniel Drake
On Thu, Sep 29, 2011 at 2:29 PM, Daniel Drake d...@laptop.org wrote:
 Full log of startup, shutdown, and sysrq dumps at point of first hang
 (before systemd-stdout-syslog-bridge.service wakeup), and second hang,
 then sysrq dumps again:
 http://dev.laptop.org/~dsd/20110929/systemd-shutdown-hang-debug.txt

I've tried to correspond this to the systemd source and unit files and
I think I might have found something of relevance.

One of the last links in the chain is that poweroff.service gets
started and calls systemctl --force poweroff, right?

In my log, poweroff.service gets installed to be run:

[   57.887771] systemd[1]: Installed new job poweroff.service/start as 243

but never gets run. By that I mean: When other services are queued to
be started, they later get started with About to execute messages
e.g.

[   57.941081] systemd[1]: Installed new job alsa-store.service/start as 250
[   60.373390] systemd[1]: About to execute: /sbin/alsactl store
[   60.450713] systemd[1]: Forked /sbin/alsactl as 1505
[   60.456367] systemd[1]: alsa-store.service changed dead - start

However, the poweroff.service never gets any of the 'about to
execute', 'forked' or 'dead-start' messages. It actually gets stopped
for some reason, perhaps before it has had a chance to do its thing?

[   57.887771] systemd[1]: Installed new job poweroff.service/start as 243
snip
[   87.312551] systemd[1]: Installed new job poweroff.service/stop as 347
[   87.340953] systemd[1]: Job poweroff.service/stop finished, result=done

However, I think it should have been ready to run. from
poweroff.service, its requirements/dependencies are:
  Requires=shutdown.target umount.target final.target
  After=shutdown.target umount.target final.target

and all of those seem to have finished:
[   87.255264] systemd[1]: Job shutdown.target/start finished, result=done
[   87.284275] systemd[1]: Job final.target/start finished, result=done
[   87.353693] systemd[1]: Job umount.target/stop finished, result=done

Am I onto something here, or am I going in the wrong direction?
Debugging tips much appreciated.

cheers
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Problems with rootfs over nfs

2011-05-16 Thread Daniel Drake
On 15 May 2011 15:16, Kay Sievers kay.siev...@vrfy.org wrote:
 Just a first quick check of an issue we ran into with ATA disks:
 what's in /proc/sys/kernel/hotplug before you shut down? Or what's
 CONFIG_UEVENT_HELPER in your kernel setup, it must be = on modern
 systems, otherwise the kernel will they to exec() binaries all the
 time and keep the system's rootfs busy.

I'm also having trouble shutting down with systemd, and I have
CONFIG_UEVENT_HELPER_PATH=/sbin/hotplug
So I'll try this solution. Thanks.

Just a quick question: is the same also true for Fedora 14
(upstart-1.2, udev-161)? i.e. can and should that config option be
cleared under that setup too? I guess so, given that /sbin/hotplug
doesn't even exist.

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Failed to connect to socket /org/freedesktop/systemd1/private

2011-05-14 Thread Daniel Drake
On 10 May 2011 22:26, Daniel Drake d...@laptop.org wrote:
 If I log the error after it tries to connect to /run/systemd/private, I get:
 Failed to connect to socket /run/systemd/private: No such file or directory

 Indeed, there's nothing at that path, and the only thing in
 /run/systemd/ is an empty directory at /run/systemd/ask-password

For completeness, this is a conflict between systemd and Fedora's
readonly-root system. Filed at
https://bugzilla.redhat.com/show_bug.cgi?id=704783

Now systemd is a bit happier :)

Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Failed to connect to socket /org/freedesktop/systemd1/private

2011-05-10 Thread Daniel Drake
On 9 May 2011 22:46, Lennart Poettering lenn...@poettering.net wrote:
 You are lacking autofs4 support in the kernel. You should fix this first.

I'm not, autofs4 is present, we're working on figuring out on why
systemd complains at
https://bugs.freedesktop.org/show_bug.cgi?id=36993

 Your udev in your initrd is a different version than on your system. You
 should really fix that too.

Ah, that explains that part. Done.

 /etc/mtab is not a symlink to /proc/mounts. Please fix.

Done.

 Your systemd userspace seems to be out-of-date, and not from the same
 package that installed /bin/systemd. i.e. the abstract namespace socket
 /org/freedesktop/systemd1/private was used in older systemd versions,
 but has since moved to /run/systemd/private. Your userspace still tries
 to access the old socket, but systemd 26 (which you appear to be
 running) uses the new one.

This image is freshly made, starting from an empty disk, from Fedora
rawhide as of yesterday, so there should be no mixing going on.

But I've debugged it a little to see why its using the wrong path:

We are reaching bus_connect() in dbus-common.c.
It first tries to connect to /run/systemd/private, but this fails.
It then falls back on /org/freedesktop/systemd1/private. This fails
too. So it then returns error and it only logs the last error (which
was about /org/freedesktop/systemd1/private)

If I log the error after it tries to connect to /run/systemd/private, I get:
Failed to connect to socket /run/systemd/private: No such file or directory

Indeed, there's nothing at that path, and the only thing in
/run/systemd/ is an empty directory at /run/systemd/ask-password

So I then looked on the systemd side of things, bus_init_private() in
dbus.c does create this socket just fine (via to dbus_server_listen).
So the question is when and why does it disappear? I sprinkled debug
statements throughout the code and determined that the socket
disappears after manager_loop() has iterated around 47 times.

Does this give you any ideas? Any suggestions for next debugging
steps? Is there an easy way to make manager_loop() log exactly what it
does on each iteration?

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd fails to boot OLPC XO-1.5

2011-05-09 Thread Daniel Drake
On 7 May 2011 23:43, Daniel Drake d...@laptop.org wrote:
 On 7 May 2011 23:30, Kay Sievers kay.siev...@vrfy.org wrote:
 You need capabilities in your kernel, or comment its use out, in the
 service file.

 I think I have capabilities in my kernel: CONFIG_SECURITY=y which
 means security/capability.c gets compiled in. Were you thinking of
 something else?

 Commenting out CapabilityBoundingSet from systemd-kmsg-syslogd.service
 does fix the issue and allow boot to continue. Thanks!

 Is this a systemd bug (maybe it should ignore CapabilityBoundingSet
 lines when capabilities aren't available?) or do I need to decide
 between hacking systemd unit files or going with this requirement?

I looked further.

systemd.exec man page pointed me to capabilities(7) man page. That man
page says:

   Removing  capabilities  from the bounding set is only supported if file
   capabilities are compiled into the  kernel  (CONFIG_SECURITY_FILE_CAPA-
   BILITIES).

That option doesn't exist in the kernel any more, it was removed by:

commit b3a222e52e4d4be77cc4520a57af1a4a0d8222d1
Author: Serge E. Hallyn se...@us.ibm.com
Date:   Mon Nov 23 16:21:30 2009 -0600

remove CONFIG_SECURITY_FILE_CAPABILITIES compile option

That commit made it be unconditionally on, in agreement with this part
of security/Makefile in modern kernels:

# always enable default capabilities
obj-y   += commoncap.o

So, I don't think its possible to build a kernel without capabilities
support. The problem must be something else (but commenting out those
CapabilityBoundingSet lines does work around the problem). Any ideas /
next debugging steps?

I filed a bug for the /sys/kernel/security problem:
https://bugs.freedesktop.org/show_bug.cgi?id=36993

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Failed to connect to socket /org/freedesktop/systemd1/private

2011-05-09 Thread Daniel Drake
Hi,

Another systemd error I am encountering is:

Failed to get D-Bus connection: Failed to connect to socket
/org/freedesktop/systemd1/private: Connection refused

The message appears a lot throughout boot, full logs here:
http://dev.laptop.org/~dsd/20110509/systemd-boot.txt

It also means I can't run systemctl to diagnose other issues I am having:

# systemctl status sys-kernel-security.automount
Failed to get D-Bus connection: Failed to connect to socket
/org/freedesktop/systemd1/private: Connection refused

Any thoughts or things to check for?

Thanks,
Daniel
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel