Re: [systemd-devel] [PATCH 1/3] generators: rename add_{root, usr}_mount to add_{sysroot, sysroot_usr}_mount

2015-05-03 Thread Lennart Poettering
On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

 This makes it obvious that those functions are only usable in the
 initramfs.
 
 Also, add a warning when noauto, nofail, or automount is used for the
 root fs, instead of silently ignoring. Using those options would be a
 sign of significant misconfiguration, and if we bother to check for
 them, than let's go all the way and complain.
 
 Other various small cleanups and reformattings elsewhere.

Sounds all good to me!

 ---
  src/fstab-generator/fstab-generator.c | 21 +
  src/shared/generator.c| 21 -
  src/shared/generator.h| 17 +
  3 files changed, 38 insertions(+), 21 deletions(-)
 
 diff --git a/src/fstab-generator/fstab-generator.c 
 b/src/fstab-generator/fstab-generator.c
 index 7aee3359e7..664ee2aa6f 100644
 --- a/src/fstab-generator/fstab-generator.c
 +++ b/src/fstab-generator/fstab-generator.c
 @@ -176,6 +176,7 @@ static int write_idle_timeout(FILE *f, const char *where, 
 const char *opts) {
  
  return 0;
  }
 +
  static int add_mount(
  const char *what,
  const char *where,
 @@ -213,10 +214,14 @@ static int add_mount(
  return 0;
  
  if (path_equal(where, /)) {
 -/* The root disk is not an option */
 -automount = false;
 -noauto = false;
 -nofail = false;
 +if (noauto)
 +log_warning(Ignoring \noauto\ for root device);
 +if (nofail)
 +log_warning(Ignoring \nofail\ for root device);
 +if (automount)
 +log_warning(Ignoring automount option for root 
 device);
 +
 +noauto = nofail = automount = false;
  }
  
  name = unit_name_from_path(where, .mount);
 @@ -419,7 +424,7 @@ static int parse_fstab(bool initrd) {
  return r;
  }
  
 -static int add_root_mount(void) {
 +static int add_sysroot_mount(void) {
  _cleanup_free_ char *what = NULL;
  const char *opts;
  
 @@ -453,7 +458,7 @@ static int add_root_mount(void) {
   /proc/cmdline);
  }
  
 -static int add_usr_mount(void) {
 +static int add_sysroot_usr_mount(void) {
  _cleanup_free_ char *what = NULL;
  const char *opts;
  
 @@ -600,9 +605,9 @@ int main(int argc, char *argv[]) {
  
  /* Always honour root= and usr= in the kernel command line if we are 
 in an initrd */
  if (in_initrd()) {
 -r = add_root_mount();
 +r = add_sysroot_mount();
  if (r == 0)
 -r = add_usr_mount();
 +r = add_sysroot_usr_mount();
  }
  
  /* Honour /etc/fstab only when that's enabled */
 diff --git a/src/shared/generator.c b/src/shared/generator.c
 index 569b25bb7c..7b2f846175 100644
 --- a/src/shared/generator.c
 +++ b/src/shared/generator.c
 @@ -32,13 +32,13 @@
  
  int generator_write_fsck_deps(
  FILE *f,
 -const char *dest,
 +const char *dir,
  const char *what,
  const char *where,
  const char *fstype) {
  
  assert(f);
 -assert(dest);
 +assert(dir);
  assert(what);
  assert(where);
  
 @@ -58,10 +58,10 @@ int generator_write_fsck_deps(
  return log_warning_errno(r, Checking was requested 
 for %s, but fsck.%s cannot be used: %m, what, fstype);
  }
  
 -if (streq(where, /)) {
 +if (path_equal(where, /)) {
  char *lnk;
  
 -lnk = strjoina(dest, / SPECIAL_LOCAL_FS_TARGET 
 .wants/systemd-fsck-root.service);
 +lnk = strjoina(dir, / SPECIAL_LOCAL_FS_TARGET 
 .wants/systemd-fsck-root.service);
  
  mkdir_parents(lnk, 0755);
  if (symlink(SYSTEM_DATA_UNIT_PATH 
 /systemd-fsck-root.service, lnk)  0)
 @@ -75,17 +75,20 @@ int generator_write_fsck_deps(
  return log_oom();
  
  fprintf(f,
 -RequiresOverridable=%s\n
 -After=%s\n,
 -fsck,
 +RequiresOverridable=%1$s\n
 +After=%1$s\n,
  fsck);
  }
  
  return 0;
  }
  
 -int generator_write_timeouts(const char *dir, const char *what, const char 
 *where,
 - const char *opts, char **filtered) {
 +int generator_write_timeouts(
 +const char *dir,
 +const char *what,
 +const char *where,
 +const char *opts,
 +char **filtered) {
  
  /* Allow configuration how long we wait for a device that
   * backs a mount point to show 

Re: [systemd-devel] [PATCH 2/3] Allow $SYSTEMD_PRETEND_INITRD to override initramfs detection

2015-05-03 Thread Lennart Poettering
On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

 When testing generators and other utilities, it is extremely useful
 to be able to trigger initramfs behaviour.

Hmm, what about the following solution: instead of checking with
access() for /etc/initrd-release and then also checking with statfs()
on / whether the root disk is writable, maybe we should just
immediately invoke statfs() on /etc/initrd-release and check the
results. If the call returns ENOENT we know that the file doesn't
exist, and if it returns useful data we can verify if it's tmpfs. 

Now, with that in place to test initrd code something like this
suffices:

touch /etc/initrd-release /run/initrd-release
mount --bind /run/initrd-release /etc/initrd-release

As that would result in a file in /etc/initrd-release that is backed
by tmpfs

In general I'd be very conservative when adding new APIs (which is
basically what $SYSTEMD_PRETEND_INITRD would become), especially if we
only need them for debugging, they are are quiet dangerous and when we
have other options too...

I hope that makes sense?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [Q] About supporting nested systemd daemon

2015-05-03 Thread Lennart Poettering
On Thu, 30.04.15 15:42, Alban Crequy (al...@endocode.com) wrote:

  systemd-nspawn nowadays mounts all hierarchies into the container, but
  mounts all controller hierarchies read-only, and of the name=systemd
  hierarchy mounts everything read-only, except the subtree the
  container is allowed to manage. That way only the cgroup tree the
  container needs access to is writable to it. That solution however
  does not hide the cgroup tree. A process running inside the container
  can still go an explore the tree and its attributes. However, all
  other groups will appear empty to it, since processes not in the
  container PID namespaces will be suppressed when reading the member
  process list.
 
 To sum up what systemd-nspawn is currently mounting in the container:
 - /sys/fs/cgroup/systemd/  --  mounted RO
 - /sys/fs/cgroup/systemd/machine.slice/machine-xxx.scope/  -- mounted RW
 - /sys/fs/cgroup/cpu,cpuacct/  --  mounted RO
 - etc. for other cgroup hierarchies  --  mounted RO

Correct.

 In order to let systemd in the container restrict cpu, memory, etc. on
 some of its services (see manpage systemd.resource-control(5)), rkt
 would like systemd-nspawn to mount a subtree of some hierarchy
 (cpu,cpuacct, memory) in read-write mode.

That's really not a safe thing to do right now... the kernel isn't
ready for this, as cgroups access is an all-or-nothing thing
currently: if you have access to a cgroup and cane creat child cgroups
in it you have access to *all* attributes you like, the dangerous ones
as well as the not so dangerous ones.

 Is there any issues with changing the systemd-nspawn mounts in the
 following way:
 - /sys/fs/cgroup/systemd/  --  mounted RO
 - /sys/fs/cgroup/systemd/machine.slice/machine-xxx.scope/  -- mounted RW
 - /sys/fs/cgroup/cpu,cpuacct/  --  mounted RO
 - /sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-xxx.scope/  -- mounted RW
 - etc. for other cgroup hierarchies.
 
 Iago wrote two experimental patches on systemd-nspawn to try that and
 it worked. Delegate=yes was enabled in systemd-nspawn in order to test
 this:
 https://github.com/endocode/systemd/commits/iaguis/delegate
 
 But I would like to know what is missing to make this safe (or if it
 is already safe to do).

Well, nspawn does actually not make any guarantees about security
currently. Since we pass CAP_SYS_ADMIN by default to the contaienrs
people can mount whatever they want and remount things freely from
within. Hence, opening this up would not make things much worse.

That said, I am a bit concerned about opening this up by default. Even
though containers are insecure we should try to be safe wherever we
can if it doesn't affect usability too much. 

Adding a new cmdline switch for all of this sounds not too attractive
though, but maybe a --delegate switch would be OK, which would open up
all controllers to the containers It would have a similar effect
then on the containers as Delegate=yes has for service processes...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Lennart Poettering
On Sun, 03.05.15 17:54, Víctor Fernández (vfr...@gmail.com) wrote:

 Ok, Thanks for your reply.
 
 But, just out of curiosity, why init process gets down with a SIGABRT and
 not with a SIGKILL (9), being this a signal which cannot be caught, blocked
 or ignored?

The kernel refuses to deliver SIGKILL to PID 1. I mean, it's a signal
that cannot be caught by userspace anyway, and hence it would without
exception result in machine halt, and hence the kernel eats it up...

I am pretty sure it's pretty irrelevant whether the kernel delivers it
or not, but they chose not to...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Lennart Poettering
On Sun, 03.05.15 19:10, Mantas Mikulėnas (graw...@gmail.com) wrote:

 On Sun, May 3, 2015 at 6:54 PM, Víctor Fernández vfr...@gmail.com wrote:
 
  Ok, Thanks for your reply.
 
  But, just out of curiosity, why init process gets down with a SIGABRT and
  not with a SIGKILL (9), being this a signal which cannot be caught, blocked
  or ignored?
 
 
 pid 1 is allowed to catch SIGKILL, and usually does so, so that you can
 sigkill everything (e.g. Alt+SysRq+I) and still have a working system
 afterwards.

Hmm, it is allowed to do catch SIGKILL? That would be news to me, and
systemd certainly doesn't. Do you have any reference?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Journald logging handler for Python 3 and AsyncIO integration

2015-05-03 Thread Lennart Poettering
On Sat, 02.05.15 15:12, Ludovic Gasc (gml...@gmail.com) wrote:

2. We use heavily AsyncIO module to have async pattern in Python,
especially for I/O: https://docs.python.org/3/library/asyncio.html
In the source code of python-systemd, I've seen that you use a C glue to
interact with journald, but I don't understand what's the communication
between my Python daemon process and journald: unix sockets ? Other
mechanism ? Depends on the mechanism, it should be have an impact for us.

The communication is via an AF_UNIX/SOCK_DGRAM socket in the file
system.

I am very sure that logging should not be asynchronous non-blocking
IO. It's about reliably getting out log messages at the right times,
and that's a property you lose if you enqueue logs non-blocking.

I don't think that's a good idea in any programming language to log
asynchronously. I mean, there's a reason why libc syslog() is
blocking too.

If you are afraid of blocking logging, then make sure to use large
socket buffers. the journal client library and journald will try to
use very large buffers, but this doesn't always work if the client is
unprivileged. Also you might have to increase the number of datagrams
that may be queued with /proc/sys/net/unix/max_dgram_qlen


Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice

2015-05-03 Thread Lennart Poettering
On Sun, 03.05.15 18:06, Andrei Borzenkov (arvidj...@gmail.com) wrote:

  On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) 
  wrote:
  
  So, the last time we discussed this we figured we should do this
  differently, and simply generate systemd-fsck-root.service in the
  initrd as well, that uses a different command line internally. The end
  result would then be that we can do without flag file, and always have
  the guarantee that systemd-fsck-root.service is the services that
  fsck'ed the root file system, regardless whether in initrd or not.
  
 
 systemd-fsck@.service has explicit dependency on
 systemd-fsck-root.service so other mounts (/usr, anything else?) will
 be serialized after it. Currently they can run in parallel.

 Not I think it is a big problem, but at least to consider.

One option could be to introduce a new target root-fs-ready.target
that is pulled in from the host OS but not in the initrd. s-f-r.s
would order itself before it, and s-f@.s after. Hence, if the target
is in the initial transaction then it will effectively serialize
things as we want if no initrd is in the mix, and if it is missing
from the initial transaction then everything will be parallelize it.

That said, I don't think it's worth it. I'd just accept that things in
the initrd as as parallel or serial as they are on the host...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] pam_systemd.so indirectly calling pam_acct_mgmt

2015-05-03 Thread Lennart Poettering
On Sat, 02.05.15 07:01, Stephen Gallagher (sgall...@redhat.com) wrote:

  Well, I guess for now. But note that eventually we hope to move most
  programs invoked from .desktop into this as systemd services. This
  then means that the actual sessions will become pretty empty, with
  only stubs remaining that trigger services off this user instance of
  systems.
  
 
 If you do that, you will still need some way to invoke PAM with
 different service identities otherwise you'll be implementing a
 pretty severe vulnerability into the system. If all services are
 authorized by the same PAM service, it amounts to removing the
 ability for administrators to differentiate which actions a
 particular user is allowed to perform.

Well, if you are enough logged in to run arbitrary scripts (like gdm,
ssh or cron allow you to), then you are in, for whatever you want to
do, there's no way around that, and having different PAM services
could only hide that fact, but not avoid it...

The admin still has a lot of control on how you can log in though. For
example, gdm will still use PAM to check if you are allowed to login
graphically, on a seat. If that's denied, then the login will be
refused. Only if you managed to login you can also use the systemd
user instance.

Also note that lingering is something that needs to be turned on
with privileges. If you don't have the privs to turn this on, you
cannot make use of this feature and the user instance of systemd is
strictly reference counted by your PAM sessions which means as soon as
you logged out from all your terminals/graphical seats you also lost
the user instance.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] networkd: Is auto-negotiation turned off when specifying parameters in a link file?

2015-05-03 Thread Lennart Poettering
On Sat, 02.05.15 12:00, Paul Menzel (paulepan...@users.sourceforge.net) wrote:

  /etc/udev/rules.d/10-speed1G-enp1s6.rules
  ACTION==add, SUBSYSTEM==net, RUN+=/usr/sbin/ethtool -s enp1s6 
  advertise 0x20
  
  :03 systemd[1]: Starting Network Service...
  :05 systemd-networkd[1612]: enp1s6  : link configured
  :05 systemd-networkd[1612]: enp1s6  : gained carrier
  :06 systemd-networkd[1612]: enp1s6  : lost carrier
  :09 systemd-networkd[1612]: enp1s6  : gained carrier
  
  ~~~
  
  /etc/udev/rules.d/10-speed1G-enp1s6.rules-
  
  :15 systemd[1]: Starting Network Service...
  :17 systemd-networkd[1633]: enp1s6  : link configured
  :17 systemd-networkd[1633]: enp1s6  : gained carrier
 
 So in your case, `gained carrier` is indeed shown earlier saving two
 seconds. The next message probably indicates a problem with the driver.
 
 Poma, what Linux kernel do you use?
 
 Lennart, is poma’s test sufficient to show that integrating an
 `advertise` command(?) into systemd-networkd would be useful?

Hmm? Not sure I understand the test, but if I got it right then it
shows that using ethtool like this slows things down by 3s?


Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn --template: should it delete /etc/hostname?

2015-05-03 Thread Lennart Poettering
On Fri, 01.05.15 19:38, Kai Krakow (hurikha...@gmail.com) wrote:

 Hello!
 
 If I create a new machine by cloning using systemd-nspawn --template, should 
 it remove etc/hostname? It already creates a new machine-id etc, and the 
 hostname should probably not be set for a new container in this case, 
 regardless of whether the template is a real template or a cloned machine.

Well, we don't touch the images really at all, we also leave
/etc/machine-id in place and everything else that could identify the
machine (such as MAC or IP configuration, ...)

People should not misunderstand --template (or --ephemeral or
machinectl clone) as something that would change identity of a system
really, and I think it might be problematic if we started to do so
since we could never implement this fully, because there will always
be more and more to patch and we shouldnt have that much app-specific
patching code in nspawn.

 Thoughts?
 
 I suppose something similar should be possible for statically configured IP 
 addresses as an option, tho I wouldn't know how to implement that because 
 systemd-networkd doesn't expect that information at well defined location.

For the case of the hostname things are relatively easy: if you do not
set the hostname from inside the container, then the hostname will be
inherited from the container manager, and be the same as the container
name you pick with -M (or whatever is derived automatically from -D
if you do not use -M). Hence, if you simply remove /etc/hostname from
your container, then using --template/--ephemeral/machinectl clone
will work the best possible way: the image you create with that will
always use the new container name you use as host name...

Now for MAC addresses things are similar automatic: the MAC addresses
are hashed from the container name, hence should change automatically
when you run the container under a new name. 

IP addresses assigned via DHCP networkd's dhcp server should probably
pick the address using a hash of the client's MAC address or so, so
that the IP address stays normally stable, but changes when the
instance is cloned under a new name. (currently the dhcp server
assigns the client addresses randomly though). 

And the machine ID should support a mode that works similar. i.e. a
way how we can boot with a stable id that changes automatically when
cloned. Maybe generate it from the container manager as hash of the
host's machine id and the container name, or so.

I added TODO list items for the latter two.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Víctor Fernández
Ok, Thanks for your reply.

But, just out of curiosity, why init process gets down with a SIGABRT and
not with a SIGKILL (9), being this a signal which cannot be caught, blocked
or ignored?

PD: I definitely not try the command above

2015-05-03 17:22 GMT+02:00 Lennart Poettering lenn...@poettering.net:

 On Sun, 03.05.15 17:18, Víctor Fernández (vfr...@gmail.com) wrote:

  Hello
 
  I'm using rigth now a Manjaro distribution (derived from arch). Making
 some
  test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will
  cause system to enter on unstable mode:
 
  after doing this, the system reboot graphic server (at least, it request
 to
  login again) and if you resend the SIGABRT, the system goes to Kernel
 Panic
  Mode.
 
  Here is the code I've tested (executing as sudo, of course).
 
  echo int main(){kill(1,6);kill(1,6);}  a.c  gcc a.c  sudo ./a.out
 
  It appears not to be a very large problem (since root permisions are
  required), but I think is an undiserable behaviour.
 
  Is this really a bug?

 Well, there are tons of ways how you can break your system if you are
 root. For example:

   dd if=/dev/urandom of=/dev/sda

 We cannot (and actually should not) try to prevent the user from
 shooting his own foot if he really desires to do so.

 Lennart

 --
 Lennart Poettering, Red Hat

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice

2015-05-03 Thread Lennart Poettering
On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

So, the last time we discussed this we figured we should do this
differently, and simply generate systemd-fsck-root.service in the
initrd as well, that uses a different command line internally. The end
result would then be that we can do without flag file, and always have
the guarantee that systemd-fsck-root.service is the services that
fsck'ed the root file system, regardless whether in initrd or not.

Harald, can you comment?

 In the initramfs, we run systemd-fsck@sysroot-device.service.
 In the real system we run systemd-fsck-root.service. It is hard
 to pass the information that the latter should not run if the first
 succeeded using unit state only.
 
 - in the real system, we need a synchronization point between the fsck
   for root and other fscks, to express the dependency to run this
   systemd-fsck@.service before all other systemd-fsck@ units. We
   cannot express it directly, because there are no wildcard
   dependencies. We could use a target as a sychronization point, but
   then we would have to provide drop-ins to order
   systemd-fsck@-.service before the target, and all others after it,
   which becomes messy. The currently used alternative of having a
   special unit (systemd-fsck-root.service) makes it easy to express
   this dependency, and seems to be the best solution.
 
 - we cannot use systemd-fsck-root.service in the initramfs, because
   other fsck units should not be ordered after it. In the real system,
   the root device is always checked and mounted before other filesystems,
   but in the initramfs this doesn't have to be true: /sysroot might be
   stacked on other filesystems and devices.
 
 - the name of the root device can legitimately be specified in a
   different way in the initramfs (on the kernel command line, or
   automatically discovered through GPT), and in the real fs (in /etc/fstab).
   Even if we didn't need systemd-fsck-root.service as a synchronization
   point, it would be hard to ensure the same instance parameter is
   provided for systemd-fsck@.service in the initrams and the real
   system.
 
 Let's use a side channel to pass this information.
 /run/systemd/fsck-root-done is touched after fsck in the initramfs
 succeeds, through an ExecStartPost line in a drop-in for
 systemd-fsck@sysroot.service.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1201979
 ---
  src/shared/generator.c | 7 +++
  units/systemd-fsck-root.service.in | 1 +
  2 files changed, 8 insertions(+)
 
 diff --git a/src/shared/generator.c b/src/shared/generator.c
 index 7b2f846175..a71222d1cb 100644
 --- a/src/shared/generator.c
 +++ b/src/shared/generator.c
 @@ -78,6 +78,13 @@ int generator_write_fsck_deps(
  RequiresOverridable=%1$s\n
  After=%1$s\n,
  fsck);
 +
 +if (in_initrd()  path_equal(where, /sysroot))
 +return write_drop_in_format(dir, fsck, 50, stamp,
 +# Automatically 
 generated by %s\n\n
 +[Service]\n
 +
 ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n,
 +
 program_invocation_short_name);
  }
  
  return 0;
 diff --git a/units/systemd-fsck-root.service.in 
 b/units/systemd-fsck-root.service.in
 index 3617abf04a..48dacc841c 100644
 --- a/units/systemd-fsck-root.service.in
 +++ b/units/systemd-fsck-root.service.in
 @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8)
  DefaultDependencies=no
  Before=local-fs.target shutdown.target
  ConditionPathIsReadWrite=!/
 +ConditionPathExists=!/run/systemd/fsck-root-done
  
  [Service]
  Type=oneshot
 -- 
 2.3.5
 
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] mount crypto_LUKS device in conatiner

2015-05-03 Thread Lennart Poettering
On Fri, 01.05.15 11:39, arnaud gaboury (arnaud.gabo...@gmail.com) wrote:

 My container will need access to a Luks encrypted device (/dev/sdd4)
 for its DB.

Only very select devices are accessible from inside containers, more
specifically the ones where it is fully safe to share them between
multiple containers and the host. /dev/random and /dev/null are of
this kind, however device mapper (DM) devices are not. 

This is a limitation of the Linux kernel really, it does not support
proper device virtualization for things like this, and probably never
will.

Or in other words: LVM and DM (and thus LUKS) are something you can
use on the host only, sorry.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] udev interface naming for SR-IOV VFs

2015-05-03 Thread Lennart Poettering
On Fri, 01.05.15 11:04, Dan Kenigsberg (dan...@redhat.com) wrote:

 On Mon, Apr 20, 2015 at 08:43:21PM +0200, Lennart Poettering wrote:
  On Fri, 17.04.15 14:19, Nir Soffer (nir...@gmail.com) wrote:
  
   - You may wait for unrelated events that happen to trigger in the same
 time, waiting after the new interfaces are ready.
   
   I think you need something like:
   
   while True:
   try:
   udevadm.settle(1)
   except udevadm.Timeout:
   pass
   else:
   if all devices are ready:
   break
   time.sleep(1)
  
  Please never use udevadm settle in new code.
 
 Could you explain why? Is it because we are not sure if our events
 have not been queued when settle is called, or something more dramatic
 that should be documented in udevadm(1)?

Well, when people use udev settle they do so usually because they
assume that after they called it all devices of the kind they are
looking for have shown up, and that all is good then. But that's really
not how devices work these days, they can come and go at any time, and
at boot we have no idea at what time they will all have appeared, as
many of the subsystems (inclduing USB or things like iSCSI for
example) can take pretty much any time they want before the devices
pop up after powering on.

Now of course, if you care only about SR-IOV and you know you
triggered your devices manually right before it, then yes, what was
triggered will have been processed at time of udev settle returning,
and you are hence safe -- but even then it's actually not really doing
what you really want it to do: it will settle until *all* devices
currently being probed have finished being probed, which might be
substantally more than what you are looking for.

In all cases the right way to implement device handling in clients is
to actually subscribe to things and wait for precisely for the devices
you need, and not any longer.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice

2015-05-03 Thread Zbigniew Jędrzejewski-Szmek
On Sun, May 03, 2015 at 06:06:58PM +0300, Andrei Borzenkov wrote:
 В Sun, 3 May 2015 16:17:15 +0200
 Lennart Poettering lenn...@poettering.net пишет:
 
  On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) 
  wrote:
  
  So, the last time we discussed this we figured we should do this
  differently, and simply generate systemd-fsck-root.service in the
  initrd as well, that uses a different command line internally. The end
  result would then be that we can do without flag file, and always have
  the guarantee that systemd-fsck-root.service is the services that
  fsck'ed the root file system, regardless whether in initrd or not.
  
 
 systemd-fsck@.service has explicit dependency on
 systemd-fsck-root.service so other mounts (/usr, anything else?) will
 be serialized after it. Currently they can run in parallel.
 
 Not I think it is a big problem, but at least to consider.
Yeah, that's the main wart. I tried to outline it in the second bullet
point below.

I prepared a patch to generate systemd-fsck-root.service in
generator_write_fsck_deps() first, but I wasn't happy with the result.
If we ignore the dependency issue, it might be judged more elegant,
since it just uses unit stat to pass information.

  Harald, can you comment?

Zbyszek

   In the initramfs, we run systemd-fsck@sysroot-device.service.
   In the real system we run systemd-fsck-root.service. It is hard
   to pass the information that the latter should not run if the first
   succeeded using unit state only.
   
   - in the real system, we need a synchronization point between the fsck
 for root and other fscks, to express the dependency to run this
 systemd-fsck@.service before all other systemd-fsck@ units. We
 cannot express it directly, because there are no wildcard
 dependencies. We could use a target as a sychronization point, but
 then we would have to provide drop-ins to order
 systemd-fsck@-.service before the target, and all others after it,
 which becomes messy. The currently used alternative of having a
 special unit (systemd-fsck-root.service) makes it easy to express
 this dependency, and seems to be the best solution.
   
   - we cannot use systemd-fsck-root.service in the initramfs, because
 other fsck units should not be ordered after it. In the real system,
 the root device is always checked and mounted before other filesystems,
 but in the initramfs this doesn't have to be true: /sysroot might be
 stacked on other filesystems and devices.
   
   - the name of the root device can legitimately be specified in a
 different way in the initramfs (on the kernel command line, or
 automatically discovered through GPT), and in the real fs (in 
   /etc/fstab).
 Even if we didn't need systemd-fsck-root.service as a synchronization
 point, it would be hard to ensure the same instance parameter is
 provided for systemd-fsck@.service in the initrams and the real
 system.
   
   Let's use a side channel to pass this information.
   /run/systemd/fsck-root-done is touched after fsck in the initramfs
   succeeds, through an ExecStartPost line in a drop-in for
   systemd-fsck@sysroot.service.
   
   https://bugzilla.redhat.com/show_bug.cgi?id=1201979
   ---
src/shared/generator.c | 7 +++
units/systemd-fsck-root.service.in | 1 +
2 files changed, 8 insertions(+)
   
   diff --git a/src/shared/generator.c b/src/shared/generator.c
   index 7b2f846175..a71222d1cb 100644
   --- a/src/shared/generator.c
   +++ b/src/shared/generator.c
   @@ -78,6 +78,13 @@ int generator_write_fsck_deps(
RequiresOverridable=%1$s\n
After=%1$s\n,
fsck);
   +
   +if (in_initrd()  path_equal(where, /sysroot))
   +return write_drop_in_format(dir, fsck, 50, 
   stamp,
   +# Automatically 
   generated by %s\n\n
   +[Service]\n
   +
   ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n,
   +
   program_invocation_short_name);
}

return 0;
   diff --git a/units/systemd-fsck-root.service.in 
   b/units/systemd-fsck-root.service.in
   index 3617abf04a..48dacc841c 100644
   --- a/units/systemd-fsck-root.service.in
   +++ b/units/systemd-fsck-root.service.in
   @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8)
DefaultDependencies=no
Before=local-fs.target shutdown.target
ConditionPathIsReadWrite=!/
   +ConditionPathExists=!/run/systemd/fsck-root-done

[Service]
Type=oneshot
   -- 
   2.3.5
   
   ___
   systemd-devel mailing list
   systemd-devel@lists.freedesktop.org
   

Re: [systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Lennart Poettering
On Sun, 03.05.15 17:18, Víctor Fernández (vfr...@gmail.com) wrote:

 Hello
 
 I'm using rigth now a Manjaro distribution (derived from arch). Making some
 test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will
 cause system to enter on unstable mode:
 
 after doing this, the system reboot graphic server (at least, it request to
 login again) and if you resend the SIGABRT, the system goes to Kernel Panic
 Mode.
 
 Here is the code I've tested (executing as sudo, of course).
 
 echo int main(){kill(1,6);kill(1,6);}  a.c  gcc a.c  sudo ./a.out
 
 It appears not to be a very large problem (since root permisions are
 required), but I think is an undiserable behaviour.
 
 Is this really a bug?

Well, there are tons of ways how you can break your system if you are
root. For example:

  dd if=/dev/urandom of=/dev/sda

We cannot (and actually should not) try to prevent the user from
shooting his own foot if he really desires to do so.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice

2015-05-03 Thread Andrei Borzenkov
В Sun, 3 May 2015 16:17:15 +0200
Lennart Poettering lenn...@poettering.net пишет:

 On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:
 
 So, the last time we discussed this we figured we should do this
 differently, and simply generate systemd-fsck-root.service in the
 initrd as well, that uses a different command line internally. The end
 result would then be that we can do without flag file, and always have
 the guarantee that systemd-fsck-root.service is the services that
 fsck'ed the root file system, regardless whether in initrd or not.
 

systemd-fsck@.service has explicit dependency on
systemd-fsck-root.service so other mounts (/usr, anything else?) will
be serialized after it. Currently they can run in parallel.

Not I think it is a big problem, but at least to consider.

 Harald, can you comment?
 
  In the initramfs, we run systemd-fsck@sysroot-device.service.
  In the real system we run systemd-fsck-root.service. It is hard
  to pass the information that the latter should not run if the first
  succeeded using unit state only.
  
  - in the real system, we need a synchronization point between the fsck
for root and other fscks, to express the dependency to run this
systemd-fsck@.service before all other systemd-fsck@ units. We
cannot express it directly, because there are no wildcard
dependencies. We could use a target as a sychronization point, but
then we would have to provide drop-ins to order
systemd-fsck@-.service before the target, and all others after it,
which becomes messy. The currently used alternative of having a
special unit (systemd-fsck-root.service) makes it easy to express
this dependency, and seems to be the best solution.
  
  - we cannot use systemd-fsck-root.service in the initramfs, because
other fsck units should not be ordered after it. In the real system,
the root device is always checked and mounted before other filesystems,
but in the initramfs this doesn't have to be true: /sysroot might be
stacked on other filesystems and devices.
  
  - the name of the root device can legitimately be specified in a
different way in the initramfs (on the kernel command line, or
automatically discovered through GPT), and in the real fs (in /etc/fstab).
Even if we didn't need systemd-fsck-root.service as a synchronization
point, it would be hard to ensure the same instance parameter is
provided for systemd-fsck@.service in the initrams and the real
system.
  
  Let's use a side channel to pass this information.
  /run/systemd/fsck-root-done is touched after fsck in the initramfs
  succeeds, through an ExecStartPost line in a drop-in for
  systemd-fsck@sysroot.service.
  
  https://bugzilla.redhat.com/show_bug.cgi?id=1201979
  ---
   src/shared/generator.c | 7 +++
   units/systemd-fsck-root.service.in | 1 +
   2 files changed, 8 insertions(+)
  
  diff --git a/src/shared/generator.c b/src/shared/generator.c
  index 7b2f846175..a71222d1cb 100644
  --- a/src/shared/generator.c
  +++ b/src/shared/generator.c
  @@ -78,6 +78,13 @@ int generator_write_fsck_deps(
   RequiresOverridable=%1$s\n
   After=%1$s\n,
   fsck);
  +
  +if (in_initrd()  path_equal(where, /sysroot))
  +return write_drop_in_format(dir, fsck, 50, stamp,
  +# Automatically 
  generated by %s\n\n
  +[Service]\n
  +
  ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n,
  +
  program_invocation_short_name);
   }
   
   return 0;
  diff --git a/units/systemd-fsck-root.service.in 
  b/units/systemd-fsck-root.service.in
  index 3617abf04a..48dacc841c 100644
  --- a/units/systemd-fsck-root.service.in
  +++ b/units/systemd-fsck-root.service.in
  @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8)
   DefaultDependencies=no
   Before=local-fs.target shutdown.target
   ConditionPathIsReadWrite=!/
  +ConditionPathExists=!/run/systemd/fsck-root-done
   
   [Service]
   Type=oneshot
  -- 
  2.3.5
  
  ___
  systemd-devel mailing list
  systemd-devel@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/systemd-devel
 
 
 Lennart
 

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Víctor Fernández
Hello

I'm using rigth now a Manjaro distribution (derived from arch). Making some
test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will
cause system to enter on unstable mode:

after doing this, the system reboot graphic server (at least, it request to
login again) and if you resend the SIGABRT, the system goes to Kernel Panic
Mode.

Here is the code I've tested (executing as sudo, of course).

echo int main(){kill(1,6);kill(1,6);}  a.c  gcc a.c  sudo ./a.out

It appears not to be a very large problem (since root permisions are
required), but I think is an undiserable behaviour.

Is this really a bug?

Thanks!
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice

2015-05-03 Thread Andrei Borzenkov
В Sun, 3 May 2015 15:33:56 +
Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl пишет:

 On Sun, May 03, 2015 at 06:06:58PM +0300, Andrei Borzenkov wrote:
  В Sun, 3 May 2015 16:17:15 +0200
  Lennart Poettering lenn...@poettering.net пишет:
  
   On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) 
   wrote:
   
   So, the last time we discussed this we figured we should do this
   differently, and simply generate systemd-fsck-root.service in the
   initrd as well, that uses a different command line internally. The end
   result would then be that we can do without flag file, and always have
   the guarantee that systemd-fsck-root.service is the services that
   fsck'ed the root file system, regardless whether in initrd or not.
   
  
  systemd-fsck@.service has explicit dependency on
  systemd-fsck-root.service so other mounts (/usr, anything else?) will
  be serialized after it. Currently they can run in parallel.
  
  Not I think it is a big problem, but at least to consider.
 Yeah, that's the main wart. I tried to outline it in the second bullet
 point below.
 

I was not sure about stacked filesystems; do you mean something like
root on loop mount?

 I prepared a patch to generate systemd-fsck-root.service in
 generator_write_fsck_deps() first, but I wasn't happy with the result.
 If we ignore the dependency issue, it might be judged more elegant,
 since it just uses unit stat to pass information.
 
   Harald, can you comment?
 
 Zbyszek
 
In the initramfs, we run systemd-fsck@sysroot-device.service.
In the real system we run systemd-fsck-root.service. It is hard
to pass the information that the latter should not run if the first
succeeded using unit state only.

- in the real system, we need a synchronization point between the fsck
  for root and other fscks, to express the dependency to run this
  systemd-fsck@.service before all other systemd-fsck@ units. We
  cannot express it directly, because there are no wildcard
  dependencies. We could use a target as a sychronization point, but
  then we would have to provide drop-ins to order
  systemd-fsck@-.service before the target, and all others after it,
  which becomes messy. The currently used alternative of having a
  special unit (systemd-fsck-root.service) makes it easy to express
  this dependency, and seems to be the best solution.

- we cannot use systemd-fsck-root.service in the initramfs, because
  other fsck units should not be ordered after it. In the real system,
  the root device is always checked and mounted before other 
filesystems,
  but in the initramfs this doesn't have to be true: /sysroot might be
  stacked on other filesystems and devices.

- the name of the root device can legitimately be specified in a
  different way in the initramfs (on the kernel command line, or
  automatically discovered through GPT), and in the real fs (in 
/etc/fstab).
  Even if we didn't need systemd-fsck-root.service as a synchronization
  point, it would be hard to ensure the same instance parameter is
  provided for systemd-fsck@.service in the initrams and the real
  system.

Let's use a side channel to pass this information.
/run/systemd/fsck-root-done is touched after fsck in the initramfs
succeeds, through an ExecStartPost line in a drop-in for
systemd-fsck@sysroot.service.

https://bugzilla.redhat.com/show_bug.cgi?id=1201979
---
 src/shared/generator.c | 7 +++
 units/systemd-fsck-root.service.in | 1 +
 2 files changed, 8 insertions(+)

diff --git a/src/shared/generator.c b/src/shared/generator.c
index 7b2f846175..a71222d1cb 100644
--- a/src/shared/generator.c
+++ b/src/shared/generator.c
@@ -78,6 +78,13 @@ int generator_write_fsck_deps(
 RequiresOverridable=%1$s\n
 After=%1$s\n,
 fsck);
+
+if (in_initrd()  path_equal(where, /sysroot))
+return write_drop_in_format(dir, fsck, 50, 
stamp,
+# Automatically 
generated by %s\n\n
+[Service]\n
+
ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n,
+
program_invocation_short_name);
 }
 
 return 0;
diff --git a/units/systemd-fsck-root.service.in 
b/units/systemd-fsck-root.service.in
index 3617abf04a..48dacc841c 100644
--- a/units/systemd-fsck-root.service.in
+++ b/units/systemd-fsck-root.service.in
@@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8)
 DefaultDependencies=no
 Before=local-fs.target shutdown.target
 

Re: [systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Mantas Mikulėnas
On Sun, May 3, 2015 at 6:18 PM, Víctor Fernández vfr...@gmail.com wrote:

 Hello

 I'm using rigth now a Manjaro distribution (derived from arch). Making
 some test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will
 cause system to enter on unstable mode:

 after doing this, the system reboot graphic server (at least, it request
 to login again) and if you resend the SIGABRT, the system goes to Kernel
 Panic Mode.

 Here is the code I've tested (executing as sudo, of course).

 echo int main(){kill(1,6);kill(1,6);}  a.c  gcc a.c  sudo ./a.out

 It appears not to be a very large problem (since root permisions are
 required), but I think is an undiserable behaviour.

 Is this really a bug?


No, it is not a bug that a program crashes *when you ask it to crash*. The
whole point of SIGABRT is that it kills the program immediately, when the
program calls abort() after detecting some serious inconsistency.

It is also not a bug that the kernel panics when init exits. Several things
depend on the existence of PID 1.

-- 
Mantas Mikulėnas graw...@gmail.com
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Sending a SIGABRT to PID1

2015-05-03 Thread Mantas Mikulėnas
On Sun, May 3, 2015 at 6:54 PM, Víctor Fernández vfr...@gmail.com wrote:

 Ok, Thanks for your reply.

 But, just out of curiosity, why init process gets down with a SIGABRT and
 not with a SIGKILL (9), being this a signal which cannot be caught, blocked
 or ignored?


pid 1 is allowed to catch SIGKILL, and usually does so, so that you can
sigkill everything (e.g. Alt+SysRq+I) and still have a working system
afterwards.

Meanwhile, things like SIGABRT or SIGSEGV or SIGILL actually mean that
something *abnormal* happened – if a program receives them, it's *supposed
to* crash. So systemd catches these signals but enters crash mode
immediately.

-- 
Mantas Mikulėnas graw...@gmail.com
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn: cannot join existing macvlan

2015-05-03 Thread Kai Krakow
Kai Krakow hurikha...@gmail.com schrieb:

Hello again!

Amended below...

 I'm not sure about this but I suspect that I cannot start a second nspawn
 container with --network-macvlan when another nspawn instance has created
 it before:
 
 # systemd-nspawn -b --network-macvlan=enp4s0
 Spawning container gentoo-mysql-base on
 /var/lib/machines/gentoo-mysql-base. Press ^] three times within 1s to
 kill container. Failed to add new macvlan interfaces: File exists
 
 To my surprise it works when adding machines to machines.target. While you
 cannot start them through means of systemd because of the same error, it
 works during boot of the whole system: All containers boot up properly -
 but stop one and you cannot restart it.
 
 So it looks like there's an unintentional race condition during boot which
 allows to create this interface but when the system is up, it no longer
 works because the race condition is no longer present.
 
 systemd-nspawn should probably just allow joining existing macvlan
 bridges. I would fix it in the code but I don't know the implications why
 this check is in there in the first place.
 
 A second fix should maybe do something about such race conditions if it is
 such one. I suspect there are cases where the interface presence check
 makes actually sense.

I installed something which is called a stable v219 snapshot, I could not 
find out which changes are included, tho:

*systemd-219_p112 (26 Apr 2015)

  26 Apr 2015; Mike Gilbert flop...@gentoo.org +systemd-219_p112.ebuild:
  Add a snapshot from the v219-stable branch upstream.

The behavior described above has changed with this snapshot: Machines using 
macvlan no longer start, even not a boot-up (which worked before).

The error is still the same:

# systemd-nspawn -b --link-journal=try-guest --network-macvlan=enp4s0 --
bind=/usr/portage --bind-ro=/usr/src --machine=test
Spawning container test on /var/lib/machines/test.
Press ^] three times within 1s to kill container.
Failed to add new macvlan interfaces: File exists

I still don't think that systemd-nspawn should insist on creating the host-
side macvlan bridge and fail, if it cannot. It should just accept that it is 
already there.

Actually I even created this device in the host with networkd because by 
design macvlan and parent device cannot communicate with each other without 
switch support and won't communicate directly locally either. Thus, you need 
to attach a host-side macvlan device to your physical parent device to 
communicate with the other virtual MAC addresses on the same host, and then 
setup your IP configuration on this device.

Of course one could argue that this is a security feature of nspawn to 
isolate containers and hosts from each other. So maybe, put an option to 
allow nspawn to join an existing macvlan, maybe --network-join-macvlan.

-- 
Replies to list only preferred.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel