Re: [systemd-devel] [PATCH 1/3] generators: rename add_{root, usr}_mount to add_{sysroot, sysroot_usr}_mount
On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: This makes it obvious that those functions are only usable in the initramfs. Also, add a warning when noauto, nofail, or automount is used for the root fs, instead of silently ignoring. Using those options would be a sign of significant misconfiguration, and if we bother to check for them, than let's go all the way and complain. Other various small cleanups and reformattings elsewhere. Sounds all good to me! --- src/fstab-generator/fstab-generator.c | 21 + src/shared/generator.c| 21 - src/shared/generator.h| 17 + 3 files changed, 38 insertions(+), 21 deletions(-) diff --git a/src/fstab-generator/fstab-generator.c b/src/fstab-generator/fstab-generator.c index 7aee3359e7..664ee2aa6f 100644 --- a/src/fstab-generator/fstab-generator.c +++ b/src/fstab-generator/fstab-generator.c @@ -176,6 +176,7 @@ static int write_idle_timeout(FILE *f, const char *where, const char *opts) { return 0; } + static int add_mount( const char *what, const char *where, @@ -213,10 +214,14 @@ static int add_mount( return 0; if (path_equal(where, /)) { -/* The root disk is not an option */ -automount = false; -noauto = false; -nofail = false; +if (noauto) +log_warning(Ignoring \noauto\ for root device); +if (nofail) +log_warning(Ignoring \nofail\ for root device); +if (automount) +log_warning(Ignoring automount option for root device); + +noauto = nofail = automount = false; } name = unit_name_from_path(where, .mount); @@ -419,7 +424,7 @@ static int parse_fstab(bool initrd) { return r; } -static int add_root_mount(void) { +static int add_sysroot_mount(void) { _cleanup_free_ char *what = NULL; const char *opts; @@ -453,7 +458,7 @@ static int add_root_mount(void) { /proc/cmdline); } -static int add_usr_mount(void) { +static int add_sysroot_usr_mount(void) { _cleanup_free_ char *what = NULL; const char *opts; @@ -600,9 +605,9 @@ int main(int argc, char *argv[]) { /* Always honour root= and usr= in the kernel command line if we are in an initrd */ if (in_initrd()) { -r = add_root_mount(); +r = add_sysroot_mount(); if (r == 0) -r = add_usr_mount(); +r = add_sysroot_usr_mount(); } /* Honour /etc/fstab only when that's enabled */ diff --git a/src/shared/generator.c b/src/shared/generator.c index 569b25bb7c..7b2f846175 100644 --- a/src/shared/generator.c +++ b/src/shared/generator.c @@ -32,13 +32,13 @@ int generator_write_fsck_deps( FILE *f, -const char *dest, +const char *dir, const char *what, const char *where, const char *fstype) { assert(f); -assert(dest); +assert(dir); assert(what); assert(where); @@ -58,10 +58,10 @@ int generator_write_fsck_deps( return log_warning_errno(r, Checking was requested for %s, but fsck.%s cannot be used: %m, what, fstype); } -if (streq(where, /)) { +if (path_equal(where, /)) { char *lnk; -lnk = strjoina(dest, / SPECIAL_LOCAL_FS_TARGET .wants/systemd-fsck-root.service); +lnk = strjoina(dir, / SPECIAL_LOCAL_FS_TARGET .wants/systemd-fsck-root.service); mkdir_parents(lnk, 0755); if (symlink(SYSTEM_DATA_UNIT_PATH /systemd-fsck-root.service, lnk) 0) @@ -75,17 +75,20 @@ int generator_write_fsck_deps( return log_oom(); fprintf(f, -RequiresOverridable=%s\n -After=%s\n, -fsck, +RequiresOverridable=%1$s\n +After=%1$s\n, fsck); } return 0; } -int generator_write_timeouts(const char *dir, const char *what, const char *where, - const char *opts, char **filtered) { +int generator_write_timeouts( +const char *dir, +const char *what, +const char *where, +const char *opts, +char **filtered) { /* Allow configuration how long we wait for a device that * backs a mount point to show
Re: [systemd-devel] [PATCH 2/3] Allow $SYSTEMD_PRETEND_INITRD to override initramfs detection
On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: When testing generators and other utilities, it is extremely useful to be able to trigger initramfs behaviour. Hmm, what about the following solution: instead of checking with access() for /etc/initrd-release and then also checking with statfs() on / whether the root disk is writable, maybe we should just immediately invoke statfs() on /etc/initrd-release and check the results. If the call returns ENOENT we know that the file doesn't exist, and if it returns useful data we can verify if it's tmpfs. Now, with that in place to test initrd code something like this suffices: touch /etc/initrd-release /run/initrd-release mount --bind /run/initrd-release /etc/initrd-release As that would result in a file in /etc/initrd-release that is backed by tmpfs In general I'd be very conservative when adding new APIs (which is basically what $SYSTEMD_PRETEND_INITRD would become), especially if we only need them for debugging, they are are quiet dangerous and when we have other options too... I hope that makes sense? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [Q] About supporting nested systemd daemon
On Thu, 30.04.15 15:42, Alban Crequy (al...@endocode.com) wrote: systemd-nspawn nowadays mounts all hierarchies into the container, but mounts all controller hierarchies read-only, and of the name=systemd hierarchy mounts everything read-only, except the subtree the container is allowed to manage. That way only the cgroup tree the container needs access to is writable to it. That solution however does not hide the cgroup tree. A process running inside the container can still go an explore the tree and its attributes. However, all other groups will appear empty to it, since processes not in the container PID namespaces will be suppressed when reading the member process list. To sum up what systemd-nspawn is currently mounting in the container: - /sys/fs/cgroup/systemd/ -- mounted RO - /sys/fs/cgroup/systemd/machine.slice/machine-xxx.scope/ -- mounted RW - /sys/fs/cgroup/cpu,cpuacct/ -- mounted RO - etc. for other cgroup hierarchies -- mounted RO Correct. In order to let systemd in the container restrict cpu, memory, etc. on some of its services (see manpage systemd.resource-control(5)), rkt would like systemd-nspawn to mount a subtree of some hierarchy (cpu,cpuacct, memory) in read-write mode. That's really not a safe thing to do right now... the kernel isn't ready for this, as cgroups access is an all-or-nothing thing currently: if you have access to a cgroup and cane creat child cgroups in it you have access to *all* attributes you like, the dangerous ones as well as the not so dangerous ones. Is there any issues with changing the systemd-nspawn mounts in the following way: - /sys/fs/cgroup/systemd/ -- mounted RO - /sys/fs/cgroup/systemd/machine.slice/machine-xxx.scope/ -- mounted RW - /sys/fs/cgroup/cpu,cpuacct/ -- mounted RO - /sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-xxx.scope/ -- mounted RW - etc. for other cgroup hierarchies. Iago wrote two experimental patches on systemd-nspawn to try that and it worked. Delegate=yes was enabled in systemd-nspawn in order to test this: https://github.com/endocode/systemd/commits/iaguis/delegate But I would like to know what is missing to make this safe (or if it is already safe to do). Well, nspawn does actually not make any guarantees about security currently. Since we pass CAP_SYS_ADMIN by default to the contaienrs people can mount whatever they want and remount things freely from within. Hence, opening this up would not make things much worse. That said, I am a bit concerned about opening this up by default. Even though containers are insecure we should try to be safe wherever we can if it doesn't affect usability too much. Adding a new cmdline switch for all of this sounds not too attractive though, but maybe a --delegate switch would be OK, which would open up all controllers to the containers It would have a similar effect then on the containers as Delegate=yes has for service processes... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Sending a SIGABRT to PID1
On Sun, 03.05.15 17:54, Víctor Fernández (vfr...@gmail.com) wrote: Ok, Thanks for your reply. But, just out of curiosity, why init process gets down with a SIGABRT and not with a SIGKILL (9), being this a signal which cannot be caught, blocked or ignored? The kernel refuses to deliver SIGKILL to PID 1. I mean, it's a signal that cannot be caught by userspace anyway, and hence it would without exception result in machine halt, and hence the kernel eats it up... I am pretty sure it's pretty irrelevant whether the kernel delivers it or not, but they chose not to... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Sending a SIGABRT to PID1
On Sun, 03.05.15 19:10, Mantas Mikulėnas (graw...@gmail.com) wrote: On Sun, May 3, 2015 at 6:54 PM, Víctor Fernández vfr...@gmail.com wrote: Ok, Thanks for your reply. But, just out of curiosity, why init process gets down with a SIGABRT and not with a SIGKILL (9), being this a signal which cannot be caught, blocked or ignored? pid 1 is allowed to catch SIGKILL, and usually does so, so that you can sigkill everything (e.g. Alt+SysRq+I) and still have a working system afterwards. Hmm, it is allowed to do catch SIGKILL? That would be news to me, and systemd certainly doesn't. Do you have any reference? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Journald logging handler for Python 3 and AsyncIO integration
On Sat, 02.05.15 15:12, Ludovic Gasc (gml...@gmail.com) wrote: 2. We use heavily AsyncIO module to have async pattern in Python, especially for I/O: https://docs.python.org/3/library/asyncio.html In the source code of python-systemd, I've seen that you use a C glue to interact with journald, but I don't understand what's the communication between my Python daemon process and journald: unix sockets ? Other mechanism ? Depends on the mechanism, it should be have an impact for us. The communication is via an AF_UNIX/SOCK_DGRAM socket in the file system. I am very sure that logging should not be asynchronous non-blocking IO. It's about reliably getting out log messages at the right times, and that's a property you lose if you enqueue logs non-blocking. I don't think that's a good idea in any programming language to log asynchronously. I mean, there's a reason why libc syslog() is blocking too. If you are afraid of blocking logging, then make sure to use large socket buffers. the journal client library and journald will try to use very large buffers, but this doesn't always work if the client is unprivileged. Also you might have to increase the number of datagrams that may be queued with /proc/sys/net/unix/max_dgram_qlen Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice
On Sun, 03.05.15 18:06, Andrei Borzenkov (arvidj...@gmail.com) wrote: On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: So, the last time we discussed this we figured we should do this differently, and simply generate systemd-fsck-root.service in the initrd as well, that uses a different command line internally. The end result would then be that we can do without flag file, and always have the guarantee that systemd-fsck-root.service is the services that fsck'ed the root file system, regardless whether in initrd or not. systemd-fsck@.service has explicit dependency on systemd-fsck-root.service so other mounts (/usr, anything else?) will be serialized after it. Currently they can run in parallel. Not I think it is a big problem, but at least to consider. One option could be to introduce a new target root-fs-ready.target that is pulled in from the host OS but not in the initrd. s-f-r.s would order itself before it, and s-f@.s after. Hence, if the target is in the initial transaction then it will effectively serialize things as we want if no initrd is in the mix, and if it is missing from the initial transaction then everything will be parallelize it. That said, I don't think it's worth it. I'd just accept that things in the initrd as as parallel or serial as they are on the host... Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] pam_systemd.so indirectly calling pam_acct_mgmt
On Sat, 02.05.15 07:01, Stephen Gallagher (sgall...@redhat.com) wrote: Well, I guess for now. But note that eventually we hope to move most programs invoked from .desktop into this as systemd services. This then means that the actual sessions will become pretty empty, with only stubs remaining that trigger services off this user instance of systems. If you do that, you will still need some way to invoke PAM with different service identities otherwise you'll be implementing a pretty severe vulnerability into the system. If all services are authorized by the same PAM service, it amounts to removing the ability for administrators to differentiate which actions a particular user is allowed to perform. Well, if you are enough logged in to run arbitrary scripts (like gdm, ssh or cron allow you to), then you are in, for whatever you want to do, there's no way around that, and having different PAM services could only hide that fact, but not avoid it... The admin still has a lot of control on how you can log in though. For example, gdm will still use PAM to check if you are allowed to login graphically, on a seat. If that's denied, then the login will be refused. Only if you managed to login you can also use the systemd user instance. Also note that lingering is something that needs to be turned on with privileges. If you don't have the privs to turn this on, you cannot make use of this feature and the user instance of systemd is strictly reference counted by your PAM sessions which means as soon as you logged out from all your terminals/graphical seats you also lost the user instance. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] networkd: Is auto-negotiation turned off when specifying parameters in a link file?
On Sat, 02.05.15 12:00, Paul Menzel (paulepan...@users.sourceforge.net) wrote: /etc/udev/rules.d/10-speed1G-enp1s6.rules ACTION==add, SUBSYSTEM==net, RUN+=/usr/sbin/ethtool -s enp1s6 advertise 0x20 :03 systemd[1]: Starting Network Service... :05 systemd-networkd[1612]: enp1s6 : link configured :05 systemd-networkd[1612]: enp1s6 : gained carrier :06 systemd-networkd[1612]: enp1s6 : lost carrier :09 systemd-networkd[1612]: enp1s6 : gained carrier ~~~ /etc/udev/rules.d/10-speed1G-enp1s6.rules- :15 systemd[1]: Starting Network Service... :17 systemd-networkd[1633]: enp1s6 : link configured :17 systemd-networkd[1633]: enp1s6 : gained carrier So in your case, `gained carrier` is indeed shown earlier saving two seconds. The next message probably indicates a problem with the driver. Poma, what Linux kernel do you use? Lennart, is poma’s test sufficient to show that integrating an `advertise` command(?) into systemd-networkd would be useful? Hmm? Not sure I understand the test, but if I got it right then it shows that using ethtool like this slows things down by 3s? Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn --template: should it delete /etc/hostname?
On Fri, 01.05.15 19:38, Kai Krakow (hurikha...@gmail.com) wrote: Hello! If I create a new machine by cloning using systemd-nspawn --template, should it remove etc/hostname? It already creates a new machine-id etc, and the hostname should probably not be set for a new container in this case, regardless of whether the template is a real template or a cloned machine. Well, we don't touch the images really at all, we also leave /etc/machine-id in place and everything else that could identify the machine (such as MAC or IP configuration, ...) People should not misunderstand --template (or --ephemeral or machinectl clone) as something that would change identity of a system really, and I think it might be problematic if we started to do so since we could never implement this fully, because there will always be more and more to patch and we shouldnt have that much app-specific patching code in nspawn. Thoughts? I suppose something similar should be possible for statically configured IP addresses as an option, tho I wouldn't know how to implement that because systemd-networkd doesn't expect that information at well defined location. For the case of the hostname things are relatively easy: if you do not set the hostname from inside the container, then the hostname will be inherited from the container manager, and be the same as the container name you pick with -M (or whatever is derived automatically from -D if you do not use -M). Hence, if you simply remove /etc/hostname from your container, then using --template/--ephemeral/machinectl clone will work the best possible way: the image you create with that will always use the new container name you use as host name... Now for MAC addresses things are similar automatic: the MAC addresses are hashed from the container name, hence should change automatically when you run the container under a new name. IP addresses assigned via DHCP networkd's dhcp server should probably pick the address using a hash of the client's MAC address or so, so that the IP address stays normally stable, but changes when the instance is cloned under a new name. (currently the dhcp server assigns the client addresses randomly though). And the machine ID should support a mode that works similar. i.e. a way how we can boot with a stable id that changes automatically when cloned. Maybe generate it from the container manager as hash of the host's machine id and the container name, or so. I added TODO list items for the latter two. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Sending a SIGABRT to PID1
Ok, Thanks for your reply. But, just out of curiosity, why init process gets down with a SIGABRT and not with a SIGKILL (9), being this a signal which cannot be caught, blocked or ignored? PD: I definitely not try the command above 2015-05-03 17:22 GMT+02:00 Lennart Poettering lenn...@poettering.net: On Sun, 03.05.15 17:18, Víctor Fernández (vfr...@gmail.com) wrote: Hello I'm using rigth now a Manjaro distribution (derived from arch). Making some test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will cause system to enter on unstable mode: after doing this, the system reboot graphic server (at least, it request to login again) and if you resend the SIGABRT, the system goes to Kernel Panic Mode. Here is the code I've tested (executing as sudo, of course). echo int main(){kill(1,6);kill(1,6);} a.c gcc a.c sudo ./a.out It appears not to be a very large problem (since root permisions are required), but I think is an undiserable behaviour. Is this really a bug? Well, there are tons of ways how you can break your system if you are root. For example: dd if=/dev/urandom of=/dev/sda We cannot (and actually should not) try to prevent the user from shooting his own foot if he really desires to do so. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice
On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: So, the last time we discussed this we figured we should do this differently, and simply generate systemd-fsck-root.service in the initrd as well, that uses a different command line internally. The end result would then be that we can do without flag file, and always have the guarantee that systemd-fsck-root.service is the services that fsck'ed the root file system, regardless whether in initrd or not. Harald, can you comment? In the initramfs, we run systemd-fsck@sysroot-device.service. In the real system we run systemd-fsck-root.service. It is hard to pass the information that the latter should not run if the first succeeded using unit state only. - in the real system, we need a synchronization point between the fsck for root and other fscks, to express the dependency to run this systemd-fsck@.service before all other systemd-fsck@ units. We cannot express it directly, because there are no wildcard dependencies. We could use a target as a sychronization point, but then we would have to provide drop-ins to order systemd-fsck@-.service before the target, and all others after it, which becomes messy. The currently used alternative of having a special unit (systemd-fsck-root.service) makes it easy to express this dependency, and seems to be the best solution. - we cannot use systemd-fsck-root.service in the initramfs, because other fsck units should not be ordered after it. In the real system, the root device is always checked and mounted before other filesystems, but in the initramfs this doesn't have to be true: /sysroot might be stacked on other filesystems and devices. - the name of the root device can legitimately be specified in a different way in the initramfs (on the kernel command line, or automatically discovered through GPT), and in the real fs (in /etc/fstab). Even if we didn't need systemd-fsck-root.service as a synchronization point, it would be hard to ensure the same instance parameter is provided for systemd-fsck@.service in the initrams and the real system. Let's use a side channel to pass this information. /run/systemd/fsck-root-done is touched after fsck in the initramfs succeeds, through an ExecStartPost line in a drop-in for systemd-fsck@sysroot.service. https://bugzilla.redhat.com/show_bug.cgi?id=1201979 --- src/shared/generator.c | 7 +++ units/systemd-fsck-root.service.in | 1 + 2 files changed, 8 insertions(+) diff --git a/src/shared/generator.c b/src/shared/generator.c index 7b2f846175..a71222d1cb 100644 --- a/src/shared/generator.c +++ b/src/shared/generator.c @@ -78,6 +78,13 @@ int generator_write_fsck_deps( RequiresOverridable=%1$s\n After=%1$s\n, fsck); + +if (in_initrd() path_equal(where, /sysroot)) +return write_drop_in_format(dir, fsck, 50, stamp, +# Automatically generated by %s\n\n +[Service]\n + ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n, + program_invocation_short_name); } return 0; diff --git a/units/systemd-fsck-root.service.in b/units/systemd-fsck-root.service.in index 3617abf04a..48dacc841c 100644 --- a/units/systemd-fsck-root.service.in +++ b/units/systemd-fsck-root.service.in @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8) DefaultDependencies=no Before=local-fs.target shutdown.target ConditionPathIsReadWrite=!/ +ConditionPathExists=!/run/systemd/fsck-root-done [Service] Type=oneshot -- 2.3.5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] mount crypto_LUKS device in conatiner
On Fri, 01.05.15 11:39, arnaud gaboury (arnaud.gabo...@gmail.com) wrote: My container will need access to a Luks encrypted device (/dev/sdd4) for its DB. Only very select devices are accessible from inside containers, more specifically the ones where it is fully safe to share them between multiple containers and the host. /dev/random and /dev/null are of this kind, however device mapper (DM) devices are not. This is a limitation of the Linux kernel really, it does not support proper device virtualization for things like this, and probably never will. Or in other words: LVM and DM (and thus LUKS) are something you can use on the host only, sorry. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] udev interface naming for SR-IOV VFs
On Fri, 01.05.15 11:04, Dan Kenigsberg (dan...@redhat.com) wrote: On Mon, Apr 20, 2015 at 08:43:21PM +0200, Lennart Poettering wrote: On Fri, 17.04.15 14:19, Nir Soffer (nir...@gmail.com) wrote: - You may wait for unrelated events that happen to trigger in the same time, waiting after the new interfaces are ready. I think you need something like: while True: try: udevadm.settle(1) except udevadm.Timeout: pass else: if all devices are ready: break time.sleep(1) Please never use udevadm settle in new code. Could you explain why? Is it because we are not sure if our events have not been queued when settle is called, or something more dramatic that should be documented in udevadm(1)? Well, when people use udev settle they do so usually because they assume that after they called it all devices of the kind they are looking for have shown up, and that all is good then. But that's really not how devices work these days, they can come and go at any time, and at boot we have no idea at what time they will all have appeared, as many of the subsystems (inclduing USB or things like iSCSI for example) can take pretty much any time they want before the devices pop up after powering on. Now of course, if you care only about SR-IOV and you know you triggered your devices manually right before it, then yes, what was triggered will have been processed at time of udev settle returning, and you are hence safe -- but even then it's actually not really doing what you really want it to do: it will settle until *all* devices currently being probed have finished being probed, which might be substantally more than what you are looking for. In all cases the right way to implement device handling in clients is to actually subscribe to things and wait for precisely for the devices you need, and not any longer. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice
On Sun, May 03, 2015 at 06:06:58PM +0300, Andrei Borzenkov wrote: В Sun, 3 May 2015 16:17:15 +0200 Lennart Poettering lenn...@poettering.net пишет: On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: So, the last time we discussed this we figured we should do this differently, and simply generate systemd-fsck-root.service in the initrd as well, that uses a different command line internally. The end result would then be that we can do without flag file, and always have the guarantee that systemd-fsck-root.service is the services that fsck'ed the root file system, regardless whether in initrd or not. systemd-fsck@.service has explicit dependency on systemd-fsck-root.service so other mounts (/usr, anything else?) will be serialized after it. Currently they can run in parallel. Not I think it is a big problem, but at least to consider. Yeah, that's the main wart. I tried to outline it in the second bullet point below. I prepared a patch to generate systemd-fsck-root.service in generator_write_fsck_deps() first, but I wasn't happy with the result. If we ignore the dependency issue, it might be judged more elegant, since it just uses unit stat to pass information. Harald, can you comment? Zbyszek In the initramfs, we run systemd-fsck@sysroot-device.service. In the real system we run systemd-fsck-root.service. It is hard to pass the information that the latter should not run if the first succeeded using unit state only. - in the real system, we need a synchronization point between the fsck for root and other fscks, to express the dependency to run this systemd-fsck@.service before all other systemd-fsck@ units. We cannot express it directly, because there are no wildcard dependencies. We could use a target as a sychronization point, but then we would have to provide drop-ins to order systemd-fsck@-.service before the target, and all others after it, which becomes messy. The currently used alternative of having a special unit (systemd-fsck-root.service) makes it easy to express this dependency, and seems to be the best solution. - we cannot use systemd-fsck-root.service in the initramfs, because other fsck units should not be ordered after it. In the real system, the root device is always checked and mounted before other filesystems, but in the initramfs this doesn't have to be true: /sysroot might be stacked on other filesystems and devices. - the name of the root device can legitimately be specified in a different way in the initramfs (on the kernel command line, or automatically discovered through GPT), and in the real fs (in /etc/fstab). Even if we didn't need systemd-fsck-root.service as a synchronization point, it would be hard to ensure the same instance parameter is provided for systemd-fsck@.service in the initrams and the real system. Let's use a side channel to pass this information. /run/systemd/fsck-root-done is touched after fsck in the initramfs succeeds, through an ExecStartPost line in a drop-in for systemd-fsck@sysroot.service. https://bugzilla.redhat.com/show_bug.cgi?id=1201979 --- src/shared/generator.c | 7 +++ units/systemd-fsck-root.service.in | 1 + 2 files changed, 8 insertions(+) diff --git a/src/shared/generator.c b/src/shared/generator.c index 7b2f846175..a71222d1cb 100644 --- a/src/shared/generator.c +++ b/src/shared/generator.c @@ -78,6 +78,13 @@ int generator_write_fsck_deps( RequiresOverridable=%1$s\n After=%1$s\n, fsck); + +if (in_initrd() path_equal(where, /sysroot)) +return write_drop_in_format(dir, fsck, 50, stamp, +# Automatically generated by %s\n\n +[Service]\n + ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n, + program_invocation_short_name); } return 0; diff --git a/units/systemd-fsck-root.service.in b/units/systemd-fsck-root.service.in index 3617abf04a..48dacc841c 100644 --- a/units/systemd-fsck-root.service.in +++ b/units/systemd-fsck-root.service.in @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8) DefaultDependencies=no Before=local-fs.target shutdown.target ConditionPathIsReadWrite=!/ +ConditionPathExists=!/run/systemd/fsck-root-done [Service] Type=oneshot -- 2.3.5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org
Re: [systemd-devel] Sending a SIGABRT to PID1
On Sun, 03.05.15 17:18, Víctor Fernández (vfr...@gmail.com) wrote: Hello I'm using rigth now a Manjaro distribution (derived from arch). Making some test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will cause system to enter on unstable mode: after doing this, the system reboot graphic server (at least, it request to login again) and if you resend the SIGABRT, the system goes to Kernel Panic Mode. Here is the code I've tested (executing as sudo, of course). echo int main(){kill(1,6);kill(1,6);} a.c gcc a.c sudo ./a.out It appears not to be a very large problem (since root permisions are required), but I think is an undiserable behaviour. Is this really a bug? Well, there are tons of ways how you can break your system if you are root. For example: dd if=/dev/urandom of=/dev/sda We cannot (and actually should not) try to prevent the user from shooting his own foot if he really desires to do so. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice
В Sun, 3 May 2015 16:17:15 +0200 Lennart Poettering lenn...@poettering.net пишет: On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: So, the last time we discussed this we figured we should do this differently, and simply generate systemd-fsck-root.service in the initrd as well, that uses a different command line internally. The end result would then be that we can do without flag file, and always have the guarantee that systemd-fsck-root.service is the services that fsck'ed the root file system, regardless whether in initrd or not. systemd-fsck@.service has explicit dependency on systemd-fsck-root.service so other mounts (/usr, anything else?) will be serialized after it. Currently they can run in parallel. Not I think it is a big problem, but at least to consider. Harald, can you comment? In the initramfs, we run systemd-fsck@sysroot-device.service. In the real system we run systemd-fsck-root.service. It is hard to pass the information that the latter should not run if the first succeeded using unit state only. - in the real system, we need a synchronization point between the fsck for root and other fscks, to express the dependency to run this systemd-fsck@.service before all other systemd-fsck@ units. We cannot express it directly, because there are no wildcard dependencies. We could use a target as a sychronization point, but then we would have to provide drop-ins to order systemd-fsck@-.service before the target, and all others after it, which becomes messy. The currently used alternative of having a special unit (systemd-fsck-root.service) makes it easy to express this dependency, and seems to be the best solution. - we cannot use systemd-fsck-root.service in the initramfs, because other fsck units should not be ordered after it. In the real system, the root device is always checked and mounted before other filesystems, but in the initramfs this doesn't have to be true: /sysroot might be stacked on other filesystems and devices. - the name of the root device can legitimately be specified in a different way in the initramfs (on the kernel command line, or automatically discovered through GPT), and in the real fs (in /etc/fstab). Even if we didn't need systemd-fsck-root.service as a synchronization point, it would be hard to ensure the same instance parameter is provided for systemd-fsck@.service in the initrams and the real system. Let's use a side channel to pass this information. /run/systemd/fsck-root-done is touched after fsck in the initramfs succeeds, through an ExecStartPost line in a drop-in for systemd-fsck@sysroot.service. https://bugzilla.redhat.com/show_bug.cgi?id=1201979 --- src/shared/generator.c | 7 +++ units/systemd-fsck-root.service.in | 1 + 2 files changed, 8 insertions(+) diff --git a/src/shared/generator.c b/src/shared/generator.c index 7b2f846175..a71222d1cb 100644 --- a/src/shared/generator.c +++ b/src/shared/generator.c @@ -78,6 +78,13 @@ int generator_write_fsck_deps( RequiresOverridable=%1$s\n After=%1$s\n, fsck); + +if (in_initrd() path_equal(where, /sysroot)) +return write_drop_in_format(dir, fsck, 50, stamp, +# Automatically generated by %s\n\n +[Service]\n + ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n, + program_invocation_short_name); } return 0; diff --git a/units/systemd-fsck-root.service.in b/units/systemd-fsck-root.service.in index 3617abf04a..48dacc841c 100644 --- a/units/systemd-fsck-root.service.in +++ b/units/systemd-fsck-root.service.in @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8) DefaultDependencies=no Before=local-fs.target shutdown.target ConditionPathIsReadWrite=!/ +ConditionPathExists=!/run/systemd/fsck-root-done [Service] Type=oneshot -- 2.3.5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel Lennart ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Sending a SIGABRT to PID1
Hello I'm using rigth now a Manjaro distribution (derived from arch). Making some test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will cause system to enter on unstable mode: after doing this, the system reboot graphic server (at least, it request to login again) and if you resend the SIGABRT, the system goes to Kernel Panic Mode. Here is the code I've tested (executing as sudo, of course). echo int main(){kill(1,6);kill(1,6);} a.c gcc a.c sudo ./a.out It appears not to be a very large problem (since root permisions are required), but I think is an undiserable behaviour. Is this really a bug? Thanks! ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 3/3] Use a stamp file to avoid running systemd-fsck-root.service twice
В Sun, 3 May 2015 15:33:56 + Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl пишет: On Sun, May 03, 2015 at 06:06:58PM +0300, Andrei Borzenkov wrote: В Sun, 3 May 2015 16:17:15 +0200 Lennart Poettering lenn...@poettering.net пишет: On Sat, 02.05.15 13:16, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote: So, the last time we discussed this we figured we should do this differently, and simply generate systemd-fsck-root.service in the initrd as well, that uses a different command line internally. The end result would then be that we can do without flag file, and always have the guarantee that systemd-fsck-root.service is the services that fsck'ed the root file system, regardless whether in initrd or not. systemd-fsck@.service has explicit dependency on systemd-fsck-root.service so other mounts (/usr, anything else?) will be serialized after it. Currently they can run in parallel. Not I think it is a big problem, but at least to consider. Yeah, that's the main wart. I tried to outline it in the second bullet point below. I was not sure about stacked filesystems; do you mean something like root on loop mount? I prepared a patch to generate systemd-fsck-root.service in generator_write_fsck_deps() first, but I wasn't happy with the result. If we ignore the dependency issue, it might be judged more elegant, since it just uses unit stat to pass information. Harald, can you comment? Zbyszek In the initramfs, we run systemd-fsck@sysroot-device.service. In the real system we run systemd-fsck-root.service. It is hard to pass the information that the latter should not run if the first succeeded using unit state only. - in the real system, we need a synchronization point between the fsck for root and other fscks, to express the dependency to run this systemd-fsck@.service before all other systemd-fsck@ units. We cannot express it directly, because there are no wildcard dependencies. We could use a target as a sychronization point, but then we would have to provide drop-ins to order systemd-fsck@-.service before the target, and all others after it, which becomes messy. The currently used alternative of having a special unit (systemd-fsck-root.service) makes it easy to express this dependency, and seems to be the best solution. - we cannot use systemd-fsck-root.service in the initramfs, because other fsck units should not be ordered after it. In the real system, the root device is always checked and mounted before other filesystems, but in the initramfs this doesn't have to be true: /sysroot might be stacked on other filesystems and devices. - the name of the root device can legitimately be specified in a different way in the initramfs (on the kernel command line, or automatically discovered through GPT), and in the real fs (in /etc/fstab). Even if we didn't need systemd-fsck-root.service as a synchronization point, it would be hard to ensure the same instance parameter is provided for systemd-fsck@.service in the initrams and the real system. Let's use a side channel to pass this information. /run/systemd/fsck-root-done is touched after fsck in the initramfs succeeds, through an ExecStartPost line in a drop-in for systemd-fsck@sysroot.service. https://bugzilla.redhat.com/show_bug.cgi?id=1201979 --- src/shared/generator.c | 7 +++ units/systemd-fsck-root.service.in | 1 + 2 files changed, 8 insertions(+) diff --git a/src/shared/generator.c b/src/shared/generator.c index 7b2f846175..a71222d1cb 100644 --- a/src/shared/generator.c +++ b/src/shared/generator.c @@ -78,6 +78,13 @@ int generator_write_fsck_deps( RequiresOverridable=%1$s\n After=%1$s\n, fsck); + +if (in_initrd() path_equal(where, /sysroot)) +return write_drop_in_format(dir, fsck, 50, stamp, +# Automatically generated by %s\n\n +[Service]\n + ExecStartPost=-/bin/touch /run/systemd/fsck-root-done\n, + program_invocation_short_name); } return 0; diff --git a/units/systemd-fsck-root.service.in b/units/systemd-fsck-root.service.in index 3617abf04a..48dacc841c 100644 --- a/units/systemd-fsck-root.service.in +++ b/units/systemd-fsck-root.service.in @@ -11,6 +11,7 @@ Documentation=man:systemd-fsck-root.service(8) DefaultDependencies=no Before=local-fs.target shutdown.target
Re: [systemd-devel] Sending a SIGABRT to PID1
On Sun, May 3, 2015 at 6:18 PM, Víctor Fernández vfr...@gmail.com wrote: Hello I'm using rigth now a Manjaro distribution (derived from arch). Making some test, i've discovered that sending SIGABRT (6) to PID 1 (systemd) will cause system to enter on unstable mode: after doing this, the system reboot graphic server (at least, it request to login again) and if you resend the SIGABRT, the system goes to Kernel Panic Mode. Here is the code I've tested (executing as sudo, of course). echo int main(){kill(1,6);kill(1,6);} a.c gcc a.c sudo ./a.out It appears not to be a very large problem (since root permisions are required), but I think is an undiserable behaviour. Is this really a bug? No, it is not a bug that a program crashes *when you ask it to crash*. The whole point of SIGABRT is that it kills the program immediately, when the program calls abort() after detecting some serious inconsistency. It is also not a bug that the kernel panics when init exits. Several things depend on the existence of PID 1. -- Mantas Mikulėnas graw...@gmail.com ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Sending a SIGABRT to PID1
On Sun, May 3, 2015 at 6:54 PM, Víctor Fernández vfr...@gmail.com wrote: Ok, Thanks for your reply. But, just out of curiosity, why init process gets down with a SIGABRT and not with a SIGKILL (9), being this a signal which cannot be caught, blocked or ignored? pid 1 is allowed to catch SIGKILL, and usually does so, so that you can sigkill everything (e.g. Alt+SysRq+I) and still have a working system afterwards. Meanwhile, things like SIGABRT or SIGSEGV or SIGILL actually mean that something *abnormal* happened – if a program receives them, it's *supposed to* crash. So systemd catches these signals but enters crash mode immediately. -- Mantas Mikulėnas graw...@gmail.com ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn: cannot join existing macvlan
Kai Krakow hurikha...@gmail.com schrieb: Hello again! Amended below... I'm not sure about this but I suspect that I cannot start a second nspawn container with --network-macvlan when another nspawn instance has created it before: # systemd-nspawn -b --network-macvlan=enp4s0 Spawning container gentoo-mysql-base on /var/lib/machines/gentoo-mysql-base. Press ^] three times within 1s to kill container. Failed to add new macvlan interfaces: File exists To my surprise it works when adding machines to machines.target. While you cannot start them through means of systemd because of the same error, it works during boot of the whole system: All containers boot up properly - but stop one and you cannot restart it. So it looks like there's an unintentional race condition during boot which allows to create this interface but when the system is up, it no longer works because the race condition is no longer present. systemd-nspawn should probably just allow joining existing macvlan bridges. I would fix it in the code but I don't know the implications why this check is in there in the first place. A second fix should maybe do something about such race conditions if it is such one. I suspect there are cases where the interface presence check makes actually sense. I installed something which is called a stable v219 snapshot, I could not find out which changes are included, tho: *systemd-219_p112 (26 Apr 2015) 26 Apr 2015; Mike Gilbert flop...@gentoo.org +systemd-219_p112.ebuild: Add a snapshot from the v219-stable branch upstream. The behavior described above has changed with this snapshot: Machines using macvlan no longer start, even not a boot-up (which worked before). The error is still the same: # systemd-nspawn -b --link-journal=try-guest --network-macvlan=enp4s0 -- bind=/usr/portage --bind-ro=/usr/src --machine=test Spawning container test on /var/lib/machines/test. Press ^] three times within 1s to kill container. Failed to add new macvlan interfaces: File exists I still don't think that systemd-nspawn should insist on creating the host- side macvlan bridge and fail, if it cannot. It should just accept that it is already there. Actually I even created this device in the host with networkd because by design macvlan and parent device cannot communicate with each other without switch support and won't communicate directly locally either. Thus, you need to attach a host-side macvlan device to your physical parent device to communicate with the other virtual MAC addresses on the same host, and then setup your IP configuration on this device. Of course one could argue that this is a security feature of nspawn to isolate containers and hosts from each other. So maybe, put an option to allow nspawn to join an existing macvlan, maybe --network-join-macvlan. -- Replies to list only preferred. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel