Re: [systemd-devel] Troubleshooting Failed Nspawn Starts
On Sun, Aug 16, 2015 at 9:29 AM, Lennart Poettering lenn...@poettering.net wrote: Could you file a github RFE issue, asking for support for watchdog keep-alive message send stuff in PID 1 and nspawn, and watchdog keep-alive message receive stuff in nspawn? I think it would make a lot of sense to add this! https://github.com/systemd/systemd/issues/997 (I didn't see a way to tag as RFE so I suppose somebody will be helpful and do this after the fact.) -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Troubleshooting Failed Nspawn Starts
Occassionally I'll have nspawn containers that freeze up when they're loading. What is the best way to troubleshoot these and get useful info to devs? This is on systemd-218, on Gentoo. Also, is there any way to detect these freezes, perhaps getting the service launching it to at least fail? Short of installing nagios/etc something like this is hard to spot right now. Example of a frozen container: systemctl status mariadb-contain ● mariadb-contain.service - mariadb container Loaded: loaded (/etc/systemd/system/mariadb-contain.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2015-08-10 07:21:48 EDT; 37min ago Docs: man:systemd-nspawn(1) Main PID: 1033 (systemd-nspawn) Status: Container running. CGroup: /system.slice/mariadb-contain.service ├─1033 /usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=guest --directory=/sstorage3/cont... ├─1044 /usr/lib/systemd/systemd └─system.slice ├─systemd-journald.service │ └─1407 /usr/lib/systemd/systemd-journald └─systemd-journal-flush.service └─1340 /usr/bin/journalctl --flush Aug 10 07:57:45 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:57:55 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:58:05 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:58:14 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:58:24 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:58:34 rich64 systemd-nspawn[1033]: [1.6K blob data] Aug 10 07:58:44 rich64 systemd-nspawn[1033]: [1.6K blob data] Aug 10 07:58:54 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:59:04 rich64 systemd-nspawn[1033]: [1.5K blob data] Aug 10 07:59:13 rich64 systemd-nspawn[1033]: [1.6K blob data] Journal for this boot: Aug 10 07:20:04 mariadb systemd-journal[16]: Journal stopped -- Reboot -- Aug 10 07:21:54 mariadb systemd-journal[13]: Runtime journal is using 8.0M (max allowed 793.0M, trying to leave 1.1G fre Aug 10 07:21:57 mariadb systemd-journal[13]: Permanent journal is using 544.0M (max allowed 4.0G, trying to leave 4.0G f Aug 10 07:22:02 mariadb systemd-journal[13]: Time spent on flushing to /var is 5.461887s for 2 entries. Aug 10 07:22:04 mariadb systemd-journal[13]: Journal started Aug 10 07:22:07 mariadb systemd[1]: Starting Flush Journal to Persistent Storage... Aug 10 07:23:07 mariadb systemd-journal[19]: Runtime journal is using 8.0M (max allowed 793.0M, trying to leave 1.1G fre Aug 10 07:23:10 mariadb systemd-journal[19]: Permanent journal is using 544.0M (max allowed 4.0G, trying to leave 4.0G f Aug 10 07:23:21 mariadb systemd-journal[19]: Time spent on flushing to /var is 10.996266s for 2 entries. Aug 10 07:23:21 mariadb systemd-journal[19]: Journal started Aug 10 07:23:07 mariadb systemd[1]: systemd-journald.service watchdog timeout (limit 1min)! Aug 10 07:24:27 mariadb systemd[1]: Time has been changed -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] fstab generator and nfs-client.target
On Mon, Jul 27, 2015 at 10:51 AM, Lennart Poettering lenn...@poettering.net wrote: If you are looking for a way to start this service only when an NFS mount is attempted, then I must disappoint you: there's currently no way to do this nicely, as you can neither express applies only to NFS, nor is there a way to actively pull in things from remote mounts. Presumably if the .mount unit had Wants=nfs-common.target or something similar it would pull it in when it mounts, right? That would probably require a change to the generator to detect nfs shares and add this. Whether you want to do it that way is a different matter. (There's a reason for both limitations: we try to avoid pull-deps on mounts, since we want to keep the effect of manually ordered /bin/mount invocations, and systemd-ordered .mount activations as close as possible.) Would it actually affect ordering? If the mount wants nfs-common and is after remote-fs-pre.target, and nfs-common is before remote-fs-pre.target is after nfs-common, then won't nfs-common be started before any remote fs are mounted? The only change would be that the mount wants nfs-common, so that it is started when it otherwise might not have been. Or am I missing something? -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] fstab generator and nfs-client.target
I noticed that mount units for nfs shares created by the generator do not Want=nfs-client.target or similar. That means that if you don't explicitly want nfs-client in your configuration then nfs shares will get mounted, but services like rpc-statd-notify.service won't run. Would it make sense to have the generator add the nfs-client target as a want to nfs mount services it creates? -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Waiting for nspawn services
On Mon, Oct 27, 2014 at 10:49 AM, Lennart Poettering lenn...@poettering.net wrote: In general I think making use of socket notification here would be the much better option, as it removes the entire need for ordering things here. nspawn already support socket activation just fine. If your mysql container would use this, then you could start the entire mysql container at the same time as the mysql client without any further complexity or synchronization, and it would just work. Is socket activation supported for nspawn containers that use network namespaces? Incoming connections would not be pointed at the host IP, but at the container's IP, which the host wouldn't otherwise be listening on since the interface for it does not yet exist. Or do I need to move everything to different port numbers and use the host IP? -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Waiting for nspawn services
On Mon, Oct 27, 2014 at 11:32 AM, Lennart Poettering lenn...@poettering.net wrote: Network namespaces are relevant for the process that originally binds the sockets. In the case of socket-activated containers that would be the host. If you then pass the fds into the containers and those are locked into their own namespaces, then any sockets they create and bind would be from their own namepsace, but the one they got passed in would still be from the original host namespace. If they then accept a connection on that passed-in socket that connection socket would also be part of the same host namespace -- not of the containers. In case it wasn't clear - I'm talking about network namespaces with ethernet bridging - not just an isolated network namespace without any network access at all. That said, I could certainly see why the latter would be useful. So, if the host is 10.0.0.1, then mysql would normally listen on 10.0.0.2:3306. One of my goals here was to keep everything running on its native port and dedicated IP to minimize configuration. For example, I can run ssh on 10.0.0.2 and let it have port 22, and not worry about the other 3 containers running ssh on port 22. I suppose I could have systemd listen on 10.0.0.1:x and pass that connection over to mysql. However, then I need to point services to 10.0.0.1 and not 10.0.0.2. This is why I alluded to it being useful to be able to depend on services on remote hosts. I completely agree that doing this in a clean way without resorting to polling would involve a bit of work. My own workaround in this case was basically going to amount to polling. -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Waiting for nspawn services
One of the useful options in nspawn is the ability to boot the init within the container using -b, especially if that init happens to be systemd. However, I could not find any easy way to set a dependency on a service within a container. Example use case: Unit 1 boots an nspawn container that runs mysql Unit 2 launches a service that depends on mysql, or it might even be another container that depends on mysql. I could put together a script that pings mysql until it is up, but the original mysql unit already has to make the determination as to whether the service is ready, so this is redundant. Also, that is a solution specific to a single daemon, while the problem is generic. I could think of a few possible ways to solve this. 1. Have a way to actually specify a dependency on a unit within a container. 2. Have a generic wait program that can wait for any unit to start within a container, or perhaps even on a remote host. 3. Have a way for nspawn to delay becoming online until all services inside have become online. Actually, being able to express unit dependencies across machines might be useful on many levels, but I'll be happy enough just to be able to handle containers on a single host for now. -- Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] networkd losing dhcp lease with dracut / nfs root
On Sun, Jun 29, 2014 at 10:27 AM, Tom Gundersen t...@jklm.no wrote: On Sat, Jun 28, 2014 at 11:29 AM, Tom Gundersen t...@jklm.no wrote: Your analysis is correct. networkd is not updating the lft. We should change two things: dracut (or whatever is being used on your machine) should set an infinite lifetime when using NFS root (IMHO), and networkd should update the lft (and in particular force-set it to infinite if CriticalConnection is being used). The latter is on my TODO. I just pushed a fix for this in networkd, please let me know if you are still having issues. Did this make it into 215? If so, I'm still seeing odd behavior though it no longer crashes. I have a 5min dhcp lease (for testing). If I set CriticalConnection then it sets valid_lft to forever, and if not it starts at 300s - this seems right. At 150 seconds left it renews DHCP, but does not update valid_lft A minute later it again renews DHCP, but also does not update valid_lft. 51 seconds later it again renews DHCP, and this time it updates valid_lft. So, the interface never drops, but it isn't really maintaining valid_lft at all points where it could. I don't know what would have happened if it didn't get the lease at the last update - at that point there was around 30s left. I guess I could test that if necessary by shutting down the dhcp server. Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] networkd losing dhcp lease with dracut / nfs root
I'm running systemd-212 and dracut-037, on a diskless box with an nfs root and pxe boot. After a number of updates I noticed that the box would freeze up after 24h uptime - almost exactly. This behavior is the same whether I have systemd-networkd running or not (it is configured to set up any interface matching e* with dhcp). I traced this to the dhcp lease time - if I set the lease to 10min the box freezes in 10min, with errors spewing to the network console shortly after about not being able to reach the nfs server. After some research, I suspect it is the result of: https://bugzilla.redhat.com/show_bug.cgi?id=1097523 I monitored the box more closely and discovered that with a 10 minute lease the box is renewing the lease after 5 minutes. However, if I run watch ip addr the box counts down the valid_lft from 600 seconds down to 1 second with no change after 5 minutes. If I disable systemd-networkd then the box doesn't renew the lease at all, and valid_lft counts down just the same. I suspect that systemd-networkd is renewing the lease but not updating the valid_lft on the interface, and thus after the original lease expires the kernel brings it down. The only other thing that is odd is that my interface has two IPs assigned, and I have no idea where one is coming from: 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:01:2e:31:04:dc brd ff:ff:ff:ff:ff:ff inet 200.0.0.0/24 brd 200.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.0.10/24 brd 192.168.0.255 scope global dynamic eth0 valid_lft 220sec preferred_lft 220sec inet6 fe80::201:2eff:fe31:4dc/64 scope link valid_lft forever preferred_lft forever Clearly systemd-networkd is managing 192.168.0.10: Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: link is up Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: carrier on Jun 27 23:12:43 mythliv2 systemd[1]: Started Network Service. Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: DHCPv4 address 192.168.0.10/24 via 192.168.0.101 Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: link configured I'm not sure where the other IP is coming from - it shows up even if I don't enable systemd-networkd, so perhaps dracut is setting it up. I'm not sure if its valid_lft of forever is causing any confusion though. My network config: [Match] Name=e* [Network] DHCP=yes [DHCPv4] CriticalConnection=yes (I get the same behavior if I drop the CriticalConnection=yes) Any thoughts as to what is going wrong here? I'm happy to test patches/etc. Rich ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel