Re: [systemd-devel] Troubleshooting Failed Nspawn Starts

2015-08-20 Thread Rich Freeman
On Sun, Aug 16, 2015 at 9:29 AM, Lennart Poettering
lenn...@poettering.net wrote:
 Could you file a github RFE issue, asking for support for watchdog
 keep-alive message send stuff in PID 1 and nspawn, and watchdog
 keep-alive message receive stuff in nspawn? I think it would make a
 lot of sense to add this!

https://github.com/systemd/systemd/issues/997

(I didn't see a way to tag as RFE so I suppose somebody will be
helpful and do this after the fact.)

--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Troubleshooting Failed Nspawn Starts

2015-08-10 Thread Rich Freeman
Occassionally I'll have nspawn containers that freeze up when they're
loading.  What is the best way to troubleshoot these and get useful
info to devs?

This is on systemd-218, on Gentoo.

Also, is there any way to detect these freezes, perhaps getting the
service launching it to at least fail?  Short of installing nagios/etc
something like this is hard to spot right now.

Example of a frozen container:

systemctl status mariadb-contain
● mariadb-contain.service - mariadb container
   Loaded: loaded (/etc/systemd/system/mariadb-contain.service;
enabled; vendor preset: enabled)
   Active: active (running) since Mon 2015-08-10 07:21:48 EDT; 37min ago
 Docs: man:systemd-nspawn(1)
 Main PID: 1033 (systemd-nspawn)
   Status: Container running.
   CGroup: /system.slice/mariadb-contain.service
   ├─1033 /usr/bin/systemd-nspawn --quiet --keep-unit --boot
--link-journal=guest --directory=/sstorage3/cont...
   ├─1044 /usr/lib/systemd/systemd
   └─system.slice
 ├─systemd-journald.service
 │ └─1407 /usr/lib/systemd/systemd-journald
 └─systemd-journal-flush.service
   └─1340 /usr/bin/journalctl --flush

Aug 10 07:57:45 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:57:55 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:58:05 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:58:14 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:58:24 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:58:34 rich64 systemd-nspawn[1033]: [1.6K blob data]
Aug 10 07:58:44 rich64 systemd-nspawn[1033]: [1.6K blob data]
Aug 10 07:58:54 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:59:04 rich64 systemd-nspawn[1033]: [1.5K blob data]
Aug 10 07:59:13 rich64 systemd-nspawn[1033]: [1.6K blob data]


Journal for this boot:
Aug 10 07:20:04 mariadb systemd-journal[16]: Journal stopped
-- Reboot --
Aug 10 07:21:54 mariadb systemd-journal[13]: Runtime journal is using
8.0M (max allowed 793.0M, trying to leave 1.1G fre
Aug 10 07:21:57 mariadb systemd-journal[13]: Permanent journal is
using 544.0M (max allowed 4.0G, trying to leave 4.0G f
Aug 10 07:22:02 mariadb systemd-journal[13]: Time spent on flushing to
/var is 5.461887s for 2 entries.
Aug 10 07:22:04 mariadb systemd-journal[13]: Journal started
Aug 10 07:22:07 mariadb systemd[1]: Starting Flush Journal to
Persistent Storage...
Aug 10 07:23:07 mariadb systemd-journal[19]: Runtime journal is using
8.0M (max allowed 793.0M, trying to leave 1.1G fre
Aug 10 07:23:10 mariadb systemd-journal[19]: Permanent journal is
using 544.0M (max allowed 4.0G, trying to leave 4.0G f
Aug 10 07:23:21 mariadb systemd-journal[19]: Time spent on flushing to
/var is 10.996266s for 2 entries.
Aug 10 07:23:21 mariadb systemd-journal[19]: Journal started
Aug 10 07:23:07 mariadb systemd[1]: systemd-journald.service watchdog
timeout (limit 1min)!
Aug 10 07:24:27 mariadb systemd[1]: Time has been changed


--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] fstab generator and nfs-client.target

2015-07-27 Thread Rich Freeman
On Mon, Jul 27, 2015 at 10:51 AM, Lennart Poettering
lenn...@poettering.net wrote:

 If you are looking for a way to start this service only when an NFS
 mount is attempted, then I must disappoint you: there's currently no
 way to do this nicely, as you can neither express applies only to
 NFS, nor is there a way to actively pull in things from remote
 mounts.

Presumably if the .mount unit had Wants=nfs-common.target or something
similar it would pull it in when it mounts, right?  That would
probably require a change to the generator to detect nfs shares and
add this.

Whether you want to do it that way is a different matter.


 (There's a reason for both limitations: we try to avoid pull-deps on
 mounts, since we want to keep the effect of manually ordered
 /bin/mount invocations, and systemd-ordered .mount activations as
 close as possible.)


Would it actually affect ordering?

If the mount wants nfs-common and is after remote-fs-pre.target, and
nfs-common is before remote-fs-pre.target is after nfs-common, then
won't nfs-common be started before any remote fs are mounted?  The
only change would be that the mount wants nfs-common, so that it is
started when it otherwise might not have been.

Or am I missing something?

--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] fstab generator and nfs-client.target

2015-07-24 Thread Rich Freeman
I noticed that mount units for nfs shares created by the generator do
not Want=nfs-client.target or similar.

That means that if you don't explicitly want nfs-client in your
configuration then nfs shares will get mounted, but services like
rpc-statd-notify.service won't run.

Would it make sense to have the generator add the nfs-client target as
a want to nfs mount services it creates?

--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Waiting for nspawn services

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 10:49 AM, Lennart Poettering
lenn...@poettering.net wrote:
 In general I think making use of socket notification here would be the
 much better option, as it removes the entire need for ordering things
 here. nspawn already support socket activation just fine. If your
 mysql container would use this, then you could start the entire mysql
 container at the same time as the mysql client without any further
 complexity or synchronization, and it would just work.


Is socket activation supported for nspawn containers that use network
namespaces?  Incoming connections would not be pointed at the host IP,
but at the container's IP, which the host wouldn't otherwise be
listening on since the interface for it does not yet exist.

Or do I need to move everything to different port numbers and use the host IP?

--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Waiting for nspawn services

2014-10-27 Thread Rich Freeman
On Mon, Oct 27, 2014 at 11:32 AM, Lennart Poettering
lenn...@poettering.net wrote:
 Network namespaces are relevant for the process that originally binds
 the sockets. In the case of socket-activated containers that would be
 the host. If you then pass the fds into the containers and those are
 locked into their own namespaces, then any sockets they create and
 bind would be from their own namepsace, but the one they got passed in
 would still be from the original host namespace. If they then accept a
 connection on that passed-in socket that connection socket would also
 be part of the same host namespace -- not of the containers.


In case it wasn't clear - I'm talking about network namespaces with
ethernet bridging - not just an isolated network namespace without any
network access at all.  That said, I could certainly see why the
latter would be useful.

So, if the host is 10.0.0.1, then mysql would normally listen on
10.0.0.2:3306.  One of my goals here was to keep everything running on
its native port and dedicated IP to minimize configuration.  For
example, I can run ssh on 10.0.0.2 and let it have port 22, and not
worry about the other 3 containers running ssh on port 22.

I suppose I could have systemd listen on 10.0.0.1:x and pass that
connection over to mysql.  However, then I need to point services to
10.0.0.1 and not 10.0.0.2.

This is why I alluded to it being useful to be able to depend on
services on remote hosts.  I completely agree that doing this in a
clean way without resorting to polling would involve a bit of work.
My own workaround in this case was basically going to amount to
polling.

--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Waiting for nspawn services

2014-10-25 Thread Rich Freeman
One of the useful options in nspawn is the ability to boot the init
within the container using -b, especially if that init happens to be
systemd.

However, I could not find any easy way to set a dependency on a
service within a container.

Example use case:
Unit 1 boots an nspawn container that runs mysql
Unit 2 launches a service that depends on mysql, or it might even be
another container that depends on mysql.

I could put together a script that pings mysql until it is up, but the
original mysql unit already has to make the determination as to
whether the service is ready, so this is redundant.  Also, that is a
solution specific to a single daemon, while the problem is generic.

I could think of a few possible ways to solve this.

1.  Have a way to actually specify a dependency on a unit within a container.
2.  Have a generic wait program that can wait for any unit to start
within a container, or perhaps even on a remote host.
3.  Have a way for nspawn to delay becoming online until all services
inside have become online.

Actually, being able to express unit dependencies across machines
might be useful on many levels, but I'll be happy enough just to be
able to handle containers on a single host for now.

--
Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] networkd losing dhcp lease with dracut / nfs root

2014-07-14 Thread Rich Freeman
On Sun, Jun 29, 2014 at 10:27 AM, Tom Gundersen t...@jklm.no wrote:
 On Sat, Jun 28, 2014 at 11:29 AM, Tom Gundersen t...@jklm.no wrote:
 Your analysis is correct. networkd is not updating the lft.

 We should change two things: dracut (or whatever is being used on your
 machine) should set an infinite lifetime when using NFS root (IMHO),
 and networkd should update the lft (and in particular force-set it to
 infinite if CriticalConnection is being used).

 The latter is on my TODO.

 I just pushed a fix for this in networkd, please let me know if you
 are still having issues.

Did this make it into 215?  If so, I'm still seeing odd behavior
though it no longer crashes.

I have a 5min dhcp lease (for testing).

If I set CriticalConnection then it sets valid_lft to forever, and if
not it starts at 300s - this seems right.

At 150 seconds left it renews DHCP, but does not update valid_lft
A minute later it again renews DHCP, but also does not update valid_lft.
51 seconds later it again renews DHCP, and this time it updates valid_lft.

So, the interface never drops, but it isn't really maintaining
valid_lft at all points where it could.  I don't know what would have
happened if it didn't get the lease at the last update - at that point
there was around 30s left.  I guess I could test that if necessary by
shutting down the dhcp server.

Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] networkd losing dhcp lease with dracut / nfs root

2014-06-27 Thread Rich Freeman
I'm running systemd-212 and dracut-037, on a diskless box with an nfs
root and pxe boot.

After a number of updates I noticed that the box would freeze up after
24h uptime - almost exactly.  This behavior is the same whether I have
systemd-networkd running or not (it is configured to set up any
interface matching e* with dhcp).

I traced this to the dhcp lease time - if I set the lease to 10min the
box freezes in 10min, with errors spewing to the network console
shortly after about not being able to reach the nfs server.

After some research, I suspect it is the result of:
https://bugzilla.redhat.com/show_bug.cgi?id=1097523

I monitored the box more closely and discovered that with a 10 minute
lease the box is renewing the lease after 5 minutes.  However, if I
run watch ip addr the box counts down the valid_lft from 600 seconds
down to 1 second with no change after 5 minutes.

If I disable systemd-networkd then the box doesn't renew the lease at
all, and valid_lft counts down just the same.

I suspect that systemd-networkd is renewing the lease but not updating
the valid_lft on the interface, and thus after the original lease
expires the kernel brings it down.

The only other thing that is odd is that my interface has two IPs
assigned, and I have no idea where one is coming from:
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether 00:01:2e:31:04:dc brd ff:ff:ff:ff:ff:ff
inet 200.0.0.0/24 brd 200.0.0.255 scope global eth0
   valid_lft forever preferred_lft forever
inet 192.168.0.10/24 brd 192.168.0.255 scope global dynamic eth0
   valid_lft 220sec preferred_lft 220sec
inet6 fe80::201:2eff:fe31:4dc/64 scope link
   valid_lft forever preferred_lft forever

Clearly systemd-networkd is managing 192.168.0.10:
Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: link is up
Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: carrier on
Jun 27 23:12:43 mythliv2 systemd[1]: Started Network Service.
Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: DHCPv4 address
192.168.0.10/24 via 192.168.0.101
Jun 27 23:12:43 mythliv2 systemd-networkd[442]: eth0: link configured

I'm not sure where the other IP is coming from - it shows up even if I
don't enable systemd-networkd, so perhaps dracut is setting it up.
I'm not sure if its valid_lft of forever is causing any confusion
though.

My network config:
[Match]
Name=e*

[Network]
DHCP=yes

[DHCPv4]
CriticalConnection=yes

(I get the same behavior if I drop the CriticalConnection=yes)

Any thoughts as to what is going wrong here?  I'm happy to test patches/etc.

Rich
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel