Re: [systemd-devel] Activation socket overwritten while socket service running

2017-04-25 Thread Lennart Poettering
On Tue, 25.04.17 10:17, Trent Lloyd (tr...@lloyd.id.au) wrote:

> I had two general thoughts,
> 
> (a) Can I make a unit change to "improve" the situation, for example adding
> PartOf=avahi-daemon.service to avahi-daemon.socket.  I have noticed that
> CUPS and Docker (err, moby) seem to ship this though most other things
> don't; however that does seem to take away the ability to actually use
> socket activation since they'll always activate together making it mostly
> pointless (even though in most cases with avahi specifically, we want to
> actually startup and not wait to be activated so the network side is
> active).  So this seems non-ideal and I think probably this doesn't make
> sense.  It does make me wonder if others had the same thought-process and/or
> problems though.

I figure a BindsTo=avahi-daemon.socket in avahi-daemon.service
wouldn't be too bad. That way if the service is stopped the socket is
stopped too.

> (b) Would it make sense to improve systemd to monitor the socket status and
> alert or "exit" the service (making it eligible to be restarted,
> particularly it would be restarted automatically when avahi-daemon.service
> is again started) if it is no longer actually bound to the on disk path.  Or
> otherwise improve the situation directly from the systemd side in some way,
> such as checking the socket status at least when the service is restarting.
> Restarting avahi-daemon.socket does in fact restart avahi-daemon.service (if
> nothing else by way of necessity I guess, since a new FD has to be passed
> in) but the reverse is not true by default (which does make sense for
> short-lived activated services, you don't want to re-bind the socket every
> time and leaves a race time the service is unavailable)

I figure it would be OK for PID 1 to use inotify to watch for the
socket node to be removed/replaced in the file system, and if that
happens we could place the socket unit in a failure state or so.

Would be happy to take a patch for that!

> From my view the "real problem" is that the issue is entirely invisible.
> The socket does not work, there are no errors visible on either the socket
> or service and restarting avahi-daemon.service does not fix it.  Restarting
> avahi-daemon.socket does fix it and I appreciate that, but I think that is
> confusing in many cases.  I am feeling that having the socket service exit
> if the path becomes invalid may be a sensible improvement but I thought I'd
> float the idea before working on it.

Yeah, it is indeed to easy to debug issues like that. I'd be open to
making this more discoverable, by for example refusing to start a
service whose sockets have disappeared in the file system (or maybe
warn but permit), but I am not aware of any nice way how we could
detect that the fs node still maps to the same socket we originally
created. To my knowledge there is no Linux/UNIX API for something like
that...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Activation socket overwritten while socket service running

2017-04-24 Thread Trent Lloyd

Hi Folks,

I ran into an issue when doing local development on Avahi where after 
the following sequence of events I could no longer connect to 
/run/avahi-daemon/socket activated by avahi-daemon.socket

 (1) systemctl disable avahi-daemon.service
 (2) systemctl stop avahi-daemon.service
 (2) sudo avahi-daemon --debug, then exit the process
 (3) systemctl start avahi-daemon.service
 [side note: if you try to reproduce this there is a chance a hostname 
lookup will trigger and connect to avahi-daemon.socket and activate it, 
meaning avahi-daemon won't manually start.  You'll have to avoid that; 
stopping avahi-daemon.socket would invalidate the issues noted below]


When avahi-daemon is started manually (without systemd), it does not 
receive an FD and instead unlinks and replaces the socket. 
avahi-daemon.socket is left running and never restarted, the now stale 
socket (no longer actually linked to the on-disk file) is still passed 
to avahi-daemon.service when it is started again and the socket does not 
work.  A restart of avahi-daemon.socket does fix that.


I'm wondering if I can improve this situation and what the "Right Way" 
is.  I think it's a little muddy generally as effectively I am messing 
with the system outside of systemd but given the general recommended 
socket activation workflow to bind the socket if it's not passed in, 
this does not seem entirely unlikely to happen in some cases. And the 
result is surprising and confusing on inspection to the average user, 
the socket exists but does not work even after restarting 
avahi-daemon.service.



I had two general thoughts,

(a) Can I make a unit change to "improve" the situation, for example 
adding PartOf=avahi-daemon.service to avahi-daemon.socket.  I have 
noticed that CUPS and Docker (err, moby) seem to ship this though most 
other things don't; however that does seem to take away the ability to 
actually use socket activation since they'll always activate together 
making it mostly pointless (even though in most cases with avahi 
specifically, we want to actually startup and not wait to be activated 
so the network side is active).  So this seems non-ideal and I think 
probably this doesn't make sense.  It does make me wonder if others had 
the same thought-process and/or problems though.


(b) Would it make sense to improve systemd to monitor the socket status 
and alert or "exit" the service (making it eligible to be restarted, 
particularly it would be restarted automatically when 
avahi-daemon.service is again started) if it is no longer actually bound 
to the on disk path.  Or otherwise improve the situation directly from 
the systemd side in some way, such as checking the socket status at 
least when the service is restarting.  Restarting avahi-daemon.socket 
does in fact restart avahi-daemon.service (if nothing else by way of 
necessity I guess, since a new FD has to be passed in) but the reverse 
is not true by default (which does make sense for short-lived activated 
services, you don't want to re-bind the socket every time and leaves a 
race time the service is unavailable)



From my view the "real problem" is that the issue is entirely 
invisible.  The socket does not work, there are no errors visible on 
either the socket or service and restarting avahi-daemon.service does 
not fix it.  Restarting avahi-daemon.socket does fix it and I appreciate 
that, but I think that is confusing in many cases.  I am feeling that 
having the socket service exit if the path becomes invalid may be a 
sensible improvement but I thought I'd float the idea before working on it.


Any input appreciated.

Cheers,
Trent
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel