Hi Folks,

I ran into an issue when doing local development on Avahi where after the following sequence of events I could no longer connect to /run/avahi-daemon/socket activated by avahi-daemon.socket
 (1) systemctl disable avahi-daemon.service
 (2) systemctl stop avahi-daemon.service
 (2) sudo avahi-daemon --debug, then exit the process
 (3) systemctl start avahi-daemon.service
[side note: if you try to reproduce this there is a chance a hostname lookup will trigger and connect to avahi-daemon.socket and activate it, meaning avahi-daemon won't manually start. You'll have to avoid that; stopping avahi-daemon.socket would invalidate the issues noted below]

When avahi-daemon is started manually (without systemd), it does not receive an FD and instead unlinks and replaces the socket. avahi-daemon.socket is left running and never restarted, the now stale socket (no longer actually linked to the on-disk file) is still passed to avahi-daemon.service when it is started again and the socket does not work. A restart of avahi-daemon.socket does fix that.

I'm wondering if I can improve this situation and what the "Right Way" is. I think it's a little muddy generally as effectively I am messing with the system outside of systemd but given the general recommended socket activation workflow to bind the socket if it's not passed in, this does not seem entirely unlikely to happen in some cases. And the result is surprising and confusing on inspection to the average user, the socket exists but does not work even after restarting avahi-daemon.service.


I had two general thoughts,

(a) Can I make a unit change to "improve" the situation, for example adding PartOf=avahi-daemon.service to avahi-daemon.socket. I have noticed that CUPS and Docker (err, moby) seem to ship this though most other things don't; however that does seem to take away the ability to actually use socket activation since they'll always activate together making it mostly pointless (even though in most cases with avahi specifically, we want to actually startup and not wait to be activated so the network side is active). So this seems non-ideal and I think probably this doesn't make sense. It does make me wonder if others had the same thought-process and/or problems though.

(b) Would it make sense to improve systemd to monitor the socket status and alert or "exit" the service (making it eligible to be restarted, particularly it would be restarted automatically when avahi-daemon.service is again started) if it is no longer actually bound to the on disk path. Or otherwise improve the situation directly from the systemd side in some way, such as checking the socket status at least when the service is restarting. Restarting avahi-daemon.socket does in fact restart avahi-daemon.service (if nothing else by way of necessity I guess, since a new FD has to be passed in) but the reverse is not true by default (which does make sense for short-lived activated services, you don't want to re-bind the socket every time and leaves a race time the service is unavailable)


From my view the "real problem" is that the issue is entirely invisible. The socket does not work, there are no errors visible on either the socket or service and restarting avahi-daemon.service does not fix it. Restarting avahi-daemon.socket does fix it and I appreciate that, but I think that is confusing in many cases. I am feeling that having the socket service exit if the path becomes invalid may be a sensible improvement but I thought I'd float the idea before working on it.

Any input appreciated.

Cheers,
Trent
_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to