Re: s6, listen(8), etc.
I think we might be approaching diminishing returns on this thread, so please don't take offense if i don't reply too much beyond this. I do appreciate your taking the time to document your thought process here, though. thanks! A couple minor comments: On Thu 2016-09-01 08:34:18 -0400, Laurent Bercot wrote: > I agree with this principle entirely. This is good design. The question > now becomes: how does the daemon know what fd corresponds to what use? > > The simplest, natural answer, used for instance by UCSPI clients and > servers, is: the fd numbers are hardcoded, they're a part of the daemon > interface and that should be documented. > So, in your example, the kresd documentation should say something like: > > - fd 3 must be a datagram socket, that's where kresd will read its > UDP DNS queries from. > - fds 4 and 5 must be listening stream sockets, that's where kresd will > read its TCP DNS queries from. kresd use two sockets for this in order to > allow one of them to be set up over TLS. > - fd 6 must be a listening Unix domain stream socket, that's where kresd > will read its control messages from. This doesn't allow for the creation of multiple different sockets e.g. for a daemon expected to listen on distinct IPv6 and IPv4 addresses, but not on all available addresses. > LISTEN_FD_MAP=udp=3,tcp=4,tls=5,control=6 I'd want the map the other way around: from number to use case, if i was going to go for a re-design like this, though i suppose it's roughly equivalent to have udp=3,udp=4,tcp=5 as long as you don't mind having repeated keys. but meh, this is basically equivalent. > Or does it? The API still constrains you: even if you can provide labels, > you still don't send a map, you send a number of fds and a list of labels. > So you still have to make sure the fds you send are 3, 4, 5, 6 ! And > if your "listen" program happens to have fd 3 open when it's launched > (and there may be very good reasons to have some fd already open when you > run a daemon), well, it has to overwrite it with the udp socket it opens, > because that's what the API forces you to do. If I want the daemon to run > with some open fd that's not explicitly passed via LISTEN_FDS (for instance > if the daemon doesn't have to know it has it open) then I have to make sure > that my fd is at least 3+$LISTEN_FDS. Talk about unwieldy. this seems no more unwieldy (probably less unwieldy) than the convention to always use FD 6 and FD 7. what if i wanted to have some other fd open on those numbers for some other reason? i'd have to move it out of the way to use the s6 convention. Note that i'm not claiming this is terrible, it's only a minor hassle. but it's the same minor hassle as what you're describing here, no? > And from the daemon's side? it has to parse the contents of $LISTEN_FDNAMES, > which is a colon-separated list of labels; that's not really easier than > parsing my suggested LISTEN_FD_MAP string. right, it's equivalent. > So, LISTEN_FDS doesn't know if it wants to pass a list of hardcoded fd > numbers, or a map with labels; it ends up doing a bit of both, badly, and > getting the worst of both worlds. eh? on the contrary: if your daemon doesn't need to distinguish between any of its sockets -- if all sockets are standard listening sockets (it can still inspect the socket itself to distinguish between datagram and stream) then it doesn't care about anything but LISTEN_FDs. If the daemon receives some sockets (like a local control socket) that need to be treated differently, then it needs to look at the map. Your proposed LISTEN_FD_MAP looks precisely equivalent to LISTEN_FDNAMES for this purpose, with only minor syntactic differences. The non-syntactic difference, of course, is that multiple daemons written and running today actually already know how to parse LISTEN_FDNAMES, whether because they link to libsystemd or do the simple parsing themselves. if LISTEN_FD_MAP had gotten there first, i would have made listen(8) implement that labeling scheme. > Oh, and I haven't entirely given up my argument of political agenda. > The only reason you could have for designing such a whimsical interface, > apart from obvious lack of experience in software design, is to encourage > daemon authors to use the sd_listen_fds_with_names() interface (and so, > link against libsystemd), because it's too painful to fully implement it > yourself as a daemon author - you have to handle the case when > LISTEN_FD_NAMES is set, the case when it's not set, etc. etc. (i don't think that baseless slights against the authors of other software advance your cause here) on the technical merits, i really don't see it as difficult as you're making it out to be. When this convention is in use, your daemon already knows whether it needs to parse LISTEN_FDNAMES, or whether it can get away with just looking at LISTEN_FDS. in any case, both will be set. And there is absolutely no need to link against libsystemd.
Re: s6, listen(8), etc.
On 01/09/16 15:43, Roger Pate wrote: On Thu, Sep 1, 2016 at 8:34 AM, Laurent Bercot wrote: OK, now let's have a look at LISTEN_FDS. I also find these particular implementation details a poor choice. I was going to recommend a different environment convention, but then I saw the pre-existing convention was copied exactly. If I was Daniel, I'd create something better. But I'm not sure there's enough interest/need to warrant it. (Daemons currently written to expect LISTEN_FDS could have a chain-loaded conversion program.) Not that I'm particularly knowledgeable here; s6's fdholding seems able to fill this niche already. FD holding is a very general mechanism and requires a protocol between the FD holding daemon and the client(s). A "./run" script can access the FD holding daemon, but this isn't the purpose of the systemd style socket activation. Socket activation is a very old idea in Unix and commonly implemented the inetd superserver. This works well for forking servers with low setup overhead. There several problems with forking one process per connection. * For some protocols forking a process per request is too much overhead. * Some daemons perform expensive setup operations e.g. OpenSSH used to generate an ephemeral SSHv1 server key during startup. A socket requires very few resources compared to a running process especially a process offering a non-trivial service yet only the bound socket is required for clients to initiate a connection. This tempted Apple to not just create and bind sockets in the superserver but also accept the (first) connection inside the superserver before they spawn the service process. The problem is that now the superserver has to pass the bound socket and the accepted connected socket to the service process. This requires a protocol between the superserver and the service. Both Apple and systemd "strongly recommend" applications to link against their libraries resulting in an annoying vendor lock-in. On a classic Unix server running a few services this is unnecessary complexity, but these days most unixoid systems are powered by a single Lithium battery cell. Launchd went too far too quickly and the original implementation requires Mach IPC. In a launchd world every service is always available and the processes implementing it are spawned on demand. There is even a transaction concept enabling launchd to reclaim resources from cooperating services unless they're inside a transaction. This design works well on a laptop or smartphone. It fails spectacularly in the face of "evil legacy software" which doesn't follow the "one true way". The systemd APIs look like they try to follow a middle ground, but they end up making a mess out of things. Have a look at https://ewontfix.com/15/ if want to know more about systemd's design flaws.
Re: runit kill runsv
[sorry for replying late, catching up] * Laurent Bercot [20160627 18:05]: > On 27/06/2016 14:02, Joan Picanyol i Puig wrote: > >However, couldn't they know whether their child did not cease to run > >because > >of a signal they sent? > > I'm not sure about runsv, but s6-supervise is a state machine, and the > service state only goes from UP to FINISH when the supervisor receives a > SIGCHLD. The state does not change at all after the supervisor sent a > signal: it sent a signal, yeah, so what - it's entirely up to the daemon > what to do with that signal. I understand: supervisors only exec() processes and propagate signals, they have no saying in nor can expect what their effect is. > There's an exception for SIGSTOP because stopped daemons won't die > before you SIGCONT them, but that's it; even sending SIGKILL won't > make s6-supervise change states. Of course, if you send SIGKILL, > you're going to receive a SIGCHLD very soon, and *that* will trigger a > state change. Given that SIGKILL shares with SIGSTOP the fact that they can't be caught (and thus supervisors can assume a forthcoming SIGCHLD) signals (pun intended) that the exception should be extended? > >No, but neither can the admin enforce this policy automatically and > >portably using current supervisors. Other than the "dedicated user/login > >class/cgroup" scheme proposed by Jan (which can be considered best > >practice anyway), it'd be nice if they exposed this somehow (hand-waving > >SMOP ahead: duplicate the pid field in ./status and remove the working > >copy only when receiving a down signal). > > No need to duplicate the pid field: if s6-supervise dies before the service > goes down, the pid field in supervise/status is left unchanged, so it still > contains the correct pid. I suspect runsv works the same. Ah, ok, it didn't occur to me that pid 0 in supervise/status could be used to mean "never run or got SIGCHLD" > I guess a partial mitigation strategy could be "if supervise/status exists > and its pid field is nonzero when the supervisor starts, warn that an > instance of the daemon may still be running and print its pid". Do you > think it would be worth the effort? As well as the warning (which would make troubleshooting easier and might have probably avoided this thread), a robust automation enabling ui (in s6-svstat / s6-svok) would round this additional feature and make it yet more useful. keep up the good work -- pica
Re: s6, listen(8), etc.
On Thu, Sep 1, 2016 at 8:34 AM, Laurent Bercot wrote: > OK, now let's have a look at LISTEN_FDS. I also find these particular implementation details a poor choice. I was going to recommend a different environment convention, but then I saw the pre-existing convention was copied exactly. If I was Daniel, I'd create something better. But I'm not sure there's enough interest/need to warrant it. (Daemons currently written to expect LISTEN_FDS could have a chain-loaded conversion program.) Not that I'm particularly knowledgeable here; s6's fdholding seems able to fill this niche already.
Re: s6, listen(8), etc.
On 01/09/2016 07:52, Daniel Kahn Gillmor wrote: I think you might have misunderstood the description of the convention, actually. There's no unix domain socket for receiving the descriptors at all. Rather, the descriptors are already open and present in the process. OK, after reading your message, then the sd_listen_fds man page, it appears that I have indeed misunderstood it. My bad. It my defence, it really is not clear from that page whether the descriptors are already open in the process that invokes sd_listen_fds(), or whether sd_listen_fds actually connects to systemd to receive them. I understood the latter, and I will agree it was my prejudice speaking; the former is much more reasonable. So let's dive a bit deeper. exec listen -udp::53 \ -tcp::53 \ -tcp:label=tls:853 \ -unix:label=control,mode=0600:/run/kresd/control \ chpst -u kresd -p 1 \ /usr/sbin/kresd This means kresd doesn't need to know about dropping privileges, opening listening ports, or resource constraints at all (listen and chpst take care of that), but kresd can still retain in-memory state that might be useful for handling multiple connections with no exec() ever involved. I agree with this principle entirely. This is good design. The question now becomes: how does the daemon know what fd corresponds to what use? The simplest, natural answer, used for instance by UCSPI clients and servers, is: the fd numbers are hardcoded, they're a part of the daemon interface and that should be documented. So, in your example, the kresd documentation should say something like: - fd 3 must be a datagram socket, that's where kresd will read its UDP DNS queries from. - fds 4 and 5 must be listening stream sockets, that's where kresd will read its TCP DNS queries from. kresd use two sockets for this in order to allow one of them to be set up over TLS. - fd 6 must be a listening Unix domain stream socket, that's where kresd will read its control messages from. To me, that approach is good enough. But you could reasonably argue that's unwieldy. So the next step is to have a map, a simple key-value store that maps identifiers to fd numbers. And the simplest implementation of a key-value store is the environment. So, the kresd documentation could say something like this: kresd will read the values of the UDP_FD, TCP_FD, TLS_FD and CONTROL_FD environment variables. Those variables should contain an integer which represents an open file descriptor. kresd will use the descriptor in $UDP_FD to receive DNS queries over a datagram socket. etc. etc. That's much friendlier to the person who writes the run script for kresd, and that's also pretty easy to implement for the kresd author: read an integer in an environment variable. All in all, it sounds like a good solution when the daemon has more than 2 or 3 fds to handle. But, you say, that's not generic enough a mechanism! A process manager cannot pass a flock of environment variables to a daemon, the names of which depend on the daemon entirely! It has to settle on a few standardized environment variable names. Well, first, yes, it definitely can pass a flock of environment variables, that's what a run script is for. I'm pretty sure systemd even has syntactic sugar to change the daemon's environment before spawning it. With s6, you can also store your variable set into the filesystem and read it via s6-envdir. Second, even if you don't want to do that, it's a simple map, and a map is easily passed as a single variable in the environment, provided you reserve a separator character. Let's say your process manager will pass the LISTEN_FD_MAP environment variable to daemons, and labels aren't allowed to use commas: LISTEN_FD_MAP=udp=3,tcp=4,tls=5,control=6 It's a bit more work for the daemon, but you can still relatively easily extract a map from labels to fd numbers from a single environment variable. OK, now let's have a look at LISTEN_FDS. LISTEN_FDS is designed to be usable without LISTEN_FDNAMES, i.e. as a list. And in that case, you have a single piece of information: the number of open fds. Which means: - the starting fd has to be hardcoded (3) - the fds have to be consecutive. So here you'd have LISTEN_FDS=4 and the fds would be 3,4,5,6. The daemon has to hardcode that 3 is udp, 4 is tcp, 5 is tls and 6 is control - it's exactly as unwieldy for the run script author, and it's less flexible because neither the daemon nor the script author can even choose what fds are allocated! The numbers are enforced by the API. This is not good. Right, so there's the LISTEN_FDNAMES mechanism to help. With LISTEN_FDNAMES, we have a map: the daemon gets labels to help it identify the fds. This avoids any number hardcoding, this makes it more flexible for the run script author. Or does it? The API still constrains you: even if you can provide labels, you still don't send a map, you send a number of fds and a list of labels.
Re: [PATCH 1/3] correct typo
On Tue, Aug 30, 2016 at 01:15:20PM +0200, Laurent Bercot wrote: > >I will be happy to send a 0/n introductory message (git send-email > >--compose) for any future proposed series, i was not aware that was the > >custom here. Indeed, i wrote this series because i could find no clear > >instructions for how to propose improvements to runit's codebase. > > I'm not sure. This is more of a discussion/questions list than a > patch-sending list; as far as I'm concerned I don't like to receive > patches without some discussion before or at least a little context, but > for runit you'd have to check with the new runit maintainer, if there's > one - Gerrit, before you leave, please designate one, and if the answer > is "it's maintained by Debian/whateverdistro these days", then we'll know > the right place to send patches is the Debian/whateverdistro ML (but the > actual maintainers should still subscribe to our list and read it). Hi, I'm not leaving, just getting older.. Although runit now has a new Debian maintainer, I intend to keep my hands on the runit original upstream package, most probably not applying many changes to it, if at all. I consider it very stable and usable. There's one itch https://bugs.debian.org/641632 actually reported by Daniel back then. A proposed patch is available in the devground branch in the git repository. But with this bug we're actually living for years already, seems to happen very rarely. > >Alternately, if there's a better way to propose changes than sending > >mail to this list, i'd be happy to rewrite code.html to document it, but > >i don't know what it is. > > I'm sorry I don't have a clearer answer. Until we know more precisely how > the future of runit is organized, it's kinda in a limbo state. I'm not against patches to runit, as long as submitters are not demotivated by the fact that it may take some time for me to respond, and probably most of the patches won't make it into my git repository. As I did contributions to git end of 2005 and the years after that, I'd be fine with submitting patches through the mailing list, as they still do. OTOH there's some discussion about the development workflow through mailboxes on the git list lately https://marc.info/?t=14713666535&r=1&w=2 Regards, Gerrit.
Re: [PATCH 1/3] correct typo
On Mon, Aug 29, 2016 at 09:10:24PM -0400, Daniel Kahn Gillmor wrote: > --- > package/CHANGES | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/package/CHANGES b/package/CHANGES > index da4ea27..d3c419b 100644 > --- a/package/CHANGES > +++ b/package/CHANGES > @@ -146,7 +146,7 @@ Sun, 16 Apr 2006 12:26:50 + > 1.4.1 > Mon, 20 Mar 2006 18:54:41 + >* doc/faq.html: typos; add usercontrol, userservices; minor. > - * src/uidgid.h: use uid_t, git_t (fix setting of multiple groups with > + * src/uidgid.h: use uid_t, gid_t (fix setting of multiple groups with > dietlibc, thx Tino Keitel, http://bugs.debian.org/356016) Ha! This is a nice one, Freudian slip. I guess now we know approximately when runit version control was migrated from CVS to git ;). Regards, Gerrit.