Re: s6, listen(8), etc.

2016-09-01 Thread Daniel Kahn Gillmor
I think we might be approaching diminishing returns on this thread, so
please don't take offense if i don't reply too much beyond this.  I do
appreciate your taking the time to document your thought process here,
though.  thanks!

A couple minor comments:

On Thu 2016-09-01 08:34:18 -0400, Laurent Bercot wrote:
>   I agree with this principle entirely. This is good design. The question
> now becomes: how does the daemon know what fd corresponds to what use?
>
>   The simplest, natural answer, used for instance by UCSPI clients and
> servers, is: the fd numbers are hardcoded, they're a part of the daemon
> interface and that should be documented.
>   So, in your example, the kresd documentation should say something like:
>
>   - fd 3 must be a datagram socket, that's where kresd will read its
> UDP DNS queries from.
>   - fds 4 and 5 must be listening stream sockets, that's where kresd will
> read its TCP DNS queries from. kresd use two sockets for this in order to
> allow one of them to be set up over TLS.
>   - fd 6 must be a listening Unix domain stream socket, that's where kresd
> will read its control messages from.

This doesn't allow for the creation of multiple different sockets
e.g. for a daemon expected to listen on distinct IPv6 and IPv4
addresses, but not on all available addresses.

>   LISTEN_FD_MAP=udp=3,tcp=4,tls=5,control=6

I'd want the map the other way around: from number to use case, if i was
going to go for a re-design like this, though i suppose it's roughly
equivalent to have udp=3,udp=4,tcp=5 as long as you don't mind having
repeated keys.  but meh, this is basically equivalent.

>   Or does it? The API still constrains you: even if you can provide labels,
> you still don't send a map, you send a number of fds and a list of labels.
> So you still have to make sure the fds you send are 3, 4, 5, 6 ! And
> if your "listen" program happens to have fd 3 open when it's launched
> (and there may be very good reasons to have some fd already open when you
> run a daemon), well, it has to overwrite it with the udp socket it opens,
> because that's what the API forces you to do. If I want the daemon to run
> with some open fd that's not explicitly passed via LISTEN_FDS (for instance
> if the daemon doesn't have to know it has it open) then I have to make sure
> that my fd is at least 3+$LISTEN_FDS. Talk about unwieldy.

this seems no more unwieldy (probably less unwieldy) than the convention
to always use FD 6 and FD 7.  what if i wanted to have some other fd
open on those numbers for some other reason?  i'd have to move it out of
the way to use the s6 convention.  Note that i'm not claiming this is
terrible, it's only a minor hassle.  but it's the same minor hassle as
what you're describing here, no?

>   And from the daemon's side? it has to parse the contents of $LISTEN_FDNAMES,
> which is a colon-separated list of labels; that's not really easier than
> parsing my suggested LISTEN_FD_MAP string.

right, it's equivalent.

>   So, LISTEN_FDS doesn't know if it wants to pass a list of hardcoded fd
> numbers, or a map with labels; it ends up doing a bit of both, badly, and
> getting the worst of both worlds.

eh?  on the contrary: if your daemon doesn't need to distinguish between
any of its sockets -- if all sockets are standard listening sockets
(it can still inspect the socket itself to distinguish between datagram
and stream) then it doesn't care about anything but LISTEN_FDs.

If the daemon receives some sockets (like a local control socket) that
need to be treated differently, then it needs to look at the map.  Your
proposed LISTEN_FD_MAP looks precisely equivalent to LISTEN_FDNAMES for
this purpose, with only minor syntactic differences.  The non-syntactic
difference, of course, is that multiple daemons written and running
today actually already know how to parse LISTEN_FDNAMES, whether because
they link to libsystemd or do the simple parsing themselves.  if
LISTEN_FD_MAP had gotten there first, i would have made listen(8)
implement that labeling scheme.

>   Oh, and I haven't entirely given up my argument of political agenda.
> The only reason you could have for designing such a whimsical interface,
> apart from obvious lack of experience in software design, is to encourage
> daemon authors to use the sd_listen_fds_with_names() interface (and so,
> link against libsystemd), because it's too painful to fully implement it
> yourself as a daemon author - you have to handle the case when
> LISTEN_FD_NAMES is set, the case when it's not set, etc. etc.

(i don't think that baseless slights against the authors of other
software advance your cause here) on the technical merits, i really
don't see it as difficult as you're making it out to be.  When this
convention is in use, your daemon already knows whether it needs to
parse LISTEN_FDNAMES, or whether it can get away with just looking at
LISTEN_FDS.  in any case, both will be set.  And there is absolutely no
need to link against libsystemd.


Re: s6, listen(8), etc.

2016-09-01 Thread Jan Bramkamp

On 01/09/16 15:43, Roger Pate wrote:

On Thu, Sep 1, 2016 at 8:34 AM, Laurent Bercot
 wrote:

 OK, now let's have a look at LISTEN_FDS.


I also find these particular implementation details a poor choice.  I
was going to recommend a different environment convention, but then I
saw the pre-existing convention was copied exactly.

If I was Daniel, I'd create something better.  But I'm not sure
there's enough interest/need to warrant it.  (Daemons currently
written to expect LISTEN_FDS could have a chain-loaded conversion
program.)

Not that I'm particularly knowledgeable here; s6's fdholding seems
able to fill this niche already.


FD holding is a very general mechanism and requires a protocol between 
the FD holding daemon and the client(s). A "./run" script can access the 
FD holding daemon, but this isn't the purpose of the systemd style 
socket activation.


Socket activation is a very old idea in Unix and commonly implemented 
the inetd superserver. This works well for forking servers with low 
setup overhead. There several problems with forking one process per 
connection.


 * For some protocols forking a process per request is too much overhead.

 * Some daemons perform expensive setup operations e.g. OpenSSH used to 
generate an ephemeral SSHv1 server key during startup.


A socket requires very few resources compared to a running process 
especially a process offering a non-trivial service yet only the bound 
socket is required for clients to initiate a connection. This tempted 
Apple to not just create and bind sockets in the superserver but also 
accept the (first) connection inside the superserver before they spawn 
the service process. The problem is that now the superserver has to pass 
the bound socket and the accepted connected socket to the service 
process. This requires a protocol between the superserver and the 
service. Both Apple and systemd "strongly recommend" applications to 
link against their libraries resulting in an annoying vendor lock-in.


On a classic Unix server running a few services this is unnecessary 
complexity, but these days most unixoid systems are powered by a single 
Lithium battery cell. Launchd went too far too quickly and the original 
implementation requires Mach IPC. In a launchd world every service is 
always available and the processes implementing it are spawned on 
demand. There is even a transaction concept enabling launchd to reclaim 
resources from cooperating services unless they're inside a transaction. 
This design works well on a laptop or smartphone. It fails spectacularly 
in the face of "evil legacy software" which doesn't follow the "one true 
way".


The systemd APIs look like they try to follow a middle ground, but they 
end up making a mess out of things. Have a look at 
https://ewontfix.com/15/ if want to know more about systemd's design flaws.


Re: runit kill runsv

2016-09-01 Thread Joan Picanyol i Puig
[sorry for replying late, catching up]

* Laurent Bercot  [20160627 18:05]:
> On 27/06/2016 14:02, Joan Picanyol i Puig wrote:
> >However, couldn't they know whether their child did not cease to run 
> >because
> >of a signal they sent?
> 
>  I'm not sure about runsv, but s6-supervise is a state machine, and the
> service state only goes from UP to FINISH when the supervisor receives a
> SIGCHLD. The state does not change at all after the supervisor sent a
> signal: it sent a signal, yeah, so what - it's entirely up to the daemon
> what to do with that signal.

I understand: supervisors only exec() processes and propagate signals, they
have no saying in nor can expect what their effect is.

> There's an exception for SIGSTOP because stopped daemons won't die
> before you SIGCONT them, but that's it; even sending SIGKILL won't
> make s6-supervise change states. Of course, if you send SIGKILL,
> you're going to receive a SIGCHLD very soon, and *that* will trigger a
> state change.

Given that SIGKILL shares with SIGSTOP the fact that they can't be
caught (and thus supervisors can assume a forthcoming SIGCHLD) signals
(pun intended) that the exception should be extended?

> >No, but neither can the admin enforce this policy automatically and
> >portably using current supervisors. Other than the "dedicated user/login
> >class/cgroup" scheme proposed by Jan (which can be considered best
> >practice anyway), it'd be nice if they exposed this somehow (hand-waving
> >SMOP ahead: duplicate the pid field in ./status and remove the working
> >copy only when receiving a down signal).
> 
>  No need to duplicate the pid field: if s6-supervise dies before the service
> goes down, the pid field in supervise/status is left unchanged, so it still
> contains the correct pid. I suspect runsv works the same.

Ah, ok, it didn't occur to me that pid 0 in supervise/status could be
used to mean "never run or got SIGCHLD"

>  I guess a partial mitigation strategy could be "if supervise/status exists
> and its pid field is nonzero when the supervisor starts, warn that an
> instance of the daemon may still be running and print its pid". Do you
> think it would be worth the effort?

As well as the warning (which would make troubleshooting easier and
might have probably avoided this thread), a robust automation enabling
ui (in s6-svstat / s6-svok) would round this additional feature and make it
yet more useful. 

keep up the good work
-- 
pica


Re: s6, listen(8), etc.

2016-09-01 Thread Roger Pate
On Thu, Sep 1, 2016 at 8:34 AM, Laurent Bercot
 wrote:
>  OK, now let's have a look at LISTEN_FDS.

I also find these particular implementation details a poor choice.  I
was going to recommend a different environment convention, but then I
saw the pre-existing convention was copied exactly.

If I was Daniel, I'd create something better.  But I'm not sure
there's enough interest/need to warrant it.  (Daemons currently
written to expect LISTEN_FDS could have a chain-loaded conversion
program.)

Not that I'm particularly knowledgeable here; s6's fdholding seems
able to fill this niche already.


Re: s6, listen(8), etc.

2016-09-01 Thread Laurent Bercot

On 01/09/2016 07:52, Daniel Kahn Gillmor wrote:

I think you might have misunderstood the description of the convention,
actually.  There's no unix domain socket for receiving the descriptors
at all.  Rather, the descriptors are already open and present in the
process.


 OK, after reading your message, then the sd_listen_fds man page, it
appears that I have indeed misunderstood it. My bad. It my defence,
it really is not clear from that page whether the descriptors are already
open in the process that invokes sd_listen_fds(), or whether sd_listen_fds
actually connects to systemd to receive them. I understood the latter, and
I will agree it was my prejudice speaking; the former is much more
reasonable. So let's dive a bit deeper.



exec listen -udp::53 \
-tcp::53 \
-tcp:label=tls:853 \
-unix:label=control,mode=0600:/run/kresd/control \
chpst -u kresd -p 1 \
/usr/sbin/kresd

This means kresd doesn't need to know about dropping privileges, opening
listening ports, or resource constraints at all (listen and chpst take
care of that), but kresd can still retain in-memory state that might be
useful for handling multiple connections with no exec() ever involved.


 I agree with this principle entirely. This is good design. The question
now becomes: how does the daemon know what fd corresponds to what use?

 The simplest, natural answer, used for instance by UCSPI clients and
servers, is: the fd numbers are hardcoded, they're a part of the daemon
interface and that should be documented.
 So, in your example, the kresd documentation should say something like:

 - fd 3 must be a datagram socket, that's where kresd will read its
UDP DNS queries from.
 - fds 4 and 5 must be listening stream sockets, that's where kresd will
read its TCP DNS queries from. kresd use two sockets for this in order to
allow one of them to be set up over TLS.
 - fd 6 must be a listening Unix domain stream socket, that's where kresd
will read its control messages from.

 To me, that approach is good enough. But you could reasonably argue that's
unwieldy. So the next step is to have a map, a simple key-value store that
maps identifiers to fd numbers. And the simplest implementation of a
key-value store is the environment. So, the kresd documentation could say
something like this:

 kresd will read the values of the UDP_FD, TCP_FD, TLS_FD and CONTROL_FD
environment variables. Those variables should contain an integer which
represents an open file descriptor. kresd will use the descriptor in
$UDP_FD to receive DNS queries over a datagram socket. etc. etc.

 That's much friendlier to the person who writes the run script for kresd,
and that's also pretty easy to implement for the kresd author: read an
integer in an environment variable. All in all, it sounds like a good
solution when the daemon has more than 2 or 3 fds to handle.

 But, you say, that's not generic enough a mechanism! A process manager
cannot pass a flock of environment variables to a daemon, the names of
which depend on the daemon entirely! It has to settle on a few standardized
environment variable names.
 Well, first, yes, it definitely can pass a flock of environment variables,
that's what a run script is for. I'm pretty sure systemd even has syntactic
sugar to change the daemon's environment before spawning it. With s6,
you can also store your variable set into the filesystem and read it via
s6-envdir.
 Second, even if you don't want to do that, it's a simple map, and a map
is easily passed as a single variable in the environment, provided you
reserve a separator character. Let's say your process manager will pass
the LISTEN_FD_MAP environment variable to daemons, and labels aren't
allowed to use commas:
 LISTEN_FD_MAP=udp=3,tcp=4,tls=5,control=6
 It's a bit more work for the daemon, but you can still relatively easily
extract a map from labels to fd numbers from a single environment variable.

 OK, now let's have a look at LISTEN_FDS.

 LISTEN_FDS is designed to be usable without LISTEN_FDNAMES, i.e. as a
list. And in that case, you have a single piece of information: the number
of open fds. Which means:
 - the starting fd has to be hardcoded (3)
 - the fds have to be consecutive.
 So here you'd have LISTEN_FDS=4 and the fds would be 3,4,5,6. The daemon
has to hardcode that 3 is udp, 4 is tcp, 5 is tls and 6 is control - it's
exactly as unwieldy for the run script author, and it's less flexible
because neither the daemon nor the script author can even choose what fds
are allocated! The numbers are enforced by the API. This is not good.

 Right, so there's the LISTEN_FDNAMES mechanism to help. With LISTEN_FDNAMES,
we have a map: the daemon gets labels to help it identify the fds. This
avoids any number hardcoding, this makes it more flexible for the run
script author.
 Or does it? The API still constrains you: even if you can provide labels,
you still don't send a map, you send a number of fds and a list of labels.

Re: [PATCH 1/3] correct typo

2016-09-01 Thread Gerrit Pape
On Tue, Aug 30, 2016 at 01:15:20PM +0200, Laurent Bercot wrote:
> >I will be happy to send a 0/n introductory message (git send-email
> >--compose) for any future proposed series, i was not aware that was the
> >custom here.  Indeed, i wrote this series because i could find no clear
> >instructions for how to propose improvements to runit's codebase.
> 
>  I'm not sure. This is more of a discussion/questions list than a
> patch-sending list; as far as I'm concerned I don't like to receive
> patches without some discussion before or at least a little context, but
> for runit you'd have to check with the new runit maintainer, if there's
> one - Gerrit, before you leave, please designate one, and if the answer
> is "it's maintained by Debian/whateverdistro these days", then we'll know
> the right place to send patches is the Debian/whateverdistro ML (but the
> actual maintainers should still subscribe to our list and read it).

Hi, I'm not leaving, just getting older..  Although runit now has a new
Debian maintainer, I intend to keep my hands on the runit original
upstream package, most probably not applying many changes to it, if at
all.  I consider it very stable and usable.  There's one itch

 https://bugs.debian.org/641632

actually reported by Daniel back then.  A proposed patch is available in
the devground branch in the git repository.  But with this bug we're
actually living for years already, seems to happen very rarely.

> >Alternately, if there's a better way to propose changes than sending
> >mail to this list, i'd be happy to rewrite code.html to document it, but
> >i don't know what it is.
> 
>  I'm sorry I don't have a clearer answer. Until we know more precisely how
> the future of runit is organized, it's kinda in a limbo state.

I'm not against patches to runit, as long as submitters are not
demotivated by the fact that it may take some time for me to respond,
and probably most of the patches won't make it into my git repository.

As I did contributions to git end of 2005 and the years after that, I'd
be fine with submitting patches through the mailing list, as they still
do.  OTOH there's some discussion about the development workflow through
mailboxes on the git list lately

 https://marc.info/?t=14713666535=1=2

Regards, Gerrit.