Re: Preliminary version of s6-rc available

2015-08-23 Thread Guillermo
Hello,

I have new issues with the current s6-rc git head (after yesterday's
bugfixes), discovered with the following scenario: a service database
with only two longruns, producersvc and loggersvc, the latter
being the former' s logger. Loggersvc's service definition directory
had only 'consumer-for', 'run' and 'type' files, the run script being:

#!/bin/execlineb -P
redirfd -w 1 /home/test/logfile
s6-log t 1

This means no readiness notification for this service.

So the issues:

* s6-rc-fdholder-filler appears to have a bug when creating
identifiers for the writing end of the pipe between producersvc and
loggersvc:

$ s6-fdholder-list scan directory/s6rc-fdholder/s
pipe:s6rc-r-loggersvc
pipe:s6rc-w-loggersvc5\0xdaU

I also saw this with longer pipelines; identifiers for the reading
ends were OK, identifiers for the writing ends ended with random
characters. I didn't try to start producersvc, since I expected it to
fail trying to retrieve the nonexistent pipe:s6rc-w-loggersvc file
descriptor.

* s6-rc was unable to start loggersvc. More specifically, 's6-rc -v3
change loggersvc' produced this output:

s6-rc: info: bringing selected services up
s6-rc: info: processing service s6rc-fdholder: already up
s6-rc: warning: unable to access live state dir
symlink/scandir/loggersvc/notification-fd: No such file or directory
s6-rc: info: processing service loggersvc: starting
s6-ftrigrd: fatal: unable to sync with client: Broken pipe
s6-svlisten1: fatal: unable to ftrigr_startf: Connection timed out
s6-rc: warning: unable to start service loggersvc: command exited 111

However, a manual 's6-svc -uwu scan directory/loggersvc' succesfully
started the service, and the following test showed that it worked:

$ execlineb -c 's6-fdholder-retrieve scan directory/s6rc-fdholder/s
pipe:s6rc-w-loggersvc5\0xdaU fdmove 1 0 echo Test message'

(3 times)

$ cat logfile | s6-tai64nlocal
2015-08-23 18:09:10.822137309  Test message
2015-08-23 18:09:16.871541383  Test message
2015-08-23 18:09:18.219259082  Test message

So I'd have to conclude the problem is in s6-rc, although I didn't see
anything obvious that could launch an s6-svlisten1 process.

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-08-22 Thread Laurent Bercot

 Should be all fixed, thanks!

--
 Laurent


Re: Preliminary version of s6-rc available

2015-08-22 Thread Laurent Bercot

On 22/08/2015 08:26, Colin Booth wrote:

I run my s6 stuff in slashpackage configuration so I missed the
s6-fdholder-filler issue. The slashpackage puts full paths in for all
generated run scripts so I'm a little surprised it isn't doing that
for standard FHS layouts.


 FHS doesn't guarantee absolute paths. If you don't
--enable-slashpackage, the build system doesn't use absolute paths
and simply assumes your executables are reachable via PATH search.

 Unexported executables are a problem for FHS: by definition, they
must not be accessible via PATH, so they have to be called with an
absolute path anyway. This is a problem when using staging
directories, but FHS can't do any better.

 Here, I had simply forgotten to give the correct prefix to the
s6-fdholder-filler invocation, so the PATH search failed as it is
supposed to.

--
 Laurent



Re: Preliminary version of s6-rc available

2015-08-22 Thread Colin Booth
On Fri, Aug 21, 2015 at 6:36 PM, Guillermo gdiazhartu...@gmail.com wrote:
 Hello,

 I have the following issues with the current s6-rc git head (last
 commit 8bdcc09f699a919b500885f00db15cd0764cebe1):
(snip)


I run my s6 stuff in slashpackage configuration so I missed the
s6-fdholder-filler issue. The slashpackage puts full paths in for all
generated run scripts so I'm a little surprised it isn't doing that
for standard FHS layouts.

All the uid/gid stuff I've verified as failing in the same ways. I'd
also expect the gid directories to either: not be symlinks but their
own directories or have a single access directory that both the uid
and gid entries are links to. I also don't know s6-fdholder's rules
well enough, but does it treat uid 0 special or if you specify a
non-root uid do you also need to specify root?

Lastly, I appear to have never run `s6-rc-db pipeline longrun'. From
the source it's failing in the if (buffer_flush(buffer_1)) call. I may
be wrong, but I think removing the if test and just forcing out the
flush is what you want.

Cheers!

-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: Preliminary version of s6-rc available

2015-08-21 Thread Guillermo
Hello,

I have the following issues with the current s6-rc git head (last
commit 8bdcc09f699a919b500885f00db15cd0764cebe1):

* s6-rc-compile doesn't copy the 'nosetsid' file in the service
definition directory of a longrun to the compiled database directory.

* s6-rc-compile produces an error if it is given the -u option with
more than one user ID. More precisely, if it was called with '-u
uid1,uid2,uid3,...', the error is 's6-rc-compile: fatal: unable to
symlink uid1 to compiled DB
dir/servicedirs/s6rc-fdholder/data/rules/uid/uid2: File exists'.

* s6-rc-compile produces an error if it is given the -g option without
the -u option, and produces rules directories that look wrong to me
otherwise (or I didn't understand them). More precisely, if it was
called with '-g gid1,gid2,gid3,...' and no -u option, the error is
's6-rc-compile: fatal: unable to mkdir compiled DB
dir/servicedirs/s6rc-fdholder/data/rules/uid/0/env: No such file or
directory'. And if it was called with '-u user -g gid1,gid2,gid3,...',
then:

  + Both s6rc-fdholder and s6rc-oneshot-runner have gid1, gid2, gid3,
... directories, but showing up in data/rules/uid, and

  + s6rc-fdholder has symlinks gid1, gid2, gid3, ... under
data/rules/gid, pointing to data/rules/uid/user, but
s6rc-oneshot-runner has an empty data/rules/gid.

* 's6-rc-db pipeline' displays the expected result, but outputs a
's6-rc-db: fatal: unable to write to stdout: Success' message at the
end.

* Starting s6rc-fdholder produces an 's6-ipcclient: fatal: unable to
exec s6-rc-fdholder-filler: No such file or directory' error. I guess
because it exists in the libexecdir, which isn't normally in
s6-svscan's PATH, so the run script should probably use the full path?

Thanks,
G.


Re: Preliminary version of s6-rc available

2015-07-19 Thread Guillermo
2015-07-12 2:59 GMT-03:00 Laurent Bercot:

  s6-rc is available to play with.
 [...] I decided to publish what's already there, so you can test it and
 give feedback while I'm working on the rest. You can compile service
 definition directories, look into the compiled database, and run the
 service manager. It works on my machines, all that's missing is the
 live update capability.

Hi,

Well, I haven't been very lucky with oneshots. First, the #!execline
shebang with no absolute path doesn't work on my system, even if the
execlineb program can be found via the PATH environment variable.
Neither does #!bash, #!python, or any similar construct. If I run
a script from the shell with such a shebang line I get a bad
interpreter: No such file or directory message. And s6-supervise
fails too:

s6-supervise (child): fatal: unable to exec run: No such file or directory
s6-supervise servicedir name: warning: unable to spawn ./run -
waiting 10 seconds

And because s6rc-oneshot-runner has a run script with an #!execline
shebang, it cannot start, and therefore oneshots don't work :)
However, I was able to work around this in two ways: either by just
modifying s6rc-oneshot-runner's run script in the servicedirs/
subdirectory of the live state directory, or by using Linux
binfmt_misc magic[1]. So now I'm really curious about how the
#!execline shebang worked on your test systems.

But once I could get s6rc-oneshot-runner to start, I ran into another
problem. s6-rc change then failed to run my test oneshot with this
message:

s6-ipcclient: fatal: unable to connect to
/path-to/live/servicedirs/s6rc-oneshot-runne: No such file or
directory
s6-rc: warning: unable to start service oneshot name: command exited 111

/path-to/live/ represents here what was the full path of the live
state directory, and the  was really a string of random
characters. I suppose this was meant to be the path to
s6rc-oneshot-runner's local socket, but somehow ended up being
gibberish instead. So oneshots still don't work for me :(

Longruns without a logger work for me as expected, and I haven't tried
loggers, bundles and dependencies yet.

Now some other general comments:

* It looks like s6-rc-compile ignores symbolic links to service
definition directories in the source directories specified in the
command line; they seem to have to be real subdirectories. I don't
know if this is deliberate or not, but I'd like symlinks to be allowed
too, just like s6-svscan allows symbolic links to service directories
in its scan directory.

* I'm curious about why is it required to also have a producer file
pointing back from the logger, instead of just a logger file in the
producer's service definition directory. Is it related to the parsing
sucks issue?

* It doesn't really bother me that much, but it might be worth making
down files optional for oneshots, with an absent file being the same
as one contanining exit, just like finish files are optional for
longruns.

* I second this:

2015-07-14 13:23 GMT-03:00 Colin Booth:

 s6-rc-init: remove the uid 0 restriction to allow non-privileged
 accounts to set up supervision trees.

I test new versions of s6 on an entirely non-root supervision tree,
with services that can be run by that user, separate of the
system-wide (privileged) supervision tree, if any. And it is also
the way I'm testing s6-rc now. But, independently of any potential
use-cases, I really see it this way: s6-svscan and s6-supervise are
already installed with mode 0755 and can therefore happily run as any
user besides root. So it is possible to build a non-root supervision
tree, and if some services refuse to run because of permission
denied errors, they will be gracefully dealt with just like with any
other failure mode; the user will know via the supervision tree logs,
and no harm is done. So if a non-root supervision tree is allowed, why
not a service manager on top of it, too?

2015-07-16 19:16 GMT-03:00 Laurent Bercot:

  I understand. I guess I can make s6-rc-init and s6-rc 0755 while
 keeping them in /sbin, where Joe User isn't supposed to find them.

It would be nice if s6rc-oneshot-runner's data/rules directory (for
s6-ipcserver-access on the local socket) could also be changed, so it
doesn't allow only root. For example, allow the user s6-rc-init ran as
instead (or in addition to root), or allow the specification of an
allowed user, or a complete rulesdir / rulesfile, with an -u, -i or -x
option to s6-rc-compile or sc-rc-init. The user checked against the
data/rules rulesdir would be the one s6-rc was run as, right? So it
defines which user is allowed to run oneshots?

And finally, for the record, it appears that OpenRC doesn't mount /run
as noexec, so at least Gentoo in the non-systemd configuration, and
probably other [GNU/]Linux distributions with OpenRC as part of their
init systems, won't have any problems with service directories under
/run.

Cheers!
G.

[1] http://www.kernel.org/doc/Documentation/binfmt_misc.txt


Re: Preliminary version of s6-rc available

2015-07-19 Thread Laurent Bercot

On 19/07/2015 20:13, Guillermo wrote:

Well, I haven't been very lucky with oneshots. First, the #!execline
shebang with no absolute path doesn't work on my system, even if the
execlineb program can be found via the PATH environment variable.
Neither does #!bash, #!python, or any similar construct. If I run
a script from the shell with such a shebang line I get a bad
interpreter: No such file or directory message.


 Looks like your kernel can't do PATH searches.
 The #!execline shebang worked on Linux 3.10.62 and 3.19.1. But yeah,
it's not standard, so I'll find a way to put absolute paths there, no
big deal.



/path-to/live/servicedirs/s6rc-oneshot-runne: No such file or
directory
s6-rc: warning: unable to start service oneshot name: command exited 111

/path-to/live/ represents here what was the full path of the live
state directory, and the  was really a string of random
characters. I suppose this was meant to be the path to
s6rc-oneshot-runner's local socket, but somehow ended up being
gibberish instead. So oneshots still don't work for me :(


 I committed a few quick changes lately, I probably messed up some
string copying/termination. I'll investigate and fix this.



* It looks like s6-rc-compile ignores symbolic links to service
definition directories in the source directories specified in the
command line; they seem to have to be real subdirectories. I don't
know if this is deliberate or not, but I'd like symlinks to be allowed
too, just like s6-svscan allows symbolic links to service directories
in its scan directory.


 It was deliberate because I didn't want to read the same subdirectory
twice if there's a symlink to a subdirectory in the same source
directory. But you're right, this is not a good reason, I will remove
the check. Symlinks to a subdirectory in the same place will cause a
duplicate service definition error, though.



* I'm curious about why is it required to also have a producer file
pointing back from the logger, instead of just a logger file in the
producer's service definition directory. Is it related to the parsing
sucks issue?


 It's just so that if the compiler encounters the logger before the
producer, it knows right away that it is involved in a logged service
and doesn't have to do a special pass later on to adjust service
directory names.
 It also doubles up as a small database consistency check, and
clarity for the reader of the source.

 

* It doesn't really bother me that much, but it might be worth making
down files optional for oneshots, with an absent file being the same
as one contanining exit, just like finish files are optional for
longruns.


 Right. You can have empty down files already for this purpose; I guess
I could make them entirely optional.



The user checked against the
data/rules rulesdir would be the one s6-rc was run as, right? So it
defines which user is allowed to run oneshots?


 Yes. And indeed, allowing s6-rc to be run by normal users implies
changing the configuration on s6rc-oneshot-runner. I'll work on it.



And finally, for the record, it appears that OpenRC doesn't mount /run
as noexec, so at least Gentoo in the non-systemd configuration, and
probably other [GNU/]Linux distributions with OpenRC as part of their
init systems, won't have any problems with service directories under
/run.


 That's good news !

 Thanks a lot for the feedback ! I have a nice week of work ahead of me...

--
 Laurent



Re: Preliminary version of s6-rc available

2015-07-19 Thread Colin Booth
On Fri, Jul 17, 2015 at 10:13 AM, Claes Wallin (韋嘉誠)
skar...@clacke.user.lysator.liu.se wrote:
 On 17-Jul-2015 12:49 am, Colin Booth cathe...@gmail.com wrote:

 Depending on your cron, users might be able to simply put an @reboot
 s6-svscan in their user crontab. I don't see many drawbacks with that.

There's nothing managing the per-user s6-svscan if it dies during
normal system runtime, which defeats the entire purpose of using a
supervision framework in the first place. With process suprevision, at
some point your supervision tree must have PID 1 bringing the tree
back up (be it an inittab entry, s6-svscan running as init, runit
managing runsvdir and so on) otherwise you're only playing tricks with
daemonization. Using @reboot crontab entries is a clever way around
the reboot case, but like I said above, it doesn't protect the
supervision root process outside of that event.

I actually think that systemd based systems can have a correctly
supervised non-privileged supervision tree through the use of loginctl
enable-linger and daemon-ish unit files. So you could bring up your
supervision tree that way, or just forego the process supervisor and
write directly against systemd. I however don't have any systemd hosts
laying around to test that on, and even if I did s6-rc and systemd
both cover the same operational space.

Cheers!

-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: Preliminary version of s6-rc available

2015-07-17 Thread Laurent Bercot

On 17/07/2015 09:26, Rafal Bisingier wrote:

So I run them as a service with sleep BIG in
finish script (it's usually unimportant if this runs on same hours
every day). I can have this sleep in the main process itself, but it
isn't really it's job


 I also use a supervision infrastructure as a cron-like tool. In those
cases, I put everything in the run script:
 if { periodic-task } sleep $BIG

 periodic-task's run time is usually more or less negligible compared
to $BIG, and I'm not expecting to be controlling it with signals anyway
- but I like to being able to kill the sleep if I want to run
periodic-task again earlier for some reason. So I don't mind executing
a short-lived (even if it takes an hour or so) process in a child, and
then having the run script exec into the sleep. And since
periodic-task exits before the sleep, it doesn't block resources
needlessly.

 Whereas if your sleep is running in the finish script, you have no
way to control it. You stay in a limbo state for $BIG and your service
is basically unresponsive that whole time; it's reported as down (or
finish with runit) but it's still the normal, running state. I find
this ugly.

 What do you think ? Is putting your periodic-task in a child an
envisionable solution for you, or do you absolutely need to exec into
the interpreters ?

--
 Laurent



Re: Preliminary version of s6-rc available

2015-07-16 Thread Laurent Bercot

On 16/07/2015 19:22, Colin Booth wrote:

You're right, ./run is up, and being in ./finish doesn't count as up.
At work we use a lot of runit and have a lot more services that do
cleanup in their ./finish scripts so I'm more used to the runit
handling of down statuses (up for ./run, finish for ./finish, and down
for not running). My personal setup, which is pretty much all on s6
(though migrated from runit), only has informational logging in the
./finish scripts so it's rare for my services to ever be in that
interim state for long enough for anything to notice.


 I did some analysis back in the day, and my conclusion was that
admins really wanted to know whether their service was up as opposed
to... not up; and the finish script is clearly not up. I did not
foresee a situation like a service manager, where you would need to
wait for a really down event.



As for notification, maybe 'd' for when ./run dies, and 'D' for when
./finish ends. Though since s6-supervise SIGKILLs long-running
./finish scripts, it encourages people to do their cleanup elsewhere
and as such removes the main reason why you'd want to be notified on
when your service is really down. If the s6-supervise timer wasn't
there, I'd definitely suggest sending some message when ./finish went
away.


 Yes, I've gotten some flak for the decision to put a hard time limit
on ./finish execution, and I'm not 100% convinced it's the right
decision - but I'm almost 100% convinced it's less wrong than just
allowing ./finish to block forever.

 ./finish is a destroyer, just like close() or free(). It is nigh
impossible to define sensical semantics that allow a destroyer to fail,
because if it does, then what do you do ? void free() is the right
prototype; int close() is a historical mistake.
 Same with ./finish ; and nobody tests ./finish's exit code and that's
okay, but since ./finish is a user-provided script, it has many more
failure modes than just exiting nonzero - in particular, it can hang
(or simply run for ages). The problem is that while it's alive, the
service is still down, and that's not what the admin wants.
Long-running ./finish scripts are almost always a mistake. And that's
why s6-supervise kills ./finish scripts so brutally.

 I think the only satisfactory answer would be to leave it to the user :
keep killing ./finish scripts on a short timer by default, but have
a configuration option to change the timer or remove it entirely. And
with such an option, a burial notification when ./finish ends becomes
a possibility.



Ah, gotcha. I was sending explicit timeout values in my s6-rc comands,
not using timeout-up and timeout-down files. Assuming -tN is the
global value, then passing that along definitely makes sense, if
nothing else than to bring its behavior in-line with the behavior of
timeout-up and timeout-down.


 Those pesky little s6-svlisten1 processes will get nerfed.



Part of my job entails dealing with development servers where
automatic deploys happen pretty frequently but service definitions
dont change too often. So having non-privileged access to a subsection
of the supervision tree is more important than having non-privileged
access to the pre- and post- compiled offline stuff.


 I understand. I guess I can make s6-rc-init and s6-rc 0755 while
keeping them in /sbin, where Joe User isn't supposed to find them.



By the way, that's less secure than running a full non-privileged
subtree.


 Oh, absolutely. It's just that a full setuidgid subtree isn't very
common - but for your use case, a full user service database makes
perfect sense.

--
 Laurent



Re: Preliminary version of s6-rc available

2015-07-14 Thread Colin Booth
On Mon, Jul 13, 2015 at 3:20 PM, Laurent Bercot ska-skaw...@skarnet.org wrote:
  Ah, so that's why you didn't like the must not exist yet requirement.
 OK, got it.
  Yeah, mounting another tmpfs inside the noexec tmpfs can work, thanks
 for the idea. It's still ugly, but a bit less ugly than the other choices.
 I don't see anything inherently bad in nesting tmpfses either, it's just a
 small waste of resources - and distros that insist on having /run noexec
 are probably not the ones that care about thrifty resource management.

It's the (least) ugly option that I can think of. Like I said, not
great but better than the alternative. It does give some nice per-user
isolation as well if you're running multiple sub-trees.

  s6-rc obviously won't mount a tmpfs itself, since the operation is
 system-specific. I will simply document that some distros like to have
 /run noexec and suggest that workaround.

And s6-rc shouldn't be responsible for handling the creation and
mounting of its tmpfs, system specific or not. That's the
responsibility of the system administrator or the package maintainer.


  Yes, I'm going to change that. absent was to ensure that s6-rc-init
 was really called early at boot time in a clean tmpfs, but absent|empty
 should be fine too.

A fresh, empty tmpfs is probably cleaner than a freshly created
directory in a dirty tmpfs (like /run can be), at least if you're
running s6-svscan in non-pid1 mode.

  Landmines indeed. Services aren't guaranteed to keep the same numbers
 from one compiled to another, so you may well have shuffled the live
 state without noticing, and your next s6-rc change could have very
 unexpected results.

Everything seemed to work out ok but live-updating stuff without
adjusting the state file seemed dicy.

  But yes, bundle and dependency changes are easy. The hard part is when
 atomic services change, and that's when I need a whiteboard with tables
 and flowcharts everywhere to keep track of what to do in every case.

Yeah, that'll be a bit harder. Good luck with your whiteboarding.


  Please mention them. If you're having trouble with the tools, so will
 other people.

Most of the stuff has been handled with my closer reading of s6-rc -a,
plus the changes to s6-rc list. Plus simply familiarizing myself with
the tools and their output has helped a lot. I did find a few bugs,
documentation or otherwise:

s6-rc-db: [-d] dependencies servicename exits 1 if you pass it a
bundle. Interestingly, all-dependencies servicename shows the full
dependency tree if you pass it a bundle and the docs makes no special
mention of bundles so I'm guessing that the failure when checking
dependencies of bundles is a bug and that the docs are correct.

s6-rc-init.html: Typical usage could be mis-read to have someone who
hasn't been working with s6 for a while to think that s6-rc-init
should be run before the catch-all logger is set up.
index.html Discussion location listed twice.

s6-rc.html: longrun transitions for non-notification supporting
services should say that the service is considered to be up as soon as
s6-supervise is forked and ./run is executed. This deals with an
ambiguity case for non-supervision experts who may not think of the
run script as part of the service. This might be talked about in the
s6 docs, but it's important and should be repeated if that is the
case.

s6-rc.html: note that s6-rc will block indefinitely when starting
services with notification support unless a timeout is set. Similar to
the above, dry-running commands will tell you what's going on under
the hood, but otherwise it's a bit of a black box.

s6-rc: if you run `s6-rc -utN change service' and the timeout occurs,
s6-rc -da list still reports the service down (as per the docs) but
subsequent runs of `s6-rc -u change service' complain about not being
able to remove the down file. I'd expect a service that timed out on
startup to have the down file since s6-rc-compile.html notes that down
files are used to mark services that s6-rc considers to be down. Maybe
make the removal of the down file the last thing the startup routine
does instead of the first since I'd consider interrupting or killing a
call to s6-rc the same as timing out (and as such shouldn't change the
reported state). -dtN has the same behavior (putting the down file in
place before calling s6-svc) but in that case erring on the side of
down feels correct.

s6-svc: -Dd doesn't seem to take finish scripts into account. Not a
bug per-se, but somewhat surprising since a run script is considered
to be part of the service. Initially I thought this was a s6-rc
timeout bug which is why I noticed it here originally.

s6-rc: Unless there's a really good reason not to, -tN should pass
along its timeout value to the forked s6-svc and s6-svlisten1
processes. If for no other reason than it'll keep impatient
administrators with misbehaving processes and too-low shutdown
timeouts from spawning tons and tons of orphaned s6-svlisten1
processes.


Re: Preliminary version of s6-rc available

2015-07-13 Thread Laurent Bercot

On 13/07/2015 17:35, Colin Booth wrote:

Those options are all bad. My workaround was to mount a new tmpfs
inside of run (that wasn't noexec) but that made using s6-rc annoying
due to the no directory requirement. I don't think there's anything
inherently bad about nesting mounts in this way though I could be
mistaken.


 Ah, so that's why you didn't like the must not exist yet requirement.
OK, got it.
 Yeah, mounting another tmpfs inside the noexec tmpfs can work, thanks
for the idea. It's still ugly, but a bit less ugly than the other choices.
I don't see anything inherently bad in nesting tmpfses either, it's just a
small waste of resources - and distros that insist on having /run noexec
are probably not the ones that care about thrifty resource management.

 s6-rc obviously won't mount a tmpfs itself, since the operation is
system-specific. I will simply document that some distros like to have
/run noexec and suggest that workaround.



My suggestion is for one of: changing the s6-rc-init behavior to
accept an empty or absent directory as a valid target instead of just
absent


 Yes, I'm going to change that. absent was to ensure that s6-rc-init
was really called early at boot time in a clean tmpfs, but absent|empty
should be fine too.



Hm, either the documentation or my reading skills need work (and I'm
not really sure which).


 When in doubt, I'll improve the doc: a good doc should be understandable
even by people with uncertain reading skills. :)



Actually, assuming you're only making bundle and dependency changes,
it looks like swapping out db, n, and resolve,cdb from under s6-rc's
nose works. I'd be unsurprised if there were some landmines in doing
that but it worked for hot-updating my service sequence.


 Landmines indeed. Services aren't guaranteed to keep the same numbers
from one compiled to another, so you may well have shuffled the live
state without noticing, and your next s6-rc change could have very
unexpected results.

 But yes, bundle and dependency changes are easy. The hard part is when
atomic services change, and that's when I need a whiteboard with tables
and flowcharts everywhere to keep track of what to do in every case.



Glad to hear it. So far s6-rc feels like what I'd expect from a
supervision-oriented rc system. There are some issues that I I haven't
mentioned but I'm pretty sure those are mostly due to unfamiliarity
with the tools more than anything else.


 Please mention them. If you're having trouble with the tools, so will
other people.

--
 Laurent



Re: Preliminary version of s6-rc available

2015-07-12 Thread Colin Booth
On Sat, Jul 11, 2015 at 10:59 PM, Laurent Bercot
ska-skaw...@skarnet.org wrote:


  So I decided to publish what's already there, so you can test it and
 give feedback while I'm working on the rest. You can compile service
 definition directories, look into the compiled database, and run the
 service manager. It works on my machines, all that's missing is the
 live update capability.

The requirement for the s6-rc-init live directory to not exist is
awkward if trying to go with the defaults on a distro system since
/run is mounted noexec. It's pretty easy to work around but then the
defaults are broken on distro systems.

It'd be nice if s6-rc-db contents printed a newline after the last
service for single-item bundles:
root@radon:/run/s6# s6-rc-db -l /run/s6/s6-rc/ contents 1
getty-5root@radon:/run/s6# s6-rc-db -l /run/s6/s6-rc/ contents 2
getty-6root@radon:/run/s6# s6-rc-db -l /run/s6/s6-rc/ contents 3
getty-5
getty-6
root@radon:/run/s6#

  Bug-reports more than welcome: they are in demand!

s6-rc -da list doesn't seem to work right. At least, it doesn't work
like I'd expect, which is to show all services that are down. Given
two longruns getty-5 and getty-6, with getty-5 up and getty-6 down,
I'd expect s6-rc -da list to show getty-6 (and s6-rc -ua list to only
show getty-5). Currently -da list and -ua list show the same thing:
root@radon:/run/s6# s6-rc -l /run/s6/s6-rc -da list
getty-5
root@radon:/run/s6# s6-rc -l /run/s6/s6-rc -ua list
getty-5

s6-svstat shows the correct status of the world:
root@radon:/run/s6# s6-svstat service/getty-5/
up (pid 8944) 301 seconds
root@radon:/run/s6# s6-svstat service/getty-6/
down (signal SIGTERM) 206 seconds

`s6-rc -ua change' also doesn't seem to do what I'd expect. `s6-rc -da
change' brings down all running services, `s6-rc -pda change' brings
down all running services and then starts all stopped services.
Following that logic I'd expect `s6-rc -ua change' to start all
stopped services, however it instead appears to do nothing. My guess
is that it's related to the issues with -a above and that -a is only
ever returning the things in the up group.

Not exactle a bug but the docs are wrong: the index page points to
s6-rc-upgrade when it should point to s6-rc-update.

 --
  Laurent

Lastly, I know you're working on it but s6-rc-update will be much
appreciated. Having to tear down the entire supervision tree, delete
the compiled and live directories, and then re-initialize everything
with s6-rc-init is awkward to say the least. Especially with the above
issues involving s6-rc-init and not being able to overwrite the
contents of directories if they exist.

That's everything I've found in an hour or two of messing around. I
haven't done anything with oneshots or larger dependency trees yet, so
far it's just been a few getty processes and some wrapper bundles.

Cheers!

-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake