RE: Determining the running init

2016-05-05 Thread James Powell
From: Steve Litt
Sent: Monday, May 2, 2016 4:18 PM
To: supervision@list.skarnet.org
Subject: Re: Determining the running init

On Mon, 02 May 2016 16:20:20 -0400
James Cloos  wrote:

> I don't have a runit system up right now.
> 
> Could someone show me what:
> 
>   lsof -a -p 1 -d txt
> 
> outputs when run on a runit-based system?
> 
> Thanks!
> 
> -JimC

[root@mydesk ~]# lsof -a -p 1 -d txt
COMMAND PID USER  FD   TYPE DEVICE SIZE/OFF    NODE NAME
runit 1 root txt    REG    8,1   754872 2364332 /usr/bin/runit
[root@mydesk ~]#

Beware that this is Runit as installed by Void Linux. I suppose
theoretically a different type of Runit installation could yield a
different result.

SteveT

Steve Litt 
April 2016 featured book: Rapid Learning for the 21st Century
http://www.troubleshooters.com/rl21


Runit will vary upon implementation but generally most follow the same 
guidelines for Stage 2. Stage 1 will vary depending on how the scripts are 
setup. You can have a monolithic stage 1 or modular stage 1.

-Jim


RE: OpenRC findings (was: Some suggestions about s6 and s6-rc)

2015-09-21 Thread James Powell
A lot of problems still stem from the fact most scripts are still written in 
Bash with sysvinit-isms embedded within. OpenRC and the service init scripts 
could possibly be modernized further than they are to use s6, runit, or 
daemontools-encore style and using commands and even use possibly execline to 
some extent.

However, OpenRC does things sysvinit never got to or could without getting very 
hackish with scripts. It was a step in the right direction, but more and 
further steps could be taken to further extend OpenRC possibly with a 
supervisor the same way OpenRC extended sysvinit in handling startup of system 
and services.

But that's my $0.02.

From: Laurent Bercot
Sent: ‎9/‎20/‎2015 7:07 PM
To: supervision@list.skarnet.org
Subject: OpenRC findings (was: Some suggestions about s6 and s6-rc)

On 20/09/2015 11:12, Laurent Bercot wrote:
>   Interesting, thanks for the notice. I'll have to download OpenRC and
> perform experiments to see exactly what it's doing.

  So, I downloaded OpenRC, compiled it - the build process makes a lot
of annoying assumptions, such as "ncurses is there and has been
installed with pkg-config") - and experimented with it.

  I'll spare you the details, but give the essential results, which
are... interesting, to say the least.

  * Strictly speaking, I was wrong: on a high level, rc_parallel
does honor the dependency graph. OpenRC only starts a subprocess
for a service when all the subprocesses for the service's dependencies
have exited. Functionally, it's looking good. Which is strange,
because with a fundamentally serial design, it shouldn't be able to
do that... so I explored more.

  * The way OpenRC implements rc_parallel is this: instead of starting
one subprocess at a time like in the serial case, it starts all the
subprocesses for the whole dependency chain at the same time, and lets
them sort themselves out.

  * The way a subprocess knows when it can run: it attempts to take a
lock on a specific file for that subprocess. When it manages to grab
the lock, it means that all its dependencies have completed.
But it doesn't attempt to grab the lock in blocking way. Instead...
it uses a nonblocking flock() call that it loops around with a 20 ms
sleep in-between two calls.

  In other words, for every service that is going to be brought up
(or down) but is waiting for a dependency to complete, OpenRC spawns
a new process that polls on a lock file every 20 milliseconds.

  This is EVIL and irresponsible. One poll is bad enough, but N
parallel polls, N being the number of processes waiting for
dependencies? This can quickly amount to a non-negligible load,
for big dependency graphs.

  And it's not even correct! You can't poll a lock. Another OpenRC
invocation at the wrong time may very well grab the lock that has
just been released by the dependencies, and the process polling on
that lock will fail to take it, again. Race conditions like this is
how you corrupt a database. This is Oracle-level coding, boys, I
hope I'm not infringing a EULA here.

  But of course, you can't block on a lock either, because if you do,
you can't implement a timeout. There's a solution to that: fork a
thread, or a process. The s6lock library does that. Among all the
processes spawned by an OpenRC invocation, including at least one
shell script per service, forking one to handle timed locks would
not have been a bad idea. But polling the lock? No, just no.

  Even with a correct solution such as "use s6lock_* functions or
similar to attempt a lock grab", this would not entirely solve the
problem, because a concurrent OpenRC invocation could also attempt
to grab the same lock and win, keeping the service in an unfinished
state! Lock-grabbers have no priority, there's no guaranteed order
in which they will get the lock.

  "One lock per service, released when the service has no more pending
dependencies" is just not the right mechanism. It's not powerful
enough to guarantee correct behaviour. Especially when you poll the
lock, which you should *never* do.

  So, even if I was wrong in my initial assessment, and OpenRC does
honor the dependency graph when the rc_parallel option is used, I
still cannot recommend the use of that option, because as for now,
its implementation is an ugly, ugly hack. And after what I've seen
of OpenRC, and the bloated straces that result from invoking it,
I'm even reluctant to recommend it as a serial service manager. :(

--
  Laurent



RE: Some suggestions about s6 and s6-rc

2015-09-20 Thread James Powell
Well generally I hatched the idea in my head when I started looking at how 
OpenRC can use stacked runlevels and it made me think of the typical stage-2 
script which acts as a master controller script to start the supervisor and 
execute everything in the service directory. Because s6 provides a supervisor 
and sysvinit provides the executables to bring the system up and/or down, and 
OpenRC can provide stacked runlevels executing services in parallel, why not 
use all three in synchronization with each other to perform vital functions 
betwixt and between the other.

Example:

Typically a service like dbus is started with the stage-2 script executing the 
/service directory as a whole, but controlling dbus directly requires some 
abstract commands specific to s6 but nothing else. My idea was why not 
eliminate the /service directory and have everything ran in parallel via 
OpenRC, but instead of calling the supervisor to execute everything as just s6, 
wrap the s6 commands via the start, stop, status, and restart calls in each 
script. This way OpenRC and s6 utilize equal control over the service, the 
service runs with supervision, and runs in parallel start up with the necessary 
state tracker checks of service in the s6 scripts separate from OpenRC.

Basically, s6 gets micromanaged. It sounds complex, but in thinking, it's 
relatively human in thinking like BSDinit is. You don't technically eliminate 
stage-2, but create multiple stage-2-like instances unique to each service for 
tighter control.

Am I making any sense here? Is this workable to some extent? I'm just trying to 
think outside the box here.

-Jim

From: Laurent Bercot<mailto:ska-supervis...@skarnet.org>
Sent: ‎9/‎19/‎2015 7:52 AM
To: supervision@list.skarnet.org<mailto:supervision@list.skarnet.org>
Subject: Re: Some suggestions about s6 and s6-rc

On 19/09/2015 14:52, James Powell wrote:
> I don't see it, rc_parallel, as entirely broken, that is if you
> follow proper scripting techniques and create the proper dependency
> prestarts.

  Even if you do, it's not guaranteed to work as long as you don't
have a way to notify readiness. In the serial case, OpenRC starts a
subprocess to start a service, and readiness is assumed when the
subprocess exits. That defers readiness test to the subprocess, which
is perfectly reasonable.
  With rc_parallel, you just don't wait for the subprocess to exit.
I haven't studied the code in detail, but without any readiness
notification system, there's no way it's going to respect the
dependency graph. It's basically "start everything at the same
time, and yolo". Which defeats the purpose of a dependency-based
service manager.


> I've often wondered if services started via OpenRC could be ran
> wrapped to s6, such as instead of scripting to start the daemon
> normally via direct execution, you start it wrapped via OpenRC by
> executing the s6 run script and stopped by the finish script within
> the OpenRC script acting as a manager layer.

  I think that's what the "supervisor=s6" variable does.
  See https://github.com/OpenRC/openrc/blob/master/s6-guide.md

--
  Laurent


RE: [ale] systemd talk from July has slide deck online now

2015-09-09 Thread James Powell
That support does launch s6, granted, but what I'm getting at is this:

Use OpenRC to launch each individual service with a wrapped s6 service script 
using a sort of stacked runlevel schema. Each service launches against s6, but 
rather than be controlled strictly by s6, it's controlled through OpenRC with 
the exception of the supervisor which can restart a failed service using s6 
while still maintaining control through OpenRC.

-James

From: post-sysv<mailto:boycottsyst...@openmailbox.org>
Sent: ‎9/‎8/‎2015 9:01 PM
To: a...@ale.org<mailto:a...@ale.org>
Cc: supervision@list.skarnet.org<mailto:supervision@list.skarnet.org>
Subject: Re: [ale] systemd talk from July has slide deck online now

On 09/08/2015 11:44 PM, James Powell wrote:
> I have wondered if OpenRC could have scripts that hook into s6 to start 
> services via s6 and run in supervision mode while having OpenRC manage order 
> of process startup and sysvinit handles the remainder of init functionality. 
> In a way you combine sysvinit, OpenRC, and s6 in a harmonious union. It's one 
> way of doing things, but it does give insight into mixing different services 
> to run a system.

OpenRC, since 0.16, *does* in fact support s6 integration:
https://github.com/OpenRC/openrc/blob/master/s6-guide.md


RE: [ale] systemd talk from July has slide deck online now

2015-09-08 Thread James Powell
I have wondered if OpenRC could have scripts that hook into s6 to start 
services via s6 and run in supervision mode while having OpenRC manage order of 
process startup and sysvinit handles the remainder of init functionality. In a 
way you combine sysvinit, OpenRC, and s6 in a harmonious union. It's one way of 
doing things, but it does give insight into mixing different services to run a 
system.

-James

From: Colin Booth
Sent: ‎9/‎8/‎2015 6:22 PM
To: a...@ale.org; 
supervision@list.skarnet.org
Subject: Re: [ale] systemd talk from July has slide deck online now

> Hi Jim,
>
> Here are some of the many good inits:
>
> * runit: Serves as both PID1 and typical daemontools-style service
>   manager.
>
Runit is a pid1 init, which runs a service manager as a non-pid1
supervision tree root. runit runs a script called `1' to handle all
post-boot initialization oneshots, followed by a script called `2'
which serves to exec into your supervision scanner (runsvdir), and
runs `3' to handle all system teardown.
>
> * s6: Very advanced daemontools-style service manager. Requires some
>   other PID1. Sysvinit will fill that bill (without any of the init
>   scripts: Just one line in /etc/inittab). Personally, I used
>   Suckless-Init  to implement PID1, and LittKit to provide
>   deterministic startup order of services.
>
s6-svscan is designed to be a stage 2 init and supervision root but
can happily run as a non-init process under another service. Compared
to runit, where pid1 is runit, and it forks out different scripts for
each stage, when run as an init s6 uses a script for stage 1, execs
into s6-svscan for stage 2, and uses a script again for stage 3. I use
s6-svscan as an init at home and as a runsvdir/svscan (daemontools)
non-pid1 supervision scanner at work.
>
> * s6-rc: This is coming out this month: I haven't used it. From what I
>   understand, this has raised the bar by combining a top quality PID1
>   with the s6 service manager.
>
s6-rc has nothing to do with pid1 init and everything to do with
solving some major problems that supervision suites have. The base s6
service already handles init if you want it to. s6-rc is a system for
handling service ordering dependencies and the ability to call oneshot
scripts from inside of a supervision context. It is true that s6-init
is a lot friendlier when coupled with s6-rc since it allows you to
move most system initialization out of the very delicate stage1
period, but to say it's an init system for s6 does both s6 and itself
a disservice. If anything, s6 is the init and supervisor, s6-rc is the
service manager.
>
> * nosh: Another PID1 plus daemontools-style service manager. Its
>   runscripts require a special language, I was unable to compile it
>   eight months ago. Judging from the many things I've heard its author
>   saying, this should be an excellent init if you can get it running.
>
As far as I know it, nosh started as a new-school init for FreeBSD,
since launchd was never going to happen, and upstart was linux-only.
As it stands, nosh is playing in the same space as upstart, launchd,
and systemd. I haven't used it (I use bsd init + s6 on my one freebsd
system, and straight bsd init on my openbsd router) but I'm sure its
solid.
>
> * Epoch: Trivially easy init system with declarative config file
>   instead of init scripts, run scripts, or unit files. If I need to
>   alt-init a computer in two hours, I'll be using Epoch.
>
I haven't used it but epoch would make a great init for a supervisor
that wasn't capable of doing that itself. daemontools or
daemontools-encore would do great with epoch sicne 95+% of your
non-boot stuff will be handled by daemontools already. Jonathan, feel
free to correct me if I'm wrong.
>
> * OpenRC: A sort of advanced version of what sysvinit should have been.
>   I've used it a couple times. I understand it has the ability to do
>   all the same stuff as systemd achieves with its socket-activation. Is
>   not capable of automatically rerunning crashed services.
>
OpenRC is a service manager that handles the service half of init
(your rc scripts). It isn't a pid1 but instead hooks into your init,
supervisors, etc to manage service grouping, ordering, and the like.
>
> * RichFelker + LittKit + daemontools-encore: If you want the simplest
>   possible init: One that absolutely anybody can understand and
>   troubleshoot, this is it. Includes a 16 lines-of-C PID1. You can
>   replace RichFelker with Suckless-Init if you want your PID1 to listen
>   for SIGCHLD, SIGINT and SIGUSR1 and do the right thing, substitute
>   the 83 lines-of-C Suckless Init. You'll need to write your own
>   shutdown script, which isn't particularly difficult. If you want to
>   know what RichFelker PID1 is, see the bottom of this page:
>   http://ewontfix.com/14/
>
Rich Felker's init and suckless init both have one major flaw that
keeps 

RE: runit maintenance - Re: patch: sv check should wait when svrun is not ready

2015-06-18 Thread James Powell
You could always craft a rudimentary Makefile, autogen.sh, and configure script 
set in the tree root that can be used. Then again, I like others, just do 
install -v XXX and cp -a to the files to directories as needed anyways, and 
retune to sv scan default directory in patch.

Sent from my Windows Phone

From: Gerrit Papemailto:p...@smarden.org
Sent: ‎6/‎18/‎2015 7:54 AM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: runit maintenance - Re: patch: sv check should wait when svrun is not 
ready

Hi all,

I'm around, but currently not very active in my open source activities.

runit started in 2001 and was tracked in CVS.  After I took over Debian
maintainership of git in 2005, I converted the repository to git.  The
complete history is available since then

 git clone http://smarden.org/git/runit.git/

Every here and then I think about doing some work on runit, like
integrating patches, but find the file layout in the repo, the build and
installation process, so archaic that I stop again, most of the time.
It's about 13 years old..

Those two things are the two top items on my TODO list.

Regards, Gerrit.


On Thu, Jun 18, 2015 at 10:35:45AM +, James Byrne wrote:
 Similarly, I have two patches which I submitted to the mailing list in 
 February which I would like to get merged:

 http://www.mail-archive.com/supervision@list.skarnet.org/msg00500.html

 http://www.mail-archive.com/supervision@list.skarnet.org/msg00501.html

 It would be useful if Gerrit could respond to confirm whether he is still 
 accepting patches to runit or planning to do any future releases.

 Regards,

 James

 -Original Message-
 From: supervision@list.skarnet.org [mailto:supervision@list.skarnet.org] On 
 Behalf Of Avery Payne
 Sent: 16 June 2015 18:43
 To: Buck Evan
 Cc: supervision@list.skarnet.org
 Subject: Re: patch: sv check should wait when svrun is not ready

 I'm not the maintainer of any C code, anywhere.  While I do host a mirror or 
 two on bitbucket, I only do humble scripts, sorry.  Gerrit is around, he's 
 just a bit elusive.

 On 6/16/2015 9:37 AM, Buck Evan wrote:
  I'd still like to get this merged.
 
  Avery: are you the current maintainer?
  I haven't seen Gerrit Pape on the list.
 
  On Tue, Feb 17, 2015 at 4:49 PM, Buck Evan b...@yelp.com
  mailto:b...@yelp.com wrote:
 
  On Tue, Feb 17, 2015 at 4:20 PM, Avery Payne
  avery.p.pa...@gmail.com mailto:avery.p.pa...@gmail.com wrote:
  
   On 2/17/2015 11:02 AM, Buck Evan wrote:
  
   I think there's only three cases here:
  
1. Users that would have gotten immediate failure, and no
  amount of
   spinning would help. These users will see their error delayed
  by $SVWAIT
   seconds, but no other difference.
2. Users that would have gotten immediate failure, but could
  have gotten
   a success within $SVWAIT seconds. All of these users will of
  course be glad
   of the change.
3. Users that would not have gotten immediate failure. None of
  these
   users will see the slightest change in behavior.
  
   Do you have a particular scenario in mind when you mention
  breaking lots
   of existing installations elsewhere due to a default behavior
  change? I
   don't see that there is any case this change would break.
  snip
 
  Thanks for the thoughtful reply Avery. My background is also
  maintaining business software, although putting it in those terms
  gives me horrific visions of java servlets and soap protocols.
 
   I have to look at it from a viewpoint of what is everything
  else in the system expecting when this code is called.  This
  means thinking in terms of code-as-API, so that calls elsewhere
  don't break.
 
  As a matter of API, sv-check does sometimes take up to $SVWAIT
  seconds to fail.
  Any caller to sv-check will be expecting this (strictly limited)
  delay, in the exceptional case.
  My patch just extends this existing, documented behavior to the
  special case of unable to open supervise/ok.
  The API is unchanged, just the amount of time to return the result
  is changed.
 
   This happens because the use of sv check (child) follows the
  convention of check, and either succeed fast or fail fast, ...
 
  Either you're confused about what sv-check does, or I'm confused about
  what you're saying.
  sv-check generaly doesn't fail fast (except in the special case I'm
  trying to make no longer fail fast -- svrun is not started).
  Generally it will spin for $SVWAIT seconds before failing.
 
   Without that fast-fail, the logged hint never occurs; the
  sysadmin now has to figure out which of three possible services in
  a dependency chain are causing the hang.
 
  Even if I put the above issue aside aside, you wouldn't get a hang,
  you'd get the 

RE: comparison

2015-06-16 Thread James Powell
And supervision-scripts has been that generic profile that can be used in 99% 
of situations. Sadly, Avery, I wish we could have had your work about 8 years 
ago. The UNIX world might have been a vastly different place.

Sent from my Windows Phone

From: Avery Paynemailto:avery.p.pa...@gmail.com
Sent: ‎6/‎16/‎2015 11:26 AM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: comparison

On 6/16/2015 5:22 AM, James Powell wrote:
 Very true, but something always seems to say something along the lines of if 
 we had done #2 years ago, we might have avoided a huge mess that now exists.
Agreed.
 The same applies to init systems. If there are ready to use feet wetting, 
 taste testing scripts ready to go, the job of importing things just gets 
 easier on the distribution.
Also agreed.  Actually, there's some discussion on the mailing list from
a few months back about this.
 
 From: Steve Littmailto:sl...@troubleshooters.com
 Sent: ‎6/‎16/‎2015 4:45 AM
 To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
 Subject: Re: comparison

 On Tue, 16 Jun 2015 04:05:29 -0700
 James Powell james4...@hotmail.com wrote:

 I agree Laurent. Though, even though complete init+supervision
 systems like Runit exist, it's been nearly impossible to get a
 foothold with any alternatives to sysvinit and systemd effectively. I
 think one of the major setbacks has been the lack of ready-to-use
 script sets, like those included with OpenRC, various rehashes of
 sysvinit and bsdinit scripts, and systemd units just aren't there
 ready to go.
The true problem is that each daemon needs its own special environment
variables, command flags, and other gobbledygook that is specific to
getting it up and running, and a master catalog of all settings doesn't
exist.  Compounding that is the normal and inevitable need for each
supervision author to do their own thing, in their own way, so tools get
renamed, flags get mapped, return codes aren't consistent.  That's just
the framework, we haven't talked about run scripts yet.  Who wants to
write hundreds of scripts?  Each hand-cobbled script is an error-prone
task, and that implies the potential for hundreds of errors, bugs,
strange behaviors, etc.

This is the _entire_ reason for supervision-scripts.  It was meant to be
a generic one size fits most solution to providing prefabricated run
scripts, easing or removing the burden for package maintainers, system
builders, etc.  All of the renaming and flags and options and
environment settings and other things are abstracted away as variables
that are correctly set for whatever environment you have.  With all of
that out of the way, it becomes much easier to actually write scripts to
launch things under multiple environments.  A single master script
handles it all, reduces debugging, and can be easily swapped out to
support chainload launchers from s6 and nosh.

The opposite end of this is Laurent's proposal to compile the scripts so
they are built into existence.  If I'm understanding / imagining this
correctly, this would take all of the settings and with a makefile
bake each script into existence with all of the steps and settings
needed.  It would in effect provide the same thing I am doing but it
would make it static to the environment. There's nothing wrong with the
approach, and the end result is the same.

The only difference between Laurent's approach and mine, is that
Laurent's would need to re-bake your scripts if your framework
changes; in my project, you simply run a single script and all of the
needed settings change on the fly.  I'm not sure of the pros/cons to
either approach, as I would hazard a guess that any system switching
between frameworks may also require a reboot if a new init is desired.

Here's the rub: in both cases, the settings for each
service/daemon/whatever are key to getting things running.  Again, we
come back to the idea of a master catalog of settings.  If it existed,
then half of the problem would be resolved.  There are lots of examples
out there, but, they're not all in one place.

So I try to toil over supervision-scripts when I get time, and make that
catalog.  Even if people don't like what I'm doing with the master run
script itself, that doesn't matter.  *What matters is that I've managed
to capture the settings for the various daemons, along with some
annotations*.  Because I took the time to support envdir, and the
settings for each daemon are stored in this format, those settings can
be extracted and used elsewhere.  I'm slowly creating that master
catalog in a plaintext format that can be read and processed easily.
This is the real, hidden value of supervision-scripts.

By the way, I'm going to bite the bullet and switch off of MPL 2.0 soon,
probably by month-end.


RE: comparison

2015-06-16 Thread James Powell
You really do have to cater to get a foothold in the door any more.

Sent from my Windows Phone

From: Laurent Bercotmailto:ska-supervis...@skarnet.org
Sent: ‎6/‎16/‎2015 4:24 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: comparison

On 16/06/2015 22:32, post-sysv wrote:
 Soon systemd arrives with its promise of being a unified userspace
 toolkit that systems developers can supposedly just plug in and
 integrate without hassle to get X, Y and Z advantages. No more
 writing initscripts, no more setting policy because systemd will do
 as much as it can for you. A lazy package maintainer's dream,
 ostensibly.

 Everyone hops on the bandwagon and there you are. Now we get to hear
 about how systemd solves so many long-standing problems with the
 distributions, but I can't shake the feeling that many of them were
 self-inflicted through indifference and/or incompetence.

  I think you nailed it exactly. Inertia is the most potent force in
the universe.
  Basically, to get supervision suites - or anything else - adopted by
distributions, the only necessary thing is to do all their work for
them.

--
  Laurent


RE: staggering runsv startup

2015-06-04 Thread James Powell
If runit had the ability to order processes like OpenRC where you have:

before=
after=

setups, you could order the entire tree structure.

The problem with sv check is the command often can only check the status of the 
service.

Sent from my Windows Phone

From: Steve Littmailto:sl...@troubleshooters.com
Sent: ‎6/‎4/‎2015 3:33 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: staggering runsv startup

On Fri, 05 Jun 2015 00:10:05 +0200
Laurent Bercot ska-supervis...@skarnet.org wrote:

   What you really want is a real service manager that works on top
 of a process supervision system and that would managed a complete,
 ordered initialization sequence for you.

   Steve is saying that process supervisors are lacking real service
 management capabilities, and he's right. Process supervision does
 not offer service management; service management is more complex and
 one layer above.

I'm not familiar with the definitions of service managers and process
supervisors, but the simple ability to declare the order of initial,
bootup startup, would go a long way.

I'm not saying such ordering would free me from needing to check for
required other services being ready in this service's run script. I'm
just saying that ordering would eliminate the tendency of such checks
to result in one service coming up per cycle around /service, and would
produce reasonable boot times, because in most cases teh required
services *would* be up and functioning by the time the service that
needs them is started.



   There are some tools to accomplish service management on top of
 process supervision. One that I like is anopa:
 http://jjacky.com/anopa/ but it's designed to work with s6, not runit.

   I'm also working on a service management system for s6, that should
 hit beta soon.

This is very good news. Ordering is sorely needed in the daemontools
world, and I think your new S6 service management system would be the
first to do that without kludges like I enumerated in other emails.

Thanks,

SteveT

Steve Litt
June 2015 featured book: The Key to Everyday Excellence
http://www.troubleshooters.com/key


RE: move s6 to github?

2015-04-21 Thread James Powell
I think s6 is fine how it is. Currently many related projects exist on githib 
that can help or assist s6, but they are not developed solely on githib. 
Runit-for-LFS was based in GoogleCode for logistical reasons until Google 
pulled the plug on us.

I also see s6 not as a true Cathedral, but as a focused project with a 
Benevolent Dictator for Life at the helm guiding the project carefully. It is 
neither cathedral or bazaar, but a unit of a larger bazaar.

Thanks,
James

Sent from my Windows Phone

From: Laurent Bercotmailto:ska-supervis...@skarnet.org
Sent: ‎4/‎21/‎2015 6:24 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: move s6 to github?

On 22/04/2015 02:58, Buck Evan wrote:
 Just to set my own expectations, may I send pull requests to github, or
 must I send patches here?

  Please send patches here. Even better, please start a design discussion
about the feature/change you want before writing a patch, unless it is
very small. And above all, please don't get upset if I say no. :)
(If I say no, I will explain why.)


 I brought up the bazaar because you criticized systemd as neglecting The
 bazaar approach that has made the free software ecosystem what it is
 today;, which made me think s6 would embrace the bazaar in contrast.
 http://skarnet.org/software/s6/systemd.html

  Hm. I can see how it is misleading.

  I actually do not support bazaar as a *development model for a project*.
I believe that quality software can only be written by keeping a tight grip
on what goes in, with a clear vision about the scope and design of the project,
and that can only be achieved with very small teams. Free software following
the bazaar development model is notoriously bad at quality control.

  However, I also believe that a project scope should be limited, and I very
much support the blossoming of as many small-scope projects as can be, and
total freedom about the  interfaces and communication points between all those
projects. That is what I call the bazaar approach that has made the free
software ecosystem what it is today: everybody can write software that interacts
with other software on their machine, in the way they choose. I support bazaar
as an *application creation model for an existing system*. To me, that is what
free software is about.

  systemd, unsurprisingly, gets both levels wrong. It has a large developer
base so no coherent vision and bad quality control, *and* it has an insanely
large scope and tries to enforce the use of its own interfaces for new
software development, essentially proprietarizing it.


 While I agree that lines-of-code should not grow fast, I would enjoy seeing
 user uptake grow much more quickly, and I believe that's part of your
 project goal, someday.

  Oh, yes. But users don't have to be developers.


 I think the cpython project shows that populism and restraint aren't
 mutually exclusive features.
 It's a very bazaar-oriented project, but also quite conservative about
 feature creep

  The problem with Python is not feature-creep. It's its existence in the first
place.

--
  Laurent



RE: first round of optional dependency support

2015-01-15 Thread James Powell
Service scripts often need a lot of setup code before the actual daemon is 
executed. My question is, does it provide a fail-safe solution to dependency 
trees?

Shutdown is only an issue if you need a finish script, otherwise the service 
supervisor will execute the kill signal and bring things down.

Sent from my Windows Phone

From: Avery Paynemailto:avery.p.pa...@gmail.com
Sent: ‎1/‎15/‎2015 9:11 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: first round of optional dependency support

Ok, admittedly I'm excited because it works.

The High Points:

+ It works (ok, yeah, it took me long enough.)
+ Framework-neutral grammar for bringing services up and checking them, no
case-switches needed
+ Uses symlinks (of course) to declare dependencies in a tidy ./needs
directory
+ Can do chain dependencies, where A needs B, B needs C, C starts as a
consequence of starting A
+ Chain dependencies are naive, having no concept of each other beyond what
is in their ./needs directory, so you do NOT need to declare the kitchen
sink when setting up ./needs
+ Is entirely optional, it is off by default, so you get the existing
behavior until enabled
+ Simple activation, you enable it by writing a 1 to a file
+ Smart enough to notice missing definitions or /service entries, a script
will fail until fixed

The So-So Points:

~ Framework grammar makes the working assumption that it follows a
tool-command-service format.  This might be a problem for future frameworks
that require tool-service-command or other grammars.
~ Some distro maintainers may have situations where they compile out
something that will be defined in a ./needs, or may compile in something
that is missing from ./needs; this mismatch will bring tears, but for now,
I am assuming that things are sane enough that these inter-dependencies
will remain intact.
~ I'm not happy with handling of ./env settings, it could have been cleaner
~ Oversight of dependencies is based on the assumption that the supervisor
for the dependency will keep the service propped up and running.
~ Once enabled, you need to start or restart services.  It doesn't affect
running services.
~ Currently starting a scripts sends the commands up, then check.  Maybe it
should do check, then up, then check?  That feels wrong - at what point
does it turn into turtles all the way down?

The Low Points:

- Not true dependency management.  It only tackles start-up, not shut-down,
and won't monitor a chain of dependencies for failures or restarts.
- Enormous code bloat.  By the time I finished with the bulk of exception
handling, I felt like I ran a marathon...twice.  The resulting script is
*multiple* times the size of the others.
- The number of dependent commands needed in user-space to run the script
has gone upway up.  Every additional user-space tool included is
another does your install have X that ultimately limits things -
especially embedded devices.  Did I mention bloat earlier?
- Way too many failure tests, which means...way too many failure paths.
This makes testing much harder.
- There's a bug (or two) lurking in there, my gut tells me so
- Relative pathing is fine for a static install inside of /etc, but what
happens when users try to spawn off their own user-controlled services?  I
smell a security hole in the making...

The Plan:

This will become a part of avahi-daemon and forked-daapd definitions, but
disabled by default.  From everyone else's perspectives, it will function
like it always did, until enabled.  With sv/.env/ENABLE_NEEDS set to 1, for
example, a launch of forked-daapd will bring up avahi-daemon, and
avahi-daemon will bring up dbus.

Constructive criticism welcome.  I ask that Laurent leaves his flamethrower
at home - the urge to burn it with fire to purify the project may be
strong. ;)


RE: [announce] s6-2.0.1.0

2015-01-14 Thread James Powell
Going to pull the new version as soon as I can to test in LFS.

Thanks Laurent.

Sent from my Windows Phone

From: Laurent Bercotmailto:ska-supervis...@skarnet.org
Sent: ‎1/‎13/‎2015 6:15 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: [announce] s6-2.0.1.0


  Greetings,

  s6-2.0.1.0 is out.

  It fixes the issue that some people encountered with parallel builds
(make -j2).

  There's a new program, s6-applyuidgid, that is a more generic
s6-setuidgid, meant to be called by programs that want to drop
privileges automatically. (The s6-networking superservers do this.)

  The support for readiness notification has been improved: if a server
supports it and has been launched under s6-notifywhenup, then
s6-svwait -U correctly waits for readiness in every case.

  http://skarnet.org/software/s6/
  git://git.skarnet.org/s6

  Enjoy,
  Bug-reports welcome.

--
  Laurent


RE: redoing the layout of things

2015-01-09 Thread James Powell
Avery, as a good rule of thumb, stick to what you know is best and do what 
you feel is the simplest method with the greatest functionality. If hidden 
directories work to help things work better, use them. Besides, if it already 
works, why change it?

Sent from my Windows Phone

From: Avery Paynemailto:avery.p.pa...@gmail.com
Sent: ‎1/‎9/‎2015 2:21 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: redoing the layout of things

On Thu, Jan 8, 2015 at 3:08 PM, Luke Diamand l...@diamand.org wrote:

 On 08/01/15 17:53, Avery Payne wrote:

 The use of hidden directories was done for administrative and aesthetic
 reasons.  The rationale was that the various templates and scripts and
 utilities shouldn't be mixed in while looking at a display of the various
 definitions.


 Why shouldn't they be mixed in? Surely better to see everything clearly
 and plainly, than to hide some parts away where people won't expect to find
 them. I think this may confuse people, especially if they use tools that
 ignore hidden directories.


Ok, I'll take this as part of the consideration.


 Move everything down one level then?


I've given it a bit of thought.  I would be willing to remove the dots.
However, the current naming convention would create confusion if you were
to eliminate the support directories altogether.  Keep in mind the purpose
of the directories was to separate out functionality and clearly define
what a group of things does; a service template is vastly different from a
logging template.  The script names were meant as a reminder to how they
are used, along with the directories.  This is why there is a run-svlogd,
and not a log-svlogd.  However, I suppose I could rename things to better
match their intended use.  And while I don't want to drop the prefix (for
reasons of clarity when writing the script) as long as the directories
remain, I'm willing to drop those as well.  The proposal woudl be, inside
of sv/, something like:

/bin
/bin/use-daemontools
/bin/use-runit
/bin/use-s6
/env
/env/PATH
/env/FRAMEWORK
/env/ENABLE_DEPENDS
/finish
/finish/clean
/finish/notify
/finish/force
/log
/log/multilogd
/log/svlogd
/log/s6-log
/log/logger
/log/socklog
/run
/run/envdir
/run/getty
/run/user-service
/(definition 1)
/(definition 2)

...and so on, without the dots.  I'm not wild about the messy appearance
it will give but if it makes adoption easier, then I'll do it.  That, and
we now have five words that are reserved and can never be used by any
service (although I doubt that a service would use any of the above),
because the names exist alongside the definitions.  That was another reason
I wanted dot-files - it was one less thing to worry about, one less issue
that needed attention.

Good thing the bulk of the defintions are symlinks...makes it easy to
switch the directory name. ;)


RE: thoughts on rudimentary dependency handling

2015-01-08 Thread James Powell
I'll be following this intently as I have a project I'm working on that will 
use s6 heavily even discretely.

Sent from my Windows Phone

From: Avery Paynemailto:avery.p.pa...@gmail.com
Sent: ‎1/‎7/‎2015 11:58 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: thoughts on rudimentary dependency handling

On Wed, Jan 7, 2015 at 6:53 PM, Laurent Bercot ska-supervis...@skarnet.org
wrote:

  Unfortunately, the envdir tool, which I use to abstract away the daemons
 and settings, only chain-loads; it would be nice if it had a persistence
 mechanism, so that I could load once for the scope of the shell script.


  Here's an ugly hack that allows you do that using envdir:
 set -a
 eval $({ env; envdir ../.env env; } | grep -vF -e _= -e SHLVL= | sort |
 uniq -u)
 set +a


Thanks!  When I can carve out a bit of time this week I'll put it in and
finish up the few bits needed.  Most of the dependency loop is already
written, I just didn't have a somewhat clean way of pulling in the
$CMDWHATEVER settings without repeatedly reading ./env over and over.


  It only works for variables you add, though, not for variables you remove.


It will work fine. I'm attempting to pre-load values that will remain
constant inside the scope of the script, so there isn't a need to change
them at runtime.


RE: s6 init-stage1

2015-01-06 Thread James Powell
The problem of using sockets rather than named pipes is that each UNIX socket 
requires more POSIX shared memory increasing the system resource base 
requirements. Named pipes just use normal process memory which keeps system 
requirements less. How Lennart failed to mention that in the systemd 
presentation is insane.

Sent from my Windows Phone

From: post-sysvmailto:boycottsyst...@openmailbox.org
Sent: ‎1/‎6/‎2015 12:03 PM
To: Laurent Bercotmailto:ska-supervis...@skarnet.org
Cc: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: s6 init-stage1

On 01/06/2015 07:48 AM, Laurent Bercot wrote:
 Interesting. Thanks for the heads-up - I had heard of tsort, but didn't
 know exactly what it does.

  However, I'd like a tool that knows what steps it can parallelize.
 A sequential output is great for functions name in a piece of code,
 but for services, the point is to start as many as possible in
 parallel, and minimize the amount of synchronization points.

 For instance, given
 1 2
 3 4
 meaning 2 should happen after 1, and 4 should happen after 3,
 tsort gives
 1
 3
 2
 4
 but instead, I need something like
 1 3
 2 4
 because 1 and 3 can happen in parallel, and same for 2 and 4.

  AFAICT, tsort cannot do that. (make might not be able to either,
 but since it's more complex, it's harder to tell.)


  About that. Actually, I'm not even certain if there exists a service
manager that
*actually* starts processes in parallel. Usually what I've noticed is
that most of
the time what is really meant is that services are started
asynchronously, or at
best concurrently.

  Debian and other formerly sysvinit-based distributions had what was
known as
a Makefile-style concurrent boot. To the best of my knowledge, this
was done
using a combination of LSB initscript headers through insserv, and a
program
called startpar.

  Reading the source code of startpar, I was surprised to see that it
does its job
through a primitive form of socket activation in the run() function
where it allocates
a so-called preload socket and determines exit status by its
availability for
connection. Secondary routines including meddling with ptys and file
descriptors
to curb interleaving and make sure the execution state is clean and free of
potentially blocking operations.

  Makes me wonder if Poettering ever read it, though his ostensible
inspiration
was from launchd. That said, it does show that the systemd supporters have
overhyped the novelty of socket activation (inetd) even more
significantly
than I had previously thought. Someone should make note of this.

  In any event, I'm under the impression that most so-called parallel
service starters
are really ones that start asynchronously in a clean execution state, as
true
parallelism and even concurrency sounds conceptually quite difficult,
particularly
when you keep in mind that many boot processes are I/O-bound, primarily.
systemd itself has a complex dependency system at its backbone, with socket
activation not being a mandatory thing from what I've learned. It also
blocks on
occasion to fulfill start jobs, so evidently it has synchronization
methods that are
contrary to its claims.

  If someone can clarify this issue or point to any concurrent/parallel
schemes for
starting services at boot time that have been implemented, that would be
appreciated.


RE: runit-scripts gone, supervision-scripts progress

2015-01-02 Thread James Powell
Hey Laurent,

Over at LQ, I'm working on importing s6 into LFS again, but this time at a 
slower pace. I was hoping to also see about using the native LFS utilities as 
much as possible and only include the init-shim tools (halt, shutdown, pause, 
and runlevel scripts and binaries) from Runit-For-LFS for low level system 
management if possible to avoid using more extras.

I have had a though, why not include symlinkable functionality for halt, 
poweroff, shutdown, and reboot directly in s6-svscanctl and move s6-pause into 
s6 itself to simplify the packages (you could even have a configure trigger 
--with-s6-pause to enable or disable it during build. Just a suggestion, but no 
biggie.

Anyways, I'll be posting more frequently about getting init-stage-1/2/3 drafted 
correctly and in execline script language. Avery maybe you can share your notes 
as well on this with me, if possible.

Thanks,
Jim

Sent from my Windows Phone

From: Laurent Bercotmailto:ska-supervis...@skarnet.org
Sent: ‎1/‎2/‎2015 4:59 AM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: runit-scripts gone, supervision-scripts progress


  Hi Avery,
  Happy new year to you !

  Congratulations on the achievements so far, even if they're not reaching
the bar you set for yourself.

  Just a little note:

 + The ./finish concept needs development and refinement.

 + Need to incorporate some kind of alerting or reporting mechanism into
 ./finish, so that the sysadmin receives notifications

  ./finish is a delicate beast. It is not only run when the admin brings
the service down, which is fine, but also when the service stops in an
untimely fashion; and the service cannot start again as long as ./finish
is running. So, if anything time-consuming, or worse, blocking, happens
in ./finish, the service can be totally hosed.
  Services should do all their necessary work in ./run, before executing
into the long-lived process: when they are in ./run, it's a known and
manageable state, they are up, even if they are not ready yet. But in
./finish, it's kind of a limbo state that shouldn't be drawn out. The
service is down, but it's still doing something, can't be brought up
right now, etc. Having a service stuck in finish state is about as
infuriating as having a process stuck in D state on Linux.

  s6-supervise has a built-in protection against misbehaving ./finish
scripts: if ./finish is still around after 5 seconds, it kills it.
(With a SIGKILL. When a service is down is not the time to be polite.)
AFAICT, runsv does not have such a protection, which makes it even more
important to pay attention when writing ./finish scripts.

  One way or the other, ./finish should only be used scarcely, for clean-up
duties that absolutely need to happen when the long-lived process has died:
removing stale or temporary files, for instance. Those should be brief
operations and absolutely cannot block.
  So, if you're implementing reporting in ./finish, make sure you are using
fast, non-blocking commands that just fail (possibly logging an error
message) if they have trouble doing their job.

  The way I would implement reporting wouldn't be based on ./finish, but on
an external set of processes listening to down/up/ready notifications in
/service/foobar/event. It would only work with s6, though.

--
  Laurent



RE: New linux distribution using runit

2014-12-07 Thread James Powell
Void has been on our radar for several months now. It actually was a systemd 
distribution, but reverted to Runit and eudev after several months time.

Sent from my Windows Phone

From: Martin Födingermailto:foedinger.mar...@gmail.com
Sent: ‎12/‎6/‎2014 3:26 AM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: New linux distribution using runit

Hi

I wanted to report that the Void linux distribution is also using runit
as its default init scheme. The distribution was written from scratch in
2008.

Here is a link to the homepage of the project: http://www.voidlinux.eu/

Have a nice day,
Martin



RE: Transition to supervision-scripts

2014-11-06 Thread James Powell
 Date: Mon, 3 Nov 2014 10:41:05 -0800
 Subject: Transition to supervision-scripts
 From: avery.p.pa...@gmail.com
 To: supervision@list.skarnet.org
 
 The transition is complete and all framework-specific dependencies are
 being replaced with generic redirects.  The runit-scripts repository will
 be deleted January 1st.

Current the runit-for-lfs project has been studying how to get better control 
of some of the scripts we use also. Our project is still mostly incomplete with 
several services still unable to function. Avery, we actually used some of your 
scripts as a reference point as well with using chpst as a way to execute a 
service inside a group:user setting. However we still have a long ways to go 
with stabilizing the services. Thank you for hosting them.

Currently, sshd and winbindd just will not operate regardless of what is used 
or done. sshd complains the log service is not working and the service is down. 
Winbindd won't even start even if nmbd and smbd are both started as well. We 
even have issues getting kdm to login, unless it is working but auto-reloads 
itself. Rsyncd doesn't work either.

I'm still wanting to push for a RC2 release by December but at the rate scripts 
are still not working this may not happen, and I still have a lot of other 
scripts to get working as well.

If anyone could get a checkout of the svn at the runit-for-lfs project page 
https://code.google.com/p/runit-for-lfs/ and see where we're going wrong, it 
would greatly help.

Thanks.
Jim
  

RE: License selection for process scripts

2014-10-24 Thread James Powell
I licensed runit-for-lfs under the MIT license because I wanted people to have 
the choice of contributing without compulsion. They can reuse the work, but we 
still have to be referenced only if work was derived.

Sent from my Windows Phone

From: Laurent Bercotmailto:ska-supervis...@skarnet.org
Sent: ‎10/‎23/‎2014 3:07 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Re: License selection for process scripts

  Licensing is only an issue if you want to distribute code together with
pre-existing licensed code. Since it's unlikely that your scripts will be
distributed together with runit, s6, daemontools, perp, or anything else,
the fact that they all have different licenses should not be a problem for
you - you don't have to worry about compatibility, you can just choose
whatever license you like best for your script packages.

  As for what kind of license is the best, it's a highly religious subject
and I very much wish to avoid this kind of debate on the mailing-list.
  I've found the wikipedia article at
http://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenses
as well as http://opensource.org/licenses to be nice starting points.

--
  Laurent



RE: [PATCH] Implement stage 4 (execve handoff)

2014-08-01 Thread James Powell
Might be good for a bootdisk scenario in some ways as FINNIX does. However 
outside that, I would have only speculation for a real time system. Still 
intriguing to say the least. 

 Date: Fri, 1 Aug 2014 10:20:10 +
 From: p...@smarden.org
 To: supervision@list.skarnet.org
 Subject: Re: [PATCH] Implement stage 4 (execve handoff)
 
 On Tue, Jul 29, 2014 at 12:01:36PM -0700, Ryan Finnie wrote:
  This patch adds stage 4, an optional stage which is run after stage 3.
  If /etc/runit/4 is found and executable, runit will execve() into it,
  giving it control of PID 1.  Finnix (http://www.finnix.org/) uses runit
  as its init system, and uses this patch as a way to pivot root back into
  a ramdisk upon shutdown, so it may cleanly umount the union mount which
  houses runit.
 
 Hi Ryan, I'm not sure how interesting this is for general use.  It's
 the first time I hear about the need for a stage 4.
 
 And actually I wouldn't call it so, because runit no longer runs but got
 replaced.  Finally, when considering an updated version, documentation
 is missing ;).
 
 Regards, Gerrit.
  

RE: Rare runsv logging problem

2014-07-25 Thread James Powell
My question is why are you running Upstart? Runit has it's own init so Upstart 
is pointless. Runit's binary should maintain runsv. It also could depend on the 
run script also having an improper handling.

Sent from my Windows Phone

From: Caleb Sparemailto:cesp...@gmail.com
Sent: ‎7/‎25/‎2014 5:16 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Rare runsv logging problem

Hi,

I've been using runit for a while now and it has been mostly
wonderful. I'm noticing a persistent issue and I'm not sure how to
debug it.

On the servers we're running Ubuntu and we use runit 2.1.1 via the
default package that comes with the distro. Upstart runs runsvdir and
we use runit to manage all of our application processes. Each
application has a simple ./run and ./log/run; the latter execs svlogd
(this is all a typical configuration, as I understand it).

The problem I'm seeing is that, very occasionally, runsv will get into
a bad state where svlogd is not running. (I'm not sure if it fails to
start svlogd or if this happens later on after it has been running
properly.) When the problem occurs, pstree shows something like this:

runsvdir-+-runsv-+-foo---5*[{foo}]
 |   `-svlogd
 |-runsv-+-bar---21*[{bar}]
 |   `-svlogd
 `-runsv---baz---250*[{baz}]

Here you can see that the baz process does not have an associated
svlogd process. Further:

$ sudo sv s foo
run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s
$ sudo sv s baz
run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok:
file does not exist
; run: log: (pid 2337) 2983s

Two strange things there: the warning about supervise/ok and also that
the pid for 'log' is the same as for 'baz'.

When runsv is in this bad state, the output from baz goes right to
runsvdir and ends up in /var/log/upstart/runsvdir.log.

The fix I've been using is to 'sv d baz' and then kill the offending
runsv process. Runsvdir will quickly restart it and then everything
will be working:

runsvdir-+-runsv-+-foo---5*[{foo}]
 |   `-svlogd
 |-runsv-+-baz---25*[{baz}]
 |   `-svlogd
 `-runsv-+-bar---20*[{bar}]
 `-svlogd

I'm unsure what causes this rare problem. We only do simple things
with the runit: sv {t,d,u}. When we deploy services, we rsync a
directory from elsewhere on the box into /etc/services/name and then
'sv t name'. That source dir only has ./run, ./finish, and
./log/run.

Any ideas of what we might be doing wrong, or how to otherwise avoid
this issue? Or if not, what I could do to further debug?

Sorry for the long email; I wanted to be thorough in my description
and avoid making assumptions about what could be causing this problem.

Thanks,
Caleb Spare


RE: Rare runsv logging problem

2014-07-25 Thread James Powell
Another thing could be that the service may not need a log. I've directed a lot 
of unwanted output to /dev/null.

Can you post one of your run files as an example?

Sent from my Windows Phone

From: James Powellmailto:james4...@hotmail.com
Sent: ‎7/‎25/‎2014 9:35 PM
To: Caleb Sparemailto:cesp...@gmail.com; 
supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: RE: Rare runsv logging problem

My question is why are you running Upstart? Runit has it's own init so Upstart 
is pointless. Runit's binary should maintain runsv. It also could depend on the 
run script also having an improper handling.

Sent from my Windows Phone

From: Caleb Sparemailto:cesp...@gmail.com
Sent: ‎7/‎25/‎2014 5:16 PM
To: supervision@list.skarnet.orgmailto:supervision@list.skarnet.org
Subject: Rare runsv logging problem

Hi,

I've been using runit for a while now and it has been mostly
wonderful. I'm noticing a persistent issue and I'm not sure how to
debug it.

On the servers we're running Ubuntu and we use runit 2.1.1 via the
default package that comes with the distro. Upstart runs runsvdir and
we use runit to manage all of our application processes. Each
application has a simple ./run and ./log/run; the latter execs svlogd
(this is all a typical configuration, as I understand it).

The problem I'm seeing is that, very occasionally, runsv will get into
a bad state where svlogd is not running. (I'm not sure if it fails to
start svlogd or if this happens later on after it has been running
properly.) When the problem occurs, pstree shows something like this:

runsvdir-+-runsv-+-foo---5*[{foo}]
 |   `-svlogd
 |-runsv-+-bar---21*[{bar}]
 |   `-svlogd
 `-runsv---baz---250*[{baz}]

Here you can see that the baz process does not have an associated
svlogd process. Further:

$ sudo sv s foo
run: foo: (pid 4885) 526260s; run: log: (pid 875) 526517s
$ sudo sv s baz
run: baz: (pid 2337) 2983swarning: baz: unable to open supervise/ok:
file does not exist
; run: log: (pid 2337) 2983s

Two strange things there: the warning about supervise/ok and also that
the pid for 'log' is the same as for 'baz'.

When runsv is in this bad state, the output from baz goes right to
runsvdir and ends up in /var/log/upstart/runsvdir.log.

The fix I've been using is to 'sv d baz' and then kill the offending
runsv process. Runsvdir will quickly restart it and then everything
will be working:

runsvdir-+-runsv-+-foo---5*[{foo}]
 |   `-svlogd
 |-runsv-+-baz---25*[{baz}]
 |   `-svlogd
 `-runsv-+-bar---20*[{bar}]
 `-svlogd

I'm unsure what causes this rare problem. We only do simple things
with the runit: sv {t,d,u}. When we deploy services, we rsync a
directory from elsewhere on the box into /etc/services/name and then
'sv t name'. That source dir only has ./run, ./finish, and
./log/run.

Any ideas of what we might be doing wrong, or how to otherwise avoid
this issue? Or if not, what I could do to further debug?

Sorry for the long email; I wanted to be thorough in my description
and avoid making assumptions about what could be causing this problem.

Thanks,
Caleb Spare


Runit design for LFS.

2014-07-24 Thread James Powell
Figured I'd get some feedback on this:

I've currently crafting the next major release of Runit for LFS as both a 
replacement for the LFS-Bootscripts and BLFS-Bootscripts packages I plan to dub 
as LFS-Runit-Services.

The focus of the design will center around a stand-alone stage-1 for booting 
the system equivalent of using LFS-Bootscripts plus the Random Number Generator 
entropy, as well as basic core networking.

Stage-2 will compromise all previous sysvinit operated services, with the 
exception of BLFS's LSB scripts which are nothing more than trigger scripts ran 
by a stage-1 processes. This will save a lot of time and effort with the next 
implementation. These stage scripts are redrafts and imported commands from 
VoidLinux and ArchIgnited with some customizations specifically for LFS.

As a result I've also had to import the service scripts for LSB, several 
handling scripts from LFS-Bootscripts for system control, and other things as 
well. I'm currently in the process of drafting a Makefile to install and 
uninstall scripts as needed, and rework the BLFS-services Makefile to perform 
equal actions for the service run files in creating and removing symlinks 
between /var/services(/etc/runit/runsvdir/current symlink) and /etc/sv. The 
goal is to make a complete swap-out from the book.

My goal is to have at least a minimal service set of scripts out for release by 
the first part of August if all goes well.
  

Runit work we've been up to.

2014-07-15 Thread James Powell
Even though our work with s6 is on hold at the moment, I wanted to take some 
time to let you all know that we've been working on our Runit-for-LFS 
implementation, and we've hit several milestones worth mentioning.

1. We've solved our long-running Stage 3 shutdown issue with drives not being 
properly dismounted.

2. We've reduced overhead of re-copying many init scripts for basic command 
executions for Stage 1 down to simple commands in Stage 1 and Stage 3.

3. We've developed a full FHS installation method.

4. All currently available scripts are working properly (all scripts are 
available from our topic on the LinuxQuestions forums here: 
http://www.linuxquestions.org/questions/linux-from-scratch-13/runit-for-lfs-without-sysvinit-official-release-4175506569/
 and are free software.

5. And lastly, we've succeeded in several test phases to determine the 
longevity of a new init system, and have determined Runit passes all our 
current criteria, and now we need help redeploying it out for other 
distributions.

The only question I have left is for Gerrit Pape and that is if Runit is still 
under any level of development upstream even if maintenance only, and are there 
any outstanding issues still in need of attention.

- Jim