Re: interesting claims

2019-05-19 Thread Guillermo
El dom., 19 may. 2019 a las 8:24, fungal-net escribió:
>
> [...]
> This is Adélie adelielinux.org
> installation on HD.  Although it is confusing to me how they set this up
> still, after months of following its development (beta3), there is
> sysvinit on the first steps of booting then OpenRC takes over, and then
> s6-supervisor handles everything running.  It is like a fruit punch in
> my eyes.

As far as I can tell, at the moment it has a sysvinit + OpenRC setup,
plus an s6 supervision tree anchored to process #1. Unlike other
distributions, the getty processes are part of the supervision tree,
and placed there by an Adélie-specific OpenRC service in the sysinit
runlevel.

> One of the reasons I am trying to learn more about init in general and
> porting s6 to a different system is to use either Adélie or Void-musl
> and have pure s6 on them.

I believe Adélie is heading towards supporting that.

> Both s6/s6-rc and 66 pkgs are available through void's repositories but
> s6-rc has been modified and I haven't been able to get it to work.

Really? As far as I can tell, Void's s6-rc is the unmodified upstream
package, and Void's 66 is the unmodified package from Obarun's
repository.

> Void uses arch-like /bin /sbin --> /usr/bin, Adélie has more traditional
> 4 separate directories.

Yeah, that's both /usr-merge and bin-sbin merge. A whole discussion in itself.

G.


Re: interesting claims

2019-05-19 Thread fungal-net
Guillermo:
>> But although I got curious what "kill -9 -1" would do to different
>> systems I don't see the usefulness of this.
> 
> Since you actually went ahead and did it, and reported the results,
> for me it was interesting to see if they matched what theory says that
> would happen. They did (assuming that what you wrote about the s6 case
> means that the system more or less reconstructed itself).

I am glad some of you can tell more than I can about this, and since you
did I tried my weirdest of setups.  This is Adélie adelielinux.org
installation on HD.  Although it is confusing to me how they set this up
still, after months of following its development (beta3), there is
sysvinit on the first steps of booting then OpenRC takes over, and then
s6-supervisor handles everything running.  It is like a fruit punch in
my eyes.  For those that don't know this is built on musl.

# kill -9 -1  on tty1 brought me back to tty1 login screen with 5 more
ttys active.  So everything is respawned almost instantly to a system
just like it had just booted.  Doing the same from terminal on X had the
same exact outcome.

> Thanks,
> G.

One of the reasons I am trying to learn more about init in general and
porting s6 to a different system is to use either Adélie or Void-musl
and have pure s6 on them.  Recent efforts with void failed, except for
using arch kernel building and installing Obarun's pkgs into void.  Not
very clean but works for months.  Dracut is a thing I still need to
learn about as obstacle #1.

Both s6/s6-rc and 66 pkgs are available through void's repositories but
s6-rc has been modified and I haven't been able to get it to work.
Void uses arch-like /bin /sbin --> /usr/bin, Adélie has more traditional
4 separate directories.



Re: interesting claims

2019-05-18 Thread Guillermo
El sáb., 18 may. 2019 a las 13:26, fungal-net escribió:
>
> >>>  OpenRC: Nice,
> >>>init
> >>> |_ zsh
> >>>when I exited the shell there was nothing but a dead cursor on my 
> >>> screen
> > [...]
> >> May I ask what was this setup like? You made a different entry for
> >> sysvinit, presumably with the customary getty processes configured in
> >> /etc/inittab 'respawn' entries, judging by your results, so how was
> >> the OpenRC case different?
> >
> > i also wondered whether he used openrc-init here ?
> > [...]
> I remember seeing this although I may have mixed it up.  I have a few
> Artix-OpenRC images and an older Manjaro-OpenRC which was a predecessor.
>  Running both again didn't produce this result.  They just froze with a
> dash on the top left of the screen, didn't poweroff. So I am puzzled now
> what I mixed up.

Ah, the OpenRC variant of Artix. That might explain it. Apparently,
this does mean 'pure' OpenRC indeed, i.e. openrc-init and
openrc-shutdown in addition to the service manager. I didn't know
there were distributions that used this setup.

openrc-init, just like Suckless init, does not currently supervise any
other process, so this test seems to have put the VM in a coma by
killing every process but #1 (after the only apparent survivor, zsh,
exited).

> But although I got curious what "kill -9 -1" would do to different
> systems I don't see the usefulness of this.

Since you actually went ahead and did it, and reported the results,
for me it was interesting to see if they matched what theory says that
would happen. They did (assuming that what you wrote about the s6 case
means that the system more or less reconstructed itself).

Thanks,
G.


Re: interesting claims

2019-05-18 Thread fungal-net
The tests I did were on live images run as vm-s

Jeff:
> 18.05.2019, 00:58, "Guillermo" :
>>>  OpenRC: Nice,
>>>    init
>>> |_ zsh
>>>    when I exited the shell there was nothing but a dead cursor on my screen
> 
> in this case the shell is not signaled since "-1" does not signal the sending
> process.
>> May I ask what was this setup like? You made a different entry for
>> sysvinit, presumably with the customary getty processes configured in
>> /etc/inittab 'respawn' entries, judging by your results, so how was
>> the OpenRC case different?
> 
> i also wondered whether he used openrc-init here ?
> in that case he may have also used openrc's "supervise-daemon" util
> which do not get restarted after they were terminated by the kill -1 -9
> blast and hence cannot respawn the gettys. looks like you were pretty
> hosed when you quit the super-user zsh (which sent the kill blast via
> its "kill" builtin) ?

I remember seeing this although I may have mixed it up.  I have a few
Artix-OpenRC images and an older Manjaro-OpenRC which was a predecessor.
 Running both again didn't produce this result.  They just froze with a
dash on the top left of the screen, didn't poweroff. So I am puzzled now
what I mixed up.

> you should provide more information on the used init here as openrc
> is not an init per se and works well with sysv + busybox init, runit, ...

This is clearly the case of OpenRC in some early Refracta images I have,
I didn't use them.  The Devuan version of OpenRC works as an additional
service supervisor.  In Artix if there are sysv type of scripts must be
limited in the early parts of booting.

> 
>>>  sysV: init and 6 ttys with shell ... nothing can kill it that I know off.
> 
> what do you mean here ?
> were the gettys respawned by SysV init or did they not die at all ?
> where did you send the signal from ?
> i would assume from a super-user zsh on a console tty ?

I am pretty sure I used a Devuan/Miyo image on this one, and I am pretty
sure they were respawned time after time of trying it again, as pids
were higher numbered.

For runit I used one Artix and one void, they seem to behave the same.

But although I got curious what "kill -9 -1" would do to different
systems I don't see the usefulness of this.  What could possibly,
without intention, do such a thing to a system?  An
intruder/virus/trojan trying to mess up your system?  I can't see that
software would malfunction to do something like this.

My initial inquiry was what it would be like killing things and going
down to 1 and whether you can rebuild from there, still a tty is needed,
or an ssh serving daemon to access such system.  And this is only just
reversing stage 2, right?



Re: interesting claims

2019-05-17 Thread Jeff
18.05.2019, 00:58, "Guillermo" :
>>  OpenRC: Nice,
>>    init
>> |_ zsh
>>    when I exited the shell there was nothing but a dead cursor on my screen

in this case the shell is not signaled since "-1" does not signal the sending
process.

> May I ask what was this setup like? You made a different entry for
> sysvinit, presumably with the customary getty processes configured in
> /etc/inittab 'respawn' entries, judging by your results, so how was
> the OpenRC case different?

i also wondered whether he used openrc-init here ?
in that case he may have also used openrc's "supervise-daemon" util
which do not get restarted after they were terminated by the kill -1 -9
blast and hence cannot respawn the gettys. looks like you were pretty
hosed when you quit the super-user zsh (which sent the kill blast via
its "kill" builtin) ?

you should provide more information on the used init here as openrc
is not an init per se and works well with sysv + busybox init, runit, ...

>>  sysV: init and 6 ttys with shell ... nothing can kill it that I know off.

what do you mean here ?
were the gettys respawned by SysV init or did they not die at all ?
where did you send the signal from ?
i would assume from a super-user zsh on a console tty ?



Re: interesting claims

2019-05-17 Thread Guillermo
Hi,

El vie., 17 may. 2019 a las 8:22, fungal-net escribió:
>
> OpenRC: Nice,
>   init
>|_ zsh
>   when I exited the shell there was nothing but a dead cursor on my screen

May I ask what was this setup like? You made a different entry for
sysvinit, presumably with the customary getty processes configured in
/etc/inittab 'respawn' entries, judging by your results, so how was
the OpenRC case different?

> sysV: init and 6 ttys with shell ... nothing can kill it that I know off.

Thanks,
G.


Re: interesting claims

2019-05-17 Thread fungal-net



Laurent Bercot:
> I'm not sure I understand your question, but I think there are
> really two different questions here; I'll try to reformulate them,
> correct me if I'm wrong.
> 
> 1. Is booting a system a linear process where every step is
> reversible?

Well, assuming it wasn't from experience I was asking if it could
possibly be reversible.  The answer did help me understand that what may
be theoretically possible it is most likely unnecessary, like having
very busy one-way streets around a building block and a parking space
became available 5 car lengths behind you.  It may be quicker to go
around the block (big fat luck).

> 2. Is it possible to restart a system "from scratch" without
> rebooting?
> 
> The answer to both questions is "not really, but it doesn't matter".
> 
>. . . .
>. . . . .
>
> Stage 1 isn't reversible; once it's done, you never touch it again,
> you don't need to "reverse" it. It would be akin to also unloading
> the kernel from memory before shutting down - it's just not necessary.

But if you can unload it you can reload it or load a different one?

> . . . . .
> . . . .
> 
> - If you want to kill every process but pid 1 and have the system
> reconstruct itself from there, then yes, it is possible, and that is
> the whole point of having a supervision tree rooted in pid 1. When
> you kill every process, the supervision tree respawns, so you always
> have a certain set of services running, and the system can always
> recover from whatever you throw at it. Try it: grab a machine with
> a supervision tree and a root shell, run "kill -9 -1", see what
> happens.

Very interesting:
Runit: I've never seen anything poweroff so fast (void faster than artix)
OpenRC: Nice,
  init
   |_ zsh
  when I exited the shell there was nothing but a dead cursor on my screen

S6/66:  Goodmorning, it is like I had rebooted and was looking at my login:

sysV: init and 6 ttys with shell ... nothing can kill it that I know off.

sys.239.D: I hate to say, same behavior as s6/66

***BSD: I must research to find the equivalent to kill -9 -1 but it
seemed like openrc behavior.

> 
> -- 
> Laurent
> 
> 


Re: interesting claims

2019-05-16 Thread Dewayne Geraghty
Thanks Laurent, the additional insight is appreciated (you should have a
skarnet.org/software/ page for "Insights and Philosophy" ;) )

A few years ago I managed a small outsource, the systems were rebooted
when UPS batteries were replaced or a disk mirror problem arose.
Because we used FreeBSD's jails we could nicely monitor and control the
environment(s), monit helped.  I'm looking to do the same this time, but
with HardenedBSD and s6.  We're in pursuit of both reliability and
resilience in a headless environment ;)

Sidenote: for Linux folk if curious about "jails" (ie not a chroot jail)
1) https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html

2)
https://www.freebsd.org/cgi/man.cgi?query=jail&apropos=0&sektion=0&manpath=FreeBSD+12.0-RELEASE+and+Ports&arch=default&format=html



Re: interesting claims

2019-05-16 Thread Jeff
16.05.2019, 10:31, "Laurent Bercot" :
>> The Question: As a newbie outsider I wonder, after following the
>> discussion of supervision and tasks on stages (1,2,3), that there is a
>> restrictive linear progression that prevents reversal. In terms of pid1
>> that I may not totally understand, is there a way that an admin can
>> reduce the system back to pid1 and restart processes instead of taking
>> the system down and restarting? If a glitch is found, usually it is
>> corrected and we find it simple to just do a reboot. What if you can
>> fix the problem and do it on the fly. The question would be why (or why
>> not), and I am not sure I can answer it, but if you theoretically can do
>> so, then can you also kill pid2 while pid10 is still running. With my
>> limited vision I see stages as one-way check valves in a series of fluid
>> linear flow.

take a look at (the now defunct) depinit:
http://sf.net/p/depinit/
http://depinit.sf.net/

it is said to provide very extended rollback of dependencies
(so extended gettys will not work with it according to the docs).

> Stage 1 isn't reversible; once it's done, you never touch it again,
> you don't need to "reverse" it. It would be akin to also unloading
> the kernel from memory before shutting down - it's just not necessary.

indeed.
and when something fails in that first stage a super-user rescue shell
should be started to fix it instead of any services that depend on it.
(stupid example: sethostname failed for some reason, spawn a rescue
shell for the admin to do something about it ;-).
in such cases it has to be considered whether this failure important
enough to justify interuption of the boot phase.

if not: start as much other services as possible,
output/log an error message, keep calm, and carry on,
things can be handled when a getty is up.

> stage 4

i would prefer to call it "stage 3b" since stage 4 would be start after
stage3a + b, i. e. process #1 execs into another executable, maybe
required in connection with initramfs, anopa provides such a stage 4
execline script.

> - If you want to kill every process but pid 1 and have the system
> reconstruct itself from there, then yes, it is possible, and that is
> the whole point of having a supervision tree rooted in pid 1. When
> you kill every process, the supervision tree respawns, so you always
> have a certain set of services running, and the system can always
> recover from whatever you throw at it. Try it: grab a machine with
> a supervision tree and a root shell, run "kill -9 -1", see what happens.

i wonder what happens if process #1 reacts to, say SIGTERM,
by starting the shutdown phase and doing reboot afterwards.
what if process #1 is signaled "accidently" by kill -TERM 1
(as we saw in preceding posts -1 will not reach it).
nothing is restarted and the system goes down instead since
it is assumed that the signal was not sent "accidently".

in the case of a process #1 not supervising anything, supervisor
runs with 1 < PID when killing everything "accidently"
(via kill ( -1, SIGKILL ) for example), system is bricked, reset
button has to be used:

only a privileged process can reach everything with PID > 1 that
way. there seems to be something wrong that should be fixed ASAP.
in the case of process #1 respawning the supervisor:
it restarts everything, maybe the "accident" happens again, and so on ...
could lead to the system being caught in such an "endless" loop.
maybe this can also only get fixed by powering down ...

non supervising process #1: same, but worse: reset button has to
be used, state is lost, fs are not unmounted cleanly and what not.

but in the situation of a supervising process #1 it can also be possible
to be prevented from entering the shutdown phase cleanly.



Re: interesting claims

2019-05-16 Thread Laurent Bercot

The Question:  As a newbie outsider I wonder, after following the
discussion of supervision and tasks on stages (1,2,3), that there is a
restrictive linear progression that prevents reversal.  In terms of pid1
that I may not totally understand, is there a way that an admin can
reduce the system back to pid1 and restart processes instead of taking
the system down and restarting?  If a glitch is found, usually it is
corrected and we find it simple to just do a reboot.  What if you can
fix the problem and do it on the fly.  The question would be why (or why
not), and I am not sure I can answer it, but if you theoretically can do
so, then can you also kill pid2 while pid10 is still running.  With my
limited vision I see stages as one-way check valves in a series of fluid
linear flow.


I'm not sure I understand your question, but I think there are
really two different questions here; I'll try to reformulate them,
correct me if I'm wrong.

1. Is booting a system a linear process where every step is
reversible?
2. Is it possible to restart a system "from scratch" without
rebooting?

The answer to both questions is "not really, but it doesn't matter".

We've been talking a lot about stages 1, 2 and 3 (and sometimes 4)
lately because I've been working on s6-linux-init, which focuses on
booting and especially on stage 1. But it's a very narrow, very
specific thing to focus on. Stage 1 is a critical part of the booting
process, obviously, and has to be done right, but once it is, you
can basically forget about it.

Most of the machine's lifetime, including most of the booting
sequence, happens in stage 2. Stage 1 is just early preparation, the
very basic minimum things you should be able to assume, such as
"there is a supervision tree running and I can add services to it";
for all intents and purposes, stage 2 is where you will be working,
even if your focus is to bring the machine up, e.g. if you're writing
a service manager.

Stage 1 isn't reversible; once it's done, you never touch it again,
you don't need to "reverse" it. It would be akin to also unloading
the kernel from memory before shutting down - it's just not necessary.

Stage 2 is where things happen. But what happens in stage 2 isn't
really reversible either: there is still a certain amount of one-time
initialization that needs to be done at boot time and doesn't need to
be undone at shutdown time. Booting and shutting down can be made
symmetric up to a point, but never entirely; the most obvious example
is mounting filesystems. There is a point in the boot sequence where
the filesystems are mounted; however, *unmounting* filesystems cannot
be done at the symmetrical point in the shutdown sequence - it has to
be done at the very end of the boot sequence, in stage 4, right before
the power goes off. Why? Because during shutdown, you may still have
user processes running, that prevent filesystems from being unmounted,
so you can only unmount filesystems after killing everything, which
happens at the end. Whereas during the boot sequence, you don't have
random user processes yet, you have a much more controlled
environment.
Booting and shutting down can't be made 100% symmetric. But that's
not a problem, because *symmetry is not a goal*. The goal of the
boot sequence is to make the machine operational; the goal of the
shutdown sequence is to make sure the plug can be pulled without
causing problems.

Symmetry makes sense in a service manager, because it helps to
see a service as being "up" or "down", and there is a hierarchy of
dependencies between services that make it natural to bring services
"up" or "down" in a certain, reversible order. But service management
isn't all there is, and in the bigger picture, a machine's lifetime
isn't perfectly symmetrical. And that's okay.

As for restarting a system from scratch without rebooting, the
question is what you want to achieve.

- If you want to be able to go through the whole shutdown procedure
with bringing down services etc. but *not* the actual hardware reboot,
and bringing up the whole system again from pid 1, yes, it is
theoretically possible, but not particularly useful. The shutdown
procedure is designed to make the system ready for poweroff, and it's
quite a waste if you're not going to poweroff. The boot procedure is
designed to get the system from a just-powered-on state to a fully
operational state, and it's also quite a waste if the system is
already fully operational. There aren't many problems which doing
this is the right solution to.

- If you want to kill every process but pid 1 and have the system
reconstruct itself from there, then yes, it is possible, and that is
the whole point of having a supervision tree rooted in pid 1. When
you kill every process, the supervision tree respawns, so you always
have a certain set of services running, and the system can always
recover from whatever you throw at it. Try it: grab a machine with
a supervision tree and a root shell, run "kill -9 -1", see wh

Re: interesting claims

2019-05-15 Thread fungal-net
I apologize for interrupting, and also make my presence known at the
same time, as my level of technical expertise should restrict me to
being a silent entry level student, but in all my searches I have not
gotten a good answer.  (introduction at the end)

The Question:  As a newbie outsider I wonder, after following the
discussion of supervision and tasks on stages (1,2,3), that there is a
restrictive linear progression that prevents reversal.  In terms of pid1
that I may not totally understand, is there a way that an admin can
reduce the system back to pid1 and restart processes instead of taking
the system down and restarting?  If a glitch is found, usually it is
corrected and we find it simple to just do a reboot.  What if you can
fix the problem and do it on the fly.  The question would be why (or why
not), and I am not sure I can answer it, but if you theoretically can do
so, then can you also kill pid2 while pid10 is still running.  With my
limited vision I see stages as one-way check valves in a series of fluid
linear flow.

In reference to the 95% reliability model which I can understand, I
believe systemd works on 50% reliability basis.  If there is a thing it
does well is to clean up the mess its own design constantly creates,
without bothering the admin.  It is like a wealthy home owner who eats
chocolates throwing the wrappers on the floor while walking through the
house and having servants cleaning up behind him.  He is always in a
clean house.  The extremes being having the house sealed to prevent dust
coming in, or clean up every week or two and let it breath some fresh
air.  I think the fallacy with supervision is if you try to anticipate
anything that can possibly happen when you can't.  Can the user without
any admin privileges be allowed to compile and run software and have
100% of available resources to do so?  How efficient is a system that
mandates a cap on resources?


--
Introduction:  I don't like to eavesdrop and just read/listen discussion
without people realizing I am here too, so I am making my presence
known.  I run a blog sysdfree.wordpress.com and I have been introduced
to s6 and runit in the past couple of years through using Obarun, Void,
and Artix, and by reading a few articles by Steve Litt.  I am fascinated
that in the world of open and free software meritocracy is really low
when compared to corporate budgets and marketing.  My aim is not to
write my own init system, not even hack the one I use, but find the
reasons why would large corporate projects fund a mediocre system, and
promote it, almost by force, while what is superior remains relatively
unknown.  I understand that there are merits in working quietly and
nearly alone, but still.  I have a hunch that control, of software
design and users, may have something to do with the "source of funding".


PS  I promise to remain quiet and learn before I speak again.





Re: interesting claims

2019-05-15 Thread Steve Litt
On Thu, 16 May 2019 01:22:14 +0200
Oliver Schad  wrote:

> On Wed, 15 May 2019 13:22:48 -0400
> Steve Litt  wrote:
> 
> > The preceding's true for you, but not for everyone. Some
> > people, like myself, are perfectly happy with a 95% reliable
> > system. I reboot once every 2 to 4 weeks to get rid of accumulated
> > state, or as a troubleshooting diagnostic test. I don't think I'm
> > alone. Some people need 100% reliable, some don't.  
> 
> That is a strange point of view: 

Not strange at all. In a tradeoff between reliability and simplicity,
some people will sacrifice some off the former to get some of the
latter.

> there might be people who doesn't
> need computers at all. So we shouldn't program anything? 

The preceding analogy makes no sense in the current context.

> So if there
> are people outside who needs a higher quality and Laurant wants to
> target them, then he needs to deliver that and it makes sense

for Laurant to program to their higher standards because that's what he
wants to do. It would also make sense for somebody to make something
simpler, but with lower reliability.

> argument with that requirement.

I don't understand the preceding phrase in the current context.

There's a tradeoff between the product A, which has the utmost in
reliability and a fairly simple architecture, and product B, which is
fairly reliable and has the utmost in simplicity. In contrast to A and
B, there's product C whose reliability is between A and B, but which is
much less simple than A and B. Then there's productD, which is
unreliable and whose architecture is an unholy mess. When viewed over
the entire spectrum, the differences in A and B could reasonably be
termed a "family quarrel". Absent from the entire discussion are people
who don't need A, B, C or D.

SteveT


Re: interesting claims

2019-05-15 Thread Oliver Schad
On Wed, 15 May 2019 13:22:48 -0400
Steve Litt  wrote:

> The preceding's true for you, but not for everyone. Some
> people, like myself, are perfectly happy with a 95% reliable system. I
> reboot once every 2 to 4 weeks to get rid of accumulated state, or as
> a troubleshooting diagnostic test. I don't think I'm alone. Some
> people need 100% reliable, some don't.

That is a strange point of view: there might be people who doesn't need
computers at all. So we shouldn't program anything? So if there are
people outside who needs a higher quality and Laurant wants to target
them, then he needs to deliver that and it makes sense to argument with
that requirement.

Best Regards
Oli

-- 
Automatic-Server AG •
Oliver Schad
Geschäftsführer
Turnerstrasse 2
9000 St. Gallen | Schweiz

www.automatic-server.com | oliver.sc...@automatic-server.com
Tel: +41 71 511 31 11 | Mobile: +41 76 330 03 47


pgpg5pz6vlCk6.pgp
Description: OpenPGP digital signature


Re: interesting claims

2019-05-15 Thread Steve Litt
On Wed, 01 May 2019 18:13:53 +
"Laurent Bercot"  wrote:

> >So Laurent's words from http://skarnet.org/software/s6/ were just
> >part of a very minor family quarrel, not a big deal, and nothing to
> >get worked up over.  
> 
>   This very minor family quarrel is the whole difference between
> having and not having a 100% reliable system, which is the whole
> point of supervision.

The preceding's true for you, but not for everyone. Some
people, like myself, are perfectly happy with a 95% reliable system. I
reboot once every 2 to 4 weeks to get rid of accumulated state, or as a
troubleshooting diagnostic test. I don't think I'm alone. Some people
need 100% reliable, some don't.

My liking of supervision is not 100% reliability, but instead 95%
reliability that is also simple, understandable, and lets me write
daemons that don't have to background themselves. I don't think I'm
alone.

 
>   Yes, obviously sinit and ewontfix init are greatly superior to
> systemd, sysvinit or what have you. 

Which is why I call it a family quarrel. Some in our family have a
strong viewpoint on whether PID1 supervises at least one process, and
some don't. But outside our family, most are happy with systemd, which
of course makes most of us retch.

> That is a low bar to clear. And
> the day we're happy with low bars is the day we start getting
> complacent and writing mediocre software.

I'd call it a not-highest bar, not a low bar. Systemd is a low bar.
> 
>   Also, you are misrepresenting my position - this is not the first
> time, and it's not the first time I'm asking you to do better.
> I've never said that the supervision had to be done by pid 1, actually
> I insist on the exact opposite: the supervisor *does not* have to
> be pid 1. What I am saying, however, is that pid 1 must supervise
> *at least one process*, which is a very different thing.

I'm sorry. Either I didn't know the preceding, or I forgot it. And
supervising one process in PID1 makes a lot more sense than packing an
entire supervisor in PID1.


>   s6-svscan is not a supervisor. It can supervise s6-supervise
> processes, yes - that's a part of being suitable as pid 1 - but it's
> not the same as being able to supervise any daemon, which is much
> harder because "any daemon" is not a known quantity.

I understand now.

>   Supervising a process you control is simple; supervising a process
> you don't know the behaviour of, which is what the job of a
> "supervisor" is, is more complex.

I understand now.

Thanks,

SteveT


Re: interesting claims

2019-05-01 Thread Laurent Bercot

So Laurent's words from http://skarnet.org/software/s6/ were just part
of a very minor family quarrel, not a big deal, and nothing to get
worked up over.


 This very minor family quarrel is the whole difference between having
and not having a 100% reliable system, which is the whole point of
supervision.

 Yes, obviously sinit and ewontfix init are greatly superior to
systemd, sysvinit or what have you. That is a low bar to clear. And
the day we're happy with low bars is the day we start getting
complacent and writing mediocre software.

 Also, you are misrepresenting my position - this is not the first
time, and it's not the first time I'm asking you to do better.
I've never said that the supervision had to be done by pid 1, actually
I insist on the exact opposite: the supervisor *does not* have to
be pid 1. What I am saying, however, is that pid 1 must supervise
*at least one process*, which is a very different thing.

 s6-svscan is not a supervisor. It can supervise s6-supervise
processes, yes - that's a part of being suitable as pid 1 - but it's
not the same as being able to supervise any daemon, which is much
harder because "any daemon" is not a known quantity.
 Supervising a process you control is simple; supervising a process
you don't know the behaviour of, which is what the job of a
"supervisor" is, is more complex.

 In future presentations, I will make sure to pinpoint the difference.
Yes, that is a detail, but this detail is what allows us to make
pid 1 both simple (not having the whole supervision logic in pid 1)
and correct (covering the case where all processes die).

--
 Laurent



Re: interesting claims

2019-05-01 Thread Steve Litt
On Mon, 29 Apr 2019 21:19:58 +0200
Jeff  wrote:

> i came across some interesting claims recently. on
> http://skarnet.org/software/s6/
> it reads
> 
> "suckless init is incorrect, because it has no supervision
> capabilities, and thus, killing all processes but init can brick the
> machine."

Oh, that.

First of all, Suckless Init is a PID1 that forks an rc script and then
hangs around reaping zombies, but it's not an entire init system. You
could make it a complete init system by using the forked rc file to run
supervision systems such as daemontools-encore and the supervision part
of runit and s6. And of course you'd need a shutdown script that PID1
can call when it gets signals to reboot or poweroff. So Suckless Init is
the PID1 part of an init system. It's 83 lines of C. It's not an entire
init system.

There are three philosophies: 

1) The supervision should be done by PID1: Supported by Laurent Bercot

2) The supervision should be done outside of PID1: Perhaps supported by
   Rich Felker in his http://ewontfix.com/14/ blog.

3) Either is acceptable and greatly superior to systemd, sysvinit,
   upstart, etc. This is supported by most people who like process
   supervision.

So Laurent's words from http://skarnet.org/software/s6/ were just part
of a very minor family quarrel, not a big deal, and nothing to get
worked up over.

SteveT


Re: interesting claims

2019-05-01 Thread Steve Litt
On Mon, 29 Apr 2019 21:19:58 +0200
Jeff  wrote:

> i came across some interesting claims recently. on
> http://skarnet.org/software/s6/
> it reads
> 
> "suckless init is incorrect, because it has no supervision
> capabilities, and thus, killing all processes but init can brick the
> machine."

Oh, that.

First of all, Suckless Init is a PID1 that forks an rc script and then
hangs around reaping zombies. You could use that rc file to run
supervision systems such as daemontools-encore and the supervision part
of runit and s6. So Suckless Init is the PID1 part of an init system.
It's 83 lines of C.

There are three philosophies: 

1) The supervision should be done by PID1: Supported by Laurent Bercot

2) The supervision should be done outside of PID1: Perhaps supported by
   Rich Felker in his http://ewontfix.com/about/

> 
> a rather bold claim IMO !
> where was the "correct" init behaviour specified ?
> where can i learn how a "correct" init has to operate ?
> or is it true since s6-svscan already provides such respawn
> capabilities ? ;-)
> 
> there is actually NO need for a "correct" working init implementation
> to provide respawn capabilities at all IMO.
> this can easily done in/by a subprocess and has 2 advantages:
> 
> - it simplyfies the init implementation
> 
> - process #1 is the default subprocess reaper on any unix
>   implementation and hence a lot of terminated zombie subprocesses
>   get assigned to it, subprocesses that were not started by it.
>   if it has respawn capabilities it has to find out if any of this
> recently assigned but elsewhere terminated subprocesses is one of its
>   own childs to be respawned. if it has lots of services to respawn
>   this means lots of unnecessary work that could be also done
>   in/by a suprocess aswell.
> 
> when do you kill a non supvervised process running with UID 0
> "accidently" ? when calling kill ( -1, SIGTERM ) ?
> the kernel protects special/important processes in this case from
> being killed "accidently", that's true.
> but where do we usually see that ? in the shutdown stage, i guess.
> and that's exactly where one wants to kill all process with PID > 1
> (sometimes excluding the calling process since it has to complete
> more tasks). or when going into single user mode.
> 
> so this looks like a rather artificial and constructed argument for
> the necessity of respawn functionality in an init implementation IMO.
> 



-- 
SteveT

Steve Litt 
January 2019 featured book: Troubleshooting: Just the Facts
http://www.troubleshooters.com/tjust


Re: interesting claims

2019-04-30 Thread Jonathan de Boyne Pollard

Jeff:


where can i learn how a "correct" init has to operate ?



See https://unix.stackexchange.com/a/197472/5132 for starters.


Re: interesting claims

2019-04-30 Thread Laurent Bercot

"suckless init is incorrect, because it has no supervision capabilities,
and thus, killing all processes but init can brick the machine."

a rather bold claim IMO !
where was the "correct" init behaviour specified ?
where can i learn how a "correct" init has to operate ?


For instance:
https://archive.fosdem.org/2017/schedule/event/s6_supervision/
https://www.youtube.com/watch?v=I7qE43KK5bY&t=7591
 
https://www.reddit.com/r/linux/comments/2dx7k3/s6_skarnetorg_small_secure_supervision_software/cjxc1hj/?context=3


 Or, as Guillermo mentioned, several posts in the ML archive.

 init is a subject that little study has been put into (though it
is also the subject of a whole lot of talk, which says something
about whether people would rather talk or study). But I think you'll
find that things are different around here.



or is it true since s6-svscan already provides such respawn
capabilities ? ;-)


Do not mistake causes for consequences. Things are not correct
because s6 does them; s6 does things because they are correct.



there is actually NO need for a "correct" working init implementation
to provide respawn capabilities at all IMO.


Then you are free to use one of the many incorrect inits out there,
including sinit, Rich Felker's init, dumb-init, and others. You are
definitely not alone with your opinion. However, you sound interested
in process supervision, which is part of the more general idea that a
machine should be made as reliable as possible *at all times* and
*under any circumstances*; if you subscribe to that idea, then you
will understand why init must supervise at least 1 process.



so this looks like a rather artificial and constructed argument for
the necessity of respawn functionality in an init implementation IMO.


 Maybe you've never bricked a device because init didn't respawn
anything. I have. The "rather artificial and constructed argument"
happened to me in real life, and it was a significant inconvenience.

--
 Laurent


Re: interesting claims

2019-04-29 Thread Guillermo
El lun., 29 abr. 2019 a las 16:46, Jeff escribió:
>
> "suckless init is incorrect, because it has no supervision capabilities,
> and thus, killing all processes but init can brick the machine."
>
> a rather bold claim IMO !
> where was the "correct" init behaviour specified ?
> where can i learn how a "correct" init has to operate ?
> [...]
> there is actually NO need for a "correct" working init implementation
> to provide respawn capabilities at all IMO.

This was discussed in the mailing list, you'll be able to find
relevant messages in the archives, and the last part of the sentence
you quoted should clarify what "correct" means in this context. But to
recap:

* A failure mode is identified (the machine becomes unusable and
requires a hard reboot), along with the condition that triggers it
(death of all processes except #1).
* The condition can be triggered explicitly with a kill(-1, SIGKILL)
call in a process with root privileges, so by definition it is not an
impossible condition, but this is not the only way to trigger it.
Processes can die for a variety of reasons.
* An program with "respawn capabilities" running as process 1 can
avoid entering this failure mode, a program that does not have the
capabilities, cannot.

Nothing more, nothing less. This is not a statement about how likely
this failure mode is, only that it exists. An init system can or
cannot choose to prevent it, this is a design choice (and usage of
"correct" will give you an idea of what the author of this particular
software package thinks), and a person may or may not decide to use an
init system that doesn't, this is a matter of preference.

G: