Re: interesting claims
El dom., 19 may. 2019 a las 8:24, fungal-net escribió: > > [...] > This is Adélie adelielinux.org > installation on HD. Although it is confusing to me how they set this up > still, after months of following its development (beta3), there is > sysvinit on the first steps of booting then OpenRC takes over, and then > s6-supervisor handles everything running. It is like a fruit punch in > my eyes. As far as I can tell, at the moment it has a sysvinit + OpenRC setup, plus an s6 supervision tree anchored to process #1. Unlike other distributions, the getty processes are part of the supervision tree, and placed there by an Adélie-specific OpenRC service in the sysinit runlevel. > One of the reasons I am trying to learn more about init in general and > porting s6 to a different system is to use either Adélie or Void-musl > and have pure s6 on them. I believe Adélie is heading towards supporting that. > Both s6/s6-rc and 66 pkgs are available through void's repositories but > s6-rc has been modified and I haven't been able to get it to work. Really? As far as I can tell, Void's s6-rc is the unmodified upstream package, and Void's 66 is the unmodified package from Obarun's repository. > Void uses arch-like /bin /sbin --> /usr/bin, Adélie has more traditional > 4 separate directories. Yeah, that's both /usr-merge and bin-sbin merge. A whole discussion in itself. G.
Re: interesting claims
Guillermo: >> But although I got curious what "kill -9 -1" would do to different >> systems I don't see the usefulness of this. > > Since you actually went ahead and did it, and reported the results, > for me it was interesting to see if they matched what theory says that > would happen. They did (assuming that what you wrote about the s6 case > means that the system more or less reconstructed itself). I am glad some of you can tell more than I can about this, and since you did I tried my weirdest of setups. This is Adélie adelielinux.org installation on HD. Although it is confusing to me how they set this up still, after months of following its development (beta3), there is sysvinit on the first steps of booting then OpenRC takes over, and then s6-supervisor handles everything running. It is like a fruit punch in my eyes. For those that don't know this is built on musl. # kill -9 -1 on tty1 brought me back to tty1 login screen with 5 more ttys active. So everything is respawned almost instantly to a system just like it had just booted. Doing the same from terminal on X had the same exact outcome. > Thanks, > G. One of the reasons I am trying to learn more about init in general and porting s6 to a different system is to use either Adélie or Void-musl and have pure s6 on them. Recent efforts with void failed, except for using arch kernel building and installing Obarun's pkgs into void. Not very clean but works for months. Dracut is a thing I still need to learn about as obstacle #1. Both s6/s6-rc and 66 pkgs are available through void's repositories but s6-rc has been modified and I haven't been able to get it to work. Void uses arch-like /bin /sbin --> /usr/bin, Adélie has more traditional 4 separate directories.
Re: interesting claims
El sáb., 18 may. 2019 a las 13:26, fungal-net escribió: > > >>> OpenRC: Nice, > >>>init > >>> |_ zsh > >>>when I exited the shell there was nothing but a dead cursor on my > >>> screen > > [...] > >> May I ask what was this setup like? You made a different entry for > >> sysvinit, presumably with the customary getty processes configured in > >> /etc/inittab 'respawn' entries, judging by your results, so how was > >> the OpenRC case different? > > > > i also wondered whether he used openrc-init here ? > > [...] > I remember seeing this although I may have mixed it up. I have a few > Artix-OpenRC images and an older Manjaro-OpenRC which was a predecessor. > Running both again didn't produce this result. They just froze with a > dash on the top left of the screen, didn't poweroff. So I am puzzled now > what I mixed up. Ah, the OpenRC variant of Artix. That might explain it. Apparently, this does mean 'pure' OpenRC indeed, i.e. openrc-init and openrc-shutdown in addition to the service manager. I didn't know there were distributions that used this setup. openrc-init, just like Suckless init, does not currently supervise any other process, so this test seems to have put the VM in a coma by killing every process but #1 (after the only apparent survivor, zsh, exited). > But although I got curious what "kill -9 -1" would do to different > systems I don't see the usefulness of this. Since you actually went ahead and did it, and reported the results, for me it was interesting to see if they matched what theory says that would happen. They did (assuming that what you wrote about the s6 case means that the system more or less reconstructed itself). Thanks, G.
Re: interesting claims
The tests I did were on live images run as vm-s Jeff: > 18.05.2019, 00:58, "Guillermo" : >>> OpenRC: Nice, >>> init >>> |_ zsh >>> when I exited the shell there was nothing but a dead cursor on my screen > > in this case the shell is not signaled since "-1" does not signal the sending > process. >> May I ask what was this setup like? You made a different entry for >> sysvinit, presumably with the customary getty processes configured in >> /etc/inittab 'respawn' entries, judging by your results, so how was >> the OpenRC case different? > > i also wondered whether he used openrc-init here ? > in that case he may have also used openrc's "supervise-daemon" util > which do not get restarted after they were terminated by the kill -1 -9 > blast and hence cannot respawn the gettys. looks like you were pretty > hosed when you quit the super-user zsh (which sent the kill blast via > its "kill" builtin) ? I remember seeing this although I may have mixed it up. I have a few Artix-OpenRC images and an older Manjaro-OpenRC which was a predecessor. Running both again didn't produce this result. They just froze with a dash on the top left of the screen, didn't poweroff. So I am puzzled now what I mixed up. > you should provide more information on the used init here as openrc > is not an init per se and works well with sysv + busybox init, runit, ... This is clearly the case of OpenRC in some early Refracta images I have, I didn't use them. The Devuan version of OpenRC works as an additional service supervisor. In Artix if there are sysv type of scripts must be limited in the early parts of booting. > >>> sysV: init and 6 ttys with shell ... nothing can kill it that I know off. > > what do you mean here ? > were the gettys respawned by SysV init or did they not die at all ? > where did you send the signal from ? > i would assume from a super-user zsh on a console tty ? I am pretty sure I used a Devuan/Miyo image on this one, and I am pretty sure they were respawned time after time of trying it again, as pids were higher numbered. For runit I used one Artix and one void, they seem to behave the same. But although I got curious what "kill -9 -1" would do to different systems I don't see the usefulness of this. What could possibly, without intention, do such a thing to a system? An intruder/virus/trojan trying to mess up your system? I can't see that software would malfunction to do something like this. My initial inquiry was what it would be like killing things and going down to 1 and whether you can rebuild from there, still a tty is needed, or an ssh serving daemon to access such system. And this is only just reversing stage 2, right?
Re: interesting claims
18.05.2019, 00:58, "Guillermo" : >> OpenRC: Nice, >> init >> |_ zsh >> when I exited the shell there was nothing but a dead cursor on my screen in this case the shell is not signaled since "-1" does not signal the sending process. > May I ask what was this setup like? You made a different entry for > sysvinit, presumably with the customary getty processes configured in > /etc/inittab 'respawn' entries, judging by your results, so how was > the OpenRC case different? i also wondered whether he used openrc-init here ? in that case he may have also used openrc's "supervise-daemon" util which do not get restarted after they were terminated by the kill -1 -9 blast and hence cannot respawn the gettys. looks like you were pretty hosed when you quit the super-user zsh (which sent the kill blast via its "kill" builtin) ? you should provide more information on the used init here as openrc is not an init per se and works well with sysv + busybox init, runit, ... >> sysV: init and 6 ttys with shell ... nothing can kill it that I know off. what do you mean here ? were the gettys respawned by SysV init or did they not die at all ? where did you send the signal from ? i would assume from a super-user zsh on a console tty ?
Re: interesting claims
Hi, El vie., 17 may. 2019 a las 8:22, fungal-net escribió: > > OpenRC: Nice, > init >|_ zsh > when I exited the shell there was nothing but a dead cursor on my screen May I ask what was this setup like? You made a different entry for sysvinit, presumably with the customary getty processes configured in /etc/inittab 'respawn' entries, judging by your results, so how was the OpenRC case different? > sysV: init and 6 ttys with shell ... nothing can kill it that I know off. Thanks, G.
Re: interesting claims
Laurent Bercot: > I'm not sure I understand your question, but I think there are > really two different questions here; I'll try to reformulate them, > correct me if I'm wrong. > > 1. Is booting a system a linear process where every step is > reversible? Well, assuming it wasn't from experience I was asking if it could possibly be reversible. The answer did help me understand that what may be theoretically possible it is most likely unnecessary, like having very busy one-way streets around a building block and a parking space became available 5 car lengths behind you. It may be quicker to go around the block (big fat luck). > 2. Is it possible to restart a system "from scratch" without > rebooting? > > The answer to both questions is "not really, but it doesn't matter". > >. . . . >. . . . . > > Stage 1 isn't reversible; once it's done, you never touch it again, > you don't need to "reverse" it. It would be akin to also unloading > the kernel from memory before shutting down - it's just not necessary. But if you can unload it you can reload it or load a different one? > . . . . . > . . . . > > - If you want to kill every process but pid 1 and have the system > reconstruct itself from there, then yes, it is possible, and that is > the whole point of having a supervision tree rooted in pid 1. When > you kill every process, the supervision tree respawns, so you always > have a certain set of services running, and the system can always > recover from whatever you throw at it. Try it: grab a machine with > a supervision tree and a root shell, run "kill -9 -1", see what > happens. Very interesting: Runit: I've never seen anything poweroff so fast (void faster than artix) OpenRC: Nice, init |_ zsh when I exited the shell there was nothing but a dead cursor on my screen S6/66: Goodmorning, it is like I had rebooted and was looking at my login: sysV: init and 6 ttys with shell ... nothing can kill it that I know off. sys.239.D: I hate to say, same behavior as s6/66 ***BSD: I must research to find the equivalent to kill -9 -1 but it seemed like openrc behavior. > > -- > Laurent > >
Re: interesting claims
Thanks Laurent, the additional insight is appreciated (you should have a skarnet.org/software/ page for "Insights and Philosophy" ;) ) A few years ago I managed a small outsource, the systems were rebooted when UPS batteries were replaced or a disk mirror problem arose. Because we used FreeBSD's jails we could nicely monitor and control the environment(s), monit helped. I'm looking to do the same this time, but with HardenedBSD and s6. We're in pursuit of both reliability and resilience in a headless environment ;) Sidenote: for Linux folk if curious about "jails" (ie not a chroot jail) 1) https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails.html 2) https://www.freebsd.org/cgi/man.cgi?query=jail&apropos=0&sektion=0&manpath=FreeBSD+12.0-RELEASE+and+Ports&arch=default&format=html
Re: interesting claims
16.05.2019, 10:31, "Laurent Bercot" : >> The Question: As a newbie outsider I wonder, after following the >> discussion of supervision and tasks on stages (1,2,3), that there is a >> restrictive linear progression that prevents reversal. In terms of pid1 >> that I may not totally understand, is there a way that an admin can >> reduce the system back to pid1 and restart processes instead of taking >> the system down and restarting? If a glitch is found, usually it is >> corrected and we find it simple to just do a reboot. What if you can >> fix the problem and do it on the fly. The question would be why (or why >> not), and I am not sure I can answer it, but if you theoretically can do >> so, then can you also kill pid2 while pid10 is still running. With my >> limited vision I see stages as one-way check valves in a series of fluid >> linear flow. take a look at (the now defunct) depinit: http://sf.net/p/depinit/ http://depinit.sf.net/ it is said to provide very extended rollback of dependencies (so extended gettys will not work with it according to the docs). > Stage 1 isn't reversible; once it's done, you never touch it again, > you don't need to "reverse" it. It would be akin to also unloading > the kernel from memory before shutting down - it's just not necessary. indeed. and when something fails in that first stage a super-user rescue shell should be started to fix it instead of any services that depend on it. (stupid example: sethostname failed for some reason, spawn a rescue shell for the admin to do something about it ;-). in such cases it has to be considered whether this failure important enough to justify interuption of the boot phase. if not: start as much other services as possible, output/log an error message, keep calm, and carry on, things can be handled when a getty is up. > stage 4 i would prefer to call it "stage 3b" since stage 4 would be start after stage3a + b, i. e. process #1 execs into another executable, maybe required in connection with initramfs, anopa provides such a stage 4 execline script. > - If you want to kill every process but pid 1 and have the system > reconstruct itself from there, then yes, it is possible, and that is > the whole point of having a supervision tree rooted in pid 1. When > you kill every process, the supervision tree respawns, so you always > have a certain set of services running, and the system can always > recover from whatever you throw at it. Try it: grab a machine with > a supervision tree and a root shell, run "kill -9 -1", see what happens. i wonder what happens if process #1 reacts to, say SIGTERM, by starting the shutdown phase and doing reboot afterwards. what if process #1 is signaled "accidently" by kill -TERM 1 (as we saw in preceding posts -1 will not reach it). nothing is restarted and the system goes down instead since it is assumed that the signal was not sent "accidently". in the case of a process #1 not supervising anything, supervisor runs with 1 < PID when killing everything "accidently" (via kill ( -1, SIGKILL ) for example), system is bricked, reset button has to be used: only a privileged process can reach everything with PID > 1 that way. there seems to be something wrong that should be fixed ASAP. in the case of process #1 respawning the supervisor: it restarts everything, maybe the "accident" happens again, and so on ... could lead to the system being caught in such an "endless" loop. maybe this can also only get fixed by powering down ... non supervising process #1: same, but worse: reset button has to be used, state is lost, fs are not unmounted cleanly and what not. but in the situation of a supervising process #1 it can also be possible to be prevented from entering the shutdown phase cleanly.
Re: interesting claims
The Question: As a newbie outsider I wonder, after following the discussion of supervision and tasks on stages (1,2,3), that there is a restrictive linear progression that prevents reversal. In terms of pid1 that I may not totally understand, is there a way that an admin can reduce the system back to pid1 and restart processes instead of taking the system down and restarting? If a glitch is found, usually it is corrected and we find it simple to just do a reboot. What if you can fix the problem and do it on the fly. The question would be why (or why not), and I am not sure I can answer it, but if you theoretically can do so, then can you also kill pid2 while pid10 is still running. With my limited vision I see stages as one-way check valves in a series of fluid linear flow. I'm not sure I understand your question, but I think there are really two different questions here; I'll try to reformulate them, correct me if I'm wrong. 1. Is booting a system a linear process where every step is reversible? 2. Is it possible to restart a system "from scratch" without rebooting? The answer to both questions is "not really, but it doesn't matter". We've been talking a lot about stages 1, 2 and 3 (and sometimes 4) lately because I've been working on s6-linux-init, which focuses on booting and especially on stage 1. But it's a very narrow, very specific thing to focus on. Stage 1 is a critical part of the booting process, obviously, and has to be done right, but once it is, you can basically forget about it. Most of the machine's lifetime, including most of the booting sequence, happens in stage 2. Stage 1 is just early preparation, the very basic minimum things you should be able to assume, such as "there is a supervision tree running and I can add services to it"; for all intents and purposes, stage 2 is where you will be working, even if your focus is to bring the machine up, e.g. if you're writing a service manager. Stage 1 isn't reversible; once it's done, you never touch it again, you don't need to "reverse" it. It would be akin to also unloading the kernel from memory before shutting down - it's just not necessary. Stage 2 is where things happen. But what happens in stage 2 isn't really reversible either: there is still a certain amount of one-time initialization that needs to be done at boot time and doesn't need to be undone at shutdown time. Booting and shutting down can be made symmetric up to a point, but never entirely; the most obvious example is mounting filesystems. There is a point in the boot sequence where the filesystems are mounted; however, *unmounting* filesystems cannot be done at the symmetrical point in the shutdown sequence - it has to be done at the very end of the boot sequence, in stage 4, right before the power goes off. Why? Because during shutdown, you may still have user processes running, that prevent filesystems from being unmounted, so you can only unmount filesystems after killing everything, which happens at the end. Whereas during the boot sequence, you don't have random user processes yet, you have a much more controlled environment. Booting and shutting down can't be made 100% symmetric. But that's not a problem, because *symmetry is not a goal*. The goal of the boot sequence is to make the machine operational; the goal of the shutdown sequence is to make sure the plug can be pulled without causing problems. Symmetry makes sense in a service manager, because it helps to see a service as being "up" or "down", and there is a hierarchy of dependencies between services that make it natural to bring services "up" or "down" in a certain, reversible order. But service management isn't all there is, and in the bigger picture, a machine's lifetime isn't perfectly symmetrical. And that's okay. As for restarting a system from scratch without rebooting, the question is what you want to achieve. - If you want to be able to go through the whole shutdown procedure with bringing down services etc. but *not* the actual hardware reboot, and bringing up the whole system again from pid 1, yes, it is theoretically possible, but not particularly useful. The shutdown procedure is designed to make the system ready for poweroff, and it's quite a waste if you're not going to poweroff. The boot procedure is designed to get the system from a just-powered-on state to a fully operational state, and it's also quite a waste if the system is already fully operational. There aren't many problems which doing this is the right solution to. - If you want to kill every process but pid 1 and have the system reconstruct itself from there, then yes, it is possible, and that is the whole point of having a supervision tree rooted in pid 1. When you kill every process, the supervision tree respawns, so you always have a certain set of services running, and the system can always recover from whatever you throw at it. Try it: grab a machine with a supervision tree and a root shell, run "kill -9 -1", see wh
Re: interesting claims
I apologize for interrupting, and also make my presence known at the same time, as my level of technical expertise should restrict me to being a silent entry level student, but in all my searches I have not gotten a good answer. (introduction at the end) The Question: As a newbie outsider I wonder, after following the discussion of supervision and tasks on stages (1,2,3), that there is a restrictive linear progression that prevents reversal. In terms of pid1 that I may not totally understand, is there a way that an admin can reduce the system back to pid1 and restart processes instead of taking the system down and restarting? If a glitch is found, usually it is corrected and we find it simple to just do a reboot. What if you can fix the problem and do it on the fly. The question would be why (or why not), and I am not sure I can answer it, but if you theoretically can do so, then can you also kill pid2 while pid10 is still running. With my limited vision I see stages as one-way check valves in a series of fluid linear flow. In reference to the 95% reliability model which I can understand, I believe systemd works on 50% reliability basis. If there is a thing it does well is to clean up the mess its own design constantly creates, without bothering the admin. It is like a wealthy home owner who eats chocolates throwing the wrappers on the floor while walking through the house and having servants cleaning up behind him. He is always in a clean house. The extremes being having the house sealed to prevent dust coming in, or clean up every week or two and let it breath some fresh air. I think the fallacy with supervision is if you try to anticipate anything that can possibly happen when you can't. Can the user without any admin privileges be allowed to compile and run software and have 100% of available resources to do so? How efficient is a system that mandates a cap on resources? -- Introduction: I don't like to eavesdrop and just read/listen discussion without people realizing I am here too, so I am making my presence known. I run a blog sysdfree.wordpress.com and I have been introduced to s6 and runit in the past couple of years through using Obarun, Void, and Artix, and by reading a few articles by Steve Litt. I am fascinated that in the world of open and free software meritocracy is really low when compared to corporate budgets and marketing. My aim is not to write my own init system, not even hack the one I use, but find the reasons why would large corporate projects fund a mediocre system, and promote it, almost by force, while what is superior remains relatively unknown. I understand that there are merits in working quietly and nearly alone, but still. I have a hunch that control, of software design and users, may have something to do with the "source of funding". PS I promise to remain quiet and learn before I speak again.
Re: interesting claims
On Thu, 16 May 2019 01:22:14 +0200 Oliver Schad wrote: > On Wed, 15 May 2019 13:22:48 -0400 > Steve Litt wrote: > > > The preceding's true for you, but not for everyone. Some > > people, like myself, are perfectly happy with a 95% reliable > > system. I reboot once every 2 to 4 weeks to get rid of accumulated > > state, or as a troubleshooting diagnostic test. I don't think I'm > > alone. Some people need 100% reliable, some don't. > > That is a strange point of view: Not strange at all. In a tradeoff between reliability and simplicity, some people will sacrifice some off the former to get some of the latter. > there might be people who doesn't > need computers at all. So we shouldn't program anything? The preceding analogy makes no sense in the current context. > So if there > are people outside who needs a higher quality and Laurant wants to > target them, then he needs to deliver that and it makes sense for Laurant to program to their higher standards because that's what he wants to do. It would also make sense for somebody to make something simpler, but with lower reliability. > argument with that requirement. I don't understand the preceding phrase in the current context. There's a tradeoff between the product A, which has the utmost in reliability and a fairly simple architecture, and product B, which is fairly reliable and has the utmost in simplicity. In contrast to A and B, there's product C whose reliability is between A and B, but which is much less simple than A and B. Then there's productD, which is unreliable and whose architecture is an unholy mess. When viewed over the entire spectrum, the differences in A and B could reasonably be termed a "family quarrel". Absent from the entire discussion are people who don't need A, B, C or D. SteveT
Re: interesting claims
On Wed, 15 May 2019 13:22:48 -0400 Steve Litt wrote: > The preceding's true for you, but not for everyone. Some > people, like myself, are perfectly happy with a 95% reliable system. I > reboot once every 2 to 4 weeks to get rid of accumulated state, or as > a troubleshooting diagnostic test. I don't think I'm alone. Some > people need 100% reliable, some don't. That is a strange point of view: there might be people who doesn't need computers at all. So we shouldn't program anything? So if there are people outside who needs a higher quality and Laurant wants to target them, then he needs to deliver that and it makes sense to argument with that requirement. Best Regards Oli -- Automatic-Server AG • Oliver Schad Geschäftsführer Turnerstrasse 2 9000 St. Gallen | Schweiz www.automatic-server.com | oliver.sc...@automatic-server.com Tel: +41 71 511 31 11 | Mobile: +41 76 330 03 47 pgpg5pz6vlCk6.pgp Description: OpenPGP digital signature
Re: interesting claims
On Wed, 01 May 2019 18:13:53 + "Laurent Bercot" wrote: > >So Laurent's words from http://skarnet.org/software/s6/ were just > >part of a very minor family quarrel, not a big deal, and nothing to > >get worked up over. > > This very minor family quarrel is the whole difference between > having and not having a 100% reliable system, which is the whole > point of supervision. The preceding's true for you, but not for everyone. Some people, like myself, are perfectly happy with a 95% reliable system. I reboot once every 2 to 4 weeks to get rid of accumulated state, or as a troubleshooting diagnostic test. I don't think I'm alone. Some people need 100% reliable, some don't. My liking of supervision is not 100% reliability, but instead 95% reliability that is also simple, understandable, and lets me write daemons that don't have to background themselves. I don't think I'm alone. > Yes, obviously sinit and ewontfix init are greatly superior to > systemd, sysvinit or what have you. Which is why I call it a family quarrel. Some in our family have a strong viewpoint on whether PID1 supervises at least one process, and some don't. But outside our family, most are happy with systemd, which of course makes most of us retch. > That is a low bar to clear. And > the day we're happy with low bars is the day we start getting > complacent and writing mediocre software. I'd call it a not-highest bar, not a low bar. Systemd is a low bar. > > Also, you are misrepresenting my position - this is not the first > time, and it's not the first time I'm asking you to do better. > I've never said that the supervision had to be done by pid 1, actually > I insist on the exact opposite: the supervisor *does not* have to > be pid 1. What I am saying, however, is that pid 1 must supervise > *at least one process*, which is a very different thing. I'm sorry. Either I didn't know the preceding, or I forgot it. And supervising one process in PID1 makes a lot more sense than packing an entire supervisor in PID1. > s6-svscan is not a supervisor. It can supervise s6-supervise > processes, yes - that's a part of being suitable as pid 1 - but it's > not the same as being able to supervise any daemon, which is much > harder because "any daemon" is not a known quantity. I understand now. > Supervising a process you control is simple; supervising a process > you don't know the behaviour of, which is what the job of a > "supervisor" is, is more complex. I understand now. Thanks, SteveT
Re: interesting claims
So Laurent's words from http://skarnet.org/software/s6/ were just part of a very minor family quarrel, not a big deal, and nothing to get worked up over. This very minor family quarrel is the whole difference between having and not having a 100% reliable system, which is the whole point of supervision. Yes, obviously sinit and ewontfix init are greatly superior to systemd, sysvinit or what have you. That is a low bar to clear. And the day we're happy with low bars is the day we start getting complacent and writing mediocre software. Also, you are misrepresenting my position - this is not the first time, and it's not the first time I'm asking you to do better. I've never said that the supervision had to be done by pid 1, actually I insist on the exact opposite: the supervisor *does not* have to be pid 1. What I am saying, however, is that pid 1 must supervise *at least one process*, which is a very different thing. s6-svscan is not a supervisor. It can supervise s6-supervise processes, yes - that's a part of being suitable as pid 1 - but it's not the same as being able to supervise any daemon, which is much harder because "any daemon" is not a known quantity. Supervising a process you control is simple; supervising a process you don't know the behaviour of, which is what the job of a "supervisor" is, is more complex. In future presentations, I will make sure to pinpoint the difference. Yes, that is a detail, but this detail is what allows us to make pid 1 both simple (not having the whole supervision logic in pid 1) and correct (covering the case where all processes die). -- Laurent
Re: interesting claims
On Mon, 29 Apr 2019 21:19:58 +0200 Jeff wrote: > i came across some interesting claims recently. on > http://skarnet.org/software/s6/ > it reads > > "suckless init is incorrect, because it has no supervision > capabilities, and thus, killing all processes but init can brick the > machine." Oh, that. First of all, Suckless Init is a PID1 that forks an rc script and then hangs around reaping zombies, but it's not an entire init system. You could make it a complete init system by using the forked rc file to run supervision systems such as daemontools-encore and the supervision part of runit and s6. And of course you'd need a shutdown script that PID1 can call when it gets signals to reboot or poweroff. So Suckless Init is the PID1 part of an init system. It's 83 lines of C. It's not an entire init system. There are three philosophies: 1) The supervision should be done by PID1: Supported by Laurent Bercot 2) The supervision should be done outside of PID1: Perhaps supported by Rich Felker in his http://ewontfix.com/14/ blog. 3) Either is acceptable and greatly superior to systemd, sysvinit, upstart, etc. This is supported by most people who like process supervision. So Laurent's words from http://skarnet.org/software/s6/ were just part of a very minor family quarrel, not a big deal, and nothing to get worked up over. SteveT
Re: interesting claims
On Mon, 29 Apr 2019 21:19:58 +0200 Jeff wrote: > i came across some interesting claims recently. on > http://skarnet.org/software/s6/ > it reads > > "suckless init is incorrect, because it has no supervision > capabilities, and thus, killing all processes but init can brick the > machine." Oh, that. First of all, Suckless Init is a PID1 that forks an rc script and then hangs around reaping zombies. You could use that rc file to run supervision systems such as daemontools-encore and the supervision part of runit and s6. So Suckless Init is the PID1 part of an init system. It's 83 lines of C. There are three philosophies: 1) The supervision should be done by PID1: Supported by Laurent Bercot 2) The supervision should be done outside of PID1: Perhaps supported by Rich Felker in his http://ewontfix.com/about/ > > a rather bold claim IMO ! > where was the "correct" init behaviour specified ? > where can i learn how a "correct" init has to operate ? > or is it true since s6-svscan already provides such respawn > capabilities ? ;-) > > there is actually NO need for a "correct" working init implementation > to provide respawn capabilities at all IMO. > this can easily done in/by a subprocess and has 2 advantages: > > - it simplyfies the init implementation > > - process #1 is the default subprocess reaper on any unix > implementation and hence a lot of terminated zombie subprocesses > get assigned to it, subprocesses that were not started by it. > if it has respawn capabilities it has to find out if any of this > recently assigned but elsewhere terminated subprocesses is one of its > own childs to be respawned. if it has lots of services to respawn > this means lots of unnecessary work that could be also done > in/by a suprocess aswell. > > when do you kill a non supvervised process running with UID 0 > "accidently" ? when calling kill ( -1, SIGTERM ) ? > the kernel protects special/important processes in this case from > being killed "accidently", that's true. > but where do we usually see that ? in the shutdown stage, i guess. > and that's exactly where one wants to kill all process with PID > 1 > (sometimes excluding the calling process since it has to complete > more tasks). or when going into single user mode. > > so this looks like a rather artificial and constructed argument for > the necessity of respawn functionality in an init implementation IMO. > -- SteveT Steve Litt January 2019 featured book: Troubleshooting: Just the Facts http://www.troubleshooters.com/tjust
Re: interesting claims
Jeff: where can i learn how a "correct" init has to operate ? See https://unix.stackexchange.com/a/197472/5132 for starters.
Re: interesting claims
"suckless init is incorrect, because it has no supervision capabilities, and thus, killing all processes but init can brick the machine." a rather bold claim IMO ! where was the "correct" init behaviour specified ? where can i learn how a "correct" init has to operate ? For instance: https://archive.fosdem.org/2017/schedule/event/s6_supervision/ https://www.youtube.com/watch?v=I7qE43KK5bY&t=7591 https://www.reddit.com/r/linux/comments/2dx7k3/s6_skarnetorg_small_secure_supervision_software/cjxc1hj/?context=3 Or, as Guillermo mentioned, several posts in the ML archive. init is a subject that little study has been put into (though it is also the subject of a whole lot of talk, which says something about whether people would rather talk or study). But I think you'll find that things are different around here. or is it true since s6-svscan already provides such respawn capabilities ? ;-) Do not mistake causes for consequences. Things are not correct because s6 does them; s6 does things because they are correct. there is actually NO need for a "correct" working init implementation to provide respawn capabilities at all IMO. Then you are free to use one of the many incorrect inits out there, including sinit, Rich Felker's init, dumb-init, and others. You are definitely not alone with your opinion. However, you sound interested in process supervision, which is part of the more general idea that a machine should be made as reliable as possible *at all times* and *under any circumstances*; if you subscribe to that idea, then you will understand why init must supervise at least 1 process. so this looks like a rather artificial and constructed argument for the necessity of respawn functionality in an init implementation IMO. Maybe you've never bricked a device because init didn't respawn anything. I have. The "rather artificial and constructed argument" happened to me in real life, and it was a significant inconvenience. -- Laurent
Re: interesting claims
El lun., 29 abr. 2019 a las 16:46, Jeff escribió: > > "suckless init is incorrect, because it has no supervision capabilities, > and thus, killing all processes but init can brick the machine." > > a rather bold claim IMO ! > where was the "correct" init behaviour specified ? > where can i learn how a "correct" init has to operate ? > [...] > there is actually NO need for a "correct" working init implementation > to provide respawn capabilities at all IMO. This was discussed in the mailing list, you'll be able to find relevant messages in the archives, and the last part of the sentence you quoted should clarify what "correct" means in this context. But to recap: * A failure mode is identified (the machine becomes unusable and requires a hard reboot), along with the condition that triggers it (death of all processes except #1). * The condition can be triggered explicitly with a kill(-1, SIGKILL) call in a process with root privileges, so by definition it is not an impossible condition, but this is not the only way to trigger it. Processes can die for a variety of reasons. * An program with "respawn capabilities" running as process 1 can avoid entering this failure mode, a program that does not have the capabilities, cannot. Nothing more, nothing less. This is not a statement about how likely this failure mode is, only that it exists. An init system can or cannot choose to prevent it, this is a design choice (and usage of "correct" will give you an idea of what the author of this particular software package thinks), and a person may or may not decide to use an init system that doesn't, this is a matter of preference. G:
interesting claims
i came across some interesting claims recently. on http://skarnet.org/software/s6/ it reads "suckless init is incorrect, because it has no supervision capabilities, and thus, killing all processes but init can brick the machine." a rather bold claim IMO ! where was the "correct" init behaviour specified ? where can i learn how a "correct" init has to operate ? or is it true since s6-svscan already provides such respawn capabilities ? ;-) there is actually NO need for a "correct" working init implementation to provide respawn capabilities at all IMO. this can easily done in/by a subprocess and has 2 advantages: - it simplyfies the init implementation - process #1 is the default subprocess reaper on any unix implementation and hence a lot of terminated zombie subprocesses get assigned to it, subprocesses that were not started by it. if it has respawn capabilities it has to find out if any of this recently assigned but elsewhere terminated subprocesses is one of its own childs to be respawned. if it has lots of services to respawn this means lots of unnecessary work that could be also done in/by a suprocess aswell. when do you kill a non supvervised process running with UID 0 "accidently" ? when calling kill ( -1, SIGTERM ) ? the kernel protects special/important processes in this case from being killed "accidently", that's true. but where do we usually see that ? in the shutdown stage, i guess. and that's exactly where one wants to kill all process with PID > 1 (sometimes excluding the calling process since it has to complete more tasks). or when going into single user mode. so this looks like a rather artificial and constructed argument for the necessity of respawn functionality in an init implementation IMO.