Re: [gentoo-dev] init script guidelines
On Wed, 2005-08-31 at 08:13 +0100, Roy Marples wrote: > Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this. > Basically when an init script calls start-stop-daemon --start then we > log what it started (and hopefully a pidfile) in > ${svcdir}/daemons/${myservice} in pre7 :) -- Roy Marples <[EMAIL PROTECTED]> Gentoo Linux Developer signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] init script guidelines
maillog: 31/08/2005-09:05:51(+0100): Roy Marples types > On Wed, 2005-08-31 at 08:13 +0100, Roy Marples wrote: > > Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this. > > Basically when an init script calls start-stop-daemon --start then we > > log what it started (and hopefully a pidfile) in > > ${svcdir}/daemons/${myservice} > > Forgot to attach a patch for depscan.sh Not related, but why not apply this as well, while you're at it: --- /sbin/depscan.sh2005-08-25 17:28:51.0 +0900 +++ /sbin/depscan.sh2005-08-31 17:21:37.0 +0900 @@ -1,7 +1,7 @@ #!/bin/bash # Copyright 1999-2004 Gentoo Foundation # Distributed under the terms of the GNU General Public License v2 -# $Header$ +# $Header: $ source /etc/init.d/functions.sh -- / Georgi Georgiev/ Depart in pieces, i.e., split./ \ [EMAIL PROTECTED]\ \ / +81(90)2877-8845/ / -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] init script guidelines
On Wed, 2005-08-31 at 08:13 +0100, Roy Marples wrote: > Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this. > Basically when an init script calls start-stop-daemon --start then we > log what it started (and hopefully a pidfile) in > ${svcdir}/daemons/${myservice} Forgot to attach a patch for depscan.sh Roy --- depscan.sh 2005-08-17 22:04:34.0 +0100 +++ /sbin/depscan.sh 2005-08-31 06:25:11.0 +0100 @@ -16,7 +16,7 @@ fi fi -for x in softscripts snapshot options \ +for x in softscripts snapshot options daemons \ started starting inactive stopping failed \ exclusive exitcodes ; do if [[ ! -d "${svcdir}/${x}" ]] ; then
Re: [gentoo-dev] init script guidelines
On Tue, 2005-08-23 at 16:09 +0200, Paul de Vrieze wrote: > What I would really like to see in the init system is a way that > initscripts can check whether the services they are responsible for are > still running and then adjust their status accordingly, along with some > nice output. This would then allow the execution of rc-status to give > proper information of actually running daemons, and the "rc" command the > possibility to actually bring online all daemons that should be running. > > Paul > Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this. Basically when an init script calls start-stop-daemon --start then we log what it started (and hopefully a pidfile) in ${svcdir}/daemons/${myservice} When it's status is asked for (either init.d/foo status or rc-status) then we load this daemon file and check to see if the given daemons are still running. If not then we call init.d/foo stop. We do this instead of just marking the daemon as stopped in-case there is any clean-up code that's needed to be run by the init script. For this to work well, start-stop-daemon needs to be used correctly, not just to stop it (like most init scripts seem to). sshd is a popular init script and on most Gentoo'ers systems, so I've attached a patch so show how init script should use start-stop-daemon so this works correctly. What do people think about this? Is this worthfile and fixing all the init scripts in the tree to use start-stop-daemon correctly AND for starting up? Thanks Roy --- rc-status 2005-08-01 21:26:00.0 +0100 +++ /bin/rc-status 2005-08-31 07:57:15.0 +0100 @@ -31,6 +31,7 @@ # grab settings from conf.d/rc source /etc/conf.d/rc +source "${svclib}/sh/rc-daemon.sh" # Parse command line options # @@ -157,10 +158,19 @@ # Now collect information about the status of the various services; whether # # they're started, broken, or failed. Put all of this into arrays. # -# Read services from ${svcdir}/{started,failed,broken} +if [[ -x ${svcdir}/started ]]; then +started=$(ls ${svcdir}/started) +# If we're root then update service statuses incase any naughty daemons +# stopped running without our say so +if [[ ${EUID} == 0 ]]; then + for service in ${started}; do + update_service_status "${service}" + done + started=$(ls ${svcdir}/started) +fi +fi [[ -x ${svcdir}/starting ]] && starting=$(ls ${svcdir}/starting) [[ -x ${svcdir}/inactive ]] && inactive=$(ls ${svcdir}/inactive) -[[ -x ${svcdir}/started ]] && started=$(ls ${svcdir}/started) [[ -x ${svcdir}/stopping ]] && stopping=$(ls ${svcdir}/stopping) --- runscript.sh 2005-08-21 18:08:24.0 +0100 +++ /sbin/runscript.sh 2005-08-31 07:59:30.0 +0100 @@ -413,6 +413,10 @@ # to work with the printed " * status: foo". local efunc="" state="" + # If we are effectively root, check to see if required daemons are running + # and update our status accordingly + [[ ${EUID} == 0 ]] && update_service_status "${myservice}" + if service_starting "${myservice}" ; then efunc="einfo" state="starting" --- rc-daemon.sh 2005-08-30 07:22:39.0 +0100 +++ /lib/rcscripts/sh/rc-daemon.sh 2005-08-31 07:53:14.0 +0100 @@ -19,6 +19,7 @@ RC_GOT_DAEMON="yes" [[ ${RC_GOT_FUNCTIONS} != "yes" ]] && source /sbin/functions.sh +[[ ${RC_GOT_SERVICES} != "yes" ]] && source "${svclib}/sh/rc-services.sh" RC_RETRY_KILL="no" RC_RETRY_TIMEOUT=1 @@ -285,14 +286,45 @@ return "${retval}" } +# void update_service_status(char *service) +# +# Loads the service state file and ensures that all listed daemons are still +# running - hopefully on their correct pids too +# If not, we stop the service +update_service_status() { + local service="$1" daemonfile="${svcdir}/daemons/$1" i + local -a RC_DAEMONS=() RC_PIDFILES=() + + # We only care about marking started services as stopped if the daemon(s) + # for it are no longer running + ! service_started "${service}" && return + [[ ! -f ${daemonfile} ]] && return + + # OK, now check that every daemon launched is active + # If the --start command was any good a pidfile was specified too + source "${daemonfile}" + for (( i=0; i<[EMAIL PROTECTED]; i++ )); do + if ! is_daemon_running ${RC_DAEMONS[i]} "${RC_PIDFILES[i]}" ; then + if [[ -e "/etc/init.d/${service}" ]]; then +/etc/init.d/"${service}" stop &>/dev/null +break + fi + fi + done +} + # int start-stop-daemon(...) # # Provide a wrapper to start-stop-daemon # Return the result of start_daemon or stop_daemon depending on # how we are called start-stop-daemon() { - local args=$( requote "$@" ) - local cmd pidfile pid stopping signal nothing=false + local args=$( requote "$@" ) r
Re: [gentoo-dev] init script guidelines
On Tuesday 19 July 2005 20:00, Roy Marples wrote: > On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote: > > The real problem is not that the daemons don't return errors, but > > that our init scripts do not make reasonable attempts to verify > > service startup. If a Gentoo init script claims that a service > > started, it should make an effort to check that the processes are > > actually running shortly after the script is run, even if > > start-stop-daemon says the parent process initialized. Relying on > > the return value of start-stop-daemon is simply insufficient for some > > services. > > I agree. > > Infact, rc-services.sh (/lib/rcscripts/sh) has been totally re-written > for the baselayout-1.12.x branch. It now intercepts calls to > start-stop-daemon and checks if the daemon is still active after a > default time of 0.1 (adjustable) seconds. If not, the we assume the > daemon failed. This solves many existing bugs :) > > Also, we kill any rogue processes and other such checks when a stop > call to start-stop-daemon is made - which is handy for when asterisk > fails to start and leaves mpg123 processes lying around :) > > Check it out when baselayout-1.12.0pre1 hits portage! > > Caveat: - some init scripts abuse start-stop-daemon. One example are > all courier scripts which pass the env program as a daemon. This is > easily worked around, but we fail badly if env then calls a shell > script which in turn launches a daemon. Of all the server stuff I run, > only couier has this issue - but there may be other programs too. > Basically start-stop-daemon should only call daemons! What I would really like to see in the init system is a way that initscripts can check whether the services they are responsible for are still running and then adjust their status accordingly, along with some nice output. This would then allow the execution of rc-status to give proper information of actually running daemons, and the "rc" command the possibility to actually bring online all daemons that should be running. Paul -- Paul de Vrieze Gentoo Developer Mail: [EMAIL PROTECTED] Homepage: http://www.devrieze.net pgpcNIjGYio4j.pgp Description: PGP signature
RE: [gentoo-dev] init script guidelines
On Tue, 2005-07-19 at 23:53 +0200, Martin Schlemmer wrote: > I know Roy already did the sleep check in rc-services.sh which is small, > and I think fairly acceptable 0.1 seconds by default. This is adjustable in /etc/conf.d/rc Roy -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] init script guidelines
Roy Marples wrote: >On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote: > > > > >>The real problem is not that the daemons don't return errors, but that our >>init >>scripts do not make reasonable attempts to verify service startup. If a >>Gentoo >>init script claims that a service started, it should make an effort to check >>that the processes are actually running shortly after the script is run, even >>if >>start-stop-daemon says the parent process initialized. Relying on the return >>value of start-stop-daemon is simply insufficient for some services. >> >> > >I agree. > >Infact, rc-services.sh (/lib/rcscripts/sh) has been totally re-written >for the baselayout-1.12.x branch. It now intercepts calls to >start-stop-daemon and checks if the daemon is still active after a >default time of 0.1 (adjustable) seconds. If not, the we assume the >daemon failed. This solves many existing bugs :) > >Also, we kill any rogue processes and other such checks when a stop call >to start-stop-daemon is made - which is handy for when asterisk fails to >start and leaves mpg123 processes lying around :) > >Check it out when baselayout-1.12.0pre1 hits portage! > >Caveat: - some init scripts abuse start-stop-daemon. One example are all >courier scripts which pass the env program as a daemon. This is easily >worked around, but we fail badly if env then calls a shell script which >in turn launches a daemon. Of all the server stuff I run, only couier >has this issue - but there may be other programs too. Basically >start-stop-daemon should only call daemons! > >http://bugs.gentoo.org/show_bug.cgi?id=98745 > >Roy > > what about to define two additional functions check_startup() and check_shutdown() intended to be filled from package mantainer. The rc scripts can call these one to check if a service is started/stopped or not. If not it wait and retry untill a timeout is reached. This open the road also to centralized policies of waits between check like : (1,1,1,1,1,1) (1,2,3,4,5,6) (1,2,4,8,16,32) and other nice stuff. Francesco -- gentoo-dev@gentoo.org mailing list
RE: [gentoo-dev] init script guidelines
On Tue, 2005-07-19 at 14:40 -0400, Chris Gianelloni wrote: > On Tue, 2005-07-19 at 14:08 -0400, Eric Brown wrote: > > My point is that Snort and Apache are not alone in this, so I suppose > > quite a few upstream developers just disagree with us on what proper > > initialization means. Why should our users suffer? > > They shouldn't, but that doesn't mean implementing some half-baked hack > to resolve the situation. It might be better to instead patch the > daemon in question and send the patches upstream. Upstream developers > (usually) are much more willing to make changes when you've done the > work for them... ;] > I know Roy already did the sleep check in rc-services.sh which is small, and I think fairly acceptable, but like Mike said, you cannot make it longer and then do it for all, as some arches is just too slow, and I'm going to guess we have a less than 10% of services with this issue? Personally I think the issue should be taken on a per-package basis, and if somebody sees an issue, open a bug against snort/apache/whatever to do a timeout, and then check some or other way if its actually started. For the developer awareness issue ... its not always such an open/shut case. I can't remember what had this issue, but some daemon only displayed this issues with slower boxes, and not the faster ones, so it really will totally depend on what type of hardware the developer have or not. So yeah, better awareness by adding a section to the developer manual or something to the test for new developers might help, but not fool proof. -- Martin Schlemmer signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] init script guidelines
On Tue, 2005-07-19 at 16:43 -0400, Michael Cummings wrote: > not to detract from the discussion, but...anyone else notice this? He quoted me. His text was above mine. People have met me. They know I exist. Though Eric might be a figment of my shattered subconscious psyche. Who knows? :P > On Tue, 19 Jul 2005 14:40:01 -0400 > Chris Gianelloni <[EMAIL PROTECTED]> wrote: > > > They shouldn't, but that doesn't mean implementing some half-baked > > hack to resolve the situation. It might be better to instead patch > > the daemon in question and send the patches upstream. Upstream > > developers (usually) are much more willing to make changes when you've > > done the work for them... ;] > > > > On Tue, 19 Jul 2005 15:39:16 -0400 > "Eric Brown" <[EMAIL PROTECTED]> wrote: > > > > They shouldn't, but that doesn't mean implementing some half-baked > > hack to resolve the situation. It might be better to instead patch > > the daemon in question and send the patches upstream. Upstream > > developers (usually) are much more willing to make changes when you've > > done the work for them... ;] > > > > I'm beginning to suspect Eric and Chris are the same person. Prove they > aren't - show evidence of them independently in the same room at the > same time ;) > > (and being a mid-stream developer, I know *I* like working patches more > than 'fix your junk, it broke' reports) -- Chris Gianelloni Release Engineering - Strategic Lead/QA Manager Games - Developer Gentoo Linux signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] init script guidelines
not to detract from the discussion, but...anyone else notice this? On Tue, 19 Jul 2005 14:40:01 -0400 Chris Gianelloni <[EMAIL PROTECTED]> wrote: > They shouldn't, but that doesn't mean implementing some half-baked > hack to resolve the situation. It might be better to instead patch > the daemon in question and send the patches upstream. Upstream > developers (usually) are much more willing to make changes when you've > done the work for them... ;] > On Tue, 19 Jul 2005 15:39:16 -0400 "Eric Brown" <[EMAIL PROTECTED]> wrote: > > They shouldn't, but that doesn't mean implementing some half-baked > hack to resolve the situation. It might be better to instead patch > the daemon in question and send the patches upstream. Upstream > developers (usually) are much more willing to make changes when you've > done the work for them... ;] > I'm beginning to suspect Eric and Chris are the same person. Prove they aren't - show evidence of them independently in the same room at the same time ;) (and being a mid-stream developer, I know *I* like working patches more than 'fix your junk, it broke' reports) -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] init script guidelines
On Tuesday 19 July 2005 02:08 pm, Eric Brown wrote: > I do see how timing could be an issue for sleeps, but I would personally > much rather have a timeout variable in conf.d somewhere rather than no > check at all. because you're only looking at one side of the race condition your check goes to sleep for 3 seconds ... then the service starts up but because it's a slow CPU, it takes 10 seconds to get to the config file parsing where it fails and exits silently ... when the check wakes back up it goes 'hey, service is still running, all is good' -mike -- gentoo-dev@gentoo.org mailing list
RE: [gentoo-dev] init script guidelines
Not everyone can patch them, more people would be capable of writing half-baked hacks that resolve most of the issues. Anyway I guess the new baselayout sounds promising here. > My point is that Snort and Apache are not alone in this, so I suppose > quite a few upstream developers just disagree with us on what proper > initialization means. Why should our users suffer? They shouldn't, but that doesn't mean implementing some half-baked hack to resolve the situation. It might be better to instead patch the daemon in question and send the patches upstream. Upstream developers (usually) are much more willing to make changes when you've done the work for them... ;] -- gentoo-dev@gentoo.org mailing list
RE: [gentoo-dev] init script guidelines
On Tue, 2005-07-19 at 14:08 -0400, Eric Brown wrote: > My point is that Snort and Apache are not alone in this, so I suppose > quite a few upstream developers just disagree with us on what proper > initialization means. Why should our users suffer? They shouldn't, but that doesn't mean implementing some half-baked hack to resolve the situation. It might be better to instead patch the daemon in question and send the patches upstream. Upstream developers (usually) are much more willing to make changes when you've done the work for them... ;] -- Chris Gianelloni Release Engineering - Strategic Lead/QA Manager Games - Developer Gentoo Linux signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] init script guidelines
Eric Brown wrote: > Services that use Gentoo init scripts often report a status of [started] or > > [OK] even though they fail to start. The most recent bug like this that I've > > > found is with snort. If you have a bad rule, snort will initialize, the > > rc-scripts will give it an [OK] status, and then it will die once it parses > the > > rules. > > > > The real problem is not that the daemons don't return errors, but that our > init > > scripts do not make reasonable attempts to verify service startup. If a > Gentoo > > init script claims that a service started, it should make an effort to check > > > that the processes are actually running shortly after the script is run, even > if > > start-stop-daemon says the parent process initialized. Relying on the return > > > value of start-stop-daemon is simply insufficient for some services. > > > > I am aware that there are services that can monitor the status of other > services > > (app-admin/mon?) but I think this issue is a little different. If an ebuild > > > developer is aware of an error condition can commonly occur shortly after a > > daemon initializes, why not attempt to catch those errors? Most of them > could > > probably be caught by simply checking to see if the process is still running > > > shortly after the script is run. > > > > I propose increasing developer awareness of this problem, perhaps through > some > > formal guidelines for ebuild developers. At the very least, I would like to > see > > these bugs being acknowledged in bugs.gentoo.org instead of getting the same > old > > upstream/it's not our fault response. We are responsible for our init > scripts, > > and they are important to our users. > > > > I have 2 ideas for the actual implementation: > > > > 1) Some kind of check() function in the init.d script, or a generic check() > function > > that just checks with ps | grep. This might typically be called after having > the > > init script sleep for a certain amount of time. > > > > 2) Some kind of special init script that checks registered daemons after all > services > > have started. (i.e. it depends on all daemons, or they are put into it’s > config file). > > With this scheme we could avoid excessive sleeping during startup (to keep it > fast), > > And perhaps even keep using service specific check() functions > > > > Does anyone else think this idea is worth looking into? > http://bugs.gentoo.org/show_bug.cgi?id=90471 We managed this checking for the socket mysql always create on *nix . But whit a timeout of five seconds if there is no error message nor socket in that time the script assume the server started. I'm the first to say that this need to be improved but it's a start. -- gentoo-dev@gentoo.org mailing list
RE: [gentoo-dev] init script guidelines
A few responses: (Please forgive the lack of normal formatting) 1) To Chris Gianelloni I really do agree that it's silly for a daemon to lie about it's initialization status. However, after actually haven taken some of these issues upstream (in particular Apache 1.3). I realized that the upstream devs don't really consider these bugs all of the time. In Apache's case, it's a bug, but one that's never going to be fixed in 1.3 (2.0 supposedly fixes it). I think there was one case where pure-ftpd actually fixed one of these bugs when I reported it. My point is that Snort and Apache are not alone in this, so I suppose quite a few upstream developers just disagree with us on what proper initialization means. Why should our users suffer? 2) To Mike Frysinger Most of these services are pretty common, and the suckage is usually limited to this area of initialization =) I do see how timing could be an issue for sleeps, but I would personally much rather have a timeout variable in conf.d somewhere rather than no check at all. I would also much rather have a simple check be performed that produced false positives itself (which is what the init scripts are doing now), as long as it cut down on the total number of false positives. 3) To anyone else So far it looks like developer awareness is the best we can do? What about making standard functions or check services available to help developers who are aware and need to use them? Even if developers just become willing to add checks, that would be great. Right now most devs simply rely on upstream (although I think upstream should certainly be a part of each case). -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] init script guidelines
On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote: > The real problem is not that the daemons don't return errors, but that our > init > scripts do not make reasonable attempts to verify service startup. If a > Gentoo > init script claims that a service started, it should make an effort to check > that the processes are actually running shortly after the script is run, even > if > start-stop-daemon says the parent process initialized. Relying on the return > value of start-stop-daemon is simply insufficient for some services. I agree. Infact, rc-services.sh (/lib/rcscripts/sh) has been totally re-written for the baselayout-1.12.x branch. It now intercepts calls to start-stop-daemon and checks if the daemon is still active after a default time of 0.1 (adjustable) seconds. If not, the we assume the daemon failed. This solves many existing bugs :) Also, we kill any rogue processes and other such checks when a stop call to start-stop-daemon is made - which is handy for when asterisk fails to start and leaves mpg123 processes lying around :) Check it out when baselayout-1.12.0pre1 hits portage! Caveat: - some init scripts abuse start-stop-daemon. One example are all courier scripts which pass the env program as a daemon. This is easily worked around, but we fail badly if env then calls a shell script which in turn launches a daemon. Of all the server stuff I run, only couier has this issue - but there may be other programs too. Basically start-stop-daemon should only call daemons! http://bugs.gentoo.org/show_bug.cgi?id=98745 Roy -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] init script guidelines
On Tuesday 19 July 2005 12:42 pm, Eric Brown wrote: > The real problem is not that the daemons don't return errors, but that > our init scripts do not make reasonable attempts to verify service startup. i'd disagree ... if a service sucks, it sucks adding some code to try and guess whether the service actually started is a roundabout (and by no means fool proof) way of doing things ... it may result in correct results sometimes, but i imagine it'll also be susceptible to false positives > If a Gentoo init script claims that a service started, it should make an > effort to check that the processes are actually running shortly after the > script is run how do you define 'short' ? really anything that relies on sometime out value like this is a flawed design ... just cause your smokin fast amd64 should complete in .1 seconds doesnt mean my not-very-smokin-fast-at-all arm netwinder can complete inside of 3 seconds > Relying on the return > value of start-stop-daemon is simply insufficient for some services. then those services should not be using ssd > I propose increasing developer awareness of this problem, perhaps > through some formal guidelines for ebuild developers. this seems to be the only feasible approach (and one i'm all for) -mike -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] init script guidelines
On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote: > Services that use Gentoo init scripts often report a status of [started] or > [OK] even though they fail to start. The most recent bug like this that I've > found is with snort. If you have a bad rule, snort will initialize, the > rc-scripts will give it an [OK] status, and then it will die once it parses > the > rules. So snort shouldn't be giving the OK until it really is OK. > > The real problem is not that the daemons don't return errors, but that our > init > scripts do not make reasonable attempts to verify service startup. If a > Gentoo > init script claims that a service started, it should make an effort to check > that the processes are actually running shortly after the script is run, even > if > start-stop-daemon says the parent process initialized. Relying on the return > value of start-stop-daemon is simply insufficient for some services. Not really. An init script is simply a script. It doesn't guarantee anything other than what the service told it. If a service is returning status codes when it really isn't completed its initialization, that is a bug in that service, not in the init script code. While code might need to be adjusted in the init script, this will most likely require patches to the upstream sources. > > I am aware that there are services that can monitor the status of other > services > (app-admin/mon?) but I think this issue is a little different. If an ebuild > developer is aware of an error condition can commonly occur shortly after a > daemon initializes, why not attempt to catch those errors? Most of them could > probably be caught by simply checking to see if the process is still running > shortly after the script is run. I agree with you that we should catch the errors, but running another check is simply a waste of time. The service should not ever show a completed state until it is completed. It shouldn't ever be like "Yes, snort worked.. oh wait, no it didn't." That is even more confusing for users. > I propose increasing developer awareness of this problem, perhaps through some > formal guidelines for ebuild developers. At the very least, I would like to > see > these bugs being acknowledged in bugs.gentoo.org instead of getting the same > old > upstream/it's not our fault response. We are responsible for our init > scripts, > and they are important to our users. You really need to take this up with the developers in question, as this is not a global matter, but really a matter with specific packages. Those are bugs in those packages. If the ebuild maintainers are refusing to resolve issues in the init scripts, which are definitely Gentoo works, please take it up with user relations or attempt to provide a fix for the problem. > > I have 2 ideas for the actual implementation: > > 1) Some kind of check() function in the init.d script, or a generic check() > function > that just checks with ps | grep. This might typically be called after having > the > init script sleep for a certain amount of time. I would object to this. Having a function to check the status of a service for all of the possible services, when it is only a few that are showing this error, is a bad idea. It adds extra load on all developers that have any init scripts, and is unnecessary in most cases. > > 2) Some kind of special init script that checks registered daemons after all > services > have started. (i.e. it depends on all daemons, or they are put into it’s > config file). > With this scheme we could avoid excessive sleeping during startup (to keep it > fast), > And perhaps even keep using service specific check() functions This would require much more knowledge on the end-user's part. Plus, it will need to be aware of init script dependencies. All in all, it sounds like a bad patch for a situation. -- Chris Gianelloni Release Engineering - Strategic Lead/QA Manager Games - Developer Gentoo Linux signature.asc Description: This is a digitally signed message part
[gentoo-dev] init script guidelines
Services that use Gentoo init scripts often report a status of [started] or[OK] even though they fail to start. The most recent bug like this that I'vefound is with snort. If you have a bad rule, snort will initialize, therc-scripts will give it an [OK] status, and then it will die once it parses therules. The real problem is not that the daemons don't return errors, but that our initscripts do not make reasonable attempts to verify service startup. If a Gentooinit script claims that a service started, it should make an effort to checkthat the processes are actually running shortly after the script is run, even ifstart-stop-daemon says the parent process initialized. Relying on the returnvalue of start-stop-daemon is simply insufficient for some services. I am aware that there are services that can monitor the status of other services(app-admin/mon?) but I think this issue is a little different. If an ebuilddeveloper is aware of an error condition can commonly occur shortly after adaemon initializes, why not attempt to catch those errors? Most of them couldprobably be caught by simply checking to see if the process is still runningshortly after the script is run. I propose increasing developer awareness of this problem, perhaps through someformal guidelines for ebuild developers. At the very least, I would like to seethese bugs being acknowledged in bugs.gentoo.org instead of getting the same oldupstream/it's not our fault response. We are responsible for our init scripts,and they are important to our users. I have 2 ideas for the actual implementation: 1) Some kind of check() function in the init.d script, or a generic check() functionthat just checks with ps | grep. This might typically be called after having theinit script sleep for a certain amount of time. 2) Some kind of special init script that checks registered daemons after all serviceshave started. (i.e. it depends on all daemons, or they are put into it’s config file).With this scheme we could avoid excessive sleeping during startup (to keep it fast),And perhaps even keep using service specific check() functions Does anyone else think this idea is worth looking into?