Re: [gentoo-dev] init script guidelines

2005-08-31 Thread Roy Marples
On Wed, 2005-08-31 at 08:13 +0100, Roy Marples wrote:
> Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this.
> Basically when an init script calls start-stop-daemon --start then we
> log what it started (and hopefully a pidfile) in
> ${svcdir}/daemons/${myservice}

in pre7 :)

-- 
Roy Marples <[EMAIL PROTECTED]>
Gentoo Linux Developer


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] init script guidelines

2005-08-31 Thread Georgi Georgiev
maillog: 31/08/2005-09:05:51(+0100): Roy Marples types
> On Wed, 2005-08-31 at 08:13 +0100, Roy Marples wrote:
> > Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this.
> > Basically when an init script calls start-stop-daemon --start then we
> > log what it started (and hopefully a pidfile) in
> > ${svcdir}/daemons/${myservice}
> 
> Forgot to attach a patch for depscan.sh

Not related, but why not apply this as well, while you're at it:

--- /sbin/depscan.sh2005-08-25 17:28:51.0 +0900
+++ /sbin/depscan.sh2005-08-31 17:21:37.0 +0900
@@ -1,7 +1,7 @@
 #!/bin/bash
 # Copyright 1999-2004 Gentoo Foundation
 # Distributed under the terms of the GNU General Public License v2
-# $Header$
+# $Header: $
 
 source /etc/init.d/functions.sh
 
-- 
 /   Georgi Georgiev/ Depart in pieces, i.e., split./
\ [EMAIL PROTECTED]\   \
 /  +81(90)2877-8845/   /
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] init script guidelines

2005-08-31 Thread Roy Marples
On Wed, 2005-08-31 at 08:13 +0100, Roy Marples wrote:
> Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this.
> Basically when an init script calls start-stop-daemon --start then we
> log what it started (and hopefully a pidfile) in
> ${svcdir}/daemons/${myservice}

Forgot to attach a patch for depscan.sh

Roy
--- depscan.sh	2005-08-17 22:04:34.0 +0100
+++ /sbin/depscan.sh	2005-08-31 06:25:11.0 +0100
@@ -16,7 +16,7 @@
 	fi
 fi
 
-for x in softscripts snapshot options \
+for x in softscripts snapshot options daemons \
 	started starting inactive stopping failed \
 	exclusive exitcodes ; do
 	if [[ ! -d "${svcdir}/${x}" ]] ; then


Re: [gentoo-dev] init script guidelines

2005-08-31 Thread Roy Marples
On Tue, 2005-08-23 at 16:09 +0200, Paul de Vrieze wrote:
> What I would really like to see in the init system is a way that 
> initscripts can check whether the services they are responsible for are 
> still running and then adjust their status accordingly, along with some 
> nice output. This would then allow the execution of rc-status to give 
> proper information of actually running daemons, and the "rc" command the 
> possibility to actually bring online all daemons that should be running.
> 
> Paul
> 

Attached is a patch to baselayout-1.12.0_pre6-r3 that allows this.
Basically when an init script calls start-stop-daemon --start then we
log what it started (and hopefully a pidfile) in
${svcdir}/daemons/${myservice}

When it's status is asked for (either init.d/foo status or rc-status)
then we load this daemon file and check to see if the given daemons are
still running. If not then we call init.d/foo stop. We do this instead
of just marking the daemon as stopped in-case there is any clean-up code
that's needed to be run by the init script.

For this to work well, start-stop-daemon needs to be used correctly, not
just to stop it (like most init scripts seem to). sshd is a popular init
script and on most Gentoo'ers systems, so I've attached a patch so show
how init script should use start-stop-daemon so this works correctly.

What do people think about this? Is this worthfile and fixing all the
init scripts in the tree to use start-stop-daemon correctly AND for
starting up?

Thanks

Roy
--- rc-status	2005-08-01 21:26:00.0 +0100
+++ /bin/rc-status	2005-08-31 07:57:15.0 +0100
@@ -31,6 +31,7 @@
 
 # grab settings from conf.d/rc
 source /etc/conf.d/rc
+source "${svclib}/sh/rc-daemon.sh"
 
 
 #  Parse command line options  #
@@ -157,10 +158,19 @@
 #  Now collect information about the status of the various services; whether   #
 #  they're started, broken, or failed.  Put all of this into arrays.   #
 
-# Read services from ${svcdir}/{started,failed,broken}
+if [[ -x ${svcdir}/started ]]; then
+started=$(ls ${svcdir}/started)
+# If we're root then update service statuses incase any naughty daemons
+# stopped running without our say so
+if [[ ${EUID} == 0 ]]; then
+	for service in ${started}; do
+	update_service_status "${service}"
+	done
+	started=$(ls ${svcdir}/started)
+fi
+fi
 [[ -x ${svcdir}/starting ]] && starting=$(ls ${svcdir}/starting)
 [[ -x ${svcdir}/inactive ]] && inactive=$(ls ${svcdir}/inactive)
-[[ -x ${svcdir}/started ]] && started=$(ls ${svcdir}/started)
 [[ -x ${svcdir}/stopping ]] && stopping=$(ls ${svcdir}/stopping)
 
 
--- runscript.sh	2005-08-21 18:08:24.0 +0100
+++ /sbin/runscript.sh	2005-08-31 07:59:30.0 +0100
@@ -413,6 +413,10 @@
 	# to work with the printed " * status:  foo".
 	local efunc="" state=""
 
+	# If we are effectively root, check to see if required daemons are running
+	# and update our status accordingly
+	[[ ${EUID} == 0 ]] && update_service_status "${myservice}"
+
 	if service_starting "${myservice}" ; then
 		efunc="einfo"
 		state="starting"
--- rc-daemon.sh	2005-08-30 07:22:39.0 +0100
+++ /lib/rcscripts/sh/rc-daemon.sh	2005-08-31 07:53:14.0 +0100
@@ -19,6 +19,7 @@
 RC_GOT_DAEMON="yes"
 
 [[ ${RC_GOT_FUNCTIONS} != "yes" ]] && source /sbin/functions.sh
+[[ ${RC_GOT_SERVICES} != "yes" ]] && source "${svclib}/sh/rc-services.sh"
 
 RC_RETRY_KILL="no"
 RC_RETRY_TIMEOUT=1
@@ -285,14 +286,45 @@
 	return "${retval}"
 }
 
+# void update_service_status(char *service)
+#
+# Loads the service state file and ensures that all listed daemons are still
+# running - hopefully on their correct pids too
+# If not, we stop the service
+update_service_status() {
+	local service="$1" daemonfile="${svcdir}/daemons/$1" i
+	local -a RC_DAEMONS=() RC_PIDFILES=()
+
+	# We only care about marking started services as stopped if the daemon(s)
+	# for it are no longer running
+	! service_started "${service}" && return
+	[[ ! -f ${daemonfile} ]] && return
+
+	# OK, now check that every daemon launched is active
+	# If the --start command was any good a pidfile was specified too
+	source "${daemonfile}"
+	for (( i=0; i<[EMAIL PROTECTED]; i++ )); do
+		if ! is_daemon_running ${RC_DAEMONS[i]} "${RC_PIDFILES[i]}" ; then
+			if [[ -e "/etc/init.d/${service}" ]]; then
+/etc/init.d/"${service}" stop &>/dev/null
+break
+			fi
+		fi
+	done
+}
+
 # int start-stop-daemon(...)
 #
 # Provide a wrapper to start-stop-daemon
 # Return the result of start_daemon or stop_daemon depending on
 # how we are called
 start-stop-daemon() {
-	local args=$( requote "$@" )
-	local cmd pidfile pid stopping signal nothing=false
+	local args=$( requote "$@" ) r

Re: [gentoo-dev] init script guidelines

2005-08-23 Thread Paul de Vrieze
On Tuesday 19 July 2005 20:00, Roy Marples wrote:
> On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote:
> > The real problem is not that the daemons don't return errors, but
> > that our init scripts do not make reasonable attempts to verify
> > service startup.  If a Gentoo init script claims that a service
> > started, it should make an effort to check that the processes are
> > actually running shortly after the script is run, even if
> > start-stop-daemon says the parent process initialized.  Relying on
> > the return value of start-stop-daemon is simply insufficient for some
> > services.
>
> I agree.
>
> Infact, rc-services.sh (/lib/rcscripts/sh) has been totally re-written
> for the baselayout-1.12.x branch. It now intercepts calls to
> start-stop-daemon and checks if the daemon is still active after a
> default time of 0.1 (adjustable) seconds. If not, the we assume the
> daemon failed. This solves many existing bugs :)
>
> Also, we kill any rogue processes and other such checks when a stop
> call to start-stop-daemon is made - which is handy for when asterisk
> fails to start and leaves mpg123 processes lying around :)
>
> Check it out when baselayout-1.12.0pre1 hits portage!
>
> Caveat: - some init scripts abuse start-stop-daemon. One example are
> all courier scripts which pass the env program as a daemon. This is
> easily worked around, but we fail badly if env then calls a shell
> script which in turn launches a daemon. Of all the server stuff I run,
> only couier has this issue - but there may be other programs too.
> Basically start-stop-daemon should only call daemons!

What I would really like to see in the init system is a way that 
initscripts can check whether the services they are responsible for are 
still running and then adjust their status accordingly, along with some 
nice output. This would then allow the execution of rc-status to give 
proper information of actually running daemons, and the "rc" command the 
possibility to actually bring online all daemons that should be running.

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: [EMAIL PROTECTED]
Homepage: http://www.devrieze.net


pgpcNIjGYio4j.pgp
Description: PGP signature


RE: [gentoo-dev] init script guidelines

2005-07-19 Thread Roy Marples
On Tue, 2005-07-19 at 23:53 +0200, Martin Schlemmer wrote:
> I know Roy already did the sleep check in rc-services.sh which is small,
> and I think fairly acceptable

0.1 seconds by default. This is adjustable in /etc/conf.d/rc

Roy

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Francesco R
Roy Marples wrote:

>On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote:
>
>
>  
>
>>The real problem is not that the daemons don't return errors, but that our 
>>init
>>scripts do not make reasonable attempts to verify service startup.  If a 
>>Gentoo
>>init script claims that a service started, it should make an effort to check
>>that the processes are actually running shortly after the script is run, even 
>>if
>>start-stop-daemon says the parent process initialized.  Relying on the return
>>value of start-stop-daemon is simply insufficient for some services.
>>
>>
>
>I agree.
>
>Infact, rc-services.sh (/lib/rcscripts/sh) has been totally re-written
>for the baselayout-1.12.x branch. It now intercepts calls to
>start-stop-daemon and checks if the daemon is still active after a
>default time of 0.1 (adjustable) seconds. If not, the we assume the
>daemon failed. This solves many existing bugs :)
>
>Also, we kill any rogue processes and other such checks when a stop call
>to start-stop-daemon is made - which is handy for when asterisk fails to
>start and leaves mpg123 processes lying around :)
>
>Check it out when baselayout-1.12.0pre1 hits portage!
>
>Caveat: - some init scripts abuse start-stop-daemon. One example are all
>courier scripts which pass the env program as a daemon. This is easily
>worked around, but we fail badly if env then calls a shell script which
>in turn launches a daemon. Of all the server stuff I run, only couier
>has this issue - but there may be other programs too. Basically
>start-stop-daemon should only call daemons!
>
>http://bugs.gentoo.org/show_bug.cgi?id=98745
>
>Roy
>  
>

what about to define two additional functions

check_startup() and check_shutdown()

intended to be filled from package mantainer.
The rc scripts can call these one to check if a service is
started/stopped or not.
If not it wait and retry untill a timeout is reached.

This open the road also to centralized policies of waits between check
like :
(1,1,1,1,1,1) (1,2,3,4,5,6) (1,2,4,8,16,32) and other nice stuff.

Francesco
-- 
gentoo-dev@gentoo.org mailing list



RE: [gentoo-dev] init script guidelines

2005-07-19 Thread Martin Schlemmer
On Tue, 2005-07-19 at 14:40 -0400, Chris Gianelloni wrote:
> On Tue, 2005-07-19 at 14:08 -0400, Eric Brown wrote:
> > My point is that Snort and Apache are not alone in this, so I suppose
> > quite a few upstream developers just disagree with us on what proper
> > initialization means.  Why should our users suffer?
> 
> They shouldn't, but that doesn't mean implementing some half-baked hack
> to resolve the situation.  It might be better to instead patch the
> daemon in question and send the patches upstream.  Upstream developers
> (usually) are much more willing to make changes when you've done the
> work for them... ;]
> 

I know Roy already did the sleep check in rc-services.sh which is small,
and I think fairly acceptable, but like Mike said, you cannot make it
longer and then do it for all, as some arches is just too slow, and I'm
going to guess we have a less than 10% of services with this issue?

Personally I think the issue should be taken on a per-package basis, and
if somebody sees an issue, open a bug against snort/apache/whatever to
do a timeout, and then check some or other way if its actually started.

For the developer awareness issue ... its not always such an open/shut
case.  I can't remember what had this issue, but some daemon only
displayed this issues with slower boxes, and not the faster ones, so it
really will totally depend on what type of hardware the developer have
or not.  So yeah, better awareness by adding a section to the developer
manual or something to the test for new developers might help, but not
fool proof.


-- 
Martin Schlemmer



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Chris Gianelloni
On Tue, 2005-07-19 at 16:43 -0400, Michael Cummings wrote:
> not to detract from the discussion, but...anyone else notice this?

He quoted me.  His text was above mine.

People have met me.  They know I exist.  Though Eric might be a figment
of my shattered subconscious psyche.  Who knows?  :P

> On Tue, 19 Jul 2005 14:40:01 -0400
> Chris Gianelloni <[EMAIL PROTECTED]> wrote:
> 
> > They shouldn't, but that doesn't mean implementing some half-baked
> > hack to resolve the situation.  It might be better to instead patch
> > the daemon in question and send the patches upstream.  Upstream
> > developers (usually) are much more willing to make changes when you've
> > done the work for them... ;]
> > 
> 
> On Tue, 19 Jul 2005 15:39:16 -0400
> "Eric Brown" <[EMAIL PROTECTED]> wrote:
> > 
> > They shouldn't, but that doesn't mean implementing some half-baked
> > hack to resolve the situation.  It might be better to instead patch
> > the daemon in question and send the patches upstream.  Upstream
> > developers (usually) are much more willing to make changes when you've
> > done the work for them... ;]
> > 
> 
> I'm beginning to suspect Eric and Chris are the same person. Prove they
> aren't - show evidence of them independently in the same room at the
> same time ;)
> 
> (and being a mid-stream developer, I know *I* like working patches more
> than 'fix your junk, it broke' reports)
-- 
Chris Gianelloni
Release Engineering - Strategic Lead/QA Manager
Games - Developer
Gentoo Linux


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Michael Cummings
not to detract from the discussion, but...anyone else notice this?

On Tue, 19 Jul 2005 14:40:01 -0400
Chris Gianelloni <[EMAIL PROTECTED]> wrote:

> They shouldn't, but that doesn't mean implementing some half-baked
> hack to resolve the situation.  It might be better to instead patch
> the daemon in question and send the patches upstream.  Upstream
> developers (usually) are much more willing to make changes when you've
> done the work for them... ;]
> 

On Tue, 19 Jul 2005 15:39:16 -0400
"Eric Brown" <[EMAIL PROTECTED]> wrote:
> 
> They shouldn't, but that doesn't mean implementing some half-baked
> hack to resolve the situation.  It might be better to instead patch
> the daemon in question and send the patches upstream.  Upstream
> developers (usually) are much more willing to make changes when you've
> done the work for them... ;]
> 

I'm beginning to suspect Eric and Chris are the same person. Prove they
aren't - show evidence of them independently in the same room at the
same time ;)

(and being a mid-stream developer, I know *I* like working patches more
than 'fix your junk, it broke' reports)
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Mike Frysinger
On Tuesday 19 July 2005 02:08 pm, Eric Brown wrote:
> I do see how timing could be an issue for sleeps, but I would personally
> much rather have a timeout variable in conf.d somewhere rather than no
> check at all.

because you're only looking at one side of the race condition

your check goes to sleep for 3 seconds ... then the service starts up but 
because it's a slow CPU, it takes 10 seconds to get to the config file 
parsing where it fails and exits silently ... when the check wakes back up it 
goes 'hey, service is still running, all is good'
-mike
-- 
gentoo-dev@gentoo.org mailing list



RE: [gentoo-dev] init script guidelines

2005-07-19 Thread Eric Brown
Not everyone can patch them, more people would be capable of writing
half-baked hacks that resolve most of the issues.

Anyway I guess the new baselayout sounds promising here.

> My point is that Snort and Apache are not alone in this, so I suppose
> quite a few upstream developers just disagree with us on what proper
> initialization means.  Why should our users suffer?

They shouldn't, but that doesn't mean implementing some half-baked hack
to resolve the situation.  It might be better to instead patch the
daemon in question and send the patches upstream.  Upstream developers
(usually) are much more willing to make changes when you've done the
work for them... ;]



-- 
gentoo-dev@gentoo.org mailing list



RE: [gentoo-dev] init script guidelines

2005-07-19 Thread Chris Gianelloni
On Tue, 2005-07-19 at 14:08 -0400, Eric Brown wrote:
> My point is that Snort and Apache are not alone in this, so I suppose
> quite a few upstream developers just disagree with us on what proper
> initialization means.  Why should our users suffer?

They shouldn't, but that doesn't mean implementing some half-baked hack
to resolve the situation.  It might be better to instead patch the
daemon in question and send the patches upstream.  Upstream developers
(usually) are much more willing to make changes when you've done the
work for them... ;]

-- 
Chris Gianelloni
Release Engineering - Strategic Lead/QA Manager
Games - Developer
Gentoo Linux


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Francesco R
Eric Brown wrote:

> Services that use Gentoo init scripts often report a status of [started] or   
>
> [OK] even though they fail to start.  The most recent bug like this that I've 
>   
>
> found is with snort.  If you have a bad rule, snort will initialize, the   
>
> rc-scripts will give it an [OK] status, and then it will die once it parses 
> the   
>
> rules.   
>
> 
>
> The real problem is not that the daemons don't return errors, but that our 
> init   
>
> scripts do not make reasonable attempts to verify service startup.  If a 
> Gentoo   
>
> init script claims that a service started, it should make an effort to check  
>  
>
> that the processes are actually running shortly after the script is run, even 
> if   
>
> start-stop-daemon says the parent process initialized.  Relying on the return 
>   
>
> value of start-stop-daemon is simply insufficient for some services.   
>
> 
>
> I am aware that there are services that can monitor the status of other 
> services   
>
> (app-admin/mon?) but I think this issue is a little different.  If an ebuild  
>  
>
> developer is aware of an error condition can commonly occur shortly after a   
>
> daemon initializes, why not attempt to catch those errors?  Most of them 
> could   
>
> probably be caught by simply checking to see if the process is still running  
>  
>
> shortly after the script is run.   
>
> 
>
> I propose increasing developer awareness of this problem, perhaps through 
> some   
>
> formal guidelines for ebuild developers.  At the very least, I would like to 
> see   
>
> these bugs being acknowledged in bugs.gentoo.org instead of getting the same 
> old   
>
> upstream/it's not our fault response.  We are responsible for our init 
> scripts,   
>
> and they are important to our users.   
>
> 
>
> I have 2 ideas for the actual implementation:   
>
> 
>
> 1) Some kind of check() function in the init.d script, or a generic check() 
> function   
>
> that just checks with ps | grep.  This might typically be called after having 
> the   
>
> init script sleep for a certain amount of time.   
>
> 
>
> 2) Some kind of special init script that checks registered daemons after all 
> services   
>
> have started. (i.e. it depends on all daemons, or they are put into it’s 
> config file).   
>
> With this scheme we could avoid excessive sleeping during startup (to keep it 
> fast),   
>
> And perhaps even keep using service specific check() functions   
>
>   
>
> Does anyone else think this idea is worth looking into?   
>

http://bugs.gentoo.org/show_bug.cgi?id=90471

We managed this checking for the socket mysql always create on *nix .
But whit a timeout of five seconds if there is no error message nor
socket in that time the script assume the server started.
I'm the first to say that this need to be improved but it's a start.


-- 
gentoo-dev@gentoo.org mailing list



RE: [gentoo-dev] init script guidelines

2005-07-19 Thread Eric Brown
A few responses:
(Please forgive the lack of normal formatting)

1) To Chris Gianelloni

I really do agree that it's silly for a daemon to lie about it's
initialization status.  However, after actually haven taken some of
these issues upstream (in particular Apache 1.3).  I realized that the
upstream devs don't really consider these bugs all of the time.  In
Apache's case, it's a bug, but one that's never going to be fixed in 1.3
(2.0 supposedly fixes it).  I think there was one case where pure-ftpd
actually fixed one of these bugs when I reported it.

My point is that Snort and Apache are not alone in this, so I suppose
quite a few upstream developers just disagree with us on what proper
initialization means.  Why should our users suffer?


2) To Mike Frysinger

Most of these services are pretty common, and the suckage is usually
limited to this area of initialization =)

I do see how timing could be an issue for sleeps, but I would personally
much rather have a timeout variable in conf.d somewhere rather than no
check at all.

I would also much rather have a simple check be performed that produced
false positives itself (which is what the init scripts are doing now),
as long as it cut down on the total number of false positives.


3) To anyone else

So far it looks like developer awareness is the best we can do?
What about making standard functions or check services available to help
developers who are aware and need to use them?

Even if developers just become willing to add checks, that would be
great.  Right now most devs simply rely on upstream (although I think
upstream should certainly be a part of each case).

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Roy Marples
On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote:


> The real problem is not that the daemons don't return errors, but that our 
> init
> scripts do not make reasonable attempts to verify service startup.  If a 
> Gentoo
> init script claims that a service started, it should make an effort to check
> that the processes are actually running shortly after the script is run, even 
> if
> start-stop-daemon says the parent process initialized.  Relying on the return
> value of start-stop-daemon is simply insufficient for some services.

I agree.

Infact, rc-services.sh (/lib/rcscripts/sh) has been totally re-written
for the baselayout-1.12.x branch. It now intercepts calls to
start-stop-daemon and checks if the daemon is still active after a
default time of 0.1 (adjustable) seconds. If not, the we assume the
daemon failed. This solves many existing bugs :)

Also, we kill any rogue processes and other such checks when a stop call
to start-stop-daemon is made - which is handy for when asterisk fails to
start and leaves mpg123 processes lying around :)

Check it out when baselayout-1.12.0pre1 hits portage!

Caveat: - some init scripts abuse start-stop-daemon. One example are all
courier scripts which pass the env program as a daemon. This is easily
worked around, but we fail badly if env then calls a shell script which
in turn launches a daemon. Of all the server stuff I run, only couier
has this issue - but there may be other programs too. Basically
start-stop-daemon should only call daemons!

http://bugs.gentoo.org/show_bug.cgi?id=98745

Roy

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Mike Frysinger
On Tuesday 19 July 2005 12:42 pm, Eric Brown wrote:
> The real problem is not that the daemons don't return errors, but that
> our init scripts do not make reasonable attempts to verify service startup.

i'd disagree ... if a service sucks, it sucks

adding some code to try and guess whether the service actually started is a 
roundabout (and by no means fool proof) way of doing things ... it may result 
in correct results sometimes, but i imagine it'll also be susceptible to 
false positives

> If a Gentoo init script claims that a service started, it should make an
> effort to check that the processes are actually running shortly after the
> script is run

how do you define 'short' ?  really anything that relies on sometime out value 
like this is a flawed design ... just cause your smokin fast amd64 should 
complete in .1 seconds doesnt mean my not-very-smokin-fast-at-all arm 
netwinder can complete inside of 3 seconds

> Relying on the return
> value of start-stop-daemon is simply insufficient for some services.

then those services should not be using ssd

> I propose increasing developer awareness of this problem, perhaps
> through some formal guidelines for ebuild developers.

this seems to be the only feasible approach (and one i'm all for)
-mike
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] init script guidelines

2005-07-19 Thread Chris Gianelloni
On Tue, 2005-07-19 at 12:42 -0400, Eric Brown wrote:
> Services that use Gentoo init scripts often report a status of [started] or
> [OK] even though they fail to start.  The most recent bug like this that I've
> found is with snort.  If you have a bad rule, snort will initialize, the
> rc-scripts will give it an [OK] status, and then it will die once it parses 
> the
> rules.

So snort shouldn't be giving the OK until it really is OK.
>  
> The real problem is not that the daemons don't return errors, but that our 
> init
> scripts do not make reasonable attempts to verify service startup.  If a 
> Gentoo
> init script claims that a service started, it should make an effort to check
> that the processes are actually running shortly after the script is run, even 
> if
> start-stop-daemon says the parent process initialized.  Relying on the return
> value of start-stop-daemon is simply insufficient for some services.

Not really.  An init script is simply a script.  It doesn't guarantee
anything other than what the service told it.  If a service is returning
status codes when it really isn't completed its initialization, that is
a bug in that service, not in the init script code.  While code might
need to be adjusted in the init script, this will most likely require
patches to the upstream sources.
>  
> I am aware that there are services that can monitor the status of other 
> services
> (app-admin/mon?) but I think this issue is a little different.  If an ebuild
> developer is aware of an error condition can commonly occur shortly after a
> daemon initializes, why not attempt to catch those errors?  Most of them could
> probably be caught by simply checking to see if the process is still running
> shortly after the script is run.

I agree with you that we should catch the errors, but running another
check is simply a waste of time.  The service should not ever show a
completed state until it is completed.  It shouldn't ever be like "Yes,
snort worked.. oh wait, no it didn't."  That is even more
confusing for users.

> I propose increasing developer awareness of this problem, perhaps through some
> formal guidelines for ebuild developers.  At the very least, I would like to 
> see
> these bugs being acknowledged in bugs.gentoo.org instead of getting the same 
> old
> upstream/it's not our fault response.  We are responsible for our init 
> scripts,
> and they are important to our users.

You really need to take this up with the developers in question, as this
is not a global matter, but really a matter with specific packages.
Those are bugs in those packages.  If the ebuild maintainers are
refusing to resolve issues in the init scripts, which are definitely
Gentoo works, please take it up with user relations or attempt to
provide a fix for the problem.
>  
> I have 2 ideas for the actual implementation:
>  
> 1) Some kind of check() function in the init.d script, or a generic check() 
> function
> that just checks with ps | grep.  This might typically be called after having 
> the
> init script sleep for a certain amount of time.

I would object to this.  Having a function to check the status of a
service for all of the possible services, when it is only a few that are
showing this error, is a bad idea.  It adds extra load on all developers
that have any init scripts, and is unnecessary in most cases.
>  
> 2) Some kind of special init script that checks registered daemons after all 
> services
> have started. (i.e. it depends on all daemons, or they are put into it’s 
> config file).
> With this scheme we could avoid excessive sleeping during startup (to keep it 
> fast),
> And perhaps even keep using service specific check() functions

This would require much more knowledge on the end-user's part.  Plus, it
will need to be aware of init script dependencies.  All in all, it
sounds like a bad patch for a situation.

-- 
Chris Gianelloni
Release Engineering - Strategic Lead/QA Manager
Games - Developer
Gentoo Linux


signature.asc
Description: This is a digitally signed message part


[gentoo-dev] init script guidelines

2005-07-19 Thread Eric Brown






Services that use Gentoo init scripts often report a status of [started] or[OK] even though they fail to start.  The most recent bug like this that I'vefound is with snort.  If you have a bad rule, snort will initialize, therc-scripts will give it an [OK] status, and then it will die once it parses therules. The real problem is not that the daemons don't return errors, but that our initscripts do not make reasonable attempts to verify service startup.  If a Gentooinit script claims that a service started, it should make an effort to checkthat the processes are actually running shortly after the script is run, even ifstart-stop-daemon says the parent process initialized.  Relying on the returnvalue of start-stop-daemon is simply insufficient for some services. I am aware that there are services that can monitor the status of other services(app-admin/mon?) but I think this issue is a little different.  If an ebuilddeveloper is aware of an error condition can commonly occur shortly after adaemon initializes, why not attempt to catch those errors?  Most of them couldprobably be caught by simply checking to see if the process is still runningshortly after the script is run. I propose increasing developer awareness of this problem, perhaps through someformal guidelines for ebuild developers.  At the very least, I would like to seethese bugs being acknowledged in bugs.gentoo.org instead of getting the same oldupstream/it's not our fault response.  We are responsible for our init scripts,and they are important to our users. I have 2 ideas for the actual implementation: 1) Some kind of check() function in the init.d script, or a generic check() functionthat just checks with ps | grep.  This might typically be called after having theinit script sleep for a certain amount of time. 2) Some kind of special init script that checks registered daemons after all serviceshave started. (i.e. it depends on all daemons, or they are put into it’s config file).With this scheme we could avoid excessive sleeping during startup (to keep it fast),And perhaps even keep using service specific check() functions  Does anyone else think this idea is worth looking into?