Hello David, David Powell wrote: > [ re-sending to avoid opensolaris bounce ] > > Mats, > > On Mon, Nov 13, 2006 at 08:06:43PM +0100, Mats Kindahl wrote: >> #!/sbin/sh >> >> . /lib/svc/share/smf_include.sh >> >> NAME=inadyn >> DAEMON=/usr/local/bin/$NAME >> . >> . >> . >> case $1 in >> start) >> if [ ! -x $DAEMON ]; then >> echo "$DAEMON is not executable, not starting" >> exit $SMF_EXIT_ERR_CONFIG >> fi >> if [ ! -r $CONF ]; then >> echo "$CONF not readable, not starting" >> exit $SMF_EXIT_ERR_CONFIG >> fi >> if pgrep $NAME >/dev/null; then >> echo "'$NAME' is already running" >> else >> $DAEMON & >> sleep 1 >> fi >> ;; >> . >> . >> . >> esac >> >> exit $SMF_EXIT_OK >> >> When I call the 'inadyn-client' script directly, the daemon starts well >> and keeps going without a glitch, but when svc.startd tries to start it, >> it decides somehow that "all processes in the service exited". AFAICT, >> this is not the case. > > As others have pointed out, pgrep will find the script itself, and > needs to be more selective. It sounds like you have done so and > achieved some degree of success.
Yes, it worked fine to solve the immediate problem, but as you are pointing out, there are other problems (some of which I managed to solve, albeit not in an entirely satisfactory manner). > There are deeper problems, though. The start method interface > requires that when you return SMF_EXIT_OK, your service has been > started and is ready to offer service. Services which depend on > yours can be started as soon as you return; if you aren't really > ready, they might fail. Backgrounding the daemon and waiting a > second is, frankly, a hack that won't always work. That said, you > can get away with this as long as your service doesn't have > dependents, or the dependents are robust enough to tolerate a lack of > availability. Yes, backgrounding the daemon and waiting a second is definitely a hack. The original solution was to use a switch --background to inadyn, which forks and reassigns the PPID of the background process to init (i.e., PID 1). Since I thought this might be the reason for the problem (i.e., the exiting fork of the inadyn process was considered as "part of the service", while the re-parented process was not), I rewrote the script to background the service instead. > The more serious problem is that if the daemon (or an unfortunately > named program also found by pgrep) is already running, your service > will fail once again because "all processes in the service exited". > SMF can only monitor the health of processes it starts, and moreover > has no way to know that some other set of processes should be > important to it. If your start method thinks it has found another > copy of the daemon running, it should return SMF_EXIT_ERR_NOSMF. The > service will still fail, but will do so for a legitimate, > understandable reason. OK. I was wondering about what to return. I saw a note that you should return an error in this case, but I couldn't see or figure out what error code to return. As you pointed out, this was exactly what was happening: the script exited immediately, and svc.startd tried to stop the service by using :kill, which of course failed since the process was out of it's control. What I did was rewrite the service manifest to use issue a ".../inadyn-client stop" instead, which would then kill the process, whereever it were, and then restart it under the control of svc.startd. > >> So, the questions are: >> - How do svc.startd decide if a process is in the service? > > Every process and subprocess started from the start method of a > service belongs to that service, unless one of those processes > specifically requests otherwise for its children. What happens if the process is re-parented? Is that considered as "requesting otherwise for its children"? Looking at the output from ptree -c, I see the following: -bash-3.00$ ptree -c `pgrep -x inadyn` [process contract 1] 1 /sbin/init [process contract 4] 7 /lib/svc/bin/svc.startd [process contract 111] 1285 /usr/local/bin/inadyn --background --input_file /etc/inadyn.con And following up with a: -bash-3.00$ ps -o 'pid ppid comm' -p 1285 PID PPID COMMAND 1285 1 /usr/local/bin/inadyn Clearly demonstrates that it does not seem to affect how it is considered to be part of the service. It seems to have to do with processes and contracts, which is something I'm not familiar with, so I guess I have to read up on those pages. Anyway, thanks for all the help! I have some reading to catch up with. Best wishes, Mats Kindahl -- Mats Kindahl Replication Team MySQL AB, www.mysql.com -------------- next part -------------- A non-text attachment was scrubbed... Name: mats.vcf Type: text/x-vcard Size: 199 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/smf-discuss/attachments/20061115/11abd3b9/attachment.vcf>