Mats,

On Wed, Nov 15, 2006 at 08:12:31PM +0100, Mats Kindahl wrote:
> Yes, backgrounding the daemon and waiting a second is definitely a hack.
> The original solution was to use a switch --background to inadyn, which
> forks and reassigns the PPID of the background process to init (i.e.,
> PID 1). Since I thought this might be the reason for the problem (i.e.,
> the exiting fork of the inadyn process was considered as "part of the
> service", while the re-parented process was not), I rewrote the script
> to background the service instead.

  The --background behavior you describe is 100% safe, and is in fact
  how many Solaris services function.  Rather than overloading
  sessions, process groups, or simple parentage with additional
  fault-management semantics, we created the notion of a "process
  contract" which is used to track the processes in a service.  If you
  use 'ptree -c', you can see how we track even those processes which
  have been reparented to init.  [ Looking ahead, it appears you've
  already discovered ptree... ]

  If you want your service to be robust in the ways I mentioned, the
  easiest way to do that is to make sure 'daemon --background' doesn't
  exit until the child is ready.  If it doesn't and you can't or don't
  want to modify the daemon, the alternative is to include something in
  your start method which waits while explicitly testing for
  availability.

> OK. I was wondering about what to return. I saw a note that you should
> return an error in this case, but I couldn't see or figure out what
> error code to return.
> 
> As you pointed out, this was exactly what was happening: the script
> exited immediately, and svc.startd tried to stop the service by using
> :kill, which of course failed since the process was out of it's control.
> What I did was rewrite the service manifest to use issue a
> ".../inadyn-client stop" instead, which would then kill the process,
> whereever it were, and then restart it under the control of svc.startd.

  Oooh, SMF wins.  I like it :)

> >> So, the questions are:
> >> - How do svc.startd decide if a process is in the service?
> > 
> >   Every process and subprocess started from the start method of a
> >   service belongs to that service, unless one of those processes
> >   specifically requests otherwise for its children.
> 
> What happens if the process is re-parented? Is that considered as
> "requesting otherwise for its children"? Looking at the output from
> ptree -c, I see the following:
> 
> -bash-3.00$ ptree -c `pgrep -x inadyn`
> [process contract 1]
>   1     /sbin/init
>     [process contract 4]
>       7     /lib/svc/bin/svc.startd
>         [process contract 111]
>           1285  /usr/local/bin/inadyn --background --input_file
> /etc/inadyn.con
> 
> And following up with a:
> 
> -bash-3.00$ ps -o 'pid ppid comm' -p 1285
>   PID  PPID COMMAND
>  1285     1 /usr/local/bin/inadyn
> 
> Clearly demonstrates that it does not seem to affect how it is
> considered to be part of the service. It seems to have to do with
> processes and contracts, which is something I'm not familiar with, so I
> guess I have to read up on those pages.

  The contract(4) and process(4) man pages talk about contracts and
  process contracts, respectively.  (Unfortunately, they are oriented
  more towards developers, and will need some improvement before they
  can be considered good tutorial material.)  In short, the key
  characteristics of process contract membership are:

    A process's process contract never changes

    A forked process, by default, is added to its parent's process
    contract

    A new process contract can be created on fork (with the forked
    child being placed in the new process contract)

  The process which requests that a new process contract be created on
  fork becomes the "owner" of that contract, and can request that
  events be delivered when membership of that contract changes.
  Additionally, if a process exits through unnatural causes (e.g. it
  encountered an uncorrectable hardware error or dumps core), special
  events can be delivered.

  When svc.startd forks to exec a service's start method, it creates a
  new contract for it.  Unless that method or one of its progeny
  creates a new process contract (by consuming the interfaces
  documented in process(4) and libcontract(3lib), or by using the
  ctrun(1) utility), all these processes will be contained in the
  process contract created by svc.startd, and will be recognized as
  part of the "service".

  svc.startd then manages those services by waiting for events.  If you
  are curious about how svc.startd uses process contracts, you can use
  use ctstat(1) to examine the contracts' parameters, and ctwatch(1) to
  observe events as they occur.

  Of course, if you are really curious, there's always the source:

  http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/svc/startd/

  Dave


Reply via email to