[smf-discuss] unintuitive behaviour by SMF

Darren Reed Thu, 15 Jun 2006 16:44:06 +0800

David Powell wrote:

>On Wed, Jun 14, 2006 at 09:56:09PM +0800, Darren Reed wrote:
>
>>So what do I think is wrong here?
>>1) "svcadm disable ntp" has no affect on anything
>>
>
>  False.  It disabled the service.  We assume you want your services
>  shut down cleanly, and part of that is waiting for a partially
>  started service to finish starting cleanly.  The NTP start method has
>  a timeout of 30 minutes; perhaps that should be shortened.
>


Ugh...30 minutes?  How many others have timeouts like this?
I'd say 3 seconds was bordering on too long for ntpdate to return.

>>2) "svcs -x ntp" does not make sense.
>>  "Start method is running" but it is "offline".
>>  This should probably be "i'm disabling myself"
>>
>
>  True, it could print a better message, especially since the "Start
>  method is running" and "offline" combination is printed in the both
>  the starting-up and shutting-down case.  svcs -l isn't ambiguous in
>  this case, though.
>
>
>>3) there is no output from "svcadm" telling me this
>>
>
>  That is because svcadm disable is completely asynchronous.  If you
>  want synchronous behavior, you need to use the -s option.  It will
>  wait until the effect of the requested operation is completed, or
>  give you an error if something is going to prevent that from
>  happening.
>

 From a certain perspective, svcadm being synchronous or
asynchronous is just an implementation detail.  Or to put
it a different way, when svcadm exits, I expect the service
to either have been shut down or be shutting down.
To me, anything else means there is something wrong and
svcadm should be telling me "something is not right."
If this is Big Complicated App, then I would expect it to
be "shutting down" or "on the way to that state".

Or, in short, if the service has failed to achieve any kind
of state transition by the time svcadm returns, then something
is not right and svcadm should inform whatever invoked it that
this was the case.

>>In a conversation with someone last year, they put it
>>to me that if you were putting this in a script, you
>>would just issue the "disable" and no more problems.
>>This is clear evidence that just doing "svcadm disable"
>>is not nearly reliable enough of an action for that
>>kind of treatment.
>>
>
>  If the commands executed after the disable depend on the service not
>  running, then that's incorrect.  You need to use the -s option.
>
>  Otherwise, disable is sufficient.  Subject to the definition of the
>  service, the service will be stopped as soon as possible.
>

I think what I'm looking for here is if the service can't
be stopped 'immediately' then I want to know why not
and I'd like svcadm to tell me that by way of a exit code
or similar.  If I'm going to use "-s" then I want to be
able to specify a timeout.  I would be happy if that meant
that svcadm could say "your timeout is too short" for a
Big Complicated Application but for others, no.

>>My experience to date is that when everything is right
>>and working fine, SMF is nice to deal with.  But when
>>something isn't working right (especially network services)
>>then it is far far worse than what were expected to do
>>with prior Solaris.
>>
>
>  I think that's a little harsh, if not outright incorrect.  With
>  previous versions of Solaris, you had now way to get this information
>  at all.  When "something wasn't working right" and xntpd hung during
>  start-up, you had no way to find out.  If it died, it died silently
>  -- not to mention there was no way to query the system to see if
>  xntpd was in fact supposed to be running.
>

My basis for comparison is previously I would just use ps
to find things that were running but "hung" and use kill until
the things disappeared and restart them manually because
there was no other real way.  For better or worse, it worked
as intended - processes you wanted to die went away and
if you knew how to kill them off, you probably knew how (or
how to find out) to restart them.

So far as diagnosing problems goes, we'd look to see what
ended up in logs maintained by syslogd.  Despite there being
log files maintained by SMF, that is still necessary.  What
is collected in the SMF log files isn't nearly useful enough
to rely on by itself - but that isn't SMF's fault.  So
diagnosing why a service failed to start is largely the same,
except that we have SMF to tell us that it isn't working,
rather than some monitoring script/agent.

>>Personally, I don't care if the service is still doing
>>its start method or something else.  If disable (and -t)
>>are the only proper hooks to stop a service then that
>>is what I expect it to do, not futz around and pretend.
>>
>
>  There is no pretending going on here.  The service will be stopped.
>  SMF is simply doing what the service definition for NTP is telling it
>  to do.  It can't make assumptions about why you are shutting down the
>  service, or whether or you care how the service is shut down.
>
>  Ultimately, svcadm is an administrative interface, not an interface
>  for exacting vengeance on pids you don't want to see anymore.  If I'm
>  disabling a Big Complicated Application, I want SMF to go through the
>  paces, or else my app might not come up again.
>

I agree and understand this part of how it works.  What would
be nice if it was possible for SMF to know the difference between
this is a Big Complicated Application vs Small Incosequential Thing
and could thus know whether or not it was ok to terminate processes
without prejudice (use kill -TERM and/or kill -KILL) that belong
to a specific service.

>  Perhaps we need a property that tells SMF that a service's start
>  method is interruptible and should be short-circuited when the
>  service is disabled while it's running.
>

Yes, I think this would help a lot (and was going to suggest it
myself) as I'm sure this won't be the only start method that
can become hung but also be safely interrupted.

Darren

[smf-discuss] unintuitive behaviour by SMF

Reply via email to