[smf-discuss] Restart after SMF start timeout

Liane Praza Fri, 18 Apr 2008 11:03:41 -0700

Renaud Manus wrote:
> This is RFE 6450661 "start method failures should be configurable"


Yep, mostly.  For RFE 6450661 (and its friends 6197273 and 6219078, 
which should probably be consolidated), I haven't yet seen a specific 
request that wouldn't be better solved by more intelligent restart 
detection by startd rather than allowing significant tweaking of the 
algorithm through configuration.

So, I would like a clarification on the scenario below.

> Neil Garthwaite wrote:
>> Hi,
>>
>> After reading the svc.startd (1M) man page and in particular "SERVICE  
>> FAILURE", I'm trying to find if I can influence
>>
>> "...
>>       If three method failures happen in a row, or if the  service
>>       is restarting more than once a second, svc.startd places the
>>       service in the maintenance state.
>> ..."
>>
>> It appears that even a transient service gets retried three times if  
>> the start fails to exit within the start timeout, i.e. the start times  
>> out. Basically, I have an SMF service which works fine. However, I'm  
>> now injecting some faults to determine how robust it is and in this  
>> regard the start method checks to see if the application is really up  
>> and available for work before exiting from the start method.
>>
>> In one particular fault injected case, my SMF service consumes all of  
>> it's start timeout and then times out. However, it gets restarted 3  
>> times before entering into a maintenance state. In this regard if all  
>> of start timeout is consumed I would simply like the SMF service to  
>> enter maintenance state and not get retried three times.
>>
>> So, is there a property I can use to influence the number of retries  
>> for start.

If we had general fail-once semantics (that is, any failure of this 
service would cause it to enter maintenance), would that satisfy your 
request?

Though, as an aside, I am interested in how this works in real life.  If 
the transient service fails, and enters maintenance, what will the 
administrator do differently than your stop method script to clean up so 
that they don't have to reboot to repair the service?

(Most the requests I've heard for fail-once semantics have been from 
administrators rather than service authors, because the administrators 
don't trust that the services can properly clean up after themselves.)

>>
>> My only thought to achieve only one start is to keep a count of the  
>> consumed time, i.e. ksh variable $SECONDS and then exit with  
>> $SMF_EXIT_ERR_CONFIG or $SMF_EXIT_ERR_FATAL just before the service  
>> times out.

Yep, that's probably the best workaround for now.

>>
>> I was hoping a transient service would be exempt from being restarted  
>> three times but it appears not. I would appreciate any thoughts on how  
>> to achieve only one start or other suggestions to my thought of  
>> keeping a count within the start method.

liane

[smf-discuss] Restart after SMF start timeout

Reply via email to