changeset:   4415:cdbe70f24e8d
tag:         tip
parent:      4408:c8588d280ae7
user:        [email protected]
date:        Wed Aug 14 14:11:44 2013 +0530
summary:     amfnd: correlate clc scripts response event based on comp name 
[#514]

changeset:   4414:45a4f1fd7f41
branch:      opensaf-4.3.x
parent:      4412:a5f5c8cc982a
user:        [email protected]
date:        Wed Aug 14 14:10:44 2013 +0530
summary:     amfnd: correlate clc scripts response event based on comp name 
[#514]

changeset:   4413:872897ba7e1f
branch:      opensaf-4.2.x
parent:      4410:b1e1071ba046
user:        [email protected]
date:        Wed Aug 14 14:08:58 2013 +0530
summary:     amfnd: correlate clc scripts response event based on comp name 
[#514]



---

** [tickets:#514] Amfnd: Component cleanup fail**

**Status:** fixed
**Created:** Mon Jul 22, 2013 08:15 AM UTC by hano
**Last Updated:** Tue Aug 06, 2013 01:24 PM UTC
**Owner:** Praveen

Annoated syslog, debug patch used:

Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 
'safComp=xxx,safSu=SC-1,safSg=1,safApp=APP1' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1' 
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP1' 
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2' 
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21575]: exec '/opt/scsv/lib/yyy.sh cleanup', 
timeout 10000 ms
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2' 
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning 
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1' 
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning 
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2' 
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21574]: exec '/opt/vdchsv/bin/xxx.sh cleanup', 
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=xxx-2N-1,safApp=APP1' 
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=xxx-2N-1,safApp=APP1' 
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=yyy-2N-1,safApp=APP2' 
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=yyy-2N-1,safApp=APP2' 
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21575 terminated normally 
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21595]: exec '/opt/scsv/lib/yyy.sh instantiate', 
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21574 terminated normally 
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21599]: exec '/opt/vdchsv/bin/xxx.sh 
instantiate', timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21595 
to terminate
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21599 
to terminate

>>hafe: timeout waiting for instantiate scripts

Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Instantiation of 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within 
time'

>>hafe: execution of instantiate script fails with timeout after 10sec, this is 
>>OK and an component issue.
>>hafe: cleanup command issued as can be seen below.

Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Cleanup of 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within 
time'

>>hafe: because of the bug described, the second timeout is correlated with the 
>>wrong component and since that
>>hafe: component is in TERMINATING state it enters TERMINATION-FAILED state as 
>>seen below.

Jul 19 07:37:58 SC-1 osafamfnd[21724]: exec '/opt/scsv/lib/yyy.sh cleanup', 
timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2' 
Presence State TERMINATING => TERMINATION_FAILED
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Child process 21724 terminated normally 
with exit status 0


Here amfnd interpretes and correlates the second instantiate timeout with the 
wrong component and think cleanup has failed. The component enters 
TERMINATION-FAILED presence state which is a final state that requires manual 
intervention.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to