- Description has changed:

Diff:

~~~~

--- old
+++ new
@@ -1 +1,45 @@
-When amfnd is to run component cleanup script, the exec in the child process 
is not run and a timeout after 10 seconds is delivered to amfnd. Probably the 
child process "hangs" when closing file descriptors before exec. A patch that 
will use alarm in the child process is being tested.
+Annoated syslog, debug patch used:
+
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 
'safComp=xxx,safSu=SC-1,safSg=1,safApp=APP1' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'suFailover'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 
'safSi=xxx-2N-1,safApp=APP1' ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP1' 
Presence State INSTANTIATED => TERMINATING
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'suFailover'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 
'safSi=yyy-2N-1,safApp=APP2' ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:47 SC-1 osafamfnd[21575]: exec '/opt/scsv/lib/yyy.sh cleanup', 
timeout 10000 ms
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2' 
Presence State INSTANTIATED => TERMINATING
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning 
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning 
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:47 SC-1 osafamfnd[21574]: exec '/opt/vdchsv/bin/xxx.sh cleanup', 
timeout 10000 ms
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 
'safSi=xxx-2N-1,safApp=APP1' from 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=xxx-2N-1,safApp=APP1' 
from 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 
'safSi=yyy-2N-1,safApp=APP2' from 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=yyy-2N-1,safApp=APP2' 
from 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21575 terminated normally 
with exit status 0
+Jul 19 07:37:48 SC-1 osafamfnd[21595]: exec '/opt/scsv/lib/yyy.sh 
instantiate', timeout 10000 ms
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21574 terminated normally 
with exit status 0
+Jul 19 07:37:48 SC-1 osafamfnd[21599]: exec '/opt/vdchsv/bin/xxx.sh 
instantiate', timeout 10000 ms
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21595 
to terminate
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21599 
to terminate
+
+>>hafe: timeout waiting for instantiate scripts
+
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Instantiation of 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within 
time'
+
+>>hafe: execution of instantiate script fails with timeout after 10sec, this 
is OK and an component issue.
+>>hafe: cleanup command issued as can be seen below.
+
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Cleanup of 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within 
time'
+
+>>hafe: because of the bug described, the second timeout is correlated with 
the wrong component and since that
+>>hafe: component is in TERMINATING state it enters TERMINATION-FAILED state 
as seen below.
+
+Jul 19 07:37:58 SC-1 osafamfnd[21724]: exec '/opt/scsv/lib/yyy.sh cleanup', 
timeout 10000 ms
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2' 
Presence State TERMINATING => TERMINATION_FAILED
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: Child process 21724 terminated normally 
with exit status 0
+
+
+Here amfnd interpretes and correlates the second instantiate timeout with the 
wrong component and think cleanup has failed. The component enters 
TERMINATION-FAILED presence state which is a final state that requires manual 
intervention.

~~~~




---

** [tickets:#514] Amfnd: Component cleanup fail**

**Status:** accepted
**Created:** Mon Jul 22, 2013 08:15 AM UTC by hano
**Last Updated:** Tue Aug 06, 2013 10:53 AM UTC
**Owner:** hano

Annoated syslog, debug patch used:

Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 
'safComp=xxx,safSu=SC-1,safSg=1,safApp=APP1' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1' 
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP1' 
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' faulted due to 
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2' 
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21575]: exec '/opt/scsv/lib/yyy.sh cleanup', 
timeout 10000 ms
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2' 
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning 
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1' 
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning 
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2' 
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21574]: exec '/opt/vdchsv/bin/xxx.sh cleanup', 
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=xxx-2N-1,safApp=APP1' 
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=xxx-2N-1,safApp=APP1' 
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=yyy-2N-1,safApp=APP2' 
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=yyy-2N-1,safApp=APP2' 
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21575 terminated normally 
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21595]: exec '/opt/scsv/lib/yyy.sh instantiate', 
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21574 terminated normally 
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21599]: exec '/opt/vdchsv/bin/xxx.sh 
instantiate', timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21595 
to terminate
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21599 
to terminate

>>hafe: timeout waiting for instantiate scripts

Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Instantiation of 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within 
time'

>>hafe: execution of instantiate script fails with timeout after 10sec, this is 
>>OK and an component issue.
>>hafe: cleanup command issued as can be seen below.

Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Cleanup of 
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within 
time'

>>hafe: because of the bug described, the second timeout is correlated with the 
>>wrong component and since that
>>hafe: component is in TERMINATING state it enters TERMINATION-FAILED state as 
>>seen below.

Jul 19 07:37:58 SC-1 osafamfnd[21724]: exec '/opt/scsv/lib/yyy.sh cleanup', 
timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2' 
Presence State TERMINATING => TERMINATION_FAILED
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Child process 21724 terminated normally 
with exit status 0


Here amfnd interpretes and correlates the second instantiate timeout with the 
wrong component and think cleanup has failed. The component enters 
TERMINATION-FAILED presence state which is a final state that requires manual 
intervention.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to