- Description has changed:
Diff:
~~~~
--- old
+++ new
@@ -1 +1,45 @@
-When amfnd is to run component cleanup script, the exec in the child process
is not run and a timeout after 10 seconds is delivered to amfnd. Probably the
child process "hangs" when closing file descriptors before exec. A patch that
will use alarm in the child process is being tested.
+Annoated syslog, debug patch used:
+
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO
'safComp=xxx,safSu=SC-1,safSg=1,safApp=APP1' faulted due to
'csiSetcallbackTimeout' : Recovery is 'suFailover'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned
'safSi=xxx-2N-1,safApp=APP1' ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP1'
Presence State INSTANTIATED => TERMINATING
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' faulted due to
'csiSetcallbackTimeout' : Recovery is 'suFailover'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned
'safSi=yyy-2N-1,safApp=APP2' ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:47 SC-1 osafamfnd[21575]: exec '/opt/scsv/lib/yyy.sh cleanup',
timeout 10000 ms
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2'
Presence State INSTANTIATED => TERMINATING
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:47 SC-1 osafamfnd[21574]: exec '/opt/vdchsv/bin/xxx.sh cleanup',
timeout 10000 ms
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing
'safSi=xxx-2N-1,safApp=APP1' from 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=xxx-2N-1,safApp=APP1'
from 'safSu=SC-1,safSg=1,safApp=APP1'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing
'safSi=yyy-2N-1,safApp=APP2' from 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=yyy-2N-1,safApp=APP2'
from 'safSu=SC-1,safSg=1,safApp=APP2'
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21575 terminated normally
with exit status 0
+Jul 19 07:37:48 SC-1 osafamfnd[21595]: exec '/opt/scsv/lib/yyy.sh
instantiate', timeout 10000 ms
+Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21574 terminated normally
with exit status 0
+Jul 19 07:37:48 SC-1 osafamfnd[21599]: exec '/opt/vdchsv/bin/xxx.sh
instantiate', timeout 10000 ms
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21595
to terminate
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21599
to terminate
+
+>>hafe: timeout waiting for instantiate scripts
+
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Instantiation of
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within
time'
+
+>>hafe: execution of instantiate script fails with timeout after 10sec, this
is OK and an component issue.
+>>hafe: cleanup command issued as can be seen below.
+
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Cleanup of
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within
time'
+
+>>hafe: because of the bug described, the second timeout is correlated with
the wrong component and since that
+>>hafe: component is in TERMINATING state it enters TERMINATION-FAILED state
as seen below.
+
+Jul 19 07:37:58 SC-1 osafamfnd[21724]: exec '/opt/scsv/lib/yyy.sh cleanup',
timeout 10000 ms
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2'
Presence State TERMINATING => TERMINATION_FAILED
+Jul 19 07:37:58 SC-1 osafamfnd[12904]: Child process 21724 terminated normally
with exit status 0
+
+
+Here amfnd interpretes and correlates the second instantiate timeout with the
wrong component and think cleanup has failed. The component enters
TERMINATION-FAILED presence state which is a final state that requires manual
intervention.
~~~~
---
** [tickets:#514] Amfnd: Component cleanup fail**
**Status:** accepted
**Created:** Mon Jul 22, 2013 08:15 AM UTC by hano
**Last Updated:** Tue Aug 06, 2013 10:53 AM UTC
**Owner:** hano
Annoated syslog, debug patch used:
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO
'safComp=xxx,safSu=SC-1,safSg=1,safApp=APP1' faulted due to
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1'
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP1'
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' faulted due to
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2'
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21575]: exec '/opt/scsv/lib/yyy.sh cleanup',
timeout 10000 ms
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2'
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1'
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2'
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21574]: exec '/opt/vdchsv/bin/xxx.sh cleanup',
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=xxx-2N-1,safApp=APP1'
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=xxx-2N-1,safApp=APP1'
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=yyy-2N-1,safApp=APP2'
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=yyy-2N-1,safApp=APP2'
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21575 terminated normally
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21595]: exec '/opt/scsv/lib/yyy.sh instantiate',
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21574 terminated normally
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21599]: exec '/opt/vdchsv/bin/xxx.sh
instantiate', timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21595
to terminate
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21599
to terminate
>>hafe: timeout waiting for instantiate scripts
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Instantiation of
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within
time'
>>hafe: execution of instantiate script fails with timeout after 10sec, this is
>>OK and an component issue.
>>hafe: cleanup command issued as can be seen below.
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Cleanup of
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within
time'
>>hafe: because of the bug described, the second timeout is correlated with the
>>wrong component and since that
>>hafe: component is in TERMINATING state it enters TERMINATION-FAILED state as
>>seen below.
Jul 19 07:37:58 SC-1 osafamfnd[21724]: exec '/opt/scsv/lib/yyy.sh cleanup',
timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2'
Presence State TERMINATING => TERMINATION_FAILED
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Child process 21724 terminated normally
with exit status 0
Here amfnd interpretes and correlates the second instantiate timeout with the
wrong component and think cleanup has failed. The component enters
TERMINATION-FAILED presence state which is a final state that requires manual
intervention.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets