- **status**: review --> fixed
- **assigned_to**: hano --> Praveen
- **Milestone**: future --> 4.2.5
---
** [tickets:#514] Amfnd: Component cleanup fail**
**Status:** fixed
**Created:** Mon Jul 22, 2013 08:15 AM UTC by hano
**Last Updated:** Tue Aug 06, 2013 01:24 PM UTC
**Owner:** Praveen
Annoated syslog, debug patch used:
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO
'safComp=xxx,safSu=SC-1,safSg=1,safApp=APP1' faulted due to
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1'
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP1'
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' faulted due to
'csiSetcallbackTimeout' : Recovery is 'suFailover'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2'
ACTIVE to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21575]: exec '/opt/scsv/lib/yyy.sh cleanup',
timeout 10000 ms
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2'
Presence State INSTANTIATED => TERMINATING
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning
'safSi=xxx-2N-1,safApp=APP1' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=xxx-2N-1,safApp=APP1'
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigning
'safSi=yyy-2N-1,safApp=APP2' QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[12904]: NO Assigned 'safSi=yyy-2N-1,safApp=APP2'
QUIESCED to 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:47 SC-1 osafamfnd[21574]: exec '/opt/vdchsv/bin/xxx.sh cleanup',
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=xxx-2N-1,safApp=APP1'
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=xxx-2N-1,safApp=APP1'
from 'safSu=SC-1,safSg=1,safApp=APP1'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removing 'safSi=yyy-2N-1,safApp=APP2'
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: NO Removed 'safSi=yyy-2N-1,safApp=APP2'
from 'safSu=SC-1,safSg=1,safApp=APP2'
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21575 terminated normally
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21595]: exec '/opt/scsv/lib/yyy.sh instantiate',
timeout 10000 ms
Jul 19 07:37:48 SC-1 osafamfnd[12904]: Child process 21574 terminated normally
with exit status 0
Jul 19 07:37:48 SC-1 osafamfnd[21599]: exec '/opt/vdchsv/bin/xxx.sh
instantiate', timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21595
to terminate
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Timeout waiting for child process 21599
to terminate
>>hafe: timeout waiting for instantiate scripts
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Instantiation of
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within
time'
>>hafe: execution of instantiate script fails with timeout after 10sec, this is
>>OK and an component issue.
>>hafe: cleanup command issued as can be seen below.
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Cleanup of
'safComp=yyy,safSu=SC-1,safSg=1,safApp=APP2' failed
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO Reason:'Script did not exit within
time'
>>hafe: because of the bug described, the second timeout is correlated with the
>>wrong component and since that
>>hafe: component is in TERMINATING state it enters TERMINATION-FAILED state as
>>seen below.
Jul 19 07:37:58 SC-1 osafamfnd[21724]: exec '/opt/scsv/lib/yyy.sh cleanup',
timeout 10000 ms
Jul 19 07:37:58 SC-1 osafamfnd[12904]: NO 'safSu=SC-1,safSg=1,safApp=APP2'
Presence State TERMINATING => TERMINATION_FAILED
Jul 19 07:37:58 SC-1 osafamfnd[12904]: Child process 21724 terminated normally
with exit status 0
Here amfnd interpretes and correlates the second instantiate timeout with the
wrong component and think cleanup has failed. The component enters
TERMINATION-FAILED presence state which is a final state that requires manual
intervention.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets