The problem seem to start here:
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: ER amf_quiesced_state_handler oi
deactivate FAILED
SMF tries to clear the class implementors and something fails (no logging
exactly what fails).
And SMF is not reporting the failure to AMF so AMF tries to activate the other
SMF and then
the IMM class activation fails there (since the old active failed to clear
them, I guess).
I have checked other services how they handle quiesced and many of them just
ignores errors
when releasing IMM class implementers.
The question is how you should handle them? In SMF case we are releasing three
different
class implementers and if the first one succeed and the second fails we will
fail to continue
releasing the third. Should we just continue, ignoring errors, and release all
of them anyway ?
/Bertil
From: Hrishikesh [mailto:[email protected]]
Sent: den 13 december 2013 13:00
To: [email protected]
Subject: [tickets] [opensaf:tickets] #657 SMF component got faulted during
middleware rollback from CS-4667 to CS-3796
________________________________
[tickets:#657]<http://sourceforge.net/p/opensaf/tickets/657/> SMF component got
faulted during middleware rollback from CS-4667 to CS-3796
Status: unassigned
Created: Fri Dec 13, 2013 11:59 AM UTC by Hrishikesh
Last Updated: Fri Dec 13, 2013 11:59 AM UTC
Owner: nobody
Changeset of Opensaf 4.2 CS-3796, Opensaf4.4 4667
SetUp: Cluster with 4nodes of SUSE 64bit
Scenario: Middleware upgrade was done between the above versions of opensaf.
Upgrade from 6.2SP2 to 4.4 was successful(upgrade was done in order of
PL-4,PL-3,SC-2,SC-1). Rollback was triggered and SC-1 was rolledback
successfully and opensaf was started.
Then smfd(SC-2) triggered switchover to make SC-1 active.
Then "amf_active_state_handler oi got FAILED to activate" and SMF comp got
faulted on SC-1 and failover happened during switchover. Thus rollback got
failed.
Trace Logs are attached.
Snippet of syslog from node (SC-1) which was rolled back to 6.2SP2:
Dec 13 12:19:52 SLES2-1 osafimmnd[2888]: Ccb 14 COMMITTED (SMFSERVICE)
Dec 13 12:19:52 SLES2-1 osafamfd[2936]: Cold sync complete!
Dec 13 12:19:58 SLES2-1 osafsmfd[3028]: amf_active_state_handler oi
activate FAILED
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackFailed(12)' : Recovery is 'nodeFailfast(6)'
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackFailed(12) Recovery is:nodeFailfast(6)
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast
Dec 13 12:19:58 SLES2-1 opensaf_reboot: Rebooting local node
===========
syslog from initial(when rollback started) Active node : SC-2
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO STEP: Rolling back node
reboot step completed
safSmfStep=0004,safSmfProc=OpenSAF-
upgrade,safSmfCampaign=campaign,safApp=safSmfService
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO PROC: Step safSmfStep=0003
needs switchover, let other controller take over
Dec 13 12:19:56 SLES2-2 osafamfd[2290]: NO safSi=SC-2N,safApp=OpenSAF
Swap initiated
Dec 13 12:19:56 SLES2-2 osafamfnd[2300]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: ER amf_quiesced_state_handler oi
deactivate FAILED
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 26
<285, 2020f> (safMsgGrpService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 31
<3, 2020f> (safLogService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 30
<298, 2020f> (safEvtService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 28
<299, 2020f> (safLckService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 27
<300, 2020f> (safCheckPointService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 29
<6, 2020f> (safClmService)
Dec 13 12:19:58 SLES2-2 osafamfnd[2300]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 47
(safMsgGrpService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 44
<0, 2010f> (@safLogService)
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 48
(safCheckPointService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 49
(safLckService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 50
(safClmService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 51
(safEvtService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 52
(safLogService) <0, 2010f>
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Role: ACTIVE, Node Down for
node id: 2010f
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Failover occurred in the
middle of switchover
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131599, SupervisionTime = 60
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408031] TIPC: Resetting link
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408035] TIPC: Lost link
<1.1.2:eth0-1.1.1:eth0> on network plane A
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408038] TIPC: Lost contact with
<1.1.1>
Dec 13 12:20:03 SLES2-2 osafimmd[2225]: WA IMMND DOWN on active
controller f1 detected at standby immd!! f2. Possible failover
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safAmfNode=SC-1,safAmfCluster=myAmfCluster OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safAmfNode=SC-1,safAmfCluster=myAmfCluster'
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: NO Node 'SC-1' left the cluster
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safSu=SC-1,safSg=NoRed,safApp=OpenSAF OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
==============
________________________________
Sent from sourceforge.net because
[email protected]<mailto:[email protected]>
is subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
---
** [tickets:#657] SMF component got faulted during middleware rollback from
CS-4667 to CS-3796**
**Status:** unassigned
**Created:** Fri Dec 13, 2013 11:59 AM UTC by Hrishikesh
**Last Updated:** Fri Dec 13, 2013 11:59 AM UTC
**Owner:** nobody
Changeset of Opensaf 4.2 CS-3796, Opensaf4.4 4667
SetUp: Cluster with 4nodes of SUSE 64bit
Scenario: Middleware upgrade was done between the above versions of opensaf.
Upgrade from 6.2SP2 to 4.4 was successful(upgrade was done in order of
PL-4,PL-3,SC-2,SC-1). Rollback was triggered and SC-1 was rolledback
successfully and opensaf was started.
Then smfd(SC-2) triggered switchover to make SC-1 active.
Then "amf_active_state_handler oi got FAILED to activate" and SMF comp got
faulted on SC-1 and failover happened during switchover. Thus rollback got
failed.
Trace Logs are attached.
Snippet of syslog from node (SC-1) which was rolled back to 6.2SP2:
===========
Dec 13 12:19:52 SLES2-1 osafimmnd[2888]: Ccb 14 COMMITTED (SMFSERVICE)
Dec 13 12:19:52 SLES2-1 osafamfd[2936]: Cold sync complete!
Dec 13 12:19:58 SLES2-1 osafsmfd[3028]: amf_active_state_handler oi
activate FAILED
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackFailed(12)' : Recovery is 'nodeFailfast(6)'
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackFailed(12) Recovery is:nodeFailfast(6)
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast
Dec 13 12:19:58 SLES2-1 opensaf_reboot: Rebooting local node
===========
syslog from initial(when rollback started) Active node : SC-2
=======================
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO STEP: Rolling back node
reboot step completed
safSmfStep=0004,safSmfProc=OpenSAF-
upgrade,safSmfCampaign=campaign,safApp=safSmfService
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO PROC: Step safSmfStep=0003
needs switchover, let other controller take over
Dec 13 12:19:56 SLES2-2 osafamfd[2290]: NO safSi=SC-2N,safApp=OpenSAF
Swap initiated
Dec 13 12:19:56 SLES2-2 osafamfnd[2300]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: ER amf_quiesced_state_handler oi
deactivate FAILED
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 26
<285, 2020f> (safMsgGrpService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 31
<3, 2020f> (safLogService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 30
<298, 2020f> (safEvtService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 28
<299, 2020f> (safLckService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 27
<300, 2020f> (safCheckPointService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 29
<6, 2020f> (safClmService)
Dec 13 12:19:58 SLES2-2 osafamfnd[2300]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 47
(safMsgGrpService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 44
<0, 2010f> (@safLogService)
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 48
(safCheckPointService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 49
(safLckService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 50
(safClmService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 51
(safEvtService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 52
(safLogService) <0, 2010f>
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Role: ACTIVE, Node Down for
node id: 2010f
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Failover occurred in the
middle of switchover
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131599, SupervisionTime = 60
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408031] TIPC: Resetting link
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408035] TIPC: Lost link
<1.1.2:eth0-1.1.1:eth0> on network plane A
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408038] TIPC: Lost contact with
<1.1.1>
Dec 13 12:20:03 SLES2-2 osafimmd[2225]: WA IMMND DOWN on active
controller f1 detected at standby immd!! f2. Possible failover
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safAmfNode=SC-1,safAmfCluster=myAmfCluster OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safAmfNode=SC-1,safAmfCluster=myAmfCluster'
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: NO Node 'SC-1' left the cluster
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safSu=SC-1,safSg=NoRed,safApp=OpenSAF OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
==============
---
Sent from sourceforge.net because [email protected] is
subscribed to http://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets