Can you reproduce this?
If you can reproduce it you can activate tracing on SMF and include them so we 
can see in more detail what failed.
You can activate tracing by doing "pkill -USR2 osafsmfd" (on both SC nodes 
before you start the rollback).
The trace logs can be found in /var/log/opensaf/osafsmfd (on both SC's).
Also include the complete syslogs from both SC's.

/Bertil

From: Hrishikesh [mailto:[email protected]]
Sent: den 16 december 2013 06:51
To: [opensaf:tickets]
Subject: [opensaf:tickets] #657 SMF component got faulted during middleware 
rollback from CS-4667 to CS-3796


  *   Description has changed:

Diff:

--- old

+++ new

@@ -3,13 +3,13 @@

 SetUp: Cluster with 4nodes of SUSE 64bit

 Scenario: Middleware upgrade was done between the above versions of opensaf.



-Upgrade from 6.2SP2 to 4.4 was successful(upgrade was done in order of 
PL-4,PL-3,SC-2,SC-1). Rollback was triggered and SC-1 was rolledback 
successfully and opensaf was started.

+Upgrade from Opensaf changeset 3796 to Changeset 4667 was successful(upgrade 
was done in order of PL-4,PL-3,SC-2,SC-1). Rollback was triggered and SC-1 was 
rolledback successfully and opensaf was started.

 Then smfd(SC-2) triggered switchover to make SC-1 active.

 Then "amf_active_state_handler oi got FAILED to activate"  and SMF comp got 
faulted on SC-1 and failover happened during switchover. Thus rollback got 
failed.



 Trace Logs are attached.



-Snippet of syslog from node (SC-1) which was rolled back to 6.2SP2:

+Snippet of syslog from node (SC-1) which was rolled back to Changeset 3796:

 ===========

 Dec 13 12:19:52 SLES2-1 osafimmnd[2888]: Ccb 14 COMMITTED (SMFSERVICE)

 Dec 13 12:19:52 SLES2-1 osafamfd[2936]: Cold sync complete!

________________________________

[tickets:#657]<http://sourceforge.net/p/opensaf/tickets/657/> SMF component got 
faulted during middleware rollback from CS-4667 to CS-3796

Status: unassigned
Created: Fri Dec 13, 2013 11:59 AM UTC by Hrishikesh
Last Updated: Fri Dec 13, 2013 11:59 AM UTC
Owner: nobody

Changeset of Opensaf 4.2 CS-3796, Opensaf4.4 4667

SetUp: Cluster with 4nodes of SUSE 64bit
Scenario: Middleware upgrade was done between the above versions of opensaf.

Upgrade from Opensaf changeset 3796 to Changeset 4667 was successful(upgrade 
was done in order of PL-4,PL-3,SC-2,SC-1). Rollback was triggered and SC-1 was 
rolledback successfully and opensaf was started.
Then smfd(SC-2) triggered switchover to make SC-1 active.
Then "amf_active_state_handler oi got FAILED to activate" and SMF comp got 
faulted on SC-1 and failover happened during switchover. Thus rollback got 
failed.

Trace Logs are attached.

Snippet of syslog from node (SC-1) which was rolled back to Changeset 3796:

Dec 13 12:19:52 SLES2-1 osafimmnd[2888]: Ccb 14 COMMITTED (SMFSERVICE)
Dec 13 12:19:52 SLES2-1 osafamfd[2936]: Cold sync complete!

Dec 13 12:19:58 SLES2-1 osafsmfd[3028]: amf_active_state_handler oi
activate FAILED
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackFailed(12)' : Recovery is 'nodeFailfast(6)'
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackFailed(12) Recovery is:nodeFailfast(6)
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast
Dec 13 12:19:58 SLES2-1 opensaf_reboot: Rebooting local node
===========

syslog from initial(when rollback started) Active node : SC-2

Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO STEP: Rolling back node
reboot step completed
safSmfStep=0004,safSmfProc=OpenSAF-
upgrade,safSmfCampaign=campaign,safApp=safSmfService
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO PROC: Step safSmfStep=0003
needs switchover, let other controller take over
Dec 13 12:19:56 SLES2-2 osafamfd[2290]: NO safSi=SC-2N,safApp=OpenSAF
Swap initiated
Dec 13 12:19:56 SLES2-2 osafamfnd[2300]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: ER amf_quiesced_state_handler oi
deactivate FAILED
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 26
<285, 2020f> (safMsgGrpService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 31
<3, 2020f> (safLogService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 30
<298, 2020f> (safEvtService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 28
<299, 2020f> (safLckService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 27
<300, 2020f> (safCheckPointService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 29
<6, 2020f> (safClmService)
Dec 13 12:19:58 SLES2-2 osafamfnd[2300]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 47
(safMsgGrpService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 44
<0, 2010f> (@safLogService)
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 48
(safCheckPointService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 49
(safLckService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 50
(safClmService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 51
(safEvtService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 52
(safLogService) <0, 2010f>
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Role: ACTIVE, Node Down for
node id: 2010f
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Failover occurred in the
middle of switchover
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131599, SupervisionTime = 60
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408031] TIPC: Resetting link
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408035] TIPC: Lost link
<1.1.2:eth0-1.1.1:eth0> on network plane A
Dec 13 12:20:03 SLES2-2 kernel: [ 525.408038] TIPC: Lost contact with
<1.1.1>
Dec 13 12:20:03 SLES2-2 osafimmd[2225]: WA IMMND DOWN on active
controller f1 detected at standby immd!! f2. Possible failover
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safAmfNode=SC-1,safAmfCluster=myAmfCluster OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safAmfNode=SC-1,safAmfCluster=myAmfCluster'
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: NO Node 'SC-1' left the cluster
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safSu=SC-1,safSg=NoRed,safApp=OpenSAF OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
==============

________________________________

Sent from sourceforge.net because you indicated interest in 
https://sourceforge.net/p/opensaf/tickets/657/

To unsubscribe from further messages, please visit 
https://sourceforge.net/auth/subscriptions/



---

** [tickets:#657] SMF component got faulted during middleware rollback from 
CS-4667 to CS-3796**

**Status:** unassigned
**Created:** Fri Dec 13, 2013 11:59 AM UTC by Hrishikesh
**Last Updated:** Mon Dec 16, 2013 05:50 AM UTC
**Owner:** nobody

Changeset of Opensaf 4.2 CS-3796,  Opensaf4.4 4667

SetUp: Cluster with 4nodes of SUSE 64bit
Scenario: Middleware upgrade was done between the above versions of opensaf.

Upgrade from Opensaf changeset 3796 to Changeset 4667 was successful(upgrade 
was done in order of PL-4,PL-3,SC-2,SC-1). Rollback was triggered and SC-1 was 
rolledback successfully and opensaf was started.
Then smfd(SC-2) triggered switchover to make SC-1 active.
Then "amf_active_state_handler oi got FAILED to activate"  and SMF comp got 
faulted on SC-1 and failover happened during switchover. Thus rollback got 
failed.

Trace Logs are attached.

Snippet of syslog from node (SC-1) which was rolled back to Changeset 3796: 
===========
Dec 13 12:19:52 SLES2-1 osafimmnd[2888]: Ccb 14 COMMITTED (SMFSERVICE)
Dec 13 12:19:52 SLES2-1 osafamfd[2936]: Cold sync complete!


Dec 13 12:19:58 SLES2-1 osafsmfd[3028]: amf_active_state_handler oi
activate FAILED
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
'safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to
'csiSetcallbackFailed(12)' : Recovery is 'nodeFailfast(6)'
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]:
safComp=SMF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due
to:csiSetcallbackFailed(12) Recovery is:nodeFailfast(6)
Dec 13 12:19:58 SLES2-1 osafamfnd[2946]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast
Dec 13 12:19:58 SLES2-1 opensaf_reboot: Rebooting local node
===========


syslog from initial(when rollback started) Active node : SC-2
======================= 
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO STEP: Rolling back node
reboot step completed
safSmfStep=0004,safSmfProc=OpenSAF-
upgrade,safSmfCampaign=campaign,safApp=safSmfService
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: NO PROC: Step safSmfStep=0003
needs switchover, let other controller take over
Dec 13 12:19:56 SLES2-2 osafamfd[2290]: NO safSi=SC-2N,safApp=OpenSAF
Swap initiated
Dec 13 12:19:56 SLES2-2 osafamfnd[2300]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:56 SLES2-2 osafsmfd[2331]: ER amf_quiesced_state_handler oi
deactivate FAILED
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 26
<285, 2020f> (safMsgGrpService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 31
<3, 2020f> (safLogService)
Dec 13 12:19:56 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 30
<298, 2020f> (safEvtService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 28
<299, 2020f> (safLckService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 27
<300, 2020f> (safCheckPointService)
Dec 13 12:19:57 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 29
<6, 2020f> (safClmService)
Dec 13 12:19:58 SLES2-2 osafamfnd[2300]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to
'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 47
(safMsgGrpService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer disconnected 44
<0, 2010f> (@safLogService)
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 48
(safCheckPointService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 49
(safLckService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 50
(safClmService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 51
(safEvtService) <0, 2010f>
Dec 13 12:19:58 SLES2-2 osafimmnd[2235]: NO Implementer connected: 52
(safLogService) <0, 2010f>
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Role: ACTIVE, Node Down for
node id: 2010f
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: NO Failover occurred in the
middle of switchover
Dec 13 12:20:03 SLES2-2 osaffmd[2215]: Rebooting OpenSAF NodeId = 131343
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId =
131599, SupervisionTime = 60
Dec 13 12:20:03 SLES2-2 kernel: [  525.408031] TIPC: Resetting link
<1.1.2:eth0-1.1.1:eth0>, peer not responding
Dec 13 12:20:03 SLES2-2 kernel: [  525.408035] TIPC: Lost link
<1.1.2:eth0-1.1.1:eth0> on network plane A
Dec 13 12:20:03 SLES2-2 kernel: [  525.408038] TIPC: Lost contact with
<1.1.1>
Dec 13 12:20:03 SLES2-2 osafimmd[2225]: WA IMMND DOWN on active
controller f1 detected at standby immd!! f2. Possible failover
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safAmfNode=SC-1,safAmfCluster=myAmfCluster OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safAmfNode=SC-1,safAmfCluster=myAmfCluster'
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: NO Node 'SC-1' left the cluster
Dec 13 12:20:03 SLES2-2 osafamfd[2290]:
safSu=SC-1,safSg=NoRed,safApp=OpenSAF OperState ENABLED =>
DISABLED
Dec 13 12:20:03 SLES2-2 osafamfd[2290]: WA State change notification
lost for 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF'
============== 



---

Sent from sourceforge.net because [email protected] is 
subscribed to http://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
http://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to