- **summary**: smf: two hours is spent on step undoing state  --> smf: step 
undoing is in progress forever until cluster reset
- Attachments has changed:

Diff:

~~~~

--- old
+++ new
@@ -1 +1,2 @@
+1353.tgz (475.2 kB; application/octet-stream)
 messages_step_undo (111.1 kB; application/octet-stream)

~~~~

- **Comment**:

**Steps to reproduce:**

1) Execute middle-ware upgrade(5.0->5.1). Campaign is ran.
2) On the node (SC-2) which is being upgraded, the new rpms(5.1) are kept empty 
and the node came up without opensaf installation.

**Observations:**

1) Node(SC-2) went for reboot for upgrade.
2) As node SC-2 didnot join within 10 mins of time, step undoing is initiated. 
Rolling back of node reboot step is initiated by SMF.

Below is the snippet:

Sep 12 13:13:30 SLES-M-SLOT-1 osafamfd[2528]: NO Node 'SC-2' left the cluster
Sep 12 13:13:33 SLES-M-SLOT-1 osaffmd[2467]: NO Node Down event for node id 
2020f:
Sep 12 13:13:33 SLES-M-SLOT-1 osaffmd[2467]: NO Current role: ACTIVE
Sep 12 13:13:33 SLES-M-SLOT-1 osaffmd[2467]: Rebooting OpenSAF NodeId = 131599 
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60

Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO SmfUpgradeStep::nodeReboot: 
the following nodes has not been correctly rebooted
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO Node 
safAmfNode=SC-2,safAmfCluster=myAmfCluster
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: ER Fails to reboot node 
safAmfNode=SC-2,safAmfCluster=myAmfCluster
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: ER Step execution failed, Try 
undoing the step

Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO SmfStepStateUndoing::execute 
start undoing step.
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO STEP: Rolling back node reboot 
step 
safSmfStep=0002,safSmfProc=OpenSAF-upgrade,safSmfCampaign=UpgradeCampaign_7.0_7.1,safApp=safSmfService


3) Step undoing is in progress forever until cluster reset.

Attachments:

1) Syslog, smf traces of both the controllers.



---

** [tickets:#1353] smf: step undoing is in progress forever until cluster 
reset**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Apr 28, 2015 01:33 PM UTC by Neelakanta Reddy
**Last Updated:** Wed May 04, 2016 07:25 PM UTC
**Owner:** nobody
**Attachments:**

- 
[1353.tgz](https://sourceforge.net/p/opensaf/tickets/1353/attachment/1353.tgz) 
(475.2 kB; application/octet-stream)
- 
[messages_step_undo](https://sourceforge.net/p/opensaf/tickets/1353/attachment/messages_step_undo)
 (111.1 kB; application/octet-stream)


Test description:
1. rolling middle-ware upgrade(4.5->4.6) campaign is ran
2. one of the upgrade node(PL-4) the new rpms(4.6) are kept empty and the node 
comes up without opensaf installation
3. the step rollback is taken approximately two hours to describe the campaign 
as EXECUTION_FAILED
4. attaching syslog of SC-1

Apr 24 18:36:55 SLES1 osafamfd[2289]: NO Node 'PL-4' left the cluster
Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer connected: 47 
(MsgQueueService132111) <2280, 2010f>
Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer locally disconnected. 
Marking it as doomed 47 <2280, 2010f> (MsgQueueService132111)
Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer disconnected 47 <2280, 
2010f> (MsgQueueService132111)
Apr 24 18:36:58 SLES1 kernel: [  172.812065] TIPC: Resetting link 
<1.1.1:eth0-1.1.4:eth0>, peer not responding
Apr 24 18:36:58 SLES1 kernel: [  172.812071] TIPC: Lost link 
<1.1.1:eth0-1.1.4:eth0> on network plane A
Apr 24 18:36:58 SLES1 kernel: [  172.812075] TIPC: Lost contact with <1.1.4>
Apr 24 18:37:15 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster
Apr 24 18:37:36 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster

-------------------
--------------
----------------------

Apr 24 20:36:00 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:36:22 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:36:44 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:37:06 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node 
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO no node destination found whitin time 
limit for node safAmfNode=PL-4,safAmfCluster=myAmfCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO no node destination found for node 
safAmfNode=PL-4,safAmfCluster=myAmfCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: ER Failed to online install old bundles
Apr 24 20:37:28 SLES1 osafsmfd[2318]: ER Step undoing failed
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO Step safSmfStep=0004 in procedure 
safSmfProc=OpenSAF-upgrade failed, step result 5
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO CAMP: Procedure 
safSmfProc=OpenSAF-upgrade returned FAILED





---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to