- **summary**: smf: two hours is spent on step undoing state --> smf: step
undoing is in progress forever until cluster reset
- Attachments has changed:
Diff:
~~~~
--- old
+++ new
@@ -1 +1,2 @@
+1353.tgz (475.2 kB; application/octet-stream)
messages_step_undo (111.1 kB; application/octet-stream)
~~~~
- **Comment**:
**Steps to reproduce:**
1) Execute middle-ware upgrade(5.0->5.1). Campaign is ran.
2) On the node (SC-2) which is being upgraded, the new rpms(5.1) are kept empty
and the node came up without opensaf installation.
**Observations:**
1) Node(SC-2) went for reboot for upgrade.
2) As node SC-2 didnot join within 10 mins of time, step undoing is initiated.
Rolling back of node reboot step is initiated by SMF.
Below is the snippet:
Sep 12 13:13:30 SLES-M-SLOT-1 osafamfd[2528]: NO Node 'SC-2' left the cluster
Sep 12 13:13:33 SLES-M-SLOT-1 osaffmd[2467]: NO Node Down event for node id
2020f:
Sep 12 13:13:33 SLES-M-SLOT-1 osaffmd[2467]: NO Current role: ACTIVE
Sep 12 13:13:33 SLES-M-SLOT-1 osaffmd[2467]: Rebooting OpenSAF NodeId = 131599
EE Name = , Reason: Received Node Down for peer controller, OwnNodeId = 131343,
SupervisionTime = 60
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO SmfUpgradeStep::nodeReboot:
the following nodes has not been correctly rebooted
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO Node
safAmfNode=SC-2,safAmfCluster=myAmfCluster
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: ER Fails to reboot node
safAmfNode=SC-2,safAmfCluster=myAmfCluster
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: ER Step execution failed, Try
undoing the step
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO SmfStepStateUndoing::execute
start undoing step.
Sep 12 13:23:38 SLES-M-SLOT-1 osafsmfd[2583]: NO STEP: Rolling back node reboot
step
safSmfStep=0002,safSmfProc=OpenSAF-upgrade,safSmfCampaign=UpgradeCampaign_7.0_7.1,safApp=safSmfService
3) Step undoing is in progress forever until cluster reset.
Attachments:
1) Syslog, smf traces of both the controllers.
---
** [tickets:#1353] smf: step undoing is in progress forever until cluster
reset**
**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Apr 28, 2015 01:33 PM UTC by Neelakanta Reddy
**Last Updated:** Wed May 04, 2016 07:25 PM UTC
**Owner:** nobody
**Attachments:**
-
[1353.tgz](https://sourceforge.net/p/opensaf/tickets/1353/attachment/1353.tgz)
(475.2 kB; application/octet-stream)
-
[messages_step_undo](https://sourceforge.net/p/opensaf/tickets/1353/attachment/messages_step_undo)
(111.1 kB; application/octet-stream)
Test description:
1. rolling middle-ware upgrade(4.5->4.6) campaign is ran
2. one of the upgrade node(PL-4) the new rpms(4.6) are kept empty and the node
comes up without opensaf installation
3. the step rollback is taken approximately two hours to describe the campaign
as EXECUTION_FAILED
4. attaching syslog of SC-1
Apr 24 18:36:55 SLES1 osafamfd[2289]: NO Node 'PL-4' left the cluster
Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer connected: 47
(MsgQueueService132111) <2280, 2010f>
Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer locally disconnected.
Marking it as doomed 47 <2280, 2010f> (MsgQueueService132111)
Apr 24 18:36:55 SLES1 osafimmnd[2237]: NO Implementer disconnected 47 <2280,
2010f> (MsgQueueService132111)
Apr 24 18:36:58 SLES1 kernel: [ 172.812065] TIPC: Resetting link
<1.1.1:eth0-1.1.4:eth0>, peer not responding
Apr 24 18:36:58 SLES1 kernel: [ 172.812071] TIPC: Lost link
<1.1.1:eth0-1.1.4:eth0> on network plane A
Apr 24 18:36:58 SLES1 kernel: [ 172.812075] TIPC: Lost contact with <1.1.4>
Apr 24 18:37:15 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
Apr 24 18:37:36 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
-------------------
--------------
----------------------
Apr 24 20:36:00 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:36:22 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:36:44 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:37:06 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO Failed to get node dest for clm node
safNode=PL-4,safCluster=myClmCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO no node destination found whitin time
limit for node safAmfNode=PL-4,safAmfCluster=myAmfCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO no node destination found for node
safAmfNode=PL-4,safAmfCluster=myAmfCluster
Apr 24 20:37:28 SLES1 osafsmfd[2318]: ER Failed to online install old bundles
Apr 24 20:37:28 SLES1 osafsmfd[2318]: ER Step undoing failed
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO Step safSmfStep=0004 in procedure
safSmfProc=OpenSAF-upgrade failed, step result 5
Apr 24 20:37:28 SLES1 osafsmfd[2318]: NO CAMP: Procedure
safSmfProc=OpenSAF-upgrade returned FAILED
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets