Ticket [#2160] will add support to differentiate between a hung versus a 
stopped node, no additional documentation will be needed.


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Nov 02, 2016 11:08 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to