Agree, "Suggestion: 1" document that admin needs to perform clm admin lock of
standby is a good suggestion. The node will then not be a member of the cluster
and not affected by remote fencing
---
** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with
STONITH enabled cluster**
**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Tue Nov 08, 2016 11:49 AM UTC
**Owner:** nobody
OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled
Steps:
1. Bring up OpenSaf on two nodes
2. Enable STONITH
3. Stop opensaf on Standby
Active controller triggers reboot of standby
SC-1 Syslog
Oct 5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4,
dest:565215202263055)
Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for
nodeId:2020f pid:3579
Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0,
2020f(down)> (@safAmfService2020f)
Oct 5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct 5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343,
SupervisionTime = 60
Oct 5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was
stopped**
Oct 5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct 5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct 5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct 5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was
started
Oct 5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct 5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3,
dest:565217457979407)
Oct 5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY
Controller at 2020f
Oct 5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently
coord) requests sync
Oct 5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176
epoch:0
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct 5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling
epoch:4
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY -->
IMM_SERVER_SYNC_SERVER
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct 5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE
18430
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at
node 2010f old epoch: 3 new epoch:4
Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at
node 2020f old epoch: 0 new epoch:4
Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER
--> IMM_SERVER_READY
Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16
(MsgQueueService131599) <467, 2010f>
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected.
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467,
2010f> (MsgQueueService131599)
Oct 5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct 5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct 5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5,
dest:13)
Oct 5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f
with role STANDBY
Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node
0x2020f with role STANDBY
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets