I think the procedure for stopping OpenSAF in a controlled way is to first lock
the node using CLM. The CLM lock admin operation will remove the node from
cluster membership. The it should be safe to stop OpenSAF on that node without
getting fenced - i.e. we should not fence a node that we lost contact with if
the node was not a member of the cluster.
---
** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with
STONITH enabled cluster**
**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Thu Oct 06, 2016 11:43 AM UTC
**Owner:** nobody
OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled
Steps:
1. Bring up OpenSaf on two nodes
2. Enable STONITH
3. Stop opensaf on Standby
Active controller triggers reboot of standby
SC-1 Syslog
Oct 5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4,
dest:565215202263055)
Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for
nodeId:2020f pid:3579
Oct 5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0,
2020f(down)> (@safAmfService2020f)
Oct 5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct 5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343,
SupervisionTime = 60
Oct 5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was
stopped**
Oct 5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct 5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct 5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct 5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was
started
Oct 5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct 5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3,
dest:565217457979407)
Oct 5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY
Controller at 2020f
Oct 5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently
coord) requests sync
Oct 5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176
epoch:0
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct 5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling
epoch:4
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY -->
IMM_SERVER_SYNC_SERVER
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct 5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE
18430
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at
node 2010f old epoch: 3 new epoch:4
Oct 5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at
node 2020f old epoch: 0 new epoch:4
Oct 5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER
--> IMM_SERVER_READY
Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct 5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16
(MsgQueueService131599) <467, 2010f>
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected.
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct 5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467,
2010f> (MsgQueueService131599)
Oct 5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct 5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct 5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5,
dest:13)
Oct 5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f
with role STANDBY
Oct 5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node
0x2020f with role STANDBY
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets