For the scenario-2, 
-> Management software e.g. SWN other than opensafd issued reboot on standby 
controller. From opensaf perspecitve , the standby controller might be healthy 
member of a cluster. But from the SWN perspecitve, node needs to be repaired 
and reboot is invoked.

-> When reboot command is invoked by SWN, all services in configured runlevel 
shall be stopped in the order.

-> Once the opensafd stop script is invoked on standby controller, active 
controller detects that the standby controller is in healthy state and remote 
fencing shall be done.

-> As part of remote fencing, the node shall be hard rebooted, which doesn't 
give chance for other services in runlevel to be stopped gracefully.

-> If the SWN has a database service ( e.g. drbd) which is to be stopped after 
opensafd stop, the database service stop script shall not be invoked as remote 
fencing is done. This may result in bad state for the other management software 
e.g. SWN.

 Suggestion :

1)  Either opensaf shall document that admin needs to perform clm admin lock of 
standby controller before repairing. OR
2)  FM should detect the difference between opensafd stop and hung opensaf 
processes. As part of opensafd stop, peer fmd on standby contoller can update 
fmd on active controller that opensafd on standby is going gracefully. 



---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Tue Nov 08, 2016 11:49 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to