[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2017-03-01 Thread Chani Srivastava
- **status**: unassigned --> duplicate
- **Comment**:

Closing as duplicate of #2160



---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** duplicate
**Milestone:** 5.2.RC1
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Mar 01, 2017 06:46 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2017-02-28 Thread Hans Nordebäck
I suggest to close this ticket as a duplicate of ticket #2160


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.RC1
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Fri Nov 11, 2016 08:48 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-11 Thread Hans Nordebäck
Agree, "Suggestion: 1" document that admin needs to perform clm admin lock of 
standby is a good suggestion. The node will then not be a member of the cluster 
and not affected by remote fencing


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Tue Nov 08, 2016 11:49 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-08 Thread Hans Nordebäck
in a) doing /etc/init.d/opensafd stop doesn't reboot the node, but stops 
opensaf on that node and saClmNodeIsMember is set to false. The active 
controller will then not perform remote fencing of that node.
in b) "graceful" reboot after opensafd stop, should work fine without any 
involvemnet of the remote fencing functionality


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Mon Nov 07, 2016 11:46 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-07 Thread Srikanth R
There are two scenarios where "opensafd stop" is invoked on any opensaf 
controller.

SCENARIO-1) Where /etc/init.d/opensafd script is invoked manually on command 
prompt when the system is running and up.
SCENARIO-2) Software on a controller ( other than opensafd) invoked "reboot"  
for which opensafd stop is invoked in run level 3 or higher.

 With the patch submitted for #2160,
 
 a)node shall go for reboot in scenario-1, if administrator doesn't invoke clm 
admin operation. This is fine.
 
b) For scenario-2, all run level services shall not be stopped gracefully as 
the node shall be rebooted abruptly after opensafd stop as admin did not invoke 
clm admin operation. So, opensafd as a HA software shall not support graceful 
reboot on standby controller with the #2160 fix ?


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Nov 02, 2016 11:40 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-02 Thread Hans Nordebäck
Ticket [#2160] will add support to differentiate between a hung versus a 
stopped node, no additional documentation will be needed.


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Nov 02, 2016 11:08 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-11-02 Thread Chani Srivastava
Can you provide the documentation on how to stop opensaf in a controlled manner 
so that I can close the ticket.


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Thu Oct 13, 2016 10:27 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-13 Thread Hans Nordebäck
Split brain may only happen between the system controllers.


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Thu Oct 13, 2016 09:39 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-13 Thread Chani Srivastava
Is Stonith applicable only for controllers? As no reboot observed while 
stopping opensaf on Payload.



---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Thu Oct 06, 2016 12:23 PM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-06 Thread Anders Widell
I think the procedure for stopping OpenSAF in a controlled way is to first lock 
the node using CLM. The CLM lock admin operation will remove the node from 
cluster membership. The it should be safe to stop OpenSAF on that node without 
getting fenced - i.e. we should not fence a node that we lost contact with if 
the node was not a member of the cluster.


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Thu Oct 06, 2016 11:43 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-06 Thread Mathi Naickan
This seems to be a case of differentiating a hung node versus a node on which 
the middleware is stopped.

Is there any standard means to detect a "hung" node? 
IF there is such a mechanism to detect a hung node, then 
Upon receiving "NODE_DOWN" i.e. below event 
 "Oct 5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:"
FM could use (say a libvirt command) command to detect if the node is hung or 
running healthy. If running healthy then a reboot using stonith could be 
avoided.

OpenSAF did support the usecase of "/etc/init.d/opensafd stop without OS 
reboot". Should we continue to support that?


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Thu Oct 06, 2016 11:19 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-06 Thread Hans Nordebäck
This is the same behaviour as running without stontih or PLM. Without stonith 
opensaf tries to reboot the standby controller at opensafd stop, but needs 
either PLM or stonith to succeed. Perhaps it is needed to stop opensaf and not 
trigger remote fencing? Is this an upgrade case? Perhaps we should create an 
enhancement ticket for this?  


---

** [tickets:#2094] Standby controller goes for reboot on stopping openSaf with 
STONITH enabled cluster**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Wed Oct 05, 2016 07:28 AM UTC by Chani Srivastava
**Last Updated:** Wed Oct 05, 2016 07:28 AM UTC
**Owner:** nobody


OS : Ubuntu 64bit
Changeset : 7997 ( 5.1.FC)
Setup : 2-node cluster (both controllers) Remote fencing enabled

Steps:
1. Bring up OpenSaf on two nodes 
2. Enable STONITH
3. Stop opensaf on Standby

Active controller triggers reboot of standby

SC-1 Syslog

Oct  5 13:01:23 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:4, 
dest:565215202263055)
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Global discard node received for 
nodeId:2020f pid:3579
Oct  5 13:01:23 SC-1 osafimmnd[5545]: NO Implementer disconnected 14 <0, 
2020f(down)> (@safAmfService2020f)
Oct  5 13:01:24 SC-1 osafamfd[5592]: **NO Node 'SC-2' left the cluster**
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Node Down event for node id 2020f:
Oct  5 13:01:24 SC-1 osaffmd[5526]: NO Current role: ACTIVE
Oct  5 13:01:24 SC-1 osaffmd[5526]: **Rebooting OpenSAF NodeId = 131599 EE Name 
= SC-2, Reason: Received Node Down for peer controller, OwnNodeId = 131343, 
SupervisionTime = 60
Oct  5 13:01:25 SC-1 external/libvirt[5893]: [5906]: notice: Domain SC-2 was 
stopped**
Oct  5 13:01:27 SC-1 kernel: [ 5355.132093] tipc: Resetting link 
<1.1.1:eth0-1.1.2:eth0>, peer not responding
Oct  5 13:01:27 SC-1 kernel: [ 5355.132123] tipc: Lost link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:27 SC-1 kernel: [ 5355.132126] tipc: Lost contact with <1.1.2>
Oct  5 13:01:27 SC-1 external/libvirt[5893]: [5915]: notice: Domain SC-2 was 
started
Oct  5 13:01:42 SC-1 kernel: [ 5370.557180] tipc: Established link 
<1.1.1:eth0-1.1.2:eth0> on network plane A
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO MDS event from svc_id 25 (change:3, 
dest:565217457979407)
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO New IMMND process is on STANDBY 
Controller at 2020f
Oct  5 13:01:42 SC-1 osafimmd[5535]: WA IMMND on controller (not currently 
coord) requests sync
Oct  5 13:01:42 SC-1 osafimmd[5535]: NO Node 2020f request sync sync-pid:1176 
epoch:0
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Announce sync, epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO Successfully announced sync. New ruling 
epoch:4
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_READY --> 
IMM_SERVER_SYNC_SERVER
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_R_AVAILABLE
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync starting
Oct  5 13:01:43 SC-1 osafimmloadd: IN Synced 346 objects in total
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 
18430
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Epoch set to 4 in ImmModel
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2010f old epoch: 3  new epoch:4
Oct  5 13:01:43 SC-1 osafimmd[5535]: NO ACT: New Epoch for IMMND process at 
node 2020f old epoch: 0  new epoch:4
Oct  5 13:01:43 SC-1 osafimmloadd: NO Sync ending normally
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO SERVER STATE: IMM_SERVER_SYNC_SERVER 
--> IMM_SERVER_READY
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Received node_up from 2020f: msg_id 1
Oct  5 13:01:43 SC-1 osafamfd[5592]: NO Node 'SC-2' joined the cluster
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer connected: 16 
(MsgQueueService131599) <467, 2010f>
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer locally disconnected. 
Marking it as doomed 16 <467, 2010f> (MsgQueueService131599)
Oct  5 13:01:43 SC-1 osafimmnd[5545]: NO Implementer disconnected 16 <467, 
2010f> (MsgQueueService131599)
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Peer up on node 0x2020f
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO clm init OK
Oct  5 13:01:44 SC-1 osafimmd[5535]: NO MDS event from svc_id 24 (change:5, 
dest:13)
Oct  5 13:01:44 SC-1 osaffmd[5526]: NO Peer clm node name: SC-2
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info request from node 0x2020f 
with role STANDBY
Oct  5 13:01:44 SC-1 osafrded[5518]: NO Got peer info response from node 
0x2020f with role STANDBY




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech