[tickets] [opensaf:tickets] #1897 Incorrect ER messages in syslog

2016-10-06 Thread Long HB Nguyen
- **status**: unassigned --> assigned - **Comment**: I think amfd should use LOG_NO in this case since it returns TRY_AGAIN. It is also to be consistent to si_swap() function in sg_nway_fsm.cc. --- ** [tickets:#1897] Incorrect ER messages in syslog** **Status:** assigned **Milestone:**

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-06 Thread Anders Widell
I think the procedure for stopping OpenSAF in a controlled way is to first lock the node using CLM. The CLM lock admin operation will remove the node from cluster membership. The it should be safe to stop OpenSAF on that node without getting fenced - i.e. we should not fence a node that we lost

[tickets] [opensaf:tickets] #2099 msgd got crashed on Standby controller during cold_sync

2016-10-06 Thread Ritu Raj
--- ** [tickets:#2099] msgd got crashed on Standby controller during cold_sync** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 06, 2016 12:07 PM UTC by Ritu Raj **Last Updated:** Thu Oct 06, 2016 12:07 PM UTC **Owner:** nobody **Attachments:** -

[tickets] [opensaf:tickets] #2097 Both controllers went for reboot while recovering from split brain

2016-10-06 Thread Hans Nordebäck
A question, how is the network configured? I assume tipc is used for the opensaf cluster and there is a separate interface for tcp and stonith, (the backplane)? How are the interfaces brought down? It seems that even the "backplane" interface is down. --- ** [tickets:#2097] Both controllers

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-06 Thread Mathi Naickan
This seems to be a case of differentiating a hung node versus a node on which the middleware is stopped. Is there any standard means to detect a "hung" node? IF there is such a mechanism to detect a hung node, then Upon receiving "NODE_DOWN" i.e. below event "Oct 5 13:01:24 SC-1

[tickets] [opensaf:tickets] #2097 Both controllers went for reboot while recovering from split brain

2016-10-06 Thread Chani Srivastava
The command --- virsh --connect=qemu+tcp://192.168.122.1/system list --all displays IdName State 26PL-3 running 32SC-1 running 33SC-2

[tickets] [opensaf:tickets] #2094 Standby controller goes for reboot on stopping openSaf with STONITH enabled cluster

2016-10-06 Thread Hans Nordebäck
This is the same behaviour as running without stontih or PLM. Without stonith opensaf tries to reboot the standby controller at opensafd stop, but needs either PLM or stonith to succeed. Perhaps it is needed to stop opensaf and not trigger remote fencing? Is this an upgrade case? Perhaps we

[tickets] [opensaf:tickets] #2097 Both controllers went for reboot while recovering from split brain

2016-10-06 Thread Hans Nordebäck
as TCP is used above, check also that /etc/libvirt/libvirtd.conf file, (on the host): listen_tcp = 1 is set --- ** [tickets:#2097] Both controllers went for reboot while recovering from split brain** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 06, 2016 04:58 AM UTC by

[tickets] [opensaf:tickets] #2097 Both controllers went for reboot while recovering from split brain

2016-10-06 Thread Hans Nordebäck
I had a quick look at the logs: Oct 6 10:34:42 SC-1 stonith: [3391]: CRIT: external_reset_req: 'libvirt reset' for host node failed with rc 1 Oct 6 10:34:42 SC-1 opensaf_reboot: Rebooting remote node SC-2 using stonith failed, rc: 5 Oct 6 10:34:42 SC-1 osaffmd[1117]: node reboot failure:

[tickets] [opensaf:tickets] #2098 amfnd: amfnd doesn't exit at opensafd stop

2016-10-06 Thread Hans Nordebäck
On the real system OPENSAF_TERMTIMEOUT has been changed to 1000, and systemd times out opensafd after 5 minutes. --- ** [tickets:#2098] amfnd: amfnd doesn't exit at opensafd stop** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 06, 2016 09:02 AM UTC by Hans Nordebäck **Last

[tickets] [opensaf:tickets] #2098 amfnd: amfnd doesn't exit at opensafd stop

2016-10-06 Thread Praveen
Default term time during OpenSAF stop is (nid.conf): \# Specifies how long "rootd stop" should wait before stop has considered to fail OPENSAF_TERMTIMEOUT=60 In AppConfig-nwayactive.xml for clean up script saAmfCtDefClcCliTimeout is 10 seconds. Are these values same? AMFND is exiting even if

[tickets] [opensaf:tickets] #2098 amfnd: amfnd doesn't exit at opensafd stop

2016-10-06 Thread Hans Nordebäck
--- ** [tickets:#2098] amfnd: amfnd doesn't exit at opensafd stop** **Status:** unassigned **Milestone:** 5.2.FC **Created:** Thu Oct 06, 2016 09:02 AM UTC by Hans Nordebäck **Last Updated:** Thu Oct 06, 2016 09:02 AM UTC **Owner:** nobody amfnd doesn't exit at opensafd stop. This problem

[tickets] [opensaf:tickets] #1872 smf: ER messages when handling smfnd node up

2016-10-06 Thread elunlen
- **status**: accepted --> unassigned - **assigned_to**: elunlen --> nobody --- ** [tickets:#1872] smf: ER messages when handling smfnd node up** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Fri Jun 10, 2016 08:41 AM UTC by elunlen **Last Updated:** Thu Oct 06, 2016 07:07 AM UTC

[tickets] [opensaf:tickets] #1872 smf: ER messages when handling smfnd node up

2016-10-06 Thread elunlen
Is related to [#1969] ? --- ** [tickets:#1872] smf: ER messages when handling smfnd node up** **Status:** accepted **Milestone:** 5.0.2 **Created:** Fri Jun 10, 2016 08:41 AM UTC by elunlen **Last Updated:** Tue Sep 20, 2016 05:43 PM UTC **Owner:** elunlen The following error messages is

[tickets] [opensaf:tickets] #2041 Msg: saMsgInitialize is returning continuous TRY_AGAINS after mqsv ndrestarts in backward compatability.

2016-10-06 Thread Hung Nguyen
- **Component**: imm --> msg --- ** [tickets:#2041] Msg: saMsgInitialize is returning continuous TRY_AGAINS after mqsv ndrestarts in backward compatability.** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Fri Sep 16, 2016 12:13 PM UTC by Madhurika Koppula **Last Updated:** Tue Sep