Re: [devel] [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842]

2018-04-25 Thread Ravi Sekhar Reddy Konda
Hi Minh, Ack for the patch, code review only Regards, Ravi -Original Message- From: Minh Chau [mailto:minh.c...@dektech.com.au] Sent: Thursday, April 26, 2018 4:52 AM To: anders.wid...@ericsson.com; hans.nordeb...@ericsson.com; ravisekhar.ko...@oracle.com Cc: opensaf-devel@lists.source

[devel] [PATCH 0/1] Review Request for clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842]

2018-04-25 Thread Minh Chau
Summary: clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842] Review request for Ticket(s): 2842 Peer Reviewer(s): *** Anders, Hans, Ravi Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2

[devel] [PATCH 1/1] clmd: Increase message priority of CLMSV_CLMS_MDS_NODE_EVT to be sent to main thread [#2842]

2018-04-25 Thread Minh Chau
In the event of stop/start standby controller, the node is stopped that generates the MDS event CLMSV_CLMS_MDS_NODE_EVT. This event is being sent to main thread with NORMAL priority. When the node is started again, the other event like CLMSV_CLUSTER_JOIN_REQ is being sent with HIGH priority. The r

[devel] [PATCH 0/1] Review Request for msgd: handle abrupt restart of remote node [#2840]

2018-04-25 Thread Alex Jones
Summary: msgd: handle abrupt restart of remote node [#2840] Review request for Ticket(s): 2840 Peer Reviewer(s): Srinivas Pull request to: Affected branch(es): develop Development branch: ticket-2840 Base revision: dd6a9bfe9d897fe9cc3a70e21d7e066b7a727d44 Personal repository: git://git.code.sf.net

[devel] [PATCH 1/1] msgd: handle abrupt restart of remote node [#2840]

2018-04-25 Thread Alex Jones
Sometimes when a remote node restarts abruptly, queues which were created on that node, are unable to be opened again when that node comes up. There is a race condition when the remote node goes down between msgd getting the CLM and MDS events indicating node down, and immd removing the implemente

Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Hans Nordeback
Hi Alex, ok I'll check if there is a problem, but immnd is restartable and should be restarted after the nid phase is finished. After the nid phase the system should  be in a "well defined" state. That was one of the reasons fifo monitoring was added to the nid phase. /HansN On 04/25/20

Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Alex Jones
Hi Hans, I understand. But, what if it doesn't fail in the nid phase? If you run this command in your setup: "systemctl start opensafd; sleep 2; pkill -KILL immnd", does immnd get restarted? And does opensafd successfully come up according to systemd? Alex On 04/25/

Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Hans Nordebäck
Hi Alex, the reboot should only happen if REBOOT_ON_FAIL_TIMEOUT is set, (i.e. not 0). I checked the latest version, the reboot works fine if e.g. immnd fails in the nid phase and REBOOT_ON_FAIL_TIMEOUT is set. /Thanks HansN From: Alex Jones [mailto:ajo...@rbbn.com] Sent: den 25 april 2018 15:0

Re: [devel] [PATCH 1/1] nid: restart opensafd on failure when systemd enabled [#2839]

2018-04-25 Thread Alex Jones
Hi Hans, There must be a hole here, then. Because in our setup, if dtmd or immnd crashes early in the startup process, the node doesn't reboot, and the executables are not restarted. If I set "Restart=on-failure" it works fine. Can you test this in your setup to see if y

[devel] [PATCH 1/1] rded: prevent unnecessary takeover [#2843]

2018-04-25 Thread Gary Lee
rded should not automatically include itself in the cluster member list. Instead it should rely solely on AMFND service up, so that the count is consistent across nodes. Also adjust some split-brain prevention related values. More time is required to ensure we should have an accurate view of clust

[devel] [PATCH 0/1] Review Request for rded: prevent unnecessary takeover [#2843]

2018-04-25 Thread Gary Lee
Summary: rded: prevent unnecessary takeover [#2843] Review request for Ticket(s): 2843 Peer Reviewer(s): Anders, Hans, Ravi Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2843 Base revision: dd6a9bfe9d897fe9cc3a70e21d7e066b7a72