[devel] [PATCH 1/1] fmd: improve failover response time [#3008]

2019-02-17 Thread Gary Lee
Improve failover response time if split brain prevention is enabled but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0. Also, return immediately if node promotion fails to avoid sending active role to RDA --- src/fm/fmd/fm_rda.cc | 14 +- 1 file changed, 9 insertions(+), 5

[devel] [PATCH 0/1] Review Request for fmd: improve failover response time [#3008]

2019-02-17 Thread Gary Lee
revision 6deba4a116fd04687b0f4f7fdc9cf96db1a77234 Author: Gary Lee Date: Mon, 18 Feb 2019 15:34:02 +1100 fmd: improve failover response time [#3008] Improve failover response time if split brain prevention is enabled but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0. Also, return immediately if node promotion fail

Re: [devel] [PATCH 1/1] osaf: Call opensaf_quick_reboot if failed to set active role in consensus [#3001]

2019-02-15 Thread Gary Lee
Ack, review only. > On 15 Feb 2019, at 5:51 pm, Minh Chau wrote: > > --- > src/fm/fmd/fm_rda.cc | 4 ++-- > src/rde/rded/rde_main.cc | 8 +++- > src/rde/rded/role.cc | 8 > 3 files changed, 9 insertions(+), 11 deletions(-) > > diff --git a/src/fm/fmd/fm_rda.cc

[devel] [PATCH 0/1] Review Request for base: fix compiler warnings [#3006]

2019-02-08 Thread Gary Lee
955487e18b Author: Gary Lee Date: Sat, 9 Feb 2019 14:02:13 +1100 base: fix warnings [#3006] fix warnings about unused variables and add SA_RESTART Complete diffstat: -- src/base/daemon.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) Testin

[devel] [PATCH 6/6] rded: fix leak when processing takeover_request [#3006]

2019-02-07 Thread Gary Lee
a memory leak may occur if a takeover_request is processed after split brain prevention is disabled at runtime --- src/rde/rded/rde_main.cc | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc index f511814..bb17133 100644

[devel] [PATCH 5/6] base: add daemon_sighup_install [#3006]

2019-02-07 Thread Gary Lee
refactor amfd/fmd/rded to use daemon_sighup_install --- src/amf/amfd/main.cc | 27 +-- src/base/daemon.c| 36 +++- src/base/daemon.h| 3 +++ src/fm/fmd/fm_main.cc| 28 +---

[devel] [PATCH 3/6] amfd: reload split brain prevention parameters on SIGHUP [#3006]

2019-02-07 Thread Gary Lee
--- src/amf/amfd/cb.h | 1 + src/amf/amfd/main.cc | 32 src/amf/amfd/osaf-amfd.in | 1 + 3 files changed, 34 insertions(+) diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h index 7ac743e..89cf15d 100644 --- a/src/amf/amfd/cb.h +++

[devel] [PATCH 0/6] Review Request for osaf: allow split brain prevention parameter changes at runtime V2 [#3006]

2019-02-07 Thread Gary Lee
anges from V1: add daemon_sighup_install after review fix memory leak in rded -- revision 32573b69eae74db365b4842ff41c00789d6f295d Author: Gary Lee Date: Fri, 8 Feb 2019 14:20:25 +1100 rded: fix leak when processing takeover_request [#3006] a memory leak may occur if a takeover_request is

[devel] [PATCH 4/6] rded: reload split brain prevention parameters on SIGHUP [#3006]

2019-02-07 Thread Gary Lee
If enabled at runtime and this node is active, promote this node in consensus service. If disabled at runtime, watch threads will terminate gracefully when the plugin exits after losing connectivty to the consensus service. --- src/rde/rded/osaf-rded.in | 1 + src/rde/rded/rde_main.cc | 59

Re: [devel] [PATCH 0/4] Review Request for osaf: allow split brain prevention parameter changes at runtime [#3006]

2019-02-06 Thread Gary Lee
brain prevention when it's running, when it's done we might have to document this runtime change. Thanks Minh On 4/2/19 9:41 pm, Gary Lee wrote: Summary: osaf: add ability to reload config from fmd.conf [#3006] Review request for Ticket(s): 3006 Peer Reviewer(s): Hans, Minh Pull request

[devel] [PATCH 4/4] rded: reload split brain prevention parameters on SIGHUP [#3006]

2019-02-04 Thread Gary Lee
If enabled at runtime and this node is active, promote this node in consensus service. If disabled at runtime, watch threads will terminate gracefully when the plugin exits after losing connectivty to the consensus service. --- src/rde/rded/osaf-rded.in | 1 + src/rde/rded/rde_main.cc | 59

[devel] [PATCH 2/4] fmd: reload split brain prevention parameters on SIGHUP [#3006]

2019-02-04 Thread Gary Lee
--- src/fm/fmd/fm_main.cc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc index b473b68..ea70f24 100644 --- a/src/fm/fmd/fm_main.cc +++ b/src/fm/fmd/fm_main.cc @@ -306,6 +306,8 @@ int main(int argc, char *argv[]) { if

[devel] [PATCH 1/4] osaf: add ability to reload config from fmd.conf [#3006]

2019-02-04 Thread Gary Lee
Add ReloadConfiguration() function - when called it will read fmd.conf and look for 'export FMS_X=', and overwrite current environment variable settings in the caller. This allows split brain prevention parameters to be changed at runtime without a node restart. ---

[devel] [PATCH 0/4] Review Request for osaf: allow split brain prevention parameter changes at runtime [#3006]

2019-02-04 Thread Gary Lee
revision b5206b54fbc5462eaf6f0599d2c449f22087635d Author: Gary Lee Date: Mon, 4 Feb 2019 21:33:11 +1100 rded: reload split brain prevention parameters on SIGHUP [#3006] If enabled at runtime and this node is active, promote this node in consensus service. If disabled at runtime, watch threads will terminate

[devel] [PATCH 3/4] amfd: reload split brain prevention parameters on SIGHUP [#3006]

2019-02-04 Thread Gary Lee
--- src/amf/amfd/cb.h | 1 + src/amf/amfd/main.cc | 32 src/amf/amfd/osaf-amfd.in | 1 + 3 files changed, 34 insertions(+) diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h index 7ac743e..89cf15d 100644 --- a/src/amf/amfd/cb.h +++

[devel] [PATCH 0/1] Review Request for rded: process takeover request without delay [#3005]

2019-01-28 Thread Gary Lee
revision 09af1afd94395624247750749bdf1f7c9e325fd4 Author: Gary Lee Date: Tue, 29 Jan 2019 15:08:15 +1100 rded: process takeover request without delay [#3005] Currently, a takeover request is not sent to the main thread immediately so that MDS messages related to topology changes are processed first. If the plugin

[devel] [PATCH 1/1] rded: process takeover request without delay [#3005]

2019-01-28 Thread Gary Lee
Currently, a takeover request is not sent to the main thread immediately so that MDS messages related to topology changes are processed first. If the plugin informs us it has lost connectivity to the consensus service by returning 'UNDEFINED', or we prioritise the current active SC, then we

[devel] [PATCH 0/2] Review Request for osaf: update etcd v2 plugin [#3003]

2019-01-23 Thread Gary Lee
revision d3dba70d552d21717bf83a17c5f7ef4e653f230a Author: Gary Lee Date: Thu, 24 Jan 2019 12:11:36 +1100 osaf: update sample plugin [#3003] revision 26f675ea92b6b4ba9045182d083dcf607e6d123a Author: Gary Lee Date: Thu, 24 Jan 2019 12:11:36 +1100 osaf: update etcd v2 plugin [#3003] 'etcdctl watch' will return if

[devel] [PATCH 1/2] osaf: update etcd v2 plugin [#3003]

2019-01-23 Thread Gary Lee
'etcdctl watch' will return if connection to the etcd server is lost. If that occurs, send a 'fake' takeover request to rded so rded will reboot the node. This is in alignment with the etcd v3 plugin. --- src/osaf/consensus/plugins/etcd.plugin | 29 + 1 file changed,

[devel] [PATCH 2/2] osaf: update sample plugin [#3003]

2019-01-23 Thread Gary Lee
--- src/osaf/consensus/plugins/sample.plugin | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/src/osaf/consensus/plugins/sample.plugin b/src/osaf/consensus/plugins/sample.plugin index fc4c54c..cadb9e0 100644 --- a/src/osaf/consensus/plugins/sample.plugin

Re: [devel] [PATCH 0/5] Review Request for rded: add relaxed node promotion feature [#2996]

2019-01-21 Thread Gary Lee
in amfd. I intentionally separated it out from [5], so [5] has rded changes only. On 21/1/19 9:43 pm, Minh Hon Chau wrote: Hi Gary, I'm trying to understand the patch 3/5 and 4/5, there seems to be logic of *relaxed mode* left in 3/5 and 4/5. Thanks Minh On 21/1/19 2:52 pm, Gary Lee

[devel] [PATCH 2/5] fmd: add configuration parameters [#2996]

2019-01-20 Thread Gary Lee
Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and FMS_RELAXED_NODE_PROMOTION. --- src/fm/fmd/fmd.conf | 17 + 1 file changed, 17 insertions(+) diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf index 9a106bf..209e484 100644 --- a/src/fm/fmd/fmd.conf +++

[devel] [PATCH 4/5] amfd: allow node to remain active is peer SC can be seen [#2996]

2019-01-20 Thread Gary Lee
If relaxed node promotion is enabled, allow a SC to remain active if the peer SC can be seen, even if access to the consensus service is lost. --- src/amf/amfd/ndfsm.cc | 2 +- src/amf/amfd/ndproc.cc | 13 +++-- src/amf/amfd/proc.h| 2 +- 3 files changed, 13 insertions(+), 4

[devel] [PATCH 3/5] osaf: allow active SC to be preferred during network split [#2996]

2019-01-20 Thread Gary Lee
Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow active SC to be preferred during a network split. The default behavior is to prefer the larger partition to maintain existing behaviour. Add configuration support for FMS_RELAXED_NODE_PROMOTION. --- src/osaf/consensus/consensus.cc | 39

[devel] [PATCH 5/5] rded: add relaxed node promotion feature [#2996]

2019-01-20 Thread Gary Lee
Allow promotion of node to active at cluster startup, even if the consensus service is unavailable, if the peer SC can be seen. During normal cluster operation, if the consensus service becomes unavailable but the peer SC can still be seen, allow the existing active SC to remain active. A new

[devel] [PATCH 1/5] osaf: update etcd3 to poll instead of watch [#2996]

2019-01-20 Thread Gary Lee
The 'watch' command does not return if the etcd server goes down. We need to poll the etcd server to properly check we still have connectivity to the etcd server. --- src/osaf/consensus/plugins/etcd3.plugin | 50 ++--- 1 file changed, 40 insertions(+), 10 deletions(-)

[devel] [PATCH 0/5] Review Request for rded: add relaxed node promotion feature [#2996]

2019-01-20 Thread Gary Lee
revision 9a681198810be2e2ad3f512ff966fe1d9eceb1ab Author: Gary Lee Date: Mon, 21 Jan 2019 14:35:49 +1100 rded: add relaxed node promotion feature [#2996] Allow promotion of node to active at cluster startup, even if the consensus service is unavailable, if the peer SC can be seen. During normal cluster operation, if the

Re: [devel] [PATCH 1/1] amfd: Fix misordered and dropped item in job queue [#2981]

2018-12-11 Thread Gary Lee
Hi Minh ack with minor comments: - it seems like we have the wrong data structure here, maybe fix in an enhancement. - try to simplify the else statement, eg a single place that calls dequeue/queue? On 10/12/18 4:44 pm, Minh Chau wrote: --- src/amf/amfd/imm.cc | 54

[devel] [PATCH 0/2] Review Request for amfd: checkpoint node state to standby [#2971]

2018-11-25 Thread Gary Lee
revision 846d1b4410f47f808f7f29cdba8e4abec167d99d Author: Gary Lee Date: Mon, 26 Nov 2018 17:18:55 +1100 amfd: set userData [#2971] Depending on timing, it's possible for node_info.member to be set after this ccb callback. We should populate userData anyway, in case the active validates this callback and

[devel] [PATCH 1/2] amfd: checkpoint node state to standby [#2971]

2018-11-25 Thread Gary Lee
we need to checkpoint change to node_info.member to the standby --- src/amf/amfd/ndfsm.cc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc index 301de83..aeac062 100644 --- a/src/amf/amfd/ndfsm.cc +++ b/src/amf/amfd/ndfsm.cc @@ -821,6 +821,9 @@

[devel] [PATCH 2/2] amfd: set userData [#2971]

2018-11-25 Thread Gary Lee
Depending on timing, it's possible for node_info.member to be set after this ccb callback. We should populate userData anyway, in case the active validates this callback and then a SC failover to the standby occurs. --- src/amf/amfd/node.cc | 1 + 1 file changed, 1 insertion(+) diff --git

Re: [devel] [PATCH 1/1] amf: should not run check_nodes_after_reinit_imm() out of main [#2972]

2018-11-23 Thread Gary Lee
ack Thanks On 23/11/18 10:00 pm, thuan.tran wrote: --- src/amf/amfd/imm.cc | 4 src/amf/amfd/main.cc | 2 ++ 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc index e990288..82d2b13 100644 --- a/src/amf/amfd/imm.cc +++

[devel] [PATCH 0/1] Review Request for amfd: add node to failover_list before calling SetState [#2963]

2018-11-15 Thread Gary Lee
revision c43ae9d97d169cc4a3b57da14ed9191dca8dfba5 Author: Gary Lee Date: Fri, 16 Nov 2018 06:33:18 + amfd: add node to failover_list before calling SetState [#2963] node must be added to failover_list before SetState() is called. If the state is 'end', then it will be deleted by SetState(). Otherwise, we

[devel] [PATCH 1/1] amfd: add node to failover_list before calling SetState [#2963]

2018-11-15 Thread Gary Lee
node must be added to failover_list before SetState() is called. If the state is 'end', then it will be deleted by SetState(). Otherwise, we will leave a node in 'End' state mistakenly in failover_list. --- src/amf/amfd/ckpt_dec.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff

Re: [devel] [PATCH 1/1] amfd: do not assert when events are received in End state [#2963]

2018-11-15 Thread Gary Lee
Hi Minh Please ignore. I will send another patch. On 16/11/18 4:16 pm, Gary Lee wrote: On the standby, receiving events in 'End' state is not entirely unexpected, depending on when the checkpoint arrives from the active. We should just log a message instead of asserting. 2018-11-16 00:49

[devel] [PATCH 0/1] Review Request for amfd: do not assert when events are received in End state [#2963]

2018-11-15 Thread Gary Lee
revision 2cec33e87290d99797cc3aa71314388825aa0c7f Author: Gary Lee Date: Fri, 16 Nov 2018 05:00:58 + amfd: do not assert when events are received in End state [#2963] On the standby, receiving events in 'End' state is not entirely unexpected, depending on when the checkpoint arrives from the active. We s

[devel] [PATCH 1/1] amfd: do not assert when events are received in End state [#2963]

2018-11-15 Thread Gary Lee
On the standby, receiving events in 'End' state is not entirely unexpected, depending on when the checkpoint arrives from the active. We should just log a message instead of asserting. 2018-11-16 00:49:22.096 SC-2 osafamfd[252]: NO Node 'PL-3' not found in failover_list. Create new entry

[devel] [PATCH 0/1] Review Request for amfd: set node failover state correctly on standby [#2963]

2018-11-15 Thread Gary Lee
revision 4935b51bce5f313925006d1cf1cf1c46c71f0abb Author: Gary Lee Date: Thu, 15 Nov 2018 23:30:54 + amfd: set node failover state correctly on standby [#2963] Complete diffstat: -- src/amf/amfd/node_state_machine.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testin

[devel] [PATCH 1/1] amfd: set node failover state correctly on standby [#2963]

2018-11-15 Thread Gary Lee
--- src/amf/amfd/node_state_machine.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amf/amfd/node_state_machine.cc b/src/amf/amfd/node_state_machine.cc index 478ad2a48..c5d86d33c 100644 --- a/src/amf/amfd/node_state_machine.cc +++ b/src/amf/amfd/node_state_machine.cc

Re: [devel] [PATCH 1/1] amfd: Give assignment for pre-instantiated su after the node joins cluster [#2960]

2018-11-15 Thread Gary Lee
Hi Minh Ack (review) > On 14 Nov 2018, at 3:37 pm, Minh Chau wrote: > > If the SU is unlock-in/unlock before the node joins cluster, the SU is > instantiated > and in unlocked state. However, when the node completes joining the cluster, > amfd > assumes all applications' SU uninstantiated

[devel] [PATCH 0/2] Review Request for osaf: ensure takeover_requests have a lease [#2954]

2018-11-12 Thread Gary Lee
ment: Canh noticed this problem and did the initial patches revision 81650ab440f6b140fd4d5d485d13e573b4e070c0 Author: Gary Lee Date: Tue, 13 Nov 2018 07:30:45 + osaf: update etcd2 and sample plugins [#2954] add timeout parameter to set and set_if_prev revision f3a64996eeb739904c7d8e1abcf4f4

[devel] [PATCH 1/2] osaf: ensure takeover_requests have a lease [#2954]

2018-11-12 Thread Gary Lee
In CreateTakeoverRequest(), if the initial attempt fails, then the takeover_request is created without a lease. Furthermore, when the takeover_request result is set, it is being set without a lease, and the takeover_request is not automatically removed. Add parameter to KeyValue::Set, and

[devel] [PATCH 2/2] osaf: update etcd2 and sample plugins [#2954]

2018-11-12 Thread Gary Lee
add timeout parameter to set and set_if_prev --- src/osaf/consensus/plugins/etcd.plugin | 20 src/osaf/consensus/plugins/sample.plugin | 16 ++-- 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/src/osaf/consensus/plugins/etcd.plugin

[devel] [PATCH 1/1] amfd: check consensus service is writable [#2957]

2018-11-12 Thread Gary Lee
A check to make sure the consensus service is writable (ie. the SC is in a partition with quorum) is present in avd_node_failover(). However, [#2918] means this function is not always being called. We need to move it. --- src/amf/amfd/ndfsm.cc | 1 + src/amf/amfd/ndproc.cc | 10 +++---

[devel] [PATCH 0/1] Review Request for amfd: check consensus service is writable [#2957]

2018-11-12 Thread Gary Lee
revision 34dc36653385a670198acdbc66f4b53699524696 Author: Gary Lee Date: Tue, 13 Nov 2018 06:23:52 + amfd: check consensus service is writable [#2957] A check to make sure the consensus service is writable (ie. the SC is in a partition with quorum) is present in avd_node_failover(). However, [#2918] means thi

Re: [devel] [PATCH 1/1] amf: active amfd should check nodes after reinit with imm [#2949]

2018-11-12 Thread Gary Lee
Hi Thuan Ack with one comment (review only), I guess avd_check_nodes_after_renit_imn() and should be avd_check_nodes_after_reinit_imm()? I'll change it and push on your behalf. Thanks Gary On 12/11/18 7:25 pm, thuan.tran wrote: - When AMFD got IMM BAD_HANDLE, it will try to finalize

[devel] [PATCH 1/1] amfd: reset snd_msg_id in LostFound state [#2952]

2018-11-01 Thread Gary Lee
If a PL rejoins the main network partition before the node failover timer expires, it is told to reboot by AMFD. AMFND thinks it has become headless and resets rcv_msg_id to 0, and shows this when it receives the reboot msg from AMFD: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason:

[devel] [PATCH 0/1] Review Request for amfd: reset snd_msg_id in LostFound state [#2952]

2018-11-01 Thread Gary Lee
revision 2ddb0f3a9bc401afcecf7e17f5a629a709e27c48 Author: Gary Lee Date: Fri, 2 Nov 2018 04:57:55 + amfd: reset snd_msg_id in LostFound state [#2952] If a PL rejoins the main network partition before the node failover timer expires, it is told to reboot by AMFD. AMFND thinks it has become headless

Re: [devel] [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946]

2018-10-30 Thread Gary Lee
>> >> -Original Message- >> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] >> Sent: 30 October 2018 12:40 >> To: Nagendra Kumar; 'Gary Lee'; hans.nordeb...@ericsson.com >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH

Re: [devel] [PATCH 4/4] amfd: add support for delaying node failover [#2918]

2018-10-29 Thread Gary Lee
t find when the following code will be called: +LostFound::LostFound(NodeStateMachine *fsm) : +LostRebooting::LostRebooting(NodeStateMachine *fsm) : Thanks -Nagu -Original Message- From: Gary Lee [mailto:gary@dektech.com.au] Sent: 24 October 2018 17:57 To: hans.nordeb...@ericsson.com;

Re: [devel] [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946]

2018-10-29 Thread Gary Lee
Hi Nagu Thanks for the quick review! ‘Also modify avd_count_node_up() not to count standby SC’ - I guess you figured out the NOT shouldn’t be there :) Thanks > On 29 Oct 2018, at 7:19 pm, Nagendra Kumar wrote: > > Also modify avd_count_node_up() not to count standby SC

[devel] [PATCH 1/1] amfd: ensure node_sync_window_closed is set [#2946]

2018-10-29 Thread Gary Lee
If all nodes are synced after headless, the timer is stopped but node_sync_window_closed is never set to true. Later on, if a node becomes split from the main network and rejoins, it will send a headless sync to amfd. amfd will go into a never ending loop of processing the message, putting back

[devel] [PATCH 0/1] Review Request for amfd: ensure node_sync_window_closed is set V2 [#2946]

2018-10-29 Thread Gary Lee
revision ee709671ab47d8b2b8c4b9d7f322bd138d787d6e Author: Gary Lee Date: Mon, 29 Oct 2018 06:36:46 + amfd: ensure node_sync_window_closed is set [#2946] If all nodes are synced after headless, the timer is stopped but node_sync_window_closed is never set to true. Later on, if a node becomes split fro

[devel] [PATCH 0/1] Review Request for amfd: ensure node_sync_window_closed is set [#2946]

2018-10-28 Thread Gary Lee
revision 1c3df910cf48ff60a2fcdf1b7bf1f7a6dec5c967 Author: Gary Lee Date: Mon, 29 Oct 2018 04:20:48 + amfd: ensure node_sync_window_closed is set [#2946] If all nodes are synced after headless, the timer is stopped but node_sync_window_closed is never set to true. Later on, if a node becomes split from the main network a

[devel] [PATCH 1/1] amfnd: change log message severity [#2945]

2018-10-24 Thread Gary Lee
--- src/amf/amfnd/clm.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amf/amfnd/clm.cc b/src/amf/amfnd/clm.cc index f1f65bcef..06eb229c7 100644 --- a/src/amf/amfnd/clm.cc +++ b/src/amf/amfnd/clm.cc @@ -124,7 +124,7 @@ static void clm_to_amf_node(void) { error =

[devel] [PATCH 0/1] Review Request for amfnd: change log message severity [#2945]

2018-10-24 Thread Gary Lee
revision c7de076e4efbcc2c4822e7ad4f8eafa0cdf61f46 Author: Gary Lee Date: Thu, 25 Oct 2018 05:24:25 + amfnd: change log message severity [#2945] Complete diffstat: -- src/amf/amfnd/clm.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Testing Commands: - *** LIST THE COMMAND

[devel] [PATCH 4/4] amfd: add support for delaying node failover [#2918]

2018-10-24 Thread Gary Lee
OpenSAF has relied on reliable, redundant links between nodes in a cluster. This can no longer be assumed in virtualised environments. In order to avoid duplicate assignments, we need to delay node failover in environments where temporary network partitioning is expected. When delayed node

[devel] [PATCH 0/4] Review Request for amfd: add support for delaying node failover [#2918]

2018-10-24 Thread Gary Lee
t for more details and a state diagram is available there. revision 7e04f9bc5aea4f5580e3bdf0551b37c05bfc4025 Author: Gary Lee Date: Wed, 24 Oct 2018 11:37:04 + amfd: add support for delaying node failover [#2918] OpenSAF has relied on reliable, redundant links between nodes in a cluster

[devel] [PATCH 2/4] amfnd: allow reboot from any director [#2918]

2018-10-24 Thread Gary Lee
allow reboot msg to be sent from any director, for split brain recovery situations --- src/amf/amfnd/mds.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/amf/amfnd/mds.cc b/src/amf/amfnd/mds.cc index 1ee24cf5b..d179ff40e 100644 --- a/src/amf/amfnd/mds.cc +++

[devel] [PATCH 1/4] amfd: add class definitions for new timers [#2918]

2018-10-24 Thread Gary Lee
osafAmfDelayNodeFailoverTimeout - the number of seconds we wait after MDS down is received before we consider it truly down. osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we wait for Node Up after receving MDS up, before we send reboot to the node. After sending reboot to a node,

[devel] [PATCH 3/4] amfd: add checkpointing of node failover state [#2918]

2018-10-24 Thread Gary Lee
--- src/amf/amfd/chkop.cc| 10 ++ src/amf/amfd/ckpt.h | 3 ++- src/amf/amfd/ckpt_dec.cc | 40 +++- src/amf/amfd/ckpt_enc.cc | 26 -- src/amf/amfd/ckpt_msg.h | 1 + 5 files changed, 76 insertions(+), 4 deletions(-)

[devel] [PATCH 1/4] amfd: add class definitions for new timers [#2918]

2018-10-24 Thread Gary Lee
osafAmfDelayNodeFailoverTimeout - the number of seconds we wait after MDS down is received before we consider it truly down. osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we wait for Node Up after receving MDS up, before we send reboot to the node. After sending reboot to a node,

[devel] [PATCH 0/4] Review Request for Test Case 1:

2018-10-24 Thread Gary Lee
t for more details and a state diagram is available there. revision 7e04f9bc5aea4f5580e3bdf0551b37c05bfc4025 Author: Gary Lee Date: Wed, 24 Oct 2018 11:37:04 + amfd: add support for delaying node failover [#2918] OpenSAF has relied on reliable, redundant links between nodes in a cluster

[devel] [PATCH 3/4] amfd: add checkpointing of node failover state [#2918]

2018-10-24 Thread Gary Lee
--- src/amf/amfd/chkop.cc| 10 ++ src/amf/amfd/ckpt.h | 3 ++- src/amf/amfd/ckpt_dec.cc | 40 +++- src/amf/amfd/ckpt_enc.cc | 26 -- src/amf/amfd/ckpt_msg.h | 1 + 5 files changed, 76 insertions(+), 4 deletions(-)

[devel] [PATCH 0/1] Review Request for rded: fence only one SC if split brain is detected [#2935]

2018-10-23 Thread Gary Lee
OpenSAF servicesy Core libraries n Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 3c530e1f8cd1ed64826ac0fe4d461809af9c5d7b Author: Gary

[devel] [PATCH 1/1] rded: fence only one SC if split brain is detected [#2935]

2018-10-23 Thread Gary Lee
Keep the SC with the earlier boot time alive, if split brain is detected. In the unlikely event that the boot up time is equal, the node with the lower ID survives. --- src/rde/rded/rde_cb.h| 2 ++ src/rde/rded/rde_main.cc | 21 - src/rde/rded/rde_mds.cc | 22

Re: [devel] [PATCH 0/2] Review Request for amf: Handle the excessive assigments for 2N, Nway Active, NoRed due to split brain V2 [#2929]

2018-10-15 Thread Gary Lee
Hi Minh ack (review only). Perhaps you could remove the TRACE_LEAVE() though. Thanks Gary On 16/10/18 09:16, Minh Chau wrote: Summary: amf: Add new susi fsm EXCESSIVE state to handle excessive assignment due to splitbrain V2 [#2929] Review request for Ticket(s): 2929 Peer Reviewer(s):

Re: [devel] [PATCH 1/1] amf: retry when implementer set returned ERR_EXIST [#2921]

2018-10-07 Thread Gary Lee
Hi One comment inline. /Gary On 3/10/18, 7:00 am, "Minh Hon Chau" wrote: Hi Thang, It looks no harm to me if we do some retries to set implementer. You may need Gary/Hans to have a look, since it relates to consensus case. Thanks Minh On

[devel] [PATCH 1/3] base: remove use of SIGHUP to toggle INFO messages [#2923]

2018-09-21 Thread Gary Lee
We need to use SIGHUP for reload of configuration --- src/base/daemon.c| 5 +++-- src/base/logtrace.cc | 22 -- 2 files changed, 3 insertions(+), 24 deletions(-) diff --git a/src/base/daemon.c b/src/base/daemon.c index 4a37c4174..cdde7fde0 100644 --- a/src/base/daemon.c

[devel] [PATCH 0/3] Review Request for fmd: enable reload of configuration without restart V3 [#2923]

2018-09-21 Thread Gary Lee
addressed V2 comments, removed used of regex revision 0900ae075f9bbdc2fc7e8bdb87affd343e621d8d Author: Gary Lee Date: Fri, 21 Sep 2018 04:16:53 + fmd: enable reload of configuration without restart [#2923] Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and FMS_ACTIVATION_SUPERVI

Re: [devel] [PATCH 2/3] base: add config file reader [#2923]

2018-09-20 Thread Gary Lee
. /Thanks HansN On 09/20/2018 02:09 PM, Gary Lee wrote: > Hi Hans > > Just tested it with 4.9.3 and it works fine. > > It doesn't link with 4.8.5 > > config_file_reader.cc:(.text+0x1c8): undefined reference to `std::regex_token_iterator<__gn

Re: [devel] [PATCH 2/3] base: add config file reader [#2923]

2018-09-20 Thread Gary Lee
ect: Re: [PATCH 2/3] base: add config file reader [#2923] ack, review only. Some comments below. /Thanks HansN On 09/19/2018 05:42 AM, Gary Lee wrote: > Some configuration attribute are read by OpenSAF daemons as >

Re: [devel] [PATCH 2/3] base: add config file reader [#2923]

2018-09-20 Thread Gary Lee
er [#2923] ack, review only. Some comments below. /Thanks HansN On 09/19/2018 05:42 AM, Gary Lee wrote: > Some configuration attribute are read by OpenSAF daemons as > environment variables. eg. > > export FMS_PROMOTE_ACTIVE_TIMER=0 >

Re: [devel] [PATCH 1/1] smf: improve CcbApplyCallback() to avoid NULL access [#2927]

2018-09-20 Thread Gary Lee
Ack (review only) -Original Message- From: "thuan.tran" Date: Monday, 17 September 2018 at 9:39 pm To: , gary Cc: , "thuan.tran" Subject: [PATCH 1/1] smf: improve CcbApplyCallback() to avoid NULL access [#2927] Current CcbApplyCallback() is not safe, may access NULL pointer

[devel] [PATCH 3/3] fmd: enable reload of configuration without restart [#2923]

2018-09-18 Thread Gary Lee
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and FMS_ACTIVATION_SUPERVISION_TIMER are currently supported. These values can be changed in fmd.conf and take effect by sending SIGHUP to fmd. --- src/fm/fmd/fm_cb.h | 2 ++ src/fm/fmd/fm_main.cc | 70

[devel] [PATCH 0/3] Review Request for fmd: enable reload of configuration without restart V2 [#2923]

2018-09-18 Thread Gary Lee
revision a297b2e05fd428f2088c4d1c345ae6052fdf51f1 Author: Gary Lee Date: Wed, 19 Sep 2018 03:33:52 + fmd: enable reload of configuration without restart [#2923] Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and FMS_ACTIVATION_SUPERVISION_TIMER are currently supported. These values can be changed i

[devel] [PATCH 1/3] base: remove use of SIGHUP to toggle INFO messages [#2923]

2018-09-18 Thread Gary Lee
We need to use SIGHUP for reload of configuration --- src/base/daemon.c| 5 +++-- src/base/logtrace.cc | 22 -- 2 files changed, 3 insertions(+), 24 deletions(-) diff --git a/src/base/daemon.c b/src/base/daemon.c index 4a37c4174..cdde7fde0 100644 --- a/src/base/daemon.c

[devel] [PATCH 2/3] base: add config file reader [#2923]

2018-09-18 Thread Gary Lee
Some configuration attribute are read by OpenSAF daemons as environment variables. eg. export FMS_PROMOTE_ACTIVE_TIMER=0 There is no easy way to reload these values without a restart. ConfigFileReader will parse these files looking for 'export VAR=VAL' and store them into a map, so a daemon can

Re: [devel] [PATCH 0/3] Review Request for fmd: enable reload of configuration without restart [#2923]

2018-09-18 Thread Gary Lee
Sorry, I will send out a V2 soon. Please ignore. I haven't handled the case where you send SIGHUP to a process that doesn't explicitly handle it. On 18/09/18 18:13, Gary Lee wrote: Summary: base: remove use of SIGHUP to toggle INFO messages [#2923] Review request for Ticket(s): 2923 Peer

[devel] [PATCH 3/3] fmd: enable reload of configuration without restart [#2923]

2018-09-18 Thread Gary Lee
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and FMS_ACTIVATION_SUPERVISION_TIMER are currently supported. These values can be changed in fmd.conf and take effect by sending SIGHUP to fmd. --- src/fm/fmd/fm_cb.h | 2 ++ src/fm/fmd/fm_main.cc | 70

[devel] [PATCH 2/3] base: add config file reader [#2923]

2018-09-18 Thread Gary Lee
Some configuration attribute are read by OpenSAF daemons as environment variables. eg. export FMS_PROMOTE_ACTIVE_TIMER=0 There is no easy way to reload these values without a restart. ConfigFileReader will parse these files looking for 'export VAR=VAL' and store them into a map, so a daemon can

[devel] [PATCH 0/3] Review Request for fmd: enable reload of configuration without restart [#2923]

2018-09-18 Thread Gary Lee
revision 093496e7869f7dc205ec9f176d5be30d75564c34 Author: Gary Lee Date: Tue, 18 Sep 2018 08:05:40 + fmd: enable reload of configuration without restart [#2923] Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and FMS_ACTIVATION_SUPERVISION_TIMER are currently supported. These values can be changed i

[devel] [PATCH 1/3] base: remove use of SIGHUP to toggle INFO messages [#2923]

2018-09-18 Thread Gary Lee
We need to use SIGHUP for reload of configuration --- src/base/logtrace.cc | 22 -- 1 file changed, 22 deletions(-) diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc index c1a194f60..8908c1ff3 100644 --- a/src/base/logtrace.cc +++ b/src/base/logtrace.cc @@ -75,22 +75,6

Re: [devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

2018-09-03 Thread Gary Lee
) - OpenSAF Support and Services  - Original Message - Subject: Re: [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920] From: "Gary Lee" Date: 9/3/18 1:36 pm To: nagen...@hasolutions.in, hans.nordeb...@ericsson.com,

Re: [devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

2018-09-03 Thread Gary Lee
Solutions Pvt. Ltd. (www.hasolutions.in) - OpenSAF Support and Services  - Original Message - Subject: Re: [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920] From: "Gary Lee" Date: 9/3/18 1:36 pm To: nagen...@haso

[devel] [PATCH 0/1] Review Request for hans.nordeb...@ericsson.com, minh.c...@dektech.com.au, nagen...@hasolutions.in

2018-09-03 Thread Gary Lee
uot; above): - revision 1cb5f00bb25729129ee7dc2edf11790a0debbd11 Author: Gary Lee Date: Fri, 31 Aug 2018 06:37:03 + amfd: reboot nodes that report conflicting 2N active assignments [#2920] After a split network event, both SCs can reboot endlessly, due to this assertion: 2018-08-29 18:05:34.689 SC-

[devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

2018-09-03 Thread Gary Lee
After a split network event, both SCs can reboot endlessly, due to this assertion: 2018-08-29 18:05:34.689 SC-2 osafamfd[263]: src/amf/amfd/sg_2n_fsm.cc:596: avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed. 2018-08-29 18:05:34.695 SC-2 osafamfnd[273]: ER AMFD has

Re: [devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

2018-09-03 Thread Gary Lee
Hi Nagendra On 03/09/18 17:50, nagen...@hasolutions.in wrote: Hi Gary, I have few questions: 1. Do we really want to reboot both the nodes in case of conflicts? That's a good question. A cluster reboot should also be considered? I have proposed both nodes as it's somewhere in between. Keep

[devel] [PATCH 0/1] Review Request for amfd: reboot nodes that report conflicting 2N active assignments [#2920]

2018-08-31 Thread Gary Lee
uot; above): - revision 1cb5f00bb25729129ee7dc2edf11790a0debbd11 Author: Gary Lee Date: Fri, 31 Aug 2018 06:37:03 + amfd: reboot nodes that report conflicting 2N active assignments [#2920] After a split network event, both SCs can reboot endlessly, due to this

[devel] [PATCH 1/1] amfd: reboot nodes that report conflicting 2N active assignments [#2920]

2018-08-31 Thread Gary Lee
After a split network event, both SCs can reboot endlessly, due to this assertion: 2018-08-29 18:05:34.689 SC-2 osafamfd[263]: src/amf/amfd/sg_2n_fsm.cc:596: avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed. 2018-08-29 18:05:34.695 SC-2 osafamfnd[273]: ER AMFD has

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-27 Thread Gary Lee
/2018 11:52 PM, Gary Lee wrote: NOTICE: This email was received from an EXTERNAL sender Hi Alex Thanks, it looks much better. So I tried ‘killall

Re: [devel] [PATCH 1/1] amfd: Set SA_AMF_READINESS_IN_SERVICE for qualified SU after cluster startup timeout [#2916]

2018-08-22 Thread Gary Lee
Hi Minh ack (review) Thanks Gary On 22/08/18 20:23, Minh Chau wrote: In the scenario of single step upgrade where the UNLOCK-IN/UNLOCK admin op are issued to a SU hosted on non-active node while cluster startup timer is active and not all ncs SU on that node are fully assigned. In such case,

[devel] [PATCH 0/1] Review Request for osaf: make takeover request expiration time configurable [#2917]

2018-08-21 Thread Gary Lee
9a2 Author: Gary Lee Date: Wed, 22 Aug 2018 04:37:33 + osaf: make takeover request expiration time configurable [#2917] Complete diffstat: -- src/fm/fmd/fmd.conf | 4 src/osaf/consensus/consensus.cc | 24 src/osaf/consensus/consens

[devel] [PATCH 1/1] osaf: make takeover request expiration time configurable [#2917]

2018-08-21 Thread Gary Lee
--- src/fm/fmd/fmd.conf | 4 src/osaf/consensus/consensus.cc | 24 src/osaf/consensus/consensus.h | 4 ++-- 3 files changed, 22 insertions(+), 10 deletions(-) diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf index 9aff54970..9a106bf90 100644 ---

[devel] [PATCH 0/1] Review Request for osaf: modify log severity level in Consensus::Demote [#2912]

2018-08-16 Thread Gary Lee
n OpenSAF servicesn Core libraries y Samples n Tests n Other n Comments (indicate scope for each "y" above): - revision 46e2be0e8ce001bc7349b4556847b0aa7c427772 Au

[devel] [PATCH 1/1] osaf: modify log severity level in Consensus::Demote [#2912]

2018-08-16 Thread Gary Lee
All callers of Consensus::Demote() already log an error if the return code is not SA_AIS_OK. A warning message will suffice. --- src/osaf/consensus/consensus.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-15 Thread Gary Lee
Gary From: Alex Jones Organization: Ribbon Date: Thursday, 16 August 2018 at 3:41 am To: Gary Lee , , , Cc: Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70] G'day Gary, I see you were adding the XML file dynamically with "immcfg -f". I hadn't tri

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-13 Thread Gary Lee
here. But, #2 (rejecting all but NWay-active for container) should already be in there. Is there a specific test you ran that didn't work? Alex On 08/13/2018 02:43 AM, Gary Lee wrote: NOTICE: This email was received from

Re: [devel] [PATCH 1/1] amf: add support for container/contained [#70]

2018-08-13 Thread Gary Lee
Hi Alex Some initial comments: 0. Is it possible to split up the patch into amfd / amfnd / common / samples. Just makes it easier to reply inline. 1. Please compile the container demo by default, and make amf_container_script world executable. Eg. diff --git a/samples/amf/Makefile.am

Re: [devel] [PATCH 2/2] rded: perform KV store operations outside main thread [#2905]

2018-08-07 Thread Gary Lee
Hi Hans Please see reply inline. Thanks On 07/08/18 17:04, Hans Nordeback wrote: Hi Gary, ack, review only. Minor comments below. /Thanks HansN On 08/01/2018 06:49 AM, Gary Lee wrote: Sometimes the 'watch' command in the KV plugin will not return a takeover request, if the KV store does

<    1   2   3   4   5   6   7   8   9   10   >