Improve failover response time if split brain prevention is enabled
but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.
Also, return immediately if node promotion fails to avoid
sending active role to RDA
---
src/fm/fmd/fm_rda.cc | 14 +-
1 file changed, 9 insertions(+), 5
revision 6deba4a116fd04687b0f4f7fdc9cf96db1a77234
Author: Gary Lee
Date: Mon, 18 Feb 2019 15:34:02 +1100
fmd: improve failover response time [#3008]
Improve failover response time if split brain prevention is enabled
but FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 0.
Also, return immediately if node promotion fail
Ack, review only.
> On 15 Feb 2019, at 5:51 pm, Minh Chau wrote:
>
> ---
> src/fm/fmd/fm_rda.cc | 4 ++--
> src/rde/rded/rde_main.cc | 8 +++-
> src/rde/rded/role.cc | 8
> 3 files changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/src/fm/fmd/fm_rda.cc
955487e18b
Author: Gary Lee
Date: Sat, 9 Feb 2019 14:02:13 +1100
base: fix warnings [#3006]
fix warnings about unused variables and add SA_RESTART
Complete diffstat:
--
src/base/daemon.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
Testin
a memory leak may occur if a takeover_request is processed after
split brain prevention is disabled at runtime
---
src/rde/rded/rde_main.cc | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index f511814..bb17133 100644
refactor amfd/fmd/rded to use daemon_sighup_install
---
src/amf/amfd/main.cc | 27 +--
src/base/daemon.c| 36 +++-
src/base/daemon.h| 3 +++
src/fm/fmd/fm_main.cc| 28 +---
---
src/amf/amfd/cb.h | 1 +
src/amf/amfd/main.cc | 32
src/amf/amfd/osaf-amfd.in | 1 +
3 files changed, 34 insertions(+)
diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h
index 7ac743e..89cf15d 100644
--- a/src/amf/amfd/cb.h
+++
anges from V1:
add daemon_sighup_install after review
fix memory leak in rded
--
revision 32573b69eae74db365b4842ff41c00789d6f295d
Author: Gary Lee
Date: Fri, 8 Feb 2019 14:20:25 +1100
rded: fix leak when processing takeover_request [#3006]
a memory leak may occur if a takeover_request is
If enabled at runtime and this node is active, promote this node
in consensus service.
If disabled at runtime, watch threads will terminate gracefully when
the plugin exits after losing connectivty to the consensus service.
---
src/rde/rded/osaf-rded.in | 1 +
src/rde/rded/rde_main.cc | 59
brain prevention when it's
running, when it's done we might have to document this runtime change.
Thanks
Minh
On 4/2/19 9:41 pm, Gary Lee wrote:
Summary: osaf: add ability to reload config from fmd.conf [#3006]
Review request for Ticket(s): 3006
Peer Reviewer(s): Hans, Minh
Pull request
If enabled at runtime and this node is active, promote this node
in consensus service.
If disabled at runtime, watch threads will terminate gracefully when
the plugin exits after losing connectivty to the consensus service.
---
src/rde/rded/osaf-rded.in | 1 +
src/rde/rded/rde_main.cc | 59
---
src/fm/fmd/fm_main.cc | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index b473b68..ea70f24 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -306,6 +306,8 @@ int main(int argc, char *argv[]) {
if
Add ReloadConfiguration() function - when called it will
read fmd.conf and look for 'export FMS_X=', and overwrite
current environment variable settings in the caller.
This allows split brain prevention parameters to be changed at
runtime without a node restart.
---
revision b5206b54fbc5462eaf6f0599d2c449f22087635d
Author: Gary Lee
Date: Mon, 4 Feb 2019 21:33:11 +1100
rded: reload split brain prevention parameters on SIGHUP [#3006]
If enabled at runtime and this node is active, promote this node
in consensus service.
If disabled at runtime, watch threads will terminate
---
src/amf/amfd/cb.h | 1 +
src/amf/amfd/main.cc | 32
src/amf/amfd/osaf-amfd.in | 1 +
3 files changed, 34 insertions(+)
diff --git a/src/amf/amfd/cb.h b/src/amf/amfd/cb.h
index 7ac743e..89cf15d 100644
--- a/src/amf/amfd/cb.h
+++
revision 09af1afd94395624247750749bdf1f7c9e325fd4
Author: Gary Lee
Date: Tue, 29 Jan 2019 15:08:15 +1100
rded: process takeover request without delay [#3005]
Currently, a takeover request is not sent to the main thread immediately
so that MDS messages related to topology changes are processed first.
If the plugin
Currently, a takeover request is not sent to the main thread immediately
so that MDS messages related to topology changes are processed first.
If the plugin informs us it has lost connectivity to the consensus service
by returning 'UNDEFINED', or we prioritise the current active SC,
then we
revision d3dba70d552d21717bf83a17c5f7ef4e653f230a
Author: Gary Lee
Date: Thu, 24 Jan 2019 12:11:36 +1100
osaf: update sample plugin [#3003]
revision 26f675ea92b6b4ba9045182d083dcf607e6d123a
Author: Gary Lee
Date: Thu, 24 Jan 2019 12:11:36 +1100
osaf: update etcd v2 plugin [#3003]
'etcdctl watch' will return if
'etcdctl watch' will return if connection to the etcd server is lost.
If that occurs, send a 'fake' takeover request to rded so rded
will reboot the node. This is in alignment with the etcd v3 plugin.
---
src/osaf/consensus/plugins/etcd.plugin | 29 +
1 file changed,
---
src/osaf/consensus/plugins/sample.plugin | 20 +++-
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/src/osaf/consensus/plugins/sample.plugin
b/src/osaf/consensus/plugins/sample.plugin
index fc4c54c..cadb9e0 100644
--- a/src/osaf/consensus/plugins/sample.plugin
in
amfd. I intentionally separated it out from [5], so [5] has rded changes
only.
On 21/1/19 9:43 pm, Minh Hon Chau wrote:
Hi Gary,
I'm trying to understand the patch 3/5 and 4/5, there seems to be
logic of *relaxed mode* left in 3/5 and 4/5.
Thanks
Minh
On 21/1/19 2:52 pm, Gary Lee
Add parameters FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE and
FMS_RELAXED_NODE_PROMOTION.
---
src/fm/fmd/fmd.conf | 17 +
1 file changed, 17 insertions(+)
diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf
index 9a106bf..209e484 100644
--- a/src/fm/fmd/fmd.conf
+++
If relaxed node promotion is enabled, allow a SC to remain
active if the peer SC can be seen, even if access to the
consensus service is lost.
---
src/amf/amfd/ndfsm.cc | 2 +-
src/amf/amfd/ndproc.cc | 13 +++--
src/amf/amfd/proc.h| 2 +-
3 files changed, 13 insertions(+), 4
Add FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE option to allow
active SC to be preferred during a network split. The default
behavior is to prefer the larger partition to maintain
existing behaviour.
Add configuration support for FMS_RELAXED_NODE_PROMOTION.
---
src/osaf/consensus/consensus.cc | 39
Allow promotion of node to active at cluster startup, even if the
consensus service is unavailable, if the peer SC can be seen.
During normal cluster operation, if the consensus service becomes
unavailable but the peer SC can still be seen, allow the existing
active SC to remain active.
A new
The 'watch' command does not return if the etcd server goes down.
We need to poll the etcd server to properly check we still have
connectivity to the etcd server.
---
src/osaf/consensus/plugins/etcd3.plugin | 50 ++---
1 file changed, 40 insertions(+), 10 deletions(-)
revision 9a681198810be2e2ad3f512ff966fe1d9eceb1ab
Author: Gary Lee
Date: Mon, 21 Jan 2019 14:35:49 +1100
rded: add relaxed node promotion feature [#2996]
Allow promotion of node to active at cluster startup, even if the
consensus service is unavailable, if the peer SC can be seen.
During normal cluster operation, if the
Hi Minh
ack with minor comments:
- it seems like we have the wrong data structure here, maybe fix in an
enhancement.
- try to simplify the else statement, eg a single place that calls
dequeue/queue?
On 10/12/18 4:44 pm, Minh Chau wrote:
---
src/amf/amfd/imm.cc | 54
revision 846d1b4410f47f808f7f29cdba8e4abec167d99d
Author: Gary Lee
Date: Mon, 26 Nov 2018 17:18:55 +1100
amfd: set userData [#2971]
Depending on timing, it's possible for node_info.member to be set
after this ccb callback. We should populate userData anyway, in case
the active validates this callback and
we need to checkpoint change to node_info.member to the
standby
---
src/amf/amfd/ndfsm.cc | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/amf/amfd/ndfsm.cc b/src/amf/amfd/ndfsm.cc
index 301de83..aeac062 100644
--- a/src/amf/amfd/ndfsm.cc
+++ b/src/amf/amfd/ndfsm.cc
@@ -821,6 +821,9 @@
Depending on timing, it's possible for node_info.member to be set
after this ccb callback. We should populate userData anyway, in case
the active validates this callback and then a SC failover to the
standby occurs.
---
src/amf/amfd/node.cc | 1 +
1 file changed, 1 insertion(+)
diff --git
ack
Thanks
On 23/11/18 10:00 pm, thuan.tran wrote:
---
src/amf/amfd/imm.cc | 4
src/amf/amfd/main.cc | 2 ++
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/amf/amfd/imm.cc b/src/amf/amfd/imm.cc
index e990288..82d2b13 100644
--- a/src/amf/amfd/imm.cc
+++
revision c43ae9d97d169cc4a3b57da14ed9191dca8dfba5
Author: Gary Lee
Date: Fri, 16 Nov 2018 06:33:18 +
amfd: add node to failover_list before calling SetState [#2963]
node must be added to failover_list before SetState() is called.
If the state is 'end', then it will be deleted by SetState().
Otherwise, we
node must be added to failover_list before SetState() is called.
If the state is 'end', then it will be deleted by SetState().
Otherwise, we will leave a node in 'End' state mistakenly in
failover_list.
---
src/amf/amfd/ckpt_dec.cc | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff
Hi Minh
Please ignore. I will send another patch.
On 16/11/18 4:16 pm, Gary Lee wrote:
On the standby, receiving events in 'End' state is not entirely unexpected,
depending
on when the checkpoint arrives from the active. We should just log a message
instead
of asserting.
2018-11-16 00:49
revision 2cec33e87290d99797cc3aa71314388825aa0c7f
Author: Gary Lee
Date: Fri, 16 Nov 2018 05:00:58 +
amfd: do not assert when events are received in End state [#2963]
On the standby, receiving events in 'End' state is not entirely unexpected,
depending
on when the checkpoint arrives from the active. We s
On the standby, receiving events in 'End' state is not entirely unexpected,
depending
on when the checkpoint arrives from the active. We should just log a message
instead
of asserting.
2018-11-16 00:49:22.096 SC-2 osafamfd[252]: NO Node 'PL-3' not found in
failover_list. Create new entry
revision 4935b51bce5f313925006d1cf1cf1c46c71f0abb
Author: Gary Lee
Date: Thu, 15 Nov 2018 23:30:54 +
amfd: set node failover state correctly on standby [#2963]
Complete diffstat:
--
src/amf/amfd/node_state_machine.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Testin
---
src/amf/amfd/node_state_machine.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/amf/amfd/node_state_machine.cc
b/src/amf/amfd/node_state_machine.cc
index 478ad2a48..c5d86d33c 100644
--- a/src/amf/amfd/node_state_machine.cc
+++ b/src/amf/amfd/node_state_machine.cc
Hi Minh
Ack (review)
> On 14 Nov 2018, at 3:37 pm, Minh Chau wrote:
>
> If the SU is unlock-in/unlock before the node joins cluster, the SU is
> instantiated
> and in unlocked state. However, when the node completes joining the cluster,
> amfd
> assumes all applications' SU uninstantiated
ment: Canh noticed this problem and did
the initial patches
revision 81650ab440f6b140fd4d5d485d13e573b4e070c0
Author: Gary Lee
Date: Tue, 13 Nov 2018 07:30:45 +
osaf: update etcd2 and sample plugins [#2954]
add timeout parameter to set and set_if_prev
revision f3a64996eeb739904c7d8e1abcf4f4
In CreateTakeoverRequest(), if the initial attempt fails,
then the takeover_request is created without a lease.
Furthermore, when the takeover_request result is set,
it is being set without a lease, and the takeover_request
is not automatically removed.
Add parameter to KeyValue::Set, and
add timeout parameter to set and set_if_prev
---
src/osaf/consensus/plugins/etcd.plugin | 20
src/osaf/consensus/plugins/sample.plugin | 16 ++--
2 files changed, 22 insertions(+), 14 deletions(-)
diff --git a/src/osaf/consensus/plugins/etcd.plugin
A check to make sure the consensus service is writable (ie. the SC is in a
partition with quorum)
is present in avd_node_failover(). However, [#2918] means this function is not
always being called.
We need to move it.
---
src/amf/amfd/ndfsm.cc | 1 +
src/amf/amfd/ndproc.cc | 10 +++---
revision 34dc36653385a670198acdbc66f4b53699524696
Author: Gary Lee
Date: Tue, 13 Nov 2018 06:23:52 +
amfd: check consensus service is writable [#2957]
A check to make sure the consensus service is writable (ie. the SC is in a
partition with quorum)
is present in avd_node_failover(). However, [#2918] means thi
Hi Thuan
Ack with one comment (review only), I guess
avd_check_nodes_after_renit_imn() and should be
avd_check_nodes_after_reinit_imm()?
I'll change it and push on your behalf.
Thanks
Gary
On 12/11/18 7:25 pm, thuan.tran wrote:
- When AMFD got IMM BAD_HANDLE, it will try to finalize
If a PL rejoins the main network partition before the node failover timer
expires,
it is told to reboot by AMFD. AMFND thinks it has become headless and
resets rcv_msg_id to 0, and shows this when it receives the reboot msg from
AMFD:
Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason:
revision 2ddb0f3a9bc401afcecf7e17f5a629a709e27c48
Author: Gary Lee
Date: Fri, 2 Nov 2018 04:57:55 +
amfd: reset snd_msg_id in LostFound state [#2952]
If a PL rejoins the main network partition before the node failover timer
expires,
it is told to reboot by AMFD. AMFND thinks it has become headless
>>
>> -Original Message-
>> From: Minh Hon Chau [mailto:minh.c...@dektech.com.au]
>> Sent: 30 October 2018 12:40
>> To: Nagendra Kumar; 'Gary Lee'; hans.nordeb...@ericsson.com
>> Cc: opensaf-devel@lists.sourceforge.net
>> Subject: Re: [PATCH
t find when the following code will be called:
+LostFound::LostFound(NodeStateMachine *fsm) :
+LostRebooting::LostRebooting(NodeStateMachine *fsm) :
Thanks
-Nagu
-Original Message-
From: Gary Lee [mailto:gary@dektech.com.au]
Sent: 24 October 2018 17:57
To: hans.nordeb...@ericsson.com;
Hi Nagu
Thanks for the quick review!
‘Also modify avd_count_node_up() not to count standby SC’ - I guess you figured
out the NOT shouldn’t be there :)
Thanks
> On 29 Oct 2018, at 7:19 pm, Nagendra Kumar wrote:
>
> Also modify avd_count_node_up() not to count standby SC
If all nodes are synced after headless, the timer is stopped
but node_sync_window_closed is never set to true.
Later on, if a node becomes split from the main network and
rejoins, it will send a headless sync to amfd.
amfd will go into a never ending loop of processing the message,
putting back
revision ee709671ab47d8b2b8c4b9d7f322bd138d787d6e
Author: Gary Lee
Date: Mon, 29 Oct 2018 06:36:46 +
amfd: ensure node_sync_window_closed is set [#2946]
If all nodes are synced after headless, the timer is stopped
but node_sync_window_closed is never set to true.
Later on, if a node becomes split fro
revision 1c3df910cf48ff60a2fcdf1b7bf1f7a6dec5c967
Author: Gary Lee
Date: Mon, 29 Oct 2018 04:20:48 +
amfd: ensure node_sync_window_closed is set [#2946]
If all nodes are synced after headless, the timer is stopped
but node_sync_window_closed is never set to true.
Later on, if a node becomes split from the main network a
---
src/amf/amfnd/clm.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/amf/amfnd/clm.cc b/src/amf/amfnd/clm.cc
index f1f65bcef..06eb229c7 100644
--- a/src/amf/amfnd/clm.cc
+++ b/src/amf/amfnd/clm.cc
@@ -124,7 +124,7 @@ static void clm_to_amf_node(void) {
error =
revision c7de076e4efbcc2c4822e7ad4f8eafa0cdf61f46
Author: Gary Lee
Date: Thu, 25 Oct 2018 05:24:25 +
amfnd: change log message severity [#2945]
Complete diffstat:
--
src/amf/amfnd/clm.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Testing Commands:
-
*** LIST THE COMMAND
OpenSAF has relied on reliable, redundant links between nodes in a cluster.
This can no longer be assumed in virtualised environments.
In order to avoid duplicate assignments, we need to delay
node failover in environments where temporary network partitioning is expected.
When delayed node
t for more details and a state diagram is available there.
revision 7e04f9bc5aea4f5580e3bdf0551b37c05bfc4025
Author: Gary Lee
Date: Wed, 24 Oct 2018 11:37:04 +
amfd: add support for delaying node failover [#2918]
OpenSAF has relied on reliable, redundant links between nodes in a cluster
allow reboot msg to be sent from any director, for
split brain recovery situations
---
src/amf/amfnd/mds.cc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/amf/amfnd/mds.cc b/src/amf/amfnd/mds.cc
index 1ee24cf5b..d179ff40e 100644
--- a/src/amf/amfnd/mds.cc
+++
osafAmfDelayNodeFailoverTimeout - the number of seconds we wait
after MDS down is received before we consider it truly down.
osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we
wait for Node Up after receving MDS up, before we send reboot
to the node. After sending reboot to a node,
---
src/amf/amfd/chkop.cc| 10 ++
src/amf/amfd/ckpt.h | 3 ++-
src/amf/amfd/ckpt_dec.cc | 40 +++-
src/amf/amfd/ckpt_enc.cc | 26 --
src/amf/amfd/ckpt_msg.h | 1 +
5 files changed, 76 insertions(+), 4 deletions(-)
osafAmfDelayNodeFailoverTimeout - the number of seconds we wait
after MDS down is received before we consider it truly down.
osafAmfDelayNodeFailoverNodeUpWait - the number of seconds we
wait for Node Up after receving MDS up, before we send reboot
to the node. After sending reboot to a node,
t for more details and a state diagram is available there.
revision 7e04f9bc5aea4f5580e3bdf0551b37c05bfc4025
Author: Gary Lee
Date: Wed, 24 Oct 2018 11:37:04 +
amfd: add support for delaying node failover [#2918]
OpenSAF has relied on reliable, redundant links between nodes in a cluster
---
src/amf/amfd/chkop.cc| 10 ++
src/amf/amfd/ckpt.h | 3 ++-
src/amf/amfd/ckpt_dec.cc | 40 +++-
src/amf/amfd/ckpt_enc.cc | 26 --
src/amf/amfd/ckpt_msg.h | 1 +
5 files changed, 76 insertions(+), 4 deletions(-)
OpenSAF servicesy
Core libraries n
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 3c530e1f8cd1ed64826ac0fe4d461809af9c5d7b
Author: Gary
Keep the SC with the earlier boot time alive, if split brain is
detected. In the unlikely event that the boot up time
is equal, the node with the lower ID survives.
---
src/rde/rded/rde_cb.h| 2 ++
src/rde/rded/rde_main.cc | 21 -
src/rde/rded/rde_mds.cc | 22
Hi Minh
ack (review only). Perhaps you could remove the TRACE_LEAVE() though.
Thanks
Gary
On 16/10/18 09:16, Minh Chau wrote:
Summary: amf: Add new susi fsm EXCESSIVE state to handle excessive assignment
due to splitbrain V2 [#2929]
Review request for Ticket(s): 2929
Peer Reviewer(s):
Hi
One comment inline.
/Gary
On 3/10/18, 7:00 am, "Minh Hon Chau" wrote:
Hi Thang,
It looks no harm to me if we do some retries to set implementer.
You may need Gary/Hans to have a look, since it relates to consensus case.
Thanks
Minh
On
We need to use SIGHUP for reload of configuration
---
src/base/daemon.c| 5 +++--
src/base/logtrace.cc | 22 --
2 files changed, 3 insertions(+), 24 deletions(-)
diff --git a/src/base/daemon.c b/src/base/daemon.c
index 4a37c4174..cdde7fde0 100644
--- a/src/base/daemon.c
addressed V2 comments, removed used
of regex
revision 0900ae075f9bbdc2fc7e8bdb87affd343e621d8d
Author: Gary Lee
Date: Fri, 21 Sep 2018 04:16:53 +
fmd: enable reload of configuration without restart [#2923]
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and
FMS_ACTIVATION_SUPERVI
. /Thanks HansN
On 09/20/2018 02:09 PM, Gary Lee wrote:
> Hi Hans
>
> Just tested it with 4.9.3 and it works fine.
>
> It doesn't link with 4.8.5
>
> config_file_reader.cc:(.text+0x1c8): undefined reference to
`std::regex_token_iterator<__gn
ect: Re: [PATCH 2/3] base: add config file reader [#2923]
ack, review only. Some comments below. /Thanks HansN
On 09/19/2018 05:42 AM, Gary Lee wrote:
> Some configuration attribute are read by OpenSAF daemons as
>
er [#2923]
ack, review only. Some comments below. /Thanks HansN
On 09/19/2018 05:42 AM, Gary Lee wrote:
> Some configuration attribute are read by OpenSAF daemons as
> environment variables. eg.
>
> export FMS_PROMOTE_ACTIVE_TIMER=0
>
Ack (review only)
-Original Message-
From: "thuan.tran"
Date: Monday, 17 September 2018 at 9:39 pm
To: , gary
Cc: , "thuan.tran"
Subject: [PATCH 1/1] smf: improve CcbApplyCallback() to avoid NULL access
[#2927]
Current CcbApplyCallback() is not safe, may access NULL pointer
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and
FMS_ACTIVATION_SUPERVISION_TIMER are currently supported.
These values can be changed in fmd.conf and take effect
by sending SIGHUP to fmd.
---
src/fm/fmd/fm_cb.h | 2 ++
src/fm/fmd/fm_main.cc | 70
revision a297b2e05fd428f2088c4d1c345ae6052fdf51f1
Author: Gary Lee
Date: Wed, 19 Sep 2018 03:33:52 +
fmd: enable reload of configuration without restart [#2923]
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and
FMS_ACTIVATION_SUPERVISION_TIMER are currently supported.
These values can be changed i
We need to use SIGHUP for reload of configuration
---
src/base/daemon.c| 5 +++--
src/base/logtrace.cc | 22 --
2 files changed, 3 insertions(+), 24 deletions(-)
diff --git a/src/base/daemon.c b/src/base/daemon.c
index 4a37c4174..cdde7fde0 100644
--- a/src/base/daemon.c
Some configuration attribute are read by OpenSAF daemons as
environment variables. eg.
export FMS_PROMOTE_ACTIVE_TIMER=0
There is no easy way to reload these values without a restart.
ConfigFileReader will parse these files looking for 'export VAR=VAL'
and store them into a map, so a daemon can
Sorry, I will send out a V2 soon. Please ignore. I haven't handled the
case where you send SIGHUP to a process that doesn't explicitly handle it.
On 18/09/18 18:13, Gary Lee wrote:
Summary: base: remove use of SIGHUP to toggle INFO messages [#2923]
Review request for Ticket(s): 2923
Peer
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and
FMS_ACTIVATION_SUPERVISION_TIMER are currently supported.
These values can be changed in fmd.conf and take effect
by sending SIGHUP to fmd.
---
src/fm/fmd/fm_cb.h | 2 ++
src/fm/fmd/fm_main.cc | 70
Some configuration attribute are read by OpenSAF daemons as
environment variables. eg.
export FMS_PROMOTE_ACTIVE_TIMER=0
There is no easy way to reload these values without a restart.
ConfigFileReader will parse these files looking for 'export VAR=VAL'
and store them into a map, so a daemon can
revision 093496e7869f7dc205ec9f176d5be30d75564c34
Author: Gary Lee
Date: Tue, 18 Sep 2018 08:05:40 +
fmd: enable reload of configuration without restart [#2923]
Only FMS_PROMOTE_ACTIVE_TIMER, FMS_NODE_ISOLATION_TIMEOUT and
FMS_ACTIVATION_SUPERVISION_TIMER are currently supported.
These values can be changed i
We need to use SIGHUP for reload of configuration
---
src/base/logtrace.cc | 22 --
1 file changed, 22 deletions(-)
diff --git a/src/base/logtrace.cc b/src/base/logtrace.cc
index c1a194f60..8908c1ff3 100644
--- a/src/base/logtrace.cc
+++ b/src/base/logtrace.cc
@@ -75,22 +75,6
)
- OpenSAF Support and Services
- Original Message -
Subject: Re: [PATCH 1/1] amfd: reboot nodes that report
conflicting 2N active assignments [#2920]
From: "Gary Lee"
Date: 9/3/18 1:36 pm
To: nagen...@hasolutions.in, hans.nordeb...@ericsson.com,
Solutions Pvt. Ltd. (www.hasolutions.in)
- OpenSAF Support and Services
- Original Message -
Subject: Re: [PATCH 1/1] amfd: reboot nodes that report
conflicting 2N active assignments [#2920]
From: "Gary Lee"
Date: 9/3/18 1:36 pm
To: nagen...@haso
uot; above):
-
revision 1cb5f00bb25729129ee7dc2edf11790a0debbd11
Author: Gary Lee
Date: Fri, 31 Aug 2018 06:37:03 +
amfd: reboot nodes that report conflicting 2N active assignments [#2920]
After a split network event, both SCs can reboot endlessly,
due to this assertion:
2018-08-29 18:05:34.689 SC-
After a split network event, both SCs can reboot endlessly,
due to this assertion:
2018-08-29 18:05:34.689 SC-2 osafamfd[263]: src/amf/amfd/sg_2n_fsm.cc:596:
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
2018-08-29 18:05:34.695 SC-2 osafamfnd[273]: ER AMFD has
Hi Nagendra
On 03/09/18 17:50, nagen...@hasolutions.in wrote:
Hi Gary,
I have few questions:
1. Do we really want to reboot both the nodes in case of conflicts?
That's a good question. A cluster reboot should also be considered? I
have proposed both nodes as it's somewhere in between. Keep
uot; above):
-
revision 1cb5f00bb25729129ee7dc2edf11790a0debbd11
Author: Gary Lee
Date: Fri, 31 Aug 2018 06:37:03 +
amfd: reboot nodes that report conflicting 2N active assignments [#2920]
After a split network event, both SCs can reboot endlessly,
due to this
After a split network event, both SCs can reboot endlessly,
due to this assertion:
2018-08-29 18:05:34.689 SC-2 osafamfd[263]: src/amf/amfd/sg_2n_fsm.cc:596:
avd_sg_2n_act_susi: Assertion 'a_susi_1->su == a_susi_2->su' failed.
2018-08-29 18:05:34.695 SC-2 osafamfnd[273]: ER AMFD has
/2018 11:52 PM, Gary Lee wrote:
NOTICE: This email was received from an EXTERNAL sender
Hi Alex
Thanks, it looks much better.
So I tried ‘killall
Hi Minh
ack (review)
Thanks Gary
On 22/08/18 20:23, Minh Chau wrote:
In the scenario of single step upgrade where the UNLOCK-IN/UNLOCK
admin op are issued to a SU hosted on non-active node while cluster
startup timer is active and not all ncs SU on that node are fully
assigned. In such case,
9a2
Author: Gary Lee
Date: Wed, 22 Aug 2018 04:37:33 +
osaf: make takeover request expiration time configurable [#2917]
Complete diffstat:
--
src/fm/fmd/fmd.conf | 4
src/osaf/consensus/consensus.cc | 24
src/osaf/consensus/consens
---
src/fm/fmd/fmd.conf | 4
src/osaf/consensus/consensus.cc | 24
src/osaf/consensus/consensus.h | 4 ++--
3 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/src/fm/fmd/fmd.conf b/src/fm/fmd/fmd.conf
index 9aff54970..9a106bf90 100644
---
n
OpenSAF servicesn
Core libraries y
Samples n
Tests n
Other n
Comments (indicate scope for each "y" above):
-
revision 46e2be0e8ce001bc7349b4556847b0aa7c427772
Au
All callers of Consensus::Demote() already log an error if the return code is
not SA_AIS_OK.
A warning message will suffice.
---
src/osaf/consensus/consensus.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
Gary
From: Alex Jones
Organization: Ribbon
Date: Thursday, 16 August 2018 at 3:41 am
To: Gary Lee , ,
,
Cc:
Subject: Re: [PATCH 1/1] amf: add support for container/contained [#70]
G'day Gary,
I see you were adding the XML file dynamically with "immcfg -f". I hadn't
tri
here. But, #2 (rejecting
all but NWay-active for container) should already be in there. Is
there a specific test you ran that didn't work?
Alex
On 08/13/2018 02:43 AM, Gary Lee wrote:
NOTICE: This email was received from
Hi Alex
Some initial comments:
0. Is it possible to split up the patch into amfd / amfnd / common / samples.
Just makes it easier to reply inline.
1. Please compile the container demo by default, and make amf_container_script
world executable.
Eg.
diff --git a/samples/amf/Makefile.am
Hi Hans
Please see reply inline.
Thanks
On 07/08/18 17:04, Hans Nordeback wrote:
Hi Gary,
ack, review only. Minor comments below. /Thanks HansN
On 08/01/2018 06:49 AM, Gary Lee wrote:
Sometimes the 'watch' command in the KV plugin will not return
a takeover request, if the KV store does
101 - 200 of 904 matches
Mail list logo