Hi Gary,

My mistake.

I tried turning on trace logs for amfd and rded. The osafamfd log file (snippet 
below) does show the consensus code being hit, plus it shows that the 
etcd3.plugin is being used.
Many thanks for your help!!!

<143>1 2020-06-02T14:46:44.537709-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2435"] 25708:osaf/consensus/consensus.cc:275 >> Consensus
<143>1 2020-06-02T14:46:44.537722-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2436"] 25708:base/getenv.cc:139 TR FMS_SPLIT_BRAIN_PREVENTION = 1
<143>1 2020-06-02T14:46:44.537728-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2437"] 25708:base/getenv.cc:51 TR FMS_KEYVALUE_STORE_PLUGIN_CMD = 
'/etc/opensaf/etcd3.plugin'
<143>1 2020-06-02T14:46:44.537735-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2438"] 25708:base/getenv.cc:146 TR FMS_USE_REMOTE_FENCING is not 
set; using default value 0
<143>1 2020-06-02T14:46:44.537741-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2439"] 25708:base/getenv.cc:146 TR 
FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is not set; using default value 1
<143>1 2020-06-02T14:46:44.537746-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2440"] 25708:base/getenv.cc:146 TR 
FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME is not set; using default 
value 4
<143>1 2020-06-02T14:46:44.537752-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2441"] 25708:base/getenv.cc:146 TR FMS_RELAXED_NODE_PROMOTION is 
not set; using default value 0
<143>1 2020-06-02T14:46:44.537757-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2442"] 25708:base/getenv.cc:51 TR FMS_CONF_FILE = 
'/etc/opensaf/fmd.conf'
<143>1 2020-06-02T14:46:44.537763-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2443"] 25708:base/getenv.cc:139 TR FMS_TAKEOVER_REQUEST_VALID_TIME 
= 20
<143>1 2020-06-02T14:46:44.537771-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2444"] 25708:osaf/consensus/consensus.cc:0 << Consensus
<143>1 2020-06-02T14:46:44.537779-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2445"] 25708:osaf/consensus/consensus.cc:30 >> PromoteThisNode
<143>1 2020-06-02T14:46:44.537808-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2447"] 25708:osaf/consensus/consensus.cc:393 >> 
CheckForExistingTakeoverRequest
<143>1 2020-06-02T14:46:44.537838-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2450"] 25708:osaf/consensus/consensus.cc:607 >> ReadTakeoverRequest
<143>1 2020-06-02T14:46:44.537881-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2455"] 25708:osaf/consensus/consensus.cc:275 >> Consensus
<143>1 2020-06-02T14:46:44.537887-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2456"] 25708:base/getenv.cc:139 TR FMS_SPLIT_BRAIN_PREVENTION = 1
<143>1 2020-06-02T14:46:44.537892-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2457"] 25708:base/getenv.cc:51 TR FMS_KEYVALUE_STORE_PLUGIN_CMD = 
'/etc/opensaf/etcd3.plugin'
<143>1 2020-06-02T14:46:44.537898-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2458"] 25708:base/getenv.cc:146 TR FMS_USE_REMOTE_FENCING is not 
set; using default value 0
<143>1 2020-06-02T14:46:44.537904-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2459"] 25708:base/getenv.cc:146 TR 
FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is not set; using default value 1
<143>1 2020-06-02T14:46:44.53791-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2460"] 25708:base/getenv.cc:146 TR 
FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME is not set; using default 
value 4
<143>1 2020-06-02T14:46:44.537915-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2461"] 25708:base/getenv.cc:146 TR FMS_RELAXED_NODE_PROMOTION is 
not set; using default value 0
<143>1 2020-06-02T14:46:44.537921-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2462"] 25708:base/getenv.cc:51 TR FMS_CONF_FILE = 
'/etc/opensaf/fmd.conf'
<143>1 2020-06-02T14:46:44.537926-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2463"] 25708:base/getenv.cc:139 TR FMS_TAKEOVER_REQUEST_VALID_TIME 
= 20
<143>1 2020-06-02T14:46:44.537932-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2464"] 25708:osaf/consensus/consensus.cc:0 << Consensus
<143>1 2020-06-02T14:46:44.537964-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2466"] 25708:osaf/consensus/key_value.cc:23 >> Execute
<143>1 2020-06-02T14:46:44.601834-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2469"] 25708:osaf/consensus/key_value.cc:45 TR Executed 
'/etc/opensaf/etcd3.plugin get "takeover_request"', returning 1
<143>1 2020-06-02T14:46:44.60184-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2470"] 25708:osaf/consensus/key_value.cc:0 << Execute
<143>1 2020-06-02T14:46:44.601846-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2471"] 25708:osaf/consensus/key_value.cc:59 TR Read ''
<143>1 2020-06-02T14:46:44.601899-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2473"] 25708:osaf/consensus/key_value.cc:0 << Get
<143>1 2020-06-02T14:46:44.601905-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2474"] 25708:osaf/consensus/consensus.cc:615 TR Could not read 
takeover request (7)
<143>1 2020-06-02T14:46:44.60191-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2475"] 25708:osaf/consensus/consensus.cc:0 << ReadTakeoverRequest
<143>1 2020-06-02T14:46:44.601916-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2476"] 25708:osaf/consensus/consensus.cc:0 << 
CheckForExistingTakeoverRequest
<143>1 2020-06-02T14:46:44.60197-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2478"] 25708:osaf/consensus/key_value.cc:154 >> Lock
<143>1 2020-06-02T14:46:44.601992-04:00 ems osafamfd 25708 osafamfd [meta 
sequenceId="2480"] 25708:osaf/consensus/consensus.cc:275 >> Consensus
…

Regards,
David



From: Hoyt, David
Sent: Monday, June 1, 2020 5:05 PM
To: Gary Lee <gary....@dektech.com.au>
Cc: opensaf-users@lists.sourceforge.net
Subject: RE: opensaf and etcd

Hi Gary,

Sorry, I’m still not getting it.

I have updated the etcd.conf file on each of the SC nodes:
ETCD_NAME="etcd1"  # value is "etcd2" on SC-2 node
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380";
ETCD_LISTEN_CLIENT_URLS=http://localhost:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://0.0.0.0:2380";
ETCD_ADVERTISE_CLIENT_URLS=http://localhost:2379
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER="etcd1=http://123.45.6.14:2380,etcd2=http://123.45.6.100:2380,etcd3=http://123.78.9.18:2380";

I updated the fmd.con file as follows:

export FMS_SPLIT_BRAIN_PREVENTION=1
export FMS_KEYVALUE_STORE_PLUGIN_CMD=/etc/opensaf/etcd3.plugin

iptables was also updated:
iptables -I INPUT -p tcp --dport 2379 -j ACCEPT -m comment --comment "etcd 
client communication"
iptables -I INPUT -p tcp --dport 2380 -j ACCEPT -m comment --comment "etcd 
server to server communication"


etcd is running on each node. Note: I could only get it started from the 
command line with the various options. Is there a way to start etcd and have it 
read these options from the etcd.conf file. When I tried just running etcd from 
the command line, the values for the various options were incorrect.
Here's an example of how I started it:
]# /bin/etcd --name etcd1 --data-dir=/var/lib/etcd 
--initial-advertise-peer-urls http://123.45.6.14:2380 --listen-peer-urls 
http://123.45.6.14:2380 --listen-client-urls http://123.45.6.14:2379  
--advertise-client-urls http://123.45.6.14:2379 --initial-cluster-token 
etcd-cluster --initial-cluster 
etcd1=http://123.45.6.14:2380,etcd2=http://123.45.6.100:2380,etcd3=http://123.78.9.18:2380
 --initial-cluster-state new

When I started opensaf, I did see the following log being generated:
Jun  1 09:10:38 dhoyt-ha-1 osafrded[12686]: NO Connectivity to consensus 
service established

I only started etcd on both SC nodes as there was an issue with the 3rd node.
]# etcdctl cluster-health
member 3d66d0999b62239d is healthy: got healthy result from 
http://123.45.6.100:2379
member 80de35916ce53b47 is unreachable: no available published client urls
member ce6170fb51239c3e is healthy: got healthy result from 
http://123.45.6.14:2379
]# etcdctl member list
3d66d0999b62239d: name=etcd2 peerURLs=http://123.45.6.100:2380 
clientURLs=http://172.23.8.100:2379 isLeader=false
80de35916ce53b47: name=etcd3 peerURLs=http://123.78.9.18:2380 clientURLs= 
isLeader=false
ce6170fb51239c3e: name=etcd1 peerURLs=http://123.45.6.14:2380 
clientURLs=http://172.23.8.14:2379 isLeader=true


At this point, what should happen?
I checked via etcdctl to see if the /opensaf/ directory exists and it doesn’t.
]# etcdctl ls /opensaf
]# etcdctl ls /
]#

Am I still missing some configuration?

Regards,
David




From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Sent: Monday, June 1, 2020 2:58 AM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Cc: 
opensaf-users@lists.sourceforge.net<mailto:opensaf-users@lists.sourceforge.net>
Subject: Re: opensaf and etcd

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi

In the plugin, there's some text that describes how to do it. But it assumes 
you have etcdctl installed on the SC. The actual cluster can be elsewhere.

"If you have configured etcd to run elsewhere,
please add the '--endpoints' option to etcdctl in the plugin."

If you don't want to install etcdctl on the SCs, then you could write a custom 
plugin that uses the REST interface provided by etcd. There's a sample.plugin 
file that describes the 'API' that be implemented.

Gary
________________________________
From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: 01 June 2020 13:31
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> 
<Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>>
Subject: Re: opensaf and etcd

Sorry, hit send before I was finished.
My question was:
Does opensaf already have the code in place to communicate with the etcd 
members if the etcd cluster is outside of opensaf's cluster?
-David
Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: Sunday, May 31, 2020 11:28:51 PM
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> 
<Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>>
Subject: Re: opensaf and etcd

Actually, we're leaning towards having the etcd cluster outside of the opensaf 
cluster. But that scenario I described came up for discussion too.
Now if the etcd cluster is outside the opensaf cluster, after enabling 
opensaf's etcd option, is it just a matter of providing the IPs (plus port) for 
each of the etcd members?
Does opensaf have the code
Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Sent: Sunday, May 31, 2020 11:14:28 PM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Cc: 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> 
<Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>>
Subject: Re: opensaf and etcd

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi David

I guess that hybrid approach should work. opensaf doesn’t really care about how 
the etcd cluster is configured, as long as a quorum in the etcd cluster can be 
maintained during the entire lifetime.

Gary
—
________________________________
From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>
Sent: Monday, June 1, 2020 1:09:34 PM
To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Cc: 
Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> 
<Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>>
Subject: Re: opensaf and etcd

Ok, so what if I have 2 SC nodes  where etcd is running locally on each, and 
then a 3rd etcd node outside of the opensaf cluster. Is this etcd config valid? 
 If so, how would opensaf's etcd configuration look like?
Or does the etcd cluster have to be either:
- all within the opensaf cluster
or
- all outside the opensaf cluster
-David

Get Outlook for Android<https://aka.ms/ghei36>



From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>>
Sent: Thursday, May 28, 2020 9:35 PM
To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; 
opensaf-users@lists.sourceforge.net<mailto:opensaf-users@lists.sourceforge.net>
Subject: Re: opensaf and etcd



________________________________

NOTICE: This email was received from an EXTERNAL sender

________________________________



Hi David



There are some docs AndersW wrote in these tickets, for background info:



https://sourceforge.net/p/opensaf/tickets/64/<https://sourceforge.net/p/opensaf/tickets/64>

https://sourceforge.net/p/opensaf/tickets/2795/<https://sourceforge.net/p/opensaf/tickets/2795>



You have to decide whether your etcd cluster is internal or external to your 
OpenSAF cluster.



Basically, there are hooks in the OpenSAF code to obtain a "lock" in etcd 
before a node is promoted to Active.

The sample etcd3.plugin assumes there's an etcd instance running locally on 
each SC.



The configuration is in fmd.conf



# To enable split brain prevention, change to 1

export FMS_SPLIT_BRAIN_PREVENTION=1



# Full path to key-value store plugin
export FMS_KEYVALUE_STORE_PLUGIN_CMD=/full/path/to/etcd3.plugin



Gary



________________________________
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.
________________________________

_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to