Hi Gary, My mistake.
I tried turning on trace logs for amfd and rded. The osafamfd log file (snippet below) does show the consensus code being hit, plus it shows that the etcd3.plugin is being used. Many thanks for your help!!! <143>1 2020-06-02T14:46:44.537709-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2435"] 25708:osaf/consensus/consensus.cc:275 >> Consensus <143>1 2020-06-02T14:46:44.537722-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2436"] 25708:base/getenv.cc:139 TR FMS_SPLIT_BRAIN_PREVENTION = 1 <143>1 2020-06-02T14:46:44.537728-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2437"] 25708:base/getenv.cc:51 TR FMS_KEYVALUE_STORE_PLUGIN_CMD = '/etc/opensaf/etcd3.plugin' <143>1 2020-06-02T14:46:44.537735-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2438"] 25708:base/getenv.cc:146 TR FMS_USE_REMOTE_FENCING is not set; using default value 0 <143>1 2020-06-02T14:46:44.537741-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2439"] 25708:base/getenv.cc:146 TR FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is not set; using default value 1 <143>1 2020-06-02T14:46:44.537746-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2440"] 25708:base/getenv.cc:146 TR FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME is not set; using default value 4 <143>1 2020-06-02T14:46:44.537752-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2441"] 25708:base/getenv.cc:146 TR FMS_RELAXED_NODE_PROMOTION is not set; using default value 0 <143>1 2020-06-02T14:46:44.537757-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2442"] 25708:base/getenv.cc:51 TR FMS_CONF_FILE = '/etc/opensaf/fmd.conf' <143>1 2020-06-02T14:46:44.537763-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2443"] 25708:base/getenv.cc:139 TR FMS_TAKEOVER_REQUEST_VALID_TIME = 20 <143>1 2020-06-02T14:46:44.537771-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2444"] 25708:osaf/consensus/consensus.cc:0 << Consensus <143>1 2020-06-02T14:46:44.537779-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2445"] 25708:osaf/consensus/consensus.cc:30 >> PromoteThisNode <143>1 2020-06-02T14:46:44.537808-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2447"] 25708:osaf/consensus/consensus.cc:393 >> CheckForExistingTakeoverRequest <143>1 2020-06-02T14:46:44.537838-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2450"] 25708:osaf/consensus/consensus.cc:607 >> ReadTakeoverRequest <143>1 2020-06-02T14:46:44.537881-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2455"] 25708:osaf/consensus/consensus.cc:275 >> Consensus <143>1 2020-06-02T14:46:44.537887-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2456"] 25708:base/getenv.cc:139 TR FMS_SPLIT_BRAIN_PREVENTION = 1 <143>1 2020-06-02T14:46:44.537892-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2457"] 25708:base/getenv.cc:51 TR FMS_KEYVALUE_STORE_PLUGIN_CMD = '/etc/opensaf/etcd3.plugin' <143>1 2020-06-02T14:46:44.537898-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2458"] 25708:base/getenv.cc:146 TR FMS_USE_REMOTE_FENCING is not set; using default value 0 <143>1 2020-06-02T14:46:44.537904-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2459"] 25708:base/getenv.cc:146 TR FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is not set; using default value 1 <143>1 2020-06-02T14:46:44.53791-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2460"] 25708:base/getenv.cc:146 TR FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME is not set; using default value 4 <143>1 2020-06-02T14:46:44.537915-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2461"] 25708:base/getenv.cc:146 TR FMS_RELAXED_NODE_PROMOTION is not set; using default value 0 <143>1 2020-06-02T14:46:44.537921-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2462"] 25708:base/getenv.cc:51 TR FMS_CONF_FILE = '/etc/opensaf/fmd.conf' <143>1 2020-06-02T14:46:44.537926-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2463"] 25708:base/getenv.cc:139 TR FMS_TAKEOVER_REQUEST_VALID_TIME = 20 <143>1 2020-06-02T14:46:44.537932-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2464"] 25708:osaf/consensus/consensus.cc:0 << Consensus <143>1 2020-06-02T14:46:44.537964-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2466"] 25708:osaf/consensus/key_value.cc:23 >> Execute <143>1 2020-06-02T14:46:44.601834-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2469"] 25708:osaf/consensus/key_value.cc:45 TR Executed '/etc/opensaf/etcd3.plugin get "takeover_request"', returning 1 <143>1 2020-06-02T14:46:44.60184-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2470"] 25708:osaf/consensus/key_value.cc:0 << Execute <143>1 2020-06-02T14:46:44.601846-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2471"] 25708:osaf/consensus/key_value.cc:59 TR Read '' <143>1 2020-06-02T14:46:44.601899-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2473"] 25708:osaf/consensus/key_value.cc:0 << Get <143>1 2020-06-02T14:46:44.601905-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2474"] 25708:osaf/consensus/consensus.cc:615 TR Could not read takeover request (7) <143>1 2020-06-02T14:46:44.60191-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2475"] 25708:osaf/consensus/consensus.cc:0 << ReadTakeoverRequest <143>1 2020-06-02T14:46:44.601916-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2476"] 25708:osaf/consensus/consensus.cc:0 << CheckForExistingTakeoverRequest <143>1 2020-06-02T14:46:44.60197-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2478"] 25708:osaf/consensus/key_value.cc:154 >> Lock <143>1 2020-06-02T14:46:44.601992-04:00 ems osafamfd 25708 osafamfd [meta sequenceId="2480"] 25708:osaf/consensus/consensus.cc:275 >> Consensus … Regards, David From: Hoyt, David Sent: Monday, June 1, 2020 5:05 PM To: Gary Lee <gary....@dektech.com.au> Cc: opensaf-users@lists.sourceforge.net Subject: RE: opensaf and etcd Hi Gary, Sorry, I’m still not getting it. I have updated the etcd.conf file on each of the SC nodes: ETCD_NAME="etcd1" # value is "etcd2" on SC-2 node ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380" ETCD_LISTEN_CLIENT_URLS=http://localhost:2379 ETCD_INITIAL_ADVERTISE_PEER_URLS="http://0.0.0.0:2380" ETCD_ADVERTISE_CLIENT_URLS=http://localhost:2379 ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_INITIAL_CLUSTER="etcd1=http://123.45.6.14:2380,etcd2=http://123.45.6.100:2380,etcd3=http://123.78.9.18:2380" I updated the fmd.con file as follows: export FMS_SPLIT_BRAIN_PREVENTION=1 export FMS_KEYVALUE_STORE_PLUGIN_CMD=/etc/opensaf/etcd3.plugin iptables was also updated: iptables -I INPUT -p tcp --dport 2379 -j ACCEPT -m comment --comment "etcd client communication" iptables -I INPUT -p tcp --dport 2380 -j ACCEPT -m comment --comment "etcd server to server communication" etcd is running on each node. Note: I could only get it started from the command line with the various options. Is there a way to start etcd and have it read these options from the etcd.conf file. When I tried just running etcd from the command line, the values for the various options were incorrect. Here's an example of how I started it: ]# /bin/etcd --name etcd1 --data-dir=/var/lib/etcd --initial-advertise-peer-urls http://123.45.6.14:2380 --listen-peer-urls http://123.45.6.14:2380 --listen-client-urls http://123.45.6.14:2379 --advertise-client-urls http://123.45.6.14:2379 --initial-cluster-token etcd-cluster --initial-cluster etcd1=http://123.45.6.14:2380,etcd2=http://123.45.6.100:2380,etcd3=http://123.78.9.18:2380 --initial-cluster-state new When I started opensaf, I did see the following log being generated: Jun 1 09:10:38 dhoyt-ha-1 osafrded[12686]: NO Connectivity to consensus service established I only started etcd on both SC nodes as there was an issue with the 3rd node. ]# etcdctl cluster-health member 3d66d0999b62239d is healthy: got healthy result from http://123.45.6.100:2379 member 80de35916ce53b47 is unreachable: no available published client urls member ce6170fb51239c3e is healthy: got healthy result from http://123.45.6.14:2379 ]# etcdctl member list 3d66d0999b62239d: name=etcd2 peerURLs=http://123.45.6.100:2380 clientURLs=http://172.23.8.100:2379 isLeader=false 80de35916ce53b47: name=etcd3 peerURLs=http://123.78.9.18:2380 clientURLs= isLeader=false ce6170fb51239c3e: name=etcd1 peerURLs=http://123.45.6.14:2380 clientURLs=http://172.23.8.14:2379 isLeader=true At this point, what should happen? I checked via etcdctl to see if the /opensaf/ directory exists and it doesn’t. ]# etcdctl ls /opensaf ]# etcdctl ls / ]# Am I still missing some configuration? Regards, David From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Sent: Monday, June 1, 2020 2:58 AM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Cc: opensaf-users@lists.sourceforge.net<mailto:opensaf-users@lists.sourceforge.net> Subject: Re: opensaf and etcd ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi In the plugin, there's some text that describes how to do it. But it assumes you have etcdctl installed on the SC. The actual cluster can be elsewhere. "If you have configured etcd to run elsewhere, please add the '--endpoints' option to etcdctl in the plugin." If you don't want to install etcdctl on the SCs, then you could write a custom plugin that uses the REST interface provided by etcd. There's a sample.plugin file that describes the 'API' that be implemented. Gary ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: 01 June 2020 13:31 To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> <Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>> Subject: Re: opensaf and etcd Sorry, hit send before I was finished. My question was: Does opensaf already have the code in place to communicate with the etcd members if the etcd cluster is outside of opensaf's cluster? -David Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: Sunday, May 31, 2020 11:28:51 PM To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> <Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>> Subject: Re: opensaf and etcd Actually, we're leaning towards having the etcd cluster outside of the opensaf cluster. But that scenario I described came up for discussion too. Now if the etcd cluster is outside the opensaf cluster, after enabling opensaf's etcd option, is it just a matter of providing the IPs (plus port) for each of the etcd members? Does opensaf have the code Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Sent: Sunday, May 31, 2020 11:14:28 PM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Cc: Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> <Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>> Subject: Re: opensaf and etcd ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David I guess that hybrid approach should work. opensaf doesn’t really care about how the etcd cluster is configured, as long as a quorum in the etcd cluster can be maintained during the entire lifetime. Gary — ________________________________ From: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>> Sent: Monday, June 1, 2020 1:09:34 PM To: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net> <Opensaf-users@lists.sourceforge.net<mailto:Opensaf-users@lists.sourceforge.net>> Subject: Re: opensaf and etcd Ok, so what if I have 2 SC nodes where etcd is running locally on each, and then a 3rd etcd node outside of the opensaf cluster. Is this etcd config valid? If so, how would opensaf's etcd configuration look like? Or does the etcd cluster have to be either: - all within the opensaf cluster or - all outside the opensaf cluster -David Get Outlook for Android<https://aka.ms/ghei36> From: Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Sent: Thursday, May 28, 2020 9:35 PM To: Hoyt, David <dh...@rbbn.com<mailto:dh...@rbbn.com>>; opensaf-users@lists.sourceforge.net<mailto:opensaf-users@lists.sourceforge.net> Subject: Re: opensaf and etcd ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi David There are some docs AndersW wrote in these tickets, for background info: https://sourceforge.net/p/opensaf/tickets/64/<https://sourceforge.net/p/opensaf/tickets/64> https://sourceforge.net/p/opensaf/tickets/2795/<https://sourceforge.net/p/opensaf/tickets/2795> You have to decide whether your etcd cluster is internal or external to your OpenSAF cluster. Basically, there are hooks in the OpenSAF code to obtain a "lock" in etcd before a node is promoted to Active. The sample etcd3.plugin assumes there's an etcd instance running locally on each SC. The configuration is in fmd.conf # To enable split brain prevention, change to 1 export FMS_SPLIT_BRAIN_PREVENTION=1 # Full path to key-value store plugin export FMS_KEYVALUE_STORE_PLUGIN_CMD=/full/path/to/etcd3.plugin Gary ________________________________ Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. ________________________________ _______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users