Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-13 Thread Hans Nordebäck

Hi,



On 04/12/2018 04:15 PM, Gary Lee wrote:

Hi


On 12/04/18 23:34, Anders Widell wrote:

Ack with comments:

* There is no need to use "const" when passing function arguments by 
value. E.g. the argument "const uint64_t cluster_size" should be 
"uint64_t cluster_size".




[GL] Sure, but it doesn't do any harm, and would stop accidental 
assignments (that would be lost anyway).
[HansN] perhaps these functions should be const member functions? E.g. 
SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t 
cluster_size) const;


* You assume that all nodes in the cluster have synchronized clocks 
(probably using NTP). Would it be possible to use an expiration time 
for the etcd key instead of writing a time stamp in the value, so 
that etcd automatically deletes the takeover request when it expires? 
That way we would not require synchronized clocks.




[GL] Good idea. I did question why I hadn't use TTL/lease once I had 
finished the ticket. :-) Will see what I can do!



regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:
Summary: split-brain: select active SC from largest network 
partition V3 [#2795]

Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    n
  Build system    n
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    n
  OpenSAF services    y
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology 
events to be processed first

osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to 
KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to 
answer takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted 
to active, it will
   create a takeover request in the KV store. An existing SC can 
reject the lock takeover




revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key 
function [#2795]


- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc    |  26 +-
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 +-
  src/osaf/consensus/consensus.cc  | 435 
++-

  src/osaf/consensus/consensus.h   |  55 +++-
  src/osaf/consensus/key_value.cc  | 105 +---
  src/osaf/consensus/key_value.h   |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 
++

  src/osaf/consensus/plugins/sample.plugin |  67 -
  src/rde/rded/rde_cb.h    |  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 ++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc   

Re: [devel] [PATCH 1/1] imma: Do not finalize previously instialized privateOmHandle in saImmOiAugmentCcbInitialize [#2827]

2018-04-13 Thread Vu Minh Nguyen
Hi Hoa,

Ack from me with minor suggestion, started with [Vu].

Regards, Vu

> -Original Message-
> From: Hoa Le [mailto:hoa...@dektech.com.au]
> Sent: Friday, April 6, 2018 10:52 AM
> To: ravisekhar.ko...@oracle.com; vu.m.ngu...@dektech.com.au;
> zoran.milinko...@ericsson.com
> Cc: opensaf-devel@lists.sourceforge.net; Hoa Le 
> Subject: [PATCH 1/1] imma: Do not finalize previously instialized
> privateOmHandle in saImmOiAugmentCcbInitialize [#2827]
> 
> Currently in saImmOiAugmentCcbInitialize, if OI augmented ccb handle
> initialization request does not result in OK or TRY_AGAIN, the
> privateOmHandle will be finalized and the private OM will be destroyed.
> 
> That is fine if the privateOmHandle is initialized in the current
> session. But if the privateOmHandle was already initialized in another
> previous callback, which means the privateOmHandle record was added to
> imma_oi_ccb_record of the corresponding client node, finalizing this
> privateOmHandle without cleaning-up imma_oi_ccb_record may call up
> unexpected behavior of the Implementer.
> 
> This patch checks if the privateOmHandle is the previous initialized
> privateOmHandle to avoid finalizing it. This privateOmHandle will be
> automatically finalized and cleanup-up in CCB abort or CCB apply.
> ---
>  src/imm/agent/imma_oi_api.cc | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/imm/agent/imma_oi_api.cc b/src/imm/agent/imma_oi_api.cc
> index 29fb39d..187e891 100644
> --- a/src/imm/agent/imma_oi_api.cc
> +++ b/src/imm/agent/imma_oi_api.cc
> @@ -4098,9 +4098,11 @@ done:
>  TRACE("ownerHandle:%llx", *ownerHandle);
> 
>} else if (privateOmHandle) {
> -osafassert(immsv_om_handle_finalize);
> -immsv_om_handle_finalize(
> -privateOmHandle); /* Also finalizes admo handles & ccb handles*/
> +if (privateOmHandle != ccb_oi_record->privateAugOmHandle) {
[Vu] Is it good to combine 02 `if` conditions into one? Such as:
else if (privateOmHandle && privateOmHandle !=
ccb_oi_record->privateAugOmHandle) {
osafassert(immsv_om_handle_finalize);
immsv_om_handle_finalize(privateOmHandle);
}

> +  osafassert(immsv_om_handle_finalize);
> +  immsv_om_handle_finalize(
> +  privateOmHandle); /* Also finalizes admo handles & ccb
handles*/
> +}
>}
> 
>if (out_evt) {
> --
> 2.7.4



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/5] Review Request for split-brain: select active SC from largest network partition V3 [#2795]

2018-04-13 Thread Gary Lee

Hi Hans

Yes, they could be declared const member functions, as they generally 
don't change anything in the object. The changes are actually in the KV 
store.


But I guess we could potentially mislead callers about the intentions of 
the functions though.


What do you think?

/Gary

On 13/04/18 16:16, Hans Nordebäck wrote:

Hi,



On 04/12/2018 04:15 PM, Gary Lee wrote:

Hi


On 12/04/18 23:34, Anders Widell wrote:

Ack with comments:

* There is no need to use "const" when passing function arguments by 
value. E.g. the argument "const uint64_t cluster_size" should be 
"uint64_t cluster_size".




[GL] Sure, but it doesn't do any harm, and would stop accidental 
assignments (that would be lost anyway).
[HansN] perhaps these functions should be const member functions? E.g. 
SaAisErrorT PromoteThisNode(bool graceful_takeover, uint64_t 
cluster_size) const;


* You assume that all nodes in the cluster have synchronized clocks 
(probably using NTP). Would it be possible to use an expiration time 
for the etcd key instead of writing a time stamp in the value, so 
that etcd automatically deletes the takeover request when it 
expires? That way we would not require synchronized clocks.




[GL] Good idea. I did question why I hadn't use TTL/lease once I had 
finished the ticket. :-) Will see what I can do!



regards,
Anders Widell

On 04/11/2018 09:35 AM, Gary Lee wrote:
Summary: split-brain: select active SC from largest network 
partition V3 [#2795]

Review request for Ticket(s): 2795
Peer Reviewer(s): Anders, Ravi, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: 1c302a300e449e8a8527671fbd6c7f4e2b41e95d
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    n
  Build system    n
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    n
  OpenSAF services    y
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

*** Changes from V2: ***

fmd: made cluster_size atomic
fmd: wait 3 seconds before promoting to active, to allow topology 
events to be processed first

osaf: add check for existing takeover request, before trying to lock
etcdv3 plugin: reliablity improvements


revision c7bc78656d5de11f6147727bd8612274fb6e438f
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:46 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 4899e5d0f5abdff8f15eca8ad17d3b13b6a00393
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:16:18 +1000

fmd: adapt to new Consensus API [#2795]



revision 812a315af21df06b2f9fdcc3d8fd5b7bbad3e550
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:15:41 +1000

amfd: adapt to new Consensus API [#2795]



revision b8a37c1b8965826e5faffbfebc44a84bdb6433a1
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:14:39 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to 
KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE 
to answer takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted 
to active, it will
   create a takeover request in the KV store. An existing SC can 
reject the lock takeover




revision 955be872ba5887b1b521eac9f7732dd3f6afc593
Author:    Gary Lee 
Date:    Wed, 11 Apr 2018 17:13:45 +1000

osaf: extend API to include a create key and an enhanced set key 
function [#2795]


- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc    |  26 +-
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 +-
  src/osaf/consensus/consensus.cc  | 435 
++-

  src/osaf/consensus/consensus.h   |  55 +++-
  src/osaf/consensus/key_value.cc  | 105 +---
  src/osaf/consensus/key_value.h   |  19 +-
  src/osaf/consensus/plugins/etcd.plugin   |  86 +-
  src/osaf/consensus/plugins/etcd3.plugin  | 366 

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-13 Thread Ravi Sekhar Reddy Konda
HI Hans,

 

The use case that we are addressing here is link flickering  when remote 
fencing is not enabled, Also remote fencing using Stonith is valid only in 
Virtualization environments. I have not tested using Stonith enabled as the use 
case is in the case where remote fencing is disabled.

 

Thanks,

Ravi 

 

 

From: Hans Nordebäck [mailto:hans.nordeb...@ericsson.com] 
Sent: Friday, April 13, 2018 1:10 AM
To: ravi-sekhar ; Anders Widell 

Cc: opensaf-devel@lists.sourceforge.net
Subject: SV: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

 

Hi Ravi,

 

I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith support.

It is important to have a separate interface for stonith, to be able to perform 
the remote fencing, similar to use a back plane.

Have you tested with stonith enabled? 

 

/Regards HansN 

  _  

Från: ravi-sekhar mailto:ravisekhar.ko...@oracle.com"ravisekhar.ko...@oracle.com>
Skickat: den 12 april 2018 15:29:13
Till: Hans Nordebäck; Anders Widell
Kopia: HYPERLINK 
"mailto:opensaf-devel@lists.sourceforge.net"opensaf-devel@lists.sourceforge.net;
 ravi-sekhar
Ämne: [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833] 

 

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi
 
 NODE_ID_FILE=$pkglocalstatedir/node_id
 
@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors that keep 
restarting the node
 # exit 0
 
+    # If the application is using different interface for cluster 
communication, please
+    # add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+  
+    # Isolate the node
+    if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+    else
+   $icmd pkill -STOP osafdtmd
+    fi
 
 # Start a reboot supervision background process. Note that a 
similar
 # supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
 fi
 
-   # Stop some important opensaf processes to prevent bad things 
from happening
-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush
 
-- 
1.9.1
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] dtm: Add --delete option to osaflog command for deleting log streams [#2837]

2018-04-13 Thread Anders Widell
Make it possible to delete log streams in the internal OpenSAF log server.  This
will free up resources for log streams that are no longer used, as well as make
it possible to create a new log stream with the same name but different
configuration options (max file size and number of backups).
---
 src/dtm/README  |  7 --
 src/dtm/tools/osaflog.cc| 49 +
 src/dtm/transport/log_server.cc | 19 ++--
 src/dtm/transport/log_server.h  |  2 ++
 4 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/src/dtm/README b/src/dtm/README
index c671cfd39..430ff1950 100644
--- a/src/dtm/README
+++ b/src/dtm/README
@@ -185,8 +185,11 @@ Options:
   server to disk even when no LOGSTREAM
   is specified.
 --print   print the messages stored on disk for the
-  specified LOGSTREAM. This option is default
-  when no option is specified.
+  specified LOGSTREAM(s). This option is the
+  default when no option is specified.
+--delete  Delete the specified LOGSTREAM(s) by
+  removing allocated resources in the log
+  server. Does not delete log files from disk.
 --max-file-size=SIZE  Set the maximum size of the log file to
   SIZE bytes. The log file will be rotated
   when it exceeds this size. Suffixes k, M and
diff --git a/src/dtm/tools/osaflog.cc b/src/dtm/tools/osaflog.cc
index 572c68ad1..c5946eea7 100644
--- a/src/dtm/tools/osaflog.cc
+++ b/src/dtm/tools/osaflog.cc
@@ -47,6 +47,7 @@ bool Flush();
 base::UnixServerSocket* CreateSocket();
 uint64_t Random64Bits(uint64_t seed);
 bool PrettyPrint(const std::string& log_stream);
+bool Delete(const std::string& log_stream);
 std::list OpenLogFiles(const std::string& log_stream);
 std::string PathName(const std::string& log_stream, int suffix);
 uint64_t GetInode(int fd);
@@ -61,21 +62,23 @@ int main(int argc, char** argv) {
   struct option long_options[] = {{"max-file-size", required_argument, 0, 'm'},
   {"max-backups", required_argument, 0, 'b'},
   {"flush", no_argument, 0, 'f'},
-  {"print", required_argument, nullptr, 'p'},
+  {"print", no_argument, nullptr, 'p'},
+  {"delete", no_argument, nullptr, 'd'},
   {0, 0, 0, 0}};
 
   uint64_t max_file_size = 0;
   uint64_t max_backups = 0;
-  char *pretty_print_argument = NULL;
   int option = 0;
 
   int long_index = 0;
   bool flush_result =  true;
   bool print_result =  true;
+  bool delete_result =  true;
   bool max_file_size_result = true;
   bool number_of_backups_result = true;
   bool flush_set = false;
   bool pretty_print_set = false;
+  bool delete_set = false;
   bool max_file_size_set = false;
   bool max_backups_set = false;
 
@@ -88,10 +91,12 @@ int main(int argc, char** argv) {
long_options, _index)) != -1) {
 switch (option) {
  case 'p':
-   pretty_print_argument = optarg;
pretty_print_set = true;
flush_set = true;
  break;
+ case 'd':
+   delete_set = true;
+ break;
  case 'f':
flush_set = true;
  break;
@@ -115,12 +120,13 @@ int main(int argc, char** argv) {
 }
   }
 
-  if (argc - optind == 1) {
- flush_result = Flush();
- flush_set = false;
- print_result = PrettyPrint(argv[optind]);
- pretty_print_set = false;
-  } else if (argc - optind > 1) {
+  if (argc > optind && !pretty_print_set && !delete_set) {
+pretty_print_set = true;
+flush_set = true;
+  }
+
+  if ((argc <= optind && (pretty_print_set || delete_set)) ||
+  (pretty_print_set && delete_set)) {
  PrintUsage(argv[0]);
  exit(EXIT_FAILURE);
   }
@@ -129,7 +135,14 @@ int main(int argc, char** argv) {
  flush_result = Flush();
   }
   if (pretty_print_set == true) {
- print_result = PrettyPrint(pretty_print_argument);
+while (print_result && optind < argc) {
+  print_result = PrettyPrint(argv[optind++]);
+}
+  }
+  if (delete_set == true) {
+while (delete_result && optind < argc) {
+  delete_result = Delete(argv[optind++]);
+}
   }
   if (max_backups_set == true) {
  number_of_backups_result = NoOfBackupFiles(max_backups);
@@ -138,7 +151,7 @@ int main(int argc, char** argv) {
  max_file_size_result = MaxTraceFileSize(max_file_size);
   }
   if (flush_result && print_result && max_file_size_result &&
-  number_of_backups_result)
+  delete_result && number_of_backups_result)
  exit(EXIT_SUCCESS);
   exit(EXIT_FAILURE);
 }
@@ 

[devel] [PATCH 0/1] Review Request for dtm: Add --delete option to osaflog command for deleting log streams [#2837]

2018-04-13 Thread Anders Widell
Summary: dtm: Add --delete option to osaflog command for deleting log streams 
[#2837]
Review request for Ticket(s): 2837
Peer Reviewer(s): Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2837
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/anders-w/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision ac0eada7b92e78ac7b804ed108a20d8d4442fb8f
Author: Anders Widell 
Date:   Fri, 13 Apr 2018 17:18:34 +0200

dtm: Add --delete option to osaflog command for deleting log streams [#2837]

Make it possible to delete log streams in the internal OpenSAF log server.  This
will free up resources for log streams that are no longer used, as well as make
it possible to create a new log stream with the same name but different
configuration options (max file size and number of backups).



Complete diffstat:
--
 src/dtm/README  |  7 --
 src/dtm/tools/osaflog.cc| 49 +
 src/dtm/transport/log_server.cc | 19 ++--
 src/dtm/transport/log_server.h  |  2 ++
 4 files changed, 59 insertions(+), 18 deletions(-)


Testing Commands:
-

Enable immnd trace
osaflog --set-file-size=1k osafimmnd
osaflog --delete osafimmnd


Testing, Expected Results:
--

The new setting (file-size=1k) shall be applied to the immnd trace log.


Conditions of Submission:
-

Ack from reviewer(s), or on 2018-04-19


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net

Re: [devel] [PATCH 0/6] Review Request for split-brain: select active SC from largest network partition V4 [#2795]

2018-04-13 Thread Anders Widell

Ack for the series.

regards,

Anders Widell


On 04/13/2018 01:50 PM, Gary Lee wrote:

Summary: split-brain: select active SC from largest network partition V4 [#2795]
Review request for Ticket(s): 2795
Peer Reviewer(s): Hans, Ravi, Anders
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesn
  OpenSAF servicesy
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

Note: Patches 1 to 4 are identical to V3.

Patch 5 (osaf: remove timestamp from takeover request) is the diff
between V3 and V4.

revision c8ca1da49c61d207cc7c1fa706f35fb3950b492d
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: remove timestamp from takeover request [#2795]

* update create() in the plugins to include a timeout parameter
* remove timestamp from the takeover request and utilise the
   built-in timeout functionality in the KV store



revision 62f7dea83a9aa0a1391668521b73a736dfe25ce8
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 360cb1c47af846b910dfc15ebaa08e6659786d11
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

fmd: adapt to new Consensus API [#2795]



revision 55b98a2187fbdcf030497f9f51044a782639a53b
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

amfd: adapt to new Consensus API [#2795]



revision 86a62aafa4a20cb7036c18280706c58863edbca7
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
   create a takeover request in the KV store. An existing SC can reject the 
lock takeover



revision a7f1aa369e3796dde15fd11e5c00df657c90a8f2
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc|  26 +--
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 ++-
  src/osaf/consensus/consensus.cc  | 390 ++-
  src/osaf/consensus/consensus.h   |  54 -
  src/osaf/consensus/key_value.cc  | 106 ++---
  src/osaf/consensus/key_value.h   |  20 +-
  src/osaf/consensus/plugins/etcd.plugin   |  88 ++-
  src/osaf/consensus/plugins/etcd3.plugin  | 389 ++
  src/osaf/consensus/plugins/sample.plugin |  69 +-
  src/rde/rded/rde_cb.h|  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 +++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc |  46 ++--
  src/rde/rded/role.h  |   2 +-
  18 files changed, 1168 insertions(+), 183 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review 

Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-13 Thread Anders Widell

A question: why did you remove the "pkill -STOP osafamfwd" etc commands?

regards,

Anders Widell


On 04/12/2018 03:29 PM, ravi-sekhar wrote:

---
  scripts/opensaf_reboot | 19 +--
  1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
  if [ -f "$pkgsysconfdir/fmd.conf" ]; then
. "$pkgsysconfdir/fmd.conf"
  fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi
  
  NODE_ID_FILE=$pkglocalstatedir/node_id
  
@@ -118,7 +121,17 @@ else

# uncomment the following line if debugging errors that keep 
restarting the node
# exit 0
  
+# If the application is using different interface for cluster communication, please

+# add your application specific isolation commands here
+
logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"
+
+# Isolate the node
+if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+else
+   $icmd pkill -STOP osafdtmd
+fi
  
  		# Start a reboot supervision background process. Note that a similar

# supervision is also done in the opensaf_reboot() function in 
LEAP.
@@ -128,12 +141,6 @@ else
(sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" > 
"/proc/sysrq-trigger") &
fi
  
-		# Stop some important opensaf processes to prevent bad things from happening

-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
# Flush OpenSAF internal log server messages to disk.
$bindir/osaflog --flush
  



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

2018-04-13 Thread Hans Nordebäck

Hi Ravi,


stonith is not only valid for virutalized environment, I assume stonith 
supports other e.g. ipmi in a legacy environment. The probability for 
"flickering" may be higher in a virtualized environment,


but for redundancy there should be two interfaces configured, which is 
the normal configuration in legacy. If the problem in this ticket is 
solved by using stonith I don't see a need for adding this patch.


BTW do this patch work when stonith is enabled?

/Regards HansN


On 04/13/2018 10:59 AM, Ravi Sekhar Reddy Konda wrote:


HI Hans,

The use case that we are addressing here is link flickering  when 
remote fencing is not enabled, Also remote fencing using Stonith is 
valid only in Virtualization environments. I have not tested using 
Stonith enabled as the use case is in the case where remote fencing is 
disabled.


Thanks,

Ravi

*From:*Hans Nordebäck [mailto:hans.nordeb...@ericsson.com]
*Sent:* Friday, April 13, 2018 1:10 AM
*To:* ravi-sekhar ; Anders Widell 


*Cc:* opensaf-devel@lists.sourceforge.net
*Subject:* SV: [PATCH 1/1] osaf: Isolate the node in the 
opensaf_reboot [#2833]


Hi Ravi,

I think stonith, implemented in ticket #1859, handles this case. This 
"flickering" was one the (manual) tests verifying the added stonith 
support.


It is important to have a separate interface for stonith, to be able 
to perform the remote fencing, similar to use a back plane.


Have you tested with stonith enabled?

/Regards HansN



*Från:*ravi-sekhar >

*Skickat:* den 12 april 2018 15:29:13
*Till:* Hans Nordebäck; Anders Widell
*Kopia:* opensaf-devel@lists.sourceforge.net 
; ravi-sekhar

*Ämne:* [PATCH 1/1] osaf: Isolate the node in the opensaf_reboot [#2833]

---
 scripts/opensaf_reboot | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/opensaf_reboot b/scripts/opensaf_reboot
index df65c26..b219c39 100644
--- a/scripts/opensaf_reboot
+++ b/scripts/opensaf_reboot
@@ -37,6 +37,9 @@ export LD_LIBRARY_PATH=$libdir:$LD_LIBRARY_PATH
 if [ -f "$pkgsysconfdir/fmd.conf" ]; then
   . "$pkgsysconfdir/fmd.conf"
 fi
+if [ -f "$pkgsysconfdir/nid.conf" ]; then
+  . "$pkgsysconfdir/nid.conf"
+fi

 NODE_ID_FILE=$pkglocalstatedir/node_id

@@ -118,7 +121,17 @@ else
 # uncomment the following line if debugging errors 
that keep restarting the node

 # exit 0

+    # If the application is using different interface for 
cluster communication, please

+    # add your application specific isolation commands here
+
 logger -t "opensaf_reboot" "Rebooting local node; 
timeout=$OPENSAF_REBOOT_TIMEOUT"

+
+    # Isolate the node
+    if [ "$MDS_TRANSPORT" = "TIPC" ]; then
+   tipc-config -bd eth:$TIPC_ETH_IF
+    else
+   $icmd pkill -STOP osafdtmd
+    fi

 # Start a reboot supervision background process. Note 
that a similar
 # supervision is also done in the opensaf_reboot() 
function in LEAP.

@@ -128,12 +141,6 @@ else
 (sleep "$OPENSAF_REBOOT_TIMEOUT"; echo -n "b" 
> "/proc/sysrq-trigger") &

 fi

-   # Stop some important opensaf processes to prevent bad 
things from happening

-   $icmd pkill -STOP osafamfwd
-   $icmd pkill -STOP osafamfnd
-   $icmd pkill -STOP osafamfd
-   $icmd pkill -STOP osaffmd
-
 # Flush OpenSAF internal log server messages to disk.
 $bindir/osaflog --flush

--
1.9.1



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 0/6] Review Request for split-brain: select active SC from largest network partition V4 [#2795]

2018-04-13 Thread Gary Lee

Sorry, it should read:

Note: Patches 1 to 5 are identical to V3.

Patch 6 (osaf: remove timestamp from takeover request) is the diff
between V3 and V4.


On 13/04/18 21:50, Gary Lee wrote:

Summary: split-brain: select active SC from largest network partition V4 [#2795]
Review request for Ticket(s): 2795
Peer Reviewer(s): Hans, Ravi, Anders
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesn
  OpenSAF servicesy
  Core libraries  y
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

Note: Patches 1 to 4 are identical to V3.

Patch 5 (osaf: remove timestamp from takeover request) is the diff
between V3 and V4.

revision c8ca1da49c61d207cc7c1fa706f35fb3950b492d
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: remove timestamp from takeover request [#2795]

* update create() in the plugins to include a timeout parameter
* remove timestamp from the takeover request and utilise the
   built-in timeout functionality in the KV store



revision 62f7dea83a9aa0a1391668521b73a736dfe25ce8
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
   of cluster members

- listen for takeover requests in KV store



revision 360cb1c47af846b910dfc15ebaa08e6659786d11
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

fmd: adapt to new Consensus API [#2795]



revision 55b98a2187fbdcf030497f9f51044a782639a53b
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

amfd: adapt to new Consensus API [#2795]



revision 86a62aafa4a20cb7036c18280706c58863edbca7
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
   create a takeover request in the KV store. An existing SC can reject the 
lock takeover



revision a7f1aa369e3796dde15fd11e5c00df657c90a8f2
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

  src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
  src/amf/amfd/role.cc |   2 +-
  src/fm/fmd/fm_cb.h   |   2 +-
  src/fm/fmd/fm_main.cc|  26 +--
  src/fm/fmd/fm_mds.cc |   2 +
  src/fm/fmd/fm_rda.cc |  27 ++-
  src/osaf/consensus/consensus.cc  | 390 ++-
  src/osaf/consensus/consensus.h   |  54 -
  src/osaf/consensus/key_value.cc  | 106 ++---
  src/osaf/consensus/key_value.h   |  20 +-
  src/osaf/consensus/plugins/etcd.plugin   |  88 ++-
  src/osaf/consensus/plugins/etcd3.plugin  | 389 ++
  src/osaf/consensus/plugins/sample.plugin |  69 +-
  src/rde/rded/rde_cb.h|  12 +-
  src/rde/rded/rde_main.cc |  75 --
  src/rde/rded/rde_mds.cc  |  39 +++-
  src/rde/rded/rde_rda.cc  |   2 +-
  src/rde/rded/role.cc |  46 ++--
  src/rde/rded/role.h  |   2 +-
  18 files changed, 1168 insertions(+), 183 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64 

[devel] [PATCH 1/6] osaf: extend API to include a create key and an enhanced set key function [#2795]

2018-04-13 Thread Gary Lee
- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin
---
 src/osaf/consensus/plugins/etcd.plugin   |  86 +++-
 src/osaf/consensus/plugins/etcd3.plugin  | 366 +++
 src/osaf/consensus/plugins/sample.plugin |  67 +-
 3 files changed, 501 insertions(+), 18 deletions(-)
 create mode 100644 src/osaf/consensus/plugins/etcd3.plugin

diff --git a/src/osaf/consensus/plugins/etcd.plugin 
b/src/osaf/consensus/plugins/etcd.plugin
index 586059b32..6ed85ac92 100644
--- a/src/osaf/consensus/plugins/etcd.plugin
+++ b/src/osaf/consensus/plugins/etcd.plugin
@@ -29,7 +29,7 @@ readonly etcd_timeout="5s"
 #   0 - success,  is echoed to stdout
 #   non-zero - failure
 get() {
-  readonly key=$1
+  readonly key="$1"
 
   if value=$(etcdctl $etcd_options --timeout $etcd_timeout get 
"$directory$key" 2>&1)
   then
@@ -49,8 +49,8 @@ get() {
 #   0 - success
 #   non-zero - failure
 setkey() {
-  readonly key=$1
-  readonly value=$2
+  readonly key="$1"
+  readonly value="$2"
 
   if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
 "$value" >/dev/null
@@ -61,6 +61,58 @@ setkey() {
   fi
 }
 
+# create
+#   create  and set to  in key-value store. Fails if the key
+#   already exists
+# params:
+#   $1 - 
+#   $2 - 
+# returns:
+#   0 - success
+#   1 - already exists
+#   2 or above - other failure
+create_key() {
+  readonly key="$1"
+  readonly value="$2"
+
+  if output=$(etcdctl $etcd_options --timeout $etcd_timeout mk 
"$directory$key" \
+"$value" 2>&1)
+  then
+return 0
+  else
+if echo $output | grep "already exists"
+then
+  return 1
+fi
+  fi
+
+  return 2
+}
+
+# set
+#   set  to  in key-value store, if the existing value matches
+#   
+# params:
+#   $1 - 
+#   $2 - 
+#   $3 - 
+# returns:
+#   0 - success
+#   non-zero - failure
+setkey_match_prev() {
+  readonly key="$1"
+  readonly value="$2"
+  readonly prev="$3"
+
+  if etcdctl $etcd_options --timeout $etcd_timeout set "$directory$key" \
+"$value" --swap-with-value "$prev" >/dev/null
+  then
+return 0
+  else
+return 1
+  fi
+}
+
 # erase
 #   erase  in key-value store
 # params:
@@ -69,7 +121,7 @@ setkey() {
 #   0 - success
 #   non-zero - failure
 erase() {
-  readonly key=$1
+  readonly key="$1"
 
   if etcdctl $etcd_options --timeout $etcd_timeout \
 rm "$directory$key" >/dev/null 2>&1
@@ -90,8 +142,8 @@ erase() {
 #   2 or above - other failure
 # NOTE: if lock is already acquired by , then timeout is extended
 lock() {
-  readonly owner=$1
-  readonly timeout=$2
+  readonly owner="$1"
+  readonly timeout="$2"
 
   if etcdctl $etcd_options --timeout $etcd_timeout \
 mk "$directory$keyname" "$owner" \
@@ -145,7 +197,7 @@ lock_owner() {
 #   2 or above - other failure
 #
 unlock() {
-  readonly owner=$1
+  readonly owner="$1"
   readonly forced=${2:-false}
 
   if [ "$forced" = false ]; then
@@ -185,7 +237,7 @@ unlock() {
 #   0 - success,  is echoed to stdout
 #   non-zero - failure
 watch() {
-  readonly key=$1
+  readonly key="$1"
 
   if value=$(etcdctl $etcd_options --timeout $etcd_timeout \
 watch "$directory$key" 2>&1)
@@ -216,6 +268,22 @@ case "$1" in
 setkey "$2" "$3"
 exit $?
 ;;
+  set_if_prev)
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 set   "
+  exit 1
+fi
+setkey_match_prev "$2" "$3" "$4"
+exit $?
+;;
+  create)
+if [ "$#" -ne 3 ]; then
+  echo "Usage: $0 create  "
+  exit 1
+fi
+create_key "$2" "$3"
+exit $?
+;;
   erase)
 if [ "$#" -ne 2 ]; then
   echo "Usage: $0 erase "
@@ -269,7 +337,7 @@ case "$1" in
 exit $?
 ;;
   *)
-echo "Usage: $0 {get|set|erase|lock|unlock|lock_owner|watch|watch_lock}"
+echo "Usage: $0 
{get|set|create|set_if_prev|erase|lock|unlock|lock_owner|watch|watch_lock}"
 ;;
 esac
 
diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
new file mode 100644
index 0..451440567
--- /dev/null
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -0,0 +1,366 @@
+#!/usr/bin/env bash
+#  -*- OpenSAF  -*-
+#
+# (C) Copyright 2018 Ericsson AB 2018 - All Rights Reserved.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+# under the GNU Lesser General Public License Version 2.1, February 1999.
+# The complete license can be accessed from the following location:
+# http://opensource.org/licenses/lgpl-license.php
+# See the Copying file included with the OpenSAF distribution for full
+# licensing terms.
+#
+# Please note: this API is subject to change and may be modified
+# in a future version of OpenSAF. Future API versions may not be
+# backward compatible. This plugin may need to be adapted.
+
+readonly 

[devel] [PATCH 3/6] amfd: adapt to new Consensus API [#2795]

2018-04-13 Thread Gary Lee
---
 src/amf/amfd/role.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index c8aa9cf1f..790983ee7 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -1217,7 +1217,7 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
   osaf_mutex_unlock_ordie(_reinit_mutex);
 
   Consensus consensus_service;
-  rc = consensus_service.PromoteThisNode();
+  rc = consensus_service.PromoteThisNode(false, 0);
   if (rc != SA_AIS_OK) {
 LOG_ER("Unable to set active controller in consensus service");
 osafassert(false);
-- 
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 4/6] fmd: adapt to new Consensus API [#2795]

2018-04-13 Thread Gary Lee
---
 src/fm/fmd/fm_cb.h|  2 +-
 src/fm/fmd/fm_main.cc | 26 +-
 src/fm/fmd/fm_mds.cc  |  2 ++
 src/fm/fmd/fm_rda.cc  | 27 ++-
 4 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/src/fm/fmd/fm_cb.h b/src/fm/fmd/fm_cb.h
index cfa50d883..010ab735a 100644
--- a/src/fm/fmd/fm_cb.h
+++ b/src/fm/fmd/fm_cb.h
@@ -100,7 +100,7 @@ struct FM_CB {
 
   std::atomic peer_sc_up{false};
   bool well_connected{false};
-  uint64_t cluster_size{};
+  std::atomic cluster_size{};
   struct timespec last_well_connected{};
   struct timespec node_isolation_timeout{};
   SaClmHandleT clm_hdl{};
diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index 73c9b9ccd..3371ec5e8 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -551,21 +551,12 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
* trigerred quicker than the node_down event
* has been received.
*/
-  if (fm_cb->role == PCS_RDA_STANDBY) {
-const std::string current_active =
-consensus_service.CurrentActive();
-if (current_active.compare(osaf_extended_name_borrow(
-_cb->peer_clm_node_name)) == 0) {
-  // update consensus service, before fencing old active controller
-  consensus_service.DemoteCurrentActive();
-}
-  }
 
   if (fm_cb->use_remote_fencing) {
 if (fm_cb->peer_node_terminated == false) {
   // if peer_sc_up is true then
   // the node has come up already
-  if (fm_cb->peer_sc_up == false && fm_cb->immnd_down == true) {
+  if (consensus_service.IsEnabled() == false) {
 opensaf_reboot(fm_cb->peer_node_id,
(char *)fm_cb->peer_clm_node_name.value,
"Received Node Down for peer controller");
@@ -580,8 +571,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 fm_cb->mutex_.Lock();
 peer_node_name = fm_cb->peer_node_name;
 fm_cb->mutex_.Unlock();
-opensaf_reboot(fm_cb->peer_node_id,
-   peer_node_name.c_str(),
+opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(),
"Received Node Down for peer controller");
   }
   if (!((fm_cb->role == PCS_RDA_ACTIVE) &&
@@ -632,12 +622,6 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 }
 
 Consensus consensus_service;
-const std::string current_active = consensus_service.CurrentActive();
-if (current_active.compare(
-osaf_extended_name_borrow(_cb->peer_clm_node_name)) == 0) {
-  // update consensus service, before fencing old active controller
-  consensus_service.DemoteCurrentActive();
-}
 
 /* Now. Try resetting other blade */
 fm_cb->role = PCS_RDA_ACTIVE;
@@ -645,7 +629,8 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
 LOG_NO("Reseting peer controller node id: %x",
unsigned(fm_cb->peer_node_id));
 if (fm_cb->use_remote_fencing) {
-  if (fm_cb->peer_node_terminated == false) {
+  if (fm_cb->peer_node_terminated == false &&
+  consensus_service.IsEnabled() == false) {
 opensaf_reboot(fm_cb->peer_node_id,
(char *)fm_cb->peer_clm_node_name.value,
"Received Node Down for peer controller");
@@ -658,8 +643,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt) {
   fm_cb->mutex_.Lock();
   peer_node_name = fm_cb->peer_node_name;
   fm_cb->mutex_.Unlock();
-  opensaf_reboot(fm_cb->peer_node_id,
- peer_node_name.c_str(),
+  opensaf_reboot(fm_cb->peer_node_id, peer_node_name.c_str(),
  "Received Node Down for Active peer");
 }
 fm_rda_set_role(fm_cb, PCS_RDA_ACTIVE);
diff --git a/src/fm/fmd/fm_mds.cc b/src/fm/fmd/fm_mds.cc
index 277a357d2..60db5dab1 100644
--- a/src/fm/fmd/fm_mds.cc
+++ b/src/fm/fmd/fm_mds.cc
@@ -373,6 +373,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb,
 case NCSMDS_NODE_DOWN:
   if (cb->cluster_size != 0) {
 --cb->cluster_size;
+TRACE("cluster_size %llu", (unsigned long long)cb->cluster_size);
 TRACE("Node down event for node id %x, cluster size is now: %llu",
   node_evt->node_id, (unsigned long long)cb->cluster_size);
 check_for_node_isolation(cb);
@@ -397,6 +398,7 @@ static uint32_t fm_mds_node_evt(FM_CB *cb,
 
 case NCSMDS_NODE_UP:
   ++cb->cluster_size;
+  TRACE("cluster_size %llu", (unsigned long long)cb->cluster_size);
   TRACE("Node up event for node id %x, cluster size is now: %llu",
 node_evt->node_id, (unsigned 

[devel] [PATCH 5/6] rded: adapt to new Consensus API [#2795]

2018-04-13 Thread Gary Lee
- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store
---
 src/rde/rded/rde_cb.h| 12 ++--
 src/rde/rded/rde_main.cc | 75 ++--
 src/rde/rded/rde_mds.cc  | 39 +
 src/rde/rded/rde_rda.cc  |  2 +-
 src/rde/rded/role.cc | 46 -
 src/rde/rded/role.h  |  2 +-
 6 files changed, 137 insertions(+), 39 deletions(-)

diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index fc100849a..f5ad689c3 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -19,12 +19,13 @@
 #define RDE_RDED_RDE_CB_H_
 
 #include 
+#include 
 #include "base/osaf_utility.h"
 #include "mds/mds_papi.h"
 #include "rde/agent/rda_papi.h"
+#include "rde/common/rde_rda_common.h"
 #include "rde/rded/rde_amf.h"
 #include "rde/rded/rde_rda.h"
-#include "rde/common/rde_rda_common.h"
 
 /*
  **  RDE_CONTROL_BLOCK
@@ -39,7 +40,9 @@ struct RDE_CONTROL_BLOCK {
   bool task_terminate;
   RDE_RDA_CB rde_rda_cb;
   RDE_AMF_CB rde_amf_cb;
-  bool monitor_lock_thread_running;
+  bool monitor_lock_thread_running{false};
+  bool monitor_takeover_req_thread_running{false};
+  std::set cluster_members{};
 };
 
 enum RDE_MSG_TYPE {
@@ -47,7 +50,10 @@ enum RDE_MSG_TYPE {
   RDE_MSG_PEER_DOWN = 2,
   RDE_MSG_PEER_INFO_REQ = 3,
   RDE_MSG_PEER_INFO_RESP = 4,
-  RDE_MSG_NEW_ACTIVE_CALLBACK = 5
+  RDE_MSG_NEW_ACTIVE_CALLBACK = 5,
+  RDE_MSG_NODE_UP = 6,
+  RDE_MSG_NODE_DOWN = 7,
+  RDE_MSG_TAKEOVER_REQUEST_CALLBACK = 8
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 78e7256c1..b395312d6 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -17,6 +17,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,17 +29,16 @@
 #include 
 #include 
 #include 
-#include "osaf/consensus/consensus.h"
+#include "base/conf.h"
 #include "base/daemon.h"
 #include "base/logtrace.h"
+#include "base/ncs_main_papi.h"
 #include "base/osaf_poll.h"
 #include "mds/mds_papi.h"
-#include "base/ncs_main_papi.h"
 #include "nid/agent/nid_api.h"
-#include 
+#include "osaf/consensus/consensus.h"
 #include "rde/rded/rde_cb.h"
 #include "rde/rded/role.h"
-#include "base/conf.h"
 
 #define RDA_MAX_CLIENTS 32
 
@@ -47,13 +47,15 @@ enum { FD_TERM = 0, FD_AMF = 1, FD_MBX, FD_RDA_SERVER, 
FD_CLIENT_START };
 static void SendPeerInfoResp(MDS_DEST mds_dest);
 static void CheckForSplitBrain(const rde_msg *msg);
 
-const char *rde_msg_name[] = {
-"-",
-"RDE_MSG_PEER_UP(1)",
-"RDE_MSG_PEER_DOWN(2)",
-"RDE_MSG_PEER_INFO_REQ(3)",
-"RDE_MSG_PEER_INFO_RESP(4)",
-};
+const char *rde_msg_name[] = {"-",
+  "RDE_MSG_PEER_UP(1)",
+  "RDE_MSG_PEER_DOWN(2)",
+  "RDE_MSG_PEER_INFO_REQ(3)",
+  "RDE_MSG_PEER_INFO_RESP(4)",
+  "RDE_MSG_NEW_ACTIVE_CALLBACK(5)"
+  "RDE_MSG_NODE_UP(6)",
+  "RDE_MSG_NODE_DOWN(7)",
+  "RDE_MSG_TAKEOVER_REQUEST_CALLBACK(8)"};
 
 static RDE_CONTROL_BLOCK _rde_cb;
 static RDE_CONTROL_BLOCK *rde_cb = &_rde_cb;
@@ -127,14 +129,16 @@ static void handle_mbx_event() {
   LOG_NO("New active controller notification from consensus service");
 
   if (role->role() == PCS_RDA_ACTIVE) {
-if (my_node.compare(active_controller) != 0) {
+if (my_node.compare(active_controller) != 0 &&
+active_controller.empty() == false) {
   // we are meant to be active, but consensus service doesn't think so
   LOG_WA("Role does not match consensus service. New controller: %s",
-active_controller.c_str());
+ active_controller.c_str());
   if (consensus_service.IsRemoteFencingEnabled() == false) {
 LOG_ER("Probable split-brain. Rebooting this node");
+
 opensaf_reboot(0, nullptr,
-  "Split-brain detected by consensus service");
+   "Split-brain detected by consensus service");
   }
 }
 
@@ -144,6 +148,44 @@ static void handle_mbx_event() {
   }
   break;
 }
+case RDE_MSG_NODE_UP:
+  rde_cb->cluster_members.insert(msg->fr_node_id);
+  TRACE("cluster_size %zu", rde_cb->cluster_members.size());
+  break;
+case RDE_MSG_NODE_DOWN:
+  rde_cb->cluster_members.erase(msg->fr_node_id);
+  TRACE("cluster_size %zu", rde_cb->cluster_members.size());
+  break;
+case RDE_MSG_TAKEOVER_REQUEST_CALLBACK: {
+  rde_cb->monitor_takeover_req_thread_running = false;
+
+  if (role->role() == PCS_RDA_ACTIVE) {
+LOG_NO("Received takeover request. Our network size is %zu",
+   

[devel] [PATCH 6/6] osaf: remove timestamp from takeover request [#2795]

2018-04-13 Thread Gary Lee
* update create() in the plugins to include a timeout parameter
* remove timestamp from the takeover request and utilise the
  built-in timeout functionality in the KV store
---
 src/osaf/consensus/consensus.cc  | 165 ---
 src/osaf/consensus/consensus.h   |  21 ++--
 src/osaf/consensus/key_value.cc  |   5 +-
 src/osaf/consensus/key_value.h   |   3 +-
 src/osaf/consensus/plugins/etcd.plugin   |  12 ++-
 src/osaf/consensus/plugins/etcd3.plugin  |  39 ++--
 src/osaf/consensus/plugins/sample.plugin |  10 +-
 7 files changed, 124 insertions(+), 131 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index f2245dd01..a6248136d 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -25,7 +25,7 @@
 const std::string Consensus::kTakeoverRequestKeyname = "takeover_request";
 
 SaAisErrorT Consensus::PromoteThisNode(const bool graceful_takeover,
-const uint64_t cluster_size) {
+   const uint64_t cluster_size) {
   TRACE_ENTER();
   SaAisErrorT rc;
 
@@ -89,18 +89,8 @@ SaAisErrorT Consensus::PromoteThisNode(const bool 
graceful_takeover,
 }
 
 if (take_over_request_created == true) {
-  SaAisErrorT rc1;
-
-  // remove takeover request
-  rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
-  retries = 0;
-  while (rc1 != SA_AIS_OK && retries < kMaxRetry) {
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
-  }
-
-  if (rc1 != SA_AIS_OK) {
+  rc = RemoveTakeoverRequest();
+  if (rc != SA_AIS_OK) {
 LOG_WA("Could not remove takeover request");
   }
 }
@@ -115,6 +105,23 @@ SaAisErrorT Consensus::PromoteThisNode(const bool 
graceful_takeover,
   return rc;
 }
 
+SaAisErrorT Consensus::RemoveTakeoverRequest() {
+  TRACE_ENTER();
+  SaAisErrorT rc;
+  uint32_t retries = 0;
+
+  // remove takeover request
+  rc = KeyValue::Erase(kTakeoverRequestKeyname);
+  retries = 0;
+  while (rc != SA_AIS_OK && retries < kMaxRetry) {
+++retries;
+std::this_thread::sleep_for(kSleepInterval);
+rc = KeyValue::Erase(kTakeoverRequestKeyname);
+  }
+
+  return rc;
+}
+
 SaAisErrorT Consensus::Demote(const std::string& node) {
   TRACE_ENTER();
   if (use_consensus_ == false) {
@@ -273,6 +280,8 @@ void Consensus::MonitorTakeoverRequest(ConsensusCallback 
callback,
 }
 
 void Consensus::CheckForExistingTakeoverRequest() {
+  TRACE_ENTER();
+
   SaAisErrorT rc;
   std::vector tokens;
   rc = ReadTakeoverRequest(tokens);
@@ -281,22 +290,17 @@ void Consensus::CheckForExistingTakeoverRequest() {
 return;
   }
 
-  // get expiration
-  const uint64_t expiration_timestamp = strtoull(
-  tokens[static_cast(TakeoverElements::TIMESTAMP)].c_str(),
-  0, 10);
+  LOG_NO("A takeover request is in progress");
 
-  // wait until expiration is over
-  int64_t expiration = expiration_timestamp - CurrentTime();
-  if (expiration > 0 && expiration <= kTakeoverValidTime) {
-// @todo check if the takeover request has been deleted already?
-LOG_NO("A takeover request is in progress"
-   " (expiring in %" PRId64 " seconds)",
-   expiration);
-std::chrono::seconds sleep_duration(expiration);
-std::this_thread::sleep_for(sleep_duration);
-  } else {
-LOG_WA("Invalid expiration time (%" PRIu64 ")", expiration_timestamp);
+  uint32_t retries = 0;
+  // wait up to approximately 10 seconds, or until the takeover request is gone
+  rc = ReadTakeoverRequest(tokens);
+  while (rc == SA_AIS_OK &&
+ retries < kMaxTakeoverRetry) {
+++retries;
+TRACE("Takeover request still present");
+std::this_thread::sleep_for(kSleepInterval);
+rc = ReadTakeoverRequest(tokens);
   }
 }
 
@@ -306,24 +310,23 @@ SaAisErrorT Consensus::CreateTakeoverRequest(const 
std::string& current_owner,
   TRACE_ENTER();
 
   // Format of takeover request:
-  // "expiration_timecurrent_ownerproposed_owner
+  // "current_ownerproposed_owner
   // proposed_owner_cluster_sizestatus"
   // status := [UNDEFINED, NEW, REJECTED, ACCEPTED]
 
   std::string takeover_request;
-  // request to expire in 20 seconds
-  uint64_t timestamp = CurrentTime() + kTakeoverValidTime;
 
   takeover_request =
-  std::to_string(timestamp) + " " + current_owner + " " +
-  base::Conf::NodeName() + " " + std::to_string(cluster_size) + " " +
+  current_owner + " " + base::Conf::NodeName() + " " +
+  std::to_string(cluster_size) + " " +
   TakeoverStateStr[static_cast(TakeoverState::NEW)];
 
   TRACE("Takeover request: \"%s\"", takeover_request.c_str());
 
   SaAisErrorT rc;
   uint32_t retries = 0;
-  rc = KeyValue::Create(kTakeoverRequestKeyname, takeover_request);
+  rc = KeyValue::Create(kTakeoverRequestKeyname, takeover_request,
+kTakeoverValidTime);
   while (rc == SA_AIS_ERR_FAILED_OPERATION && 

[devel] [PATCH 2/6] osaf: add lock takeover request fuction [#2795]

2018-04-13 Thread Gary Lee
- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover
---
 src/osaf/consensus/consensus.cc | 435 ++--
 src/osaf/consensus/consensus.h  |  55 -
 src/osaf/consensus/key_value.cc | 105 +++---
 src/osaf/consensus/key_value.h  |  19 +-
 4 files changed, 511 insertions(+), 103 deletions(-)

diff --git a/src/osaf/consensus/consensus.cc b/src/osaf/consensus/consensus.cc
index cc04f3518..f2245dd01 100644
--- a/src/osaf/consensus/consensus.cc
+++ b/src/osaf/consensus/consensus.cc
@@ -15,13 +15,17 @@
 #include "osaf/consensus/consensus.h"
 #include 
 #include 
+#include 
 #include 
 #include "base/conf.h"
 #include "base/getenv.h"
 #include "base/logtrace.h"
 #include "base/ncssysf_def.h"
 
-SaAisErrorT Consensus::PromoteThisNode() {
+const std::string Consensus::kTakeoverRequestKeyname = "takeover_request";
+
+SaAisErrorT Consensus::PromoteThisNode(const bool graceful_takeover,
+const uint64_t cluster_size) {
   TRACE_ENTER();
   SaAisErrorT rc;
 
@@ -29,6 +33,10 @@ SaAisErrorT Consensus::PromoteThisNode() {
 return SA_AIS_OK;
   }
 
+  // check if there is an existing takeover requests, we cannot
+  // attempt to lock until that is complete
+  CheckForExistingTakeoverRequest();
+
   uint32_t retries = 0;
   rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout);
   while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) {
@@ -39,33 +47,30 @@ SaAisErrorT Consensus::PromoteThisNode() {
   }
 
   if (rc == SA_AIS_ERR_EXIST) {
+// there's a chance the lock has been released since the lock attempt
 // get the current active controller
-std::string current_active("");
-retries = 0;
-rc = KeyValue::LockOwner(current_active);
-while (rc != SA_AIS_OK && retries < kMaxRetry) {
-  ++retries;
-  std::this_thread::sleep_for(kSleepInterval);
-  rc = KeyValue::LockOwner(current_active);
-}
-if (rc != SA_AIS_OK) {
-  LOG_ER("Failed to get current lock owner. Will attempt to lock anyway");
+std::string current_active = CurrentActive();
+
+if (current_active.empty() == true) {
+  LOG_WA("Failed to get current lock owner. Will attempt to lock anyway");
 }
 
+bool take_over_request_created = false;
 LOG_NO("Current active controller is %s", current_active.c_str());
 
-// there's a chance the lock has been released since the lock attempt
 if (current_active.empty() == false) {
-  // remove current active controller's lock and fence it
-  retries = 0;
-  rc = KeyValue::Unlock(current_active);
-  while (rc == SA_AIS_ERR_TRY_AGAIN && retries < kMaxRetry) {
-LOG_IN("Trying to unlock");
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::Unlock(current_active);
+  if (graceful_takeover == true) {
+rc = CreateTakeoverRequest(current_active, base::Conf::NodeName(),
+   cluster_size);
+if (rc != SA_AIS_OK) {
+  LOG_WA("Takeover request failed (%d)", rc);
+  return rc;
+}
+take_over_request_created = true;
   }
 
+  // remove current active controller's lock and fence it
+  rc = Demote(current_active);
   if (rc == SA_AIS_OK) {
 FenceNode(current_active);
   } else {
@@ -82,6 +87,23 @@ SaAisErrorT Consensus::PromoteThisNode() {
   std::this_thread::sleep_for(kSleepInterval);
   rc = KeyValue::Lock(base::Conf::NodeName(), kLockTimeout);
 }
+
+if (take_over_request_created == true) {
+  SaAisErrorT rc1;
+
+  // remove takeover request
+  rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
+  retries = 0;
+  while (rc1 != SA_AIS_OK && retries < kMaxRetry) {
+++retries;
+std::this_thread::sleep_for(kSleepInterval);
+rc1 = KeyValue::Erase(kTakeoverRequestKeyname);
+  }
+
+  if (rc1 != SA_AIS_OK) {
+LOG_WA("Could not remove takeover request");
+  }
+}
   }
 
   if (rc == SA_AIS_OK) {
@@ -93,43 +115,23 @@ SaAisErrorT Consensus::PromoteThisNode() {
   return rc;
 }
 
-SaAisErrorT Consensus::Demote(const std::string& node = "") {
+SaAisErrorT Consensus::Demote(const std::string& node) {
   TRACE_ENTER();
   if (use_consensus_ == false) {
 return SA_AIS_OK;
   }
 
-  SaAisErrorT rc = SA_AIS_ERR_FAILED_OPERATION;
-  uint32_t retries = 0;
-
-  // check current active node
-  std::string current_active;
-  rc = KeyValue::LockOwner(current_active);
-  while (rc != SA_AIS_OK && retries < kMaxRetry) {
-++retries;
-std::this_thread::sleep_for(kSleepInterval);
-rc = KeyValue::LockOwner(current_active);
-  }
-
-  if (rc != SA_AIS_OK) {
-LOG_ER("Failed to get 

[devel] [PATCH 0/6] Review Request for split-brain: select active SC from largest network partition V4 [#2795]

2018-04-13 Thread Gary Lee
Summary: split-brain: select active SC from largest network partition V4 [#2795]
Review request for Ticket(s): 2795
Peer Reviewer(s): Hans, Ravi, Anders 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2795
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy 
 Core libraries  y 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

Note: Patches 1 to 4 are identical to V3.

Patch 5 (osaf: remove timestamp from takeover request) is the diff
between V3 and V4.

revision c8ca1da49c61d207cc7c1fa706f35fb3950b492d
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: remove timestamp from takeover request [#2795]

* update create() in the plugins to include a timeout parameter
* remove timestamp from the takeover request and utilise the
  built-in timeout functionality in the KV store



revision 62f7dea83a9aa0a1391668521b73a736dfe25ce8
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

rded: adapt to new Consensus API [#2795]

- add 3 new internal message:

RDE_MSG_NODE_UP
RDE_MSG_NODE_DOWN
RDE_MSG_TAKEOVER_REQUEST_CALLBACK

- subscribe to AMFND service up events to keep track of the number
  of cluster members

- listen for takeover requests in KV store



revision 360cb1c47af846b910dfc15ebaa08e6659786d11
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

fmd: adapt to new Consensus API [#2795]



revision 55b98a2187fbdcf030497f9f51044a782639a53b
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

amfd: adapt to new Consensus API [#2795]



revision 86a62aafa4a20cb7036c18280706c58863edbca7
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: add lock takeover request fuction [#2795]

- add create and set (if previous value matches) functions to KeyValue class
- add Consensus::MonitorTakeoverRequest() function for use by RDE to answer 
takeover requests
- add Consensus::CreateTakeoverRequest() - before a SC is promoted to active, 
it will
  create a takeover request in the KV store. An existing SC can reject the lock 
takeover



revision a7f1aa369e3796dde15fd11e5c00df657c90a8f2
Author: Gary Lee 
Date:   Fri, 13 Apr 2018 21:40:24 +1000

osaf: extend API to include a create key and an enhanced set key function 
[#2795]

- add create_key function (fails if key already exists)
- add setkey_match_prev function (set value if previous value matches)
- add missing quotes
- add etcd3.plugin



Added Files:

 src/osaf/consensus/plugins/etcd3.plugin


Complete diffstat:
--
 src/amf/amfd/role.cc |   2 +-
 src/fm/fmd/fm_cb.h   |   2 +-
 src/fm/fmd/fm_main.cc|  26 +--
 src/fm/fmd/fm_mds.cc |   2 +
 src/fm/fmd/fm_rda.cc |  27 ++-
 src/osaf/consensus/consensus.cc  | 390 ++-
 src/osaf/consensus/consensus.h   |  54 -
 src/osaf/consensus/key_value.cc  | 106 ++---
 src/osaf/consensus/key_value.h   |  20 +-
 src/osaf/consensus/plugins/etcd.plugin   |  88 ++-
 src/osaf/consensus/plugins/etcd3.plugin  | 389 ++
 src/osaf/consensus/plugins/sample.plugin |  69 +-
 src/rde/rded/rde_cb.h|  12 +-
 src/rde/rded/rde_main.cc |  75 --
 src/rde/rded/rde_mds.cc  |  39 +++-
 src/rde/rded/rde_rda.cc  |   2 +-
 src/rde/rded/role.cc |  46 ++--
 src/rde/rded/role.h  |   2 +-
 18 files changed, 1168 insertions(+), 183 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is 

[devel] [PATCH 1/2] base: Add support for setting UnixServerSocket file permissions [#2838]

2018-04-13 Thread Anders Widell
---
 src/base/unix_server_socket.cc | 14 +++---
 src/base/unix_server_socket.h  |  7 +--
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/base/unix_server_socket.cc b/src/base/unix_server_socket.cc
index 620efce9f..8a5290d24 100644
--- a/src/base/unix_server_socket.cc
+++ b/src/base/unix_server_socket.cc
@@ -23,8 +23,9 @@
 
 namespace base {
 
-UnixServerSocket::UnixServerSocket(const std::string& path, Mode mode)
-: UnixSocket{path, mode} {}
+UnixServerSocket::UnixServerSocket(const std::string& path, Mode mode,
+   mode_t permissions)
+: UnixSocket{path, mode}, permissions_{permissions} {}
 
 UnixServerSocket::UnixServerSocket(const sockaddr_un& addr, socklen_t addrlen,
Mode mode)
@@ -46,7 +47,14 @@ bool UnixServerSocket::OpenHook(int sock) {
 close(tmp_sock);
 if (connect_result != 0 && connect_errno == ECONNREFUSED) Unlink();
   }
-  return bind(sock, addr(), addrlen()) == 0;
+  int bind_result = bind(sock, addr(), addrlen());
+  int chmod_result = 0;
+  if (bind_result == 0 && permissions_ != 0 && !IsAbstract()) {
+do {
+  chmod_result = chmod(path(), permissions_);
+} while (chmod_result == -1 && errno == EINTR);
+  }
+  return bind_result == 0 && chmod_result == 0;
 }
 
 void UnixServerSocket::CloseHook() { Unlink(); }
diff --git a/src/base/unix_server_socket.h b/src/base/unix_server_socket.h
index 6e108a1e1..e2cebbd5e 100644
--- a/src/base/unix_server_socket.h
+++ b/src/base/unix_server_socket.h
@@ -18,6 +18,7 @@
 #ifndef BASE_UNIX_SERVER_SOCKET_H_
 #define BASE_UNIX_SERVER_SOCKET_H_
 
+#include 
 #include 
 #include "base/unix_socket.h"
 
@@ -29,8 +30,9 @@ class UnixServerSocket : public UnixSocket {
   // Set the path name for this server socket. Note that this call does not
   // create the socket - you need to call Send() or Recv() for that to happen.
   // Abstract addresses are supported by passing '\0' as the first byte in @a
-  // path.
-  UnixServerSocket(const std::string& path, Mode mode);
+  // path. If @a permissions is not zero, change the file permissions to this
+  // value after the socket has been created.
+  UnixServerSocket(const std::string& path, Mode mode, mode_t permissions = 0);
   // Set the socket address for this server socket. Note that this call does 
not
   // create the socket - you need to call Send() or Recv() for that to happen.
   UnixServerSocket(const sockaddr_un& addr, socklen_t addrlen, Mode mode);
@@ -44,6 +46,7 @@ class UnixServerSocket : public UnixSocket {
 
  private:
   void Unlink();
+  mode_t permissions_;
 };
 
 }  // namespace base
-- 
2.13.3


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/2] Review Request for dtm: Make the osaftransportd socket world-writable [#2838]

2018-04-13 Thread Anders Widell
Summary: base: Add support for setting UnixServerSocket file permissions [#2838]
Review request for Ticket(s): 2838
Peer Reviewer(s): Ravi
Pull request to: 
Affected branch(es): develop
Development branch: ticket-2838
Base revision: b13a65123bfddcc6f5105fe340131e3bd8a5ac70
Personal repository: git://git.code.sf.net/u/anders-w/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 5dcaf67087e852a9682d00c481351fd51faef268
Author: Anders Widell 
Date:   Fri, 13 Apr 2018 15:15:16 +0200

dtm: Make the osaftransportd socket world-writable [#2838]

Allow OpenSAF agent libraries running in application processes to send trace
messages to osaftransportd, regardless of what user-id and group-id the
application process is running with.



revision 2713c82f25d9dfdaa988819ef0fde2a0942369f9
Author: Anders Widell 
Date:   Fri, 13 Apr 2018 13:55:50 +0200

base: Add support for setting UnixServerSocket file permissions [#2838]



Complete diffstat:
--
 src/base/unix_server_socket.cc  | 14 +++---
 src/base/unix_server_socket.h   |  7 +--
 src/dtm/transport/log_server.cc |  3 ++-
 3 files changed, 18 insertions(+), 6 deletions(-)


Testing Commands:
-

ls -l /var/lib/opensaf


Testing, Expected Results:
--

/var/lib/opensaf/osaf_log.sock shall be world-writable.


Conditions of Submission:
-

Ack from reviewer(s), or on 2018-04-19


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 2/2] dtm: Make the osaftransportd socket world-writable [#2838]

2018-04-13 Thread Anders Widell
Allow OpenSAF agent libraries running in application processes to send trace
messages to osaftransportd, regardless of what user-id and group-id the
application process is running with.
---
 src/dtm/transport/log_server.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/dtm/transport/log_server.cc b/src/dtm/transport/log_server.cc
index f7ccdf765..5a1121cc3 100644
--- a/src/dtm/transport/log_server.cc
+++ b/src/dtm/transport/log_server.cc
@@ -34,7 +34,8 @@ LogServer::LogServer(int term_fd)
 : term_fd_{term_fd},
   max_backups_{9},
   max_file_size_{5 * 1024 * 1024},
-  log_socket_{Osaflog::kServerSocketPath, base::UnixSocket::kNonblocking},
+  log_socket_{Osaflog::kServerSocketPath, base::UnixSocket::kNonblocking,
+  0777},
   log_streams_{},
   current_stream_{new LogStream{"mds.log", 1, 5 * 1024 * 1024}},
   no_of_log_streams_{1} {
-- 
2.13.3


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel