date:20180119

Re: [devel] [PATCH 1/1] imm: removed extra OpenSafSmfConfig in xml [#2762]

2018-01-19 Thread Lennart Lund

Ack

Thanks
Lennart

> -Original Message-
> From: Srinivas [mailto:srinivas.mangip...@oracle.com]
> Sent: den 19 januari 2018 09:48
> To: Lennart Lund ; Rafael Odzakow
> ; ravisekhar.ko...@oracle.com;
> syam.tall...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net; Srinivas
> 
> Subject: [PATCH 1/1] imm: removed extra OpenSafSmfConfig in xml [#2762]
> 
> ---
>  src/smf/config/smfsv_classes.xml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/smf/config/smfsv_classes.xml
> b/src/smf/config/smfsv_classes.xml
> index 06a6f52..6423246 100644
> --- a/src/smf/config/smfsv_classes.xml
> +++ b/src/smf/config/smfsv_classes.xml
> @@ -349,7 +349,7 @@
>  smfNodeRebootCmd
>  SA_STRING_T
>  SA_CONFIG
> -SA_WRITABLEOpenSafSmfConfig
> +SA_WRITABLE
>  
>  
>  smfInactivatePbeDuringUpgrade
> --
> 2.7.4


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

2018-01-19 Thread Anders Widell


Ack from me also, with comments:

* I think my major comment is that I had originally envisioned that you 
would use the "etcdctl lock" command (in the V3 API) and that the active 
SC would hold the lock for as long as it is active. The lock would not 
be needed for reading. Your approach of only creating the lock when you 
wish to change active controller could be fine though. However, you 
shouldn't need the lock for reading - only when you wish to update the 
active controller. Regarding the Watch: I think you should have the 
watch on the lock instead of (or in addition to) the data you are 
protecting. At a fail-over, the old standby would acquire the lock and 
wait a while to give the old active enough time to detect that a 
fail-over is pending (it notices that the lock has been created). The 
old active would then be able to remove the lock and prevent the 
fail-over from happening. We can look into this in the next iteration 
(next release) and keep it as it is for now.


* You ought to utilise the test-and-set functionality in the etcd v2 
protocol, in the cases where you are changing the value of a key and 
know (think you know) the previous value. unlock is an example of this, 
fail-over probably also. We could add this later but I think you should 
at least extend the plugin API already now, so that it takes a "previous 
value" parameter where applicable.


* You have a try-again loop when you acquire the lock, but if the 
maximum number of retries have been done then you continue as if the 
lock was acquired successfully. It doesn't seem to be correct?


* It is not obvious (to me) that no more Watch thread can be created 
simultaneously. Could you add a flag that keeps track of if there is an 
existing thread, and add assert statements checking that there is no 
existing thread when you call MonitorActive() to create a new one?


* As Hans points out below, it seems that it is also possible that the 
watch thread could disappear silently in some error case.


* As already pointed out by Hans, we should store our keys in some 
directory in the etcd database, so that the same database can be used 
for other purposes as well. I think the plugin (shell script) should add 
a directory prefix to the key.


Also see one comment inline below, marked AndersW>

regards,
Anders Widell


On 01/19/2018 01:47 PM, Hans Nordebäck wrote:

Hi Gary, ack, with some comments:


- One test case is missing in the implementation sketch, when using 
TIPC, if some TIPC bearers are disabled and as etcd/raft is using TCP, 
we should verify this works.


AndersW> Split-brain should not be possible, however the current 
algorithm will not guarantee that the active SC will be in the largest 
partition in case TIPC connectivity is broken (partitioned). So it could 
happen that a single isolated node (from TIPC point of view) is the 
active SC, even though a larger TIPC partition exists. I think this 
could be solved by writing the size of the cluster into the lock. An 
existing active SC shall reject a fail-over if it is being initiated 
from a node in a smaller partition.




- The Lock uses TTL, I think the holder of the lock needs to update 
the TTL until is unlocked, or there is a

chance that some parts of the code will run with the lock not taken.

- what if consensus_service.MonitorActive threadFunction gets 
connnection timeout, this thread seems to only log a message and returns.


- sleep(3) in MonitorCallback should be changed.

- Perhaps we should have a directory structure in etcd  instead of 
using /, e.g /opensaf/clm/opensaf_active_controller and perhaps we 
should use only one key/value pair with TTL and use test and set.


- In Consensus::FenceNode, UINT_MAX is used as argument to 
opensaf_reboot, do this work?



/Thanks HansN


On 01/19/2018 12:39 PM, Gary Lee wrote:

Summary: Add support for split brain prevention V2 [#64]
Review request for Ticket(s): 64
Peer Reviewer(s): Anders, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-64
Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docs    y
  Build system    y
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF services    y
  OpenSAF services    y
  Core libraries  n
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

Changes from V1:

* fixed most cppcheck/cpplint errors in osaf/consensus
* disable self-fencing if remote-fencing is enabled
* reboot active controller if it loses quorum (write access)
* better error handling

revision 7ab1280243311058a6848c4da2b9738ab73dc861
Author:    Gary Lee

Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

2018-01-19 Thread Hans Nordebäck


Hi Gary, ack, with some comments:


- One test case is missing in the implementation sketch, when using 
TIPC, if some TIPC bearers are disabled and as etcd/raft is using TCP, 
we should verify this works.


- The Lock uses TTL, I think the holder of the lock needs to update the 
TTL until is unlocked, or there is a

chance that some parts of the code will run with the lock not taken.

- what if consensus_service.MonitorActive threadFunction gets 
connnection timeout, this thread seems to only log a message and returns.


- sleep(3) in MonitorCallback should be changed.

- Perhaps we should have a directory structure in etcd  instead of using 
/, e.g /opensaf/clm/opensaf_active_controller and perhaps we should use 
only one key/value pair with TTL and use test and set.


- In Consensus::FenceNode, UINT_MAX is used as argument to 
opensaf_reboot, do this work?



/Thanks HansN


On 01/19/2018 12:39 PM, Gary Lee wrote:

Summary: Add support for split brain prevention V2 [#64]
Review request for Ticket(s): 64
Peer Reviewer(s): Anders, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-64
Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

  Docsy
  Build systemy
  RPM/packaging   n
  Configuration files n
  Startup scripts n
  SAF servicesy
  OpenSAF servicesy
  Core libraries  n
  Samples n
  Tests   n
  Other   n


Comments (indicate scope for each "y" above):
-

Changes from V1:

* fixed most cppcheck/cpplint errors in osaf/consensus
* disable self-fencing if remote-fencing is enabled
* reboot active controller if it loses quorum (write access)
* better error handling

revision 7ab1280243311058a6848c4da2b9738ab73dc861
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:29:42 +1100

doc: update README and makefiles [#64]



revision 625928304450399548c353473dea631a44aeecbe
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:28:59 +1100

fmd: update consensus service during controller failover [#64]



revision 42539da74893d5ce246242cf8d33c7875ea50fe8
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:26:19 +1100

amfd: update consensus service when performing SI swap [#64]

When a node goes down and split-brain prevention is enabled,
check that we still have write access to the consensus service.
If not and fencing is disabled, reboot the node to prevent
split brain.



revision b6fcd4bede291ba5996b838c3fb784842648581e
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:23:14 +1100

rded: add split brain prevention support [#64]

* consult with consensus service before promoting node to active
* add watch thread and self-fence if it detects active controller
   has been changed (if remote fencing is disabled)



revision 656b670a91a10e385604c98366239a28cde925f7
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:22:53 +1100

osaf: add consensus API [#64]



Added Files:

  src/osaf/consensus/Makefile
  src/osaf/consensus/keyvalue.cc
  src/osaf/consensus/keyvalue.h
  src/osaf/consensus/plugins/etcd.plugin
  src/osaf/consensus/plugins/sample.plugin
  src/osaf/consensus/service.cc
  src/osaf/consensus/service.h


Complete diffstat:
--
  00-README.conf   |  56 +++
  Makefile.am  |   4 +-
  src/amf/amfd/ndproc.cc   |  12 +-
  src/amf/amfd/osaf-amfd.in|   4 +
  src/amf/amfd/role.cc |  35 +++-
  src/fm/Makefile.am   |   1 +
  src/fm/fmd/fm_main.cc|  37 -
  src/fm/fmd/fm_rda.cc |  28 
  src/fm/fmd/fmd.conf  |   8 +
  src/osaf/Makefile.am |   8 +-
  src/osaf/consensus/Makefile  |  18 ++
  src/osaf/consensus/keyvalue.cc   | 174 +++
  src/osaf/consensus/keyvalue.h|  57 +++
  src/osaf/consensus/plugins/etcd.plugin   | 220 
  src/osaf/consensus/plugins/sample.plugin | 163 ++
  src/osaf/consensus/service.cc| 277 +++
  src/osaf/consensus/service.h |  75 +
  src/rde/Makefile.am  |   3 +-
  src/rde/rded/osaf-rded.in|   4 +
  src/rde/rded/rde_cb.h|   3 +-
  src/rde/rded/rde_main.cc |  35 +++-
  src/rde/rded/role.cc |  47 +-
  src/rde/rded/role.h  |   2 +
  23 files changed, 1248 insertions(+), 23 deletions(-)


Testing

[devel] [PATCH 4/5] fmd: update consensus service during controller failover [#64]

2018-01-19 Thread Gary Lee

---
 src/fm/Makefile.am|  1 +
 src/fm/fmd/fm_main.cc | 37 +++--
 src/fm/fmd/fm_rda.cc  | 28 
 src/fm/fmd/fmd.conf   |  8 
 4 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/fm/Makefile.am b/src/fm/Makefile.am
index d48a9146c..0f254b94f 100644
--- a/src/fm/Makefile.am
+++ b/src/fm/Makefile.am
@@ -49,4 +49,5 @@ bin_osaffmd_SOURCES = \
 bin_osaffmd_LDADD = \
lib/libSaAmf.la \
lib/libSaClm.la \
+   lib/libosaf_common.la \
lib/libopensaf_core.la
diff --git a/src/fm/fmd/fm_main.cc b/src/fm/fmd/fm_main.cc
index db8395ee7..74517b3b5 100644
--- a/src/fm/fmd/fm_main.cc
+++ b/src/fm/fmd/fm_main.cc
@@ -28,7 +28,8 @@ This file contains the main() routine for FM.
 #include 
 #include "base/daemon.h"
 #include "base/logtrace.h"
-
+#include "base/osaf_extended_name.h"
+#include "osaf/consensus/service.h"
 #include "nid/agent/nid_api.h"
 #include "fm.h"
 #include "base/osaf_time.h"
@@ -553,6 +554,8 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt)
TRACE_ENTER();
switch (fm_mbx_evt->evt_code) {
case FM_EVT_NODE_DOWN:
+   {
+   Consensus consensus_service;
LOG_NO("Current role: %s", role_string[fm_cb->role]);
if ((fm_mbx_evt->node_id == fm_cb->peer_node_id)) {
/* Check whether node(AMF) initialization is done */
@@ -593,15 +596,27 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt)
 * trigerred quicker than the node_down event
 * has been received.
 */
+   if (fm_cb->role == PCS_RDA_STANDBY) {
+   const std::string current_active = 
consensus_service.CurrentActive();
+   if (current_active.compare(
+   
osaf_extended_name_borrow(_cb->peer_node_name)) == 0) {
+   // update consensus service, 
before fencing old active controller
+   
consensus_service.DemoteCurrentActive();
+   }
+   }
+
if (fm_cb->use_remote_fencing) {
if (fm_cb->peer_node_terminated ==
false) {
+   // if peer_sc_up is true then
+   // the node has come up already
+   if (fm_cb->peer_sc_up == false 
&& fm_cb->immnd_down == true) {
opensaf_reboot(
-   fm_cb->peer_node_id,
-   (char *)fm_cb
-   ->peer_clm_node_name
-   .value,
-   "Received Node Down for 
peer controller");
+   fm_cb->peer_node_id,
+   (char *)fm_cb
+   
->peer_clm_node_name.value,
+   "Received Node Down for 
peer controller");
+   }
} else {
LOG_NO(
"Peer node %s is 
terminated, fencing will not be performed",
@@ -624,6 +639,7 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt)
}
}
}
+   }
break;
 
case FM_EVT_PEER_UP:
@@ -659,6 +675,15 @@ static void fm_mbx_msg_handler(FM_CB *fm_cb, FM_EVT 
*fm_mbx_evt)
0, NULL,
"Failover occurred, but this node is not 
yet ready");
}
+
+   Consensus consensus_service;
+   const std::string current_active = 
consensus_service.CurrentActive();
+   if (current_active.compare(
+   
osaf_extended_name_borrow(_cb->peer_node_name)) == 0) {
+   // update consensus service, before fencing old 
active controller
+   consensus_service.DemoteCurrentActive();
+   }
+
/* Now. Try resetting other blade */
fm_cb->role = PCS_RDA_ACTIVE;
 
diff --git a/src/fm/fmd/fm_rda.cc b/src/fm/fmd/fm_rda.cc
index

[devel] [PATCH 5/5] doc: update README and makefiles [#64]

2018-01-19 Thread Gary Lee

---
 00-README.conf   | 56 
 Makefile.am  |  4 +++-
 src/osaf/Makefile.am |  8 ++--
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/00-README.conf b/00-README.conf
index 5de286225..007872d98 100644
--- a/00-README.conf
+++ b/00-README.conf
@@ -654,3 +654,59 @@ on each node, except on the active node. This file 
indicates that a cluster
 reboot is in progress and all nodes needs to delay their start, this to give
 the active a lead.
 
+Split-Brain Prevention with Consensus Service
+=
+
+OpenSAF implements split-brain prevention by utilizing a consensus service that
+implements a replicated state machine. The consensus service uses quorum to
+prevent state changes in network partitions that don't include more than half
+of the nodes in the cluster. In network partitions containing
+half of the nodes or less, the state is either read-only or unavailable.
+Thus, it is important to keep in mind that the consensus service by itself
+does not prevent the presence of multiple active system
+controller nodes. In the case when the network has been split up into 
partitions
+and the current active system controller no longer has write access to the
+state machine, OpenSAF relies on some additional mechanism like fencing to
+ensure that the current active system controller disappears before a new
+active system controller can be chosen among the nodes that do have write
+access to the replicated state machine. If fencing is not available, the old
+active system controller can detect that it has lost write
+access and step down from its active role.
+
+The consensus service can be implemented, for example, using the RAFT 
algorithm.
+When using RAFT, there are mainly three possibilities:
+
+1. The RAFT servers run on the same nodes as OpenSAF
+2. The RAFT servers run on a subset of the OpenSAF nodes
+3. The RAFT servers run on an external set of nodes, outside of the
+   OpenSAF cluster
+
+The consensus services relies on a plugin to communicate with a distributed
+key-value store database. This plugin must still function according to the
+API when the network has split up into partitions.
+The plugin interface is defined in src/osaf/consensus/plugins/sample.plugin
+
+An implementation for etcdv2 is provided. It assumes etcd is installed
+and configured on all system controllers. In clusters where
+there are only two system controllers, it is highly recommended to
+configure etcd so it runs on at least three nodes to facilitate
+a majority vote with failure tolerance.
+
+Other implementations of a distributed key-value store service
+can be used, provided as it implements the interface documented in 
sample.plugin
+
+To enable split-brain prevention, edit fmd.conf and update accordingly:
+
+export FMS_SPLIT_BRAIN_PREVENTION=1
+export FMS_KEYVALUE_STORE_PLUGIN_CMD=/usr/local/lib/opensaf/etcd.plugin
+
+As discussed, the key-value store does not need to reside on the same nodes
+as OpenSAF. In such a configuration, an appropriate plugin that handles
+the communication with a remotely located key-value store, must be provided.
+
+If remote fencing is enabled, then it will be used to fence a node that the
+consensus service believes should not be active. Otherwise, rded/amfd will
+initiate a 'self-fencing' by rebooting the node, if it determines the node
+should no longer be active according to the consensus service, to prevent
+a split-brain situation.
+
diff --git a/Makefile.am b/Makefile.am
index bcfd844cd..57c2585a8 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -159,7 +159,9 @@ dist_osaf_execbin_SCRIPTS += \
$(top_srcdir)/scripts/opensaf_reboot \
$(top_srcdir)/scripts/opensaf_sc_active \
$(top_srcdir)/scripts/opensaf_scale_out \
-   $(top_srcdir)/scripts/plm_scale_out
+   $(top_srcdir)/scripts/plm_scale_out \
+   $(top_srcdir)/src/osaf/consensus/plugins/etcd.plugin
+# TODO remove above line before pushing
 
 include $(top_srcdir)/src/ais/Makefile.am
 include $(top_srcdir)/src/base/Makefile.am
diff --git a/src/osaf/Makefile.am b/src/osaf/Makefile.am
index 05b78c988..10bbe427b 100644
--- a/src/osaf/Makefile.am
+++ b/src/osaf/Makefile.am
@@ -16,7 +16,9 @@
 
 noinst_HEADERS += \
src/osaf/immutil/immutil.h \
-   src/osaf/saflog/saflog.h
+   src/osaf/saflog/saflog.h \
+   src/osaf/consensus/keyvalue.h \
+   src/osaf/consensus/service.h
 
 pkglib_LTLIBRARIES += lib/libosaf_common.la
 
@@ -33,7 +35,9 @@ lib_libosaf_common_la_LDFLAGS = \
 
 lib_libosaf_common_la_SOURCES = \
src/osaf/immutil/immutil.c \
-   src/osaf/saflog/saflog.c
+   src/osaf/saflog/saflog.c \
+   src/osaf/consensus/keyvalue.cc \
+   src/osaf/consensus/service.cc
 
 nodist_EXTRA_lib_libosaf_common_la_SOURCES = dummy.cc
 
-- 
2.14.1


--
Check out the vibrant tech community on

[devel] [PATCH 3/5] amfd: update consensus service when performing SI swap [#64]

2018-01-19 Thread Gary Lee

When a node goes down and split-brain prevention is enabled,
check that we still have write access to the consensus service.
If not and fencing is disabled, reboot the node to prevent
split brain.
---
 src/amf/amfd/ndproc.cc| 12 +++-
 src/amf/amfd/osaf-amfd.in |  4 
 src/amf/amfd/role.cc  | 35 ++-
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/src/amf/amfd/ndproc.cc b/src/amf/amfd/ndproc.cc
index 0c6316627..df68b3dbf 100644
--- a/src/amf/amfd/ndproc.cc
+++ b/src/amf/amfd/ndproc.cc
@@ -32,8 +32,8 @@
  */
 
 #include "osaf/immutil/immutil.h"
+#include "osaf/consensus/service.h"
 #include "base/logtrace.h"
-
 #include "amf/amfd/amfd.h"
 #include "amf/amfd/imm.h"
 #include "amf/amfd/cluster.h"
@@ -1221,5 +1221,15 @@ void avd_node_failover(AVD_AVND *node) {
   avd_pg_node_csi_del_all(avd_cb, node);
   avd_node_down_mw_susi_failover(avd_cb, node);
   avd_node_down_appl_susi_failover(avd_cb, node);
+
+  Consensus consensus_service;
+  if (consensus_service.IsRemoteFencingEnabled() == false &&
+  consensus_service.IsWritable() == false) {
+// remote fencing is disabled and we have lost write access
+// reboot this node to prevent split brain
+opensaf_reboot(0, nullptr,
+  "Quorum lost. Rebooting this node to prevent split-brain");
+  }
+
   TRACE_LEAVE();
 }
diff --git a/src/amf/amfd/osaf-amfd.in b/src/amf/amfd/osaf-amfd.in
index 45c5ab9e4..26a77ef52 100644
--- a/src/amf/amfd/osaf-amfd.in
+++ b/src/amf/amfd/osaf-amfd.in
@@ -28,6 +28,10 @@ else
. $pkgsysconfdir/amfd.conf
 fi 
 
+if [ -f "$pkgsysconfdir/fmd.conf" ]; then
+  . "$pkgsysconfdir/fmd.conf"
+fi
+
 binary=$pkglibdir/$osafprog
 pidfile=$pkgpiddir/$osafprog.pid
 lockfile=$lockdir/$initscript
diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index 865d89d94..aac90eb13 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -38,6 +38,7 @@
 #include "osaf/immutil/immutil.h"
 #include "base/logtrace.h"
 #include "rde/agent/rda_papi.h"
+#include "osaf/consensus/service.h"
 
 #include "amf/amfd/amfd.h"
 #include "amf/amfd/imm.h"
@@ -1085,6 +1086,12 @@ uint32_t amfd_switch_actv_qsd(AVD_CL_CB *cb) {
 avd_d2n_msg_dequeue(cb);
   }
 
+  Consensus consensus_service;
+  rc = consensus_service.DemoteThisNode();
+  if (rc != SA_AIS_OK) {
+LOG_ER("Failed to demote this node from consensus service");
+  }
+
   TRACE_LEAVE();
   return NCSCC_RC_SUCCESS;
 }
@@ -1209,13 +1216,21 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
   cb->avail_state_avd = SA_AMF_HA_ACTIVE;
   osaf_mutex_unlock_ordie(_reinit_mutex);
 
+  Consensus consensus_service;
+  rc = consensus_service.BeginActivePromotion();
+  if (rc != SA_AIS_OK) {
+LOG_ER("Unable to set active controller in consensus service");
+osafassert(false);
+  }
+
   /* Declare this standby as Active. Set Vdest role role */
   if (NCSCC_RC_SUCCESS !=
   (status = avd_mds_set_vdest_role(cb, SA_AMF_HA_ACTIVE))) {
 LOG_ER("Switch Standby --> Active FAILED, MDS role set failed");
 cb->swap_switch = false;
 avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE);
-return NCSCC_RC_FAILURE;
+status = NCSCC_RC_FAILURE;
+goto done;
   }
 
   /* Time to send fail-over messages to all the AVND's */
@@ -1240,7 +1255,8 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
 } else {
   cb->swap_switch = false;
   avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE);
-  return NCSCC_RC_FAILURE;
+  status = NCSCC_RC_FAILURE;
+  goto done;
 }
   }
 
@@ -1259,7 +1275,8 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
  in avd_imm_reinit_bg_thread.*/
 } else {
   avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE);
-  return NCSCC_RC_FAILURE;
+  status = NCSCC_RC_FAILURE;
+  goto done;
 }
   } else
 osaf_mutex_unlock_ordie(_reinit_mutex);
@@ -1274,7 +1291,8 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
 LOG_ER("Switch Standby --> Active, clm track start failed");
 Fifo::queue(new ClmTrackStart());
 avd_d2d_chg_role_rsp(cb, NCSCC_RC_FAILURE, SA_AMF_HA_ACTIVE);
-return NCSCC_RC_FAILURE;
+status = NCSCC_RC_FAILURE;
+goto done;
   }
 
   /* Send the message to other avd for role change rsp as success */
@@ -1291,8 +1309,15 @@ uint32_t amfd_switch_stdby_actv(AVD_CL_CB *cb) {
 }
   }
 
+  status = NCSCC_RC_SUCCESS;
+done:
+  rc = consensus_service.EndActivePromotion();
+  if (rc != SA_AIS_OK) {
+LOG_ER("Unable to remove lock in consensus service");
+  }
+
   TRACE_LEAVE();
-  return NCSCC_RC_SUCCESS;
+  return status;
 }
 
 /\
-- 
2.14.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___

[devel] [PATCH 2/5] rded: add split brain prevention support [#64]

2018-01-19 Thread Gary Lee

* consult with consensus service before promoting node to active
* add watch thread and self-fence if it detects active controller
  has been changed (if remote fencing is disabled)
---
 src/rde/Makefile.am   |  3 ++-
 src/rde/rded/osaf-rded.in |  4 
 src/rde/rded/rde_cb.h |  3 ++-
 src/rde/rded/rde_main.cc  | 35 ++-
 src/rde/rded/role.cc  | 47 ++-
 src/rde/rded/role.h   |  2 ++
 6 files changed, 86 insertions(+), 8 deletions(-)

diff --git a/src/rde/Makefile.am b/src/rde/Makefile.am
index c967f9fc4..182f347ab 100644
--- a/src/rde/Makefile.am
+++ b/src/rde/Makefile.am
@@ -58,7 +58,8 @@ bin_osafrded_SOURCES = \
 
 bin_osafrded_LDADD = \
lib/libSaAmf.la \
-   lib/libopensaf_core.la
+   lib/libopensaf_core.la \
+   lib/libosaf_common.la
 
 bin_rdegetrole_CPPFLAGS = \
$(AM_CPPFLAGS)
diff --git a/src/rde/rded/osaf-rded.in b/src/rde/rded/osaf-rded.in
index 1c1786c8d..1697936a7 100644
--- a/src/rde/rded/osaf-rded.in
+++ b/src/rde/rded/osaf-rded.in
@@ -28,6 +28,10 @@ else
. $pkgsysconfdir/rde.conf
 fi 
 
+if [ -f "$pkgsysconfdir/fmd.conf" ]; then
+  . "$pkgsysconfdir/fmd.conf"
+fi
+
 binary=$pkglibdir/$osafprog
 pidfile=$pkgpiddir/$osafprog.pid
 tracefile=$pkglogdir/$osafprog.log
diff --git a/src/rde/rded/rde_cb.h b/src/rde/rded/rde_cb.h
index d2a3d46b2..83f35c691 100644
--- a/src/rde/rded/rde_cb.h
+++ b/src/rde/rded/rde_cb.h
@@ -45,7 +45,8 @@ enum RDE_MSG_TYPE {
   RDE_MSG_PEER_UP = 1,
   RDE_MSG_PEER_DOWN = 2,
   RDE_MSG_PEER_INFO_REQ = 3,
-  RDE_MSG_PEER_INFO_RESP = 4
+  RDE_MSG_PEER_INFO_RESP = 4,
+  RDE_MSG_NEW_ACTIVE_CALLBACK = 5
 };
 
 struct rde_peer_info {
diff --git a/src/rde/rded/rde_main.cc b/src/rde/rded/rde_main.cc
index 0298bf3ff..ba472fc6a 100644
--- a/src/rde/rded/rde_main.cc
+++ b/src/rde/rded/rde_main.cc
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include "osaf/consensus/service.h"
 #include "base/daemon.h"
 #include "base/logtrace.h"
 #include "base/osaf_poll.h"
@@ -37,6 +38,7 @@
 #include 
 #include "rde/rded/rde_cb.h"
 #include "rde/rded/role.h"
+#include "base/conf.h"
 
 #define RDA_MAX_CLIENTS 32
 
@@ -92,10 +94,6 @@ static void handle_mbx_event() {
   TRACE_ENTER();
 
   msg = reinterpret_cast(ncs_ipc_non_blk_recv(_cb->mbx));
-  TRACE("Received %s from node 0x%x with state %s. My state is %s",
-rde_msg_name[msg->type], msg->fr_node_id,
-Role::to_string(msg->info.peer_info.ha_role),
-Role::to_string(role->role()));
 
   switch (msg->type) {
 case RDE_MSG_PEER_INFO_REQ:
@@ -118,6 +116,32 @@ static void handle_mbx_event() {
 case RDE_MSG_PEER_DOWN:
   LOG_NO("Peer down on node 0x%x", msg->fr_node_id);
   break;
+   case RDE_MSG_NEW_ACTIVE_CALLBACK:
+  {
+const std::string my_node = base::Conf::NodeName();
+
+// get current active controller
+Consensus consensus_service;
+std::string active_controller = consensus_service.CurrentActive();
+
+LOG_NO("New active controller notification from consensus service");
+
+if (role->role() == PCS_RDA_ACTIVE) {
+  if (my_node.compare(active_controller) != 0) {
+// we are meant to be active, but consensus service doesn't think 
so
+LOG_ER("Role does not match consensus service. New controller: %s",
+  active_controller.c_str());
+if (consensus_service.IsRemoteFencingEnabled() == false ) {
+  LOG_ER("Probable split brain. Rebooting this node");
+  opensaf_reboot(0, nullptr, "Split-brain detected by consensus 
service");
+}
+  }
+
+  // register for callback
+  consensus_service.MonitorActive(Role::MonitorCallback, rde_cb->mbx);
+}
+  }
+  break;
 default:
   LOG_ER("%s: discarding unknown message type %u", __FUNCTION__, 
msg->type);
   break;
@@ -205,11 +229,12 @@ int main(int argc, char *argv[]) {
   NCS_SEL_OBJ mbx_sel_obj;
   RDE_RDA_CB *rde_rda_cb = _cb->rde_rda_cb;
   int term_fd;
-
   opensaf_reboot_prepare();
 
   daemonize(argc, argv);
 
+  base::Conf::InitNodeName();
+
   if (initialize_rde() != NCSCC_RC_SUCCESS) goto init_failed;
 
   mbx_sel_obj = ncs_ipc_get_sel_obj(_cb->mbx);
diff --git a/src/rde/rded/role.cc b/src/rde/rded/role.cc
index f7511f0d8..cccbb299f 100644
--- a/src/rde/rded/role.cc
+++ b/src/rde/rded/role.cc
@@ -27,7 +27,9 @@
 #include "base/process.h"
 #include "base/time.h"
 #include "base/ncs_main_papi.h"
+#include "base/ncssysf_def.h"
 #include "rde/rded/rde_cb.h"
+#include "osaf/consensus/service.h"
 
 const char* const Role::role_names_[] = {"Undefined", "ACTIVE","STANDBY",
  "QUIESCED",  "QUIESCING", "Invalid"};
@@ -42,6 +44,23 @@ const char* Role::to_string(PCS_RDA_ROLE role) {
  : role_names_[0];
 }
 
+void Role::MonitorCallback(const std::string& new_value, SYSF_MBX mbx)
+{
+  TRACE_ENTER();
+
+

[devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

2018-01-19 Thread Gary Lee

Summary: Add support for split brain prevention V2 [#64] 
Review request for Ticket(s): 64
Peer Reviewer(s): Anders, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-64
Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsy 
 Build systemy 
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesy 
 OpenSAF servicesy 
 Core libraries  n 
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

Changes from V1:

* fixed most cppcheck/cpplint errors in osaf/consensus
* disable self-fencing if remote-fencing is enabled
* reboot active controller if it loses quorum (write access)
* better error handling

revision 7ab1280243311058a6848c4da2b9738ab73dc861
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:29:42 +1100

doc: update README and makefiles [#64]



revision 625928304450399548c353473dea631a44aeecbe
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:28:59 +1100

fmd: update consensus service during controller failover [#64]



revision 42539da74893d5ce246242cf8d33c7875ea50fe8
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:26:19 +1100

amfd: update consensus service when performing SI swap [#64]

When a node goes down and split-brain prevention is enabled,
check that we still have write access to the consensus service.
If not and fencing is disabled, reboot the node to prevent
split brain.



revision b6fcd4bede291ba5996b838c3fb784842648581e
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:23:14 +1100

rded: add split brain prevention support [#64]

* consult with consensus service before promoting node to active
* add watch thread and self-fence if it detects active controller
  has been changed (if remote fencing is disabled)



revision 656b670a91a10e385604c98366239a28cde925f7
Author: Gary Lee 
Date:   Fri, 19 Jan 2018 22:22:53 +1100

osaf: add consensus API [#64]



Added Files:

 src/osaf/consensus/Makefile
 src/osaf/consensus/keyvalue.cc
 src/osaf/consensus/keyvalue.h
 src/osaf/consensus/plugins/etcd.plugin
 src/osaf/consensus/plugins/sample.plugin
 src/osaf/consensus/service.cc
 src/osaf/consensus/service.h


Complete diffstat:
--
 00-README.conf   |  56 +++
 Makefile.am  |   4 +-
 src/amf/amfd/ndproc.cc   |  12 +-
 src/amf/amfd/osaf-amfd.in|   4 +
 src/amf/amfd/role.cc |  35 +++-
 src/fm/Makefile.am   |   1 +
 src/fm/fmd/fm_main.cc|  37 -
 src/fm/fmd/fm_rda.cc |  28 
 src/fm/fmd/fmd.conf  |   8 +
 src/osaf/Makefile.am |   8 +-
 src/osaf/consensus/Makefile  |  18 ++
 src/osaf/consensus/keyvalue.cc   | 174 +++
 src/osaf/consensus/keyvalue.h|  57 +++
 src/osaf/consensus/plugins/etcd.plugin   | 220 
 src/osaf/consensus/plugins/sample.plugin | 163 ++
 src/osaf/consensus/service.cc| 277 +++
 src/osaf/consensus/service.h |  75 +
 src/rde/Makefile.am  |   3 +-
 src/rde/rded/osaf-rded.in|   4 +
 src/rde/rded/rde_cb.h|   3 +-
 src/rde/rded/rde_main.cc |  35 +++-
 src/rde/rded/role.cc |  47 +-
 src/rde/rded/role.h  |   2 +
 23 files changed, 1248 insertions(+), 23 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y  
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___

[devel] [PATCH 1/5] osaf: add consensus API [#64]

2018-01-19 Thread Gary Lee

---
 src/osaf/consensus/Makefile  |  18 ++
 src/osaf/consensus/keyvalue.cc   | 174 +++
 src/osaf/consensus/keyvalue.h|  57 +++
 src/osaf/consensus/plugins/etcd.plugin   | 220 
 src/osaf/consensus/plugins/sample.plugin | 163 ++
 src/osaf/consensus/service.cc| 277 +++
 src/osaf/consensus/service.h |  75 +
 7 files changed, 984 insertions(+)
 create mode 100644 src/osaf/consensus/Makefile
 create mode 100644 src/osaf/consensus/keyvalue.cc
 create mode 100644 src/osaf/consensus/keyvalue.h
 create mode 100644 src/osaf/consensus/plugins/etcd.plugin
 create mode 100644 src/osaf/consensus/plugins/sample.plugin
 create mode 100644 src/osaf/consensus/service.cc
 create mode 100644 src/osaf/consensus/service.h

diff --git a/src/osaf/consensus/Makefile b/src/osaf/consensus/Makefile
new file mode 100644
index 0..a2c8bc9dd
--- /dev/null
+++ b/src/osaf/consensus/Makefile
@@ -0,0 +1,18 @@
+#  -*- OpenSAF  -*-
+#
+# (C) Copyright 2018 The OpenSAF Foundation
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+# or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+# under the GNU Lesser General Public License Version 2.1, February 1999.
+# The complete license can be accessed from the following location:
+# http://opensource.org/licenses/lgpl-license.php
+# See the Copying file included with the OpenSAF distribution for full
+# licensing terms.
+#
+# Author(s): Ericsson AB
+#
+
+all:
+   $(MAKE) -C ../.. lib/libconsensus.la
diff --git a/src/osaf/consensus/keyvalue.cc b/src/osaf/consensus/keyvalue.cc
new file mode 100644
index 0..860a8657f
--- /dev/null
+++ b/src/osaf/consensus/keyvalue.cc
@@ -0,0 +1,174 @@
+/*  -*- OpenSAF  -*-
+ *
+ * (C) Copyright 2018 The OpenSAF Foundation
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. This file and program are licensed
+ * under the GNU Lesser General Public License Version 2.1, February 1999.
+ * The complete license can be accessed from the following location:
+ * http://opensource.org/licenses/lgpl-license.php
+ * See the Copying file included with the OpenSAF distribution for full
+ * licensing terms.
+ *
+ * Author(s): Ericsson AB
+ *
+ */
+#include "osaf/consensus/keyvalue.h"
+#include "base/logtrace.h"
+#include "base/getenv.h"
+#include "base/conf.h"
+
+int KeyValue::Execute(const std::string& command, std::string& output) {
+  TRACE_ENTER();
+  constexpr size_t buf_size = 128;
+  std::array buffer;
+  FILE* pipe = popen(command.c_str(), "r");
+  if (pipe == nullptr) {
+return 1;
+  }
+  output = "";
+  while (feof(pipe) == 0) {
+if (fgets(buffer.data(), buf_size, pipe) != nullptr) {
+  output += buffer.data();
+}
+  }
+  const int exit_code = pclose(pipe);
+  if (output.empty() == false && isspace(output.back()) != 0) {
+// remove newline at end of output
+output.pop_back();
+  }
+  TRACE("Executed '%s', returning %d", command.c_str(), exit_code);
+  return exit_code;
+}
+
+SaAisErrorT KeyValue::Get(const std::string& key, std::string& value) {
+  TRACE_ENTER();
+
+  const std::string kv_store_cmd = base::GetEnv(
+"FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
+  const std::string command(kv_store_cmd + " get " + key);
+  int rc = KeyValue::Execute(command, value);
+  TRACE("Read '%s'", value.c_str());
+
+  if (rc == 0) {
+return SA_AIS_OK;
+  } else {
+return SA_AIS_ERR_FAILED_OPERATION;
+  }
+}
+
+SaAisErrorT KeyValue::Set(const std::string& key, const std::string& value) {
+  TRACE_ENTER();
+
+  const std::string kv_store_cmd = base::GetEnv(
+"FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
+  const std::string command(kv_store_cmd + " set " + key + " " + value);
+  std::string output;
+  int rc = KeyValue::Execute(command, output);
+
+  if (rc == 0) {
+return SA_AIS_OK;
+  } else {
+return SA_AIS_ERR_FAILED_OPERATION;
+  }
+}
+
+SaAisErrorT KeyValue::Erase(const std::string& key) {
+  TRACE_ENTER();
+
+  const std::string kv_store_cmd = base::GetEnv(
+"FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
+  const std::string command(kv_store_cmd + " erase " + key);
+  std::string output;
+  int rc = KeyValue::Execute(command, output);
+
+  if (rc == 0) {
+return SA_AIS_OK;
+  } else {
+return SA_AIS_ERR_FAILED_OPERATION;
+  }
+}
+
+SaAisErrorT KeyValue::Lock(const std::string& owner,
+ const unsigned int timeout) {
+  TRACE_ENTER();
+
+  const std::string kv_store_cmd = base::GetEnv(
+"FMS_KEYVALUE_STORE_PLUGIN_CMD", "");
+  const std::string command(kv_store_cmd + " lock " + owner + " " +
+std::to_string(timeout));
+  std::string output;
+  int rc =

Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]

2018-01-19 Thread Anders Widell


I will update the README file before pushing.

Regarding CHASSIS_ID and SUBSLOT_ID: the values 2 and 15 are used as 
default values only, in the case when /etc/opensaf/chassis_id and 
/etc/opensaf/subslot_id are missing. If the files are present then the 
values are configured in the files are used.


regards,

Anders Widell


On 01/19/2018 09:57 AM, Ravi Sekhar Reddy Konda wrote:

Hi Anders,

Ack, reviewed & tested (combinations also)
Update the README(if you are pushing #2759 immediately, you can do after #2759)
one minor comment, in generate_nodeid you hardcoded chassis_id & subslot_id
  + CHASSIS_ID=2
+   SUBSLOT_ID=15
  
I understand we are not taking them as config values, but better to define them in #defines


Thanks,
Ravi

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com]
Sent: Friday, January 12, 2018 5:54 PM
To: Ravi Sekhar Reddy Konda 
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 

Subject: [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address 
[#2758]

Summary: dtm: Derive Node ID from IPv4 address [#2758] Review request for 
Ticket(s): 2758 Peer Reviewer(s): Ravi Pull request to:
Affected branch(es): develop
Development branch: ticket-2758
Base revision: 34a070372ff7cfe3caae7ec4e11a6681e19cdf31
Personal repository: git://git.code.sf.net/u/anders-w/review


Impacted area   Impact y/n

  Docsn
  Build systemn
  RPM/packaging   n
  Configuration files n
  Startup scripts y
  SAF servicesy
  OpenSAF servicesy
  Core libraries  y
  Samples n
  Tests   y
  Other   n


Comments (indicate scope for each "y" above):
-

revision c9161898793a57ce1db18be5343b9ea2a372271e
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

nid: Make chassis_id, slot_id and subslot_id in /etc/opensaf optional [#2758]

The files chassis_id, slot_id and subslot_id in /etc/opensaf no longer have to 
be present. When they are missing, OpenSAF will derive the Node ID from the 
TIPC address or the IPv4 address of the node.



revision 2500da3505f42aec23cc51eb0ec58091dadb14ea
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

msg: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision bc8ef895e4eb480416ab841d8c4ce1bf152fad82
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

lck: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision 629672485cba6356c73ca31d3eea5d475e1b4cf8
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

clm: Allow any unsigned 32-bit value to be used as Node ID [#2758]

Also fix the CLM API tests so that they longer assume that there is a node with 
node ID 0x2010f in the cluster.



revision c10a688016d950ab16d09e08c63519de1c56e449
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

ckpt: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision 8b56c8eec0da20b95c46772b682b23413e666dfd
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

dtm: Derive Node ID from IPv4 address [#2758]

If the /var/lib/opensaf/node_id file doesn't exist when DTM starts, DTM will 
create the file and use the IPv4 address as node ID. When using IPv6, the file 
must still be configured manually. IPv6 support may be added in a future ticket.



Complete diffstat:
--
  src/ckpt/apitest/test_cpsv.h|  2 +-
  src/ckpt/ckptnd/cpnd_res.c  | 18 ++---
  src/ckpt/common/cpsv_evt.c  |  2 +-
  src/clm/apitest/clmtest.cc  |  6 ++
  src/clm/apitest/tet_saClmClusterNodeGet.cc  | 23 +++---
  src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 15 ++--
  src/clm/apitest/tet_saClmClusterTrack.cc|  2 +-
  src/clm/apitest/tet_saClmClusterTrackStop.cc|  4 +-
  src/clm/apitest/tet_saClmDispatch.cc|  6 +-
  src/clm/apitest/tet_saClmResponse.cc|  2 +-
  src/dtm/dtmnd/dtm.h |  1 +
  src/dtm/dtmnd/dtm_main.cc   | 98 -
  src/dtm/dtmnd/dtm_node_db.cc|  7 ++
  src/dtm/dtmnd/dtm_read_config.cc|  6 --
  src/lck/apitest/tet_gla.c   |  4 +-
  src/lck/apitest/tet_gld.c   |  4 +-
  src/lck/apitest/tet_glnd.c  |  2 +-
  src/lck/apitest/tet_glsv.h  |  2 +-
  src/msg/apitest/tet_mqa.c   |  4 +-
  src/msg/apitest/tet_mqsv.h  |  2 +-
  src/nid/opensafd.in

Re: [devel] [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762]

2018-01-19 Thread Syam Prasad Talluri

ACK from my Side.

Thanks,
Syam.
-Original Message-
From: Srinivas [mailto:srinivas.mangip...@oracle.com] 
Sent: Friday, January 19, 2018 2:18 PM
To: lennart.l...@ericsson.com; rafael.odza...@ericsson.com; 
ravisekhar.ko...@oracle.com; syam.tall...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net; Srinivas 

Subject: [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml 
[#2762]

Summary: imm: removed extra OpenSafSmfConfig in xml [#2762] Review request for 
Ticket(s): 2762 Peer Reviewer(s): lennart.l...@ericsson.com , Ravi, Shyam, 
rafael.odza...@ericsson.com Pull request to: *** LIST THE PERSON WITH PUSH 
ACCESS HERE *** Affected branch(es): develop Development branch: ticket-2762 
Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354
Personal repository: git://git.code.sf.net/u/sri-mangipudy/review

Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files y
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n

Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 9ed9e7a38e39d41d3698d5de9dbac6f07417693f
Author: Srinivas 
Date:   Fri, 19 Jan 2018 12:51:46 +0530

imm: removed extra OpenSafSmfConfig in xml [#2762]

Complete diffstat:
--
 src/smf/config/smfsv_classes.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Testing Commands:
-
run immxml-configure, extra OpenSafSmfConfig should not be present in the 
resultant xml.

Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***

Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n

Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]

Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list

Re: [devel] [PATCH 0/5] Review Request for smf: Add capability to redo CCBs that fail [#1398]

2018-01-19 Thread Lennart Lund

Hi

Thanks for doing an early review.

I have started to create a test application for this IMM object handler. It 
will be located in the same directory as the demo applications and being built 
the same way. The test application will make it possible to test this new 
handler independently of SMF.
I have found some problems that I am fixing. The most seriously is that in the 
lower layer API (imm_om_api) handles are given when the objects are created and 
cannot be restored without creating a new object. This means that recovery if 
for example BAD_HANDLE does not work. It is also means that a new 
ModelModification object has to be created for each CCB.

Also, see below for some comments/answers regarding the points below.

Thanks
Lennart

> -Original Message-
> From: Syam Prasad Talluri [mailto:syam.tall...@oracle.com]
> Sent: den 18 januari 2018 09:30
> To: Lennart Lund ; Vijay Roy
> 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: RE: [devel] [PATCH 0/5] Review Request for smf: Add capability to
> redo CCBs that fail [#1398]
> 
> Hi Lennart,
> 
>   I understand you are planning to send some more patches. Here are my
> initial observations
> 
>   1. In om_handle.cc we are reading the env variable
> IMMOMCPP_TRACE_PATHNAME, we need to add about this in the ReadMe
> file or remove this env
[Lennart] Good finding, this has to be removed

>   2. Like immccb operations do we need to implement wrappers for
> saImmOmAccessorGet_2, saImmOmAdminOperationInvoke_2 which are
> used by SMF.
[Lennart] I agree, since we have the same problem with recovery handling etc. 
also here. However even if handlers for these OM APIs should be easier to 
implement and some of the code created for CCB handling is common it is 
projects that probably has to be handled one by one.

>   3. Wrappers are added for IMM OM, do we need to have same kind of
> wrapper for IMM OI implementation as well ?
[Lennart] Yes, that would be really good to do but it will be more complicated 
and a bigger "project" than OM handling.
In the log service there is a handler for the configuration object that is 
based on some ideas that can be used for an OI handler

>   4. In om_handle.h is there any special reason to keep the below functions
>   void SetDummy(int i) { dummy_ = 1; }
>   int GetDummy(void) { return dummy_; }
[Lennart] No, I have removed them

>   5. There are TODO's in some files, I hope you will address them before final
> commit.
[Lennart] Yes. There is also a lot of LLDTEST tagged logs, traces and even code 
that also will be handled

>   6. Looks like Wrappers implemented are generic Hence we can use these in
> other services also and these can be moved to a common place going
> forward.
[Lennart] Yes, it is generic but let's use it with SMF first. If it works well 
it can be moved to a generic place and become library code. This also requires 
that the lower layer API wrappers are made generic library code.
> 
>   We will try to integrate the existing SMF OM Interface with these Wrappers
> .
> 
> 
> Thanks,
> Syam.
> -Original Message-
> From: Lennart Lund [mailto:lennart.l...@ericsson.com]
> Sent: Tuesday, January 16, 2018 8:11 PM
> To: Vijay Roy 
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: [devel] [PATCH 0/5] Review Request for smf: Add capability to redo
> CCBs that fail [#1398]
> 
> Summary: smf: Add capability to redo CCBs that fail [#1398] Review request
> for Ticket(s): 1398 Peer Reviewer(s): *** LIST THE TECH REVIEWER(S) /
> MAINTAINER(S) HERE *** Pull request to: *** LIST THE PERSON WITH PUSH
> ACCESS HERE *** Affected branch(es): develop Development branch: ticket-
> 1398 Base revision: d082d2fb2b26437bbe6860e8efff8479748378c2
> Personal repository: git://git.code.sf.net/u/elunlen/review
> 
> 
> Impacted area   Impact y/n
> 
>  Docsn
>  Build systemn
>  RPM/packaging   n
>  Configuration files n
>  Startup scripts n
>  SAF servicesy
>  OpenSAF servicesn
>  Core libraries  n
>  Samples n
>  Tests   n
>  Other   n
> 
> NOTE: Patch(es) contain lines longer than 80 characers
> 
> Comments (indicate scope for each "y" above):
> -
> Note:
> Logging and tracing is still tagged LLDTEST. This will be removed For testing
> use the demo programs as before
> 
> revision 011fc764956ff8892f577b44a7174384c85cc878
> Author:   Lennart Lund 
> Date: Tue, 16 Jan 2018 15:28:10 +0100
> 
> smf: Add capability to redo CCBs that fail [#1398]
> 
> Object modify is added
> ccbdemo_modify updated with demo code for object modify
> 
> Also some cleanup and bug fixes
> 
> 
> 
> revision 5c2931adb8b8f5d030b869a48e0fcb91ceeda3ac
> Author:   Lennart Lund

Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]

2018-01-19 Thread Ravi Sekhar Reddy Konda

Hi Anders,

Ack, reviewed & tested (combinations also)
Update the README(if you are pushing #2759 immediately, you can do after #2759) 
one minor comment, in generate_nodeid you hardcoded chassis_id & subslot_id
 +  CHASSIS_ID=2
+   SUBSLOT_ID=15
 
I understand we are not taking them as config values, but better to define them 
in #defines

Thanks,
Ravi

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com] 
Sent: Friday, January 12, 2018 5:54 PM
To: Ravi Sekhar Reddy Konda 
Cc: opensaf-devel@lists.sourceforge.net; Anders Widell 

Subject: [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address 
[#2758]

Summary: dtm: Derive Node ID from IPv4 address [#2758] Review request for 
Ticket(s): 2758 Peer Reviewer(s): Ravi Pull request to: 
Affected branch(es): develop
Development branch: ticket-2758
Base revision: 34a070372ff7cfe3caae7ec4e11a6681e19cdf31
Personal repository: git://git.code.sf.net/u/anders-w/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts y
 SAF servicesy
 OpenSAF servicesy
 Core libraries  y
 Samples n
 Tests   y
 Other   n


Comments (indicate scope for each "y" above):
-

revision c9161898793a57ce1db18be5343b9ea2a372271e
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

nid: Make chassis_id, slot_id and subslot_id in /etc/opensaf optional [#2758]

The files chassis_id, slot_id and subslot_id in /etc/opensaf no longer have to 
be present. When they are missing, OpenSAF will derive the Node ID from the 
TIPC address or the IPv4 address of the node.



revision 2500da3505f42aec23cc51eb0ec58091dadb14ea
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

msg: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision bc8ef895e4eb480416ab841d8c4ce1bf152fad82
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

lck: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision 629672485cba6356c73ca31d3eea5d475e1b4cf8
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

clm: Allow any unsigned 32-bit value to be used as Node ID [#2758]

Also fix the CLM API tests so that they longer assume that there is a node with 
node ID 0x2010f in the cluster.



revision c10a688016d950ab16d09e08c63519de1c56e449
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

ckpt: Allow any unsigned 32-bit value to be used as Node ID [#2758]



revision 8b56c8eec0da20b95c46772b682b23413e666dfd
Author: Anders Widell 
Date:   Fri, 12 Jan 2018 13:08:48 +0100

dtm: Derive Node ID from IPv4 address [#2758]

If the /var/lib/opensaf/node_id file doesn't exist when DTM starts, DTM will 
create the file and use the IPv4 address as node ID. When using IPv6, the file 
must still be configured manually. IPv6 support may be added in a future ticket.



Complete diffstat:
--
 src/ckpt/apitest/test_cpsv.h|  2 +-
 src/ckpt/ckptnd/cpnd_res.c  | 18 ++---
 src/ckpt/common/cpsv_evt.c  |  2 +-
 src/clm/apitest/clmtest.cc  |  6 ++
 src/clm/apitest/tet_saClmClusterNodeGet.cc  | 23 +++---
 src/clm/apitest/tet_saClmClusterNodeGetAsync.cc | 15 ++--
 src/clm/apitest/tet_saClmClusterTrack.cc|  2 +-
 src/clm/apitest/tet_saClmClusterTrackStop.cc|  4 +-
 src/clm/apitest/tet_saClmDispatch.cc|  6 +-
 src/clm/apitest/tet_saClmResponse.cc|  2 +-
 src/dtm/dtmnd/dtm.h |  1 +
 src/dtm/dtmnd/dtm_main.cc   | 98 -
 src/dtm/dtmnd/dtm_node_db.cc|  7 ++
 src/dtm/dtmnd/dtm_read_config.cc|  6 --
 src/lck/apitest/tet_gla.c   |  4 +-
 src/lck/apitest/tet_gld.c   |  4 +-
 src/lck/apitest/tet_glnd.c  |  2 +-
 src/lck/apitest/tet_glsv.h  |  2 +-
 src/msg/apitest/tet_mqa.c   |  4 +-
 src/msg/apitest/tet_mqsv.h  |  2 +-
 src/nid/opensafd.in | 49 ++---
 21 files changed, 149 insertions(+), 110 deletions(-)


Testing Commands:
-

Start OpenSAF with IPv4 TCP transport, without configuring /etc/opensaf/slot_id 
or /var/lib/opensaf/node_id


Testing, Expected Results:
--

OpenSAF shall use the IPv4 address as Node ID.


Conditions of Submission:
-

Ack from

[devel] [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762]

2018-01-19 Thread Srinivas

Summary: imm: removed extra OpenSafSmfConfig in xml [#2762]
Review request for Ticket(s): 2762
Peer Reviewer(s): lennart.l...@ericsson.com , Ravi, Shyam, 
rafael.odza...@ericsson.com 
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-2762
Base revision: e1e0d2c0dc45a5ca7789f19d58dde0a41ed19354
Personal repository: git://git.code.sf.net/u/sri-mangipudy/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files y
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 9ed9e7a38e39d41d3698d5de9dbac6f07417693f
Author: Srinivas 
Date:   Fri, 19 Jan 2018 12:51:46 +0530

imm: removed extra OpenSafSmfConfig in xml [#2762]



Complete diffstat:
--
 src/smf/config/smfsv_classes.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


Testing Commands:
-
run immxml-configure, extra OpenSafSmfConfig should not be present in the 
resultant xml.

Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
*** HOW MANY DAYS BEFORE PUSHING, CONSENSUS ETC ***


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  n  n
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1/1] imm: removed extra OpenSafSmfConfig in xml [#2762]

Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

Re: [devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

[devel] [PATCH 4/5] fmd: update consensus service during controller failover [#64]

[devel] [PATCH 5/5] doc: update README and makefiles [#64]

[devel] [PATCH 3/5] amfd: update consensus service when performing SI swap [#64]

[devel] [PATCH 2/5] rded: add split brain prevention support [#64]

[devel] [PATCH 0/5] Review Request for Add support for split brain prevention V2 [#64]

[devel] [PATCH 1/5] osaf: add consensus API [#64]

Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]

Re: [devel] [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762]

Re: [devel] [PATCH 0/5] Review Request for smf: Add capability to redo CCBs that fail [#1398]

Re: [devel] [PATCH 0/6] Review Request for dtm: Derive Node ID from IPv4 address [#2758]

[devel] [PATCH 0/1] Review Request for removed extra OpenSafSmfConfig in xml [#2762]

14 matches

Site Navigation

Mail list logo

Footer information